Skip to content

Conversation

@ankurgitks
Copy link

  • Set temperature=0.0 for deterministic responses
  • Add SCENARIO_INDEX override for pinning specific test scenarios
  • Remove manual agent spans to prevent duplicate evaluation metrics
  • Update run_agent_script.sh for configurable intervals and container support
  • Add README_multi_agent.md with setup and usage instructions
  • Add deploy/ directory with Dockerfile.alpha and otel-collector-alpha.yaml
  • Support editable install workflow matching manual setup process"

@ankurgitks ankurgitks requested review from a team as code owners November 10, 2025 11:12
@github-actions
Copy link


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


ankurs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@zhirafovod
Copy link
Contributor

@ankurgitks , let's move this test app into instrumentation-genai/opentelemetry-instrumentation-langchain/examples folder, i.e. multi-agent-trigger or multi-agent-qse. See an example in instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/

OPENAI_API_KEY=sk-YOUR_API_KEY
OPENAI_API_KEY=
#Circuit
CLIENT_ID=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove it, if we don't use it for now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mandatory fields, to connect with ai model we need the token and api key, at present we are using the personal token that can't be used here.

"""
Two-Agent Application with Deliberate Metric Triggers for Evaluation Testing
This application deliberately generates responses that trigger evaluation metrics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also cover sentiment metric

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

@zhirafovod zhirafovod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments, please update the PR

@ankurgitks
Copy link
Author

  1. Directory Restructuring
  • Moved from manual/ to multi-agent-qse/ directory per reviewer request
  • Copied all necessary files including .env, deploy/, README, and scripts
  • Updated all references and documentation to reflect new location
  1. Architecture Improvements
  • Replaced manual agent spans with LangGraph StateGraph workflow
  • Implemented proper workflow nodes: content_generator → formatter
  • Added HTTP request simulation with POST /qse/evaluate server span
  • Removed global agent.* resource attributes
  • Now using service-level attributes: service.name, service.version, service.namespace
  1. Sentiment Metric Coverage
  • Added explicit sentiment analysis in multiple scenarios:
    Scenario 0: Bias + Sentiment (prejudicial language with negative tone)
    Scenario 2: Sentiment + Toxicity (hostile, dismissive responses)
    Scenario 4: Comprehensive (all metrics including sentiment)
  • Updated documentation to highlight sentiment coverage
  • Created sentiment-specific test examples and validation

Complete Coverage of All 5 Metrics:

gen_ai.evaluation.toxicity: Triggered by toxic language patterns
gen_ai.evaluation.bias: Triggered by prejudicial statements
gen_ai.evaluation.hallucination: Triggered by false information
gen_ai.evaluation.relevance: Triggered by off-topic responses
gen_ai.evaluation.sentiment: Explicitly triggered in scenarios 0, 2, 4

@ankurgitks ankurgitks force-pushed the QSE-4732-deterministic-multi-agent-harness branch from b5716b6 to 545c3e0 Compare November 11, 2025 14:46
- Created 4 test applications (LangChain, LangGraph, Traceloop, Direct AI)
- Added automation scripts (setup.sh, run_tests.sh with loop mode)
- Configured multi-realm support (lab0, rc0, us1)
- Added pytest fixtures, mocks, and test data
- Documented test plan and execution checklist
- All instrumentation methods covered (zero-code, code-based, direct)
- All evaluation metrics configured (bias, toxicity, hallucination, relevance, sentiment)
- Change LangChainInstrumentor to LangchainInstrumentor in entry point
- Fixes AttributeError in zero-code instrumentation
- Resolves: 'module has no attribute LangChainInstrumentor' error
…h masked secrets)

- Add comprehensive zero-code vs manual instrumentation section for LangGraph
- Add Docker and Kubernetes deployment instructions
- Add complete environment variables reference table (30+ vars)
- Add dependencies section with version requirements
- Document DeepEval/Traceloop dependency conflicts
- Update run_tests.sh to support both zero-code and manual modes
- Fix syntax error in .env templates (quote EVALS_EVALUATORS)
- Mask all sensitive API keys and tokens in templates
- Meets customer documentation requirements (TC-1.1, TC-2.2, TC-2.3)
@ankurgitks ankurgitks force-pushed the QSE-4732-deterministic-multi-agent-harness branch from 545c3e0 to 24df98a Compare November 11, 2025 19:48
@ankurgitks
Copy link
Author

Created a new PR #72

Closing this one.

@ankurgitks ankurgitks closed this Nov 13, 2025
@ankurgitks ankurgitks deleted the QSE-4732-deterministic-multi-agent-harness branch November 13, 2025 16:17
@github-actions github-actions bot locked and limited conversation to collaborators Nov 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants