-
Notifications
You must be signed in to change notification settings - Fork 4
QSE-4732 deterministic multi agent harness #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QSE-4732 deterministic multi agent harness #47
Conversation
ankurgitks
commented
Nov 10, 2025
- Set temperature=0.0 for deterministic responses
- Add SCENARIO_INDEX override for pinning specific test scenarios
- Remove manual agent spans to prevent duplicate evaluation metrics
- Update run_agent_script.sh for configurable intervals and container support
- Add README_multi_agent.md with setup and usage instructions
- Add deploy/ directory with Dockerfile.alpha and otel-collector-alpha.yaml
- Support editable install workflow matching manual setup process"
|
I have read the CLA Document and I hereby sign the CLA ankurs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
|
@ankurgitks , let's move this test app into |
| OPENAI_API_KEY=sk-YOUR_API_KEY | ||
| OPENAI_API_KEY= | ||
| #Circuit | ||
| CLIENT_ID= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove it, if we don't use it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mandatory fields, to connect with ai model we need the token and api key, at present we are using the personal token that can't be used here.
...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py
Show resolved
Hide resolved
| """ | ||
| Two-Agent Application with Deliberate Metric Triggers for Evaluation Testing | ||
| This application deliberately generates responses that trigger evaluation metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also cover sentiment metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py
Outdated
Show resolved
Hide resolved
zhirafovod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments, please update the PR
Complete Coverage of All 5 Metrics: gen_ai.evaluation.toxicity: Triggered by toxic language patterns |
b5716b6 to
545c3e0
Compare
- Created 4 test applications (LangChain, LangGraph, Traceloop, Direct AI) - Added automation scripts (setup.sh, run_tests.sh with loop mode) - Configured multi-realm support (lab0, rc0, us1) - Added pytest fixtures, mocks, and test data - Documented test plan and execution checklist - All instrumentation methods covered (zero-code, code-based, direct) - All evaluation metrics configured (bias, toxicity, hallucination, relevance, sentiment)
- Change LangChainInstrumentor to LangchainInstrumentor in entry point - Fixes AttributeError in zero-code instrumentation - Resolves: 'module has no attribute LangChainInstrumentor' error
…h masked secrets) - Add comprehensive zero-code vs manual instrumentation section for LangGraph - Add Docker and Kubernetes deployment instructions - Add complete environment variables reference table (30+ vars) - Add dependencies section with version requirements - Document DeepEval/Traceloop dependency conflicts - Update run_tests.sh to support both zero-code and manual modes - Fix syntax error in .env templates (quote EVALS_EVALUATORS) - Mask all sensitive API keys and tokens in templates - Meets customer documentation requirements (TC-1.1, TC-2.2, TC-2.3)
545c3e0 to
24df98a
Compare
|
Created a new PR #72 Closing this one. |