QSE-4732 deterministic multi agent harness #47

ankurgitks · 2025-11-10T11:12:35Z

Set temperature=0.0 for deterministic responses
Add SCENARIO_INDEX override for pinning specific test scenarios
Remove manual agent spans to prevent duplicate evaluation metrics
Update run_agent_script.sh for configurable intervals and container support
Add README_multi_agent.md with setup and usage instructions
Add deploy/ directory with Dockerfile.alpha and otel-collector-alpha.yaml
Support editable install workflow matching manual setup process"

github-actions · 2025-11-10T11:12:47Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

ankurs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

zhirafovod · 2025-11-10T15:27:55Z

@ankurgitks , let's move this test app into instrumentation-genai/opentelemetry-instrumentation-langchain/examples folder, i.e. multi-agent-trigger or multi-agent-qse. See an example in instrumentation-genai/opentelemetry-instrumentation-langchain/examples/multi_agent_travel_planner/

zhirafovod · 2025-11-10T15:29:28Z

instrumentation-genai/opentelemetry-instrumentation-langchain/examples/manual/.env

-OPENAI_API_KEY=sk-YOUR_API_KEY
+OPENAI_API_KEY=
+#Circuit
+CLIENT_ID=


we can remove it, if we don't use it for now

These are mandatory fields, to connect with ai model we need the token and api key, at present we are using the personal token that can't be used here.

...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py

zhirafovod · 2025-11-10T15:32:42Z

...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py

+"""
+Two-Agent Application with Deliberate Metric Triggers for Evaluation Testing
+
+This application deliberately generates responses that trigger evaluation metrics:


can we also cover sentiment metric

...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py

zhirafovod

Added a few comments, please update the PR

ankurgitks · 2025-11-10T16:18:33Z

Directory Restructuring

Moved from manual/ to multi-agent-qse/ directory per reviewer request
Copied all necessary files including .env, deploy/, README, and scripts
Updated all references and documentation to reflect new location

Architecture Improvements

Replaced manual agent spans with LangGraph StateGraph workflow
Implemented proper workflow nodes: content_generator → formatter
Added HTTP request simulation with POST /qse/evaluate server span
Removed global agent.* resource attributes
Now using service-level attributes: service.name, service.version, service.namespace

Sentiment Metric Coverage

Added explicit sentiment analysis in multiple scenarios:
Scenario 0: Bias + Sentiment (prejudicial language with negative tone)
Scenario 2: Sentiment + Toxicity (hostile, dismissive responses)
Scenario 4: Comprehensive (all metrics including sentiment)
Updated documentation to highlight sentiment coverage
Created sentiment-specific test examples and validation

Complete Coverage of All 5 Metrics:

gen_ai.evaluation.toxicity: Triggered by toxic language patterns
gen_ai.evaluation.bias: Triggered by prejudicial statements
gen_ai.evaluation.hallucination: Triggered by false information
gen_ai.evaluation.relevance: Triggered by off-topic responses
gen_ai.evaluation.sentiment: Explicitly triggered in scenarios 0, 2, 4

- Created 4 test applications (LangChain, LangGraph, Traceloop, Direct AI) - Added automation scripts (setup.sh, run_tests.sh with loop mode) - Configured multi-realm support (lab0, rc0, us1) - Added pytest fixtures, mocks, and test data - Documented test plan and execution checklist - All instrumentation methods covered (zero-code, code-based, direct) - All evaluation metrics configured (bias, toxicity, hallucination, relevance, sentiment)

- Change LangChainInstrumentor to LangchainInstrumentor in entry point - Fixes AttributeError in zero-code instrumentation - Resolves: 'module has no attribute LangChainInstrumentor' error

…h masked secrets) - Add comprehensive zero-code vs manual instrumentation section for LangGraph - Add Docker and Kubernetes deployment instructions - Add complete environment variables reference table (30+ vars) - Add dependencies section with version requirements - Document DeepEval/Traceloop dependency conflicts - Update run_tests.sh to support both zero-code and manual modes - Fix syntax error in .env templates (quote EVALS_EVALUATORS) - Mask all sensitive API keys and tokens in templates - Meets customer documentation requirements (TC-1.1, TC-2.2, TC-2.3)

ankurgitks · 2025-11-13T16:16:53Z

Created a new PR #72

Closing this one.

ankurgitks requested review from a team as code owners November 10, 2025 11:12

zhirafovod reviewed Nov 10, 2025

View reviewed changes

...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py Show resolved Hide resolved

zhirafovod reviewed Nov 10, 2025

View reviewed changes

...pentelemetry-instrumentation-langchain/examples/manual/multi-agent-openai-metrics-trigger.py Outdated Show resolved Hide resolved

zhirafovod requested changes Nov 10, 2025

View reviewed changes

ankurgitks force-pushed the QSE-4732-deterministic-multi-agent-harness branch from b5716b6 to 545c3e0 Compare November 11, 2025 14:46

ankurgitks added 3 commits November 12, 2025 01:17

fix: Correct entry point case for LangchainInstrumentor

0ae1ac6

- Change LangChainInstrumentor to LangchainInstrumentor in entry point - Fixes AttributeError in zero-code instrumentation - Resolves: 'module has no attribute LangChainInstrumentor' error

ankurgitks force-pushed the QSE-4732-deterministic-multi-agent-harness branch from 545c3e0 to 24df98a Compare November 11, 2025 19:48

ankurgitks closed this Nov 13, 2025

ankurgitks deleted the QSE-4732-deterministic-multi-agent-harness branch November 13, 2025 16:17

github-actions bot locked and limited conversation to collaborators Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QSE-4732 deterministic multi agent harness #47

QSE-4732 deterministic multi agent harness #47

Uh oh!

ankurgitks commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

zhirafovod commented Nov 10, 2025

Uh oh!

zhirafovod Nov 10, 2025

Uh oh!

ankurgitks Nov 10, 2025

Uh oh!

Uh oh!

zhirafovod Nov 10, 2025

Uh oh!

ankurgitks Nov 10, 2025

Uh oh!

Uh oh!

zhirafovod left a comment

Uh oh!

ankurgitks commented Nov 10, 2025

Uh oh!

ankurgitks commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

QSE-4732 deterministic multi agent harness #47

QSE-4732 deterministic multi agent harness #47

Uh oh!

Conversation

ankurgitks commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

zhirafovod commented Nov 10, 2025

Uh oh!

zhirafovod Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ankurgitks Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhirafovod Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ankurgitks Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhirafovod left a comment

Choose a reason for hiding this comment

Uh oh!

ankurgitks commented Nov 10, 2025

Uh oh!

ankurgitks commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants