Skip to content

Enable OTel instrumentation and propagation#650

Closed
javiermtorres wants to merge 4 commits intomainfrom
javiermtorres-otel-propagation
Closed

Enable OTel instrumentation and propagation#650
javiermtorres wants to merge 4 commits intomainfrom
javiermtorres-otel-propagation

Conversation

@javiermtorres
Copy link
Contributor

This PR adds optional flags to enable OTel distributed tracing in the httpx/starlette packages.

@javiermtorres
Copy link
Contributor Author

There are comments in a test showing the use of the new flags. Maybe these should instead be added to one of the cookbooks (or a new one be created).

@javiermtorres javiermtorres marked this pull request as ready for review July 18, 2025 08:46
@javiermtorres javiermtorres requested review from a team and dpoulopoulos July 18, 2025 08:47
@njbrake
Copy link
Contributor

njbrake commented Jul 18, 2025

Can you provide some more information about why this is required? I don't completely follow why the existing guidance from https://mozilla-ai.github.io/any-agent/tracing/#adding-an-opentelemetry-exporter isn't sufficient.

@javiermtorres
Copy link
Contributor Author

The key point here is context propagation. Without instrumenting both the (outgoing) httpx library and the (incoming) starlette library, the traces in agents across a2a boundaries will carry different trace ids. With this change, the trace id is carried in the traceparent header (following the W3C specifications) and the spans in both the caller and callee agent can be grouped together in the OpenTelemetry collector, as shown here:
image

Otherwise, the spans would not be related and we wouldn't be able to get, for example, cost information related to the complete scenario. Note that the first invoke_agent in the picture would belong to the main agent in the test, while the last invoke_agent in the picture would belong to the A2A-served date agent.

@njbrake
Copy link
Contributor

njbrake commented Jul 18, 2025

@javiermtorres thanks for the clarification. Are there any alternatives? Like, could we have some mechanism to pass a context id between a server and client?

I don't have a handle on what options are available to solve the problem.

@javiermtorres
Copy link
Contributor Author

The basic info on distributed tracing can be found here: https://opentelemetry.io/docs/concepts/context-propagation/
As linked there, the main header we use is traceparent: https://www.w3.org/TR/trace-context/#traceparent-header
This issue afaict is often solved at the transport/session level, rather than application.

This solution is pretty standard: we only need to add the appropriate intrumentation packages, instrument a couple of objects, and we're done. It's optional and disabled by default after all, because the casual user won't need to correlate traces on A2A-distributed agents since they can always use https://mozilla-ai.github.io/any-agent/agents/tools/#callables and gather everything locally. But in a distributed setting, this will help.

Maybe I should add a small piece about how to instrument an httpx client so that an A2A served agent can be appropriately traced out of the use of the A2A tool.

@github-actions
Copy link
Contributor

This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale label Jul 30, 2025
@javiermtorres
Copy link
Contributor Author

This will be closed since the user is expected to provide their own tracing facilities. A follow up PR will provide the hooks a user would need to instrument the library appropriately.

@daavoo daavoo deleted the javiermtorres-otel-propagation branch August 26, 2025 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants