Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end trace duration metric #37597

Open
decimalst opened this issue Jan 30, 2025 · 3 comments
Open

End to end trace duration metric #37597

decimalst opened this issue Jan 30, 2025 · 3 comments

Comments

@decimalst
Copy link

Component(s)

connector/spanmetrics

Is your feature request related to a problem? Please describe.

Hi, I have a scenario where I have a trace, with three sub spans, from three different services. In terms of flow, service A will always produce the root span for the trace, then service B does some work on the request, and service C finalizes the request and publishes it.
I want to calculate two metrics - the end to end duration of the trace defined as the time between span 1 from service A starting to when the span from service C closes.

Describe the solution you'd like

Ideally, the ability to iterate through spans grouped by the same trace ID. Then, output the end to end duration of the trace defined as the time between the root span from service A's start time to the span from service C end time.

Describe alternatives you've considered

Maybe the signaltometrics connector instead https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/signaltometricsconnector ?
We could also maybe achieve this with a transform processor based on the spanmetrics durations

Additional context

No response

@decimalst decimalst added enhancement New feature or request needs triage New item requiring triage labels Jan 30, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@decimalst
Copy link
Author

/label connector/signaltometricsconnector

@jamesmoessis
Copy link
Contributor

To clarify, in your example trace, there is an asynchronous operation? That is, the root span A does not encompass the whole transaction, and span C ends after span A does. Because otherwise you could just use the duration of span A as the trace duration.

The problem you are describing is non-trivial because it requires the same machine to have access to all the spans of a trace at the same time. It makes the system stateful, and if you are running more than one collector it would require using the loadbalancingexporter and sharding by trace ID into a second layer of collectors, otherwise spans from the same trace could end up in different collectors. You also need a decent amount of memory to cache these calculations and all the wonderful things that come with that like how do you deploy/restart without losing state etc.

There is a processor that can calculate trace duration - the tail sampling processor, but this is for the purposes of sampling, not spanmetrics.

I don't work or own the span metrics connector though, so I'll wait for codeowner input. Perhaps there is some prior discussion on this but I couldn't find any.

@jamesmoessis jamesmoessis added waiting-for-code-owners and removed needs triage New item requiring triage labels Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants