-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to OTel metrics #348
base: main
Are you sure you want to change the base?
Conversation
97880a5
to
2cccf3e
Compare
f7d1475
to
e8bf800
Compare
metricTypes[md.ID] = md.Type | ||
switch typ := md.Type; typ { | ||
case MetricTypeCounter: | ||
counter, err := meter.Int64Counter(md.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the name
field from metrics.json
here, another option would be to use field
which is "namespaced" and maybe easier to parse. Both options abide by the OTel instrument naming requirements.
I pushed #350 which should fix the ARM64 test failures. |
e8bf800
to
0e6e85e
Compare
ids := make([]uint32, nMetrics) | ||
values := make([]int64, nMetrics) | ||
|
||
ctx := context.Background() | ||
for i := 0; i < nMetrics; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured elapsed time for this loop and it's in the tens-of-microseconds range, which means that performance is not an issue.
metrics/metrics.go
Outdated
@@ -37,44 +41,62 @@ var ( | |||
|
|||
// Used in fallback checks, e.g. to avoid sending "counters" with 0 values | |||
metricTypes map[MetricID]MetricType | |||
|
|||
// OTel metric instrumentation | |||
meter = otel.Meter("go.opentelemetry.io/ebpf-profiler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a way so that this can be disabled by configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration normally takes place at the meter provider level which is either part of OTel collector (if agent runs inside OTel collector) or if we introduce a meter provider and OTLP metrics exporter in the otlp_reporter
this is where we'd add it.
If a meter provider is not configured, every metering operation is a NOP.
ids := make([]uint32, nMetrics) | ||
values := make([]int64, nMetrics) | ||
|
||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be able to cancel the operation and not get blocked, it might be best to propagate the context to the function and us it.
ctx := context.Background() | |
func report(ctx context.Context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add context
it will propagate out of report
and into every Add/AddSlice
operation we do in the agent, which is what I was trying to avoid as it could get ugly (e.g. ProcessManager
is caching AddSlice
). Based on the initial research that I did, it didn't seem possible that a metering operation would block (this is also incompatible with metering being perfomant enough to be inserted in hot loops) so that would make propagating context
for this reason - avoid blocking - unnecessary. I'll dig some more.
|
||
if reporterImpl != nil { | ||
reporterImpl.ReportMetrics(uint32(prevTimestamp), ids, values) | ||
metric := metricsBuffer[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify this loop and just do a for _, metric := range metricsBuffer {
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I roughly agree with Florian here.
for _, metric := range metricsBuffer[0:nMetrics] {
would do it. Avoiding i
here increases readability - otherwise when reading the code the first time, one tries to find if i
is used in the loop body, which can be avoided.
Treat missing metric definitions as a hard error
Will switch to OTel metrics, MetricsReporter is no longer needed.
No longer used
No longer used
900e7e9
to
1917970
Compare
3eb37e9
to
6164846
Compare
6164846
to
0adf32a
Compare
Summary
This PR adds OTel metrics instrumentation and cleans up the relevant
reporter
interfaces. My initial implementation kept theMetricsReporter
abstraction but I then decided to remove all the previous metric reporting logic (that was not OTel-compliant) in order to simplify the code and keep OTel metrics instrumentation in a single place.This is an initial attempt at introducing OTel instrumentation, deeper architectural restructuring could be made but I think this is a good first step. Note that I'm not introducing a meter provider for OTLP metrics: if the agent is running as an OTel collector receiver, the expectation is that the global meter provider that the OTel collector configures will be used. If the agent is running standalone and reporting via OTLP, we could introduce a meter provider and an OTLP metrics exporter in a follow-up PR (assuming we think this is needed).
Review commit-by-commit might be easier. The last commit will be removed before merging and contains a meter provider, stdout exporter for testing. We'd follow a similar route if we wanted to add an OTLP metrics exporter to the OTLP reporter.
TODO:
Exporter example for testing (stdout)Mark unused metrics as obsolete