Switch to OTel metrics #348

christos68k · 2025-02-11T15:58:19Z

Summary

This PR adds OTel metrics instrumentation and cleans up the relevant reporter interfaces. My initial implementation kept the MetricsReporter abstraction but I then decided to remove all the previous metric reporting logic (that was not OTel-compliant) in order to simplify the code and keep OTel metrics instrumentation in a single place.

This is an initial attempt at introducing OTel instrumentation, deeper architectural restructuring could be made but I think this is a good first step. Note that I'm not introducing a meter provider for OTLP metrics: if the agent is running as an OTel collector receiver, the expectation is that the global meter provider that the OTel collector configures will be used. If the agent is running standalone and reporting via OTLP, we could introduce a meter provider and an OTLP metrics exporter in a follow-up PR (assuming we think this is needed).

Review commit-by-commit might be easier. The last commit will be removed before merging and contains a meter provider, stdout exporter for testing. We'd follow a similar route if we wanted to add an OTLP metrics exporter to the OTLP reporter.

TODO:

~~Exporter example for testing (stdout)~~
~~Mark unused metrics as obsolete~~

christos68k · 2025-02-11T23:01:15Z

metrics/metrics.go

 		metricTypes[md.ID] = md.Type
+		switch typ := md.Type; typ {
+		case MetricTypeCounter:
+			counter, err := meter.Int64Counter(md.Name,


I'm using the name field from metrics.json here, another option would be to use field which is "namespaced" and maybe easier to parse. Both options abide by the OTel instrument naming requirements.

christos68k · 2025-02-11T23:15:28Z

I pushed #350 which should fix the ARM64 test failures.

christos68k · 2025-02-11T23:22:58Z

metrics/metrics.go

-	ids := make([]uint32, nMetrics)
-	values := make([]int64, nMetrics)
-
+	ctx := context.Background()
 	for i := 0; i < nMetrics; i++ {


I measured elapsed time for this loop and it's in the tens-of-microseconds range, which means that performance is not an issue.

metrics/metrics.go

florianl · 2025-02-12T09:59:34Z

metrics/metrics.go

@@ -37,44 +41,62 @@ var (

 	// Used in fallback checks, e.g. to avoid sending "counters" with 0 values
 	metricTypes map[MetricID]MetricType
+
+	// OTel metric instrumentation
+	meter    = otel.Meter("go.opentelemetry.io/ebpf-profiler")


Should there be a way so that this can be disabled by configuration?

Configuration normally takes place at the meter provider level which is either part of OTel collector (if agent runs inside OTel collector) or if we introduce a meter provider and OTLP metrics exporter in the otlp_reporter this is where we'd add it.

If a meter provider is not configured, every metering operation is a NOP.

florianl · 2025-02-12T10:06:27Z

metrics/metrics.go

-	ids := make([]uint32, nMetrics)
-	values := make([]int64, nMetrics)
-
+	ctx := context.Background()


To be able to cancel the operation and not get blocked, it might be best to propagate the context to the function and us it.

Suggested change

ctx := context.Background()

func report(ctx context.Context)

If we add context it will propagate out of report and into every Add/AddSlice operation we do in the agent, which is what I was trying to avoid as it could get ugly (e.g. ProcessManager is caching AddSlice). Based on the initial research that I did, it didn't seem possible that a metering operation would block (this is also incompatible with metering being perfomant enough to be inserted in hot loops) so that would make propagating context for this reason - avoid blocking - unnecessary. I'll dig some more.

florianl · 2025-02-12T10:09:20Z

metrics/metrics.go

-
-	if reporterImpl != nil {
-		reporterImpl.ReportMetrics(uint32(prevTimestamp), ids, values)
+		metric := metricsBuffer[i]


Could we simplify this loop and just do a for _, metric := range metricsBuffer { instead?

metricsBuffer is statically sized by worst-case (IDMax) and the number of metrics it actually contains is given by nMetrics, so we can't iterate over the entire slice (we could check against Metric{} or Metric.ID == 0 during the iteration but I think the current approach is cleaner).

I roughly agree with Florian here.
for _, metric := range metricsBuffer[0:nMetrics] { would do it. Avoiding i here increases readability - otherwise when reading the code the first time, one tries to find if i is used in the loop body, which can be avoided.

Treat missing metric definitions as a hard error

Will switch to OTel metrics, MetricsReporter is no longer needed.

No longer used

reporter/otlp_reporter.go

christos68k requested review from a team as code owners February 11, 2025 15:58

christos68k marked this pull request as draft February 11, 2025 16:12

christos68k force-pushed the ck/otel-metrics branch from 97880a5 to 2cccf3e Compare February 11, 2025 16:19

christos68k changed the title ~~WIP: Switch to OTel metrics~~ Switch to OTel metrics Feb 11, 2025

christos68k marked this pull request as ready for review February 11, 2025 22:40

christos68k force-pushed the ck/otel-metrics branch 2 times, most recently from f7d1475 to e8bf800 Compare February 11, 2025 22:50

christos68k commented Feb 11, 2025

View reviewed changes

christos68k self-assigned this Feb 11, 2025

christos68k requested a review from dmathieu February 11, 2025 23:15

christos68k force-pushed the ck/otel-metrics branch from e8bf800 to 0e6e85e Compare February 11, 2025 23:19

christos68k commented Feb 11, 2025

View reviewed changes

florianl reviewed Feb 12, 2025

View reviewed changes

christos68k added 7 commits February 19, 2025 09:28

metrics: Simplify GetDefinitions

dbf818d

Treat missing metric definitions as a hard error

Remove ReportMetricsInterval as it is unused

4a2fd6c

Remove MetricsReporter interface

785ab93

Will switch to OTel metrics, MetricsReporter is no longer needed.

Add OTel metrics

9ea4488

Remove Reporter.GetMetrics

dfdb3fb

No longer used

Remove RPC stats handler

c34b41b

No longer used

Mark unused metrics obsolete

d8e1a00

christos68k force-pushed the ck/otel-metrics branch from 900e7e9 to 1917970 Compare February 19, 2025 14:29

christos68k requested review from florianl and rockdaboot February 19, 2025 14:53

florianl reviewed Feb 21, 2025

View reviewed changes

reporter/otlp_reporter.go Outdated Show resolved Hide resolved

Add version reporting

804d8db

christos68k force-pushed the ck/otel-metrics branch from 3eb37e9 to 6164846 Compare February 21, 2025 14:47

Add new licenses

0adf32a

christos68k force-pushed the ck/otel-metrics branch from 6164846 to 0adf32a Compare February 21, 2025 15:04

dmathieu approved these changes Feb 21, 2025

View reviewed changes

florianl approved these changes Feb 21, 2025

View reviewed changes

rockdaboot approved these changes Feb 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to OTel metrics #348

Switch to OTel metrics #348

christos68k commented Feb 11, 2025 •

edited

Loading

christos68k Feb 11, 2025

christos68k commented Feb 11, 2025 •

edited

Loading

christos68k Feb 11, 2025

florianl Feb 12, 2025

christos68k Feb 12, 2025

florianl Feb 12, 2025

christos68k Feb 12, 2025 •

edited

Loading

florianl Feb 12, 2025

christos68k Feb 12, 2025 •

edited

Loading

rockdaboot Feb 21, 2025

Switch to OTel metrics #348

Are you sure you want to change the base?

Switch to OTel metrics #348

Conversation

christos68k commented Feb 11, 2025 • edited Loading

Summary

christos68k Feb 11, 2025

Choose a reason for hiding this comment

christos68k commented Feb 11, 2025 • edited Loading

christos68k Feb 11, 2025

Choose a reason for hiding this comment

florianl Feb 12, 2025

Choose a reason for hiding this comment

christos68k Feb 12, 2025

Choose a reason for hiding this comment

florianl Feb 12, 2025

Choose a reason for hiding this comment

christos68k Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

florianl Feb 12, 2025

Choose a reason for hiding this comment

christos68k Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

rockdaboot Feb 21, 2025

Choose a reason for hiding this comment

christos68k commented Feb 11, 2025 •

edited

Loading

christos68k commented Feb 11, 2025 •

edited

Loading

christos68k Feb 12, 2025 •

edited

Loading

christos68k Feb 12, 2025 •

edited

Loading