Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to OTel metrics #348

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Switch to OTel metrics #348

wants to merge 9 commits into from

Conversation

christos68k
Copy link
Member

@christos68k christos68k commented Feb 11, 2025

Summary

This PR adds OTel metrics instrumentation and cleans up the relevant reporter interfaces. My initial implementation kept the MetricsReporter abstraction but I then decided to remove all the previous metric reporting logic (that was not OTel-compliant) in order to simplify the code and keep OTel metrics instrumentation in a single place.

This is an initial attempt at introducing OTel instrumentation, deeper architectural restructuring could be made but I think this is a good first step. Note that I'm not introducing a meter provider for OTLP metrics: if the agent is running as an OTel collector receiver, the expectation is that the global meter provider that the OTel collector configures will be used. If the agent is running standalone and reporting via OTLP, we could introduce a meter provider and an OTLP metrics exporter in a follow-up PR (assuming we think this is needed).

Review commit-by-commit might be easier. The last commit will be removed before merging and contains a meter provider, stdout exporter for testing. We'd follow a similar route if we wanted to add an OTLP metrics exporter to the OTLP reporter.

TODO:

  • Exporter example for testing (stdout)
  • Mark unused metrics as obsolete

@christos68k christos68k requested review from a team as code owners February 11, 2025 15:58
@christos68k christos68k marked this pull request as draft February 11, 2025 16:12
@christos68k christos68k changed the title WIP: Switch to OTel metrics Switch to OTel metrics Feb 11, 2025
@christos68k christos68k marked this pull request as ready for review February 11, 2025 22:40
@christos68k christos68k force-pushed the ck/otel-metrics branch 2 times, most recently from f7d1475 to e8bf800 Compare February 11, 2025 22:50
metricTypes[md.ID] = md.Type
switch typ := md.Type; typ {
case MetricTypeCounter:
counter, err := meter.Int64Counter(md.Name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the name field from metrics.json here, another option would be to use field which is "namespaced" and maybe easier to parse. Both options abide by the OTel instrument naming requirements.

@christos68k
Copy link
Member Author

christos68k commented Feb 11, 2025

I pushed #350 which should fix the ARM64 test failures.

ids := make([]uint32, nMetrics)
values := make([]int64, nMetrics)

ctx := context.Background()
for i := 0; i < nMetrics; i++ {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured elapsed time for this loop and it's in the tens-of-microseconds range, which means that performance is not an issue.

@@ -37,44 +41,62 @@ var (

// Used in fallback checks, e.g. to avoid sending "counters" with 0 values
metricTypes map[MetricID]MetricType

// OTel metric instrumentation
meter = otel.Meter("go.opentelemetry.io/ebpf-profiler")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a way so that this can be disabled by configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration normally takes place at the meter provider level which is either part of OTel collector (if agent runs inside OTel collector) or if we introduce a meter provider and OTLP metrics exporter in the otlp_reporter this is where we'd add it.

If a meter provider is not configured, every metering operation is a NOP.

ids := make([]uint32, nMetrics)
values := make([]int64, nMetrics)

ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to cancel the operation and not get blocked, it might be best to propagate the context to the function and us it.

Suggested change
ctx := context.Background()
func report(ctx context.Context)

Copy link
Member Author

@christos68k christos68k Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add context it will propagate out of report and into every Add/AddSlice operation we do in the agent, which is what I was trying to avoid as it could get ugly (e.g. ProcessManager is caching AddSlice). Based on the initial research that I did, it didn't seem possible that a metering operation would block (this is also incompatible with metering being perfomant enough to be inserted in hot loops) so that would make propagating context for this reason - avoid blocking - unnecessary. I'll dig some more.


if reporterImpl != nil {
reporterImpl.ReportMetrics(uint32(prevTimestamp), ids, values)
metric := metricsBuffer[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this loop and just do a for _, metric := range metricsBuffer { instead?

Copy link
Member Author

@christos68k christos68k Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metricsBuffer is statically sized by worst-case (IDMax) and the number of metrics it actually contains is given by nMetrics, so we can't iterate over the entire slice (we could check against Metric{} or Metric.ID == 0 during the iteration but I think the current approach is cleaner).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I roughly agree with Florian here.
for _, metric := range metricsBuffer[0:nMetrics] { would do it. Avoiding i here increases readability - otherwise when reading the code the first time, one tries to find if i is used in the loop body, which can be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants