Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEL Prometheus Exporters Gap Evaluation and Recommendations #68

Open
erichsueh3 opened this issue Jan 19, 2022 · 0 comments
Open

OTEL Prometheus Exporters Gap Evaluation and Recommendations #68

erichsueh3 opened this issue Jan 19, 2022 · 0 comments

Comments

@erichsueh3
Copy link

Description
The purpose for this evaluation was to assess the state of Prometheus pull exporters available in OpenTelemetry (OTEL) Collector and language libraries for metrics GA readiness. Goals included assessing:

  1. completeness of the OTEL spec for these exporters
  2. verifying whether the pull exporters implemented are compliant with the OTEL specification and
  3. if the existing pull exporters have unit test coverage completeness

Background
What are Prometheus pull exporters? The OTEL Collector collects metrics data and uses the Prometheus pull exporter to export these collected metrics to Prometheus. This is achieved when the Prometheus server makes pull requests to an HTTP server exposed and maintained by the Prometheus pull exporter, scraping the metrics data on a regular interval.

One use case is to collect the metrics from an application that runs constantly at different payloads. while publishing on a HTTP server, using the port, host name we can query the data in order to scrape the metrics that is needed. CPU utilization, memory Utilization, no of HTTP requests are some of the important metrics can used for monitoring an application.

Table 1: Prometheus pull exporter implemented across repositories

Repositories Prometheus Exporter Do End to end (E2E) Tests Exist?
opentelemetry-java Yes Yes
opentelemetry-go Yes Yes
opentelemetry-js Yes Yes
opentelemetry-dotnet Yes Yes
opentelemetry-collector-contrib Yes Yes
opentelemetry-python Yes* N/A
opentelemetry-cpp Yes* Yes
opentelemetry-rust Yes N/A*
opentelemetry-php Yes N/A*
opentelemetry-swift Yes N/A*
opentelemetry-ruby No* N/A

*Rust/PHP/Swift will not be tested since the libraries are not stable
*Python/C++/Ruby will be addressed in the Gaps and Recommendations section

End-to-end (E2E) Testing

Test Criteria: Fully complete integration tests include testing of all instrument and aggregation types for the Prometheus Exporters. There are 6 instrument types, and 3 aggregations, as seen below.

6 Instrument types

  • Counter
  • Async Counter (CounterObserver)
  • Histogram
  • Async Gauge (GaugeObserver)
  • UpDownCounter
  • Async UpDownCounter (UpDownCounterObserver)

Instrument → Aggregation

  • Counter → Sum Aggregation
  • Async Counter → Sum Aggregation
  • UpDownCounter → Sum Aggregation
  • Async UpDownCounter → Sum Aggregation
  • Async Gauge → Last Value Aggregation
  • Histogram → Histogram Aggregation

Unit Tests vs Integration Tests

This section outlines the difference between unit tests vs integration tests in the context of Prometheus exporter.

  • Unit tests don’t include the tests that use SDK. The unit tests pass pre-aggregated values to Prometheus exporter and validate the exported values with the expected values.
  • Integration testing the Prometheus exporter involves tests that use the SDK (i.e. instruments) to generate metrics which are passed to the Prometheus exporter and then validate the exported values with the expected values.

Go, JS, dotnet, and Java’s tests all resemble integration tests. Important things to note are:

  • JS and dotnet call their integration tests a PrometheusSerializer test, which is ambiguous as to whether they were meant to be unit or integration tests
  • Java has both integration and unit tests
  • Go added more tests to cover all the instruments here
  • Java, JS and dotnet tests will NOT be added as they already test all aggregation types
    * Go tests already added before decision was made to not add more tests

TABLE 2: E2E Testing Breakdown

Repositories Available instruments What instruments are tested? Available Aggreggations What aggregations are tested? Prometheus metrics tested (Counter, Gauge, Histogram, Summary)
opentelemetry-go All instruments, in Float64 and Int64 types Float64: Counter, UDCounter, Histogram Int64: GaugeObserver Sum, LastValue, Histogram All Counter, Gauge, Histogram
opentelemetry-js All instruments, in number type Counter, Histogram, Async Gauge, UpDownCounter Sum, LastValue, Histogram All Counter, Gauge, Histogram
opentelemetry-dotnet (long, int, short, byte, double, float) x (Counter, Async Counter, Async Gauge, Histogram) Async Gauge, long/double counter, double histogram LongSumIncomingCumulative, LongSumIncomingDelta, DoubleSumIncomingCumulative, DoubleSumIncomingDelta, DoubleGauge, LongGauge, Histogram, HistogramSumCount LongSumIncomingDelta, DoubleSumIncomingDelta, Histogram, SumIncomingCumulative Counter, Gauge, Histogram
opentelemetry-java All instruments in float64 and int64 types LongCounter and DoubleGauge Sum, LastValue, Histogram, Exponential Histogram Sum, LastValue, Histogram Counter, Gauge, histogram
opentelemetry-collector-contrib N/A N/A All All All

Findings

  1. Prometheus summary data points were removed according to this comment and this PR (#1412).
  2. Creation of async gauges for testing was without a specified integral/floating point type, and uses the default. However, the default is not specified in this documentation, just that a default unit is used. Therefore, the type of async gauge and the type of SumIncomingCumulative aggregator is unknown.
  3. SDKs will not create Prometheus summaries currently, in OTEL creating histograms is preferred. Summaries are considered for “Prometheus support only”. One thing to note is that we would like a Prometheus client library ↔ OTEL bridge at some point, and SDKs would then need to be able to represent summaries, but discussion will happen in for that in 2022.
  4. See the Exponential Histogram Aggregator Behavior section below (under the Recommendations section) for more detail about the Exponential Histogram data type and usage.
  5. The Collector may test Prometheus summaries since it is possible for it to receive aggregated data in Summary format from another source.

Gaps and Recommendation

Gaps:

  1. Spec gap: Currently, the specification for the Prometheus Exporter is minimal, and has the experimental status. The specification needs to be further fleshed out in order to reach stable status, and alongside this the various Prometheus exporters must be sure to be compliant with the specification in the future.
  2. Implementation gaps:
    1. Ruby does not have a Prometheus Exporter.
    2. Python is also missing a Prometheus Exporter, though it DID exist before.
    3. C++ Prometheus Exporter exists but is implemented based on the old C++ metrics specification, and so it is deprecated until the new implementation compliant with the spec is completed.
  3. Each Prometheus Exporter also has its own set of tests. As seen in table 2, not all tests are complete. These gaps are addressed in the listed issues and PRs section below.

Recommendations:

  1. Implementation should reflect the spec: We recommend the Prometheus pull exporters currently existing for specific language SDKs be updated to reflect the functionality in the specification as that is being fleshed out. The Prometheus Remote Write Exporter (push implementation) is available via the Collector.
  2. Exponential Histogram Aggregator Behavior: The sum, gauge, histogram, exponential histogram are part of metric data streams in the OTEL protocol data model. Currently, the exponential histogram is in experimental status as mentioned in specification document. Java is one of the first SDKs to support the exponential histogram and added it as proto type. The only/major difference perhaps in exponential histogram from histogram is just making it suitable to convey in high dynamic range with small relative error. Exponential histogram should have unit tests for Prometheus conversion, as it's a separate SDK data type.

Issues and PRs

Go:
open-telemetry/opentelemetry-go#2466
open-telemetry/opentelemetry-go#2487 (MERGED)

JS:
open-telemetry/opentelemetry-js#2690

DotNet:
open-telemetry/opentelemetry-dotnet#2757

Java:
open-telemetry/opentelemetry-java#4031 (CLOSED)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant