Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JMX implementation : feature parity for target systems #12158

Open
13 tasks
SylvainJuge opened this issue Sep 3, 2024 · 13 comments
Open
13 tasks

JMX implementation : feature parity for target systems #12158

SylvainJuge opened this issue Sep 3, 2024 · 13 comments
Assignees

Comments

@SylvainJuge
Copy link
Contributor

SylvainJuge commented Sep 3, 2024

JMX Insights supports some values for otel.jmx.target.system, those are defined in YAML files here.

JMX Gatherer (in contrib) supports more values of otel.jmx.target.system, those are defined in Groovy scripts here.

While the Groovy scripts are convenient, moving to YAML seems a more future-proof solution:

  • removes security risk of having executable groovy scripts
  • YAML syntax is already widespread and usually do not require java/groovy knowledge
  • YAML syntax could later allow to inline the configuration in a global OpenTelemetry YAML configuration when such would be available, for now it has to be stored in a separate file.

Merging both implementations and bringing them to feature parity means that we have to attempt migrate/align all of the JMX Gatherer supported systems and ensure they can be implemented with YAML. Doing so will highlight any missing feature of the YAML implementation by adding any missing part.

Once the alignment is complete, we should then be able to start on the next step: building a "JMX Scraper" in contrib based on the YAML implementation in instrumentation.

For each system listed below, we need to ensure the following with JMX Insights

  • add YAML if system is not supported yet
  • convert groovy metrics to their YAML equivalent
  • deal with any found inconsistency for existing metrics by choosing to
    • leave them as-is
    • fix YAML or Groovy definitions (or both)
  • add any missing feature to YAML implementation if needed

List of systems to cover:

Once feature parity is achieved and JMX Scraper allows to capture both:

  • current JMX Gatherer metrics
  • current JMX Insight metrics (maybe as opt-in)

Then we can start the next step to enhance and align the metrics as the initial attempt in #11621

When doing so, special care should be taken to ensure that we conform to current guidelines for metrics defined here, for example:

  • units using {noun} instead of 1
  • metric name with a namespace
  • metric attributes with a namespace
  • maybe defining a common strategy to map existing JMX metrics with minimal definition (for example stay close to MBean attribute name by default, but it's just a random thought)

Follow-up tasks

  • open issue to enhance jmx metrics (maybe system per system)
@SylvainJuge SylvainJuge self-assigned this Sep 3, 2024
@SylvainJuge
Copy link
Contributor Author

Ping @robsunday I can't yet co-assign you as you are not part of the otel contributors group.

@SylvainJuge
Copy link
Contributor Author

SylvainJuge commented Sep 3, 2024

For Tomcat, the mapping is not the same but almost equivalent, there isn't anything we need to add for 1:1 support beyond aligning the metrics themselves.

Side note: using JMX object names and attributes is a convenient way to identify elements, as it's a common part between the two mappings.

  • JMX : Catalina:type=Manager,host=localhost,context=* or Tomcat:type=GlobalRequestProcessor,name=*
    • activeSessions : tomcat.sessions (no attribute) <==> http.server.tomcat.sessions.activeSessions with context attribute
  • JMX: Catalina:type=GlobalRequestProcessor,name=* or Catalina:type=GlobalRequestProcessor,name=*
    • JMX Gatherer: name => proto_handler, JMX Insight: name => name
    • errorCount: tomcat.errors with proto_handler attribute <==> http.server.tomcat.errorCount with name attribute
    • requestCount: tomcat.request_count with proto_handler attribute <==> http.server.tomcat.requestCount with name attribute
    • maxTime: tomcat.max_time with proto_handler attribute <==> http.server.tomcat.maxTime with name attribute
    • processingTime: tomcat.processing_time with proto_handler attribute <==> http.server.tomcat.processingTime with name attribute
    • bytesReceived: tomcat.traffic with proto_handler and direction = received|sent <==> http.server.tomcat.traffic with name, direction identical
  • JMX: Catalina:type=ThreadPool,name=* or Tomcat:type=ThreadPool,name=*
    • JMX Gatherer: name => proto_handler, JMX Insight: name => name
    • currentThreadCount : tomcat.threads with state = idle <==> http.server.tomcat.threads with name , state identical (state=idle reports the total number of threads, which is a bug mentioned here and here)
    • currentThreadsBusy: tomcat.threads with state = busy <==> http.server.tomcat.threads with name and state identical

Given the mapping differences, I think here we need we probably need to leave it as-is for now.

@robsunday
Copy link
Contributor

I'll look on Jetty

@SylvainJuge
Copy link
Contributor Author

For Wildfly, the mapping is also not the same but equivalent, there isn't anything we need to add for 1:1 support beyond aligning the metrics themselves.

  • JMX: jboss.as:deployment=*,subsystem=undertow
    • Both map deployment => deployment attribute
    • sessionsCreated: wildfly.session.count <==> wildfly.session.sessionsCreated
    • activeSessions: wildfly.session.active <==> wildfly.session.activeSessions
    • expiredSessions: wildfly.session.expired <==> wildfly.session.expiredSessions
    • rejectedSessions: wildfly.session.rejected <==> wildfly.session.rejectedSessions
  • JMX: jboss.as:subsystem=undertow,server=*,http-listener=*
    • Both map server => server attribute and http-listener => value of listener
    • requestCount: wildfly.request.count <==> wildfly.request.requestCount
    • processingTime: wildfly.request.time <==> wildfly.request.processingTime
    • errorCount: wildfly.request.server_error <==> wildfly.request.errorCount
    • bytesSent: wildfly.network.io with extra state = out attribute <==> same
    • bytesReceived: wildfly.network.io with extra state = in attribute <==> same
  • JMX: jboss.as:subsystem=datasources,data-source=*,statistics=pool
    • Both map data-source => value of data_source
    • ActiveCount : wildfly.jdbc.connection.open with state = active <==> wildfly.db.client.connections.usage with state = used
    • IdleCount : wildfly.jdbc.connection.open with state = idle <==> wildfly.db.client.connections.usage with state = idle
    • WaitCount: wildfly.jdbc.request.wait <==> wildfly.db.client.connections.WaitCount
  • JMX: jboss.as:subsystem=transactions
    • numberOfTransactions: wildfly.jdbc.transaction.count <==> wildfly.db.client.transaction.NumberOfTransactions
    • numberOfSystemRollbacks: wildfly.jdbc.rollback.count with cause = system <==> wildfly.db.client.rollback.count with cause = system
    • numberOfResourceRollbacks: wildfly.jdbc.rollback.count with cause = resource <==> wildfly.db.client.rollback.count with cause = resource
    • numberOfApplicationRollbacks: wildfly.jdbc.rollback.count with cause = application <==> wildfly.db.client.rollback.count with cause = application

@SylvainJuge
Copy link
Contributor Author

For JVM metrics, the JMX Insight does not provide a YAML file, the feature is implemented in the runtime-metrics module of instrumentation (link). The current definition is aligned with semantic conventions for JVM metrics.

JMX Gatherer provides the following metrics that are not aligned with semconv, all of those can be easily captured with the YAML configuration:

  • java.lang:type=ClassLoading:
    • LoadedClassCount : jvm.classes.loaded
  • java.lang:type=GarbageCollector,* :
    • CollectionCount: jvm.gc.collections.count with name => name
    • CollectionTime: jvm.gc.collections.elapsed with name => name
  • java.lang:type=Memory
    • HeapMemoryUsage: jvm.memory.heap
    • NonHeapMemoryUsage: jvm.memory.nonheap
  • java.lang:type=MemoryPool,*
    • Usage: jvm.memory.pool with name => name
  • java.lang:type=Threading:
    • ThreadCount : jvm.threads.count

@SylvainJuge
Copy link
Contributor Author

As a side note, after reviewing differences for jvm, tomcat and wildfly, it becomes more and more obvious to me that there are too many differences to fix. Also, the groovy definitions haven't been modified in 2 or 3 years for some, which means they are very probably obsolete or not really used in practice.

As a consequence, I think the better option for now is to:

  • finish reviewing the mapping to ensure we can reproduce it with YAML in JMX Gatherer

The steps that will likely follow are:

  • build a new module that will use the JMX Insight implementation in contrib next to JMX Gatherer
  • provide a set of YAML definitions for this new module to capture the metrics as they currently are (just to preserve compatibility)
  • modify the collector jmxreciver implementation to use this new way to capture JMX metrics
  • start deprecating the current JMX Gatherer
  • start improving the metrics definitions so we have a set of common YAML definitions that can be reused between Instrumentation and Contrib (from the consumer side of those metrics, they should be exactly the same).

@robsunday
Copy link
Contributor

Here are my findings regarding jetty:

  • JMX: org.eclipse.jetty.server.session:context=*,type=sessionhandler,id=*

    • MBean property: sessionsCreated --> YAML: jetty.session.sessionsCreated <==> Groovy: jetty.session.count
    • MBean property: sessionTimeTotal --> YAML: jetty.session.sessionTimeTotal <==> Groovy: jetty.session.time.total
      • minor difference in type: YAML: counter / Groovy: UpDownCounter
    • MBean property: sessionTimeMax --> YAML: jetty.session.sessionTimeMax <==> Groovy: jetty.session.time.max
    • MBean property: sessionTimeMean --> YAML: jetty.session.sessionTimeMean, not used in Groovy
  • JMX: org.eclipse.jetty.util.thread:type=queuedthreadpool,id=*

    • MBean property: busyThreads --> YAML: jetty.threads.busyThreads <==> Groovy: jetty.thread.count with extra state=busy attribute
      • minor difference in type: YAML: updowncounter / Groovy: Value
    • MBean property: idleThreads --> YAML: jetty.threads.idleThreads <==> Groovy: jetty.thread.count with extra state=idle attribute
      • minor difference in type: YAML: updowncounter / Groovy: Value
    • MBean property: maxThreads --> YAML: jetty.threads.maxThreads, not used in Groovy
    • MBean property: queueSize --> YAML: jetty.threads.queueSize <==> Groovy: jetty.thread.queue.count
      • minor difference in type: YAML: updowncounter / Groovy: Value
  • JMX: org.eclipse.jetty.io:context=*,type=managedselector,id=*

    • MBean property: selectCount --> YAML: jetty.io.selectCount <==> Groovy: jetty.select.count
      • difference in units: YAML: 1 / Groovy: {operations}
  • JMX: org.eclipse.jetty.logging:type=jettyloggerfactory,id=* not used in Groovy

@SylvainJuge
Copy link
Contributor Author

For hbase, there isn't anything in JMX Insight for it, the mappings are simple and it should be quite straightforward (but a bit tedious) to produce an equivalent YAML to hbase.groovy.

@SylvainJuge
Copy link
Contributor Author

For hadoop:

JMX attribute tag.Hostname is always mapped to node_name metric attribute in both implementations.

JMX Hadoop:service=NameNode,name=FSNamesystem:

  • CapacityUsed : hadoop.name_node.capacity.usage <==> hadoop.capacity.CapacityUsed
  • CapacityTotal: hadoop.name_node.capacity.limit <==> hadoop.capacity.CapacityTotal
  • BlocksTotal: hadoop.name_node.block.count <==> hadoop.block.BlocksTotal
  • MissingBlocks: hadoop.name_node.block.missing <==> hadoop.block.MissingBlocks
  • CorruptBlocks: hadoop.name_node.block.corrupt <==> hadoop.block.CorruptBlocks
  • VolumeFailuresTotal: hadoop.name_node.volume.failed <==> hadoop.volume.VolumeFailuresTotal
  • FilesTotal: hadoop.name_node.file.count <==> hadoop.file.FilesTotal
  • TotalLoad: hadoop.name_node.file.load <==> hadoop.file.TotalLoad
  • NumLiveDataNodes: hadoop.name_node.data_node.count with state = live <==> hadoop.datenode.Count, same state value (yes, there is a typo in datanode)
  • NumDeadDataNodes: hadoop.name_node.data_node.count with state = dead <==> hadoop.hadoop.datenode.Count, same state value

@SylvainJuge
Copy link
Contributor Author

For cassandra:

There is no mapping in YAML, the mapping is verbose and the lack of support for templates or string interpolation would make it quite tedious to write, but it's more an annoyance than a really blocking issue.

For example, few examples of MBeans:

  • org.apache.cassandra.metrics:type=ClientRequest
  • org.apache.cassandra.metrics:type=ClientRequest,scope=RangeSlice
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Read
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Write
  • all of above with scope= with 3 variants by adding ,name= with value in Unavailables, Timeouts or Failures
  • org.apache.cassandra.metrics:type=Storage,name=Load

There isn't anything that could not be mapped using YAML syntax.

@robsunday
Copy link
Contributor

robsunday commented Sep 4, 2024

For activemq everything except property descriptions seems to be in sync.
Metric attributes are consitent.

  • JMX: org.apache.activemq:type=Broker,brokerName=*,destinationType=Queue,destinationName=* and org.apache.activemq:type=Broker,brokerName=*,destinationType=Topic,destinationName=*
    • ProducerCount: activemq.producer.count <==> activemq.ProducerCount
    • ConsumerCount: activemq.consumer.count <==> activemq.ConsumerCount
    • MemoryPercentUsage: activemq.memory.usage <==> activemq.memory.MemoryPercentUsage
    • QueueSize: activemq.message.current <==> activemq.message.QueueSize
    • ExpiredCount: activemq.message.expired <==> activemq.message.ExpiredCount
    • EnqueueCount: activemq.message.enqueued <==> activemq.message.EnqueueCount
    • DequeueCount: activemq.message.dequeued <==> activemq.message.DequeueCount
    • AverageEnqueueTime: activemq.message.wait_time.avg <==> activemq.message.AverageEnqueueTime

All desc fields in properties needs to be synchronized because wording is different

  • JMX: org.apache.activemq:type=Broker,brokerName=*
    • CurrentConnectionsCount: activemq.connection.count <==> activemq.connections.CurrentConnectionsCount
    • StorePercentUsage: activemq.disk.store_usage <==> activemq.disc.StorePercentUsage
    • TempPercentUsage: activemq.disk.temp_usage <==> activemq.disc.TempPercentUsage

@robsunday
Copy link
Contributor

solr case is very similar to hbase. No YAML at the moment but creating it should not be an issue.

@SylvainJuge
Copy link
Contributor Author

For kafka, the YAML is kafka-broker.yaml

JMX: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:

  • Count : kafka.message.count
    JMX: kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec:
  • Count: kafka.request.count with type = produce
    JMX: kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec:
  • Count: kafka.request.count with type = fetch
    JMX: kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec:
  • Count: kafka.request.failed with type = produce
    JMX: kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec:
  • Count: kafka.request.failed with type = fetch

I haven't checked in detail all the others, but they look identical between the two implementations.

I discovered that we have a way to use multiple mbeans names with the same metrics definition as seen in kafka-broker.yaml

For kafka-consumer.groovy and kafka-producer.groovy there is no equivalent YAML mapping though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants