Skip to content

Conversation

@swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Nov 20, 2025

What does this PR do?

When we generate self-monitoring configuration, we do certain things differently depending on whether a component will run in a beat process or a beat receiver in an otel collector. This PR ensures this information is accurate. Up until now, this decision was based on which runtime the component was configured to use, rather than what it ultimately used. If the component cannot run in the otel runtime - for example because the output is not supported - it would fall back to the process runtime, but this would happen after the self-monitoring configuration was generated, leading to inconsistencies.

This is achieved by making the following changes:

  1. Falling back to the process runtime is done as a component modifier instead of a dedicated Coordinator method.
  2. Component modifiers are now applied immediately after the components are generated, and before the self-monitoring configuration is generated. As a result, the monitoring manager now sees what runtime components will actually run as.
  3. For the self-monitoring components, we check if the output is otel supported and make decisions based on that.

Why is it important?

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

How to test this PR locally

Build agent locally and run using the following configuration:

agent:
  logging:
    to_stderr: true
inputs:
- data_stream:
    namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream:
      dataset: system.cpu
    metricsets:
    - cpu
  type: system/metrics
  use_output: default
outputs:
  default:
    username: elastic
    password: elastic
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch
    indices: []

Verify that all the components are running as beats processes via the status, and that the prometheus monitoring component is not present.

Related issues

@swiatekm swiatekm added bug Something isn't working backport-8.19 Automated backport to the 8.19 branch backport-9.2 Automated backport to the 9.2 branch labels Nov 20, 2025
@swiatekm swiatekm force-pushed the feat/accurate-otel-support-detection-monitoring branch 8 times, most recently from 7e29478 to aba6cc1 Compare November 22, 2025 19:41
@swiatekm swiatekm changed the title Ensure the self-monitoring configuration accounts for the runtime components actually run in Ensure the self-monitoring configuration knows the actual component runtime Nov 22, 2025
@swiatekm swiatekm force-pushed the feat/accurate-otel-support-detection-monitoring branch from 1dc723b to 0e8bc78 Compare November 24, 2025 11:24
@swiatekm swiatekm force-pushed the feat/accurate-otel-support-detection-monitoring branch from 0e8bc78 to c20cea5 Compare November 24, 2025 13:42
@swiatekm swiatekm marked this pull request as ready for review November 24, 2025 13:42
@swiatekm swiatekm requested a review from a team as a code owner November 24, 2025 13:42
@swiatekm swiatekm added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 24, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm requested a review from cmacknz November 24, 2025 13:43
@elasticmachine
Copy link
Contributor

elasticmachine commented Nov 24, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @swiatekm

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and works, a couple of very minor comments.

A lot of this is mechanical so we can probably backport after 9.2.2 is released, I don't think we need to rush this into 9.2.2 as long as we are confident there is no reason to with the work around that is already there.

@swiatekm swiatekm removed the backport-9.2 Automated backport to the 9.2 branch label Nov 25, 2025
@swiatekm swiatekm requested a review from cmacknz November 25, 2025 16:12
@swiatekm swiatekm merged commit 2c4c615 into main Nov 26, 2025
21 checks passed
@swiatekm swiatekm deleted the feat/accurate-otel-support-detection-monitoring branch November 26, 2025 11:37
mergify bot pushed a commit that referenced this pull request Nov 26, 2025
…untime (#11300)

* Move ComponentsModifies to the component package

* Move Otel runtime determination to component modifier

* Check supported outputs in monitoring config generation

* Add changelog entry

* Log warning about switching to process runtime for monitoring

* Fix monitoring config types

* fix TestBeatsReceiverProcessRuntimeFallback

* Add logstash output to test cases

(cherry picked from commit 2c4c615)

# Conflicts:
#	internal/pkg/agent/application/monitoring/component/v1_monitor.go
#	internal/pkg/agent/install/componentvalidation/validation.go
#	internal/pkg/otel/translate/otelconfig.go
#	pkg/component/component.go
#	testing/integration/ess/beat_receivers_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove the workaround for environment variable injection issues in beats

4 participants