Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRACING-4752: Add OpenTelemetry-Collector as optional sub-package #4281

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

copejon
Copy link
Contributor

@copejon copejon commented Dec 6, 2024

Which issue(s) this PR addresses:

Closes #

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2024
Copy link
Contributor

openshift-ci bot commented Dec 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Dec 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: copejon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2024
packaging/rpm/microshift.spec Outdated Show resolved Hide resolved
packaging/observability/opentelemetry-collector.yaml Outdated Show resolved Hide resolved
packaging/observability/opentelemetry-collector.yaml Outdated Show resolved Hide resolved
packaging/rpm/microshift.spec Show resolved Hide resolved
packaging/rpm/microshift.spec Show resolved Hide resolved
packaging/observability/microshift-observability.service Outdated Show resolved Hide resolved
@@ -0,0 +1,27 @@
[Unit]
Description=MicroShift Observability
BindsTo=microshift.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to run the collector even when MicroShift fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think yes. If MicroShift fails to start, the metrics and log data should still be collectable by the metrics/logging backend remotely.

packaging/observability/microshift-observability.service Outdated Show resolved Hide resolved
@ggiguash
Copy link
Contributor

ggiguash commented Dec 9, 2024

/retitle NO-ISSUE: OpenTelemetry certificates and service for MicroShift

@openshift-ci openshift-ci bot changed the title No issue generate otel cert NO-ISSUE: OpenTelemetry certificates and service for MicroShift Dec 9, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 9, 2024
@openshift-ci-robot
Copy link

@copejon: This pull request explicitly references no jira issue.

In response to this:

Which issue(s) this PR addresses:

Closes #

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@copejon copejon changed the title NO-ISSUE: OpenTelemetry certificates and service for MicroShift TRACING-4752: Add OpenTelemetry-Collector as optional sub-package Dec 12, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2024

@copejon: This pull request references TRACING-4752 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Which issue(s) this PR addresses:

Closes #

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@copejon
Copy link
Contributor Author

copejon commented Dec 12, 2024

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2024

@copejon: This pull request references TRACING-4752 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@copejon copejon force-pushed the no-issue-generate-otel-cert branch from fa4f579 to fede276 Compare December 12, 2024 21:05
@copejon copejon marked this pull request as ready for review January 21, 2025 16:45
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2025
@openshift-ci openshift-ci bot requested review from agullon and eslutsky January 21, 2025 16:46
implemented opentelemetry-collector in packaging pipeline. the otel binary is unable to see the cert files, but the config paths are correct and the files exist. file permissions for the observability user have been checked, but are good. WIP

Signed-off-by: Jon Cope <[email protected]>
ensure the otelcol process creates necessary dirs

added firewall port handling for otel-col exporter

Signed-off-by: Jon Cope <[email protected]>
… to allow the collection of pod logs from the host filesystem

Signed-off-by: Jon Cope <[email protected]>
…s more sense that observability should

a) not be killed if microshift fails during start or runtime

b) not cause the microshift service to start, and only be started after microshift completes

Signed-off-by: Jon Cope <[email protected]>
…e extension

systemd unit sets microshift.service for RequiredBy=, meaning that if observability fails to start, microshift should be killed.

otel config now includes microshift-etcd, microshift-tuned services in log collection. Removed debugging exporter. Added filelog pipeline

Signed-off-by: Jon Cope <[email protected]>
@copejon copejon force-pushed the no-issue-generate-otel-cert branch from 2042714 to ccfea22 Compare January 22, 2025 23:20
Signed-off-by: Jon Cope <[email protected]>
Requires: opentelemetry-collector

%description observability
Deploys the Red Hat build of Opentelemetry-collector as a systemd service on host. MicroShift provides client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be consistent in the naming case. Either fix this, or the Summary section, please.

Suggested change
Deploys the Red Hat build of Opentelemetry-collector as a systemd service on host. MicroShift provides client
Deploys the Red Hat build of OpenTelemetry-Collector as a systemd service on host. MicroShift provides client

Comment on lines 232 to 234
certificates to permit access to the kube-apiserver metrics endpoints. If a user defined opentelemetry-collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided. Note that
the default configuration requires the backend endpoint be set by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
certificates to permit access to the kube-apiserver metrics endpoints. If a user defined opentelemetry-collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided. Note that
the default configuration requires the backend endpoint be set by the user.
certificates to permit access to the kube-apiserver metrics endpoints. If a user-defined configuration file exists
at /etc/microshift/opentelemetry-collector.yaml, this configuration is used. Otherwise, a default configuration is provided.
Note that the default configuration requires the backend endpoint be set by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the backend endpoint, should we be specific on what we expect users to set?
I mean, should we say exporters.otlp section must be edited by users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more specific instructions

# EXAMPLE OTLP (Prometheus) ENDPOINT CONFIG
# The otlp exporter requires an endpoint listening for OTLP connections. To prevent spamming the log with Go
# stack traces, the exporter is disabled. The endpoint is not known at installation, thus a tire-kicking of the
# microshift-observability package would result in stack traces spam in logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think what we can do so that the logs are not "spammed" when the default configuration is used. It sounds as if we should copy this file with .example suffix so that users would have to explicitly rename the file when they enable the collector service.

Copy link
Contributor

@ggiguash ggiguash Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the "style" of this comment should be reworded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked the comment and made it a little more informational

@@ -0,0 +1,20 @@
[Unit]
Description=MicroShift Observability
After=microshift.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ConditionPathExists here for all the files the service expects to have before it starts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opentelemetry-collector performs that check for us each time it starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the point of the condition in systemd is not to attempt starting the service if the path does no exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this help to avoid unnecessary restarts?


# It takes a bit for the certs to be created. This service will reach it's burst limit almost immediately, pretty much
# guaranteeing that it will reach the restart limit before it can possibly succeed.
RestartSec=200ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? We've configured the service to start After microshift, so microshift must report readiness to systemd before the current service startup is attempted. MicroShift only reports readiness after creating all certificates.
What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In earlier tests this was necessary to keep the service from crash looping, but that doesn't seem to be an issue in the latest opentelemetry-collector. Will remove

…es "debug" exporter instead, which writes to stderr

Signed-off-by: Jon Cope <[email protected]>
Signed-off-by: Jon Cope <[email protected]>
auth_type: tls
ca_file: /etc/pki/microshift-opentelemetry-collector-client/client-ca.crt
key_file: /etc/pki/microshift-opentelemetry-collector-client/client.key
cert_file: /etc/pki/microshift-opentelemetry-collector-client/client.crt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths need to be updated too -> /var/lib/microshift/..../

certificates to permit access to the kube-apiserver metrics endpoints. If a user defined Opentelemetry-Collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided. Note that
the default configuration requires the backend endpoint be set by the user. The otlp export must also be specified as
.service.pipelines.$RECIEVER.exporter: "otlp". The specification for the otlp config is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use shortened words because it's a user-facing RPM description.

fixed cert signer to be kubelet, not kube-apiserver

Signed-off-by: Jon Cope <[email protected]>
@copejon copejon force-pushed the no-issue-generate-otel-cert branch from aee833a to ad4892d Compare January 30, 2025 13:13
Copy link
Contributor

openshift-ci bot commented Jan 30, 2025

@copejon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-rebase ad4892d link false /test test-rebase
ci/prow/verify ad4892d link true /test verify

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Requires: opentelemetry-collector

%description observability
Deploys the Red Hat build of Opentelemetry-Collector as a systemd service on host. MicroShift provides client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix the case of Opentelemety -> OpenTelemetry to make it consistent with the summary text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants