Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRACING-4752: Add OpenTelemetry-Collector as optional sub-package #4281

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions assets/optional/observability/00-namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
kind: Namespace
apiVersion: v1
metadata:
name: openshift-observability
5 changes: 5 additions & 0 deletions assets/optional/observability/01-service-account.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: openshift-observability
name: openshift-observability-client
66 changes: 66 additions & 0 deletions assets/optional/observability/02-cluster-role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: openshift-observability-client
rules:
# OpenTelemetry Component: kubeletstats receiver
- apiGroups: [""]
resources: ["nodes/stats"]
verbs: ["get"]
# OpenTelemetry Component: k8s_event receiver
- apiGroups:
- ""
resources:
- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- pods
- pods/status
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
12 changes: 12 additions & 0 deletions assets/optional/observability/03-cluster-role-binding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: openshift-observability-client
namespace: openshift-observability
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: openshift-observability-client
subjects:
- kind: User
name: openshift-observability-client
7 changes: 7 additions & 0 deletions assets/optional/observability/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- 00-namespace.yaml
- 01-service-account.yaml
- 02-cluster-role.yaml
- 03-cluster-role-binding.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
enable microshift-observability.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means that service will be enabled upon installation?
We don't do that with microshift.service, so given dependencies betweem them, what will happen if microshift.service is not enabled by the user?

Saw explanation in other place

15 changes: 15 additions & 0 deletions packaging/observability/microshift-observability.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[Unit]
Description=MicroShift Observability
After=microshift.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ConditionPathExists here for all the files the service expects to have before it starts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opentelemetry-collector performs that check for us each time it starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the point of the condition in systemd is not to attempt starting the service if the path does no exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this help to avoid unnecessary restarts?

PartOf=microshift.service
AssertPathExists=/var/lib/microshift/resources/observability-client/kubeconfig

[Service]
Environment=KUBECONFIG=/var/lib/microshift/resources/observability-client/kubeconfig
Environment=K8S_NODE_NAME="%l"
ExecStart=/usr/bin/opentelemetry-collector --config=/etc/microshift/opentelemetry-collector.yaml
Restart=always
User=root

[Install]
RequiredBy=microshift.service
97 changes: 97 additions & 0 deletions packaging/observability/opentelemetry-collector.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Opentelemetry-collector.yaml provides a minimal set of metrics and logs for monitoring system, node, and
# workload resource usage for CPU, Memory, Disk, and Network. Included also are all cluster events of "Warning" type.

# This configuration exports:
# - Container, Pod, and Node metrics
# - Kubernetes Events (Warnings only)
# - Host's CPU, Mem, Disk, and Network Metrics
# - System journals for selected MicroShift services and dependencies, priority < Info

receivers:
kubeletstats:
auth_type: tls
ca_file: /var/lib/microshift/certs/ca-bundle/client-ca.crt
key_file: /var/lib/microshift/certs/admin-kubeconfig-signer/openshift-observability-client/client.key
cert_file: /var/lib/microshift/certs/admin-kubeconfig-signer/openshift-observability-client/client.crt
insecure_skip_verify: true
collection_interval: 20s
endpoint: "${env:K8S_NODE_NAME}:10250"
node: ${env:K8S_NODE_NAME}
k8s_api_config:
auth_type: kubeConfig
k8s_events:
auth_type: kubeConfig
hostmetrics:
root_path: /
collection_interval: 10s
scrapers:
cpu:
memory:
network:
disk:
filesystem:
journald:
units:
- microshift
- microshift-observability
- microshift-etcd
- crio
- openvswitch.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also ovsdb-server.service and ovs-vswitchd.service - should we include them?
In my experience, I think I saw more times ovsdb-server failing than the others

- ovsdb-server.service
- ovs-vswitchd.service
processors:
batch:
resourcedetection/system:
detectors: [ "system" ]
system:
hostname_sources: [ "os" ]

exporters:
otlp:
sending_queue:
storage: file_storage
# Endpoint must point to an ip or hostname and port of an otlp service. Here, the K8S_NODE_NAME is used because it
# will be resolved to the local node's hostname automatically. An unreachable endpoint will reported in the logs
# of the microshift-observability service.
endpoint: ${env:K8S_NODE_NAME}:4317
tls:
insecure: true

extensions:
file_storage:
directory: /var/lib/microshift-observability

service:
extensions: [ file_storage ]
pipelines:
metrics/kubeletstats:
receivers: [ kubeletstats ]
processors: [ batch ]
exporters: [ otlp ]
metrics/hostmetrics:
receivers: [ hostmetrics ]
processors: [ resourcedetection/system, batch ]
exporters: [ otlp ]
logs/kube_events:
receivers: [ k8s_events ]
processors: [ resourcedetection/system, batch ]
exporters: [ otlp ]
logs/host:
receivers: [ hostmetrics ]
processors: [ resourcedetection/system ]
exporters: [ otlp ]
logs/journald:
receivers: [ journald ]
processors: [ resourcedetection/system ]
exporters: [ otlp ]
telemetry:
metrics:
readers:
- periodic:
exporter:
otlp:
# Endpoint must point to an ip or hostname and port of an otlp service. Here, the K8S_NODE_NAME is used
# because it will be resolved to the local node's hostname automatically. An unreachable endpoint will
# reported in the logs of the microshift-observability service.
protocol: http/protobuf
endpoint: http://${env:K8S_NODE_NAME}:4318
37 changes: 37 additions & 0 deletions packaging/rpm/microshift.spec
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,17 @@ The microshift-ai-model-serving-release-info package provides release informatio
release. These files contain the list of container image references used by Model Serving
and can be used to embed those images into osbuilder blueprints or bootc containerfiles.

%package observability
Summary: OpenTelemetry-Collector configured for MicroShift
BuildArch: noarch
Requires: microshift = %{version}
Requires: opentelemetry-collector

%description observability
Deploys the Red Hat build of OpenTelemetry-Collector as a systemd service on host. MicroShift provides client
certificates to permit access to the kube-apiserver metrics endpoints. If a user-defined OpenTelemetry-Collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided.

%prep
# Dynamic detection of the available golang version also works for non-RPM golang packages
golang_detected=$(go version | awk '{print $3}' | tr -d '[a-z]' | cut -f1-2 -d.)
Expand Down Expand Up @@ -553,6 +564,15 @@ cat assets/optional/ai-model-serving/runtimes/kustomization.x86_64.yaml >> %{bui
mkdir -p -m755 %{buildroot}%{_datadir}/microshift/release
install -p -m644 assets/optional/ai-model-serving/release-ai-model-serving-x86_64.json %{buildroot}%{_datadir}/microshift/release/

#observability
install -d -m755 %{buildroot}%{_presetdir}
install -d -m755 %{buildroot}%{_sharedstatedir}/microshift-observability
install -p -m644 packaging/observability/opentelemetry-collector.yaml -D %{buildroot}%{_sysconfdir}/microshift/opentelemetry-collector.yaml
install -p -m644 packaging/observability/microshift-observability.service %{buildroot}%{_unitdir}/
install -p -m644 packaging/observability/90-enable-microshift-observability.preset %{buildroot}%{_presetdir}/
install -d -m755 %{buildroot}/%{_prefix}/lib/microshift/manifests.d/003-microshift-observability/
install -p -m644 assets/optional/observability/*.yaml %{buildroot}/%{_prefix}/lib/microshift/manifests.d/003-microshift-observability/

%pre networking

getent group hugetlbfs >/dev/null || groupadd -r hugetlbfs
Expand Down Expand Up @@ -610,6 +630,12 @@ if [ $1 -eq 1 ]; then
systemctl is-active --quiet crio && systemctl restart --quiet crio || true
fi

%post observability
%systemd_post microshift-observability.service

%preun observability
%systemd_preun microshift-observability.service

%files
%license LICENSE
%{_bindir}/microshift
Expand Down Expand Up @@ -727,10 +753,21 @@ fi
%files ai-model-serving-release-info
%{_datadir}/microshift/release/release-ai-model-serving-x86_64.json

%files observability
%dir %{_prefix}/lib/microshift/manifests.d/003-microshift-observability
%dir %{_sharedstatedir}/microshift-observability
%{_unitdir}/microshift-observability.service
%{_presetdir}/90-enable-microshift-observability.preset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this preset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the observability rpm were installed without the preset, then it's disabled by default and systemctl enable microshift command doesn't propagate to dependencies. That, coupled with the service settings RequiredBy=microshift.service, means that microshift itself can't start without the user manually enabling the observability service.

This way, the user isn't required to manage the service, as it's handled automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the other way around? When microshift+observability are installed, microshift is disabled, but observability enabled by default. So, the observability service will fail on reboot.

Copy link
Contributor

@ggiguash ggiguash Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the following in the observability unit configuration.
Isn't it enough to couple service start / stop?

[Unit]
PartOf=microshift.service
After=microshift.service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it enough to couple service start / stop?

Well, no. In order for the RequiredBy= directive to take effect, the service must be enabled. But as you said, if it's enabled, then after a reboot it would start and enter a fail loop. This is fixed by applying the After=microshift.service.

So the .preset file + the RequiredBy= is what activates the dependency. The After= ensures that the service doesn't start until microshift is active. PartOf= propagates stops/restarts of the microshift service to the observability service.

But I think in the end this ended up being a long walk to accomplishing (basically) the same thing as adding microshift-observability to the microshift.service's Wants= directive.

%{_sysconfdir}/microshift/opentelemetry-collector.yaml
%{_prefix}/lib/microshift/manifests.d/003-microshift-observability/*


# Use Git command to generate the log and replace the VERSION string
# LANG=C git log --date="format:%a %b %d %Y" --pretty="tformat:* %cd %an <%ae> VERSION%n- %s%n" packaging/rpm/microshift.spec
%changelog
* Mon Mar 17 2025 Jon Cope <[email protected]> 4.19.0
- Add an optional microshift-oservability RPM to enable OpenTelemetry collector preconfigured for MicroShift

* Thu Feb 13 2025 Patryk Matuszak <[email protected]> 4.19.0
- Add new RPM with AI Model Serving for MicroShift

Expand Down
22 changes: 21 additions & 1 deletion pkg/cmd/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,15 @@ func certSetup(cfg *config.Config) (*certchains.CertificateChains, error) {
ValidityDays: cryptomaterial.LongLivedCertificateValidityDays,
},
UserInfo: &user.DefaultInfo{Name: "system:admin", Groups: []string{"system:masters"}},
}),
}).WithClientCertificates(
&certchains.ClientCertificateSigningRequestInfo{
CSRMeta: certchains.CSRMeta{
Name: "openshift-observability-client",
ValidityDays: cryptomaterial.ShortLivedCertificateValidityDays,
},
UserInfo: &user.DefaultInfo{Name: "openshift-observability-client", Groups: []string{""}},
},
),

// kubelet + CSR signing chain
certchains.NewCertificateSigner(
Expand Down Expand Up @@ -536,6 +544,18 @@ func initKubeconfigs(
); err != nil {
return err
}
observabilityClientCertPEM, observabilityClientKeyPEM, err := certChains.GetCertKey("admin-kubeconfig-signer", "openshift-observability-client")
if err != nil {
return err
}
if err := util.KubeConfigWithClientCerts(
cfg.KubeConfigPath(config.ObservabilityClient),
cfg.ApiServer.URL,
internalTrustPEM,
observabilityClientCertPEM, observabilityClientKeyPEM,
); err != nil {
return err
}
return nil
}

Expand Down
1 change: 1 addition & 0 deletions pkg/config/kubeconfig.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ const (
Kubelet KubeConfigID = "kubelet"
ClusterPolicyController KubeConfigID = "cluster-policy-controller"
RouteControllerManager KubeConfigID = "route-controller-manager"
ObservabilityClient KubeConfigID = "observability-client"
)

// KubeConfigPath returns the path to the specified kubeconfig file.
Expand Down
9 changes: 9 additions & 0 deletions scripts/auto-rebase/assets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -294,3 +294,12 @@ assets:
- file: 05-daemonset.yaml
- file: release-kube-proxy-aarch64.json
- file: release-kube-proxy-x86_64.json

- dir: optional/observability/
ignore: "they don't exist in upstream repository - only in microshift"
files:
- file: 00-namespace.yaml
- file: 01-service-account.yaml
- file: 02-cluster-role.yaml
- file: 03-cluster-role-binding.yaml
- file: kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ RUN dnf repoinfo --enabled && \
"microshift-ai-model-serving-{{ .Env.SOURCE_VERSION }}" \
"microshift-ai-model-serving-release-info-{{ .Env.SOURCE_VERSION }}" \
# {{- end }}
"microshift-gateway-api-{{ .Env.SOURCE_VERSION }}" && \
"microshift-gateway-api-{{ .Env.SOURCE_VERSION }}" \
"microshift-observability-{{ .Env.SOURCE_VERSION }}" && \
rm -vf /etc/yum.repos.d/microshift-*.repo && \
rm -rvf $USHIFT_RPM_REPO_PATH && \
dnf clean all
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ COPY ./bootc-images/$USHIFT_RPM_REPO_NAME.repo ./bootc-images/microshift-centos9
RUN dnf repoinfo --enabled && \
dnf install -y "microshift-olm-{{ .Env.SOURCE_VERSION }}" \
"microshift-multus-{{ .Env.SOURCE_VERSION }}" \
"microshift-gateway-api-{{ .Env.SOURCE_VERSION }}" && \
"microshift-gateway-api-{{ .Env.SOURCE_VERSION }}" \
"microshift-observability-{{ .Env.SOURCE_VERSION }}" && \
rm -vf /etc/yum.repos.d/microshift-*.repo && \
rm -rvf $USHIFT_RPM_REPO_PATH && \
dnf clean all
Loading