|
1 | | -# OpenTelemetry Kube Stack |
| 1 | +# OpenTelemetry Kube Stack – Base Configuration |
2 | 2 |
|
3 | | -The OpenTelemetry Kube Stack is a comprehensive observability solution that provides a complete OpenTelemetry setup for Kubernetes clusters. It includes the OpenTelemetry Operator, collectors, and essential monitoring components. |
| 3 | +This directory contains the **base manifests** for deploying the [OpenTelemetry Kube Stack](https://opentelemetry.io/), a **unified observability framework** for collecting, processing, and exporting **traces and logs** from Kubernetes workloads and infrastructure components. |
| 4 | +It is designed to be **consumed by cluster repositories** as a remote base, allowing each cluster to apply **custom overrides** as needed. |
4 | 5 |
|
5 | | -## Overview |
| 6 | +--- |
6 | 7 |
|
7 | | -This chart deploys: |
8 | | -- **OpenTelemetry Operator**: Manages OpenTelemetry collectors and instrumentation |
9 | | -- **OpenTelemetry Collector**: Collects, processes, and exports telemetry data |
10 | | -- **Kube State Metrics**: Exposes cluster-level metrics about Kubernetes objects |
11 | | -- **Node Exporter**: Collects hardware and OS metrics from cluster nodes |
| 8 | +## About OpenTelemetry Kube Stack |
12 | 9 |
|
13 | | -## Configuration |
14 | | - |
15 | | -### Chart Information |
16 | | -- **Chart**: opentelemetry-kube-stack |
17 | | -- **Version**: 0.11.1 |
18 | | -- **App Version**: 0.129.1 |
19 | | -- **Repository**: https://open-telemetry.github.io/opentelemetry-helm-charts |
20 | | - |
21 | | -### Namespace |
22 | | -Deployed in the `observability` namespace alongside other monitoring components. |
23 | | - |
24 | | -### Security Hardening |
25 | | - |
26 | | -The deployment includes comprehensive security configurations: |
27 | | - |
28 | | -#### Container Security |
29 | | -- Non-root execution (`runAsNonRoot: true`) |
30 | | -- Specific user ID (`runAsUser: 65534`) |
31 | | -- Security profiles (`seccompProfile.type: RuntimeDefault`) |
32 | | -- Capability dropping (`capabilities.drop: [ALL]`) |
33 | | -- Read-only root filesystem (`readOnlyRootFilesystem: true`) |
34 | | -- Privilege escalation disabled (`allowPrivilegeEscalation: false`) |
35 | | - |
36 | | -#### Resource Management |
37 | | -- CPU and memory limits defined for all components |
38 | | -- Resource requests set for proper scheduling |
39 | | -- Memory limiter processor configured for collectors |
40 | | - |
41 | | -#### Network Security |
42 | | -- OTLP receivers configured on standard ports (4317/4318) |
43 | | -- Service monitors enabled for Prometheus integration |
44 | | -- Node selectors for Linux-only deployment |
45 | | - |
46 | | -### Key Features |
47 | | - |
48 | | -#### OpenTelemetry Operator |
49 | | -- Manages collector lifecycle and configuration |
50 | | -- Supports auto-instrumentation for applications |
51 | | -- Webhook-based configuration validation |
52 | | - |
53 | | -#### Collector Configuration |
54 | | -- OTLP receivers for traces, metrics, and logs |
55 | | -- Batch processing for efficient data handling |
56 | | -- Memory limiting to prevent resource exhaustion |
57 | | -- Logging exporter for initial setup (can be customized) |
58 | | - |
59 | | -#### Monitoring Integration |
60 | | -- Prometheus ServiceMonitor resources enabled |
61 | | -- Kube State Metrics for cluster-level observability |
62 | | -- Node Exporter for infrastructure metrics |
63 | | -- Compatible with existing Prometheus stack |
64 | | - |
65 | | -### Customization |
66 | | - |
67 | | -#### Collector Configuration |
68 | | -The default collector configuration can be extended by modifying the `config` section in the hardened values file. Common customizations include: |
69 | | - |
70 | | -```yaml |
71 | | -config: |
72 | | - exporters: |
73 | | - otlp: |
74 | | - endpoint: "your-backend:4317" |
75 | | - tls: |
76 | | - insecure: false |
77 | | - prometheusremotewrite: |
78 | | - endpoint: "https://prometheus.example.com/api/v1/write" |
79 | | -``` |
80 | | -
|
81 | | -#### Resource Scaling |
82 | | -Adjust resource limits based on cluster size and telemetry volume: |
83 | | -
|
84 | | -```yaml |
85 | | -resources: |
86 | | - requests: |
87 | | - memory: "256Mi" |
88 | | - cpu: "200m" |
89 | | - limits: |
90 | | - memory: "1Gi" |
91 | | - cpu: "1000m" |
92 | | -``` |
93 | | -
|
94 | | -### Dependencies |
95 | | -
|
96 | | -This chart has dependencies on: |
97 | | -- OpenTelemetry CRDs (installed automatically) |
98 | | -- Kubernetes 1.19+ for proper ServiceMonitor support |
99 | | -- Prometheus Operator (for ServiceMonitor resources) |
100 | | -
|
101 | | -### Compatibility |
102 | | -
|
103 | | -#### Existing Services |
104 | | -The configuration is designed to work alongside existing observability services: |
105 | | -- **kube-prometheus-stack**: Kubernetes service monitors disabled to avoid conflicts |
106 | | -- **Prometheus CRDs**: Installation disabled (uses existing CRDs) |
107 | | -- **Grafana**: Compatible with OpenTelemetry data sources |
108 | | -
|
109 | | -#### OpenTelemetry Operator |
110 | | -This deployment may conflict with the existing `opentelemetry-operator` service. Consider: |
111 | | -- Using this as a replacement for the standalone operator |
112 | | -- Disabling the operator component if only collectors are needed |
113 | | -- Coordinating CRD management between deployments |
114 | | - |
115 | | -### Monitoring and Observability |
116 | | - |
117 | | -#### Health Checks |
118 | | -Monitor the deployment status: |
119 | | -```bash |
120 | | -kubectl get helmrelease opentelemetry-kube-stack -n observability |
121 | | -kubectl get pods -n observability -l app.kubernetes.io/name=opentelemetry-kube-stack |
122 | | -``` |
123 | | - |
124 | | -#### Collector Status |
125 | | -Check OpenTelemetry collector status: |
126 | | -```bash |
127 | | -kubectl get opentelemetrycollector -n observability |
128 | | -kubectl logs -n observability -l app.kubernetes.io/component=opentelemetry-collector |
129 | | -``` |
130 | | - |
131 | | -#### Metrics Availability |
132 | | -Verify metrics collection: |
133 | | -```bash |
134 | | -kubectl port-forward -n observability svc/opentelemetry-kube-stack-collector 8888:8888 |
135 | | -curl http://localhost:8888/metrics |
136 | | -``` |
137 | | - |
138 | | -### Troubleshooting |
139 | | - |
140 | | -#### Common Issues |
141 | | - |
142 | | -1. **CRD Conflicts**: If OpenTelemetry CRDs already exist, disable installation: |
143 | | - ```yaml |
144 | | - crds: |
145 | | - installOtel: false |
146 | | - ``` |
147 | | - |
148 | | -2. **Resource Constraints**: Increase resource limits if collectors are OOMKilled: |
149 | | - ```yaml |
150 | | - resources: |
151 | | - limits: |
152 | | - memory: "1Gi" |
153 | | - ``` |
154 | | - |
155 | | -3. **Webhook Failures**: If admission webhooks cause issues: |
156 | | - ```yaml |
157 | | - opentelemetry-operator: |
158 | | - admissionWebhooks: |
159 | | - failurePolicy: "Ignore" |
160 | | - ``` |
161 | | - |
162 | | -#### Debug Commands |
163 | | -```bash |
164 | | -# Check operator logs |
165 | | -kubectl logs -n observability -l app.kubernetes.io/name=opentelemetry-operator |
166 | | -
|
167 | | -# Describe collector resources |
168 | | -kubectl describe opentelemetrycollector -n observability |
169 | | -
|
170 | | -# Check service monitor status |
171 | | -kubectl get servicemonitor -n observability |
172 | | -``` |
173 | | - |
174 | | -### Integration Examples |
175 | | - |
176 | | -#### Application Instrumentation |
177 | | -Enable auto-instrumentation for applications: |
178 | | -```yaml |
179 | | -apiVersion: opentelemetry.io/v1alpha1 |
180 | | -kind: Instrumentation |
181 | | -metadata: |
182 | | - name: my-instrumentation |
183 | | -spec: |
184 | | - exporter: |
185 | | - endpoint: http://opentelemetry-kube-stack-collector:4317 |
186 | | - propagators: |
187 | | - - tracecontext |
188 | | - - baggage |
189 | | -``` |
190 | | - |
191 | | -#### Custom Exporters |
192 | | -Configure exporters for your observability backend: |
193 | | -```yaml |
194 | | -config: |
195 | | - exporters: |
196 | | - jaeger: |
197 | | - endpoint: jaeger-collector:14250 |
198 | | - tls: |
199 | | - insecure: true |
200 | | - prometheus: |
201 | | - endpoint: "0.0.0.0:8889" |
202 | | -``` |
203 | | - |
204 | | -This deployment provides a solid foundation for OpenTelemetry-based observability in Kubernetes environments with enterprise-grade security and monitoring capabilities. |
| 10 | +- Provides a **complete observability foundation** for Kubernetes clusters, integrating **traces and logs** under a single open standard. |
| 11 | +- Deployed using the **OpenTelemetry Operator**, which manages collectors, instrumentation, and telemetry pipelines declaratively via Kubernetes manifests. |
| 12 | +- Collects telemetry data from: |
| 13 | + - **Kubernetes system components** (API server, kubelet, scheduler, etc.) |
| 14 | + - **Application workloads** instrumented with OpenTelemetry SDKs or auto-instrumentation. |
| 15 | +- Processes data through **OpenTelemetry Collectors**, which perform transformation, filtering, batching, and enrichment before export. |
| 16 | +- Supports multiple backends including **Prometheus**, **Tempo**, **Loki**, **Grafana**, **Jaeger**, and **OTLP-compatible endpoints**. |
| 17 | +- Enables **auto-discovery and dynamic configuration** for Kubernetes workloads, simplifying instrumentation and reducing manual setup. |
| 18 | +- Designed for **scalability and resilience**, supporting both **agent** and **gateway** modes for distributed telemetry collection. |
| 19 | +- Natively integrates with **Grafana** and other observability tools for unified dashboards and correlation between metrics, traces, and logs. |
0 commit comments