A high-performance Apache Kafka® consumer group lag exporter written in Rust. Calculates both offset lag and time lag (latency in seconds) with accurate timestamp-based measurements.
- Accurate Time Lag Calculation — Directly reads message timestamps from Kafka® partitions instead of interpolating from lookup tables
- Dual Export Support — Native Prometheus HTTP endpoint (
/metrics) and OpenTelemetry OTLP export - Non-blocking Scrapes — Continuous background collection with instant metric reads (no Kafka® calls during Prometheus scrapes)
- Multi-cluster Support — Monitor multiple Kafka® clusters with independent collection loops and failure isolation
- Flexible Filtering — Regex-based whitelist/blacklist for consumer groups and topics
- Configurable Granularity — Topic-level (reduced cardinality) or partition-level metrics
- Custom Labels — Add environment, datacenter, or any custom labels to all metrics
- Full Authentication Support — SASL/PLAIN, SASL/SCRAM, SSL/TLS, and Kerberos via librdkafka
- Production Ready — Health (
/health) and readiness (/ready) endpoints for Kubernetes deployments - Resource Efficient — Written in Rust with async/await, minimal memory footprint, and bounded concurrency
| Feature | klag-exporter | kafka-lag-exporter | KMinion |
|---|---|---|---|
| Language | Rust | Scala (JVM) | Go |
| Time Lag | Direct timestamp reading | Interpolation from lookup table | Offset-only (requires PromQL) |
| Idle Producer Handling | Shows actual message age | Shows 0 (interpolation fails) | N/A |
| Memory Usage | ~20-50 MB | ~200-500 MB (JVM) | ~50-100 MB |
| Startup Time | < 1 second | 5-15 seconds (JVM warmup) | < 1 second |
| OpenTelemetry | Native OTLP | No | No |
| Blocking Scrapes | No | No | Yes |
Problem with interpolation (kafka-lag-exporter):
- Builds a lookup table of (offset, timestamp) pairs over time
- Interpolates to estimate lag — breaks when producers stop sending
- Shows 0 lag incorrectly for idle topics
- Requires many poll cycles to build accurate tables
Our approach (targeted timestamp sampling):
- Seeks directly to the consumer group's committed offset
- Reads actual message timestamp — always accurate
- Handles idle producers correctly (shows true message age)
- TTL-cached to prevent excessive broker load
docker run -d \
-p 8000:8000 \
-v $(pwd)/config.toml:/etc/klag-exporter/config.toml \
klag-exporter:latest \
--config /etc/klag-exporter/config.toml# Build from source
cargo build --release
# Run with config file
./target/release/klag-exporter --config config.toml
# Run with debug logging
./target/release/klag-exporter -c config.toml -l debugA complete demo environment with Kafka®, Prometheus, and Grafana is available in the test-stack/ directory:
cd test-stack
docker-compose up -d --build
# Access points:
# - Grafana: http://localhost:3000 (admin/admin)
# - Prometheus: http://localhost:9090
# - klag-exporter: http://localhost:8000/metrics
# - Kafka UI: http://localhost:8080Create a config.toml file:
[exporter]
poll_interval = "30s"
http_port = 8000
http_host = "0.0.0.0"
granularity = "partition" # "topic" or "partition"
[exporter.timestamp_sampling]
enabled = true
cache_ttl = "60s"
max_concurrent_fetches = 10
[exporter.otel]
enabled = false
endpoint = "http://localhost:4317"
export_interval = "60s"
[[clusters]]
name = "production"
bootstrap_servers = "kafka1:9092,kafka2:9092"
group_whitelist = [".*"]
group_blacklist = []
topic_whitelist = [".*"]
topic_blacklist = ["__.*"] # Exclude internal topics
[clusters.consumer_properties]
"security.protocol" = "SASL_SSL"
"sasl.mechanism" = "PLAIN"
"sasl.username" = "${KAFKA_USER}"
"sasl.password" = "${KAFKA_PASSWORD}"
[clusters.labels]
environment = "production"
datacenter = "us-east-1"Use ${VAR_NAME} syntax in config values. The exporter will substitute with environment variable values at startup.
| Metric | Labels | Description |
|---|---|---|
kafka_partition_latest_offset |
cluster_name, topic, partition | High watermark offset |
kafka_partition_earliest_offset |
cluster_name, topic, partition | Low watermark offset |
| Metric | Labels | Description |
|---|---|---|
kafka_consumergroup_group_offset |
cluster_name, group, topic, partition, member_host, consumer_id, client_id | Committed offset |
kafka_consumergroup_group_lag |
cluster_name, group, topic, partition, member_host, consumer_id, client_id | Offset lag |
kafka_consumergroup_group_lag_seconds |
cluster_name, group, topic, partition, member_host, consumer_id, client_id | Time lag in seconds |
| Metric | Labels | Description |
|---|---|---|
kafka_consumergroup_group_max_lag |
cluster_name, group | Max offset lag across partitions |
kafka_consumergroup_group_max_lag_seconds |
cluster_name, group | Max time lag across partitions |
kafka_consumergroup_group_sum_lag |
cluster_name, group | Sum of offset lag |
kafka_consumergroup_group_topic_sum_lag |
cluster_name, group, topic | Sum of offset lag per topic |
| Metric | Labels | Description |
|---|---|---|
kafka_consumergroup_poll_time_ms |
cluster_name | Time to poll all offsets |
kafka_lag_exporter_scrape_duration_seconds |
cluster_name | Collection cycle duration |
kafka_lag_exporter_up |
— | 1 if healthy, 0 otherwise |
| Endpoint | Description |
|---|---|
GET /metrics |
Prometheus metrics |
GET /health |
Liveness probe (always 200 if running) |
GET /ready |
Readiness probe (200 when metrics available) |
GET / |
Basic info page |
┌─────────────────────────────────────────────────────────────────────────────┐
│ Main Application │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Config │ │ HTTP Server │ │ Metrics Registry │ │
│ │ Loader │ │ (Prometheus + │ │ (In-memory Gauge Store) │ │
│ │ │ │ Health) │ │ │ │
│ └──────────────┘ └──────────────────┘ └──────────────────────────────┘ │
│ ▲ │
│ ┌──────────────────────────────────────────────────────┴────────────────┐ │
│ │ Cluster Manager (per cluster) │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ │
│ │ │ Offset │ │ Timestamp │ │ Metrics │ │ │
│ │ │ Collector │ │ Sampler │ │ Calculator │ │ │
│ │ │ (Admin API) │ │ (Consumer) │ │ │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Export Layer │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ Prometheus Exporter │ │ OpenTelemetry Exporter │ │ │
│ │ │ (HTTP /metrics) │ │ (OTLP gRPC) │ │ │
│ │ └─────────────────────────┘ └─────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
- Rust 1.78 or later
- CMake (for librdkafka)
- OpenSSL development libraries
- SASL development libraries (optional, for Kerberos)
# Debug build
cargo build
# Release build (optimized)
cargo build --release
# Run tests
cargo test
# Run linter
cargo clippydocker build -t klag-exporter:latest .apiVersion: apps/v1
kind: Deployment
metadata:
name: klag-exporter
spec:
replicas: 1
selector:
matchLabels:
app: klag-exporter
template:
metadata:
labels:
app: klag-exporter
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
containers:
- name: klag-exporter
image: klag-exporter:latest
ports:
- containerPort: 8000
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
volumeMounts:
- name: config
mountPath: /etc/klag-exporter
env:
- name: KAFKA_USER
valueFrom:
secretKeyRef:
name: kafka-credentials
key: username
- name: KAFKA_PASSWORD
valueFrom:
secretKeyRef:
name: kafka-credentials
key: password
volumes:
- name: config
configMap:
name: klag-exporter-config
---
apiVersion: v1
kind: Service
metadata:
name: klag-exporter
labels:
app: klag-exporter
spec:
ports:
- port: 8000
targetPort: 8000
name: metrics
selector:
app: klag-exporterA pre-built Grafana dashboard is included in test-stack/grafana/provisioning/dashboards/kafka-lag.json with:
- Total consumer lag overview
- Max time lag per consumer group
- Per-partition lag breakdown
- Offset progression over time
- Message rate calculations
- Exporter health status
To calculate accurate time lag, klag-exporter fetches messages from Kafka® at the consumer group's committed offset to read the message timestamp. Only the timestamp metadata is extracted — the message payload (key and value) is never read, logged, or stored.
| Risk | Level | Notes |
|---|---|---|
| Data exposure in logs | None | Only topic/partition/offset/timestamp logged, never payload |
| Data in memory | Low | Payload briefly in process memory (~ms), then dropped |
| Data exfiltration | None | Payload never sent, stored, or exposed via API |
| Network exposure | Same as any consumer | Use TLS (security.protocol=SSL) for encryption |
For sensitive environments:
- Disable timestamp sampling with
timestamp_sampling.enabled = false— you'll still get offset lag metrics - Fetch size is limited to 256KB per partition to minimize data transfer
- Run klag-exporter with minimal privileges and restricted network access
This is expected when:
- Consumer catches up completely (lag = 0)
- Timestamp cache expires and refetch is in progress
- Kafka® fetch times out
Solutions:
- Increase
cache_ttlin config - Use Grafana's "Connect null values" option
- For alerting, use
avg_over_time()orlast_over_time()
- Reduce
max_concurrent_fetches - Use
granularity = "topic"instead of"partition" - Add more restrictive
group_blacklist/topic_blacklistpatterns
- Verify
bootstrap_serversare reachable - Check authentication configuration in
consumer_properties - Ensure network policies allow connections to Kafka® brokers
MIT License — see LICENSE for details.
Contributions are welcome! Please open an issue or submit a pull request.
Apache®, Apache Kafka®, and Kafka® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. This project is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. For more information about Apache trademarks, please see the Apache Trademark Policy.
