Skip to content

feat(otel): add OpenTelemetry ingest, query, and frontend traces UI#18

Open
dviejokfs wants to merge 4 commits intomainfrom
feat/add-otel
Open

feat(otel): add OpenTelemetry ingest, query, and frontend traces UI#18
dviejokfs wants to merge 4 commits intomainfrom
feat/add-otel

Conversation

@dviejokfs
Copy link
Contributor

Description

Add a complete OpenTelemetry observability stack to Temps — from OTLP/HTTP protobuf ingest through TimescaleDB storage to a frontend traces visualization UI.

Backend (temps-otel crate)

  • OTLP/HTTP protobuf ingest for traces, metrics, and logs (gzip/zstd decompression)
  • Dual auth: API keys (tk_) and deployment tokens (dt_), with header-based and path-based ingest routes
  • TimescaleDB storage with hypertables, continuous aggregates, compression, and retention
  • Query API: filter/list spans, get trace, query metrics with time_bucket, query logs, pipeline stats, health summaries, insights
  • Rate limiting, storage quota checks, anomaly detection, health compute service
  • OpenAPI annotations on all 12+ endpoints, 117 passing unit tests

Auth & Permissions

  • OtelRead/OtelWrite permissions added
  • deployment_id added to deployment tokens for full OTel context propagation
  • Migration for the new column

Frontend

  • Traces list with filtering (time range, service, status, trace ID search) and trace-level error aggregation
  • Trace detail with span waterfall visualization, span detail panel, and refresh button
  • Setup section with environment selector, OTLP endpoint, and Next.js code snippets
  • Sidebar nav item added

Notable fixes

  • Protobuf Span.flags changed from uint32 to fixed32 per OTLP v1.1.0+ spec
  • Removed server-side tail sampling — sampling is the client SDK's responsibility
  • Fixed TraceDetail data extraction (data.data not data.spans)
  • Fixed status code comparisons (uppercase ERROR/OK from API)
  • Fixed waterfall duration label visibility for wide spans

Type of change

  • New feature (non-breaking change that adds functionality)

Checklist

  • I have written tests that cover the changes
  • All new and existing tests pass (cargo test --lib)
  • cargo check --lib passes with no warnings
  • My commits follow the Conventional Commits format
  • I have updated documentation where necessary

Related issues

Ref #17

… integration

- Introduced `temps-otel` crate to the workspace and updated dependencies in `Cargo.toml`.
- Added `OtelRead` and `OtelWrite` permissions to the `Permission` enum in `temps-auth`.
- Registered `OtelPlugin` in the console API for OpenTelemetry metrics, traces, and logs collection.
- Created migration for OpenTelemetry tables in the database.
- Updated relevant files to integrate OpenTelemetry functionality across the application.
- Added `.env` to `.gitignore` to prevent sensitive information from being tracked.
- Updated `Cargo.toml` to include new crates: `temps-environments`, `temps-screenshots`, and `temps-embeddings`.
- Added `tower` and `uuid` dependencies to `Cargo.lock` and `Cargo.toml`.
- Enhanced `CHANGELOG.md` with new features related to PostgreSQL backups and preset providers.
- Updated `docker-compose.yml` for PostgreSQL configuration to support WAL-G for backups.
- Improved CLI error handling and added source map management commands in `temps-cli`.
- Refined analytics event handling and introduced console event ingestion in `temps-analytics-events`.
- Added resource monitoring tab in the project sidebar and a dedicated monitoring settings page with per-environment CPU, memory, and disk metrics.
- Introduced `status_code_class` query parameter for proxy log stats endpoints to filter by status code classes (e.g., "2xx", "3xx").
- Implemented TimescaleDB compression and retention policies for the `proxy_logs` hypertable, optimizing data management.
- Enabled `cargo clippy` pre-commit hook to catch lint issues before CI, improving code quality.
- Updated various components and API types to support new monitoring functionalities and enhance user experience.
- Complete temps-otel crate: OTLP/HTTP protobuf ingest (traces, metrics, logs),
  query handlers, TimescaleDB storage, rate limiting, quota checks, anomaly
  detection, health summaries, and sidecar config generation
- Auth: support tk_ (API key) and dt_ (deployment token) authentication for
  OTel ingest with path-based and header-based routes
- Frontend: Traces list page with filtering (time range, service, status),
  trace detail page with span waterfall visualization and span detail panel,
  setup section with OTLP endpoint and Next.js code snippets
- Add deployment_id to deployment tokens for OTel context propagation
- Fix protobuf Span.flags from uint32 to fixed32 per OTLP v1.1.0+ spec
- Remove server-side tail sampling (sampling is client SDK responsibility)
- Add OtelRead/OtelWrite permissions, plugin registered in console
- 117 passing unit tests, zero clippy warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant