MCP Observability gives your AI agents—and you—a single HTTPS endpoint to explore logs, metrics and traces for any application, without learning half-a-dozen vendor APIs.
• Standards-based: OpenTelemetry ingestion, Grafana/Loki/Tempo storage • AI-ready: implements the Model-Context-Protocol (MCP) so agents can ask natural-language questions about performance issues • Zero-lock-in: self-host via Docker Compose, Kubernetes (Helm) or any cloud container platform
Imagine asking your favourite coding agent:
"The /checkout endpoint went 💥 at 09:33 UTC. Why?"
The agent queries the MCP endpoint, pulls traces, error logs and latency metrics, and replies with an RCA—and a pull-request suggestion. That is what this repo enables.
Choose the deployment style that fits you:
| Environment | Quick start | Full guide |
|---|---|---|
| Local laptop | docker compose -f mcp-obs.yml up -d |
Docker guide |
| Kubernetes | helm install mcp charts/mcp-obs |
Kubernetes & Helm |
| Cloud (ECS, Cloud Run, Azure CA) | see Terraform/CLI snippets | Cloud deployment |
After the stack is up:
Note – the Compose file now pulls the pre-built image
pmcfadin/mcp-observability:latestfrom Docker Hub; no local Docker build is required.
export MCP_TOKEN="$(openssl rand -hex 16)" # use the same token you passed during install
# Health check
curl -k -H "Authorization: Bearer $MCP_TOKEN" https://<HOST>:8000/healthOpen Grafana at https://<HOST>:3000 (admin / $GF_ADMIN_PASSWORD) to browse dashboards.
All requests are HTTPS + Bearer-token. Key endpoints:
| Path | What it's for |
|---|---|
/logs/errors?limit=100 |
Latest error logs from Loki |
/metrics/latency?percentile=0.95 |
95-th percentile latency from Prometheus |
/resources |
Metadata describing the data sources your agent can query |
/prompts |
Parameterised prompt templates you can re-use |
"Retrieve the top three slowest routes over the last hour and suggest an optimisation."
that becomes an MCP SamplingRequest under the hood; the server stitches data from Prometheus & Tempo and returns JSON your agent can read.
Pick your language guide and drop in the ready-made snippet:
- Python & FastAPI – docs/observability/python_fastapi.md
- Java / Spring Boot – docs/observability/java_spring.md
- Node.js – docs/observability/nodejs.md
- Go – docs/observability/go.md
- Next.js – docs/observability/nextjs.md
- React SPA – docs/observability/react.md
Need only a subset (e.g. logs-only)? Use the Partial-deployment guide.
- Bearer token – set
MCP_TOKENin your deployment and pass it inAuthorizationheaders. - TLS – self-signed by default (dev), or plug in cert-manager / ACM / Cloud CA.
- mTLS (optional) – enable client cert auth by mounting your CA & toggling
security.mtls=trueinvalues.yaml.
| Symptom | Check | Fix |
|---|---|---|
401 Unauthorized |
Correct token? | Pass -H "Authorization: Bearer $MCP_TOKEN" |
| No traces showing | Otel-collector healthy? Endpoint URL correct in client? | Verify port 4318 reachable |
| Grafana empty dashboards | Components disabled | See Partial deployment |
Fantastic 🎉 — head over to docs/contributing for architecture diagrams, dev-environment setup, and the contributor workflow. The README you're reading will stay laser-focused on users and agents.