-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
🚀 Describe the new functionality needed
The telemetry system llama stack has right now is a good start, but there is work to be done to polish it into something more production ready.
Goals
The goal of this feature is to significantly simplify the telemetry systems in llama stack so that:
- The developer experience for testing and capturing telemetry for new or existing services uses open telemetry in a way that is simple and consistent across the projeect
- The telemetry that is captured is well documented, and easy to use and integrate with popular telemetry provider offerings like: datadog, new relic, dynatrace, jaeger, prometheus, and grafana
- It is as simple as possible to export telemetry to an OTLP collector from llama stack
- power user features are available for advanced cases, but are not necessary to expose to regular users
- Telemetry that is captured is secure by default, but can be configured to be more detailed as needed.
Design
User Facing Changes
Since the telemetry API is being removed, the provider should also be removed from the config. There are two reasons for this:
- providers are generally API implementations, which telemetry will no longer be
- it can be confusing for users to set open telemetry config options in a config file, since they are always overwritten by the environment variables.
Due to llama stack using uvicorn, and known issues with open telemetry auto instrumentation not being passed on to python subprocesses it starts for each worker, we will need to initialize telemetry manually in llama stack. As a result, llama stack will ship with telemetry enabled by default, it will capture data that is always secure, and export it via http/protobuff
unless otherwise configured.
We will defer configuration of OTEL as much as possible to the pre-defined environment variables unless there is a very good reason not to do so.
A warning log will be written if the OTLP exporter and protocol are empty or incorrectly set to alert users to configuration errors, otherwise the telemetry system will fail silently.
To export data with a different protocol, the OTEL_EXPORTER_OTLP_PROTOCOL
environment variable can be used. To export data to an OTLP at a custom location, OTEL_EXPORTER_OTLP_ENDPOINT
can be set.
To disable telemetry, users can set OTEL_SDK_DISABLED=True
. To disable capturing telemetry from a given service, they can use OTEL_PYTHON_DISABLED_INSTRUMENTATIONS=sqlalchemy
.
New Config Structure
To simplify the telemetry workflow, we are changing the config to have a simple top level option for enabling and disabling telemetry, which can be expanded later.
version: 2
telemetry:
enabled: True
apis:
- inference
- safety
- vector_io
providers:
inference:
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY:=}
vector_io:
- provider_id: faiss
provider_type: inline::faiss
config:
kvstore:
type: sqlite
db_path: ~/.llama/faiss_store.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config: {}
server:
port: 8321
Documentation Changes
Maintain a Telemetry subsection of the llama stack docs, which keeps detailed records of what custom telemetry data we capture for each API endpoint. Customers can reasonably assume what data gets captured automatically by open telemetry, since that is standardized, so it does not need to be captured.
Internal Changes
Once the testing changes land, we can use that to make sure that the same baseline quality of telemetry data is exported.
I would advocate for the following internal changes to be made:
- fix: remove broken tracing middleware #3723: remove custom tracing middleware and lean on built in fastAPI instrumentation
- WIP: auto instrument #3733: use automatic intrumentation installation to make sure all ingress and egress points are being traced and observed at a baseline level, then improve from there later with manual instrumentation.
- make capturing request/response bodies something that is disabled by default, but can be enabled. This prevents accidental capture of sensitive data like prompts or images.
- capture attributes more efficiently. We capture the same attributes multiple times in a given trace, causing inflated data volumes.
Reference Architecture
Improve the scripts/telemetry library to work by default, and include a Grafana dashboard that shows off the telemetry data we capture across the stack.
💡 Why is this needed? What if we don't build it?
This makes the telemetry offering from llama stack complete, easy to use, and digestable to users. It gets out of their way as much as possible, and offers a reference architecture that they can adopt or lean on to consume llama stack telemetry with little or no effort.
Core Tasks
- Remove Telemetry API
- Create Instrumentation data test
- Implement Automatic Instrumentation installation
- Enrich Auto instrumentation so that it captures data in a way that conforms to the tested requirements
- Documentation for new user workflow
- Remove Telemetry Provider
Nice to Have
- Create a configurable way to enable/disable capture of request/response bodies
- Remove duplicate fields from telemetry data
- Power User documentation for things like enabling/disabling capture of HTTP bodies or headers