Summary
Add a new built-in survey plugin that exports metrics only over OTLP/HTTP to a configurable telemetry endpoint.
The purpose of this plugin is to crowd source real-world model performance across live mesh-llm nodes without using the mesh itself for reporting. We want to learn which models are being launched, on what hardware, with what context sizes, and how well they actually perform in the wild.
This should use the standard Rust OpenTelemetry crates and exporter stack. We should not implement a custom OTLP wire protocol or custom JSON sink.
Goals
- Collect real-world model performance data from live nodes
- Understand:
- which models are being launched
- which launches succeed or fail
- what hardware the model ran on
- what context length was used
- how long launches take
- how long models stay loaded
- which models exit unexpectedly
- Build a dataset for comparing model viability across hardware and context settings
- Keep the implementation metrics-only
- Make the OTLP endpoint configurable
- Preserve privacy:
- no raw prompts
- no completions
- no logs
- no traces
Non-Goals
- No reporting through mesh gossip, mesh channels, or peer-to-peer relay
- No OTLP logs
- No OTLP traces
- No custom JSON/HTTP telemetry backend
- No raw prompt capture
- No prompt hashing or content-derived summaries
- No mesh protocol changes for this feature
Why This Exists
We need crowd-sourced performance data from real usage, not just local benchmarks.
The main questions we want this to answer are:
- Which models are people actually trying to run?
- Which models fail to launch in the wild?
- Which hardware/model combinations work reliably?
- What context lengths are actually being used successfully?
- Which models are unstable after launch?
- Which models stay loaded and useful versus churn quickly?
This is primarily a product/data feature, not just an operator telemetry feature.
High-Level Design
Implement survey as a built-in mesh-llm plugin with outbound OTLP metric export.
Although it is a plugin from a configuration/product perspective, the telemetry hooks should come from host/runtime lifecycle code, not from polling local APIs and inferring state changes.
Why host-emitted metrics instead of polling
The runtime already has clear source-of-truth transitions for:
- launch start
- launch success
- launch failure
- runtime load
- unload
- unexpected process exit
Those are the events we want to aggregate into crowd-sourced performance metrics. Polling would lose fidelity and make failure attribution weaker.
Transport
Use OTLP/HTTP only.
Use official crates:
opentelemetry
opentelemetry_sdk
opentelemetry-otlp
Do not add:
tracing-opentelemetry
opentelemetry-appender-tracing
This feature is metrics-only.
Configuration
Add mesh-llm config for telemetry while still respecting standard OTel env vars.
Mesh config
[telemetry]
enabled = true
service_name = "mesh-llm"
endpoint = "https://otel.example.com"
headers = { "authorization" = "Bearer TOKEN" }
export_interval_secs = 15
queue_size = 2048
prompt_shape_metrics = false
Optional metrics override
[telemetry.metrics]
endpoint = "https://otel.example.com/v1/metrics"
Standard env vars to support
OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
Precedence
- explicit mesh-llm config
- standard
OTEL_* env vars
- disabled if no endpoint is configured
Metrics Schema
Use a mesh-llm-specific schema with low-cardinality attributes that are useful for aggregate analysis.
Counters
mesh_llm_model_launch_total
mesh_llm_model_launch_success_total
mesh_llm_model_launch_failure_total
mesh_llm_model_unload_total
mesh_llm_model_exit_unexpected_total
Gauges
mesh_llm_loaded_models
mesh_llm_model_loaded
mesh_llm_model_context_length
Histograms
mesh_llm_model_launch_duration_ms
mesh_llm_model_uptime_s
Attributes
Use only bounded, aggregatable attributes.
Model attributes
mesh_llm.model
mesh_llm.architecture
mesh_llm.quantization
mesh_llm.launch_kind
Hardware attributes
mesh_llm.gpu_name
mesh_llm.gpu_stable_id
mesh_llm.backend_device
mesh_llm.gpu_count
mesh_llm.is_soc
Runtime attributes
mesh_llm.backend
mesh_llm.context_bucket
mesh_llm.service_version
Outcome attributes
Suggested enums
mesh_llm.launch_kind:
startup
runtime_load
multi_model
moe_fallback
moe_shard
mesh_llm.failure_reason:
spawn_failed
health_timeout
exited_before_healthy
backend_proxy_failed
capacity_rejected
known_kv_cache_crash
mmproj_missing
other
mesh_llm.context_bucket:
<=8k
8k_16k
16k_32k
32k_64k
64k_128k
>128k
Privacy Rules
The exporter must use a strict allowlist.
Allowed
- model identifier
- model architecture / quantization
- hardware facts
- launch outcome classification
- context length bucket or exact context length if we decide it is safe enough
- durations
- counts and gauges
Not allowed
- raw prompts
- completions
- logs
- file paths
- URLs from user payloads
- hostnames from prompt content
- request IDs with unbounded cardinality
- raw error strings as metric attributes
If prompt-shape metrics are ever added later, they must remain disabled by default and still avoid content-bearing fields.
Integration Points
Hook metric emission at the runtime/launch source of truth.
Launch path
Emit:
- launch attempt
- launch success
- launch failure
- launch duration
- context length
- hardware used
Runtime model control
Emit:
- runtime load success/failure
- unload
Unexpected exit path
Emit:
- unexpected exit
- uptime if known
Current code areas to wire
Likely integration points:
mesh-llm/src/inference/launch.rs
mesh-llm/src/runtime/mod.rs
mesh-llm/src/runtime/local.rs
- built-in plugin registration in
mesh-llm/src/plugin/config.rs and mesh-llm/src/plugin/mod.rs
Runtime Behavior
- bounded async export queue
- batch OTLP export
- retry with backoff
- drop oldest metrics when queue is full
- never block model launch or unload on telemetry export
- failure to export metrics must not affect inference availability
Plugin/Product Behavior
Expose survey as a built-in plugin that can be enabled/disabled in config.
Example:
[[plugin]]
name = "survey"
enabled = true
Telemetry-specific settings should live under [telemetry], not in a separate plugin-specific config tree.
Key Questions This Data Should Answer
- Which models are most frequently launched?
- Which models have the highest launch failure rate?
- Which hardware/model combinations are most reliable?
- Which context sizes are viable for each model/hardware combination?
- Which models have poor stability after successful launch?
- Which model/hardware combinations produce long launch times or short uptimes?
Acceptance Criteria
survey can be enabled via config
- metrics export over OTLP/HTTP to a configurable endpoint
- no mesh transport is used for telemetry
- launch success increments success metrics
- launch failure increments failure metrics with classified reason
- unload increments unload metrics
- unexpected process exit increments unexpected-exit metrics
- loaded-model state is visible through gauges
- hardware and context metadata are attached
- exporter failures do not affect serving, startup, or unload flows
- no logs, traces, raw prompts, or content-bearing fields are exported
Validation
Minimum validation:
- local smoke test against a test OTLP endpoint
- verify launch success/failure metrics appear
- verify unload metrics appear
- verify unexpected exit metrics appear
- verify gauges update when models load/unload
- verify exporter misconfiguration does not break runtime behavior
Open Questions
- Should
mesh_llm.model use the display name, canonical ref, or normalized local model name?
- Should exact context length be exported, or only a bucket?
- Is
gpu_stable_id acceptable as-is, or should it be hashed before export?
- Do we want a lightweight “still loaded” gauge only, or also a periodic heartbeat metric?
- Should telemetry auto-enable when an OTLP endpoint is configured, or require explicit opt-in?
Summary
Add a new built-in
surveyplugin that exports metrics only over OTLP/HTTP to a configurable telemetry endpoint.The purpose of this plugin is to crowd source real-world model performance across live mesh-llm nodes without using the mesh itself for reporting. We want to learn which models are being launched, on what hardware, with what context sizes, and how well they actually perform in the wild.
This should use the standard Rust OpenTelemetry crates and exporter stack. We should not implement a custom OTLP wire protocol or custom JSON sink.
Goals
Non-Goals
Why This Exists
We need crowd-sourced performance data from real usage, not just local benchmarks.
The main questions we want this to answer are:
This is primarily a product/data feature, not just an operator telemetry feature.
High-Level Design
Implement
surveyas a built-in mesh-llm plugin with outbound OTLP metric export.Although it is a plugin from a configuration/product perspective, the telemetry hooks should come from host/runtime lifecycle code, not from polling local APIs and inferring state changes.
Why host-emitted metrics instead of polling
The runtime already has clear source-of-truth transitions for:
Those are the events we want to aggregate into crowd-sourced performance metrics. Polling would lose fidelity and make failure attribution weaker.
Transport
Use OTLP/HTTP only.
Use official crates:
opentelemetryopentelemetry_sdkopentelemetry-otlpDo not add:
tracing-opentelemetryopentelemetry-appender-tracingThis feature is metrics-only.
Configuration
Add mesh-llm config for telemetry while still respecting standard OTel env vars.
Mesh config
Optional metrics override
Standard env vars to support
OTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_METRICS_ENDPOINTPrecedence
OTEL_*env varsMetrics Schema
Use a mesh-llm-specific schema with low-cardinality attributes that are useful for aggregate analysis.
Counters
mesh_llm_model_launch_totalmesh_llm_model_launch_success_totalmesh_llm_model_launch_failure_totalmesh_llm_model_unload_totalmesh_llm_model_exit_unexpected_totalGauges
mesh_llm_loaded_modelsmesh_llm_model_loadedmesh_llm_model_context_lengthHistograms
mesh_llm_model_launch_duration_msmesh_llm_model_uptime_sAttributes
Use only bounded, aggregatable attributes.
Model attributes
mesh_llm.modelmesh_llm.architecturemesh_llm.quantizationmesh_llm.launch_kindHardware attributes
mesh_llm.gpu_namemesh_llm.gpu_stable_idmesh_llm.backend_devicemesh_llm.gpu_countmesh_llm.is_socRuntime attributes
mesh_llm.backendmesh_llm.context_bucketmesh_llm.service_versionOutcome attributes
mesh_llm.failure_reasonSuggested enums
mesh_llm.launch_kind:startupruntime_loadmulti_modelmoe_fallbackmoe_shardmesh_llm.failure_reason:spawn_failedhealth_timeoutexited_before_healthybackend_proxy_failedcapacity_rejectedknown_kv_cache_crashmmproj_missingothermesh_llm.context_bucket:<=8k8k_16k16k_32k32k_64k64k_128k>128kPrivacy Rules
The exporter must use a strict allowlist.
Allowed
Not allowed
If prompt-shape metrics are ever added later, they must remain disabled by default and still avoid content-bearing fields.
Integration Points
Hook metric emission at the runtime/launch source of truth.
Launch path
Emit:
Runtime model control
Emit:
Unexpected exit path
Emit:
Current code areas to wire
Likely integration points:
mesh-llm/src/inference/launch.rsmesh-llm/src/runtime/mod.rsmesh-llm/src/runtime/local.rsmesh-llm/src/plugin/config.rsandmesh-llm/src/plugin/mod.rsRuntime Behavior
Plugin/Product Behavior
Expose
surveyas a built-in plugin that can be enabled/disabled in config.Example:
Telemetry-specific settings should live under
[telemetry], not in a separate plugin-specific config tree.Key Questions This Data Should Answer
Acceptance Criteria
surveycan be enabled via configValidation
Minimum validation:
Open Questions
mesh_llm.modeluse the display name, canonical ref, or normalized local model name?gpu_stable_idacceptable as-is, or should it be hashed before export?