Description
OpenShell's policy engine enforces four protection layers (network, filesystem, process, inference) but has no concept of audio. As NemoClaw adds speech capabilities (see #1520 for /v1/audio/* gateway routes), we need policy controls for how audio flows through the sandbox.
Without this, NemoClaw agents can access speech services but operators have no way to control microphone access, audio storage, or whether audio leaves the local environment.
Enterprise voice agent deployments require answers to:
Which agents can access the microphone? (today: blocked globally by Permissions-Policy header — openclaw/openclaw#51085)
Can audio be recorded or stored? Where? For how long?
Can audio egress to a cloud endpoint, or must it stay local? (e.g., force all audio to a local PersonaPlex sidecar, never to an external API)
This is the gap between "voice agent" and "secure voice agent" — which is NemoClaw's differentiator.
Example: openclaw-sandbox.yaml
audio:
microphone:
grant: per-agent # "none" | "per-agent" | "all"
allowed_agents:
- voice-assistant
recording:
allow: false # Can audio be written to disk?
retention: 0 # Max seconds to retain (0 = no storage)
allowed_paths: [] # If allow: true, restrict to these dirs
egress:
mode: local-only # "local-only" | "allowlist" | "any"
allowlist: # Used when mode: allowlist
- host: personaplex.local
port: 8998
- host: inference.local
port: 8000
Dependencies:
#1520 — /v1/audio/transcriptions and /v1/audio/speech gateway routes (audio can't flow at all until this ships)
#409 — WebSocket egress timeout (voice sessions are long-lived WS; currently killed at ~2 min)
openclaw/openclaw#51085 — Permissions-Policy: microphone=() blocks mic by default
Suggested steps:
- Per-agent mic grant in policy YAML + microphone=(self) header fix
- Audio egress allowlist (reuse existing network policy structure, add media-type: audio filter)
- Recording rules (storage path restriction, retention enforcement, audit log for audio writes)
- Audio encryption enforcement (require TLS/DTLS on audio streams)
Reproduction Steps
This is a feature request, not a bug. To observe the gap:
- Deploy NemoClaw with a voice agent that needs mic access and audio streaming
- Attempt to configure per-agent microphone grants in openclaw-sandbox.yaml → no audio section exists
- Attempt to set recording rules or audio egress restrictions → no policy surface available
- Note that the only current audio-related control is the Permissions-Policy HTTP header, which is global (all-or-nothing) and blocks mic by default
Environment
Debug Output
Logs
Checklist
Description
OpenShell's policy engine enforces four protection layers (network, filesystem, process, inference) but has no concept of audio. As NemoClaw adds speech capabilities (see #1520 for /v1/audio/* gateway routes), we need policy controls for how audio flows through the sandbox.
Without this, NemoClaw agents can access speech services but operators have no way to control microphone access, audio storage, or whether audio leaves the local environment.
Enterprise voice agent deployments require answers to:
Which agents can access the microphone? (today: blocked globally by Permissions-Policy header — openclaw/openclaw#51085)
Can audio be recorded or stored? Where? For how long?
Can audio egress to a cloud endpoint, or must it stay local? (e.g., force all audio to a local PersonaPlex sidecar, never to an external API)
This is the gap between "voice agent" and "secure voice agent" — which is NemoClaw's differentiator.
Example: openclaw-sandbox.yaml
audio:
microphone:
grant: per-agent # "none" | "per-agent" | "all"
allowed_agents:
- voice-assistant
recording:
allow: false # Can audio be written to disk?
retention: 0 # Max seconds to retain (0 = no storage)
allowed_paths: [] # If allow: true, restrict to these dirs
egress:
mode: local-only # "local-only" | "allowlist" | "any"
allowlist: # Used when mode: allowlist
- host: personaplex.local
port: 8998
- host: inference.local
port: 8000
Dependencies:
#1520 — /v1/audio/transcriptions and /v1/audio/speech gateway routes (audio can't flow at all until this ships)
#409 — WebSocket egress timeout (voice sessions are long-lived WS; currently killed at ~2 min)
openclaw/openclaw#51085 — Permissions-Policy: microphone=() blocks mic by default
Suggested steps:
Reproduction Steps
This is a feature request, not a bug. To observe the gap:
Environment
Debug Output
Logs
Checklist