Part I: Output & Parsing | Challenge §3
Severity: High | Frequency: Very Common | Detectability: Hard | Token Spend: Medium | Time: Low | Context: High
Unix convention: stdout = data, stderr = diagnostics. Most CLI tools violate this, mixing progress messages, warnings, and errors into stdout alongside actual output.
$ tool export-data > output.json
Connecting to server...
Fetching records... (234 found)
Warning: 3 records skipped (missing required field)
{"records": [...]}
Export complete.$ cat output.json
Connecting to server...
Fetching records... (234 found)
Warning: 3 records skipped (missing required field)
{"records": [...]}
Export complete.
# ← not valid JSON, parse failsAgent captures both streams together:
result = subprocess.run(cmd, capture_output=True)
output = result.stdout.decode()
# if tool mixed stderr into stdout, output is corruptedWarnings that belong on stderr end up in stdout:
$ tool validate config.yaml
config.yaml is valid
Warning: deprecated key 'timeout' found at line 12
Agent parses first line as success, misses the warning.
- Output parsing fails when stdout is contaminated
- Warnings are invisible to agents (they read stdout, log stderr separately)
- Agent cannot distinguish data from diagnostics
Strict stream discipline:
stdout: ONLY the command's primary output (data, result, id)
stderr: progress indicators, warnings, debug info, timing, counts
# Good
$ tool create-user --name Alice 2>/dev/null
{"id": 42, "name": "Alice"}
$ tool create-user --name Alice 1>/dev/null
Creating user Alice...
Done. (45ms)Structured warnings in JSON output:
{
"ok": true,
"data": {"records": [...]},
"warnings": [
{"code": "DEPRECATED_KEY", "message": "...", "location": "line 12"}
]
}For framework design:
- Route all
log(),progress(),debug()calls to stderr by default - Only
print()/output()writes to stdout - Provide
--quietto suppress all stderr - Provide
--warnings-as-errorsto exit non-zero on any warning
Merged from §39: The following content was originally a separate challenge. It is consolidated here because it describes a specific case of the same root problem.
Severity: High | Frequency: Common | Detectability: Medium | Token Spend: Medium | Time: Low | Context: High
In some frameworks, particularly Commander.js (the most widely deployed Node.js CLI framework), --help output goes to stdout by default. Most agent workflows capture stdout as the primary data channel. When an agent invokes a command that prints help (e.g., because a required argument is missing), help text floods stdout, corrupting what the agent expects to be structured data.
This is distinct from challenge #3 (Stderr vs Stdout Discipline), which focuses on error messages and diagnostic output mixing into stdout. Challenge #3 describes the general class of problem. This challenge specifically concerns help text — which has unique properties: it is very long (filling the context window), occurs in specific failure conditions (wrong invocation), and is not a runtime error but a framework-generated response. Many CLIs that correctly route runtime errors to stderr still route help text to stdout.
// Commander.js default: help goes to stdout
program.parse(); // If required arg missing, help printed to stdout then exits 1
// Agent capture:
const result = await exec('my-tool deploy'); // missing required --env
// result.stdout = "Usage: my-tool deploy [options]\n\nDeploy to an environment\n\nOptions:\n -e, --env..."
// result.stderr = ""
// result.exitCode = 1
// Agent tries to parse result.stdout as JSON → failsThe failure is compounded by timing: help output appears on the same stdout stream as normal output, meaning an agent that has successfully called a command dozens of times may suddenly receive help text when it makes a slightly wrong call — with no separator or content-type indicator to distinguish the two.
Python Fire is worse: it routes all its own output (help, trace, error messages) to stdout, making stdout an indistinguishable mix of framework messages and application data.
- Agent attempts to JSON-parse help text, fails, reasons incorrectly about the invocation
- Help text is long (hundreds of tokens), consuming context window for no useful purpose (the agent already has the schema from
--help --jsonor--schema) - Agent may mistake help text for successful output (e.g., a
deploycommand that prints usage text looks like it printed deployment instructions) - If the agent does parse the failure correctly, it must distinguish help-on-failure from data-output, requiring content-sniffing heuristics
- In Python Fire specifically, there is no reliable way to separate framework messages from application output in the stdout stream
For agents invoking CLI tools:
result = subprocess.run(cmd, capture_output=True)
# Detect help text pollution before attempting to parse
if result.returncode != 0 and ('Usage:' in result.stdout.decode() or 'Options:' in result.stdout.decode()):
# This is help text, not data output
raise ValueError(f"Command failed (usage error): {cmd}")For CLI authors using Commander.js:
program.configureOutput({
writeOut: (str) => process.stderr.write(str), // ✓ route help to stderr
writeErr: (str) => process.stderr.write(str),
});For framework design:
- Route all help output to stderr by default when stdout is not a TTY
- Never route help or usage text to stdout, regardless of TTY state
- When
isatty(stdout) == False, replace help display with a structured JSON error on stdout:{"ok": false, "error": {"code": "USAGE_ERROR", "message": "Missing required option --env", "hint": "Run with --schema for the full interface definition"}} - Ensure that exit-code-2 (usage error) is always accompanied by stderr-only output, never stdout output
| Score | Condition |
|---|---|
| 0 | Progress messages, warnings, and help text mixed into stdout alongside data |
| 1 | Most data on stdout but some prose leaks (e.g. "Done.", help text on bad invocation, or debug lines) |
| 2 | All diagnostic content on stderr; data only on stdout; warnings present in JSON warnings[] field |
| 3 | Help text routed to stderr in non-TTY mode; --quiet suppresses all stderr; --warnings-as-errors flag available |
Check: Run any command redirecting stdout to a file and stderr to /dev/null, then validate the file contains only valid JSON — any prose in the file is a failure.
Always capture stderr and stdout separately; detect contamination before parsing:
result = subprocess.run(cmd, capture_output=True, text=True)
stdout = result.stdout.strip()
stderr = result.stderr.strip()
# Detect help text on stdout (usage error with wrong invocation)
HELP_MARKERS = ("Usage:", "Options:", "Commands:", "Examples:")
if any(m in stdout for m in HELP_MARKERS):
# Don't try to parse — extract the actual error from stderr instead
raise ValueError(f"Usage error — got help text on stdout. stderr: {stderr[:300]}")
# Treat stderr lines as diagnostic context, not data
if stderr:
# Log for debugging but don't mix into parsed result
logger.debug("tool stderr: %s", stderr)
parsed = json.loads(stdout)For tools that route warnings to stdout as prose, strip leading non-JSON lines:
lines = stdout.splitlines()
json_start = next((i for i, l in enumerate(lines) if l.strip().startswith("{")), None)
if json_start is not None and json_start > 0:
warnings_text = "\n".join(lines[:json_start])
stdout = "\n".join(lines[json_start:])Limitation: If a tool routes structured data to stderr or mixes help text and JSON in the same stream with no separator, there is no reliable parse strategy — the tool requires a fix from its author before it can be safely used by agents