Skip to content

Latest commit

 

History

History
63 lines (43 loc) · 8.65 KB

File metadata and controls

63 lines (43 loc) · 8.65 KB

Navigation: Main Guide | Security Audit Reference | CVEs/GHSAs | Issue #1796 | Medium Article | ZeroLeeks | Post-merge Hardening | Open Issues | Open PRs | Ecosystem Threats | SecurityScorecard | Cisco AI Defense | Model Poisoning | Hudson Rock | Cline Supply Chain | ClawJacked | Model Comparison

Second security audit (Medium article)

In January 2026, a Medium article by Saad Khalid titled "Why Clawdbot is a Bad Idea: Critical Zero-days Found in My Audit" claimed 8 critical zero-day vulnerabilities (CVSS 7.5-10.0) based on a self-described "Complete White Box Penetration Test." This section provides a source-code-verified analysis.

How each model covered it

Model Coverage Accuracy
Opus 4.5 Most thorough: full 8-claim analysis with code file/line references, CVSS comparison, 3 legitimate gaps identified All verdicts match source code review
Copilot GPT-5.2 Covers all 8 claims individually with code references and nuanced "attacker needs admin access" framing High accuracy; minor error on claim 3 (logs.tail called "partially accurate" when schema fully blocks arbitrary paths)
GLM 4.7 5-row table, but the claims analyzed do not match the article's actual findings Inaccurate -- appears to have hallucinated or confused the article's claims with a different report (e.g., lists "CVE-2024-44946 Directory Traversal" and "Insecure Dependencies" which the article does not mention)
Gemini 3.0 Pro Brief bullet-point summary; correctly notes DNS rebinding is mitigated Mostly inaccurate -- accepted auth bypass (#5), arbitrary read (#3), and RCE (#1) claims at face value without verifying against RBAC, schema validation, or Docker isolation
Kimi K2.5 Detailed coverage of all claims with CVSS scores, attack scenarios, "Auditor's Verdict" quote Inaccurate -- accepts SSRF/DNS rebinding, logic bombs, self-approval bypass, and LD_PRELOAD claims at face value; does not verify against DNS pinning (ssrf.ts), Docker isolation, RBAC enforcement, or human approval flow; quotes auditor's "Do Not Deploy" verdict without challenge

Key disagreements resolved:

  • Claim 3 (logs.tail traversal): Copilot GPT-5.2 calls it "partially accurate" and Gemini 3.0 Pro lists it as a "Data Risk." Code review confirms the LogsTailParamsSchema (src/gateway/protocol/schema/logs-chat.ts:4-11) has additionalProperties: false with only cursor/limit/maxBytes parameters -- there is no file path parameter at all. The file path comes from getResolvedLoggerSettings().file (config-derived). Verdict: false, not partially accurate.

  • Claim 5 (auth bypass / self-approving agent): Gemini 3.0 Pro states "Agents can self-approve dangerous commands (missing role check)." Code review confirms authorizeGatewayMethod() (src/gateway/server-methods.ts:100-157) enforces role checks on every call and agents are blocked from approval methods. Verdict: false.

  • GLM 4.7 claim set mismatch: GLM analyzed claims like "CVE-2024-44946 Directory Traversal" and "OS Command Injection via Filename" that do not appear in the Medium article. The article's actual 8 claims are about config injection, nodes outPath, logs.tail, DNS rebinding, RBAC, token format, regex validation, and env vars. This is a factual error in the analysis, not a disagreement about interpretation.

Kimi K2.5 disagreement: Kimi K2.5 quotes the auditor's "Do Not Deploy" recommendation without verification. The security analysis presents attack chains (e.g., "SSRF steals AWS credentials -> Environment injection achieves RCE -> Persistent backdoor via config.patch") that require bypassing multiple layered controls: DNS pinning, Docker sandboxing, human approval flow, and RSA-signed tokens. Each link in these chains is independently blocked by existing code.

Synthesized verdict (all 8 claims)

# Claim Verdict Source code evidence
1 Config injection RCE via setupCommand Partially true, overstated setupCommand executes inside Docker container, not host (src/agents/sandbox/docker.ts:473-474). Config changes require gateway auth.
2 Arbitrary write via nodes:screen_record outPath True but overstated outPath lacks path validation (src/agents/tools/nodes-tool-media.ts:353-354), but writes to paired node device, not gateway.
3 Log traversal via logs.tail False Schema has additionalProperties: false, accepts only cursor/limit/maxBytes (src/gateway/protocol/schema/logs-chat.ts:4-11). File path from config, not request.
4 DNS rebinding SSRF via web-fetch False resolvePinnedHostname() + createPinnedDispatcher() pins DNS (src/infra/net/ssrf.ts:312-362). Redirect-to-private-IP tested and blocked (web-fetch.ssrf.test.ts:120-142).
5 Self-approving agent (no RBAC) False authorizeGatewayMethod() enforces role checks on every call (src/gateway/server-methods.ts:100-157). Agents blocked from approval methods. Further hardened by owner-only tool gating (392bbddf2), owner allowlist enforcement (385a7eba3), and nodes tool restricted to owners only (9692dc766).
6 Token field shifting via pipe injection Misleading Pipe-delimited format exists (src/gateway/device-auth.ts:34-47) but tokens are RSA-signed. Modified payload fails signature verification.
7 Shell injection via incomplete regex False isSafeExecutableValue() validates executable names, not commands (src/infra/exec-safety.ts:16-44). Strict allowlist: /^[A-Za-z0-9._+-]+$/.
8 Env variable injection (LD_PRELOAD) Partially true, MITIGATED in PR #12; further hardened Feb 21 sync 7 Gateway validates params.env via policy (src/infra/host-env-security-policy.json) and validateHostEnv() at src/agents/bash-tools.exec-runtime.ts:84 (enforced at src/agents/bash-tools.exec.ts:705). sanitizeHostExecEnv() at src/infra/host-env-security.ts:224 is the unified enforcement point. Node-host: sanitizeEnv() at src/node-host/invoke.ts:95 delegates to sanitizeHostExecEnv(). Requires human approval + localhost. Related to GHSA-82g8-464f-2mv7.

Result: 0 of 8 claims are exploitable as described.

  • 5 are factually incorrect (claims 3, 4, 5, 6, 7)
  • 2 are partially true but heavily overstated (claims 1, 8)
  • 1 is a true observation with misleading risk framing (claim 2)

Methodology concerns

The article claims a "Complete White Box Penetration Test" but demonstrates a pattern consistent with static code reading without architectural context. Key security controls (Docker sandboxing, DNS pinning, RBAC enforcement, RSA signing, human approval flow) were either not tested or not acknowledged. This mirrors the first audit's weakness: analyzing code patterns in isolation without tracing the full execution path through layered defenses.

Comparison to first audit

Aspect Argus (Issue #1796) Medium Article (Saad Khalid)
Methodology Automated scanners + AI Claims manual pentest
Findings 512 total, 8 critical 8 critical
Exploitable as described 0 of 8 0 of 8
Core weakness Pattern matching without context Code reading without architectural context

For defense-in-depth gap status and post-merge hardening notes, see Post-merge security hardening.

For full detailed analysis: Opus 4.5 Security Audit Analysis

Article: Why Clawdbot is a Bad Idea (Medium)