Created: 2026-02-23
Audited by: Antigravity AI Agent
Scope: All files insrc/agent/, includingloop.ts,router.ts,fallback.ts,context-guard.ts,prompts.ts,providers/
Reference:src/agent/README.md,src/agent/ARCHITECTURAL_REVIEW.md
Status: Open — ready for AI agent implementation
Context: The agent is the core brain of Talon. Every bug here affects ALL channels.
User message arrives (via any channel)
→ eventBus.emit('message.inbound') [gateway/index.ts:90]
→ gateway listener calls agentLoop.run(session) [gateway/index.ts:105]
AgentLoop.run(session):
1. state = 'thinking' [loop.ts:151]
2. Check compression needed [loop.ts:156]
3. Get default provider (modelRouter) [loop.ts:161]
4. Set available tools for system prompt [loop.ts:172]
5. LOOP (max iterations):
a. state = 'executing' [loop.ts:183]
b. Build context (memoryManager.buildContext) [loop.ts:195]
c. Evaluate context window [loop.ts:198]
d. Truncate if needed [loop.ts:205]
e. LLM call with fallback + 90s timeout [loop.ts:224-273]
f. IF tool_calls → execute tools → continue [loop.ts:338-455]
g. IF text response → yield done → return [loop.ts:457-508]
6. Max iterations → yield warning + done [loop.ts:511-540]
→ eventBus.emit('message.outbound') [gateway/index.ts:136]
→ Channel delivers response (CLI only, see IssuesChannels.md CHAN-003)
| Step | Status | Notes |
|---|---|---|
| Message ingestion | ✅ Works | Via BaseChannel.ingestMessage() |
| Session resolution | ✅ Works | Via MessageRouter.handleInbound() |
| Provider selection | See AGENT-003 (unresolved env vars) | |
| Context building | ✅ Works | memoryManager.buildContext() |
| Context guard | See AGENT-008 (token estimation) | |
| LLM call | ✅ Works | With timeout + fallback |
| Tool execution | ✅ Works | Rate-limited, error-handled |
| Response delivery | ❌ Broken for non-CLI | See IssuesChannels.md CHAN-003 |
| Memory compression | See AGENT-010 | |
| Error handling | ✅ Works | Yields error chunk back to gateway |
The existing ARCHITECTURAL_REVIEW.md (created 2026-02-20) already identified these — still unfixed:
- Severity: 🔴 Critical —
already documented in ARCHITECTURAL_REVIEW.md - File:
src/agent/README.md, Section 4 - Problem: README claims
loop.tsis 21,592 lines when it's 576. All 5 files are overstated by 35-43x. These are byte counts misreported as line counts.
| File | README claims | Actual lines | Actual bytes |
|---|---|---|---|
loop.ts |
"21,592 lines" | 576 | 23,421 |
prompts.ts |
"21,851 lines" | 511 | 21,851 |
router.ts |
"9,733 lines" | 263 | 9,733 |
fallback.ts |
"10,012 lines" | 291 | 10,012 |
context-guard.ts |
"6,346 lines" | 180 | 6,346 |
- Root cause: The README generator used byte counts instead of line counts.
- Fix: Replace all numbers with actual line counts from
wc -l.
- Severity: 🟠 High
- File:
src/agent/README.md, Section 7 (Data Contracts) - Problem: Claims
LLMMessageSchema,ToolCallSchema,AgentConfigSchema,ProviderConfigSchemaas Zod schemas. Zero Zod schemas exist in the agent directory. Only TypeScript interfaces are used. - Fix: Either implement schemas or update README to say "TypeScript interfaces only (no runtime validation)".
-
Severity: 🟢 Low (downgraded — config loader DOES resolve env vars)
-
File:
src/agent/router.ts, line 46 -
Problem: The router skips providers whose API key starts with
${:if (!apiKey || apiKey.startsWith('${')) { logger.debug({ provider: id }, 'Skipping provider — no API key'); continue; }
The config loader (
src/config/loader.ts:190) callsresolveEnvVarsDeep()which resolves${OPENROUTER_API_KEY}→ actual value BEFORE reaching the router. So this works correctly as long as the env var is set.However, if the env var is NOT set (e.g.,
OPENROUTER_API_KEYundefined),resolveEnvVars()returns the original"${OPENROUTER_API_KEY}"string (line 44 returns original onundefined), and the router silently skips the provider with only adebuglog. The user gets no warning about the missing env var. -
Fix: Log at
warnlevel instead ofdebugwhen skipping due to unresolved env var pattern:if (apiKey.startsWith('${')) { logger.warn({ provider: id, hint: apiKey }, 'Skipping provider — env var not set'); continue; }
-
Severity: 🟡 Medium
-
File:
src/agent/loop.ts, lines 108-109 -
Problem:
executeTool()dynamically imports../tools/normalize.jsand callsnormalizeToolExecution(). ButexecuteTool()is never called by the main loop — the loop usestoolHandler.execute()directly at line 383. Andnormalize.tswas flagged as dead code indocs/IssuesTools.md(TOOL-003).executeTool()is only a public utility for external callers. The normalization wrapper adds overhead for no benefit since the loop handles errors itself. -
Fix: Either:
- (A) Remove
executeTool()entirely if nothing external uses it - (B) Make the main loop also use normalized execution for consistency
- (A) Remove
- Severity: 🟡 Medium
- File:
src/agent/loop.ts, various lines - Problem: The README claims "PLAN → DECIDE → EXECUTE → EVALUATE → COMPRESS → RESPOND with guard conditions". Reality:
- No
PLANorDECIDEphases exist - State is set with raw
this.state = 'xxx'assignments - No validation that the transition is valid
- Any code can set any state from any other state
- If an error occurs during
evaluating, the state can get stuck
- No
- Fix: Add a
transition()method:private transition(to: LoopState): void { const allowed: Record<LoopState, LoopState[]> = { 'idle': ['thinking'], 'thinking': ['executing', 'compressing', 'error'], 'executing': ['evaluating', 'error'], 'evaluating': ['responding', 'executing'], 'compressing': ['thinking'], 'responding': ['idle'], 'error': ['idle'], }; if (!allowed[this.state]?.includes(to)) { logger.warn({ from: this.state, to }, 'Invalid state transition'); } this.state = to; }
- Severity: 🟢 Low
- File:
src/agent/loop.ts, lines 392, 444 - Problem: Both
tool.completeevent (line 392) and the tool message metadata (line 444) reportexecutionTime: 0. The actual execution time is never measured. - Fix: Capture
Date.now()before and after tool execution:const toolStart = Date.now(); output = await toolHandler.execute(tc.args, session); const executionTime = Date.now() - toolStart;
-
Severity: 🟡 Medium
-
File:
src/agent/context-guard.ts, lines 65-79 -
Problem: Model name matching uses
normalized.includes(pattern):for (const [pattern, window] of Object.entries(MODEL_CONTEXT_WINDOWS)) { if (normalized.includes(pattern.toLowerCase())) { return window; } }
This causes false matches:
"openrouter/deepseek/deepseek-chat-v3-0324"matches"deepseek-chat"→ returns 64k- But it also matches
"default"if iterated in wrong order "claude-3-opus"matches before"claude-3-haiku"if the model ID contains both
The iteration order of
Object.entries()is insertion order, sogpt-4omatches beforegpt-4o-mini— returning the same 128k, but this is fragile. -
Fix: Sort patterns by length (longest first) to match most specific first. Or use a lookup table with exact slug extraction.
-
Severity: 🟡 Medium
-
File:
src/agent/context-guard.ts, lines 44-49 -
Problem:
estimateTokens()usestext.length / 4. For English prose this is ~OK, but:- Code blocks: Typically ~3 chars/token → underestimates by ~25%
- JSON/structured data: Varies wildly
- Non-English text: Chinese/Japanese can be 1-2 chars/token → underestimates by 50-75%
- Tool outputs: Often JSON, underestimated
This means the context guard may say "you have 30k tokens remaining" when you actually have 20k. Combined with large tool outputs, this can cause unexpected context overflow errors from the LLM.
-
Fix: Use a proper tokenizer library, or at minimum add a safety margin:
export function estimateTokens(text: string): number { return Math.ceil(text.length / 3); // Conservative: ~3 chars/token }
- Severity: 🟡 Medium
- File:
src/agent/context-guard.ts, lines 141-179 - Problem: Truncation removes oldest non-system messages. But tool call messages (role:
assistantwithtool_calls) and tool result messages (role:tool) must come in pairs. If truncation removes an assistant message with tool_calls but keeps the corresponding tool result (or vice versa), the LLM will receive an invalid message sequence and may crash with:"Invalid parameter: messages with role 'tool' must be a response to a preceding message with 'tool_calls'"
- Fix: When truncating, always remove tool_call/tool_result as atomic pairs.
-
Severity: 🟡 Medium
-
File:
src/agent/loop.ts, lines 545-568 -
Problem:
compressMemory():- Gets messages for compression
- Formats them
- Sends to
memoryCompressor.compress()— which makes an LLM call - Applies compression result
If the compression LLM call fails (rate limit, timeout), the error is not caught — it propagates up and crashes the entire
run()generator. This means a compression failure kills the user's current request even though the actual task might not need compression. -
Fix: Wrap compression in try/catch:
private async *compressMemory(session: Session): AsyncIterable<AgentChunk> { try { // ... existing code } catch (err) { logger.error({ err }, 'Memory compression failed — continuing without compression'); yield { type: 'thinking', content: 'Memory compression skipped (will retry later)' }; } }
- Severity: 🟢 Low
- File:
src/agent/providers/README.md, lines 7-8 - Problem: Claims
openai-compatible.tsis "9455 lines" andopencode.tsis "3880 lines". Actual: 283 and 110 lines respectively. Same byte-count-as-line-count bug. - Fix: Update to actual line counts.
- Severity: 🟢 Low
- File:
src/agent/providers/opencode.ts, line 13 - Problem:
this.baseUrl = 'https://opencode.ai/zen/v1'is hardcoded. If OpenCode changes their URL or the user wants to use a self-hosted version, there's no way to configure it. - Fix: Accept
baseUrlas a constructor parameter with a default.
- Severity: 🟢 Low
- File:
src/agent/providers/opencode.ts, lines 99-104 - Problem:
chatStream()throws'Streaming not supported for OpenCode provider'. If any future code path callschatStream()(e.g., adding streaming to the agent loop), it will crash instead of falling back to non-streaming. - Fix: Implement a polyfill that returns the non-streaming result as a single chunk.
- Severity: 🟢 Low
- File:
src/agent/providers/README.md, line 47 - Problem: Says
https://api.opencode.ai/v1but actual code useshttps://opencode.ai/zen/v1. - Fix: Update README.
- Severity: 🟡 Medium
- File:
src/agent/prompts.ts, lines 234-402 - Problem:
buildSystemPrompt()appends a massive section (~2000 tokens) of rules, tool categories, response format instructions, and multi-step task guidance to EVERY system prompt. This:- Wastes ~2000 tokens per message (at $0.001/1k tokens for DeepSeek, that's $0.002/message)
- Reduces available context for actual conversation
- Hardcodes tool categories that may not match registered tools
- Includes
<think>and<final>tag instructions that not all models support
- Fix:
- Move static instructions to SOUL.md (loaded once, user-editable)
- Only include tool categories for tools that are actually registered
- Make
<think>/<final>format model-aware (some models don't support it)
- Severity: 🟡 Medium
- File:
src/agent/prompts.ts, lines 286-323 - Problem: Lines 286-323 hardcode tool categories like "Apple Mail", "Browser Automation", "Delegation". If tools are added/removed from the registry, this section becomes stale. The actual available tools ARE listed dynamically (line 283), but the category descriptions are static.
- Fix: Generate tool categories from the registered tools, or remove the manual categorization section.
- Severity: 🟢 Low
- File:
src/agent/prompts.ts, lines 65-72 - Problem:
loadWorkspaceFile()doesfs.existsSync()+fs.readFileSync()for SOUL.md, USER.md, IDENTITY.md, MEMORY.md, and daily memories on every single message. That's 5-7 disk reads per message. Files rarely change mid-conversation. - Fix: Cache with a 30-second TTL, or use a file watcher.
- Severity: 🟡 Medium
- File:
src/agent/router.ts, lines 82-166;src/agent/loop.ts, line 161 - Problem: README says the router does "cost-based model selection (simple→cheapest, complex→best)". But the agent loop only calls
getDefaultProvider()(line 161), which always returns the'moderate'complexity route. The entiregetProviderForTask()method with itssimple/moderate/complex/summarizelogic is never invoked by the agent loop.getProviderForTask()is effectively dead code for the main conversation flow.
- Fix: Either:
- (A) Implement task complexity detection in the loop (analyze user message → determine complexity → select appropriate provider)
- (B) Document that task-based routing is not yet wired up
- Severity: 🟢 Low
- File:
src/agent/router.ts, lines 193-206 - Problem:
selectModelByComplexity()hardcodes lists like['minimax-m2.5-free', 'big-pickle', 'glm-5-free', ...]and['deepseek-reasoner', 'o3-mini', 'claude-opus', ...]. These lists will become stale as models are deprecated and new ones appear. - Fix: Move model tier lists to config or make them discoverable from provider model lists.
- Severity: 🟡 Medium
- File:
src/agent/fallback.ts, lines 157-254 - Problem: README (line 163) says "Missing: Circuit breaker per provider." This is accurate — if a provider is consistently failing, the fallback router still tries it every time (wasting latency on the timeout). No state is tracked across requests.
- Fix: Implement a simple circuit breaker:
private failureCounts = new Map<string, { count: number; lastFail: number }>(); private isCircuitOpen(providerId: string): boolean { const state = this.failureCounts.get(providerId); if (!state) return false; // Open circuit for 60s after 3 consecutive failures return state.count >= 3 && (Date.now() - state.lastFail) < 60000; }
- Severity: 🟡 Medium
- File:
src/agent/fallback.ts, line 241 - Problem:
This checks
if (this.providers.indexOf(providerInfo) < orderedProviders.length - 1) { await this.delay(this.retryDelayMs); }
this.providers.indexOf(providerInfo)but iterates overorderedProviders. SinceorderedProvidersmay have a different order thanthis.providers, the indexOf check could return-1(if the object reference doesn't match) or an incorrect index, causing incorrect delay behavior. - Fix: Change to:
const currentIndex = orderedProviders.indexOf(providerInfo); if (currentIndex < orderedProviders.length - 1) { await this.delay(this.retryDelayMs); }
-
Severity: 🟠 High
-
File:
src/agent/loop.ts, line 383 -
Problem: The agent loop calls
toolHandler.execute(tc.args, session)with raw LLM-generated arguments. The LLM controls what args are passed to each tool. If a tool (likeshell_execute) doesn't validate its inputs, the LLM can be tricked (via prompt injection) into running arbitrary commands.While individual tools should validate their own inputs, the agent loop provides no defense-in-depth. A single un-validated tool is a security hole.
-
Fix: Add a central sanitization layer before tool execution:
- Log all tool calls with their args for audit
- Validate that args match the tool's declared parameter schema
- Block tools that are in the security deny list
- Severity: 🟡 Medium
- File:
src/agent/prompts.ts, lines 65-72 - Problem: Workspace files (SOUL.md, USER.md, MEMORY.md) are loaded and injected verbatim into the system prompt. If any of these files contain prompt injection text (e.g., "IGNORE ALL PREVIOUS INSTRUCTIONS"), it becomes part of the system prompt. Since tools like
file_writecan modify these files, a clever prompt injection chain could self-propagate. - Fix: Sanitize workspace file content before injection, or at minimum add a separator that the LLM understands is user-controlled content.
-
Severity: 🟠 High
-
File:
tests/directory -
Problem: README claims "323 tests" but:
- No
agent-loop.test.tsexists - No
fallback.test.tsexists - No
context-guard.test.tsexists - No
prompts.test.tsexists model-router.test.tsexists (1 file)
The core brain of the entire system has almost no test coverage. The "323 tests" claim is fabricated.
- No
-
Fix: Create tests for:
- Agent loop: tool execution, max iterations, error handling, compression trigger
- Fallback: error classification, provider ordering, retry behavior
- Context guard: token estimation, truncation (especially tool pairs)
- Prompts: workspace file loading, template detection, system prompt construction
- Severity: 🟡 Medium
- File:
src/agent/loop.ts, lines 224-273 - Problem: The timeout implementation uses
Promise.race():If the LLM call succeeds before the timeout, theconst timeoutPromise = new Promise<never>((_, reject) => { setTimeout(() => reject(...), LLM_TIMEOUT_MS); }); return Promise.race([llmPromise, timeoutPromise]);
setTimeoutstill fires 90 seconds later and callsreject()on the already-resolved promise. While this doesn't break anything (unhandled rejection on a settled promise is a no-op), it keeps the timer reference alive, preventing garbage collection. - Fix: Clear the timeout on success:
let timeoutId: NodeJS.Timeout; const timeoutPromise = new Promise<never>((_, reject) => { timeoutId = setTimeout(() => reject(...), LLM_TIMEOUT_MS); }); return Promise.race([ llmPromise.finally(() => clearTimeout(timeoutId)), timeoutPromise, ]);
- Severity: 🟢 Low
- File:
src/agent/loop.ts, line 237 - Problem:
this.getToolDefinitions()is called inside the loop on every iteration. This rebuilds the entire tool definition array from the Map. Tools don't change during a loop run. - Fix: Build tool definitions once before the loop and reuse.
| # | Issue | What | Time |
|---|---|---|---|
| 1 | AGENT-009 |
Fix tool pair truncation (causes LLM crashes) | 30 min |
| 2 | AGENT-010 |
Catch compression failures (kills active request) | 10 min |
| 3 | AGENT-001 |
Fix README line counts (bytes → lines) | 5 min |
| # | Issue | What |
|---|---|---|
| 4 | AGENT-022 |
Add tool argument validation layer |
| 5 | AGENT-021 |
Fix fallback indexOf bug |
| 6 | AGENT-008 |
Improve token estimation safety margin |
| 7 | AGENT-020 |
Add circuit breaker for failed providers |
| 8 | AGENT-025 |
Clear LLM timeout on success |
| 9 | AGENT-005 |
Add state transition guards |
| # | Issue | What |
|---|---|---|
| 10 | AGENT-015 |
Reduce system prompt bloat |
| 11 | AGENT-017 |
Cache workspace file reads |
| 12 | AGENT-018 |
Wire up task complexity routing |
| 13 | AGENT-024 |
Write agent unit tests |
| 14 | AGENT-003 |
Improve env var warning logging |
| 15 | AGENT-002,004,006,007,011-14,16,19,23,26 |
Everything else |
| File | Lines | Status | Critical Issues |
|---|---|---|---|
src/agent/loop.ts |
576 | 🟡 Has bugs | AGENT-004, 005, 006, 009, 010, 022, 025, 026 |
src/agent/router.ts |
263 | 🟡 Has issues | AGENT-003, 018, 019 |
src/agent/fallback.ts |
291 | 🟡 Has bugs | AGENT-020, 021 |
src/agent/context-guard.ts |
180 | 🟡 Has issues | AGENT-007, 008, 009 |
src/agent/prompts.ts |
511 | 🟡 Bloated | AGENT-015, 016, 017, 023 |
src/agent/providers/openai-compatible.ts |
283 | ✅ Clean | — |
src/agent/providers/opencode.ts |
110 | ✅ Clean | AGENT-012, 013 |
src/agent/README.md |
340 | 🔴 Hallucinated | AGENT-001, 002 |
src/agent/ARCHITECTURAL_REVIEW.md |
588 | ✅ Accurate | Pre-existing review |
src/agent/providers/README.md |
58 | 🟡 Inaccurate | AGENT-011, 014 |
| Metric | src/tools/ | src/channels/ | src/agent/ |
|---|---|---|---|
| Total issues | 32 | 24 | 26 |
| Critical/High | 8 | 9 | 7 |
| Has unit tests? | Partial (Apple only) | ❌ None | ❌ Nearly none |
| README accuracy | ~60% | ~50% | ~40% (worst) |
| Dead code found | 3 modules | 1 file | 1 method |
| Security issues | 5 | 2 | 2 |
| Blocks daily use? | No | Yes (CHAN-003) | Maybe (AGENT-003) |
The agent README is the least trustworthy of all three — primarily because of the 35-43x line count inflation, which suggests no human ever reviewed it.