IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist by youwangd · Pull Request #288 · aiming-lab/AutoResearchClaw

youwangd · 2026-06-01T21:25:52Z

When Stage 5 (LITERATURE_SCREEN) returns PAUSED with decision="rejected_all" because its strict screen rejects all candidates, no shortlist.jsonl is written. Stage 6 then runs on empty input, spends an LLM turn extracting cards from nothing, produces low-quality fallback content, and lets downstream stages cascade on garbage.

Add a defensive entry-point gate in _execute_knowledge_extract: if shortlist.jsonl is missing, empty, or whitespace-only, return PAUSED with a knowledge_meta.json explaining the upstream issue. The gate fires before any LLM call, so a misconfigured run does not waste budget.

Why PAUSED, not BLOCKED_APPROVAL

The runner halts on PAUSED unconditionally
(runner.py: 'if result.status == StageStatus.PAUSED: break'), whereas BLOCKED_APPROVAL only halts when stop_on_gate=True — and --auto-approve explicitly sets stop_on_gate=False, which is the exact scenario this gate must prevent the cascade for. Returning BLOCKED_APPROVAL would write the meta file but still allow Stages 7-23 to run on missing input, defeating the gate's purpose.

This mirrors the existing Stage 5 pattern, which also returns PAUSED with decision="rejected_all" when its strict screen rejects all candidates (literature.py:762-769). Stage 5 only becomes BLOCKED_APPROVAL because it is in GATE_STAGES and the executor wraps gate-stage results; KNOWLEDGE_EXTRACT is not a gate stage, so its PAUSED status is preserved end-to-end and halts the pipeline regardless of stop_on_gate.

Tests

Adds 4 unit tests in TestIMP21_KnowledgeExtractEmptyShortlistGate (tests/test_rc_executor.py) covering: missing file, empty file, whitespace-only file, and the positive case where a single valid row passes the gate.

Adds 1 runner-level integration test in tests/test_rc_runner.py that drives execute_pipeline with stop_on_gate=False (the --auto-approve scenario) and asserts:
(a) pipeline halts at Stage 6 with status=PAUSED
(b) Stage 7 (SYNTHESIS) and downstream stages 8-23 never execute
(c) pipeline_summary.json reports final_status="paused"
This regression test would fail if the gate returned BLOCKED_APPROVAL — verified by temporarily flipping the mock and seeing all 23 stages execute.

Reproduction

rc-20260530-075008-438447 in our smoke-test run hit this exact path (Stage 5 rejected all candidates → Stage 6 ran on empty shortlist → cascade failure into Stage 9 plan-budget overflow).

When Stage 5 (LITERATURE_SCREEN) returns PAUSED with decision="rejected_all" because its strict screen rejects all candidates, no shortlist.jsonl is written. Stage 6 then runs on empty input, spends an LLM turn extracting cards from nothing, produces low-quality fallback content, and lets downstream stages cascade on garbage. Add a defensive entry-point gate in _execute_knowledge_extract: if shortlist.jsonl is missing, empty, or whitespace-only, return PAUSED with a knowledge_meta.json explaining the upstream issue. The gate fires before any LLM call, so a misconfigured run does not waste budget. Why PAUSED, not BLOCKED_APPROVAL -------------------------------- The runner halts on PAUSED unconditionally (runner.py: 'if result.status == StageStatus.PAUSED: break'), whereas BLOCKED_APPROVAL only halts when stop_on_gate=True — and --auto-approve explicitly sets stop_on_gate=False, which is the exact scenario this gate must prevent the cascade for. Returning BLOCKED_APPROVAL would write the meta file but still allow Stages 7-23 to run on missing input, defeating the gate's purpose. This mirrors the existing Stage 5 pattern, which also returns PAUSED with decision="rejected_all" when its strict screen rejects all candidates (literature.py:762-769). Stage 5 only becomes BLOCKED_APPROVAL because it is in GATE_STAGES and the executor wraps gate-stage results; KNOWLEDGE_EXTRACT is not a gate stage, so its PAUSED status is preserved end-to-end and halts the pipeline regardless of stop_on_gate. Tests ----- Adds 4 unit tests in TestIMP21_KnowledgeExtractEmptyShortlistGate (tests/test_rc_executor.py) covering: missing file, empty file, whitespace-only file, and the positive case where a single valid row passes the gate. Adds 1 runner-level integration test in tests/test_rc_runner.py that drives execute_pipeline with stop_on_gate=False (the --auto-approve scenario) and asserts: (a) pipeline halts at Stage 6 with status=PAUSED (b) Stage 7 (SYNTHESIS) and downstream stages 8-23 never execute (c) pipeline_summary.json reports final_status="paused" This regression test would fail if the gate returned BLOCKED_APPROVAL — verified by temporarily flipping the mock and seeing all 23 stages execute. Reproduction ------------ rc-20260530-075008-438447 in our smoke-test run hit this exact path (Stage 5 rejected all candidates → Stage 6 ran on empty shortlist → cascade failure into Stage 9 plan-budget overflow).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist#288

IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist#288
youwangd wants to merge 1 commit into
aiming-lab:mainfrom
youwangd:fix/stage6-paused-on-empty-shortlist

youwangd commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youwangd commented Jun 1, 2026

Why PAUSED, not BLOCKED_APPROVAL

Tests

Reproduction

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant