IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist#288
Open
youwangd wants to merge 1 commit into
Open
IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist#288youwangd wants to merge 1 commit into
youwangd wants to merge 1 commit into
Conversation
When Stage 5 (LITERATURE_SCREEN) returns PAUSED with decision="rejected_all" because its strict screen rejects all candidates, no shortlist.jsonl is written. Stage 6 then runs on empty input, spends an LLM turn extracting cards from nothing, produces low-quality fallback content, and lets downstream stages cascade on garbage. Add a defensive entry-point gate in _execute_knowledge_extract: if shortlist.jsonl is missing, empty, or whitespace-only, return PAUSED with a knowledge_meta.json explaining the upstream issue. The gate fires before any LLM call, so a misconfigured run does not waste budget. Why PAUSED, not BLOCKED_APPROVAL -------------------------------- The runner halts on PAUSED unconditionally (runner.py: 'if result.status == StageStatus.PAUSED: break'), whereas BLOCKED_APPROVAL only halts when stop_on_gate=True — and --auto-approve explicitly sets stop_on_gate=False, which is the exact scenario this gate must prevent the cascade for. Returning BLOCKED_APPROVAL would write the meta file but still allow Stages 7-23 to run on missing input, defeating the gate's purpose. This mirrors the existing Stage 5 pattern, which also returns PAUSED with decision="rejected_all" when its strict screen rejects all candidates (literature.py:762-769). Stage 5 only becomes BLOCKED_APPROVAL because it is in GATE_STAGES and the executor wraps gate-stage results; KNOWLEDGE_EXTRACT is not a gate stage, so its PAUSED status is preserved end-to-end and halts the pipeline regardless of stop_on_gate. Tests ----- Adds 4 unit tests in TestIMP21_KnowledgeExtractEmptyShortlistGate (tests/test_rc_executor.py) covering: missing file, empty file, whitespace-only file, and the positive case where a single valid row passes the gate. Adds 1 runner-level integration test in tests/test_rc_runner.py that drives execute_pipeline with stop_on_gate=False (the --auto-approve scenario) and asserts: (a) pipeline halts at Stage 6 with status=PAUSED (b) Stage 7 (SYNTHESIS) and downstream stages 8-23 never execute (c) pipeline_summary.json reports final_status="paused" This regression test would fail if the gate returned BLOCKED_APPROVAL — verified by temporarily flipping the mock and seeing all 23 stages execute. Reproduction ------------ rc-20260530-075008-438447 in our smoke-test run hit this exact path (Stage 5 rejected all candidates → Stage 6 ran on empty shortlist → cascade failure into Stage 9 plan-budget overflow).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When Stage 5 (LITERATURE_SCREEN) returns PAUSED with decision="rejected_all" because its strict screen rejects all candidates, no shortlist.jsonl is written. Stage 6 then runs on empty input, spends an LLM turn extracting cards from nothing, produces low-quality fallback content, and lets downstream stages cascade on garbage.
Add a defensive entry-point gate in _execute_knowledge_extract: if shortlist.jsonl is missing, empty, or whitespace-only, return PAUSED with a knowledge_meta.json explaining the upstream issue. The gate fires before any LLM call, so a misconfigured run does not waste budget.
Why PAUSED, not BLOCKED_APPROVAL
The runner halts on PAUSED unconditionally
(runner.py: 'if result.status == StageStatus.PAUSED: break'), whereas BLOCKED_APPROVAL only halts when stop_on_gate=True — and --auto-approve explicitly sets stop_on_gate=False, which is the exact scenario this gate must prevent the cascade for. Returning BLOCKED_APPROVAL would write the meta file but still allow Stages 7-23 to run on missing input, defeating the gate's purpose.
This mirrors the existing Stage 5 pattern, which also returns PAUSED with decision="rejected_all" when its strict screen rejects all candidates (literature.py:762-769). Stage 5 only becomes BLOCKED_APPROVAL because it is in GATE_STAGES and the executor wraps gate-stage results; KNOWLEDGE_EXTRACT is not a gate stage, so its PAUSED status is preserved end-to-end and halts the pipeline regardless of stop_on_gate.
Tests
Adds 4 unit tests in TestIMP21_KnowledgeExtractEmptyShortlistGate (tests/test_rc_executor.py) covering: missing file, empty file, whitespace-only file, and the positive case where a single valid row passes the gate.
Adds 1 runner-level integration test in tests/test_rc_runner.py that drives execute_pipeline with stop_on_gate=False (the --auto-approve scenario) and asserts:
(a) pipeline halts at Stage 6 with status=PAUSED
(b) Stage 7 (SYNTHESIS) and downstream stages 8-23 never execute
(c) pipeline_summary.json reports final_status="paused"
This regression test would fail if the gate returned BLOCKED_APPROVAL — verified by temporarily flipping the mock and seeing all 23 stages execute.
Reproduction
rc-20260530-075008-438447 in our smoke-test run hit this exact path (Stage 5 rejected all candidates → Stage 6 ran on empty shortlist → cascade failure into Stage 9 plan-budget overflow).