Skip to content

IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist#288

Open
youwangd wants to merge 1 commit into
aiming-lab:mainfrom
youwangd:fix/stage6-paused-on-empty-shortlist
Open

IMP-21: gate Stage 6 (knowledge_extract) on non-empty shortlist#288
youwangd wants to merge 1 commit into
aiming-lab:mainfrom
youwangd:fix/stage6-paused-on-empty-shortlist

Conversation

@youwangd

@youwangd youwangd commented Jun 1, 2026

Copy link
Copy Markdown

When Stage 5 (LITERATURE_SCREEN) returns PAUSED with decision="rejected_all" because its strict screen rejects all candidates, no shortlist.jsonl is written. Stage 6 then runs on empty input, spends an LLM turn extracting cards from nothing, produces low-quality fallback content, and lets downstream stages cascade on garbage.

Add a defensive entry-point gate in _execute_knowledge_extract: if shortlist.jsonl is missing, empty, or whitespace-only, return PAUSED with a knowledge_meta.json explaining the upstream issue. The gate fires before any LLM call, so a misconfigured run does not waste budget.

Why PAUSED, not BLOCKED_APPROVAL

The runner halts on PAUSED unconditionally
(runner.py: 'if result.status == StageStatus.PAUSED: break'), whereas BLOCKED_APPROVAL only halts when stop_on_gate=True — and --auto-approve explicitly sets stop_on_gate=False, which is the exact scenario this gate must prevent the cascade for. Returning BLOCKED_APPROVAL would write the meta file but still allow Stages 7-23 to run on missing input, defeating the gate's purpose.

This mirrors the existing Stage 5 pattern, which also returns PAUSED with decision="rejected_all" when its strict screen rejects all candidates (literature.py:762-769). Stage 5 only becomes BLOCKED_APPROVAL because it is in GATE_STAGES and the executor wraps gate-stage results; KNOWLEDGE_EXTRACT is not a gate stage, so its PAUSED status is preserved end-to-end and halts the pipeline regardless of stop_on_gate.

Tests

Adds 4 unit tests in TestIMP21_KnowledgeExtractEmptyShortlistGate (tests/test_rc_executor.py) covering: missing file, empty file, whitespace-only file, and the positive case where a single valid row passes the gate.

Adds 1 runner-level integration test in tests/test_rc_runner.py that drives execute_pipeline with stop_on_gate=False (the --auto-approve scenario) and asserts:
(a) pipeline halts at Stage 6 with status=PAUSED
(b) Stage 7 (SYNTHESIS) and downstream stages 8-23 never execute
(c) pipeline_summary.json reports final_status="paused"
This regression test would fail if the gate returned BLOCKED_APPROVAL — verified by temporarily flipping the mock and seeing all 23 stages execute.

Reproduction

rc-20260530-075008-438447 in our smoke-test run hit this exact path (Stage 5 rejected all candidates → Stage 6 ran on empty shortlist → cascade failure into Stage 9 plan-budget overflow).

When Stage 5 (LITERATURE_SCREEN) returns PAUSED with
decision="rejected_all" because its strict screen rejects all
candidates, no shortlist.jsonl is written. Stage 6 then runs on
empty input, spends an LLM turn extracting cards from nothing,
produces low-quality fallback content, and lets downstream stages
cascade on garbage.

Add a defensive entry-point gate in _execute_knowledge_extract: if
shortlist.jsonl is missing, empty, or whitespace-only, return PAUSED
with a knowledge_meta.json explaining the upstream issue. The gate
fires before any LLM call, so a misconfigured run does not waste
budget.

Why PAUSED, not BLOCKED_APPROVAL
--------------------------------
The runner halts on PAUSED unconditionally
(runner.py: 'if result.status == StageStatus.PAUSED: break'),
whereas BLOCKED_APPROVAL only halts when stop_on_gate=True — and
--auto-approve explicitly sets stop_on_gate=False, which is the
exact scenario this gate must prevent the cascade for. Returning
BLOCKED_APPROVAL would write the meta file but still allow Stages
7-23 to run on missing input, defeating the gate's purpose.

This mirrors the existing Stage 5 pattern, which also returns
PAUSED with decision="rejected_all" when its strict screen rejects
all candidates (literature.py:762-769). Stage 5 only becomes
BLOCKED_APPROVAL because it is in GATE_STAGES and the executor wraps
gate-stage results; KNOWLEDGE_EXTRACT is not a gate stage, so its
PAUSED status is preserved end-to-end and halts the pipeline
regardless of stop_on_gate.

Tests
-----
Adds 4 unit tests in TestIMP21_KnowledgeExtractEmptyShortlistGate
(tests/test_rc_executor.py) covering: missing file, empty file,
whitespace-only file, and the positive case where a single valid row
passes the gate.

Adds 1 runner-level integration test in tests/test_rc_runner.py
that drives execute_pipeline with stop_on_gate=False (the
--auto-approve scenario) and asserts:
  (a) pipeline halts at Stage 6 with status=PAUSED
  (b) Stage 7 (SYNTHESIS) and downstream stages 8-23 never execute
  (c) pipeline_summary.json reports final_status="paused"
This regression test would fail if the gate returned
BLOCKED_APPROVAL — verified by temporarily flipping the mock and
seeing all 23 stages execute.

Reproduction
------------
rc-20260530-075008-438447 in our smoke-test run hit this exact path
(Stage 5 rejected all candidates → Stage 6 ran on empty shortlist
→ cascade failure into Stage 9 plan-budget overflow).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant