feat(routing): respect pinned default_model + observable empty-response handling by osvalois · Pull Request #11 · cuervo-ai/halcon

osvalois · 2026-04-29T09:17:11Z

Phase 1 of the routing-correctness audit

Full design + roadmap: docs/architecture/routing-correctness-audit.md — please read before reviewing the diff.

What this PR closes

The user-visible "agent doesn't respond" failure mode where the adaptive selector silently downgraded a user-pinned claude-sonnet-4-6 to gemini-2.0-flash, the request was forwarded to Cenzontle which routed everything via the OPENAI provider against an Azure AI Services deployment that does not recognise Gemini, returning HTTP 200 with an empty body — and Halcon's D1 retry path then re-issued the same misrouted request twice more before synthesising a generic fallback.

Five-stage failure cascade documented in the audit doc §1; this PR ships the Halcon side. The Cenzontle backend side (typed errors + 502 on empty body) is a sibling PR against cuervo-ai/cenzontle on a branch with the same name.

Changes (Halcon)

Area	Change
`ModelSelectionConfig::respect_default_model`	New opt-in `bool` (default `false`). When `true` and `general.default_model` is non-empty/non-`auto`, the adaptive selector is not constructed for the session.
`repl/mod.rs`	Extends the existing `!explicit_model` guard with `pinned_via_config`. Emits `tracing::info` when the selector is skipped so the audit trail shows why the override was bypassed.
`agent/mod.rs` D1 EmptyResponse	The silent `info!` becomes a structured `warn!` with `provider`, `model`, `fallbacks_available`, `fallbacks` list. The user-facing TUI message names the next viable fallback (`halcon -m <fallback> chat ...`) or directs to `halcon doctor` when none is configured.
`model_selector.rs`	New unit test `config_back_compat_default_false` proving prior configs still deserialise with the new field defaulting to `false`.
`docs/architecture/routing-correctness-audit.md`	Full audit: five-stage cascade, before/after flows, files modified, 8-item deferred roadmap with owners/estimates, operator validation steps, acceptance-criteria checklist.

Why a "light" fix on the empty-response branch

The clean version requires either plumbing effective_provider/selected_model mutability through round_setup for true per-iteration model swap, OR a new exhausted_models: Vec<String> on AgentState fed back into the selector and the Paloma router. Both cross 4-5 modules and warrant a focused PR with the agent-loop owner. This PR converts the silent failure into a loud, actionable warning naming the next fallback. Combined with the Cenzontle 502, the user-visible symptom is fully resolved. The deferred refactor is tracked as R1 in the roadmap.

Test plan

cargo fmt --all -- --check — clean
cargo check -p halcon-cli --no-default-features --features tui — clean
cargo test -p halcon-cli --lib --no-default-features --features tui model_selector — 54 passed including new back-compat test
cargo test -p halcon-cli --lib --no-default-features --features tui (full) — 4863 passed; 13 pre-existing failures in task_analyzer / hybrid_classifier (unrelated, verified by git stash + re-run before commit)
CI Test (ubuntu) green
CI Test (macos) green on merge to main
Operator smoke-test per audit doc §8 once Cenzontle PR also lands

Acceptance criteria status (from spec §7)

Criterion	Status
`claude-sonnet-4-6` cannot be silently substituted by Gemini	✅ via `respect_default_model=true`
Failures observable + actionable	✅ enriched warn + user-facing fallback hint
Fallback does not retry blindly	🔶 Halcon now warns with next fallback name; true automated failover scoped as R1
No `200 OK` with empty body	🔶 Cenzontle PR (sibling)
Typed errors at the wire	🔶 Cenzontle PR (sibling)

🤖 Generated with Claude Code

…se handling Phase 1 of the routing-correctness audit (docs/architecture/routing-correctness-audit.md). Closes the user-visible "agent doesn't respond" failure mode where the adaptive selector silently downgraded a user-pinned `claude-sonnet-4-6` to `gemini-2.0-flash`, the request was forwarded to Cenzontle which routed everything via the OPENAI provider against an Azure AI Services deployment that does not recognise Gemini, returning HTTP 200 with an empty body — and Halcon's D1 retry path then re-issued the same misrouted request twice more before synthesising a generic fallback. This commit ships the Halcon side of the fix (the Cenzontle backend side is a separate PR against cuervo-ai/cenzontle on the same branch name). CHANGES 1. `ModelSelectionConfig::respect_default_model: bool` (default false, opt-in for back-compat). When true AND `general.default_model` is a non-empty/non-`auto` value, the adaptive selector is not constructed for the session — the pinned model is used verbatim, equivalent to passing `--model` on the CLI. 2. `repl/mod.rs`: extends the existing `!explicit_model` guard with the new `pinned_via_config` condition. Emits `tracing::info` when the selector is skipped so the audit trail shows *why* the override was bypassed (Loki/Tempo correlation). 3. `agent/mod.rs` D1 EmptyResponse branch: - replaces the silent `info!` with a `warn!` carrying provider, model, and the count + identity of available fallback models - replaces the generic `[frontier] empty response detected (retry N/M)` user message with one that names the next viable fallback (or directs the user to `halcon doctor` when none is configured) - leaves a TODO with a pointer to the audit doc for the deferred true automated failover (R1 in the roadmap), which requires a cross-module mutability refactor scoped to its own PR - The same-model retry behaviour is preserved for now to avoid regressing legitimate transient failures; it just becomes loud and actionable instead of silent and confusing. 4. New unit test `config_back_compat_default_false` proves that configs written before this change still deserialise (they get `respect_default_model = false`, preserving prior behaviour). `config_serde_roundtrip` updated for the new field. 5. `docs/architecture/routing-correctness-audit.md` — full audit: five-stage failure cascade, before/after flow diagrams, files modified across both repos, eight-item roadmap of deferred work with owners/estimates, and operator validation steps. VERIFICATION - `cargo fmt --all -- --check` clean. - `cargo check -p halcon-cli --no-default-features --features tui` clean. - `cargo test -p halcon-cli --lib --no-default-features --features tui model_selector` → 54 passed (incl. the new back-compat test). - 13 failing tests in the full `halcon-cli` lib suite are all in `repl::domain::task_analyzer` / `hybrid_classifier` and are pre-existing on `main` (verified with `git stash` + re-run before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cuts a patch release containing: - feat(routing): respect pinned default_model when the user fixes a capable model in halcon.config.yaml (#11) — the planner no longer silently downgrades a pinned Opus/Sonnet selection to Haiku. - observable empty-response handling — explicit RuntimeEvent emitted when an upstream provider returns no choices instead of degrading to a generic error. No behavioral changes outside the routing path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

osvalois merged commit de67462 into main Apr 29, 2026
16 checks passed

osvalois deleted the fix/routing-correctness-phase-1 branch April 29, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(routing): respect pinned default_model + observable empty-response handling#11

feat(routing): respect pinned default_model + observable empty-response handling#11
osvalois merged 1 commit intomainfrom
fix/routing-correctness-phase-1

osvalois commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

osvalois commented Apr 29, 2026

Phase 1 of the routing-correctness audit

What this PR closes

Changes (Halcon)

Why a "light" fix on the empty-response branch

Test plan

Acceptance criteria status (from spec §7)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant