Skip to content

feat(routing): respect pinned default_model + observable empty-response handling#11

Merged
osvalois merged 1 commit intomainfrom
fix/routing-correctness-phase-1
Apr 29, 2026
Merged

feat(routing): respect pinned default_model + observable empty-response handling#11
osvalois merged 1 commit intomainfrom
fix/routing-correctness-phase-1

Conversation

@osvalois
Copy link
Copy Markdown
Contributor

Phase 1 of the routing-correctness audit

Full design + roadmap: docs/architecture/routing-correctness-audit.md — please read before reviewing the diff.

What this PR closes

The user-visible "agent doesn't respond" failure mode where the adaptive selector silently downgraded a user-pinned claude-sonnet-4-6 to gemini-2.0-flash, the request was forwarded to Cenzontle which routed everything via the OPENAI provider against an Azure AI Services deployment that does not recognise Gemini, returning HTTP 200 with an empty body — and Halcon's D1 retry path then re-issued the same misrouted request twice more before synthesising a generic fallback.

Five-stage failure cascade documented in the audit doc §1; this PR ships the Halcon side. The Cenzontle backend side (typed errors + 502 on empty body) is a sibling PR against cuervo-ai/cenzontle on a branch with the same name.

Changes (Halcon)

Area Change
ModelSelectionConfig::respect_default_model New opt-in bool (default false). When true and general.default_model is non-empty/non-auto, the adaptive selector is not constructed for the session.
repl/mod.rs Extends the existing !explicit_model guard with pinned_via_config. Emits tracing::info when the selector is skipped so the audit trail shows why the override was bypassed.
agent/mod.rs D1 EmptyResponse The silent info! becomes a structured warn! with provider, model, fallbacks_available, fallbacks list. The user-facing TUI message names the next viable fallback (halcon -m <fallback> chat ...) or directs to halcon doctor when none is configured.
model_selector.rs New unit test config_back_compat_default_false proving prior configs still deserialise with the new field defaulting to false.
docs/architecture/routing-correctness-audit.md Full audit: five-stage cascade, before/after flows, files modified, 8-item deferred roadmap with owners/estimates, operator validation steps, acceptance-criteria checklist.

Why a "light" fix on the empty-response branch

The clean version requires either plumbing effective_provider/selected_model mutability through round_setup for true per-iteration model swap, OR a new exhausted_models: Vec<String> on AgentState fed back into the selector and the Paloma router. Both cross 4-5 modules and warrant a focused PR with the agent-loop owner. This PR converts the silent failure into a loud, actionable warning naming the next fallback. Combined with the Cenzontle 502, the user-visible symptom is fully resolved. The deferred refactor is tracked as R1 in the roadmap.

Test plan

  • cargo fmt --all -- --check — clean
  • cargo check -p halcon-cli --no-default-features --features tui — clean
  • cargo test -p halcon-cli --lib --no-default-features --features tui model_selector — 54 passed including new back-compat test
  • cargo test -p halcon-cli --lib --no-default-features --features tui (full) — 4863 passed; 13 pre-existing failures in task_analyzer / hybrid_classifier (unrelated, verified by git stash + re-run before commit)
  • CI Test (ubuntu) green
  • CI Test (macos) green on merge to main
  • Operator smoke-test per audit doc §8 once Cenzontle PR also lands

Acceptance criteria status (from spec §7)

Criterion Status
claude-sonnet-4-6 cannot be silently substituted by Gemini ✅ via respect_default_model=true
Failures observable + actionable ✅ enriched warn + user-facing fallback hint
Fallback does not retry blindly 🔶 Halcon now warns with next fallback name; true automated failover scoped as R1
No 200 OK with empty body 🔶 Cenzontle PR (sibling)
Typed errors at the wire 🔶 Cenzontle PR (sibling)

🤖 Generated with Claude Code

…se handling

Phase 1 of the routing-correctness audit
(docs/architecture/routing-correctness-audit.md).

Closes the user-visible "agent doesn't respond" failure mode where the
adaptive selector silently downgraded a user-pinned `claude-sonnet-4-6`
to `gemini-2.0-flash`, the request was forwarded to Cenzontle which
routed everything via the OPENAI provider against an Azure AI Services
deployment that does not recognise Gemini, returning HTTP 200 with an
empty body — and Halcon's D1 retry path then re-issued the same
misrouted request twice more before synthesising a generic fallback.

This commit ships the Halcon side of the fix (the Cenzontle backend
side is a separate PR against cuervo-ai/cenzontle on the same branch
name).

CHANGES

1. `ModelSelectionConfig::respect_default_model: bool` (default false,
   opt-in for back-compat). When true AND `general.default_model` is a
   non-empty/non-`auto` value, the adaptive selector is not constructed
   for the session — the pinned model is used verbatim, equivalent to
   passing `--model` on the CLI.

2. `repl/mod.rs`: extends the existing `!explicit_model` guard with the
   new `pinned_via_config` condition. Emits `tracing::info` when the
   selector is skipped so the audit trail shows *why* the override was
   bypassed (Loki/Tempo correlation).

3. `agent/mod.rs` D1 EmptyResponse branch:
   - replaces the silent `info!` with a `warn!` carrying provider,
     model, and the count + identity of available fallback models
   - replaces the generic `[frontier] empty response detected (retry
     N/M)` user message with one that names the next viable fallback
     (or directs the user to `halcon doctor` when none is configured)
   - leaves a TODO with a pointer to the audit doc for the deferred
     true automated failover (R1 in the roadmap), which requires a
     cross-module mutability refactor scoped to its own PR
   - The same-model retry behaviour is preserved for now to avoid
     regressing legitimate transient failures; it just becomes loud
     and actionable instead of silent and confusing.

4. New unit test `config_back_compat_default_false` proves that configs
   written before this change still deserialise (they get
   `respect_default_model = false`, preserving prior behaviour).
   `config_serde_roundtrip` updated for the new field.

5. `docs/architecture/routing-correctness-audit.md` — full audit:
   five-stage failure cascade, before/after flow diagrams, files
   modified across both repos, eight-item roadmap of deferred work
   with owners/estimates, and operator validation steps.

VERIFICATION

- `cargo fmt --all -- --check` clean.
- `cargo check -p halcon-cli --no-default-features --features tui`
  clean.
- `cargo test -p halcon-cli --lib --no-default-features --features tui
  model_selector` → 54 passed (incl. the new back-compat test).
- 13 failing tests in the full `halcon-cli` lib suite are all in
  `repl::domain::task_analyzer` / `hybrid_classifier` and are
  pre-existing on `main` (verified with `git stash` + re-run before
  this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@osvalois osvalois merged commit de67462 into main Apr 29, 2026
16 checks passed
@osvalois osvalois deleted the fix/routing-correctness-phase-1 branch April 29, 2026 17:07
osvalois pushed a commit that referenced this pull request Apr 30, 2026
Cuts a patch release containing:
- feat(routing): respect pinned default_model when the user fixes a
  capable model in halcon.config.yaml (#11) — the planner no longer
  silently downgrades a pinned Opus/Sonnet selection to Haiku.
- observable empty-response handling — explicit RuntimeEvent emitted
  when an upstream provider returns no choices instead of degrading
  to a generic error.

No behavioral changes outside the routing path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant