Conversation
…se handling
Phase 1 of the routing-correctness audit
(docs/architecture/routing-correctness-audit.md).
Closes the user-visible "agent doesn't respond" failure mode where the
adaptive selector silently downgraded a user-pinned `claude-sonnet-4-6`
to `gemini-2.0-flash`, the request was forwarded to Cenzontle which
routed everything via the OPENAI provider against an Azure AI Services
deployment that does not recognise Gemini, returning HTTP 200 with an
empty body — and Halcon's D1 retry path then re-issued the same
misrouted request twice more before synthesising a generic fallback.
This commit ships the Halcon side of the fix (the Cenzontle backend
side is a separate PR against cuervo-ai/cenzontle on the same branch
name).
CHANGES
1. `ModelSelectionConfig::respect_default_model: bool` (default false,
opt-in for back-compat). When true AND `general.default_model` is a
non-empty/non-`auto` value, the adaptive selector is not constructed
for the session — the pinned model is used verbatim, equivalent to
passing `--model` on the CLI.
2. `repl/mod.rs`: extends the existing `!explicit_model` guard with the
new `pinned_via_config` condition. Emits `tracing::info` when the
selector is skipped so the audit trail shows *why* the override was
bypassed (Loki/Tempo correlation).
3. `agent/mod.rs` D1 EmptyResponse branch:
- replaces the silent `info!` with a `warn!` carrying provider,
model, and the count + identity of available fallback models
- replaces the generic `[frontier] empty response detected (retry
N/M)` user message with one that names the next viable fallback
(or directs the user to `halcon doctor` when none is configured)
- leaves a TODO with a pointer to the audit doc for the deferred
true automated failover (R1 in the roadmap), which requires a
cross-module mutability refactor scoped to its own PR
- The same-model retry behaviour is preserved for now to avoid
regressing legitimate transient failures; it just becomes loud
and actionable instead of silent and confusing.
4. New unit test `config_back_compat_default_false` proves that configs
written before this change still deserialise (they get
`respect_default_model = false`, preserving prior behaviour).
`config_serde_roundtrip` updated for the new field.
5. `docs/architecture/routing-correctness-audit.md` — full audit:
five-stage failure cascade, before/after flow diagrams, files
modified across both repos, eight-item roadmap of deferred work
with owners/estimates, and operator validation steps.
VERIFICATION
- `cargo fmt --all -- --check` clean.
- `cargo check -p halcon-cli --no-default-features --features tui`
clean.
- `cargo test -p halcon-cli --lib --no-default-features --features tui
model_selector` → 54 passed (incl. the new back-compat test).
- 13 failing tests in the full `halcon-cli` lib suite are all in
`repl::domain::task_analyzer` / `hybrid_classifier` and are
pre-existing on `main` (verified with `git stash` + re-run before
this commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
osvalois
pushed a commit
that referenced
this pull request
Apr 30, 2026
Cuts a patch release containing: - feat(routing): respect pinned default_model when the user fixes a capable model in halcon.config.yaml (#11) — the planner no longer silently downgrades a pinned Opus/Sonnet selection to Haiku. - observable empty-response handling — explicit RuntimeEvent emitted when an upstream provider returns no choices instead of degrading to a generic error. No behavioral changes outside the routing path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 of the routing-correctness audit
Full design + roadmap:
docs/architecture/routing-correctness-audit.md— please read before reviewing the diff.What this PR closes
The user-visible "agent doesn't respond" failure mode where the adaptive selector silently downgraded a user-pinned
claude-sonnet-4-6togemini-2.0-flash, the request was forwarded to Cenzontle which routed everything via the OPENAI provider against an Azure AI Services deployment that does not recognise Gemini, returning HTTP 200 with an empty body — and Halcon's D1 retry path then re-issued the same misrouted request twice more before synthesising a generic fallback.Five-stage failure cascade documented in the audit doc §1; this PR ships the Halcon side. The Cenzontle backend side (typed errors + 502 on empty body) is a sibling PR against
cuervo-ai/cenzontleon a branch with the same name.Changes (Halcon)
ModelSelectionConfig::respect_default_modelbool(defaultfalse). Whentrueandgeneral.default_modelis non-empty/non-auto, the adaptive selector is not constructed for the session.repl/mod.rs!explicit_modelguard withpinned_via_config. Emitstracing::infowhen the selector is skipped so the audit trail shows why the override was bypassed.agent/mod.rsD1 EmptyResponseinfo!becomes a structuredwarn!withprovider,model,fallbacks_available,fallbackslist. The user-facing TUI message names the next viable fallback (halcon -m <fallback> chat ...) or directs tohalcon doctorwhen none is configured.model_selector.rsconfig_back_compat_default_falseproving prior configs still deserialise with the new field defaulting tofalse.docs/architecture/routing-correctness-audit.mdWhy a "light" fix on the empty-response branch
The clean version requires either plumbing
effective_provider/selected_modelmutability throughround_setupfor true per-iteration model swap, OR a newexhausted_models: Vec<String>onAgentStatefed back into the selector and the Paloma router. Both cross 4-5 modules and warrant a focused PR with the agent-loop owner. This PR converts the silent failure into a loud, actionable warning naming the next fallback. Combined with the Cenzontle 502, the user-visible symptom is fully resolved. The deferred refactor is tracked as R1 in the roadmap.Test plan
cargo fmt --all -- --check— cleancargo check -p halcon-cli --no-default-features --features tui— cleancargo test -p halcon-cli --lib --no-default-features --features tui model_selector— 54 passed including new back-compat testcargo test -p halcon-cli --lib --no-default-features --features tui(full) — 4863 passed; 13 pre-existing failures intask_analyzer/hybrid_classifier(unrelated, verified bygit stash+ re-run before commit)Acceptance criteria status (from spec §7)
claude-sonnet-4-6cannot be silently substituted by Geminirespect_default_model=true200 OKwith empty body🤖 Generated with Claude Code