Multi-Headed Speculative Execution for Claude Code
"Cut off one head, two more shall take its place."
Except here β every head is doing your work faster and cheaper.
9 agents Β Β·Β 7 slash commands Β Β·Β 3 hooks Β Β·Β ~50% cost savings Β Β·Β Persistent memory Β Β·Β Integration integrity
You know how in the movies, Hydra had agents embedded everywhere, silently getting things done in the background? That's exactly what this framework does for your Claude Code sessions.
Hydra is a task-level speculative execution framework inspired by Speculative Decoding in LLM inference. Instead of making one expensive model (Opus 4.6) do everything β from searching files to writing entire modules β Hydra deploys a team of specialized "heads" running on faster, cheaper models that handle the grunt work.
The result? Opus becomes a manager, not a laborer. It classifies tasks, dispatches them to the right head, glances at the output, and moves on. The user never notices. It's invisible. It's always on.
New in v2.0.0: Every agent now has persistent memory β they learn your codebase patterns, conventions, and architectural decisions across sessions. The orchestrator (Opus) also maintains its own memory of fragile zones and routing decisions. Plus, the new hydra-sentinel automatically catches integration breakage after code changes β and code isn't presented to you until verification completes.
Four built-in speed optimizations reduce overhead at every stage: speculative pre-dispatch (scout launches in parallel with task classification), session indexing (codebase context persists across turns β no re-exploration), fire-and-forget dispatch (non-critical agents run in the background without blocking downstream work), and confidence-based auto-accept (raw factual outputs skip Opus review entirely).
Think of it this way:
Would you hire a $500/hr architect to carry bricks? No. You'd have them design the building and let the crew handle construction. That's Hydra.
One command. Done.
npx hail-hydra-cc@latest[OR]
npm i hail-hydra-cc@latestRuns the interactive installer β deploys 9 agents, 7 slash commands, 3 hooks, and registers the statusline and update checker. Done in seconds.
# Clone the repo
git clone https://github.com/AR6420/Hail_Hydra.git
cd hydra
# Deploy heads globally (recommended β always on, every project)
./scripts/install.sh --user
# π Hail Hydra! Framework active in all Claude Code sessions.
# β
9 agents β
7 commands β
3 hooks β
StatusLine β
VERSION# User-level β available in ALL your Claude Code projects
./scripts/install.sh --user
# Project-level β just this one project
./scripts/install.sh --project
# Both β maximum coverage
./scripts/install.sh --both
# Check what's deployed
./scripts/install.sh --status
# Remove everything
./scripts/install.sh --uninstall~/.claude/
βββ agents/ # 9 agent definitions (all with memory: project)
β βββ hydra-scout.md # π’ Haiku 4.5 β explore codebase
β βββ hydra-runner.md # π’ Haiku 4.5 β run tests/builds
β βββ hydra-scribe.md # π’ Haiku 4.5 β write documentation
β βββ hydra-guard.md # π’ Haiku 4.5 β security/quality gate
β βββ hydra-git.md # π’ Haiku 4.5 β git operations
β βββ hydra-sentinel-scan.md # π’ Haiku 4.5 β fast integration sweep
β βββ hydra-coder.md # π΅ Sonnet 4.6 β write/edit code
β βββ hydra-analyst.md # π΅ Sonnet 4.6 β debug/diagnose
β βββ hydra-sentinel.md # π΅ Sonnet 4.6 β deep integration analysis
βββ commands/hydra/ # 7 slash commands
β βββ help.md # /hydra:help
β βββ status.md # /hydra:status
β βββ update.md # /hydra:update
β βββ config.md # /hydra:config
β βββ guard.md # /hydra:guard
β βββ quiet.md # /hydra:quiet
β βββ verbose.md # /hydra:verbose
βββ hooks/ # 3 lifecycle hooks
β βββ hydra-check-update.js # SessionStart β version check (background)
β βββ hydra-statusline.js # StatusLine β status bar display
β βββ hydra-auto-guard.js # PostToolUse β file change tracker
βββ skills/
βββ hydra/ # Skill (Claude Code discoverable)
βββ SKILL.md # Orchestrator instructions
βββ VERSION # Installed version number
βββ config/
β βββ hydra.config.md # User configuration (created by --config)
βββ references/
βββ model-capabilities.md
βββ routing-guide.md
> **Note:** `settings.json` is at `~/.claude/settings.json` β hooks and statusLine are registered there.
Project-level (
--project): same files written to.claude/in your working directory instead of~/.claude/. Project-level takes precedence when both exist.
| Command | Description |
|---|---|
/hydra:help |
Show all commands and agents |
/hydra:status |
Show installed agents, version, and update availability |
/hydra:update |
Update Hydra to the latest version |
/hydra:config |
Show current configuration |
/hydra:guard [files] |
Run manual security & quality scan |
/hydra:quiet |
Suppress dispatch logs for this session |
/hydra:verbose |
Enable verbose dispatch logs with timing |
After installation, your Claude Code status bar shows real-time framework info:
π β Opus β Ctx: 37% ββββββββββ β $0.42 β my-project
| Element | What It Shows |
|---|---|
| π | Hydra is active |
| Model | Current Claude model (Opus, Sonnet, Haiku) |
| Ctx: XX% | Context window usage with visual bar |
| $X.XX | Session API cost so far |
| Directory | Current working directory |
| β Warning | Compaction warning (only at 70%+ context usage) |
Context bar colors:
- π’ Green (0β49%) β plenty of room
- π‘ Yellow (50β79%) β getting full, consider
/compact - π΄ Red (80%+) β context nearly full,
/compactor/clearrecommended
Compaction warnings (appended automatically at 70%+):
π β Opus β Ctx: 73% ββββββββββ β $1.87 β my-project β β Auto-compact at 85%
π β Opus β Ctx: 83% ββββββββββ β $3.14 β my-project β β Compacting soon!
- β Auto-compact at 85% (70β79%) β heads-up that compaction is approaching
- β Compacting soon! (80%+) β compaction is imminent, consider
/compactnow
Note: If you already have a custom
statusLineconfigured, the installer keeps yours and prints instructions for switching to Hydra's.
Hydra checks for updates once per session in the background (never blocks startup). When a new version is available, you'll see it in the status bar:
π β Opus β Ctx: 37% ββββββββββ β $0.42 β my-project β β‘ v1.1.0 available
Update with:
# From within Claude Code:
/hydra:update
# Or from your terminal:
npx hail-hydra-cc@latest --globalAfter updating, restart Claude Code to load the new files.
- Nine specialized heads β Haiku 4.5 (fast) and Sonnet 4.6 (capable) heads for every task type
- Sentinel integration integrity β Two-tier verification (fast scan + deep analysis) catches ~72% of integration bugs before runtime
- Persistent agent memory β Every agent remembers your codebase patterns, conventions, and past decisions across sessions
- Orchestrator memory β Opus maintains its own notes on fragile zones, routing patterns, and known issues via CLAUDE.md
- Quality-first pipeline β Code changes block until sentinel + guard verification completes; nothing reaches you unchecked
- Auto-Guard β hydra-guard (Haiku 4.5) automatically scans code changes for security issues after every hydra-coder run
- Configurable modes β
conservative,balanced(default), oraggressivedelegation viahydra.config.md - Slash commands β
/hydra:help,/hydra:status,/hydra:update,/hydra:config,/hydra:guard,/hydra:quiet,/hydra:verbosefor full session control - Quick commands β natural language shortcuts:
hydra status,hydra quiet,hydra verbose - Custom agent templates β Add your own heads using
templates/custom-agent.md - Session indexing β Codebase context persists across turns; no re-exploration on every prompt
- Speculative pre-dispatch β hydra-scout launches in parallel with task classification, saving 2β3 seconds per task
- Dispatch log β Transparent audit trail showing which agents ran, what model, and outcome
Most bugs don't come from bad code β they come from good code that doesn't fit together. A renamed export, a changed return type, a missing dependency after a refactor. These integration issues slip past linters, type-checkers, and even code review because no single file looks wrong.
hydra-sentinel catches them automatically.
Code change lands (hydra-coder finishes)
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β π’ hydra-sentinel-scan (Haiku 4.5) β β Runs on EVERY code change (~1-2s)
β Fast sweep: imports, exports, β
β signatures, dependencies β
ββββββββββββββββ¬ββββββββββββββββββββββββ
β
Issues found?
βββ No: β
Pass β code proceeds to guard
β
βββ Yes: Escalate
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β π΅ hydra-sentinel (Sonnet 4.6) β β Only when scan flags issues (~20-30%)
β Deep analysis: confirms real issues, β
β dismisses false positives, β
β proposes fixes β
ββββββββββββββββ¬ββββββββββββββββββββββββ
β
Fix decision:
βββ Trivial (import typo): Auto-fix
βββ Medium (signature mismatch): Offer fix to user
βββ Complex (architectural): Report with context
| Check Type | Priority | Example |
|---|---|---|
| Import/export mismatches | P0 | Importing a function that was renamed or removed |
| Function signature changes | P0 | Caller passes 2 args, function now expects 3 |
| Type contract violations | P1 | Function returns string but caller expects number |
| Missing dependency updates | P1 | New import added but package not in package.json |
| Cross-file rename gaps | P1 | Variable renamed in definition but not all call sites |
| Circular dependency introduction | P2 | New import creates A β B β C β A cycle |
| Dead code from refactoring | P2 | Exported function no longer imported anywhere |
| Environment/config mismatches | P2 | Code references env var that isn't in .env.example |
π‘οΈ Sentinel Report
ββββββββββββββββββββββββββββββββββββββ
β P0: src/auth.js imports `validateToken` from src/utils.js
but src/utils.js now exports `verifyToken` (renamed in this session)
β Fix: Update import to `verifyToken` [auto-fixable]
β P1: src/api/routes.js calls createUser(name, email)
but src/models/user.js:createUser now expects (name, email, role)
β Missing required parameter `role` added in this change
β 6 other integration points verified clean
ββββββββββββββββββββββββββββββββββββββ
| Category | Estimated Detection Rate | Notes |
|---|---|---|
| Import/export mismatches | ~95% | Direct string matching |
| Signature mismatches | ~80% | Requires type inference |
| Type contract violations | ~60% | Limited without full type system |
| Missing dependencies | ~90% | Package.json diffing |
| Cross-file rename gaps | ~70% | Heuristic-based |
| Circular dependencies | ~85% | Import graph traversal |
| Dead code | ~50% | Conservative β flags only obvious cases |
| Config mismatches | ~40% | Pattern matching on env references |
| Weighted average | ~72% | Based on typical issue distribution |
Memory makes it better over time. Sentinel remembers past false positives and known fragile integration points in your project. The more you use it, the more accurate it gets.
Without memory, every session starts cold. Agents re-discover your conventions, re-learn your project structure, and repeat the same questions. With memory, knowledge compounds.
| Aspect | Without Memory | With Memory |
|---|---|---|
| First task | Agent explores from scratch | Agent recalls project patterns |
| Conventions | May use wrong style | Remembers your naming, structure, patterns |
| Known issues | No awareness of past bugs | Recalls fragile areas and past fixes |
| Routing accuracy | Generic classification | Improved by past dispatch outcomes |
| False positives | Same false alarms repeat | Sentinel suppresses known non-issues |
Every agent has memory: project in its frontmatter. Claude Code automatically manages a
per-project memory directory (.claude/memory/) where agents store and retrieve learnings.
| Agent | What It Remembers |
|---|---|
| hydra-scout | Project structure patterns, key file locations, search shortcuts |
| hydra-runner | Test commands, common failure patterns, build quirks |
| hydra-scribe | Documentation style, preferred formats, terminology |
| hydra-guard | Known false positives, project-specific security patterns |
| hydra-git | Commit conventions, branch naming, merge preferences |
| hydra-sentinel-scan | Known fragile integration points, past false positives |
| hydra-coder | Coding style, architecture patterns, preferred libraries |
| hydra-analyst | Common bug patterns, performance hotspots, review focus areas |
| hydra-sentinel | Integration history, confirmed vs dismissed findings |
| Orchestrator (Opus) | Fragile zones, routing accuracy, escalation patterns (via CLAUDE.md Hydra Notes) |
- Automatic β agents read and write memory without any user action
- Project-scoped β each project has its own memory; no cross-contamination
- Persistent β survives across sessions; compounds over time
- Manageable β stored as plain markdown in
.claude/memory/; edit or delete anytime
Session 1: Agents learn your project. Session 5: They know your conventions. Session 20: They anticipate your patterns. The framework gets more efficient the more you use it β not because the models improve, but because context quality improves.
After Opus 4.6 dropped, I noticed something frustrating β code execution felt slowww. Reallyyy Slow. Not because the model was worse, but because I was feeding everything through one massive model. Every file read, every grep, every test run, every docstring β all burning through Opus-tier tokens. The result? Frequent context compaction, more hallucinations, and an API bill that made me wince.
So I started experimenting. I switched to Haiku for the simple stuff β running commands, tool calls, file exploration. Sonnet for code generation, refactoring, reviews. And kept Opus only for what it's actually good at: planning, architecture, and the hard decisions. The result surprised me. Same code quality. Sometimes better β because each model was operating within a focused context window instead of one overloaded one.
Five agents. Five separate context windows. Each with a clearly defined job. They do the work, and only pass results back to the brain β Opus. The outcome:
- Longer coding sessions (less compaction, less context blowup)
- Drastically reduced API costs (Haiku 4.5 is 5Γ cheaper than Opus 4.6)
- Faster execution (Haiku 4.5 responds ~10Γ faster)
- Same or better code quality (focused context > bloated context)
- Zero manual model switching (this is the big one)
Because that was the real pain β manually switching between models for every task tier to save costs. Every. Single. Time. So I built a framework that does it for me. And honestly? It does it better than I did. That was hard to admit, but here we are.
I also didn't want it to be boring. So I gave it teeth, heads, and a battle cry. If you prefer something more buttoned-up, the spec-exec branch has the same framework with zero theatrics.
Hail Hydra. Have fun.
Speculative decoding (Chen et al., 2023) accelerates LLM inference by having a small draft model propose tokens that a large target model verifies in parallel. Since verifying K tokens costs roughly the same as generating 1 token, you get 2β2.5Γ speedup with zero quality loss.
Hydra applies this at the task level:
βββββββββββββββββββββββββββββββββββ
β SPECULATIVE DECODING (tokens) β
β β
β Small model drafts K tokens β
β Big model verifies in parallel β
β Accept or reject + resample β
β Result: 2-2.5Γ speedup β
βββββββββββββββββββββββββββββββββββ
β
Same idea,
bigger scale
β
βΌ
βββββββββββββββββββββββββββββββββββ
β π HYDRA (tasks) β
β β
β Haiku/Sonnet drafts the task β
β Opus verifies (quick glance) β
β Accept or redo yourself β
β Result: 2-3Γ speedup β
βββββββββββββββββββββββββββββββββββ
The math is simple: if 70% of tasks can be handled by Haiku 4.5 (10Γ faster, 5Γ cheaper) and 20% by Sonnet 4.6 (3Γ faster, ~1.7Γ cheaper), your effective speed and cost improve dramatically β even accounting for the occasional rejection.
User Request
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
β π§ ORCHESTRATOR (Opus) β β π’ hydra-scout (Haiku 4.5) β
β Classifies task β β IMMEDIATE pre-dispatch: β
β Plans waves β β "Find files relevant to β
β Decides blocking / not β β [user's request]" β
ββββββββββ¬βββββββββββββββββββββ ββββββββββββββββ¬ββββββββββββββββ
β (unless Session Index already covers) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β (scout + classification both ready)
[Session Index updated]
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Wave N (parallel dispatch, index context injected)
βββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β BLOCKING β NON-BLOCKING (fire & forget) β
βΌ βΌ β
[coder] [scribe] βββββββββββββββββββββββββββββββ
β
βΌ
Results arrive
β
βββ Raw data / clean pass? β AUTO-ACCEPT β (updates Session Index if scout)
βββ Code / analysis / user-facing docs? β Orchestrator verifies
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π‘οΈ QUALITY GATE (blocks until complete) β
β β
β π’ sentinel-scan (Haiku) β fast integration sweep β
β βββ issues? β π΅ sentinel (Sonnet) β deep β
β π’ guard (Haiku) β security/quality scan β
β β
β Both must pass before result reaches user β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
User gets result + non-blocking outputs appended when ready
| Head | Model | Speed | Role | Personality |
|---|---|---|---|---|
| hydra-scout (Haiku 4.5) | π’ Haiku 4.5 | β‘β‘β‘ | Codebase exploration, file search, reading | "I've already found it." |
| hydra-runner (Haiku 4.5) | π’ Haiku 4.5 | β‘β‘β‘ | Test execution, builds, linting, validation | "47 passed, 3 failed. Here's why." |
| hydra-scribe (Haiku 4.5) | π’ Haiku 4.5 | β‘β‘β‘ | Documentation, READMEs, comments | "Documented before you finished asking." |
| hydra-guard (Haiku 4.5) | π’ Haiku 4.5 | β‘β‘β‘ | Security/quality gate after code changes | "No secrets. No injection. You're clean." |
| hydra-git (Haiku 4.5) | π’ Haiku 4.5 | β‘β‘β‘ | Git: commit, branch, diff, stash, log | "Committed. Conventional message. Clean diff." |
| hydra-sentinel-scan (Haiku 4.5) | π’ Haiku 4.5 | β‘β‘β‘ | Fast integration sweep after code changes | "Imports check out. Signatures match. Clean." |
| hydra-coder (Sonnet 4.6) | π΅ Sonnet 4.6 | β‘β‘ | Code implementation, refactoring, features | "Feature's done. Tests pass." |
| hydra-analyst (Sonnet 4.6) | π΅ Sonnet 4.6 | β‘β‘ | Code review, debugging, analysis | "Found 2 critical bugs and an N+1 query." |
| hydra-sentinel (Sonnet 4.6) | π΅ Sonnet 4.6 | β‘β‘ | Deep integration analysis (when scan flags issues) | "2 real issues confirmed. 1 false positive dismissed." |
Is it read-only? βββ Yes βββ Finding files?
β βββ Yes: hydra-scout (Haiku 4.5) π’
β βββ No: hydra-analyst (Sonnet 4.6) π΅
β
No βββ Is it a git operation? βββ Yes βββ hydra-git (Haiku 4.5) π’
β
No βββ Is it a security scan? βββ Yes βββ hydra-guard (Haiku 4.5) π’
β
No βββ Just running a command? βββ Yes βββ hydra-runner (Haiku 4.5) π’
β
No βββ Writing docs only? βββ Yes βββ hydra-scribe (Haiku 4.5) π’
β
No βββ Clear implementation approach? βββ Yes βββ hydra-coder (Sonnet 4.6) π΅
β
No βββ Needs deep reasoning? βββ Yes βββ π§ Opus 4.6 (handle it yourself)
Code was just changed? βββ Yes βββ hydra-sentinel-scan (Haiku 4.5) π’
β β
β Issues found?
β βββ No: Done β
β βββ Yes: hydra-sentinel (Sonnet 4.6) π΅
Customize Hydra's behavior with an optional config file:
# Create a default config (user-level β applies to all projects)
./scripts/install.sh --configThen edit ~/.claude/skills/hydra/config/hydra.config.md:
mode: balanced # conservative | balanced (default) | aggressive
dispatch_log: on # on (default) | off | verbose
auto_guard: on # on (default) | offProject-level config (overrides user-level):
Place at .claude/skills/hydra/config/hydra.config.md in your project root.
See config/hydra.config.md for the full reference with all options.
Add your own specialized head in three steps:
1. Copy the template:
cp templates/custom-agent.md agents/hydra-myspecialist.md2. Customize the agent β edit the name, description, tools, and instructions.
3. Deploy it:
./scripts/install.sh --user # or --projectYour new head is now discoverable by Claude Code alongside the built-in nine.
See templates/custom-agent.md for the full template with
instructions on writing effective agent descriptions, output formats, and collaboration protocols.
hydra/
βββ π SKILL.md # Core framework instructions
βββ π² agents/
β βββ hydra-scout.md # π’ Codebase explorer
β βββ hydra-runner.md # π’ Test & build executor
β βββ hydra-scribe.md # π’ Documentation writer
β βββ hydra-guard.md # π’ Security/quality gate
β βββ hydra-git.md # π’ Git operations
β βββ hydra-sentinel-scan.md # π’ Fast integration sweep
β βββ hydra-coder.md # π΅ Code implementer
β βββ hydra-analyst.md # π΅ Code reviewer & debugger
β βββ hydra-sentinel.md # π΅ Deep integration analysis
βββ π references/
β βββ routing-guide.md # 30+ task classification examples
β βββ model-capabilities.md # What each model excels at
βββ βοΈ config/
β βββ hydra.config.md # User configuration template
βββ π templates/
β βββ custom-agent.md # Template for adding your own heads
βββ π§ scripts/
βββ install.sh # One-command deployment
| Metric | Without Hydra | With Hydra | Improvement |
|---|---|---|---|
| Task Speed | 1Γ (Opus for everything) | 2β3Γ faster | π’ Haiku 4.5 heads respond ~10Γ faster |
| API Cost | 1Γ (Opus 4.6 for everything) | ~0.5Γ | ~50% cheaper |
| Quality | Opus-level | Opus-level | Zero degradation |
| User Experience | Normal | Normal | Invisible β zero friction |
| Overhead per turn (Turn 2+) | Full re-exploration each turn | Session index reused | π’ 2-4s saved per turn |
| Scout/runner verification | Opus reviews every output | Auto-accepted for factual data | π’ ~50-60% of outputs skip review |
| Integration bugs caught | 0% (no verification) | ~72% caught before runtime | π’ Sentinel auto-verification |
| Session knowledge | Starts cold every time | Compounds across sessions | π’ Persistent agent memory |
| Task Type | % of Work | Model Used | Input Cost vs Opus 4.6 | Output Cost vs Opus 4.6 |
|---|---|---|---|---|
| Exploration, search, tests, docs | ~50% | π’ Haiku 4.5 | 20% ($1 vs $5/MTok) | 20% ($5 vs $25/MTok) |
| Implementation, review, debugging | ~30% | π΅ Sonnet 4.6 | 60% ($3 vs $5/MTok) | 60% ($15 vs $25/MTok) |
| Architecture, hard problems | ~20% | π§ Opus 4.6 | 100% (no change) | 100% (no change) |
| Sentinel scan (fast) | Auto (every code change) | π’ Haiku 4.5 | 20% | 20% |
| Sentinel deep (conditional) | ~20-30% of code changes | π΅ Sonnet 4.6 | 60% | 60% |
| Blended effective cost | ~48% of all-Opus | ~48% of all-Opus |
Note: Blended input = (0.5Γ$1 + 0.3Γ$3 + 0.2Γ$5) / $5 = $2.40/$5 β 48%. Rounded to ~50% blended cost reduction overall. Savings calculated against Opus 4.6 ($5/$25 per MTok) as of February 2026.
The most accurate way to measure Hydra's impact β no estimation, real numbers:
- Start a Claude Code session without Hydra installed
- Complete a representative coding task
- Note the session cost from Claude Code's cost display
- Start a new session with Hydra installed
- Complete a similar task
- Compare the two costs
That's it. Real data beats theoretical calculations every time.
With a typical task distribution (50% Haiku 4.5, 30% Sonnet 4.6, 20% Opus 4.6):
- Input tokens: ~52% cheaper ($2.40 vs $5.00 per MTok)
- Output tokens: ~52% cheaper ($12.00 vs $25.00 per MTok)
- Blended: ~50% cost reduction
- Speed: 2β3Γ faster on delegated tasks
Note: Savings calculated against Opus 4.6 pricing ($5/$25 per MTok) as of February 2026. Savings would be significantly higher compared to Opus 4.1/4.0 ($15/$75 per MTok).
The user should never notice Hydra operating. No announcements, no permission requests, no process narration. If a head does the work, present the output as if Opus did it.
Don't overthink classification. Quick mental check: "Haiku? Sonnet? Me?" and go. If you spend 10 seconds classifying a 5-second task, you've defeated the purpose.
Independent subtasks launch in parallel. "Fix the bug AND add tests" β two heads working simultaneously.
If a head's output isn't good enough, Opus does it directly. No retries at the same tier. This mirrors speculative decoding's rejection sampling β when a draft token is rejected, the target model samples directly.
For those who want to go deeper, here's how Hydra maps to the original speculative decoding concepts:
| Speculative Decoding Concept | Hydra Equivalent |
|---|---|
| Target model (large) | π§ Opus 4.6 β the orchestrator |
| Draft model (small) | π’ Haiku / π΅ Sonnet heads |
| Draft K tokens | Heads draft the full task output |
| Parallel verification | Opus glances at the output |
| Modified rejection sampling | Accept β ship it. Reject β Opus redoes it. |
| Acceptance rate (~70-90%) | Target: 85%+ of delegated tasks accepted as-is |
| Guaranteed β₯1 token per loop | Every task produces a result β Opus catches failures |
| Temperature/nucleus compatibility | Works with any coding task type or domain |
- Accelerating Large Language Model Decoding with Speculative Sampling β Chen et al., 2023 (DeepMind)
- Fast Inference from Transformers via Speculative Decoding β Leviathan et al., 2022 (Google)
Will I notice any quality difference?
No. Hydra only delegates tasks that are within each model's capability band. If there's any doubt, the task stays with Opus. And Opus always verifies β if a head's output isn't up to standard, Opus redoes it before you ever see it.
Is this actually speculative decoding?
Not at the token level β that happens inside Anthropic's servers and we can't modify it. Hydra applies the same philosophy at the task level: draft with a fast model, verify with the powerful model, accept or reject. Same goals (speed + cost), same guarantees (zero quality loss), different granularity.
What if I'm not using Opus?
Hydra is designed for the Opus-as-orchestrator pattern, but the principles apply at any tier. If you're running Sonnet as your main model, you could adjust the heads to use Haiku for everything delegatable.
Can I customize which models the heads use?
Absolutely. Each head is a simple Markdown file with a
model: field in the frontmatter. Change model: haiku to model: sonnet (or any supported model) and you're done.
Do the heads work with subagents I already have?
Yes. Hydra heads coexist with any other subagents. Claude Code discovers all agents in the
.claude/agents/ directories. No conflicts.
How do I uninstall?
Removes all agents, commands, hooks, and cache files. Deregisters hooks from
~/.claude/settings.json. Your other Claude Code configuration is preserved.
./scripts/install.sh --uninstall
# or: npx hail-hydra-cc --uninstallWhat is Sentinel and how does it work?
Sentinel is a two-tier integration verification system. After every code change, hydra-sentinel-scan (Haiku 4.5) runs a fast sweep (~1-2s) checking imports, exports, function signatures, and dependencies. If it finds potential issues, hydra-sentinel (Sonnet 4.6) performs deep analysis to confirm real problems and dismiss false positives. The result is ~72% of integration bugs caught before they reach you.
Does Sentinel slow things down?
The fast scan adds ~1-2 seconds per code change. The deep analysis only triggers when the scan flags issues (~20-30% of changes), adding another ~3-5 seconds in those cases. For the ~70-80% of changes that are clean, you'll barely notice it. The time saved debugging integration issues far outweighs the scan overhead.
Will Sentinel auto-fix things without asking?
Only trivial fixes (like updating an import path after a rename). For medium-complexity fixes (signature mismatches), it offers the fix for your approval. For complex architectural issues, it reports the problem with context but doesn't attempt a fix. You stay in control.
Can I disable Sentinel?
Yes. Set
sentinel: off in your hydra.config.md. You can also set sentinel: scan-only to keep the fast sweep but skip deep analysis. The default is on (both tiers).
Does Agent Memory use extra tokens?
Memory is loaded as part of each agent's context when it starts, so it does use some tokens β but agent memory files are small (typically a few hundred tokens each). The improved accuracy from having project context usually saves tokens by reducing re-exploration and misclassification.
Where is agent memory stored?
In
.claude/memory/ within your project directory. Each agent stores its own memory as plain markdown files. You can read, edit, or delete them anytime. Memory is project-scoped β each project has its own memory, no cross-contamination.
Does Opus (the orchestrator) also have memory?
Yes. Opus maintains a "Hydra Notes" section in your project's
CLAUDE.md file. This includes fragile integration zones, routing accuracy observations, and known issues. Unlike agent memory (which is per-agent), orchestrator memory is visible to all agents and informs dispatch decisions.
Found a task type that gets misclassified? Have an idea for a new head? Contributions are welcome!
- Fork it
- Create your branch (
git checkout -b feature/hydra-new-head) - Commit (
git commit -m 'Add hydra-optimizer head for perf tuning') - Push (
git push origin feature/hydra-new-head) - Open a PR
MIT β Use it, fork it, deploy it. Just don't use it for world domination.
...unless it's code world domination. Then go ahead.
Built with π§ by Claude Opus 4.6 β ironically, the model this framework is designed to use less of.
v2.0.0 β Now with memory and integration integrity.
Prefer a clean, technical version? See the
spec-execbranch β same framework, zero theatrics.