Skip to content

feat(inference): add layer-aligned KV cache management (#1964)#3074

Merged
mrveiss merged 1 commit intoDev_new_guifrom
feat/1964-kv-cache
Mar 31, 2026
Merged

feat(inference): add layer-aligned KV cache management (#1964)#3074
mrveiss merged 1 commit intoDev_new_guifrom
feat/1964-kv-cache

Conversation

@mrveiss
Copy link
Copy Markdown
Owner

@mrveiss mrveiss commented Mar 31, 2026

Summary

  • LayerKVCache: per-layer (k, v) tensor pair storage with lazy allocation
  • KVCacheManager: memory estimation, max sequence length calculator, RTX 4070 preset
  • KVCacheConfig: validated config with dtype mapping (fp16/bf16/fp32 + aliases)
  • Lazy torch import — module loads without GPU/torch installed
  • 50 tests across 12 test classes

Closes #1964

Test plan

  • Config validation (dtype bytes, aliases, 6 error paths)
  • Cache get/update: single/multi-layer, multi-step accumulation
  • Shape validation: wrong batch/heads/head_dim/rank
  • Trim: reduce, to-zero, no-op, negative, content correctness
  • Memory estimation: formula, fp32 vs fp16, scaling
  • Max seq_len: positive, monotonic, tiny budget floor, round-trip
  • RTX 4070 helper: positive and within budget
  • flake8 passes (F821 suppressed for torch.Tensor type hints)

🤖 Generated with Claude Code

Per-layer (k, v) tensor pair storage with VRAM budget calculator.
Supports fp16/bf16/fp32, trim, clear, and RTX 4070 memory estimation.
Lazy torch import — works without GPU. 50 tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mrveiss mrveiss merged commit 8008b48 into Dev_new_gui Mar 31, 2026
3 of 4 checks passed
@mrveiss mrveiss deleted the feat/1964-kv-cache branch March 31, 2026 17:53
@github-actions
Copy link
Copy Markdown

✅ SSOT Configuration Compliance: Passing

🎉 No hardcoded values detected that have SSOT config equivalents!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant