.--..--..--..--..--..--..--..--..--..--..--..--..--..--..--..--..--.
/ .. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \
\ \/\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ \/ /
\/ /`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'\/ /
/ /\ ▄▀▀█▄ ▄▀▀▄ ▀▄ ▄▀▀█▀▄ ▄▀▀▄ ▄▀▄ ▄▀▀▄ ▄▀▀▄ ▄▀▀▀▀▄ / /\
/ /\ \ ▐ ▄▀ ▀▄ █ █ █ █ █ █ █ █ █ ▀ █ █ █ █ █ █ ▐ / /\ \
\ \/ / █▄▄▄█ ▐ █ ▀█ ▐ █ ▐ ▐ █ █ ▐ █ █ ▀▄ \ \/ /
\/ / ▄▀ █ █ █ █ █ █ █ █ ▀▄ █ \/ /
/ /\ █ ▄▀ ▄▀ █ ▄▀▀▀▀▀▄ ▄▀ ▄▀ ▀▄▄▄▄▀ █▀▀▀ / /\
/ /\ \ ▐ ▐ █ ▐ █ █ █ █ ▐ / /\ \
\ \/ / ▐ ▐ ▐ ▐ ▐ \ \/ /
\/ / \/ /
/ /\.--..--..--..--..--..--..--..--..--..--..--..--..--..--..--./ /\
/ /\ \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \.. \/\ \
\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `'\ `' /
`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'`--'
Local-first AI agent with intelligent code understanding and multi-strategy retrieval.
Animus is an experimental agentic system designed to run entirely on edge hardware (8W Jetson/Pi with Transformer hat to consumer GPUs) with sub-7B models. Features novel Animus: Manifold multi-strategy retrieval that combines vector search, knowledge graphs, and keyword search with hardcoded routing—no cloud dependencies, no LLM-based classification overhead.
Key Innovation: The first local-first system to achieve <400ms hybrid query latency by treating the entire codebase as "one surface" through AST-based knowledge graphs and contextual embeddings, eliminating naive chunking strategies.
git clone https://github.com/yourusername/animus.git
cd animus
pip install -e ".[all]" # Installs all dependenciesThis installs:
- Core dependencies (typer, rich, pydantic, tiktoken)
- Local inference (llama-cpp-python for GGUF models)
- Embeddings (sentence-transformers for semantic search)
- Testing framework (pytest)
animus initCreates ~/.animus/config.yaml with default settings. The config will auto-detect your hardware (GPU, CPU, OS) and set appropriate defaults.
# Recommended: 7B coder model (4.8 GB VRAM)
animus pull qwen-2.5-coder-7b
# Or see all available models
animus pull --listModels are downloaded to ~/.animus/models/ as GGUF files. The pull command auto-configures your settings.
# Build AST-based knowledge graph (Python code intelligence)
animus graph ./your-project
# Build vector store with contextual embeddings
animus ingest ./your-projectOrder matters: Run graph first, then ingest. The ingest process uses the knowledge graph to enrich embeddings with structural context (callers, callees, inheritance).
animus riseYou'll see:
[i] Provider: native Model: qwen-2.5-coder-7b
[i] [Manifold] Unified search tool registered
[i] Session: abc123def
[i] Type 'exit' or 'quit' to end.
You>
What it does: Multi-strategy retrieval router that automatically classifies queries and dispatches to the optimal search backend.
How it works:
- Hardcoded pattern matching (<1ms classification, no LLM)
- Four strategies: SEMANTIC, STRUCTURAL, HYBRID, KEYWORD
- Reciprocal Rank Fusion for multi-strategy result merging
- Contextual embeddings (graph-enriched vector search)
Example usage:
You> search for "how does authentication work?"
→ Routes to SEMANTIC (vector similarity)
→ Returns: Code snippets semantically similar to "authentication"
You> search for "what calls authenticate()?"
→ Routes to STRUCTURAL (knowledge graph)
→ Returns: All functions that call authenticate()
You> search for "find the auth code and what depends on it"
→ Routes to HYBRID (both strategies + RRF fusion)
→ Returns: Auth code (semantic) + its callers (structural), fused and ranked
You> search for "find TODO comments"
→ Routes to KEYWORD (exact grep match)
→ Returns: Lines containing "TODO"
When to use: Any time you need to understand, find, or navigate code. Manifold automatically picks the right strategy—you don't choose.
What it does: Agent executes tools (file operations, git, shell commands) with observation-reflection-action pattern.
How it works:
- Agent evaluates tool results (success/failure/empty/long)
- Provides contextual guidance for next actions
- Prevents infinite loops (repeat detection, thrashing detection, hard limits)
- Cumulative execution budgets (300s session limit)
Example usage:
You> Create a Python script that generates Pascal's triangle up to n layers
Agent plans:
[1/2] write_file("pascal.py", "def pascal_triangle(n): ...")
[2/2] Test the script by running it
Agent reflects:
[Tool write_file SUCCESS]: Successfully wrote 156 characters to pascal.py
The operation succeeded. This step is COMPLETE. Do NOT make additional tool calls to verify.
When to use: Any coding task, file manipulation, git operations, or shell commands. The agent handles multi-step workflows automatically.
What it does: Decomposes complex tasks into atomic steps, each executed with fresh context and filtered tools.
How it works:
- LLM generates numbered step plan (focused prompt, no tools, no history)
- Hardcoded parser extracts steps into structured format
- Each step executed independently with minimal context
- GBNF grammar constraints for valid JSON tool calls
Example usage:
You> Read all Python files in src/, find the longest one, and create a summary
Agent decomposes:
[1/4] list_dir("src/", recursive=true)
[2/4] read_file for each .py file
[3/4] Compare file lengths
[4/4] write_file("summary.txt", "...")
Each step gets fresh context (no accumulated history noise).
When to use: Automatically activated for small models (<7B) or complex multi-step tasks. Can be forced with /plan command.
What it does: Full code structure extraction with call graphs, inheritance trees, and import tracking.
How it works:
- Python AST parsing (classes, functions, methods, docstrings, args, decorators)
- Four edge types: CALLS, INHERITS, CONTAINS, IMPORTS
- Graph queries: search, callers, callees, inheritance, blast_radius
- Incremental updates (mtime + content hash change detection)
Example usage:
You> What functions call estimate_tokens()?
→ Graph query returns all callers with file locations
You> Show me the blast radius of changing Agent.run()
→ Returns all downstream code affected by the change
You> What does ModelProvider inherit from?
→ Returns inheritance tree (ABC base class)
When to use: Understanding code structure, impact analysis, refactoring planning, dependency mapping.
What it does: Enriches code chunks with structural context before embedding.
How it works:
- Queries knowledge graph for callers, callees, inheritance
- Prepends context:
[From path, function X, called by Y, calls Z] {code} - Embeds contextualized text (captures WHERE code lives)
- Stores original text (clean display to user)
Example benefit:
Without context: "def authenticate(token): ..." → embedding
With context: "[From src/auth/handler.py, function authenticate in AuthService,
called by middleware.verify and routes.login]
def authenticate(token): ..." → embedding
Query: "login flow"
→ Matches authenticate() because context mentions routes.login
When to use: Automatically active when both knowledge graph and vector store exist. Dramatically improves semantic search relevance.
What it does: AST-informed chunking and boundary detection for 7+ programming languages.
Supported languages:
- Python (full AST parsing)
- Go, Rust, C/C++, TypeScript, JavaScript, Shell (boundary detection)
How it works:
- Detects language from file extension
- Uses language-specific patterns for function/class boundaries
- Python: Full AST with semantic metadata
- Others: Regex-based boundary detection with future pluggable parser support
When to use: Automatically applied during ingestion. Works on polyglot codebases.
What it does: Permission system with dangerous operation blocking and optional Ornstein sandbox isolation.
Protections:
- Blocked paths:
/etc,/sys,C:\Windows, etc. - Blocked commands:
rm -rf /,mkfs, fork bombs - Dangerous command confirmation:
rm,sudo,shutdown - Execution budgets: 300s session limit, 6 tool calls per step max
- Loop prevention: Repeat detection, thrashing detection, hard limits
Example:
You> Delete all files in the project
→ [!] Allow dangerous command: rm -rf *? [y/N]
→ User must explicitly confirm
You> Run this command 50 times
→ Hard limit kicks in after 6 calls
→ [System]: Hard limit reached - 6 tool calls executed
When to use: Always active. Provides safety rails for autonomous agent operation.
What it does: Converts agent responses to speech using Piper TTS with voice profiles.
Features:
- Multiple voice profiles (balanced, narrative, technical, energetic)
- DSP processing (bass boost, treble, normalization, compression)
- Audio caching (identical responses reuse cached audio)
- Offline operation (Piper runs locally)
Enable:
# ~/.animus/config.yaml
audio:
enabled: true
voice_profile: balanced # or narrative, technical, energeticWhen to use: Hands-free operation, accessibility, multitasking while agent works.
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB |
| Storage | 10 GB free | 50 GB free (for multiple models) |
| CPU | 4 cores | 8+ cores |
| OS | Windows 10+, Linux (Ubuntu 20.04+), macOS 11+ | Any modern OS |
| GPU | None (CPU-only works) | NVIDIA with 6+ GB VRAM |
Performance tested on consumer hardware (RTX 2080 Ti 11GB, Ryzen 9 5900X) with Q4_K_M quantization:
| Model Size | VRAM (Q4) | Inference Speed | Tool Calling | Planning | Code Quality | Viable Use Cases |
|---|---|---|---|---|---|---|
| 1-3B | 1.2-2.4 GB | Fast (1-5s) | ✅ With GBNF | Single-step file ops, simple Q&A | ||
| 7B | 4.8 GB | Moderate (15-30s) | ✅ Reliable | ✅ Good (5-7 steps) | ✅ Production-ready | Multi-file coding, code review, refactoring |
| 14B | 8.9 GB | Slow (30-60s) | ✅ Excellent | ✅ Excellent (7-10 steps) | ✅ High quality | Complex agentic workflows, architecture |
| 20B | 12.3 GB | Very Slow (60-120s) | ✅ Excellent | ✅ Excellent (10+ steps) | ✅ Very high | Research-grade code generation |
| 30B | 18.3 GB | Multi-GPU needed | ✅ Near-perfect | ✅ Near-perfect | ✅ Exceptional | Professional development |
| 70B | 42 GB | Multi-GPU required | ✅ Near-perfect | ✅ Near-perfect | ✅ Exceptional | Frontier local capability |
VRAM formula: params_B × 0.6 + 0.3 GB base + ~15% runtime overhead for Q4_K_M quantization
Key thresholds:
- 7B: Minimum for production-quality code output
- 14B: Sweet spot for consumer hardware (single GPU, good quality)
- 30B+: Requires multi-GPU or workstation hardware (>$5K investment)
- 70B+: Typically exceeds cost-effectiveness vs API usage for most workflows
| Tier | GPU | VRAM | Best Model | Use Case |
|---|---|---|---|---|
| Entry | None (CPU-only) | N/A | API (GPT-4/Claude) | Learning, experimentation |
| Hobbyist | GTX 1660 / RTX 3050 | 6 GB | 7B models | Weekend projects |
| Enthusiast | RTX 3060 / 4060 Ti | 8-12 GB | 7-14B models | Serious development |
| Professional | RTX 4090 / A6000 | 24 GB | 20-30B models | Production workflows |
| Workstation | Multi-GPU (2-4×) | 48+ GB | 70B+ models | Research, frontier experiments |
| Edge | Jetson Orin Nano | 8 GB | 3-7B models | Embedded, air-gapped |
Reality check: For most users, API access to GPT-4 or Claude Sonnet is more cost-effective than dedicated GPU hardware for 30B+ models. Animus supports both—use local for privacy/air-gapped, use API for scale.
# 1. Install
git clone https://github.com/yourusername/animus.git
cd animus
pip install -e ".[all]"
# 2. Initialize
animus init
animus detect # Check your GPU
# 3. Download model (choose based on your VRAM)
animus pull qwen-2.5-coder-7b # 4.8 GB VRAM, best for coding
# 4. Start agent
animus rise
# 5. Test basic functionality
You> What files are in this directory?
You> Create a file called test.txt with "Hello World"
You> exit# 1. Build knowledge graph (AST parsing)
animus graph ./your-project
# → Extracts 1000s of nodes (classes, functions, methods)
# → Creates call graphs, inheritance trees, import maps
# 2. Build vector store (contextual embeddings)
animus ingest ./your-project
# → Chunks code with AST boundaries
# → Enriches with graph context
# → Embeds with sentence-transformers
# 3. Use intelligent search
animus rise
You> search for "how does configuration loading work?"
→ [Strategy: SEMANTIC] Returns conceptually relevant code
You> search for "what calls load_config()?"
→ [Strategy: STRUCTURAL] Returns all callers from graph
You> search for "find config code and everything that depends on it"
→ [Strategy: HYBRID] Fuses semantic + structural results
→ Results marked with ★ appear in both (high confidence)# 1. Install and init
pip install -e ".[dev]"
animus init
# 2. Set API key
export ANTHROPIC_API_KEY="sk-ant-..."
# or
export OPENAI_API_KEY="sk-..."
# 3. Configure provider
# Edit ~/.animus/config.yaml:
model:
provider: anthropic # or "openai"
model_name: claude-sonnet-4-5-20250929 # or "gpt-4"
# 4. Start
animus riseNo model download needed—uses cloud API directly.
You> search for "error handling patterns"
→ Manifold routes to SEMANTIC
→ Returns code chunks with try/except patterns
You> search for "what does ChunkContextualizer.contextualize call?"
→ Manifold routes to STRUCTURAL
→ Returns callees from knowledge graph
You> What's the blast radius of changing estimate_tokens()?
→ Agent uses get_blast_radius tool
→ Shows all downstream code affected
You> Read src/core/agent.py and add a debug logging statement to the _step method
Agent:
[1/3] read_file("src/core/agent.py")
[2/3] modify file with logging
[3/3] write_file with updated content
You> Now run the tests to make sure nothing broke
→ Agent runs pytest and reports results
You> What files have uncommitted changes?
→ Agent runs git_status
You> Show me the diff for src/core/agent.py
→ Agent runs git_diff
You> Commit the changes with message "Add debug logging"
→ [!] Commit with message: Add debug logging? [y/N]
→ User confirms, agent commits
You> Find all usages of estimate_tokens(), then consolidate them into
a single implementation in src/core/context.py
Agent (with 7B model):
[1/5] search for "estimate_tokens" # Uses Manifold
[2/5] read_file for each file with matches
[3/5] Analyze duplicate implementations
[4/5] Update all files to import from context.py
[5/5] Run tests to verify changes
→ Automatically handles complex multi-file operations
| Command | Description | Example |
|---|---|---|
animus init |
Initialize config | animus init |
animus detect |
Show hardware info | animus detect |
animus status |
System readiness check | animus status |
animus config --show |
View configuration | animus config --show |
| Command | Description | Example |
|---|---|---|
animus models |
List available models | animus models |
animus models --vram 6 |
Filter by VRAM | animus models --vram 6 |
animus models --role planner |
Filter by capability | animus models --role planner |
animus pull <model> |
Download model | animus pull qwen-2.5-coder-7b |
animus pull --list |
Show all downloadable models | animus pull --list |
| Command | Description | Example |
|---|---|---|
animus graph <path> |
Build knowledge graph | animus graph ./src |
animus ingest <path> |
Build vector store | animus ingest ./src |
| Command | Description | Example |
|---|---|---|
animus rise |
Start interactive session | animus rise |
animus rise --resume |
Resume last session | animus rise --resume |
animus rise --session <id> |
Resume specific session | animus rise --session abc123 |
animus sessions |
List all sessions | animus sessions |
| Command | Description |
|---|---|
/tools |
Show available tools |
/tokens |
Show context usage |
/plan |
Toggle plan mode |
/save |
Save session |
/clear |
Reset conversation |
/help |
List all commands |
exit or quit |
End session |
Initial hypothesis: Local models can compete with APIs through clever prompting.
Reality discovered: For production-scale agentic workflows, API costs scale linearly with usage, while local inference costs scale worse than linearly with quality requirements:
- 7B model: Fast but limited code quality
- 14B model: Better but 2x slower, requires 2x VRAM
- 30B+ model: Good quality but requires multi-GPU ($5K+) and 5-10x slower
Key finding: API almost always wins on total cost of ownership at scale. Local inference is for privacy, air-gapped environments, or specific low-latency scenarios—not cost savings.
Initial approach: Standard RAG chunking (sliding window, 512 tokens, 64 token overlap)
Problems discovered:
- Semantic boundaries ignored - Functions split mid-implementation
- No structural metadata - Chunks are anonymous text blobs
- Context-free embeddings - "def authenticate()" could be anywhere
- Search quality poor - "login flow" doesn't match relevant auth code
Attempted fix: Better chunking (paragraph-aware, code-aware regex)
Result: Marginal improvement, fundamental issues remained.
Breakthrough insight: Stop trying to make chunks self-contained. Instead, make the entire codebase one surface that the LLM can navigate through hardcoded tooling.
Key components:
- AST-based knowledge graph - Parse code structure, not text
- Graph queries as tools - "What calls X?" is a SQL query, not an LLM prompt
- Contextual embeddings - Enrich chunks with graph-derived context
- Hardcoded routing - Classify query intent with regex, not LLM
Why this works:
- Knowledge graph answers structural questions (<20ms SQL query)
- Vector search answers semantic questions (with graph-enriched embeddings)
- Router combines them intelligently (hardcoded, <1ms, no LLM overhead)
- LLM only used for understanding user intent and generating code—not navigation
The synthesis: Animus had all the pieces:
- ✅ Vector store (semantic search)
- ✅ Knowledge graph (structural queries)
- ✅ AST parser (code understanding)
- ✅ Tool framework (extensibility)
What was missing: Orchestration layer to make them work as one system.
Manifold implementation:
- Hardcoded query router (SEMANTIC/STRUCTURAL/HYBRID/KEYWORD)
- Reciprocal Rank Fusion (cross-strategy result merging)
- Contextual embeddings (graph context prepended before embedding)
- Unified search() tool (automatic strategy selection)
Result: <400ms hybrid queries on edge hardware. No cloud, no large models needed for code navigation. LLM used only for actual reasoning/generation, not for "finding the right code."
The experiment: Built an automated gauntlet test — identical multi-step task (create directory, write Python file, git init/add/commit) executed across three model sizes on the same hardware (RTX 2080 Ti).
Results:
- 1B (Llama-3.2-1B): 0/8 checks passed. Cannot produce structured tool calls at all — outputs shell scripts instead of JSON tool invocations. No amount of scaffolding (GBNF grammar, plan-then-execute, filtered tools) compensates for insufficient model capacity.
- 7B (Qwen2.5-Coder-7B): 8/8 checks passed, 238.9s. Clean plan, correct tool calls, some scope bleed (hallucinated a GitHub URL and attempted push). Minimum viable model for agentic use.
- 14B (Qwen2.5-Coder-14B): 8/8 checks passed, 373.9s. 56% slower than 7B, 58% more tool calls (19 vs 12), identical outcome. Extra parameters manifest as over-verification and unnecessary branch creation, not better task completion.
Key insight: Below ~3B parameters, models cannot follow the tool-call contract regardless of scaffolding. Above the threshold, returns diminish rapidly — 7B with good scaffolding (plan-then-execute, GBNF grammar, tool filtering) outperforms 14B with the same scaffolding in wall-clock efficiency. Infrastructure matters more than model size once you cross the viability threshold.
See LLM_GECK/Archival Assets/Phase_2_assessment.md for full empirical analysis and security audit.
Traditional RAG: LLM decides what to search for, LLM interprets results, LLM navigates codebase
Manifold approach:
- Hardcoded router decides strategy (<1ms vs LLM's 100-500ms)
- SQL queries answer structural questions (deterministic vs LLM's probabilistic)
- AST parsing extracts code structure (100% accurate vs LLM's ~80%)
- LLM only used where ambiguity/creativity actually needed
Philosophy: "Use LLMs only where ambiguity, creativity, or natural language understanding is required. Use hardcoded logic for everything else."
The result: A 7B model with Manifold outperforms a 30B model with naive RAG because the 7B model is doing less work—the infrastructure handles code navigation deterministically.
- All data stored locally (SQLite databases)
- No cloud dependencies for core functionality
- API providers available but not required
- Works offline after initial model download
- Task decomposition: Hardcoded parser (not LLM)
- Query routing: Regex patterns (not LLM classifier)
- Tool selection: Type-based filtering (not LLM decision)
- Error recovery: Exception classification (not LLM diagnosis)
- Designed for 8W Jetson to consumer GPUs
- <400ms query latency target
- Paginated vector search (constant memory)
- SIMD-accelerated KNN (sqlite-vec)
- Batched embedding generation
- Permission system (blocked paths/commands)
- Execution budgets (time limits per session)
- Loop prevention (repeat detection, thrashing detection, hard limits)
- Audit trails (write operations logged)
- Sandbox isolation (Ornstein for untrusted code)
# Run full test suite
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test module
pytest tests/test_router.py -vTest coverage: 546 tests across 15 modules CI/CD: Automated testing on Python 3.11, 3.12, 3.13 via GitHub Actions
animus/
├── src/
│ ├── core/ # Agent, planner, context management
│ │ ├── agent.py # Agentic loop with reflection
│ │ ├── planner.py # Plan-then-execute pipeline
│ │ ├── context.py # Token estimation, context budgeting
│ │ └── tool_parsing.py # Shared tool call parser
│ ├── llm/ # Model providers
│ │ ├── native.py # llama-cpp-python (GGUF)
│ │ ├── api.py # OpenAI/Anthropic APIs
│ │ └── base.py # Provider ABC
│ ├── memory/ # RAG pipeline
│ │ ├── chunker.py # Multi-language chunking
│ │ ├── embedder.py # Sentence-transformers
│ │ ├── vectorstore.py # SQLite vector store
│ │ ├── scanner.py # Directory walker
│ │ └── contextualizer.py # Contextual embedding
│ ├── knowledge/ # Code intelligence
│ │ ├── parser.py # AST-based code parser
│ │ ├── graph_db.py # Knowledge graph storage
│ │ └── indexer.py # Incremental graph builder
│ ├── retrieval/ # Manifold system
│ │ ├── router.py # Query classification
│ │ └── executor.py # Strategy dispatch + RRF
│ ├── tools/ # Agent tools
│ │ ├── filesystem.py # File operations
│ │ ├── shell.py # Shell commands
│ │ ├── git.py # Git operations
│ │ ├── graph.py # Graph queries
│ │ └── manifold_search.py # Unified search
│ ├── isolation/ # Sandboxing
│ │ └── ornstein.py # Lightweight sandbox
│ └── audio/ # TTS system
│ ├── engine.py # Piper TTS integration
│ └── voice_profile.py # Voice profiles + DSP
├── tests/ # Test suite (546 tests)
├── docs/ # Documentation
└── LLM_GECK/ # Development audits & blueprints
| Operation | Latency | Backend |
|---|---|---|
| Router classification | <1ms | Pure regex |
| Vector search (sqlite-vec) | 20-50ms | SIMD KNN |
| Graph query | 10-20ms | Indexed SQL |
| Keyword search (grep) | 50-100ms | Subprocess |
| RRF fusion | <1ms | Pure math |
| Total (cached embedding) | <200ms | Combined |
| Query embedding (MiniLM) | 50-200ms | GPU/CPU dependent |
| Total (cold query) | <400ms | End-to-end (Does not include LLM prompt processing time or plan formation) |
Tested on RTX 2080 Ti with 601 chunks, 1,240 graph nodes.
| Metric | Before Audit | After Audit | Improvement |
|---|---|---|---|
| Tool calls per step | 12-20+ (loops) | 1-2 | 92% reduction |
| Token estimation error | ±30% (4 char/token) | ±2% (tiktoken) | 93% accuracy gain |
| Vector search memory | 150MB+ spikes | Constant (paginated) | Bounded |
| Repeat detection | 3 identical calls | 2 identical calls | Stricter |
| Language support | Python only | 7+ languages | 700% expansion |
See CONTRIBUTING.md for development setup, testing guidelines, and contribution areas.
Areas open for contribution:
- Multi-language parsers (Go, Rust, TypeScript using tree-sitter)
- Additional tool implementations (web browsing, API calls, database access)
- Model provider integrations (Ollama, LM Studio, vLLM)
- Performance optimizations (Go sidecar architecture documented in LLM_GECK)
- Documentation and tutorials
Core principle: "Use LLMs only where ambiguity, creativity, or natural language understanding is required. Use hardcoded logic for everything else."
In practice:
- ✅ LLM for: Task decomposition, code generation, natural language understanding
- ❌ LLM for: File parsing (use AST), pattern matching (use regex), routing (use decision trees)
See LLM_GECK/README.md for development framework and LLM_GECK/MANIFOLD_BUILD_INSTRUCTIONS.md for Manifold architecture details.
Test coverage: 546 tests passing (100%) Supported: Python 3.11, 3.12, 3.13 Platforms: Windows, Linux, macOS
[Insert your license here - MIT, Apache 2.0, etc.]
Built with:
- llama-cpp-python for local GGUF inference
- sentence-transformers for semantic embeddings
- sqlite-vec for SIMD-accelerated vector search
- tiktoken for accurate token counting
- Piper TTS for voice synthesis
Inspired by the principle that the best code is the code you don't have to write—and the best LLM call is the one you hardcode away.
Status: Production-ready with 39 tasks completed, 8,000+ lines of improvements, and novel Manifold multi-strategy retrieval system.
"The name of the game isn't who has the biggest model. It's who gets the most signal per watt."