Goal: Minimal viable infrastructure
- Project scaffold
- Documentation structure
- uv project setup
- Basic package structure
- Configuration management
- Logging infrastructure
Goal: Single agent responding via Telegram
- LLM provider abstraction (Claude API)
- Basic Telegram interface
- Main dialog agent
- Simple prompt template
- CLI interface for testing
Goal: Persistent context across sessions
- SQLite storage layer
- Working memory (session)
- Episodic memory (conversation logs)
- Basic retrieval (FTS5 + fallback)
- Memory injection into prompts
- Core memory blocks (Letta concept)
- Conversation summarization on session end
- Memory importance scoring
Exit criteria: Bot remembers previous conversations.
Goal: LLM flexibility and cost optimization
- Provider router with fallback logic
- Cost tracking per request
- OpenRouter integration
- Local LLM support (Ollama/LM Studio)
- Model selection by task type (TaskType enum)
Exit criteria: Different queries route to appropriate LLM.
Goal: Proactive behavior
- Agent lifecycle management (Orchestrator)
- Sleep agent (memory consolidation)
- Awareness agent (proactive checks)
- Async task scheduler
- Semantic memory extraction
Exit criteria: Bot consolidates memories during idle, can proactively notify.
Goal: Cost-optimized model selection by task type
- Task-based model selection (not fallback)
- Model difficulty registry (HARD/INTERMEDIATE/EASY)
- Task→Difficulty mapping (CHAT→HARD, SUMMARIZATION→INTERMEDIATE, etc)
- Cost tracking per request with budget enforcement
- Automatic downgrade when approaching budget limit
- Model capability filtering (multimodal, context window)
- Provider fallback and health checking
- Integration tests for routing behavior
Exit criteria: Each task type routes to optimal model by cost/capability. ✅
Goal: Sandboxed environment for code execution
- Workspace directory structure (
data/workspace/) - Python environment isolation (venv per workspace)
- Script execution with output capture (stdout/stderr)
- File read/write within sandbox with path validation
- Execution timeout and resource limits
- Safety validator (blocks dangerous imports/functions)
- CodeAgent for LLM-driven script generation
- Comprehensive test coverage
Exit criteria: Bot can write, execute Python scripts and read results safely. ✅
Goal: Proactive async task execution
- Task type system (REMINDER, AGENT_TASK, API_CALL, WEB_SEARCH)
- Schedule parsing (delays: "5m", "2h"; patterns: "daily 9am", "weekdays 6pm")
- SQLite-backed task persistence with indexing
- Recurring task rescheduling after execution
- Task execution engine with notification callbacks
- Telegram commands:
/remind,/schedule,/tasks,/cancel - Integration with AwarenessAgent for proactive checks
- Comprehensive integration test suite
Exit criteria: Users can schedule one-time and recurring tasks via natural language. ✅ Documentation: task_system.md
Goal: LLM-driven function execution via native APIs
- Tool definition system with
@tooldecorator - Native API integration (Anthropic tool use, OpenAI function calling)
- Tool registry with provider-specific converters
- Built-in tools: task management, system utilities
- Tool executor with validation and error handling
- DialogAgent integration with two-pass approach
- Automatic tool result formatting for LLM
- Integration tests with live API calls
Exit criteria: Agent can call functions from natural language using native provider APIs. ✅ Documentation: tool_calling.md
Goal: Safe autonomous operation
- Action classification system
- Approval workflow
- Audit logging
- Test harness for code changes
- Self-modification sandbox (uses Tool Workspace)
Exit criteria: Agent can propose and safely apply code changes.
Goal: Full multimodal support
- Voice message handling (STT)
- Image understanding
- File processing
- Inline keyboards/actions
- Notification preferences
Exit criteria: Natural multimodal conversations.
- Code agent (uses Tool Workspace)
- Research agent (web search, synthesis)
- Calendar agent (scheduling)
- Calendar APIs
- Note-taking apps
- Smart home
- User isolation
- Shared knowledge base
- User-to-user introductions
- Vertical slices: Each phase delivers usable functionality
- Test-driven: Tests before features
- Documentation-first: Design before code
- Minimal dependencies: Add only when needed
- Machine-readable: Code optimized for AI understanding
Phase 7: Safety & Self-Modification (Not Started)
Phases 0-6 complete including:
- ✅ Core infrastructure and memory
- ✅ Multi-provider LLM routing with cost optimization
- ✅ Tool Workspace for sandboxed code execution
- ✅ Task scheduling system for proactive behavior
- ✅ Native API tool calling framework
Next actions:
- Design action classification system (read/write/execute severity levels)
- Implement approval workflow for high-risk actions
- Add audit logging for all agent actions
- Create test harness for proposed code changes
- Build self-modification sandbox using Tool Workspace