Skip to content

Latest commit

 

History

History
182 lines (140 loc) · 5.94 KB

File metadata and controls

182 lines (140 loc) · 5.94 KB

Development Roadmap

Phase 0: Foundation (Complete)

Goal: Minimal viable infrastructure

  • Project scaffold
  • Documentation structure
  • uv project setup
  • Basic package structure
  • Configuration management
  • Logging infrastructure

Phase 1: Core Loop (Complete)

Goal: Single agent responding via Telegram

  • LLM provider abstraction (Claude API)
  • Basic Telegram interface
  • Main dialog agent
  • Simple prompt template
  • CLI interface for testing

Phase 2: Memory (Complete)

Goal: Persistent context across sessions

  • SQLite storage layer
  • Working memory (session)
  • Episodic memory (conversation logs)
  • Basic retrieval (FTS5 + fallback)
  • Memory injection into prompts
  • Core memory blocks (Letta concept)
  • Conversation summarization on session end
  • Memory importance scoring

Exit criteria: Bot remembers previous conversations.

Phase 3: Multi-Provider (Complete)

Goal: LLM flexibility and cost optimization

  • Provider router with fallback logic
  • Cost tracking per request
  • OpenRouter integration
  • Local LLM support (Ollama/LM Studio)
  • Model selection by task type (TaskType enum)

Exit criteria: Different queries route to appropriate LLM.

Phase 4: Background Agents (Complete)

Goal: Proactive behavior

  • Agent lifecycle management (Orchestrator)
  • Sleep agent (memory consolidation)
  • Awareness agent (proactive checks)
  • Async task scheduler
  • Semantic memory extraction

Exit criteria: Bot consolidates memories during idle, can proactively notify.

Phase 5: Intelligent LLM Routing (Complete)

Goal: Cost-optimized model selection by task type

  • Task-based model selection (not fallback)
  • Model difficulty registry (HARD/INTERMEDIATE/EASY)
  • Task→Difficulty mapping (CHAT→HARD, SUMMARIZATION→INTERMEDIATE, etc)
  • Cost tracking per request with budget enforcement
  • Automatic downgrade when approaching budget limit
  • Model capability filtering (multimodal, context window)
  • Provider fallback and health checking
  • Integration tests for routing behavior

Exit criteria: Each task type routes to optimal model by cost/capability. ✅

Phase 6: Tool Workspace (Complete)

Goal: Sandboxed environment for code execution

  • Workspace directory structure (data/workspace/)
  • Python environment isolation (venv per workspace)
  • Script execution with output capture (stdout/stderr)
  • File read/write within sandbox with path validation
  • Execution timeout and resource limits
  • Safety validator (blocks dangerous imports/functions)
  • CodeAgent for LLM-driven script generation
  • Comprehensive test coverage

Exit criteria: Bot can write, execute Python scripts and read results safely. ✅

Phase 6.5: Task Scheduling System (Complete)

Goal: Proactive async task execution

  • Task type system (REMINDER, AGENT_TASK, API_CALL, WEB_SEARCH)
  • Schedule parsing (delays: "5m", "2h"; patterns: "daily 9am", "weekdays 6pm")
  • SQLite-backed task persistence with indexing
  • Recurring task rescheduling after execution
  • Task execution engine with notification callbacks
  • Telegram commands: /remind, /schedule, /tasks, /cancel
  • Integration with AwarenessAgent for proactive checks
  • Comprehensive integration test suite

Exit criteria: Users can schedule one-time and recurring tasks via natural language. ✅ Documentation: task_system.md

Phase 6.6: Tool Calling Framework (Complete)

Goal: LLM-driven function execution via native APIs

  • Tool definition system with @tool decorator
  • Native API integration (Anthropic tool use, OpenAI function calling)
  • Tool registry with provider-specific converters
  • Built-in tools: task management, system utilities
  • Tool executor with validation and error handling
  • DialogAgent integration with two-pass approach
  • Automatic tool result formatting for LLM
  • Integration tests with live API calls

Exit criteria: Agent can call functions from natural language using native provider APIs. ✅ Documentation: tool_calling.md

Phase 7: Safety & Self-Modification

Goal: Safe autonomous operation

  • Action classification system
  • Approval workflow
  • Audit logging
  • Test harness for code changes
  • Self-modification sandbox (uses Tool Workspace)

Exit criteria: Agent can propose and safely apply code changes.

Phase 8: Rich Interface

Goal: Full multimodal support

  • Voice message handling (STT)
  • Image understanding
  • File processing
  • Inline keyboards/actions
  • Notification preferences

Exit criteria: Natural multimodal conversations.

Future Phases

Phase 9: Specialized Agents

  • Code agent (uses Tool Workspace)
  • Research agent (web search, synthesis)
  • Calendar agent (scheduling)

Phase 10: External Integrations

  • Calendar APIs
  • Note-taking apps
  • Smart home
  • Email

Phase 11: Multi-User

  • User isolation
  • Shared knowledge base
  • User-to-user introductions

Development Principles

  1. Vertical slices: Each phase delivers usable functionality
  2. Test-driven: Tests before features
  3. Documentation-first: Design before code
  4. Minimal dependencies: Add only when needed
  5. Machine-readable: Code optimized for AI understanding

Current Focus

Phase 7: Safety & Self-Modification (Not Started)

Phases 0-6 complete including:

  • ✅ Core infrastructure and memory
  • ✅ Multi-provider LLM routing with cost optimization
  • ✅ Tool Workspace for sandboxed code execution
  • ✅ Task scheduling system for proactive behavior
  • ✅ Native API tool calling framework

Next actions:

  1. Design action classification system (read/write/execute severity levels)
  2. Implement approval workflow for high-risk actions
  3. Add audit logging for all agent actions
  4. Create test harness for proposed code changes
  5. Build self-modification sandbox using Tool Workspace