Open-Alita is a generalist agent designed to enable scalable agentic reasoning with minimal predefinition and maximal self-evolution. This project leverages the Model Context Protocol (MCP) to dynamically create, adapt, and reuse capabilities based on the demands of various tasks, moving away from the reliance on predefined tools and workflows.
Follow updates & connect: @ryan_tzr on Twitter
This project is inspired by the Alita project by CharlesQ9 and the concepts presented in the research paper "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".
- Original Alita Project: CharlesQ9/Alita on GitHub
- Research Paper: Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution (arXiv:2505.20286)
Full credits to the authors and contributors of these works for the foundational architecture and ideas.
Open-Alita now integrates with browser-use for powerful browser automation capabilities! The system intelligently routes tasks to the most appropriate agent:
- π Browser Automation: YouTube videos, interactive websites, authentication, visual tasks
- π Web Search: Simple information queries, static content, documentation
- π οΈ Tool Creation: Mathematical calculations, API integrations, data processing
- π¨ Synthesis: Combining information into comprehensive answers
Open-Alita implements a modular, self-evolving architecture built around the Model Context Protocol (MCP) and LangGraph workflows. The system is designed to minimize hard-coded behaviors while maximizing adaptive capabilities.
π View Complete Architecture Diagram
The architecture consists of multiple layers working together to provide intelligent query routing, dynamic tool creation, web search, and browser automation capabilities.
The intelligent workflow coordinator that routes tasks to the most appropriate agent.
Key Features:
- Smart Task Routing: Uses LLM to analyze queries and determine the best approach
- Browser Automation Detection: Automatically identifies tasks requiring browser interaction
- Multi-Agent Orchestration: Coordinates between web search, tool creation, and browser automation
- Iterative Refinement: Continuously improves results through multiple iterations
Handles complex web automation tasks using browser-use.
Key Features:
- Video Content: YouTube videos, video analysis, watching content
- Visual Tasks: Screenshots, image analysis, OCR, visual verification
- Interactive Websites: Login forms, shopping carts, social media interactions
- Dynamic Content: JavaScript-heavy sites, real-time updates, SPAs
- Authentication: OAuth flows, API keys, user credentials
- Platform Integration: GitHub, Twitter, LinkedIn, Discord, Slack
- E-commerce: Shopping, checkout processes, product browsing
- Multi-step Workflows: Job applications, form filling, complex processes
Handles web search and content extraction for simple information queries.
Key Features:
- Intelligent Query Decomposition: Breaks complex queries into focused searches
- Multiple Search Strategies: Targeted, broader, and verification searches
- Content Analysis: Relevance scoring, credibility assessment, summarization
- Follow-up Searches: Automatically identifies and fills information gaps
Creates and executes custom tools for computational and data processing tasks.
Key Features:
- Dynamic Tool Creation: Generates specialized tools based on query requirements
- Tool Chaining: Executes multiple tools in sequence or parallel
- Dependency Management: Handles tool dependencies and execution order
- Specialized Capabilities: API integrations, data analysis, system operations
The central orchestrator that handles user interactions and coordinates all system components.
Key Features:
- Dynamic Intent Recognition: Uses LLM-powered natural language understanding
- Command Processing: Handles both streaming and batch processing modes
- MCP Lifecycle Management: Creates, caches, and reuses dynamic capabilities
- Fallback Mechanisms: Gracefully degrades from complex workflows to simple handling
A Flask-based web application providing an intuitive chat interface.
Components:
- WebApp (
web_app.py): Flask server with streaming response support - Frontend (
templates/index.html): Real-time chat interface with streaming - Styling (
static/style.css): Modern, responsive UI design
User Input β LangGraph Coordinator β LLM Analysis
β
Route to appropriate agent:
βββ Browser Agent (browser-use) β Complex web automation
βββ Web Agent β Information search and extraction
βββ MCP Agent β Tool creation and execution
βββ Synthesizer β Final answer generation
β
Evaluator β Assess completeness β Iterate if needed
β
Return comprehensive result
To set up the project with browser automation support:
# Clone the repository
git clone https://github.com/ryantzr1/OpenAlita
cd Open-Alita
# Install dependencies
uv sync
# Install browser automation dependencies
uv add browser-use
playwright install chromium --with-deps --no-shell
# Set up environment variables
cp .env.example .env
# Edit .env and add your API keys:
# DEEPWISDOM_API_KEY=your_deepwisdom_key (required for all functionality)
# ANTHROPIC_API_KEY=your_anthropic_key (required for browser automation)
# FIRECRAWL_API_KEY=your_firecrawl_key (optional)- π Browser Automation: Full browser control for complex web tasks using browser-use
- π Intelligent Web Search: Multi-strategy search with content analysis and follow-ups
- π οΈ Dynamic Tool Creation: Generate specialized tools on-demand using MCP
- π§ Smart Task Routing: LLM-powered analysis to choose the best approach
- π Iterative Refinement: Continuously improve results through multiple iterations
- π Real-time Streaming: Watch agents work in real-time with streaming output
- πΎ Capability Reuse: Automatically cache and reuse previously generated tools
- π― Natural Language Processing: Advanced intent recognition for intuitive interactions
To run the Alita agent with the web interface:
cd src/web
python web_app.pyThis will start the server on http://127.0.0.1:5001/.
# Information queries
"What is the latest news about AI developments?"
"How many stars does the Python repository have on GitHub?"
"What is the weather forecast for tomorrow?"
# Research
"Find information about the history of machine learning"
"Research the latest trends in web development"# Calculations
"Calculate the area of a circle with radius 10"
"Solve the quadratic equation xΒ² + 5x + 6 = 0"
# Data processing
"Analyze this dataset and create a summary report"
"Convert this CSV file to JSON format"
# System operations
"Get my current IP address and check if it's in a specific country"
"List all files in the current directory and sort them by size"Create a .env file with the following variables:
# Required for all functionality
LLM_MODEL_NAME="your_model_name"
LLM_API_KEY=your_llm_api_key
# Required for browser automation
ANTHROPIC_API_KEY=your_anthropic_api_key
# Optional for enhanced web search
FIRECRAWL_API_KEY=your_firecrawl_api_keyNote: LiteLLM support for multiple LLM providers will be added back in a future update, allowing you to use various AI model providers through a unified interface.
Ensure browser automation is properly configured:
# Install browser-use
uv add browser-use
# Install Playwright browser
playwright install chromium --with-deps --no-shell
# Verify installation
python -c "import browser_use; print('Browser automation ready!')"-
browser-use not installed:
uv add browser-use
-
Playwright browser not found:
playwright install chromium --with-deps --no-shell
-
Missing API keys:
- Add your API keys to the
.envfile - Ensure the keys are valid and have sufficient credits
- Add your API keys to the
-
Browser automation fails:
- Check if the task requires authentication
- Verify the website is accessible
- Try rephrasing the request
- Import errors: Ensure all dependencies are installed
- API rate limits: Check your API key usage and limits
- Memory issues: Restart the application if it becomes unresponsive
Contributions are welcome! If you have ideas for improvements or new features, feel free to submit a pull request or open an issue.
This project is licensed under the MIT License. See the LICENSE file for more details.
π Special Thanks to DeepWisdom
We extend our deepest gratitude to DeepWisdom for providing API credits during the development process. Their generous support has been absolutely crucial in making Open-Alita possible - huge huge huge thanks! π
Special thanks to:
- browser-use for powerful browser automation capabilities
- LangGraph for workflow orchestration
- Model Context Protocol for dynamic tool creation
We would like to thank the contributors and the community for their support and feedback in developing Open-Alita.
Open-Alita includes comprehensive testing capabilities for evaluating agent performance on the GAIA benchmark. The GAIA test suite helps validate the agent's reasoning, tool usage, and problem-solving abilities.
The GAIA tests cover various domains including:
- Mathematical reasoning and calculations
- Web search and information retrieval
- Tool creation and execution
- Multi-step problem solving
- Real-world task completion