Sera(Smart Search Assistant) is an intelligent agentic web navigation system designed to autonomously perform human-like web browsing behaviors based on natural language goals. It can search, read, click, scroll, and complete structured tasks across arbitrary websites with changing layouts and DOM structures. Leveraging modular design principles, Sera combines advanced Large Language Models (LLMs) for task planning with DOM-aware execution modules and browser automation to provide a transparent, interpretable, and deployable solution for automating complex web tasks. Users can input queries via text to issue web search queries or task instructions naturally. The system processes and seamlessly integrates the commands into the planning and execution pipeline.
- Developed an AI Agent to convert natural language into structured multi-step plans using prompt engineering.
- Build another AI Agent to translate each plan step into precise DOM-aware browser actions.
- Integrate Playwright to fetch DOM snapshots, execute browser actions (click, type, scroll), and enable closed-loop navigation.
- Design a simple UI for users to input queries, track agent progress, and view final outputs.
References
- https://arxiv.org/pdf/2401.13919
- https://arxiv.org/pdf/2407.13032
- https://arxiv.org/pdf/2410.19609
- https://www.youtube.com/watch?v=wGr5rz8WGCE
- Agentic Systems
- AI Agents
- DOM Parsing
- PlayWright
- NLP
- LLM/VLM
https://drive.google.com/file/d/1Q4ttkGFsgW10r34Tw-wXoAV5tZF0WCv3/view?usp=drivesdk