An advanced web scraping solution that combines the power of AI with automated data extraction. Built with a modern tech stack and featuring an intuitive Streamlit interface, this tool transforms complex web data into structured, analysis-ready formats.
- 🤖 AI-Powered Data Extraction - Utilizes multiple LLM models for intelligent data parsing
- 🎯 Custom Field Selection - Define exactly what data you want to extract
- 📊 Multi-Format Export - Export to JSON, Excel, and Markdown
- ⚡ Real-Time Processing - Watch the scraping process in action
- 🎨 Modern UI/UX - Clean, responsive interface built with Streamlit
- 🔄 Progress Tracking - Live updates on scraping status
- Python 3.11+
- Google Chrome 132+
- pip (Python package manager)
- Clone the repository:
git clone https://github.com/yourusername/intelligent-web-scraper.git
cd intelligent-web-scraper
- Install dependencies:
pip install -r requirements.txt
- Launch the application:
streamlit run streamlit_app.py
Example output format:
{
"listings": [
{
"train_number": "12345",
"train_name": "Express",
"departure": "10:00 AM",
"arrival": "06:30 PM",
"duration": "8h 30m"
}
]
}
- Web Automation: Selenium WebDriver
- AI Models: OpenAI GPT-4, Google Gemini, Llama
- Frontend: Streamlit
- Data Processing: Pandas, BeautifulSoup4
- Export Formats: JSON, Excel, Markdown
- Browser Driver: ChromeDriver
intelligent-web-scraper/
├── streamlit_app.py # Main application interface
├── scraper.py # Core scraping engine
├── assets.py # Utility functions and constants
├── requirements.txt # Project dependencies
├── output/ # Exported data directory
└── chromedriver/ # Chrome WebDriver files
We welcome contributions! Here's how you can help:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions:
- Open an issue in the GitHub repository
- Contact the maintainer at [email protected]
- Selenium Documentation Team
- Streamlit Community
- ChromeDriver Development Team
- All our contributors and users
Made by Priyankesh