🌐 Web Scraper

An advanced web scraping solution that combines the power of AI with automated data extraction. Built with a modern tech stack and featuring an intuitive Streamlit interface, this tool transforms complex web data into structured, analysis-ready formats.

🌟 Key Features

🤖 AI-Powered Data Extraction - Utilizes multiple LLM models for intelligent data parsing
🎯 Custom Field Selection - Define exactly what data you want to extract
📊 Multi-Format Export - Export to JSON, Excel, and Markdown
⚡ Real-Time Processing - Watch the scraping process in action
🎨 Modern UI/UX - Clean, responsive interface built with Streamlit
🔄 Progress Tracking - Live updates on scraping status

🚀 Getting Started

Prerequisites

Python 3.11+
Google Chrome 132+
pip (Python package manager)

Quick Install

Clone the repository:

git clone https://github.com/yourusername/intelligent-web-scraper.git
cd intelligent-web-scraper

Install dependencies:

pip install -r requirements.txt

Launch the application:

streamlit run streamlit_app.py

📸 Usage Example

1. Select Target Website

2. Configure Scraping Parameters

3. View Results

4. Access Exported Data

Example output format:

{
  "listings": [
    {
      "train_number": "12345",
      "train_name": "Express",
      "departure": "10:00 AM",
      "arrival": "06:30 PM",
      "duration": "8h 30m"
    }
  ]
}

🛠️ Technology Stack

Web Automation: Selenium WebDriver
AI Models: OpenAI GPT-4, Google Gemini, Llama
Frontend: Streamlit
Data Processing: Pandas, BeautifulSoup4
Export Formats: JSON, Excel, Markdown
Browser Driver: ChromeDriver

📁 Project Structure

intelligent-web-scraper/
├── streamlit_app.py     # Main application interface
├── scraper.py          # Core scraping engine
├── assets.py          # Utility functions and constants
├── requirements.txt   # Project dependencies
├── output/           # Exported data directory
└── chromedriver/    # Chrome WebDriver files

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Support

If you encounter any issues or have questions:

Open an issue in the GitHub repository
Contact the maintainer at [email protected]

🙏 Acknowledgments

Selenium Documentation Team
Streamlit Community
ChromeDriver Development Team
All our contributors and users

Made by Priyankesh

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.streamlit		.streamlit
chromedriver-linux64		chromedriver-linux64
chromedriver-win64		chromedriver-win64
.gitignore		.gitignore
README.md		README.md
assets.py		assets.py
logo.svg		logo.svg
packages.txt		packages.txt
requirements.txt		requirements.txt
scraper.py		scraper.py
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Web Scraper

🌟 Key Features

🚀 Getting Started

Prerequisites

Quick Install

📸 Usage Example

1. Select Target Website

2. Configure Scraping Parameters

3. View Results

4. Access Exported Data

🛠️ Technology Stack

📁 Project Structure

🤝 Contributing

📝 License

👥 Support

🙏 Acknowledgments

About

Releases

Packages

Languages

priyankeshh/web-scraper

Folders and files

Latest commit

History

Repository files navigation

🌐 Web Scraper

🌟 Key Features

🚀 Getting Started

Prerequisites

Quick Install

📸 Usage Example

1. Select Target Website

2. Configure Scraping Parameters

3. View Results

4. Access Exported Data

🛠️ Technology Stack

📁 Project Structure

🤝 Contributing

📝 License

👥 Support

🙏 Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages