Automated Web Scraper with Instant Data Scraper Extension

This project automates batch processing of URLs using the Instant Data Scraper Chrome extension. It programmatically controls the extension to extract tabular data from multiple web pages, eliminating the need for manual extension activation and data downloading for each URL.

The script will:

Read URLs from queue.txt
Process each URL automatically
Move processed URLs to crawled.txt
Save extracted data as CSV files

Prerequisites

Python 3.7+
Chrome browser
Instant Data Scraper Extension
ChromeDriver (matching your Chrome version)

Required Python Packages

pip install selenium
pip install pyautogui
pip install pandas


## Important Note

The current implementation uses hardcoded screen coordinates for automating the extension interface. These coordinates are set for a specific screen resolution and will need to be adjusted for different displays:

- Chrome extension button: (1762, 94)
- Extension activate button: (1460, 302)
- Download CSV button: (1003, 153)
- Extension close button: (1883, 14)

## Usage

1. Update the paths in `main.py`:

```python
chromedriver_path = 'path/to/your/chromedriver.exe'
instant_data_scraper_path = 'path/to/extension.crx'
url = "your_target_url"

Add your target URLs to queue.txt, one URL per line
Run the scraper:

python main.py

Features

Batch URL processing from queue.txt
Automated extension workflow for each URL:
- Extension activation
- Data extraction
- CSV download
- Extension closure
Tracking of processed URLs in crawled.txt
CAPTCHA detection
Logging system

Limitations

Screen Resolution Dependency
- The automation relies on specific screen coordinates
- Will only work on screens matching the coded resolution
Single Page Scraping
- Currently handles only single pages
- Pagination support is implemented but commented out
Error Handling
- Basic error handling for CAPTCHA detection
- Limited handling for extension failures

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is for educational purposes only. Make sure to comply with the target website's terms of service and robots.txt file before scraping. Always include appropriate delays between requests to avoid overwhelming servers.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
instant_data_scraper.py		instant_data_scraper.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Web Scraper with Instant Data Scraper Extension

The script will:

Prerequisites

Required Python Packages

Features

Limitations

Contributing

License

Disclaimer

About

Uh oh!

Releases

Packages

Languages

masoudrahimi39/auto-web-scrapting

Folders and files

Latest commit

History

Repository files navigation

Automated Web Scraper with Instant Data Scraper Extension

The script will:

Prerequisites

Required Python Packages

Features

Limitations

Contributing

License

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages