Scraper of Tripadvisor reviews, parametric by date and language. The script allows to scrape:
- urls of TA points of interests based on string query
- POIs metadata
- POIs reviews up to a certain minimum date and with a specified language
Follow these steps to use the scraper:
- 
Download Chromedrive from here. 
- 
Install Python packages from requirements file, either using pip, conda or virtualenv: `conda create --name scraping python=3.6 --file requirements.txt`
Note: Python >= 3.6 is required.
The scraper has 5 parameters:
- --i: input file, containing a list of Tripadvisor urls that point to first page of reviews.
- --lang: language code to filter reviews. Note: only "select all languages" click is implemented.
- --N: number of reviews to scrape.
- --q: string query to scrape url places.
- --place: boolean value to scrape place metadata instead of reviews.
Some examples:
- python scraper.py --q amsterdam: generates the urls.txt file with the top-30 POIs of amsterdam
- python scraper.py --place 1: generates a csv file containing metadata of places present in urls.txt
- python scraper.py: generates a csv file containing reviews of places present in urls.txt
The config.json file allows to set the directory to store output csv, as well as their filenames.
GNU GPLv3