Skip to content

Zomato Scraper #2567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Zomato Scraper/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Infinite Scroll Web Scraping

This Python script uses Selenium and BeautifulSoup to perform web scraping with infinite scroll on the Zomato website. The script navigates to a specific page on Zomato that lists cafes in Ahmedabad, India, and extracts details such as name, link, rating, cuisine, and rate for each cafe.

## Requirements

- Google Chrome (or another browser) with a compatible version of ChromeDriver
- ChromeDriver (compatible with your Chrome version)

## Working

The script will start scraping the cafes on the Zomato page. It will scroll down the page and print the details of each cafe until you interrupt the process (e.g., using Ctrl + C).
2 changes: 2 additions & 0 deletions Zomato Scraper/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
beautifulsoup4==4.10.0
selenium==3.141.0
34 changes: 34 additions & 0 deletions Zomato Scraper/zomato.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import re
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome()
url = "https://zomato.com/ahmedabad/restaurants/cafes?category=2"
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

container = soup.find("div",{"id":"root"})
i = 0

while True:
i = 0
for items in container.find_all("div",class_=re.compile("sc-1mo3ldo-0 sc-")):
if i==0:
i = 1
continue
print(items.text)
first_child = items.find("div")
for item in first_child:
link = item.find("a",href=True)['href']
print(link)
name = item.find("h4")
print(name.text)
rating = item.find("div",{"class":"sc-1q7bklc-1 cILgox"})
print(rating.text)
cusine = item.find("p")
print(cusine.text)
rate = item.find("p").next_sibling
print(rate.text)