Skip to content

Commit c6733a0

Browse files
committedOct 17, 2020
adding readme.md and requirements.txt
1 parent d869d1d commit c6733a0

File tree

4 files changed

+21
-1
lines changed

4 files changed

+21
-1
lines changed
 

‎quotes/quotes/settings.py

+9
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,18 @@
1212
SPIDER_MODULES = ['quotes.spiders']
1313
NEWSPIDER_MODULE = 'quotes.spiders'
1414

15+
# PROXY_POOL_ENABLED = True
1516

1617
# Crawl responsibly by identifying yourself (and your website) on the user-agent
1718
#USER_AGENT = 'quotes (+http://www.yourdomain.com)'
19+
DOWNLOADER_MIDDLEWARES = {
20+
#The below two lines are for user agents
21+
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
22+
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
23+
# Enable the below line to use proxies
24+
# 'scrapy_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
25+
# 'scrapy_proxy_pool.middlewares.BanDetectionMiddleware': 620,
26+
}
1827

1928
# Obey robots.txt rules
2029
ROBOTSTXT_OBEY = True

‎quotes/quotes/spiders/QuotesScrapper.py ‎quotes/quotes/spiders/QuotesScraper.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ class QuotesScraper(scrapy.Spider):
1010

1111
def _parse(self, response, **kwargs):
1212
item = QuotesItem()
13-
for quote in response.css(".quote"):
13+
for quote in response.css(".quote")[:2]:
1414
title = quote.css(".quoteText::text").extract_first()
1515
author = quote.css(".authorOrTitle::text").extract_first()
1616
item["title"] = title

‎readme.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
## **QuotesScrapy**
2+
This scraper is based on the scrapy framework with pagination feature. It uses fake user agents to bypass the security.
3+
4+
Steps to run the projects:-
5+
1. Activate virtual env with `. env/bin/activate`
6+
2. Install requirements using `pip install -r requirements.txt`
7+
3. Run the following commands:-
8+
<br>`scrapy crawl QuotesScraper`

‎requirements.txt

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Scrapy==2.4.0
2+
scrapy-proxy-pool==0.1.9
3+
scrapy-user-agents==0.1.1

0 commit comments

Comments
 (0)