Skip to content

A web scraper to retrieve data from gyakorikerdesek.hu

Notifications You must be signed in to change notification settings

DSuveges/gyik_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GYK scraper

Codacy Badge

This code scrapes data from gyakorikerdesek.hu. The data is not too complex, but complex enough to make sense to load into an sqlite database. The loaded data then can be used to do some analytics.

Finally the code is getting into shape. User can decide to fetch all questions for a category, a range of a category by defining a start and end page or fetching a single question based on a provided URL.

Usage

python gyik_scraper.py --database <str> \
            --category <str> \
            --subCategory <str> \
            --startPage <int> \
            --endPage <int> \
            --directQuestion <str> \
            --logFile <str>

where

  • database: mandatory option, sqlite database file. If not exists, the script creates.
  • category: mandatory option. Main GyIK category (eg. tudomanyok).
  • subCategory: optionally the subcategory within the category can be specified to narow down the scope.
  • startPage: optional. First page of questions to load. Default: 1.
  • lastpage: optional. The last page of questions to load. Default: last page of questions.
  • directQuestion: optional. If present only this question will be downloaded. Mostly for testing purposes.
  • logFile: optional filename for the logs. Default filename: scraper.log

The start page has to be lower then last page. To retrieve all questions for a category these paremeters needs to be omitted.

SQLite schema

db schema

About

A web scraper to retrieve data from gyakorikerdesek.hu

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages