Skip to content

Web Crawling UI and HTTP API, based on Scrapy and Tornado

Notifications You must be signed in to change notification settings

rmax-archive/arachnado

This branch is 4 commits behind TeamHG-Memex/arachnado:master.

Folders and files

NameName
Last commit message
Last commit date
Aug 9, 2016
Aug 12, 2016
Aug 9, 2016
May 25, 2016
Jul 2, 2016
Aug 7, 2015
Jul 7, 2016
Aug 7, 2015
Jul 2, 2016
May 25, 2016
Jul 1, 2016
Jun 6, 2016
Aug 8, 2016
May 25, 2016

Repository files navigation

Arachnado

Arachnado is a tool to crawl a specific website. It provides a Tornado-based HTTP API and a web UI for a Scrapy-based crawler.

License is MIT.

Install

Arachnado requires Python 2.7 or Python 3.5. To install Arachnado use pip:

pip install arachnado

Run

To start Arachnado execute arachnado command:

arachnado

and then visit http://0.0.0.0:8888 (or whatever URL is configured).

To see available command-line options use

arachnado --help

Arachnado can be configured using a config file. Put it to one of the common locations ('/etc/arachnado.conf', '~/.config/arachnado.conf' or '~/.arachnado.conf') or pass the file name as an argument when starting the server:

arachnado --config ./my-config.conf

For available options check https://github.com/TeamHG-Memex/arachnado/blob/master/arachnado/config/defaults.conf.

Tests

To run tests make sure tox is installed, then execute tox command from the source root.

Development

To build Arachnado static assets node.js + npm are required. Install all JavaScript requirements using npm - run the following command from the repo root:

npm install

then rebuild static files (we use Webpack):

npm run build

or auto-build static files on each change during development:

npm run watch

About

Web Crawling UI and HTTP API, based on Scrapy and Tornado

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 63.1%
  • JavaScript 32.2%
  • Julia 2.8%
  • HTML 1.8%
  • CSS 0.1%