forked from ntblk/block-crawler
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
18b605e
commit d5df718
Showing
2 changed files
with
48 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Installing and running block-crawler | ||
|
||
Pre-requisites | ||
-------------- | ||
|
||
You need Node.js (version >= 8) and the npm package manager. | ||
|
||
Installing dependencies | ||
----------------------- | ||
|
||
npm install | ||
|
||
Running | ||
------- | ||
|
||
Simplest run: | ||
|
||
node index.js http://starting.point.example/ | ||
|
||
(The Node.js executable may be named "nodejs" on your system) | ||
|
||
Running with a collector: | ||
|
||
node index.js --collector https://collector.example/ http://starting.point.example/ | ||
|
||
The collector has to be able to receive POST results and do something | ||
with them. A very limited collector in Python+WSGI is: | ||
|
||
def store(start_response, environ): | ||
fileo = open("/var/storage/store.log", 'a') | ||
status = '200 OK' | ||
data = environ['wsgi.input'].read() | ||
fileo.write(data) | ||
fileo.close() | ||
output = "Stored %i bytes\n" % len(data) | ||
response_headers = [('Content-Type', 'text/plain'), | ||
('Content-Length', str(len(output)))] | ||
start_response(status, response_headers) | ||
return [output] | ||
|
||
The results (only the HTTP errors) will appear in JSON format in | ||
/var/storage/store.log, for instance: | ||
|
||
{"date":"2017-11-11T12:10:07.314Z","creator":"block-crawler","version":"0.1","url":"http://httpstat.us/451","status":451,"statusText":"Unavailable For Legal Reasons"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters