A Node.js-based server that crawls websites on the Permaweb via ArNS names and publishes the results as Parquet files on Arweave.
-
Install all dependencies:
npm i -
Create a
wallet.jsonwith an Arweave JWK wallet. The wallet needs $AR for bigger uploads (>100KB), but only crawling a few pages should work without tokens. -
Ensure port
3000is available and start the crawler:npm run dev -
Open http://localhost:3000/app/ in a browser to start a crawl.
-
Start a crawl by entering an ArNS name (e.g., docs).
-
Download the Parquet files from Arweave.
-
Build the container image:
npm run docker:build -
Run the container image:
npm run docker:start
The crawler uses environment variables for configuration.
-
LOG_LEVELThe detail level of the logs.
Takes
debug,info,warn, orerror. Default isinfo. -
PORTThe port of the webserver.
Takes a number. Default is
3000. -
WALLET_PATHThe path to the Arweave JWK wallet used for Parquet file uploads.
Takes a string. Default is
./wallet.json. -
FALLBACK_GATEWAYThe gateway used to download a HTML page when a network gateway failed.
Takes a string. Default is
permagate.io. -
MAX_TASKSThe number of tasks (finished or not) to keep around.
Takes a number. Default is
100.
-
GET
/tasks/Returns a list of tasks the crawler handled or will be handling.
-
POST
/tasks/Creates a new task.
Takes a JSON objcet with the task config options.