NorthData Scraper

Docker-based service that scrapes northdata.de via a REST API.

Features

Docker setup with memory limits
Puppeteer with stealth plugin
Express REST API
In-memory queue for sequential processing
Request blocking for api.rupt.dev
Human-like behavior with slow typing
Debug mode with visible browser window

Setup

Clone the repository and navigate to the directory
Create a .env file from .env.example
Add your northdata.de credentials to .env

Running

With Docker:

docker-compose up --build

For development:

npm install
npm run dev

API Endpoints

Search

POST /search
Content-Type: application/json
Body: {"query": "Company Name"}

Returns HTML content of search results.

Suggestions

GET /suggest?query=Company

Returns JSON suggestions from northdata.de's suggestion API.

Page Content

GET /page?url=https://www.northdata.de/...

Returns cleaned HTML content of a specific page, with:

Only the main content section
No JavaScript, CSS, links, images, or non-informational elements
Minimal whitespace to reduce token count for downstream AI processing

Health Check

GET /health

Returns status of the service and queue information.

Configuration

Key environment variables:

PORT: Server port (default: 3000)
NORTHDATA_USERNAME: northdata.de username
NORTHDATA_PASSWORD: northdata.de password
BROWSER_HEADLESS: Set to 'false' for debug mode
TYPING_DELAY_MIN/MAX: Keystroke delay in ms
WAIT_FOR_NETWORK_IDLE: Wait for network idle after navigation

Debug Mode

Set BROWSER_HEADLESS=false in .env to see browser interactions.

For Docker debugging (Linux only):

docker-compose -f docker-compose.yml -f docker-compose.debug.yml up

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NorthData Scraper

Features

Setup

Running

API Endpoints

Search

Suggestions

Page Content

Health Check

Configuration

Debug Mode

About

Releases

Packages

Languages

jannikhst/northdata_scraper

Folders and files

Latest commit

History

Repository files navigation

NorthData Scraper

Features

Setup

Running

API Endpoints

Search

Suggestions

Page Content

Health Check

Configuration

Debug Mode

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages