This repository is for the Site Scanning engine itself, the codebase that actually runs the scans and generates data. This is the new base scanner repository which uses Headless Chrome, powered by Puppeteer for scanning.
For more detailed documentation about the Site Scanning program, including who it's for, what it does, long-term goals, etc. please visit the Site Scanning program website, especially the Technical Details page.
The project's issue tracker and other relevant repositories and links can be found here.
Development Requirements:
gitnodejsnvm(see .nvmrc for currentnodeversion.dockerdocker-compose- Cloud Foundry CLI (aka
cf) redis-cli(optional)
First clone the repository:
git clone https://github.com/GSA/site-scanning-engine/From the project root run:
nvm useThis will install the correct Node version for the project.
npm iThis will install all production and development Node dependencies.
The project uses a dotenv (.env) file for local development credentials.
Note that this file is not version-controlled and should only be used for
local development.
Before starting Docker, create a .env file in the project root and add
the following values replacing <add_a_key_here> with a local passwords
that are at least 8 characters long.
Note: this is only for local development and has no impact on the Cloud.gov configuration
# postgres configuration
DATABASE_HOST=localhost
DATABASE_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<add_a_key_here>
# redis configuration
QUEUE_HOST=localhost
QUEUE_PORT=6379
# Minio Config -- Minio is an S3 api compliant storage
MINIO_ACCESS_KEY=<add_a_key_here>
MINIO_SECRET_KEY=<add_a_key_here>
AWS_ACCESS_KEY_ID=<add_a_key_here>
AWS_SECRET_ACCESS_KEY=<add_a_key_here>
S3_HOSTNAME=localhost
S3_PORT=9000
S3_BUCKET_NAME=site-scanning-snapshot
# Sets the development environment name to dev
NODE_ENV=devFrom the project root run:
docker-compose up --build -dThis will build (--build) all of the Docker containers and
network interfaces listed in the
docker-compose.yml file and start them
running in the background (-d).
docker-compose down will stop and remove all containers
and network interfaces.
Running docker-compose up --build -d will rebuild all of
the containers. This is useful if you need to wipe data from
the database, for instance.
If you encounter any issues starting the containers with
docker-compose, specifically related to OOM errors
(or Exit 137) try upping the resources in your Docker
preferences.
To build the application image, go to the project root and run:
docker build -f apps/scan-engine/Dockerfile .cd to the project root and run:
npm run build:allThis command will build the apps, which compiles from Typescript
to Javascript, doing any minification and optimization in the
process. All of the app artifacts end up in the /dist directory.
This is ultimately what gets pushed to Cloud Foundry.
Note that you can also build apps seperately:
npm run build:api
npm run build:scan-engine
npm run build:cliNext, you can start the apps with following command:
npm run start:allThe apps are started as follows: first the API starts and then the Site Scanning worker follows. This is designed so that the API app runs any shared configuration against the database first.
Note, that you can start the apps individually as follows:
npm run start:api
npm run start:scan-engineThe Site Scanning engine relies on a list of federal domains and metadata about those to domains to operate. This list is ingested into the system from a public repository using a the Ingest Service.
To run the ingest service do the following:
npm run ingest -- --limit 200The limit parameter is optional, but it can be useful to use a smaller subset of the total list for local development.
To enqueue for scan all sites in the website table:
npx nest start cli -- enqueue-scansTo scan a single site, which must be in the website table, run commands like:
npx nest start cli -- scan-site --url 18f.govNOTE: This is intended for testing scan behavior, and doesn't currently write results to the database.
From the project root run:
npm run test:unitThis runs all unit tests.
First, log in to Cloud.gov using the CLI and choose the organization and space.
Then, you can use the cloudgov-deploy.sh script to build and deploy the apps.
You can optionally pass a different manifest file with cloudgov-deploy.sh manifest-dev.yml.