datagov-catalog

catalog.data.gov is the public-facing dataset discovery and search application for Data.gov. It serves 515,000+ datasets from 120+ federal, state, municipal, university, and tribal publishing organizations.

This application is a custom Python/Flask web application that replaced the legacy CKAN-based catalog in 2025. It reads from the shared harvest database managed by datagov-harvester and uses OpenSearch for full-text search.

Production: catalog.data.gov
Legacy catalog (through fall 2026): catalog-old.data.gov

Architecture

Web app: Python/Flask, served via NGINX proxy on cloud.gov
Database: Shared Postgres instance managed by datagov-harvester (datagov-harvest-db service)
Search: OpenSearch (((app_name))-opensearch service on cloud.gov)
Storage: S3 for sitemaps and static assets
Monitoring: New Relic
Logging: Logstack (cloud.gov log drain)

The application does not write to the harvest database -- it reads only. All dataset metadata is written by datagov-harvester. The SQLAlchemy models are duplicated locally in app/models.py for isolation; interact with the shared DB through CatalogDBInterface (app/database/interface.py).

Local Development

Prerequisites

Docker and Docker Compose
Python 3.x and Poetry
Node.js and npm (for static assets and accessibility testing)

Setup

Copy the sample environment file:

   cp .env.sample .env

Update values in .env as needed for your local services (file is ignored by Git)
Install static assets:

   make install-static

Start the app:

   make up

Load test data:

   make load-test-data

Running tests

Run the full Python test suite:

make test

Run accessibility tests (requires running app):

make test-pa11y

Run linting:

make lint-check

Auto-fix linting:

make lint-fix

Poetry

CI uses the latest Poetry release. Keep your local Poetry up to date:

make poetry-update

Environment variables

Variable	Description
`DATABASE_SERVER`	Postgres host (default: localhost)
`DATABASE_PORT`	Postgres port (default: 5432)
`DATABASE_NAME`	Postgres database name
`DATABASE_USER`	Postgres user
`DATABASE_PASSWORD`	Postgres password
`DATABASE_URI`	Full Postgres connection URI (auto-constructed from above)
`PORT`	App port (default: 8080)
`OPENSEARCH_HOST`	OpenSearch host (default: localhost)
`NEW_RELIC_LICENSE_KEY`	New Relic license key
`NEW_RELIC_APP_NAME`	New Relic app name
`NEW_RELIC_MONITOR_MODE`	Enable New Relic monitoring (true/false)
`NEW_RELIC_LOG`	New Relic log file path
`NEW_RELIC_LOG_LEVEL`	New Relic log level
`SITEMAP_AWS_REGION`	AWS region for sitemap S3 bucket
`SITEMAP_AWS_ACCESS_KEY_ID`	AWS access key for sitemap S3 bucket
`SITEMAP_AWS_SECRET_ACCESS_KEY`	AWS secret key for sitemap S3 bucket
`SITEMAP_S3_BUCKET`	S3 bucket name for sitemaps

For cloud.gov deployments, secrets are managed via user-provided services. See the cloud.gov wiki page for secrets management procedures.

Deployment

Deployments are triggered automatically via GitHub Actions on push to main. The deploy workflow runs in this order:

Lint -- runs ruff Python linting
Deploy to staging -- deploys to the staging cloud.gov space and runs a smoke test
Deploy to prod -- deploys to the prod cloud.gov space and runs a smoke test (only runs after staging succeeds)

Cloud.gov spaces:

staging
prod

For emergency deployments outside of the normal CI/CD pipeline, see Break Glass deployment.

dataset_view_count seeding

The dataset_view_count table stores view count records for each dataset slug, used to populate the popularity column. Data is primarily populated from Google Analytics. For local testing, seed the table with:

CREATE OR REPLACE FUNCTION public.generate_popularity()
RETURNS integer
LANGUAGE plpgsql
VOLATILE AS $$
BEGIN
  RETURN CASE
    WHEN random() < 0.80 THEN (random() * 51)::integer
    WHEN random() < 0.90 THEN (51 + random() * 50)::integer
    WHEN random() < 0.95 THEN (101 + random() * 900)::integer
    ELSE (1001 + random() * 4000)::integer
  END;
END; $$;

TRUNCATE TABLE dataset_view_count;

INSERT INTO dataset_view_count (id, dataset_slug, view_count)
SELECT gen_random_uuid()::VARCHAR(36) AS id,
       slug AS dataset_slug,
       generate_popularity() AS view_count
FROM dataset;

Local Accessibility Testing

We use pa11y-ci for accessibility testing.

Install dependencies: npm install
Load test data: make load-test-data
Run pa11y tests: make test-pa11y

Related resources

harvest.data.gov -- harvest pipeline UI
datagov-harvester -- harvester source code and shared DB
Data.gov wiki -- operational documentation
catalog.data.gov wiki page

Name		Name	Last commit message	Last commit date
Latest commit History 1,169 Commits
.github		.github
app		app
config		config
docs		docs
proxy		proxy
shared		shared
tests		tests
tools		tools
.env		.env
.env.sample		.env.sample
.gitignore		.gitignore
.pa11yci		.pa11yci
.profile		.profile
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
app-start.sh		app-start.sh
docker-compose.yml		docker-compose.yml
docker-compose_debug.yml		docker-compose_debug.yml
docker-compose_prod.yml		docker-compose_prod.yml
gunicorn.conf.py		gunicorn.conf.py
lighthouserc.yml		lighthouserc.yml
manifest.yml		manifest.yml
package.json		package.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run.py		run.py
runtime.txt		runtime.txt
vars.development.yml		vars.development.yml
vars.prod.yml		vars.prod.yml
vars.staging.yml		vars.staging.yml
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datagov-catalog

Architecture

Local Development

Prerequisites

Setup

Running tests

Poetry

Environment variables

Deployment

dataset_view_count seeding

Local Accessibility Testing

Related resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

datagov-catalog

Architecture

Local Development

Prerequisites

Setup

Running tests

Poetry

Environment variables

Deployment

dataset_view_count seeding

Local Accessibility Testing

Related resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages