Welcome to Hades, a robust job scheduler designed with scalability in mind. Hades' primary mission is to provide a straightforward, scalable, and adaptable solution for executing containerized workloads in various environments, from educational programming courses to research computing clusters.
Hades embodies several core design principles:
-
Simplicity: Hades focuses on delivering just the essentials required to execute containerized jobs efficiently, without unnecessary complexity.
-
Scalability: Hades has scalability at its core, capable of queuing and executing a vast number of jobs in parallel, making it ideal for large-scale operations.
-
Container-Based: Hades executes jobs within containers, ensuring a high level of isolation and security between workloads.
-
Kubernetes Native: As a Kubernetes-native solution, Hades leverages the power and flexibility of Kubernetes as its primary execution platform for production workloads.
-
Extensibility: Hades is designed to be highly extensible, allowing for easy integration with other execution platforms and workflow systems as needed.
Hades is built upon the following key components:
-
API: Serving as the main entry point, the API handles all incoming job requests and provides status information.
-
Queue: Using NATS as a message queue, this component is responsible for managing the queue of jobs, ensuring efficient scheduling and reliable delivery.
-
Scheduler: The scheduler orchestrates the execution of jobs, coordinating with the executor components to run each job step in the appropriate environment.
-
Docker Executor: Designed for local development, the Docker executor is responsible for running jobs within Docker containers on a single host.
-
Hades Operator (Recommended): The modern, production-ready standard for Kubernetes. It implements a Kubernetes-native controller pattern using Custom Resource Definitions (CRDs). This mode offers superior scalability, automatic retries, and fine-grained RBAC integration.
-
Kubernetes Executor (Deprecated): The legacy Kubernetes execution mode.
-
-
Log Manager (local development only): Subscribes to job status and log events on NATS, aggregates per-job logs in memory, and exposes them through an HTTP API (
GET /jobs,/jobs/:id/logs,/jobs/:id/status, default port8081). Run viamake runfor local workflows; not currently part of the Docker compose stack or the production Helm deployment.
Hades processes jobs through a sequence of well-defined steps:
- Job Submission: Jobs are submitted to the API, defining a series of steps to execute.
- Queuing: The job is queued in NATS for asynchronous processing.
- Scheduling: The scheduler picks up the job and schedules it on the appropriate executor.
- Execution: Each step of the job runs in its own container, with steps sharing data through a common volume.
- Completion: Upon completion, results are stored and made available through the API.
- Docker and Docker Compose for local development
- Kubectl and a Kubernetes cluster for production deployment
- Minikube for local Kubernetes testing (optional)
To run Hades in Docker mode for local development:
-
Clone the repository:
git clone https://github.com/yourusername/Hades.git cd Hades -
Copy the
.env.examplefile to.env(the default configuration uses Docker as the executor, so no changes are necessary for local testing):cp .env.example .env
-
Start the Hades services:
-
All components in the CLI (NATS still runs in Docker):
make runThis launches
HadesAPI,HadesScheduler, andHadesLogManagerviago runand streams their logs to the terminal. PressCtrl-Cto stop them; runmake docker-stopto also shut NATS down. -
Full stack in Docker:
make docker-runUse
make docker-logsto follow the output andmake docker-stopto tear the stack down.
-
For production deployments, Hades is designed to run natively within a Kubernetes cluster using Helm. This is the recommended way to achieve full scalability and reliability.
-
Prerequisites:
- A Kubernetes cluster (v1.25+)
- Helm (v3.12+) installed locally.
-
Deployment: We provide a comprehensive Helm Chart that packages the API, Scheduler, and NATS broker. The scheduler uses a
ServiceAccountto manage job lifecycles within the cluster.# Quick install helm repo add nats https://nats-io.github.io/k8s/helm/charts helm dependency build ./helm/hades/ helm upgrade --install hades ./helm/hades -n hades --create-namespace
-
Detailed Documentation: For advanced configuration (Ingress, TLS, resource limits) and step-by-step setup, please refer to the: Hades Helm Chart Guide
Here's an example of submitting a basic job to Hades:
{
"name": "Example Job",
"metadata": {
"GLOBAL": "test"
},
"steps": [
{
"id": 1,
"name": "Hello World",
"image": "alpine:latest",
"script": "echo 'Hello, Hades!'"
}
]
}Submit this job using:
curl -X POST -H "Content-Type: application/json" -d @job.json http://localhost:8080/buildFor more complex workflows, you can define multi-step jobs where each step runs in a different container:
{
"name": "Multi-Step Example",
"steps": [
{
"id": 1,
"name": "Step 1",
"image": "alpine:latest",
"script": "echo 'Setting up environment...' > /shared/output.txt"
},
{
"id": 2,
"name": "Step 2",
"image": "ubuntu:latest",
"script": "cat /shared/output.txt && echo 'Processing data...' >> /shared/output.txt"
},
{
"id": 3,
"name": "Step 3",
"image": "python:3.9-alpine",
"script": "cat /shared/output.txt && echo 'Finalizing...' >> /shared/output.txt && cat /shared/output.txt"
}
]
}Hades can be configured through environment variables or a .env file:
| Variable | Description | Default |
|---|---|---|
HADES_EXECUTOR |
Execution platform: docker or k8s |
docker |
CONCURRENCY |
Number of jobs to process concurrently | 1 |
API_PORT |
Port for the Hades API | 8080 |
A top-level Makefile wraps the most common development tasks. Run make help to see every target.
| Target | Purpose |
|---|---|
make run |
Run HadesAPI, HadesScheduler, and HadesLogManager locally via go run (NATS auto-starts in Docker). |
make run-api / make run-scheduler / make run-logmanager / make run-operator |
Run a single component locally via go run. |
make docker-run / make docker-stop / make docker-logs |
Start, stop, or tail the full docker compose stack. |
make docker-run-api / make docker-run-scheduler / make docker-run-nats |
Start an individual service via docker compose. |
make build |
Compile every Go module in the workspace. |
make docker-build |
Build all Hades container images. |
make test |
Run unit tests across every Go module. |
make test-race |
Same as make test with the race detector. |
make cover |
Generate and open the HadesAPI coverage report. |
make test-operator / make test-operator-e2e |
Run HadesOperator envtest unit tests, or Kind-based e2e tests. |
make fmt / make lint |
Format code with gofmt or run go vet. |
make vuln |
Run govulncheck (auto-installs it on first use). |
make deps-check / make deps-update / make deps-tidy |
List outdated direct dependencies, bump them, or run go mod tidy across all modules. |
make helm-deps |
Refresh the Helm chart subchart lock file. |
make ci |
Mirror the CI run locally (lint + test). |
Tests live alongside the code in each module, and CI (.github/workflows/ci.yml) currently runs the shared and HadesAPI suites on every push and pull request.
The HadesOperator e2e target requires Kind to be installed locally.
For production deployments in a VM:
-
Ensure you have Docker installed in the VM
-
Copy the
.env.examplefile to.envand update the configuration:cp .env.example .env
-
Change the
LETSENCRYPT_EMAILvariable to your email address in your.envfile. -
Change the
HADES_API_HOSTvariable to domain name or your IP address in your.envfile. -
Create Traefik configuration files
touch traefik/acme.json chmod 600 traefik/acme.json
-
Deploy Hades:
docker compose -f compose.yml -f docker-compose.deploy.yml up -d
Hades includes Ansible playbooks for automated deployment.
See the ansible/hades/README.md file for more details.
Hades uses Renovate (configured in renovate.json) to open automated PRs for dependency updates across Go modules, Helm charts, Docker base images, and GitHub Actions.
Prefer merging Renovate PRs whenever possible so lock files and changelog links stay consistent.
For manual checks (for example before cutting a release), the workspace is wired up through the top-level Makefile:
make deps-check # list outdated direct dependencies in every Go module
make deps-update # bump direct deps in every module and run go mod tidy
make helm-deps # refresh helm/hades/Chart.lock
make vuln # run govulncheck across every moduleAfter running make deps-update, verify the workspace still builds and tests pass:
make build
make testMajor-version upgrades (for example sigs.k8s.io/controller-runtime v0.22 -> v0.24, or any /v2, /v3 import path bump) often contain breaking API changes and should be reviewed one module at a time rather than via a blanket make deps-update.
Docker base images in the per-component Dockerfiles are tracked by Renovate; for a manual bump, look up the latest tag on the relevant registry and edit the FROM line.
┌─────────┐ ┌─────────┐ ┌───────────────┐
│ │ jobs │ │ jobs │ │
│ API │────────▶│ NATS │─────────▶│ Scheduler │
│ │ │ Queue │ │ │
└─────────┘ └────┬────┘ └───────┬───────┘
▲ │
status │ logs ▼
│ ┌──────────┴──────────┐
┌──────┴──────┐ │ │
│ │ ▼ ▼
│ Log │ ┌─────────────┐ ┌─────────────────┐
│ Manager │ │ Docker │ │ Kubernetes │
│ (HTTP API) │ │ Executor │ │ / Operator │
│ │ └─────────────┘ └─────────────────┘
└─────────────┘
- Special thanks to all contributors who have helped shape Hades
- Inspired by the need for a lightweight, scalable job execution system in educational environments
- Built with Go, Docker, Kubernetes, and NATS