Hades: A Scalable Job Scheduler for Container Workloads

Welcome to Hades, a robust job scheduler designed with scalability in mind. Hades' primary mission is to provide a straightforward, scalable, and adaptable solution for executing containerized workloads in various environments, from educational programming courses to research computing clusters.

Design Goals

Hades embodies several core design principles:

Simplicity: Hades focuses on delivering just the essentials required to execute containerized jobs efficiently, without unnecessary complexity.
Scalability: Hades has scalability at its core, capable of queuing and executing a vast number of jobs in parallel, making it ideal for large-scale operations.
Container-Based: Hades executes jobs within containers, ensuring a high level of isolation and security between workloads.
Kubernetes Native: As a Kubernetes-native solution, Hades leverages the power and flexibility of Kubernetes as its primary execution platform for production workloads.
Extensibility: Hades is designed to be highly extensible, allowing for easy integration with other execution platforms and workflow systems as needed.

Architecture

Hades is built upon the following key components:

API: Serving as the main entry point, the API handles all incoming job requests and provides status information.
Queue: Using NATS as a message queue, this component is responsible for managing the queue of jobs, ensuring efficient scheduling and reliable delivery.
Scheduler: The scheduler orchestrates the execution of jobs, coordinating with the executor components to run each job step in the appropriate environment.
- Docker Executor: Designed for local development, the Docker executor is responsible for running jobs within Docker containers on a single host.
- Hades Operator (Recommended): The modern, production-ready standard for Kubernetes. It implements a Kubernetes-native controller pattern using Custom Resource Definitions (CRDs). This mode offers superior scalability, automatic retries, and fine-grained RBAC integration.
- Kubernetes Executor (Deprecated): The legacy Kubernetes execution mode.
Log Manager (local development only): Subscribes to job status and log events on NATS, aggregates per-job logs in memory, and exposes them through an HTTP API (GET /jobs, /jobs/:id/logs, /jobs/:id/status, default port 8081). Run via make run for local workflows; not currently part of the Docker compose stack or the production Helm deployment.

How It Works

Hades processes jobs through a sequence of well-defined steps:

Job Submission: Jobs are submitted to the API, defining a series of steps to execute.
Queuing: The job is queued in NATS for asynchronous processing.
Scheduling: The scheduler picks up the job and schedules it on the appropriate executor.
Execution: Each step of the job runs in its own container, with steps sharing data through a common volume.
Completion: Upon completion, results are stored and made available through the API.

Getting Started

Prerequisites

Docker and Docker Compose for local development
Kubectl and a Kubernetes cluster for production deployment
Minikube for local Kubernetes testing (optional)

Running in Docker Mode

To run Hades in Docker mode for local development:

Clone the repository:

git clone https://github.com/yourusername/Hades.git
cd Hades

Copy the .env.example file to .env (the default configuration uses Docker as the executor, so no changes are necessary for local testing):
```
cp .env.example .env
```
Start the Hades services:
- All components in the CLI (NATS still runs in Docker):
```
make run
```
  This launches HadesAPI, HadesScheduler, and HadesLogManager via go run and streams their logs to the terminal. Press Ctrl-C to stop them; run make docker-stop to also shut NATS down.
- Full stack in Docker:
```
make docker-run
```
  Use make docker-logs to follow the output and make docker-stop to tear the stack down.

Running in Kubernetes Mode

For production deployments, Hades is designed to run natively within a Kubernetes cluster using Helm. This is the recommended way to achieve full scalability and reliability.

Prerequisites:
- A Kubernetes cluster (v1.25+)
- Helm (v3.12+) installed locally.

Deployment: We provide a comprehensive Helm Chart that packages the API, Scheduler, and NATS broker. The scheduler uses a ServiceAccount to manage job lifecycles within the cluster.

# Quick install
helm repo add nats https://nats-io.github.io/k8s/helm/charts
helm dependency build ./helm/hades/
helm upgrade --install hades ./helm/hades -n hades --create-namespace

Detailed Documentation: For advanced configuration (Ingress, TLS, resource limits) and step-by-step setup, please refer to the: Hades Helm Chart Guide

Usage Examples

Creating a Simple Job

Here's an example of submitting a basic job to Hades:

{
  "name": "Example Job",
  "metadata": {
    "GLOBAL": "test"
  },
  "steps": [
    {
      "id": 1,
      "name": "Hello World",
      "image": "alpine:latest",
      "script": "echo 'Hello, Hades!'"
    }
  ]
}

Submit this job using:

curl -X POST -H "Content-Type: application/json" -d @job.json http://localhost:8080/build

Multi-Step Job Example

For more complex workflows, you can define multi-step jobs where each step runs in a different container:

{
  "name": "Multi-Step Example",
  "steps": [
    {
      "id": 1,
      "name": "Step 1",
      "image": "alpine:latest",
      "script": "echo 'Setting up environment...' > /shared/output.txt"
    },
    {
      "id": 2,
      "name": "Step 2",
      "image": "ubuntu:latest",
      "script": "cat /shared/output.txt && echo 'Processing data...' >> /shared/output.txt"
    },
    {
      "id": 3,
      "name": "Step 3",
      "image": "python:3.9-alpine",
      "script": "cat /shared/output.txt && echo 'Finalizing...' >> /shared/output.txt && cat /shared/output.txt"
    }
  ]
}

Configuration Options

Hades can be configured through environment variables or a .env file:

Variable	Description	Default
`HADES_EXECUTOR`	Execution platform: `docker` or `k8s`	`docker`
`CONCURRENCY`	Number of jobs to process concurrently	`1`
`API_PORT`	Port for the Hades API	`8080`

Development Workflow

A top-level Makefile wraps the most common development tasks. Run make help to see every target.

Target	Purpose
`make run`	Run `HadesAPI`, `HadesScheduler`, and `HadesLogManager` locally via `go run` (NATS auto-starts in Docker).
`make run-api` / `make run-scheduler` / `make run-logmanager` / `make run-operator`	Run a single component locally via `go run`.
`make docker-run` / `make docker-stop` / `make docker-logs`	Start, stop, or tail the full docker compose stack.
`make docker-run-api` / `make docker-run-scheduler` / `make docker-run-nats`	Start an individual service via docker compose.
`make build`	Compile every Go module in the workspace.
`make docker-build`	Build all Hades container images.
`make test`	Run unit tests across every Go module.
`make test-race`	Same as `make test` with the race detector.
`make cover`	Generate and open the HadesAPI coverage report.
`make test-operator` / `make test-operator-e2e`	Run HadesOperator envtest unit tests, or Kind-based e2e tests.
`make fmt` / `make lint`	Format code with `gofmt` or run `go vet`.
`make vuln`	Run `govulncheck` (auto-installs it on first use).
`make deps-check` / `make deps-update` / `make deps-tidy`	List outdated direct dependencies, bump them, or run `go mod tidy` across all modules.
`make helm-deps`	Refresh the Helm chart subchart lock file.
`make ci`	Mirror the CI run locally (`lint` + `test`).

Tests live alongside the code in each module, and CI (.github/workflows/ci.yml) currently runs the shared and HadesAPI suites on every push and pull request. The HadesOperator e2e target requires Kind to be installed locally.

Deployment

Deploy into a VM

For production deployments in a VM:

Ensure you have Docker installed in the VM
Copy the .env.example file to .env and update the configuration:
```
cp .env.example .env
```
Change the LETSENCRYPT_EMAIL variable to your email address in your .env file.
Change the HADES_API_HOST variable to domain name or your IP address in your .env file.

Create Traefik configuration files

touch traefik/acme.json
chmod 600 traefik/acme.json

Deploy Hades:

docker compose -f compose.yml -f docker-compose.deploy.yml up -d

Ansible Deployment

Hades includes Ansible playbooks for automated deployment. See the ansible/hades/README.md file for more details.

Dependency Management

Hades uses Renovate (configured in renovate.json) to open automated PRs for dependency updates across Go modules, Helm charts, Docker base images, and GitHub Actions. Prefer merging Renovate PRs whenever possible so lock files and changelog links stay consistent.

For manual checks (for example before cutting a release), the workspace is wired up through the top-level Makefile:

make deps-check     # list outdated direct dependencies in every Go module
make deps-update    # bump direct deps in every module and run go mod tidy
make helm-deps      # refresh helm/hades/Chart.lock
make vuln           # run govulncheck across every module

After running make deps-update, verify the workspace still builds and tests pass:

make build
make test

Major-version upgrades (for example sigs.k8s.io/controller-runtime v0.22 -> v0.24, or any /v2, /v3 import path bump) often contain breaking API changes and should be reviewed one module at a time rather than via a blanket make deps-update.

Docker base images in the per-component Dockerfiles are tracked by Renovate; for a manual bump, look up the latest tag on the relevant registry and edit the FROM line.

High-Level Architecture Diagram

┌─────────┐         ┌─────────┐          ┌───────────────┐
│         │ jobs    │         │  jobs    │               │
│  API    │────────▶│  NATS   │─────────▶│  Scheduler    │
│         │         │ Queue   │          │               │
└─────────┘         └────┬────┘          └───────┬───────┘
                         ▲                       │
                  status │ logs                  ▼
                         │            ┌──────────┴──────────┐
                  ┌──────┴──────┐     │                     │
                  │             │     ▼                     ▼
                  │    Log      │  ┌─────────────┐    ┌─────────────────┐
                  │   Manager   │  │   Docker    │    │   Kubernetes    │
                  │  (HTTP API) │  │  Executor   │    │  / Operator     │
                  │             │  └─────────────┘    └─────────────────┘
                  └─────────────┘

Acknowledgments

Special thanks to all contributors who have helped shape Hades
Inspired by the need for a lightweight, scalable job execution system in educational environments
Built with Go, Docker, Kubernetes, and NATS

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.github		.github
.idea		.idea
.vscode		.vscode
HadesAPI		HadesAPI
HadesLogManager		HadesLogManager
HadesScheduler		HadesScheduler
ansible/hades		ansible/hades
bruno		bruno
helm/hades		helm/hades
shared		shared
traefik		traefik
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Readme.md		Readme.md
compose.yml		compose.yml
docker-compose.deploy.yml		docker-compose.deploy.yml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.k8s.yml		docker-compose.k8s.yml
docker-compose.test.yml		docker-compose.test.yml
go.work		go.work
go.work.sum		go.work.sum
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hades: A Scalable Job Scheduler for Container Workloads

Design Goals

Architecture

How It Works

Getting Started

Prerequisites

Running in Docker Mode

Running in Kubernetes Mode

Usage Examples

Creating a Simple Job

Multi-Step Job Example

Configuration Options

Development Workflow

Deployment

Deploy into a VM

Ansible Deployment

Dependency Management

High-Level Architecture Diagram

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hades: A Scalable Job Scheduler for Container Workloads

Design Goals

Architecture

How It Works

Getting Started

Prerequisites

Running in Docker Mode

Running in Kubernetes Mode

Usage Examples

Creating a Simple Job

Multi-Step Job Example

Configuration Options

Development Workflow

Deployment

Deploy into a VM

Ansible Deployment

Dependency Management

High-Level Architecture Diagram

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages