A self-contained lab to practice building real-time streaming pipelines using Kafka, Flink, Schema Registry, MinIO (S3-compatible storage), Iceberg, and DynamoDB Local.
Use this repository as a ready-to-run scaffold for experimenting with modern event-driven data platforms.
In order to learn Apache Flink locally, this project began as a personal sandbox. Once everything was operational, I made the decision to make it publicly available and include short lessons so that others could learn about data streaming as well. The objective is straightforward: get introduced to any technology in 100 seconds and immediately have a working environment to experiment with.
New to streaming? Work through the guided lessons in lessons/, starting from Track 1 – Introduction.
-
Start everything
docker compose up -d
-
Open the main UIs
- Kafka UI:
http://localhost:8080 - Flink Dashboard:
http://localhost:8081 - MinIO Console:
http://localhost:9001 - DynamoDB Admin:
http://localhost:8001
- Kafka UI:
-
Check that data is flowing
- In Kafka UI, topics
orders.rawandorders.enrichedshow new messages arriving. - In Flink Dashboard, the streaming job is in a RUNNING state and processing records.
- In MinIO Console, under the warehouse path, new data files appear for the Iceberg
orders_enrichedtable.
- In Kafka UI, topics
flowchart LR
subgraph Producers
P["Python Producer<br/>(Order Events)"]
end
subgraph "Message Broker"
K["Kafka"]
SR["Schema Registry"]
K <--> SR
end
subgraph "Stream Processing"
F["Apache Flink<br/>(Enrich + Aggregate)"]
end
subgraph "Storage"
KE["Kafka<br/>orders.enriched"]
ICE["Iceberg<br/>(on MinIO)"]
DDB["DynamoDB<br/>customer_state"]
end
P --> K
K --> F
F --> KE
F --> ICE
F --> DDB
- Event Generation: Python producer generates order events and publishes to Kafka (
orders.raw) - Schema Validation: Schema Registry ensures messages follow the Avro schema
- Stream Processing: Flink consumes, enriches (adds EUR conversion, timestamps), and routes to:
- Kafka (
orders.enriched) for downstream consumers - Iceberg Tables (on MinIO) for analytics - view in MinIO Console
- DynamoDB for real-time per-customer state aggregations
- Kafka (
- Observability: Prometheus + Grafana for metrics and dashboards
| Service | URL | Credentials |
|---|---|---|
| Kafka UI | localhost:8080 | - |
| Flink Dashboard | localhost:8081 | - |
| MinIO Console | localhost:9001 | admin / password |
| Grafana | localhost:3000 | admin / admin |
| Prometheus | localhost:9090 | - |
| DynamoDB Admin | localhost:8001 | - |
- Docker and Docker Compose
- At least 8GB RAM allocated to Docker
- Ports available: 3000, 8000, 8080-8082, 8181, 9001-9002, 9090, 9092
make up # Start all services
make down # Stop all services
make clean # Stop and delete all data
make logs # Follow logs from all services
make status # Show container status and URLs
make build-flink # Build the Flink job JAR
make rebuild # Clean rebuild everythingThis lab includes a set of short, focused lessons that make it easy to learn streaming step by step while reusing this project as a scaffold for your own experiments.
Start with Track 1 – Lesson 1: Introduction and follow the Lesson Index in that file for the full list of Track 1 and Track 2 lessons.
├── docker-compose.yml # All services configuration
├── flink-job/ # Flink streaming job (Java)
├── python-producer/ # Order event generator
├── schemas/ # Avro schemas for orders
├── scripts/ # Setup scripts
├── sql/ # Flink SQL scripts (Iceberg)
├── lessons/ # Learning materials
├── grafana/ # Grafana provisioning
├── prometheus/ # Prometheus config
└── terraform/ # Optional LocalStack config
# Check what's using a port
lsof -i :8080
# Stop conflicting service or change ports in docker-compose.yml# Check container logs
docker compose logs kafka
docker compose logs flink-jobmanager
# Restart unhealthy services
docker compose restart kafka# View init container logs
docker logs streamlab-init
# Re-run init manually
docker compose up -d streamlab-initDocker needs at least 8GB RAM. Check Docker Desktop settings.
# Check Docker resource usage
docker stats# Check if JAR was built
ls -la jars/
# Check builder logs
docker logs streamlab-flink-job-builder
# Rebuild JAR manually if needed
make build-flink
# Restart init to resubmit
docker compose restart streamlab-initAll services use the streamlab-network. Verify it exists:
docker network ls | grep streamlabIf you prefer to run setup steps manually instead of using the init container:
# Create topics
./scripts/create-topics.sh
# Register schemas
./scripts/register-schemas.sh
# Create DynamoDB table
./scripts/create-dynamodb-table.sh
# Submit Flink job (via UI at http://localhost:8081)MIT License - see LICENSE for details.