Real-time Streaming Practice Lab

A self-contained lab to practice building real-time streaming pipelines using Kafka, Flink, Schema Registry, MinIO (S3-compatible storage), Iceberg, and DynamoDB Local.

Use this repository as a ready-to-run scaffold for experimenting with modern event-driven data platforms.

Why This Exists

In order to learn Apache Flink locally, this project began as a personal sandbox. Once everything was operational, I made the decision to make it publicly available and include short lessons so that others could learn about data streaming as well. The objective is straightforward: get introduced to any technology in 100 seconds and immediately have a working environment to experiment with. New to streaming? Work through the guided lessons in lessons/, starting from Track 1 – Introduction.

Quickstart TL;DR

Start everything
```
docker compose up -d
```
Open the main UIs
- Kafka UI: http://localhost:8080
- Flink Dashboard: http://localhost:8081
- MinIO Console: http://localhost:9001
- DynamoDB Admin: http://localhost:8001
Check that data is flowing
- In Kafka UI, topics orders.raw and orders.enriched show new messages arriving.
- In Flink Dashboard, the streaming job is in a RUNNING state and processing records.
- In MinIO Console, under the warehouse path, new data files appear for the Iceberg orders_enriched table.

Architecture

flowchart LR
  subgraph Producers
    P["Python Producer<br/>(Order Events)"]
  end

  subgraph "Message Broker"
    K["Kafka"]
    SR["Schema Registry"]
    K <--> SR
  end

  subgraph "Stream Processing"
    F["Apache Flink<br/>(Enrich + Aggregate)"]
  end

  subgraph "Storage"
    KE["Kafka<br/>orders.enriched"]
    ICE["Iceberg<br/>(on MinIO)"]
    DDB["DynamoDB<br/>customer_state"]
  end

  P --> K
  K --> F
  F --> KE
  F --> ICE
  F --> DDB

Data Flow

Event Generation: Python producer generates order events and publishes to Kafka (orders.raw)
Schema Validation: Schema Registry ensures messages follow the Avro schema
Stream Processing: Flink consumes, enriches (adds EUR conversion, timestamps), and routes to:
- Kafka (orders.enriched) for downstream consumers
- Iceberg Tables (on MinIO) for analytics - view in MinIO Console
- DynamoDB for real-time per-customer state aggregations
Observability: Prometheus + Grafana for metrics and dashboards

Web UIs

Service	URL	Credentials
Kafka UI	localhost:8080	-
Flink Dashboard	localhost:8081	-
MinIO Console	localhost:9001	admin / password
Grafana	localhost:3000	admin / admin
Prometheus	localhost:9090	-
DynamoDB Admin	localhost:8001	-

Prerequisites

Docker and Docker Compose
At least 8GB RAM allocated to Docker
Ports available: 3000, 8000, 8080-8082, 8181, 9001-9002, 9090, 9092

Common Commands

make up          # Start all services
make down        # Stop all services
make clean       # Stop and delete all data
make logs        # Follow logs from all services
make status      # Show container status and URLs
make build-flink # Build the Flink job JAR
make rebuild     # Clean rebuild everything

Learning Path

This lab includes a set of short, focused lessons that make it easy to learn streaming step by step while reusing this project as a scaffold for your own experiments.

Start with Track 1 – Lesson 1: Introduction and follow the Lesson Index in that file for the full list of Track 1 and Track 2 lessons.

Project Structure

├── docker-compose.yml      # All services configuration
├── flink-job/              # Flink streaming job (Java)
├── python-producer/        # Order event generator
├── schemas/                # Avro schemas for orders
├── scripts/                # Setup scripts
├── sql/                    # Flink SQL scripts (Iceberg)
├── lessons/                # Learning materials
├── grafana/                # Grafana provisioning
├── prometheus/             # Prometheus config
└── terraform/              # Optional LocalStack config

Troubleshooting

Port Already in Use

# Check what's using a port
lsof -i :8080

# Stop conflicting service or change ports in docker-compose.yml

Containers Not Starting

# Check container logs
docker compose logs kafka
docker compose logs flink-jobmanager

# Restart unhealthy services
docker compose restart kafka

Init Container Failing

# View init container logs
docker logs streamlab-init

# Re-run init manually
docker compose up -d streamlab-init

Not Enough Memory

Docker needs at least 8GB RAM. Check Docker Desktop settings.

# Check Docker resource usage
docker stats

Flink Job Not Running

# Check if JAR was built
ls -la jars/

# Check builder logs
docker logs streamlab-flink-job-builder

# Rebuild JAR manually if needed
make build-flink

# Restart init to resubmit
docker compose restart streamlab-init

Network Issues

All services use the streamlab-network. Verify it exists:

docker network ls | grep streamlab

Manual Setup (Optional)

If you prefer to run setup steps manually instead of using the init container:

# Create topics
./scripts/create-topics.sh

# Register schemas
./scripts/register-schemas.sh

# Create DynamoDB table
./scripts/create-dynamodb-table.sh

# Submit Flink job (via UI at http://localhost:8081)

License

MIT License - see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-time Streaming Practice Lab

Why This Exists

Quickstart TL;DR

Architecture

Data Flow

Web UIs

Prerequisites

Common Commands

Learning Path

Project Structure

Troubleshooting

Port Already in Use

Containers Not Starting

Init Container Failing

Not Enough Memory

Flink Job Not Running

Network Issues

Manual Setup (Optional)

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
flink-job		flink-job
grafana/provisioning/datasources		grafana/provisioning/datasources
lessons		lessons
prometheus		prometheus
python-producer		python-producer
schemas		schemas
scripts		scripts
sql		sql
terraform		terraform
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
data-pipeline.mmd		data-pipeline.mmd
docker-compose.yml		docker-compose.yml
flink-conf.yaml		flink-conf.yaml

License

igladun/events-home-lab

Folders and files

Latest commit

History

Repository files navigation

Real-time Streaming Practice Lab

Why This Exists

Quickstart TL;DR

Architecture

Data Flow

Web UIs

Prerequisites

Common Commands

Learning Path

Project Structure

Troubleshooting

Port Already in Use

Containers Not Starting

Init Container Failing

Not Enough Memory

Flink Job Not Running

Network Issues

Manual Setup (Optional)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages