Skip to content

Real-time streaming project for learning modern data pipelines locally with Kafka, Flink, Schema Registry, MinIO/S3, Iceberg, and DynamoDB local. Lessons included to get familiar with all of these techs in 100 seconds

License

Notifications You must be signed in to change notification settings

igladun/events-home-lab

Repository files navigation

Docker License

Real-time Streaming Practice Lab

A self-contained lab to practice building real-time streaming pipelines using Kafka, Flink, Schema Registry, MinIO (S3-compatible storage), Iceberg, and DynamoDB Local.

Use this repository as a ready-to-run scaffold for experimenting with modern event-driven data platforms.

Why This Exists

In order to learn Apache Flink locally, this project began as a personal sandbox. Once everything was operational, I made the decision to make it publicly available and include short lessons so that others could learn about data streaming as well. The objective is straightforward: get introduced to any technology in 100 seconds and immediately have a working environment to experiment with. New to streaming? Work through the guided lessons in lessons/, starting from Track 1 – Introduction.

Quickstart TL;DR

  1. Start everything

    docker compose up -d
  2. Open the main UIs

    • Kafka UI: http://localhost:8080
    • Flink Dashboard: http://localhost:8081
    • MinIO Console: http://localhost:9001
    • DynamoDB Admin: http://localhost:8001
  3. Check that data is flowing

    • In Kafka UI, topics orders.raw and orders.enriched show new messages arriving.
    • In Flink Dashboard, the streaming job is in a RUNNING state and processing records.
    • In MinIO Console, under the warehouse path, new data files appear for the Iceberg orders_enriched table.

Architecture

flowchart LR
  subgraph Producers
    P["Python Producer<br/>(Order Events)"]
  end

  subgraph "Message Broker"
    K["Kafka"]
    SR["Schema Registry"]
    K <--> SR
  end

  subgraph "Stream Processing"
    F["Apache Flink<br/>(Enrich + Aggregate)"]
  end

  subgraph "Storage"
    KE["Kafka<br/>orders.enriched"]
    ICE["Iceberg<br/>(on MinIO)"]
    DDB["DynamoDB<br/>customer_state"]
  end

  P --> K
  K --> F
  F --> KE
  F --> ICE
  F --> DDB
Loading

Data Flow

  1. Event Generation: Python producer generates order events and publishes to Kafka (orders.raw)
  2. Schema Validation: Schema Registry ensures messages follow the Avro schema
  3. Stream Processing: Flink consumes, enriches (adds EUR conversion, timestamps), and routes to:
    • Kafka (orders.enriched) for downstream consumers
    • Iceberg Tables (on MinIO) for analytics - view in MinIO Console
    • DynamoDB for real-time per-customer state aggregations
  4. Observability: Prometheus + Grafana for metrics and dashboards

Web UIs

Service URL Credentials
Kafka UI localhost:8080 -
Flink Dashboard localhost:8081 -
MinIO Console localhost:9001 admin / password
Grafana localhost:3000 admin / admin
Prometheus localhost:9090 -
DynamoDB Admin localhost:8001 -

Prerequisites

  • Docker and Docker Compose
  • At least 8GB RAM allocated to Docker
  • Ports available: 3000, 8000, 8080-8082, 8181, 9001-9002, 9090, 9092

Common Commands

make up          # Start all services
make down        # Stop all services
make clean       # Stop and delete all data
make logs        # Follow logs from all services
make status      # Show container status and URLs
make build-flink # Build the Flink job JAR
make rebuild     # Clean rebuild everything

Learning Path

This lab includes a set of short, focused lessons that make it easy to learn streaming step by step while reusing this project as a scaffold for your own experiments.

Start with Track 1 – Lesson 1: Introduction and follow the Lesson Index in that file for the full list of Track 1 and Track 2 lessons.

Project Structure

├── docker-compose.yml      # All services configuration
├── flink-job/              # Flink streaming job (Java)
├── python-producer/        # Order event generator
├── schemas/                # Avro schemas for orders
├── scripts/                # Setup scripts
├── sql/                    # Flink SQL scripts (Iceberg)
├── lessons/                # Learning materials
├── grafana/                # Grafana provisioning
├── prometheus/             # Prometheus config
└── terraform/              # Optional LocalStack config

Troubleshooting

Port Already in Use

# Check what's using a port
lsof -i :8080

# Stop conflicting service or change ports in docker-compose.yml

Containers Not Starting

# Check container logs
docker compose logs kafka
docker compose logs flink-jobmanager

# Restart unhealthy services
docker compose restart kafka

Init Container Failing

# View init container logs
docker logs streamlab-init

# Re-run init manually
docker compose up -d streamlab-init

Not Enough Memory

Docker needs at least 8GB RAM. Check Docker Desktop settings.

# Check Docker resource usage
docker stats

Flink Job Not Running

# Check if JAR was built
ls -la jars/

# Check builder logs
docker logs streamlab-flink-job-builder

# Rebuild JAR manually if needed
make build-flink

# Restart init to resubmit
docker compose restart streamlab-init

Network Issues

All services use the streamlab-network. Verify it exists:

docker network ls | grep streamlab

Manual Setup (Optional)

If you prefer to run setup steps manually instead of using the init container:

# Create topics
./scripts/create-topics.sh

# Register schemas
./scripts/register-schemas.sh

# Create DynamoDB table
./scripts/create-dynamodb-table.sh

# Submit Flink job (via UI at http://localhost:8081)

License

MIT License - see LICENSE for details.

About

Real-time streaming project for learning modern data pipelines locally with Kafka, Flink, Schema Registry, MinIO/S3, Iceberg, and DynamoDB local. Lessons included to get familiar with all of these techs in 100 seconds

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published