A complete streaming analytics stack that captures MySQL changes via CDC, stores them in Apache Paimon format, and visualizes them with Rill dashboards.
- Real-Time CDC: Captures every MySQL change using Flink CDC
- Lake Storage: Stores data in Apache Paimon format on S3-compatible storage
- Live Dashboard: Rill analytics with automated catalog management
- Automated Fixes: Sidecar container handles DuckDB catalog prefix issues
- One Command Start: Everything runs with
docker compose up
MySQL β Flink CDC β Apache Paimon β MinIO β Rill Dashboard
β β
Manual inserts Analytics
Components:
- MySQL/MariaDB: Source database with sample product data
- Apache Flink: Real-time CDC processing engine
- Apache Paimon: Lake storage format optimized for streaming
- MinIO: S3-compatible object storage
- Rill: Modern analytics dashboard with DuckDB engine
- Rill Patcher: Automated sidecar handling catalog prefix issues
- Docker and Docker Compose
- 8GB+ RAM recommended
- Ports 3000, 3306, 8081, 9000-9001 available
git clone <your-repo>
cd flink_iceberg_anomaly_pipeline_paimon
docker compose up -d
./setup_cdc.sh
Navigate to: http://localhost:3000
The dashboard will show live data with automatic 60-second refresh.
Add new products to see live updates:
# Add some products
docker exec mariadb mysql -u root -prootpassword -e "
INSERT INTO mydatabase.products (name, price) VALUES
('New Product 1', 99.99),
('New Product 2', 199.99);"
# Check MySQL count
docker exec mariadb mysql -u root -prootpassword -e "SELECT COUNT(*) FROM mydatabase.products;"
# Wait 60 seconds for dashboard to refresh
# You'll see the updated count automatically!
- MySQL Changes: Any INSERT/UPDATE/DELETE in MySQL is captured
- Flink Processing: Flink CDC reads the MySQL binlog in real-time
- Paimon Storage: Changes are written to Paimon tables in MinIO
- Rill Dashboard: Visualizes data with 60-second refresh cycle
DuckDB creates random catalog prefixes (e.g., main8514e79c
) on startup. Our rill-patcher
sidecar:
- Waits for Rill to start
- Discovers the current catalog alias via SQL
- Patches the model file with the correct prefix
- Refreshes data every 60 seconds
- Re-patches if Rill restarts with a new prefix
- Optimized for streaming updates with ACID guarantees
- Supports both batch and streaming workloads
- Compatible with multiple query engines
- Efficient storage with automatic compaction
# Check all containers
docker ps
# Monitor CDC job
curl -s http://localhost:8081/jobs | jq
# Test Rill Dashboard API
curl -s "http://localhost:3000/v1/instances/default/query" \
-H "Content-Type: application/json" \
-d '{"sql":"SELECT COUNT(*) FROM paimon_products"}'
# View Paimon files in MinIO
docker exec minio mc ls --recursive local/warehouse/
# MySQL data
docker exec mariadb mysql -u root -prootpassword -e "SELECT COUNT(*) FROM mydatabase.products;"
# MinIO storage
docker exec minio mc ls --recursive local/warehouse/cdc_db.db/products_sink/
# Rill dashboard count
curl -s "http://localhost:3000/v1/instances/default/query" \
-H "Content-Type: application/json" \
-d '{"sql":"SELECT COUNT(*) FROM paimon_products"}' | jq '.data[0]'
βββ docker-compose.yml # Complete stack definition
βββ conf/
β βββ flink-conf.yaml # Flink configuration
βββ rill/
β βββ connectors/ # DuckDB S3 configuration
β βββ models/ # SQL model definitions
β βββ metrics/ # Metrics definitions
β βββ dashboards/ # Dashboard configs
βββ rill-patcher.sh # Automated catalog management
βββ duckdb/
β βββ test_s3.py # DuckDB query examples
βββ sql/
β βββ init.sql # MySQL initial data
β βββ setup_paimon_cdc.sql # CDC pipeline setup
βββ setup_cdc.sh # CDC initialization script
Flink Config (conf/flink-conf.yaml
):
- Configures Flink job manager and task manager
- Sets checkpointing intervals
- Defines S3/MinIO credentials
CDC Setup (sql/setup_paimon_cdc.sql
):
- Creates Paimon catalog
- Defines source MySQL table
- Creates sink Paimon table
- Starts CDC pipeline
CDC Pipeline not starting
# Check if the job started:
curl -s http://localhost:8081/jobs | jq
# If not, run setup again:
./setup_cdc.sh
No data in MinIO
# Check Flink job status
curl -s http://localhost:8081/jobs
# Restart CDC setup
./setup_cdc.sh
Verify data flow
# Check Flink job metrics
curl -s http://localhost:8081/jobs/<job-id>/metrics
# List Paimon files
docker exec minio mc ls local/warehouse/cdc_db.db/
# Complete reset
docker compose down -v
docker compose up -d
./setup_cdc.sh
# Wait 2-3 minutes for full initialization
Built with: Apache Flink β’ Apache Paimon β’ Rill β’ DuckDB β’