Demo CDC pipeline: PostgreSQL (OLTP) → Debezium → Kafka → ClickHouse (OLAP).
Spring Boot generator melempar transaksi e-commerce ke Postgres, Debezium menangkap perubahan via logical replication, lalu ClickHouse meng-consume topik Kafka pakai Kafka engine + materialized view.
flowchart LR
APP[Spring Boot<br/>data generator]
PG[(PostgreSQL<br/>OLTP)]
DBZ[Debezium<br/>Kafka Connect]
K[(Apache Kafka<br/>shop.public.*)]
CH[(ClickHouse<br/>OLAP)]
APP -->|INSERT / UPDATE| PG
PG -->|logical replication<br/>via WAL + pgoutput| DBZ
DBZ -->|publish JSON<br/>+ unwrap SMT| K
K -->|Kafka engine table<br/>+ Materialized View| CH
- Podman 4+ (atau
podman-compose) — semua command pakaipodman compose curl,jq(opsional, untuk register connector)psqlclient (opsional)- Java 21 + Maven hanya kalau mau run generator dari host (
make generator-host); kalau via compose tidak perlu
# 1. Naikkan stack (Postgres, Kafka, Connect, ClickHouse, Generator)
# Pertama kali akan build image generator dari Dockerfile (~1-2 menit).
make up
# 2. Tunggu sampai Connect siap (~30 detik), lalu register connector
make register
# 3. Cek status connector — pastikan state RUNNING
make status
# 4. Pantau generator (sudah jalan otomatis di container)
make generator-logs
# 5. Buka ClickHouse client (terminal baru) dan jalankan query analytics
make chKalau code generator berubah, rebuild image-nya:
make rebuildGenerator juga punya UI terminal-style untuk eksplorasi data OLTP secara real-time. Buka di browser:
http://localhost:8080
Fitur:
- Top navigation: Customers / Products / Orders + shortcut tombol
1/2/3 - Pagination (
← / →arrow keys), pilih page size 25/50/100/250/500 - Sort by
createdAt(asc/desc) — atau bebas pakai query param - Auto-refresh tiap 5 detik di page pertama (data streaming dari
@Scheduledtick) - Live count badge per tab — flash kuning kalau ada baris baru
- Status pill warna-warni untuk order status (PLACED / PAID / SHIPPED / DELIVERED / CANCELLED)
Backend yang dipakai UI di atas (port 8080):
GET /api/customers
GET /api/products
GET /api/orders
Semua endpoint pakai pagination Spring Data (default size=100, sort createdAt DESC):
# default — 100 row terbaru
curl -s http://localhost:8080/api/orders | jq
# halaman ke-2, 50 row per halaman
curl -s "http://localhost:8080/api/orders?page=1&size=50" | jq
# sort ascending
curl -s "http://localhost:8080/api/customers?sort=createdAt,asc" | jq
# multi-sort
curl -s "http://localhost:8080/api/products?sort=category,asc&sort=price,desc" | jqResponse berbentuk Page<T> (Spring Data) — body utama di content[], metadata pagination di totalElements, totalPages, number, size.
Stop tanpa hapus data:
make downReset semua volume:
make cleanReplacingMergeTree tidak deduplikasi otomatis sampai background merge — gunakan modifier FINAL untuk view real-time, dan filter is_deleted = 0.
Top 5 produk by revenue:
SELECT
p.name AS product,
p.category AS category,
sum(oi.quantity) AS units_sold,
sum(oi.quantity * oi.unit_price) AS revenue
FROM order_items AS oi FINAL
JOIN products AS p FINAL ON p.id = oi.product_id
WHERE oi.is_deleted = 0 AND p.is_deleted = 0
GROUP BY p.name, p.category
ORDER BY revenue DESC
LIMIT 5;Order count per status:
SELECT status, count() AS jumlah
FROM orders FINAL
WHERE is_deleted = 0
GROUP BY status
ORDER BY jumlah DESC;Revenue per kota (join orders + customers + order_items):
SELECT
c.city AS city,
count(DISTINCT o.id) AS total_orders,
sum(oi.quantity * oi.unit_price) AS revenue
FROM order_items AS oi FINAL
JOIN orders AS o FINAL ON o.id = oi.order_id
JOIN customers AS c FINAL ON c.id = o.customer_id
WHERE o.is_deleted = 0 AND c.is_deleted = 0 AND oi.is_deleted = 0
GROUP BY c.city
ORDER BY revenue DESC
LIMIT 10;Order per menit (lihat real-time ingestion):
SELECT
toStartOfMinute(created_at) AS minute,
count() AS orders,
sum(total_amount) AS revenue
FROM orders FINAL
WHERE is_deleted = 0
GROUP BY minute
ORDER BY minute DESC
LIMIT 20;1. Cek topic Kafka terbentuk setelah register connector:
make topics
# harus muncul: shop.public.customers, shop.public.products, shop.public.orders, shop.public.order_items2. Cek isi tabel Postgres:
make psql
\dt
SELECT count(*) FROM orders;3. Cek data sampai di ClickHouse:
make ch
SELECT count() FROM orders;
SELECT count() FROM order_items;.
├── podman-compose.yml # Stack: postgres, kafka, connect, clickhouse
├── Makefile # Shortcut command
├── postgres/init.sql # Schema + REPLICA IDENTITY FULL + publication
├── clickhouse/init.sql # Kafka source tables + ReplacingMergeTree + MV
├── debezium/postgres-connector.json # Source connector dengan unwrap SMT
└── data-generator/ # Spring Boot 3 + JPA + Datafaker
├── pom.xml
└── src/main/...
- REPLICA IDENTITY FULL — supaya UPDATE/DELETE event di Debezium membawa
beforestate lengkap (bukan cuma PK). Cost-nya WAL lebih besar; untuk production biasanya pakaiDEFAULT+ PK saja. unwrapSMT — membuat payload jadi flat (langsung field-field tabel) sehingga ClickHouseJSONEachRowgampang parse. Dengandelete.handling.mode=rewritesetiap pesan punya kolom__deleted('true'/'false') alih-alih tombstone.time.precision.mode=connect— semua timestamp dikirim sebagai epoch millis (Int64), jadi ClickHouse pakaifromUnixTimestamp64Milliuntuk konversi.decimal.handling.mode=double— DECIMAL Postgres dikirim sebagai JSON number; di-cast keDecimal64(2)di MV. Untuk presisi penuh, ganti kestringdan parse manual.- ReplacingMergeTree(updated_at) — versi terbaru per primary key menang. Background merge mendeduplikasi; query ad-hoc pakai
FINALatauargMax(field, updated_at). - Kafka KRaft mode — tanpa Zookeeper, image apache/kafka 3.8.0.
- Connector REGISTRATION failed: replication slot already exists →
make cleanlalumake upulang, atau drop slot manual:SELECT pg_drop_replication_slot('debezium_shop'); - ClickHouse tidak menerima data → cek connector status
make status. Pastikan topik Kafka ada (make topics). Cek error:podman logs olap-clickhouse | grep -i kafka. - Generator gagal connect Postgres → host port di-set ke
15432(container internal tetap 5432) untuk menghindari bentrok dengan Postgres lokal di 5432 maupun port dev umum lain. Kalau 15432 juga sudah dipakai, ganti dipodman-compose.yml+data-generator/src/main/resources/application.yml+Makefile.