This example showcase how to integrate data from PostgreSQL to YTSaurus in 2 main modes:
- Snapshot mode, via staic tables
- Replication (CDC) mode, via sorted dynamic tables
Also we will run end to end docker compose sample with CDC real-time replication from postgres to YT.
Here's an updated Mermaid diagram with a structure and flow more similar to the visual style in the referenced example:
graph LR
subgraph Source
A[Postgres]
end
subgraph Load_Generation
B[Load Generator]
end
subgraph TRCLI
C[Replication from PG]
end
subgraph Destination
D[YTSaurus]
end
B -- Generate random CRUD load --> A
A -- CRUD Operations --> C
C -- Replicates Data --> D
classDef source fill:#dff,stroke:#000,stroke-width:2px,rx:5px,ry:5px;
classDef load fill:#ffefaa,stroke:#000,stroke-width:2px,rx:5px,ry:5px;
classDef replication fill:#aaf,stroke:#000,stroke-width:2px,rx:5px,ry:5px;
classDef destination fill:#afa,stroke:#000,stroke-width:2px,rx:5px,ry:5px;
class A source
class B load
class C replication
class D destination
This diagram introduces subgraph
elements for grouping, rounded boxes, and adjusted colors to resemble the style and structure of the reference image. Let me know if further adjustments are needed!
-
Postgres: A Postgres instance is used as the source of data changes.
- Database:
testdb
- User:
testuser
- Password:
testpassword
- Initialization: Data is seeded using
init.sql
.
- Database:
-
Transfer CLI: A Go-based application that replicates changes from Postgres to YT.
- Configuration: Reads changes from Postgres and sends them to YTSaurus tables.
-
YTSaurus: An open source big data platform for distributed storage and processing.
- Access URL: http://localhost:9981 - web UI
-
Load Generator: A CRUD load generator that performs operations on the Postgres database, which triggers CDC.
- Docker and Docker Compose installed on your machine.
-
Clone the Repository:
git clone https://github.com/doublecloud/transfer cd transfer/examples/pg2yt
-
Build and Run the Docker Compose:
docker-compose up --build
-
Access YT Saurus: Open your web browser and navigate to web UI to view resulted tables.
- Once the Docker containers are running, you can start performing CRUD operations on the Postgres database. The
load_gen
service will simulate these operations. - The
transfer
CLI will listen for changes in the Postgres database and replicate them to YT. - You can monitor the changes in YT using the YT UI.
transfer_cdc_embed.yaml
: Specifies the source (Postgres) and destination (YT) settings inside docker-composetransfer_dynamic.yaml
: Specifies configuration of CDC transfer outside docker-composetransfer_static.yaml
: Snapshot only configuration which delivery on-time copy to static tables.
Once docker compose up and running your will see main YT Saurus page:
Based on cdc configuration:
dst:
type: yt
params: |
{
"path": "//home/cdc", # HERE is a target path
"cluster": "yt-backend:80",
"cellbundle": "default",
"primarymedium": "default"
}
Transfer will create a folder inside //home/cdc
directory:
Here you can see 2 tables:
//home/cdc/__data_transfer_lsn
- system tables that use to track snapshot LSN-tracks to deduplicate in terms of failure//home/cdc/users
- actual table from postgres
Table consist all data automatically transfered and updated in real-time:
To stop the Docker containers, run:
docker-compose down
This example provides a complete end-to-end CDC solution using Postgres, YTSaurus, and a Transfer application. You can use it to demonstrate how data can be replicated from a relational database to a YTSaurus data platform for real-time processing.