The Lakehouse connector is a Pulsar IO connector for synchronizing data between Lakehouse (Delta Lake, Iceberg and Hudi) and Pulsar. It contains two types of connectors:
Lakehouse source connector
Currently support DeltaLake
This source connector can capture data changes from delta lake through DSR and writes data to Pulsar topics.
Lakehouse sink connector
Currently support DeltaLake
, Hudi
and Iceberg
.
This sink connector can consume pulsar topic data and write into Lakehouse and users can use other big-data engines to process the delta lake table data further.
Currently, Lakehouse connector versions (x.y.z) are based on Pulsar versions (x.y.z).
Delta connector version | Pulsar version | Doc |
---|---|---|
2.9.x | 2.9.2 | - Lakehouse source connector - Lakehouse sink connector |
Lakehouse Demos
Lakehouse | Demo |
---|---|
Delta Lake | Delta Lake Source and Sink Demo |
Below are the sub folders and files of this project and their corresponding descriptions.
├── conf // stores configuration examples.
├── docs // stores user guides.
├── src // stores source codes.
│ ├── checkstyle // stores checkstyle configuration files.
│ ├── license // stores license headers. You can use `mvn license:format` to format the project with the stored license header.
│ │ └── ALv2
│ ├── main // stores all main source files.
│ │ └── java
│ ├── spotbugs // stores spotbugs configuration files.
│ └── test // stores all related tests.
│
Requirements:
Compile and install without cloud dependency:
$ mvn clean install -DskipTests
Compile and install with cloud dependency (Including aws
, gcs
and azure
):
$ mvn clean install -P cloud -DskipTests
Run Unit Tests:
$ mvn test
Run Individual Unit Test:
$ mvn test -Dtest=unit-test-name (e.g: ParquetReaderTest)
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0