Skip to content

Commit fc240c6

Browse files
committed
Merge branch 'dev_doris_partition' of github.com:beyond-up/bitsail into dev_doris_partition
2 parents 37a4631 + 21ee69f commit fc240c6

File tree

89 files changed

+920
-653
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+920
-653
lines changed

README.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -89,13 +89,20 @@ In data synchronization scenarios, it covers batch, streaming, and incremental d
8989
In the Runtime layer, it supports multiple execution modes, such as yarn, local, and k8s is under development
9090

9191
## Supported Connectors
92+
9293
<table>
9394
<tr>
9495
<th>DataSource</th>
9596
<th>Sub Modules</th>
9697
<th>Reader</th>
9798
<th>Writer</th>
9899
</tr>
100+
<tr>
101+
<td>ClickHouse</td>
102+
<td>-</td>
103+
<td>✅</td>
104+
<td>-</td>
105+
</tr>
99106
<tr>
100107
<td>Doris</td>
101108
<td>-</td>
@@ -109,7 +116,7 @@ In the Runtime layer, it supports multiple execution modes, such as yarn, local,
109116
<td>✅</td>
110117
</tr>
111118
<tr>
112-
<td>ElasticSearch</td>
119+
<td>Elasticsearch</td>
113120
<td>-</td>
114121
<td> </td>
115122
<td>✅</td>
@@ -127,19 +134,19 @@ In the Runtime layer, it supports multiple execution modes, such as yarn, local,
127134
<td> </td>
128135
</tr>
129136
<tr>
130-
<td>Hive</td>
137+
<td>Hadoop</td>
131138
<td>-</td>
132139
<td>✅</td>
133140
<td>✅</td>
134141
</tr>
135142
<tr>
136-
<td>Hadoop</td>
143+
<td>HBase</td>
137144
<td>-</td>
138145
<td>✅</td>
139146
<td>✅</td>
140147
</tr>
141148
<tr>
142-
<td>Hbase</td>
149+
<td>Hive</td>
143150
<td>-</td>
144151
<td>✅</td>
145152
<td>✅</td>
@@ -177,6 +184,12 @@ In the Runtime layer, it supports multiple execution modes, such as yarn, local,
177184
<td>✅</td>
178185
<td>✅</td>
179186
</tr>
187+
<tr>
188+
<td>LarkSheet</td>
189+
<td>-</td>
190+
<td>✅</td>
191+
<td> </td>
192+
</tr>
180193
<tr>
181194
<td>MongoDB</td>
182195
<td>-</td>
@@ -201,18 +214,6 @@ In the Runtime layer, it supports multiple execution modes, such as yarn, local,
201214
<td> </td>
202215
<td>✅</td>
203216
</tr>
204-
<tr>
205-
<td>LarkSheet</td>
206-
<td>-</td>
207-
<td>✅</td>
208-
<td> </td>
209-
</tr>
210-
<tr>
211-
<td>Clickhouse</td>
212-
<td>-</td>
213-
<td>✅</td>
214-
<td> </td>
215-
</tr>
216217
</table>
217218

218219
Documentation for [Connectors](website/en/documents/connectors/README.md).

README_zh.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -72,14 +72,21 @@ BitSail目前已被广泛使用,并支持数百万亿的大流量场景。同时
7272

7373
在Runtime层,支持多种执行模式,比如yarn、local,k8s在开发中
7474

75-
## 支持连接器列表
75+
## 支持的连接器
76+
7677
<table>
7778
<tr>
7879
<th>DataSource</th>
7980
<th>Sub Modules</th>
8081
<th>Reader</th>
8182
<th>Writer</th>
8283
</tr>
84+
<tr>
85+
<td>ClickHouse</td>
86+
<td>-</td>
87+
<td>✅</td>
88+
<td>-</td>
89+
</tr>
8390
<tr>
8491
<td>Doris</td>
8592
<td>-</td>
@@ -93,7 +100,7 @@ BitSail目前已被广泛使用,并支持数百万亿的大流量场景。同时
93100
<td>✅</td>
94101
</tr>
95102
<tr>
96-
<td>ElasticSearch</td>
103+
<td>Elasticsearch</td>
97104
<td>-</td>
98105
<td> </td>
99106
<td>✅</td>
@@ -111,19 +118,19 @@ BitSail目前已被广泛使用,并支持数百万亿的大流量场景。同时
111118
<td> </td>
112119
</tr>
113120
<tr>
114-
<td>Hive</td>
121+
<td>Hadoop</td>
115122
<td>-</td>
116123
<td>✅</td>
117124
<td>✅</td>
118125
</tr>
119126
<tr>
120-
<td>Hadoop</td>
127+
<td>HBase</td>
121128
<td>-</td>
122129
<td>✅</td>
123130
<td>✅</td>
124131
</tr>
125132
<tr>
126-
<td>Hbase</td>
133+
<td>Hive</td>
127134
<td>-</td>
128135
<td>✅</td>
129136
<td>✅</td>
@@ -161,6 +168,12 @@ BitSail目前已被广泛使用,并支持数百万亿的大流量场景。同时
161168
<td>✅</td>
162169
<td>✅</td>
163170
</tr>
171+
<tr>
172+
<td>LarkSheet</td>
173+
<td>-</td>
174+
<td>✅</td>
175+
<td> </td>
176+
</tr>
164177
<tr>
165178
<td>MongoDB</td>
166179
<td>-</td>
@@ -185,18 +198,6 @@ BitSail目前已被广泛使用,并支持数百万亿的大流量场景。同时
185198
<td> </td>
186199
<td>✅</td>
187200
</tr>
188-
<tr>
189-
<td>LarkSheet</td>
190-
<td>-</td>
191-
<td>✅</td>
192-
<td> </td>
193-
</tr>
194-
<tr>
195-
<td>Clickhouse</td>
196-
<td>-</td>
197-
<td>✅</td>
198-
<td> </td>
199-
</tr>
200201
</table>
201202

202203
详情见:[Connectors详细文档](website/zh/documents/connectors/README.md).

website/en/documents/components/conversion/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Parent document: [bitsail-components](../README.md)
66

77
## Content
88

9-
When ***BitSail*** transmits data to a specified data source, it needs to convert the intermediate format (`bitsail rows`) used in the transmission process into a data type acceptable to the data source.
9+
When **BitSail** transmits data to a specified data source, it needs to convert the intermediate format (`bitsail rows`) used in the transmission process into a data type acceptable to the data source.
1010
This module provides convenient tools for converting.
1111

1212
- In this context, `bitsail rows` means `com.bytedance.bitsail.common.column.Column` data wrapped by `org.apache.flink.types.Row`

website/en/documents/components/format/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Parent document: [bitsail-components](../README.md)
66

77
## Content
88

9-
When ***BitSail*** uses flink as the engine, it uses `flink rows` as intermediate format.
9+
When **BitSail** uses flink as the engine, it uses `flink rows` as intermediate format.
1010
So developers need to convert data from data source into `flink rows`.
1111
This module offers convenient methods to convert some kinds of data into `flink rows`.
1212
The specific supported formats are as follows:

website/en/documents/connectors/README.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,24 @@ dir:
66

77
# Connectors
88

9-
***BitSail*** supports the following connectors for different data sources:
9+
**BitSail** supports the following connectors for different data sources:
1010

11+
- [ClickHouse connector](clickhouse/clickhouse.md)
12+
- [Doris connector](doris/doris.md)
13+
- [Druid connector](druid/druid.md)
1114
- [Elasticsearch connector](elasticsearch/elasticsearch.md)
1215
- [FTP/SFTP connector](ftp/ftp.md)
16+
- [FTP/SFTP-v1 connector](ftp/v1/ftp-v1.md)
1317
- [Hadoop connector](hadoop/hadoop.md)
14-
- [Hive connector](hive/hive.md)
1518
- [HBase connector](hbase/hbase.md)
19+
- [Hive connector](hive/hive.md)
1620
- [Hudi connector](hudi/hudi.md)
21+
- [JDBC connector](jdbc/jdbc.md)
1722
- [Kafka connector](kafka/kafka.md)
18-
- [RocketMQ connector](rocketmq/rocketmq.md)
19-
- [Redis connector](redis/redis-v1.md)
20-
- [MongoDB connector](mongodb/mongodb.md)
21-
- [Doris connector](doris/doris.md)
22-
- [StreamingFile connector (Hdfs streaming connector)](StreamingFile/StreamingFile.md)
23-
- [Jdbc connector](Jdbc/jdbc.md)
24-
- [LarkSheet connector](larksheet/larksheet.md)
2523
- [Kudu connector](kudu/kudu.md)
26-
- [Druid connector](druid/druid.md)
27-
24+
- [LarkSheet connector](larksheet/larksheet.md)
25+
- [MongoDB connector](mongodb/mongodb.md)
26+
- [Redis connector](redis/redis.md)
27+
- [Redis-v1 connector](redis/v1/redis-v1.md)
28+
- [RocketMQ connector](rocketmq/rocketmq.md)
29+
- [StreamingFile(HDFS streaming) connector](streamingfile/streamingfile.md)
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# ClickHouse connector example
2+
3+
Parent document: [ClickHouse connector](./clickhouse.md)
4+
5+
## ClickHouse configuration
6+
7+
Suppose the ClickHouse service is configured as:
8+
- JDBC URL: `jdbc:clickhouse://127.0.0.1:8123`
9+
10+
Account information:
11+
- Username: default
12+
- Password: 1234567
13+
14+
Target database and table:
15+
- Database name: default
16+
- Table name: test_ch_table
17+
18+
The table creation statement is:
19+
20+
```sql
21+
CREATE TABLE IF NOT EXISTS `default`.`test_ch_table` (
22+
`id` Int64,
23+
`int_type` Int32,
24+
`double_type` Float64,
25+
`string_type` String,
26+
`p_date` Date
27+
)
28+
ENGINE=MergeTree
29+
PARTITION BY toYYYYMM(p_date)
30+
PRIMARY KEY id
31+
```
32+
33+
Insert some test data:
34+
35+
```sql
36+
INSERT INTO `default`.`test_ch_table`
37+
(*)
38+
VALUES
39+
(1, 100001, 100.001, 'text_0001', '2020-01-01'),
40+
(2, 100002, 100.002, 'text_0002', '2020-01-02')
41+
```
42+
43+
## ClickHouse reader
44+
45+
Example task configuration to read the above ClickHouse table:
46+
47+
```json
48+
{
49+
"job": {
50+
"reader": {
51+
"class": "com.bytedance.bitsail.connector.clickhouse.source.ClickhouseSource",
52+
"jdbc_url": "jdbc:clickhouse://127.0.0.1:8123",
53+
"user_name": "default",
54+
"password": "1234567",
55+
"db_name": "default",
56+
"table_name": "test_ch_table",
57+
"split_field": "id",
58+
"split_config": "{\"lower_bound\": 0, \"upper_bound\": 10000, \"split_num\": 3}",
59+
"sql_filter": "( id % 2 == 0 )",
60+
"columns": [
61+
{
62+
"name": "id",
63+
"type": "int64"
64+
},
65+
{
66+
"name": "int_type",
67+
"type": "int32"
68+
},
69+
{
70+
"name": "double_type",
71+
"type": "float64"
72+
},
73+
{
74+
"name": "string_type",
75+
"type": "string"
76+
},
77+
{
78+
"name": "p_date",
79+
"type": "date"
80+
}
81+
]
82+
}
83+
}
84+
}
85+
```
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# ClickHouse connector
2+
3+
Parent document: [Connectors](../README.md)
4+
5+
**BitSail** ClickHouse connector can be used to read data in ClickHouse, mainly supports the following functions:
6+
7+
- Support batch reading of ClickHouse tables
8+
- JDBC Driver version: 0.3.2-patch11
9+
10+
## Maven dependency
11+
12+
```xml
13+
<dependency>
14+
<groupId>com.bytedance.bitsail</groupId>
15+
<artifactId>connector-clickhouse</artifactId>
16+
<version>${revision}</version>
17+
</dependency>
18+
```
19+
20+
## ClickHouse reader
21+
22+
### Supported data types
23+
24+
The following basic data types are supported:
25+
26+
- Int8
27+
- Int16
28+
- Int32
29+
- Int64
30+
- UInt8
31+
- UInt16
32+
- UInt32
33+
- UInt64
34+
- Float32
35+
- Float64
36+
- Decimal
37+
- Date
38+
- String
39+
40+
### Parameters
41+
42+
Read connector parameters are configured in `job.reader`, please pay attention to the path prefix when actually using it. Example of parameter configuration:
43+
44+
```json
45+
{
46+
"job": {
47+
"reader": {
48+
"class": "com.bytedance.bitsail.connector.clickhouse.source.ClickhouseSource",
49+
"jdbc_url": "jdbc:clickhouse://127.0.0.1:8123",
50+
"user_name": "default",
51+
"password": "1234567",
52+
"db_name": "default",
53+
"table_name": "test_ch_table",
54+
"split_field": "id",
55+
"split_config": "{\"lower_bound\": 0, \"upper_bound\": 10000, \"split_num\": 3}",
56+
"sql_filter": "( id % 2 == 0 )"
57+
}
58+
}
59+
}
60+
```
61+
62+
#### Required parameters
63+
64+
| Parameter name | Required | Optional value | Description |
65+
|:---------------|:---------|:-------------------------------------------------|:------------------------------|
66+
| class | yes | `com.bytedance.bitsail.connector.clickhouse.source.ClickhouseSource` | ClickHouse read connector type |
67+
| jdbc_url | yes | | JDBC connection address of ClickHouse |
68+
| db_name | yes | | ClickHouse library to read |
69+
| table_name | yes | | ClickHouse table to read |
70+
71+
<!--AGGREGATE<br/>DUPLICATE-->
72+
73+
#### Optional parameters
74+
75+
| Parameter name | Required | Optional value | Description |
76+
|:-------------------|:---------|:---------------|:---------------------------------------------------|
77+
| user_name | no | | Username to access ClickHouse services |
78+
| password | no | | The password of the above user |
79+
| split_field | no | | Batch query fields, only support Int8 - Int64 and UInt8 - UInt32 integer types |
80+
| split_config | no | | The configuration for batch query according to `split_field` field, including initial value, maximum value and query times, <p/> For example: `{"lower_bound": 0, "upper_bound": 10000, "split_num": 3}` |
81+
| sql_filter | no | | The filter condition of the query, such as `( id % 2 == 0 )`, will be spliced into the WHERE clause of the query SQL |
82+
| reader_parallelism_num | no | | ClickHouse reader parallelism num |
83+
84+
## Related documents
85+
86+
Configuration example: [ClickHouse connector example](./clickhouse-example.md)

0 commit comments

Comments
 (0)