Skip to content

Commit 5dd0412

Browse files
committed
update README
1 parent 99607a4 commit 5dd0412

File tree

1 file changed

+31
-69
lines changed
  • change_sorting_key_landing_data_source

1 file changed

+31
-69
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
# Tinybird Versions - Change Sorting Key to a Landing Data Source
22

3-
[Pull Request](https://github.com/tinybirdco/use-case-examples/pull/249)
3+
Changing the sorting key of a Data Source requires rewriting the entire table.
44

5-
> Remember to follow the [instructions](../README.md) to setup your Tinybird Data Project before jumping into the use-case steps
5+
You can opt-in by any of these two options:
66

7+
1. Create a Materialized View where the target Data Source has the desired sorting key.
8+
2. Create a new Data Source with the new sorting key, sync the data with the original table and then exchange both tables.
79

8-
1. Change the sorting key and partition key of the Data Source:
10+
## 1. New Materialized View
911

10-
`datasources/analytics_events.datasource`
12+
- Create a new Data Source
13+
14+
`datasources/analytics_events_sk.datasource`
1115

1216
```diff
1317
SCHEMA >
@@ -26,88 +30,46 @@
2630
ENGINE_TTL "timestamp + toIntervalDay(60)"
2731
```
2832

29-
2. Changing the sorting key or partition key requires to re-create the Data Source. Bump the **Major** Version to re-create the Data Source and leave the changes in a `Preview` release to execute backfill migrations in a further step.
30-
31-
`.tinyenv`
33+
- Create a new Materialized View `sync_analytics_events.pipe` filtering by a date in the future for backfilling purposes
3234

33-
```diff
34-
- VERSION=2.0.1
35-
+ VERSION=3.0.0
3635
```
37-
38-
You can read more about it at https://versions.tinybird.co/docs/version-control/deployment-strategies.html
39-
40-
3. We need to run a backfill to sync the current Live Release with the new Preview Release. To do so, we will use a Materialize View for syncing new incoming data and use a Copy Pipe to run the backfill.
41-
42-
As we need to read the data from the release v2.0.1, we will create the Materialize and Copy Pipe following the convention. `v2.0.1` -> `v2_0_1`. Therefore, `v2_0_1.analytics_events`
43-
44-
First, we will create a Materialized View (MV) named `pipes/syncing_analytics_events.pipe` with the following SQL.
45-
46-
This MV will start to sync data between release `2.0.1` to `3.0.0` after '2024-02-08 13:45:00' once it will be deployed.
47-
48-
49-
```sql
50-
NODE syncing_data
36+
NODE sync_analytics_events
5137
SQL >
52-
SELECT * FROM v2_0_1.analytics_events WHERE timestamp > '2024-02-08 13:45:00'
38+
SELECT * FROM analytics_events
39+
WHERE timestamp > '2024-05-28 00:00:00'
5340
5441
TYPE MATERIALIZED
55-
DATASOURCE analytics_events
42+
DATASOURCE analytics_events_sk
5643
```
5744

58-
> We will use a future timestamp to [avoid problems while doing the backfill](https://versions.tinybird.co/docs/version-control/backfill-strategies.html#the-challenge-of-backfilling-real-time-data)
59-
60-
Then, we will create a Copy Pipe `pipes/backfill_analytics_events.pipe` to run to move all the historical data until '2024-02-08 13:45:00', so we will create a Copy Pipe like the following.
45+
- Create a Copy Pipe `backfill_analytics_events.pipe` for backfilling.
6146

62-
```sql
63-
NODE syncing_data
47+
```
48+
NODE backfill_analytics_events
6449
SQL >
65-
%
66-
SELECT *
67-
FROM v2_0_1.analytics_events
68-
WHERE timestamp BETWEEN {{ DateTime(start) }} AND {{ DateTime(end) }}
69-
70-
TYPE COPY
71-
TARGET_DATASOURCE analytics_events
72-
COPY_SCHEDULE @on-demand
50+
%
51+
SELECT * FROM analytics_events
52+
WHERE timestamp BETWEEN {{DateTime(start_backfill_timestamp)}} AND {{DateTime(end_backfill_timestamp)}}
7353
```
7454

75-
Once deployed and we will pass the timestamp '2024-02-08 13:45:00', we will be able to run
55+
- Run CI and merge the Pull Request. To run the backfill do the following (you can test it in CI and then run it after merge):
7656

77-
```shell
78-
tb --semver 3.0.0 pipe copy run backfill_analytics_events --param start='1970-01-01 00:00:00' --param end='2024-02-08 13:45:00' --yes --wait
57+
- Wait for the first event ingested in `analytics_events_sk`
58+
- Run the backfill, you do it with a single command or by chungs:
7959
```
80-
81-
Finally, we will create a Pull Request with the changes from above as we have done in https://github.com/tinybirdco/use-case-examples/pull/249/files
82-
83-
84-
## CI Workflow
85-
86-
Once the CI Workflow will run successfully, we have to do
87-
88-
1. You authenticate into the branch
89-
2. As you are not appending data, you can run directly `tb --semver 3.0.0 pipe copy run backfill_analytics_events --param start='1970-01-01 00:00:00' --param end='2024-02-08 13:45:00' --yes --wait` or append some dummy data to validate the approach
90-
3. You can run some validations between 2.0.1 vs 3.0.0
91-
92-
```shell
93-
94-
tb --semver 3.0.0 sql "SELECT count() FROM analytics_events"
95-
96-
tb --semver 2.0.1 sql "SELECT count() FROM analytics_events"
60+
tb pipe copy run backfill_analytics_events --param start_backfill_timestamp='2024-01-01 00:00:00' --param end_backfill_timestamp='2024-05-28 00:00:00' --wait --yes
9761
```
9862

99-
4. Merge the Pull Request if everything goes fine
100-
63+
- Cleanup
10164

102-
## CD Worflow
65+
Once you finish the backfill you can remove the Copy Pipe and start using the new `analytics_events_sk` in your downstream dependencies.
10366

104-
Once the CD Workflow runs successfully, we have to do:
67+
## 2. New Data Source with exchange
10568

106-
1. You authenticate into your Live Release
107-
2. You will have to wait until '2024-02-08 13:45:00'
108-
3. Once it is '2024-02-08 13:45:00', you should start seeing data flowing into the Data Source `analytics_events` of the Preview Release
109-
4. Now, we should run the copy pipe `tb --semver 3.0.0 pipe copy run backfill_analytics_events --param start='1970-01-01 00:00:00' --param end='2024-02-08 13:45:00' --yes --wait`.
110-
5. Once the data is migrated, we can promote the release to Live at the UI or the CLI
69+
Follow the same steps as above, but once the backfill operation over `analytics_events_sk` is finished you can exchange the old and new Data Sources:
11170

71+
```
72+
tb datasource exchange analytics_events analytics_events_sk
73+
```
11274

113-
You can read https://versions.tinybird.co/docs/version-control/backfill-strategies.html#scenario-3-streaming-ingestion-with-incremental-timestamp-column in more details about this strategy and why
75+
The exchange command is useful to replace two tables among them when they have the same exact schema. The exchange command is experimental, contact us at `support@tinybird.co` to enable it on your main Workspace.

0 commit comments

Comments
 (0)