|
1 | 1 | # Tinybird Versions - Change Sorting Key to a Landing Data Source
|
2 | 2 |
|
3 |
| -[Pull Request](https://github.com/tinybirdco/use-case-examples/pull/249) |
| 3 | +Changing the sorting key of a Data Source requires rewriting the entire table. |
4 | 4 |
|
5 |
| -> Remember to follow the [instructions](../README.md) to setup your Tinybird Data Project before jumping into the use-case steps |
| 5 | +You can opt-in by any of these two options: |
6 | 6 |
|
| 7 | +1. Create a Materialized View where the target Data Source has the desired sorting key. |
| 8 | +2. Create a new Data Source with the new sorting key, sync the data with the original table and then exchange both tables. |
7 | 9 |
|
8 |
| -1. Change the sorting key and partition key of the Data Source: |
| 10 | +## 1. New Materialized View |
9 | 11 |
|
10 |
| -`datasources/analytics_events.datasource` |
| 12 | +- Create a new Data Source |
| 13 | + |
| 14 | +`datasources/analytics_events_sk.datasource` |
11 | 15 |
|
12 | 16 | ```diff
|
13 | 17 | SCHEMA >
|
|
26 | 30 | ENGINE_TTL "timestamp + toIntervalDay(60)"
|
27 | 31 | ```
|
28 | 32 |
|
29 |
| -2. Changing the sorting key or partition key requires to re-create the Data Source. Bump the **Major** Version to re-create the Data Source and leave the changes in a `Preview` release to execute backfill migrations in a further step. |
30 |
| - |
31 |
| -`.tinyenv` |
| 33 | +- Create a new Materialized View `sync_analytics_events.pipe` filtering by a date in the future for backfilling purposes |
32 | 34 |
|
33 |
| -```diff |
34 |
| -- VERSION=2.0.1 |
35 |
| -+ VERSION=3.0.0 |
36 | 35 | ```
|
37 |
| - |
38 |
| -You can read more about it at https://versions.tinybird.co/docs/version-control/deployment-strategies.html |
39 |
| - |
40 |
| -3. We need to run a backfill to sync the current Live Release with the new Preview Release. To do so, we will use a Materialize View for syncing new incoming data and use a Copy Pipe to run the backfill. |
41 |
| - |
42 |
| -As we need to read the data from the release v2.0.1, we will create the Materialize and Copy Pipe following the convention. `v2.0.1` -> `v2_0_1`. Therefore, `v2_0_1.analytics_events` |
43 |
| - |
44 |
| -First, we will create a Materialized View (MV) named `pipes/syncing_analytics_events.pipe` with the following SQL. |
45 |
| - |
46 |
| -This MV will start to sync data between release `2.0.1` to `3.0.0` after '2024-02-08 13:45:00' once it will be deployed. |
47 |
| - |
48 |
| - |
49 |
| -```sql |
50 |
| -NODE syncing_data |
| 36 | +NODE sync_analytics_events |
51 | 37 | SQL >
|
52 |
| - SELECT * FROM v2_0_1.analytics_events WHERE timestamp > '2024-02-08 13:45:00' |
| 38 | + SELECT * FROM analytics_events |
| 39 | + WHERE timestamp > '2024-05-28 00:00:00' |
53 | 40 |
|
54 | 41 | TYPE MATERIALIZED
|
55 |
| -DATASOURCE analytics_events |
| 42 | +DATASOURCE analytics_events_sk |
56 | 43 | ```
|
57 | 44 |
|
58 |
| -> We will use a future timestamp to [avoid problems while doing the backfill](https://versions.tinybird.co/docs/version-control/backfill-strategies.html#the-challenge-of-backfilling-real-time-data) |
59 |
| -
|
60 |
| -Then, we will create a Copy Pipe `pipes/backfill_analytics_events.pipe` to run to move all the historical data until '2024-02-08 13:45:00', so we will create a Copy Pipe like the following. |
| 45 | +- Create a Copy Pipe `backfill_analytics_events.pipe` for backfilling. |
61 | 46 |
|
62 |
| -```sql |
63 |
| -NODE syncing_data |
| 47 | +``` |
| 48 | +NODE backfill_analytics_events |
64 | 49 | SQL >
|
65 |
| - % |
66 |
| - SELECT * |
67 |
| - FROM v2_0_1.analytics_events |
68 |
| - WHERE timestamp BETWEEN {{ DateTime(start) }} AND {{ DateTime(end) }} |
69 |
| - |
70 |
| -TYPE COPY |
71 |
| -TARGET_DATASOURCE analytics_events |
72 |
| -COPY_SCHEDULE @on-demand |
| 50 | + % |
| 51 | + SELECT * FROM analytics_events |
| 52 | + WHERE timestamp BETWEEN {{DateTime(start_backfill_timestamp)}} AND {{DateTime(end_backfill_timestamp)}} |
73 | 53 | ```
|
74 | 54 |
|
75 |
| -Once deployed and we will pass the timestamp '2024-02-08 13:45:00', we will be able to run |
| 55 | +- Run CI and merge the Pull Request. To run the backfill do the following (you can test it in CI and then run it after merge): |
76 | 56 |
|
77 |
| -```shell |
78 |
| -tb --semver 3.0.0 pipe copy run backfill_analytics_events --param start='1970-01-01 00:00:00' --param end='2024-02-08 13:45:00' --yes --wait |
| 57 | +- Wait for the first event ingested in `analytics_events_sk` |
| 58 | +- Run the backfill, you do it with a single command or by chungs: |
79 | 59 | ```
|
80 |
| - |
81 |
| -Finally, we will create a Pull Request with the changes from above as we have done in https://github.com/tinybirdco/use-case-examples/pull/249/files |
82 |
| - |
83 |
| - |
84 |
| -## CI Workflow |
85 |
| - |
86 |
| -Once the CI Workflow will run successfully, we have to do |
87 |
| - |
88 |
| -1. You authenticate into the branch |
89 |
| -2. As you are not appending data, you can run directly `tb --semver 3.0.0 pipe copy run backfill_analytics_events --param start='1970-01-01 00:00:00' --param end='2024-02-08 13:45:00' --yes --wait` or append some dummy data to validate the approach |
90 |
| -3. You can run some validations between 2.0.1 vs 3.0.0 |
91 |
| - |
92 |
| -```shell |
93 |
| - |
94 |
| -tb --semver 3.0.0 sql "SELECT count() FROM analytics_events" |
95 |
| - |
96 |
| -tb --semver 2.0.1 sql "SELECT count() FROM analytics_events" |
| 60 | +tb pipe copy run backfill_analytics_events --param start_backfill_timestamp='2024-01-01 00:00:00' --param end_backfill_timestamp='2024-05-28 00:00:00' --wait --yes |
97 | 61 | ```
|
98 | 62 |
|
99 |
| -4. Merge the Pull Request if everything goes fine |
100 |
| - |
| 63 | +- Cleanup |
101 | 64 |
|
102 |
| -## CD Worflow |
| 65 | +Once you finish the backfill you can remove the Copy Pipe and start using the new `analytics_events_sk` in your downstream dependencies. |
103 | 66 |
|
104 |
| -Once the CD Workflow runs successfully, we have to do: |
| 67 | +## 2. New Data Source with exchange |
105 | 68 |
|
106 |
| -1. You authenticate into your Live Release |
107 |
| -2. You will have to wait until '2024-02-08 13:45:00' |
108 |
| -3. Once it is '2024-02-08 13:45:00', you should start seeing data flowing into the Data Source `analytics_events` of the Preview Release |
109 |
| -4. Now, we should run the copy pipe `tb --semver 3.0.0 pipe copy run backfill_analytics_events --param start='1970-01-01 00:00:00' --param end='2024-02-08 13:45:00' --yes --wait`. |
110 |
| -5. Once the data is migrated, we can promote the release to Live at the UI or the CLI |
| 69 | +Follow the same steps as above, but once the backfill operation over `analytics_events_sk` is finished you can exchange the old and new Data Sources: |
111 | 70 |
|
| 71 | +``` |
| 72 | +tb datasource exchange analytics_events analytics_events_sk |
| 73 | +``` |
112 | 74 |
|
113 |
| -You can read https://versions.tinybird.co/docs/version-control/backfill-strategies.html#scenario-3-streaming-ingestion-with-incremental-timestamp-column in more details about this strategy and why |
| 75 | +The exchange command is useful to replace two tables among them when they have the same exact schema. The exchange command is experimental, contact us at `support@tinybird.co` to enable it on your main Workspace. |
0 commit comments