You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some SQLMesh models query "external" tables that were created by code unavailable to SQLMesh. External models are a special model kind used to store metadata about those external tables.
3
+
4
+
External models allow SQLMesh to provide column-level lineage and data type information for external tables queried with SELECT *.
5
+
6
+
## Generating external models schema
7
+
External models consist of an external table's schema information stored in the `schema.yaml` file in the SQLMesh project's root directory.
8
+
9
+
You can add schema information to the file by either (i) writing the YAML by hand or (ii) allowing SQLMesh to query the external table and add the information itself with the create_external_models CLI command.
10
+
11
+
Consider this example FULL model that queries an external table external_db.external_table:
12
+
13
+
```sql
14
+
MODEL (
15
+
name my_db.my_table,
16
+
kind FULL
17
+
);
18
+
19
+
SELECT
20
+
*
21
+
FROM
22
+
external_db.external_table;
23
+
```
24
+
25
+
The following sections demonstrate how to create an external model containing metadata about external_db.external_table, which contains columns column_a and column_b.
26
+
27
+
## Writing YAML by hand
28
+
This example demonstrates how the schema.yaml file should be formatted.
29
+
30
+
```yaml
31
+
- name: external_db.external_table
32
+
description: An external table
33
+
columns:
34
+
column_a: int
35
+
column_b: text
36
+
```
37
+
38
+
All the external models in a SQLMesh project are stored in one schema.yaml file. The file might look like this with an additional external model:
39
+
40
+
```yaml
41
+
- name: external_db.external_table
42
+
description: An external table
43
+
columns:
44
+
column_a: int
45
+
column_b: text
46
+
- name: external_db.external_table_2
47
+
description: Another external table
48
+
columns:
49
+
column_c: bool
50
+
column_d: float
51
+
```
52
+
53
+
### Using the create_external_models CLI command
54
+
Instead of writing the external model YAML by hand, SQLMesh can create it for you with the [create_external_models](../../../reference/cli#create_external_models) CLI command.
55
+
56
+
The command locates all external tables queried in your SQLMesh project, executes the queries, and infers the tables' column names and types from the results.
57
+
58
+
It then writes that information to the schema.yaml file.
Copy file name to clipboardExpand all lines: docs/concepts/models/model_kinds.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ Models of the `INCREMENTAL_BY_TIME_RANGE` kind are computed incrementally based
8
8
9
9
Only missing time intervals are processed during each execution for `INCREMENTAL_BY_TIME_RANGE` models. This is in contrast to the [FULL](#full) model kind, where the entire dataset is recomputed every time the model is executed.
10
10
11
-
An `INCREMENTAL_BY_TIME_RANGE` model query must contain an expression in its SQL `WHERE` clause that filters the upstream records by time range. SQLMesh provides special macros that represent the start and end of the time range being processed: `@start_date` / `@end_date` and `@start_ds` / `@end_ds`.
11
+
An `INCREMENTAL_BY_TIME_RANGE` model query must contain an expression in its SQL `WHERE` clause that filters the upstream records by time range. SQLMesh provides special macros that represent the start and end of the time range being processed: `@start_date` / `@end_date` and `@start_ds` / `@end_ds`.
12
12
13
13
Refer to [Macros](../macros.md#predefined-variables) for more information.
14
14
@@ -30,7 +30,7 @@ WHERE
30
30
```
31
31
32
32
### Time column
33
-
SQLMesh needs to know which column in the model's output represents the timestamp or date associated with each record.
33
+
SQLMesh needs to know which column in the model's output represents the timestamp or date associated with each record.
34
34
35
35
The `time_column` is used to determine which records will be overridden during data [restatement](../plans.md#restatement-plans) and provides a partition key for engines that support partitioning (such as Apache Spark):
36
36
@@ -85,7 +85,7 @@ WHERE
85
85
```
86
86
87
87
### Idempotency
88
-
It is recommended that queries of models of this kind are [idempotent](../glossary.md#idempotency) to prevent unexpected results during data [restatement](../plans.md#restatement-plans).
88
+
It is recommended that queries of models of this kind are [idempotent](../glossary.md#idempotency) to prevent unexpected results during data [restatement](../plans.md#restatement-plans).
89
89
90
90
Note, however, that upstream models and tables can impact a model's idempotency. For example, referencing an upstream model of kind [FULL](#full) in the model query automatically causes the model to be non-idempotent.
91
91
@@ -167,7 +167,7 @@ Depending on the target engine, models of the `INCREMENTAL_BY_UNIQUE_KEY` kind a
167
167
| DuckDB | not supported |
168
168
169
169
## FULL
170
-
Models of the `FULL` kind cause the dataset associated with a model to be fully refreshed (rewritten) upon each model evaluation.
170
+
Models of the `FULL` kind cause the dataset associated with a model to be fully refreshed (rewritten) upon each model evaluation.
171
171
172
172
The `FULL` model kind is somewhat easier to use than incremental kinds due to the lack of special settings or additional query considerations. This makes it suitable for smaller datasets, where recomputing data from scratch is relatively cheap and doesn't require preservation of processing history. However, using this kind with datasets containing a large volume of records will result in significant runtime and compute costs.
173
173
@@ -201,7 +201,7 @@ Depending on the target engine, models of the `FULL` kind are materialized using
201
201
| DuckDB | CREATE OR REPLACE TABLE |
202
202
203
203
## VIEW
204
-
The model kinds described so far cause the output of a model query to be materialized and stored in a physical table.
204
+
The model kinds described so far cause the output of a model query to be materialized and stored in a physical table.
205
205
206
206
The `VIEW` kind is different, because no data is actually written during model execution. Instead, a non-materialized view (or "virtual table") is created or replaced based on the model's query.
207
207
@@ -222,7 +222,7 @@ FROM db.employees;
222
222
```
223
223
224
224
## EMBEDDED
225
-
Embedded models are a way to share common logic between different models of other kinds.
225
+
Embedded models are a way to share common logic between different models of other kinds.
226
226
227
227
There are no data assets (tables or views) associated with `EMBEDDED` models in the data warehouse. Instead, an `EMBEDDED` model's query is injected directly into the query of each downstream model that references it.
228
228
@@ -240,3 +240,6 @@ FROM db.employees;
240
240
241
241
## SEED
242
242
The `SEED` model kind is used to specify [seed models](./seed_models.md) for using static CSV datasets in your SQLMesh project.
243
+
244
+
## EXTERNAL
245
+
The EXTERNAL model kind is used to specify [external models](./external_models.md) that store metadata about external tables. External models are special; they are not specified in .sql files like the other model kinds. They are optional but useful for propagating column and type information for external tables queried in your SQLMesh project.
0 commit comments