@@ -75,32 +75,136 @@ pipeline = dlt.pipeline(
7575pipeline.run(source)
7676```
7777
78- ## Learn
78+ ## Supported features
79+
80+ ### Data loading
81+
82+ Data is loaded into CrateDB using the most efficient method depending on the data source.
83+
84+ - For local files, the ` psycopg2 ` library is used to directly load files into
85+ CrateDB tables using the ` INSERT ` command.
86+ - For files in remote storage like S3 or Azure Blob Storage,
87+ CrateDB data loading functions are used to read the files and insert the data into tables.
88+
89+ ### Datasets
90+
91+ Use ` dataset_name="doc" ` to address CrateDB's default schema ` doc ` .
92+ When addressing other schemas, make sure they contain at least one table. [ ^ create-schema ]
93+
94+ ### File formats
95+
96+ - The [ SQL INSERT file format] is the preferred format for both direct loading and staging.
97+
98+ ### Column types
99+
100+ The ` cratedb ` destination has a few specific deviations from the default SQL destinations.
101+
102+ - CrateDB does not support the ` time ` datatype. Time will be loaded to a ` text ` column.
103+ - CrateDB does not support the ` binary ` datatype. Binary will be loaded to a ` text ` column.
104+ - CrateDB can produce rounding errors under certain conditions when using the ` float/double ` datatype.
105+ Make sure to use the ` decimal ` datatype if you can’t afford to have rounding errors.
106+
107+ ### Column hints
108+
109+ CrateDB supports the following [ column hints] .
110+
111+ - ` primary_key ` - marks the column as part of the primary key. Multiple columns can have this hint to create a composite primary key.
112+
113+ ### File staging
114+
115+ CrateDB supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as file staging destinations.
116+
117+ ` dlt ` will upload CSV or JSONL files to the staging location and use CrateDB data loading functions
118+ to load the data directly from the staged files.
119+
120+ Please refer to the filesystem documentation to learn how to configure credentials for the staging destinations.
121+
122+ - [ AWS S3]
123+ - [ Azure Blob Storage]
124+ - [ Google Storage]
125+
126+ Invoke a pipeline with staging enabled.
127+
128+ ``` python
129+ pipeline = dlt.pipeline(
130+ pipeline_name = ' chess_pipeline' ,
131+ destination = ' cratedb' ,
132+ staging = ' filesystem' , # add this to activate staging
133+ dataset_name = ' chess_data'
134+ )
135+ ```
136+
137+ ### dbt support
138+
139+ Integration with [ dbt] is generally supported via [ dbt-cratedb2] but not tested by us.
140+
141+ ### dlt state sync
142+
143+ The CrateDB destination fully supports [ dlt state sync] .
144+
145+
146+ ## See also
147+
148+ :::{rubric} Examples
149+ :::
79150
80151::::{grid}
81152
153+ :::{grid-item-card} Usage guide: Load API data with dlt
154+ :link : dlt-usage
155+ :link-type: ref
156+ Exercise a canonical ` dlt init ` example with CrateDB.
157+ :::
158+
82159:::{grid-item-card} Examples: Use dlt with CrateDB
83160:link : https://github.com/crate/cratedb-examples/tree/main/framework/dlt
84161:link-type: url
85- Executable code examples that demonstrate how to use dlt with CrateDB.
162+ Executable code examples on GitHub that demonstrate how to use dlt with CrateDB.
163+ :::
164+
165+ ::::
166+
167+ :::{rubric} Resources
86168:::
87169
88- :::{grid-item-card} Adapter: The dlt destination adapter for CrateDB
89- :link : https://github.com/crate/dlt-cratedb
170+ ::::{grid}
171+
172+ :::{grid-item-card} Package: ` dlt-cratedb `
173+ :link : https://pypi.org/project/dlt-cratedb/
90174:link-type: url
91- Based on the dlt PostgreSQL adapter, the package enables you to work
92- with dlt and CrateDB .
175+ The dlt destination adapter for CrateDB is
176+ based on the dlt PostgreSQL adapter .
93177:::
94178
95- :::{grid-item-card} See also: ingestr
179+ :::{grid-item-card} Package: ` ingestr `
96180:link : ingestr
97181:link-type: ref
98- The ingestr data import/export application uses dlt.
182+ The ingestr data import/export application uses dlt as a workhorse .
99183:::
100184
101185::::
102186
103187
188+ :::{toctree}
189+ :maxdepth: 1
190+ :hidden:
191+ Usage <usage >
192+ :::
193+
194+
195+ [ ^ create-schema ] : CrateDB does not support ` CREATE SCHEMA ` yet, see [ CRATEDB-14601] .
196+ This means by default, unless tables exist within a schema, it appears to not exist,
197+ however, it also can't be explicitly created. Schemas are currently implicitly created
198+ when tables exist in them.
104199
200+ [ AWS S3 ] : https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#aws-s3
201+ [ Azure Blob Storage ] : https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#azure-blob-storage
202+ [ column hints ] : https://dlthub.com/docs/general-usage/schema#column-hint-rules
203+ [ CRATEDB-14601 ] : https://github.com/crate/crate/issues/14601
105204[ databases supported by SQLAlchemy ] : https://docs.sqlalchemy.org/en/20/dialects/
205+ [ dbt ] : https://dlthub.com/docs/hub/features/transformations/dbt-transformations
206+ [ dbt-cratedb2 ] : https://pypi.org/project/dbt-cratedb2/
106207[ dlt ] : https://dlthub.com/
208+ [ dlt state sync ] : https://dlthub.com/docs/general-usage/state#syncing-state-with-destination
209+ [ Google Storage ] : https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#google-storage
210+ [ SQL INSERT file format ] : https://dlthub.com/docs/dlt-ecosystem/file-formats/insert-format
0 commit comments