Skip to content

Commit eac089d

Browse files
hammerheadamotl
authored andcommitted
Trino: Index page and starter tutorial
1 parent 427d701 commit eac089d

File tree

3 files changed

+91
-0
lines changed

3 files changed

+91
-0
lines changed

docs/integrate/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ streamsets/index
6868
superset/index
6969
tableau/index
7070
telegraf/index
71+
trino/index
7172
:::
7273

7374

docs/integrate/trino/index.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
(trino)=
2+
# Trino
3+
4+
```{div} .float-right
5+
[![Trino logo](https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Trino-logo-w-bk.svg/330px-Trino-logo-w-bk.svg.png){height=60px loading=lazy}][Trino]
6+
```
7+
```{div} .clearfix
8+
```
9+
10+
11+
:::{rubric} About
12+
:::
13+
14+
[Trino] is a fast distributed SQL query engine for big data analytics
15+
that helps you explore your data universe.
16+
17+
:::{rubric} Learn
18+
:::
19+
20+
::::{grid} 2
21+
22+
:::{grid-item-card} Connecting to CrateDB in Trino
23+
:link: trino-tutorial
24+
:link-type: ref
25+
Learn how to configure Trino to run queries against CrateDB.
26+
:::
27+
28+
::::
29+
30+
:::{toctree}
31+
:maxdepth: 1
32+
:hidden:
33+
Tutorial <tutorial>
34+
:::
35+
36+
[Trino]: https://trino.io/

docs/integrate/trino/tutorial.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
(trino-tutorial)=
2+
# Connecting to CrateDB in Trino
3+
4+
[Trino](https://trino.io/) (formerly known as Presto SQL) is a distributed query engine, that allows running analytical queries across different data sources via SQL. One of those data sources can be CrateDB and this article is going to look at how to configure the connection.
5+
6+
## Prerequisites
7+
8+
We assume a Trino client/server installation is already in place as per [Trino’s installation instructions](https://trino.io/docs/current/installation.html).
9+
10+
For this post, I installed Trino on macOS using Homebrew with `brew install trino` and my installation directory is `/usr/local/Cellar/trino/375`. Depending on your installation method, there might be different ways to start the Trino server. For the sake of this post, I start it in my console from the installation directory with the command `./bin/trino-server run`. Your preferred way of starting might differ.
11+
12+
## Connector configuration
13+
14+
Due to CrateDB’s PostgreSQL protocol compatibility, we can make use of Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a new file `/usr/local/Cellar/trino/375/libexec/etc/catalog/postgresql.properties` to configure the connection:
15+
16+
```
17+
connector.name=postgresql
18+
connection-url=jdbc:postgresql://<CrateDB hostname>:5432/
19+
connection-user=<CrateDB username>
20+
connection-password=<CrateDB password>
21+
insert.non-transactional-insert.enabled=true
22+
```
23+
24+
Please replace the placeholders for the CrateDB hostname, username, and password to match your setup. Besides the connection details, the configuration has two particularities:
25+
26+
* No database name: With PostgreSQL, a JDBC connection URL usually ends with a database name. We intentionally omit the database name when connecting to CrateDB for compatibility reasons.
27+
CrateDB consists of a single database with multiple schemas, hence we do not specify a database name in the `connection-url`. If a database name is specified, you will run into an error message on certain operations (`ERROR: Table with more than 2 QualifiedName parts is not supported. Only <schema>.<tableName> works`).
28+
* Disabling transactions: Being a database with eventual consistency, CrateDB doesn’t support transactions. By default, the PostgreSQL connector will wrap `INSERT` queries into transactions and attempt to create a temporary table. We disable this behavior with the `insert.non-transactional-insert.enabled` parameter.
29+
30+
## Running queries against CrateDB
31+
32+
Once the PostgreSQL connector is configured, we can connect to the Trino server using its CLI:
33+
34+
```bash
35+
# schema refers to an existing CrateDB schema
36+
$ ./bin/trino --catalog postgresql --schema doc
37+
trino:doc>
38+
```
39+
40+
A `SHOW TABLES` query should successfully list all existing tables in the specified CrateDB schema and you can proceed with querying them.
41+
42+
As CrateDB differs in some aspects from PostgreSQL, there are a few particularities to consider for your queries:
43+
44+
* Querying `OBJECT` columns: Columns of the data type `OBJECT` can usually be queried using the bracket notation, e.g. `SELECT my_object_column['my_object_key'] FROM my_table`. In Trino’s SQL dialect, the identifier needs to be wrapped in double quotes, such as `SELECT "my_object_column['my_object_key']" FROM my_table`.
45+
* `INSERT` queries: When inserting, Trino addresses tables with `catalog_name.schema_name.table_name`, which currently isn't supported by CrateDB. Please see [crate/crate#12658](https://github.com/crate/crate/issues/12658) on addressing this issue.
46+
* Data types: Not all of Trino’s [data types](https://trino.io/docs/current/language/types.html) can be mapped to CrateDB data types and vice versa.
47+
* For creating tables, it can be advisable to run the `CREATE TABLE` statement directly in CrateDB. This approach is also recommended if you want to configure custom table settings, such as sharding, partitioning, or replication.
48+
* For querying tables, a strategy can be to create views preparing data in a Trino-compatible way. For example, when dealing with the `GEO_POINT` data type, using the functions `LONGITUDE` and `LATITUDE`, splitting `GEO_POINT` into two simple, numerical values.
49+
* Columns with data types that cannot be mapped are skipped by Trino when importing metadata. This means that such columns cannot be queried through Trino. Creating a view can be a workaround (see the previous bullet point).
50+
* There are [limitations in Trino](https://trino.io/docs/current/optimizer/pushdown.html) on what parts of a query are pushed down to the data source. Therefore, the performance of a query can decrease significantly when running it through Trino compared to running it on CrateDB directly.
51+
52+
## Conclusion
53+
54+
With a few parameter tweaks, Trino can successfully connect to CrateDB. The information presented in this post is the result of a short compatibility test and is likely not exhaustive. If you use Trino with CrateDB and are aware of any additional aspects, please let us know!

0 commit comments

Comments
 (0)