Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/integrate/influxdb/cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Cloud to Cloud

The procedure for importing data from [InfluxDB Cloud] into [CrateDB Cloud] is
similar to the {ref}`standalone variant <influxdb-tutorial>`, with a few small
adjustments.

First, helpful aliases:
```shell
alias ctk="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest ctk"
alias crash="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest crash"
```

You will need credentials for both CrateDB and InfluxDB.
Use placeholders and/or environment variables (recommended) to avoid leaking
secrets in shell history.

:::{rubric} CrateDB Cloud
:::
- Host: `<CRATEDB_HOST>` (e.g., `cluster-id.eks1.eu-west-1.aws.cratedb.net`)
- Username: `<CRATEDB_USER>` (e.g., `admin`)
- Password: `<CRATEDB_PASSWORD>`

:::{rubric} InfluxDB Cloud
:::
- Host: `<INFLUXDB_HOST>` (e.g., `eu-central-1-1.aws.cloud2.influxdata.com`)
- Organization ID: `<INFLUXDB_ORG_ID>`
- All-Access API token: `<INFLUXDB_TOKEN>`

For CrateDB, the credentials are displayed at time of cluster creation.
For InfluxDB, they can be found in the [cloud platform] itself.

Now, same as before, import data from InfluxDB bucket/measurement into
CrateDB schema/table.
```shell
ctk load table \
"influxdb2://${INFLUX_ORG}:${INFLUX_TOKEN}@${INFLUX_HOST}/testdrive/demo?ssl=true" \
--cluster-url="crate://${CRATEDB_USER}:${CRATEDB_PASSWORD}@${CRATEDB_HOST}:4200/testdrive/demo?ssl=true"
```

:::{note}
Note the **necessary** `ssl=true` query parameter at the end of both database connection URLs
when working on Cloud-to-Cloud transfers.
:::

Verify that relevant data has been transferred to CrateDB.
```shell
crash --hosts "https://${CRATEDB_USER}:${CRATEDB_PASSWORD}@${CRATEDB_HOST}:4200" --command 'SELECT * FROM testdrive.demo;'
```


[cloud platform]: https://docs.influxdata.com/influxdb/cloud/admin
[CrateDB Cloud]: https://console.cratedb.cloud/
[InfluxDB Cloud]: https://cloud2.influxdata.com/
13 changes: 12 additions & 1 deletion docs/integrate/influxdb/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ fast query response times to build user interfaces, monitoring, and automation s
```shell
ctk load table \
"influxdb2://example:[email protected]:8086/testdrive/demo" \
--cratedb-sqlalchemy-url="crate://user:[email protected]:4200/testdrive/demo"
--cluster-url="crate://user:[email protected]:4200/testdrive/demo"
```

That's the blueprint for the InfluxDB URI:
Expand Down Expand Up @@ -62,4 +62,15 @@ Load InfluxDB collections into CrateDB.
:maxdepth: 1
:hidden:
Tutorial <tutorial>
cloud
model
:::


:::{note}
The InfluxDB I/O subsystem is based on the [influxio] package. See its
documentation for additional capabilities when working with InfluxDB.
:::


[influxio]: https://influxio.readthedocs.io/
17 changes: 17 additions & 0 deletions docs/integrate/influxdb/model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
(influxdb-data-model)=
# Data Model

InfluxDB stores time-series data in buckets and measurements; CrateDB stores
data in schemas and tables.

- A **bucket** is a named location with a retention policy where time series data is stored.
- A **series** is a logical grouping of data defined by a shared measurement and tag set (fields do not define series).
- A **measurement** is similar to an SQL database table.
- A **tag** is similar to an indexed column in an SQL database.
- A **field** is similar to a non-indexed column in an SQL database.
- A **point** is similar to an SQL row.

> Source: [What are series and bucket in InfluxDB]


[What are series and bucket in InfluxDB]: https://stackoverflow.com/questions/58190272/what-are-series-and-bucket-in-influxdb/69951376#69951376
190 changes: 63 additions & 127 deletions docs/integrate/influxdb/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Transfer data from InfluxDB bucket/measurement into CrateDB schema/table.
```shell
ctk load table \
"influxdb2://example:[email protected]:8086/testdrive/demo" \
--cratedb-sqlalchemy-url="crate://user:[email protected]:4200/testdrive/demo"
--cluster-url="crate://user:[email protected]:4200/testdrive/demo"
```
Query data in CrateDB.
```shell
Expand All @@ -25,168 +25,104 @@ Transfer data from InfluxDB line protocol file into CrateDB schema/table.
```shell
ctk load table \
"https://github.com/influxdata/influxdb2-sample-data/raw/master/air-sensor-data/air-sensor-data.lp" \
--cratedb-sqlalchemy-url="crate://user:[email protected]:4200/testdrive/air-sensor-data"
--cluster-url="crate://user:[email protected]:4200/testdrive/air-sensor-data"
```
Query data in CrateDB.
```shell
export CRATEPW=password
crash --host=cratedb.example.org --username=user --command='SELECT * FROM testdrive."air-sensor-data";'
```

## Prerequisites

## Data Model
Docker is used for running all components. This approach works consistently
across Linux, macOS, and Windows. Alternatively, you can use Podman.

InfluxDB stores time series data in buckets and measurements. CrateDB stores
data in schemas and tables.

- A **bucket** is a named location with a retention policy where time series data is stored.
- A **series** is a logical grouping of data defined by shared measurement, tag, and field.
- A **measurement** is similar to an SQL database table.
- A **tag** is similar to an indexed column in an SQL database.
- A **field** is similar to an un-indexed column in an SQL database.
- A **point** is similar to an SQL row.

> via: [What are series and bucket in InfluxDB]

## Tutorial

The tutorial heavily uses Docker to provide services and to run jobs.
Alternatively, you can use the drop-in replacement Podman.
The walkthrough uses basic example setup including InfluxDB 2.x and
a few samples worth of data that is being transferred to CrateDB.

### Services
Create a shared network.
```shell
docker network create cratedb-demo
```

Prerequisites are running instances of CrateDB and InfluxDB.
Start CrateDB.
```shell
docker run --rm --name=cratedb --network=cratedb-demo \
--publish=4200:4200 \
--volume="$PWD/var/lib/cratedb:/data" \
docker.io/crate -Cdiscovery.type=single-node
```

Start InfluxDB.
:::{code} shell
docker run --rm -it --name=influxdb \
```shell
docker run --rm --name=influxdb --network=cratedb-demo \
--publish=8086:8086 \
--env=DOCKER_INFLUXDB_INIT_MODE=setup \
--env=DOCKER_INFLUXDB_INIT_USERNAME=admin \
--env=DOCKER_INFLUXDB_INIT_PASSWORD=secret0000 \
--env=DOCKER_INFLUXDB_INIT_ORG=example \
--env=DOCKER_INFLUXDB_INIT_BUCKET=testdrive \
--env=DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=token \
--volume="$PWD/var/lib/influxdb2:/var/lib/influxdb2" \
influxdb:2
:::
docker.io/influxdb:2
```

Start CrateDB.
:::{code} shell
docker run --rm -it --name=cratedb \
--publish=4200:4200 \
--volume="$PWD/var/lib/cratedb:/data" \
crate:latest -Cdiscovery.type=single-node
:::
Prepare shortcuts for the CrateDB shell, CrateDB Toolkit, and the InfluxDB client
programs.

::::{tab-set}

### Sample Data
Command shortcuts.
:::{code} shell
:::{tab-item} Linux and macOS
To make the settings persistent, add them to your shell profile (`~/.profile`).
```shell
alias crash="docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash"
alias ctk="docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit ctk"
alias influx="docker exec influxdb influx"
alias influx-write="influx write --bucket=testdrive --org=example --token=token --precision=s"
```
:::
:::{tab-item} Windows PowerShell
To make the settings persistent, add them to your PowerShell profile (`$PROFILE`).
```powershell
function crash { docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash @args }
function ctk { docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit ctk @args }
function influx { docker exec influxdb influx @args }
function influx-write { influx write --bucket=testdrive --org=example --token=token --precision=s @args }
```
:::
:::{tab-item} Windows Command
```shell
doskey crash=docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash $*
doskey ctk=docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit ctk $*
doskey influx=docker exec influxdb influx $*
doskey influx-write=influx write --bucket=testdrive --org=example --token=token --precision=s $*
```
:::

::::

Write a few samples worth of data to InfluxDB.
:::{code} shell
## Usage

Write a few sample records to InfluxDB.
```shell
influx-write "demo,region=amazonas temperature=27.4,humidity=92.3,windspeed=4.5 1588363200"
influx-write "demo,region=amazonas temperature=28.2,humidity=88.7,windspeed=4.7 1588549600"
influx-write "demo,region=amazonas temperature=27.9,humidity=91.6,windspeed=3.2 1588736000"
influx-write "demo,region=amazonas temperature=29.1,humidity=88.1,windspeed=2.4 1588922400"
influx-write "demo,region=amazonas temperature=28.6,humidity=93.4,windspeed=2.9 1589108800"
:::

### Data Import

First, create these command aliases, for better UX.
:::{code} shell
alias crash="docker run --rm -it --link=cratedb ghcr.io/crate/cratedb-toolkit:latest crash"
alias ctk="docker run --rm -it --link=cratedb --link=influxdb ghcr.io/crate/cratedb-toolkit:latest ctk"
:::
```

Now, import data from InfluxDB bucket/measurement into CrateDB schema/table.
:::{code} shell
Invoke the data transfer pipeline, importing data from
InfluxDB bucket/measurement into CrateDB schema/table.
```shell
ctk load table \
"influxdb2://example:token@influxdb:8086/testdrive/demo" \
--cratedb-sqlalchemy-url="crate://crate@cratedb:4200/testdrive/demo"
:::

Verify that relevant data has been transferred to CrateDB.
:::{code} shell
crash --host=cratedb --command="SELECT * FROM testdrive.demo;"
:::

## Cloud to Cloud

The procedure for importing data from [InfluxDB Cloud] into [CrateDB Cloud] is
similar, with a few small adjustments.

First, helpful aliases again:
:::{code} shell
alias ctk="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest ctk"
alias crash="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest crash"
:::

You will need your credentials for both CrateDB and InfluxDB.
These are, with examples:

:::{rubric} CrateDB Cloud
:::
- Host: ```purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net```
- Username: ```admin```
- Password: ```dZ..qB```

:::{rubric} InfluxDB Cloud
:::
- Host: ```eu-central-1-1.aws.cloud2.influxdata.com```
- Organization ID: ```9fafc869a91a3406```
- All-Access API token: ```T2..==```

For CrateDB, the credentials are displayed at time of cluster creation.
For InfluxDB, they can be found in the [cloud platform] itself.

Now, same as before, import data from InfluxDB bucket/measurement into
CrateDB schema/table.
:::{code} shell
export CRATEPW='dZ..qB'
ctk load table \
"influxdb2://9f..06:[email protected]/testdrive/demo?ssl=true" \
--cratedb-sqlalchemy-url="crate://admin:${CRATEPW}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200/testdrive/demo?ssl=true"
:::

::: {note}
Note the **necessary** `ssl=true` query parameter at the end of both database connection URLs
when working on Cloud-to-Cloud transfers.
:::

Verify that relevant data has been transferred to CrateDB.
:::{code} shell
export CRATEPW='dZ..qB'
crash --hosts 'https://admin:${CRATEPW}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200' --command 'SELECT * FROM testdrive.demo;'
:::

## More information

There are more ways to apply the I/O subsystem of CrateDB Toolkit as
pipeline elements in your daily data operations routines. Please visit the
[CrateDB Toolkit InfluxDB I/O subsystem] documentation, to learn more about what's possible.

The InfluxDB I/O subsystem is based on the [influxio] package. See its
documentation for additional capabilities when working with InfluxDB.
--cluster-url="crate://crate@cratedb:4200/doc/testdrive"
```

:::{note}
**Important:** If you discover any issues with this adapter, please
[report them] back to us.
:::
Inspect data stored in CrateDB.
```shell
crash --hosts cratedb -c "SELECT * FROM doc.testdrive"
```
Comment on lines +120 to +123
Copy link

@coderabbitai coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update verification query to match the standardized target.

Query the table you just loaded into: testdrive.demo.

-```shell
-crash --hosts cratedb -c "SELECT * FROM doc.testdrive"
-```
+```shell
+crash --hosts cratedb -c "SELECT * FROM testdrive.demo"
+```
🤖 Prompt for AI Agents
In docs/integrate/influxdb/tutorial.md around lines 120 to 123, the verification
query references the wrong table and mislabels the fenced block; update the
shell block so the command queries the table you loaded into (testdrive.demo) by
replacing the existing crash command (SELECT * FROM doc.testdrive) with crash
--hosts cratedb -c "SELECT * FROM testdrive.demo" and ensure the fenced code
block uses the shell language marker and proper opening/closing backticks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: doc.testdrive is correct.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl Thank you for the correction! You're absolutely right - doc.testdrive is correct. The verification query should match the target specified in the --cluster-url parameter, which is doc/testdrive (schema/table). I apologize for the confusion in my initial suggestion.


✏️ Learnings added
Learnt from: amotl
PR: crate/cratedb-guide#255
File: docs/integrate/influxdb/tutorial.md:120-123
Timestamp: 2025-08-15T18:56:11.233Z
Learning: In CrateDB integration tutorials, when using `ctk load table` with `--cluster-url="crate://userhost:port/schema/table"`, the verification query should reference the target schema.table combination specified in the cluster URL, not derive it from the source system's naming. For example, if loading from InfluxDB bucket/measurement `testdrive/demo` to CrateDB `doc/testdrive`, the verification query should be `SELECT * FROM doc.testdrive`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.



[cloud platform]: https://docs.influxdata.com/influxdb/cloud/admin
[CrateDB]: https://github.com/crate/crate
[CrateDB Cloud]: https://console.cratedb.cloud/
[CrateDB Toolkit InfluxDB I/O subsystem]: https://cratedb-toolkit.readthedocs.io/io/influxdb/loader.html
[InfluxDB]: https://github.com/influxdata/influxdb
[InfluxDB Cloud]: https://cloud2.influxdata.com/
[influxio]: https://influxio.readthedocs.io/
[report them]: https://github.com/crate/cratedb-toolkit/issues
[What are series and bucket in InfluxDB]: https://stackoverflow.com/questions/58190272/what-are-series-and-bucket-in-influxdb/69951376#69951376
Loading