-
Notifications
You must be signed in to change notification settings - Fork 1
Integrate/InfluxDB: Simplify starter tutorial #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
46bf7cb
f85d1de
92dd621
96ad601
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Cloud to Cloud | ||
|
|
||
| The procedure for importing data from [InfluxDB Cloud] into [CrateDB Cloud] is | ||
| similar to the {ref}`standalone variant <influxdb-tutorial>`, with a few small | ||
| adjustments. | ||
|
|
||
| First, helpful aliases: | ||
| ```shell | ||
| alias ctk="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest ctk" | ||
| alias crash="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest crash" | ||
| ``` | ||
|
|
||
| You will need credentials for both CrateDB and InfluxDB. | ||
| Use placeholders and/or environment variables (recommended) to avoid leaking | ||
| secrets in shell history. | ||
|
|
||
| :::{rubric} CrateDB Cloud | ||
| ::: | ||
| - Host: `<CRATEDB_HOST>` (e.g., `cluster-id.eks1.eu-west-1.aws.cratedb.net`) | ||
| - Username: `<CRATEDB_USER>` (e.g., `admin`) | ||
| - Password: `<CRATEDB_PASSWORD>` | ||
|
|
||
| :::{rubric} InfluxDB Cloud | ||
| ::: | ||
| - Host: `<INFLUXDB_HOST>` (e.g., `eu-central-1-1.aws.cloud2.influxdata.com`) | ||
| - Organization ID: `<INFLUXDB_ORG_ID>` | ||
| - All-Access API token: `<INFLUXDB_TOKEN>` | ||
|
|
||
| For CrateDB, the credentials are displayed at time of cluster creation. | ||
| For InfluxDB, they can be found in the [cloud platform] itself. | ||
|
|
||
| Now, same as before, import data from InfluxDB bucket/measurement into | ||
| CrateDB schema/table. | ||
| ```shell | ||
| ctk load table \ | ||
| "influxdb2://${INFLUX_ORG}:${INFLUX_TOKEN}@${INFLUX_HOST}/testdrive/demo?ssl=true" \ | ||
| --cluster-url="crate://${CRATEDB_USER}:${CRATEDB_PASSWORD}@${CRATEDB_HOST}:4200/testdrive/demo?ssl=true" | ||
| ``` | ||
|
|
||
| :::{note} | ||
| Note the **necessary** `ssl=true` query parameter at the end of both database connection URLs | ||
| when working on Cloud-to-Cloud transfers. | ||
| ::: | ||
|
|
||
| Verify that relevant data has been transferred to CrateDB. | ||
| ```shell | ||
| crash --hosts "https://${CRATEDB_USER}:${CRATEDB_PASSWORD}@${CRATEDB_HOST}:4200" --command 'SELECT * FROM testdrive.demo;' | ||
| ``` | ||
|
|
||
|
|
||
| [cloud platform]: https://docs.influxdata.com/influxdb/cloud/admin | ||
| [CrateDB Cloud]: https://console.cratedb.cloud/ | ||
| [InfluxDB Cloud]: https://cloud2.influxdata.com/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,7 +30,7 @@ fast query response times to build user interfaces, monitoring, and automation s | |
| ```shell | ||
| ctk load table \ | ||
| "influxdb2://example:[email protected]:8086/testdrive/demo" \ | ||
| --cratedb-sqlalchemy-url="crate://user:[email protected]:4200/testdrive/demo" | ||
| --cluster-url="crate://user:[email protected]:4200/testdrive/demo" | ||
| ``` | ||
|
|
||
| That's the blueprint for the InfluxDB URI: | ||
|
|
@@ -62,4 +62,15 @@ Load InfluxDB collections into CrateDB. | |
| :maxdepth: 1 | ||
| :hidden: | ||
| Tutorial <tutorial> | ||
| cloud | ||
| model | ||
| ::: | ||
|
|
||
|
|
||
| :::{note} | ||
| The InfluxDB I/O subsystem is based on the [influxio] package. See its | ||
| documentation for additional capabilities when working with InfluxDB. | ||
| ::: | ||
|
|
||
|
|
||
| [influxio]: https://influxio.readthedocs.io/ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| (influxdb-data-model)= | ||
| # Data Model | ||
|
|
||
| InfluxDB stores time-series data in buckets and measurements; CrateDB stores | ||
| data in schemas and tables. | ||
|
|
||
| - A **bucket** is a named location with a retention policy where time series data is stored. | ||
| - A **series** is a logical grouping of data defined by a shared measurement and tag set (fields do not define series). | ||
| - A **measurement** is similar to an SQL database table. | ||
| - A **tag** is similar to an indexed column in an SQL database. | ||
| - A **field** is similar to a non-indexed column in an SQL database. | ||
| - A **point** is similar to an SQL row. | ||
|
|
||
| > Source: [What are series and bucket in InfluxDB] | ||
|
|
||
|
|
||
| [What are series and bucket in InfluxDB]: https://stackoverflow.com/questions/58190272/what-are-series-and-bucket-in-influxdb/69951376#69951376 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,7 +12,7 @@ Transfer data from InfluxDB bucket/measurement into CrateDB schema/table. | |
| ```shell | ||
| ctk load table \ | ||
| "influxdb2://example:[email protected]:8086/testdrive/demo" \ | ||
| --cratedb-sqlalchemy-url="crate://user:[email protected]:4200/testdrive/demo" | ||
| --cluster-url="crate://user:[email protected]:4200/testdrive/demo" | ||
| ``` | ||
| Query data in CrateDB. | ||
| ```shell | ||
|
|
@@ -25,168 +25,104 @@ Transfer data from InfluxDB line protocol file into CrateDB schema/table. | |
| ```shell | ||
| ctk load table \ | ||
| "https://github.com/influxdata/influxdb2-sample-data/raw/master/air-sensor-data/air-sensor-data.lp" \ | ||
| --cratedb-sqlalchemy-url="crate://user:[email protected]:4200/testdrive/air-sensor-data" | ||
| --cluster-url="crate://user:[email protected]:4200/testdrive/air-sensor-data" | ||
| ``` | ||
| Query data in CrateDB. | ||
| ```shell | ||
| export CRATEPW=password | ||
| crash --host=cratedb.example.org --username=user --command='SELECT * FROM testdrive."air-sensor-data";' | ||
| ``` | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| ## Data Model | ||
| Docker is used for running all components. This approach works consistently | ||
| across Linux, macOS, and Windows. Alternatively, you can use Podman. | ||
|
|
||
| InfluxDB stores time series data in buckets and measurements. CrateDB stores | ||
| data in schemas and tables. | ||
|
|
||
| - A **bucket** is a named location with a retention policy where time series data is stored. | ||
| - A **series** is a logical grouping of data defined by shared measurement, tag, and field. | ||
| - A **measurement** is similar to an SQL database table. | ||
| - A **tag** is similar to an indexed column in an SQL database. | ||
| - A **field** is similar to an un-indexed column in an SQL database. | ||
| - A **point** is similar to an SQL row. | ||
|
|
||
| > via: [What are series and bucket in InfluxDB] | ||
|
|
||
| ## Tutorial | ||
|
|
||
| The tutorial heavily uses Docker to provide services and to run jobs. | ||
| Alternatively, you can use the drop-in replacement Podman. | ||
| The walkthrough uses basic example setup including InfluxDB 2.x and | ||
| a few samples worth of data that is being transferred to CrateDB. | ||
|
|
||
| ### Services | ||
| Create a shared network. | ||
| ```shell | ||
| docker network create cratedb-demo | ||
| ``` | ||
|
|
||
| Prerequisites are running instances of CrateDB and InfluxDB. | ||
| Start CrateDB. | ||
| ```shell | ||
| docker run --rm --name=cratedb --network=cratedb-demo \ | ||
| --publish=4200:4200 \ | ||
| --volume="$PWD/var/lib/cratedb:/data" \ | ||
| docker.io/crate -Cdiscovery.type=single-node | ||
| ``` | ||
|
|
||
| Start InfluxDB. | ||
| :::{code} shell | ||
| docker run --rm -it --name=influxdb \ | ||
| ```shell | ||
| docker run --rm --name=influxdb --network=cratedb-demo \ | ||
| --publish=8086:8086 \ | ||
| --env=DOCKER_INFLUXDB_INIT_MODE=setup \ | ||
| --env=DOCKER_INFLUXDB_INIT_USERNAME=admin \ | ||
| --env=DOCKER_INFLUXDB_INIT_PASSWORD=secret0000 \ | ||
| --env=DOCKER_INFLUXDB_INIT_ORG=example \ | ||
| --env=DOCKER_INFLUXDB_INIT_BUCKET=testdrive \ | ||
| --env=DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=token \ | ||
| --volume="$PWD/var/lib/influxdb2:/var/lib/influxdb2" \ | ||
| influxdb:2 | ||
| ::: | ||
| docker.io/influxdb:2 | ||
| ``` | ||
|
|
||
| Start CrateDB. | ||
| :::{code} shell | ||
| docker run --rm -it --name=cratedb \ | ||
| --publish=4200:4200 \ | ||
| --volume="$PWD/var/lib/cratedb:/data" \ | ||
| crate:latest -Cdiscovery.type=single-node | ||
| ::: | ||
| Prepare shortcuts for the CrateDB shell, CrateDB Toolkit, and the InfluxDB client | ||
| programs. | ||
|
|
||
| ::::{tab-set} | ||
|
|
||
| ### Sample Data | ||
| Command shortcuts. | ||
| :::{code} shell | ||
| :::{tab-item} Linux and macOS | ||
| To make the settings persistent, add them to your shell profile (`~/.profile`). | ||
| ```shell | ||
| alias crash="docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash" | ||
| alias ctk="docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit ctk" | ||
| alias influx="docker exec influxdb influx" | ||
| alias influx-write="influx write --bucket=testdrive --org=example --token=token --precision=s" | ||
| ``` | ||
| ::: | ||
| :::{tab-item} Windows PowerShell | ||
| To make the settings persistent, add them to your PowerShell profile (`$PROFILE`). | ||
| ```powershell | ||
| function crash { docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash @args } | ||
| function ctk { docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit ctk @args } | ||
| function influx { docker exec influxdb influx @args } | ||
| function influx-write { influx write --bucket=testdrive --org=example --token=token --precision=s @args } | ||
| ``` | ||
| ::: | ||
| :::{tab-item} Windows Command | ||
| ```shell | ||
| doskey crash=docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash $* | ||
| doskey ctk=docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit ctk $* | ||
| doskey influx=docker exec influxdb influx $* | ||
| doskey influx-write=influx write --bucket=testdrive --org=example --token=token --precision=s $* | ||
| ``` | ||
| ::: | ||
|
|
||
| :::: | ||
|
|
||
| Write a few samples worth of data to InfluxDB. | ||
| :::{code} shell | ||
| ## Usage | ||
|
|
||
| Write a few sample records to InfluxDB. | ||
| ```shell | ||
| influx-write "demo,region=amazonas temperature=27.4,humidity=92.3,windspeed=4.5 1588363200" | ||
| influx-write "demo,region=amazonas temperature=28.2,humidity=88.7,windspeed=4.7 1588549600" | ||
| influx-write "demo,region=amazonas temperature=27.9,humidity=91.6,windspeed=3.2 1588736000" | ||
| influx-write "demo,region=amazonas temperature=29.1,humidity=88.1,windspeed=2.4 1588922400" | ||
| influx-write "demo,region=amazonas temperature=28.6,humidity=93.4,windspeed=2.9 1589108800" | ||
| ::: | ||
|
|
||
| ### Data Import | ||
|
|
||
| First, create these command aliases, for better UX. | ||
| :::{code} shell | ||
| alias crash="docker run --rm -it --link=cratedb ghcr.io/crate/cratedb-toolkit:latest crash" | ||
| alias ctk="docker run --rm -it --link=cratedb --link=influxdb ghcr.io/crate/cratedb-toolkit:latest ctk" | ||
| ::: | ||
| ``` | ||
|
|
||
| Now, import data from InfluxDB bucket/measurement into CrateDB schema/table. | ||
| :::{code} shell | ||
| Invoke the data transfer pipeline, importing data from | ||
| InfluxDB bucket/measurement into CrateDB schema/table. | ||
| ```shell | ||
| ctk load table \ | ||
| "influxdb2://example:token@influxdb:8086/testdrive/demo" \ | ||
| --cratedb-sqlalchemy-url="crate://crate@cratedb:4200/testdrive/demo" | ||
| ::: | ||
|
|
||
| Verify that relevant data has been transferred to CrateDB. | ||
| :::{code} shell | ||
| crash --host=cratedb --command="SELECT * FROM testdrive.demo;" | ||
| ::: | ||
|
|
||
| ## Cloud to Cloud | ||
|
|
||
| The procedure for importing data from [InfluxDB Cloud] into [CrateDB Cloud] is | ||
| similar, with a few small adjustments. | ||
|
|
||
| First, helpful aliases again: | ||
| :::{code} shell | ||
| alias ctk="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest ctk" | ||
| alias crash="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest crash" | ||
| ::: | ||
|
|
||
| You will need your credentials for both CrateDB and InfluxDB. | ||
| These are, with examples: | ||
|
|
||
| :::{rubric} CrateDB Cloud | ||
| ::: | ||
| - Host: ```purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net``` | ||
| - Username: ```admin``` | ||
| - Password: ```dZ..qB``` | ||
|
|
||
| :::{rubric} InfluxDB Cloud | ||
| ::: | ||
| - Host: ```eu-central-1-1.aws.cloud2.influxdata.com``` | ||
| - Organization ID: ```9fafc869a91a3406``` | ||
| - All-Access API token: ```T2..==``` | ||
|
|
||
| For CrateDB, the credentials are displayed at time of cluster creation. | ||
| For InfluxDB, they can be found in the [cloud platform] itself. | ||
|
|
||
| Now, same as before, import data from InfluxDB bucket/measurement into | ||
| CrateDB schema/table. | ||
| :::{code} shell | ||
| export CRATEPW='dZ..qB' | ||
| ctk load table \ | ||
| "influxdb2://9f..06:[email protected]/testdrive/demo?ssl=true" \ | ||
| --cratedb-sqlalchemy-url="crate://admin:${CRATEPW}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200/testdrive/demo?ssl=true" | ||
| ::: | ||
|
|
||
| ::: {note} | ||
| Note the **necessary** `ssl=true` query parameter at the end of both database connection URLs | ||
| when working on Cloud-to-Cloud transfers. | ||
| ::: | ||
|
|
||
| Verify that relevant data has been transferred to CrateDB. | ||
| :::{code} shell | ||
| export CRATEPW='dZ..qB' | ||
| crash --hosts 'https://admin:${CRATEPW}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200' --command 'SELECT * FROM testdrive.demo;' | ||
| ::: | ||
|
|
||
| ## More information | ||
|
|
||
| There are more ways to apply the I/O subsystem of CrateDB Toolkit as | ||
| pipeline elements in your daily data operations routines. Please visit the | ||
| [CrateDB Toolkit InfluxDB I/O subsystem] documentation, to learn more about what's possible. | ||
|
|
||
| The InfluxDB I/O subsystem is based on the [influxio] package. See its | ||
| documentation for additional capabilities when working with InfluxDB. | ||
| --cluster-url="crate://crate@cratedb:4200/doc/testdrive" | ||
| ``` | ||
|
|
||
| :::{note} | ||
| **Important:** If you discover any issues with this adapter, please | ||
| [report them] back to us. | ||
| ::: | ||
| Inspect data stored in CrateDB. | ||
| ```shell | ||
| crash --hosts cratedb -c "SELECT * FROM doc.testdrive" | ||
| ``` | ||
|
Comment on lines
+120
to
+123
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update verification query to match the standardized target. Query the table you just loaded into: testdrive.demo. 🤖 Prompt for AI Agents
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @coderabbitai: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @amotl Thank you for the correction! You're absolutely right - ✏️ Learnings added
🧠 Learnings used |
||
|
|
||
|
|
||
| [cloud platform]: https://docs.influxdata.com/influxdb/cloud/admin | ||
| [CrateDB]: https://github.com/crate/crate | ||
| [CrateDB Cloud]: https://console.cratedb.cloud/ | ||
| [CrateDB Toolkit InfluxDB I/O subsystem]: https://cratedb-toolkit.readthedocs.io/io/influxdb/loader.html | ||
| [InfluxDB]: https://github.com/influxdata/influxdb | ||
| [InfluxDB Cloud]: https://cloud2.influxdata.com/ | ||
| [influxio]: https://influxio.readthedocs.io/ | ||
| [report them]: https://github.com/crate/cratedb-toolkit/issues | ||
| [What are series and bucket in InfluxDB]: https://stackoverflow.com/questions/58190272/what-are-series-and-bucket-in-influxdb/69951376#69951376 | ||
Uh oh!
There was an error while loading. Please reload this page.