55
66:::{include} /_ include/links.md
77:::
8+ :::{include} /_ include/styles.html
9+ :::
810
911:::{div}
10- CrateDB provides many options to connect and integrate with third-party
12+ Options to connect and integrate CrateDB with third-party
1113ETL applications, mostly using [ CrateDB's PostgreSQL interface] .
1214CrateDB also provides native adapter components to leverage advanced
1315features.
1416
15- This documentation section lists corresponding ETL applications and
17+ This documentation section lists ETL applications and
1618frameworks which can be used together with CrateDB, and outlines how
1719to use them optimally.
1820Please also take a look at support for {ref}` cdc ` solutions.
1921:::
2022
2123
24+ :::{rubric} Grouped by category
25+ :::
26+
27+ :::::{grid} 1 2 2 2
28+ :margin: 4 4 0 0
29+ :padding: 0
30+ :gutter: 2
31+ :class-container: ul-li-wide
32+
33+
34+ ::::{grid-item-card} {material-outlined}` air;2em ` Dataflow / Pipeline / Code-first
35+ - {ref}` apache-airflow `
36+
37+ Apache Airflow is an open source software platform to programmatically author,
38+ schedule, and monitor workflows. Pipelines are defined in Python, allowing for
39+ dynamic pipeline generation and on-demand, code-driven pipeline invocation.
40+
41+ - {ref}` apache-flink `
42+
43+ Apache Flink is a programming framework and distributed processing engine for
44+ stateful computations over unbounded and bounded data streams, written in Java.
45+
46+ - {ref}` apache-nifi `
47+
48+ Apache NiFi is a dataflow system based on the concepts of flow-based programming.
49+ It supports powerful and scalable directed graphs of data routing, transformation,
50+ and system mediation logic.
51+
52+ - {ref}` dbt `
53+
54+ dbt is an SQL-first platform for transforming data in data warehouses using
55+ Python and SQL. The data abstraction layer provided by dbt-core allows the
56+ decoupling of the models on which reports and dashboards rely from the source data.
57+
58+ - {ref}` kestra `
59+
60+ Kestra is an open source workflow automation and orchestration toolkit with a rich
61+ plugin ecosystem. It enables users to automate and manage complex workflows in a
62+ streamlined and efficient manner, defining them both declaratively, or imperatively
63+ using any scripting language like Python, Bash, or JavaScript.
64+
65+ - {ref}` meltano `
66+
67+ Meltano is a declarative code-first polyglot data integration engine adhering to
68+ the Singer specification. Singer is a composable open source ETL framework and
69+ specification, including powerful data extraction and consolidation elements.
70+
71+ +++
72+ Data pipeline programming frameworks and platforms.
73+ ::::
74+
75+
76+ ::::{grid-item-card} {material-outlined}` all_inclusive;2em ` Low-code / No-code / Visual
77+ - {ref}` apache-hop `
78+
79+ Apache Hop aims to be the future of data integration. Visual development enables
80+ developers to be more productive than they can be through code.
81+
82+ - {ref}` estuary `
83+
84+ Estuary provides real-time data integration and modern ETL and ELT data pipelines
85+ as a fully managed solution. Estuary Flow is a real-time, reliable change data
86+ capture (CDC) solution.
87+
88+ - {ref}` node-red `
89+
90+ Node-RED is an open-source programming tool for wiring together hardware devices,
91+ APIs and online services within a low-code programming environment for event-driven
92+ applications.
93+
94+ +++
95+ Visual data flow and integration frameworks and platforms.
96+ ::::
97+
98+
99+ ::::{grid-item-card} {material-outlined}` storage;2em ` Databases
100+ - {ref}` aws-dms `
101+
102+ AWS DMS is a managed migration and replication service that helps move your
103+ database and analytics workloads between different kinds of databases quickly,
104+ securely, and with minimal downtime and zero data loss.
105+
106+ - {ref}` aws-dynamodb `
107+
108+ DynamoDB is a fully managed NoSQL database service provided by Amazon Web Services (AWS).
109+
110+ - {ref}` influxdb `
111+
112+ InfluxDB is a scalable datastore for metrics, events, and real-time analytics to
113+ collect, process, transform, and store event and time series data.
114+
115+ - {ref}` mongodb `
116+
117+ MongoDB is a document database designed for ease of application development and scaling.
118+
119+ - {ref}` mysql `
120+
121+ MySQL and MariaDB are well-known free and open-source relational database management
122+ systems (RDBMS), available as standalone and managed variants.
123+
124+ - {ref}` sql-server `
125+
126+ Microsoft SQL Server Integration Services (SSIS) is a component of the Microsoft SQL
127+ Server database software that can be used to perform a broad range of data migration tasks.
128+
129+ +++
130+ Load data from database systems.
131+ ::::
132+
133+
134+ ::::{grid-item-card} {material-outlined}` fast_forward;2em ` Streams
135+ - {ref}` apache-kafka `
136+
137+ Apache Kafka is an open-source distributed event streaming platform
138+ for high-performance data pipelines, streaming analytics, data integration,
139+ and mission-critical applications.
140+
141+ - {ref}` aws-kinesis `
142+
143+ Amazon Kinesis Data Streams is a serverless streaming data service that simplifies
144+ the capture, processing, and storage of data streams at any scale, such as
145+ application logs, website clickstreams, and IoT telemetry data, for machine
146+ learning (ML), analytics, and other applications.
147+
148+ - {ref}` risingwave `
149+
150+ RisingWave is a stream processing and management platform that allows configuring
151+ data sources, views on that data, and destinations where results are materialized.
152+ It provides both a Postgres-compatible SQL interface, like CrateDB, and a
153+ DataFrame-style Python interface.
154+ It delivers low-latency insights from real-time streams, database CDC, and
155+ time-series data, bringing streaming and batch together.
156+
157+ - {ref}` streamsets `
158+
159+ The StreamSets Data Collector is a lightweight and powerful engine that allows you
160+ to build streaming, batch and change-data-capture (CDC) pipelines that can ingest
161+ and transform data from a variety of sources.
162+
163+ +++
164+ Load data from streaming platforms.
165+ ::::
166+
167+
168+ ::::{grid-item-card} {material-outlined}` add_to_queue;2em ` Serverless Compute
169+
170+ - {ref}` azure-functions `
171+
172+ An Azure Function is a short-lived, serverless computation that is triggered by
173+ external events. The trigger produces an input payload, which is delivered to
174+ the Azure Function. The Azure Function then does computation with this payload
175+ and subsequently outputs its result to other Azure Functions, computation
176+ services, or storage services.
177+ +++
178+ Use serverless compute units for custom import tasks.
179+ ::::
180+
181+
182+ ::::{grid-item-card} {material-outlined}` dataset;2em ` Datasets
183+
184+ - {ref}` apache-iceberg `
185+
186+ Apache Iceberg is an open table format for analytic datasets.
187+
188+ +++
189+ Load data from datasets and open table formats.
190+ ::::
191+
192+
193+ :::::
194+
195+
196+ :::{rubric} Alphabetically sorted
197+ :::
198+
199+ :::{div}
22200- {ref}` apache-airflow `
23201- {ref}` apache-flink `
24202- {ref}` apache-hop `
25203- {ref}` apache-iceberg `
26204- {ref}` apache-kafka `
27205- {ref}` apache-nifi `
28- - {ref}` aws-dms `
29206- {ref}` aws-dynamodb `
30207- {ref}` aws-kinesis `
208+ - {ref}` aws-dms `
31209- {ref}` azure-functions `
32210- {ref}` dbt `
33211- {ref}` estuary `
@@ -40,3 +218,4 @@ Please also take a look at support for {ref}`cdc` solutions.
40218- {ref}` risingwave `
41219- {ref}` sql-server `
42220- {ref}` streamsets `
221+ :::
0 commit comments