diff --git a/contribute/style-guide.md b/contribute/style-guide.md index 3d8fc8be8dc..02519179d42 100644 --- a/contribute/style-guide.md +++ b/contribute/style-guide.md @@ -340,7 +340,7 @@ When using URL parameters to control which version of documentation is displayed there are conventions to follow for reliable functionality. Here's how the `?v=v08` parameter relates to the snippet selection: -#### How It Works +#### How it works The URL parameter acts as a selector that matches against the `version` property in your component configuration. For example: @@ -393,3 +393,22 @@ show_related_blogs: true This will show it on the page, assuming there is a matching blog. If there is no match then it remains hidden. + +## Vale + +Vale is a command-line tool that brings code-like linting to prose. +We have a number of rules set up to ensure that our documentation is +consistent in style. + +The style rules are located at `/styles/ClickHouse`, and largely based +off of the Google styleset, with some ClickHouse specific adaptions. +If you want to check only a specific rule locally, you +can run: + +```bash +vale --filter='.Name == "ClickHouse.Headings"' docs/integrations +``` + +This will run only the rule named `Headings` on +the `docs/integrations` directory. Specifying a specific markdown +file is also possible. diff --git a/docs/_snippets/_GCS_authentication_and_bucket.md b/docs/_snippets/_GCS_authentication_and_bucket.md index 6e3a45cd436..546666a8049 100644 --- a/docs/_snippets/_GCS_authentication_and_bucket.md +++ b/docs/_snippets/_GCS_authentication_and_bucket.md @@ -19,7 +19,7 @@ import Image from '@theme/IdealImage'; Creating a GCS bucket in US East 4 -### Generate an Access key {#generate-an-access-key} +### Generate an access key {#generate-an-access-key} ### Create a service account HMAC key and secret {#create-a-service-account-hmac-key-and-secret} diff --git a/docs/_snippets/_add_superset_detail.md b/docs/_snippets/_add_superset_detail.md index 9df8b5c920a..88e64ec9be1 100644 --- a/docs/_snippets/_add_superset_detail.md +++ b/docs/_snippets/_add_superset_detail.md @@ -13,7 +13,7 @@ There are a few tasks to be done before running `docker compose`: The commands below are to be run from the top level of the GitHub repo, `superset`. ::: -## Official ClickHouse Connect driver {#official-clickhouse-connect-driver} +## Official ClickHouse connect driver {#official-clickhouse-connect-driver} To make the ClickHouse Connect driver available in the Superset deployment add it to the local requirements file: diff --git a/docs/_snippets/_users-and-roles-common.md b/docs/_snippets/_users-and-roles-common.md index 0a78e88d7aa..29726229be9 100644 --- a/docs/_snippets/_users-and-roles-common.md +++ b/docs/_snippets/_users-and-roles-common.md @@ -269,7 +269,7 @@ Roles are used to define groups of users for certain privileges instead of manag Verify that only the above two rows are returned, rows with the value `B` in `column1` should be excluded. ::: -## Modifying Users and Roles {#modifying-users-and-roles} +## Modifying users and roles {#modifying-users-and-roles} Users can be assigned multiple roles for a combination of privileges needed. When using multiple roles, the system will combine the roles to determine privileges, the net effect will be that the role permissions will be cumulative. diff --git a/docs/about-us/beta-and-experimental-features.md b/docs/about-us/beta-and-experimental-features.md index 7ca6657d1cf..a38c8d9ab5b 100644 --- a/docs/about-us/beta-and-experimental-features.md +++ b/docs/about-us/beta-and-experimental-features.md @@ -14,7 +14,7 @@ Due to the uncertainty of when features are classified as generally available, w The sections below explicitly describe the properties of **Beta** and **Experimental** features: -## Beta Features {#beta-features} +## Beta features {#beta-features} - Under active development to make them generally available (GA) - Main known issues can be tracked on GitHub @@ -26,7 +26,7 @@ You can find below the features considered Beta in ClickHouse Cloud and are avai Note: please be sure to be using a current version of the ClickHouse [compatibility](/operations/settings/settings#compatibility) setting to be using a recently introduced feature. -## Experimental Features {#experimental-features} +## Experimental features {#experimental-features} - May never become GA - May be removed diff --git a/docs/about-us/distinctive-features.md b/docs/about-us/distinctive-features.md index 77b13be8702..75e4413cb47 100644 --- a/docs/about-us/distinctive-features.md +++ b/docs/about-us/distinctive-features.md @@ -7,9 +7,9 @@ title: 'Distinctive Features of ClickHouse' keywords: ['compression', 'secondary-indexes','column-oriented'] --- -# Distinctive Features of ClickHouse +# Distinctive features of ClickHouse -## True Column-Oriented Database Management System {#true-column-oriented-database-management-system} +## True column-oriented database management system {#true-column-oriented-database-management-system} In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data. @@ -17,29 +17,29 @@ This is in contrast to systems that can store values of different columns separa Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server. -## Data Compression {#data-compression} +## Data compression {#data-compression} Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance. In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones. -## Disk Storage of Data {#disk-storage-of-data} +## Disk storage of data {#disk-storage-of-data} Keeping data physically sorted by primary key makes it possible to extract data based on specific values or value ranges with low latency in less than a few dozen milliseconds. Some column-oriented DBMSs, such as SAP HANA and Google PowerDrill, can only work in RAM. This approach requires allocation of a larger hardware budget than necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available. -## Parallel Processing on Multiple Cores {#parallel-processing-on-multiple-cores} +## Parallel processing on multiple cores {#parallel-processing-on-multiple-cores} Large queries are parallelized naturally, taking all the necessary resources available on the current server. -## Distributed Processing on Multiple Servers {#distributed-processing-on-multiple-servers} +## Distributed processing on multiple servers {#distributed-processing-on-multiple-servers} Almost none of the columnar DBMSs mentioned above have support for distributed query processing. In ClickHouse, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user. -## SQL Support {#sql-support} +## SQL support {#sql-support} ClickHouse supports [SQL language](/sql-reference/) that is mostly compatible with the ANSI SQL standard. @@ -47,29 +47,29 @@ Supported queries include [GROUP BY](../sql-reference/statements/select/group-by Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future. -## Vector Computation Engine {#vector-engine} +## Vector computation engine {#vector-engine} Data is not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency. -## Real-Time Data Inserts {#real-time-data-updates} +## Real-time data inserts {#real-time-data-updates} ClickHouse supports tables with a primary key. To quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested. -## Primary Indexes {#primary-index} +## Primary indexes {#primary-index} Having data physically sorted by primary key makes it possible to extract data based on specific values or value ranges with low latency in less than a few dozen milliseconds. -## Secondary Indexes {#secondary-indexes} +## Secondary indexes {#secondary-indexes} Unlike other database management systems, secondary indexes in ClickHouse do not point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts would not match the query filtering conditions and do not read them at all, thus they are called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes). -## Suitable for Online Queries {#suitable-for-online-queries} +## Suitable for online queries {#suitable-for-online-queries} Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later"). In ClickHouse "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment as the user interface page is loading. In other words, online. -## Support for Approximated Calculations {#support-for-approximated-calculations} +## Support for approximated calculations {#support-for-approximated-calculations} ClickHouse provides various ways to trade accuracy for performance: @@ -77,11 +77,11 @@ ClickHouse provides various ways to trade accuracy for performance: 2. Running a query based on a part ([SAMPLE](../sql-reference/statements/select/sample.md)) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk. 3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources. -## Adaptive Join Algorithm {#adaptive-join-algorithm} +## Adaptive join algorithm {#adaptive-join-algorithm} ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table. -## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support} +## Data replication and data integrity support {#data-replication-and-data-integrity-support} ClickHouse uses asynchronous multi-master replication. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases. @@ -91,7 +91,7 @@ For more information, see the section [Data replication](../engines/table-engine ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in ANSI SQL standard and popular relational database management systems. -## Features that Can Be Considered Disadvantages {#clickhouse-features-that-can-be-considered-disadvantages} +## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages} 1. No full-fledged transactions. 2. Lack of ability to modify or delete already inserted data with a high rate and low latency. There are batch deletes and updates available to clean up or modify data, for example, to comply with [GDPR](https://gdpr-info.eu). diff --git a/docs/about-us/history.md b/docs/about-us/history.md index 3546af768e7..9a888930af6 100644 --- a/docs/about-us/history.md +++ b/docs/about-us/history.md @@ -7,7 +7,7 @@ keywords: ['history','development','Metrica'] title: 'ClickHouse History' --- -# ClickHouse History {#clickhouse-history} +# ClickHouse history {#clickhouse-history} ClickHouse was initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be its core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development. @@ -15,7 +15,7 @@ Yandex.Metrica builds customized reports on the fly based on hits and sessions, As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events needed to be stored, in order to build custom reports. A single query may have required scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds. -## Usage in Yandex.Metrica and Other Yandex Services {#usage-in-yandex-metrica-and-other-yandex-services} +## Usage in Yandex.Metrica and other Yandex services {#usage-in-yandex-metrica-and-other-yandex-services} ClickHouse serves multiple purposes in Yandex.Metrica. Its main task is to build reports in online mode using non-aggregated data. It uses a cluster of 374 servers, which store over 20.3 trillion rows in the database. The volume of compressed data is about 2 PB, without accounting for duplicates and replicas. The volume of uncompressed data (in TSV format) would be approximately 17 PB. @@ -30,7 +30,7 @@ ClickHouse also plays a key role in the following processes: Nowadays, there are a multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others. -## Aggregated and Non-aggregated Data {#aggregated-and-non-aggregated-data} +## Aggregated and non-aggregated data {#aggregated-and-non-aggregated-data} There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data. diff --git a/docs/about-us/index.md b/docs/about-us/index.md index 6577c1464ab..d61d835efd8 100644 --- a/docs/about-us/index.md +++ b/docs/about-us/index.md @@ -9,11 +9,11 @@ description: 'Landing page for About ClickHouse' In this section of the docs you'll find information about ClickHouse. Refer to the table of contents below for a list of pages in this section of the docs. -| Page | Description | -|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [What is ClickHouse](/about-clickhouse) | Introduces ClickHouse's core features, architecture, and uses, providing a concise overview for new users. | -| [Adopters](/about-us/adopters) | A list of companies using ClickHouse and their success stories, assembled from public sources | -| [Support](/about-us/support) | An introduction to ClickHouse Cloud Support Services and their mission. | -| [Beta Features and Experimental](/beta-and-experimental-features) | Learn about how ClickHouse uses "Beta" and "Experimental" labels to distinguish between officially supported and early-stage, unsupported features due to varied development speeds from community contributions. | -| [Cloud Service](/about-us/cloud) | Discover ClickHouse Cloud - a fully managed service that allows users to spin up open-source ClickHouse databases and offers benefits like fast time to value, seamless scaling, and serverless operations. | -| [ClickHouse History](/about-us/history) | Learn more about the history of ClickHouse. | +| Page | Description | +|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [What is ClickHouse](/about-clickhouse) | Introduces ClickHouse's core features, architecture, and uses, providing a concise overview for new users. | +| [Adopters](/about-us/adopters) | A list of companies using ClickHouse and their success stories, assembled from public sources | +| [Support](/about-us/support) | An introduction to ClickHouse Cloud support services and their mission. | +| [Beta features and experimental features](/beta-and-experimental-features) | Learn about how ClickHouse uses "Beta" and "Experimental" labels to distinguish between officially supported and early-stage, unsupported features due to varied development speeds from community contributions. | +| [Cloud service](/about-us/cloud) | Discover ClickHouse Cloud - a fully managed service that allows users to spin up open-source ClickHouse databases and offers benefits like fast time to value, seamless scaling, and serverless operations. | +| [ClickHouse history](/about-us/history) | Learn more about the history of ClickHouse. | diff --git a/docs/about-us/support.md b/docs/about-us/support.md index 2b2b4225e1f..fb02f17832c 100644 --- a/docs/about-us/support.md +++ b/docs/about-us/support.md @@ -1,12 +1,12 @@ --- slug: /about-us/support sidebar_label: 'Support' -title: 'ClickHouse Cloud Support Services' +title: 'ClickHouse Cloud support services' sidebar_position: 30 description: 'Information on ClickHouse Cloud support services' --- -# ClickHouse Cloud Support Services +# ClickHouse Cloud support services ClickHouse provides Support Services for our ClickHouse Cloud users and customers. Our objective is a Support Services team that represents the ClickHouse product – unparalleled performance, ease of use, and exceptionally fast, high-quality results. For details, [visit our ClickHouse Support Program](https://clickhouse.com/support/program/) page. diff --git a/docs/architecture/cluster-deployment.md b/docs/architecture/cluster-deployment.md index af00312b8ac..4b7a2f81068 100644 --- a/docs/architecture/cluster-deployment.md +++ b/docs/architecture/cluster-deployment.md @@ -1,8 +1,8 @@ --- slug: /architecture/cluster-deployment -sidebar_label: 'Cluster Deployment' +sidebar_label: 'Cluster deployment' sidebar_position: 100 -title: 'Cluster Deployment' +title: 'Cluster deployment' description: 'By going through this tutorial, you will learn how to set up a simple ClickHouse cluster.' --- @@ -10,7 +10,7 @@ This tutorial assumes you've already set up a [local ClickHouse server](../getti By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster. It'll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries. -## Cluster Deployment {#cluster-deployment} +## Cluster deployment {#cluster-deployment} This ClickHouse cluster will be a homogeneous cluster. Here are the steps: diff --git a/docs/best-practices/_snippets/_async_inserts.md b/docs/best-practices/_snippets/_async_inserts.md index 02e12fb39ee..186e699c890 100644 --- a/docs/best-practices/_snippets/_async_inserts.md +++ b/docs/best-practices/_snippets/_async_inserts.md @@ -21,7 +21,7 @@ When enabled (1), inserts are buffered and only written to disk once one of the This batching process is invisible to clients and helps ClickHouse efficiently merge insert traffic from multiple sources. However, until a flush occurs, the data cannot be queried. Importantly, there are multiple buffers per insert shape and settings combination, and in clusters, buffers are maintained per node - enabling fine-grained control across multi-tenant environments. Insert mechanics are otherwise identical to those described for [synchronous inserts](/best-practices/selecting-an-insert-strategy#synchronous-inserts-by-default). -### Choosing a Return Mode {#choosing-a-return-mode} +### Choosing a return mode {#choosing-a-return-mode} The behavior of asynchronous inserts is further refined using the [`wait_for_async_insert`](/operations/settings/settings#wait_for_async_insert) setting. diff --git a/docs/best-practices/_snippets/_avoid_optimize_final.md b/docs/best-practices/_snippets/_avoid_optimize_final.md index 262f1f8f7d5..9cc119a9bd9 100644 --- a/docs/best-practices/_snippets/_avoid_optimize_final.md +++ b/docs/best-practices/_snippets/_avoid_optimize_final.md @@ -18,7 +18,7 @@ OPTIMIZE TABLE FINAL; **you should avoid this operation in most cases** as it initiates resource intensive operations which may impact cluster performance. -## Why Avoid? {#why-avoid} +## Why avoid? {#why-avoid} ### It's expensive {#its-expensive} diff --git a/docs/best-practices/index.md b/docs/best-practices/index.md index 4f672250e63..5a3ae78ab5f 100644 --- a/docs/best-practices/index.md +++ b/docs/best-practices/index.md @@ -1,6 +1,6 @@ --- slug: /best-practices -keywords: ['Cloud', 'Primary key', 'Ordering key', 'Materialized Views', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid Mutations', 'Avoid Nullable Columns', 'Avoid Optimize Final', 'Partitioning Key'] +keywords: ['Cloud', 'Primary key', 'Ordering key', 'Materialized Views', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid Mutations', 'Avoid nullable Columns', 'Avoid Optimize Final', 'Partitioning Key'] title: 'Overview' hide_title: true description: 'Landing page for Best Practices section in ClickHouse' diff --git a/docs/best-practices/select_data_type.md b/docs/best-practices/select_data_type.md index d9191294453..fa55d0a8a45 100644 --- a/docs/best-practices/select_data_type.md +++ b/docs/best-practices/select_data_type.md @@ -18,7 +18,7 @@ Some straightforward guidelines can significantly enhance the schema: * **Use Strict Types:** Always select the correct data type for columns. Numeric and date fields should use appropriate numeric and date types rather than general-purpose String types. This ensures correct semantics for filtering and aggregations. -* **Avoid Nullable Columns:** Nullable columns introduce additional overhead by maintaining separate columns for tracking null values. Only use Nullable if explicitly required to distinguish between empty and null states. Otherwise, default or zero-equivalent values typically suffice. For further information on why this type should be avoided unless needed, see [Avoid Nullable Columns](/best-practices/select-data-types#avoid-nullable-columns). +* **Avoid nullable Columns:** Nullable columns introduce additional overhead by maintaining separate columns for tracking null values. Only use Nullable if explicitly required to distinguish between empty and null states. Otherwise, default or zero-equivalent values typically suffice. For further information on why this type should be avoided unless needed, see [Avoid nullable Columns](/best-practices/select-data-types#avoid-nullable-columns). * **Minimize Numeric Precision:** Select numeric types with minimal bit-width that still accommodate the expected data range. For instance, prefer [UInt16 over Int32](/sql-reference/data-types/int-uint) if negative values aren't needed, and the range fits within 0–65535. @@ -136,6 +136,6 @@ ENGINE = MergeTree ORDER BY tuple() ``` -## Avoid Nullable columns {#avoid-nullable-columns} +## Avoid nullable columns {#avoid-nullable-columns} diff --git a/docs/best-practices/selecting_an_insert_strategy.md b/docs/best-practices/selecting_an_insert_strategy.md index e4dfebf8ae7..f9ffca9edf5 100644 --- a/docs/best-practices/selecting_an_insert_strategy.md +++ b/docs/best-practices/selecting_an_insert_strategy.md @@ -130,7 +130,7 @@ When data arrives pre-sorted, ClickHouse can skip or simplify the internal sorti -## Choose an interface - HTTP or Native {#choose-an-interface} +## Choose an interface - HTTP or native {#choose-an-interface} ### Native {#choose-an-interface-native} diff --git a/docs/best-practices/sizing-and-hardware-recommendations.md b/docs/best-practices/sizing-and-hardware-recommendations.md index 6ef842bb49e..037c0ac6398 100644 --- a/docs/best-practices/sizing-and-hardware-recommendations.md +++ b/docs/best-practices/sizing-and-hardware-recommendations.md @@ -1,12 +1,12 @@ --- slug: /guides/sizing-and-hardware-recommendations -sidebar_label: 'Sizing and Hardware Recommendations' +sidebar_label: 'Sizing and hardware recommendations' sidebar_position: 4 -title: 'Sizing and Hardware Recommendations' +title: 'Sizing and hardware recommendations' description: 'This guide discusses our general recommendations regarding hardware, compute, memory, and disk configurations for open-source users.' --- -# Sizing and Hardware Recommendations +# Sizing and hardware recommendations This guide discusses our general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. If you would like to simplify your setup, we recommend using [ClickHouse Cloud](https://clickhouse.com/cloud) as it automatically scales and adapts to your workloads while minimizing costs pertaining to infrastructure management. diff --git a/docs/best-practices/use_materialized_views.md b/docs/best-practices/use_materialized_views.md index 18737240b4c..ed49b160a94 100644 --- a/docs/best-practices/use_materialized_views.md +++ b/docs/best-practices/use_materialized_views.md @@ -28,7 +28,7 @@ ClickHouse supports two types of materialized views: [**incremental**](/material The choice between incremental and refreshable materialized views depends largely on the nature of the query, how frequently data changes, and whether updates to the view must reflect every row as it is inserted, or if a periodic refresh is acceptable. Understanding these trade-offs is key to designing performant, scalable materialized views in ClickHouse. -## When to Use Incremental Materialized Views {#when-to-use-incremental-materialized-views} +## When to use incremental materialized views {#when-to-use-incremental-materialized-views} Incremental materialized views are generally preferred, as they update automatically in real-time whenever the source tables receive new data. They support all aggregation functions and are particularly effective for aggregations over a single table. By computing results incrementally at insert-time, queries run against significantly smaller data subsets, allowing these views to scale effortlessly even to petabytes of data. In most cases they will have no appreciable impact on overall cluster performance. @@ -40,7 +40,7 @@ Use incremental materialized views when: For examples of incremental materialized views see [here](/materialized-view/incremental-materialized-view). -## When to Use Refreshable Materialized Views {#when-to-use-refreshable-materialized-views} +## When to use refreshable materialized views {#when-to-use-refreshable-materialized-views} Refreshable materialized views execute their queries periodically rather than incrementally, storing the query result set for rapid retrieval. @@ -60,7 +60,7 @@ In summary, use refreshable materialized views when: For examples of refreshable materialized views see [here](/materialized-view/refreshable-materialized-view). -### APPEND vs REPLACE Mode {#append-vs-replace-mode} +### APPEND vs REPLACE mode {#append-vs-replace-mode} Refreshable materialized views support two modes for writing data to the target table: `APPEND` and `REPLACE`. These modes define how the result of the view's query is written when the view is refreshed. diff --git a/docs/chdb/install/bun.md b/docs/chdb/install/bun.md index 6f421e7fec7..f8e9e134621 100644 --- a/docs/chdb/install/bun.md +++ b/docs/chdb/install/bun.md @@ -36,7 +36,9 @@ var result = query("SELECT version()", "CSV"); console.log(result); // 23.10.1.1 ``` + ### Session.Query(query, *format) {#sessionqueryquery-format} + ```javascript import { Session } from 'chdb-bun'; diff --git a/docs/chdb/install/c.md b/docs/chdb/install/c.md index cc0a019b24a..9ff218aab17 100644 --- a/docs/chdb/install/c.md +++ b/docs/chdb/install/c.md @@ -6,7 +6,9 @@ description: 'How to install chDB for C and C++' keywords: ['chdb', 'embedded', 'clickhouse-lite', 'install'] --- + # Installing chDB for C and C++ + ## Requirements {#requirements} diff --git a/docs/chdb/install/python.md b/docs/chdb/install/python.md index a62b2cdb9d1..446251f7c2c 100644 --- a/docs/chdb/install/python.md +++ b/docs/chdb/install/python.md @@ -73,7 +73,7 @@ print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, elapsed {res. chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe') ``` -### Query On Table (Pandas DataFrame, Parquet file/bytes, Arrow bytes) {#query-on-table-pandas-dataframe-parquet-filebytes-arrow-bytes} +### Query on table (Pandas DataFrame, Parquet file/bytes, Arrow bytes) {#query-on-table-pandas-dataframe-parquet-filebytes-arrow-bytes} **Query On Pandas DataFrame** @@ -90,7 +90,7 @@ print(ret_tbl) print(ret_tbl.query('select b, sum(a) from __table__ group by b')) ``` -### Query with Stateful Session {#query-with-stateful-session} +### Query with stateful session {#query-with-stateful-session} Sessions will keep the state of query. All DDL and DML state will be kept in a directory. Directory path can be passed in as an argument. If it is not passed, a temporary directory will be created. @@ -169,7 +169,7 @@ Some notes on the chDB Python UDF (User Defined Function) decorator. see also: [test_udf.py](https://github.com/chdb-io/chdb/blob/main/tests/test_udf.py). -### Python Table Engine {#python-table-engine} +### Python table engine {#python-table-engine} ### Query on Pandas DataFrame {#query-on-pandas-dataframe} diff --git a/docs/cloud/bestpractices/index.md b/docs/cloud/bestpractices/index.md index c1d00a2b712..965e290d0d3 100644 --- a/docs/cloud/bestpractices/index.md +++ b/docs/cloud/bestpractices/index.md @@ -1,6 +1,6 @@ --- slug: /cloud/bestpractices -keywords: ['Cloud', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid Mutations', 'Avoid Nullable Columns', 'Avoid Optimize Final', 'Low Cardinality Partitioning Key', 'Multi Tenancy', 'Usage Limits'] +keywords: ['Cloud', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid mutations', 'Avoid nullable columns', 'Avoid Optimize Final', 'Low Cardinality Partitioning Key', 'Multi Tenancy', 'Usage Limits'] title: 'Overview' hide_title: true description: 'Landing page for Best Practices section in ClickHouse Cloud' diff --git a/docs/cloud/bestpractices/multitenancy.md b/docs/cloud/bestpractices/multitenancy.md index c0a82c725c4..9f47dc580d1 100644 --- a/docs/cloud/bestpractices/multitenancy.md +++ b/docs/cloud/bestpractices/multitenancy.md @@ -305,7 +305,7 @@ User management is similar to the approaches described previously, since all ser Note the number of child services in a warehouse is limited to a small number. See [Warehouse limitations](/cloud/reference/warehouses#limitations). -## Separate Cloud service {#separate-service} +## Separate cloud service {#separate-service} The most radical approach is to use a different ClickHouse service per tenant. diff --git a/docs/cloud/changelogs/24_02.md b/docs/cloud/changelogs/24_02.md index 01c0426db0f..fa418da75a8 100644 --- a/docs/cloud/changelogs/24_02.md +++ b/docs/cloud/changelogs/24_02.md @@ -9,7 +9,7 @@ sidebar_position: 8 ### ClickHouse release tag: 24.2.2.15987 {#clickhouse-release-tag-242215987} -#### Backward Incompatible Change {#backward-incompatible-change} +#### Backward incompatible change {#backward-incompatible-change} * Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. [#59385](https://github.com/ClickHouse/ClickHouse/pull/59385) ([Kruglov Pavel](https://github.com/Avogar)). * The sort clause `ORDER BY ALL` (introduced with v23.12) is replaced by `ORDER BY *`. The previous syntax was too error-prone for tables with a column `all`. [#59450](https://github.com/ClickHouse/ClickHouse/pull/59450) ([Robert Schulze](https://github.com/rschu1ze)). * Add sanity check for number of threads and block sizes. [#60138](https://github.com/ClickHouse/ClickHouse/pull/60138) ([Raúl Marín](https://github.com/Algunenano)). @@ -21,9 +21,9 @@ sidebar_position: 8 * ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, `output_format_parquet_string_as_string`, `output_format_orc_string_as_string`, `output_format_arrow_string_as_string`. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster `lz4` compression method, that's why we set `zstd` by default. This is controlled by the settings `output_format_parquet_compression_method`, `output_format_orc_compression_method`, and `output_format_arrow_compression_method`. We changed the default to `zstd` for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). [#61817](https://github.com/ClickHouse/ClickHouse/pull/61817) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views [https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security](/sql-reference/statements/create/view#sql_security). [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit)) -#### New Feature {#new-feature} +#### New feature {#new-feature} * Topk/topkweighed support mode, which return count of values and it's error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)). -* Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit)). +* Added new syntax which allows to specify definer user in view/materialized view. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit)). * Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty `convert_to_replicated` file in table's data directory (`/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/`) and that table will be converted automatically on next server start. [#57798](https://github.com/ClickHouse/ClickHouse/pull/57798) ([Kirill](https://github.com/kirillgarbar)). * Added table function `mergeTreeIndex`. It represents the contents of index and marks files of `MergeTree` tables. It can be used for introspection. Syntax: `mergeTreeIndex(database, table, [with_marks = true])` where `database.table` is an existing table with `MergeTree` engine. [#58140](https://github.com/ClickHouse/ClickHouse/pull/58140) ([Anton Popov](https://github.com/CurtizJ)). * Try to detect file format automatically during schema inference if it's unknown in `file/s3/hdfs/url/azureBlobStorage` engines. Closes [#50576](https://github.com/ClickHouse/ClickHouse/issues/50576). [#59092](https://github.com/ClickHouse/ClickHouse/pull/59092) ([Kruglov Pavel](https://github.com/Avogar)). @@ -38,7 +38,7 @@ sidebar_position: 8 * Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60281](https://github.com/ClickHouse/ClickHouse/pull/60281) ([Shaun Struwig](https://github.com/Blargian)). * Support single-argument version for the merge table function, as `merge(['db_name', ] 'tables_regexp')`. [#60372](https://github.com/ClickHouse/ClickHouse/pull/60372) ([豪肥肥](https://github.com/HowePa)). * Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. [#60420](https://github.com/ClickHouse/ClickHouse/pull/60420) ([豪肥肥](https://github.com/HowePa)). -* Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). +* Added new syntax which allows to specify definer user in view/materialized view. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). * Add four properties to the `StorageMemory` (memory-engine) `min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep` and `max_rows_to_keep` - Add tests to reflect new changes - Update `memory.md` documentation - Add table `context` property to `MemorySink` to enable access to table parameter bounds. [#60612](https://github.com/ClickHouse/ClickHouse/pull/60612) ([Jake Bamrah](https://github.com/JakeBamrah)). * Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60649](https://github.com/ClickHouse/ClickHouse/pull/60649) ([Robert Schulze](https://github.com/rschu1ze)). * Separate limits on number of waiting and executing queries. Added new server setting `max_waiting_queries` that limits the number of queries waiting due to `async_load_databases`. Existing limits on number of executing queries no longer count waiting queries. [#61053](https://github.com/ClickHouse/ClickHouse/pull/61053) ([Sergei Trifonov](https://github.com/serxa)). diff --git a/docs/cloud/changelogs/24_05.md b/docs/cloud/changelogs/24_05.md index dde199e3d84..62e97dd6e9c 100644 --- a/docs/cloud/changelogs/24_05.md +++ b/docs/cloud/changelogs/24_05.md @@ -7,11 +7,11 @@ sidebar_label: '24.5' sidebar_position: 7 --- -# v24.5 Changelog for Cloud +# V24.5 changelog for Cloud Relevant changes for ClickHouse Cloud services based on the v24.5 release. -## Breaking Changes {#breaking-changes} +## Breaking changes {#breaking-changes} * Change the column name from duration_ms to duration_microseconds in the system.zookeeper table to reflect the reality that the duration is in the microsecond resolution. [#60774](https://github.com/ClickHouse/ClickHouse/pull/60774) (Duc Canh Le). @@ -22,7 +22,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.5 release. * Usage of functions neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set allow_deprecated_error_prone_window_functions=1. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) (Nikita Taranov). -## Backward Incompatible Changes {#backward-incompatible-changes} +## Backward incompatible changes {#backward-incompatible-changes} * In the new ClickHouse version, the functions geoDistance, greatCircleDistance, and greatCircleAngle will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes #58476. In previous versions, the function always used Float32. You can switch to the old behavior by setting geo_distance_returns_float64_on_float64_arguments to false or setting compatibility to 24.2 or earlier. [#61848](https://github.com/ClickHouse/ClickHouse/pull/61848) (Alexey Milovidov). @@ -30,7 +30,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.5 release. * Fix crash in largestTriangleThreeBuckets. This changes the behaviour of this function and makes it to ignore NaNs in the series provided. Thus the resultset might differ from previous versions. [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) (Raúl Marín). -## New Features {#new-features} +## New features {#new-features} * The new analyzer is enabled by default on new services. diff --git a/docs/cloud/changelogs/24_06.md b/docs/cloud/changelogs/24_06.md index d28786e0e8b..aeed4a90f3c 100644 --- a/docs/cloud/changelogs/24_06.md +++ b/docs/cloud/changelogs/24_06.md @@ -7,15 +7,15 @@ sidebar_label: '24.6' sidebar_position: 6 --- -# v24.6 Changelog for Cloud +# V24.6 changelog for Cloud Relevant changes for ClickHouse Cloud services based on the v24.6 release. -## Backward Incompatible Change {#backward-incompatible-change} +## Backward incompatible change {#backward-incompatible-change} * Rework parallel processing in `Ordered` mode of storage `S3Queue`. This PR is backward incompatible for Ordered mode if you used settings `s3queue_processing_threads_num` or `s3queue_total_shards_num`. Setting `s3queue_total_shards_num` is deleted, previously it was allowed to use only under `s3queue_allow_experimental_sharded_mode`, which is now deprecated. A new setting is added - `s3queue_buckets`. [#64349](https://github.com/ClickHouse/ClickHouse/pull/64349) ([Kseniia Sumarokova](https://github.com/kssenii)). * New functions `snowflakeIDToDateTime`, `snowflakeIDToDateTime64`, `dateTimeToSnowflakeID`, and `dateTime64ToSnowflakeID` were added. Unlike the existing functions `snowflakeToDateTime`, `snowflakeToDateTime64`, `dateTimeToSnowflake`, and `dateTime64ToSnowflake`, the new functions are compatible with function `generateSnowflakeID`, i.e. they accept the snowflake IDs generated by `generateSnowflakeID` and produce snowflake IDs of the same type as `generateSnowflakeID` (i.e. `UInt64`). Furthermore, the new functions default to the UNIX epoch (aka. 1970-01-01), just like `generateSnowflakeID`. If necessary, a different epoch, e.g. Twitter's/X's epoch 2010-11-04 aka. 1288834974657 msec since UNIX epoch, can be passed. The old conversion functions are deprecated and will be removed after a transition period: to use them regardless, enable setting `allow_deprecated_snowflake_conversion_functions`. [#64948](https://github.com/ClickHouse/ClickHouse/pull/64948) ([Robert Schulze](https://github.com/rschu1ze)). -## New Feature {#new-feature} +## New feature {#new-feature} * Support empty tuples. [#55061](https://github.com/ClickHouse/ClickHouse/pull/55061) ([Amos Bird](https://github.com/amosbird)). * Add Hilbert Curve encode and decode functions. [#60156](https://github.com/ClickHouse/ClickHouse/pull/60156) ([Artem Mustafin](https://github.com/Artemmm91)). diff --git a/docs/cloud/changelogs/24_08.md b/docs/cloud/changelogs/24_08.md index b2c81b2ec37..78acd3f5f98 100644 --- a/docs/cloud/changelogs/24_08.md +++ b/docs/cloud/changelogs/24_08.md @@ -9,7 +9,7 @@ sidebar_position: 5 Relevant changes for ClickHouse Cloud services based on the v24.8 release. -## Backward Incompatible Change {#backward-incompatible-change} +## Backward incompatible change {#backward-incompatible-change} - Change binary serialization of Variant data type: add compact mode to avoid writing the same discriminator multiple times for granules with single variant or with only NULL values. Add MergeTree setting use_compact_variant_discriminators_serialization that is enabled by default. Note that Variant type is still experimental and backward-incompatible change in serialization should not impact you unless you have been working with support to get this feature enabled earlier. [#62774](https://github.com/ClickHouse/ClickHouse/pull/62774) (Kruglov Pavel). @@ -30,7 +30,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.8 release. - Fix REPLACE modifier formatting (forbid omitting brackets). [#67774](https://github.com/ClickHouse/ClickHouse/pull/67774) (Azat Khuzhin). -## New Feature {#new-feature} +## New feature {#new-feature} - Extend function tuple to construct named tuples in query. Introduce function tupleNames to extract names from tuples. [#54881](https://github.com/ClickHouse/ClickHouse/pull/54881) (Amos Bird). diff --git a/docs/cloud/changelogs/24_10.md b/docs/cloud/changelogs/24_10.md index d274cbf0417..c62c6c3fc68 100644 --- a/docs/cloud/changelogs/24_10.md +++ b/docs/cloud/changelogs/24_10.md @@ -9,7 +9,7 @@ sidebar_position: 4 Relevant changes for ClickHouse Cloud services based on the v24.10 release. -## Backward Incompatible Change {#backward-incompatible-change} +## Backward incompatible change {#backward-incompatible-change} - Allow to write `SETTINGS` before `FORMAT` in a chain of queries with `UNION` when subqueries are inside parentheses. This closes [#39712](https://github.com/ClickHouse/ClickHouse/issues/39712). Change the behavior when a query has the SETTINGS clause specified twice in a sequence. The closest SETTINGS clause will have a preference for the corresponding subquery. In the previous versions, the outermost SETTINGS clause could take a preference over the inner one. [#60197](https://github.com/ClickHouse/ClickHouse/pull/60197)[#68614](https://github.com/ClickHouse/ClickHouse/pull/68614) ([Alexey Milovidov](https://github.com/alexey-milovidov)). - Reimplement Dynamic type. Now when the limit of dynamic data types is reached new types are not cast to String but stored in a special data structure in binary format with binary encoded data type. Now any type ever inserted into Dynamic column can be read from it as subcolumn. [#68132](https://github.com/ClickHouse/ClickHouse/pull/68132) ([Pavel Kruglov](https://github.com/Avogar)). - Expressions like `a[b].c` are supported for named tuples, as well as named subscripts from arbitrary expressions, e.g., `expr().name`. This is useful for processing JSON. This closes [#54965](https://github.com/ClickHouse/ClickHouse/issues/54965). In previous versions, an expression of form `expr().name` was parsed as `tupleElement(expr(), name)`, and the query analyzer was searching for a column `name` rather than for the corresponding tuple element; while in the new version, it is changed to `tupleElement(expr(), 'name')`. In most cases, the previous version was not working, but it is possible to imagine a very unusual scenario when this change could lead to incompatibility: if you stored names of tuple elements in a column or an alias, that was named differently than the tuple element's name: `SELECT 'b' AS a, CAST([tuple(123)] AS 'Array(Tuple(b UInt8))') AS t, t[1].a`. It is very unlikely that you used such queries, but we still have to mark this change as potentially backward incompatible. [#68435](https://github.com/ClickHouse/ClickHouse/pull/68435) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -18,7 +18,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.10 release. - Fix `optimize_functions_to_subcolumns` optimization (previously could lead to `Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got LowCardinality(String)` error), by preserving `LowCardinality` type in `mapKeys`/`mapValues`. [#70716](https://github.com/ClickHouse/ClickHouse/pull/70716) ([Azat Khuzhin](https://github.com/azat)). -## New Feature {#new-feature} +## New feature {#new-feature} - Refreshable materialized views are production ready. [#70550](https://github.com/ClickHouse/ClickHouse/pull/70550) ([Michael Kolupaev](https://github.com/al13n321)). Refreshable materialized views are now supported in Replicated databases. [#60669](https://github.com/ClickHouse/ClickHouse/pull/60669) ([Michael Kolupaev](https://github.com/al13n321)). - Function `toStartOfInterval()` now has a new overload which emulates TimescaleDB's `time_bucket()` function, respectively PostgreSQL's `date_bin()` function. ([#55619](https://github.com/ClickHouse/ClickHouse/issues/55619)). It allows to align date or timestamp values to multiples of a given interval from an *arbitrary* origin (instead of 0000-01-01 00:00:00.000 as *fixed* origin). For example, `SELECT toStartOfInterval(toDateTime('2023-01-01 14:45:00'), INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'));` returns `2023-01-01 14:44:30` which is a multiple of 1 minute intervals, starting from origin `2023-01-01 14:35:30`. [#56738](https://github.com/ClickHouse/ClickHouse/pull/56738) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). - MongoDB integration refactored: migration to new driver mongocxx from deprecated Poco::MongoDB, remove support for deprecated old protocol, support for connection by URI, support for all MongoDB types, support for WHERE and ORDER BY statements on MongoDB side, restriction for expression unsupported by MongoDB. [#63279](https://github.com/ClickHouse/ClickHouse/pull/63279) ([Kirill Nikiforov](https://github.com/allmazz)). diff --git a/docs/cloud/changelogs/24_12.md b/docs/cloud/changelogs/24_12.md index 827862f55cb..629ca84d8e2 100644 --- a/docs/cloud/changelogs/24_12.md +++ b/docs/cloud/changelogs/24_12.md @@ -9,7 +9,7 @@ sidebar_position: 3 Relevant changes for ClickHouse Cloud services based on the v24.12 release. -## Backward Incompatible Changes {#backward-incompatible-changes} +## Backward incompatible changes {#backward-incompatible-changes} - Functions `greatest` and `least` now ignore NULL input values, whereas they previously returned NULL if one of the arguments was NULL. For example, `SELECT greatest(1, 2, NULL)` now returns 2. This makes the behavior compatible with PostgreSQL. [#65519](https://github.com/ClickHouse/ClickHouse/pull/65519) ([kevinyhzou](https://github.com/KevinyhZou)). - Don't allow Variant/Dynamic types in ORDER BY/GROUP BY/PARTITION BY/PRIMARY KEY by default because it may lead to unexpected results. [#69731](https://github.com/ClickHouse/ClickHouse/pull/69731) ([Pavel Kruglov](https://github.com/Avogar)). @@ -24,7 +24,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.12 release. - Remove support for `Enum` as well as `UInt128` and `UInt256` arguments in `deltaSumTimestamp`. Remove support for `Int8`, `UInt8`, `Int16`, and `UInt16` of the second ("timestamp") argument of `deltaSumTimestamp`. [#71790](https://github.com/ClickHouse/ClickHouse/pull/71790) ([Alexey Milovidov](https://github.com/alexey-milovidov)). - Added source query validation when ClickHouse is used as a source for a dictionary. [#72548](https://github.com/ClickHouse/ClickHouse/pull/72548) ([Alexey Katsman](https://github.com/alexkats)). -## New Features {#new-features} +## New features {#new-features} - Implement SYSTEM LOAD PRIMARY KEY command to load primary indexes for all parts of a specified table or for all tables if no table is specified. This will be useful for benchmarks and to prevent extra latency during query execution. [#66252](https://github.com/ClickHouse/ClickHouse/pull/66252) ([ZAWA_ll](https://github.com/Zawa-ll)). - Added statement `SYSTEM LOAD PRIMARY KEY` for loading the primary indexes of all parts in a specified table or for all tables if no table is specified. This can be useful for benchmarking and to prevent extra latency during query execution. [#67733](https://github.com/ClickHouse/ClickHouse/pull/67733) ([ZAWA_ll](https://github.com/Zawa-ll)). diff --git a/docs/cloud/changelogs/25_04.md b/docs/cloud/changelogs/25_04.md index 5e64a2ba436..3e7067dddc9 100644 --- a/docs/cloud/changelogs/25_04.md +++ b/docs/cloud/changelogs/25_04.md @@ -7,7 +7,7 @@ sidebar_label: '25.4' sidebar_position: 2 --- -## Backward Incompatible Changes {#backward-incompatible-changes} +## Backward incompatible changes {#backward-incompatible-changes} * Parquet output format converts Date and DateTime columns to date/time types supported by Parquet, instead of writing them as raw numbers. DateTime becomes DateTime64(3) (was: UInt32); setting `output_format_parquet_datetime_as_uint32` brings back the old behavior. Date becomes Date32 (was: UInt16). [#70950](https://github.com/ClickHouse/ClickHouse/pull/70950) ([Michael Kolupaev](https://github.com/al13n321)). * Don't allow comparable types (like JSON/Object/AggregateFunction) in ORDER BY and comparison functions `less/greater/equal/etc` by default. [#73276](https://github.com/ClickHouse/ClickHouse/pull/73276) ([Pavel Kruglov](https://github.com/Avogar)). @@ -26,7 +26,7 @@ sidebar_position: 2 * The legacy MongoDB integration has been removed. Server setting `use_legacy_mongodb_integration` became obsolete and now does nothing. [#77895](https://github.com/ClickHouse/ClickHouse/pull/77895) ([Robert Schulze](https://github.com/rschu1ze)). * Enhance SummingMergeTree validation to skip aggregation for columns used in partition or sort keys. [#78022](https://github.com/ClickHouse/ClickHouse/pull/78022) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)). -## New Features {#new-features} +## New features {#new-features} * Added an in-memory cache for deserialized skipping index granules. This should make repeated queries that use skipping indexes faster. The size of the new cache is controlled by server settings `skipping_index_cache_size` and `skipping_index_cache_max_entries`. The original motivation for the cache were vector similarity indexes which became a lot faster now. [#70102](https://github.com/ClickHouse/ClickHouse/pull/70102) ([Robert Schulze](https://github.com/rschu1ze)). * A new implementation of the Userspace Page Cache, which allows caching data in the in-process memory instead of relying on the OS page cache. It is useful when the data is stored on a remote virtual filesystem without backing with the local filesystem cache. [#70509](https://github.com/ClickHouse/ClickHouse/pull/70509) ([Michael Kolupaev](https://github.com/al13n321)). @@ -612,7 +612,7 @@ sidebar_position: 2 * Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown. [#78858](https://github.com/ClickHouse/ClickHouse/pull/78858) ([Azat Khuzhin](https://github.com/azat)). * Fix parsing of bad DateTime values in CSV format. [#78919](https://github.com/ClickHouse/ClickHouse/pull/78919) ([Pavel Kruglov](https://github.com/Avogar)). -## Build/Testing/Packaging Improvement {#build-testing-packaging-improvement} +## Build/testing/packaging improvement {#build-testing-packaging-improvement} * The internal dependency LLVM is bumped from 16 to 18. [#66053](https://github.com/ClickHouse/ClickHouse/pull/66053) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Restore deleted nats integration tests and fix errors. - fixed some race conditions in nats engine - fixed data loss when streaming data to nats in case of connection loss - fixed freeze of receiving the last chunk of data when streaming from nats ended - nats_max_reconnect is deprecated and has no effect, reconnect is performed permanently with nats_reconnect_wait timeout. [#69772](https://github.com/ClickHouse/ClickHouse/pull/69772) ([Dmitry Novikov](https://github.com/dmitry-sles-novikov)). diff --git a/docs/cloud/get-started/query-endpoints.md b/docs/cloud/get-started/query-endpoints.md index c861d2ac1ce..07c332e5fdc 100644 --- a/docs/cloud/get-started/query-endpoints.md +++ b/docs/cloud/get-started/query-endpoints.md @@ -14,11 +14,11 @@ import endpoints_completed from '@site/static/images/cloud/sqlconsole/endpoints- import endpoints_curltest from '@site/static/images/cloud/sqlconsole/endpoints-curltest.png'; import endpoints_monitoring from '@site/static/images/cloud/sqlconsole/endpoints-monitoring.png'; -# Query API Endpoints +# Query API endpoints The **Query API Endpoints** feature allows you to create an API endpoint directly from any saved SQL query in the ClickHouse Cloud console. You'll be able to access API endpoints via HTTP to execute your saved queries without needing to connect to your ClickHouse Cloud service via a native driver. -## Quick-start Guide {#quick-start-guide} +## Quick-start guide {#quick-start-guide} Before proceeding, ensure you have an API key and an Admin Console Role. You can follow this guide to [create an API key](/cloud/manage/openapi). @@ -55,7 +55,7 @@ Next step, we'll go ahead and save the query: More documentation around saved queries can be found [here](/cloud/get-started/sql-console#saving-a-query). -### Configuring the Query API Endpoint {#configuring-the-query-api-endpoint} +### Configuring the query API endpoint {#configuring-the-query-api-endpoint} Query API endpoints can be configured directly from query view by clicking the **Share** button and selecting `API Endpoint`. You'll be prompted to specify which API key(s) should be able to access the endpoint: @@ -79,7 +79,7 @@ After you've sent your first request, a new button should appear immediately to -## Implementation Details {#implementation-details} +## Implementation details {#implementation-details} ### Description {#description} @@ -91,11 +91,11 @@ This route runs a query on a specified query endpoint. It supports different ver - **Method**: Basic Auth via OpenAPI Key/Secret - **Permissions**: Appropriate permissions for the query endpoint. -### URL Parameters {#url-parameters} +### URL parameters {#url-parameters} - `queryEndpointId` (required): The unique identifier of the query endpoint to run. -### Query Parameters {#query-parameters} +### Query parameters {#query-parameters} #### V1 {#v1} @@ -112,7 +112,7 @@ None - `x-clickhouse-endpoint-version` (optional): The version of the query endpoint. Supported versions are `1` and `2`. If not provided, the default version is last saved for the endpoint. - `x-clickhouse-endpoint-upgrade` (optional): Set this header to upgrade the endpoint version. This works in conjunction with the `x-clickhouse-endpoint-version` header. -### Request Body {#request-body} +### Request body {#request-body} - `queryVariables` (optional): An object containing variables to be used in the query. - `format` (optional): The format of the response. If Query API Endpoint is version 2 any ClickHouse supported format is possible. Supported formats for v1 are: @@ -132,19 +132,19 @@ None - **401 Unauthorized**: The request was made without authentication or with insufficient permissions. - **404 Not Found**: The specified query endpoint was not found. -### Error Handling {#error-handling} +### Error handling {#error-handling} - Ensure that the request includes valid authentication credentials. - Validate the `queryEndpointId` and `queryVariables` to ensure they are correct. - Handle any server errors gracefully, returning appropriate error messages. -### Upgrading the Endpoint Version {#upgrading-the-endpoint-version} +### Upgrading the endpoint version {#upgrading-the-endpoint-version} To upgrade the endpoint version from `v1` to `v2`, include the `x-clickhouse-endpoint-upgrade` header in the request and set it to `1`. This will trigger the upgrade process and allow you to use the features and improvements available in `v2`. ## Examples {#examples} -### Basic Request {#basic-request} +### Basic request {#basic-request} **Query API Endpoint SQL:** @@ -246,7 +246,7 @@ fetch( {"database":"INFORMATION_SCHEMA","num_tables":"REFERENTIAL_CONSTRAINTS"} ``` -### Request with Query Variables and Version 2 on JSONCompactEachRow Format {#request-with-query-variables-and-version-2-on-jsoncompacteachrow-format} +### Request with query variables and version 2 on JSONCompactEachRow format {#request-with-query-variables-and-version-2-on-jsoncompacteachrow-format} **Query API Endpoint SQL:** @@ -297,7 +297,7 @@ fetch( ["query_views_log", "system"] ``` -### Request with Array in the query variables that inserts data into a table {#request-with-array-in-the-query-variables-that-inserts-data-into-a-table} +### Request with array in the query variables that inserts data into a table {#request-with-array-in-the-query-variables-that-inserts-data-into-a-table} **Table SQL:** diff --git a/docs/cloud/get-started/query-insights.md b/docs/cloud/get-started/query-insights.md index 0f8047dfb41..5dbddedccb2 100644 --- a/docs/cloud/get-started/query-insights.md +++ b/docs/cloud/get-started/query-insights.md @@ -17,7 +17,7 @@ import insights_query_info from '@site/static/images/cloud/sqlconsole/insights_q The **Query Insights** feature makes ClickHouse's built-in query log easier to use through various visualizations and tables. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. -## Query Overview {#query-overview} +## Query overview {#query-overview} After selecting a service, the **Monitoring** navigation item in the left sidebar should expand to reveal a new **Query insights** sub-item. Clicking on this option opens the new Query insights page: diff --git a/docs/cloud/get-started/sql-console.md b/docs/cloud/get-started/sql-console.md index 5965623ad46..41f1e3bd94c 100644 --- a/docs/cloud/get-started/sql-console.md +++ b/docs/cloud/get-started/sql-console.md @@ -51,9 +51,9 @@ SQL console is the fastest and easiest way to explore and query your databases i - Execute queries and visualize result data in just a few clicks - Share queries with team members and collaborate more effectively. -### Exploring Tables {#exploring-tables} +### Exploring tables {#exploring-tables} -### Viewing Table List and Schema Info {#viewing-table-list-and-schema-info} +### Viewing table list and schema info {#viewing-table-list-and-schema-info} An overview of tables contained in your ClickHouse instance can be found in the left sidebar area. Use the database selector at the top of the left bar to view the tables in a specific database @@ -62,19 +62,19 @@ Tables in the list can also be expanded to view columns and types -### Exploring Table Data {#exploring-table-data} +### Exploring table data {#exploring-table-data} Click on a table in the list to open it in a new tab. In the Table View, data can be easily viewed, selected, and copied. Note that structure and formatting are preserved when copy-pasting to spreadsheet applications such as Microsoft Excel and Google Sheets. You can flip between pages of table data (paginated in 30-row increments) using the navigation in the footer. -### Inspecting Cell Data {#inspecting-cell-data} +### Inspecting cell data {#inspecting-cell-data} The Cell Inspector tool can be used to view large amounts of data contained within a single cell. To open it, right-click on a cell and select 'Inspect Cell'. The contents of the cell inspector can be copied by clicking the copy icon in the top right corner of the inspector contents. -## Filtering and Sorting Tables {#filtering-and-sorting-tables} +## Filtering and sorting tables {#filtering-and-sorting-tables} ### Sorting a table {#sorting-a-table} @@ -118,9 +118,9 @@ Filters and sorts are not mandatory when using the 'Create Query' feature. You can learn more about querying in the SQL console by reading the (link) query documentation. -## Creating and Running a Query {#creating-and-running-a-query} +## Creating and running a query {#creating-and-running-a-query} -### Creating a Query {#creating-a-query} +### Creating a query {#creating-a-query} There are two ways to create a new query in the SQL console. @@ -129,7 +129,7 @@ There are two ways to create a new query in the SQL console. -### Running a Query {#running-a-query} +### Running a query {#running-a-query} To run a query, type your SQL command(s) into the SQL Editor and click the 'Run' button or use the shortcut `cmd / ctrl + enter`. To write and run multiple commands sequentially, make sure to add a semicolon after each command. @@ -157,13 +157,13 @@ Running the command at the current cursor position can be achieved in two ways: The command present at the cursor position will flash yellow on execution. ::: -### Canceling a Query {#canceling-a-query} +### Canceling a query {#canceling-a-query} While a query is running, the 'Run' button in the Query Editor toolbar will be replaced with a 'Cancel' button. Simply click this button or press `Esc` to cancel the query. Note: Any results that have already been returned will persist after cancellation. -### Saving a Query {#saving-a-query} +### Saving a query {#saving-a-query} Saving queries allows you to easily find them later and share them with your teammates. The SQL console also allows you to organize your queries into folders. @@ -179,7 +179,7 @@ Alternatively, you can simultaneously name and save a query by clicking on "Unti -### Query Sharing {#query-sharing} +### Query sharing {#query-sharing} The SQL console allows you to easily share queries with your team members. The SQL console supports four levels of access that can be adjusted both globally and on a per-user basis: @@ -206,7 +206,7 @@ After selecting a team member, a new line item should appear with an access leve -### Accessing Shared Queries {#accessing-shared-queries} +### Accessing shared queries {#accessing-shared-queries} If a query has been shared with you, it will be displayed in the "Queries" tab of the SQL console left sidebar: @@ -218,7 +218,7 @@ Saved queries are also permalinked, meaning that you can send and receive links Values for any parameters that may exist in a query are automatically added to the saved query URL as query parameters. For example, if a query contains `{start_date: Date}` and `{end_date: Date}` parameters, the permalink can look like: `https://console.clickhouse.cloud/services/:serviceId/console/query/:queryId?param_start_date=2015-01-01¶m_end_date=2016-01-01`. -## Advanced Querying Features {#advanced-querying-features} +## Advanced querying features {#advanced-querying-features} ### Searching query results {#searching-query-results} @@ -246,7 +246,7 @@ Query result sets can be easily exported to CSV format directly from the SQL con -## Visualizing Query Data {#visualizing-query-data} +## Visualizing query data {#visualizing-query-data} Some data can be more easily interpreted in chart form. You can quickly create visualizations from query result data directly from the SQL console in just a few clicks. As an example, we'll use a query that calculates weekly statistics for NYC taxi trips: diff --git a/docs/cloud/manage/account-close.md b/docs/cloud/manage/account-close.md index fee12eb6cc0..6aed00bf62c 100644 --- a/docs/cloud/manage/account-close.md +++ b/docs/cloud/manage/account-close.md @@ -5,13 +5,13 @@ title: 'Account Close & Deletion' description: 'We know there are circumstances that sometimes necessitate account closure. This guide will help you through the process.' --- -## Account Close & Deletion {#account-close--deletion} +## Account closure and deletion {#account-close--deletion} Our goal is to help you be successful in your project. If you have questions that are not answered on this site or need help evaluating a unique use case, please contact us at [support@clickhouse.com](mailto:support@clickhouse.com). We know there are circumstances that sometimes necessitate account closure. This guide will help you through the process. -## Close vs Delete {#close-vs-delete} +## Close versus delete your account {#close-vs-delete} Customers may log back into closed accounts to view usage, billing and account-level activity logs. This enables you to easily access details that are useful for a variety of purposes, from documenting use cases to downloading invoices at the end of the year for tax purposes. You will also continue receiving product updates so that you know if a feature you may have been waiting for is now available. Additionally, @@ -23,7 +23,7 @@ be available. You will not receive product updates and may not reopen the accoun Newsletter subscribers can unsubscribe at any time by using the unsubscribe link at the bottom of the newsletter email without closing their account or deleting their information. -## Preparing for Closure {#preparing-for-closure} +## Preparing for closure {#preparing-for-closure} Before requesting account closure, please take the following steps to prepare the account. 1. Export any data from your service that you need to keep. @@ -31,7 +31,7 @@ Before requesting account closure, please take the following steps to prepare th 3. Remove all users except the admin that will request closure. This will help you ensure no new services are created while the process completes. 4. Review the 'Usage' and 'Billing' tabs in the control panel to verify all charges have been paid. We are not able to close accounts with unpaid balances. -## Request Account Closure {#request-account-closure} +## Request an account closure {#request-account-closure} We are required to authenticate requests for both closure and deletion. To ensure your request can be processed quickly, please follow the steps outlined below. @@ -51,7 +51,7 @@ Description: We would appreciate it if you would share a brief note about why yo 6. We will close your account and send a confirmation email to let you know when it is complete. -## Request Personal Data Deletion {#request-personal-data-deletion} +## Request deletion of your personal data {#request-personal-data-deletion} Please note, only account administrators may request personal data deletion from ClickHouse. If you are not an account administrator, please contact your ClickHouse account administrator to request to be removed from the account. diff --git a/docs/cloud/manage/api/api-overview.md b/docs/cloud/manage/api/api-overview.md index de31f91f9d8..0d006650519 100644 --- a/docs/cloud/manage/api/api-overview.md +++ b/docs/cloud/manage/api/api-overview.md @@ -25,14 +25,14 @@ consume the ClickHouse Cloud API docs, we offer a JSON-based Swagger endpoint via https://api.clickhouse.cloud/v1. You can also find the API docs via the [Swagger UI](https://clickhouse.com/docs/cloud/manage/api/swagger). -## Rate Limits {#rate-limits} +## Rate limits {#rate-limits} Developers are limited to 100 API keys per organization. Each API key has a limit of 10 requests over a 10-second window. If you'd like to increase the number of API keys or requests per 10-second window for your organization, please contact support@clickhouse.com -## Terraform Provider {#terraform-provider} +## Terraform provider {#terraform-provider} The official ClickHouse Terraform Provider lets you use [Infrastructure as Code](https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac) to create predictable, version-controlled configurations to make deployments much diff --git a/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md b/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md index e255b79ad0a..d8e9e34f7cd 100644 --- a/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md +++ b/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md @@ -61,7 +61,7 @@ You will need the following details to export/restore backups to your own CSP st ## Backup / Restore to AWS S3 Bucket {#backup--restore-to-aws-s3-bucket} -### Take a DB Backup {#take-a-db-backup} +### Take a DB backup {#take-a-db-backup} **Full Backup** @@ -97,7 +97,7 @@ See: [Configuring BACKUP/RESTORE to use an S3 Endpoint](/operations/backup#confi ## Backup / Restore to Azure Blob Storage {#backup--restore-to-azure-blob-storage} -### Take a DB Backup {#take-a-db-backup-1} +### Take a DB backup {#take-a-db-backup-1} **Full Backup** @@ -128,7 +128,7 @@ See: [Configuring BACKUP/RESTORE to use an S3 Endpoint](/operations/backup#confi ## Backup / Restore to Google Cloud Storage (GCS) {#backup--restore-to-google-cloud-storage-gcs} -### Take a DB Backup {#take-a-db-backup-2} +### Take a DB backup {#take-a-db-backup-2} **Full Backup** diff --git a/docs/cloud/manage/backups/overview.md b/docs/cloud/manage/backups/overview.md index b2098ce5e7f..1e8f85b0c22 100644 --- a/docs/cloud/manage/backups/overview.md +++ b/docs/cloud/manage/backups/overview.md @@ -34,7 +34,7 @@ On Day 1, a full backup is taken to start the backup chain. On Day 2, an increme ## Default backup policy {#default-backup-policy} -In the Basic, Scale, and Enterprise tiers, backups are metered and billed separately from storage. All services will default to one backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud Console. +In the Basic, Scale, and Enterprise tiers, backups are metered and billed separately from storage. All services will default to one backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud console. ## Backup status list {#backup-status-list} @@ -171,7 +171,7 @@ SYNC SETTINGS max_table_size_to_drop=2097152 -- increases the limit to 2TB ``` ::: -## Configurable Backups {#configurable-backups} +## Configurable backups {#configurable-backups} If you want to set up a backups schedule different from the default backup schedule, take a look at [Configurable Backups](./configurable-backups.md). diff --git a/docs/cloud/manage/billing.md b/docs/cloud/manage/billing.md index 3be9a301f38..1def3aac523 100644 --- a/docs/cloud/manage/billing.md +++ b/docs/cloud/manage/billing.md @@ -175,7 +175,7 @@ Best for: large scale, mission critical deployments that have stringent security
-## Frequently Asked Questions {#faqs} +## Frequently asked questions {#faqs} ### How is compute metered? {#how-is-compute-metered} @@ -191,7 +191,7 @@ Storage costs are the same across tiers and vary by region and cloud service pro Storage and backups are counted towards storage costs and billed separately. All services will default to one backup, retained for a day. -Users who need additional backups can do so by configuring additional [backups](backups/overview.md) under the settings tab of the Cloud Console. +Users who need additional backups can do so by configuring additional [backups](backups/overview.md) under the settings tab of the Cloud console. ### How do I estimate compression? {#how-do-i-estimate-compression} diff --git a/docs/cloud/manage/billing/payment-thresholds.md b/docs/cloud/manage/billing/payment-thresholds.md index 97049d81c3a..0c2b6948d0e 100644 --- a/docs/cloud/manage/billing/payment-thresholds.md +++ b/docs/cloud/manage/billing/payment-thresholds.md @@ -6,7 +6,7 @@ description: 'Payment thresholds and automatic invoicing for ClickHouse Cloud.' keywords: ['billing', 'payment thresholds', 'automatic invoicing', 'invoice'] --- -# Payment Thresholds +# Payment thresholds When your amount due in a billing period for ClickHouse Cloud reaches $10,000 USD or the equivalent value, your payment method will be automatically charged. A failed charge will result in the suspension or termination of your services after a grace period. diff --git a/docs/cloud/manage/cloud-tiers.md b/docs/cloud/manage/cloud-tiers.md index fc56abe89a5..1cd784431ec 100644 --- a/docs/cloud/manage/cloud-tiers.md +++ b/docs/cloud/manage/cloud-tiers.md @@ -5,7 +5,7 @@ title: 'ClickHouse Cloud Tiers' description: 'Cloud tiers available in ClickHouse Cloud' --- -# ClickHouse Cloud Tiers +# ClickHouse Cloud tiers There are several tiers available in ClickHouse Cloud. Tiers are assigned at any organizational level. Services within an organization therefore belong to the same tier. diff --git a/docs/cloud/manage/dashboards.md b/docs/cloud/manage/dashboards.md index cbdc480e7a2..c33953aa82b 100644 --- a/docs/cloud/manage/dashboards.md +++ b/docs/cloud/manage/dashboards.md @@ -22,9 +22,9 @@ import dashboards_11 from '@site/static/images/cloud/dashboards/11_dashboards.pn The SQL Console's dashboards feature allows you to collect and share visualizations from saved queries. Get started by saving and visualizing queries, adding query visualizations to a dashboard, and making the dashboard interactive using query parameters. -## Core Concepts {#core-concepts} +## Core concepts {#core-concepts} -### Query Sharing {#query-sharing} +### Query sharing {#query-sharing} In order to share your dashboard with colleagues, please be sure to share the underlying saved query. To view a visualization, users must have, at a minimum, read-only access to the underlying saved query. @@ -34,11 +34,11 @@ Use [query parameters](/sql-reference/syntax#defining-and-using-query-parameters You can toggle the query parameter input via the **Global** filters side pane by selecting a “filter” type in the visualization settings. You can also toggle the query parameter input by linking to another object (like a table) on the dashboard. Please see the “[configure a filter](/cloud/manage/dashboards#configure-a-filter)” section of the quick start guide below. -## Quick Start {#quick-start} +## Quick start {#quick-start} Let's create a dashboard to monitor our ClickHouse service using the [query\_log](/operations/system-tables/query_log) system table. -## Quick Start {#quick-start-1} +## Quick start {#quick-start-1} ### Create a saved query {#create-a-saved-query} diff --git a/docs/cloud/manage/integrations.md b/docs/cloud/manage/integrations.md index fb2c23f8453..67e562aa23d 100644 --- a/docs/cloud/manage/integrations.md +++ b/docs/cloud/manage/integrations.md @@ -7,7 +7,7 @@ description: 'Integrations for ClickHouse' To see a full list of integrations for ClickHouse, please see [this page](/integrations). -## Proprietary Integrations for ClickHouse Cloud {#proprietary-integrations-for-clickhouse-cloud} +## Proprietary integrations for ClickHouse Cloud {#proprietary-integrations-for-clickhouse-cloud} Besides the dozens of integrations available for ClickHouse, there are also some proprietary integrations only available for ClickHouse Cloud: @@ -24,9 +24,9 @@ Looker Studio can be connected to ClickHouse Cloud by enabling the [MySQL interf ### MySQL Interface {#mysql-interface} -Some applications currently do not support the ClickHouse wire protocol. To use ClickHouse Cloud with these applications, you can enable the MySQL wire protocol through the Cloud Console. Please see [this page](/interfaces/mysql#enabling-the-mysql-interface-on-clickhouse-cloud) for details on how to enable the MySQL wire protocol through the Cloud Console. +Some applications currently do not support the ClickHouse wire protocol. To use ClickHouse Cloud with these applications, you can enable the MySQL wire protocol through the Cloud console. Please see [this page](/interfaces/mysql#enabling-the-mysql-interface-on-clickhouse-cloud) for details on how to enable the MySQL wire protocol through the Cloud console. -## Unsupported Integrations {#unsupported-integrations} +## Unsupported integrations {#unsupported-integrations} The following features for integrations are not currently available for ClickHouse Cloud as they are experimental features. If you need to support these features in your application, please contact support@clickhouse.com. diff --git a/docs/cloud/manage/jan2025_faq/backup.md b/docs/cloud/manage/jan2025_faq/backup.md index 706435db827..579788f8dec 100644 --- a/docs/cloud/manage/jan2025_faq/backup.md +++ b/docs/cloud/manage/jan2025_faq/backup.md @@ -7,7 +7,7 @@ description: 'Backup policy in new tiers' ## What is the backup policy? {#what-is-the-backup-policy} In Basic, Scale, and Enterprise tiers backups are metered and billed separately from storage. -All services will default to one daily backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud Console. Each backup will be retained for at least 24 hours. +All services will default to one daily backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud console. Each backup will be retained for at least 24 hours. ## What happens to current configurations that users have set up separate from default backups? {#what-happens-to-current-configurations-that-users-have-set-up-separate-from-default-backups} diff --git a/docs/cloud/manage/jan2025_faq/new_tiers.md b/docs/cloud/manage/jan2025_faq/new_tiers.md index 704e3e442f4..3e87d497bb6 100644 --- a/docs/cloud/manage/jan2025_faq/new_tiers.md +++ b/docs/cloud/manage/jan2025_faq/new_tiers.md @@ -15,7 +15,7 @@ description: 'Description of new tiers and features' - **Single Sign On (SSO):** This feature is offered in Enterprise tier and requires a support ticket to be enabled for an Organization. Users who have multiple Organizations should ensure all of their organizations are on the Enterprise tier to use SSO for each organization. -## Basic Tier {#basic-tier} +## Basic tier {#basic-tier} ### What are the considerations for the Basic tier? {#what-are-the-considerations-for-the-basic-tier} @@ -37,7 +37,7 @@ Yes, single replica services are supported on all three tiers. Users can scale o No, services on this tier are meant to support workloads that are small and fixed size (single replica `1x8GiB` or `1x12GiB`). If users need to scale up/down or add replicas, they will be prompted to upgrade to Scale or Enterprise tiers. -## Scale Tier {#scale-tier} +## Scale tier {#scale-tier} ### Which tiers on the new plans (Basic/Scale/Enterprise) support compute-compute separation? {#which-tiers-on-the-new-plans-basicscaleenterprise-support-compute-compute-separation} @@ -47,7 +47,7 @@ Only Scale and Enterprise tiers support compute-compute separation. Please also Compute-compute separation is not supported on existing Development and Production services, except for users who already participated in the Private Preview and Beta. If you have additional questions, please contact [support](https://clickhouse.com/support/program). -## Enterprise Tier {#enterprise-tier} +## Enterprise tier {#enterprise-tier} ### What different hardware profiles are supported for the Enterprise tier? {#what-different-hardware-profiles-are-supported-for-the-enterprise-tier} diff --git a/docs/cloud/manage/jan2025_faq/plan_migrations.md b/docs/cloud/manage/jan2025_faq/plan_migrations.md index f69e941127d..cce2fbe0fb9 100644 --- a/docs/cloud/manage/jan2025_faq/plan_migrations.md +++ b/docs/cloud/manage/jan2025_faq/plan_migrations.md @@ -28,19 +28,21 @@ Yes, see below for guidance on self-serve migrations: Users can upgrade during the trial and continue to use the trial credits to evaluate the new service tiers and the features it supports. However, if they choose to continue using the same Development and Production services, they can do so and upgrade to PAYG. They will still have to migrate before July 23, 2025. -### Can users upgrade their tiers, i.e. Basic → Scale, Scale → Enterprise, etc? {#can-users-upgrade-their-tiers-ie-basic--scale-scale--enterprise-etc} +### Can users upgrade their tiers {#can-users-upgrade-their-tiers-ie-basic--scale-scale--enterprise-etc} -Yes, users can upgrade self-serve and the pricing will reflect the tier selection after upgrade. +Can users upgrade their tiers, for example, Basic → Scale, Scale → Enterprise, etc. +Yes, users can upgrade self-serve, and the pricing will reflect the tier selection after upgrade. -### Can users move from a higher to a lower-cost tier, e.g., Enterprise → Scale, Scale → Basic, Enterprise → Basic self-serve? {#can-users-move-from-a-higher-to-a-lower-cost-tier-eg-enterprise--scale-scale--basic-enterprise--basic-self-serve} +### Can users move from a higher to a lower-cost tier {#can-users-move-from-a-higher-to-a-lower-cost-tier-eg-enterprise--scale-scale--basic-enterprise--basic-self-serve} +For example, Enterprise → Scale, Scale → Basic, Enterprise → Basic self-serve? No, we do not permit downgrading tiers. -### Can users with only Development services in the organization migrate to the Basic tier? {#can-users-with-only-development-services-in-the-organization-migrate-to-the-basic-tier} +### Can users with only development services in the organization migrate to the Basic tier? {#can-users-with-only-development-services-in-the-organization-migrate-to-the-basic-tier} Yes, this would be permitted. Users will be given a recommendation based on their past use and can select Basic `1x8GiB` or `1x12GiB`. -### Can users with a Development and Production service in the same organization move to the Basic Tier? {#can-users-with-a-development-and-production-service-in-the-same-organization-move-to-the-basic-tier} +### Can users with a development and production service in the same organization move to the basic tier? {#can-users-with-a-development-and-production-service-in-the-same-organization-move-to-the-basic-tier} No, if a user has both Development and Production services in the same organization, they can self-serve and migrate only to the Scale or Enterprise tier. If they want to migrate to Basic, they should delete all existing Production services. diff --git a/docs/cloud/manage/notifications.md b/docs/cloud/manage/notifications.md index abc9fde3064..708c41b2274 100644 --- a/docs/cloud/manage/notifications.md +++ b/docs/cloud/manage/notifications.md @@ -17,7 +17,7 @@ ClickHouse Cloud sends notifications about critical events related to your servi 2. **Notification severity**: Notification severity can be `info`, `warning`, or `critical` depending on how important a notification is. This is not configurable. 3. **Notification channel**: Channel refers to the mode by which the notification is received such as UI, email, Slack etc. This is configurable for most notifications. -## Receiving Notifications {#receiving-notifications} +## Receiving notifications {#receiving-notifications} Notifications can be received via various channels. For now, ClickHouse Cloud supports receiving notifications through email, ClickHouse Cloud UI, and Slack. You can click on the bell icon in the top left menu to view current notifications, which opens a flyout. Clicking the button **View All** the bottom of the flyout will take you to a page that shows an activity log of all notifications. @@ -27,7 +27,7 @@ Notifications can be received via various channels. For now, ClickHouse Cloud su ClickHouse Cloud notifications activity log -## Customizing Notifications {#customizing-notifications} +## Customizing notifications {#customizing-notifications} For each notification, you can customize how you receive the notification. You can access the settings screen from the notifications flyout or from the second tab on the notifications activity log. @@ -43,6 +43,6 @@ To configure delivery for a specific notification, click on the pencil icon to m Certain **required** notifications such as **Payment failed** are not configurable. ::: -## Supported Notifications {#supported-notifications} +## Supported notifications {#supported-notifications} Currently, we send out notifications related to billing (payment failure, usage exceeded ascertain threshold, etc.) as well as notifications related to scaling events (scaling completed, scaling blocked etc.). diff --git a/docs/cloud/manage/openapi.md b/docs/cloud/manage/openapi.md index 4e1410bb53f..919cb38cc48 100644 --- a/docs/cloud/manage/openapi.md +++ b/docs/cloud/manage/openapi.md @@ -12,7 +12,7 @@ import image_04 from '@site/static/images/cloud/manage/openapi4.png'; import image_05 from '@site/static/images/cloud/manage/openapi5.png'; import Image from '@theme/IdealImage'; -# Managing API Keys +# Managing API keys ClickHouse Cloud provides an API utilizing OpenAPI that allows you to programmatically manage your account and aspects of your services. diff --git a/docs/cloud/manage/postman.md b/docs/cloud/manage/postman.md index 51655f3c536..d1917939568 100644 --- a/docs/cloud/manage/postman.md +++ b/docs/cloud/manage/postman.md @@ -28,16 +28,19 @@ This guide will help you test the ClickHouse Cloud API using [Postman](https://w The Postman Application is available for use within a web browser or can be downloaded to a desktop. ### Create an account {#create-an-account} + * Free accounts are available at [https://www.postman.com](https://www.postman.com). Postman site -### Create a Workspace {#create-a-workspace} +### Create a workspace {#create-a-workspace} + * Name your workspace and set the visibility level. Create workspace -### Create a Collection {#create-a-collection} +### Create a collection {#create-a-collection} + * Below "Explore" on the top left Menu click "Import": Explore > Import @@ -63,7 +66,7 @@ The Postman Application is available for use within a web browser or can be down Import complete -### Set Authorization {#set-authorization} +### Set authorization {#set-authorization} * Toggle the dropdown menu to select "Basic Auth": Basic auth @@ -72,9 +75,12 @@ The Postman Application is available for use within a web browser or can be down credentials -### Enable Variables {#enable-variables} +### Enable variables {#enable-variables} + * [Variables](https://learning.postman.com/docs/sending-requests/variables/) enable the storage and reuse of values in Postman allowing for easier API testing. -#### Set the Organization ID and Service ID {#set-the-organization-id-and-service-id} + +#### Set the organization ID and Service ID {#set-the-organization-id-and-service-id} + * Within the "Collection", click the "Variable" tab in the middle pane (The Base URL will have been set by the earlier API import): * Below `baseURL` click the open field "Add new value", and Substitute your organization ID and service ID: @@ -82,7 +88,9 @@ The Postman Application is available for use within a web browser or can be down ## Test the ClickHouse Cloud API functionalities {#test-the-clickhouse-cloud-api-functionalities} + ### Test "GET list of available organizations" {#test-get-list-of-available-organizations} + * Under the "OpenAPI spec for ClickHouse Cloud", expand the folder > V1 > organizations * Click "GET list of available organizations" and press the blue "Send" button on the right: @@ -93,6 +101,7 @@ The Postman Application is available for use within a web browser or can be down Status ### Test "GET organizational details" {#test-get-organizational-details} + * Under the `organizationid` folder, navigate to "GET organizational details": * In the middle frame menu under Params an `organizationid` is required. @@ -109,6 +118,7 @@ The Postman Application is available for use within a web browser or can be down * The returned results should deliver your organization details with "status": 200. (If you receive a "status" 400 with no organization information your configuration is not correct). ### Test "GET service details" {#test-get-service-details} + * Click "GET service details" * Edit the Values for `organizationid` and `serviceid` with `{{orgid}}` and `{{serviceid}}` respectively. * Press "Save" and then the blue "Send" button on the right. diff --git a/docs/cloud/manage/replica-aware-routing.md b/docs/cloud/manage/replica-aware-routing.md index a32f5de78c5..c220fba21bb 100644 --- a/docs/cloud/manage/replica-aware-routing.md +++ b/docs/cloud/manage/replica-aware-routing.md @@ -5,7 +5,7 @@ description: 'How to use Replica-aware routing to increase cache re-use' keywords: ['cloud', 'sticky endpoints', 'sticky', 'endpoints', 'sticky routing', 'routing', 'replica aware routing'] --- -# Replica-aware routing (Private Preview) +# Replica-aware routing (private preview) Replica-aware routing (also known as sticky sessions, sticky routing, or session affinity) utilizes [Envoy proxy's ring hash load balancing](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash). The main purpose of replica-aware routing is to increase the chance of cache reuse. It does not guarantee isolation. @@ -31,6 +31,6 @@ Any disruption to the service, e.g. server pod restarts (due to any reason like Customers need to manually add a DNS entry to make name resolution work for the new hostname pattern. It is possible that this can cause imbalance in the server load if customers use it incorrectly. -## Configuring Replica-aware Routing {#configuring-replica-aware-routing} +## Configuring replica-aware routing {#configuring-replica-aware-routing} To enable Replica-aware routing, please contact [our support team](https://clickhouse.com/support). diff --git a/docs/cloud/manage/scaling.md b/docs/cloud/manage/scaling.md index 429bbc848d7..6da2eb2570f 100644 --- a/docs/cloud/manage/scaling.md +++ b/docs/cloud/manage/scaling.md @@ -15,7 +15,7 @@ import scaling_configure from '@site/static/images/cloud/manage/scaling-configur import scaling_memory_allocation from '@site/static/images/cloud/manage/scaling-memory-allocation.png'; import ScalePlanFeatureBadge from '@theme/badges/ScalePlanFeatureBadge' -# Automatic Scaling +# Automatic scaling Scaling is the ability to adjust available resources to meet client demands. Scale and Enterprise (with standard 1:4 profile) tier services can be scaled horizontally by calling an API programmatically, or changing settings on the UI to adjust system resources. Alternatively, these services can be **autoscaled** vertically to meet application demands. @@ -110,7 +110,7 @@ Once the service has scaled, the metrics dashboard in the cloud console should s Scaling memory allocation -## Automatic Idling {#automatic-idling} +## Automatic idling {#automatic-idling} In the **Settings** page, you can also choose whether or not to allow automatic idling of your service when it is inactive as shown in the image above (i.e. when the service is not executing any user-submitted queries). Automatic idling reduces the cost of your service, as you are not billed for compute resources when the service is paused. :::note @@ -123,7 +123,8 @@ The service may enter an idle state where it suspends refreshes of [refreshable Use automatic idling only if your use case can handle a delay before responding to queries, because when a service is paused, connections to the service will time out. Automatic idling is ideal for services that are used infrequently and where a delay can be tolerated. It is not recommended for services that power customer-facing features that are used frequently. ::: -## Handling bursty workloads {#handling-bursty-workloads} +## Handling spikes in workload {#handling-bursty-workloads} + If you have an upcoming expected spike in your workload, you can use the [ClickHouse Cloud API](/cloud/manage/api/api-overview) to preemptively scale up your service to handle the spike and scale it down once diff --git a/docs/cloud/manage/service-uptime.md b/docs/cloud/manage/service-uptime.md index cae47a221e3..3a31e459eaf 100644 --- a/docs/cloud/manage/service-uptime.md +++ b/docs/cloud/manage/service-uptime.md @@ -5,7 +5,7 @@ title: 'Service Uptime' description: 'Users can now see regional uptimes on the status page and subscribe to alerts on service disruptions.' --- -## Uptime Alerts {#uptime-alerts} +## Uptime alerts {#uptime-alerts} Users can now see regional uptimes on the [status page](https://status.clickhouse.com/) and subscribe to alerts on service disruptions. diff --git a/docs/cloud/manage/settings.md b/docs/cloud/manage/settings.md index 0ce24c99d60..a766ef59c13 100644 --- a/docs/cloud/manage/settings.md +++ b/docs/cloud/manage/settings.md @@ -8,7 +8,7 @@ description: 'How to configure settings for your ClickHouse Cloud service for a import Image from '@theme/IdealImage'; import cloud_settings_sidebar from '@site/static/images/cloud/manage/cloud-settings-sidebar.png'; -# Configuring Settings +# Configuring settings To specify settings for your ClickHouse Cloud service for a specific [user](/operations/access-rights#user-account-management) or [role](/operations/access-rights#role-management), you must use [SQL-driven Settings Profiles](/operations/access-rights#settings-profiles-management). Applying Settings Profiles ensures that the settings you configure persist, even when your services stop, idle, and upgrade. To learn more about Settings Profiles, please see [this page](/operations/settings/settings-profiles.md). diff --git a/docs/cloud/reference/architecture.md b/docs/cloud/reference/architecture.md index a84e0eef521..9c3d7cf5f56 100644 --- a/docs/cloud/reference/architecture.md +++ b/docs/cloud/reference/architecture.md @@ -7,7 +7,7 @@ description: 'This page describes the architecture of ClickHouse Cloud' import Architecture from '@site/static/images/cloud/reference/architecture.svg'; -# ClickHouse Cloud Architecture +# ClickHouse Cloud architecture @@ -43,10 +43,10 @@ For AWS, access to storage is controlled via AWS IAM, and each IAM role is uniqu For GCP and Azure, services have object storage isolation (all services have their own buckets or storage container). -## Compute-Compute Separation {#compute-compute-separation} +## Compute-compute separation {#compute-compute-separation} [Compute-compute separation](/cloud/reference/warehouses) lets users create multiple compute node groups, each with their own service URL, that all use the same shared object storage. This allows for compute isolation of different use cases such as reads from writes, that share the same data. It also leads to more efficient resource utilization by allowing for independent scaling of the compute groups as needed. -## Concurrency Limits {#concurrency-limits} +## Concurrency limits {#concurrency-limits} There is no limit to the number of queries per second (QPS) in your ClickHouse Cloud service. There is, however, a limit of 1000 concurrent queries per replica. QPS is ultimately a function of your average query execution time and the number of replicas in your service. diff --git a/docs/cloud/reference/byoc.md b/docs/cloud/reference/byoc.md index 36f66e6cd9a..d59ae44c00f 100644 --- a/docs/cloud/reference/byoc.md +++ b/docs/cloud/reference/byoc.md @@ -48,11 +48,11 @@ Metrics and logs are stored within the customer's BYOC VPC. Logs are currently s
-## Onboarding Process {#onboarding-process} +## Onboarding process {#onboarding-process} Customers can initiate the onboarding process by reaching out to [us](https://clickhouse.com/cloud/bring-your-own-cloud). Customers need to have a dedicated AWS account and know the region they will use. At this time, we are allowing users to launch BYOC services only in the regions that we support for ClickHouse Cloud. -### Prepare an AWS Account {#prepare-an-aws-account} +### Prepare an AWS account {#prepare-an-aws-account} Customers are recommended to prepare a dedicated AWS account for hosting the ClickHouse BYOC deployment to ensure better isolation. However, using a shared account and an existing VPC is also possible. See the details in *Setup BYOC Infrastructure* below. @@ -79,7 +79,7 @@ module "clickhouse_onboarding" { -### Setup BYOC Infrastructure {#setup-byoc-infrastructure} +### Set up BYOC infrastructure {#setup-byoc-infrastructure} After creating the CloudFormation stack, you will be prompted to set up the infrastructure, including S3, VPC, and the EKS cluster, from the cloud console. Certain configurations must be determined at this stage, as they cannot be changed later. Specifically: @@ -121,7 +121,7 @@ Create a support ticket with the following information: To create or delete VPC peering for ClickHouse BYOC, follow the steps: -#### Step 1 Enable Private Load Balancer for ClickHouse BYOC {#step-1-enable-private-load-balancer-for-clickhouse-byoc} +#### Step 1: Enable private load balancer for ClickHouse BYOC {#step-1-enable-private-load-balancer-for-clickhouse-byoc} Contact ClickHouse Support to enable Private Load Balancer. #### Step 2 Create a peering connection {#step-2-create-a-peering-connection} @@ -177,7 +177,7 @@ In the peering AWS account,
-#### Step 6 Edit Security Group to allow Peered VPC access {#step-6-edit-security-group-to-allow-peered-vpc-access} +#### Step 6: Edit security group to allow peered VPC access {#step-6-edit-security-group-to-allow-peered-vpc-access} In the ClickHouse BYOC account, you need to update the Security Group settings to allow traffic from your peered VPC. Please contact ClickHouse Support to request the addition of inbound rules that include the CIDR ranges of your peered VPC. --- @@ -189,7 +189,7 @@ To access ClickHouse privately, a private load balancer and endpoint are provisi Optional, after verifying that peering is working, you can request the removal of the public load balancer for ClickHouse BYOC. -## Upgrade Process {#upgrade-process} +## Upgrade process {#upgrade-process} We regularly upgrade the software, including ClickHouse database version upgrades, ClickHouse Operator, EKS, and other components. @@ -199,7 +199,7 @@ While we aim for seamless upgrades (e.g., rolling upgrades and restarts), some, Maintenance windows do not apply to security and vulnerability fixes. These are handled as off-cycle upgrades, with timely communication to coordinate a suitable time and minimize operational impact. ::: -## CloudFormation IAM Roles {#cloudformation-iam-roles} +## CloudFormation IAM roles {#cloudformation-iam-roles} ### Bootstrap IAM role {#bootstrap-iam-role} @@ -233,7 +233,7 @@ These roles are assumed by applications running within the customer's EKS cluste Lastly, **`data-plane-mgmt`** allows a ClickHouse Cloud Control Plane component to reconcile necessary custom resources, such as `ClickHouseCluster` and the Istio Virtual Service/Gateway. -## Network Boundaries {#network-boundaries} +## Network boundaries {#network-boundaries} This section covers different network traffic to and from the customer BYOC VPC: @@ -324,7 +324,7 @@ Besides Clickhouse instances (ClickHouse servers and ClickHouse Keeper), we run Currently we have 3 m5.xlarge nodes (one for each AZ) in a dedicated node group to run those workloads. -### Network and Security {#network-and-security} +### Network and security {#network-and-security} #### Can we revoke permissions set up during installation after setup is complete? {#can-we-revoke-permissions-set-up-during-installation-after-setup-is-complete} @@ -346,9 +346,9 @@ Contact support to schedule maintenance windows. Please expect a minimum of a we ## Observability {#observability} -### Built-in Monitoring Tools {#built-in-monitoring-tools} +### Built-in monitoring tools {#built-in-monitoring-tools} -#### Observability Dashboard {#observability-dashboard} +#### Observability dashboard {#observability-dashboard} ClickHouse Cloud includes an advanced observability dashboard that displays metrics such as memory usage, query rates, and I/O. This can be accessed in the **Monitoring** section of ClickHouse Cloud web console interface. @@ -358,7 +358,7 @@ ClickHouse Cloud includes an advanced observability dashboard that displays metr
-#### Advanced Dashboard {#advanced-dashboard} +#### Advanced dashboard {#advanced-dashboard} You can customize a dashboard using metrics from system tables like `system.metrics`, `system.events`, and `system.asynchronous_metrics` and more to monitor server performance and resource utilization in detail. diff --git a/docs/cloud/reference/changelog.md b/docs/cloud/reference/changelog.md index 94795beb8a5..7a2512e712f 100644 --- a/docs/cloud/reference/changelog.md +++ b/docs/cloud/reference/changelog.md @@ -156,7 +156,7 @@ In addition to this ClickHouse Cloud changelog, please see the [Cloud Compatibil ## February 21, 2025 {#february-21-2025} -### ClickHouse Bring Your Own Cloud (BYOC) for AWS is now generally available! {#clickhouse-byoc-for-aws-ga} +### ClickHouse Bring Your Own Cloud (BYOC) for AWS is now generally available {#clickhouse-byoc-for-aws-ga} In this deployment model, data plane components (compute, storage, backups, logs, metrics) run in the Customer VPC, while the control plane (web access, APIs, and billing) @@ -256,7 +256,7 @@ Users can schedule upgrades for their services. This feature is supported for En [Golang](https://github.com/ClickHouse/clickhouse-go/releases/tag/v2.30.1), [Python](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.8.11), and [NodeJS](https://github.com/ClickHouse/clickhouse-js/releases/tag/1.10.1) clients added support for Dynamic, Variant, and JSON types. -### DBT support for Refreshable Materialized Views {#dbt-support-for-refreshable-materialized-views} +### DBT support for refreshable materialized views {#dbt-support-for-refreshable-materialized-views} DBT now [supports Refreshable Materialized Views](https://github.com/ClickHouse/dbt-clickhouse/releases/tag/v1.8.7) in the `1.8.7` release. @@ -299,15 +299,15 @@ Org Admins can now add more email addresses to a specific notification as additi ## December 6, 2024 {#december-6-2024} -### BYOC (Beta) {#byoc-beta} +### BYOC (beta) {#byoc-beta} Bring Your Own Cloud for AWS is now available in Beta. This deployment model allows you to deploy and run ClickHouse Cloud in your own AWS account. We support deployments in 11+ AWS regions, with more coming soon. Please [contact support](https://clickhouse.com/support/program) for access. Note that this deployment is reserved for large-scale deployments. -### Postgres Change-Data-Capture (CDC) Connector in ClickPipes {#postgres-change-data-capture-cdc-connector-in-clickpipes} +### Postgres Change Data Capture (CDC) connector in ClickPipes {#postgres-change-data-capture-cdc-connector-in-clickpipes} This turnkey integration enables customers to replicate their Postgres databases to ClickHouse Cloud in just a few clicks and leverage ClickHouse for blazing-fast analytics. You can use this connector for both continuous replication and one-time migrations from Postgres. -### Dashboards (Beta) {#dashboards-beta} +### Dashboards (beta) {#dashboards-beta} This week, we're excited to announce the Beta launch of Dashboards in ClickHouse Cloud. With Dashboards, users can turn saved queries into visualizations, organize visualizations onto dashboards, and interact with dashboards using query parameters. To get started, follow the [dashboards documentation](/cloud/manage/dashboards). @@ -333,15 +333,15 @@ To get started, follow the [Query API Endpoints documentation](/cloud/get-starte We are launching Beta for our native JSON support in ClickHouse Cloud. To get started, please get in touch with support[ to enable your cloud service](/cloud/support). -### Vector search using vector similarity indexes (Early Access) {#vector-search-using-vector-similarity-indexes-early-access} +### Vector search using vector similarity indexes (early access) {#vector-search-using-vector-similarity-indexes-early-access} -We are announcing vector similarity indexes for approximate vector search in early access! +We are announcing vector similarity indexes for approximate vector search in early access. ClickHouse already offers robust support for vector-based use cases, with a wide range of [distance functions]https://clickhouse.com/blog/reinvent-2024-product-announcements#vector-search-using-vector-similarity-indexes-early-access) and the ability to perform linear scans. In addition, more recently, we added an experimental[ approximate vector search](/engines/table-engines/mergetree-family/annindexes) approach powered by the [usearch](https://github.com/unum-cloud/usearch) library and the Hierarchical Navigable Small Worlds (HNSW) approximate nearest neighbor search algorithm. To get started, [please sign up for the early access waitlist](https://clickhouse.com/cloud/vector-search-index-waitlist). -### ClickHouse-Connect (Python) and ClickHouse-Kafka-Connect Users {#clickhouse-connect-python-and-clickhouse-kafka-connect-users} +### ClickHouse-connect (Python) and ClickHouse Kafka Connect users {#clickhouse-connect-python-and-clickhouse-kafka-connect-users} Notification emails went out to customers who had experienced issues where the clients could encounter a `MEMORY_LIMIT_EXCEEDED` exception. @@ -367,7 +367,7 @@ From now on, any new services created on AWS will allow a maximum available repl ### Built-in advanced observability dashboard for ClickHouse Cloud {#built-in-advanced-observability-dashboard-for-clickhouse-cloud} -Previously, the advanced observability dashboard that allows you to monitor ClickHouse server metrics and hardware resource utilization was only available in open-source ClickHouse. We are happy to announce that this feature is now available in the ClickHouse Cloud console! +Previously, the advanced observability dashboard that allows you to monitor ClickHouse server metrics and hardware resource utilization was only available in open-source ClickHouse. We are happy to announce that this feature is now available in the ClickHouse Cloud console. This dashboard allows you to view queries based on the [system.dashboards](/operations/system-tables/dashboards) table in an all-in-one UI. Visit **Monitoring > Service Health** page to start using the advanced observability dashboard today. @@ -375,11 +375,11 @@ This dashboard allows you to view queries based on the [system.dashboards](/oper ### AI-powered SQL autocomplete {#ai-powered-sql-autocomplete} -We've improved autocomplete significantly, allowing you to get in-line SQL completions as you write your queries with the new AI Copilot! This feature can be enabled by toggling the **"Enable Inline Code Completion"** setting for any ClickHouse Cloud service. +We've improved autocomplete significantly, allowing you to get in-line SQL completions as you write your queries with the new AI Copilot. This feature can be enabled by toggling the **"Enable Inline Code Completion"** setting for any ClickHouse Cloud service. Animation showing the AI Copilot providing SQL autocompletion suggestions as a user types -### New "Billing" role {#new-billing-role} +### New "billing" role {#new-billing-role} You can now assign users in your organization to a new **Billing** role that allows them to view and manage billing information without giving them the ability to configure or manage services. Simply invite a new user or edit an existing user's role to assign the **Billing** role. @@ -405,7 +405,7 @@ Customers looking for increased security for protected health information (PHI) Services are available in GCP `us-central-1` to customers with the **Dedicated** service type and require a Business Associate Agreement (BAA). Contact [sales](mailto:sales@clickhouse.com) or [support](https://clickhouse.com/support/program) to request access to this feature or join the wait list for additional GCP, AWS, and Azure regions. -### Compute-Compute separation is now in Private Preview for GCP and Azure {#compute-compute-separation-is-now-in-private-preview-for-gcp-and-azure} +### Compute-compute separation is now in private preview for GCP and Azure {#compute-compute-separation-is-now-in-private-preview-for-gcp-and-azure} We recently announced the Private Preview for Compute-Compute Separation for AWS. We're happy to announce that it is now available for GCP and Azure. @@ -415,9 +415,9 @@ Compute-compute separation allows you to designate specific services as read-wri Customers using multi-factor authentication can now obtain recovery codes that can be used in the event of a lost phone or accidentally deleted token. Customers enrolling in MFA for the first time will be provided the code on set up. Customers with existing MFA can obtain a recovery code by removing their existing MFA token and adding a new one. -### ClickPipes Update: Custom Certificates, Latency Insights, and More! {#clickpipes-update-custom-certificates-latency-insights-and-more} +### ClickPipes update: custom certificates, latency insights, and more. {#clickpipes-update-custom-certificates-latency-insights-and-more} -We're excited to share the latest updates for ClickPipes, the easiest way to ingest data into your ClickHouse service! These new features are designed to enhance your control over data ingestion and provide greater visibility into performance metrics. +We're excited to share the latest updates for ClickPipes, the easiest way to ingest data into your ClickHouse service. These new features are designed to enhance your control over data ingestion and provide greater visibility into performance metrics. *Custom Authentication Certificates for Kafka* @@ -443,7 +443,7 @@ It is now possible to ingest an entire Kafka or Kinesis message without parsing ### New Terraform provider version - v1.0.0 {#new-terraform-provider-version---v100} -Terraform allows you to control your ClickHouse Cloud services programmatically, then store your configuration as code. Our Terraform provider has almost 200,000 downloads and is now officially v1.0.0! This new version includes improvements such as better retry logic and a new resource to attach private endpoints to your ClickHouse Cloud service. You can download the [Terraform provider here](https://registry.terraform.io/providers/ClickHouse/clickhouse/latest) and view the [full changelog here](https://github.com/ClickHouse/terraform-provider-clickhouse/releases/tag/v1.0.0). +Terraform allows you to control your ClickHouse Cloud services programmatically, then store your configuration as code. Our Terraform provider has almost 200,000 downloads and is now officially v1.0.0. This new version includes improvements such as better retry logic and a new resource to attach private endpoints to your ClickHouse Cloud service. You can download the [Terraform provider here](https://registry.terraform.io/providers/ClickHouse/clickhouse/latest) and view the [full changelog here](https://github.com/ClickHouse/terraform-provider-clickhouse/releases/tag/v1.0.0). ### 2024 SOC 2 Type II report and updated ISO 27001 certificate {#2024-soc-2-type-ii-report-and-updated-iso-27001-certificate} @@ -469,11 +469,11 @@ ClickPipes is the easiest way to ingest data into ClickHouse Cloud. We're happy ## July 18, 2024 {#july-18-2024} -### Prometheus Endpoint for Metrics is now Generally Available {#prometheus-endpoint-for-metrics-is-now-generally-available} +### Prometheus endpoint for metrics is now generally available {#prometheus-endpoint-for-metrics-is-now-generally-available} In our last cloud changelog, we announced the Private Preview for exporting [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud. This feature allows you to use the [ClickHouse Cloud API](/cloud/manage/api/api-overview) to get your metrics into tools like [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. We're happy to announce that this feature is now **Generally Available**. Please see [our docs](/integrations/prometheus) to learn more about this feature. -### Table Inspector in Cloud Console {#table-inspector-in-cloud-console} +### Table inspector in Cloud console {#table-inspector-in-cloud-console} ClickHouse has commands like [`DESCRIBE`](/sql-reference/statements/describe-table) that allow you to introspect your table to examine schema. These commands output to the console, but they are often not convenient to use as you need to combine several queries to retrieve all pertinent data about your tables and columns. @@ -489,11 +489,11 @@ Our [Java Client](https://github.com/ClickHouse/clickhouse-java) is one of the m For the last couple of years, we've been working on a new analyzer for query analysis and optimization. This analyzer improves query performance and will allow us to make further optimizations, including faster and more efficient `JOIN`s. Previously, it was required that new users enable this feature using the setting `allow_experimental_analyzer`. This improved analyzer is now available on new ClickHouse Cloud services by default. -Stay tuned for more improvements to the analyzer as we have many more optimizations planned! +Stay tuned for more improvements to the analyzer as we have many more optimizations planned. ## June 28, 2024 {#june-28-2024} -### ClickHouse Cloud for Microsoft Azure is now Generally Available! {#clickhouse-cloud-for-microsoft-azure-is-now-generally-available} +### ClickHouse Cloud for Microsoft Azure is now generally available {#clickhouse-cloud-for-microsoft-azure-is-now-generally-available} We first announced Microsoft Azure support in Beta [this past May](https://clickhouse.com/blog/clickhouse-cloud-is-now-on-azure-in-public-beta). In this latest cloud release, we're happy to announce that our Azure support is transitioning from Beta to Generally Available. ClickHouse Cloud is now available on all the three major cloud platforms: AWS, Google Cloud Platform, and now Microsoft Azure. @@ -504,19 +504,19 @@ This release also includes support for subscriptions via the [Microsoft Azure Ma If you'd like any specific region to be supported, please [contact us](https://clickhouse.com/support/program). -### Query Log Insights {#query-log-insights} +### Query log insights {#query-log-insights} -Our new Query Insights UI in the Cloud Console makes ClickHouse's built-in query log a lot easier to use. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. There's just one caveat: with 70+ fields and multiple records per query, interpreting the query log represents a steep learning curve. This initial version of query insights provides a blueprint for future work to simplify query debugging and optimization patterns. We'd love to hear your feedback as we continue to iterate on this feature, so please reach out—your input will be greatly appreciated! +Our new Query Insights UI in the Cloud console makes ClickHouse's built-in query log a lot easier to use. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. There's just one caveat: with 70+ fields and multiple records per query, interpreting the query log represents a steep learning curve. This initial version of query insights provides a blueprint for future work to simplify query debugging and optimization patterns. We'd love to hear your feedback as we continue to iterate on this feature, so please reach out—your input will be greatly appreciated. ClickHouse Cloud Query Insights UI showing query performance metrics and analysis -### Prometheus Endpoint for Metrics (Private Preview) {#prometheus-endpoint-for-metrics-private-preview} +### Prometheus endpoint for metrics (private preview) {#prometheus-endpoint-for-metrics-private-preview} Perhaps one of our most requested features: you can now export [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud to [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. Prometheus provides an open-source solution to monitor ClickHouse and set up custom alerts. Access to Prometheus metrics for your ClickHouse Cloud service is available via the [ClickHouse Cloud API](/integrations/prometheus). This feature is currently in Private Preview. Please reach out to the [support team](https://clickhouse.com/support/program) to enable this feature for your organization. Grafana dashboard showing Prometheus metrics from ClickHouse Cloud -### Other features: {#other-features} +### Other features {#other-features} - [Configurable backups](/cloud/manage/backups/configurable-backups) to configure custom backup policies like frequency, retention, and schedule are now Generally Available. ## June 13, 2024 {#june-13-2024} @@ -536,7 +536,7 @@ The following options are available: ### Enroll services to the Fast release channel {#enroll-services-to-the-fast-release-channel} -The Fast release channel allows your services to receive updates ahead of the release schedule. Previously, this feature required assistance from the support team to enable. Now, you can use the ClickHouse Cloud console to enable this feature for your services directly. Simply navigate to **Settings**, and click **Enroll in fast releases**. Your service will now receive updates as soon as they are available! +The Fast release channel allows your services to receive updates ahead of the release schedule. Previously, this feature required assistance from the support team to enable. Now, you can use the ClickHouse Cloud console to enable this feature for your services directly. Simply navigate to **Settings**, and click **Enroll in fast releases**. Your service will now receive updates as soon as they are available. ClickHouse Cloud settings page showing the option to enroll in fast releases @@ -558,13 +558,13 @@ We're happy to announce that you can now easily share queries via the ClickHouse ### ClickHouse Cloud for Microsoft Azure is now in beta {#clickhouse-cloud-for-microsoft-azure-is-now-in-beta} -We've finally launched the ability to create ClickHouse Cloud services on Microsoft Azure! We already have many customers using ClickHouse Cloud on Azure in production as part of our Private Preview program. Now, anyone can create their own service on Azure. All of your favorite ClickHouse features that are supported on AWS and GCP will also work on Azure. +We've finally launched the ability to create ClickHouse Cloud services on Microsoft Azure. We already have many customers using ClickHouse Cloud on Azure in production as part of our Private Preview program. Now, anyone can create their own service on Azure. All of your favorite ClickHouse features that are supported on AWS and GCP will also work on Azure. We expect to have ClickHouse Cloud for Azure ready for General Availability in the next few weeks. [Read this blog post](https://clickhouse.com/blog/clickhouse-cloud-is-now-on-azure-in-public-beta) to learn more, or create your new service using Azure via the ClickHouse Cloud console. Note: **Development** services for Azure are not supported at this time. -### Set up Private Link via the Cloud Console {#set-up-private-link-via-the-cloud-console} +### Set up Private Link via the Cloud console {#set-up-private-link-via-the-cloud-console} Our Private Link feature allows you to connect your ClickHouse Cloud services with internal services in your cloud provider account without having to direct traffic to the public internet, saving costs and enhancing security. Previously, this was difficult to set up and required using the ClickHouse Cloud API. @@ -574,15 +574,15 @@ You can now configure private endpoints in just a few clicks directly from the C ## May 17, 2024 {#may-17-2024} -### Ingest data from Amazon Kinesis using ClickPipes (Beta) {#ingest-data-from-amazon-kinesis-using-clickpipes-beta} +### Ingest data from Amazon Kinesis using ClickPipes (beta) {#ingest-data-from-amazon-kinesis-using-clickpipes-beta} -ClickPipes is an exclusive service provided by ClickHouse Cloud to ingest data without code. Amazon Kinesis is AWS's fully managed streaming service to ingest and store data streams for processing. We are thrilled to launch the ClickPipes beta for Amazon Kinesis, one of our most requested integrations. We're looking to add more integrations to ClickPipes, so please let us know which data source you'd like us to support! Read more about this feature [here](https://clickhouse.com/blog/clickpipes-amazon-kinesis). +ClickPipes is an exclusive service provided by ClickHouse Cloud to ingest data without code. Amazon Kinesis is AWS's fully managed streaming service to ingest and store data streams for processing. We are thrilled to launch the ClickPipes beta for Amazon Kinesis, one of our most requested integrations. We're looking to add more integrations to ClickPipes, so please let us know which data source you'd like us to support. Read more about this feature [here](https://clickhouse.com/blog/clickpipes-amazon-kinesis). You can try the new Amazon Kinesis integration for ClickPipes in the cloud console: ClickPipes interface showing Amazon Kinesis integration configuration options -### Configurable Backups (Private Preview) {#configurable-backups-private-preview} +### Configurable backups (private preview) {#configurable-backups-private-preview} Backups are important for every database (no matter how reliable), and we've taken backups very seriously since day 1 of ClickHouse Cloud. This week, we launched Configurable Backups, which allows for much more flexibility for your service's backups. You can now control start time, retention, and frequency. This feature is available for **Production** and **Dedicated** services and is not available for **Development** services. As this feature is in private preview, please contact support@clickhouse.com to enable this for your service. Read more about configurable backups [here](https://clickhouse.com/blog/configurable-backups-in-clickhouse-cloud). @@ -646,7 +646,7 @@ Other changes: ## April 4, 2024 {#april-4-2024} -### Introducing the new ClickHouse Cloud Console {#introducing-the-new-clickhouse-cloud-console} +### Introducing the new ClickHouse Cloud console {#introducing-the-new-clickhouse-cloud-console} This release introduces a private preview for the new cloud console. @@ -656,7 +656,7 @@ Thousands of ClickHouse Cloud users execute billions of queries on our SQL conso Select customers will receive a preview of our new cloud console experience – a unified and immersive way to explore and manage your data in ClickHouse. Please reach out to us at support@clickhouse.com if you'd like priority access. -Animation showing the new ClickHouse Cloud Console interface with integrated SQL editor and management features +Animation showing the new ClickHouse Cloud console interface with integrated SQL editor and management features ## March 28, 2024 {#march-28-2024} @@ -690,10 +690,10 @@ This release introduces support for Microsoft Azure, Horizontal Scaling via API, ## March 14, 2024 {#march-14-2024} -This release makes available in early access the new Cloud Console experience, ClickPipes for bulk loading from S3 and GCS, and support for Avro format in ClickPipes for Kafka. It also upgrades the ClickHouse database version to 24.1, bringing support for new functions as well as performance and resource usage optimizations. +This release makes available in early access the new Cloud console experience, ClickPipes for bulk loading from S3 and GCS, and support for Avro format in ClickPipes for Kafka. It also upgrades the ClickHouse database version to 24.1, bringing support for new functions as well as performance and resource usage optimizations. ### Console changes {#console-changes-2} -- New Cloud Console experience is available in early access (please contact support if you're interested in participating). +- New Cloud console experience is available in early access (please contact support if you're interested in participating). - ClickPipes for bulk loading from S3 and GCS are available in early access (please contact support if you're interested in participating). - Support for Avro format in ClickPipes for Kafka is available in early access (please contact support if you're interested in participating). @@ -911,7 +911,7 @@ This release brings general availability of ClickPipes for Kafka, Confluent Clou ### Console changes {#console-changes-11} - Added a self-service workflow to secure [access to Amazon S3 via IAM roles](/cloud/security/secure-s3) -- Introduced AI-assisted query suggestions in private preview (please [contact ClickHouse Cloud support](https://console.clickhouse.cloud/support) to try it out!) +- Introduced AI-assisted query suggestions in private preview (please [contact ClickHouse Cloud support](https://console.clickhouse.cloud/support) to try it out.) ### Integrations changes {#integrations-changes-11} - Announced general availability of ClickPipes - a turnkey data ingestion service - for Kafka, Confluent Cloud, and Amazon MSK (see the [release blog](https://clickhouse.com/blog/clickpipes-is-generally-available)) @@ -1049,9 +1049,21 @@ This release brings the public release of the ClickHouse Cloud Programmatic API ## May 11, 2023 {#may-11-2023} -This release brings the ~~public beta~~ (now GA, see June 20th entry above) of ClickHouse Cloud on GCP (see [blog](https://clickhouse.com/blog/clickhouse-cloud-on-gcp-available-in-public-beta) for details), extends administrators rights to grant terminate query permissions, and adds more visibility into the status of MFA users in the Cloud console. +This release brings the public beta of ClickHouse Cloud on GCP +(see [blog](https://clickhouse.com/blog/clickhouse-cloud-on-gcp-available-in-public-beta) +for details), extends administrators' rights to grant terminate query permissions, +and adds more visibility into the status of MFA users in the Cloud console. + +:::note Update +ClickHouse Cloud on GCP is now GA, see the entry for June twenty above. +::: + +### ClickHouse Cloud on GCP is now available in public beta {#clickhouse-cloud-on-gcp-is-now-available-in-public-beta-now-ga-see-june-20th-entry-above} + +:::note +ClickHouse Cloud on GCP is now GA, see the [June 20th](#june-20-2023) entry above. +::: -### ClickHouse Cloud on GCP ~~(Public Beta)~~ (now GA, see June 20th entry above) {#clickhouse-cloud-on-gcp-public-beta-now-ga-see-june-20th-entry-above} - Launches a fully-managed separated storage and compute ClickHouse offering, running on top of Google Compute and Google Cloud Storage - Available in Iowa (us-central1), Netherlands (europe-west4), and Singapore (asia-southeast1) regions - Supports both Development and Production services in all three initial regions @@ -1308,7 +1320,7 @@ This release introduces seamless logins for administrators to SQL console, impro ### Integrations changes {#integrations-changes-26} - The [Metabase plugin](/integrations/data-visualization/metabase-and-clickhouse.md) got a long-awaited v0.9.1 major update. Now it is compatible with the latest Metabase version and has been thoroughly tested against ClickHouse Cloud. -## December 6, 2022 - General Availability {#december-6-2022---general-availability} +## December 6, 2022 - General availability {#december-6-2022---general-availability} ClickHouse Cloud is now production-ready with SOC2 Type II compliance, uptime SLAs for production workloads, and public status page. This release includes major new capabilities like AWS Marketplace integration, SQL console - a data exploration workbench for ClickHouse users, and ClickHouse Academy - self-paced learning in ClickHouse Cloud. Learn more in this [blog](https://clickhouse.com/blog/clickhouse-cloud-generally-available). diff --git a/docs/cloud/reference/cloud-compatibility.md b/docs/cloud/reference/cloud-compatibility.md index 8250ca62d7f..62cdfcb8710 100644 --- a/docs/cloud/reference/cloud-compatibility.md +++ b/docs/cloud/reference/cloud-compatibility.md @@ -5,11 +5,11 @@ title: 'Cloud Compatibility' description: 'This guide provides an overview of what to expect functionally and operationally in ClickHouse Cloud.' --- -# ClickHouse Cloud — Compatibility Guide +# ClickHouse Cloud compatibility guide This guide provides an overview of what to expect functionally and operationally in ClickHouse Cloud. While ClickHouse Cloud is based on the open-source ClickHouse distribution, there may be some differences in architecture and implementation. You may find this blog on [how we built ClickHouse Cloud](https://clickhouse.com/blog/building-clickhouse-cloud-from-scratch-in-a-year) interesting and relevant to read as background. -## ClickHouse Cloud Architecture {#clickhouse-cloud-architecture} +## ClickHouse Cloud architecture {#clickhouse-cloud-architecture} ClickHouse Cloud significantly simplifies operational overhead and reduces the costs of running ClickHouse at scale. There is no need to size your deployment upfront, set up replication for high availability, manually shard your data, scale up your servers when your workload increases, or scale them down when you are not using them — we handle this for you. These benefits come as a result of architectural choices underlying ClickHouse Cloud: @@ -99,7 +99,7 @@ The [Kafka Table Engine](/integrations/data-ingestion/kafka/index.md) is not gen [Named collections](/operations/named-collections) are not currently supported in ClickHouse Cloud. -## Operational Defaults and Considerations {#operational-defaults-and-considerations} +## Operational defaults and considerations {#operational-defaults-and-considerations} The following are default settings for ClickHouse Cloud services. In some cases, these settings are fixed to ensure the correct operation of the service, and in others, they can be adjusted. ### Operational limits {#operational-limits} diff --git a/docs/cloud/reference/index.md b/docs/cloud/reference/index.md index a56fdbffe54..a4e19f99a7c 100644 --- a/docs/cloud/reference/index.md +++ b/docs/cloud/reference/index.md @@ -1,20 +1,20 @@ --- slug: /cloud/reference -keywords: ['Cloud', 'reference', 'architecture', 'SharedMergeTree', 'Compute-Compute Separation', 'Bring Your Own Cloud', 'Changelogs', 'Supported Cloud Regions', 'Cloud Compatibility'] +keywords: ['Cloud', 'reference', 'architecture', 'SharedMergeTree', 'Compute-compute Separation', 'Bring Your Own Cloud', 'Changelogs', 'Supported Cloud Regions', 'Cloud Compatibility'] title: 'Overview' hide_title: true description: 'Landing page for the Cloud reference section' --- -# Cloud Reference +# Cloud reference This section acts as a reference guide for some of the more technical details of ClickHouse Cloud and contains the following pages: | Page | Description | |-----------------------------------|-----------------------------------------------------------------------------------------------------------| | [Architecture](/cloud/reference/architecture) | Discusses the architecture of ClickHouse Cloud, including storage, compute, administration, and security. | -| [SharedMergeTree](/cloud/reference/shared-merge-tree) | Explainer on SharedMergeTree, the cloud-native replacement for the ReplicatedMergeTree and analogues. | -| [Warehouses](/cloud/reference/warehouses) | Explainer on what Warehouses and Compute-Compute separation are in ClickHouse Cloud. | +| [SharedMergeTree](/cloud/reference/shared-merge-tree) | Explainer on SharedMergeTree, the cloud-native replacement for the ReplicatedMergeTree and analogues. | +| [Warehouses](/cloud/reference/warehouses) | Explainer on what Warehouses and compute-compute separation are in ClickHouse Cloud. | | [BYOC (Bring Your Own Cloud)](/cloud/reference/byoc)| Explainer on the Bring Your Own Cloud (BYOC) service available with ClickHouse Cloud. | | [Changelogs](/cloud/reference/changelogs) | Cloud Changelogs and Release Notes. | | [Cloud Compatibility](/whats-new/cloud-compatibility) | A guide to what to expect functionally and operationally in ClickHouse Cloud. | diff --git a/docs/cloud/reference/shared-catalog.md b/docs/cloud/reference/shared-catalog.md index b70b82d6019..fa474c41b74 100644 --- a/docs/cloud/reference/shared-catalog.md +++ b/docs/cloud/reference/shared-catalog.md @@ -6,7 +6,7 @@ keywords: ['SharedCatalog', 'SharedDatabaseEngine'] description: 'Describes the Shared Catalog component and the Shared database engine in ClickHouse Cloud' --- -# Shared Catalog and Shared Database Engine {#shared-catalog-and-shared-database-engine} +# Shared catalog and shared database engine {#shared-catalog-and-shared-database-engine} **Available exclusively in ClickHouse Cloud (and first party partner cloud services)** @@ -21,7 +21,7 @@ It supports replication of the following database engines: - MySQL - DataLakeCatalog -## Architecture and Metadata Storage {#architecture-and-metadata-storage} +## Architecture and metadata storage {#architecture-and-metadata-storage} All metadata and DDL query history in Shared Catalog is stored centrally in ZooKeeper. Nothing is persisted on local disk. This architecture ensures: @@ -29,7 +29,7 @@ All metadata and DDL query history in Shared Catalog is stored centrally in ZooK - Statelessness of compute nodes - Fast, reliable replica bootstrapping -## Shared Database Engine {#shared-database-engine} +## Shared database engine {#shared-database-engine} The **Shared database engine** works in conjunction with Shared Catalog to manage databases whose tables use **stateless table engines** such as `SharedMergeTree`. These table engines do not write persistent state to disk and are compatible with dynamic compute environments. diff --git a/docs/cloud/reference/shared-merge-tree.md b/docs/cloud/reference/shared-merge-tree.md index be0009a40c5..589e96a7a0e 100644 --- a/docs/cloud/reference/shared-merge-tree.md +++ b/docs/cloud/reference/shared-merge-tree.md @@ -11,7 +11,7 @@ import shared_merge_tree_2 from '@site/static/images/cloud/reference/shared-merg import Image from '@theme/IdealImage'; -# SharedMergeTree Table Engine +# SharedMergeTree table engine The SharedMergeTree table engine family is a cloud-native replacement of the ReplicatedMergeTree engines that is optimized to work on top of shared storage (e.g. Amazon S3, Google Cloud Storage, MinIO, Azure Blob Storage). There is a SharedMergeTree analog for every specific MergeTree engine type, i.e. ReplacingSharedMergeTree replaces ReplacingReplicatedMergeTree. diff --git a/docs/cloud/reference/supported-regions.md b/docs/cloud/reference/supported-regions.md index 4775688d5e7..b1c6d3bb7ed 100644 --- a/docs/cloud/reference/supported-regions.md +++ b/docs/cloud/reference/supported-regions.md @@ -8,9 +8,9 @@ slug: /cloud/reference/supported-regions import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' -# Supported Cloud Regions +# Supported cloud regions -## AWS Regions {#aws-regions} +## AWS regions {#aws-regions} - ap-northeast-1 (Tokyo) - ap-south-1 (Mumbai) @@ -75,7 +75,7 @@ Key considerations for private regions: Additional requirements may apply for HIPAA compliance (including signing a BAA). Note that HIPAA is currently available only for Enterprise tier services -## HIPAA Compliant Regions {#hipaa-compliant-regions} +## HIPAA compliant regions {#hipaa-compliant-regions} @@ -88,7 +88,7 @@ Customers must sign a Business Associate Agreement (BAA) and request onboarding - GCP us-central1 (Iowa) - GCP us-east1 (South Carolina) -## PCI Compliant Regions {#pci-compliant-regions} +## PCI compliant regions {#pci-compliant-regions} diff --git a/docs/cloud/reference/warehouses.md b/docs/cloud/reference/warehouses.md index 9beec53a19c..675c3f51b5e 100644 --- a/docs/cloud/reference/warehouses.md +++ b/docs/cloud/reference/warehouses.md @@ -16,7 +16,7 @@ import Image from '@theme/IdealImage'; # Warehouses -## What is Compute-Compute Separation? {#what-is-compute-compute-separation} +## What is compute-compute separation? {#what-is-compute-compute-separation} Compute-compute separation is available for Scale and Enterprise tiers. @@ -47,7 +47,7 @@ _Fig. 2 - compute separation in ClickHouse Cloud_ It is possible to create extra services that share the same data with your existing services, or create a completely new setup with multiple services sharing the same data. -## What is a Warehouse? {#what-is-a-warehouse} +## What is a warehouse? {#what-is-a-warehouse} In ClickHouse Cloud, a _warehouse_ is a set of services that share the same data. Each warehouse has a primary service (this service was created first) and secondary service(s). For example, in the screenshot below you can see a warehouse "DWH Prod" with two services: @@ -153,9 +153,9 @@ Compute prices are the same for all services in a warehouse (primary and seconda - As all services in a single warehouse share the same storage, backups are made only on the primary (initial) service. By this, the data for all services in a warehouse is backed up. - If you restore a backup from a primary service of a warehouse, it will be restored to a completely new service, not connected to the existing warehouse. You can then add more services to the new service immediately after the restore is finished. -## Using Warehouses {#using-warehouses} +## Using warehouses {#using-warehouses} -### Creating a Warehouse {#creating-a-warehouse} +### Creating a warehouse {#creating-a-warehouse} To create a warehouse, you need to create a second service that will share the data with an existing service. This can be done by clicking the plus sign on any of the existing services: @@ -167,7 +167,7 @@ _Fig. 7 - Click the plus sign to create a new service in a warehouse_ On the service creation screen, the original service will be selected in the dropdown as the source for the data of the new service. Once created, these two services will form a warehouse. -### Renaming a Warehouse {#renaming-a-warehouse} +### Renaming a warehouse {#renaming-a-warehouse} There are two ways to rename a warehouse: diff --git a/docs/cloud/security/accessing-s3-data-securely.md b/docs/cloud/security/accessing-s3-data-securely.md index f20673e1d08..7503cdccf9c 100644 --- a/docs/cloud/security/accessing-s3-data-securely.md +++ b/docs/cloud/security/accessing-s3-data-securely.md @@ -22,7 +22,7 @@ This approach allows customers to manage all access to their S3 buckets in a sin ## Setup {#setup} -### Obtaining the ClickHouse service IAM role Arn {#obtaining-the-clickhouse-service-iam-role-arn} +### Obtaining the ClickHouse service IAM role ARN {#obtaining-the-clickhouse-service-iam-role-arn} 1 - Login to your ClickHouse cloud account. @@ -71,7 +71,7 @@ This approach allows customers to manage all access to their S3 buckets in a sin CloudFormation stack output showing IAM Role ARN -#### Option 2: Manually create IAM role. {#option-2-manually-create-iam-role} +#### Option 2: Manually create IAM role {#option-2-manually-create-iam-role} 1 - Login to your AWS Account in the web browser with an IAM user that has permission to create & manage IAM role. @@ -128,7 +128,7 @@ IAM policy (Please replace `{BUCKET_NAME}` with your bucket name): 4 - Copy the new **IAM Role Arn** after creation. This is what needed to access your S3 bucket. -## Access your S3 bucket with the ClickHouseAccess Role {#access-your-s3-bucket-with-the-clickhouseaccess-role} +## Access your S3 bucket with the ClickHouseAccess role {#access-your-s3-bucket-with-the-clickhouseaccess-role} ClickHouse Cloud has a new feature that allows you to specify `extra_credentials` as part of the S3 table function. Below is an example of how to run a query using the newly created role copied from above. diff --git a/docs/cloud/security/aws-privatelink.md b/docs/cloud/security/aws-privatelink.md index d61dfa89984..56bde44c5c6 100644 --- a/docs/cloud/security/aws-privatelink.md +++ b/docs/cloud/security/aws-privatelink.md @@ -39,12 +39,12 @@ ClickHouse Cloud currently supports [cross-region PrivateLink](https://aws.amazo Find Terraform examples [here](https://github.com/ClickHouse/terraform-provider-clickhouse/tree/main/examples/). -## Attention {#attention} +## Points of attention {#attention} ClickHouse attempts to group your services to reuse the same published [service endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html#endpoint-service-overview) within the AWS region. However, this grouping is not guaranteed, especially if you spread your services across multiple ClickHouse organizations. -If you already have PrivateLink configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: [Add ClickHouse "Endpoint ID" to ClickHouse service allow list](#add-endpoint-id-to-services-allow-list). +If you already have PrivateLink configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: Add ClickHouse "Endpoint ID" to ClickHouse service allow list. -## Prerequisites {#prerequisites} +## Prerequisites for this process {#prerequisites} Before you get started you will need: @@ -55,7 +55,7 @@ Before you get started you will need: Follow these steps to connect your ClickHouse Cloud services via AWS PrivateLink. -### Obtain Endpoint "Service name" {#obtain-endpoint-service-info} +### Obtain endpoint "Service name" {#obtain-endpoint-service-info} #### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console} @@ -105,7 +105,7 @@ This command should return something like: Make a note of the `endpointServiceId` and `privateDnsHostname` [move onto next step](#create-aws-endpoint). -### Create AWS Endpoint {#create-aws-endpoint} +### Create AWS endpoint {#create-aws-endpoint} :::important This section covers ClickHouse-specific details for configuring ClickHouse via AWS PrivateLink. AWS-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the AWS cloud provider. Please consider AWS configuration based on your specific use case. @@ -187,7 +187,7 @@ resource "aws_vpc_endpoint" "this" { After creating the VPC Endpoint, make a note of the `Endpoint ID` value; you'll need it for an upcoming step. -#### Set Private DNS Name for Endpoint {#set-private-dns-name-for-endpoint} +#### Set private DNS name for endpoint {#set-private-dns-name-for-endpoint} :::note There are various ways to configure DNS. Please set up DNS according to your specific use case. @@ -269,7 +269,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" \ Each service with Private Link enabled has a public and private endpoint. In order to connect using Private Link, you need to use a private endpoint which will be `privateDnsHostname`API or `DNS Name`console taken from [Obtain Endpoint "Service name"](#obtain-endpoint-service-info). -#### Getting Private DNS Hostname {#getting-private-dns-hostname} +#### Getting private DNS hostname {#getting-private-dns-hostname} ##### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-3} @@ -328,7 +328,7 @@ Please refer [here](#attention) - Most likely Endpoint ID was not added to service allow list, please visit [step](#add-endpoint-id-to-services-allow-list) -### Checking Endpoint filters {#checking-endpoint-filters} +### Checking endpoint filters {#checking-endpoint-filters} Set the following environment variables before running any commands: diff --git a/docs/cloud/security/azure-privatelink.md b/docs/cloud/security/azure-privatelink.md index e3d99975381..f213496c7a7 100644 --- a/docs/cloud/security/azure-privatelink.md +++ b/docs/cloud/security/azure-privatelink.md @@ -99,7 +99,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" "https://api.clickhouse.cloud Make a note of the `endpointServiceId`. You'll use it in the next step. -## Create a Private Endpoint in Azure {#create-private-endpoint-in-azure} +## Create a private endpoint in Azure {#create-private-endpoint-in-azure} :::important This section covers ClickHouse-specific details for configuring ClickHouse via Azure Private Link. Azure-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the Azure cloud provider. Please consider Azure configuration based on your specific use case. @@ -111,7 +111,7 @@ For any issues related to Azure configuration tasks, contact Azure Support direc In this section, we're going to create a Private Endpoint in Azure. You can use either the Azure Portal or Terraform. -### Option 1: Using Azure Portal to create a Private Endpoint in Azure {#option-1-using-azure-portal-to-create-a-private-endpoint-in-azure} +### Option 1: Using Azure Portal to create a private endpoint in Azure {#option-1-using-azure-portal-to-create-a-private-endpoint-in-azure} In the Azure Portal, open **Private Link Center → Private Endpoints**. @@ -180,7 +180,7 @@ Open the network interface associated with Private Endpoint and copy the **Priva Private Endpoint IP Address -### Option 2: Using Terraform to create a Private Endpoint in Azure {#option-2-using-terraform-to-create-a-private-endpoint-in-azure} +### Option 2: Using Terraform to create a private endpoint in Azure {#option-2-using-terraform-to-create-a-private-endpoint-in-azure} Use the template below to use Terraform to create a Private Endpoint: @@ -199,7 +199,7 @@ resource "azurerm_private_endpoint" "example_clickhouse_cloud" { } ``` -### Obtaining the Private Endpoint `resourceGuid` {#obtaining-private-endpoint-resourceguid} +### Obtaining the private endpoint `resourceGuid` {#obtaining-private-endpoint-resourceguid} In order to use Private Link, you need to add the Private Endpoint connection GUID to your service allow list. @@ -427,7 +427,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" -X PATCH -H "Content-Type: ap Each service with Private Link enabled has a public and private endpoint. In order to connect using Private Link, you need to use a private endpoint which will be `privateDnsHostname`API or `DNS name`console taken from [Obtain Azure connection alias for Private Link](#obtain-azure-connection-alias-for-private-link). -### Obtaining the Private DNS Hostname {#obtaining-the-private-dns-hostname} +### Obtaining the private DNS hostname {#obtaining-the-private-dns-hostname} #### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-3} @@ -488,7 +488,7 @@ Address: 10.0.0.4 Most likely, the Private Endpoint GUID was not added to the service allow-list. Revisit the [_Add Private Endpoint GUID to your services allow-list_ step](#add-private-endpoint-guid-to-services-allow-list). -### Private Endpoint is in Pending state {#private-endpoint-is-in-pending-state} +### Private Endpoint is in pending state {#private-endpoint-is-in-pending-state} Most likely, the Private Endpoint GUID was not added to the service allow-list. Revisit the [_Add Private Endpoint GUID to your services allow-list_ step](#add-private-endpoint-guid-to-services-allow-list). @@ -523,7 +523,7 @@ Early data was not sent Verify return code: 0 (ok) ``` -### Checking Private Endpoint filters {#checking-private-endpoint-filters} +### Checking private endpoint filters {#checking-private-endpoint-filters} Set the following environment variables before running any commands: diff --git a/docs/cloud/security/cloud-access-management/cloud-authentication.md b/docs/cloud/security/cloud-access-management/cloud-authentication.md index 3c4b5b4f8d0..c0138a1d18e 100644 --- a/docs/cloud/security/cloud-access-management/cloud-authentication.md +++ b/docs/cloud/security/cloud-access-management/cloud-authentication.md @@ -8,11 +8,11 @@ description: 'This guide explains some good practices for configuring your authe import ScalePlanFeatureBadge from '@theme/badges/ScalePlanFeatureBadge' import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' -# Cloud Authentication +# Cloud authentication ClickHouse Cloud provides a number of ways to authenticate. This guide explains some good practices for configuring your authentication. Always check with your security team when selecting authentication methods. -## Password Settings {#password-settings} +## Password settings {#password-settings} Minimum password settings for our console and services (databases) currently comply with [NIST 800-63B](https://pages.nist.gov/800-63-3/sp800-63b.html#sec4) Authenticator Assurance Level 1: - Minimum 12 characters @@ -22,15 +22,15 @@ Minimum password settings for our console and services (databases) currently com - 1 number - 1 special character -## Email + Password {#email--password} +## Email and password {#email--password} ClickHouse Cloud allows you to authenticate with an email address and password. When using this method the best way to protect your ClickHouse account use a strong password. There are many online resources to help you devise a password you can remember. Alternatively, you can use a random password generator and store your password in a password manager for increased security. -## SSO Using Google or Microsoft Social Authentication {#sso-using-google-or-microsoft-social-authentication} +## SSO using Google or Microsoft social authentication {#sso-using-google-or-microsoft-social-authentication} If your company uses Google Workspace or Microsoft 365, you can leverage your current single sign-on setup within ClickHouse Cloud. To do this, simply sign up using your company email address and invite other users using their company email. The effect is that your users must login using your company's login flows, whether via your identity provider or directly through Google or Microsoft authentication, before they can authenticate into ClickHouse Cloud. -## Multi-Factor Authentication {#multi-factor-authentication} +## Multi-factor authentication {#multi-factor-authentication} Users with email + password or social authentication can further secure their account using multi-factor authentication (MFA). To set up MFA: 1. Log into console.clickhouse.cloud @@ -118,7 +118,7 @@ Users with email + password or social authentication can further secure their ac ClickHouse Cloud also supports security assertion markup language (SAML) single sign on (SSO). For more information, see [SAML SSO Setup](/cloud/security/saml-setup). -## Database User ID + Password {#database-user-id--password} +## Database user ID and password {#database-user-id--password} Use the SHA256_hash method when [creating user accounts](/sql-reference/statements/create/user.md) to secure passwords. diff --git a/docs/cloud/security/cloud-access-management/index.md b/docs/cloud/security/cloud-access-management/index.md index 0aa6b9e3c4c..a61339a7258 100644 --- a/docs/cloud/security/cloud-access-management/index.md +++ b/docs/cloud/security/cloud-access-management/index.md @@ -1,6 +1,6 @@ --- slug: /cloud/security/cloud-access-management -title: 'Cloud Access Management' +title: 'Cloud access management' description: 'Cloud Access Management Table Of Contents' --- diff --git a/docs/cloud/security/cmek.md b/docs/cloud/security/cmek.md index a62982207da..f5be60ec17f 100644 --- a/docs/cloud/security/cmek.md +++ b/docs/cloud/security/cmek.md @@ -9,7 +9,7 @@ import Image from '@theme/IdealImage'; import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' import cmek_performance from '@site/static/images/_snippets/cmek-performance.png'; -# ClickHouse Enhanced Encryption +# ClickHouse enhanced encryption @@ -89,15 +89,15 @@ Once a service is encrypted with TDE, customers may update the key to enable CME -## Key Rotation {#key-rotation} +## Key rotation {#key-rotation} Once you set up CMEK, rotate the key by following the procedures above for creating a new KMS key and granting permissions. Return to the service settings to paste the new ARN (AWS) or Key Resource Path (GCP) and save the settings. The service will restart to apply the new key. -## Backup and Restore {#backup-and-restore} +## Backup and restore {#backup-and-restore} Backups are encrypted using the same key as the associated service. When you restore an encrypted backup, it creates an encrypted instance that uses the same KMS key as the original instance. If needed, you can rotate the KMS key after restoration; see [Key Rotation](#key-rotation) for more details. -## KMS Key Poller {#kms-key-poller} +## KMS key poller {#kms-key-poller} When using CMEK, the validity of the provided KMS key is checked every 10 minutes. If access to the KMS key is invalid, the ClickHouse service will stop. To resume service, restore access to the KMS key by following the steps in this guide, and then restart the service. diff --git a/docs/cloud/security/common-access-management-queries.md b/docs/cloud/security/common-access-management-queries.md index ddaf0581275..24b98073491 100644 --- a/docs/cloud/security/common-access-management-queries.md +++ b/docs/cloud/security/common-access-management-queries.md @@ -7,7 +7,7 @@ description: 'This article shows the basics of defining SQL users and roles and import CommonUserRolesContent from '@site/docs/_snippets/_users-and-roles-common.md'; -# Common Access Management Queries +# Common access management queries :::tip Self-managed If you are working with self-managed ClickHouse please see [SQL users and roles](/guides/sre/user-management/index.md). diff --git a/docs/cloud/security/compliance-overview.md b/docs/cloud/security/compliance-overview.md index 52bdcd5a3f7..4653c0f09c1 100644 --- a/docs/cloud/security/compliance-overview.md +++ b/docs/cloud/security/compliance-overview.md @@ -8,10 +8,10 @@ description: 'This page describes the security and compliance measures implement import BetaBadge from '@theme/badges/BetaBadge'; import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge'; -# Security and Compliance Reports +# Security and compliance reports ClickHouse Cloud evaluates the security and compliance needs of our customers and is continuously expanding the program as additional reports are requested. For additional information or to download the reports visit our [Trust Center](https://trust.clickhouse.com). -### SOC 2 Type II (Since 2022) {#soc-2-type-ii-since-2022} +### SOC 2 Type II (since 2022) {#soc-2-type-ii-since-2022} System and Organization Controls (SOC) 2 is a report focusing on security, availability, confidentiality, processing integrity and privacy criteria contained in the Trust Services Criteria (TSC) as applied to an organization's systems and is designed to provide assurance about these controls to relying parties (our customers). ClickHouse works with independent external auditors to undergo an audit at least once per year addressing security, availability and processing integrity of our systems and confidentiality and privacy of the data processed by our systems. The report addresses both our ClickHouse Cloud and Bring Your Own Cloud (BYOC) offerings. @@ -19,11 +19,11 @@ System and Organization Controls (SOC) 2 is a report focusing on security, avail International Standards Organization (ISO) 27001 is an international standard for information security. It requires companies to implement an Information Security Management System (ISMS) that includes processes for managing risks, creating and communicating policies, implementing security controls, and monitoring to ensure components remain relevant and effective. ClickHouse conducts internal audits and works with independent external auditors to undergo audits and interim inspections for the 2 years between certificate issuance. -### U.S. DPF (Since 2024) {#us-dpf-since-2024} +### U.S. DPF (since 2024) {#us-dpf-since-2024} The U.S. Data Privacy Framework was developed to provide U.S. organizations with reliable mechanisms for personal data transfers to the United States from the European Union/ European Economic Area, the United Kingdom, and Switzerland that are consistent with EU, UK and Swiss law (https://dataprivacyframework.gov/Program-Overview). ClickHouse self-certified to the framework and is listed on the [Data Privacy Framework List](https://dataprivacyframework.gov/list). -### HIPAA (Since 2024) {#hipaa-since-2024} +### HIPAA (since 2024) {#hipaa-since-2024} @@ -31,7 +31,7 @@ Customers that wish to deploy services to a HIPAA compliant region to load elect The Health Insurance Portability and Accountability Act (HIPAA) of 1996 is a United States based privacy law focused on management of protected health information (PHI). HIPAA has several requirements, including the [Security Rule](https://www.hhs.gov/hipaa/for-professionals/security/index.html), which is focused on protecting electronic personal health information (ePHI). ClickHouse has implemented administrative, physical and technical safeguards to ensure the confidentiality, integrity and security of ePHI stored in designated services. These activities are incorporated in our SOC 2 Type II report available for download in our [Trust Center](https://trust.clickhouse.com). -### PCI Service Provider (Since 2025) {#pci-service-provider-since-2025} +### PCI service provider (since 2025) {#pci-service-provider-since-2025} @@ -39,27 +39,27 @@ Customers that wish to deploy services to PCI compliant regions to load cardhold The [Payment Card Industry Data Security Standard (PCI DSS)](https://www.pcisecuritystandards.org/standards/pci-dss/) is a set of rules created by the PCI Security Standards Council to protect credit card payment data. ClickHouse has undergone an external audit with a Qualified Security Assessor (QSA) that resulted in a passing Report on Compliance (ROC) against PCI criteria relevant to storing credit card data. To download a copy of our Attestation on Compliance (AOC) and PCI responsibility overview, please visit our [Trust Center](https://trust.clickhouse.com). -# Privacy Compliance +# Privacy compliance In addition to the items above, ClickHouse maintains internal compliance programs addressing the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA) and other relevant privacy frameworks. Details on personal data that ClickHouse collects, how it is used, how it is protected and other privacy related information can be found in the following locations. -### Legal Documents {#legal-documents} +### Legal documents {#legal-documents} - [Privacy Policy](https://clickhouse.com/legal/privacy-policy) - [Cookie Policy](https://clickhouse.com/legal/cookie-policy) - [Data Privacy Framework Notification](https://clickhouse.com/legal/data-privacy-framework) - [Data Processing Addendum (DPA)](https://clickhouse.com/legal/agreements/data-processing-addendum) -### Processing Locations {#processing-locations} +### Processing locations {#processing-locations} - [Sub-Processors and Affiliates](https://clickhouse.com/legal/agreements/subprocessors) - [Data Processing Locations](https://trust.clickhouse.com) -### Additional Procedures {#additional-procedures} +### Additional procedures {#additional-procedures} - [Personal Data Access](/cloud/security/personal-data-access) - [Delete Account](/cloud/manage/close_account) -# Payment Compliance +# Payment compliance ClickHouse provides a secure method to pay by credit card that is compliant with [PCI SAQ A v4.0](https://www.pcisecuritystandards.org/document_library/). diff --git a/docs/cloud/security/gcp-private-service-connect.md b/docs/cloud/security/gcp-private-service-connect.md index 45135092a10..1113b3346c9 100644 --- a/docs/cloud/security/gcp-private-service-connect.md +++ b/docs/cloud/security/gcp-private-service-connect.md @@ -23,7 +23,7 @@ import gcp_privatelink_pe_dns from '@site/static/images/cloud/security/gcp-priva -Private Service Connect(PSC) is a Google Cloud networking feature that allows consumers to access managed services privately inside their virtual private cloud (VPC) network. Similarly, it allows managed service producers to host these services in their own separate VPC networks and offer a private connection to their consumers. +Private Service Connect (PSC) is a Google Cloud networking feature that allows consumers to access managed services privately inside their virtual private cloud (VPC) network. Similarly, it allows managed service producers to host these services in their own separate VPC networks and offer a private connection to their consumers. Service producers publish their applications to consumers by creating Private Service Connect services. Service consumers access those Private Service Connect services directly through one of these Private Service Connect types. @@ -65,7 +65,7 @@ Code examples are provided below to show how to set up Private Service Connect w - GCP VPC in customer GCP project: `default` ::: -You'll need to retrieve information about your ClickHouse Cloud service. You can do this either via the ClickHouse Cloud Console or the ClickHouse API. If you are going to use the ClickHouse API, please set the following environment variables before proceeding: +You'll need to retrieve information about your ClickHouse Cloud service. You can do this either via the ClickHouse Cloud console or the ClickHouse API. If you are going to use the ClickHouse API, please set the following environment variables before proceeding: ```shell REGION= @@ -129,7 +129,7 @@ For any issues related to GCP configuration tasks, contact GCP Support directly. In this section, we're going to create a service endpoint. -### Adding a Private Service Connection {#adding-a-private-service-connection} +### Adding a private service connection {#adding-a-private-service-connection} First up, we're going to create a Private Service Connection. @@ -137,7 +137,7 @@ First up, we're going to create a Private Service Connection. In the Google Cloud console, navigate to **Network services -> Private Service Connect**. -Open Private Service Connect in Google Cloud Console +Open Private Service Connect in Google Cloud console Open the Private Service Connect creation dialog by clicking on the **Connect Endpoint** button. @@ -209,7 +209,7 @@ output "psc_connection_id" { use `endpointServiceId`API or `Service name`console from [Obtain GCP service attachment for Private Service Connect](#obtain-gcp-service-attachment-and-dns-name-for-private-service-connect) step ::: -## Set Private DNS Name for Endpoint {#setting-up-dns} +## Set private DNS name for endpoint {#set-private-dns-name-for-endpoint} :::note There are various ways to configure DNS. Please set up DNS according to your specific use case. @@ -336,7 +336,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" -X PATCH -H "Content-Type: ap Each service with Private Link enabled has a public and private endpoint. In order to connect using Private Link, you need to use a private endpoint which will be `privateDnsHostname` taken from [Obtain GCP service attachment for Private Service Connect](#obtain-gcp-service-attachment-and-dns-name-for-private-service-connect). -### Getting Private DNS Hostname {#getting-private-dns-hostname} +### Getting private DNS hostname {#getting-private-dns-hostname} #### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-3} @@ -412,7 +412,7 @@ Early data was not sent Verify return code: 0 (ok) ``` -### Checking Endpoint filters {#checking-endpoint-filters} +### Checking endpoint filters {#checking-endpoint-filters} #### REST API {#rest-api} diff --git a/docs/cloud/security/index.md b/docs/cloud/security/index.md index 481708bf471..b6a2d56ab1b 100644 --- a/docs/cloud/security/index.md +++ b/docs/cloud/security/index.md @@ -6,7 +6,7 @@ hide_title: true description: 'Landing page for ClickHouse Cloud Security' --- -# ClickHouse Cloud Security +# ClickHouse Cloud security This section delves into security in ClickHouse Cloud and contains the following pages: diff --git a/docs/cloud/security/inviting-new-users.md b/docs/cloud/security/inviting-new-users.md index 8af59db9301..38dc099a6cb 100644 --- a/docs/cloud/security/inviting-new-users.md +++ b/docs/cloud/security/inviting-new-users.md @@ -1,7 +1,7 @@ --- sidebar_label: 'Inviting new users' slug: /cloud/security/inviting-new-users -title: 'Inviting New Users' +title: 'Inviting new users' description: 'This page describes how administrators can invite new users to their organisation and assign roles to them' --- diff --git a/docs/cloud/security/personal-data-access.md b/docs/cloud/security/personal-data-access.md index 8682c52eb60..bcf4514b301 100644 --- a/docs/cloud/security/personal-data-access.md +++ b/docs/cloud/security/personal-data-access.md @@ -20,7 +20,7 @@ Depending on where you are located, applicable law may also provide you addition Please review ClickHouse's Privacy Policy for details on personal data that ClickHouse collects and how it may be used. -## Self Service {#self-service} +## Self service {#self-service} By default, ClickHouse empowers users to view their personal data directly from the ClickHouse console. @@ -38,7 +38,7 @@ Below is a summary of the data ClickHouse collects during account setup and serv Note: URLs with `OrgID` need to be updated to reflect the `OrgID` for your specific account. -### Current Customers {#current-customers} +### Current customers {#current-customers} If you have an account with us and the self-service option has not resolved your personal data issue, you can submit a Data Subject Access Request under the Privacy Policy. To do so, log into your ClickHouse account and open a [support case](https://console.clickhouse.cloud/support). This helps us verify your identity and streamline the process to address your request. @@ -51,11 +51,11 @@ Please be sure to include the following details in your support case: Support Case Form in ClickHouse Cloud -### Individuals Without an Account {#individuals-without-an-account} +### Individuals without an account {#individuals-without-an-account} If you do not have an account with us and the self-service option above has not resolved your personal-data issue, and you wish to make a Data Subject Access Request pursuant to the Privacy Policy, you may submit these requests by email to [privacy@clickhouse.com](mailto:privacy@clickhouse.com). -## Identity Verification {#identity-verification} +## Identity verification {#identity-verification} Should you submit a Data Subject Access Request through email, we may request specific information from you to help us confirm your identity and process your request. Applicable law may require or permit us to decline your request. If we decline your request, we will tell you why, subject to legal restrictions. diff --git a/docs/cloud/security/privacy-compliance-overview.md b/docs/cloud/security/privacy-compliance-overview.md index e81afc20fd2..e47d422c0a8 100644 --- a/docs/cloud/security/privacy-compliance-overview.md +++ b/docs/cloud/security/privacy-compliance-overview.md @@ -5,7 +5,7 @@ title: 'Privacy and Compliance' description: 'Landing page for privacy and compliance' --- -# Privacy and Compliance +# Privacy and compliance This section contains the following pages: diff --git a/docs/cloud/security/private-link-overview.md b/docs/cloud/security/private-link-overview.md index 9e3ada28a27..183362a8e58 100644 --- a/docs/cloud/security/private-link-overview.md +++ b/docs/cloud/security/private-link-overview.md @@ -1,14 +1,14 @@ --- -sidebar_label: 'Private Link Overview' +sidebar_label: 'Private link overview' slug: /cloud/security/private-link-overview -title: 'Private Link Overview' -description: 'Landing page for Private Link' +title: 'Private link overview' +description: 'Landing page for private link' --- -# Private Link Overview +# Private link overview ClickHouse Cloud provides the ability to connect your services to your cloud virtual network. Refer to the guides below for your provider: -- [AWS Private Link](/cloud/security/aws-privatelink.md) -- [GCP Private Service Connect](/cloud/security/gcp-private-service-connect.md) -- [Azure Private Link](/cloud/security/azure-privatelink.md) +- [AWS private Link](/cloud/security/aws-privatelink.md) +- [GCP private service connect](/cloud/security/gcp-private-service-connect.md) +- [Azure private link](/cloud/security/azure-privatelink.md) diff --git a/docs/cloud/security/saml-sso-setup.md b/docs/cloud/security/saml-sso-setup.md index 00134800a43..ea2ad2f2613 100644 --- a/docs/cloud/security/saml-sso-setup.md +++ b/docs/cloud/security/saml-sso-setup.md @@ -13,13 +13,13 @@ import samlAzureApp from '@site/static/images/cloud/security/saml-azure-app.png' import samlAzureClaims from '@site/static/images/cloud/security/saml-azure-claims.png'; import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' -# SAML SSO Setup +# SAML SSO setup ClickHouse Cloud supports single-sign on (SSO) via security assertion markup language (SAML). This enables you to sign in securely to your ClickHouse Cloud organization by authenticating with your identity provider (IdP). -We currently support service provider initiated SSO, multiple organizations using separate connections, and just-in-time provisioning. We do not yet support a system for cross-domain identity management (SCIM) or attribute mapping. +We currently support service provider-initiated SSO SSO, multiple organizations using separate connections, and just-in-time provisioning. We do not yet support a system for cross-domain identity management (SCIM) or attribute mapping. ## Before you begin {#before-you-begin} @@ -27,7 +27,7 @@ You will need Admin permissions in your IdP and the **Admin** role in your Click We recommend setting up a **direct link to your organization** in addition to your SAML connection to simplify the login process. Each IdP handles this differently. Read on for how to do this for your IdP. -## How to Configure Your IdP {#how-to-configure-your-idp} +## How to configure your IdP {#how-to-configure-your-idp} ### Steps {#steps} @@ -51,7 +51,7 @@ We recommend setting up a **direct link to your organization** in addition to yo
Configure your SAML integration - ClickHouse uses service provider initiated SAML connections. This means you can log in via https://console.clickhouse.cloud or via a direct link. We do not currently support identity provider initiated connections. Basic SAML configurations include the following: + ClickHouse uses service provider-initiated SAML connections. This means you can log in via https://console.clickhouse.cloud or via a direct link. We do not currently support identity provider initiated connections. Basic SAML configurations include the following: - SSO URL or ACS URL: `https://auth.clickhouse.cloud/login/callback?connection={organizationid}` @@ -336,35 +336,35 @@ Azure (Microsoft) SAML may also be referred to as Azure Active Directory (AD) or
-## How It Works {#how-it-works} +## How it works {#how-it-works} -### Service Provider Initiated SSO {#service-provider-initiated-sso} +### Service provider-initiated SSO {#service-provider-initiated-sso} -We only utilize service provider initiated SSO. This means users go to `https://console.clickhouse.cloud` and enter their email address to be redirected to the IdP for authentication. Users already authenticated via your IdP can use the direct link to automatically log in to your organization without entering their email address at the login page. +We only utilize service provider-initiated SSO. This means users go to `https://console.clickhouse.cloud` and enter their email address to be redirected to the IdP for authentication. Users already authenticated via your IdP can use the direct link to automatically log in to your organization without entering their email address at the login page. -### Assigning User Roles {#assigning-user-roles} +### Assigning user roles {#assigning-user-roles} Users will appear in your ClickHouse Cloud console after they are assigned to your IdP application and log in for the first time. At least one SSO user should be assigned the Admin role in your organization and additional users that login with SSO will be created with the role of ["Member"](/cloud/security/cloud-access-management/overview#console-users-and-roles), meaning that by default they do not have access to any services and should have their access and roles updated by an Admin. Use social login or `https://console.clickhouse.cloud/?with=email` to log in with your original authentication method to update your SSO role. -### Removing Non-SSO Users {#removing-non-sso-users} +### Removing non-SSO users {#removing-non-sso-users} Once you have SSO users set up and have assigned at least one user the Admin role, the Admin can remove users using other methods (e.g. social authentication or user ID + password). Google authentication will continue to work after SSO is set up. User ID + password users will be automatically redirected to SSO based on their email domain unless users use `https://console.clickhouse.cloud/?with=email`. -### Managing Users {#managing-users} +### Managing users {#managing-users} ClickHouse Cloud currently implements SAML for SSO. We have not yet implemented SCIM to manage users. This means SSO users must be assigned to the application in your IdP to access your ClickHouse Cloud organization. Users must log in to ClickHouse Cloud once to appear in the **Users** area in the organization. When users are removed in your IdP, they will not be able to log in to ClickHouse Cloud using SSO. However, the SSO user will still show in your organization until and administrator manually removes the user. -### Multi-Org SSO {#multi-org-sso} +### Multi-org SSO {#multi-org-sso} ClickHouse Cloud supports multi-organization SSO by providing a separate connection for each organization. Use the direct link (`https://console.clickhouse.cloud/?connection={organizationid}`) to log in to each respective organization. Be sure to log out of one organization before logging into another. -## Additional Information {#additional-information} +## Additional information {#additional-information} Security is our top priority when it comes to authentication. For this reason, we made a few decisions when implementing SSO that we need you to know. -- **We only process service provider initiated authentication flows.** Users must navigate to `https://console.clickhouse.cloud` and enter an email address to be redirected to your identity provider. Instructions to add a bookmark application or shortcut are provided for your convenience so your users don't need to remember the URL. +- **We only process service provider-initiated authentication flows.** Users must navigate to `https://console.clickhouse.cloud` and enter an email address to be redirected to your identity provider. Instructions to add a bookmark application or shortcut are provided for your convenience so your users don't need to remember the URL. - **All users assigned to your app via your IdP must have the same email domain.** If you have vendors, contractors or consultants you would like to have access to your ClickHouse account, they must have an email address with the same domain (e.g. user@domain.com) as your employees. diff --git a/docs/cloud/security/setting-ip-filters.md b/docs/cloud/security/setting-ip-filters.md index 7036229b17b..3d8586ef11c 100644 --- a/docs/cloud/security/setting-ip-filters.md +++ b/docs/cloud/security/setting-ip-filters.md @@ -9,7 +9,7 @@ import Image from '@theme/IdealImage'; import ip_filtering_after_provisioning from '@site/static/images/cloud/security/ip-filtering-after-provisioning.png'; import ip_filter_add_single_ip from '@site/static/images/cloud/security/ip-filter-add-single-ip.png'; -## Setting IP Filters {#setting-ip-filters} +## Setting IP filters {#setting-ip-filters} IP access lists filter traffic to ClickHouse services or API keys by specifying which source addresses are permitted to connect. These lists are configurable for each service and each API key. Lists can be configured during service or API key creation, or afterward. diff --git a/docs/concepts/olap.md b/docs/concepts/olap.md index f0033debb20..253bd85310b 100644 --- a/docs/concepts/olap.md +++ b/docs/concepts/olap.md @@ -19,7 +19,7 @@ keywords: ['OLAP'] **Online** …in real-time. -## OLAP from the Business Perspective {#olap-from-the-business-perspective} +## OLAP from the business perspective {#olap-from-the-business-perspective} In recent years business people started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in. @@ -27,7 +27,7 @@ In a business sense, OLAP allows companies to continuously plan, analyze, and re ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and so an in-house data warehouse scenario is also viable. -## OLAP from the Technical Perspective {#olap-from-the-technical-perspective} +## OLAP from the technical perspective {#olap-from-the-technical-perspective} All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). The former focuses on building reports, each based on large volumes of historical data, but by doing it less frequently. The latter usually handles a continuous stream of transactions, constantly modifying the current state of data. diff --git a/docs/concepts/why-clickhouse-is-so-fast.md b/docs/concepts/why-clickhouse-is-so-fast.md index 272865cc4d4..d3fb45a5bfe 100644 --- a/docs/concepts/why-clickhouse-is-so-fast.md +++ b/docs/concepts/why-clickhouse-is-so-fast.md @@ -15,7 +15,7 @@ We will next explain in more detail what makes ClickHouse so fast, especially co From an architectural perspective, databases consist (at least) of a storage layer and a query processing layer. While the storage layer is responsible for saving, loading, and maintaining the table data, the query processing layer executes user queries. Compared to other databases, ClickHouse provides innovations in both layers that enable extremely fast inserts and Select queries. -## Storage Layer: Concurrent inserts are isolated from each other {#storage-layer-concurrent-inserts-are-isolated-from-each-other} +## Storage layer: concurrent inserts are isolated from each other {#storage-layer-concurrent-inserts-are-isolated-from-each-other} @@ -29,7 +29,7 @@ This approach has several advantages: All data processing can be [offloaded to b 🤿 Deep dive into this in the [On-Disk Format](/docs/academic_overview#3-1-on-disk-format) section of the web version of our VLDB 2024 paper. -## Storage Layer: Concurrent inserts and selects are isolated {#storage-layer-concurrent-inserts-and-selects-are-isolated} +## Storage layer: concurrent inserts and selects are isolated {#storage-layer-concurrent-inserts-and-selects-are-isolated} @@ -37,7 +37,7 @@ Inserts are fully isolated from SELECT queries, and merging inserted data parts 🤿 Deep dive into this in the [Storage Layer](/docs/academic_overview#3-storage-layer) section of the web version of our VLDB 2024 paper. -## Storage Layer: Merge-time computation {#storage-layer-merge-time-computation} +## Storage layer: merge-time computation {#storage-layer-merge-time-computation} @@ -57,7 +57,7 @@ On the other hand, the majority of the runtime of merges is consumed by loading 🤿 Deep dive into this in the [Merge-time Data Transformation](/docs/academic_overview#3-3-merge-time-data-transformation) section of the web version of our VLDB 2024 paper. -## Storage Layer: Data pruning {#storage-layer-data-pruning} +## Storage layer: data pruning {#storage-layer-data-pruning} @@ -73,7 +73,7 @@ All three techniques aim to skip as many rows during full-column reads as possib 🤿 Deep dive into this in the [Data Pruning](/docs/academic_overview#3-2-data-pruning) section of the web version of our VLDB 2024 paper. -## Storage Layer: Data compression {#storage-layer-data-compression} +## Storage layer: data compression {#storage-layer-data-compression} diff --git a/docs/data-modeling/projections/1_projections.md b/docs/data-modeling/projections/1_projections.md index 2e42944a166..48654ca248a 100644 --- a/docs/data-modeling/projections/1_projections.md +++ b/docs/data-modeling/projections/1_projections.md @@ -22,7 +22,7 @@ queries by creating a reordering of data by attributes of interest. This can be: 1. A complete reordering 2. A subset of the original table with a different order -3. A precomputed aggregation (similar to a Materialized View) but with an ordering +3. A precomputed aggregation (similar to a materialized view) but with an ordering aligned to the aggregation.
@@ -88,8 +88,8 @@ users should be aware of and thus should be deployed sparingly. - Projections don't allow using different TTL for the source table and the (hidden) target table, materialized views allow different TTLs. - Lightweight updates and deletes are not supported for tables with projections. -- Materialized Views can be chained: the target table of one Materialized View - can be the source table of another Materialized View, and so on. This is not +- Materialized Views can be chained: the target table of one materialized view + can be the source table of another materialized view, and so on. This is not possible with projections. - Projections don't support joins, but Materialized Views do. - Projections don't support filters (`WHERE` clause), but Materialized Views do. diff --git a/docs/data-modeling/schema-design.md b/docs/data-modeling/schema-design.md index 0027e84390a..51e8dbee58f 100644 --- a/docs/data-modeling/schema-design.md +++ b/docs/data-modeling/schema-design.md @@ -139,7 +139,7 @@ Compression in ClickHouse will be impacted by 3 main factors: the ordering key, The largest initial improvement in compression and query performance can be obtained through a simple process of type optimization. A few simple rules can be applied to optimize the schema: - **Use strict types** - Our initial schema used Strings for many columns which are clearly numerics. Usage of the correct types will ensure the expected semantics when filtering and aggregating. The same applies to date types, which have been correctly provided in the Parquet files. -- **Avoid Nullable Columns** - By default the above columns have been assumed to be Null. The Nullable type allows queries to determine the difference between an empty and Null value. This creates a separate column of UInt8 type. This additional column has to be processed every time a user works with a nullable column. This leads to additional storage space used and almost always negatively affects query performance. Only use Nullable if there is a difference between the default empty value for a type and Null. For example, a value of 0 for empty values in the `ViewCount` column will likely be sufficient for most queries and not impact results. If empty values should be treated differently, they can often also be excluded from queries with a filter. +- **Avoid nullable Columns** - By default the above columns have been assumed to be Null. The Nullable type allows queries to determine the difference between an empty and Null value. This creates a separate column of UInt8 type. This additional column has to be processed every time a user works with a nullable column. This leads to additional storage space used and almost always negatively affects query performance. Only use Nullable if there is a difference between the default empty value for a type and Null. For example, a value of 0 for empty values in the `ViewCount` column will likely be sufficient for most queries and not impact results. If empty values should be treated differently, they can often also be excluded from queries with a filter. Use the minimal precision for numeric types - ClickHouse has a number of numeric types designed for different numeric ranges and precision. Always aim to minimize the number of bits used to represent a column. As well as integers of different size e.g. Int16, ClickHouse offers unsigned variants whose minimum value is 0. These can allow fewer bits to be used for a column e.g. UInt16 has a maximum value of 65535, twice that of an Int16. Prefer these types over larger signed variants if possible. - **Minimal precision for date types** - ClickHouse supports a number of date and datetime types. Date and Date32 can be used for storing pure dates, with the latter supporting a larger date range at the expense of more bits. DateTime and DateTime64 provide support for date times. DateTime is limited to second granularity and uses 32 bits. DateTime64, as the name suggests, uses 64 bits but provides support up to nanosecond granularity. As ever, choose the more coarse version acceptable for queries, minimizing the number of bits needed. - **Use LowCardinality** - Numbers, strings, Date or DateTime columns with a low number of unique values can potentially be encoded using the LowCardinality type. This dictionary encodes values, reducing the size on disk. Consider this for columns with less than 10k unique values. diff --git a/docs/deployment-guides/horizontal-scaling.md b/docs/deployment-guides/horizontal-scaling.md index 7215e598799..1aac487a345 100644 --- a/docs/deployment-guides/horizontal-scaling.md +++ b/docs/deployment-guides/horizontal-scaling.md @@ -19,7 +19,7 @@ This example architecture is designed to provide scalability. It includes three ## Environment {#environment} -### Architecture Diagram {#architecture-diagram} +### Architecture diagram {#architecture-diagram} Architecture diagram for 2 shards and 1 replica @@ -41,7 +41,7 @@ Install Clickhouse on three servers following the [instructions for your archive -## chnode1 configuration {#chnode1-configuration} +## Chnode1 configuration {#chnode1-configuration} For `chnode1`, there are five configuration files. You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately. As you read through the configuration files, you will see that most of the configuration is the same between `chnode1` and `chnode2`; the differences will be highlighted. @@ -183,7 +183,7 @@ Up above a few files ClickHouse Keeper was configured. This configuration file ``` -## chnode2 configuration {#chnode2-configuration} +## Chnode2 configuration {#chnode2-configuration} As the configuration is very similar on `chnode1` and `chnode2`, only the differences will be pointed out here. @@ -309,7 +309,7 @@ The macros configuration has one of the differences between `chnode1` and `chnod ``` -## chnode3 configuration {#chnode3-configuration} +## Chnode3 configuration {#chnode3-configuration} As `chnode3` is not storing data and is only used for ClickHouse Keeper to provide the third node in the quorum, `chnode3` has only two configuration files, one to configure the network and logging, and one to configure ClickHouse Keeper. @@ -481,7 +481,7 @@ SELECT * FROM db1.table1_dist; ``` -## More information about: {#more-information-about} +## More information about {#more-information-about} - The [Distributed Table Engine](/engines/table-engines/special/distributed.md) - [ClickHouse Keeper](/guides/sre/keeper/index.md) diff --git a/docs/deployment-guides/index.md b/docs/deployment-guides/index.md index 7543137f33b..81f83e1c016 100644 --- a/docs/deployment-guides/index.md +++ b/docs/deployment-guides/index.md @@ -4,7 +4,7 @@ title: 'Deployment Guides Overview' description: 'Landing page for the deployment and scaling section' --- -# Deployment and Scaling +# Deployment and scaling This section covers the following topics: @@ -13,4 +13,4 @@ This section covers the following topics: | [Introduction](/architecture/introduction) | | [Scaling Out](/architecture/horizontal-scaling) | | [Replication for fault tolerance](/architecture/replication) | -| [Cluster Deployment](/architecture/cluster-deployment) | +| [Cluster deployment](/architecture/cluster-deployment) | diff --git a/docs/deployment-guides/replicated.md b/docs/deployment-guides/replicated.md index 2fd2463ea28..3821ed80118 100644 --- a/docs/deployment-guides/replicated.md +++ b/docs/deployment-guides/replicated.md @@ -20,7 +20,7 @@ In this architecture, there are five servers configured. Two are used to host co ## Environment {#environment} -### Architecture Diagram {#architecture-diagram} +### Architecture diagram {#architecture-diagram} Architecture diagram for 1 shard and 2 replicas with ReplicatedMergeTree @@ -46,7 +46,7 @@ Install ClickHouse Keeper on the three servers `clickhouse-keeper-01`, `clickhou -## clickhouse-01 configuration {#clickhouse-01-configuration} +## Configure `clickhouse-01` {#clickhouse-01-configuration} For clickhouse-01 there are five configuration files. You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately. As you read through the configuration files you will see that most of the configuration is the same between clickhouse-01 and clickhouse-02; the differences will be highlighted. @@ -143,7 +143,7 @@ This configuration file `use-keeper.xml` is configuring ClickHouse Server to use ``` -## clickhouse-02 configuration {#clickhouse-02-configuration} +## Configure `clickhouse-02` {#clickhouse-02-configuration} As the configuration is very similar on clickhouse-01 and clickhouse-02 only the differences will be pointed out here. @@ -232,7 +232,7 @@ This file is the same on both clickhouse-01 and clickhouse-02. ``` -## clickhouse-keeper-01 configuration {#clickhouse-keeper-01-configuration} +## Configure `clickhouse-keeper-01` {#clickhouse-keeper-01-configuration} @@ -286,7 +286,7 @@ If for any reason a Keeper node is replaced or rebuilt, do not reuse an existing ``` -## clickhouse-keeper-02 configuration {#clickhouse-keeper-02-configuration} +## Configure `clickhouse-keeper-02` {#clickhouse-keeper-02-configuration} There is only one line difference between `clickhouse-keeper-01` and `clickhouse-keeper-02`. `server_id` is set to `2` on this node. @@ -334,7 +334,7 @@ There is only one line difference between `clickhouse-keeper-01` and `clickhouse ``` -## clickhouse-keeper-03 configuration {#clickhouse-keeper-03-configuration} +## Configure `clickhouse-keeper-03` {#clickhouse-keeper-03-configuration} There is only one line difference between `clickhouse-keeper-01` and `clickhouse-keeper-03`. `server_id` is set to `3` on this node. diff --git a/docs/dictionary/index.md b/docs/dictionary/index.md index 7eb5e1e8655..5bf2bb9bb5b 100644 --- a/docs/dictionary/index.md +++ b/docs/dictionary/index.md @@ -314,7 +314,7 @@ LIMIT 4 Peak memory usage: 666.82 MiB. ``` -## Advanced Dictionary Topics {#advanced-dictionary-topics} +## Advanced dictionary topics {#advanced-dictionary-topics} ### Choosing the Dictionary `LAYOUT` {#choosing-the-dictionary-layout} diff --git a/docs/faq/general/columnar-database.md b/docs/faq/general/columnar-database.md index 8e5202d499f..41f8c69497f 100644 --- a/docs/faq/general/columnar-database.md +++ b/docs/faq/general/columnar-database.md @@ -10,7 +10,7 @@ import Image from '@theme/IdealImage'; import RowOriented from '@site/static/images/row-oriented.gif'; import ColumnOriented from '@site/static/images/column-oriented.gif'; -# What Is a Columnar Database? {#what-is-a-columnar-database} +# What is a columnar database? {#what-is-a-columnar-database} A columnar database stores the data of each column independently. This allows reading data from disk only for those columns that are used in any given query. The cost is that operations that affect whole rows become proportionally more expensive. The synonym for a columnar database is a column-oriented database management system. ClickHouse is a typical example of such a system. diff --git a/docs/faq/general/dbms-naming.md b/docs/faq/general/dbms-naming.md index 8376ba4cfdf..5c54e43fe07 100644 --- a/docs/faq/general/dbms-naming.md +++ b/docs/faq/general/dbms-naming.md @@ -6,7 +6,7 @@ slug: /faq/general/dbms-naming description: 'Learn about What does "ClickHouse" mean?' --- -# What Does "ClickHouse" Mean? {#what-does-clickhouse-mean} +# What does "ClickHouse" mean? {#what-does-clickhouse-mean} It's a combination of "**Click**stream" and "Data ware**House**". It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet, and it still does the job. You can read more about this use case on [ClickHouse history](../../about-us/history.md) page. diff --git a/docs/faq/general/index.md b/docs/faq/general/index.md index d735adcdfae..abe0a8decd2 100644 --- a/docs/faq/general/index.md +++ b/docs/faq/general/index.md @@ -7,7 +7,7 @@ title: 'General Questions About ClickHouse' description: 'Index page listing general questions about ClickHouse' --- -# General Questions About ClickHouse +# General questions about ClickHouse - [What is ClickHouse?](../../intro.md) - [Why is ClickHouse so fast?](../../concepts/why-clickhouse-is-so-fast.md) diff --git a/docs/faq/general/mapreduce.md b/docs/faq/general/mapreduce.md index 369632883db..b056ea32858 100644 --- a/docs/faq/general/mapreduce.md +++ b/docs/faq/general/mapreduce.md @@ -7,7 +7,7 @@ description: 'This page explains why you would use ClickHouse over MapReduce' keywords: ['MapReduce'] --- -# Why Not Use Something Like MapReduce? {#why-not-use-something-like-mapreduce} +# Why not use something like MapReduce? {#why-not-use-something-like-mapreduce} We can refer to systems like MapReduce as distributed computing systems in which the reduce operation is based on distributed sorting. The most common open-source solution in this class is [Apache Hadoop](http://hadoop.apache.org). diff --git a/docs/faq/general/ne-tormozit.md b/docs/faq/general/ne-tormozit.md index b7eb9dbda62..ec09494e9a8 100644 --- a/docs/faq/general/ne-tormozit.md +++ b/docs/faq/general/ne-tormozit.md @@ -7,7 +7,7 @@ description: 'This page explains what "Не тормозит" means' keywords: ['Yandex'] --- -# What Does "Не тормозит" Mean? {#what-does-ne-tormozit-mean} +# What does "Не тормозит" mean? {#what-does-ne-tormozit-mean} We often get this question when people see vintage (limited production) ClickHouse t-shirts. They have the words **"ClickHouse не тормозит"** written in big bold text on the front. diff --git a/docs/faq/general/olap.md b/docs/faq/general/olap.md index c786e01b212..1d3c9a99c13 100644 --- a/docs/faq/general/olap.md +++ b/docs/faq/general/olap.md @@ -20,7 +20,7 @@ Analytical Online : ...in real-time. -## OLAP from the Business Perspective {#olap-from-the-business-perspective} +## OLAP from the business perspective {#olap-from-the-business-perspective} In recent years, business people started to realize the value of data. Companies who make their decisions blindly, more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be remotely useful for making business decisions and need mechanisms to timely analyze them. Here's where OLAP database management systems (DBMS) come in. @@ -28,7 +28,7 @@ In a business sense, OLAP allows companies to continuously plan, analyze, and re ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and an in-house data warehouse scenario is also viable. -## OLAP from the Technical Perspective {#olap-from-the-technical-perspective} +## OLAP from the technical perspective {#olap-from-the-technical-perspective} All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data. diff --git a/docs/faq/integration/json-import.md b/docs/faq/integration/json-import.md index 6e1776a3a18..6363b725a52 100644 --- a/docs/faq/integration/json-import.md +++ b/docs/faq/integration/json-import.md @@ -26,7 +26,7 @@ $ echo '{"foo":"bar"}' | clickhouse-client --query="INSERT INTO test FORMAT JSO Instead of inserting data manually, you might consider to use an [integration tool](../../integrations/index.mdx) instead. -## Useful Settings {#useful-settings} +## Useful settings {#useful-settings} - `input_format_skip_unknown_fields` allows to insert JSON even if there were additional fields not present in table schema (by discarding them). - `input_format_import_nested_json` allows to insert nested JSON objects into columns of [Nested](../../sql-reference/data-types/nested-data-structures/index.md) type. diff --git a/docs/faq/integration/oracle-odbc.md b/docs/faq/integration/oracle-odbc.md index 9cf9e0d2a9b..630e99c089c 100644 --- a/docs/faq/integration/oracle-odbc.md +++ b/docs/faq/integration/oracle-odbc.md @@ -6,7 +6,7 @@ toc_priority: 20 description: 'This page provides guidance on what to do if you have a problem with encodings when using Oracle via ODBC' --- -# What If I Have a Problem with Encodings When Using Oracle Via ODBC? {#oracle-odbc-encodings} +# What if I have a problem with encodings when using Oracle via ODBC? {#oracle-odbc-encodings} If you use Oracle as a source of ClickHouse external dictionaries via Oracle ODBC driver, you need to set the correct value for the `NLS_LANG` environment variable in `/etc/default/clickhouse`. For more information, see the [Oracle NLS_LANG FAQ](https://www.oracle.com/technetwork/products/globalization/nls-lang-099431.html). diff --git a/docs/faq/operations/delete-old-data.md b/docs/faq/operations/delete-old-data.md index 4cbd8b65f08..2db4ff949b4 100644 --- a/docs/faq/operations/delete-old-data.md +++ b/docs/faq/operations/delete-old-data.md @@ -6,7 +6,7 @@ toc_priority: 20 description: 'This page answers the question of whether it is possible to delete old records from a ClickHouse table' --- -# Is It Possible to Delete Old Records from a ClickHouse Table? {#is-it-possible-to-delete-old-records-from-a-clickhouse-table} +# Is it possible to delete old records from a ClickHouse table? {#is-it-possible-to-delete-old-records-from-a-clickhouse-table} The short answer is “yes”. ClickHouse has multiple mechanisms that allow freeing up disk space by removing old data. Each mechanism is aimed for different scenarios. diff --git a/docs/faq/operations/production.md b/docs/faq/operations/production.md index 14505193002..193e01358cc 100644 --- a/docs/faq/operations/production.md +++ b/docs/faq/operations/production.md @@ -6,7 +6,7 @@ toc_priority: 10 description: 'This page provides guidance on which ClickHouse version to use in production' --- -# Which ClickHouse Version to Use in Production? {#which-clickhouse-version-to-use-in-production} +# Which ClickHouse version to use in production? {#which-clickhouse-version-to-use-in-production} First of all, let's discuss why people ask this question in the first place. There are two key reasons: @@ -15,7 +15,7 @@ First of all, let's discuss why people ask this question in the first place. The The second reason is more fundamental, so we'll start with that one and then get back to navigating through various ClickHouse releases. -## Which ClickHouse Version Do You Recommend? {#which-clickhouse-version-do-you-recommend} +## Which ClickHouse version do you recommend? {#which-clickhouse-version-do-you-recommend} It's tempting to hire consultants or trust some known experts to get rid of responsibility for your production environment. You install some specific ClickHouse version that someone else recommended; if there's some issue with it - it's not your fault, it's someone else's. This line of reasoning is a big trap. No external person knows better than you what's going on in your company's production environment. @@ -46,7 +46,7 @@ When you have your pre-production environment and testing infrastructure in plac As you might have noticed, there's nothing specific to ClickHouse in the approach described above - people do that for any piece of infrastructure they rely on if they take their production environment seriously. -## How to Choose Between ClickHouse Releases? {#how-to-choose-between-clickhouse-releases} +## How to choose between ClickHouse releases? {#how-to-choose-between-clickhouse-releases} If you look into the contents of the ClickHouse package repository, you'll see two kinds of packages: diff --git a/docs/faq/troubleshooting.md b/docs/faq/troubleshooting.md index ddb2f074267..4b1221d7dea 100644 --- a/docs/faq/troubleshooting.md +++ b/docs/faq/troubleshooting.md @@ -4,7 +4,7 @@ slug: /faq/troubleshooting description: 'How to troubleshoot common ClickHouse Cloud error messages.' --- -## ClickHouse Cloud Troubleshooting {#clickhouse-cloud-troubleshooting} +## ClickHouse Cloud troubleshooting {#clickhouse-cloud-troubleshooting} ### Unable to access a ClickHouse Cloud service {#unable-to-access-a-clickhouse-cloud-service} diff --git a/docs/faq/use-cases/index.md b/docs/faq/use-cases/index.md index e86365bdc28..6331eb4d6f0 100644 --- a/docs/faq/use-cases/index.md +++ b/docs/faq/use-cases/index.md @@ -6,7 +6,7 @@ title: 'Questions About ClickHouse Use Cases' description: 'Landing page listing common questions about ClickHouse use cases' --- -# Questions About ClickHouse Use Cases +# Questions about ClickHouse use cases - [Can I use ClickHouse as a time-series database?](/knowledgebase/time-series) - [Can I use ClickHouse as a key-value storage?](/knowledgebase/key-value) diff --git a/docs/faq/use-cases/key-value.md b/docs/faq/use-cases/key-value.md index 49e5e134e6a..b044eb38ef1 100644 --- a/docs/faq/use-cases/key-value.md +++ b/docs/faq/use-cases/key-value.md @@ -6,7 +6,7 @@ toc_priority: 101 description: 'Answers the frequently asked question of whether or not ClickHouse can be used as a key-value storage?' --- -# Can I Use ClickHouse As a Key-Value Storage? {#can-i-use-clickhouse-as-a-key-value-storage} +# Can I use ClickHouse as a key-value storage? {#can-i-use-clickhouse-as-a-key-value-storage} The short answer is **"no"**. The key-value workload is among top positions in the list of cases when **NOT** to use ClickHouse. It's an [OLAP](../../faq/general/olap.md) system after all, while there are many excellent key-value storage systems out there. diff --git a/docs/faq/use-cases/time-series.md b/docs/faq/use-cases/time-series.md index dc4ea344caa..1db9ac6cba3 100644 --- a/docs/faq/use-cases/time-series.md +++ b/docs/faq/use-cases/time-series.md @@ -6,7 +6,7 @@ toc_priority: 101 description: 'Page describing how to use ClickHouse as a time-series database' --- -# Can I Use ClickHouse As a Time-Series Database? {#can-i-use-clickhouse-as-a-time-series-database} +# Can I use ClickHouse as a time-series database? {#can-i-use-clickhouse-as-a-time-series-database} _Note: Please see the blog [Working with Time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse) for additional examples of using ClickHouse for time series analysis._ diff --git a/docs/getting-started/example-datasets/brown-benchmark.md b/docs/getting-started/example-datasets/brown-benchmark.md index bde0445dfdb..59f07d3197e 100644 --- a/docs/getting-started/example-datasets/brown-benchmark.md +++ b/docs/getting-started/example-datasets/brown-benchmark.md @@ -91,7 +91,7 @@ clickhouse-client --query "INSERT INTO mgbench.logs2 FORMAT CSVWithNames" < mgbe clickhouse-client --query "INSERT INTO mgbench.logs3 FORMAT CSVWithNames" < mgbench3.csv ``` -## Run benchmark queries: {#run-benchmark-queries} +## Run benchmark queries {#run-benchmark-queries} ```sql USE mgbench; diff --git a/docs/getting-started/example-datasets/cell-towers.md b/docs/getting-started/example-datasets/cell-towers.md index 22bda6de53a..5626cb4bf89 100644 --- a/docs/getting-started/example-datasets/cell-towers.md +++ b/docs/getting-started/example-datasets/cell-towers.md @@ -40,7 +40,7 @@ Here is a preview of the dashboard created in this guide: Dashboard of cell towers by radio type in mcc 204 -## Get the Dataset {#get-the-dataset} +## Get the dataset {#get-the-dataset} This dataset is from [OpenCelliD](https://www.opencellid.org/) - The world's largest Open Database of Cell Towers. @@ -167,7 +167,7 @@ Based on the above query and the [MCC list](https://en.wikipedia.org/wiki/Mobile You may want to create a [Dictionary](../../sql-reference/dictionaries/index.md) in ClickHouse to decode these values. -## Use case: Incorporate geo data {#use-case} +## Use case: incorporate geo data {#use-case} Using the [`pointInPolygon`](/sql-reference/functions/geo/coordinates.md/#pointinpolygon) function. @@ -312,7 +312,7 @@ To build a Superset dashboard using the OpenCelliD dataset you should: If **ClickHouse Connect** is not one of your options, then you will need to install it. The command is `pip install clickhouse-connect`, and more info is [available here](https://pypi.org/project/clickhouse-connect/). ::: -#### Add your connection details: {#add-your-connection-details} +#### Add your connection details {#add-your-connection-details} :::tip Make sure that you set **SSL** on when connecting to ClickHouse Cloud or other ClickHouse systems that enforce the use of SSL. diff --git a/docs/getting-started/example-datasets/github.md b/docs/getting-started/example-datasets/github.md index 55c90fc2d9f..e598f655766 100644 --- a/docs/getting-started/example-datasets/github.md +++ b/docs/getting-started/example-datasets/github.md @@ -2414,7 +2414,7 @@ FORMAT PrettyCompactMonoBlock 3 rows in set. Elapsed: 0.170 sec. Processed 611.53 thousand rows, 41.76 MB (3.60 million rows/s., 246.07 MB/s.) ``` -## Unsolved Questions {#unsolved-questions} +## Unsolved questions {#unsolved-questions} ### Git blame {#git-blame} diff --git a/docs/getting-started/example-datasets/menus.md b/docs/getting-started/example-datasets/menus.md index 3c8d6c77ce5..bace0087323 100644 --- a/docs/getting-started/example-datasets/menus.md +++ b/docs/getting-started/example-datasets/menus.md @@ -14,7 +14,7 @@ The data is in public domain. The data is from library's archive and it may be incomplete and difficult for statistical analysis. Nevertheless it is also very yummy. The size is just 1.3 million records about dishes in the menus — it's a very small data volume for ClickHouse, but it's still a good example. -## Download the Dataset {#download-dataset} +## Download the dataset {#download-dataset} Run the command: @@ -28,7 +28,7 @@ md5sum 2021_08_01_07_01_17_data.tgz Replace the link to the up to date link from http://menus.nypl.org/data if needed. Download size is about 35 MB. -## Unpack the Dataset {#unpack-dataset} +## Unpack the dataset {#unpack-dataset} ```bash tar xvf 2021_08_01_07_01_17_data.tgz @@ -42,7 +42,7 @@ The data is normalized consisted of four tables: - `MenuPage` — Information about the pages in the menus, because every page belongs to some menu. - `MenuItem` — An item of the menu. A dish along with its price on some menu page: links to dish and menu page. -## Create the Tables {#create-tables} +## Create the tables {#create-tables} We use [Decimal](../../sql-reference/data-types/decimal.md) data type to store prices. @@ -109,7 +109,7 @@ CREATE TABLE menu_item ) ENGINE = MergeTree ORDER BY id; ``` -## Import the Data {#import-data} +## Import the data {#import-data} Upload data into ClickHouse, run: @@ -128,7 +128,7 @@ We disable [input_format_null_as_default](/operations/settings/formats#input_for The setting [date_time_input_format best_effort](/operations/settings/formats#date_time_input_format) allows to parse [DateTime](../../sql-reference/data-types/datetime.md) fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed. -## Denormalize the Data {#denormalize-data} +## Denormalize the data {#denormalize-data} Data is presented in multiple tables in [normalized form](https://en.wikipedia.org/wiki/Database_normalization#Normal_forms). It means you have to perform [JOIN](/sql-reference/statements/select/join) if you want to query, e.g. dish names from menu items. For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing `JOIN` every time. It is called "denormalized" data. @@ -180,7 +180,7 @@ FROM menu_item JOIN menu ON menu_page.menu_id = menu.id; ``` -## Validate the Data {#validate-data} +## Validate the data {#validate-data} Query: @@ -196,7 +196,7 @@ Result: └─────────┘ ``` -## Run Some Queries {#run-queries} +## Run some queries {#run-queries} ### Averaged historical prices of dishes {#query-averaged-historical-prices} @@ -240,7 +240,7 @@ Result: Take it with a grain of salt. -### Burger Prices {#query-burger-prices} +### Burger prices {#query-burger-prices} Query: @@ -354,6 +354,6 @@ Result: At least they have caviar with vodka. Very nice. -## Online Playground {#playground} +## Online playground {#playground} The data is uploaded to ClickHouse Playground, [example](https://sql.clickhouse.com?query_id=KB5KQJJFNBKHE5GBUJCP1B). diff --git a/docs/getting-started/example-datasets/metrica.md b/docs/getting-started/example-datasets/metrica.md index e17b38016ab..b06074140bd 100644 --- a/docs/getting-started/example-datasets/metrica.md +++ b/docs/getting-started/example-datasets/metrica.md @@ -6,7 +6,7 @@ slug: /getting-started/example-datasets/metrica title: 'Anonymized Web Analytics' --- -# Anonymized Web Analytics Data +# Anonymized web analytics data This dataset consists of two tables containing anonymized web analytics data with hits (`hits_v1`) and visits (`visits_v1`). @@ -14,7 +14,7 @@ The tables can be downloaded as compressed `tsv.xz` files. In addition to the sa ## Download and ingest the data {#download-and-ingest-the-data} -### Download the hits compressed TSV file: {#download-the-hits-compressed-tsv-file} +### Download the hits compressed TSV file {#download-the-hits-compressed-tsv-file} ```bash curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv @@ -41,7 +41,7 @@ Or for hits_100m_obfuscated clickhouse-client --query="CREATE TABLE default.hits_100m_obfuscated (WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, Refresh UInt8, RefererCategoryID UInt16, RefererRegionID UInt32, URLCategoryID UInt16, URLRegionID UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, OriginalURL String, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), LocalEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, RemoteIP UInt32, WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming UInt32, DNSTiming UInt32, ConnectTiming UInt32, ResponseStartTiming UInt32, ResponseEndTiming UInt32, FetchTiming UInt32, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192" ``` -### Import the hits data: {#import-the-hits-data} +### Import the hits data {#import-the-hits-data} ```bash cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000 @@ -57,7 +57,7 @@ clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1" 8873898 ``` -### Download the visits compressed TSV file: {#download-the-visits-compressed-tsv-file} +### Download the visits compressed TSV file {#download-the-visits-compressed-tsv-file} ```bash curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv @@ -131,7 +131,7 @@ FORMAT PrettyCompact" └────────────┴─────────┴────────┘ ``` -## Next Steps {#next-steps} +## Next steps {#next-steps} [A Practical Introduction to Sparse Primary Indexes in ClickHouse](/guides/best-practices/sparse-primary-indexes.md) uses the hits dataset to discuss the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices. diff --git a/docs/getting-started/example-datasets/nyc-taxi.md b/docs/getting-started/example-datasets/nyc-taxi.md index 680fbe2fbb6..2471a4137bb 100644 --- a/docs/getting-started/example-datasets/nyc-taxi.md +++ b/docs/getting-started/example-datasets/nyc-taxi.md @@ -56,7 +56,7 @@ ENGINE = MergeTree PRIMARY KEY (pickup_datetime, dropoff_datetime); ``` -## Load the Data directly from Object Storage {#load-the-data-directly-from-object-storage} +## Load the data directly from object storage {#load-the-data-directly-from-object-storage} Users' can grab a small subset of the data (3 million rows) for getting familiar with it. The data is in TSV files in object storage, which is easily streamed into ClickHouse Cloud using the `s3` table function. @@ -126,7 +126,7 @@ FROM gcs( -## Sample Queries {#sample-queries} +## Sample queries {#sample-queries} The following queries are executed on the sample described above. Users can run the sample queries on the full dataset in [sql.clickhouse.com](https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgRlJPTSBueWNfdGF4aS50cmlwcw&chart=eyJ0eXBlIjoibGluZSIsImNvbmZpZyI6eyJ0aXRsZSI6IlRlbXBlcmF0dXJlIGJ5IGNvdW50cnkgYW5kIHllYXIiLCJ4YXhpcyI6InllYXIiLCJ5YXhpcyI6ImNvdW50KCkiLCJzZXJpZXMiOiJDQVNUKHBhc3Nlbmdlcl9jb3VudCwgJ1N0cmluZycpIn19), modifying the queries below to use the table `nyc_taxi.trips`. @@ -183,7 +183,7 @@ GROUP BY passenger_count ORDER BY passenger_count ASC ``` -## Download of Prepared Partitions {#download-of-prepared-partitions} +## Download of prepared partitions {#download-of-prepared-partitions} :::note The following steps provide information about the original dataset, and a method for loading prepared partitions into a self-managed ClickHouse server environment. @@ -209,7 +209,7 @@ $ clickhouse-client --query "select count(*) from datasets.trips_mergetree" If you will run the queries described below, you have to use the full table name, `datasets.trips_mergetree`. ::: -## Results on Single Server {#results-on-single-server} +## Results on single server {#results-on-single-server} Q1: diff --git a/docs/getting-started/example-datasets/nypd_complaint_data.md b/docs/getting-started/example-datasets/nypd_complaint_data.md index ea0e240f7ee..71412ea946d 100644 --- a/docs/getting-started/example-datasets/nypd_complaint_data.md +++ b/docs/getting-started/example-datasets/nypd_complaint_data.md @@ -365,7 +365,7 @@ of `ORDER BY` or `PRIMARY KEY` must be specified. Here are some guidelines on d columns to includes in `ORDER BY`, and more information is in the *Next Steps* section at the end of this document. -### Order By and Primary Key clauses {#order-by-and-primary-key-clauses} +### `ORDER BY` and `PRIMARY KEY` clauses {#order-by-and-primary-key-clauses} - The `ORDER BY` tuple should include fields that are used in query filters - To maximize compression on disk the `ORDER BY` tuple should be ordered by ascending cardinality @@ -485,7 +485,7 @@ table: NYPD_Complaint 1 row in set. Elapsed: 0.001 sec. ``` -## Preprocess and Import Data {#preprocess-import-data} +## Preprocess and import data {#preprocess-import-data} We will use `clickhouse-local` tool for data preprocessing and `clickhouse-client` to upload it. @@ -539,7 +539,7 @@ cat ${HOME}/NYPD_Complaint_Data_Current__Year_To_Date_.tsv \ | clickhouse-client --query='INSERT INTO NYPD_Complaint FORMAT TSV' ``` -## Validate the Data {#validate-data} +## Validate the data {#validate-data} :::note The dataset changes once or more per year, your counts may not match what is in this document. @@ -580,7 +580,7 @@ Result: ``` -## Run Some Queries {#run-queries} +## Run some queries {#run-queries} ### Query 1. Compare the number of complaints by month {#query-1-compare-the-number-of-complaints-by-month} @@ -618,7 +618,7 @@ Query id: 7fbd4244-b32a-4acf-b1f3-c3aa198e74d9 12 rows in set. Elapsed: 0.006 sec. Processed 208.99 thousand rows, 417.99 KB (37.48 million rows/s., 74.96 MB/s.) ``` -### Query 2. Compare total number of complaints by Borough {#query-2-compare-total-number-of-complaints-by-borough} +### Query 2. Compare total number of complaints by borough {#query-2-compare-total-number-of-complaints-by-borough} Query: @@ -648,6 +648,6 @@ Query id: 8cdcdfd4-908f-4be0-99e3-265722a2ab8d 6 rows in set. Elapsed: 0.008 sec. Processed 208.99 thousand rows, 209.43 KB (27.14 million rows/s., 27.20 MB/s.) ``` -## Next Steps {#next-steps} +## Next steps {#next-steps} [A Practical Introduction to Sparse Primary Indexes in ClickHouse](/guides/best-practices/sparse-primary-indexes.md) discusses the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices. diff --git a/docs/getting-started/example-datasets/ontime.md b/docs/getting-started/example-datasets/ontime.md index 2c7b7c9f7e4..add5a46654a 100644 --- a/docs/getting-started/example-datasets/ontime.md +++ b/docs/getting-started/example-datasets/ontime.md @@ -125,7 +125,7 @@ CREATE TABLE `ontime` ORDER BY (Year, Quarter, Month, DayofMonth, FlightDate, IATA_CODE_Reporting_Airline); ``` -## Import from Raw Data {#import-from-raw-data} +## Import from raw data {#import-from-raw-data} Downloading data: diff --git a/docs/getting-started/example-datasets/uk-price-paid.md b/docs/getting-started/example-datasets/uk-price-paid.md index 1a1fa2d14ca..a80137b76b9 100644 --- a/docs/getting-started/example-datasets/uk-price-paid.md +++ b/docs/getting-started/example-datasets/uk-price-paid.md @@ -14,7 +14,7 @@ This data contains prices paid for real-estate property in England and Wales. Th - Description of the fields: https://www.gov.uk/guidance/about-the-price-paid-data - Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0. -## Create the Table {#create-table} +## Create the table {#create-table} ```sql CREATE DATABASE uk; @@ -40,7 +40,7 @@ ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2); ``` -## Preprocess and Insert the Data {#preprocess-import-data} +## Preprocess and insert the data {#preprocess-import-data} We will use the `url` function to stream the data into ClickHouse. We need to preprocess some of the incoming data first, which includes: - splitting the `postcode` to two different columns - `postcode1` and `postcode2`, which is better for storage and queries @@ -93,7 +93,7 @@ FROM url( Wait for the data to insert - it will take a minute or two depending on the network speed. -## Validate the Data {#validate-data} +## Validate the data {#validate-data} Let's verify it worked by seeing how many rows were inserted: @@ -112,11 +112,11 @@ WHERE name = 'uk_price_paid' Notice the size of the table is just 221.43 MiB! -## Run Some Queries {#run-queries} +## Run some queries {#run-queries} Let's run some queries to analyze the data: -### Query 1. Average Price Per Year {#average-price} +### Query 1. Average price per year {#average-price} ```sql runnable SELECT @@ -129,7 +129,7 @@ GROUP BY year ORDER BY year ``` -### Query 2. Average Price per Year in London {#average-price-london} +### Query 2. average price per year in London {#average-price-london} ```sql runnable SELECT @@ -145,7 +145,7 @@ ORDER BY year Something happened to home prices in 2020! But that is probably not a surprise... -### Query 3. The Most Expensive Neighborhoods {#most-expensive-neighborhoods} +### Query 3. The most expensive neighborhoods {#most-expensive-neighborhoods} ```sql runnable SELECT @@ -168,7 +168,7 @@ LIMIT 100 We can speed up these queries with projections. See ["Projections"](/data-modeling/projections) for examples with this dataset. -### Test it in the Playground {#playground} +### Test it in the playground {#playground} The dataset is also available in the [Online Playground](https://sql.clickhouse.com?query_id=TRCWH5ZETY4SEEK8ISCCAX). diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index 31100239e0b..54e73320931 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -8,7 +8,7 @@ slug: /getting-started/example-datasets/ title: 'Tutorials and Example Datasets' --- -# Tutorials and Example Datasets +# Tutorials and example datasets We have a lot of resources for helping you get started and learn how ClickHouse works: diff --git a/docs/getting-started/install/_snippets/_macos.md b/docs/getting-started/install/_snippets/_macos.md index 7b2eb975d4d..d3b21560ff4 100644 --- a/docs/getting-started/install/_snippets/_macos.md +++ b/docs/getting-started/install/_snippets/_macos.md @@ -16,7 +16,7 @@ the ClickHouse community [homebrew formula](https://formulae.brew.sh/cask/clickh brew install --cask clickhouse ``` -## Fix the developer verification error in MacOS {#fix-developer-verification-error-macos} +## Fix the developer verification error in macOS {#fix-developer-verification-error-macos} If you install ClickHouse using `brew`, you may encounter an error from MacOS. By default, MacOS will not run applications or tools created by a developer who cannot be verified. diff --git a/docs/getting-started/playground.md b/docs/getting-started/playground.md index 60fe41cba74..90371035308 100644 --- a/docs/getting-started/playground.md +++ b/docs/getting-started/playground.md @@ -7,7 +7,7 @@ slug: /getting-started/playground title: 'ClickHouse Playground' --- -# ClickHouse Playground +# ClickHouse playground [ClickHouse Playground](https://sql.clickhouse.com) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster. Several example datasets are available in Playground. diff --git a/docs/guides/best-practices/avoidnullablecolumns.md b/docs/guides/best-practices/avoidnullablecolumns.md index bcd9f6073a1..b7666559481 100644 --- a/docs/guides/best-practices/avoidnullablecolumns.md +++ b/docs/guides/best-practices/avoidnullablecolumns.md @@ -1,7 +1,7 @@ --- slug: /optimize/avoid-nullable-columns -sidebar_label: 'Avoid Nullable Columns' -title: 'Avoid Nullable Columns' +sidebar_label: 'Avoid nullable Columns' +title: 'Avoid nullable Columns' description: 'Why Nullable Columns should be avoided in ClickHouse' --- diff --git a/docs/guides/best-practices/index.md b/docs/guides/best-practices/index.md index 1b7acbac54a..6ff0bd04c5f 100644 --- a/docs/guides/best-practices/index.md +++ b/docs/guides/best-practices/index.md @@ -5,7 +5,7 @@ description: 'Overview page of Performance and Optimizations' title: 'Performance and Optimizations' --- -# Performance and Optimizations +# Performance and optimizations This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/parts) as a precursor to this section, @@ -22,7 +22,7 @@ which covers the main concepts required to improve performance. | [Bulk Inserts](/optimize/bulk-inserts) | Explains the benefits of using bulk inserts in ClickHouse. | | [Asynchronous Inserts](/optimize/asynchronous-inserts) | Focuses on ClickHouse's asynchronous inserts feature. It likely explains how asynchronous inserts work (batching data on the server for efficient insertion) and their benefits (improved performance by offloading insert processing). It might also cover enabling asynchronous inserts and considerations for using them effectively in your ClickHouse environment. | | [Avoid Mutations](/optimize/avoid-mutations) | Discusses the importance of avoiding mutations (updates and deletes) in ClickHouse. It recommends using append-only inserts for optimal performance and suggests alternative approaches for handling data changes. | -| [Avoid Nullable Columns](/optimize/avoid-nullable-columns) | Discusses why you may want to avoid Nullable columns to save space and increase performance. Demonstrates how to set a default value for a column. | +| [Avoid nullable columns](/optimize/avoid-nullable-columns) | Discusses why you may want to avoid nullable columns to save space and increase performance. Demonstrates how to set a default value for a column. | | [Avoid Optimize Final](/optimize/avoidoptimizefinal) | Explains how the `OPTIMIZE TABLE ... FINAL` query is resource-intensive and suggests alternative approaches to optimize ClickHouse performance. | | [Analyzer](/operations/analyzer) | Looks at the ClickHouse Analyzer, a tool for analyzing and optimizing queries. Discusses how the Analyzer works, its benefits (e.g., identifying performance bottlenecks), and how to use it to improve your ClickHouse queries' efficiency. | | [Query Profiling](/operations/optimizing-performance/sampling-query-profiler) | Explains ClickHouse's Sampling Query Profiler, a tool that helps analyze query execution. | diff --git a/docs/guides/best-practices/query-optimization.md b/docs/guides/best-practices/query-optimization.md index 4ed659e9c7b..842cc5bb450 100644 --- a/docs/guides/best-practices/query-optimization.md +++ b/docs/guides/best-practices/query-optimization.md @@ -11,7 +11,7 @@ import Image from '@theme/IdealImage'; # A simple guide for query optimization -This section aims to illustrate through common scenarios how to use different performance and optimization techniques, such as [analyzer](/operations/analyzer), [query profiling](/operations/optimizing-performance/sampling-query-profiler) or [avoid Nullable Columns](/optimize/avoid-nullable-columns), in order to improve your ClickHouse query performances. +This section aims to illustrate through common scenarios how to use different performance and optimization techniques, such as [analyzer](/operations/analyzer), [query profiling](/operations/optimizing-performance/sampling-query-profiler) or [avoid nullable Columns](/optimize/avoid-nullable-columns), in order to improve your ClickHouse query performances. ## Understand query performance {#understand-query-performance} diff --git a/docs/guides/best-practices/skipping-indexes.md b/docs/guides/best-practices/skipping-indexes.md index 60d2c7ed0e6..d2488ab8ab0 100644 --- a/docs/guides/best-practices/skipping-indexes.md +++ b/docs/guides/best-practices/skipping-indexes.md @@ -10,7 +10,7 @@ import simple_skip from '@site/static/images/guides/best-practices/simple_skip.p import bad_skip from '@site/static/images/guides/best-practices/bad_skip.png'; import Image from '@theme/IdealImage'; -# Understanding ClickHouse Data Skipping Indexes +# Understanding ClickHouse data skipping indexes ## Introduction {#introduction} @@ -22,7 +22,7 @@ In a traditional relational database, one approach to this problem is to attach Instead, ClickHouse provides a different type of index, which in specific circumstances can significantly improve query speed. These structures are labeled "Skip" indexes because they enable ClickHouse to skip reading significant chunks of data that are guaranteed to have no matching values. -## Basic Operation {#basic-operation} +## Basic operation {#basic-operation} Users can only employ Data Skipping Indexes on the MergeTree family of tables. Each data skipping has four primary arguments: @@ -113,9 +113,11 @@ example, the debug log shows that the skip index dropped all but two granules: ```sql default.skip_table (933d4b2c-8cea-4bf9-8c93-c56e900eefd1) (SelectExecutor): Index `vix` has dropped 6102/6104 granules. ``` -## Skip Index Types {#skip-index-types} +## Skip index types {#skip-index-types} + ### minmax {#minmax} + This lightweight index type requires no parameters. It stores the minimum and maximum values of the index expression for each block (if the expression is a tuple, it separately stores the values for each member of the element @@ -123,14 +125,16 @@ of the tuple). This type is ideal for columns that tend to be loosely sorted by This type of index only works correctly with a scalar or tuple expression -- the index will never be applied to expressions that return an array or map data type. + ### set {#set} + This lightweight index type accepts a single parameter of the max_size of the value set per block (0 permits an unlimited number of discrete values). This set contains all values in the block (or is empty if the number of values exceeds the max_size). This index type works well with columns with low cardinality within each set of granules (essentially, "clumped together") but higher cardinality overall. The cost, performance, and effectiveness of this index is dependent on the cardinality within blocks. If each block contains a large number of unique values, either evaluating the query condition against a large index set will be very expensive, or the index will not be applied because the index is empty due to exceeding max_size. -### Bloom Filter Types {#bloom-filter-types} +### Bloom filter types {#bloom-filter-types} A *Bloom filter* is a data structure that allows space-efficient testing of set membership at the cost of a slight chance of false positives. A false positive is not a significant concern in the case of skip indexes because the only disadvantage is reading a few unnecessary blocks. However, the potential for false positives does mean that the indexed expression should be expected to be true, otherwise valid data may be skipped. @@ -149,7 +153,7 @@ This index works only with String, FixedString, and Map datatypes. The input exp ``` This index can also be useful for text searches, particularly languages without word breaks, such as Chinese. -## Skip Index Functions {#skip-index-functions} +## Skip index functions {#skip-index-functions} The core purpose of data-skipping indexes is to limit the amount of data analyzed by popular queries. Given the analytic nature of ClickHouse data, the pattern of those queries in most cases includes functional expressions. Accordingly, skip indexes must interact correctly with common functions to be efficient. This can happen either when: * data is inserted and the index is defined as a functional expression (with the result of the expression stored in the index files), or @@ -158,7 +162,7 @@ The core purpose of data-skipping indexes is to limit the amount of data analyze Each type of skip index works on a subset of available ClickHouse functions appropriate to the index implementation listed [here](/engines/table-engines/mergetree-family/mergetree/#functions-support). In general, set indexes and Bloom filter based indexes (another type of set index) are both unordered and therefore do not work with ranges. In contrast, minmax indexes work particularly well with ranges since determining whether ranges intersect is very fast. The efficacy of partial match functions LIKE, startsWith, endsWith, and hasToken depend on the index type used, the index expression, and the particular shape of the data. -## Skip Index Settings {#skip-index-settings} +## Skip index settings {#skip-index-settings} There are two available settings that apply to skip indexes. @@ -170,7 +174,7 @@ queries. In circumstances where querying a table is too expensive unless a skip names will return an exception for any query that does not use the listed index. This would prevent poorly written queries from consuming server resources. -## Skip Best Practices {#skip-best-practices} +## Skip index best practices {#skip-best-practices} Skip indexes are not intuitive, especially for users accustomed to secondary row-based indexes from the RDMS realm or inverted indexes from document stores. To get any benefit, applying a ClickHouse data skipping index must avoid enough granule reads to offset the cost of calculating the index. Critically, if a value occurs even once in an indexed block, it means the entire block must be read into memory and evaluated, and the index cost has been needlessly incurred. diff --git a/docs/guides/best-practices/sparse-primary-indexes.md b/docs/guides/best-practices/sparse-primary-indexes.md index 10ef83e9b21..dc65f34fc8c 100644 --- a/docs/guides/best-practices/sparse-primary-indexes.md +++ b/docs/guides/best-practices/sparse-primary-indexes.md @@ -33,7 +33,7 @@ import sparsePrimaryIndexes15a from '@site/static/images/guides/best-practices/s import sparsePrimaryIndexes15b from '@site/static/images/guides/best-practices/sparse-primary-indexes-15b.png'; import Image from '@theme/IdealImage'; -# A Practical Introduction to Primary Indexes in ClickHouse +# A practical introduction to primary indexes in ClickHouse ## Introduction {#introduction} @@ -52,7 +52,7 @@ For ClickHouse [secondary data skipping indexes](/engines/table-engines/mergetre ::: -### Data Set {#data-set} +### Data set {#data-set} Throughout this guide we will use a sample anonymized web traffic data set. @@ -66,7 +66,7 @@ With these three columns we can already formulate some typical web analytics que - "What are the top 10 users that most frequently clicked a specific URL?" - "What are the most popular times (e.g. days of the week) at which a user clicks on a specific URL?" -### Test Machine {#test-machine} +### Test machine {#test-machine} All runtime numbers given in this document are based on running ClickHouse 22.2.1 locally on a MacBook Pro with the Apple M1 Pro chip and 16GB of RAM. @@ -157,7 +157,7 @@ ClickHouse client's result output indicates that ClickHouse executed a full tabl To make this (way) more efficient and (much) faster, we need to use a table with a appropriate primary key. This will allow ClickHouse to automatically (based on the primary key's column(s)) create a sparse primary index which can then be used to significantly speed up the execution of our example query. -## ClickHouse Index Design {#clickhouse-index-design} +## ClickHouse index design {#clickhouse-index-design} ### An index design for massive data scales {#an-index-design-for-massive-data-scales} diff --git a/docs/guides/creating-tables.md b/docs/guides/creating-tables.md index 924955d1c39..fe1ecf915df 100644 --- a/docs/guides/creating-tables.md +++ b/docs/guides/creating-tables.md @@ -47,7 +47,7 @@ The table engine determines: There are many engines to choose from, but for a simple table on a single-node ClickHouse server, [MergeTree](/engines/table-engines/mergetree-family/mergetree.md) is your likely choice. ::: -## A Brief Intro to Primary Keys {#a-brief-intro-to-primary-keys} +## A brief intro to primary keys {#a-brief-intro-to-primary-keys} Before you go any further, it is important to understand how primary keys work in ClickHouse (the implementation of primary keys might seem unexpected!): diff --git a/docs/guides/developer/alternative-query-languages.md b/docs/guides/developer/alternative-query-languages.md index 0ba281b61ce..9ca05f7ac0d 100644 --- a/docs/guides/developer/alternative-query-languages.md +++ b/docs/guides/developer/alternative-query-languages.md @@ -24,7 +24,7 @@ Standard SQL is the default query language of ClickHouse. SET dialect = 'clickhouse' ``` -## Pipelined Relational Query Language (PRQL) {#pipelined-relational-query-language-prql} +## Pipelined relational query language (PRQL) {#pipelined-relational-query-language-prql} @@ -48,7 +48,7 @@ aggregate { Under the hood, ClickHouse uses transpilation from PRQL to SQL to run PRQL queries. -## Kusto Query Language (KQL) {#kusto-query-language-kql} +## Kusto query language (KQL) {#kusto-query-language-kql} diff --git a/docs/guides/developer/cascading-materialized-views.md b/docs/guides/developer/cascading-materialized-views.md index 9b160601a3f..31f9aad7426 100644 --- a/docs/guides/developer/cascading-materialized-views.md +++ b/docs/guides/developer/cascading-materialized-views.md @@ -5,9 +5,9 @@ description: 'How to use multiple materialized views from a source table.' keywords: ['materialized view', 'aggregation'] --- -# Cascading Materialized Views +# Cascading materialized views -This example demonstrates how to create a Materialized View, and then how to cascade a second Materialized View on to the first. In this page, you will see how to do it, many of the possibilities, and the limitations. Different use cases can be answered by creating a Materialized view using a second Materialized view as the source. +This example demonstrates how to create a materialized view, and then how to cascade a second materialized view on to the first. In this page, you will see how to do it, many of the possibilities, and the limitations. Different use cases can be answered by creating a Materialized view using a second Materialized view as the source. @@ -54,7 +54,7 @@ You can create a materialized view on a Null table. So the data written to the t ## Monthly aggregated table and materialized view {#monthly-aggregated-table-and-materialized-view} -For the first Materialized View, we need to create the `Target` table, for this example, it will be `analytics.monthly_aggregated_data` and we will store the sum of the views by month and domain name. +For the first materialized view, we need to create the `Target` table, for this example, it will be `analytics.monthly_aggregated_data` and we will store the sum of the views by month and domain name. ```sql CREATE TABLE analytics.monthly_aggregated_data @@ -67,7 +67,7 @@ ENGINE = AggregatingMergeTree ORDER BY (domain_name, month) ``` -The Materialized View that will forward the data on the target table will look like this: +The materialized view that will forward the data on the target table will look like this: ```sql CREATE MATERIALIZED VIEW analytics.monthly_aggregated_data_mv @@ -103,7 +103,7 @@ ORDER BY (domain_name, year) This step defines the cascade. The `FROM` statement will use the `monthly_aggregated_data` table, this means the data flow will be: 1. The data comes to the `hourly_data` table. -2. ClickHouse will forward the data received to the first Materialized View `monthly_aggregated_data` table, +2. ClickHouse will forward the data received to the first materialized view `monthly_aggregated_data` table, 3. Finally, the data received in step 2 will be forwarded to the `year_aggregated_data`. ```sql diff --git a/docs/guides/developer/deduplication.md b/docs/guides/developer/deduplication.md index af77b949b72..24ad29c694e 100644 --- a/docs/guides/developer/deduplication.md +++ b/docs/guides/developer/deduplication.md @@ -10,7 +10,7 @@ import deduplication from '@site/static/images/guides/developer/de_duplication.p import Image from '@theme/IdealImage'; -# Deduplication Strategies +# Deduplication strategies **Deduplication** refers to the process of ***removing duplicate rows of a dataset***. In an OLTP database, this is done easily because each row has a unique primary key-but at the cost of slower inserts. Every inserted row needs to first be searched for and, if found, needs to be replaced. @@ -164,7 +164,7 @@ Grouping as shown in the query above can actually be more efficient (in terms of Our [Deleting and Updating Data training module](https://learn.clickhouse.com/visitor_catalog_class/show/1328954/?utm_source=clickhouse&utm_medium=docs) expands on this example, including how to use a `version` column with `ReplacingMergeTree`. -## Using CollapsingMergeTree for Updating Columns Frequently {#using-collapsingmergetree-for-updating-columns-frequently} +## Using CollapsingMergeTree for updating columns frequently {#using-collapsingmergetree-for-updating-columns-frequently} Updating a column involves deleting an existing row and replacing it with new values. As you have already seen, this type of mutation in ClickHouse happens _eventually_ - during merges. If you have a lot of rows to update, it can actually be more efficient to avoid `ALTER TABLE..UPDATE` and instead just insert the new data alongside the existing data. We could add a column that denotes whether or not the data is stale or new... and there is actually a table engine that already implements this behavior very nicely, especially considering that it deletes the stale data automatically for you. Let's see how it works. @@ -248,7 +248,7 @@ INSERT INTO hackernews_views(id, author, sign) VALUES ``` ::: -## Real-time Updates from Multiple Threads {#real-time-updates-from-multiple-threads} +## Real-time updates from multiple threads {#real-time-updates-from-multiple-threads} With a `CollapsingMergeTree` table, rows cancel each other using a sign column, and the state of a row is determined by the last row inserted. But this can be problematic if you are inserting rows from different threads where rows can be inserted out of order. Using the "last" row does not work in this situation. diff --git a/docs/guides/developer/index.md b/docs/guides/developer/index.md index 76ac534f25b..aad644e8cda 100644 --- a/docs/guides/developer/index.md +++ b/docs/guides/developer/index.md @@ -5,7 +5,7 @@ description: 'Overview of the advanced guides' title: 'Advanced Guides' --- -# Advanced Guides +# Advanced guides This section contains the following advanced guides: diff --git a/docs/guides/developer/mutations.md b/docs/guides/developer/mutations.md index 04d8e24661f..6f19b0cadd7 100644 --- a/docs/guides/developer/mutations.md +++ b/docs/guides/developer/mutations.md @@ -2,19 +2,21 @@ slug: /guides/developer/mutations sidebar_label: 'Updating and Deleting Data' sidebar_position: 1 -keywords: ['UPDATE', 'DELETE'] +keywords: ['UPDATE', 'DELETE', 'mutations'] title: 'Updating and deleting ClickHouse data' description: 'Describes how to perform update and delete operations in ClickHouse' show_related_blogs: false --- -# Updating and deleting ClickHouse data +# Updating and deleting ClickHouse data with mutations -Although ClickHouse is geared toward high volume analytic workloads, it is possible in some situations to modify or delete existing data. These operations are labeled "mutations" and are executed using the `ALTER TABLE` command. You can also `DELETE` a row using the lightweight -delete capability of ClickHouse. +Although ClickHouse is geared toward high volume analytic workloads, it is possible in some situations to modify or +delete existing data. These operations are labeled "mutations" and are executed using the `ALTER TABLE` command. :::tip -If you need to perform frequent updates, consider using [deduplication](../developer/deduplication.md) in ClickHouse, which allows you to update and/or delete rows without generating a mutation event. +If you need to perform frequent updates, consider using [deduplication](../developer/deduplication.md) in ClickHouse, which allows you to update +and/or delete rows without generating a mutation event. Alternatively, use [lightweight updates](/guides/developer/lightweight-update) +or [lightweight deletes](/guides/developer/lightweight-delete) ::: ## Updating data {#updating-data} diff --git a/docs/guides/developer/replacing-merge-tree.md b/docs/guides/developer/replacing-merge-tree.md index 9b8f3f79ca9..cb9d33b01ea 100644 --- a/docs/guides/developer/replacing-merge-tree.md +++ b/docs/guides/developer/replacing-merge-tree.md @@ -312,15 +312,15 @@ ORDER BY year ASC As shown, partitioning has significantly improved query performance in this case by allowing the deduplication process to occur at a partition level in parallel. -## Merge Behavior Considerations {#merge-behavior-considerations} +## Merge behavior considerations {#merge-behavior-considerations} ClickHouse's merge selection mechanism goes beyond simple merging of parts. Below, we examine this behavior in the context of ReplacingMergeTree, including configuration options for enabling more aggressive merging of older data and considerations for larger parts. -### Merge Selection Logic {#merge-selection-logic} +### Merge selection logic {#merge-selection-logic} While merging aims to minimize the number of parts, it also balances this goal against the cost of write amplification. Consequently, some ranges of parts are excluded from merging if they would lead to excessive write amplification, based on internal calculations. This behavior helps prevent unnecessary resource usage and extends the lifespan of storage components. -### Merging Behavior on Large Parts {#merging-behavior-on-large-parts} +### Merging behavior on large parts {#merging-behavior-on-large-parts} The ReplacingMergeTree engine in ClickHouse is optimized for managing duplicate rows by merging data parts, keeping only the latest version of each row based on a specified unique key. However, when a merged part reaches the max_bytes_to_merge_at_max_space_in_pool threshold, it will no longer be selected for further merging, even if min_age_to_force_merge_seconds is set. As a result, automatic merges can no longer be relied upon to remove duplicates that may accumulate with ongoing data insertion. @@ -328,11 +328,11 @@ To address this, users can invoke OPTIMIZE FINAL to manually merge parts and rem For a more sustainable solution that maintains performance, partitioning the table is recommended. This can help prevent data parts from reaching the maximum merge size and reduces the need for ongoing manual optimizations. -### Partitioning and Merging Across Partitions {#partitioning-and-merging-across-partitions} +### Partitioning and merging across partitions {#partitioning-and-merging-across-partitions} As discussed in Exploiting Partitions with ReplacingMergeTree, we recommend partitioning tables as a best practice. Partitioning isolates data for more efficient merges and avoids merging across partitions, particularly during query execution. This behavior is enhanced in versions from 23.12 onward: if the partition key is a prefix of the sorting key, merging across partitions is not performed at query time, leading to faster query performance. -### Tuning Merges for Better Query Performance {#tuning-merges-for-better-query-performance} +### Tuning merges for better query performance {#tuning-merges-for-better-query-performance} By default, min_age_to_force_merge_seconds and min_age_to_force_merge_on_partition_only are set to 0 and false, respectively, disabling these features. In this configuration, ClickHouse will apply standard merging behavior without forcing merges based on partition age. @@ -340,7 +340,7 @@ If a value for min_age_to_force_merge_seconds is specified, ClickHouse will igno This behavior can be further tuned by setting min_age_to_force_merge_on_partition_only=true, requiring all parts in the partition to be older than min_age_to_force_merge_seconds for aggressive merging. This configuration allows older partitions to merge down to a single part over time, which consolidates data and maintains query performance. -### Recommended Settings {#recommended-settings} +### Recommended settings {#recommended-settings} :::warning Tuning merge behavior is an advanced operation. We recommend consulting with ClickHouse support before enabling these settings in production workloads. diff --git a/docs/guides/developer/ttl.md b/docs/guides/developer/ttl.md index 4b29ca64cfc..49361a6c6cc 100644 --- a/docs/guides/developer/ttl.md +++ b/docs/guides/developer/ttl.md @@ -10,7 +10,7 @@ show_related_blogs: true import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge'; -# Manage Data with TTL (Time-to-live) +# Manage data with TTL (time-to-live) ## Overview of TTL {#overview-of-ttl} @@ -24,7 +24,7 @@ TTL (time-to-live) refers to the capability of having rows or columns moved, del TTL can be applied to entire tables or specific columns. ::: -## TTL Syntax {#ttl-syntax} +## TTL syntax {#ttl-syntax} The `TTL` clause can appear after a column definition and/or at the end of the table definition. Use the `INTERVAL` clause to define a length of time (which needs to be a `Date` or `DateTime` data type). For example, the following table has two columns with `TTL` clauses: @@ -48,7 +48,7 @@ ORDER BY tuple() TTL rules can be altered or deleted. See the [Manipulations with Table TTL](/sql-reference/statements/alter/ttl.md) page for more details. ::: -## Triggering TTL Events {#triggering-ttl-events} +## Triggering TTL events {#triggering-ttl-events} The deleting or aggregating of expired rows is not immediate - it only occurs during table merges. If you have a table that's not actively merging (for whatever reason), there are two settings that trigger TTL events: @@ -67,7 +67,7 @@ OPTIMIZE TABLE example1 FINAL `OPTIMIZE` initializes an unscheduled merge of the parts of your table, and `FINAL` forces a reoptimization if your table is already a single part. ::: -## Removing Rows {#removing-rows} +## Removing rows {#removing-rows} To remove entire rows from a table after a certain amount of time, define the TTL rule at the table level: @@ -100,7 +100,7 @@ TTL time + INTERVAL 1 MONTH DELETE WHERE event != 'error', time + INTERVAL 6 MONTH DELETE WHERE event = 'error' ``` -## Removing Columns {#removing-columns} +## Removing columns {#removing-columns} Instead of deleting the entire row, suppose you want just the balance and address columns to expire. Let's modify the `customers` table and add a TTL for both columns to be 2 hours: @@ -110,7 +110,7 @@ MODIFY COLUMN balance Int32 TTL timestamp + INTERVAL 2 HOUR, MODIFY COLUMN address String TTL timestamp + INTERVAL 2 HOUR ``` -## Implementing a Rollup {#implementing-a-rollup} +## Implementing a rollup {#implementing-a-rollup} Suppose we want to delete rows after a certain amount of time but hang on to some of the data for reporting purposes. We don't want all the details - just a few aggregated results of historical data. This can be implemented by adding a `GROUP BY` clause to your `TTL` expression, along with some columns in your table to store the aggregated results. Suppose in the following `hits` table we want to delete old rows, but hang on to the sum and maximum of the `hits` columns before removing the rows. We will need a field to store those values in, and we will need to add a `GROUP BY` clause to the `TTL` clause that rolls up the sum and maximum: diff --git a/docs/guides/developer/understanding-query-execution-with-the-analyzer.md b/docs/guides/developer/understanding-query-execution-with-the-analyzer.md index 6b8cd015399..24620bf6eb1 100644 --- a/docs/guides/developer/understanding-query-execution-with-the-analyzer.md +++ b/docs/guides/developer/understanding-query-execution-with-the-analyzer.md @@ -12,7 +12,7 @@ import analyzer4 from '@site/static/images/guides/developer/analyzer4.png'; import analyzer5 from '@site/static/images/guides/developer/analyzer5.png'; import Image from '@theme/IdealImage'; -# Understanding Query Execution with the Analyzer +# Understanding query execution with the analyzer ClickHouse processes queries extremely quickly, but the execution of a query is not a simple story. Let's try to understand how a `SELECT` query gets executed. To illustrate it, let's add some data in a table in ClickHouse: @@ -241,7 +241,7 @@ GROUP BY type You can now see all the inputs, functions, aliases, and data types that are being used. You can see some of the optimizations that the planner is going to apply [here](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/QueryPlan/Optimizations/Optimizations.h). -## Query Pipeline {#query-pipeline} +## Query pipeline {#query-pipeline} A query pipeline is generated from the query plan. The query pipeline is very similar to the query plan, with the difference that it's not a tree but a graph. It highlights how ClickHouse is going to execute a query and what resources are going to be used. Analyzing the query pipeline is very useful to see where the bottleneck is in terms of inputs/outputs. Let's take our previous query and look at the query pipeline execution: diff --git a/docs/guides/examples/aggregate_function_combinators/anyIf.md b/docs/guides/examples/aggregate_function_combinators/anyIf.md index 57e38ad0352..a0368717512 100644 --- a/docs/guides/examples/aggregate_function_combinators/anyIf.md +++ b/docs/guides/examples/aggregate_function_combinators/anyIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be aggregate function to select the first encountered element from a given column that matches the given condition. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores sales data with success flags, and we'll use `anyIf` to select the first `transaction_id`s which are above and diff --git a/docs/guides/examples/aggregate_function_combinators/argMaxIf.md b/docs/guides/examples/aggregate_function_combinators/argMaxIf.md index e490e789c3a..f3ebb5dc512 100644 --- a/docs/guides/examples/aggregate_function_combinators/argMaxIf.md +++ b/docs/guides/examples/aggregate_function_combinators/argMaxIf.md @@ -18,7 +18,7 @@ The `argMaxIf` function is useful when you need to find the value associated wit the maximum value in a dataset, but only for rows that satisfy a specific condition. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll use a sample dataset of product sales to demonstrate how `argMaxIf` works. We'll find the product name that has the highest price, but diff --git a/docs/guides/examples/aggregate_function_combinators/argMinIf.md b/docs/guides/examples/aggregate_function_combinators/argMinIf.md index 2134379106a..9fd236e7ac7 100644 --- a/docs/guides/examples/aggregate_function_combinators/argMinIf.md +++ b/docs/guides/examples/aggregate_function_combinators/argMinIf.md @@ -18,7 +18,7 @@ The `argMinIf` function is useful when you need to find the value associated with the minimum value in a dataset, but only for rows that satisfy a specific condition. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores product prices and their timestamps, and we'll use `argMinIf` to find the lowest price for each product when it's in stock. diff --git a/docs/guides/examples/aggregate_function_combinators/avgIf.md b/docs/guides/examples/aggregate_function_combinators/avgIf.md index ad9ccb523c4..a77130bef53 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgIf.md +++ b/docs/guides/examples/aggregate_function_combinators/avgIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to calculate the arithmetic mean of values for rows where the condition is true, using the `avgIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores sales data with success flags, and we'll use `avgIf` to calculate the average sale amount for successful transactions. diff --git a/docs/guides/examples/aggregate_function_combinators/avgMap.md b/docs/guides/examples/aggregate_function_combinators/avgMap.md index 1dcdbd0e51b..51f73f3cf48 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgMap.md +++ b/docs/guides/examples/aggregate_function_combinators/avgMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the arithmetic mean of values in a Map according to each key, using the `avgMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/avgMerge.md b/docs/guides/examples/aggregate_function_combinators/avgMerge.md index 72fb2c9dbf5..34a9827561f 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgMerge.md +++ b/docs/guides/examples/aggregate_function_combinators/avgMerge.md @@ -14,7 +14,7 @@ The [`Merge`](/sql-reference/aggregate-functions/combinators#-state) combinator can be applied to the [`avg`](/sql-reference/aggregate-functions/reference/avg) function to produce a final result by combining partial aggregate states. -## Example Usage {#example-usage} +## Example usage {#example-usage} The `Merge` combinator is closely related to the `State` combinator. Refer to ["avgState example usage"](/examples/aggregate-function-combinators/avgState/#example-usage) diff --git a/docs/guides/examples/aggregate_function_combinators/avgMergeState.md b/docs/guides/examples/aggregate_function_combinators/avgMergeState.md index d4c132b16a3..916e21fb12a 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgMergeState.md +++ b/docs/guides/examples/aggregate_function_combinators/avgMergeState.md @@ -18,7 +18,7 @@ can be applied to the [`avg`](/sql-reference/aggregate-functions/reference/avg) function to merge partial aggregate states of type `AverageFunction(avg, T)` and return a new intermediate aggregation state. -## Example Usage {#example-usage} +## Example usage {#example-usage} The `MergeState` combinator is particularly useful for multi-level aggregation scenarios where you want to combine pre-aggregated states and maintain them as @@ -43,7 +43,7 @@ ORDER BY (region, server_id, timestamp); ``` We'll create a server-level aggregation target table and define an Incremental -Materialized View acting as an insert trigger to it: +materialized view acting as an insert trigger to it: ```sql CREATE TABLE server_performance @@ -88,7 +88,7 @@ AS SELECT FROM server_performance GROUP BY region, datacenter; --- datacenter level table and Materialized View +-- datacenter level table and materialized view CREATE TABLE datacenter_performance ( diff --git a/docs/guides/examples/aggregate_function_combinators/avgResample.md b/docs/guides/examples/aggregate_function_combinators/avgResample.md index 029efdecdc4..bdbeb9f91d5 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgResample.md +++ b/docs/guides/examples/aggregate_function_combinators/avgResample.md @@ -15,7 +15,7 @@ combinator can be applied to the [`count`](/sql-reference/aggregate-functions/re aggregate function to count values of a specified key column in a fixed number of intervals (`N`). -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Basic example {#basic-example} diff --git a/docs/guides/examples/aggregate_function_combinators/avgState.md b/docs/guides/examples/aggregate_function_combinators/avgState.md index fa6536ffb8e..e0e2317701d 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgState.md +++ b/docs/guides/examples/aggregate_function_combinators/avgState.md @@ -15,7 +15,7 @@ can be applied to the [`avg`](/sql-reference/aggregate-functions/reference/avg) function to produce an intermediate state of `AggregateFunction(avg, T)` type where `T` is the specified type for the average. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll look at how we can use the `AggregateFunction` type, together with the `avgState` function to aggregate website traffic data. @@ -49,7 +49,7 @@ ENGINE = AggregatingMergeTree() ORDER BY page_id; ``` -Create an Incremental Materialized View that will act as an insert trigger to +Create an Incremental materialized view that will act as an insert trigger to new data and store the intermediate state data in the target table defined above: ```sql diff --git a/docs/guides/examples/aggregate_function_combinators/countIf.md b/docs/guides/examples/aggregate_function_combinators/countIf.md index 842c77397a8..53aac092dd4 100644 --- a/docs/guides/examples/aggregate_function_combinators/countIf.md +++ b/docs/guides/examples/aggregate_function_combinators/countIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to count the number of rows where the condition is true, using the `countIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores user login attempts, and we'll use `countIf` to count the number of successful logins. diff --git a/docs/guides/examples/aggregate_function_combinators/countResample.md b/docs/guides/examples/aggregate_function_combinators/countResample.md index c20f0aca74d..f90bb6a168c 100644 --- a/docs/guides/examples/aggregate_function_combinators/countResample.md +++ b/docs/guides/examples/aggregate_function_combinators/countResample.md @@ -15,7 +15,7 @@ combinator can be applied to the [`count`](/sql-reference/aggregate-functions/re aggregate function to count values of a specified key column in a fixed number of intervals (`N`). -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Basic example {#basic-example} diff --git a/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md b/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md index fd98073b543..dd7350258fc 100644 --- a/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md +++ b/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md @@ -14,7 +14,7 @@ The [`groupArrayDistinct`](/sql-reference/aggregate-functions/combinators#-forea can be applied to the [`groupArray`](/sql-reference/aggregate-functions/reference/sum) aggregate function to create an array of distinct argument values. -## Example Usage {#example-usage} +## Example usage {#example-usage} For this example we'll make use of the `hits` dataset available in our [SQL playground](https://sql.clickhouse.com/). diff --git a/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md b/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md index 9abea133bb0..38176eaa49f 100644 --- a/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md +++ b/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md @@ -17,7 +17,7 @@ and construct the resulting array by selecting one representative value (corresponding to the minimum key) from the data points falling into each interval. It creates a downsampled view of the data rather than collecting all values. -## Example Usage {#example-usage} +## Example usage {#example-usage} Let's look at an example. We'll create a table which contains the `name`, `age` and `wage` of employees, and we'll insert some data into it: diff --git a/docs/guides/examples/aggregate_function_combinators/maxMap.md b/docs/guides/examples/aggregate_function_combinators/maxMap.md index dd2e524ac04..e1ffe4907fb 100644 --- a/docs/guides/examples/aggregate_function_combinators/maxMap.md +++ b/docs/guides/examples/aggregate_function_combinators/maxMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the maximum value in a Map according to each key, using the `maxMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md b/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md index 2b758124b06..729637ce814 100644 --- a/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md +++ b/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md @@ -14,7 +14,7 @@ The [`SimpleState`](/sql-reference/aggregate-functions/combinators#-simplestate) function to return the maximum value across all input values. It returns the result with type `SimpleAggregateState`. -## Example Usage {#example-usage} +## Example usage {#example-usage} The example given in [`minSimpleState`](/examples/aggregate-function-combinators/minSimpleState/#example-usage) demonstrates a usage of both `maxSimpleState` and `minSimpleState`. diff --git a/docs/guides/examples/aggregate_function_combinators/minMap.md b/docs/guides/examples/aggregate_function_combinators/minMap.md index e72d0d86954..e8843244f64 100644 --- a/docs/guides/examples/aggregate_function_combinators/minMap.md +++ b/docs/guides/examples/aggregate_function_combinators/minMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the minimum value in a Map according to each key, using the `minMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/minSimpleState.md b/docs/guides/examples/aggregate_function_combinators/minSimpleState.md index fdad0a2e374..0d7fa86a4ac 100644 --- a/docs/guides/examples/aggregate_function_combinators/minSimpleState.md +++ b/docs/guides/examples/aggregate_function_combinators/minSimpleState.md @@ -14,7 +14,7 @@ The [`SimpleState`](/sql-reference/aggregate-functions/combinators#-simplestate) function to return the minimum value across all input values. It returns the result with type [`SimpleAggregateFunction`](/docs/sql-reference/data-types/simpleaggregatefunction). -## Example Usage {#example-usage} +## Example usage {#example-usage} Let's look at a practical example using a table that tracks daily temperature readings. For each location, we want to maintain the lowest temperature recorded. @@ -49,7 +49,7 @@ ENGINE = AggregatingMergeTree() ORDER BY location_id; ``` -Create an Incremental Materialized View that will act as an insert trigger +Create an Incremental materialized view that will act as an insert trigger for inserted data and maintains the minimum, maximum temperatures per location. ```sql @@ -74,7 +74,7 @@ INSERT INTO raw_temperature_readings (location_id, location_name, temperature) V (4, 'East', 8); ``` -These readings are automatically processed by the Materialized View. Let's check +These readings are automatically processed by the materialized view. Let's check the current state: ```sql diff --git a/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md b/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md index 651734121f2..3df6548f982 100644 --- a/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md +++ b/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md @@ -15,7 +15,7 @@ combinator can be applied to the [`quantilesTiming`](/sql-reference/aggregate-fu function to calculate quantiles of timing values in arrays for rows where the condition is true, using the `quantilesTimingArrayIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores API response times for different endpoints, and we'll use `quantilesTimingArrayIf` to calculate response time quantiles for successful requests. diff --git a/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md b/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md index 6d33707032c..4dc1ae5743b 100644 --- a/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md +++ b/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to calculate quantiles of timing values for rows where the condition is true, using the `quantilesTimingIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores API response times for different endpoints, and we'll use `quantilesTimingIf` to calculate response time quantiles for successful requests. diff --git a/docs/guides/examples/aggregate_function_combinators/sumArray.md b/docs/guides/examples/aggregate_function_combinators/sumArray.md index e29fdb7e845..0f1fdabf461 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumArray.md +++ b/docs/guides/examples/aggregate_function_combinators/sumArray.md @@ -18,7 +18,7 @@ aggregate combinator function. The `sumArray` function is useful when you need to calculate the total sum of all elements across multiple arrays in a dataset. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll use a sample dataset of daily sales across different product categories to demonstrate how `sumArray` works. We'll calculate the total diff --git a/docs/guides/examples/aggregate_function_combinators/sumForEach.md b/docs/guides/examples/aggregate_function_combinators/sumForEach.md index 1b3be84900e..8184ccf01f2 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumForEach.md +++ b/docs/guides/examples/aggregate_function_combinators/sumForEach.md @@ -15,7 +15,7 @@ can be applied to the [`sum`](/sql-reference/aggregate-functions/reference/sum) function which operates on row values to an aggregate function which operates on array columns, applying the aggregate to each element in the array across rows. -## Example Usage {#example-usage} +## Example usage {#example-usage} For this example we'll make use of the `hits` dataset available in our [SQL playground](https://sql.clickhouse.com/). diff --git a/docs/guides/examples/aggregate_function_combinators/sumIf.md b/docs/guides/examples/aggregate_function_combinators/sumIf.md index f6dc4e02f8c..7b70e7117ea 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumIf.md +++ b/docs/guides/examples/aggregate_function_combinators/sumIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to calculate the sum of values for rows where the condition is true, using the `sumIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores sales data with success flags, and we'll use `sumIf` to calculate the total sales amount for successful transactions. diff --git a/docs/guides/examples/aggregate_function_combinators/sumMap.md b/docs/guides/examples/aggregate_function_combinators/sumMap.md index 17829dbb144..fda6b895388 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumMap.md +++ b/docs/guides/examples/aggregate_function_combinators/sumMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the sum of values in a Map according to each key, using the `sumMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md b/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md index d048462ba85..35b8759ea35 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md +++ b/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md @@ -14,7 +14,7 @@ The [`SimpleState`](/sql-reference/aggregate-functions/combinators#-simplestate) function to return the sum across all input values. It returns the result with type [`SimpleAggregateFunction`](/docs/sql-reference/data-types/simpleaggregatefunction). -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Tracking upvotes and downvotes {#tracking-post-votes} @@ -51,7 +51,7 @@ ENGINE = AggregatingMergeTree() ORDER BY post_id; ``` -We then create a Materialized View with `SimpleAggregateFunction` type columns: +We then create a materialized view with `SimpleAggregateFunction` type columns: ```sql CREATE MATERIALIZED VIEW mv_vote_processor TO vote_aggregates @@ -79,7 +79,7 @@ INSERT INTO raw_votes VALUES (3, 'downvote'); ``` -Query the Materialized View using the `SimpleState` combinator: +Query the materialized view using the `SimpleState` combinator: ```sql SELECT diff --git a/docs/guides/examples/aggregate_function_combinators/uniqArray.md b/docs/guides/examples/aggregate_function_combinators/uniqArray.md index 3fa5f29a381..07c651fb298 100644 --- a/docs/guides/examples/aggregate_function_combinators/uniqArray.md +++ b/docs/guides/examples/aggregate_function_combinators/uniqArray.md @@ -19,7 +19,7 @@ The `uniqArray` function is useful when you need to count unique elements across multiple arrays in a dataset. It's equivalent to using `uniq(arrayJoin())`, where `arrayJoin` first flattens the arrays and then `uniq` counts the unique elements. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll use a sample dataset of user interests across different categories to demonstrate how `uniqArray` works. We'll compare it with diff --git a/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md b/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md index 04caee8afc3..31470be7e31 100644 --- a/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md +++ b/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md @@ -21,7 +21,7 @@ condition is true, using the `uniqArrayIf` aggregate combinator function. This is useful when you want to count unique elements in an array based on specific conditions without having to use `arrayJoin`. -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Count unique products viewed by segment type and engagement level {#count-unique-products} diff --git a/docs/guides/inserting-data.md b/docs/guides/inserting-data.md index 2c9a9e2fd5a..4e26d2c13ff 100644 --- a/docs/guides/inserting-data.md +++ b/docs/guides/inserting-data.md @@ -10,7 +10,7 @@ show_related_blogs: true import postgres_inserts from '@site/static/images/guides/postgres-inserts.png'; import Image from '@theme/IdealImage'; -## Basic Example {#basic-example} +## Basic example {#basic-example} You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables). @@ -38,7 +38,7 @@ user_id message timestamp 102 Sort your data based on your commonly-used queries 2024-11-13 00:00:00 2.718 ``` -## Inserting into ClickHouse vs. OLTP Databases {#inserting-into-clickhouse-vs-oltp-databases} +## Inserting into ClickHouse vs. OLTP databases {#inserting-into-clickhouse-vs-oltp-databases} As an OLAP (Online Analytical Processing) database, ClickHouse is optimized for high performance and scalability, allowing potentially millions of rows to be inserted per second. This is achieved through a combination of a highly parallelized architecture and efficient column-oriented compression, but with compromises on immediate consistency. @@ -51,7 +51,7 @@ These transactions can potentially involve a small number of rows at a time, wit To achieve high insert performance while maintaining strong consistency guarantees, users should adhere to the simple rules described below when inserting data into ClickHouse. Following these rules will help to avoid issues users commonly encounter the first time they use ClickHouse, and try to replicate an insert strategy that works for OLTP databases. -## Best Practices for Inserts {#best-practices-for-inserts} +## Best practices for Inserts {#best-practices-for-inserts} ### Insert in large batch sizes {#insert-in-large-batch-sizes} @@ -118,7 +118,7 @@ These are optimized to ensure that inserts are performed correctly and natively See [Clients and Drivers](/interfaces/cli) for a full list of available ClickHouse clients and drivers. -### Prefer the Native format {#prefer-the-native-format} +### Prefer the native format {#prefer-the-native-format} ClickHouse supports many [input formats](/interfaces/formats) at insert (and query) time. This is a significant difference with OLTP databases and makes loading data from external sources much easier - especially when coupled with [table functions](/sql-reference/table-functions) and the ability to load data from files on disk. diff --git a/docs/guides/joining-tables.md b/docs/guides/joining-tables.md index 25fd10ee073..77ba8d118f6 100644 --- a/docs/guides/joining-tables.md +++ b/docs/guides/joining-tables.md @@ -123,7 +123,7 @@ WHERE (VoteTypeId = 2) AND (PostId IN ( Peak memory usage: 250.66 MiB. ``` -## Choosing a join algorithm {#choosing-a-join-algorithm} +## Choosing a JOIN algorithm {#choosing-a-join-algorithm} ClickHouse supports a number of [join algorithms](https://clickhouse.com/blog/clickhouse-fully-supports-joins-part1). These algorithms typically trade memory usage for performance. The following provides an overview of the ClickHouse join algorithms based on their relative memory consumption and execution time: diff --git a/docs/guides/manage-and-deploy-index.md b/docs/guides/manage-and-deploy-index.md index 68159deef47..388bfcaa4c1 100644 --- a/docs/guides/manage-and-deploy-index.md +++ b/docs/guides/manage-and-deploy-index.md @@ -4,7 +4,7 @@ description: 'Overview page for Manage and Deploy' slug: /guides/manage-and-deploy-index --- -# Manage and Deploy +# Manage and deploy This section contains the following topics: @@ -12,7 +12,7 @@ This section contains the following topics: |-------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------| | [Deployment and Scaling](/deployment-guides/index) | Working deployment examples based on the advice provided to ClickHouse users by the ClickHouse Support and Services organization. | | [Separation of Storage and Compute](/guides/separation-storage-compute) | Guide exploring how you can use ClickHouse and S3 to implement an architecture with separated storage and compute. | -| [Sizing and Hardware Recommendations](/guides/sizing-and-hardware-recommendations) | Guide discussing general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. | +| [Sizing and hardware recommendations'](/guides/sizing-and-hardware-recommendations) | Guide discussing general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. | | [Configuring ClickHouse Keeper](/guides/sre/keeper/clickhouse-keeper) | Information and examples on how to configure ClickHouse Keeper. | | [Network ports](/guides/sre/network-ports) | List of network ports used by ClickHouse. | | [Re-balancing Shards](/guides/sre/scaling-clusters) | Recommendations on re-balancing shards. | diff --git a/docs/guides/separation-storage-compute.md b/docs/guides/separation-storage-compute.md index 8fe6d258ac8..b404f55e7f1 100644 --- a/docs/guides/separation-storage-compute.md +++ b/docs/guides/separation-storage-compute.md @@ -10,7 +10,7 @@ import Image from '@theme/IdealImage'; import BucketDetails from '@site/docs/_snippets/_S3_authentication_and_bucket.md'; import s3_bucket_example from '@site/static/images/guides/s3_bucket_example.png'; -# Separation of Storage and Compute +# Separation of storage and compute ## Overview {#overview} @@ -170,7 +170,7 @@ For fault tolerance, you can use multiple ClickHouse server nodes distributed ac Replication with S3 disks can be accomplished by using the `ReplicatedMergeTree` table engine. See the following guide for details: - [Replicating a single shard across two AWS regions using S3 Object Storage](/integrations/s3#s3-multi-region). -## Further Reading {#further-reading} +## Further reading {#further-reading} - [SharedMergeTree table engine](/cloud/reference/shared-merge-tree) - [SharedMergeTree announcement blog](https://clickhouse.com/blog/clickhouse-cloud-boosts-performance-with-sharedmergetree-and-lightweight-updates) diff --git a/docs/guides/sre/keeper/index.md b/docs/guides/sre/keeper/index.md index 73165938590..940e1f2b9bb 100644 --- a/docs/guides/sre/keeper/index.md +++ b/docs/guides/sre/keeper/index.md @@ -160,7 +160,7 @@ If you don't have the symlink (`clickhouse-keeper`) you can create it or specify clickhouse keeper --config /etc/your_path_to_config/config.xml ``` -### Four Letter Word Commands {#four-letter-word-commands} +### Four letter word commands {#four-letter-word-commands} ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively. @@ -409,7 +409,7 @@ AIOWriteBytes 0 Number of bytes written with Linux or FreeBSD AIO interf ... ``` -### HTTP Control {#http-control} +### HTTP control {#http-control} ClickHouse Keeper provides an HTTP interface to check if a replica is ready to receive traffic. It may be used in cloud environments, such as [Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes). @@ -679,11 +679,11 @@ curl 127.0.0.1:9363/metrics Please also see the ClickHouse Cloud [Prometheus integration](/integrations/prometheus). -## ClickHouse Keeper User Guide {#clickhouse-keeper-user-guide} +## ClickHouse Keeper user guide {#clickhouse-keeper-user-guide} This guide provides simple and minimal settings to configure ClickHouse Keeper with an example on how to test distributed operations. This example is performed using 3 nodes on Linux. -### 1. Configure Nodes with Keeper settings {#1-configure-nodes-with-keeper-settings} +### 1. Configure nodes with Keeper settings {#1-configure-nodes-with-keeper-settings} 1. Install 3 ClickHouse instances on 3 hosts (`chnode1`, `chnode2`, `chnode3`). (View the [Quick Start](/getting-started/install/install.mdx) for details on installing ClickHouse.) @@ -954,7 +954,7 @@ this avoids having to wait several minutes for Keeper garbage collection to remove path entries as each time a path is created a new `uuid` is used in that path; paths are never reused. -### Example Environment {#example-environment} +### Example environment {#example-environment} A three node cluster that will be configured to have ClickHouse Keeper on all three nodes, and ClickHouse on two of the nodes. This provides ClickHouse Keeper with three nodes (including a tiebreaker node), and @@ -1347,7 +1347,7 @@ Sometimes it's necessary to extend experimental keeper node into a cluster. Here To get confident with the process, here's a [sandbox repository](https://github.com/ClickHouse/keeper-extend-cluster). -## Unsupported Features {#unsupported-features} +## Unsupported features {#unsupported-features} While ClickHouse Keeper aims to be fully compatible with ZooKeeper, there are some features that are currently not implemented (although development is ongoing): diff --git a/docs/guides/sre/scaling-clusters.md b/docs/guides/sre/scaling-clusters.md index d2663f78f12..b9a6bfecbcd 100644 --- a/docs/guides/sre/scaling-clusters.md +++ b/docs/guides/sre/scaling-clusters.md @@ -6,7 +6,7 @@ description: 'ClickHouse does not support automatic shard rebalancing, so we pro title: 'Rebalancing Data' --- -# Rebalancing Data +# Rebalancing data ClickHouse does not support automatic shard rebalancing. However, there are ways to rebalance shards in order of preference: diff --git a/docs/guides/sre/user-management/configuring-ldap.md b/docs/guides/sre/user-management/configuring-ldap.md index e660e6ac459..e2ca9b41230 100644 --- a/docs/guides/sre/user-management/configuring-ldap.md +++ b/docs/guides/sre/user-management/configuring-ldap.md @@ -8,7 +8,7 @@ description: 'Describes how to configure ClickHouse to use LDAP for authenticati import SelfManaged from '@site/docs/_snippets/_self_managed_only_no_roadmap.md'; -# Configuring ClickHouse to Use LDAP for Authentication and Role Mapping +# Configuring ClickHouse to use LDAP for authentication and role mapping diff --git a/docs/guides/sre/user-management/index.md b/docs/guides/sre/user-management/index.md index c25d07c17ca..8e344d246bb 100644 --- a/docs/guides/sre/user-management/index.md +++ b/docs/guides/sre/user-management/index.md @@ -7,7 +7,7 @@ keywords: ['ClickHouse Cloud', 'Access Control', 'User Management', 'RBAC', 'Sec description: 'Describes access control and account management in ClickHouse Cloud' --- -# Creating Users and Roles in ClickHouse +# Creating users and roles in ClickHouse ClickHouse supports access control management based on [RBAC](https://en.wikipedia.org/wiki/Role-based_access_control) approach. @@ -33,7 +33,7 @@ You can't manage the same access entity by both configuration methods simultaneo ::: :::note -If you are looking to manage ClickHouse Cloud Console users, please refer to this [page](/cloud/security/cloud-access-management) +If you are looking to manage ClickHouse Cloud console users, please refer to this [page](/cloud/security/cloud-access-management) ::: To see all users, roles, profiles, etc. and all their grants use [`SHOW ACCESS`](/sql-reference/statements/show#show-access) statement. @@ -48,13 +48,13 @@ If you just started using ClickHouse, consider the following scenario: 2. Log in to the `default` user account and create all the required users. Don't forget to create an administrator account (`GRANT ALL ON *.* TO admin_user_account WITH GRANT OPTION`). 3. [Restrict permissions](/operations/settings/permissions-for-queries) for the `default` user and disable SQL-driven access control and account management for it. -### Properties of Current Solution {#access-control-properties} +### Properties of current solution {#access-control-properties} - You can grant permissions for databases and tables even if they do not exist. - If a table is deleted, all the privileges that correspond to this table are not revoked. This means that even if you create a new table with the same name later, all the privileges remain valid. To revoke privileges corresponding to the deleted table, you need to execute, for example, the `REVOKE ALL PRIVILEGES ON db.table FROM ALL` query. - There are no lifetime settings for privileges. -### User Account {#user-account-management} +### User account {#user-account-management} A user account is an access entity that allows to authorize someone in ClickHouse. A user account contains: @@ -75,7 +75,7 @@ Management queries: - [SHOW CREATE USER](/sql-reference/statements/show#show-create-user) - [SHOW USERS](/sql-reference/statements/show#show-users) -### Settings Applying {#access-control-settings-applying} +### Settings applying {#access-control-settings-applying} Settings can be configured differently: for a user account, in its granted roles and in settings profiles. At user login, if a setting is configured for different access entities, the value and constraints of this setting are applied as follows (from higher to lower priority): @@ -106,7 +106,7 @@ Management queries: Privileges can be granted to a role by the [GRANT](/sql-reference/statements/grant.md) query. To revoke privileges from a role ClickHouse provides the [REVOKE](/sql-reference/statements/revoke.md) query. -#### Row Policy {#row-policy-management} +#### Row policy {#row-policy-management} Row policy is a filter that defines which of the rows are available to a user or a role. Row policy contains filters for one particular table, as well as a list of roles and/or users which should use this row policy. @@ -122,7 +122,7 @@ Management queries: - [SHOW CREATE ROW POLICY](/sql-reference/statements/show#show-create-row-policy) - [SHOW POLICIES](/sql-reference/statements/show#show-policies) -### Settings Profile {#settings-profiles-management} +### Settings profile {#settings-profiles-management} Settings profile is a collection of [settings](/operations/settings/index.md). Settings profile contains settings and constraints, as well as a list of roles and/or users to which this profile is applied. @@ -149,7 +149,7 @@ Management queries: - [SHOW QUOTA](/sql-reference/statements/show#show-quota) - [SHOW QUOTAS](/sql-reference/statements/show#show-quotas) -### Enabling SQL-driven Access Control and Account Management {#enabling-access-control} +### Enabling SQL-driven access control and account management {#enabling-access-control} - Setup a directory for configuration storage. @@ -160,7 +160,7 @@ Management queries: By default, SQL-driven access control and account management is disabled for all users. You need to configure at least one user in the `users.xml` configuration file and set the values of the [`access_management`](/operations/settings/settings-users.md#access_management-user-setting), `named_collection_control`, `show_named_collections`, and `show_named_collections_secrets` settings to 1. -## Defining SQL Users and Roles {#defining-sql-users-and-roles} +## Defining SQL users and roles {#defining-sql-users-and-roles} :::tip If you are working in ClickHouse Cloud, please see [Cloud access management](/cloud/security/cloud-access-management). @@ -201,7 +201,7 @@ This article shows the basics of defining SQL users and roles and applying those GRANT ALL ON *.* TO clickhouse_admin WITH GRANT OPTION; ``` -## ALTER permissions {#alter-permissions} +## Alter permissions {#alter-permissions} This article is intended to provide you with a better understanding of how to define permissions, and how permissions work when using `ALTER` statements for privileged users. diff --git a/docs/guides/sre/user-management/ssl-user-auth.md b/docs/guides/sre/user-management/ssl-user-auth.md index 606c3c41330..c7135ff5261 100644 --- a/docs/guides/sre/user-management/ssl-user-auth.md +++ b/docs/guides/sre/user-management/ssl-user-auth.md @@ -6,7 +6,7 @@ title: 'Configuring SSL User Certificate for Authentication' description: 'This guide provides simple and minimal settings to configure authentication with SSL user certificates.' --- -# Configuring SSL User Certificate for Authentication +# Configuring SSL user certificate for authentication import SelfManaged from '@site/docs/_snippets/_self_managed_only_no_roadmap.md'; diff --git a/docs/integrations/data-ingestion/apache-spark/spark-jdbc.md b/docs/integrations/data-ingestion/apache-spark/spark-jdbc.md index 215a639e9e3..746126aeacf 100644 --- a/docs/integrations/data-ingestion/apache-spark/spark-jdbc.md +++ b/docs/integrations/data-ingestion/apache-spark/spark-jdbc.md @@ -347,7 +347,7 @@ reading in parallel from multiple workers. Please visit Apache Spark's official documentation for more information on [JDBC configurations](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option). -## JDBC Limitations {#jdbc-limitations} +## JDBC limitations {#jdbc-limitations} * As of today, you can insert data using JDBC only into existing tables (currently there is no way to auto create the table on DF insertion, as Spark does with other connectors). diff --git a/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md b/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md index fc8697e7f24..f017436207e 100644 --- a/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md +++ b/docs/integrations/data-ingestion/apache-spark/spark-native-connector.md @@ -11,7 +11,7 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import TOCInline from '@theme/TOCInline'; -# Spark Connector +# Spark connector This connector leverages ClickHouse-specific optimizations, such as advanced partitioning and predicate pushdown, to improve query performance and data handling. @@ -35,7 +35,7 @@ catalog feature, it is now possible to add and work with multiple catalogs in a - Scala 2.12 or 2.13 - Apache Spark 3.3 or 3.4 or 3.5 -## Compatibility Matrix {#compatibility-matrix} +## Compatibility matrix {#compatibility-matrix} | Version | Compatible Spark Versions | ClickHouse JDBC version | |---------|---------------------------|-------------------------| @@ -50,7 +50,7 @@ catalog feature, it is now possible to add and work with multiple catalogs in a | 0.2.1 | Spark 3.2 | Not depend on | | 0.1.2 | Spark 3.2 | Not depend on | -## Installation & Setup {#installation--setup} +## Installation & setup {#installation--setup} For integrating ClickHouse with Spark, there are multiple installation options to suit different project setups. You can add the ClickHouse Spark connector as a dependency directly in your project's build file (such as in `pom.xml` @@ -146,7 +146,7 @@ for production. -### Download The Library {#download-the-library} +### Download the library {#download-the-library} The name pattern of the binary JAR is: @@ -172,7 +172,7 @@ In any case, ensure that the package versions are compatible according to the [Compatibility Matrix](#compatibility-matrix). ::: -## Register The Catalog (required) {#register-the-catalog-required} +## Register the catalog (required) {#register-the-catalog-required} In order to access your ClickHouse tables, you must configure a new Spark catalog with the following configs: @@ -222,7 +222,7 @@ That way, you would be able to access clickhouse1 table `.` fro ::: -## ClickHouse Cloud Settings {#clickhouse-cloud-settings} +## ClickHouse Cloud settings {#clickhouse-cloud-settings} When connecting to [ClickHouse Cloud](https://clickhouse.com), make sure to enable SSL and set the appropriate SSL mode. For example: @@ -231,7 +231,7 @@ spark.sql.catalog.clickhouse.option.ssl true spark.sql.catalog.clickhouse.option.ssl_mode NONE ``` -## Read Data {#read-data} +## Read data {#read-data} @@ -338,7 +338,7 @@ df.show() -## Write Data {#write-data} +## Write data {#write-data} @@ -472,7 +472,7 @@ df.writeTo("clickhouse.default.example_table").append() -## DDL Operations {#ddl-operations} +## DDL operations {#ddl-operations} You can perform DDL operations on your ClickHouse instance using Spark SQL, with all changes immediately persisted in ClickHouse. @@ -530,7 +530,7 @@ The following are the adjustable configurations available in the connector: | spark.clickhouse.write.retryInterval | 10s | The interval in seconds between write retry. | 0.1.0 | | spark.clickhouse.write.retryableErrorCodes | 241 | The retryable error codes returned by ClickHouse server when write failing. | 0.1.0 | -## Supported Data Types {#supported-data-types} +## Supported data types {#supported-data-types} This section outlines the mapping of data types between Spark and ClickHouse. The tables below provide quick references for converting data types when reading from ClickHouse into Spark and when inserting data from Spark into ClickHouse. @@ -596,7 +596,7 @@ for converting data types when reading from ClickHouse into Spark and when inser | `Object` | | ❌ | | | | `Nested` | | ❌ | | | -## Contributing and Support {#contributing-and-support} +## Contributing and support {#contributing-and-support} If you'd like to contribute to the project or report any issues, we welcome your input! Visit our [GitHub repository](https://github.com/ClickHouse/spark-clickhouse-connector) to open an issue, suggest diff --git a/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md b/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md index cd8a908b8f4..910b145215a 100644 --- a/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md +++ b/docs/integrations/data-ingestion/azure-data-factory/using_azureblobstorage.md @@ -78,7 +78,7 @@ SELECT * FROM azureBlobStorage( This allows you to efficiently pull external data into ClickHouse without needing intermediate ETL steps. -## A simple example using the Environmental Sensors Dataset {#simple-example-using-the-environmental-sensors-dataset} +## A simple example using the Environmental sensors dataset {#simple-example-using-the-environmental-sensors-dataset} As an example we will download a single file from the Environmental Sensors Dataset. @@ -152,7 +152,7 @@ inference from input data](https://clickhouse.com/docs/interfaces/schema-inferen Your sensors table is now populated with data from the `2019-06_bmp180.csv.zst` file stored in Azure Blob Storage. -## Additional Resources {#additional-resources} +## Additional resources {#additional-resources} This is just a basic introduction to using the azureBlobStorage function. For more advanced options and configuration details, please refer to the official diff --git a/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md b/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md index 1f211c7a33c..7d245d5fdab 100644 --- a/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md +++ b/docs/integrations/data-ingestion/azure-data-factory/using_http_interface.md @@ -34,7 +34,7 @@ import adfCopyDataSource from '@site/static/images/integr import adfCopyDataSinkSelectPost from '@site/static/images/integrations/data-ingestion/azure-data-factory/adf-copy-data-sink-select-post.png'; import adfCopyDataDebugSuccess from '@site/static/images/integrations/data-ingestion/azure-data-factory/adf-copy-data-debug-success.png'; -# Using ClickHouse HTTP Interface in Azure Data Factory {#using-clickhouse-http-interface-in-azure-data-factory} +# Using ClickHouse HTTP interface in Azure data factory {#using-clickhouse-http-interface-in-azure-data-factory} The [`azureBlobStorage` Table Function](https://clickhouse.com/docs/sql-reference/table-functions/azureBlobStorage) is a fast and convenient way to ingest data from Azure Blob Storage into @@ -118,7 +118,7 @@ Service to your ClickHouse instance, define a Dataset for the [REST sink](https://learn.microsoft.com/en-us/azure/data-factory/connector-rest), and create a Copy Data activity to send data from Azure to ClickHouse. -## Creating an Azure Data Factory instance {#create-an-azure-data-factory-instance} +## Creating an Azure data factory instance {#create-an-azure-data-factory-instance} This guide assumes that you have access to Microsoft Azure account, and you already have configured a subscription and a resource group. If you have @@ -321,7 +321,7 @@ Now that we've configured both the input and output datasets, we can set up a 6. Once complete, click **Publish all** to save your pipeline and dataset changes. -## Additional Resources {#additional-resources-1} +## Additional resources {#additional-resources-1} - [HTTP Interface](https://clickhouse.com/docs/interfaces/http) - [Copy and transform data from and to a REST endpoint by using Azure Data Factory](https://learn.microsoft.com/en-us/azure/data-factory/connector-rest?tabs=data-factory) - [Selecting an Insert Strategy](https://clickhouse.com/docs/best-practices/selecting-an-insert-strategy) diff --git a/docs/integrations/data-ingestion/azure-synapse/index.md b/docs/integrations/data-ingestion/azure-synapse/index.md index 68010511e5e..9da775c7d7b 100644 --- a/docs/integrations/data-ingestion/azure-synapse/index.md +++ b/docs/integrations/data-ingestion/azure-synapse/index.md @@ -72,7 +72,7 @@ Please visit the [ClickHouse Spark configurations page](/integrations/apache-spa When working with ClickHouse Cloud Please make sure to set the [required Spark settings](/integrations/apache-spark/spark-native-connector#clickhouse-cloud-settings). ::: -## Setup Verification {#setup-verification} +## Setup verification {#setup-verification} To verify that the dependencies and configurations were set successfully, please visit your session's Spark UI, and go to your `Environment` tab. There, look for your ClickHouse related settings: @@ -80,7 +80,7 @@ There, look for your ClickHouse related settings: Verifying ClickHouse settings using Spark UI -## Additional Resources {#additional-resources} +## Additional resources {#additional-resources} - [ClickHouse Spark Connector Docs](/integrations/apache-spark) - [Azure Synapse Spark Pools Overview](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-overview) diff --git a/docs/integrations/data-ingestion/clickpipes/index.md b/docs/integrations/data-ingestion/clickpipes/index.md index d9dc1276a20..1f1c9700eb4 100644 --- a/docs/integrations/data-ingestion/clickpipes/index.md +++ b/docs/integrations/data-ingestion/clickpipes/index.md @@ -30,7 +30,7 @@ import Image from '@theme/IdealImage'; ClickPipes stack -## Supported Data Sources {#supported-data-sources} +## Supported data sources {#supported-data-sources} | Name | Logo |Type| Status | Description | |----------------------------------------------------|--------------------------------------------------------------------------------------------------|----|------------------|------------------------------------------------------------------------------------------------------| @@ -90,7 +90,7 @@ Errors related to the operation of the ClickPipe will be stored in the `system.c If ClickPipes cannot connect to a data source after 15 min or to a destination after 1 hr, the ClickPipes instance stops and stores an appropriate message in the system error table (provided the ClickHouse instance is available). -## F.A.Q {#faq} +## FAQ {#faq} - **What is ClickPipes?** ClickPipes is a ClickHouse Cloud feature that makes it easy for users to connect their ClickHouse services to external data sources, specifically Kafka. With ClickPipes for Kafka, users can easily continuously load data into ClickHouse, making it available for real-time analytics. diff --git a/docs/integrations/data-ingestion/clickpipes/kinesis.md b/docs/integrations/data-ingestion/clickpipes/kinesis.md index 772e4a5ab24..396e29208d8 100644 --- a/docs/integrations/data-ingestion/clickpipes/kinesis.md +++ b/docs/integrations/data-ingestion/clickpipes/kinesis.md @@ -62,7 +62,7 @@ You have familiarized yourself with the [ClickPipes intro](./index.md) and setup 8. Finally, you can configure permissions for the internal ClickPipes user. **Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role: - - `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table. + - `Full access`: with the full access to the cluster. It might be useful if you use materialized view or Dictionary with the destination table. - `Only destination table`: with the `INSERT` permissions to the destination table only. Permissions @@ -84,12 +84,12 @@ You have familiarized yourself with the [ClickPipes intro](./index.md) and setup 10. **Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source. Otherwise it will ingest the batch and complete. -## Supported Data Formats {#supported-data-formats} +## Supported data formats {#supported-data-formats} The supported formats are: - [JSON](../../../interfaces/formats.md/#json) -## Supported Data Types {#supported-data-types} +## Supported data types {#supported-data-types} ### Standard types support {#standard-types-support} The following ClickHouse data types are currently supported in ClickPipes: @@ -125,7 +125,7 @@ have to submit a support ticket to enable it on your service. JSON fields that are always a JSON object can be assigned to a JSON destination column. You will have to manually change the destination column to the desired JSON type, including any fixed or skipped paths. -## Kinesis Virtual Columns {#kinesis-virtual-columns} +## Kinesis virtual columns {#kinesis-virtual-columns} The following virtual columns are supported for Kinesis stream. When creating a new destination table virtual columns can be added by using the `Add Column` button. diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/index.md b/docs/integrations/data-ingestion/clickpipes/mysql/index.md index d0966fc46e7..1b0a2a00a9f 100644 --- a/docs/integrations/data-ingestion/clickpipes/mysql/index.md +++ b/docs/integrations/data-ingestion/clickpipes/mysql/index.md @@ -49,7 +49,7 @@ Once your source MySQL database is set up, you can continue creating your ClickP Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/). [//]: # ( TODO update image here) -1. In the ClickHouse Cloud Console, navigate to your ClickHouse Cloud Service. +1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service. ClickPipes service @@ -74,7 +74,7 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have Fill in connection details -#### (Optional) Set up SSH Tunneling {#optional-setting-up-ssh-tunneling} +#### (Optional) Set up SSH tunneling {#optional-setting-up-ssh-tunneling} You can specify SSH tunneling details if your source MySQL database is not publicly accessible. diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md index dace34f1304..395d7e18feb 100644 --- a/docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md +++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md @@ -73,7 +73,7 @@ Then click on `Save Changes` in the top-right. You may need to reboot your insta If you have a MySQL cluster, the above parameters would be found in a [DB Cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithParamGroups.CreatingCluster.html) parameter group and not the DB instance group. ::: -## Enabling GTID Mode {#gtid-mode-aurora} +## Enabling GTID mode {#gtid-mode-aurora} Global Transaction Identifiers (GTIDs) are unique IDs assigned to each committed transaction in MySQL. They simplify binlog replication and make troubleshooting more straightforward. If your MySQL instance is MySQL 5.7, 8.0 or 8.4, we recommend enabling GTID mode so that the MySQL ClickPipe can use GTID replication. diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md index 90a78354300..3a87882f74c 100644 --- a/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md +++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md @@ -65,14 +65,14 @@ Connect to your Cloud SQL MySQL instance as the root user and execute the follow ## Configure network access {#configure-network-access-gcp-mysql} If you want to restrict traffic to your Cloud SQL instance, please add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the allowlisted IPs of your Cloud SQL MySQL instance. -This can be done either by editing the instance or by heading over to the `Connections` tab in the sidebar in Cloud Console. +This can be done either by editing the instance or by heading over to the `Connections` tab in the sidebar in Cloud console. IP allowlisting in GCP MySQL -## Download and Use Root CA certificate {#download-root-ca-certificate-gcp-mysql} +## Download and use root CA certificate {#download-root-ca-certificate-gcp-mysql} To connect to your Cloud SQL instance, you need to download the root CA certificate. -1. Go to your Cloud SQL instance in the Cloud Console. +1. Go to your Cloud SQL instance in the Cloud console. 2. Click on `Connections` in the sidebar. 3. Click on the `Security` tab. 4. In the `Manage server CA certificates` section, click on the `DOWNLOAD CERTIFICATES` button at the bottom. diff --git a/docs/integrations/data-ingestion/clickpipes/object-storage.md b/docs/integrations/data-ingestion/clickpipes/object-storage.md index 6ab6284b697..22d0a10102c 100644 --- a/docs/integrations/data-ingestion/clickpipes/object-storage.md +++ b/docs/integrations/data-ingestion/clickpipes/object-storage.md @@ -23,7 +23,7 @@ import cp_destination from '@site/static/images/integrations/data-ingestion/clic import cp_overview from '@site/static/images/integrations/data-ingestion/clickpipes/cp_overview.png'; import Image from '@theme/IdealImage'; -# Integrating Object Storage with ClickHouse Cloud +# Integrating object storage with ClickHouse Cloud Object Storage ClickPipes provide a simple and resilient way to ingest data from Amazon S3, Google Cloud Storage, Azure Blob Storage, and DigitalOcean Spaces into ClickHouse Cloud. Both one-time and continuous ingestion are supported with exactly-once semantics. @@ -67,7 +67,7 @@ You can also map [virtual columns](../../sql-reference/table-functions/s3#virtua 7. Finally, you can configure permissions for the internal ClickPipes user. **Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role: - - `Full access`: with the full access to the cluster. Required if you use Materialized View or Dictionary with the destination table. + - `Full access`: with the full access to the cluster. Required if you use materialized view or Dictionary with the destination table. - `Only destination table`: with the `INSERT` permissions to the destination table only. Permissions @@ -89,7 +89,7 @@ You can also map [virtual columns](../../sql-reference/table-functions/s3#virtua Image 9. **Congratulations!** you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source. Otherwise it will ingest the batch and complete. -## Supported Data Sources {#supported-data-sources} +## Supported data sources {#supported-data-sources} | Name |Logo|Type| Status | Description | |----------------------|----|----|-----------------|------------------------------------------------------------------------------------------------------| @@ -100,7 +100,7 @@ Image More connectors will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes). -## Supported Data Formats {#supported-data-formats} +## Supported data formats {#supported-data-formats} The supported formats are: - [JSON](/interfaces/formats/JSON) @@ -108,11 +108,11 @@ The supported formats are: - [Parquet](/interfaces/formats/Parquet) - [Avro](/interfaces/formats/Avro) -## Exactly-Once Semantics {#exactly-once-semantics} +## Exactly-once semantics {#exactly-once-semantics} Various types of failures can occur when ingesting large dataset, which can result in a partial inserts or duplicate data. Object Storage ClickPipes are resilient to insert failures and provides exactly-once semantics. This is accomplished by using temporary "staging" tables. Data is first inserted into the staging tables. If something goes wrong with this insert, the staging table can be truncated and the insert can be retried from a clean state. Only when an insert is completed and successful, the partitions in the staging table are moved to target table. To read more about this strategy, check-out [this blog post](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part3). -### View Support {#view-support} +### View support {#view-support} Materialized views on the target table are also supported. ClickPipes will create staging tables not only for the target table, but also any dependent materialized view. We do not create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you are missing data in the materialized view. @@ -173,7 +173,7 @@ Currently only protected buckets are supported for DigitalOcean spaces. You requ ### Azure Blob Storage {#azureblobstorage} Currently only protected buckets are supported for Azure Blob Storage. Authentication is done via a connection string, which supports access keys and shared keys. For more information, read [this guide](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). -## F.A.Q. {#faq} +## FAQ {#faq} - **Does ClickPipes support GCS buckets prefixed with `gs://`?** diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md b/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md index 9b282bb43a4..a46c74122fa 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md @@ -179,9 +179,9 @@ ORDER BY viewcount DESC LIMIT 10 ``` -#### Refreshable Material view {#refreshable-material-view} +#### Refreshable materialized view {#refreshable-material-view} -Another approach is to use a [Refreshable Materialized View](/materialized-view/refreshable-materialized-view), which enables you to schedule query execution for deduplicating rows and storing the results in a destination table. With each scheduled refresh, the destination table is replaced with the latest query results. +Another approach is to use a [refreshable materialized view](/materialized-view/refreshable-materialized-view), which enables you to schedule query execution for deduplicating rows and storing the results in a destination table. With each scheduled refresh, the destination table is replaced with the latest query results. The key advantage of this method is that the query using the FINAL keyword runs only once during the refresh, eliminating the need for subsequent queries on the destination table to use FINAL. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/faq.md b/docs/integrations/data-ingestion/clickpipes/postgres/faq.md index 297eadb909d..279b5410929 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/faq.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/faq.md @@ -186,17 +186,17 @@ For manually created publications, please add any tables you want to the publica ::: -## Recommended `max_slot_wal_keep_size` Settings {#recommended-max_slot_wal_keep_size-settings} +## Recommended `max_slot_wal_keep_size` settings {#recommended-max_slot_wal_keep_size-settings} - **At Minimum:** Set [`max_slot_wal_keep_size`](https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE) to retain at least **two days' worth** of WAL data. - **For Large Databases (High Transaction Volume):** Retain at least **2-3 times** the peak WAL generation per day. - **For Storage-Constrained Environments:** Tune this conservatively to **avoid disk exhaustion** while ensuring replication stability. -### How to Calculate the Right Value {#how-to-calculate-the-right-value} +### How to calculate the right value {#how-to-calculate-the-right-value} To determine the right setting, measure the WAL generation rate: -#### For PostgreSQL 10+: {#for-postgresql-10} +#### For PostgreSQL 10+ {#for-postgresql-10} ```sql SELECT pg_wal_lsn_diff(pg_current_wal_insert_lsn(), '0/0') / 1024 / 1024 AS wal_generated_mb; @@ -213,7 +213,7 @@ SELECT pg_xlog_location_diff(pg_current_xlog_insert_location(), '0/0') / 1024 / * Multiply that number by 2 or 3 to provide sufficient retention. * Set `max_slot_wal_keep_size` to the resulting value in MB or GB. -#### Example: {#example} +#### Example {#example} If your database generates 100 GB of WAL per day, set: @@ -229,7 +229,7 @@ The most common cause of replication slot invalidation is a low `max_slot_wal_ke In rare cases, we have seen this issue occur even when `max_slot_wal_keep_size` is not configured. This could be due to an intricate and rare bug in PostgreSQL, although the cause remains unclear. -## I am seeing Out Of Memory (OOMs) on ClickHouse while my ClickPipe is ingesting data. Can you help? {#i-am-seeing-out-of-memory-ooms-on-clickhouse-while-my-clickpipe-is-ingesting-data-can-you-help} +## I am seeing out of memory (OOMs) on ClickHouse while my ClickPipe is ingesting data. Can you help? {#i-am-seeing-out-of-memory-ooms-on-clickhouse-while-my-clickpipe-is-ingesting-data-can-you-help} One common reason for OOMs on ClickHouse is that your service is undersized. This means that your current service configuration doesn't have enough resources (e.g., memory or CPU) to handle the ingestion load effectively. We strongly recommend scaling up the service to meet the demands of your ClickPipe data ingestion. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/index.md b/docs/integrations/data-ingestion/clickpipes/postgres/index.md index ff7513b65e8..85923d822ec 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/index.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/index.md @@ -56,7 +56,7 @@ Once your source Postgres database is set up, you can continue creating your Cli Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/). [//]: # ( TODO update image here) -1. In the ClickHouse Cloud Console, navigate to your ClickHouse Cloud Service. +1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service. ClickPipes service @@ -87,7 +87,7 @@ You can use AWS Private Link to connect to your source Postgres database if it i want to keep your data transfer private. You can follow the [setup guide to set up the connection](/integrations/clickpipes/aws-privatelink). -#### (Optional) Setting up SSH Tunneling {#optional-setting-up-ssh-tunneling} +#### (Optional) Setting up SSH tunneling {#optional-setting-up-ssh-tunneling} You can specify SSH tunneling details if your source Postgres database is not publicly accessible. @@ -114,7 +114,7 @@ Once the connection details are filled in, click on "Next". Select replication slot -#### Advanced Settings {#advanced-settings} +#### Advanced settings {#advanced-settings} You can configure the Advanced settings if needed. A brief description of each setting is provided below: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md b/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md index fd7817c6a42..91337e28051 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/ordering_keys.md @@ -11,7 +11,7 @@ As describe in the [migration guide](/migrations/postgresql/data-modeling-techni By default with CDC, choosing an ordering key different from the Postgres primary key can cause data deduplication issues in ClickHouse. This happens because the ordering key in ClickHouse serves a dual role: it controls data indexing and sorting while acting as the deduplication key. The easiest way to address this issue is by defining refreshable materialized views. -## Use Refreshable Materialized Views {#use-refreshable-materialized-views} +## Use refreshable materialized views {#use-refreshable-materialized-views} A simple way to define custom ordering keys (ORDER BY) is using [refreshable materialized views](/materialized-view/refreshable-materialized-view) (MVs). These allow you to periodically (e.g., every 5 or 10 minutes) copy the entire table with the desired ordering key. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md b/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md index 2d4a6e83fdd..8bb90c40cec 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/postgres_generated_columns.md @@ -6,13 +6,13 @@ description: 'Page describing important considerations to keep in mind when usin When using PostgreSQL's generated columns in tables that are being replicated, there are some important considerations to keep in mind. These gotchas can affect the replication process and data consistency in your destination systems. -## The Problem with Generated Columns {#the-problem-with-generated-columns} +## The problem with generated columns {#the-problem-with-generated-columns} 1. **Not Published via `pgoutput`:** Generated columns are not published through the `pgoutput` logical replication plugin. This means that when you're replicating data from PostgreSQL to another system, the values of generated columns are not included in the replication stream. 2. **Issues with Primary Keys:** If a generated column is part of your primary key, it can cause deduplication problems on the destination. Since the generated column values are not replicated, the destination system won't have the necessary information to properly identify and deduplicate rows. -## Best Practices {#best-practices} +## Best practices {#best-practices} To work around these limitations, consider the following best practices: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/aurora.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/aurora.md index b251c331f94..8a569f4f0a2 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/aurora.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/aurora.md @@ -14,13 +14,13 @@ import security_group_in_rds_postgres from '@site/static/images/integrations/dat import edit_inbound_rules from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/rds/edit_inbound_rules.png'; import Image from '@theme/IdealImage'; -# Aurora Postgres Source Setup Guide +# Aurora Postgres source setup guide ## Supported Postgres versions {#supported-postgres-versions} ClickPipes supports Aurora PostgreSQL-Compatible Edition version 12 and later. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} You can skip this section if your Aurora instance already has the following settings configured: - `rds.logical_replication = 1` @@ -62,7 +62,7 @@ If not already configured, follow these steps: Reboot Aurora PostgreSQL -## Configure Database User {#configure-database-user} +## Configure database user {#configure-database-user} Connect to your Aurora PostgreSQL writer instance as an admin user and execute the following commands: @@ -93,9 +93,9 @@ Connect to your Aurora PostgreSQL writer instance as an admin user and execute t ``` -## Configure Network Access {#configure-network-access} +## Configure network access {#configure-network-access} -### IP-based Access Control {#ip-based-access-control} +### IP-based access control {#ip-based-access-control} If you want to restrict traffic to your Aurora cluster, please add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the `Inbound rules` of your Aurora security group. @@ -103,11 +103,11 @@ If you want to restrict traffic to your Aurora cluster, please add the [document Edit inbound rules for the above security group -### Private Access via AWS PrivateLink {#private-access-via-aws-privatelink} +### Private access via AWS PrivateLink {#private-access-via-aws-privatelink} To connect to your Aurora cluster through a private network, you can use AWS PrivateLink. Follow our [AWS PrivateLink setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup-for-clickpipes) to set up the connection. -### Aurora-Specific Considerations {#aurora-specific-considerations} +### Aurora-specific considerations {#aurora-specific-considerations} When setting up ClickPipes with Aurora PostgreSQL, keep these considerations in mind: @@ -119,7 +119,7 @@ When setting up ClickPipes with Aurora PostgreSQL, keep these considerations in 4. **Storage Considerations**: Aurora's storage layer is shared across all instances in a cluster, which can provide better performance for logical replication compared to standard RDS. -### Dealing with Dynamic Cluster Endpoints {#dealing-with-dynamic-cluster-endpoints} +### Dealing with dynamic cluster endpoints {#dealing-with-dynamic-cluster-endpoints} While Aurora provides stable endpoints that automatically route to the appropriate instance, here are some additional approaches for ensuring consistent connectivity: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md index b0700ffe6f2..391438a9112 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres.md @@ -11,11 +11,11 @@ import restart from '@site/static/images/integrations/data-ingestion/clickpipes/ import firewall from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/azure-flexible-server-postgres/firewall.png'; import Image from '@theme/IdealImage'; -# Azure Flexible Server for Postgres Source Setup Guide +# Azure flexible server for Postgres source setup guide ClickPipes supports Postgres version 12 and later. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} **You don't need** to follow the below steps if `wal_level` is set to `logical`. This setting should mostly be pre-configured if you are migrating from another data replication tool. @@ -31,7 +31,7 @@ ClickPipes supports Postgres version 12 and later. Restart server after changing wal_level -## Creating ClickPipes User and Granting permissions {#creating-clickpipes-user-and-granting-permissions} +## Creating ClickPipes users and granting permissions {#creating-clickpipes-user-and-granting-permissions} Connect to your Azure Flexible Server Postgres through the admin user and run the below commands: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/crunchy-postgres.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/crunchy-postgres.md index d5a56f9d30f..865690d1ffb 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/crunchy-postgres.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/crunchy-postgres.md @@ -9,12 +9,12 @@ import firewall_rules_crunchy_bridge from '@site/static/images/integrations/data import add_firewall_rules_crunchy_bridge from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/setup/crunchy-postgres/add_firewall_rules_crunchy_bridge.png' import Image from '@theme/IdealImage'; -# Crunchy Bridge Postgres Source Setup Guide +# Crunchy Bridge Postgres source setup guide ClickPipes supports Postgres version 12 and later. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} Crunchy Bridge comes with logical replication enabled by [default](https://docs.crunchybridge.com/how-to/logical-replication). Ensure that the settings below are configured correctly. If not, adjust them accordingly. @@ -24,7 +24,7 @@ SHOW max_wal_senders; -- should be 10 SHOW max_replication_slots; -- should be 10 ``` -## Creating ClickPipes User and Granting permissions {#creating-clickpipes-user-and-granting-permissions} +## Creating ClickPipes user and granting permissions {#creating-clickpipes-user-and-granting-permissions} Connect to your Crunchy Bridge Postgres through the `postgres` user and run the below commands: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md index a247f46089f..22cf9cd6c88 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/generic.md @@ -5,7 +5,7 @@ slug: /integrations/clickpipes/postgres/source/generic title: 'Generic Postgres Source Setup Guide' --- -# Generic Postgres Source Setup Guide +# Generic Postgres source setup guide :::info @@ -16,7 +16,7 @@ If you use one of the supported providers (in the sidebar), please refer to the ClickPipes supports Postgres version 12 and later. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} 1. To enable replication on your Postgres instance, we need to make sure that the following settings are set: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md index 4aa6190cab0..2f4cca66aaa 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql.md @@ -15,7 +15,7 @@ import firewall1 from '@site/static/images/integrations/data-ingestion/clickpipe import firewall2 from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/google-cloudsql/firewall2.png'; import Image from '@theme/IdealImage'; -# Google Cloud SQL Postgres Source Setup Guide +# Google Cloud SQL Postgres source setup guide :::info @@ -28,7 +28,7 @@ If you use one of the supported providers (in the sidebar), please refer to the Anything on or after Postgres 12 -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} **You don't need** to follow the below steps if the settings `cloudsql. logical_decoding` is on and `wal_sender_timeout` is 0. These settings should mostly be pre-configured if you are migrating from another data replication tool. @@ -43,7 +43,7 @@ Anything on or after Postgres 12 Restart Server -## Creating ClickPipes User and Granting permissions {#creating-clickpipes-user-and-granting-permissions} +## Creating ClickPipes user and granting permissions {#creating-clickpipes-user-and-granting-permissions} Connect to your Cloud SQL Postgres through the admin user and run the below commands: diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/neon-postgres.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/neon-postgres.md index ecdbc6baf2b..8d332071e7f 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/neon-postgres.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/neon-postgres.md @@ -12,7 +12,7 @@ import neon_ip_allow from '@site/static/images/integrations/data-ingestion/click import neon_conn_details from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/setup/neon-postgres/neon-conn-details.png' import Image from '@theme/IdealImage'; -# Neon Postgres Source Setup Guide +# Neon Postgres source setup guide This is a guide on how to setup Neon Postgres, which you can use for replication in ClickPipes. Make sure you're signed in to your [Neon console](https://console.neon.tech/app/projects) for this setup. @@ -42,7 +42,7 @@ Here, we can run the following SQL commands: Click on **Run** to have a publication and a user ready. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} In Neon, you can enable logical replication through the UI. This is necessary for ClickPipes's CDC to replicate data. Head over to the **Settings** tab and then to the **Logical Replication** section. @@ -59,13 +59,13 @@ SHOW max_wal_senders; -- should be 10 SHOW max_replication_slots; -- should be 10 ``` -## IP Whitelisting (For Neon Enterprise plan) {#ip-whitelisting-for-neon-enterprise-plan} +## IP whitelisting (for Neon enterprise plan) {#ip-whitelisting-for-neon-enterprise-plan} If you have Neon Enterprise plan, you can whitelist the [ClickPipes IPs](../../index.md#list-of-static-ips) to allow replication from ClickPipes to your Neon Postgres instance. To do this you can click on the **Settings** tab and go to the **IP Allow** section. Allow IPs screen -## Copy Connection Details {#copy-connection-details} +## Copy connection details {#copy-connection-details} Now that we have the user, publication ready and replication enabled, we can copy the connection details to create a new ClickPipe. Head over to the **Dashboard** and at the text box where it shows the connection string, change the view to **Parameters Only**. We will need these parameters for our next step. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md index 811a9a988fe..573d4a4a9c3 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/rds.md @@ -14,13 +14,13 @@ import security_group_in_rds_postgres from '@site/static/images/integrations/dat import edit_inbound_rules from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/rds/edit_inbound_rules.png'; import Image from '@theme/IdealImage'; -# RDS Postgres Source Setup Guide +# RDS Postgres source setup guide ## Supported Postgres versions {#supported-postgres-versions} ClickPipes supports Postgres version 12 and later. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} You can skip this section if your RDS instance already has the following settings configured: - `rds.logical_replication = 1` @@ -62,7 +62,7 @@ If not already configured, follow these steps: Reboot RDS Postgres -## Configure Database User {#configure-database-user} +## Configure database user {#configure-database-user} Connect to your RDS Postgres instance as an admin user and execute the following commands: @@ -93,9 +93,9 @@ Connect to your RDS Postgres instance as an admin user and execute the following ``` -## Configure Network Access {#configure-network-access} +## Configure network access {#configure-network-access} -### IP-based Access Control {#ip-based-access-control} +### IP-based access control {#ip-based-access-control} If you want to restrict traffic to your RDS instance, please add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the `Inbound rules` of your RDS security group. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md index b0f2a554a33..7890d0e87cc 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/supabase.md @@ -9,7 +9,7 @@ import supabase_commands from '@site/static/images/integrations/data-ingestion/c import supabase_connection_details from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/source/setup/supabase/supabase-connection-details.jpg' import Image from '@theme/IdealImage'; -# Supabase Source Setup Guide +# Supabase source setup guide This is a guide on how to setup Supabase Postgres for usage in ClickPipes. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md b/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md index 885d27d43aa..3c94bdb774f 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/source/timescale.md @@ -8,7 +8,7 @@ keywords: ['TimescaleDB'] import BetaBadge from '@theme/badges/BetaBadge'; -# Postgres with TimescaleDB Source Setup Guide +# Postgres with TimescaleDB source setup guide @@ -35,7 +35,7 @@ to the process of replicating them, which is why the ability to replicate Timesc ClickPipes supports Postgres version 12 and later. -## Enable Logical Replication {#enable-logical-replication} +## Enable logical replication {#enable-logical-replication} The steps to be follow depend on how your Postgres instance with TimescaleDB is deployed. @@ -115,7 +115,7 @@ ERROR: transparent decompression only supports tableoid system column (SQLSTATE You may need to disable [compression](https://docs.timescale.com/api/latest/compression/decompress_chunk) or [hypercore columnstore](https://docs.timescale.com/api/latest/hypercore/convert_to_rowstore) for these tables. -## Configure Network Access {#configure-network-access} +## Configure network access {#configure-network-access} If you want to restrict traffic to your Timescale instance, please allowlist the [documented static NAT IPs](../../index.md#list-of-static-ips). Instructions to do this will vary across providers, please consult the sidebar if your provider is listed or raise a diff --git a/docs/integrations/data-ingestion/data-formats/csv-tsv.md b/docs/integrations/data-ingestion/data-formats/csv-tsv.md index 773d592214d..4602df0d703 100644 --- a/docs/integrations/data-ingestion/data-formats/csv-tsv.md +++ b/docs/integrations/data-ingestion/data-formats/csv-tsv.md @@ -160,7 +160,7 @@ SELECT * FROM file('nulls.csv') ``` -## TSV (Tab-separated) files {#tsv-tab-separated-files} +## TSV (tab-separated) files {#tsv-tab-separated-files} Tab-separated data format is widely used as a data interchange format. To load data from a [TSV file](assets/data_small.tsv) to ClickHouse, the [TabSeparated](/interfaces/formats.md/#tabseparated) format is used: diff --git a/docs/integrations/data-ingestion/data-formats/json/formats.md b/docs/integrations/data-ingestion/data-formats/json/formats.md index f5a4068b0bd..a14049df1b3 100644 --- a/docs/integrations/data-ingestion/data-formats/json/formats.md +++ b/docs/integrations/data-ingestion/data-formats/json/formats.md @@ -227,7 +227,7 @@ SELECT * FROM file('objects.json', JSONObjectEachRow) Note how the `id` column has been populated by key values correctly. -## JSON Arrays {#json-arrays} +## JSON arrays {#json-arrays} Sometimes, for the sake of saving space, JSON files are encoded in arrays instead of objects. In this case, we deal with a [list of JSON arrays](../assets/arrays.json): diff --git a/docs/integrations/data-ingestion/data-formats/json/loading.md b/docs/integrations/data-ingestion/data-formats/json/loading.md index 6b4d556f2ce..6b2c97bd5d5 100644 --- a/docs/integrations/data-ingestion/data-formats/json/loading.md +++ b/docs/integrations/data-ingestion/data-formats/json/loading.md @@ -12,7 +12,7 @@ score: 15 The following examples provide a very simple example of loading structured and semi-structured JSON data. For more complex JSON, including nested structures, see the guide [**Designing JSON schema**](/integrations/data-formats/json/schema). -## Loading Structured JSON {#loading-structured-json} +## Loading structured JSON {#loading-structured-json} In this section, we assume the JSON data is in [`NDJSON`](https://github.com/ndjson/ndjson-spec) (Newline delimited JSON) format, known as [`JSONEachRow`](/interfaces/formats#jsoneachrow) in ClickHouse, and well structured i.e. the column names and types are fixed. `NDJSON` is the preferred format for loading JSON due to its brevity and efficient use of space, but others are supported for both [input and output](/interfaces/formats#json). @@ -116,7 +116,7 @@ FORMAT JSONEachRow These examples assume the use of the `JSONEachRow` format. Other common JSON formats are supported, with examples of loading these provided [here](/integrations/data-formats/json/other-formats). -## Loading Semi-structured JSON {#loading-semi-structured-json} +## Loading semi-structured JSON {#loading-semi-structured-json} Our previous example loaded JSON which was static with well known key names and types. This is often not the case - keys can be added or their types can change. This is common in use cases such as Observability data. diff --git a/docs/integrations/data-ingestion/data-formats/json/other.md b/docs/integrations/data-ingestion/data-formats/json/other.md index 33b744ab987..6df2fb5b12f 100644 --- a/docs/integrations/data-ingestion/data-formats/json/other.md +++ b/docs/integrations/data-ingestion/data-formats/json/other.md @@ -13,7 +13,7 @@ keywords: ['json', 'formats'] Different techniques may be applied to different objects in the same schema. For example, some objects can be best solved with a `String` type and others with a `Map` type. Note that once a `String` type is used, no further schema decisions need to be made. Conversely, it is possible to nest sub-objects within a `Map` key - including a `String` representing JSON - as we show below: ::: -## Using String {#using-string} +## Using the String type {#using-string} If the objects are highly dynamic, with no predictable structure and contain arbitrary nested objects, users should use the `String` type. Values can be extracted at query time using JSON functions as we show below. @@ -150,7 +150,7 @@ String functions are appreciably slower (> 10x) than explicit type conversions w This approach's flexibility comes at a clear performance and syntax cost, and it should be used only for highly dynamic objects in the schema. -### Simple JSON Functions {#simple-json-functions} +### Simple JSON functions {#simple-json-functions} The above examples use the JSON* family of functions. These utilize a full JSON parser based on [simdjson](https://github.com/simdjson/simdjson), that is rigorous in its parsing and will distinguish between the same field nested at different levels. These functions are able to deal with JSON that is syntactically correct but not well-formatted, e.g. double spaces between keys. @@ -202,7 +202,7 @@ Peak memory usage: 211.49 MiB. The above query uses the `simpleJSONExtractString` to extract the `created` key, exploiting the fact we want the first value only for the published date. In this case, the limitations of the `simpleJSON*` functions are acceptable for the gain in performance. -## Using Map {#using-map} +## Using the Map type {#using-map} If the object is used to store arbitrary keys, mostly of one type, consider using the `Map` type. Ideally, the number of unique keys should not exceed several hundred. The `Map` type can also be considered for objects with sub-objects, provided the latter have uniformity in their types. Generally, we recommend the `Map` type be used for labels and tags, e.g. Kubernetes pod labels in log data. @@ -358,7 +358,7 @@ The application of maps in this case is typically rare, and suggests that the da -## Using Nested {#using-nested} +## Using the Nested type {#using-nested} The [Nested type](/sql-reference/data-types/nested-data-structures/nested) can be used to model static objects which are rarely subject to change, offering an alternative to `Tuple` and `Array(Tuple)`. We generally recommend avoiding using this type for JSON as its behavior is often confusing. The primary benefit of `Nested` is that sub-columns can be used in ordering keys. @@ -599,7 +599,7 @@ ORDER BY c DESC LIMIT 5; 5 rows in set. Elapsed: 0.007 sec. ``` -### Using Pairwise Arrays {#using-pairwise-arrays} +### Using pairwise arrays {#using-pairwise-arrays} Pairwise arrays provide a balance between the flexibility of representing JSON as Strings and the performance of a more structured approach. The schema is flexible in that any new fields can be potentially added to the root. This, however, requires a significantly more complex query syntax and isn't compatible with nested structures. diff --git a/docs/integrations/data-ingestion/data-sources-index.md b/docs/integrations/data-ingestion/data-sources-index.md index 483061d3eb4..2fd5ba7fae3 100644 --- a/docs/integrations/data-ingestion/data-sources-index.md +++ b/docs/integrations/data-ingestion/data-sources-index.md @@ -5,7 +5,7 @@ description: 'Datasources overview page' title: 'Data Sources' --- -# Data Sources +# Data sources ClickHouse allows you to easily ingest data into your database from a variety of sources. For further information see the pages listed below: diff --git a/docs/integrations/data-ingestion/dbms/dynamodb/index.md b/docs/integrations/data-ingestion/dbms/dynamodb/index.md index 8ce600102db..f67bf5787ef 100644 --- a/docs/integrations/data-ingestion/dbms/dynamodb/index.md +++ b/docs/integrations/data-ingestion/dbms/dynamodb/index.md @@ -28,7 +28,7 @@ Data will be ingested into a `ReplacingMergeTree`. This table engine is commonly * [Change Data Capture (CDC) with PostgreSQL and ClickHouse - Part 1](https://clickhouse.com/blog/clickhouse-postgresql-change-data-capture-cdc-part-1?loc=docs-rockest-migrations) * [Change Data Capture (CDC) with PostgreSQL and ClickHouse - Part 2](https://clickhouse.com/blog/clickhouse-postgresql-change-data-capture-cdc-part-2?loc=docs-rockest-migrations) -## 1. Set up Kinesis Stream {#1-set-up-kinesis-stream} +## 1. Set up Kinesis stream {#1-set-up-kinesis-stream} First, you will want to enable a Kinesis stream on your DynamoDB table to capture changes in real-time. We want to do this before we create the snapshot to avoid missing any data. Find the AWS guide located [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/kds.html). @@ -61,12 +61,12 @@ The snapshot data from DynamoDB will look something this: } ``` -Observe that the data is in a nested format. We will need to flatten this data before loading it into ClickHouse. This can be done using the `JSONExtract` function in ClickHouse in a Materialized View. +Observe that the data is in a nested format. We will need to flatten this data before loading it into ClickHouse. This can be done using the `JSONExtract` function in ClickHouse in a materialized view. We will want to create three tables: 1. A table to store the raw data from DynamoDB 2. A table to store the final flattened data (destination table) -3. A Materialized View to flatten the data +3. A materialized view to flatten the data For the example DynamoDB data above, the ClickHouse tables would look like this: diff --git a/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md b/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md index 305c80b4865..58513cc9844 100644 --- a/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md +++ b/docs/integrations/data-ingestion/dbms/postgresql/connecting-to-postgresql.md @@ -18,7 +18,7 @@ This page covers following options for integrating PostgreSQL with ClickHouse: - using the `PostgreSQL` table engine, for reading from a PostgreSQL table - using the experimental `MaterializedPostgreSQL` database engine, for syncing a database in PostgreSQL with a database in ClickHouse -## Using the PostgreSQL Table Engine {#using-the-postgresql-table-engine} +## Using the PostgreSQL table engine {#using-the-postgresql-table-engine} The `PostgreSQL` table engine allows **SELECT** and **INSERT** operations on data stored on the remote PostgreSQL server from ClickHouse. This article is to illustrate basic methods of integration using one table. diff --git a/docs/integrations/data-ingestion/emqx/index.md b/docs/integrations/data-ingestion/emqx/index.md index d8e7aef207e..285abbad5ef 100644 --- a/docs/integrations/data-ingestion/emqx/index.md +++ b/docs/integrations/data-ingestion/emqx/index.md @@ -54,7 +54,7 @@ With the infrastructure provided by cloud providers, EMQX Cloud serves dozens of * We are using [MQTT X](https://mqttx.app/) as an MQTT client testing tool to connect the deployment of EMQX Cloud to publish MQTT data. Or other methods connecting to the MQTT broker will do the job as well. -## Get Your ClickHouse Cloud Service {#get-your-clickhouse-cloudservice} +## Get your ClickHouse Cloud service {#get-your-clickhouse-cloudservice} During this setup, we deployed the ClickHouse instance on AWS in N. Virginia (us-east -1), while an EMQX Cloud instance was also deployed in the same region. @@ -107,7 +107,7 @@ Start at the [EMQX Cloud sign up](https://accounts.emqx.com/signup?continue=http ### Create an MQTT cluster {#create-an-mqtt-cluster} -Once logged in, click on "Cloud Console" under the account menu and you will be able to see the green button to create a new deployment. +Once logged in, click on "Cloud console" under the account menu and you will be able to see the green button to create a new deployment. EMQX Cloud Create Deployment Step 1 showing deployment options @@ -121,7 +121,7 @@ Now click the panel to go to the cluster view. On this dashboard, you will see t EMQX Cloud Overview Dashboard showing broker metrics -### Add Client Credential {#add-client-credential} +### Add client credential {#add-client-credential} EMQX Cloud does not allow anonymous connections by default,so you need add a client credential so you can use the MQTT client tool to send data to this broker. @@ -152,7 +152,7 @@ EMQX Cloud offers more than 30 native integrations with popular data systems. Cl EMQX Cloud ClickHouse Data Integration connector details -### Create ClickHouse Resource {#create-clickhouse-resource} +### Create ClickHouse resource {#create-clickhouse-resource} Click "Data Integrations" on the left menu and click "View All Resources". You will find the ClickHouse in the Data Persistence section or you can search for ClickHouse. @@ -166,7 +166,7 @@ Click the ClickHouse card to create a new resource. EMQX Cloud ClickHouse Resource Setup form with connection details -### Create A New Rule {#create-a-new-rule} +### Create a new rule {#create-a-new-rule} During the creation of the resource, you will see a popup, and clicking 'New' will leads you to the rule creation page. @@ -214,7 +214,7 @@ INSERT INTO temp_hum (client_id, timestamp, topic, temp, hum) VALUES ('${client_ This is a template for inserting data into Clickhouse, you can see the variables are used here. -### View Rules Details {#view-rules-details} +### View rules details {#view-rules-details} Click "Confirm" and "View Details". Now, everything should be well set. You can see the data integration works from rule details page. diff --git a/docs/integrations/data-ingestion/etl-tools/apache-beam.md b/docs/integrations/data-ingestion/etl-tools/apache-beam.md index f1881d3b120..bd2ccbd2413 100644 --- a/docs/integrations/data-ingestion/etl-tools/apache-beam.md +++ b/docs/integrations/data-ingestion/etl-tools/apache-beam.md @@ -14,7 +14,7 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported'; **Apache Beam** is an open-source, unified programming model that enables developers to define and execute both batch and stream (continuous) data processing pipelines. The flexibility of Apache Beam lies in its ability to support a wide range of data processing scenarios, from ETL (Extract, Transform, Load) operations to complex event processing and real-time analytics. This integration leverage ClickHouse's official [JDBC connector](https://github.com/ClickHouse/clickhouse-java) for the underlying insertion layer. -## Integration Package {#integration-package} +## Integration package {#integration-package} The integration package required to integrate Apache Beam and ClickHouse is maintained and developed under [Apache Beam I/O Connectors](https://beam.apache.org/documentation/io/connectors/) - an integrations bundle of many popular data storage systems and databases. `org.apache.beam.sdk.io.clickhouse.ClickHouseIO` implementation located within the [Apache Beam repo](https://github.com/apache/beam/tree/0bf43078130d7a258a0f1638a921d6d5287ca01e/sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse). @@ -40,7 +40,7 @@ Earlier versions may not fully support the connector's functionality. The artifacts could be found in the [official maven repository](https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-clickhouse). -### Code Example {#code-example} +### Code example {#code-example} The following example reads a CSV file named `input.csv` as a `PCollection`, converts it to a Row object (using the defined schema) and inserts it into a local ClickHouse instance using `ClickHouseIO`: @@ -100,7 +100,7 @@ public class Main { ``` -## Supported Data Types {#supported-data-types} +## Supported data types {#supported-data-types} | ClickHouse | Apache Beam | Is Supported | Notes | |------------------------------------|----------------------------|--------------|------------------------------------------------------------------------------------------------------------------------------------------| @@ -126,7 +126,7 @@ public class Main { | | `Schema.TypeName#DECIMAL` | ❌ | | | | `Schema.TypeName#MAP` | ❌ | | -## ClickHouseIO.Write Parameters {#clickhouseiowrite-parameters} +## ClickHouseIO.Write parameters {#clickhouseiowrite-parameters} You can adjust the `ClickHouseIO.Write` configuration with the following setter functions: @@ -149,6 +149,6 @@ Please consider the following limitations when using the connector: * The connector doesn't perform any DDL statements; therefore, the target table must exist prior insertion. -## Related Content {#related-content} +## Related content {#related-content} * `ClickHouseIO` class [documentation](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/clickhouse/ClickHouseIO.html). * `Github` repository of examples [clickhouse-beam-connector](https://github.com/ClickHouse/clickhouse-beam-connector). diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md index 07fd94c8cbc..a206c8f3814 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md @@ -49,7 +49,7 @@ For the following guides, we assume you have a ClickHouse instance available. ## Setup of dbt and the ClickHouse plugin {#setup-of-dbt-and-the-clickhouse-plugin} -### dbt {#dbt} +### DBT {#dbt} We assume the use of the dbt CLI for the following examples. Users may also wish to consider[ dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), which offers a web-based Integrated Development Environment (IDE) allowing users to edit and run projects. @@ -305,7 +305,7 @@ In the later guides, we will convert this query into a model - materializing it Confirm the response includes `Connection test: [OK connection ok]` indicating a successful connection. -## Creating a Simple View Materialization {#creating-a-simple-view-materialization} +## Creating a simple view materialization {#creating-a-simple-view-materialization} When using the view materialization, a model is rebuilt as a view on each run, via a `CREATE VIEW AS` statement in ClickHouse. This doesn't require any additional storage of data but will be slower to query than table materializations. @@ -433,7 +433,7 @@ When using the view materialization, a model is rebuilt as a view on each run, v +------+------------+----------+------------------+------+---------+-------------------+ ``` -## Creating a Table Materialization {#creating-a-table-materialization} +## Creating a table materialization {#creating-a-table-materialization} In the previous example, our model was materialized as a view. While this might offer sufficient performance for some queries, more complex SELECTs or frequently executed queries may be better materialized as a table. This materialization is useful for models that will be queried by BI tools to ensure users have a faster experience. This effectively causes the query results to be stored as a new table, with the associated storage overheads - effectively, an `INSERT TO SELECT` is executed. Note that this table will be reconstructed each time i.e., it is not incremental. Large result sets may therefore result in long execution times - see [dbt Limitations](#limitations). @@ -811,7 +811,7 @@ WHERE id > (SELECT max(id) FROM imdb_dbt.actor_summary) OR updated_at > (SELECT In this run, only the new rows are added straight to `imdb_dbt.actor_summary` table and there is no table creation involved. -### Delete+Insert mode (Experimental) {#deleteinsert-mode-experimental} +### Delete and insert mode (experimental) {#deleteinsert-mode-experimental} Historically ClickHouse has had only limited support for updates and deletes, in the form of asynchronous [Mutations](/sql-reference/statements/alter/index.md). These can be extremely IO-intensive and should generally be avoided. @@ -835,7 +835,7 @@ This process is shown below: lightweight delete incremental -### insert_overwrite mode (Experimental) {#insert_overwrite-mode-experimental} +### insert_overwrite mode (experimental) {#insert_overwrite-mode-experimental} Performs the following steps: 1. Create a staging (temporary) table with the same structure as the incremental model relation: `CREATE TABLE {staging} AS {target}`. @@ -852,7 +852,7 @@ This approach has the following advantages: insert overwrite incremental -## Creating a Snapshot {#creating-a-snapshot} +## Creating a snapshot {#creating-a-snapshot} dbt snapshots allow a record to be made of changes to a mutable model over time. This in turn allows point-in-time queries on models, where analysts can "look back in time" at the previous state of a model. This is achieved using [type-2 Slowly Changing Dimensions](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) where from and to date columns record when a row was valid. This functionality is supported by the ClickHouse plugin and is demonstrated below. @@ -1028,7 +1028,7 @@ Note how a table actor_summary_snapshot has been created in the snapshots db (de For further details on dbt snapshots see [here](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots). -## Using Seeds {#using-seeds} +## Using seeds {#using-seeds} dbt provides the ability to load data from CSV files. This capability is not suited to loading large exports of a database and is more designed for small files typically used for code tables and [dictionaries](../../../../sql-reference/dictionaries/index.md), e.g. mapping country codes to country names. For a simple example, we generate and then upload a list of genre codes using the seed functionality. diff --git a/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md b/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md index 1d0d833b187..641f33f511c 100644 --- a/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md +++ b/docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md @@ -21,7 +21,7 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained'; pip install "dlt[clickhouse]" ``` -## Setup Guide {#setup-guide} +## Setup guide {#setup-guide} ### 1. Initialize the dlt Project {#1-initialize-the-dlt-project} @@ -61,7 +61,7 @@ GRANT CREATE TEMPORARY TABLE, S3 ON *.* TO dlt; ``` -### 3. Add Credentials {#3-add-credentials} +### 3. Add credentials {#3-add-credentials} Next, set up the ClickHouse credentials in the `.dlt/secrets.toml` file as shown below: @@ -97,7 +97,7 @@ destination.clickhouse.credentials="clickhouse://dlt:Dlt*12345789234567@localhos ``` -## Write Disposition {#write-disposition} +## Write disposition {#write-disposition} All [write dispositions](https://dlthub.com/docs/general-usage/incremental-loading#choosing-a-write-disposition) are supported. @@ -110,7 +110,7 @@ Write dispositions in the dlt library define how the data should be written to t **Append**: This is the default disposition. It will append the data to the existing data in the destination, ignoring the `primary_key` field. -## Data Loading {#data-loading} +## Data loading {#data-loading} Data is loaded into ClickHouse using the most efficient method depending on the data source: - For local files, the `clickhouse-connect` library is used to directly load files into ClickHouse tables using the `INSERT` command. @@ -120,7 +120,7 @@ Data is loaded into ClickHouse using the most efficient method depending on the `Clickhouse` does not support multiple datasets in one database, whereas `dlt` relies on datasets due to multiple reasons. In order to make `Clickhouse` work with `dlt`, tables generated by `dlt` in your `Clickhouse` database will have their names prefixed with the dataset name, separated by the configurable `dataset_table_separator`. Additionally, a special sentinel table that does not contain any data will be created, allowing `dlt` to recognize which virtual datasets already exist in a `Clickhouse` destination. -## Supported File Formats {#supported-file-formats} +## Supported file formats {#supported-file-formats} - jsonl is the preferred format for both direct loading and staging. - parquet is supported for both direct loading and staging. @@ -133,12 +133,12 @@ The `clickhouse` destination has a few specific deviations from the default sql 5. `Clickhouse` accepts adding columns to a populated table that are not null. 6. `Clickhouse` can produce rounding errors under certain conditions when using the float or double datatype. If you cannot afford to have rounding errors, make sure to use the decimal datatype. For example, loading the value 12.7001 into a double column with the loader file format set to `jsonl` will predictably produce a rounding error. -## Supported Column Hints {#supported-column-hints} +## Supported column hints {#supported-column-hints} ClickHouse supports the following column hints: - `primary_key` - marks the column as part of the primary key. Multiple columns can have this hint to create a composite primary key. -## Table Engine {#table-engine} +## Table engine {#table-engine} By default, tables are created using the `ReplicatedMergeTree` table engine in ClickHouse. You can specify an alternate table engine using the `table_engine_type` with the clickhouse adapter: ```bash @@ -158,7 +158,7 @@ Supported values are: - `merge_tree` - creates tables using the `MergeTree` engine - `replicated_merge_tree` (default) - creates tables using the `ReplicatedMergeTree` engine -## Staging Support {#staging-support} +## Staging support {#staging-support} ClickHouse supports Amazon S3, Google Cloud Storage and Azure Blob Storage as file staging destinations. @@ -181,7 +181,7 @@ pipeline = dlt.pipeline( ) ``` -### Using Google Cloud Storage as a Staging Area {#using-google-cloud-storage-as-a-staging-area} +### Using Google Cloud Storage as a staging area {#using-google-cloud-storage-as-a-staging-area} dlt supports using Google Cloud Storage (GCS) as a staging area when loading data into ClickHouse. This is handled automatically by ClickHouse's GCS table function which dlt uses under the hood. The clickhouse GCS table function only supports authentication using Hash-based Message Authentication Code (HMAC) keys. To enable this, GCS provides an S3 compatibility mode that emulates the Amazon S3 API. ClickHouse takes advantage of this to allow accessing GCS buckets via its S3 integration. @@ -221,7 +221,7 @@ There is active work in progress to simplify and improve the GCS staging setup f - Make filesystem destination work with gcs in s3 compatibility mode - Google Cloud Storage staging area support -### dbt Support {#dbt-support} +### Dbt support {#dbt-support} Integration with dbt is generally supported via dbt-clickhouse. ### Syncing of `dlt` state {#syncing-of-dlt-state} diff --git a/docs/integrations/data-ingestion/gcs/index.md b/docs/integrations/data-ingestion/gcs/index.md index 003c8249e66..04c5110c387 100644 --- a/docs/integrations/data-ingestion/gcs/index.md +++ b/docs/integrations/data-ingestion/gcs/index.md @@ -19,13 +19,13 @@ If you are using ClickHouse Cloud on [Google Cloud](https://cloud.google.com), t ClickHouse recognizes that GCS represents an attractive storage solution for users seeking to separate storage and compute. To help achieve this, support is provided for using GCS as the storage for a MergeTree engine. This will enable users to exploit the scalability and cost benefits of GCS, and the insert and query performance of the MergeTree engine. -## GCS Backed MergeTree {#gcs-backed-mergetree} +## GCS backed MergeTree {#gcs-backed-mergetree} -### Creating a Disk {#creating-a-disk} +### Creating a disk {#creating-a-disk} To utilize a GCS bucket as a disk, we must first declare it within the ClickHouse configuration in a file under `conf.d`. An example of a GCS disk declaration is shown below. This configuration includes multiple sections to configure the GCS "disk", the cache, and the policy that is specified in DDL queries when tables are to be created on the GCS disk. Each of these are described below. -#### storage_configuration > disks > gcs {#storage_configuration--disks--gcs} +#### Storage configuration > disks > gcs {#storage_configuration--disks--gcs} This part of the configuration is shown in the highlighted section and specifies that: - Batch deletes are not to be performed. GCS does not currently support batch deletes, so the autodetect is disabled to suppress error messages. @@ -61,7 +61,7 @@ This part of the configuration is shown in the highlighted section and specifies ``` -#### storage_configuration > disks > cache {#storage_configuration--disks--cache} +#### Storage configuration > disks > cache {#storage_configuration--disks--cache} The example configuration highlighted below enables a 10Gi memory cache for the disk `gcs`. @@ -98,7 +98,7 @@ The example configuration highlighted below enables a 10Gi memory cache for the ``` -#### storage_configuration > policies > gcs_main {#storage_configuration--policies--gcs_main} +#### Storage configuration > policies > gcs_main {#storage_configuration--policies--gcs_main} Storage configuration policies allow choosing where data is stored. The policy highlighted below allows data to be stored on the disk `gcs` by specifying the policy `gcs_main`. For example, `CREATE TABLE ... SETTINGS storage_policy='gcs_main'`. @@ -170,12 +170,12 @@ Depending on the hardware, this latter insert of 1m rows may take a few minutes SELECT passenger_count, avg(tip_amount) AS avg_tip, avg(total_amount) AS avg_amount FROM trips_gcs GROUP BY passenger_count; ``` -### Handling Replication {#handling-replication} +### Handling replication {#handling-replication} Replication with GCS disks can be accomplished by using the `ReplicatedMergeTree` table engine. See the [replicating a single shard across two GCP regions using GCS](#gcs-multi-region) guide for details. -### Learn More {#learn-more} +### Learn more {#learn-more} The [Cloud Storage XML API](https://cloud.google.com/storage/docs/xml-api/overview) is interoperable with some tools and libraries that work with services such as Amazon Simple Storage Service (Amazon S3). @@ -201,7 +201,7 @@ Sample requirements for high availability: ClickHouse Keeper requires two nodes to function, hence a requirement for three nodes for high availability. -### Prepare VMs {#prepare-vms} +### Prepare virtual machines {#prepare-vms} Deploy five VMS in three regions: @@ -295,7 +295,7 @@ All of the ClickHouse Keeper nodes have the same configuration file except for t ``` -### Configure ClickHouse Server {#configure-clickhouse-server} +### Configure ClickHouse server {#configure-clickhouse-server} :::note best practice Some of the steps in this guide will ask you to place a configuration file in `/etc/clickhouse-server/config.d/`. This is the default location on Linux systems for configuration override files. When you put these files into that directory ClickHouse will merge the content with the default configuration. By placing these files in the `config.d` directory you will avoid losing your configuration during an upgrade. @@ -624,7 +624,7 @@ formatReadableSize(total_bytes): 36.42 MiB 1 row in set. Elapsed: 0.002 sec. ``` -#### Verify in Google Cloud Console {#verify-in-google-cloud-console} +#### Verify in Google Cloud console {#verify-in-google-cloud-console} Looking at the buckets you will see that a folder was created in each bucket with the name that was used in the `storage.xml` configuration file. Expand the folders and you will see many files, representing the data partitions. #### Bucket for replica one {#bucket-for-replica-one} diff --git a/docs/integrations/data-ingestion/google-dataflow/dataflow.md b/docs/integrations/data-ingestion/google-dataflow/dataflow.md index c05b6b0396e..f982e6c1618 100644 --- a/docs/integrations/data-ingestion/google-dataflow/dataflow.md +++ b/docs/integrations/data-ingestion/google-dataflow/dataflow.md @@ -16,19 +16,19 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported'; There are two main ways to use Google Dataflow with ClickHouse, both are leveraging [`ClickHouseIO Apache Beam connector`](/integrations/apache-beam): -## 1. Java Runner {#1-java-runner} +## 1. Java runner {#1-java-runner} The [Java Runner](./java-runner) allows users to implement custom Dataflow pipelines using the Apache Beam SDK `ClickHouseIO` integration. This approach provides full flexibility and control over the pipeline logic, enabling users to tailor the ETL process to specific requirements. However, this option requires knowledge of Java programming and familiarity with the Apache Beam framework. -### Key Features {#key-features} +### Key features {#key-features} - High degree of customization. - Ideal for complex or advanced use cases. - Requires coding and understanding of the Beam API. -## 2. Predefined Templates {#2-predefined-templates} +## 2. Predefined templates {#2-predefined-templates} ClickHouse offers [predefined templates](./templates) designed for specific use cases, such as importing data from BigQuery into ClickHouse. These templates are ready-to-use and simplify the integration process, making them an excellent choice for users who prefer a no-code solution. -### Key Features {#key-features-1} +### Key features {#key-features-1} - No Beam coding required. - Quick and easy setup for simple use cases. - Suitable also for users with minimal programming expertise. diff --git a/docs/integrations/data-ingestion/google-dataflow/java-runner.md b/docs/integrations/data-ingestion/google-dataflow/java-runner.md index 08926d6c423..fe6abe1e201 100644 --- a/docs/integrations/data-ingestion/google-dataflow/java-runner.md +++ b/docs/integrations/data-ingestion/google-dataflow/java-runner.md @@ -8,13 +8,13 @@ title: 'Dataflow Java Runner' import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported'; -# Dataflow Java Runner +# Dataflow Java runner The Dataflow Java Runner lets you execute custom Apache Beam pipelines on Google Cloud's Dataflow service. This approach provides maximum flexibility and is well-suited for advanced ETL workflows. -## How It Works {#how-it-works} +## How it works {#how-it-works} 1. **Pipeline Implementation** To use the Java Runner, you need to implement your Beam pipeline using the `ClickHouseIO` - our official Apache Beam connector. For code examples and instructions on how to use the `ClickHouseIO`, please visit [ClickHouse Apache Beam](/integrations/apache-beam). diff --git a/docs/integrations/data-ingestion/google-dataflow/templates.md b/docs/integrations/data-ingestion/google-dataflow/templates.md index a5dc4c98569..f2006282b3c 100644 --- a/docs/integrations/data-ingestion/google-dataflow/templates.md +++ b/docs/integrations/data-ingestion/google-dataflow/templates.md @@ -8,19 +8,19 @@ title: 'Google Dataflow Templates' import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported'; -# Google Dataflow Templates +# Google Dataflow templates Google Dataflow templates provide a convenient way to execute prebuilt, ready-to-use data pipelines without the need to write custom code. These templates are designed to simplify common data processing tasks and are built using [Apache Beam](https://beam.apache.org/), leveraging connectors like `ClickHouseIO` for seamless integration with ClickHouse databases. By running these templates on Google Dataflow, you can achieve highly scalable, distributed data processing with minimal effort. -## Why Use Dataflow Templates? {#why-use-dataflow-templates} +## Why use Dataflow templates? {#why-use-dataflow-templates} - **Ease of Use**: Templates eliminate the need for coding by offering preconfigured pipelines tailored to specific use cases. - **Scalability**: Dataflow ensures your pipeline scales efficiently, handling large volumes of data with distributed processing. - **Cost Efficiency**: Pay only for the resources you consume, with the ability to optimize pipeline execution costs. -## How to Run Dataflow Templates {#how-to-run-dataflow-templates} +## How to run Dataflow templates {#how-to-run-dataflow-templates} As of today, the ClickHouse official template is available via the Google Cloud Console, CLI or Dataflow REST API. For detailed step-by-step instructions, refer to the [Google Dataflow Run Pipeline From a Template Guide](https://cloud.google.com/dataflow/docs/templates/provided-templates). diff --git a/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md b/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md index 038fc3da643..7b8040d38da 100644 --- a/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md +++ b/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md @@ -29,7 +29,7 @@ The template can read the entire table or filter specific records using a provid * The target ClickHouse table must exist. * The ClickHouse host must be accessible from the Dataflow worker machines. -## Template Parameters {#template-parameters} +## Template parameters {#template-parameters}

@@ -72,7 +72,7 @@ Having said that, your BigQuery dataset (either table or query) must have the ex target table. ::: -## Data Types Mapping {#data-types-mapping} +## Data type mapping {#data-types-mapping} The BigQuery types are converted based on your ClickHouse table definition. Therefore, the above table lists the recommended mapping you should have in your target ClickHouse table (for a given BigQuery table/query): @@ -134,7 +134,7 @@ To add it, please scroll down to the `Password for ClickHouse Endpoint` option. in [this guide](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#before-you-begin) to set up the required configurations, settings, and permissions for running the DataFlow template. -### Run Command {#run-command} +### Run command {#run-command} Use the [`gcloud dataflow flex-template run`](https://cloud.google.com/sdk/gcloud/reference/dataflow/flex-template/run) command to run a Dataflow job that uses the Flex Template. @@ -172,7 +172,7 @@ job: -### Monitor the Job {#monitor-the-job} +### Monitor the job {#monitor-the-job} Navigate to the [Dataflow Jobs tab](https://console.cloud.google.com/dataflow/jobs) in your Google Cloud Console to monitor the status of the job. You'll find the job details, including progress and any errors: @@ -181,7 +181,7 @@ monitor the status of the job. You'll find the job details, including progress a ## Troubleshooting {#troubleshooting} -### Code: 241. DB::Exception: Memory limit (total) exceeded {#code-241-dbexception-memory-limit-total-exceeded} +### Memory limit (total) exceeded error (code 241) {#code-241-dbexception-memory-limit-total-exceeded} This error occurs when ClickHouse runs out of memory while processing large batches of data. To resolve this issue: diff --git a/docs/integrations/data-ingestion/insert-local-files.md b/docs/integrations/data-ingestion/insert-local-files.md index bd1d5c4bc47..778afef7ca9 100644 --- a/docs/integrations/data-ingestion/insert-local-files.md +++ b/docs/integrations/data-ingestion/insert-local-files.md @@ -7,7 +7,7 @@ description: 'Learn about Insert Local Files' show_related_blogs: true --- -# Insert Local Files +# Insert local files You can use `clickhouse-client` to stream local files into your ClickHouse service. This allows you the ability to preprocess the data using the many powerful and convenient ClickHouse functions. Let's look at an example... diff --git a/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md b/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md index e40af985a27..f2651bfc0f0 100644 --- a/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md +++ b/docs/integrations/data-ingestion/kafka/confluent/custom-connector.md @@ -38,12 +38,12 @@ For more details, please refer to the [official Confluent documentation](https:/ #### Create a Topic {#create-a-topic} Creating a topic on Confluent Platform is fairly simple, and there are detailed instructions [here](https://docs.confluent.io/cloud/current/client-apps/topics/manage.html). -#### Important Notes {#important-notes} +#### Important notes {#important-notes} * The Kafka topic name must be the same as the ClickHouse table name. The way to tweak this is by using a transformer (for example [`ExtractTopic`](https://docs.confluent.io/platform/current/connect/transforms/extracttopic.html)). * More partitions does not always mean more performance - see our upcoming guide for more details and performance tips. -#### Install Connector {#install-connector} +#### Install connector {#install-connector} You can download the connector from our [repository](https://github.com/ClickHouse/clickhouse-kafka-connect/releases) - please feel free to submit comments and issues there as well! Navigate to "Connector Plugins" -> "Add plugin" and using the following settings: @@ -59,7 +59,7 @@ Example: #### Gather your connection details {#gather-your-connection-details} -#### Configure the Connector {#configure-the-connector} +#### Configure the connector {#configure-the-connector} Navigate to `Connectors` -> `Add Connector` and use the following settings (note that the values are examples only): ```json @@ -93,7 +93,7 @@ You must specify HTTP(S) port. The Connector doesn't support Native protocol yet You should be all set! -#### Known Limitations {#known-limitations} +#### Known limitations {#known-limitations} * Custom Connectors must use public internet endpoints. Static IP addresses aren't supported. * You can override some Custom Connector properties. See the fill [list in the official documentation.](https://docs.confluent.io/cloud/current/connectors/bring-your-connector/custom-connector-manage.html#override-configuration-properties) * Custom Connectors are available only in [some AWS regions](https://docs.confluent.io/cloud/current/connectors/bring-your-connector/custom-connector-fands.html#supported-aws-regions) diff --git a/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md b/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md index 2a107bec3b4..21589e54ed8 100644 --- a/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md +++ b/docs/integrations/data-ingestion/kafka/confluent/kafka-connect-http.md @@ -14,7 +14,7 @@ import httpAdvanced from '@site/static/images/integrations/data-ingestion/kafka/ import createMessageInTopic from '@site/static/images/integrations/data-ingestion/kafka/confluent/create_message_in_topic.png'; -# Confluent HTTP Sink Connector +# Confluent HTTP sink connector The HTTP Sink Connector is data type agnostic and thus does not need a Kafka schema as well as supporting ClickHouse specific data types such as Maps and Arrays. This additional flexibility comes at a slight increase in configuration complexity. Below we describe a simple installation, pulling messages from a single Kafka topic and inserting rows into a ClickHouse table. @@ -29,7 +29,7 @@ Below we describe a simple installation, pulling messages from a single Kafka to -#### 2. Run Kafka Connect and the HTTP Sink Connector {#2-run-kafka-connect-and-the-http-sink-connector} +#### 2. Run Kafka Connect and the HTTP sink connector {#2-run-kafka-connect-and-the-http-sink-connector} You have two options: @@ -109,7 +109,7 @@ From the [Sink documentation](https://docs.confluent.io/kafka-connectors/http/cu 1. Verify your Kafka records have the same key. 2. When you add parameters to the HTTP API URL, each record can result in a unique URL. For this reason, batching is disabled when using additional URL parameters. -#### 400 Bad Request {#400-bad-request} +#### 400 bad request {#400-bad-request} ##### CANNOT_PARSE_QUOTED_STRING {#cannot_parse_quoted_string} If HTTP Sink fails with the following message when inserting a JSON object into a `String` column: @@ -123,7 +123,7 @@ Set `input_format_json_read_objects_as_strings=1` setting in URL as encoded stri Note that this example preserves the Array fields of the Github dataset. We assume you have an empty github topic in the examples and use [kcat](https://github.com/edenhill/kcat) for message insertion to Kafka. -##### 1. Prepare Configuration {#1-prepare-configuration} +##### 1. Prepare configuration {#1-prepare-configuration} Follow [these instructions](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#set-up-a-local-connect-worker-with-cp-install) for setting up Connect relevant to your installation type, noting the differences between a standalone and distributed cluster. If using Confluent Cloud, the distributed setup is relevant. diff --git a/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md b/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md index b3191a56626..d8dcb3f4969 100644 --- a/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md +++ b/docs/integrations/data-ingestion/kafka/kafka-clickhouse-connect-sink.md @@ -29,7 +29,7 @@ The [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.htm |----------------------------------|--------------------|---------------|--------------------| | 1.0.0 | > 23.3 | > 2.7 | > 6.1 | -### Main Features {#main-features} +### Main features {#main-features} - Shipped with out-of-the-box exactly-once semantics. It's powered by a new ClickHouse core feature named [KeeperMap](https://github.com/ClickHouse/ClickHouse/pull/39976) (used as a state store by the connector) and allows for minimalistic architecture. - Support for 3rd-party state stores: Currently defaults to In-memory but can use KeeperMap (Redis to be added soon). @@ -44,7 +44,7 @@ The [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.htm -#### General Installation Instructions {#general-installation-instructions} +#### General installation instructions {#general-installation-instructions} The connector is distributed as a single JAR file containing all the class files necessary to run the plugin. @@ -118,7 +118,7 @@ The full table of configuration options: | `tolerateStateMismatch` | Allows the connector to drop records "earlier" than the current offset stored AFTER_PROCESSING (e.g. if offset 5 is sent, and offset 250 was the last recorded offset) | `"false"` | | `ignorePartitionsWhenBatching` | Will ignore partition when collecting messages for insert (though only if `exactlyOnce` is `false`). Performance Note: The more connector tasks, the fewer kafka partitions assigned per task - this can mean diminishing returns. | `"false"` | -### Target Tables {#target-tables} +### Target tables {#target-tables} ClickHouse Connect Sink reads messages from Kafka topics and writes them to appropriate tables. ClickHouse Connect Sink writes data into existing tables. Please, make sure a target table with an appropriate schema was created in ClickHouse before starting to insert data into it. @@ -129,7 +129,7 @@ Each topic requires a dedicated target table in ClickHouse. The target table nam If you need to transform outbound messages before they are sent to ClickHouse Kafka Connect Sink, use [Kafka Connect Transformations](https://docs.confluent.io/platform/current/connect/transforms/overview.html). -### Supported Data types {#supported-data-types} +### Supported data types {#supported-data-types} **With a schema declared:** @@ -163,11 +163,11 @@ Sink, use [Kafka Connect Transformations](https://docs.confluent.io/platform/cur A record is converted into JSON and sent to ClickHouse as a value in [JSONEachRow](../../../sql-reference/formats.mdx#jsoneachrow) format. -### Configuration Recipes {#configuration-recipes} +### Configuration recipes {#configuration-recipes} These are some common configuration recipes to get you started quickly. -#### Basic Configuration {#basic-configuration} +#### Basic configuration {#basic-configuration} The most basic configuration to get you started - it assumes you're running Kafka Connect in distributed mode and have a ClickHouse server running on `localhost:8443` with SSL enabled, data is in schemaless JSON. @@ -196,7 +196,7 @@ The most basic configuration to get you started - it assumes you're running Kafk } ``` -#### Basic Configuration with Multiple Topics {#basic-configuration-with-multiple-topics} +#### Basic configuration with multiple topics {#basic-configuration-with-multiple-topics} The connector can consume data from multiple topics @@ -212,7 +212,7 @@ The connector can consume data from multiple topics } ``` -#### Basic Configuration with DLQ {#basic-configuration-with-dlq} +#### Basic configuration with DLQ {#basic-configuration-with-dlq} ```json { @@ -229,7 +229,7 @@ The connector can consume data from multiple topics #### Using with different data formats {#using-with-different-data-formats} -##### Avro Schema Support {#avro-schema-support} +##### Avro schema support {#avro-schema-support} ```json { @@ -244,7 +244,7 @@ The connector can consume data from multiple topics } ``` -##### Protobuf Schema Support {#protobuf-schema-support} +##### Protobuf schema support {#protobuf-schema-support} ```json { @@ -261,7 +261,7 @@ The connector can consume data from multiple topics Please note: if you encounter issues with missing classes, not every environment comes with the protobuf converter and you may need an alternate release of the jar bundled with dependencies. -##### JSON Schema Support {#json-schema-support} +##### JSON schema support {#json-schema-support} ```json { @@ -274,7 +274,7 @@ Please note: if you encounter issues with missing classes, not every environment } ``` -##### String Support {#string-support} +##### String support {#string-support} The connector supports the String Converter in different ClickHouse formats: [JSON](/interfaces/formats#jsoneachrow), [CSV](/interfaces/formats#csv), and [TSV](/interfaces/formats#tabseparated). @@ -328,11 +328,11 @@ ClickHouse Kafka Connect reports the following metrics: - Batch size is inherited from the Kafka Consumer properties. - When using KeeperMap for exactly-once and the offset is changed or re-wound, you need to delete the content from KeeperMap for that specific topic. (See troubleshooting guide below for more details) -### Tuning Performance {#tuning-performance} +### Tuning performance {#tuning-performance} If you've ever though to yourself "I would like to adjust the batch size for the sink connector", then this is the section for you. -##### Connect Fetch vs Connector Poll {#connect-fetch-vs-connector-poll} +##### Connect fetch vs connector poll {#connect-fetch-vs-connector-poll} Kafka Connect (the framework our sink connector is built on) will fetch messages from kafka topics in the background (independent of the connector). diff --git a/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md b/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md index 2f56b1086fd..501d2abf857 100644 --- a/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md +++ b/docs/integrations/data-ingestion/kafka/kafka-connect-jdbc.md @@ -8,7 +8,7 @@ title: 'JDBC Connector' import ConnectionDetails from '@site/docs/_snippets/_gather_your_details_http.mdx'; -# JDBC Connector +# JDBC connector :::note This connector should only be used if your data is simple and consists of primitive data types e.g., int. ClickHouse specific types such as maps are not supported. @@ -47,7 +47,7 @@ Common Issue: the docs suggest copying the jar to `share/java/kafka-connect-jdbc ::: -#### 3. Prepare Configuration {#3-prepare-configuration} +#### 3. Prepare configuration {#3-prepare-configuration} Follow [these instructions](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#set-up-a-local-connect-worker-with-cp-install) for setting up a Connect relevant to your installation type, noting the differences between a standalone and distributed cluster. If using Confluent Cloud the distributed setup is relevant. @@ -147,7 +147,7 @@ SELECT count() FROM default.github; | 10000 | ``` -### Recommended Further Reading {#recommended-further-reading} +### Recommended further reading {#recommended-further-reading} * [Kafka Sink Configuration Parameters](https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/sink_config_options.html#sink-config-options) * [Kafka Connect Deep Dive – JDBC Source Connector](https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector) diff --git a/docs/integrations/data-ingestion/kafka/kafka-table-engine-named-collections.md b/docs/integrations/data-ingestion/kafka/kafka-table-engine-named-collections.md index da200f96ddc..81edf388e47 100644 --- a/docs/integrations/data-ingestion/kafka/kafka-table-engine-named-collections.md +++ b/docs/integrations/data-ingestion/kafka/kafka-table-engine-named-collections.md @@ -5,7 +5,7 @@ keywords: ['named collection', 'how to', 'kafka'] slug: /integrations/data-ingestion/kafka/kafka-table-engine-named-collections --- -# Integrating ClickHouse with Kafka using Named Collections +# Integrating ClickHouse with Kafka using named collections ## Introduction {#introduction} @@ -89,24 +89,24 @@ Add the following section to your ClickHouse `config.xml` file: ``` -### Configuration Notes {#configuration-notes} +### Configuration notes {#configuration-notes} 1. Adjust Kafka addresses and related configurations to match your Kafka cluster setup. 2. The section before `` contains ClickHouse Kafka engine parameters. For a full list of parameters, refer to the [Kafka engine parameters ](/engines/table-engines/integrations/kafka). 3. The section within `` contains extended Kafka configuration options. For more options, refer to the [librdkafka configuration](https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md). 4. This example uses the `SASL_SSL` security protocol and `PLAIN` mechanism. Adjust these settings based on your Kafka cluster configuration. -## Creating Tables and Databases {#creating-tables-and-databases} +## Creating tables and databases {#creating-tables-and-databases} Create the necessary databases and tables on your ClickHouse cluster. If you run ClickHouse as a single node, omit the cluster part of the SQL command and use any other engine instead of `ReplicatedMergeTree`. -### Create the Database {#create-the-database} +### Create the database {#create-the-database} ```sql CREATE DATABASE kafka_testing ON CLUSTER LAB_CLICKHOUSE_CLUSTER; ``` -### Create Kafka Tables {#create-kafka-tables} +### Create Kafka tables {#create-kafka-tables} Create the first Kafka table for the first Kafka cluster: @@ -132,7 +132,7 @@ CREATE TABLE kafka_testing.second_kafka_table ON CLUSTER STAGE_CLICKHOUSE_CLUSTE ENGINE = Kafka(cluster_2); ``` -### Create Replicated Tables {#create-replicated-tables} +### Create replicated tables {#create-replicated-tables} Create a table for the first Kafka table: @@ -158,7 +158,7 @@ CREATE TABLE kafka_testing.second_replicated_table ON CLUSTER STAGE_CLICKHOUSE_C ORDER BY id; ``` -### Create Materialized Views {#create-materialized-views} +### Create materialized views {#create-materialized-views} Create a materialized view to insert data from the first Kafka table into the first replicated table: @@ -182,7 +182,7 @@ SELECT FROM second_kafka_table; ``` -## Verifying the Setup {#verifying-the-setup} +## Verifying the setup {#verifying-the-setup} You should now see the relative consumer groups on your Kafka clusters: - `cluster_1_clickhouse_consumer` on `cluster_1` diff --git a/docs/integrations/data-ingestion/kafka/kafka-table-engine.md b/docs/integrations/data-ingestion/kafka/kafka-table-engine.md index ef102450bd5..8d9d6bafa1c 100644 --- a/docs/integrations/data-ingestion/kafka/kafka-table-engine.md +++ b/docs/integrations/data-ingestion/kafka/kafka-table-engine.md @@ -212,7 +212,7 @@ You should see 200,000 rows: └─────────┘ ``` -#### Common Operations {#common-operations} +#### Common operations {#common-operations} ##### Stopping & restarting message consumption {#stopping--restarting-message-consumption} @@ -228,7 +228,7 @@ This will not impact the offsets of the consumer group. To restart consumption, ATTACH TABLE github_queue; ``` -##### Adding Kafka Metadata {#adding-kafka-metadata} +##### Adding Kafka metadata {#adding-kafka-metadata} It can be useful to keep track of the metadata from the original Kafka messages after it's been ingested into ClickHouse. For example, we may want to know how much of a specific topic or partition we have consumed. For this purpose, the Kafka table engine exposes several [virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns). These can be persisted as columns in our target table by modifying our schema and materialized view's select statement. @@ -290,7 +290,7 @@ The result looks like: | Oxonium | CommitCommentEvent | 2011-02-12 12:31:28 | github | 0 | -##### Modify Kafka Engine Settings {#modify-kafka-engine-settings} +##### Modify Kafka engine settings {#modify-kafka-engine-settings} We recommend dropping the Kafka engine table and recreating it with the new settings. The materialized view does not need to be modified during this process - message consumption will resume once the Kafka engine table is recreated. @@ -459,7 +459,7 @@ wc -l Although an elaborate example, this illustrates the power of materialized views when used in conjunction with the Kafka engine. -### Clusters and Performance {#clusters-and-performance} +### Clusters and performance {#clusters-and-performance} #### Working with ClickHouse Clusters {#working-with-clickhouse-clusters} @@ -469,7 +469,7 @@ Multiple ClickHouse instances can all be configured to read from a topic using t Kafka table engine with ClickHouse clusters diagram -#### Tuning Performance {#tuning-performance} +#### Tuning performance {#tuning-performance} Consider the following when looking to increase Kafka Engine table throughput performance: @@ -483,7 +483,7 @@ Consider the following when looking to increase Kafka Engine table throughput pe Any settings changes should be tested. We recommend monitoring Kafka consumer lags to ensure you are properly scaled. -#### Additional Settings {#additional-settings} +#### Additional settings {#additional-settings} Aside from the settings discussed above, the following may be of interest: diff --git a/docs/integrations/data-ingestion/kafka/msk/index.md b/docs/integrations/data-ingestion/kafka/msk/index.md index 3af1223addd..ca17e60b88b 100644 --- a/docs/integrations/data-ingestion/kafka/msk/index.md +++ b/docs/integrations/data-ingestion/kafka/msk/index.md @@ -78,7 +78,7 @@ consumer.max.partition.fetch.bytes=1048576 You can find more details (both implementation and other considerations) in the official [Kafka](https://kafka.apache.org/documentation/#consumerconfigs) and [Amazon MSK](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-workers.html#msk-connect-create-custom-worker-config) documentation. -## Notes on Networking for MSK Connect {#notes-on-networking-for-msk-connect} +## Notes on networking for MSK Connect {#notes-on-networking-for-msk-connect} In order for MSK Connect to connect to ClickHouse, we recommend your MSK cluster to be in a private subnet with a Private NAT connected for internet access. Instructions on how to set this up are provided below. Note that public subnets are supported but not recommended due to the need to constantly assign an Elastic IP address to your ENI, [AWS provides more details here](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-internet-access.html) diff --git a/docs/integrations/data-ingestion/redshift/index.md b/docs/integrations/data-ingestion/redshift/index.md index 27bdd455c7f..5c013339e38 100644 --- a/docs/integrations/data-ingestion/redshift/index.md +++ b/docs/integrations/data-ingestion/redshift/index.md @@ -15,9 +15,9 @@ import s3_1 from '@site/static/images/integrations/data-ingestion/redshift/s3-1. import s3_2 from '@site/static/images/integrations/data-ingestion/redshift/s3-2.png'; import Image from '@theme/IdealImage'; -# Migrating Data from Redshift to ClickHouse +# Migrating data from Redshift to ClickHouse -## Related Content {#related-content} +## Related content {#related-content}
diff --git a/docs/materialized-view/refreshable-materialized-view.md b/docs/materialized-view/refreshable-materialized-view.md index 1d797380ac7..9b2362b868d 100644 --- a/docs/materialized-view/refreshable-materialized-view.md +++ b/docs/materialized-view/refreshable-materialized-view.md @@ -1,6 +1,6 @@ --- slug: /materialized-view/refreshable-materialized-view -title: 'Refreshable Materialized View' +title: 'Refreshable materialized view' description: 'How to use materialized views to speed up queries' keywords: ['refreshable materialized view', 'refresh', 'materialized views', 'speed up queries', 'query optimization'] --- diff --git a/docs/migrations/bigquery/equivalent-concepts.md b/docs/migrations/bigquery/equivalent-concepts.md index f12bab2bb28..48352b93c42 100644 --- a/docs/migrations/bigquery/equivalent-concepts.md +++ b/docs/migrations/bigquery/equivalent-concepts.md @@ -9,7 +9,7 @@ show_related_blogs: true import bigquery_1 from '@site/static/images/migrations/bigquery-1.png'; import Image from '@theme/IdealImage'; -# BigQuery vs ClickHouse Cloud: Equivalent and different concepts +# BigQuery vs ClickHouse Cloud: equivalent and different concepts ## Resource organization {#resource-organization} @@ -21,7 +21,7 @@ The way resources are organized in ClickHouse Cloud is similar to [BigQuery's re Similar to BigQuery, organizations are the root nodes in the ClickHouse cloud resource hierarchy. The first user you set up in your ClickHouse Cloud account is automatically assigned to an organization owned by the user. The user may invite additional users to the organization. -### BigQuery Projects vs ClickHouse Cloud Services {#bigquery-projects-vs-clickhouse-cloud-services} +### BigQuery projects vs ClickHouse Cloud services {#bigquery-projects-vs-clickhouse-cloud-services} Within organizations, you can create services loosely equivalent to BigQuery projects because stored data in ClickHouse Cloud is associated with a service. There are [several service types available](/cloud/manage/cloud-tiers) in ClickHouse Cloud. Each ClickHouse Cloud service is deployed in a specific region and includes: @@ -29,15 +29,15 @@ Within organizations, you can create services loosely equivalent to BigQuery pro 2. An object storage folder where the service stores all the data. 3. An endpoint (or multiple endpoints created via ClickHouse Cloud UI console) - a service URL that you use to connect to the service (for example, `https://dv2fzne24g.us-east-1.aws.clickhouse.cloud:8443`) -### BigQuery Datasets vs ClickHouse Cloud Databases {#bigquery-datasets-vs-clickhouse-cloud-databases} +### BigQuery datasets vs ClickHouse Cloud databases {#bigquery-datasets-vs-clickhouse-cloud-databases} ClickHouse logically groups tables into databases. Like BigQuery datasets, ClickHouse databases are logical containers that organize and control access to table data. -### BigQuery Folders {#bigquery-folders} +### BigQuery folders {#bigquery-folders} ClickHouse Cloud currently has no concept equivalent to BigQuery folders. -### BigQuery Slot reservations and Quotas {#bigquery-slot-reservations-and-quotas} +### BigQuery slot reservations and quotas {#bigquery-slot-reservations-and-quotas} Like BigQuery slot reservations, you can [configure vertical and horizontal autoscaling](/manage/scaling#configuring-vertical-auto-scaling) in ClickHouse Cloud. For vertical autoscaling, you can set the minimum and maximum size for the memory and CPU cores of the compute nodes for a service. The service will then scale as needed within those bounds. These settings are also available during the initial service creation flow. Each compute node in the service has the same size. You can change the number of compute nodes within a service with [horizontal scaling](/manage/scaling#manual-horizontal-scaling). @@ -78,7 +78,7 @@ When presented with multiple options for ClickHouse types, consider the actual r ## Query acceleration techniques {#query-acceleration-techniques} -### Primary and Foreign keys and Primary index {#primary-and-foreign-keys-and-primary-index} +### Primary and foreign keys and primary index {#primary-and-foreign-keys-and-primary-index} In BigQuery, a table can have [primary key and foreign key constraints](https://cloud.google.com/bigquery/docs/information-schema-table-constraints). Typically, primary and foreign keys are used in relational databases to ensure data integrity. A primary key value is normally unique for each row and is not `NULL`. Each foreign key value in a row must be present in the primary key column of the primary key table or be `NULL`. In BigQuery, these constraints are not enforced, but the query optimizer may use this information to optimize queries better. diff --git a/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md b/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md index a146df70e1d..8c89305f4ac 100644 --- a/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md +++ b/docs/migrations/bigquery/migrating-to-clickhouse-cloud.md @@ -66,7 +66,7 @@ Before trying the following examples, we recommend users review the [permissions Change Data Capture (CDC) is the process by which tables are kept in sync between two databases. This is significantly more complex if updates and deletes are to be handled in near real-time. One approach is to simply schedule a periodic export using BigQuery's [scheduled query functionality](https://cloud.google.com/bigquery/docs/scheduling-queries). Provided you can accept some delay in the data being inserted into ClickHouse, this approach is easy to implement and maintain. An example is given in [this blog post](https://clickhouse.com/blog/clickhouse-bigquery-migrating-data-for-realtime-queries#using-scheduled-queries). -## Designing Schemas {#designing-schemas} +## Designing schemas {#designing-schemas} The Stack Overflow dataset contains a number of related tables. We recommend focusing on migrating the primary table first. This may not necessarily be the largest table but rather the one on which you expect to receive the most analytical queries. This will allow you to familiarize yourself with the main ClickHouse concepts. This table may require remodeling as additional tables are added to fully exploit ClickHouse features and obtain optimal performance. We explore this modeling process in our [Data Modeling docs](/data-modeling/schema-design#next-data-modeling-techniques). @@ -515,7 +515,7 @@ MaxViewCount: 66975 Peak memory usage: 377.26 MiB. ``` -## Conditionals and Arrays {#conditionals-and-arrays} +## Conditionals and arrays {#conditionals-and-arrays} Conditional and array functions make queries significantly simpler. The following query computes the tags (with more than 10000 occurrences) with the largest percentage increase from 2022 to 2023. Note how the following ClickHouse query is succinct thanks to conditionals, array functions, and the ability to reuse aliases in the `HAVING` and `SELECT` clauses. diff --git a/docs/migrations/postgres/appendix.md b/docs/migrations/postgres/appendix.md index 99e0286be0c..6e4b762cfbd 100644 --- a/docs/migrations/postgres/appendix.md +++ b/docs/migrations/postgres/appendix.md @@ -13,7 +13,7 @@ import Image from '@theme/IdealImage'; Users coming from OLTP systems who are used to ACID transactions should be aware that ClickHouse makes deliberate compromises in not fully providing these in exchange for performance. ClickHouse semantics can deliver high durability guarantees and high write throughput if well understood. We highlight some key concepts below that users should be familiar with prior to working with ClickHouse from Postgres. -### Shards vs Replicas {#shards-vs-replicas} +### Shards vs replicas {#shards-vs-replicas} Sharding and replication are two strategies used for scaling beyond one Postgres instance when storage and/or compute become a bottleneck to performance. Sharding in Postgres involves splitting a large database into smaller, more manageable pieces across multiple nodes. However, Postgres does not support sharding natively. Instead, sharding can be achieved using extensions such as [Citus](https://www.citusdata.com/), in which Postgres becomes a distributed database capable of scaling horizontally. This approach allows Postgres to handle higher transaction rates and larger datasets by spreading the load across several machines. Shards can be row or schema-based in order to provide flexibility for workload types, such as transactional or analytical. Sharding can introduce significant complexity in terms of data management and query execution as it requires coordination across multiple machines and consistency guarantees. diff --git a/docs/migrations/postgres/data-modeling-techniques.md b/docs/migrations/postgres/data-modeling-techniques.md index 3e34f294cf1..f864bd8fb3e 100644 --- a/docs/migrations/postgres/data-modeling-techniques.md +++ b/docs/migrations/postgres/data-modeling-techniques.md @@ -71,7 +71,7 @@ PARTITION BY toYear(CreationDate) For a full description of partitioning see ["Table partitions"](/partitions). -### Applications of Partitions {#applications-of-partitions} +### Applications of partitions {#applications-of-partitions} Partitioning in ClickHouse has similar applications as in Postgres but with some subtle differences. More specifically: @@ -114,7 +114,7 @@ Ok. - **Query optimization** - While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it. However, queries that need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts as a result of partitioning). The benefit of targeting a single partition will be even less pronounced to non-existence if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, users should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the day, e.g., partitioning by day, with most queries in the last day. -### Recommendations for Partitions {#recommendations-for-partitions} +### Recommendations for partitions {#recommendations-for-partitions} Users should consider partitioning a data management technique. It is ideal when data needs to be expired from the cluster when operating with time series data e.g. the oldest partition can [simply be dropped](/sql-reference/statements/alter/partition#drop-partitionpart). diff --git a/docs/migrations/postgres/overview.md b/docs/migrations/postgres/overview.md index ed3173ef9ee..ca1d195b914 100644 --- a/docs/migrations/postgres/overview.md +++ b/docs/migrations/postgres/overview.md @@ -23,7 +23,7 @@ When migrating from PostgreSQL to ClickHouse, the right strategy depends on your Below section describes the two main strategies for migration: **Real-Time CDC** and **Manual Bulk Load + Periodic Updates**. -### Real-Time replication (CDC) {#real-time-replication-cdc} +### Real-time replication (CDC) {#real-time-replication-cdc} Change Data Capture (CDC) is the process by which tables are kept in sync between two databases. It is the most efficient approach for most migration from PostgreSQL, but yet more complex as it handles insert, updates and deletes from PostgreSQL to ClickHouse in near real-time. It is ideal for use cases where real-time analytics are important. diff --git a/docs/native-protocol/columns.md b/docs/native-protocol/columns.md index 4f1b1bf960d..e0b65368ac0 100644 --- a/docs/native-protocol/columns.md +++ b/docs/native-protocol/columns.md @@ -5,7 +5,7 @@ title: 'Column types' description: 'Column types for the native protocol' --- -# Column Types +# Column types See [Data Types](/sql-reference/data-types/) for general reference. @@ -77,7 +77,7 @@ Alias of `FixedString(16)`, UUID value represented as binary. Alias of `Int8` or `Int16`, but each integer is mapped to some `String` value. -## Low Cardinality {#low-cardinality} +## `LowCardinality` type {#low-cardinality} `LowCardinality(T)` consists of `Index T, Keys K`, where `K` is one of (UInt8, UInt16, UInt32, UInt64) depending on size of `Index`. diff --git a/docs/native-protocol/hash.md b/docs/native-protocol/hash.md index 16a89325ab6..3de9ac70315 100644 --- a/docs/native-protocol/hash.md +++ b/docs/native-protocol/hash.md @@ -7,12 +7,13 @@ description: 'Native protocol hash' # CityHash -ClickHouse uses **one of previous** versions of [CityHash from Google](https://github.com/google/cityhash). +ClickHouse uses **one of the previous** versions of [CityHash from Google](https://github.com/google/cityhash). :::info CityHash has changed the algorithm after we have added it into ClickHouse. -CityHash documentation specifically notes that the user should not rely to specific hash values and should not save it anywhere or use it as sharding key. +CityHash documentation specifically notes that the user should not rely on +specific hash values and should not save it anywhere or use it as a sharding key. But as we exposed this function to the user, we had to fix the version of CityHash (to 1.0.2). And now we guarantee that the behaviour of CityHash functions available in SQL will not change. diff --git a/docs/use-cases/observability/build-your-own/grafana.md b/docs/use-cases/observability/build-your-own/grafana.md index 33ab7bf899c..0cb500048b4 100644 --- a/docs/use-cases/observability/build-your-own/grafana.md +++ b/docs/use-cases/observability/build-your-own/grafana.md @@ -83,7 +83,7 @@ This query returns the column names expected by Grafana, rendering a table of tr Users wishing to write more complex queries can switch to the `SQL Editor`. -### View Trace details {#view-trace-details} +### View trace details {#view-trace-details} As shown above, Trace ids are rendered as clickable links. On clicking on a trace Id, a user can choose to view the associated spans via the link `View Trace`. This issues the following query (assuming OTel columns) to retrieve the spans in the required structure, rendering the results as a waterfall. diff --git a/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md b/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md index ea8bce63995..00517e5c3ae 100644 --- a/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md +++ b/docs/use-cases/observability/build-your-own/integrating-opentelemetry.md @@ -15,7 +15,7 @@ import observability_8 from '@site/static/images/use-cases/observability/observa import observability_9 from '@site/static/images/use-cases/observability/observability-9.png'; import Image from '@theme/IdealImage'; -# Integrating OpenTelemetry for Data Collection +# Integrating OpenTelemetry for data collection Any Observability solution requires a means of collecting and exporting logs and traces. For this purpose, ClickHouse recommends [the OpenTelemetry (OTel) project](https://opentelemetry.io/). @@ -83,7 +83,7 @@ This approach requires users to instrument their code with their [appropriate la [`otelbin.io`](https://www.otelbin.io/) is useful to validate and visualize configurations. ::: -## Structured vs Unstructured {#structured-vs-unstructured} +## Structured vs unstructured {#structured-vs-unstructured} Logs can either be structured or unstructured. @@ -202,7 +202,7 @@ The above messages don't have a `TraceID` or `SpanID` field. If present, e.g. in For users needing to collect local or Kubernetes log files, we recommend users become familiar with the configuration options available for the [filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/filelogreceiver/README.md#configuration) and how [offsets](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver#offset-tracking) and [multiline log parsing is handled](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver#example---multiline-logs-parsing). -## Collecting Kubernetes Logs {#collecting-kubernetes-logs} +## Collecting Kubernetes logs {#collecting-kubernetes-logs} For the collection of Kubernetes logs, we recommend the [OpenTelemetry documentation guide](https://opentelemetry.io/docs/kubernetes/). The [Kubernetes Attributes Processor](https://opentelemetry.io/docs/kubernetes/collector/components/#kubernetes-attributes-processor) is recommended for enriching logs and metrics with pod metadata. This can potentially produce dynamic metadata e.g. labels, stored in the column `ResourceAttributes`. ClickHouse currently uses the type `Map(String, String)` for this column. See [Using Maps](/use-cases/observability/schema-design#using-maps) and [Extracting from maps](/use-cases/observability/schema-design#extracting-from-maps) for further details on handling and optimizing this type. @@ -602,7 +602,7 @@ From the collector's perspective, (1) and (2) can be hard to distinguish. Howeve We recommend users use the [batch processor](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md) shown in earlier configurations to satisfy the above. This ensures inserts are sent as consistent batches of rows satisfying the above requirements. If a collector is expected to have high throughput (events per second), and at least 5000 events can be sent in each insert, this is usually the only batching required in the pipeline. In this case the collector will flush batches before the batch processor's `timeout` is reached, ensuring the end-to-end latency of the pipeline remains low and batches are of a consistent size. -### Use Asynchronous inserts {#use-asynchronous-inserts} +### Use asynchronous inserts {#use-asynchronous-inserts} Typically, users are forced to send smaller batches when the throughput of a collector is low, and yet they still expect data to reach ClickHouse within a minimum end-to-end latency. In this case, small batches are sent when the `timeout` of the batch processor expires. This can cause problems and is when asynchronous inserts are required. This case typically arises when **collectors in the agent role are configured to send directly to ClickHouse**. Gateways, by acting as aggregators, can alleviate this problem - see [Scaling with Gateways](#scaling-with-gateways). @@ -624,7 +624,7 @@ Finally, the previous deduplication behavior associated with synchronous inserts Full details on configuring this feature can be found [here](/optimize/asynchronous-inserts#enabling-asynchronous-inserts), with a deep dive [here](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse). -## Deployment Architectures {#deployment-architectures} +## Deployment architectures {#deployment-architectures} Several deployment architectures are possible when using the OTel collector with Clickhouse. We describe each below and when it is likely applicable. @@ -642,7 +642,7 @@ Users should consider migrating to a Gateway-based architecture once the number - **Processing at the edge** - Any transformations or event processing has to be performed at the edge or in ClickHouse in this architecture. As well as being restrictive this can either mean complex ClickHouse materialized views or pushing significant computation to the edge - where critical services may be impacted and resources scarce. - **Small batches and latencies** - Agent collectors may individually collect very few events. This typically means they need to be configured to flush at a set interval to satisfy delivery SLAs. This can result in the collector sending small batches to ClickHouse. While a disadvantage, this can be mitigated with Asynchronous inserts - see [Optimizing inserts](#optimizing-inserts). -### Scaling with Gateways {#scaling-with-gateways} +### Scaling with gateways {#scaling-with-gateways} OTel collectors can be deployed as Gateway instances to address the above limitations. These provide a standalone service, typically per data center or per region. These receive events from applications (or other collectors in the agent role) via a single OTLP endpoint. Typically a set of gateway instances are deployed, with an out-of-the-box load balancer used to distribute the load amongst them. diff --git a/docs/use-cases/observability/build-your-own/introduction.md b/docs/use-cases/observability/build-your-own/introduction.md index 6862cf3da06..ef2cacbcb33 100644 --- a/docs/use-cases/observability/build-your-own/introduction.md +++ b/docs/use-cases/observability/build-your-own/introduction.md @@ -10,7 +10,7 @@ import observability_1 from '@site/static/images/use-cases/observability/observa import observability_2 from '@site/static/images/use-cases/observability/observability-2.png'; import Image from '@theme/IdealImage'; -# Using ClickHouse for Observability +# Using ClickHouse for observability ## Introduction {#introduction} @@ -84,7 +84,7 @@ We currently recommend ClickHouse for storing two types of observability data: While ClickHouse can be used to store metrics data, this pillar is less mature in ClickHouse with pending support for features such as support for the Prometheus data format and PromQL. ::: -### Distributed Tracing {#distributed-tracing} +### Distributed tracing {#distributed-tracing} Distributed tracing is a critical feature of Observability. A distributed trace, simply called a trace, maps the journey of a request through a system. The request will originate from an end user or application and proliferate throughout a system, typically resulting in a flow of actions between microservices. By recording this sequence, and allowing the subsequent events to be correlated, it allows an observability user or SRE to be able to diagnose issues in an application flow irrespective of how complex or serverless the architecture is. diff --git a/docs/use-cases/observability/build-your-own/managing-data.md b/docs/use-cases/observability/build-your-own/managing-data.md index e8c2fe6d6b1..840ac333880 100644 --- a/docs/use-cases/observability/build-your-own/managing-data.md +++ b/docs/use-cases/observability/build-your-own/managing-data.md @@ -10,7 +10,7 @@ import observability_14 from '@site/static/images/use-cases/observability/observ import Image from '@theme/IdealImage'; -# Managing Data +# Managing data Deployments of ClickHouse for Observability invariably involve large datasets, which need to be managed. ClickHouse offers a number of features to assist with data management. @@ -153,7 +153,7 @@ The above illustrates how data can be efficiently moved and manipulated by parti We explore both of these in detail below. -### Query Performance {#query-performance} +### Query performance {#query-performance} While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it. However, queries which need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts). The benefit of targeting a single partition will be even less pronounced to non-existent if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, users should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the data, e.g., partitioning by day, with most queries in the last day. See [here](https://medium.com/datadenys/using-partitions-in-clickhouse-3ea0decb89c4) for an example of this behavior. diff --git a/docs/use-cases/observability/build-your-own/schema-design.md b/docs/use-cases/observability/build-your-own/schema-design.md index 67e9cd1e381..bba6f764bfc 100644 --- a/docs/use-cases/observability/build-your-own/schema-design.md +++ b/docs/use-cases/observability/build-your-own/schema-design.md @@ -225,7 +225,7 @@ Materialized columns will, by default, not be returned in a `SELECT *`. This is [Materialized views](/materialized-views) provide a more powerful means of applying SQL filtering and transformations to logs and traces. -Materialized Views allow users to shift the cost of computation from query time to insert time. A ClickHouse Materialized View is just a trigger that runs a query on blocks of data as they are inserted into a table. The results of this query are inserted into a second "target" table. +Materialized Views allow users to shift the cost of computation from query time to insert time. A ClickHouse materialized view is just a trigger that runs a query on blocks of data as they are inserted into a table. The results of this query are inserted into a second "target" table. Materialized view @@ -490,7 +490,7 @@ We don't recommend using dots in Map column names and may deprecate its use. Use ::: -## Using Aliases {#using-aliases} +## Using aliases {#using-aliases} Querying map types is slower than querying normal columns - see ["Accelerating queries"](#accelerating-queries). In addition, it's more syntactically complicated and can be cumbersome for users to write. To address this latter issue we recommend using Alias columns. @@ -571,7 +571,7 @@ By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled by The [general Clickhouse best practices](/data-modeling/schema-design#optimizing-types) for optimizing types apply to the ClickHouse use case. -## Using Codecs {#using-codecs} +## Using codecs {#using-codecs} In addition to type optimizations, users can follow the [general best practices for codecs](/data-compression/compression-in-clickhouse#choosing-the-right-column-compression-codec) when attempting to optimize compression for ClickHouse Observability schemas. @@ -579,7 +579,7 @@ In general, users will find the `ZSTD` codec highly applicable to logging and tr Furthermore, timestamps, while benefiting from delta encoding with respect to compression, have been shown to cause slow query performance if this column is used in the primary/ordering key. We recommend users assess the respective compression vs. query performance tradeoffs. -## Using Dictionaries {#using-dictionaries} +## Using dictionaries {#using-dictionaries} [Dictionaries](/sql-reference/dictionaries) are a [key feature](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse) of ClickHouse providing in-memory [key-value](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) representation of data from various internal and external [sources](/sql-reference/dictionaries#dictionary-sources), optimized for super-low latency lookup queries. @@ -604,7 +604,7 @@ We recommend that users familiarize themselves with the basics of dictionaries. For simple enrichment examples see the guide on Dictionaries [here](/dictionary). Below, we focus on common observability enrichment tasks. -### Using IP Dictionaries {#using-ip-dictionaries} +### Using IP dictionaries {#using-ip-dictionaries} Geo-enriching logs and traces with latitude and longitude values using IP addresses is a common Observability requirement. We can achieve this using `ip_trie` structured dictionary. @@ -825,7 +825,7 @@ Users are likely to want the ip enrichment dictionary to be periodically updated The above countries and coordinates offer visualization capabilities beyond grouping and filtering by country. For inspiration see ["Visualizing geo data"](/observability/grafana#visualizing-geo-data). -### Using Regex Dictionaries (User Agent parsing) {#using-regex-dictionaries-user-agent-parsing} +### Using regex dictionaries (user agent parsing) {#using-regex-dictionaries-user-agent-parsing} The parsing of [user agent strings](https://en.wikipedia.org/wiki/User_agent) is a classical regular expression problem and a common requirement in log and trace based datasets. ClickHouse provides efficient parsing of user agents using Regular Expression Tree Dictionaries. @@ -1335,7 +1335,7 @@ The CTE here identifies the minimum and maximum timestamp for the trace id `ae92 This same approach can be applied for similar access patterns. We explore a similar example in Data Modeling [here](/materialized-view/incremental-materialized-view#lookup-table). -### Using Projections {#using-projections} +### Using projections {#using-projections} ClickHouse projections allow users to specify multiple `ORDER BY` clauses for a table. @@ -1444,7 +1444,7 @@ Peak memory usage: 27.85 MiB. In the above example, we specify the columns used in the earlier query in the projection. This will mean only these specified columns will be stored on disk as part of the projection, ordered by Status. If alternatively, we used `SELECT *` here, all columns would be stored. While this would allow more queries (using any subset of columns) to benefit from the projection, additional storage will be incurred. For measuring disk space and compression, see ["Measuring table size & compression"](#measuring-table-size--compression). -### Secondary/Data Skipping indices {#secondarydata-skipping-indices} +### Secondary/data skipping indices {#secondarydata-skipping-indices} No matter how well the primary key is tuned in ClickHouse, some queries will inevitably require full table scans. While this can be mitigated using Materialized views (and projections for some queries), these require additional maintenance and users to be aware of their availability in order to ensure they are exploited. While traditional relational databases solve this with secondary indexes, these are ineffective in column-oriented databases like ClickHouse. Instead, ClickHouse uses "Skip" indexes, which can significantly improve query performance by allowing the database to skip over large data chunks with no matching values. diff --git a/docs/use-cases/observability/index.md b/docs/use-cases/observability/index.md index 17c73a27d75..31b03f1448c 100644 --- a/docs/use-cases/observability/index.md +++ b/docs/use-cases/observability/index.md @@ -9,7 +9,7 @@ keywords: ['observability', 'logs', 'traces', 'metrics', 'OpenTelemetry', 'Grafa ClickHouse offers unmatched speed, scale, and cost-efficiency for observability. This guide provides two paths depending on your needs: -## ClickStack - The ClickHouse Observability Stack {#clickstack} +## ClickStack - the ClickHouse observability stack {#clickstack} The ClickHouse Observability Stack is our **recommended approach** for most users. @@ -28,7 +28,7 @@ The ClickHouse Observability Stack is our **recommended approach** for most user | [Production](/use-cases/observability/clickstack/production) | Best practices for production deployment | -## Build-Your-Own Stack {#build-your-own-stack} +## Build-your-own stack {#build-your-own-stack} For users with **custom requirements** — such as highly specialized ingestion pipelines, schema designs, or extreme scaling needs — we provide guidance to build a custom observability stack with ClickHouse as the core database. diff --git a/docs/use-cases/time-series/analysis-functions.md b/docs/use-cases/time-series/analysis-functions.md index 9312f1d4376..64329b30817 100644 --- a/docs/use-cases/time-series/analysis-functions.md +++ b/docs/use-cases/time-series/analysis-functions.md @@ -7,7 +7,7 @@ keywords: ['time-series'] show_related_blogs: true --- -# Time-Series analysis functions +# Time-series analysis functions Time series analysis in ClickHouse can be performed using standard SQL aggregation and window functions. When working with time series data, you'll typically encounter three main types of metrics: diff --git a/docs/use-cases/time-series/query-performance.md b/docs/use-cases/time-series/query-performance.md index f0029675764..84151cbf642 100644 --- a/docs/use-cases/time-series/query-performance.md +++ b/docs/use-cases/time-series/query-performance.md @@ -7,7 +7,7 @@ keywords: ['time-series'] show_related_blogs: true --- -# Time-Series query performance +# Time-series query performance After optimizing storage, the next step is improving query performance. This section explores two key techniques: optimizing `ORDER BY` keys and using materialized views. diff --git a/docs/use-cases/time-series/storage-efficiency.md b/docs/use-cases/time-series/storage-efficiency.md index 6f3c76c1ed9..27cc735c7aa 100644 --- a/docs/use-cases/time-series/storage-efficiency.md +++ b/docs/use-cases/time-series/storage-efficiency.md @@ -7,7 +7,7 @@ keywords: ['time-series'] show_related_blogs: true --- -# Time-Series storage efficiency +# Time-series storage efficiency After exploring how to query our Wikipedia statistics dataset, let's focus on optimizing its storage efficiency in ClickHouse. This section demonstrates practical techniques to reduce storage requirements while maintaining query performance. diff --git a/docs/whats-new/changelog/2021.md b/docs/whats-new/changelog/2021.md index f107acb1791..391f8e08d1f 100644 --- a/docs/whats-new/changelog/2021.md +++ b/docs/whats-new/changelog/2021.md @@ -989,7 +989,7 @@ description: 'Changelog for 2021' * Fix limit/offset settings for distributed queries (ignore on the remote nodes). [#24940](https://github.com/ClickHouse/ClickHouse/pull/24940) ([Azat Khuzhin](https://github.com/azat)). * Fix possible heap-buffer-overflow in `Arrow` format. [#24922](https://github.com/ClickHouse/ClickHouse/pull/24922) ([Kruglov Pavel](https://github.com/Avogar)). * Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3 (S3 virtual filesystem is an experimental feature under development that should not be used in production). [#24885](https://github.com/ClickHouse/ClickHouse/pull/24885) ([Pavel Kovalenko](https://github.com/Jokser)). -* Fix "Missing columns" exception when joining Distributed Materialized View. [#24870](https://github.com/ClickHouse/ClickHouse/pull/24870) ([Azat Khuzhin](https://github.com/azat)). +* Fix "Missing columns" exception when joining distributed materialized view. [#24870](https://github.com/ClickHouse/ClickHouse/pull/24870) ([Azat Khuzhin](https://github.com/azat)). * Allow `NULL` values in postgresql compatibility protocol. Closes [#22622](https://github.com/ClickHouse/ClickHouse/issues/22622). [#24857](https://github.com/ClickHouse/ClickHouse/pull/24857) ([Kseniia Sumarokova](https://github.com/kssenii)). * Fix bug when exception `Mutation was killed` can be thrown to the client on mutation wait when mutation not loaded into memory yet. [#24809](https://github.com/ClickHouse/ClickHouse/pull/24809) ([alesapin](https://github.com/alesapin)). * Fixed bug in deserialization of random generator state with might cause some data types such as `AggregateFunction(groupArraySample(N), T))` to behave in a non-deterministic way. [#24538](https://github.com/ClickHouse/ClickHouse/pull/24538) ([tavplubix](https://github.com/tavplubix)). @@ -1000,7 +1000,7 @@ description: 'Changelog for 2021' * When user authentication is managed by LDAP. Fixed potential deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. [#24431](https://github.com/ClickHouse/ClickHouse/pull/24431) ([Denis Glazachev](https://github.com/traceon)). * In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes [#23905](https://github.com/ClickHouse/ClickHouse/issues/23905). [#24399](https://github.com/ClickHouse/ClickHouse/pull/24399) ([Ivan](https://github.com/abyss7)). * Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. [#24321](https://github.com/ClickHouse/ClickHouse/pull/24321) ([Amos Bird](https://github.com/amosbird)). -* Fixed a bug in moving Materialized View from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with Materialized View. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). +* Fixed a bug in moving materialized view from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with materialized view. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). * Allow empty HTTP headers. Fixes [#23901](https://github.com/ClickHouse/ClickHouse/issues/23901). [#24285](https://github.com/ClickHouse/ClickHouse/pull/24285) ([Ivan](https://github.com/abyss7)). * Correct processing of mutations (ALTER UPDATE/DELETE) in Memory tables. Closes [#24274](https://github.com/ClickHouse/ClickHouse/issues/24274). [#24275](https://github.com/ClickHouse/ClickHouse/pull/24275) ([flynn](https://github.com/ucasfl)). * Make column LowCardinality property in JOIN output the same as in the input, close [#23351](https://github.com/ClickHouse/ClickHouse/issues/23351), close [#20315](https://github.com/ClickHouse/ClickHouse/issues/20315). [#24061](https://github.com/ClickHouse/ClickHouse/pull/24061) ([Vladimir](https://github.com/vdimir)). @@ -1109,7 +1109,7 @@ description: 'Changelog for 2021' * Fixed the behavior when query `SYSTEM RESTART REPLICA` or `SYSTEM SYNC REPLICA` is being processed infinitely. This was detected on server with extremely little amount of RAM. [#24457](https://github.com/ClickHouse/ClickHouse/pull/24457) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Fix incorrect monotonicity of `toWeek` function. This fixes [#24422](https://github.com/ClickHouse/ClickHouse/issues/24422) . This bug was introduced in [#5212](https://github.com/ClickHouse/ClickHouse/pull/5212), and was exposed later by smarter partition pruner. [#24446](https://github.com/ClickHouse/ClickHouse/pull/24446) ([Amos Bird](https://github.com/amosbird)). * Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. [#24321](https://github.com/ClickHouse/ClickHouse/pull/24321) ([Amos Bird](https://github.com/amosbird)). -* Fixed a bug in moving Materialized View from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with Materialized View. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). +* Fixed a bug in moving materialized view from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with materialized view. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). * Allow empty HTTP headers in client requests. Fixes [#23901](https://github.com/ClickHouse/ClickHouse/issues/23901). [#24285](https://github.com/ClickHouse/ClickHouse/pull/24285) ([Ivan](https://github.com/abyss7)). * Set `max_threads = 1` to fix mutation fail of `Memory` tables. Closes [#24274](https://github.com/ClickHouse/ClickHouse/issues/24274). [#24275](https://github.com/ClickHouse/ClickHouse/pull/24275) ([flynn](https://github.com/ucasFL)). * Fix typo in implementation of `Memory` tables, this bug was introduced at [#15127](https://github.com/ClickHouse/ClickHouse/issues/15127). Closes [#24192](https://github.com/ClickHouse/ClickHouse/issues/24192). [#24193](https://github.com/ClickHouse/ClickHouse/pull/24193) ([张中南](https://github.com/plugine)). @@ -1245,7 +1245,7 @@ description: 'Changelog for 2021' * Correct aliases handling if subquery was optimized to constant. Fixes [#22924](https://github.com/ClickHouse/ClickHouse/issues/22924). Fixes [#10401](https://github.com/ClickHouse/ClickHouse/issues/10401). [#23191](https://github.com/ClickHouse/ClickHouse/pull/23191) ([Maksim Kita](https://github.com/kitaisreal)). * Server might fail to start if `data_type_default_nullable` setting is enabled in default profile, it's fixed. Fixes [#22573](https://github.com/ClickHouse/ClickHouse/issues/22573). [#23185](https://github.com/ClickHouse/ClickHouse/pull/23185) ([tavplubix](https://github.com/tavplubix)). * Fixed a crash on shutdown which happened because of wrong accounting of current connections. [#23154](https://github.com/ClickHouse/ClickHouse/pull/23154) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fixed `Table .inner_id... doesn't exist` error when selecting from Materialized View after detaching it from Atomic database and attaching back. [#23047](https://github.com/ClickHouse/ClickHouse/pull/23047) ([tavplubix](https://github.com/tavplubix)). +* Fixed `Table .inner_id... doesn't exist` error when selecting from materialized view after detaching it from Atomic database and attaching back. [#23047](https://github.com/ClickHouse/ClickHouse/pull/23047) ([tavplubix](https://github.com/tavplubix)). * Fix error `Cannot find column in ActionsDAG result` which may happen if subquery uses `untuple`. Fixes [#22290](https://github.com/ClickHouse/ClickHouse/issues/22290). [#22991](https://github.com/ClickHouse/ClickHouse/pull/22991) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix usage of constant columns of type `Map` with nullable values. [#22939](https://github.com/ClickHouse/ClickHouse/pull/22939) ([Anton Popov](https://github.com/CurtizJ)). * fixed `formatDateTime()` on `DateTime64` and "%C" format specifier fixed `toDateTime64()` for large values and non-zero scale. [#22937](https://github.com/ClickHouse/ClickHouse/pull/22937) ([Vasily Nemkov](https://github.com/Enmk)). @@ -1764,7 +1764,7 @@ description: 'Changelog for 2021' * Uninitialized memory read was possible in encrypt/decrypt functions if empty string was passed as IV. This closes [#19391](https://github.com/ClickHouse/ClickHouse/issues/19391). [#19397](https://github.com/ClickHouse/ClickHouse/pull/19397) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix possible buffer overflow in Uber H3 library. See https://github.com/uber/h3/issues/392. This closes [#19219](https://github.com/ClickHouse/ClickHouse/issues/19219). [#19383](https://github.com/ClickHouse/ClickHouse/pull/19383) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix system.parts _state column (LOGICAL_ERROR when querying this column, due to incorrect order). [#19346](https://github.com/ClickHouse/ClickHouse/pull/19346) ([Azat Khuzhin](https://github.com/azat)). -* Fixed possible wrong result or segfault on aggregation when Materialized View and its target table have different structure. Fixes [#18063](https://github.com/ClickHouse/ClickHouse/issues/18063). [#19322](https://github.com/ClickHouse/ClickHouse/pull/19322) ([tavplubix](https://github.com/tavplubix)). +* Fixed possible wrong result or segfault on aggregation when materialized view and its target table have different structure. Fixes [#18063](https://github.com/ClickHouse/ClickHouse/issues/18063). [#19322](https://github.com/ClickHouse/ClickHouse/pull/19322) ([tavplubix](https://github.com/tavplubix)). * Fix error `Cannot convert column now64() because it is constant but values of constants are different in source and result`. Continuation of [#7156](https://github.com/ClickHouse/ClickHouse/issues/7156). [#19316](https://github.com/ClickHouse/ClickHouse/pull/19316) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix bug when concurrent `ALTER` and `DROP` queries may hang while processing ReplicatedMergeTree table. [#19237](https://github.com/ClickHouse/ClickHouse/pull/19237) ([alesapin](https://github.com/alesapin)). * Fixed `There is no checkpoint` error when inserting data through http interface using `Template` or `CustomSeparated` format. Fixes [#19021](https://github.com/ClickHouse/ClickHouse/issues/19021). [#19072](https://github.com/ClickHouse/ClickHouse/pull/19072) ([tavplubix](https://github.com/tavplubix)). @@ -1803,7 +1803,7 @@ description: 'Changelog for 2021' * Add functions `countMatches`/`countMatchesCaseInsensitive`. [#17459](https://github.com/ClickHouse/ClickHouse/pull/17459) ([Azat Khuzhin](https://github.com/azat)). * Implement `countSubstrings()`/`countSubstringsCaseInsensitive()`/`countSubstringsCaseInsensitiveUTF8()` (Count the number of substring occurrences). [#17347](https://github.com/ClickHouse/ClickHouse/pull/17347) ([Azat Khuzhin](https://github.com/azat)). * Add information about used databases, tables and columns in system.query_log. Add `query_kind` and `normalized_query_hash` fields. [#17726](https://github.com/ClickHouse/ClickHouse/pull/17726) ([Amos Bird](https://github.com/amosbird)). -* Add a setting `optimize_on_insert`. When enabled, do the same transformation for INSERTed block of data as if merge was done on this block (e.g. Replacing, Collapsing, Aggregating...). This setting is enabled by default. This can influence Materialized View and MaterializeMySQL behaviour (see detailed description). This closes [#10683](https://github.com/ClickHouse/ClickHouse/issues/10683). [#16954](https://github.com/ClickHouse/ClickHouse/pull/16954) ([Kruglov Pavel](https://github.com/Avogar)). +* Add a setting `optimize_on_insert`. When enabled, do the same transformation for INSERTed block of data as if merge was done on this block (e.g. Replacing, Collapsing, Aggregating...). This setting is enabled by default. This can influence materialized view and MaterializeMySQL behaviour (see detailed description). This closes [#10683](https://github.com/ClickHouse/ClickHouse/issues/10683). [#16954](https://github.com/ClickHouse/ClickHouse/pull/16954) ([Kruglov Pavel](https://github.com/Avogar)). * Kerberos Authenticaiton for HDFS. [#16621](https://github.com/ClickHouse/ClickHouse/pull/16621) ([Ilya Golshtein](https://github.com/ilejn)). * Support `SHOW SETTINGS` statement to show parameters in system.settings. `SHOW CHANGED SETTINGS` and `LIKE/ILIKE` clause are also supported. [#18056](https://github.com/ClickHouse/ClickHouse/pull/18056) ([Jianmei Zhang](https://github.com/zhangjmruc)). * Function `position` now supports `POSITION(needle IN haystack)` synax for SQL compatibility. This closes [#18701](https://github.com/ClickHouse/ClickHouse/issues/18701). ... [#18779](https://github.com/ClickHouse/ClickHouse/pull/18779) ([Jianmei Zhang](https://github.com/zhangjmruc)). diff --git a/docs/whats-new/changelog/2022.md b/docs/whats-new/changelog/2022.md index 1b9fc200cc1..4d7dcf4d498 100644 --- a/docs/whats-new/changelog/2022.md +++ b/docs/whats-new/changelog/2022.md @@ -95,7 +95,7 @@ Refer to this issue on GitHub for more details: https://github.com/ClickHouse/Cl * Fix functions `arrayFirstOrNull` and `arrayLastOrNull` or null when the array contains `Nullable` elements. [#43274](https://github.com/ClickHouse/ClickHouse/pull/43274) ([Duc Canh Le](https://github.com/canhld94)). * Fix incorrect `UserTimeMicroseconds`/`SystemTimeMicroseconds` accounting related to Kafka tables. [#42791](https://github.com/ClickHouse/ClickHouse/pull/42791) ([Azat Khuzhin](https://github.com/azat)). * Do not suppress exceptions in `web` disks. Fix retries for the `web` disk. [#42800](https://github.com/ClickHouse/ClickHouse/pull/42800) ([Azat Khuzhin](https://github.com/azat)). -* Fixed (logical) race condition between inserts and dropping materialized views. A race condition happened when a Materialized View was dropped at the same time as an INSERT, where the MVs were present as a dependency of the insert at the begining of the execution, but the table has been dropped by the time the insert chain tries to access it, producing either an `UNKNOWN_TABLE` or `TABLE_IS_DROPPED` exception, and stopping the insertion. After this change, we avoid these exceptions and just continue with the insert if the dependency is gone. [#43161](https://github.com/ClickHouse/ClickHouse/pull/43161) ([AlfVII](https://github.com/AlfVII)). +* Fixed (logical) race condition between inserts and dropping materialized views. A race condition happened when a materialized view was dropped at the same time as an INSERT, where the MVs were present as a dependency of the insert at the begining of the execution, but the table has been dropped by the time the insert chain tries to access it, producing either an `UNKNOWN_TABLE` or `TABLE_IS_DROPPED` exception, and stopping the insertion. After this change, we avoid these exceptions and just continue with the insert if the dependency is gone. [#43161](https://github.com/ClickHouse/ClickHouse/pull/43161) ([AlfVII](https://github.com/AlfVII)). * Fix undefined behavior in the `quantiles` function, which might lead to uninitialized memory. Found by fuzzer. This closes [#44066](https://github.com/ClickHouse/ClickHouse/issues/44066). [#44067](https://github.com/ClickHouse/ClickHouse/pull/44067) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Additional check on zero uncompressed size is added to `CompressionCodecDelta`. [#43255](https://github.com/ClickHouse/ClickHouse/pull/43255) ([Nikita Taranov](https://github.com/nickitat)). * Flatten arrays from Parquet to avoid an issue with inconsistent data in arrays. These incorrect files can be generated by Apache Iceberg. [#43297](https://github.com/ClickHouse/ClickHouse/pull/43297) ([Arthur Passos](https://github.com/arthurpassos)). @@ -1789,7 +1789,7 @@ Refer to this issue on GitHub for more details: https://github.com/ClickHouse/Cl * Out of band `offset` and `limit` settings may be applied incorrectly for views. Close [#33289](https://github.com/ClickHouse/ClickHouse/issues/33289) [#33518](https://github.com/ClickHouse/ClickHouse/pull/33518) ([hexiaoting](https://github.com/hexiaoting)). * Fix an exception `Block structure mismatch` which may happen during insertion into table with default nested `LowCardinality` column. Fixes [#33028](https://github.com/ClickHouse/ClickHouse/issues/33028). [#33504](https://github.com/ClickHouse/ClickHouse/pull/33504) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix dictionary expressions for `range_hashed` range min and range max attributes when created using DDL. Closes [#30809](https://github.com/ClickHouse/ClickHouse/issues/30809). [#33478](https://github.com/ClickHouse/ClickHouse/pull/33478) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix possible use-after-free for INSERT into Materialized View with concurrent DROP ([Azat Khuzhin](https://github.com/azat)). +* Fix possible use-after-free for INSERT into materialized view with concurrent DROP ([Azat Khuzhin](https://github.com/azat)). * Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires `index_granularity_bytes=0` (i.e. turn off adaptive index granularity). [#33372](https://github.com/ClickHouse/ClickHouse/pull/33372) ([Azat Khuzhin](https://github.com/azat)). * The commands `SYSTEM SUSPEND` and `SYSTEM ... THREAD FUZZER` missed access control. It is fixed. Author: Kevin Michel. [#33333](https://github.com/ClickHouse/ClickHouse/pull/33333) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix when `COMMENT` for dictionaries does not appear in `system.tables`, `system.dictionaries`. Allow to modify the comment for `Dictionary` engine. Closes [#33251](https://github.com/ClickHouse/ClickHouse/issues/33251). [#33261](https://github.com/ClickHouse/ClickHouse/pull/33261) ([Maksim Kita](https://github.com/kitaisreal)). diff --git a/docs/whats-new/changelog/2024.md b/docs/whats-new/changelog/2024.md index d74962ebe45..1f32611026e 100644 --- a/docs/whats-new/changelog/2024.md +++ b/docs/whats-new/changelog/2024.md @@ -1239,7 +1239,7 @@ description: 'Changelog for 2024' * Fix analyzer: only interpolate expression should be used for DAG [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). * Fix azure backup writing multipart blocks by 1 MiB (read buffer size) instead of `max_upload_part_size` (in non-native copy case) [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)). * Correctly fallback during backup copy [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)). -* Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). +* Prevent LOGICAL_ERROR on CREATE TABLE as materialized view [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). * Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)). * Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)). * Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). @@ -1588,7 +1588,7 @@ description: 'Changelog for 2024' * Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with `Not enough privileges`. To address this problem, the release introduces a new feature of SQL security for views https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). #### New Feature {#new-feature-10} -* Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). +* Added new syntax which allows to specify definer user in view/materialized view. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). * Try to detect file format automatically during schema inference if it's unknown in `file/s3/hdfs/url/azureBlobStorage` engines. Closes [#50576](https://github.com/ClickHouse/ClickHouse/issues/50576). [#59092](https://github.com/ClickHouse/ClickHouse/pull/59092) ([Kruglov Pavel](https://github.com/Avogar)). * Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. [#58486](https://github.com/ClickHouse/ClickHouse/pull/58486) ([Julia Kartseva](https://github.com/jkartseva)). * Allow to set up a quota for maximum sequential login failures. [#54737](https://github.com/ClickHouse/ClickHouse/pull/54737) ([Alexey Gerasimchuck](https://github.com/Demilivor)). diff --git a/docs/whats-new/roadmap.md b/docs/whats-new/roadmap.md index d30f4c68a49..93f7d5313dd 100644 --- a/docs/whats-new/roadmap.md +++ b/docs/whats-new/roadmap.md @@ -5,13 +5,13 @@ sidebar_position: 50 description: 'Present and past ClickHouse road maps' --- -## Current Roadmap {#current-roadmap} +## Current roadmap {#current-roadmap} The current roadmap is published for open discussion: - [2025](https://github.com/ClickHouse/ClickHouse/issues/74046) -## Previous Roadmaps {#previous-roadmaps} +## Previous roadmaps {#previous-roadmaps} - [2024](https://github.com/ClickHouse/ClickHouse/issues/58392) - [2023](https://github.com/ClickHouse/ClickHouse/issues/44767) diff --git a/docs/whats-new/security-changelog.md b/docs/whats-new/security-changelog.md index 1c79dca658c..2f02469a7e5 100644 --- a/docs/whats-new/security-changelog.md +++ b/docs/whats-new/security-changelog.md @@ -1,12 +1,12 @@ --- slug: /whats-new/security-changelog sidebar_position: 20 -sidebar_label: 'Security Changelog' -title: 'Security Changelog' +sidebar_label: 'Security changelog' +title: 'Security changelog' description: 'Security changelog detailing security related updates and changes' --- -# Security Changelog +# Security changelog ## Fixed in ClickHouse v25.1.5.5, 2025-01-05 {#fixed-in-clickhouse-release-2025-01-05} diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md b/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md index 59e5bac14db..058cd1c3c64 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md @@ -42,7 +42,9 @@ var result = query("SELECT version()", "CSV"); console.log(result); // 23.10.1.1 ``` + ### Session.Query(query, *format) {#sessionqueryquery-format} + ```javascript const sess = new Session('./chdb-bun-tmp'); diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx index 02b9c0a8828..e1afdf787f2 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx @@ -26,8 +26,6 @@ import client_details from '@site/static/images/_snippets/client_details.png'; import new_rows_from_csv from '@site/static/images/_snippets/new_rows_from_csv.png'; import SQLConsoleDetail from '@site/i18n/jp/docusaurus-plugin-content-docs/current/_snippets/_launch_sql_console.md'; -```md - # ClickHouse Cloud クイックスタート ClickHouse を始める最も迅速で簡単な方法は、[ClickHouse Cloud](https://console.clickhouse.cloud) に新しいサービスを作成することです。 diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md index 41113920eed..b9c1b2dd71a 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md @@ -8,8 +8,6 @@ - 'GCP' --- - - このセクションでは、マーケットプレイスに関連する請求トピックについて詳しく説明します。 | ページ | 説明 | diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md index 1dc21f9d681..a266bd73f75 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md @@ -146,7 +146,7 @@ ClickHouse Cloudの安定した使用を確保し、ベストプラクティス [Golang](https://github.com/ClickHouse/clickhouse-go/releases/tag/v2.30.1)、[Python](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.8.11)、および[NodeJS](https://github.com/ClickHouse/clickhouse-js/releases/tag/1.10.1)クライアントが、Dynamic、Variant、およびJSONタイプリクエストをサポートしました。 -### DBT support for Refreshable Materialized Views {#dbt-support-for-refreshable-materialized-views} +### DBT support for refreshable materialized views {#dbt-support-for-refreshable-materialized-views} DBTは、`1.8.7`リリースで[リフレッシュ可能なマテリアライズドビュー](https://github.com/ClickHouse/dbt-clickhouse/releases/tag/v1.8.7)をサポートしています。 @@ -297,7 +297,7 @@ ClickHouse Cloudは、いくつかの請求およびスケーリングイベン 多要素認証を使用している顧客は、電話を失ったりトークンを誤って削除した場合に使用できる回復コードを取得できるようになりました。初めてMFAに登録する顧客には、設定時にコードが提供されます。既存のMFAを持っている顧客は、既存のMFAトークンを削除し新しいトークンを追加することで回復コードを取得できます。 -### ClickPipes Update: Custom Certificates, Latency Insights, and More! {#clickpipes-update-custom-certificates-latency-insights-and-more} +### ClickPipes update: custom certificates, latency insights, and more! {#clickpipes-update-custom-certificates-latency-insights-and-more} ClickPipes、データをClickHouseサービスに取り込むための最も簡単な方法に関する最新の更新情報をお知らせできることを嬉しく思います!これらの新機能は、データ取り込みの制御を強化し、パフォーマンスメトリクスへの可視化を提供することを目的としています。 diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md index 1ea5b38d0dd..6e72e55f9b9 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md @@ -39,7 +39,7 @@ ClickHouseが収集する個人データやその使用方法については、C 注意: `OrgID`を含むURLは、特定のアカウントの`OrgID`を反映するように更新する必要があります。 -### Current Customers {#current-customers} +### Current customers {#current-customers} 弊社とアカウントをお持ちで、セルフサービスオプションで個人データの問題が解決しない場合、プライバシーポリシーに基づきデータ主体アクセス要求を提出できます。そのためには、ClickHouseアカウントにログインし、[サポートケース](https://console.clickhouse.cloud/support)を開いてください。これにより、あなたの身元を確認し、リクエストに対応するプロセスをスムーズに進めることができます。 diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md b/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md index 689171ef683..9072126e4e5 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md @@ -52,6 +52,7 @@ CREATE TABLE uk.uk_price_paid_simple_dist ON CLUSTER test_cluster price UInt32 ) ENGINE = Distributed('test_cluster', 'uk', 'uk_price_paid_simple', rand()) +``` `ON CLUSTER` 句により、DDL ステートメントは [分散 DDL ステートメント](/sql-reference/distributed-ddl) となり、ClickHouse に `test_cluster` [クラスター定義](/architecture/horizontal-scaling#replication-and-sharding-configuration) にリストされているすべてのサーバーでテーブルを作成するよう指示します。分散 DDL には、[クラスターアーキテクチャ](/architecture/horizontal-scaling#architecture-diagram) において追加の [Keeper](https://clickhouse.com/clickhouse/keeper) コンポーネントが必要です。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md b/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md index 567c07d020c..0d0e7992055 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md @@ -201,7 +201,7 @@ Users can schedule upgrades for their services. This feature is supported for En [Golang](https://github.com/ClickHouse/clickhouse-go/releases/tag/v2.30.1), [Python](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.8.11), and [NodeJS](https://github.com/ClickHouse/clickhouse-js/releases/tag/1.10.1) clients added support for Dynamic, Variant, and JSON types. -### DBT support for Refreshable Materialized Views {#dbt-support-for-refreshable-materialized-views} +### DBT support for refreshable materialized views {#dbt-support-for-refreshable-materialized-views} DBT now [supports Refreshable Materialized Views](https://github.com/ClickHouse/dbt-clickhouse/releases/tag/v1.8.7) in the `1.8.7` release. @@ -360,7 +360,7 @@ Compute-compute separation allows you to designate specific services as read-wri Customers using multi-factor authentication can now obtain recovery codes that can be used in the event of a lost phone or accidentally deleted token. Customers enrolling in MFA for the first time will be provided the code on set up. Customers with existing MFA can obtain a recovery code by removing their existing MFA token and adding a new one. -### ClickPipes Update: Custom Certificates, Latency Insights, and More! {#clickpipes-update-custom-certificates-latency-insights-and-more} +### ClickPipes update: custom certificates, latency insights, and more! {#clickpipes-update-custom-certificates-latency-insights-and-more} We're excited to share the latest updates for ClickPipes, the easiest way to ingest data into your ClickHouse service! These new features are designed to enhance your control over data ingestion and provide greater visibility into performance metrics. @@ -414,7 +414,7 @@ ClickPipes is the easiest way to ingest data into ClickHouse Cloud. We're happy ## July 18, 2024 {#july-18-2024} -### Prometheus Endpoint for Metrics is now Generally Available {#prometheus-endpoint-for-metrics-is-now-generally-available} +### Prometheus endpoint for metrics is now generally available {#prometheus-endpoint-for-metrics-is-now-generally-available} In our last cloud changelog, we announced the Private Preview for exporting [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud. This feature allows you to use the [ClickHouse Cloud API](/cloud/manage/api/api-overview) to get your metrics into tools like [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. We're happy to announce that this feature is now **Generally Available**. Please see [our docs](/integrations/prometheus) to learn more about this feature. @@ -449,13 +449,13 @@ This release also includes support for subscriptions via the [Microsoft Azure Ma If you'd like any specific region to be supported, please [contact us](https://clickhouse.com/support/program). -### Query Log Insights {#query-log-insights} +### Query log insights {#query-log-insights} Our new Query Insights UI in the Cloud Console makes ClickHouse's built-in query log a lot easier to use. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. There's just one caveat: with 70+ fields and multiple records per query, interpreting the query log represents a steep learning curve. This initial version of query insights provides a blueprint for future work to simplify query debugging and optimization patterns. We'd love to hear your feedback as we continue to iterate on this feature, so please reach out—your input will be greatly appreciated! ClickHouse Cloud Query Insights UI showing query performance metrics and analysis -### Prometheus Endpoint for Metrics (Private Preview) {#prometheus-endpoint-for-metrics-private-preview} +### Prometheus endpoint for metrics (private preview) {#prometheus-endpoint-for-metrics-private-preview} Perhaps one of our most requested features: you can now export [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud to [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. Prometheus provides an open-source solution to monitor ClickHouse and set up custom alerts. Access to Prometheus metrics for your ClickHouse Cloud service is available via the [ClickHouse Cloud API](/integrations/prometheus). This feature is currently in Private Preview. Please reach out to the [support team](https://clickhouse.com/support/program) to enable this feature for your organization. diff --git a/styles/ClickHouse/Headings.yml b/styles/ClickHouse/Headings.yml index fbc5c83da1b..df83f9a0cef 100644 --- a/styles/ClickHouse/Headings.yml +++ b/styles/ClickHouse/Headings.yml @@ -7,38 +7,464 @@ match: $sentence indicators: - ":" exceptions: - - ClickHouse - - Cloud + - "^[a-z]+[A-Z][a-zA-Z]*$" # Matches camelCase pattern + - "^[A-Z]+$" # Matches words that are entirely uppercase + - "^[A-Z][a-zA-Z]*[A-Z][a-zA-Z]*$" # Matches PascalCase system metrics + - "^[A-Z][a-zA-Z]*[A-Z][a-zA-Z]*_.*$" # Matches system metrics with underscores + - "^[a-z][a-z_]*[a-z]$" # Matches configuration parameter names with underscores + - API + - ANN + - APPEND + - DELETE FROM + - ALTER DELETE + - DROP PARTITION + - TRUNCATE + - Arrow + - AWS + - AWS + - Apache + - AggregatingMergeTree + - Amazon + - Amazon Web Services - Azure + - Azure Blob Storage + - B2B + - BigQuery + - Bring Your Own Cloud + - Bun + - BYOC + - BYOC + - CPU + - CSV + - CDC + - CMEK - CLI + - ClickHouse + - ClickHouse Cloud + - ClickHouse Keeper + - ClickPipe + - ClickPipes + - ClickPipes Connector + - ClickStack + - ClickStack + - CloudFormation - Cosmos + - Customer Managed Encryption Keys + - DBT + - DBT + - DDL + - DNS + - Docker - Docker + - Docker Compose + - DigitalOcean Spaces + - Duo SAML + - EDOT + - Elastic Agent - Emmet - - gRPC + - FAQ + - Filebeat + - Frequently Asked Questions + - GA + - GCS + - GET + - GitHub + - Go + - Google + - Google Cloud + - Google Cloud Platform + - Google Cloud Storage + - HTTP + - Helm + - HyperDX + - HyperDX - I + - IdP + - JWT + - Java + - JSON + - JSON + - JSON format settings + - JSON settings + - Kafka + - Kafka + - Kafka Connect + - KMS - Kubernetes + - LZ4 - Linux - - macOS - - Marketplace + - MERGETREE + - Microsoft + - Middle East - MongoDB + - NPM + - NodeJS + - NodeJs + - OLAP + - OLTP + - ORC + - OSS + - OTel + - Okta + - OpenTelemetry + - Pandas + - Parquet + - PlanetScale + - Postgres + - PostgreSQL + - PSC + - Private Link + - Private Service Connect + - PrivateLink + - Prometheus + - Python + - RBAC - REPL + - REPLACE + - REST + - Role-Based Access Control + - SAML + - SOC + - SQL + - SQL + - SSO + - S3 + - SaaS + - SDK - Studio + - TCP + - TDE + - TTL + - TiDB + - Time To Live + - Transparent Data Encryption - TypeScript + - U.S. + - UDF + - UI - URLs - - Visual + - UAE + - User Defined Functions + - VLDB - VS + - VPC + - Vitess + - Visual - Windows - - JSON + - Yandex.Metrica + - ZSTD + - chDB + - ch_bucket_us_east1 + - ch_bucket_us_east4 + - gRPC + - macOS - MergeTree + - MySQL - ReplacingMergeTree + - SLA + - SLAs + - Beta + - Preview + - Private Preview + - CDC + - DB + - URL + - Build + - Testing + - Packaging + - Tier + - Tiers + - Overview + - Console + - Endpoints + - Backups + - Thresholds + - Keys + - Routing + - Cloud + - Change Data Capture + - PostLinks + - PostHistory + - DateTime + - Stack Overflow + - Homebrew + - WSL + - London + - LowCardinality + - Query 2. Average price per year in London + - V24.5 changelog for Cloud + - V24.6 changelog for Cloud + - SELECT + - PARTITION BY + - Common Table Expressions + - FixedString(N) + - AvgMerge + - AvgMergeState + - AvgState + - CountResample + - Avgmap + - Avgmergestate + - Avgstate + - Countresample + - Grouparrayresample + - Grouparraydistinct + - Maxmap + - Minsimplestate + - Minmap + - Sumarray + - Summap + - Uniqarray + - Sumsimplestate + - CityHash + - FixedString(N) + - Avg merge + - Avg merge state + - Avg state + - Count resample + - avgMerge + - avgMergeState + - avgState + - countResample + - PREWHERE + - Splunk + - Retool + - Airbyte + - RawBLOB + - RowBinary + - MessagePack + - Protocol Buffers + - Cap'n Proto + - TabSeparated + - TabSeparatedRaw + - TabSeparatedWithNames + - TabSeparatedWithNamesAndTypes + - TabSeparatedRawWithNames + - TabSeparatedRawWithNamesAndTypes + - CustomSeparated + - CustomSeparatedWithNames + - CustomSeparatedWithNamesAndTypes + - BSONEachRow + - PrettyNoEscapes + - PrettyMonoBlock + - PrettyNoEscapesMonoBlock + - PrettyCompact + - PrettyCompactNoEscapes + - PrettyCompactMonoBlock + - PrettyCompactNoEscapesMonoBlock + - PrettySpace + - PrettySpaceNoEscapes + - PrettySpaceMonoBlock + - PrettySpaceNoEscapesMonoBlock + - CapnProto + - ProtobufSingle + - ProtobufList + - AvroConfluent + - MsgPack + - predefined_query_handler + - dynamic_query_handler + - static + - clickhouse-local + - agx + - ch-ui + - HouseOps + - LightHouse + - qryn + - clickhouse-cli + - clickhouse-flamegraph + - clickhouse-plantuml + - xeus-clickhouse + - ClickCat + - ClickHouse-Mate + - clickhouse-monitoring + - CKibana + - SeekTable + - chproxy + - KittenHouse + - ClickHouse-Bulk + - input_format_json_try_infer_numbers_from_strings + - input_format_json_try_infer_named_tuples_from_objects + - input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects + - input_format_json_read_objects_as_strings + - input_format_json_read_numbers_as_strings + - input_format_json_read_bools_as_numbers + - input_format_json_read_bools_as_strings + - input_format_json_read_arrays_as_strings + - input_format_json_infer_incomplete_types_as_strings + - input_format_csv_try_infer_numbers_from_strings + - input_format_max_rows_to_read_for_schema_inference + - input_format_max_bytes_to_read_for_schema_inference + - column_names_for_schema_inference + - schema_inference_hints + - schema_inference_make_columns_nullable + - input_format_try_infer_integers + - input_format_try_infer_datetimes + - input_format_try_infer_datetimes_only_datetime64 + - input_format_try_infer_dates + - input_format_try_infer_exponent_floats + - TSV + - TSKV + - format_template_row + - format_template_rows_between_delimiter + - format_template_resultset + - DataLens + - Avro + - Yandex + - Holistics + - ClickHouse Schema Flow Visualizer + - Holistics Software + - UPDATEs + - DELETEs + - INSERTs + - MERGEs + - DBT + - insert_overwrite + - Vector + - Nginx + - Dataflow + - QueryResult + - AsyncClient + - QueryContexts + - StreamContexts + - Easypanel + - Date32 + - Node.js + - EventHubs + - Kafka Connect Sink + - String + - Map + - Nested + - flatten_nested + - Tableau Online + - SSL/TLS + - Delta Lake + - Embeddable + - Draxlr + - Rust + - Rockset + - Astrato + - Chartbrew + - Deepnote + - Zing Data + - Mitzu + - Tableau + - DataGrip + - Grafana + - Datadog + - Splunk Enterprise + - Explo + - Hashboard + - Luzmo + - QuickSight + - Metabase + - Superset + - clickhouse-static-files-disk-uploader + - OpenTelemetry + - Kubernetes + - Data Lake + - Unity + - Databricks + - Read Delta + - Google Cloud Run + - ClickStack + - Build-Your-Own + - Time-Series + - New Features + - Bug Fixes + - Bug Fix + - Backward Incompatible Changes + - Backward Incompatible Change + - Build Improvements + - Main Changes + - New Feature + - Experimental Feature + - Security Fix + - Performance Improvements + - Performance Improvement + - Documentation Updates + - Documentation Update + - Improved Workflow for Developing and Assembling ClickHouse + - Please Note When Upgrading + - Complete List of Changes + - Minor Changes + - ClickHouse Release + - Improvements + - Build Changes + - Experimental Features + - Table of Contents + - Upgrade Notes + - CVE-2021-42390 + - CVE-2021-42391 + - CVE-2021-25263 + - CVE-2019-15024 + - CVE-2019-16535 + - CVE-2019-16536 + - CVE-2019-18657 + - CVE-2018-14672 + - CVE-2018-14671 + - CVE-2018-14668 + - CVE-2018-14669 + - CVE-2018-14670 + - Performance Optimizations + - Code Cleanup + - Cloud Changelog + - CVE-2025-1385 + - CVE-2024-6873 + - CVE-2024-22412 + - CVE-2023-47118 + - CVE-2023-48298 + - CVE-2023-48704 + - CVE-2022-44011 + - CVE-2022-44010 + - CVE-2021-43304 + - CVE-2021-43305 + - CVE-2021-42387 + - CVE-2021-42388 + - CVE-2021-42389 + - cmake-variants.yaml + - launch.json + - settings.json + - run-debug.sh + - tasks.json + - Third-Party Libraries + - Build Clickhouse with DEFLATE_QPL + - Run Benchmark with DEFLATE_QPL + - Rust Libraries + - DEFLATE_QPL + - Star Schema + - DockerHub + - cpp + - codespell + - aspell + - mypy + - AST + - "C\\+\\+" + - clang-tidy + - DataLakeCatalog + - MergeTree + - ReplacingMergeTree + - SummingMergeTree - AggregatingMergeTree - - DigitalOcean Spaces - - Azure Blob Storage - - VPC - - BYOC - - TiDB - - PlanetScale - - Vitess + - CollapsingMergeTree + - VersionedCollapsingMergeTree + - GraphiteMergeTree + - TinyLog + - StripeLog + - MaterializedView + - DROP/DETACH TABLE + - ReplicatedMergeTree + - AzureQueue + - DeltaLake + - AzureBlobStorage + - Hudi + - MinMax + - Bloom + - N-gram + - EmbeddedRocksDB + - HDFS + - Hive + - ORC + - Parquet + - NATS - MySQL - ClickPipe - ClickPipes @@ -69,3 +495,395 @@ exceptions: - OTel - SQL - OSS + - URL + - RabbitMQ + - Iceberg + - TimeSeries + - AggregateFunction + - SimpleAggregateFunction + - FileLog + - system.asynchronous_insert_log + - system.asynchronous_loader + - system.asynchronous_metrics + - system.backups + - system.backup_log + - system.iceberg_history + - system.latency_buckets + - system.latency_log + - system.histogram_metrics + - system.licenses + - system.merges + - system.merge_tree_settings + - system.metric_log + - system.moves + - system.numbers + - system.numbers_mt + - system.one + - system.opentelemetry_span_log + - system.mutations + - system.parts_columns + - system.projections + - system.processors_profile_log + - system.query_cache + - system.query_condition_cache + - system.query_metric_log + - system.query_thread_log + - system.quota_limits + - system.query_views_log + - system.quotas_usage + - system.replicated_fetches + - system.role_grants + - system.replication_queue + - system.s3_queue_settings + - system.row_policies + - system.schema_inference_cache + - system.server_settings + - system.settings_changes + - system.session_log + - system.settings_profile_elements + - system.settings_profiles + - system.stack_trace + - system.storage_policies + - system.table_engines + - system.user_processes + - system.view_refreshes + - system.zookeeper_connection + - system.zookeeper_log + - AsynchronousHeavyMetricsCalculationTimeSpent + - AsynchronousHeavyMetricsUpdateInterval + - AsynchronousMetricsCalculationTimeSpent + - AsynchronousMetricsUpdateInterval + - BlockActiveTime_ + - BlockDiscardBytes_ + - BlockDiscardMerges_ + - BlockDiscardOps_ + - BlockDiscardTime_ + - BlockInFlightOps_ + - BlockQueueTime_ + - BlockReadBytes_ + - BlockReadMerges_ + - BlockReadOps_ + - BlockReadTime_ + - BlockWriteBytes_ + - BlockWriteMerges_ + - BlockWriteOps_ + - BlockWriteTime_ + - CPUFrequencyMHz_ + - AggregatorThreads + - AggregatorThreadsActive + - TablesLoaderForegroundThreads + - TablesLoaderForegroundThreadsActive + - TablesLoaderBackgroundThreads + - TablesLoaderBackgroundThreadsActive + - AsynchronousReadWait + - BackgroundBufferFlushSchedulePoolSize + - BackgroundBufferFlushSchedulePoolTask + - BackgroundCommonPoolSize + - BackgroundCommonPoolTask + - BackgroundDistributedSchedulePoolSize + - BackgroundDistributedSchedulePoolTask + - BackgroundFetchesPoolSize + - BackgroundFetchesPoolTask + - BackgroundMergesAndMutationsPoolSize + - BackgroundMergesAndMutationsPoolTask + - BackgroundMessageBrokerSchedulePoolSize + - BackgroundMessageBrokerSchedulePoolTask + - BackgroundMovePoolSize + - BackgroundMovePoolTask + - BackgroundSchedulePoolSize + - BackgroundSchedulePoolTask + - CacheDetachedFileSegments + - CacheDictionaryThreads + - CacheDictionaryThreadsActive + - CacheDictionaryUpdateQueueBatches + - CacheFileSegments + - ContextLockWait + - DatabaseCatalogThreads + - DatabaseCatalogThreadsActive + - DatabaseOnDiskThreads + - DatabaseOnDiskThreadsActive + - DestroyAggregatesThreads + - DestroyAggregatesThreadsActive + - DictCacheRequests + - DiskObjectStorageAsyncThreads + - DiskObjectStorageAsyncThreadsActive + - DiskSpaceReservedForMerge + - DistributedSend + - EphemeralNode + - FilesystemCacheElements + - FilesystemCacheReadBuffers + - FilesystemCacheSize + - QueryCacheBytes + - QueryCacheEntries + - UncompressedCacheBytes + - UncompressedCacheCells + - CompiledExpressionCacheBytes + - CompiledExpressionCacheCount + - MarkCacheBytes + - MarkCacheFiles + - GlobalThread + - LocalThread + - MemoryCode + - MemoryShared + - NetworkSend + - OSGuestTime + - OSNiceTime + - OSOpenFiles + - OSUptime + - OSUserTime + - PartMutation + - PartsActive + - PartsCompact + - PartsWide + - QueryThread + - RemoteRead + - SendScalars + - keeper_response_time_ms_bucket + - KEY_COLUMN_USAGE + - user_name/password + - username/ssh-key + - access_management + - grants + - user_name/networks + - user_name/profile + - user_name/quota + - user_name/databases + - readonly + - allow_ddl + - max_pipeline_depth + - histogram_metrics + - latency_log + - system.part_log + - system.parts + - system.processes + - system.query_log + - backup_log + - backups + - clickhouse-benchmark + - clickhouse-format utility + - compression + - core_dump + - crash_log + - encryption + - error_log + - Example + - File System + - graphite + - hsts_max_age + - Huge Pages + - include_from + - ldap_servers + - Linux Kernel + - listen_host + - listen_try + - logger + - macros + - merge_tree + - metric_log + - mysql_port + - openSSL + - part_log + - path + - prometheus + - proxy + - query_cache + - query_log + - s3queue_log + - ssh_server + - system.metrics + - system.quotas + - system.replicas + - system.resources + - system.roles + - system.scheduler + - system.settings + - system.table_engine + - system.tables + - system.text_log + - system.time_zones + - system.trace_log + - system.users + - system.warnings + - system.workloads + - system.zookeeper + - tcp_port + - tcp_ssh_port + - text_log + - timezone + - trace_log + - users_config + - zookeeper + - clickhouse_backupview + - jemalloc.active + - jemalloc.allocated + - jemalloc.arenas.all.dirty_purged + - jemalloc.arenas.all.muzzy_purged + - jemalloc.arenas.all.pactive + - jemalloc.arenas.all.pdirty + - jemalloc.arenas.all.pmuzzy + - jemalloc.background_thread.num_runs + - jemalloc.background_thread.num_threads + - jemalloc.background_thread.run_intervals + - jemalloc.epoch + - jemalloc.mapped + - jemalloc.metadata + - jemalloc.metadata_thp + - jemalloc.resident + - jemalloc.retained + - jemalloc.prof.active + - clickhouse-keeper-client + - toBFloat16 + - toFloat32 + - toFloat64 + - toDecimal32 + - toDecimal64 + - toDecimal128 + - toDecimal256 + - toFloat32OrZero + - toFloat32OrNull + - toFloat32OrDefault + - toFloat64OrZero + - toFloat64OrNull + - toFloat64OrDefault + - toBFloat16OrZero + - toBFloat16OrNull + - toDecimal32OrZero + - toDecimal32OrNull + - toDecimal32OrDefault + - toDecimal64OrZero + - toDecimal64OrNull + - toDecimal64OrDefault + - toDecimal128OrZero + - toDecimal128OrNull + - toDecimal128OrDefault + - toDecimal256OrZero + - toDecimal256OrNull + - toDecimal256OrDefault + - reinterpretAsFloat32 + - reinterpretAsFloat64 + - accurateCast + - accurateCastOrNull + - accurateCastOrDefault + - toUnixTimestamp64Second + - toUnixTimestamp64Milli + - toUnixTimestamp64Micro + - toUnixTimestamp64Nano + - fromUnixTimestamp64Second + - fromUnixTimestamp64Milli + - fromUnixTimestamp64Micro + - fromUnixTimestamp64Nano + - accurateCast.* + - accurateCastOrNull.* + - accurateCastOrDefault.* + - fromDaysSinceYearZero32 + - now64 + - getMacro + - blockSize + - currentUser + - transform.* + - runningDifference + - lengthUTF8 + - leftUTF8 + - leftPadUTF8 + - rightUTF8 + - rightPadUTF8 + - lowerUTF8 + - upperUTF8 + - isValidUTF8 + - toValidUTF8 + - reverseUTF8 + - substringUTF8 + - endsWithUTF8 + - startsWithUTF8 + - normalizeUTF8NFC + - normalizeUTF8NFD + - normalizeUTF8NFKC + - normalizeUTF8NFKD + - editDistanceUTF8 + - initcapUTF8 + - sparseGramsUTF8 + - sparseGramsHashesUTF8 + - base32Encode + - base32Decode + - tryBase32Decode + - base58Encode + - base58Decode + - tryBase58Decode + - base64Encode + - base64Decode + - tryBase64Decode + - CRC32 + - CRC64 + - positionUTF8 + - multiSearchAllPositionsUTF8 + - multiSearchFirstPositionUTF8 + - multiSearchAnyUTF8 + - ngramDistanceUTF8 + - ngramSearchUTF8 + - hasSubsequenceUTF8 + - halfMD5 + - MD4 + - MD5 + - sipHash64 + - sipHash64Keyed + - sipHash128 + - sipHash128Keyed + - sipHash128Reference + - sipHash128ReferenceKeyed + - cityHash64 + - intHash32 + - intHash64 + - SHA1 + - SHA224 + - SHA256 + - SHA512 + - SHA512_256 + - SHA1, SHA224, SHA256, SHA512, SHA512_256 + - BLAKE3 + - farmFingerprint64 + - farmHash64 + - javaHashUTF16LE + - metroHash64 + - murmurHash2_32 + - murmurHash2_64 + - murmurHash3_32 + - murmurHash3_64 + - murmurHash3_128 + - xxh3 + - xxHash32 + - xxHash64 + - ngramSimHashUTF8 + - wordShingleSimHashUTF8 + - wyHash64 + - ngramMinHashUTF8 + - ngramMinHashArgUTF8 + - wordShingleMinHashUTF8 + - wordShingleMinHashArgUTF8 + - keccak256 + - uniqUpTo.* + - quantileBFloat16Weighted + - quantiles + - quantileExact + - uniqHLL12 + - uniqCombined64 + - nonNegativeDerivative.* + - MySQL Docs + - Postgres Docs + - simpleJSON.*functions + - atan2 + - e + - exp10 + - exp2 + - intExp10 + - intExp2 + - log10 + - log1p + - log2 + - bitmaskToList.* + - bitmaskToArray.* + - bitPositionsToArray.* + - h3kRing + - stringToH3 + - h3PointDistM + - h3PointDistKm + - h3PointDistRads