Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 hive style writes #697

Open
wants to merge 33 commits into
base: antalya
Choose a base branch
from
Open

S3 hive style writes #697

wants to merge 33 commits into from

Conversation

arthurpassos
Copy link
Collaborator

More info on ClickHouse#76802

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add support for hive partition style writes

Documentation entry for user-facing changes

@arthurpassos
Copy link
Collaborator Author

Depends on #700

@arthurpassos
Copy link
Collaborator Author

Depends on #700

and writing more tests are the only thing missing I guess

@altinity-robot
Copy link
Collaborator

altinity-robot commented Mar 27, 2025

This is an automated comment for commit abdd84a with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Regression aarch64 Tiered Storage s3amazonThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
Regression aarch64 Tiered Storage s3gcsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
Regression release Tiered Storage s3amazonThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
Regression release Tiered Storage s3gcsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
Sign aarch64There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ error
Sign releaseThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ error
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Successful checks
Check nameDescriptionStatus
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Ready for releaseThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Alter move partitionThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Benchmark aws_s3There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Benchmark gcsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Benchmark minioThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Clickhouse Keeper SSLThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 LDAP authenticationThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 LDAP external_user_directoryThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 LDAP role_mappingThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Parquet aws_s3There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Parquet minioThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 ParquetThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 Tiered Storage minioThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 aes_encryptionThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 atomic_insertThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 base_58There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 clickhouse_keeperThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 data_typesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 datetime64_extended_rangeThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 disk_level_encryptionThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 dnsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 enginesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 exampleThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 extended_precision_data_typesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 kafkaThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 kerberosThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 key_valueThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 lightweight_deleteThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 memoryThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 part_moves_between_shardsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 selectsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 session_timezoneThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 tiered_storageThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression aarch64 window_functionsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release Alter move partitionThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release Benchmark aws_s3There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release Benchmark minioThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release LDAP authenticationThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release LDAP external_user_directoryThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release LDAP role_mappingThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release atomic_insertThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release base_58There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release data_typesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release datetime64_extended_rangeThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release dnsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release enginesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release exampleThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release extended_precision_data_typesThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release kafkaThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release kerberosThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release key_valueThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release part_moves_between_shardsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Regression release session_timezoneThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success

throw Exception(ErrorCodes::LOGICAL_ERROR, "Table level partition expression and query level partition expression can't be specified together, this is a bug");
}

static std::unordered_map<std::string, bool> partitioning_style_to_wildcard_acceptance =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a little bit overengineering. This make sense if we expect more variants in future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my assumption, do you think we could keep it or you want it to be changed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code works, can be kept, I think.

configuration->partitioning_style);
}

if (configuration->withPartitionWildcard() && !partitioning_style_to_wildcard_acceptance.at(configuration->partitioning_style))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withPartitionWildcard method searches substring in strings on every call, better to call once and keep result in local variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

/// - https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
/// - https://cloud.ibm.com/apidocs/cos/cos-compatibility#putobject

if (str.empty() || str.size() > 1024)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just comment, not for change right now, because this code already exists, just moved in namespace, but I don't like it. As I understand key is generated inside clickhouse code and customer can't fully control key length. And when he gets this error - what's next? "Ok, key is to long, how can I fix it?".
May be we need to add task in TODO list to think about autodecreasing key length in cases like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully generated by ClickHouse. The key here represents the path without the bucket. Part of that can be specified by user upon table creation.

*/
std::string formatToFileExtension(const std::string & format)
{
std::string lower_case_format;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference with just return Poco::toLower(format)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use Poco::toLower

@arthurpassos
Copy link
Collaborator Author

arthurpassos commented Apr 2, 2025

Depends on #710 and #711, tests will not pass otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants