Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table has more than one bucket keys, but "show create table xxx" only displays one #11090

Open
madeirak opened this issue Sep 6, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@madeirak
Copy link

madeirak commented Sep 6, 2024

Apache Iceberg version

1.4.3

Query engine

Spark

Please describe the bug 🐞

image
Through "select * from xx.xx.partitions" above, it can be seen that this table has two bucket keys.
But "show create table xx.xx"as below,only display one bucket key
image

@madeirak madeirak added the bug Something isn't working label Sep 6, 2024
@manuzhang
Copy link
Contributor

manuzhang commented Sep 19, 2024

The table has two partition keys from two partition transforms, one of which is bucket.

@madeirak
Copy link
Author

madeirak commented Sep 19, 2024

The table has two partition keys from two partition transforms, one of which is bucket.

image
Are these two partition transforms equivalent? name_bucket_10 and id_bucket_10

Are the principle both hash?

@manuzhang
Copy link
Contributor

Sorry, I missed name_bucket_10 part. How did you create your table? With which catalog?

@madeirak
Copy link
Author

Sorry, I missed name_bucket_10 part. How did you create your table? With which catalog?

Similar to the following process:

create table   dbxx.tbxx (id INT COMMENT '11', name STRING COMMENT '') USING iceberg PARTITIONED BY (name, bucket(10, name), bucket(10, id ));
insert into tbxx values (1, '1');
show create table dbxx.tbxx ;
select * from dbxx.tbxx.partitions;

@madeirak
Copy link
Author

Sorry, I missed name_bucket_10 part. How did you create your table? With which catalog?

With HiveCatalog

@lurnagao-dahua
Copy link
Contributor

create table dbxx.tbxx (id INT COMMENT '11', name STRING COMMENT '') USING iceberg PARTITIONED BY (name, bucket(10, name), bucket(10, id ));
insert into tbxx values (1, '1');
show create table dbxx.tbxx ;
select * from dbxx.tbxx.partitions;

I am quite puzzled why name is used as both partition and bucket. In this case, all the data under the name partition is in the same bucket, and the bucketing effect is meaningless.

@madeirak
Copy link
Author

madeirak commented Sep 25, 2024

create table dbxx.tbxx (id INT COMMENT '11', name STRING COMMENT '') USING iceberg PARTITIONED BY (name, bucket(10, name), bucket(10, id ));
insert into tbxx values (1, '1');
show create table dbxx.tbxx ;
select * from dbxx.tbxx.partitions;

I am quite puzzled why name is used as both partition and bucket. In this case, all the data under the name partition is in the same bucket, and the bucketing effect is meaningless.

This is just an example, not a real table. The main issue is that multiple bucket fields only display one in "show create table xxx"

@manuzhang
Copy link
Contributor

The show create table result is following Spark SQL syntax, which only supports one bucket field.

@madeirak
Copy link
Author

The show create table result is following Spark SQL syntax, which only supports one bucket field.

ok, fine. It would be better if it could be as shown in the Iceberg document:
imageref: https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants