-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](iceberg)Supports using rest
type catalog to read tables in unity catalog
#43525
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -259,6 +259,10 @@ std::vector<tparquet::KeyValue> ParquetReader::get_metadata_key_values() { | |||
return _t_metadata->key_value_metadata; | |||
} | |||
|
|||
const FieldDescriptor ParquetReader::get_file_metadata_schema() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const std::doris::vectorized::FieldDescriptor' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
const FieldDescriptor ParquetReader::get_file_metadata_schema() { | |
FieldDescriptor ParquetReader::get_file_metadata_schema() { |
be/src/vec/exec/format/parquet/vparquet_reader.h:151:
- const FieldDescriptor get_file_metadata_schema();
+ FieldDescriptor get_file_metadata_schema();
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
icebergCatalog = ((HMSExternalCatalog) key.catalog).getIcebergHiveCatalog(); | ||
Catalog icebergCatalog = ((HMSExternalCatalog) key.catalog).getIcebergHiveCatalog(); | ||
icebergTable = HiveMetaStoreClientHelper.ugiDoAs(((ExternalCatalog) key.catalog).getConfiguration(), | ||
() -> icebergCatalog.loadTable(TableIdentifier.of(key.dbName, key.tableName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should unify the interface, both using catalog.loadTable()
or using metadataOps.loadTable()
@@ -39,6 +39,7 @@ public abstract class IcebergExternalCatalog extends ExternalCatalog { | |||
public static final String ICEBERG_HADOOP = "hadoop"; | |||
public static final String ICEBERG_GLUE = "glue"; | |||
public static final String ICEBERG_DLF = "dlf"; | |||
public static final String EXTERNAL_SERVER_CATALOG_NAME = "external_server_catalog_name"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final String EXTERNAL_SERVER_CATALOG_NAME = "external_server_catalog_name"; | |
public static final String EXTERNAL_SERVER_CATALOG_NAME = "external_catalog.name"; |
_has_schema_change = true; | ||
} | ||
} | ||
Status IcebergParquetReader::_gen_col_name_maps(FieldDescriptor field_desc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status IcebergParquetReader::_gen_col_name_maps(FieldDescriptor field_desc) { | |
Status IcebergParquetReader::_gen_col_name_maps(const FieldDescriptor& field_desc) { |
@@ -149,6 +149,7 @@ class ParquetReader : public GenericReader { | |||
const std::unordered_map<std::string, VExprContextSPtr>& missing_columns) override; | |||
|
|||
std::vector<tparquet::KeyValue> get_metadata_key_values(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be removed?
@@ -218,7 +218,7 @@ class IcebergParquetReader final : public IcebergTableReader { | |||
parquet_reader->set_delete_rows(&_iceberg_delete_rows); | |||
} | |||
|
|||
Status _gen_col_name_maps(std::vector<tparquet::KeyValue> parquet_meta_kv); | |||
Status _gen_col_name_maps(FieldDescriptor field_desc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need also modify this method in IcebergOrcReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The orc format is originally associated according to the id, so there is no need to modify it.
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -147,6 +150,14 @@ Status FieldDescriptor::parse_from_thrift(const std::vector<tparquet::SchemaElem | |||
return Status::OK(); | |||
} | |||
|
|||
const doris::Slice FieldDescriptor::get_column_name_from_field_id(int32_t id) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const doris::Slice' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
const doris::Slice FieldDescriptor::get_column_name_from_field_id(int32_t id) const { | |
doris::Slice FieldDescriptor::get_column_name_from_field_id(int32_t id) const { |
be/src/vec/exec/format/parquet/schema_desc.h:137:
- const doris::Slice get_column_name_from_field_id(int32_t id) const;
+ doris::Slice get_column_name_from_field_id(int32_t id) const;
// This is for iceberg schema evolution. | ||
std::vector<tparquet::KeyValue> ParquetReader::get_metadata_key_values() { | ||
return _t_metadata->key_value_metadata; | ||
const FieldDescriptor ParquetReader::get_file_metadata_schema() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const std::doris::vectorized::FieldDescriptor' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
const FieldDescriptor ParquetReader::get_file_metadata_schema() { | |
FieldDescriptor ParquetReader::get_file_metadata_schema() { |
be/src/vec/exec/format/parquet/vparquet_reader.h:150:
- const FieldDescriptor get_file_metadata_schema();
+ FieldDescriptor get_file_metadata_schema();
run buildall |
rest
type catalog to read tables in unity catalogrest
type catalog to read tables in unity catalog
run buildall |
What problem does this PR solve?
rest
type catalog to read tables in the unity catalog (https://github.com/unitycatalog/unitycatalog).example:
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)