Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support rcbinary/rctext/sequence file for external file table in jni connector #41067

Closed
wants to merge 13 commits into from

Conversation

0130w
Copy link

@0130w 0130w commented Sep 20, 2024

Proposed changes

Issue Number: #30669

This change supports reading the contents of external file tables from rcbinary, rctext, and sequence files via the JNI connector.

todo-lists:

  • Support Reading Table Structure from Hive Metastore
  • Support S3/HDFS File System Types
  • Rename commits info

Example:

mysql> select * from local( "file_path" = "ziqitest/row_1.seq", "format" = "sequence", "backend_id" = "10011");
+-------------+--------------+---------+-------------+-----------+------------+-------------+------------+------------+-------------+-------------+----------------------------+------------+-----------------+----------------------+---------------------------+
| col_tinyint | col_smallint | col_int | col_bigint  | col_float | col_double | col_decimal | col_string | col_char   | col_varchar | col_boolean | col_timestamp              | col_date   | col_array       | col_map              | col_struct                |
+-------------+--------------+---------+-------------+-----------+------------+-------------+------------+------------+-------------+-------------+----------------------------+------------+-----------------+----------------------+---------------------------+
|           7 |           13 |      74 | 13000000000 |      6.15 |      4.376 |       57.30 | world      | Char       | Varchar     |           1 | 2022-01-01 10:00:00.000000 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+-------------+--------------+---------+-------------+-----------+------------+-------------+------------+------------+-------------+-------------+----------------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (1.12 sec)

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@0130w 0130w closed this Sep 20, 2024
public static final String NAME = "local";
public static final String PROP_FILE_PATH = "file_path";
public static final String PROP_BACKEND_ID = "backend_id";
public static final String PROP_SHARED_STORAGE = "shared_storage";

private static final Logger LOG = LogManager.getLogger(LocalTableValuedFunction.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to move this code

@@ -144,6 +143,11 @@ public BrokerDesc getBrokerDesc() {
return new BrokerDesc("LocalTvfBroker", StorageType.LOCAL, locationProperties);
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not move code that unrelated to your feature

@@ -258,6 +258,15 @@ public void analyze(Analyzer analyzer, List<Expr> resultExprs, List<String> colL
headerType = FileFormatConstants.FORMAT_CSV_WITH_NAMES_AND_TYPES;
fileFormatType = TFileFormatType.FORMAT_CSV_PLAIN;
break;
case "rcbinary":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you support write sequence/rcfile too?

protected Optional<String> resourceName = Optional.empty();

// User specified csv columns, it will override columns got from file
private final List<Column> csvSchema = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not move code

private String useMetastore;
// Comma-separated list of column names.
// Only applicable when useMetastore is false and for formats like RCFile and SequenceFile.
private String columnNamesStr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reuse FileFormatUtils.parseCsvSchema()?
Although it is called "csv schema", but actually it can be used for any file format which need specified schema

throw new AnalysisException("use metastore is disable, should set column names and column types.");
}
}
// TODO: set metastore address in else branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to support use_hivemetastore in tvf.

@@ -377,6 +410,22 @@ public List<Column> getTableColumns() throws AnalysisException {
return columns;
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not remove code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants