Adding support for CSV files #46

uzadude · 2021-11-02T09:12:44Z

Summary

Adding support for indexing CSV files.

How was it tested?

added unit tests

shay1bz · 2021-11-03T11:54:42Z

dione-hadoop/src/main/scala/com/paypal/dione/hdfs/index/csv/CsvIndexer.scala

+   * Regular seek. Called once per offset (block).
+   */
+  override def seek(offset: Long): Unit = {
+    // @TODO - need to better optimize. not to re-init on every seek


So I guess this is only for random fetches, right? The performance will be impractical for batch...

but than again, why not just save the reader open? am I missing anything?

currently, the requested use-case is only fetches. but sure I would like to solve for both. started just with something that works.
In CSV we have only offset, sub-offset is almost only zero.
let's say we want to read every other row, then we need to understand when it will be better just to skip a row compared to when will we want to "re-init" from a new offset.

ok I got it... In the parquet implementation we have basically the same thing, right? I mean just one offset, when the row group is 128MB or more

uzadude added 5 commits November 2, 2021 11:09

Adding support for CSV files

c1a5eed

supporting non psik

c8ca5ad

add support in CSV split indexing and fix bug in delimiter

7b1af93

restrict fetch to requested fields

1301780

add more types

3034e30

shay1bz reviewed Nov 3, 2021

View reviewed changes

shay1bz approved these changes Nov 6, 2021

View reviewed changes

uzadude added 2 commits November 16, 2021 10:06

minor

b23a238

Merge remote-tracking branch 'origin/release/0.5' into csv

1ab37b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding support for CSV files #46

Adding support for CSV files #46

Uh oh!

uzadude commented Nov 2, 2021

Uh oh!

shay1bz Nov 3, 2021

Uh oh!

shay1bz Nov 3, 2021

Uh oh!

uzadude Nov 3, 2021

Uh oh!

shay1bz Nov 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding support for CSV files #46

Are you sure you want to change the base?

Adding support for CSV files #46

Uh oh!

Conversation

uzadude commented Nov 2, 2021

Summary

How was it tested?

Uh oh!

shay1bz Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

shay1bz Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

uzadude Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

shay1bz Nov 6, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants