Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for parquet filter pushdown project. #256

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rhou1
Copy link
Contributor

@rhou1 rhou1 commented Jan 31, 2017

Hi Rahul,

Can you review these?

Thanks.

--Robert

@rchallapalli
Copy link
Contributor

Review still in progress....

At a glance, the tests look good. I am curious to know how you generated the baselines. If you manually verified the baselines, then I would express some concern and would request you to use mysql or postgres to generate them.

@rhou1
Copy link
Contributor Author

rhou1 commented Feb 10, 2017

I manually verified the baselines. The tests were designed to generate certain answers so that I could verify them. So for example, I created a test that would return 3020 rows, and it returned 3020 rows.

@rchallapalli
Copy link
Contributor

The tests look good Robert. Once we generate the baselines using hive/mysql, we should be ready to commit (after the running the regression test again after re-basing onto latest framework master). Below are a few observations while looking at the tests

  1. Did we test the flag"planner.store.parquet.rowgroup.filter.pushdown.threshold" ? This is present in our documentation
  2. Metadata Caching : If we have a table with 4 rowgroups and we added one new parquet file to the table folder.....do we update the metadata cache and then consider the row groups from the newly added file? Any tests surrounding this usecase?
  3. CTAS auto-partitioned datasets : Its hard to say whether any of your data sets are auto-partitioned. Do we have tests around this?
  4. The documentation says we support '<>' for parquet filter pushdown. However I couldn't find any tests. Did I miss something obvious?
  5. Do we have files with multiple row groups and tests around that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants