Vs 1736 - changing import to use parquet#9301
Draft
koncheto-broad wants to merge 44 commits intoah_var_storefrom
Draft
Vs 1736 - changing import to use parquet#9301koncheto-broad wants to merge 44 commits intoah_var_storefrom
koncheto-broad wants to merge 44 commits intoah_var_storefrom
Conversation
… make file names unique
… make file names unique
…can be made accurate
…ting in no matches when directory path was searched
…at caused an opaque localization issue
* Update to latest ah_var_store * Fix some WDL syntax errors * Disable by default, for now 'ConfigureParquetLifecycle' * Pin gcnvkernel dependency for Python 3.10, other build fixes [VS-1789] (#9316) --------- Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
* Include sample id in Parquet file names. * Store sample id in Parquet tracking table. * Added checking for None in parsing out sample_id from parquet file name. --------- Co-authored-by: Miguel Covarrubias <mcovarr@broadinstitute.org>
* Fixed the tests. * Updated the gatk docker.
This PR Updates the lifecycle config strategy for parquet so that updates are possible.
This PR adds a task to delete the parquet files once they are done being used. As there was controversy as to how to delete large amounts of files, it allows for an alternate deletion strategy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request is an extension on much older work to modify ingest to produce parquet files (for Azure, at the time) instead of writing to BigQuery tables. This PR, as part of the 1736 spike, modifies our ingest process to directly load those parquet files into BQ using free APIs instead of the costly write api.