Wish: extract parts of gzip file

My users often have huge .gz files that they would like to process in parallel.

Can gzrt be adapted so it can extract a valid gz-file in blocks?

Let us assume I have a 1 GB file.gz and I want to extract blocks of around 1 MB of compressed data. I want to do this in parallel. So first I want to identify positions where a valid gz-block starts:

$ gzrt --next-start-of-block 0
0
$ gzrt --next-start-of-block 1000000
1234888
$ gzrt --next-start-of-block 2000000
2123488
...
$ gzrt --next-start-of-block 999000000
999348877

The idea is to seek to the byte position and then identify the next valid gz-block. When it is identified, print the byteposistion and exit.

After identifying where blocks start I would then be able to extract from one block to another:

gzrt --from-byte 0 --to-byte 1234888 | my_program &
gzrt --from-byte 1234888 --to-byte 2123488 | my_program &
gzrt --from-byte 2123488 --to-byte 3212348 | my_program &
...
gzrt --from-byte 998374753 --to-byte 999348877 | my_program &



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wish: extract parts of gzip file #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wish: extract parts of gzip file #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions