Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add CPU-GPU co-processing for higher decompression throughput #17974

Open
GregoryKimball opened this issue Feb 10, 2025 · 0 comments
Open
Assignees
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@GregoryKimball
Copy link
Contributor

Is your feature request related to a problem? Please describe.

We've added host decompression support for GZIP, with Snappy and ZSTD coming soon. These implementations use a CPU thread pool to process the large compression blocks of JSONL data, blocks that are larger than the maximum allowed size in nvcomp (for ZSTD) or too large to be efficiently processed on the GPU (for Snappy and GZIP).

There are other optimizations we should explore to use host decompression and compression tool for making the parquet/ORC readers and writers more efficient.

Describe the solution you'd like

  • (decompression) sorting compressed blocks for nvCOMP. We use a batched decompress call for nvCOMP, and nvCOMP processes the compressed blocks first-in-first-out. We should test the impact of sorting the compressed buffers based on their size, processing the largest buffers first. This could reduce the load imbalancing of nvCOMP kernels.
  • (decompression) shared host and device co-processing. We could also split the compression blocks between host and device. If the GPU can only run 1000 compression blocks at a time, and 1000+28 blocks are received, we could send the 28 largest blocks down for host processing. We could apply estimates of host and device processing throughput, plus the GPU arch and host threadpool size, do decide how to split the blocks between host and device.
@GregoryKimball GregoryKimball added the feature request New feature or request label Feb 10, 2025
@GregoryKimball GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

2 participants