[BUG]: Malloc exception while reading large parquet file #487

shamimashik · 2024-12-30T10:00:08Z

Issue Description

Having issues reading large sized checkpoint parquet files.
I'm using code like this -
val results = rowGroupReader.Column(0).LogicalReader<string>().ReadAll(numRows);

Getting the below error -
class parquet::ParquetStatusException (message: 'Out of memory: malloc of size 104478272 failed')

Environment Information

ParquetSharp Version: [e.g. 1.0.1]
.NET Framework/SDK Version: [e.g. .NET Framework 4.7.2]
Operating System: [e.g. Windows 10]

Steps To Reproduce

val results = rowGroupReader.Column(0).LogicalReader<string>().ReadAll(numRows);

Expected Behavior

There should not be any exception

Additional Context (Optional)

No response

The text was updated successfully, but these errors were encountered:

adamreeve · 2025-01-05T23:21:56Z

Hi @shamimashik. Is numRows the total number of rows in the row group? Reading a smaller number of rows at a time might help reduce memory usage. Eg. something like:

const int bufferSize = 1024;
var buffer = new string[bufferSize];
using var columnReader = rowGroupReader.Column(0).LogicalReader<string>();
while (columnReader.HasNext)
{
    var rowsRead = columnReader.ReadBatch(buffer);
    var values = buffer.AsSpan(0, rowsRead);
}

pathacke · 2025-01-21T18:56:26Z

Hi @adamreeve Thanks for response.

How can we determine an ideal buffer size that works for tables of all sizes? We might have tables with a large number of columns as well as tables with fewer columns. A table with many columns could cause an exception if the buffer size isn't appropriate.

adamreeve · 2025-01-21T20:08:14Z

The buffer would be used to read one column at a time so the number of columns shouldn't matter. I think you'd need to do testing using your own data to determine a buffer size that works best for you.

shamimashik changed the title ~~[BUG]: <title>~~ [BUG]: Malloc exception while reading large parquet file Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Malloc exception while reading large parquet file #487

[BUG]: Malloc exception while reading large parquet file #487

shamimashik commented Dec 30, 2024 •

edited

Loading

adamreeve commented Jan 5, 2025

pathacke commented Jan 21, 2025

adamreeve commented Jan 21, 2025

[BUG]: Malloc exception while reading large parquet file #487

[BUG]: Malloc exception while reading large parquet file #487

Comments

shamimashik commented Dec 30, 2024 • edited Loading

Issue Description

Environment Information

Steps To Reproduce

Expected Behavior

Additional Context (Optional)

adamreeve commented Jan 5, 2025

pathacke commented Jan 21, 2025

adamreeve commented Jan 21, 2025

shamimashik commented Dec 30, 2024 •

edited

Loading