You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having issues reading large sized checkpoint parquet files.
I'm using code like this - val results = rowGroupReader.Column(0).LogicalReader<string>().ReadAll(numRows);
Getting the below error -
class parquet::ParquetStatusException (message: 'Out of memory: malloc of size 104478272 failed')
Hi @shamimashik. Is numRows the total number of rows in the row group? Reading a smaller number of rows at a time might help reduce memory usage. Eg. something like:
How can we determine an ideal buffer size that works for tables of all sizes? We might have tables with a large number of columns as well as tables with fewer columns. A table with many columns could cause an exception if the buffer size isn't appropriate.
The buffer would be used to read one column at a time so the number of columns shouldn't matter. I think you'd need to do testing using your own data to determine a buffer size that works best for you.
Issue Description
Having issues reading large sized checkpoint parquet files.
I'm using code like this -
val results = rowGroupReader.Column(0).LogicalReader<string>().ReadAll(numRows);
Getting the below error -
class parquet::ParquetStatusException (message: 'Out of memory: malloc of size 104478272 failed')
Environment Information
Steps To Reproduce
val results = rowGroupReader.Column(0).LogicalReader<string>().ReadAll(numRows);
Expected Behavior
There should not be any exception
Additional Context (Optional)
No response
The text was updated successfully, but these errors were encountered: