Binary storage and serialization #227
Replies: 6 comments 4 replies
-
That's one of the things on my todo list. Binary storage could definitely come in handy in some cases. But in general binary is not always more efficient -- at least in size -- than strings. Consider Options market data. Option premiums are small like 2, 0.5, 3.25, ... also Option sizes are small too. These prices could be stored in a few bytes in a string, but in binary they are always 8 bytes. On a separate note, I am looking for contributors. If you or somebody you know wants to contribute and for example implement the binary format please let me know |
Beta Was this translation helpful? Give feedback.
-
I might be up for contributing on that topic. I still need to play around the with library to understand what I am getting myself into. By the way, have you shared your todo list or a roadmap? |
Beta Was this translation helpful? Give feedback.
-
I don't have an official todo list. But these are the things I am thinking about and have no idea when/if to find time to do them:
|
Beta Was this translation helpful? Give feedback.
-
Re/ HDF5 you are correct. I put them there a few years ago as place holders. I think at this time, Parquet would be the highest priority, since it would make it compatible with popular packages like Arrow and Hadoop. One requirement that I have kept from the beginning of developing this package was that it should be self contained. It means DataFrame should not have any dependencies on other libraries except STL. So, if one can write Parquet format read/write routines from scratch, it would be great. Re/ 4. yes there are places we can look to simplify the interface. I have no restriction to be compatible with C++17. C++20 is just fine. I never had time to get around incorporating C++20 upgrades. |
Beta Was this translation helpful? Give feedback.
-
reading/writing in binary format is now implemented in DataFrame |
Beta Was this translation helpful? Give feedback.
-
HH's library became part of ..but he does have a very useful page with 'the best' low-level algorithms |
Beta Was this translation helpful? Give feedback.
-
I am only at the browse-the-doc stage. I see three file formats are supported, all text-based (csv, csv2, json). In addition there is also string-based serialization.
Has there been no need for writing in binary format directly? I would think that would save quite a bit of disk space and parsing time when handling large data sets (there seem to be a financial industry background to this and I have heard it matters there). It also matters when concerned with exact representation of floating point values (float->string->float injects noise in the process).
Was there any thought given to this already? Any major obstacles?
Beta Was this translation helpful? Give feedback.
All reactions