change to writing Avro B-tree blocks in pre-order #71
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Following the discussion in #70,
Looks like we are writing the blocks in the Avro B-tree in a "reverse post-order" order. this causes us to have many hops in the file when we want to iterate over it in sorted order, as the blocks are saved almost in reverse order.
Instead, this PR changes the behavior to write the blocks in "pre-order".
following that we can continue with the caching PR (#70).
It seems that somehow we're backward compatible! (as no code changes were required in the "reader" part).
Detailed Description
Instead of writing directly to the inMemoryBuffer ("flush"), we leave all the data in the tree in memory,
and only at the end we write to the inMemoryBuffer in "reverse pre-order", update the parent records with the "real" offset,
and in the end, reverse and write to the file buffer to get a file with pre-order sorted blocks.
How was it tested?
added a unit test to explicitly verify the file's block order.