-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Compression & Decompression from RLPMemo #456
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I either don't follow or there's a mistake in the approach of removing the Compress
.
Let's assume that we have a Branch
that has the children with the following first nibbles {A, B, C}
. How does this approach distinguish cases where:
A
andB
are encoded toKeccak
whileC
RLP is less than 31 bytes
2A
andC
are encoded toKeccak
whileB
RLP is less than 31 bytes
Where is the information stored now?
253eef4
to
eeec2e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few remarks, but almost ready to merge.
{ | ||
var bits = leftover[^emptyBytesLength..]; | ||
NibbleSet.Readonly.ReadFrom(bits, out empty); | ||
var bits = _buffer[^indexLength..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a nitpicking, but if we stored the index at the beginning, no span slice would be required and we could get it from the start, just by passing the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this particular change and for some reason it started breaking the CalculateStateRootHash
. Tried bunch of ways to test/debug, but it wasn't clear what exactly was causing the test failure. Moving back to the original index at the end resolved the issues.
Do you have any pointers if that is expected? But either way, I prefer to keep the underlying data consistent with the original representation after Compress
operation, by keeping the index at the end and not deal with state root mismatch issues 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, added new tests with grow/shrink which were able to catch the issue with header placement/incorrect implementation. Also modified the Large_random_operations
test which caught failures too. The current coverage for RLPMemo class is 100%. All the test pass with the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Massive.
if (value.Length == Keccak.Size) | ||
{ | ||
memoizedUpdated = true; | ||
memo.SetRaw(value, i); | ||
if (memo.Exists(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the clear approach here, where different cases are split.
// Optimization, omitting some of the branches to memoize. | ||
// It omits only these with two children where the cost of the recompute is not big. | ||
// To prevent an attack of spawning multiple levels of such branches, only even are skipped | ||
if (children.SetCount == 2 && (key.Path.Length + key.StoragePath.Length) % 2 == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shaved off some memory for small branches. It wasn't the biggest saving but it did something. Is it intentionally removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I moved this code to InspectBeforeApply
, since this optimization was done during compression, which we don't do now. But note that for this particular check, we now have to parse the branch data in InspectBeforeApply
and then find out the children.SetCount
value. Is the extra parsing in InspectBeforeApply
still worth it to save off for small branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why it was applied for small branches is exactly this. A small branch that has only 2 children will have 50% of its Keccaks recalculated on any update. Why bother with keeping the other if we can save 64 + 2 bytes for them? Could you have a db imported with this PR and provide the size used by it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get you fully.. we can save up that space, but I just wanted to point out that here we are doing an additional bit of parsing to find out the children count. I think that it is fine to be done for every branch, since we would be saving 64+2 bytes potentially with the smaller branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are. This is due to the nature of unmaterialized data that need to be reparsed on each passing. I cannot provide you with numbers (#286 has none), but saving 66 bytes out of 68 for small branches seem to be a good deal 👀
{ | ||
var bits = leftover[^emptyBytesLength..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential compression opportunity lost here. Let's consider a branch that has only one Keccak memoized in RlpMemo. In such a cache, we could encode this one on a single byte. Even more, we could encode up to 2 on a single byte. This would, for cases where only 1 or 2 keccaks are memoized, make them ligher by one byte. For 2 keccaks, it would be ~ 1.5% saving where for one it would be 3%. This would make it a bit more complicated though. Another option would be to treat the special case of 1 byte to keep the not-set (which would save some space for 15 and 14 set, but would be much less impactful size-wise I think). We could have it set in NibbleSet
so that it's capable to writing/reading from/to 1-2 bytes instead of 2 only.
I know that original decompress/compress had no such feature
Paprika/src/Paprika/Merkle/RlpMemo.cs
Lines 171 to 176 in 61c326a
if (empty.SetCount > 0) | |
{ | |
var dest = writeTo.Slice(at * Keccak.Size, NibbleSet.MaxByteSize); | |
new NibbleSet.Readonly(empty).WriteToWithLeftover(dest); | |
return at * Keccak.Size + NibbleSet.MaxByteSize; | |
} |
Paprika/src/Paprika/Merkle/Node.cs
Lines 405 to 416 in 61c326a
if (Children.SetCount == ChildCount2) | |
{ | |
leftover[0] = (byte)(Children.BiggestNibbleSet | (Children.BiggestNibbleSet << NibblePath.NibbleShift)); | |
} | |
else | |
{ | |
Debug.Assert(Children.SetCount == ChildCount3); | |
// remove the smallest nibble set and then get the smallest | |
var mid = Children.Remove(Children.SmallestNibbleSet).SmallestNibbleSet; | |
leftover[0] = (byte)(mid | (Children.BiggestNibbleSet << NibblePath.NibbleShift)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! I was also thinking of another approach and wanted to run through you to check which one might be better (before completing the code changes):
To maximum utilize one byte, it could store partial bitset index. We can basically divide the 2 bytes into two parts and utilize only 1 byte to represent the index for cases where multiple contiguous indices are set.
Original (2 bytes) : [0, 1, 2, .. 15]
Modified (1 Byte): [x, 0, 1, 2, 3, 4, 5, 6] or [x, 7, 8, 9, 10, 11, 12, 13]
x can be 0 or 1 depending on which half it is representing.
As long as many children fall in a contiguous range, we would be only needing 1 byte. Disadvantage to this approach :
a) even if 2 children exist in two different halves, we would have to fall back on 2 bytes of storage.
b) 14,15 children can never be represented with one byte
Does this make sense? Do you think it is better/worse than your proposed approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What an interesting idea! One would need to count the collisions and probabilities 😅
- for 2 nibbles, picking up two from one of the ranges is 7/15.
- for 3 nibbles, we get 1/5
- for 4 nibbles, we get 1/13
As we're trying to optimize for the small branches (the big won't benefit from removing 1 byte) it's a bet. With the ncoding proposed before, we get 1 byte for nibbles and always 2 bytes for other cases. Here we optimize up to any number, but this is diminishing returns, right? With 4 Keccaks it'll be only 1% saved.
What do you think then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to take up this optimization in another PR, since it would require extra logic and even additional testing to ensure that it works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this said, should this PR be no longer labeled with Breaking
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed the label.
} | ||
|
||
[Test] | ||
public void All_children_set_with_all_Keccaks_set() | ||
public void Random_delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The random tests are nice for fuzzying, but it would be beneficial to have some explicit A, B, C where A, B or A,C are set so that we cover what previously was missing. Or maybe I'm not seeing the tests like this but they do exist 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that test is missing but I am not quite sure how to capture this in the test - a child with Rlp < 31 bytes while others have keccaks?
I looked at existing tests to figure it out (e.g. Three_accounts_sharing_start_nibble
, Branch_two_leafs
), but I couldn't find a way to simulate this scenario:
[Test]
public void Insert_get_operation()
{
var commit = new Commit();
const string key0 = "A0000001";
const string key1 = "B0000002";
const string key2 = "C0000003";
commit.Set(key0);
commit.Set(key1);
commit.Set(key2);
NibbleSet.Readonly children = new NibbleSet(0xA, 0xB, 0xC);
commit.SetBranch(Key.Merkle(NibblePath.Empty), children);
var merkle = new ComputeMerkleBehavior();
merkle.BeforeCommit(commit, CacheBudget.Options.None.Build());
}
I am probably missing some obvious way to complete this test scenario 🫤
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd replace the Branch based test with a direct KeccakMemo
test maybe? Not sure if covering the compounded Branch + memo is doable in a nice way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added basic new tests in RLPMemo with fixed children. Branch + memo test couldn't be done without exposing few internal details as public, so sticking to independent unit tests.
100d382
to
abafbc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more clarifications.
Currently the memoised RLPs are stored in a compressed form when flushing to the PageDb, but when stored in-memory, these are in the decompressed form.
This PR removes compression & decompression altogether by always storing the memorized RLPs in compressed form. To achieve this, a NibbleSet index (2 bytes) is stored at the end of RLPMemo buffer which indicates whether each element is currently stored in the memo. For example,
[k1 | k2 | k5]
in RLPMemo would also store NibbleSet{1, 2, 5}
or0b0000 0000 0010 0110
to show that only 1, 2 & 5 index children's keccaks are currently memoized.Also note that due to an existing optimization done during DB write operation to not store memoized RLPs for top level state branches (
SkipRlpMemoizationForTopLevelsCount
), the RLPMemo data structure supports storing empty buffer even if there are children for that branch.