Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Compression & Decompression from RLPMemo #456

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

dipkakwani
Copy link
Contributor

@dipkakwani dipkakwani commented Dec 26, 2024

Currently the memoised RLPs are stored in a compressed form when flushing to the PageDb, but when stored in-memory, these are in the decompressed form.

This PR removes compression & decompression altogether by always storing the memorized RLPs in compressed form. To achieve this, a NibbleSet index (2 bytes) is stored at the end of RLPMemo buffer which indicates whether each element is currently stored in the memo. For example, [k1 | k2 | k5] in RLPMemo would also store NibbleSet {1, 2, 5} or 0b0000 0000 0010 0110 to show that only 1, 2 & 5 index children's keccaks are currently memoized.

Also note that due to an existing optimization done during DB write operation to not store memoized RLPs for top level state branches (SkipRlpMemoizationForTopLevelsCount), the RLPMemo data structure supports storing empty buffer even if there are children for that branch.

@dipkakwani dipkakwani changed the title Remove Compression/Decompression Overhead from RLPMemo Remove Compression & Decompression from RLPMemo Dec 26, 2024
@dipkakwani dipkakwani linked an issue Dec 27, 2024 that may be closed by this pull request
Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I either don't follow or there's a mistake in the approach of removing the Compress.

Let's assume that we have a Branch that has the children with the following first nibbles {A, B, C}. How does this approach distinguish cases where:

  1. A and B are encoded to Keccak while C RLP is less than 31 bytes
    2 A and C are encoded to Keccak while B RLP is less than 31 bytes

Where is the information stored now?

src/Paprika/Merkle/ComputeMerkleBehavior.cs Outdated Show resolved Hide resolved
src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
src/Paprika/Merkle/ComputeMerkleBehavior.cs Outdated Show resolved Hide resolved
@dipkakwani dipkakwani marked this pull request as ready for review January 6, 2025 08:42
Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few remarks, but almost ready to merge.

src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
{
var bits = leftover[^emptyBytesLength..];
NibbleSet.Readonly.ReadFrom(bits, out empty);
var bits = _buffer[^indexLength..];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a nitpicking, but if we stored the index at the beginning, no span slice would be required and we could get it from the start, just by passing the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this particular change and for some reason it started breaking the CalculateStateRootHash. Tried bunch of ways to test/debug, but it wasn't clear what exactly was causing the test failure. Moving back to the original index at the end resolved the issues.
Do you have any pointers if that is expected? But either way, I prefer to keep the underlying data consistent with the original representation after Compress operation, by keeping the index at the end and not deal with state root mismatch issues 😓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, added new tests with grow/shrink which were able to catch the issue with header placement/incorrect implementation. Also modified the Large_random_operations test which caught failures too. The current coverage for RLPMemo class is 100%. All the test pass with the current implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive.

src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
src/Paprika/Merkle/RlpMemo.cs Outdated Show resolved Hide resolved
if (value.Length == Keccak.Size)
{
memoizedUpdated = true;
memo.SetRaw(value, i);
if (memo.Exists(i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the clear approach here, where different cases are split.

// Optimization, omitting some of the branches to memoize.
// It omits only these with two children where the cost of the recompute is not big.
// To prevent an attack of spawning multiple levels of such branches, only even are skipped
if (children.SetCount == 2 && (key.Path.Length + key.StoragePath.Length) % 2 == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shaved off some memory for small branches. It wasn't the biggest saving but it did something. Is it intentionally removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I moved this code to InspectBeforeApply, since this optimization was done during compression, which we don't do now. But note that for this particular check, we now have to parse the branch data in InspectBeforeApply and then find out the children.SetCount value. Is the extra parsing in InspectBeforeApply still worth it to save off for small branches?

Copy link
Contributor

@Scooletz Scooletz Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why it was applied for small branches is exactly this. A small branch that has only 2 children will have 50% of its Keccaks recalculated on any update. Why bother with keeping the other if we can save 64 + 2 bytes for them? Could you have a db imported with this PR and provide the size used by it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get you fully.. we can save up that space, but I just wanted to point out that here we are doing an additional bit of parsing to find out the children count. I think that it is fine to be done for every branch, since we would be saving 64+2 bytes potentially with the smaller branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are. This is due to the nature of unmaterialized data that need to be reparsed on each passing. I cannot provide you with numbers (#286 has none), but saving 66 bytes out of 68 for small branches seem to be a good deal 👀

{
var bits = leftover[^emptyBytesLength..];
Copy link
Contributor

@Scooletz Scooletz Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential compression opportunity lost here. Let's consider a branch that has only one Keccak memoized in RlpMemo. In such a cache, we could encode this one on a single byte. Even more, we could encode up to 2 on a single byte. This would, for cases where only 1 or 2 keccaks are memoized, make them ligher by one byte. For 2 keccaks, it would be ~ 1.5% saving where for one it would be 3%. This would make it a bit more complicated though. Another option would be to treat the special case of 1 byte to keep the not-set (which would save some space for 15 and 14 set, but would be much less impactful size-wise I think). We could have it set in NibbleSet so that it's capable to writing/reading from/to 1-2 bytes instead of 2 only.

I know that original decompress/compress had no such feature

if (empty.SetCount > 0)
{
var dest = writeTo.Slice(at * Keccak.Size, NibbleSet.MaxByteSize);
new NibbleSet.Readonly(empty).WriteToWithLeftover(dest);
return at * Keccak.Size + NibbleSet.MaxByteSize;
}
but we use something similar for branches themselves
if (Children.SetCount == ChildCount2)
{
leftover[0] = (byte)(Children.BiggestNibbleSet | (Children.BiggestNibbleSet << NibblePath.NibbleShift));
}
else
{
Debug.Assert(Children.SetCount == ChildCount3);
// remove the smallest nibble set and then get the smallest
var mid = Children.Remove(Children.SmallestNibbleSet).SmallestNibbleSet;
leftover[0] = (byte)(mid | (Children.BiggestNibbleSet << NibblePath.NibbleShift));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I was also thinking of another approach and wanted to run through you to check which one might be better (before completing the code changes):
To maximum utilize one byte, it could store partial bitset index. We can basically divide the 2 bytes into two parts and utilize only 1 byte to represent the index for cases where multiple contiguous indices are set.

Original (2 bytes) : [0, 1, 2, .. 15]
Modified (1 Byte): [x, 0, 1, 2, 3, 4, 5, 6] or [x, 7, 8, 9, 10, 11, 12, 13]

x can be 0 or 1 depending on which half it is representing.

As long as many children fall in a contiguous range, we would be only needing 1 byte. Disadvantage to this approach :
a) even if 2 children exist in two different halves, we would have to fall back on 2 bytes of storage.
b) 14,15 children can never be represented with one byte

Does this make sense? Do you think it is better/worse than your proposed approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an interesting idea! One would need to count the collisions and probabilities 😅

  • for 2 nibbles, picking up two from one of the ranges is 7/15.
  • for 3 nibbles, we get 1/5
  • for 4 nibbles, we get 1/13

As we're trying to optimize for the small branches (the big won't benefit from removing 1 byte) it's a bet. With the ncoding proposed before, we get 1 byte for nibbles and always 2 bytes for other cases. Here we optimize up to any number, but this is diminishing returns, right? With 4 Keccaks it'll be only 1% saved.

What do you think then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to take up this optimization in another PR, since it would require extra logic and even additional testing to ensure that it works fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this said, should this PR be no longer labeled with Breaking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed the label.

}

[Test]
public void All_children_set_with_all_Keccaks_set()
public void Random_delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random tests are nice for fuzzying, but it would be beneficial to have some explicit A, B, C where A, B or A,C are set so that we cover what previously was missing. Or maybe I'm not seeing the tests like this but they do exist 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that test is missing but I am not quite sure how to capture this in the test - a child with Rlp < 31 bytes while others have keccaks?
I looked at existing tests to figure it out (e.g. Three_accounts_sharing_start_nibble, Branch_two_leafs), but I couldn't find a way to simulate this scenario:

[Test]
public void Insert_get_operation()
{
        var commit = new Commit();

        const string key0 = "A0000001";
        const string key1 = "B0000002";
        const string key2 = "C0000003";
        
        commit.Set(key0);
        commit.Set(key1);
        commit.Set(key2);
        
        NibbleSet.Readonly children = new NibbleSet(0xA, 0xB, 0xC);
        commit.SetBranch(Key.Merkle(NibblePath.Empty), children);

        var merkle = new ComputeMerkleBehavior();

        merkle.BeforeCommit(commit, CacheBudget.Options.None.Build());
}

I am probably missing some obvious way to complete this test scenario 🫤

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd replace the Branch based test with a direct KeccakMemo test maybe? Not sure if covering the compounded Branch + memo is doable in a nice way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added basic new tests in RLPMemo with fixed children. Branch + memo test couldn't be done without exposing few internal details as public, so sticking to independent unit tests.

@dipkakwani dipkakwani added the 💥Breaking The change introduces a storage breaking change. label Jan 13, 2025
@dipkakwani dipkakwani requested a review from Scooletz January 14, 2025 08:08
Copy link

Code Coverage

Package Line Rate Branch Rate Health
Paprika 80% 76%
Summary 80% (5040 / 6297) 76% (1755 / 2312)

Minimum allowed line rate is 75%

Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more clarifications.

@dipkakwani dipkakwani removed the 💥Breaking The change introduces a storage breaking change. label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RlpMemo should work on compressed data
2 participants