Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Closed
wants to merge 2 commits into from

Conversation

EddiG
Copy link
Contributor

@EddiG EddiG commented Aug 23, 2024

This update introduces a solution to handle data alignment in custom extensions, ensuring that encoded data is aligned in memory according to specified requirements. The main benefit of this approach is enabling zero-copy deserialization, which significantly enhances performance when working with data types like Float32Array that have strict alignment needs.

Key Steps:

  1. Alignment Handling in Encoding:

    • In the Encoder class, the code checks if an alignment requirement (align) is specified for the extension before encoding it.
    • The necessary padding is calculated to align the data properly in memory. Padding bytes (0xc1, known as noop bytes) are inserted before the extension type to ensure the data starts at an aligned memory position.
  2. Padding Skipping in Decoding:

    • The Decoder class is enhanced to recognize and skip padding bytes (0xc1) when decoding an extension. This ensures the actual extension type is correctly identified, regardless of any alignment padding added during encoding.
  3. Storing Alignment Information:

    • The ExtensionCodec class is modified to store the alignment requirement for each registered extension. This alignment information is utilized during encoding to ensure that data is correctly aligned in memory.
  4. Modification of the ExtData Class:

    • The ExtData class is extended to include an optional align property. This property stores the alignment requirement for each extension, allowing the encoder to handle data alignment appropriately.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.10%. Comparing base (3b6ef80) to head (c7e0d03).
Report is 27 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #245      +/-   ##
==========================================
+ Coverage   98.07%   98.10%   +0.02%     
==========================================
  Files          16       16              
  Lines        1092     1106      +14     
  Branches      249      251       +2     
==========================================
+ Hits         1071     1085      +14     
  Misses         21       21              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gfx
Copy link
Member

gfx commented Jan 26, 2025

Hi, @EddiG. Thank you for your contribution.

This PR could boot performance in particular cases, but extending the MessagePack format might break compatibility with other MessagePack processors.

Padding is okay, but changing Decoder might not be a good idea. Could you please re-consider the format that does not affect Decoder in order not to break the compatibility?

@EddiG
Copy link
Contributor Author

EddiG commented Jan 26, 2025

Hi @gfx

I agree that breaking compatibility is possible in the current state.

Let me go ahead and write down my considerations so far.

There are two places to add the padding I see.

  1. Before the type, so that the final format is [padding] + [extType] + [data]
  2. After the type, so that the final format is [extType] + [padding] + [data]

In the first case, we must be sure that the padding byte doesn't intersect with the other extension type bytes.
The byte 0xc1 was chosen due to its special meaning in the msgpack format. I understand though that I could misinterpret the meaning of that byte and de facto it could be used broadly as an extension type. In that case, users who use that particular byte to register their extension codec will be affected by changes introduced in that PR.
To protect them from the compatibility issue we can extend the DecoderOptions class with the useAlignment boolean option. That option is false by default and does not change the current behaviour of the decoder. To enable the alignment logic in the decoder user must enable it explicitly like in the code below.

      const decoded = decode(encoded, { extensionCodec, useAlignment: true });

In the second case, we must be sure that we do not mess up the padding and data. So far I see only one way to do that, it is by adding an extra byte that tells us the number of the padding bytes we should offset to reach the actual data. That solution means that we add an extra byte for any data encoded with the extension codec with enabled alignment even if the data is already aligned.
The implementation of that solution could be done in different ways. I can think of two of them.
One implementation could allow utilisation of the encode/decode functions of the extension codec to make the encoding/decoding with the alignment. That solution requires having access to the current position of the final buffer that is filling in (const dataPos = this.pos + 1;) not only in the Encoder but in the encode function of the extension codec. Due to the current implementation of the Encoder, ExtData and ExtensionCodec it involves significant change.
Another implementation could be done as a continuation of the proposed PR. It still requires extra options to protect the current users from compatibility issues.

What do you think?

@EddiG
Copy link
Contributor Author

EddiG commented Jan 27, 2025

For me, the ideal solution would be to provide the current position in the filling buffer to the encode function of the extension codec. In that case, I can calculate how many padding bytes I should add to my data to keep it aligned. Then in the decode function, I skip the padding bytes and build a typed array.
I like it as the API is still clean, with no need for extra parameters such as align and realignment. It gives me more freedom to choose the way of the data alignment, such as the padding byte, format, and possible optimizations.

I'll give it a try to verify that it is conceptually doable.

@EddiG
Copy link
Contributor Author

EddiG commented Jan 27, 2025

I've proposed another solution here #248
Please, take a look.

@gfx
Copy link
Member

gfx commented Feb 3, 2025

Yes, we should conentrait on #248

@gfx gfx closed this Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants