Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

EddiG · 2024-08-23T10:46:29Z

This update introduces a solution to handle data alignment in custom extensions, ensuring that encoded data is aligned in memory according to specified requirements. The main benefit of this approach is enabling zero-copy deserialization, which significantly enhances performance when working with data types like Float32Array that have strict alignment needs.

Key Steps:

Alignment Handling in Encoding:
- In the Encoder class, the code checks if an alignment requirement (align) is specified for the extension before encoding it.
- The necessary padding is calculated to align the data properly in memory. Padding bytes (0xc1, known as noop bytes) are inserted before the extension type to ensure the data starts at an aligned memory position.
Padding Skipping in Decoding:
- The Decoder class is enhanced to recognize and skip padding bytes (0xc1) when decoding an extension. This ensures the actual extension type is correctly identified, regardless of any alignment padding added during encoding.
Storing Alignment Information:
- The ExtensionCodec class is modified to store the alignment requirement for each registered extension. This alignment information is utilized during encoding to ensure that data is correctly aligned in memory.
Modification of the ExtData Class:
- The ExtData class is extended to include an optional align property. This property stores the alignment requirement for each extension, allowing the encoder to handle data alignment appropriately.

codecov-commenter · 2025-01-26T02:27:56Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.10%. Comparing base (3b6ef80) to head (c7e0d03).
Report is 27 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #245      +/-   ##
==========================================
+ Coverage   98.07%   98.10%   +0.02%     
==========================================
  Files          16       16              
  Lines        1092     1106      +14     
  Branches      249      251       +2     
==========================================
+ Hits         1071     1085      +14     
  Misses         21       21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gfx · 2025-01-26T02:40:12Z

Hi, @EddiG. Thank you for your contribution.

This PR could boot performance in particular cases, but extending the MessagePack format might break compatibility with other MessagePack processors.

Padding is okay, but changing Decoder might not be a good idea. Could you please re-consider the format that does not affect Decoder in order not to break the compatibility?

EddiG · 2025-01-26T15:53:02Z

Hi @gfx

I agree that breaking compatibility is possible in the current state.

Let me go ahead and write down my considerations so far.

There are two places to add the padding I see.

Before the type, so that the final format is [padding] + [extType] + [data]
After the type, so that the final format is [extType] + [padding] + [data]

In the first case, we must be sure that the padding byte doesn't intersect with the other extension type bytes.
The byte 0xc1 was chosen due to its special meaning in the msgpack format. I understand though that I could misinterpret the meaning of that byte and de facto it could be used broadly as an extension type. In that case, users who use that particular byte to register their extension codec will be affected by changes introduced in that PR.
To protect them from the compatibility issue we can extend the DecoderOptions class with the useAlignment boolean option. That option is false by default and does not change the current behaviour of the decoder. To enable the alignment logic in the decoder user must enable it explicitly like in the code below.

      const decoded = decode(encoded, { extensionCodec, useAlignment: true });

In the second case, we must be sure that we do not mess up the padding and data. So far I see only one way to do that, it is by adding an extra byte that tells us the number of the padding bytes we should offset to reach the actual data. That solution means that we add an extra byte for any data encoded with the extension codec with enabled alignment even if the data is already aligned.
The implementation of that solution could be done in different ways. I can think of two of them.
One implementation could allow utilisation of the encode/decode functions of the extension codec to make the encoding/decoding with the alignment. That solution requires having access to the current position of the final buffer that is filling in (const dataPos = this.pos + 1;) not only in the Encoder but in the encode function of the extension codec. Due to the current implementation of the Encoder, ExtData and ExtensionCodec it involves significant change.
Another implementation could be done as a continuation of the proposed PR. It still requires extra options to protect the current users from compatibility issues.

What do you think?

EddiG · 2025-01-27T16:12:23Z

For me, the ideal solution would be to provide the current position in the filling buffer to the encode function of the extension codec. In that case, I can calculate how many padding bytes I should add to my data to keep it aligned. Then in the decode function, I skip the padding bytes and build a typed array.
I like it as the API is still clean, with no need for extra parameters such as align and realignment. It gives me more freedom to choose the way of the data alignment, such as the padding byte, format, and possible optimizations.

I'll give it a try to verify that it is conceptually doable.

EddiG · 2025-01-27T20:19:13Z

I've proposed another solution here #248
Please, take a look.

gfx · 2025-02-03T23:57:05Z

Yes, we should conentrait on #248

Support data alignment in the custom extensions

2796aef

EddiG force-pushed the main branch from 5c88f17 to 2796aef Compare August 23, 2024 10:56

Merge branch 'main' into main

c7e0d03

EddiG mentioned this pull request Jan 27, 2025

Allow the data alignment to support zero-copy decoding #248

Merged

gfx closed this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Uh oh!

EddiG commented Aug 23, 2024

Uh oh!

codecov-commenter commented Jan 26, 2025 •

edited

Loading

Uh oh!

gfx commented Jan 26, 2025

Uh oh!

EddiG commented Jan 26, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

gfx commented Feb 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Uh oh!

Conversation

EddiG commented Aug 23, 2024

Key Steps:

Uh oh!

codecov-commenter commented Jan 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfx commented Jan 26, 2025

Uh oh!

EddiG commented Jan 26, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

gfx commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 26, 2025 •

edited

Loading

gfx commented Feb 3, 2025 •

edited

Loading