Recycle serialization buffers on transmission #342

fuzzypixelz · 2024-12-16T17:34:49Z

Adds a LIFO buffer pool in the context to reuse buffers allocated on serialization. The aim is not (only) to avoid the overhead of dynamic allocation but rather to enhance the cache locality of serialization buffers.

clalancette · 2024-12-17T21:41:26Z

All right, now that we've merged in #327 , we can consider this one. Please rebase this onto the latest, then we can do a full review of it. Until then, I'll mark it as a draft.

ahcorde

There are also many changes unrelated with the goal of this PR

rmw_zenoh_cpp/src/detail/buffer_pool.hpp

fuzzypixelz · 2024-12-19T14:52:38Z

There are also many changes unrelated with the goal of this PR

There was a formatting error from my IDE. I've restored the files and manually re-applied the patches.

clalancette

Besides the comments inline, do you have any updated performance numbers here?

rmw_zenoh_cpp/src/detail/buffer_pool.cpp

docs/design.md

rmw_zenoh_cpp/src/detail/buffer_pool.cpp

Yadunund

Overall this looks good. I've left some feedback to re-structure the code a bit.

rmw_zenoh_cpp/src/detail/rmw_context_impl_s.hpp

rmw_zenoh_cpp/src/detail/buffer_pool.hpp

fuzzypixelz · 2025-01-29T19:05:24Z

@clalancette @Yadunund The numbers I've shown are for single-process (session-local) communication. I would like to investigate the impact of this pull request on multi-process communication on the same host as well, to make sure we're not degrading performance in that case.

Once that's done, I will post numbers and address your remaining review comments.

Adds a bounded LIFO buffer pool in the context to reuse buffers allocated on serialization. The aim is not (only) to avoid the overhead of dynamic allocation but rather to enhance the cache locality of serialization buffers.

Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

Yadunund · 2025-02-07T00:54:59Z

@clalancette @Yadunund The numbers I've shown are for single-process (session-local) communication. I would like to investigate the impact of this pull request on multi-process communication on the same host as well, to make sure we're not degrading performance in that case.

Once that's done, I will post numbers and address your remaining review comments.

@fuzzypixelz any update on performance metrics after the recent set of changes? Is this ready for another review?

fuzzypixelz · 2025-02-07T16:34:45Z

@clalancette @Yadunund Here are the benchmarking results. I have used the iRobot benchmark here in single-process and multi-process modes, using the Mont Blanc topology with IPC disabled (only relevant for single-process mode). I used four benchmarking machines with varying specs to make sure we perform properly on both low-end and high-end devices.

Host 1

System information

CPU: AMD EPYC 7502 32-Core Processor
MEM: 512G

Single-process

Q2

Q3

Mean

Multi-process

Q2

Q3

Mean

Host 2

System information

CPU: 12th Gen Intel(R) Core(TM) i5-1240P
MEM: 16G

Single-process

Q2

Q3

Mean

Multi-process

Q2

Q3

Mean

Host 3

System information

CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
MEM: 8G

Single-process

Q2

Q3

Mean

Multi-process

Q2

Q3

Mean

Host 4

CPU: ARM Cortex-A76
MEM: 8G

Single-process

Q2

Q3

Mean

Multi-process

Q2

Q3

Mean

fuzzypixelz · 2025-02-07T16:38:51Z

~~The above comment turned out too long, I should probably use collapsed sections.~~

@Yadunund My conclusion is that this pull request consistently improves latency in intra-process communication for relatively large topics: these are columbia (250 KB) and tagus (250 KB). But there are data points where latency is worse, especially for some topics in intra-process communication.

My attempts to ascertain the root cause of these apparent regressions have not been successful. However, I believe this pull request definitely solves a real problem. I consistently observe lower latency for this pull request on large topics (hundreds of KBs).

I'm not very confident in the reliability of the iRobot benchmark. On one hand, there are issues with the accuracy of measurements; I ran all tests for 60 seconds on idle machines, otherwise results would significantly very from run to run.

Of course I'm not saying that these problematic numbers are meaningless, they could be signs of a real problem. But my confidence in them is rather low, especially when the difference is on the order of tens of microseconds.

I still think that this change is a necessary first step that "just makes sense" and brings rmw_zenoh in line with other RMWs. But there are clearly opportunities for refinement. This can be the subject of future work.

Yadunund · 2025-02-12T01:43:42Z

@fuzzypixelz thanks a lot for the detailed study! I'll take a closer look later this week.

docs/design.md

rmw_zenoh_cpp/src/detail/buffer_pool.hpp

rmw_zenoh_cpp/src/detail/buffer_pool.cpp

rmw_zenoh_cpp/src/detail/buffer_pool.hpp

rmw_zenoh_cpp/src/detail/buffer_pool.cpp

rmw_zenoh_cpp/src/detail/rmw_context_impl_s.hpp

Co-authored-by: yadunund <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

…IZE_BYTES`

fuzzypixelz · 2025-02-20T17:09:37Z

@Yadunund This should be ready to merge.

ahcorde

I merge this branch with rolling and if you run

colcon test --merge-install --event-handlers console_direct+ --packages-select rc

you will see some new failures

	107 - test_publisher__rmw_zenoh_cpp (Failed)
	108 - test_publisher_wait_all_ack__rmw_zenoh_cpp (Failed)
	110 - test_subscription__rmw_zenoh_cpp (Failed)
	113 - test_logging_rosout__rmw_zenoh_cpp (Failed)
	117 - test_service_event_publisher__rmw_zenoh_cpp (Failed)

Yadunund · 2025-03-14T00:37:04Z

@fuzzypixelz let's revisit this after the kilted freeze.

This comment was marked as outdated.

Sign in to view

fuzzypixelz force-pushed the buffer-pool branch from 16df6fe to 401016c Compare December 16, 2024 17:49

fuzzypixelz changed the title ~~Recycle serialization buffers on transmission.~~ Recycle serialization buffers on transmission Dec 16, 2024

clalancette marked this pull request as draft December 17, 2024 21:41

fuzzypixelz force-pushed the buffer-pool branch from 401016c to 62cb09b Compare December 18, 2024 16:59

YuanYuYuan mentioned this pull request Dec 18, 2024

Bump up zenoh version #347

Merged

fuzzypixelz force-pushed the buffer-pool branch 2 times, most recently from 068cf50 to 21006d0 Compare December 19, 2024 11:32

ahcorde requested changes Dec 19, 2024

View reviewed changes

rmw_zenoh_cpp/src/detail/buffer_pool.hpp Outdated Show resolved Hide resolved

fuzzypixelz force-pushed the buffer-pool branch 2 times, most recently from 7ca544b to bb6fd88 Compare December 19, 2024 14:46

fuzzypixelz force-pushed the buffer-pool branch 5 times, most recently from 8dd9bf5 to bcc36a1 Compare December 20, 2024 16:00

clalancette requested changes Jan 2, 2025

View reviewed changes

rmw_zenoh_cpp/src/detail/buffer_pool.cpp Outdated Show resolved Hide resolved

rmw_zenoh_cpp/src/detail/buffer_pool.cpp Outdated Show resolved Hide resolved

rmw_zenoh_cpp/src/detail/buffer_pool.cpp Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

fuzzypixelz force-pushed the buffer-pool branch from e7f140c to 0015353 Compare January 3, 2025 14:41

fuzzypixelz marked this pull request as ready for review January 3, 2025 15:20

fuzzypixelz requested a review from clalancette January 3, 2025 15:20

Yadunund self-assigned this Jan 3, 2025

fuzzypixelz force-pushed the buffer-pool branch 2 times, most recently from 6c48aa8 to 6143959 Compare January 17, 2025 09:58

clalancette requested changes Jan 21, 2025

View reviewed changes

Yadunund requested changes Jan 29, 2025

View reviewed changes

rmw_zenoh_cpp/src/detail/rmw_context_impl_s.hpp Outdated Show resolved Hide resolved

rmw_zenoh_cpp/src/detail/buffer_pool.hpp Outdated Show resolved Hide resolved

fuzzypixelz force-pushed the buffer-pool branch from fbdc876 to 25e5739 Compare January 31, 2025 15:34

fuzzypixelz and others added 9 commits February 6, 2025 16:22

Recycle serialization buffers on transmission

75d7c88

Adds a bounded LIFO buffer pool in the context to reuse buffers allocated on serialization. The aim is not (only) to avoid the overhead of dynamic allocation but rather to enhance the cache locality of serialization buffers.

Fix error-handling corner case in allocate

48addf5

Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

Add buffer_pool.cpp in CMake

2fa2381

Fix pointer check in PublisherData::publish

e54660f

Fix handling of empty RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES

d68369e

Document RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES

7f1438b

Update docs/design.md

6781c42

Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

Update docs/design.md

06ff7a9

Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

Update docs/design.md

e5c70c6

Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

fuzzypixelz force-pushed the buffer-pool branch from 25e5739 to e5c70c6 Compare February 6, 2025 16:22

Yadunund requested changes Feb 13, 2025

View reviewed changes

fuzzypixelz and others added 7 commits February 13, 2025 10:09

Update rmw_zenoh_cpp/src/detail/buffer_pool.hpp

1c49169

Co-authored-by: yadunund <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

Update rmw_zenoh_cpp/src/detail/buffer_pool.cpp

324bd93

Co-authored-by: yadunund <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>

Log warning instead of error on erronous `RMW_ZENOH_BUFFER_POOL_MAX_S…

3f60e58

…IZE_BYTES`

Move global pool to rmw_context_impl_s::Data

cf5265e

Move buffer_pool to zenoh_utils

9273f95

Reduce default RMW_ZENOH_BUFFER_POOL_MAX_SIZE_BYTES to 8 MiB

8b71679

Add missing impl in zenoh_utils

c18a1b5

fuzzypixelz requested review from Yadunund and ahcorde March 11, 2025 15:15

ahcorde requested changes Mar 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recycle serialization buffers on transmission #342

Recycle serialization buffers on transmission #342

fuzzypixelz commented Dec 16, 2024

This comment was marked as outdated.

clalancette commented Dec 17, 2024

ahcorde left a comment

fuzzypixelz commented Dec 19, 2024

clalancette left a comment

This comment was marked as outdated.

Yadunund left a comment

fuzzypixelz commented Jan 29, 2025

Yadunund commented Feb 7, 2025

fuzzypixelz commented Feb 7, 2025 •

edited

Loading

fuzzypixelz commented Feb 7, 2025 •

edited

Loading

Yadunund commented Feb 12, 2025

fuzzypixelz commented Feb 20, 2025

ahcorde left a comment •

edited

Loading

Yadunund commented Mar 14, 2025

Recycle serialization buffers on transmission #342

Are you sure you want to change the base?

Recycle serialization buffers on transmission #342

Conversation

fuzzypixelz commented Dec 16, 2024

This comment was marked as outdated.

clalancette commented Dec 17, 2024

ahcorde left a comment

Choose a reason for hiding this comment

fuzzypixelz commented Dec 19, 2024

clalancette left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

Yadunund left a comment

Choose a reason for hiding this comment

fuzzypixelz commented Jan 29, 2025

Yadunund commented Feb 7, 2025

fuzzypixelz commented Feb 7, 2025 • edited Loading

Host 1

System information

Single-process

Multi-process

Host 2

System information

Single-process

Multi-process

Host 3

System information

Single-process

Multi-process

Host 4

Single-process

Multi-process

fuzzypixelz commented Feb 7, 2025 • edited Loading

Yadunund commented Feb 12, 2025

fuzzypixelz commented Feb 20, 2025

ahcorde left a comment • edited Loading

Choose a reason for hiding this comment

Yadunund commented Mar 14, 2025

fuzzypixelz commented Feb 7, 2025 •

edited

Loading

fuzzypixelz commented Feb 7, 2025 •

edited

Loading

ahcorde left a comment •

edited

Loading