-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recycle serialization buffers on transmission #342
base: rolling
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
16df6fe
to
401016c
Compare
All right, now that we've merged in #327 , we can consider this one. Please rebase this onto the latest, then we can do a full review of it. Until then, I'll mark it as a draft. |
401016c
to
62cb09b
Compare
068cf50
to
21006d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also many changes unrelated with the goal of this PR
7ca544b
to
bb6fd88
Compare
There was a formatting error from my IDE. I've restored the files and manually re-applied the patches. |
8dd9bf5
to
bcc36a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the comments inline, do you have any updated performance numbers here?
This comment was marked as outdated.
This comment was marked as outdated.
e7f140c
to
0015353
Compare
6c48aa8
to
6143959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. I've left some feedback to re-structure the code a bit.
@clalancette @Yadunund The numbers I've shown are for single-process (session-local) communication. I would like to investigate the impact of this pull request on multi-process communication on the same host as well, to make sure we're not degrading performance in that case. Once that's done, I will post numbers and address your remaining review comments. |
fbdc876
to
25e5739
Compare
Adds a bounded LIFO buffer pool in the context to reuse buffers allocated on serialization. The aim is not (only) to avoid the overhead of dynamic allocation but rather to enhance the cache locality of serialization buffers.
Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>
25e5739
to
e5c70c6
Compare
@fuzzypixelz any update on performance metrics after the recent set of changes? Is this ready for another review? |
@clalancette @Yadunund Here are the benchmarking results. I have used the iRobot benchmark here in single-process and multi-process modes, using the Mont Blanc topology with IPC disabled (only relevant for single-process mode). I used four benchmarking machines with varying specs to make sure we perform properly on both low-end and high-end devices. Host 1System information
Single-processMulti-processHost 2System information
Single-processMulti-processHost 3System information
Single-processMulti-processHost 4
Single-processMulti-process |
@Yadunund My conclusion is that this pull request consistently improves latency in intra-process communication for relatively large topics: these are My attempts to ascertain the root cause of these apparent regressions have not been successful. However, I believe this pull request definitely solves a real problem. I consistently observe lower latency for this pull request on large topics (hundreds of KBs). I'm not very confident in the reliability of the iRobot benchmark. On one hand, there are issues with the accuracy of measurements; I ran all tests for 60 seconds on idle machines, otherwise results would significantly very from run to run. Of course I'm not saying that these problematic numbers are meaningless, they could be signs of a real problem. But my confidence in them is rather low, especially when the difference is on the order of tens of microseconds. I still think that this change is a necessary first step that "just makes sense" and brings rmw_zenoh in line with other RMWs. But there are clearly opportunities for refinement. This can be the subject of future work. |
@fuzzypixelz thanks a lot for the detailed study! I'll take a closer look later this week. |
Co-authored-by: yadunund <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: yadunund <[email protected]> Signed-off-by: Mahmoud Mazouz <[email protected]>
@Yadunund This should be ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merge this branch with rolling
and if you run
colcon test --merge-install --event-handlers console_direct+ --packages-select rc
you will see some new failures
107 - test_publisher__rmw_zenoh_cpp (Failed)
108 - test_publisher_wait_all_ack__rmw_zenoh_cpp (Failed)
110 - test_subscription__rmw_zenoh_cpp (Failed)
113 - test_logging_rosout__rmw_zenoh_cpp (Failed)
117 - test_service_event_publisher__rmw_zenoh_cpp (Failed)
@fuzzypixelz let's revisit this after the kilted freeze. |
Adds a LIFO buffer pool in the context to reuse buffers allocated on serialization. The aim is not (only) to avoid the overhead of dynamic allocation but rather to enhance the cache locality of serialization buffers.