Exploring GPU-Level Zero-Copy boundaries for GS-SLAM: Tensor serialization overhead and CUDA IPC (Ref: Issue #651) #1480

ydhc777 · 2026-03-21T05:52:35Z

ydhc777
Mar 21, 2026

Hello dora-rs maintainers and community,
I am currently architecting a dataflow for a dynamic Gaussian Splatting SLAM (GS-SLAM) system using dora-rs.
While reviewing historical performance profiling within the framework, I analyzed Issue #651, where PCL alignment caused abnormal CPU spikes. This highlighted the performance ceilings—specifically memory alignment and cache misses—when interfacing heavy external libraries with Dora's Arrow-based shared memory on the CPU side.
While the CPU-side PCL bottleneck has been resolved, my GS-SLAM implementation introduces a different architectural stress test on the GPU side. We are streaming massive, high-frequency updates of 3D Gaussian primitives, which are essentially large parameter matrices (typically Nx14 dimensions containing covariance and spherical harmonics coefficients) that heavily rely on CUDA tensors.
Building upon the lessons from #651, I would like to discuss the current architectural boundaries regarding GPU memory in dora-rs:
1.Cross-Language VRAM Mapping Overhead: When bridging Arrow-backed shared memory from the Rust core into Python (e.g., a PyTorch node for GS rendering/optimization), what is the expected overhead when these arrays are mapped into GPU VRAM? Does the current PyArrow FFI implementation trigger an implicit Host-to-Device memory copy that negates the Arrow zero-copy advantage?
2.CUDA IPC Roadmap: Are there any existing paradigms, experimental features, or roadmap plans (perhaps relevant to GSoC 2026 ideas) for supporting direct GPU-to-GPU memory sharing (such as CUDA IPC) across nodes? Ideally, for high-frequency tensor data streams, we would want to bypass the CPU host memory allocation entirely.
I am very interested in exploring how dora-rs can be pushed to handle high-bandwidth neural rendering dataflows, and whether this direction aligns with the community's future priorities. Any architectural insights or pointers would be highly appreciated.
Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dora-rs

Exploring GPU-Level Zero-Copy boundaries for GS-SLAM: Tensor serialization overhead and CUDA IPC (Ref: Issue #651) #1480

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

dora-rs

Exploring GPU-Level Zero-Copy boundaries for GS-SLAM: Tensor serialization overhead and CUDA IPC (Ref: Issue #651) #1480

Uh oh!

ydhc777 Mar 21, 2026

Replies: 0 comments

ydhc777
Mar 21, 2026