We hit CUDA OOM during trellis2 mesh post-process (fill_holes → get_edges). The error comes from src/connectivity.cu:get_edges(), and fill_holes() triggers multiple CUB operations (sort/scan/select), allocating large temporary buffers.
Stack (excerpt)
- trellis2_image_to_3d.py: decode_latent -> m.fill_holes()
- trellis2/representations/mesh/base.py: fill_holes
- cumesh.py: get_edges
- src/connectivity.cu: get_edges
- Error: [CuMesh] CUDA error ... Error text: out of memory
Hotspots
- src/connectivity.cu::CuMesh::get_edges()
- edges.resize(F*3)
- temp_storage.resize(F3sizeof(uint64_t))
- cub::DeviceRadixSort::SortKeys
- cub::DeviceRunLengthEncode::Encode
- src/clean_up.cu::CuMesh::fill_holes()
- multiple cudaMalloc + DeviceSegmentedReduce/Select/Scan
Likely causes
- get_edges allocates several buffers proportional to F*3 plus CUB temp storage → high peak VRAM.
- CuMesh caches are tied to the instance and not automatically freed. Even if users create a new CuMesh() per call, Python GC may not immediately destroy objects, and cudaMalloc buffers
can linger, causing fragmentation or sustained high usage.
Addendum: lack of clear_cache()
In long-lived services, if the call path doesn’t explicitly call clear_cache(), cached buffers persist and OOM becomes more likely. It would help to aggressively free temp/cache buffers
after heavy ops like fill_holes, or clearly document the requirement.
Suggestions
- Free temp buffers earlier in get_edges / fill_holes
- Provide low-mem / chunked path
- Reuse CUB temp storage
- Expose stats (E/B/L) for upstream guard
Env
- CuMesh commit: 8290b77
- GPU: NVIDIA L20 (48GB)
- PyTorch 2.6.0 / CUDA 12.4
- OS: Linux x86_64
- Stage: fill_holes / get_edges
We hit CUDA OOM during trellis2 mesh post-process (fill_holes → get_edges). The error comes from src/connectivity.cu:get_edges(), and fill_holes() triggers multiple CUB operations (sort/scan/select), allocating large temporary buffers.
Stack (excerpt)
Hotspots
Likely causes
can linger, causing fragmentation or sustained high usage.
Addendum: lack of clear_cache()
In long-lived services, if the call path doesn’t explicitly call clear_cache(), cached buffers persist and OOM becomes more likely. It would help to aggressively free temp/cache buffers
after heavy ops like fill_holes, or clearly document the requirement.
Suggestions
Env