Conversation
There was a problem hiding this comment.
It looks like remove_duplicate_faces() has gone through a significant refactor and now introduces Thrust as a dependency.
Could you share more context on the motivation behind this change? Specifically, which CUB feature is broken or unavailable in CUDA 11.8 that necessitated this switch?
Also, were any alternative approaches considered to avoid adding a new dependency?
|
Thanks for your contribution! I have some concerns before merging the change. It would be great if you could add more context. |
|
Thanks for your timely response. This PR addresses compatibility issues with CUDA 11.8 by updating two CUB-based operations that rely on newer API features not available in older CUDA toolkits.:
Problem: Compilation failed with the error:
This occurred because CUDA 11.8’s bundled CUB version requires explicit input and output iterators for ExclusiveSum. The original code used an in-place pattern that relies on overloads only available in CUDA 12.4+. Solution:
Problem: Compilation failed due to:
The compilation error occurs because cub::DeviceRadixSort operates on bitwise representations. While newer versions of CUB allow flexible custom Decomposers to handle structs like int3, the version included in CUDA 11.8 is strictly limited to arithmetic types (or types with explicit UnsignedBits traits). The int3_decomposer pattern used in the original code does not match the template signature required by the older CUB API, causing the no instance matches argument list error. I attempted to stay within CUB by using cub::DeviceMergeSort with a custom comparator (which usually handles structs better than RadixSort). However, this also failed to compile on CUDA 11.8 due to issues resolving the comparator for the int3 type within the device lambda context of that specific CUB version. Solution:
I’m more than happy to run additional experiments to confirm the stability and correctness of these changes. |
|
I see. Then let’s use Thrust for CUDA < 12.4. Example: #if defined(CUDART_VERSION) && (CUDART_VERSION < 12040)
#include <thrust/...>
#endif
...
#if defined(CUDART_VERSION) && (CUDART_VERSION < 12040)
// CUDA < 12.4: use Thrust implementation
// Thrust-based code here
#else
// CUDA >= 12.4: keep existing behavior
// current implementation
#endifThis way we can keep the new path well isolated and minimize risk for newer CUDA versions. |
|
Implemented as requested: the Thrust-based remove_duplicate_faces path is now isolated behind #if defined(CUDART_VERSION) && (CUDART_VERSION < 12040), and CUDA >= 12.4 keeps the original CUB DeviceRadixSort::SortPairs behavior. Validation:
Please take another look when convenient. |
As some issues stated https://github.com/microsoft/TRELLIS.2/issues/34, current implementation uses some new features on CUB on cu124. I changes the implementation so that the code could be correctly built on cu118.