-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAPIDS accelerated UDF examples build environment does not match spark-rapids-jni environment #362
Comments
RAPIDS may drop support for CentOS7 in the upcoming release, and has Ubuntu 20.04 as a minimum required version ( https://docs.rapids.ai/install#system-req ). Does that change what we need to do here? Or do we still need to ensure the Dockerfile used by the examples is using the same setup as spark-rapids-jni, and we need to update the spark-rapids-jni setup to account for the new minimum OS versions? Ref: https://endoflife.software/operating-systems/linux/red-hat-enterprise-linux-rhel |
This. Bottom line is the examples need to build in the same environment as spark-rapids-jni does, regardless of what that environment actually is. Note that we still want to build spark-rapids-jni in a way that allows a single binary to run on all supported OS's, and I'm doubtful we can simply build on Ubuntu 20.04's default toolchain to try to satisfy that requirement. |
If we plan to update the spark-rapids-jni build setup in 24.04, we can do this issue after changing spark-rapids-jni. |
It looks like RAPIDS will deprecate CentOS7 in 24.04 and stop support in 24.06, per rapidsai/docs#475 For 24.04 we should make sure the Dockerfile used for the examples matches the same one used for spark-rapids-jni (Centos7+devtoolset) In parallel we should figure out what our minimum toolchain will be so we are ready in 24.06. |
Hi @YanxuanLiu, is it possible to use the same docker file to build UDF example as the JNI? |
Sry but I think @NvTimLiu could help on this issue. I haven't dealt with this issue. |
Hi @NvTimLiu, Can you check if it's possible to use the same docker to build UDF example as the JNI? |
Good for CI to use the same docker image as the rapids JNI to build UDF examples We have a Dockerfile specified for building UDF examples https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile Shall we remove it, and document it that we build UDF examples with the rapids JNI docker image? |
Discussed with Gary, we'll use the same docker in CI job and document the link of dockerfile in JNI. I'll handle it. |
The Dockerfile used for the RAPIDS accelerated native UDF example build environment is using Ubuntu18.04, but the build environment used by spark-rapids-jni for the libcudf.so that will be placed in the RAPIDS Accelerator jar is using centos7+devtoolset. That means code could be crossing the GCC CXX11 ABI streams and lead to failures to find symbols at runtime when trying to load the native UDF shared library, e.g.:
which when run through cu++filt shows this is a failure to find:
The Dockerfile used by the examples should be using the same setup as spark-rapids-jni to avoid this. We should also add a RAPIDS Accelerated native UDF that uses a string_scalar with a std::string argument to help catch this ABI mismatch in the future.
The text was updated successfully, but these errors were encountered: