You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I am building CUTLASS 3.4.1 with the Intel LLVM (oneAPI) compiler, version 2024.0. Everything builds, and all the tests pass, except for one: TensorRef.rank2_column_major_interleaved, part of ctest_unit_core. If I swap out Intel with GCC 7.4, the entire test suite passes clean.
Because the test is operating on a matrix of integers, I'm scratching my head as to what could be going on here; this clearly isn't some minor floating-point instability, and it's quite a difference to chalk up to some alternate interpretation of C++. I augmented the test code to print out the content of matrix_data[], placed output from the two builds side by side, and copied it below. I am hoping some pattern will be apparent:
Steps/Code to reproduce bug
In theory, installing the oneAPI compiler and building with it should be enough to reproduce the issue.
Expected behavior
The test at issue should pass equally well with both compilers.
Environment details:
Building and running inside Docker, Red Hat Linux UBI8-based container.
Compiler: Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017)
Compiler builds against the GCC 7.4.0 C/C++ runtime
Additional context
The test fails with or without my warning fixes (#1380) applied.
The text was updated successfully, but these errors were encountered:
This is curious indeed. Can you check if it's optimizing the host code to use SIMD instructions and if so disable them and try again? It's unclear to me if this is a bug in CUTLASS or in the OpenAPI compiler.
This is, in fact, meant to be a debug build. The exact CXXFLAGS I'm using are --gcc-install-dir=/path/to/gcc/7.4.0/lib/gcc/x86_64-pc-linux-gnu/7.4.0 -std=c++17 -ffp-model=precise -fmessage-length=0 -fstack-protector-strong -fno-strict-aliasing -O0 -g -fstack-security-check.
CUTLASS is adding -O3 to CUDA_FLAGS, if that makes a difference (not sure if that gets passed to the host compiler).
I've found that the -O3 is coming from CMake, via the the default CMAKE_BUILD_TYPE=Release. If I set CMAKE_BUILD_TYPE=None (so that it only uses the flags I specify), then the -O3 goes away, and so does the test failure.
Indications thus appear to be some kind of optimizer bug in the host compiler, so I am closing this issue for now. If I find any CUTLASS-side problems, I'll re-file here.
Describe the bug
I am building CUTLASS 3.4.1 with the Intel LLVM (oneAPI) compiler, version 2024.0. Everything builds, and all the tests pass, except for one:
TensorRef.rank2_column_major_interleaved
, part ofctest_unit_core
. If I swap out Intel with GCC 7.4, the entire test suite passes clean.Because the test is operating on a matrix of integers, I'm scratching my head as to what could be going on here; this clearly isn't some minor floating-point instability, and it's quite a difference to chalk up to some alternate interpretation of C++. I augmented the test code to print out the content of
matrix_data[]
, placed output from the two builds side by side, and copied it below. I am hoping some pattern will be apparent:Steps/Code to reproduce bug
In theory, installing the oneAPI compiler and building with it should be enough to reproduce the issue.
Expected behavior
The test at issue should pass equally well with both compilers.
Environment details:
Building and running inside Docker, Red Hat Linux UBI8-based container.
Compiler: Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017)
Compiler builds against the GCC 7.4.0 C/C++ runtime
Additional context
The test fails with or without my warning fixes (#1380) applied.
The text was updated successfully, but these errors were encountered: