-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several integer-signedness warnings #1380
base: main
Are you sure you want to change the base?
Conversation
Hi! Thanks a lot for this MR! We are about to release CUTLASS 3.5 in the next 10 days or so. This update includes tons of fixes for warnings as we move towards a warnings as errors build system. Would you mind waiting for this MR until we release 3.5 and then rebasing to see if there is anything we have missed? |
Hi @thakkarV, is there a repo where I can try out the new version? I have some additional (non-warning) fixes for the current tree, so I'll push those out ASAP. |
No other repo unfortunately. We expect 3.5 to be available in this repo by March 15th. |
include/cutlass/arch/mma_sm60.h
Outdated
@@ -84,7 +84,7 @@ struct Mma< | |||
|
|||
#else | |||
CUTLASS_PRAGMA_UNROLL | |||
for (int i = 0; i < 2; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that size_t
is normally unsigned and the integer literal 2
is normally signed, it seems like this change would actually introduce a build warning, rather than fix one. Is size_t
signed on your platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, size_t
is effectively unsigned long
. But the compiler doesn't appear to like int
used as an array index. It doesn't complain about mixing size_t
with integer literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the compiler doesn't appear to like
int
used as an array index.
We don't have that experience with Clang as a host compiler. We would generally prefer to retain use of int
where it makes sense to do so. P0907 ("Signed Integers are Two’s Complement"), which was adopted into C++20, was revised in R1 "to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior," as unsigned integers do. Compilers historically have relied on that undefined behavior to optimize loops with signed integer indices. This explains the cross-compiler experience (including with Intel's compilers, at least the pre-LLVM one) that signed integers give better vectorization.
In this case, it may not matter, because the loop has an integer literal bound, cannot overflow, and can be trivially unrolled. Nevertheless, introducing size_t
makes the index have a different signedness than the bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the compiler doesn't appear to like int used as an array index.
What warning does it issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know what's going on -- the compiler must be warning about cutlass::Array::operator[]
, which takes size_type
(which is size_t
).
It's utterly conventional to use int
to index, say, std::vector
. I'm a bit surprised that the compiler warns about this. We could consider adding int
overloads of cutlass::Array::operator[]
as a work-around. If you have a chance, could you try doing that? They would go in include/cutlass/array.h
. Thanks! : - )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that doesn't seem to help. Either the overload is too similar for the compiler's taste...
/tmp/cutlass-git/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h(161): error: more than one operator "[]" matches these operands:
function "cutlass::Array<T, N, true>::operator[](int) const [with T=cutlass::complex<float>, N=8]"
/tmp/cutlass-git/include/cutlass/array.h(396): here
function "cutlass::Array<T, N, true>::operator[](cutlass::Array<T, N, true>::size_type) const [with T=cutlass::complex<float>, N=8]"
/tmp/cutlass-git/include/cutlass/array.h(401): here
operand types are: const cutlass::Array<cutlass::complex<float>, 8, true> [ cutlass::layout::ColumnMajor::LongIndex ]
detected during:
[instantiation contexts mercifully elided]
or the conversion warnings still occur. I tried having the operators take int
only, deleting the size_t
ones. But the []
operators in include/cutlass/array_subbyte.h
also appear to be involved, and they're implemented using at()
which also takes a size_t
... this quickly turns into a rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iskunk Thanks for trying this out! I appreciate you taking the time.
We'll try out some possible fixes with the most recent Clang as host compiler, and NVCC as device compiler. That is currently the only supported LLVM configuration. If we can't get it to work, then unfortunately we won't be able to accept these particular index type changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iskunk FYI, I added a unit test specifically to exercise cutlass::Array
in host code, but couldn't replicate these build errors with Clang 17.0.6 (as the host compiler).
We have two reasons not to add operator[]
overloads to cutlass::Array
.
- It's not consistent with C++ Standard Library array types such as
std::array
andstd::vector
. - It introduces the possibility of ambiguity in
operator[]
overload resolution.
Do you normally get build warnings when using int
indices with std::array
or std::vector
?
@@ -404,7 +404,7 @@ class TileIteratorTensorOpMixed<WarpShape_, OperatorShape_, int32_t, 32, OutputS | |||
TensorRef const &ref, | |||
unsigned lane_id | |||
): | |||
stride_(ref.stride()[0] / AccessType::kElements) { | |||
stride_(size_t(ref.stride()[0]) / AccessType::kElements) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stride_
is an int
. It's not clear to me how casting to size_t
would fix an integer conversion warning. Is size_t
signed on your platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; the compiler was complaining about an implicit conversion from Index
(i.e. long
) to size_t
in the divide operation, I believe due to AccessType::kElements
being a size_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining!
@@ -133,7 +133,7 @@ struct TestbedUniversal { | |||
else if (dist_kind == cutlass::Distribution::Sequential) { | |||
|
|||
cutlass::reference::host::BlockFillSequential( | |||
view.data(), view.capacity()); | |||
view.data(), int64_t(view.capacity())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: BlockFillSequential
's int64_t capacity
parameter gets cast internally anyway, so it might make sense instead to change BlockFillSequential
instead.
include/cutlass/numeric_conversion.h
Outdated
@@ -3705,7 +3705,7 @@ struct PackPredicates { | |||
int word_idx = (i / kWordSize); | |||
int bit_idx = (i % kWordSize); | |||
|
|||
uint8_t mask = ((predicates[i] ? 1u : 0u) << bit_idx); | |||
uint8_t mask = uint8_t((predicates[i] ? 1u : 0u) << bit_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Don't Repeat Yourself" principle suggests using auto
here.
uint8_t mask = uint8_t((predicates[i] ? 1u : 0u) << bit_idx); | |
auto mask = uint8_t((predicates[i] ? 1u : 0u) << bit_idx); |
include/cutlass/layout/permute.h
Outdated
@@ -514,7 +514,7 @@ class Tensor4DPermuteBMM0321ColumnMajorInverse : public PermuteBase { | |||
CUTLASS_HOST_DEVICE | |||
LongIndex operator()(MatrixCoord coord) const { | |||
|
|||
Index BMM_batch_idx = blockIdx.z; | |||
Index BMM_batch_idx = Index(blockIdx.z); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Don't Repeat Yourself" principle suggests using auto
here.
Index BMM_batch_idx = Index(blockIdx.z); | |
auto BMM_batch_idx = Index(blockIdx.z); |
include/cutlass/layout/permute.h
Outdated
@@ -453,7 +453,7 @@ class Tensor4DPermuteBMM0321ColumnMajor : public PermuteBase { | |||
CUTLASS_HOST_DEVICE | |||
LongIndex operator()(MatrixCoord coord) const { | |||
|
|||
Index BMM_batch_idx = blockIdx.z; | |||
Index BMM_batch_idx = Index(blockIdx.z); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Don't Repeat Yourself" principle suggests using auto
here.
Index BMM_batch_idx = Index(blockIdx.z); | |
auto BMM_batch_idx = Index(blockIdx.z); |
include/cutlass/layout/permute.h
Outdated
@@ -381,7 +381,7 @@ class Tensor4DPermuteBMM0213RowMajorInverse : public PermuteBase { | |||
LongIndex operator()(MatrixCoord coord) const { | |||
|
|||
// The batch index for BMM | |||
Index BMM_batch_idx = blockIdx.z; | |||
Index BMM_batch_idx = Index(blockIdx.z); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Don't Repeat Yourself" principle suggests using auto
here.
Index BMM_batch_idx = Index(blockIdx.z); | |
auto BMM_batch_idx = Index(blockIdx.z); |
include/cutlass/layout/permute.h
Outdated
@@ -318,7 +318,7 @@ class Tensor4DPermuteBMM0213RowMajor : public PermuteBase { | |||
LongIndex operator()(MatrixCoord coord) const { | |||
|
|||
// The batch index for BMM | |||
Index BMM_batch_idx = blockIdx.z; | |||
Index BMM_batch_idx = Index(blockIdx.z); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Don't Repeat Yourself" principle suggests using auto
here.
Index BMM_batch_idx = Index(blockIdx.z); | |
auto BMM_batch_idx = Index(blockIdx.z); |
include/cutlass/half.h
Outdated
@@ -248,7 +248,7 @@ struct alignas(2) half_t { | |||
|
|||
if (exp >= -14) { | |||
// normal fp32 to normal fp16 | |||
exp = uint16_t(exp + uint16_t(15)); | |||
exp = exp + 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it undoes a fix, rather than fixing a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the only way I could find to shut the warning up. (Not even exp += 15;
worked.)
No implication of this being the best solution, of course. It did need attention, however, as this was one of the top two sources of warnings in the build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining! Now that I look more closely, this makes sense, as exp
is int16_t
, a signed integer. Nevertheless, I would expect the compiler to warn about exp + 15
being a narrowing conversion (first exp
promotes to int
, then the result is int
, and the assignment narrows to 16 bits). Would the following line also work?
exp = exp + 15; | |
exp = static_cast<decltype(exp)>(exp + 15); |
Use of decltype(exp)
here follows the Don't Repeat Yourself principle and prevents possible bugs if the type of exp
later changes above.
include/cutlass/half.h
Outdated
@@ -224,7 +224,7 @@ struct alignas(2) half_t { | |||
#endif | |||
|
|||
uint16_t sign = uint16_t((s >> 16) & 0x8000); | |||
int16_t exp = uint16_t(((s >> 23) & 0xff) - 127); | |||
int16_t exp = int16_t(((s >> 23) & 0xff) - 127); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Don't Repeat Yourself" principle suggests using auto
here.
int16_t exp = int16_t(((s >> 23) & 0xff) - 127); | |
auto exp = int16_t(((s >> 23) & 0xff) - 127); |
@@ -100,7 +100,7 @@ struct ConvertAndPack<bfloat16_t, float, N, Round> { | |||
|
|||
CUTLASS_PRAGMA_UNROLL | |||
for (int i = 0; i < N; ++i) { | |||
int idx = (((i << 1) & 2) | ((i >> 1) & 1) | (i & 0xfffffffc)); | |||
int idx = (((i << 1) & 2) | ((i >> 1) & 1) | (i & 0x7ffffffc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the intent of these two changes was to remove the sign bit; is that correct?
We're already in trouble if i
rolls over to a negative number here, so this is probably OK. However, I'm generally nervous about changes to constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the sign bit was causing the compiler to recognize the constant as a negative signed value, but then the bit operations implied unsignedness.
My fix is focused on the warning, but you're probably better off replacing int
with uint32_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like this?
int idx = (((i << 1) & 2) | ((i >> 1) & 1) | (i & 0x7ffffffc)); | |
uint32_t idx = (((i << 1) & 2u) | ((i >> 1) & 1u) | (i & 0xfffffffc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I misremembered the effect of the sign bit: it didn't cause the literal to get interpreted as negative/signed, but as unsigned, and then the complaint was with i
that was signed.
Here is the full warning from the suggested code above (which I copied in verbatim):
/tmp/cutlass-git/include/cutlass/gemm/warp/mma_tensor_op.h:103:55: warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
103 | uint32_t idx = (((i << 1) & 2U) | ((i >> 1) & 1U)) | (i & 4294967292U);
| ^ ~
/tmp/cutlass-git/include/cutlass/gemm/warp/mma_tensor_op.h:103:39: warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
103 | uint32_t idx = (((i << 1) & 2U) | ((i >> 1) & 1U)) | (i & 4294967292U);
| ~~^~~~ ~
/tmp/cutlass-git/include/cutlass/gemm/warp/mma_tensor_op.h:103:21: warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
103 | uint32_t idx = (((i << 1) & 2U) | ((i >> 1) & 1U)) | (i & 4294967292U);
| ~~^~~~ ~
Rather than casting i
, I'll just make it uint32_t
as well. The u
suffixes don't appear to be needed.
(element_C_bytes_ * size_t(problem_size.m()) * size_t(problem_size.n())) + | ||
(element_A_bytes_ * size_t(problem_size.m()) * size_t(problem_size.k())) + | ||
(element_B_bytes_ * size_t(problem_size.k()) * size_t(problem_size.n())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that this was necessary, as the normal integer conversion rules should turn these into size_t
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think originally it was size_t * int * int
... is conversion of the latter two in the C++ standard? I'm assuming GCC didn't warn about these as it is more lax, but I doubt Intel's compiler (which is basically a repackaged LLVM) would be stricter in a way that breaks the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is conversion of the latter two in the C++ standard?
It's valid C++. The expression evaluates left to right. For size_t * int
, size_t
has higher conversion rank (assuming a 64-bit compiler), so int
converts to size_t
.
Compilers might still warn, though.
auto log_swizzle_size = UnderlyingParams::get_log_swizzle_size(problem_blocks.x, problem_blocks.y, max_swizzle); | ||
problem_blocks.x = round_up(problem_blocks.x, (1 << log_swizzle_size) * cluster_shape.m()); | ||
problem_blocks.y = round_up(problem_blocks.y, (1 << log_swizzle_size) * cluster_shape.n()); | ||
auto log_swizzle_size = UnderlyingParams::get_log_swizzle_size(int(problem_blocks.x), int(problem_blocks.y), max_swizzle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Look more closely at this.
@@ -647,7 +647,7 @@ struct PersistentTileSchedulerSm90StreamKParams { | |||
|
|||
// Calculate the number of stream-K units that would be needed if each stream-K unit | |||
// computed the minimum allowable k iterations. Truncate this to be in units of clusters. | |||
auto cluster_size = cluster_shape.m() * cluster_shape.n(); | |||
uint64_t cluster_size = cluster_shape.m() * cluster_shape.n(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how these three changes (that explicitly specify uint64_t
instead of using auto
) would fix warnings.
cluster_shape
is a GemmCoord
, so m()
and n()
should be int
. Thus, explicitly specifying uint64_t
here should actually either introduce a warning (signed to unsigned conversion), or mask possible overflow in the product of two int
values (by converting some negative value due to overflow to a large unsigned value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code changed since my original push, and I'm no longer getting a warning from the plain auto
. I don't recall if I was getting an overflow warning to make it 64-bit... but in any event, I've reverted this.
@@ -605,7 +605,7 @@ class TileIteratorTensorOpMixed<WarpShape_, OperatorShape_, int32_t, 32, OutputS | |||
TensorRef const &ref, | |||
unsigned lane_id | |||
): | |||
stride_(ref.stride()[0] / AccessType::kElements) { | |||
stride_(size_t(ref.stride()[0]) / AccessType::kElements) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stride_
is an int
. It's not clear to me how casting to size_t
would fix an integer conversion warning. Is size_t
signed on your platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -246,7 +246,7 @@ template <typename value_t> | |||
CUTLASS_HOST_DEVICE value_t find_log2(value_t x) { | |||
int a = int(31 - clz(x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that you didn't have to change this line, as clz(x)
returns value_t
(see the clz
function template immediately above this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I didn't fix all the warnings, only the ones that tripped a thousand times or more. The full list in the form that I provided is 1650 lines long. I had to draw the line somewhere :-]
(incidentally, if I filter out all -Wsign-conversion
, -Wimplicit-int-conversion
, and -Wimplicit-float-size-conversion
warnings, I am left with eight unique warnings overall.)
include/cutlass/fast_math.h
Outdated
@@ -365,7 +365,7 @@ struct FastDivmod { | |||
FastDivmod(int divisor): divisor(divisor) { | |||
|
|||
if (divisor != 1) { | |||
unsigned int p = 31 + find_log2(divisor); | |||
unsigned int p = 31 + unsigned(find_log2(divisor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on "int
+ unsigned
" implicitly converting to unsigned
, and also mentions "unsigned
" twice. Would you instead consider converting the entire expression to unsigned int
and using auto
on the left-hand side, to avoid repetition?
unsigned int p = 31 + unsigned(find_log2(divisor)); | |
auto p = static_cast<unsigned int>(31 + find_log2(divisor)); |
Hi @mhoemmen, thanks for looking at this. To help answer many of the above questions, let me provide you with an updated list of the warnings generated from my build, specifically for an unmodified Git revision f9ece1b. This was generated with
You might consider it worth the trouble to reproduce these with your own installation of the Intel oneAPI compiler, to cut out the middleman. I will read through your review comments and respond/revise as appropriate. Then a test build, and (hopefully) an update to the PR. |
28915a5
to
699178b
Compare
I've updated and rebased my PR. With these changes, the most frequent warning occurs 969 times. |
@iskunk Thanks for your interest in CUTLASS and for all your work!
Please note that the Intel oneAPI compiler is not currently a supported or tested build configuration. We do support and test Clang as host compiler plus NVCC as device compiler. |
This PR has been labeled |
699178b
to
21c0d6b
Compare
Hello @mhoemmen, have you had a chance to try out these changes? I've rebased the PR to current main. |
@iskunk Thank you for your interest in CUTLASS! Changes like the following, that replace CUTLASS_PRAGMA_UNROLL
for (size_t k = 0; k < 2; ++k) { cannot possibly reduce the number of warnings, given that the C++ Standard guarantees that
We do not support the Intel oneAPI compiler. We would consider code changes that happen to improve the oneAPI compiler experience as long as they improve code quality overall, but changes like the above objectively don't do that. |
I think the oneAPI warnings on that bit come down to |
@iskunk Thanks for the suggestion! We contemplated overloading I've implemented an example and run it through the latest icx (and the 2024 and 2023 releases). https://godbolt.org/z/P3ch9T9x1 For x[0] = 1.0f;
x[1] = 2.0f;
for (int k = 0; k < 10; ++k) {
x[k] = static_cast<float>(k) + 1.0f;
} Please let me know if I'm using |
This PR has been labeled |
21c0d6b
to
d99c9bc
Compare
Argh, this slipped through the cracks. My apologies. I was not able to reproduce those particular warnings in Godbolt. It is possible that the specific C++ backend in use has something to do with it. In any event, given that those fixes are only a few out of the bunch, I've rebased the PR and updated it to remove them. The remainder should be, as I've understood, the ones on which there is agreement. Please let me know if any further work is needed. |
This PR has been labeled |
Please let me know if there are any further concerns here. My understanding is that the remaining changes are reasonable disambiguations that have not drawn any objections. |
This PR has been labeled |
I am building cutlass 3.4.1 using the Intel LLVM (oneAPI) compiler. There are a lot of warnings being produced related to implicit integer sign conversions.
This PR does not fix all of them, but it addresses all the instances that individually result in over a thousand warnings. Test suite results are left no worse off, but please review these changes carefully.
Here is the bulk of the warnings removed from the build output, sorted in order of decreasing count: