Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failure in TensorRef.rank2_column_major_interleaved test with Intel LLVM compiler #1388

Closed
iskunk opened this issue Mar 8, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@iskunk
Copy link
Contributor

iskunk commented Mar 8, 2024

Describe the bug
I am building CUTLASS 3.4.1 with the Intel LLVM (oneAPI) compiler, version 2024.0. Everything builds, and all the tests pass, except for one: TensorRef.rank2_column_major_interleaved, part of ctest_unit_core. If I swap out Intel with GCC 7.4, the entire test suite passes clean.

Because the test is operating on a matrix of integers, I'm scratching my head as to what could be going on here; this clearly isn't some minor floating-point instability, and it's quite a difference to chalk up to some alternate interpretation of C++. I augmented the test code to print out the content of matrix_data[], placed output from the two builds side by side, and copied it below. I am hoping some pattern will be apparent:

# GCC build (GOOD)	Intel LLVM build (BAD)
matrix_data[0] = 0	matrix_data[0] = 48
matrix_data[1] = 16	matrix_data[1] = 0
matrix_data[2] = 32	matrix_data[2] = 0
matrix_data[3] = 48	matrix_data[3] = 0
matrix_data[4] = 1	matrix_data[4] = 0
matrix_data[5] = 17	matrix_data[5] = 49
matrix_data[6] = 33	matrix_data[6] = 0
matrix_data[7] = 49	matrix_data[7] = 0
matrix_data[8] = 2	matrix_data[8] = 0
matrix_data[9] = 18	matrix_data[9] = 0
matrix_data[10] = 34	matrix_data[10] = 50
matrix_data[11] = 50	matrix_data[11] = 0
matrix_data[12] = 3	matrix_data[12] = 0
matrix_data[13] = 19	matrix_data[13] = 0
matrix_data[14] = 35	matrix_data[14] = 0
matrix_data[15] = 51	matrix_data[15] = 51
matrix_data[16] = 4	matrix_data[16] = 52
matrix_data[17] = 20	matrix_data[17] = 0
matrix_data[18] = 36	matrix_data[18] = 0
matrix_data[19] = 52	matrix_data[19] = 0
matrix_data[20] = 5	matrix_data[20] = 0
matrix_data[21] = 21	matrix_data[21] = 53
matrix_data[22] = 37	matrix_data[22] = 0
matrix_data[23] = 53	matrix_data[23] = 0
matrix_data[24] = 6	matrix_data[24] = 0
matrix_data[25] = 22	matrix_data[25] = 0
matrix_data[26] = 38	matrix_data[26] = 54
matrix_data[27] = 54	matrix_data[27] = 0
matrix_data[28] = 7	matrix_data[28] = 0
matrix_data[29] = 23	matrix_data[29] = 0
matrix_data[30] = 39	matrix_data[30] = 0
matrix_data[31] = 55	matrix_data[31] = 55
matrix_data[32] = 8	matrix_data[32] = 56
matrix_data[33] = 24	matrix_data[33] = 0
matrix_data[34] = 40	matrix_data[34] = 0
matrix_data[35] = 56	matrix_data[35] = 0
matrix_data[36] = 9	matrix_data[36] = 0
matrix_data[37] = 25	matrix_data[37] = 57
matrix_data[38] = 41	matrix_data[38] = 0
matrix_data[39] = 57	matrix_data[39] = 0
matrix_data[40] = 10	matrix_data[40] = 0
matrix_data[41] = 26	matrix_data[41] = 0
matrix_data[42] = 42	matrix_data[42] = 58
matrix_data[43] = 58	matrix_data[43] = 0
matrix_data[44] = 11	matrix_data[44] = 0
matrix_data[45] = 27	matrix_data[45] = 0
matrix_data[46] = 43	matrix_data[46] = 0
matrix_data[47] = 59	matrix_data[47] = 59
matrix_data[48] = 12	matrix_data[48] = 60
matrix_data[49] = 28	matrix_data[49] = 0
matrix_data[50] = 44	matrix_data[50] = 0
matrix_data[51] = 60	matrix_data[51] = 0
matrix_data[52] = 13	matrix_data[52] = 0
matrix_data[53] = 29	matrix_data[53] = 61
matrix_data[54] = 45	matrix_data[54] = 0
matrix_data[55] = 61	matrix_data[55] = 0
matrix_data[56] = 14	matrix_data[56] = 0
matrix_data[57] = 30	matrix_data[57] = 0
matrix_data[58] = 46	matrix_data[58] = 62
matrix_data[59] = 62	matrix_data[59] = 0
matrix_data[60] = 15	matrix_data[60] = 0
matrix_data[61] = 31	matrix_data[61] = 0
matrix_data[62] = 47	matrix_data[62] = 0
matrix_data[63] = 63	matrix_data[63] = 63
matrix_data[64] = 64	matrix_data[64] = 112
matrix_data[65] = 80	matrix_data[65] = 0
matrix_data[66] = 96	matrix_data[66] = 0
matrix_data[67] = 112	matrix_data[67] = 0
matrix_data[68] = 65	matrix_data[68] = 0
matrix_data[69] = 81	matrix_data[69] = 113
matrix_data[70] = 97	matrix_data[70] = 0
matrix_data[71] = 113	matrix_data[71] = 0
matrix_data[72] = 66	matrix_data[72] = 0
matrix_data[73] = 82	matrix_data[73] = 0
matrix_data[74] = 98	matrix_data[74] = 114
matrix_data[75] = 114	matrix_data[75] = 0
matrix_data[76] = 67	matrix_data[76] = 0
matrix_data[77] = 83	matrix_data[77] = 0
matrix_data[78] = 99	matrix_data[78] = 0
matrix_data[79] = 115	matrix_data[79] = 115
matrix_data[80] = 68	matrix_data[80] = 116
matrix_data[81] = 84	matrix_data[81] = 0
matrix_data[82] = 100	matrix_data[82] = 0
matrix_data[83] = 116	matrix_data[83] = 0
matrix_data[84] = 69	matrix_data[84] = 0
matrix_data[85] = 85	matrix_data[85] = 117
matrix_data[86] = 101	matrix_data[86] = 0
matrix_data[87] = 117	matrix_data[87] = 0
matrix_data[88] = 70	matrix_data[88] = 0
matrix_data[89] = 86	matrix_data[89] = 0
matrix_data[90] = 102	matrix_data[90] = 118
matrix_data[91] = 118	matrix_data[91] = 0
matrix_data[92] = 71	matrix_data[92] = 0
matrix_data[93] = 87	matrix_data[93] = 0
matrix_data[94] = 103	matrix_data[94] = 0
matrix_data[95] = 119	matrix_data[95] = 119
matrix_data[96] = 72	matrix_data[96] = 120
matrix_data[97] = 88	matrix_data[97] = 0
matrix_data[98] = 104	matrix_data[98] = 0
matrix_data[99] = 120	matrix_data[99] = 0
matrix_data[100] = 73	matrix_data[100] = 0
matrix_data[101] = 89	matrix_data[101] = 121
matrix_data[102] = 105	matrix_data[102] = 0
matrix_data[103] = 121	matrix_data[103] = 0
matrix_data[104] = 74	matrix_data[104] = 0
matrix_data[105] = 90	matrix_data[105] = 0
matrix_data[106] = 106	matrix_data[106] = 122
matrix_data[107] = 122	matrix_data[107] = 0
matrix_data[108] = 75	matrix_data[108] = 0
matrix_data[109] = 91	matrix_data[109] = 0
matrix_data[110] = 107	matrix_data[110] = 0
matrix_data[111] = 123	matrix_data[111] = 123
matrix_data[112] = 76	matrix_data[112] = 124
matrix_data[113] = 92	matrix_data[113] = 0
matrix_data[114] = 108	matrix_data[114] = 0
matrix_data[115] = 124	matrix_data[115] = 0
matrix_data[116] = 77	matrix_data[116] = 0
matrix_data[117] = 93	matrix_data[117] = 125
matrix_data[118] = 109	matrix_data[118] = 0
matrix_data[119] = 125	matrix_data[119] = 0
matrix_data[120] = 78	matrix_data[120] = 0
matrix_data[121] = 94	matrix_data[121] = 0
matrix_data[122] = 110	matrix_data[122] = 126
matrix_data[123] = 126	matrix_data[123] = 0
matrix_data[124] = 79	matrix_data[124] = 0
matrix_data[125] = 95	matrix_data[125] = 0
matrix_data[126] = 111	matrix_data[126] = 0
matrix_data[127] = 127	matrix_data[127] = 127
matrix_data[128] = 128	matrix_data[128] = 176
matrix_data[129] = 144	matrix_data[129] = 0
matrix_data[130] = 160	matrix_data[130] = 0
matrix_data[131] = 176	matrix_data[131] = 0
matrix_data[132] = 129	matrix_data[132] = 0
matrix_data[133] = 145	matrix_data[133] = 177
matrix_data[134] = 161	matrix_data[134] = 0
matrix_data[135] = 177	matrix_data[135] = 0
matrix_data[136] = 130	matrix_data[136] = 0
matrix_data[137] = 146	matrix_data[137] = 0
matrix_data[138] = 162	matrix_data[138] = 178
matrix_data[139] = 178	matrix_data[139] = 0
matrix_data[140] = 131	matrix_data[140] = 0
matrix_data[141] = 147	matrix_data[141] = 0
matrix_data[142] = 163	matrix_data[142] = 0
matrix_data[143] = 179	matrix_data[143] = 179
matrix_data[144] = 132	matrix_data[144] = 180
matrix_data[145] = 148	matrix_data[145] = 0
matrix_data[146] = 164	matrix_data[146] = 0
matrix_data[147] = 180	matrix_data[147] = 0
matrix_data[148] = 133	matrix_data[148] = 0
matrix_data[149] = 149	matrix_data[149] = 181
matrix_data[150] = 165	matrix_data[150] = 0
matrix_data[151] = 181	matrix_data[151] = 0
matrix_data[152] = 134	matrix_data[152] = 0
matrix_data[153] = 150	matrix_data[153] = 0
matrix_data[154] = 166	matrix_data[154] = 182
matrix_data[155] = 182	matrix_data[155] = 0
matrix_data[156] = 135	matrix_data[156] = 0
matrix_data[157] = 151	matrix_data[157] = 0
matrix_data[158] = 167	matrix_data[158] = 0
matrix_data[159] = 183	matrix_data[159] = 183
matrix_data[160] = 136	matrix_data[160] = 184
matrix_data[161] = 152	matrix_data[161] = 0
matrix_data[162] = 168	matrix_data[162] = 0
matrix_data[163] = 184	matrix_data[163] = 0
matrix_data[164] = 137	matrix_data[164] = 0
matrix_data[165] = 153	matrix_data[165] = 185
matrix_data[166] = 169	matrix_data[166] = 0
matrix_data[167] = 185	matrix_data[167] = 0
matrix_data[168] = 138	matrix_data[168] = 0
matrix_data[169] = 154	matrix_data[169] = 0
matrix_data[170] = 170	matrix_data[170] = 186
matrix_data[171] = 186	matrix_data[171] = 0
matrix_data[172] = 139	matrix_data[172] = 0
matrix_data[173] = 155	matrix_data[173] = 0
matrix_data[174] = 171	matrix_data[174] = 0
matrix_data[175] = 187	matrix_data[175] = 187
matrix_data[176] = 140	matrix_data[176] = 188
matrix_data[177] = 156	matrix_data[177] = 0
matrix_data[178] = 172	matrix_data[178] = 0
matrix_data[179] = 188	matrix_data[179] = 0
matrix_data[180] = 141	matrix_data[180] = 0
matrix_data[181] = 157	matrix_data[181] = 189
matrix_data[182] = 173	matrix_data[182] = 0
matrix_data[183] = 189	matrix_data[183] = 0
matrix_data[184] = 142	matrix_data[184] = 0
matrix_data[185] = 158	matrix_data[185] = 0
matrix_data[186] = 174	matrix_data[186] = 190
matrix_data[187] = 190	matrix_data[187] = 0
matrix_data[188] = 143	matrix_data[188] = 0
matrix_data[189] = 159	matrix_data[189] = 0
matrix_data[190] = 175	matrix_data[190] = 0
matrix_data[191] = 191	matrix_data[191] = 191
matrix_data[192] = 192	matrix_data[192] = 240
matrix_data[193] = 208	matrix_data[193] = 0
matrix_data[194] = 224	matrix_data[194] = 0
matrix_data[195] = 240	matrix_data[195] = 0
matrix_data[196] = 193	matrix_data[196] = 0
matrix_data[197] = 209	matrix_data[197] = 241
matrix_data[198] = 225	matrix_data[198] = 0
matrix_data[199] = 241	matrix_data[199] = 0
matrix_data[200] = 194	matrix_data[200] = 0
matrix_data[201] = 210	matrix_data[201] = 0
matrix_data[202] = 226	matrix_data[202] = 242
matrix_data[203] = 242	matrix_data[203] = 0
matrix_data[204] = 195	matrix_data[204] = 0
matrix_data[205] = 211	matrix_data[205] = 0
matrix_data[206] = 227	matrix_data[206] = 0
matrix_data[207] = 243	matrix_data[207] = 243
matrix_data[208] = 196	matrix_data[208] = 244
matrix_data[209] = 212	matrix_data[209] = 0
matrix_data[210] = 228	matrix_data[210] = 0
matrix_data[211] = 244	matrix_data[211] = 0
matrix_data[212] = 197	matrix_data[212] = 0
matrix_data[213] = 213	matrix_data[213] = 245
matrix_data[214] = 229	matrix_data[214] = 0
matrix_data[215] = 245	matrix_data[215] = 0
matrix_data[216] = 198	matrix_data[216] = 0
matrix_data[217] = 214	matrix_data[217] = 0
matrix_data[218] = 230	matrix_data[218] = 246
matrix_data[219] = 246	matrix_data[219] = 0
matrix_data[220] = 199	matrix_data[220] = 0
matrix_data[221] = 215	matrix_data[221] = 0
matrix_data[222] = 231	matrix_data[222] = 0
matrix_data[223] = 247	matrix_data[223] = 247
matrix_data[224] = 200	matrix_data[224] = 248
matrix_data[225] = 216	matrix_data[225] = 0
matrix_data[226] = 232	matrix_data[226] = 0
matrix_data[227] = 248	matrix_data[227] = 0
matrix_data[228] = 201	matrix_data[228] = 0
matrix_data[229] = 217	matrix_data[229] = 249
matrix_data[230] = 233	matrix_data[230] = 0
matrix_data[231] = 249	matrix_data[231] = 0
matrix_data[232] = 202	matrix_data[232] = 0
matrix_data[233] = 218	matrix_data[233] = 0
matrix_data[234] = 234	matrix_data[234] = 250
matrix_data[235] = 250	matrix_data[235] = 0
matrix_data[236] = 203	matrix_data[236] = 0
matrix_data[237] = 219	matrix_data[237] = 0
matrix_data[238] = 235	matrix_data[238] = 0
matrix_data[239] = 251	matrix_data[239] = 251
matrix_data[240] = 204	matrix_data[240] = 252
matrix_data[241] = 220	matrix_data[241] = 0
matrix_data[242] = 236	matrix_data[242] = 0
matrix_data[243] = 252	matrix_data[243] = 0
matrix_data[244] = 205	matrix_data[244] = 0
matrix_data[245] = 221	matrix_data[245] = 253
matrix_data[246] = 237	matrix_data[246] = 0
matrix_data[247] = 253	matrix_data[247] = 0
matrix_data[248] = 206	matrix_data[248] = 0
matrix_data[249] = 222	matrix_data[249] = 0
matrix_data[250] = 238	matrix_data[250] = 254
matrix_data[251] = 254	matrix_data[251] = 0
matrix_data[252] = 207	matrix_data[252] = 0
matrix_data[253] = 223	matrix_data[253] = 0
matrix_data[254] = 239	matrix_data[254] = 0
matrix_data[255] = 255	matrix_data[255] = 255

Steps/Code to reproduce bug
In theory, installing the oneAPI compiler and building with it should be enough to reproduce the issue.

Expected behavior
The test at issue should pass equally well with both compilers.

Environment details:
Building and running inside Docker, Red Hat Linux UBI8-based container.
Compiler: Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017)
Compiler builds against the GCC 7.4.0 C/C++ runtime

Additional context
The test fails with or without my warning fixes (#1380) applied.

@iskunk iskunk added ? - Needs Triage bug Something isn't working labels Mar 8, 2024
@thakkarV
Copy link
Collaborator

thakkarV commented Mar 8, 2024

This is curious indeed. Can you check if it's optimizing the host code to use SIMD instructions and if so disable them and try again? It's unclear to me if this is a bug in CUTLASS or in the OpenAPI compiler.

@iskunk
Copy link
Contributor Author

iskunk commented Mar 8, 2024

This is, in fact, meant to be a debug build. The exact CXXFLAGS I'm using are --gcc-install-dir=/path/to/gcc/7.4.0/lib/gcc/x86_64-pc-linux-gnu/7.4.0 -std=c++17 -ffp-model=precise -fmessage-length=0 -fstack-protector-strong -fno-strict-aliasing -O0 -g -fstack-security-check.

CUTLASS is adding -O3 to CUDA_FLAGS, if that makes a difference (not sure if that gets passed to the host compiler).

@iskunk
Copy link
Contributor Author

iskunk commented Apr 8, 2024

I've found that the -O3 is coming from CMake, via the the default CMAKE_BUILD_TYPE=Release. If I set CMAKE_BUILD_TYPE=None (so that it only uses the flags I specify), then the -O3 goes away, and so does the test failure.

Indications thus appear to be some kind of optimizer bug in the host compiler, so I am closing this issue for now. If I find any CUTLASS-side problems, I'll re-file here.

@iskunk iskunk closed this as completed Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants