Skip to content

Commit fb769ff

Browse files
committed
base: disable SIMD on MSVC x86_64 by default
This recreates the case as of commit f169822 (tag v0.4.0-alpha.4), in that, by default (without #define'ing a macro or passing an /arch:ETC compiler flag), Wuffs does not use SIMD on MSVC x86_64. Commit b64a761 (after tag v0.4.0-alpha.4, before tag v0.4.0-alpha.5) changed the default so that x86_64_v2 (roughly equivalent to SSE4.2) was enabled by default, since the user from issue #148 was enabling that anyway (in an unsupported way, by #define'ing a macro that was a private implementation detail) with no problems (and better performance). However, another user later reported (in issue #151) that enabling SIMD on MSVC x86_64 somehow lead to ICEs (Internal Compiler Errors). This commit restores the default to "no SIMD" and it is up to the MSVC user to opt in to the SIMD code paths. Clang and GCC are unaffected: SIMD remains enabled by default. Updates #148 Updates #151
1 parent 5e0b2ae commit fb769ff

File tree

3 files changed

+111
-36
lines changed

3 files changed

+111
-36
lines changed

doc/changelog.md

+3
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,10 @@ The LICENSE has changed from a single license (Apache 2) to a dual license
2626
- Added `std/xxhash32`.
2727
- Added `std/xxhash64`.
2828
- Added `std/xz`.
29+
- Added `WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY`.
2930
- Added `WUFFS_CONFIG__DST_PIXEL_FORMAT__ENABLE_ALLOWLIST`.
31+
- Added `WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V2`.
32+
- Added `WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V3`.
3033
- Added `wuffs_base__status__is_truncated_input_error`.
3134
- Changed `lzw.set_literal_width` to `lzw.set_quirk`.
3235
- Changed `set_quirk_enabled!(quirk: u32, enabled: bool)` to `set_quirk!(key:

internal/cgen/base/fundamental-public.h

+54-18
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,62 @@
119119
#elif defined(_MSC_VER) // (#if-chain ref AVOID_CPU_ARCH_1)
120120

121121
#if defined(_M_X64)
122-
// We need <intrin.h> for the __cpuid function.
123-
#include <intrin.h>
122+
123+
// On X86_64, Microsoft Visual C/C++ (MSVC) only supports SSE2 by default.
124+
// There are /arch:SSE2, /arch:AVX and /arch:AVX2 compiler flags (the AVX2 one
125+
// is roughly equivalent to X86_64_V3), but there is no /arch:SSE42 compiler
126+
// flag that's equivalent to X86_64_V2.
127+
//
128+
// For getting maximum performance with X86_64 MSVC and Wuffs, pass /arch:AVX2
129+
// (and then test on the oldest hardware you intend to support).
130+
//
131+
// Absent that compiler flag, either define one of the three macros listed
132+
// below or else the X86_64 SIMD code will be disabled and you'll get a #pragma
133+
// message stating this library "performs best with /arch:AVX2". This message
134+
// is harmless and ignorable, in that the non-SIMD code is still correct and
135+
// reasonably performant, but is a reminder that when combining Wuffs and MSVC,
136+
// some compiler configuration is required for maximum performance.
137+
//
138+
// - WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY
139+
// - WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V2 (enables SSE4.2 and below)
140+
// - WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V3 (enables AVX2 and below)
141+
//
142+
// Defining the first one (WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY)
143+
// or defining none of those three (the default state) are equivalent (in that
144+
// both disable the SIMD code paths), other than that pragma message.
145+
//
146+
// When defining these WUFFS_CONFIG__ENABLE_ETC macros with MSVC, be aware that
147+
// some users report it leading to ICEs (Internal Compiler Errors), but other
148+
// users report no problems at all (and improved performance). It's unclear
149+
// exactly what combination of SIMD code and MSVC configuration lead to ICEs.
150+
// Do your own testing with your own MSVC version and configuration.
151+
//
152+
// https://github.com/google/wuffs/issues/148
153+
// https://github.com/google/wuffs/issues/151
154+
// https://developercommunity.visualstudio.com/t/fatal--error-C1001:-Internal-compiler-er/10703305
155+
//
156+
// Clang (including clang-cl) and GCC don't need this WUFFS_CONFIG__ETC macro
157+
// machinery, or having the Wuffs-the-library user to fiddle with compiler
158+
// flags, because they support "__attribute__((target(arg)))".
159+
#if defined(__AVX2__) || defined(__clang__) || \
160+
defined(WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V3)
161+
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
162+
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
163+
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V3
164+
#elif defined(WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V2)
124165
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
125166
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
126-
#if defined(__AVX2__) || defined(__clang__)
167+
#elif !defined(WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY)
168+
#pragma message("Wuffs with MSVC+X64 performs best with /arch:AVX2")
169+
#endif // defined(__AVX2__) || defined(__clang__) || etc
127170

171+
#if defined(WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64)
172+
173+
#if defined(WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY)
174+
#error "MSVC_CPU_ARCH simultaneously enabled and disabled"
175+
#endif
176+
177+
#include <intrin.h>
128178
// intrin.h isn't enough for X64 SIMD, with clang-cl, if we want to use
129179
// "__attribute__((target(arg)))" without e.g. "/arch:AVX".
130180
//
@@ -134,23 +184,9 @@
134184
#include <immintrin.h> // AVX, AVX2, FMA, POPCNT
135185
#include <nmmintrin.h> // SSE4.2
136186
#include <wmmintrin.h> // AES, PCLMUL
137-
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V3
138187

139-
#else // defined(__AVX2__) || defined(__clang__)
140-
141-
// clang-cl (which defines both __clang__ and _MSC_VER) supports
142-
// "__attribute__((target(arg)))".
143-
//
144-
// For MSVC's cl.exe (unlike clang or gcc), SIMD capability is a compile-time
145-
// property of the source file (e.g. a /arch:AVX2 or -mavx2 compiler flag), not
146-
// of individual functions (that can be conditionally selected at runtime).
147-
#if !defined(WUFFS_CONFIG__I_KNOW_THAT_WUFFS_MSVC_PERFORMS_BEST_WITH_ARCH_AVX2)
148-
#pragma message("Wuffs with MSVC+IX86/X64 performs best with /arch:AVX2")
149-
#endif
150-
151-
#endif // defined(__AVX2__) || defined(__clang__)
188+
#endif // defined(WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64)
152189
#endif // defined(_M_X64)
153-
154190
#endif // (#if-chain ref AVOID_CPU_ARCH_1)
155191
#endif // (#if-chain ref AVOID_CPU_ARCH_0)
156192

release/c/wuffs-unsupported-snapshot.c

+54-18
Original file line numberDiff line numberDiff line change
@@ -175,12 +175,62 @@ extern "C" {
175175
#elif defined(_MSC_VER) // (#if-chain ref AVOID_CPU_ARCH_1)
176176

177177
#if defined(_M_X64)
178-
// We need <intrin.h> for the __cpuid function.
179-
#include <intrin.h>
178+
179+
// On X86_64, Microsoft Visual C/C++ (MSVC) only supports SSE2 by default.
180+
// There are /arch:SSE2, /arch:AVX and /arch:AVX2 compiler flags (the AVX2 one
181+
// is roughly equivalent to X86_64_V3), but there is no /arch:SSE42 compiler
182+
// flag that's equivalent to X86_64_V2.
183+
//
184+
// For getting maximum performance with X86_64 MSVC and Wuffs, pass /arch:AVX2
185+
// (and then test on the oldest hardware you intend to support).
186+
//
187+
// Absent that compiler flag, either define one of the three macros listed
188+
// below or else the X86_64 SIMD code will be disabled and you'll get a #pragma
189+
// message stating this library "performs best with /arch:AVX2". This message
190+
// is harmless and ignorable, in that the non-SIMD code is still correct and
191+
// reasonably performant, but is a reminder that when combining Wuffs and MSVC,
192+
// some compiler configuration is required for maximum performance.
193+
//
194+
// - WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY
195+
// - WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V2 (enables SSE4.2 and below)
196+
// - WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V3 (enables AVX2 and below)
197+
//
198+
// Defining the first one (WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY)
199+
// or defining none of those three (the default state) are equivalent (in that
200+
// both disable the SIMD code paths), other than that pragma message.
201+
//
202+
// When defining these WUFFS_CONFIG__ENABLE_ETC macros with MSVC, be aware that
203+
// some users report it leading to ICEs (Internal Compiler Errors), but other
204+
// users report no problems at all (and improved performance). It's unclear
205+
// exactly what combination of SIMD code and MSVC configuration lead to ICEs.
206+
// Do your own testing with your own MSVC version and configuration.
207+
//
208+
// https://github.com/google/wuffs/issues/148
209+
// https://github.com/google/wuffs/issues/151
210+
// https://developercommunity.visualstudio.com/t/fatal--error-C1001:-Internal-compiler-er/10703305
211+
//
212+
// Clang (including clang-cl) and GCC don't need this WUFFS_CONFIG__ETC macro
213+
// machinery, or having the Wuffs-the-library user to fiddle with compiler
214+
// flags, because they support "__attribute__((target(arg)))".
215+
#if defined(__AVX2__) || defined(__clang__) || \
216+
defined(WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V3)
217+
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
218+
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
219+
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V3
220+
#elif defined(WUFFS_CONFIG__ENABLE_MSVC_CPU_ARCH__X86_64_V2)
180221
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64
181222
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V2
182-
#if defined(__AVX2__) || defined(__clang__)
223+
#elif !defined(WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY)
224+
#pragma message("Wuffs with MSVC+X64 performs best with /arch:AVX2")
225+
#endif // defined(__AVX2__) || defined(__clang__) || etc
183226

227+
#if defined(WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64)
228+
229+
#if defined(WUFFS_CONFIG__DISABLE_MSVC_CPU_ARCH__X86_64_FAMILY)
230+
#error "MSVC_CPU_ARCH simultaneously enabled and disabled"
231+
#endif
232+
233+
#include <intrin.h>
184234
// intrin.h isn't enough for X64 SIMD, with clang-cl, if we want to use
185235
// "__attribute__((target(arg)))" without e.g. "/arch:AVX".
186236
//
@@ -190,23 +240,9 @@ extern "C" {
190240
#include <immintrin.h> // AVX, AVX2, FMA, POPCNT
191241
#include <nmmintrin.h> // SSE4.2
192242
#include <wmmintrin.h> // AES, PCLMUL
193-
#define WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64_V3
194243

195-
#else // defined(__AVX2__) || defined(__clang__)
196-
197-
// clang-cl (which defines both __clang__ and _MSC_VER) supports
198-
// "__attribute__((target(arg)))".
199-
//
200-
// For MSVC's cl.exe (unlike clang or gcc), SIMD capability is a compile-time
201-
// property of the source file (e.g. a /arch:AVX2 or -mavx2 compiler flag), not
202-
// of individual functions (that can be conditionally selected at runtime).
203-
#if !defined(WUFFS_CONFIG__I_KNOW_THAT_WUFFS_MSVC_PERFORMS_BEST_WITH_ARCH_AVX2)
204-
#pragma message("Wuffs with MSVC+IX86/X64 performs best with /arch:AVX2")
205-
#endif
206-
207-
#endif // defined(__AVX2__) || defined(__clang__)
244+
#endif // defined(WUFFS_PRIVATE_IMPL__CPU_ARCH__X86_64)
208245
#endif // defined(_M_X64)
209-
210246
#endif // (#if-chain ref AVOID_CPU_ARCH_1)
211247
#endif // (#if-chain ref AVOID_CPU_ARCH_0)
212248

0 commit comments

Comments
 (0)