-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) #7864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) #7864
Conversation
Signed-off-by: coffeerunhobby <[email protected]>
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
backend/cpp/llama-cpp/CMakeLists.txt
Outdated
| @@ -1,6 +1,6 @@ | |||
| set(TARGET grpc-server) | |||
| # CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed? If we need to do that, it requires compiling Cmake in the build process. Doable, but adds to compilation time and CI times. If there is no specific reason to do it I would avoid to do so for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble compiling for RTX 5060 (SM_120) and the only version that consistently worked was CMake 3.31.10. I tried multiple 4.0.x versions and lower CMake versions, but none succeeded. I’d prefer we standardize on 3.31.10 for now - it looks like the safest option, and PyTorch also uses it. Also worth noting: 3.31.9 includes a fix related to CUDA 13, which may be connected to what we’re seeing.
backend/cpp/llama-cpp/Makefile
Outdated
| ifeq ($(OS),Darwin) | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
| else ifeq ($(ARCH),$(filter $(ARCH),aarch64 arm64)) | ||
| else ifneq ($(filter $(ARCH),aarch64 arm64),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any specific reason? I find ifeq more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason here is:
- Skip BMI flags on Darwin (macOS)
- Skip BMI flags on ARM (aarch64/arm64)
- Add BMI flags on x86_64 (the else case)
The ifneq checks if ARCH matches aarch64 or arm64. When the filter finds a match, the result is non-empty, so we skip the flags.
I can change to ifeq if you prefer, but the logic would need to invert:
ifeq ($(OS),Darwin)
# No BMI flags (Darwin)
else ifeq ($(filter $(ARCH),aarch64 arm64),)
# This is x86_64 - ADD BMI flags here
else
# This is ARM - NO BMI flags
endif
backend/cpp/llama-cpp/Makefile
Outdated
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
| else | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DCMAKE_C_FLAGS=-mno-bmi2 -DCMAKE_CXX_FLAGS=-mno-bmi2" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server | ||
| CFLAGS="-mno-bmi2" CXXFLAGS="-mno-bmi2" CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-avx-build" build-llama-cpp-grpc-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML_BMI does not exist: https://github.com/ggml-org/ggml/blob/ebc3a0f4a56be1c9424a89fbec09962ac34fde85/CMakeLists.txt#L155
Do we need also CFLAGS/CXXFLAGS? If don't let's drop it. GGML_BMI2 should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right here. -DGGML_BMI2=off alone works (as I tested in the successful build) - the CFLAGS/CXXFLAGS were just me being overly cautious after fighting with compiler flags for too long.
backend/cpp/llama-cpp/Makefile
Outdated
| ifeq ($(OS),Darwin) | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
| else ifeq ($(ARCH),$(filter $(ARCH),aarch64 arm64)) | ||
| else ifneq ($(filter $(ARCH),aarch64 arm64),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto about logic inversion
backend/cpp/llama-cpp/Makefile
Outdated
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
| else | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DCMAKE_C_FLAGS='-mno-bmi -mno-bmi2' -DCMAKE_CXX_FLAGS='-mno-bmi -mno-bmi2'" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server | ||
| CFLAGS="-mno-bmi -mno-bmi2" CXXFLAGS="-mno-bmi -mno-bmi2" CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto above
backend/cpp/llama-cpp/Makefile
Outdated
| ifeq ($(OS),Darwin) | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
| else ifeq ($(ARCH),$(filter $(ARCH),aarch64 arm64)) | ||
| else ifneq ($(filter $(ARCH),aarch64 arm64),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
backend/cpp/llama-cpp/Makefile
Outdated
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
| else | ||
| CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DCMAKE_C_FLAGS='-mno-bmi -mno-bmi2' -DCMAKE_CXX_FLAGS='-mno-bmi -mno-bmi2'" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server | ||
| CFLAGS="-mno-bmi -mno-bmi2" CXXFLAGS="-mno-bmi -mno-bmi2" CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI=off -DGGML_BMI2=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| @@ -1,4 +1,5 @@ | |||
| cmake_minimum_required(VERSION 3.12) | |||
| # CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues | |||
| cmake_minimum_required(VERSION 3.31.10) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Signed-off-by: Ettore Di Giacinto <[email protected]>
Description
Fix: Prevent BMI2 instruction crash on AVX-only CPUs
Problem
The
llama-cpp-avxbinary incorrectly includes BMI2 instructions despite being built for AVX-only compatibility. This causes crashes on CPUs with AVX but without BMI2 support (e.g., Intel Sandy Bridge, Ivy Bridge from 2011-2013).Error symptoms:
rpc error: code = Unavailable desc = error reading from server: EOFRoot Cause
llama.cpp's CMake automatically enables BMI2 when detecting x86_64 architecture, even when building AVX-only binaries. The
llama-cpp-avxtarget is intended for older CPUs that have AVX but lack newer instruction sets.Solution
Add
-DGGML_BMI2=off(and-DGGML_BMI=offfor fallback) to CMake args for:llama-cpp-avx: Disable BMI2 for AVX-only CPUsllama-cpp-fallback: Disable both BMI and BMI2 for maximum compatibilityllama-cpp-grpc: Disable both BMI and BMI2 for RPC server compatibilityNotes for Reviewers
Testing
Check the actual CPU inference code (lower addresses, typically 0x400000-0xffffff range)
Verified binaries no longer contain BMI2 instructions:
Added also CUDA_DOCKER_ARCH option to compile llama-cpp for Blackwell Gpus (CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues)
Tested on Ubuntu 24.04, Intel E3-1240 v2, Geforce RTX 5060 Ti 16GB (SM_120), Nvidia driver 570-open, Cuda version 12.8
./LocalAI > DOCKER_BUILDKIT=1 docker build --pull --progress=plain -f backend/Dockerfile.llama-cpp
--build-arg CMAKE_FROM_SOURCE=true
--build-arg CMAKE_VERSION=3.31.10
--build-arg BUILD_TYPE=cublas
--build-arg CUDA_MAJOR_VERSION=12
--build-arg CUDA_MINOR_VERSION=8
--build-arg UBUNTU_VERSION=2204
--build-arg CUDA_DOCKER_ARCH='75;86;89;120'
-t localai-llama-cpp-backend:cuda128 .