UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` #620

loci-dev · 2025-12-18T21:34:29Z

Mirrored from ggml-org/llama.cpp#18186

These additional codenames support different features from the current set:

ivybridge
piledriver
cascadelake
cooperlake
zen4

Resolves: #17966

…L_VARIANTS=On` - `ivybridge` - `piledriver` - `cascadelake` - `cooperlake` - `zen4` Resolves: #17966

loci-agentic-ai · 2025-12-18T22:27:41Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #620

Overview

PR #620 adds support for five additional x86_64 CPU backend variants (ivybridge, piledriver, cascadelake, cooperlake, zen4) when building with GGML_CPU_ALL_VARIANTS=On. The changes consolidate SIMD header includes by removing duplicate __F16C__ preprocessor checks from ggml-impl.h and simd-mappings.h, while adding __F16C__ explicitly to the conditional in ggml-cpu-impl.h. This refactoring introduces a consistent 6-10 ns degradation in parameter accessor functions across all CPU backend variants.

Key Findings

Impacted Functions in Performance-Critical Areas

The degradation affects parameter accessor functions compiled across multiple CPU backend variants:

Most Impacted Functions (by absolute change):

repack.cpp_ggml_set_op_params in libggml-cpu.so: +10 ns throughput (110 ns → 120 ns)
binary-ops.cpp_ggml_get_op_params_i32 in libggml-cpu.so: +7 ns throughput (76 ns → 83 ns)
ggml-backend-reg.cpp_ggml_get_op_params_i32 in libggml.so: +6 ns throughput (76 ns → 83 ns)
amx.cpp_ggml_get_op_params_i32 in libggml-cpu.so: +6 ns throughput (69 ns → 75 ns)
ggml-quants.c_ggml_get_op_params_i32 in libggml-base.so: +6 ns throughput (71 ns → 77 ns)

All affected functions are variants of ggml_get_op_params_i32 and ggml_set_op_params located in ggml-impl.h lines 151-154. These are static inline functions that extract operation parameters from tensor structures. The degradation originates from altered compiler optimization decisions when <immintrin.h> is included in different preprocessor contexts due to the addition of __F16C__ to the conditional check.

Impact on Inference Performance (Tokens per Second)

No impact on tokenization or inference throughput. The affected functions (ggml_get_op_params_i32, ggml_set_op_params) are parameter accessors used during graph construction and operation dispatch, not during the core inference loop. The functions responsible for token generation (llama_decode, llama_encode, llama_tokenize) show no performance changes in this PR.

Reference context: For the test model (ollama://smollm:135m on 12th Gen Intel Core i7-1255U), a 2 ms slowdown in llama_decode causes 7% reduction in tokens per second. The 6-10 ns changes observed here are six orders of magnitude smaller and occur in infrastructure functions called during setup, not per-token processing.

Power Consumption Analysis

Power consumption increases are minimal across affected binaries:

libggml.so: +0.054% (+2.25 nJ absolute, 4152 nJ → 4154 nJ)
libggml-cpu.so: +0.017% (+20 nJ absolute, 119986 nJ → 120006 nJ)
libggml-base.so: +0.012% (+7 nJ absolute, 59263 nJ → 59270 nJ)
libllama.so: +0.000% (no measurable change)
All other binaries: no measurable change

The power consumption changes correlate with the throughput degradations in parameter accessor functions. The increases are negligible in absolute terms and do not impact production energy consumption meaningfully.

Code Change Analysis

The PR implements legitimate functionality: expanding CPU variant support for better hardware optimization. The header refactoring consolidates SIMD intrinsic management by centralizing <immintrin.h> inclusion in ggml-cpu-impl.h. The performance regression is an unintended side effect of this consolidation, where the addition of __F16C__ to the preprocessor conditional altered the compilation context for inline functions across all CPU backend variants. The changes do not modify algorithmic logic or add computational overhead; the degradation stems from different compiler optimization decisions for the same source code when compiled with different preprocessor definitions.

bberberov added 2 commits December 18, 2025 14:59

minor: Consolidated #include <immintrin.h> under ggml-cpu-impl.h

bd68073

cmake: Added more x86-64 CPU backends when building with `GGML_CPU_AL…

f714331

…L_VARIANTS=On` - `ivybridge` - `piledriver` - `cascadelake` - `cooperlake` - `zen4` Resolves: #17966

loci-dev temporarily deployed to PROD__AL_DEMO December 18, 2025 21:34 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 20 times, most recently from f002844 to 25154fc Compare December 21, 2025 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` #620

UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` #620

Uh oh!

loci-dev commented Dec 18, 2025

Uh oh!

loci-agentic-ai bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with GGML_CPU_ALL_VARIANTS=On #620

Are you sure you want to change the base?

UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with GGML_CPU_ALL_VARIANTS=On #620

Uh oh!

Conversation

loci-dev commented Dec 18, 2025

Uh oh!

loci-agentic-ai bot commented Dec 18, 2025

Performance Analysis Summary: PR #620

Overview

Key Findings

Impacted Functions in Performance-Critical Areas

Impact on Inference Performance (Tokens per Second)

Power Consumption Analysis

Code Change Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` #620

UPSTREAM PR #18186: cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` #620