Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmaf is an empty unreachable function with optimizations on cortex-m33 #123387

Closed
PiJoules opened this issue Jan 17, 2025 · 3 comments · Fixed by #124200
Closed

fmaf is an empty unreachable function with optimizations on cortex-m33 #123387

PiJoules opened this issue Jan 17, 2025 · 3 comments · Fixed by #124200
Assignees
Labels
cmake Build system in general and CMake in particular libc

Comments

@PiJoules
Copy link
Contributor

PiJoules commented Jan 17, 2025

The generic fmaf implementation in llvm-libc is

namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma<float>(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma<float> is

template <> LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @fmaf(float noundef %0, float noundef %1, float noundef %2) #2
  ret float %call
}

Since fmaf just calls fputil::fma<float>, we end up with mutual recursion. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @fmaf(float %x, float %y, float %z) #0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc
@PiJoules PiJoules added libc bug Indicates an unexpected problem or unintended behavior labels Jan 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/issue-subscribers-bug

Author: None (PiJoules)

The generic `fmaf` implementation in llvm-libc is
namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma&lt;float&gt;(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma&lt;float&gt; is

template &lt;&gt; LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @<!-- -->_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #<!-- -->0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @<!-- -->fmaf(float noundef %0, float noundef %1, float noundef %2) #<!-- -->2
  ret float %call
}

Since fmaf just calls fputil::fma&lt;float&gt;, we end up with a circular dependency. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @<!-- -->fmaf(float %x, float %y, float %z) #<!-- -->0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/issue-subscribers-libc

Author: None (PiJoules)

The generic `fmaf` implementation in llvm-libc is
namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma&lt;float&gt;(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma&lt;float&gt; is

template &lt;&gt; LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @<!-- -->_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #<!-- -->0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @<!-- -->fmaf(float noundef %0, float noundef %1, float noundef %2) #<!-- -->2
  ret float %call
}

Since fmaf just calls fputil::fma&lt;float&gt;, we end up with a circular dependency. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @<!-- -->fmaf(float %x, float %y, float %z) #<!-- -->0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

@EugeneZelenko EugeneZelenko removed the bug Indicates an unexpected problem or unintended behavior label Jan 17, 2025
@PiJoules
Copy link
Contributor Author

I believe for the short term, we can replace the __builtin_fmaf with either a vfma.f32 instruction or the normal equivalent C code. For a freestanding environment, it should still be permissible for the compiler to lower the __builtin_ equivalent for a libc function to the actual libc function.

For the long term, it might be useful to have an audit of all the __builtin_ functions that have libc-equivalents and ensuring that they don't end up calling that libc-equivalent. Ideally for each __builtin_ with a libc-equivalent, we would have some target-checking preprocessor checks that ensure it gets lowered to either (1) a target-specific intrinsic (ie. it doesn't have a libc-equivalent), or (2) inline asm to the actual intended instruction, or (3) language-level source code.

@lntue lntue closed this as completed in b11529b Jan 24, 2025
@EugeneZelenko EugeneZelenko added the cmake Build system in general and CMake in particular label Jan 24, 2025
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Jan 24, 2025
bherrera pushed a commit to misttech/mist-os that referenced this issue Feb 4, 2025
…or __builtin_fma* to generate fma instructions. (#124200)

Fixes llvm/llvm-project#123387

GitOrigin-RevId: 47a46c9da8c911f951bbc979f0b31bb4705e24da
Original-Revision: 4c7924433f6ce1e9343a94c11139a1ad1e74fa1b
Roller-URL: https://cr-buildbucket.appspot.com/build/8724889844667415329
CQ-Do-Not-Cancel-Tryjobs: true
Change-Id: Ifc9df381dd10512bebb738fb0752ff5f6ee2f2db
Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1192714
bherrera pushed a commit to misttech/integration that referenced this issue Feb 4, 2025
…ltin_fma* to generate fma instructions. (#124200)

Fixes llvm/llvm-project#123387

GitOrigin-RevId: 47a46c9da8c911f951bbc979f0b31bb4705e24da
Original-Revision: 4c7924433f6ce1e9343a94c11139a1ad1e74fa1b
Change-Id: I9ffb4627ed2350df101bc1f9d222785f4e30ab00
bherrera pushed a commit to misttech/integration that referenced this issue Feb 4, 2025
…-fno-math-errno to for __builtin_fma* to generate fma instructions. (#124200)

Fixes llvm/llvm-project#123387

GitOrigin-RevId: 8b8370bf031ebb9fb78e960e97f189212d191acb
Original-Revision: 4c7924433f6ce1e9343a94c11139a1ad1e74fa1b
Original-Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1192714
Original-Revision: caea7397c036f6f9caa5996620f598445d2d3103
Change-Id: Ic1b0b752966ee2c5fc8210afdd234fd9a51baac0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Build system in general and CMake in particular libc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants