RISCV64-CI: don't rely on dependency resolution for qemu-user #5506

martin-frbg · 2025-10-14T20:47:23Z

No description provided.

martin-frbg · 2025-10-31T12:02:42Z

@ChipKerchner do we expect casts from bfloat16 to float32 to "just work" for C code on RISCV64 ? AFAICT this is not implemented at least in the cross-compiler setup that this gh workflow uses (even when using latest LLVM with latest riscv-gnu-toolchain), causing test failures as the intermediate result0 = (float)A[ai] * (float) B[bi] in your sbgemm kernel turns the small bfloat16 numbers into huge floats...

ChipKerchner · 2025-10-31T12:19:08Z

Scalar casting should just work from bfloat16 to float. I don't see any issue. These are the qemu flags I use.

qemu-riscv64 -cpu rv64,g=true,f=true,d=true,c=true,v=true,vlen=256,elen=64,vext_spec=v1.0,zfh=true,zvfh=true,zvfbfwma=true,rvv_ma_all_1s=true,rvv_ta_all_1s=true,zbc=true,zvbc=true -L /home/ckerchner/tools/tt-riscv-toolchain-ae8a01f3/sysroot

ChipKerchner · 2025-10-31T12:50:06Z

Actually after I sync, I'm seeing a failure in sbgemm - sbgemv seems fine. BTW, I didn't write sbgemm.

martin-frbg · 2025-10-31T13:13:54Z

Thanks for the flags - unfortunately adding the missing ones did not change the outcome for me. And I'm getting SGEMV FAILURES: 789504 as well with that setup, while the BGEMM test passes (as do all float16 ones). Most likely your TT toolchain is more advanced, and I should just leave out the SB tests in this CI job for now ?
I just noticed the use of plain (float) casts in some of the code, while the tests all go to sbf16tos() for conversions.

ChipKerchner · 2025-10-31T13:19:07Z

Are you saying that some architectures besides RISC-V are using plain casts to float while others are using a external function?

ChipKerchner · 2025-10-31T13:24:19Z

BTW, I tried an external function and I'm still getting failures.

martin-frbg · 2025-10-31T13:36:39Z

Are you saying that some architectures besides RISC-V are using plain casts to float while others are using a external function?

No, on the contrary I see RISC-V using plain casts while everything else uses an external function.
And at least the first few intermediate calculations in the sbgemm_kernel_16x8_zvl256 seem to make more sense now that I've changed them from casts to using the float16to32 wrapper around sbf16tos as in the test helper header

ChipKerchner · 2025-10-31T15:48:59Z

Strange thing is SHGEMM uses the same type casting and all pass there.

martin-frbg · 2025-10-31T17:04:18Z

Yes, this got me thinking that maybe there is a conflict between the compiler having (or being expected to have) some "native" support for a floating point "bf16" type and OpenBLAS' fallback solution of assuming bfloat16 is an uint_16.
Replacing all obvious casts with calls to the conversion function did not solve the test errors for me, however - a lot of the result matrix elements became similar enough to their SGEMM counterparts, but not all. And I have no way of finding out if it is the cross-compiler at fault, or qemu-riscv64 10.1 not handling all aspects of bfloat16 correctly. My Banana PI F3 does great for checking fp16 code but appears to lack support for the bfloat16 extensions

ChipKerchner · 2025-10-31T17:39:33Z

Yes, unfortunately the BananaPi does NOT support the bf16 format.

Another weird thing is the test pass for sizes 1 -> 100 but fail for size = 256.

ChipKerchner · 2025-11-03T15:39:41Z

Are you sure you don't need to set this environment variable instead of LD_LIBRARY_PATH?

QEMU_LD_PREFIX=/proj_sw/user_dev/ckerchner/tmp/tt-riscv-toolchain-20250709/sysroot

martin-frbg · 2025-11-03T16:19:23Z

Agree that QEMU_LD_PREFIX would be more elegant than abusing LD_LIBRARY_PATH combined with the ugly hack of crosslinking the riscv64 ld-linux into the host system path. But unfortunately this has no bearing on the main issue that this toolchain (or the most recent stable qemu) appears to produce completely bogus intermediate results (in the 2e5 to 2e6 range) from __riscv_vfwmaccfb16_vf_f32m1(result0, B0,A0,gvl). I trust your statement that it works on actual hardware, but having this in the CI job is going to be useless if basically every matrix element gets flagged as wrong.

ChipKerchner · 2025-11-03T16:30:23Z

Actually I don't have actual HW for BF16 - it's all QEMU.

Maybe the initialization values should be between [-0.5, +0.5] for the test rather than [+0.5,+1.5]

martin-frbg · 2025-11-03T16:46:32Z

Curious - that would leave the toolchain difference if you're also using a regular release version of qemu.
No particular preference for the test values, but I note that the range provided by a simple rand/rand_max+0.5 worked well on all other platforms so far (and works for the BGEMM test too). Maybe the conversion between OpenBLAS' bfloat16 and the __bf16 type is doing something unexpected in clang-21.4+riscv-gnu-toolchain

ChipKerchner · 2025-11-03T17:26:10Z

Maybe additional extensions are required.

https://github.com/riscv/riscv-bfloat16/blob/main/doc/riscv-bfloat16-zvfbfwma.adoc

Zvfbfwma - Vector BF16 widening mul-add
This extension provides a vector widening BF16 mul-add instruction that accumulates into FP32.

This extension requires the Zvfbfmin extension and the Zfbfmin extension.

martin-frbg · 2025-11-03T21:23:47Z

Hmm, I had always assumed these to be implied by the zvfbfwma. And indeed adding them to the compiler options does not change anything (and they were already in the qemu options).

ChipKerchner · 2025-11-04T19:10:30Z

I have 2 ideas of why test_sbgemm/v is failing.

There is some conflict between BUILD_BFLOAT16 and BUILD_HFLOAT16
On RISC-V, BF16 type is __bf16 and not bfloat16. Maybe it affects the conversions?

Maybe we should only test BUILD_BFLOAT16 and see if it still fails.

ChipKerchner · 2025-11-04T19:21:51Z

This looks wrong in gemmkernel_2x2.c:

             C0[0] = TO_OUTPUT(TO_F32(C0[0])+res0);

C0 is already a float32 and TO_F32 converts BF16 -> F32.

The same in gemv_n/t.c

            y[iy] = TO_OUTPUT(ALPHA * temp + BETA * TO_F32(y[iy]));

y is already a F32.

P.S. FP16 seems correct for TO_F32

martin-frbg · 2025-11-04T19:38:13Z

I have 2 ideas of why test_sbgemm/v is failing.

There is some conflict between BUILD_BFLOAT16 and BUILD_HFLOAT16

On RISC-V, BF16 type is __bf16 and not bfloat16. Maybe it affects the conversions?

Maybe we should only test BUILD_BFLOAT16 and see if it still fails.

I don't think so - I see the same problem when building just BFLOAT16
I had commented on the __bf16 type 4 days ago. In the sbgemm kernel, assignments from bfloat16 to this type produce an additional truncation of the value to just a single decimal digit. but so far I do not see why subsequent calculations then produce six-digit float32 results

ChipKerchner · 2025-11-04T20:23:15Z

There shouldn't be a conversion from bfloat16 to __bf16. It should be picked up as a __bf16 - something like *(__bf16 *)(&B[0]) instead of a cast.

Though maybe it would be easy to change the pointer types to __bf16?

martin-frbg added 20 commits October 14, 2025 22:46

install qemu-user package directly

87c0cd2

add riscv64 elf loader&libraries package

466314f

add library and loader paths

08109f3

Update riscv64_vector.yml

822a7f3

Merge branch 'OpenMathLib:develop' into fixup5496

8412041

add sbgemm/shgemm test for zvl256b target

ec03b83

Enable relevant b/hfloat extensions in qemu cpu string

c1878de

comment out the sh/sb options and tests as they require a newer qemu

1ef1100

add local build of qemu-10.1.1

c26e223

typo fix

0a79085

Update riscv64_vector.yml

928e4e9

Update riscv64_vector.yml

211f1ed

fix gist link to qemu

f305d25

Update riscv64_vector.yml

aaf5329

Update riscv64_vector.yml

d2ec4e0

Update riscv64_vector.yml

99196a6

Update riscv64_vector.yml

18da9c3

Update riscv64_vector.yml

db84577

Update riscv64_vector.yml

8c234ce

fix gcc version

d5d0ce9

martin-frbg added 4 commits November 2, 2025 22:46

remove sbgemm/gemv tests for now

294bbf3

Merge branch 'OpenMathLib:develop' into fixup5496

a431012

remove bgemm test as well

43e5114

Update riscv64_vector.yml

6ba3c4b

Update riscv64_vector.yml

f88681f

remove the failing sbgemv test again

b9c3ec1

martin-frbg merged commit 3a9da52 into OpenMathLib:develop Nov 4, 2025
86 of 88 checks passed

RISCV64-CI: don't rely on dependency resolution for qemu-user #5506

RISCV64-CI: don't rely on dependency resolution for qemu-user #5506

Conversation

martin-frbg commented Oct 14, 2025

Uh oh!

martin-frbg commented Oct 31, 2025

Uh oh!

ChipKerchner commented Oct 31, 2025

Uh oh!

ChipKerchner commented Oct 31, 2025

Uh oh!

martin-frbg commented Oct 31, 2025

Uh oh!

ChipKerchner commented Oct 31, 2025

Uh oh!

ChipKerchner commented Oct 31, 2025

Uh oh!

martin-frbg commented Oct 31, 2025

Uh oh!

ChipKerchner commented Oct 31, 2025

Uh oh!

martin-frbg commented Oct 31, 2025

Uh oh!

ChipKerchner commented Oct 31, 2025

Uh oh!

ChipKerchner commented Nov 3, 2025

Uh oh!

martin-frbg commented Nov 3, 2025

Uh oh!

ChipKerchner commented Nov 3, 2025

Uh oh!

martin-frbg commented Nov 3, 2025

Uh oh!

ChipKerchner commented Nov 3, 2025

Uh oh!

martin-frbg commented Nov 3, 2025

Uh oh!

Uh oh!

ChipKerchner commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-frbg commented Nov 4, 2025

Uh oh!

ChipKerchner commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChipKerchner commented Nov 4, 2025 •

edited

Loading

ChipKerchner commented Nov 4, 2025 •

edited

Loading

ChipKerchner commented Nov 4, 2025 •

edited

Loading