F16 intrinsics standalone #5

Narsil · 2023-08-01T11:51:39Z

This is very dirty PR more a POC than anything else at this point.

It seems to work and be correct. (It passes in every scenario I tried.)
It is faster than without.

half-rs is using a fork VoidStarKat/half-rs#98 to get some currently non existing intrinsics for pure f16 computing.

Then hackilishly added them into gemm:

Copy-pasted the code for f16 gemm (which does f16 -> f32simd -> matmul -> f16) to do purely f16 -> f16.

The code requires black_box atm for the compiler to be happy. This is most likely an error of mine in half-rs intrinsics implementation (I used arm! macro but do no understand how that affects the compiler).

I didn't re-optimize this afterwards to make sure cache lines were adapted or anything of the sort.

Current results:

GGML WITHOUT ACCELERATE (f32xf16) -> f32 :  220ms (1 thread) - 197ms (8 threads)
GEMM (f16xf16x) -> f16:   136ms (thread) - 68ms (8 threads)
M, N, K :  4096 x 128 x 11108

For reference Accelerate seems to do ~25ms for the same op and threading seems to decrease performance on it , which I guess is because Accelerate already uses threading underneath).

~-25% overall 97ms (1 thread) 52ms (8 threads)

Narsil added 3 commits August 1, 2023 10:39

Using m1 intrinsics for f16xf16

c7a1ceb

Removing black box.

a8f0280

Cleanup.

c2d2173

Narsil requested a review from LaurentMazare August 1, 2023 11:51

Following @sarah-ek advices, adding more register helped !

c39304a

~-25% overall 97ms (1 thread) 52ms (8 threads)

LaurentMazare approved these changes Aug 1, 2023

View reviewed changes

Narsil merged commit e7ef6f9 into main Aug 1, 2023

Narsil mentioned this pull request Aug 3, 2023

Apple silicon (MPS backends) support? huggingface/candle#313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

F16 intrinsics standalone #5

F16 intrinsics standalone #5

Uh oh!

Narsil commented Aug 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

F16 intrinsics standalone #5

F16 intrinsics standalone #5

Uh oh!

Conversation

Narsil commented Aug 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants