Skip to content

Conversation

@Narsil
Copy link

@Narsil Narsil commented Jul 6, 2023

Hi here.

I am attempting to port basically ggml matrix multiplication into a standalone crate: https://github.com/Narsil/ggblas

For most of the operations, I was able to leverage intrinsics: https://doc.rust-lang.org/core/arch/arm/index.html
However for M1 (so arm aarch64), it's missing some SIMD f16 intrinsics.

https://developer.arm.com/documentation/101028/0012/13--Advanced-SIMD--Neon--intrinsics

Not sure if the approach I suggest here is viable, my understanding of low level primitives such as these is fairly limited.

Happy to run a more complete set of operations if this is indeed deemed interesting.

Seems the proper implementation into the compiler itself would be something like : rust-lang/stdarch#344

That's why I felt the intrinsics would have their place here.

Cheers !

Other refS: rust-lang/rfcs#3451

HuggingFace-MacMini-Wozniak and others added 5 commits July 6, 2023 22:00
@Narsil Narsil changed the title [Tentative] Adding new intrinsics for ggblas. [Tentative] Adding new intrinsics for gemm. Aug 1, 2023
@VoidStarKat
Copy link
Owner

VoidStarKat commented Aug 5, 2023

I'm fine with putting these in the crate, maybe make sure that existing aarch64 assembly in crate doesn't overlap though, and make any existing code use the new names if there is any overlap.

However, I don't want to publicly expose the binary16 module, that's an internal structural implementation detail. Perhaps just expose these at half::arch::aarch64?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants