-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RISC-V Vector 1.0 Support: If and where to start? #772
Comments
Thanks for your interest in the topic. We don't have a fixed roadmap. Though, we're interested in adding support for as many platforms as possible, as long as this is supportable. Risc-V checks all those boxes. We build and test on Risc-V already. Next steps that would be great to tackle are
Especially, the infrastructure together with a first kernel for Risc-V would be great. More optimized kernels should be addable way easier afterwards. Depending on the compiler, it may be already very beneficial to have the extension and vector machine available. That needs benchmarking of course. |
What compiler are you using? I'm told gcc 14 has support for riscv vector instructions. |
FYI, to avoid duplicating work: I'm starting to implement some of the kernels (starting from the alphabetically first). I haven't worked with the project before, so I'm unfamiliar with the build structure and CI. |
Quick update, I'm now about halfway through the kernels. |
The GR use case is probably mostly in the 1k-10k element range. Obviously, this might vary. Further, our default benchmarks uses 2^17-1 elements. This is typically too large but an historical artefact. Since your changes would require a rather recent compiler, I suggest to |
Good to see @camel-cdr here. The DVB-T2 transmitter in GNU Radio uses quite a few kernels with fairly large vectors. I also have "bit perfect" test files for the example flow graphs (although for floating point, you have to compare with some margin). UPDATE: The DVB-T2 flow graph I'm considering uses pretty big vectors. 32768 * 19 = 622,592 complex elements (1,245,184 floats). Let me know if you want to use that strategy for testing, and I'll set you up with a set of test files. Also, there's some discussion about infrastructure in #625
|
Obviously, there's no one size fits all. @drmpeg these are quite large, and DVB typical values. I hope that most kernels perform comparably well. I suppose testing for short, full-ish L1 cache, etc. makes the most sense. |
As it turns out, I was in error. After remembering what I implemented, the vector size is only 32768 complex elements. |
I've asked regarding the input size because I'm writing the kernels to maximize LMUL without causing spills. For benchmarking I was just planning to run volk_profile, but if there is something else I can easily test I'd also be interested. One annoyance is that the RISC-V toolchain doesn't provide a way to add single extensions with a command line argument, you can just set Something like this: <machine name="rv64gcv">
<archs>generic riscv64 rvv orc|</archs>
</machine>
<!--machine name="rva22v">
<archs>generic riscv64 rvv rvb rva22v orc|</archs>
</machine>
<machine name="rva23">
<archs>generic riscv64 rvv rvb rva22v rva23 orc|</archs>
</machine--> RVA22 and RVA23 are profiles, but |
For x86, we do Is
Your machine definitions look sane to me. My gut feeling is that we need to get started with RiscV kernels and potentially, we'd need re-organize our support code (or extend it) when we realize that our approach doesn't work long-term. At the moment, I'd like to encourage you to do what you think makes the most sense. |
I’ll just add that the rva22u64 profile also includes bitmap instructions, which some of the kernels might be able to use (e.g. there’s a popcount instruction). |
Yeah, I didn't want to create too many different targets, so I choose base rvv, rva22+v, and rva23, which also includes Zvbb. I've also created a pseudo target rvvseg, that uses segmented load stores when dealing with complex numbers, because they aren't fast on all current hardware (C910). (the regular rvv target uses vnsrl to deinterleave the complex number components) I'll try to get it ready for a PR this weekend. |
Hello,
RISC-V's Vector extension was ratified a few years back and recently vector-supporting boards have come out, many based on the octo-core Spacemit K1/M1. I have been using one such board for a while now, with gnuradio and more, but the performance is quite lacking: when profiling with volk_profile, the generic is around an order of magnitude faster than the alternatives
All that to say, I think RVV 1.0 has many instructions useful for volk and I am willing to help but I have no idea where to start.
If this is in the project's plans, is there a roadmap?
The text was updated successfully, but these errors were encountered: