-
Notifications
You must be signed in to change notification settings - Fork 8
perf: Add arm64 decompression assembly #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Only tested on QEMU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds ARM64 assembly implementation for decompression in the minlz package. The main goal is to provide optimized decompression on ARM64 architectures to match the existing AMD64 assembly support. The implementation includes comprehensive test coverage and updates CI configuration to test on ARM64 runners.
Key changes:
- New ARM64 assembly decoder with fast and slow loops, NEON SIMD optimizations, and special handling for overlapping copies
- Comprehensive test suite with edge cases for overlapping copies, long offsets, and various data patterns
- CI updates to include ARM64 testing and newer toolchain versions
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| decode_test.go | Adds comprehensive tests for ARM64 decoder covering edge cases, overlapping copies, random data, and regression testing |
| decode_other.go | Updates build tags to exclude ARM64 from the Go fallback implementation |
| decode_arm64.go | ARM64-specific wrapper that calls the assembly implementation with race detection support |
| asm_arm64.go | Go stub declaration for the ARM64 assembly decoder function |
| asm_arm64.s | Complete ARM64 assembly implementation with ~986 lines covering all decompression tag types and copy operations |
| .github/workflows/release.yml | Updates Go version to 1.25.x and goreleaser to 2.13.2 |
| .github/workflows/go.yml | Adds ARM64 runner, updates Go versions to 1.25.x, updates goreleaser, and expands fuzz test matrix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This is great! I will do some testing with CockroachDB. |
|
Also, I think GitHub actions support arm: https://github.com/orgs/community/discussions/148648 |
Already added: https://github.com/minio/minlz/pull/29/files#diff-678682767f2477de3e3c546746f8568b9a1942b2c647d32331d7e774b8ff8d9fR16 |
|
@RaduBerinde I would definitely be happy if you could verify that it is at least on par with the Go code. Probably the easiest would be to build the For example using this as a testset: (here multithreaded just hits memory bandwidth, but single thread should be fine). |
|
This is on an Apple M1 laptop: I will also check on a GCE arm machine. |
|
👍🏼 Small improvement. I will have some ARM hardware available for further investigation - I can test on a wider data set. Mostly just making sure it wasn't a regression. |
|
On T2A (older): On C4A (newer): For comparison, this is what a recent x86 (C4) looks like: |
Only tested on QEMU.
See perf on real HW below.