-
Notifications
You must be signed in to change notification settings - Fork 118
chore(l2): use SP1 patched crate for ecmul precompile #5133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lines of code reportTotal lines added: Detailed view |
Benchmark Results ComparisonNo significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
Benchmark Results: SstoreBench_no_opt
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors BN254 elliptic curve precompile functions (ecmul and ecpairing) by extracting common parsing and validation logic into reusable helper functions. The changes improve code organization and maintainability by separating concerns.
Key Changes:
- Extracted BN254 point parsing logic into
parse_bn254_g1,parse_bn254_g2, andparse_bn254_scalarhelper functions - Split validation logic from
validate_bn254_coordsinto separatevalidate_bn254_g1_coordsandvalidate_bn254_g2_coordsfunctions - Added SP1 feature-gated implementation of
bn254_g1_mulusing substrate_bn library
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ilitteri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good. I'll run some replays to ensure it works properly. Left some comments.
| #[expect(unsafe_code, reason = "chunks_exact ensures the conversion is valid")] | ||
| let input: [u8; 192] = unsafe { input.try_into().unwrap_unchecked() }; | ||
| let (g1, g2) = parse_bn254_coords(&input); | ||
| if validate_bn254_coords(&g1, &g2)? { | ||
| batch.push((g1, g2)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove this unsafe? We should double-check with @edg-l, @azteca1998, or @Oppen if this is not counterproductive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we keep this behaviour when neither sp1 nor risc0 features are enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function to parse bn254 points was changed to take a slice instead of a reference to a fixed size array (so it can be reused for precompiles whose input have different sizes), so this unsafe is not necessary anymore.
The branching on the error case (which should never happen) could signify a perf. hit, I could change this to an unreachable_unchecked!() and it should be equivalent to before the change, but I'm not sure how impactful that can be (consider that we have other, more evident optimization opportunities, like not cloning and slicing bytes many times while parsing and converting to the crypto primitive types)
**Motivation** The ecmul precompile represents 10% of the total proving cycles, and we are not using the corresponding SP1 patched crate for this operation that uses a zkVM precompile. This PR fixes that and reduces 81k cycles (86% of ecmul, 10% of total) **Flamegraphs** before: <img width="2936" height="1134" alt="image" src="https://github.com/user-attachments/assets/42212408-8f2e-4dd4-80ea-c9b87a8383ef" /> after: <img width="2716" height="1220" alt="image" src="https://github.com/user-attachments/assets/f82e25f0-a7cb-43e9-ae4d-307a39bd9d0a" /> **Proving times** Mainnet blocks proven in a RTX 4090 | Block number | main | l2/ecmul_sp1 | |---------------|---------|----------------| | 23426993 | 13m 19s | 13m 20s | | 23426994 | 08m 42s | 08m 05s | | 23426995 | 06m 39s | 06m 35s | | 23426996 | 11m 24s | 10m 50s | --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Ivan Litteri <[email protected]> Co-authored-by: ilitteri <[email protected]>
Motivation
The ecmul precompile represents 10% of the total proving cycles, and we are not using the corresponding SP1 patched crate for this operation that uses a zkVM precompile.
This PR fixes that and reduces 81k cycles (86% of ecmul, 10% of total)
Flamegraphs

before:
after:

Proving times
Mainnet blocks proven in a RTX 4090