Skip to content

Conversation

@davxy
Copy link
Member

@davxy davxy commented Oct 28, 2025

IIUC, someone will soon need to benchmark the performance gains from using EC hostcalls in the PoP project.
This makes easier to instantiate an ark-vrf Bandersnatch suite that takes advantage of those hostcalls

@davxy davxy added the dependencies Pull requests that update a dependency file label Oct 28, 2025
@burdges
Copy link

burdges commented Oct 28, 2025

We should do another PR here and in https://github.com/paritytech/arkworks-substrate that switches these projective calls back to affine. This was a moment of stupidity on my part when I told the original dev to do this, sorry everyone.

It should simplify the code since affine points are easier to serialize. Anyone who needs this for performance on local networks can use it as is.

@davxy davxy requested a review from a team October 28, 2025 14:49
@davxy davxy marked this pull request as ready for review October 28, 2025 14:49
@davxy davxy added T17-primitives Changes to primitives that are not covered by any other label. and removed dependencies Pull requests that update a dependency file labels Oct 28, 2025
@davxy
Copy link
Member Author

davxy commented Oct 28, 2025

We should do another PR here and in https://github.com/paritytech/arkworks-substrate that switches these projective calls back to affine. This was a moment of stupidity on my part when I told the original dev to do this, sorry everyone.

@burdges I see what you mean. Why you don't want to serialize projective? What is the drawback?

@burdges
Copy link

burdges commented Oct 28, 2025

A conversion projective to affine is free by adding a coordfinate that's just 1, while affine to projective costs a division, not so bad but not free, but not necessarily worth having a hostcall for either. As a rule, regular code would usually leave everything in projective until they needed affine for serialization or hashing or pairings, hecne this using projective. There are however two problems doing this:

First, affine to projective costs way more in the runtime than in native, so its better have done it if we immediately do serialization or hashing or pairings. I now think hashing or pairings would almost occur immediately after doing an MSM, so its easier to simply return affine.

Anytime we wrap an arkworks function that returns projective, then that affine to projective conversion is free, and then if no more EC operations occur then a conversion back projective to affine remains free since conversion check if the extra coordiante is still 1.

Second, affine coordinates have a semi-standard representations related to serialization, but projective coordinates have no standards, since they're an implementation detail.. I asked if anyone felt the projective coordinates might change, but never quite liked the answers, so we face some risk that arkworks changes their internal projective coordinates.

Third, there are multiple libraries that implement the same curves. In fact, even affine often need minor conversion when passing between libraries since they're not quite serialized yet. We've good examples about how affine conversion between the different libraries works though:
https://github.com/search?q=repo%3AMystenLabs%2Ffastcrypto+conversion+arkworks&type=pullrequests

We could improve performance by switching form arkworks to blst, but still make everthing look like arkworks to the runtime, likely we do this with curve25519-dalek for example. We'd have to figure all the conversion shit out ourselves for projective, which sounds more complex than for affine.

It's more likely that someone writes wrappers for our hostcalls using the traits in RustCrypto, Zcash, or some blst crate, maybe someone does this with curve25519-dalek too. Afain this could require figuring out conversions, which sounds more complex than for affine.

@davxy
Copy link
Member Author

davxy commented Oct 28, 2025

A conversion projective to affine is free by adding a coordfinate that's just 1, while affine to projective costs a division,

I imagine you mean the other way around :-)
Converting from affine to projective is free, since you just set the z coordinate to 1,
while converting from projective to affine requires a division.

And that is the main reason I asked "why you want to do that"

Second, affine coordinates have a semi-standard representations

I agree that the standard encoding is better, but since this is purely internal data marshaling, I'm not sure it's worth the overhead. E.g. for product there will be 2 divisions: one to enter the host call amd another to return the result

Third, there are multiple libraries that implement the same curves...

This (IIUC your reasoning) makes sense. You mean that if we ever switch libraries in the runtime, then by using affine coordinates in the ABI between the host and the runtime, we're more likely to have a form that's already compatible.

@davxy
Copy link
Member Author

davxy commented Oct 28, 2025

I'd like to add that the code changes to use affine is trivial and I can do this in this PR. I just need to better understand the argument and weigh the benefits against the drawbacks.

@burdges
Copy link

burdges commented Oct 29, 2025

I'd keep the PR seperateif that's just as easy, but whatever you like.

I imagine you mean the other way around :-)

Yup oops, affine to projective is free.

I agree that the standard encoding is better, but since this is purely internal data marshaling, I'm not sure it's worth the overhead.

As I said backwards above: As projective to affine costs way more in the runtime than in native, ..., so its easier to simply return affine always.

E.g. for product there will be 2 divisions: one to enter the host call amd another to return the result

That's not the correct analysis: We have projective + projective -> projective and affine + projective -> projective, which give us scalar * projective -> projective and scalar * affine -> projective, respectively. Affine inputs should be faster, but projective inputs are faster than converting to affine plus doing the affine version.

We never have .. -> affine without conversion so yes my proposal makes the operation itself more expensive, but we'll frequently convert after doing these operations, if only for hashing. If host runs maybe 6x faster then the runtime, then we break even if convert after 1 in 6 operations, so the seemingly expensive way will wind up cheaper.

Also projective->affine->projective->affine only costs the first projective->affine, which runs in the host. It follows all our return values should be affine, even if we immeidately convert them to projective and back.

Alright that's return values, but you asked about inputs too.

Now <G as ScalarMul>::MulBase is affine, so MSMs always consume affine, because the

A correct analysis would be: If MSMs always consume affine, then you could above remarks into saying we should convert from projective to affine in the host, not the runtime. This is true. It's impossible though since we must support existing msm methods that all consume affine, aka all existing arkworks code would still convert into affine in the runtime, no mater way we do here.

We could only add functions to the arkworks interface and hope people use them, but the dependency injection msm calls that Achim added to TECurveConfig and SWCurveConfig already consume affine, so even our own code requires affine here. It follows all MSM inputs should be affine.

As an side, Curve::Group::normalize_batch converts does arbitrarily many projective-to-affine conversions using only one division, but still costs 2 n multiplications.

Alright imho the above arguements settle the returns case and the MSM input case, without appeal to stability concerns. We've one final case though, single scalar mult inputs. Should we do scalar * projective -> affine or scalar * affine -> affine or both?

We support scalar * point -> point because even one per tx adds up quickly. Yet, if verifier code does scalar * point -> point calls then either (a) you care little that they are slow for some reason, or (b) you should optimize it to use MSMs, maybe doing batch verification.

Imho we could've justified risks for MSMs, but here performance matters less and maintenance costs matter more, including the projective format risks, so only doing scalar * affine -> affine makes sense.

I've made a weaker argument here than above though. We could benchmark s * (x+y) vs s * (x+y).into_affine() with x,y,s being random points and scalar. In this, the x+y ensures you've a projective with a slow conversion into affine, so this accurately tells us the cost of doing the affine conversion first.

You mean that if we ever switch libraries in the runtime, then by using affine coordinates in the ABI between the host and the runtime, we're more likely to have a form that's already compatible.

Yes: We've good odds of "adding" more libraries in the runtime, not "switching" them. We might "switch" them in the host, if arkworks never catches up to blst, or Gav wants GPUs, or whatever.

We want both for ed25519, but that's an extremely special case, and not here currently.

@davxy davxy added this pull request to the merge queue Nov 4, 2025
Merged via the queue into master with commit ef07f24 Nov 4, 2025
379 of 398 checks passed
@davxy davxy deleted the davxy/bump-ark-versions branch November 4, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T17-primitives Changes to primitives that are not covered by any other label.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants