Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-ready to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
| let value = P::broadcast(value); | ||
|
|
||
| for element in slice.as_slice_mut() { | ||
| *element = value; | ||
| } |
There was a problem hiding this comment.
Nice!
Can be simplified even further:
slice.as_slice_mut().fill(P::broadcast(value));
| .for_each(|(chunk, output)| { | ||
| let scalar_iter = P::iter_slice(chunk) | ||
| .tuples() | ||
| .map(|(left, right)| left * right); | ||
| *output = P::from_scalars(scalar_iter); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Potentially it would be faster to re-use the packed multiplication by interleaving values:
let (lhs, rhs) = PackedField::interleave(chunk[0], chunk[1]);
let mults = lhs*rhs;
PackedField::from_scalars(mults.iter().step_by(2).copied(), mults.iter().skip(1).step_by(2).copied())
344ff0b to
1de23a6
Compare
e0e8aef to
f1b3aa2
Compare
1de23a6 to
07af836
Compare
| Some(log_num_inputs) => log_num_inputs, | ||
| }; | ||
| let expected_round_outputs_len = log_num_inputs; | ||
| if round_outputs.len() != expected_round_outputs_len as usize { |
There was a problem hiding this comment.
nit: maybe move the logic verifying the input and output dimensions to a separate helper function to share between the reference and fast implementations?

TL;DR
Implemented the
pairwise_product_reducefunction for the FastCpuLayer and added tests for it.What changed?
pairwise_product_reducefunction in theComputeLayerExecutortrait forFastCpuLayerfillmethod fromSmallOwnedChunkfill_constantimplementation to use the new slice abstractionItertoolsandPackedMemorySlice)How to test?
Run the new tests:
Why make this change?
This change implements a previously unimplemented function that is needed for computing pairwise products and reducing them, which is a common operation in cryptographic protocols. The implementation is optimized for CPU execution using parallel processing where possible.