feat(gpu): implement shuffle by enzodimaria · Pull Request #3472 · zama-ai/tfhe-rs

enzodimaria · 2026-04-14T12:53:19Z

This PR contains...

the new feature FHE bitonic_shuffle:

Benchmarks:

  +----------+-----------+-------------------------------------+----------+---------+
  | Size (n) | Parameter | Operation                           | GPU      | CPU     |
  +----------+-----------+-------------------------------------+----------+---------+
  | 8        | MULTIBIT  | unchecked_bitonic_shuffle_with_keys | 266 ms   | -       |
  |          |           | bitonic_shuffle                     | 271 ms   | -       |
  |          |           | OPRF (estimated)                    | 5 ms     | -       |
  |          |-----------+-------------------------------------+----------+---------+
  |          | CLASSICAL | unchecked_bitonic_shuffle_with_keys | 301 ms   | 2000 ms |
  |          |           | bitonic_shuffle                     | 309 ms   | 2100 ms |
  |          |           | OPRF (estimated)                    | 8 ms     | 100 ms  |
  +----------+-----------+-------------------------------------+----------+---------+
  | 16       | MULTIBIT  | unchecked_bitonic_shuffle_with_keys | 826 ms   | -       |
  |          |           | bitonic_shuffle                     | 836 ms   | -       |
  |          |           | OPRF (estimated)                    | 10 ms    | -       |
  |          |-----------+-------------------------------------+----------+---------+
  |          | CLASSICAL | unchecked_bitonic_shuffle_with_keys | 824 ms   | 4800 ms |
  |          |           | bitonic_shuffle                     | 839 ms   | 5100 ms |
  |          |           | OPRF (estimated)                    | 15 ms    | 300 ms  |
  +----------+-----------+-------------------------------------+----------+---------+
  | 32       | MULTIBIT  | unchecked_bitonic_shuffle_with_keys | 2420 ms  | -       |
  |          |           | bitonic_shuffle                     | 2442 ms  | -       |
  |          |           | OPRF (estimated)                    | 22 ms    | -       |
  |          |-----------+-------------------------------------+----------+---------+
  |          | CLASSICAL | unchecked_bitonic_shuffle_with_keys | 2246 ms  | 12500 ms|
  |          |           | bitonic_shuffle                     | 2279 ms  | 13000 ms|
  |          |           | OPRF (estimated)                    | 33 ms    | 500 ms  |
  +----------+-----------+-------------------------------------+----------+---------+
  | 64       | MULTIBIT  | unchecked_bitonic_shuffle_with_keys | 6704 ms  | -       |
  |          |           | bitonic_shuffle                     | 6741 ms  | -       |
  |          |           | OPRF (estimated)                    | 37 ms    | -       |
  |          |-----------+-------------------------------------+----------+---------+
  |          | CLASSICAL | unchecked_bitonic_shuffle_with_keys | 5988 ms  | -       |
  |          |           | bitonic_shuffle                     | 6069 ms  | -       |
  |          |           | OPRF (estimated)                    | 81 ms    | -       |
  +----------+-----------+-------------------------------------+----------+---------+

1 x H100-SXM

andrei-stoian-zama

Should the API work with GpuFheXYArray ?

enzodimaria · 2026-04-24T07:43:45Z

Should the API work with GpuFheXYArray ?

Yes but I have to write the HL api before, I'll do it after the merge of the CPU shuffle 👍

github-actions · 2026-04-29T14:47:25Z

✅ Backward-compat snapshot: everything looks good! No backward-compatibility issues detected.

andrei-stoian-zama

Looks good but please remove:

Two dead functions:

batched_tree_sign_reduction — only called from host_batched_unsigned_comparison
host_batched_unsigned_comparison — never called (explicitly flagged "Unused" in its own comment)

Nine dead fields on int_bitonic_sort_buffer, each with alloc + release:

batch_cmp_packed — only in host_batched_unsigned_comparison
batch_cmp_comparisons — only in host_batched_unsigned_comparison
batch_identity_lut — only in host_batched_unsigned_comparison
batch_is_non_zero_lut — only in host_batched_unsigned_comparison
batch_cmp_tree_x — only in batched_tree_sign_reduction
batch_cmp_tree_y — only in batched_tree_sign_reduction
batch_inner_tree_leaf_lut — only in batched_tree_sign_reduction
batch_last_tree_leaf_lut — only in batched_tree_sign_reduction
preallocated_h_lut — only in batched_tree_sign_reduction

andrei-stoian-zama · 2026-05-12T15:16:37Z

For a next PR:

batch the comparisons (based on initial work in host_batched_unsigned_comparison)
the cmux implemented here (shuffle.cuh:283) performs a "message extract" step to clean noise. but we might not need to clean the noise since the subsequent comparison first performs a subtraction, then cleans the noise (so it can pack) - only works for message_modulus==carry_modulus
explore if a single kernel can perform the various copy operations that are done on loops atm

andrei-stoian-zama

I'd also like to have a better explanation of the algorithm in the code comments.

I asked Claude to produce pseudo code. Could you confirm this pseudo code is correct and then break it up and copy it into comments next to the code blocks that implement it ?


High-Level Pseudocode

  FUNCTION bitonic_shuffle_with_keys(keys[], data[], n):
      # Pad to next power of 2
      padded_n = next_power_of_2(n)
      FOR i IN [n, padded_n):
          keys[i] = MAX_VALUE   # sentinel: always sorts to end
          data[i] = 0

      # Bitonic network
      FOR k = 2, 4, 8, ... while k <= padded_n:
          FOR j = k/2, k/4, ... while j >= 1:
              bitonic_substep(keys, data, padded_n, k, j)

      RETURN keys[0..n], data[0..n]   # drop sentinels


  FUNCTION bitonic_substep(keys[], data[], n, k, j):
      # Step 1: Compare all pairs in parallel (one PBS per block)
      FOR each i where (i XOR j) > i:
          l = i XOR j
          ascending = ((i AND k) == 0)
          sign[i] = FHE_compare(keys[i], keys[l])   # → {INF, EQ, SUP}
     # Step 2: Conditional swap keys (batched CMUX)
      FOR each pair (i, l):
          should_swap = ascending ? (sign == SUP) : (sign == INF)
          (keys[i], keys[l]) = CMUX(should_swap,
                                     (keys[l], keys[i]),   # swapped
                                     (keys[i], keys[l]))   # unchanged

      # Step 3: Same CMUX for data (reuse comparison result)
      FOR each pair (i, l):
          (data[i], data[l]) = CMUX(should_swap,
                                     (data[l], data[i]),
                                     (data[i], data[l]))

  ---
  CMUX (Conditional Multiplexing) — the core primitive

  Since we can't branch on encrypted values, swaps are done via:

  FUNCTION CMUX(condition, true_val, false_val):
     # condition ∈ {INF=0, EQ=1, SUP=2}

      # Bivariate PBS: zero out the losing branch
      out_true  = bivariate_PBS(true_val,  condition,
                      LUT: (b, c) -> if c == SUP then b else 0)
      out_false = bivariate_PBS(false_val, condition,
                      LUT: (b, c) -> if c != SUP then b else 0)

      # Add: exactly one branch is nonzero
      result = HE_add(out_true, out_false)
      result = message_extract(result)    # clean up noise
      RETURN result

andrei-stoian-zama · 2026-05-12T15:24:28Z

Could you please also look into refactoring CMUX : could the FheUintXY Cmux call into the batched version with a batch of 1 ? then a single function could be used in shuffle and the single value cmux operation

cla-bot Bot added the cla-signed label Apr 14, 2026

enzodimaria force-pushed the edm/shuffle branch 6 times, most recently from 0ae7754 to f6b8be2 Compare April 15, 2026 13:31

andrei-stoian-zama reviewed Apr 16, 2026

View reviewed changes

enzodimaria requested a review from andrei-stoian-zama April 24, 2026 07:44

enzodimaria marked this pull request as ready for review April 24, 2026 07:44

enzodimaria requested review from a team, SouchonTheo, nsarlin-zama, soonum and tmontaigu as code owners April 24, 2026 07:44

enzodimaria marked this pull request as draft April 24, 2026 07:44

enzodimaria force-pushed the edm/shuffle branch 5 times, most recently from 9c4dd42 to 67dad3e Compare April 29, 2026 14:38

enzodimaria force-pushed the edm/shuffle branch from 67dad3e to 9a793c6 Compare May 5, 2026 07:54

andrei-stoian-zama requested changes May 5, 2026

View reviewed changes

andrei-stoian-zama reviewed May 12, 2026

View reviewed changes

enzodimaria force-pushed the edm/shuffle branch 2 times, most recently from 6617bbb to b096209 Compare May 19, 2026 08:31

enzodimaria force-pushed the edm/shuffle branch 3 times, most recently from 16535f9 to acfa092 Compare May 19, 2026 14:18

enzodimaria marked this pull request as ready for review May 20, 2026 08:04

enzodimaria force-pushed the edm/shuffle branch from acfa092 to 52233a0 Compare May 20, 2026 08:07

enzodimaria marked this pull request as draft May 20, 2026 08:19

enzodimaria force-pushed the edm/shuffle branch 3 times, most recently from 9179e98 to d4d3465 Compare May 20, 2026 13:22

feat(gpu): implement shuffle

aa282f7

enzodimaria force-pushed the edm/shuffle branch from d4d3465 to aa282f7 Compare May 21, 2026 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpu): implement shuffle#3472

feat(gpu): implement shuffle#3472
enzodimaria wants to merge 1 commit into
mainfrom
edm/shuffle

enzodimaria commented Apr 14, 2026 •

edited

Loading

Uh oh!

andrei-stoian-zama left a comment

Uh oh!

enzodimaria commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

andrei-stoian-zama left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrei-stoian-zama commented May 12, 2026 •

edited

Loading

Uh oh!

andrei-stoian-zama left a comment •

edited

Loading

Uh oh!

andrei-stoian-zama commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

enzodimaria commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrei-stoian-zama left a comment

Choose a reason for hiding this comment

Uh oh!

enzodimaria commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

andrei-stoian-zama left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrei-stoian-zama commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrei-stoian-zama left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrei-stoian-zama commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enzodimaria commented Apr 14, 2026 •

edited

Loading

andrei-stoian-zama commented May 12, 2026 •

edited

Loading

andrei-stoian-zama left a comment •

edited

Loading