Skip to content

chore(gpu): save state of trivium/kreyvium#3429

Open
enzodimaria wants to merge 1 commit intomainfrom
edm/kreyvium-init-next-drop
Open

chore(gpu): save state of trivium/kreyvium#3429
enzodimaria wants to merge 1 commit intomainfrom
edm/kreyvium-init-next-drop

Conversation

@enzodimaria
Copy link
Copy Markdown
Contributor

@enzodimaria enzodimaria commented Mar 26, 2026

This PR contains

about Kreyvium:

  • Noise fix, before 6 additions were performed.
  • Persistant version implemented
  • Bench for monolithic and persistant version in classical and multibit params

about Trivium :

  • Noise fix, before 6 additions were performed.
  • Persistant version implemented
  • Bench for monolithic and persistant version in classical and multibit params
================================================================================
                    BEFORE: MONOLITHIC ARCHITECTURE
================================================================================

Request: Generate N bits
------------------------
[ Rust Client ] ----(Key, IV, N)----> [ GPU Backend ]
                                            |
                                            |-- 1. Allocate massive VRAM buffers
                                            |-- 2. WARMUP (1152 steps)
                                            |-- 3. GENERATE (N steps)
                                            |-- 4. Free VRAM buffers
                                            v
[ Rust Client ] <-----(N bits)------- [ GPU Backend ]

* Problem: Want M more bits ? You must call it again, wait for another 
  1152-step warmup, and allocate buffers for N+M bits.


================================================================================
                    NEW: STATEFUL ARCHITECTURE
================================================================================

WHAT IS "STATE" ?
-----------------------
  The State is a persistent object kept in the Client's 
  CPU RAM between calls. It contains the fully encrypted internal memory of 
  the Kreyvium stream cipher at a given time T:
   - FHE Registers : A (93 bits), B (84 bits), C (111 bits)
   - FHE Key & IV  : (128 bits each) + their current rotation offsets
   
  It acts as a "bookmark" allowing the stateless GPU to resume calculation 
  exactly where it left off, without leaking memory.

PHASE 1: Initialization
-----------------------
[ Rust Client ] -----(Key, IV)------> [ GPU Backend ]
                                            |
                                            |-- 1. Allocate temporary VRAM
                                            |-- 2. WARMUP (1152 steps)
                                            |-- 3. Free VRAM
                                            v
[ Rust Client ] <----(State T0)------ [ GPU Backend ]
  (Keeps State T0 in RAM)


PHASE 2: Continuous Generation (Call 1)
---------------------------------------
[ Rust Client ] -----(State T0, N)--> [ GPU Backend ]
                                            |
                                            |-- 1. Allocate VRAM for N bits
                                            |-- 2. Inject State T0
                                            |-- 3. GENERATE (N steps)
                                            |-- 4. Extract new State T1
                                            |-- 5. Free VRAM
                                            v
[ Rust Client ] <----(N bits + State T1)- [ GPU Backend ]
  (Updates RAM to State T1)


PHASE 3: Continuous Generation (Call 2)
---------------------------------------
[ Rust Client ] -----(State T1, M)--> [ GPU Backend ]
                                            |
                                            |-- 1. Allocate VRAM for M bits
                                            |-- 2. Inject State T1
                                            |-- 3. GENERATE (M steps)
                                            |-- 4. Extract new State T2
                                            |-- 5. Free VRAM
                                            v
[ Rust Client ] <----(M bits + State T2)- [ GPU Backend ]

@cla-bot cla-bot bot added the cla-signed label Mar 26, 2026
@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch from cd614a8 to 08e50ca Compare March 26, 2026 09:55
@zama-bot zama-bot removed the approved label Mar 27, 2026
@github-actions
Copy link
Copy Markdown

ℹ️ Backward-compat snapshot: neutral changes

Only neutral changes were detected. This is expected when introducing new versioned types.


➕ Neutral
  • New enum backward_compatibility::ComputeLoadVersions (1 variants)
  • New enum backward_compatibility::SerializableProjectiveVersions (1 variants)

If you encounter any errors or have doubts, you can verify locally by running:

make backward_correctness BASE_REF=<base_branch_or_commit>

Where BASE_REF is the reference branch or commit to check against.

@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch from 5b781a7 to ec2c809 Compare March 27, 2026 13:37
@enzodimaria enzodimaria changed the title chore(gpu): kreyvium -> init + next + drop chore(gpu): kreyvium -> init + next Mar 27, 2026
@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch 5 times, most recently from 70ee626 to 6faeddf Compare March 30, 2026 08:03
@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch 2 times, most recently from 502bcc7 to 5daf458 Compare April 7, 2026 15:37
@enzodimaria enzodimaria changed the title chore(gpu): kreyvium -> init + next chore(gpu): save state of trivium/kreyvium Apr 7, 2026
@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch 2 times, most recently from af695bf to 4c45454 Compare April 13, 2026 07:42
@zama-bot zama-bot removed the approved label Apr 13, 2026
@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch 3 times, most recently from e8d9fbb to 99b5f69 Compare April 13, 2026 08:15
@enzodimaria enzodimaria marked this pull request as ready for review April 14, 2026 07:57
auto active_streams_flush =
streams.active_gpu_subset(flush_ops, params.pbs_type);
this->flush_lut->broadcast_lut(active_streams_flush);
this->flush_lut->setup_gemm_batch_ks_temp_buffers(size_tracker);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't want to enable gemm anymore ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder that GEMM is useful for more than 128 PBS in the batch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's an oversigh. I'll fix it

auto active_streams_and =
streams.active_gpu_subset(and_ops, params.pbs_type);
this->and_lut->broadcast_lut(active_streams_and);
this->and_lut->setup_gemm_batch_ks_temp_buffers(size_tracker);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't want gemm anymore ?

return Err("Input key and IV must contain 128 encrypted bits.".into());
}

let mut state = CudaKreyviumState {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the state here exists only temporarily: we are not yet able to persist (save/load) this state through serialization right ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet indeed. This could be a topic for a next PR

iv: &CudaUnsignedRadixCiphertext,
streams: &CudaStreams,
) -> crate::Result<CudaTriviumState> {
let num_key_bits = 80;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add named constants somewhere instead of magic values ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch from 99b5f69 to a4b8474 Compare April 16, 2026 08:22
Copy link
Copy Markdown
Contributor

@andrei-stoian-zama andrei-stoian-zama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want setup_gemm_batch_ks_temp_buffers in kreyvium too ?

@enzodimaria
Copy link
Copy Markdown
Contributor Author

do you want setup_gemm_batch_ks_temp_buffers in kreyvium too ?

@andrei-stoian-zama Yes, when I was working on it a few month ago
Performances were slightly better with it 👍

@enzodimaria enzodimaria force-pushed the edm/kreyvium-init-next-drop branch from a4b8474 to 9d43a82 Compare April 17, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants