-
Notifications
You must be signed in to change notification settings - Fork 261
LP reg telescoping #6240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: drazen/lp-reg
Are you sure you want to change the base?
LP reg telescoping #6240
Conversation
Add in-memory state storage for handshake and session state to enable stateless transport layer. This is the foundation for packet-per-connection forwarding and future UDP support. Changes: - Add handshake_states: Arc<DashMap<[u8; 32], LpStateMachine>> Keyed by client Ed25519 public key for in-progress handshakes - Add session_states: Arc<DashMap<u32, LpSession>> Keyed by session_id for established sessions - Initialize both maps in production code (node/mod.rs) - Initialize both maps in test code (handler.rs) No logic changes yet - pure infrastructure addition. All existing tests pass. Fixes: nym-qtji
Major architectural change: decouple handshake state from TCP connection to enable packet-per-connection forwarding and future UDP support. ## Changes ### Gateway Handler (handler.rs) - Refactor `handle()` from persistent-connection loop to single-packet processing - Add `handle_client_hello()`: Process ClientHello (session_id=0), create LpStateMachine, store in handshake_states map - Add `handle_handshake_packet()`: Process handshake packets incrementally, transition to session_states when complete - Add `handle_transport_packet()`: Handle registration/forward requests via established sessions ### State Management - Use handshake_states (DashMap<u32, LpStateMachine>) for in-progress handshakes - Use session_states (DashMap<u32, LpSession>) for established sessions - Session ID computed deterministically from X25519 keys immediately after ClientHello - State persists across connection closes ### Packet Flow ``` packet arrives → parse session_id from header if session_id == 0: handle_client_hello (create state) elif in handshake_states: handle_handshake_packet (advance state) elif in session_states: handle_transport_packet (decrypt, process) else: error (unknown session) send response → close connection ``` ## Testing - All 13 existing unit tests pass - Tests verify low-level packet I/O, timestamp validation, etc. - Integration testing via nym-gateway-probe (next step) ## Backwards Compatibility - Old handshake.rs methods kept but unused (may remove later) - Metrics unchanged - Protocol wire format unchanged ## Next Steps - Test with docker/localnet topology - Verify telescoping works (nym-gateway-probe) - Apply same architecture to entry gateway (nym-31hl) Fixes: nym-21th Blocks: nym-31hl
Gateway responder's StartHandshake doesn't return a packet - it just transitions
to KKTExchange state and waits for client's KKT request. Fixed handle_client_hello()
to not expect a response packet.
## Changes
### Add BOOTSTRAP_SESSION_ID constant
- `common/nym-lp/src/packet.rs`: Add `pub const BOOTSTRAP_SESSION_ID: u32 = 0`
- Document that session_id=0 is only used for ClientHello bootstrap
- Export from lib.rs for public use
### Fix handle_client_hello() logic
- `gateway/src/node/lp_listener/handler.rs:201-225`: - Remove expectation of SendPacket action from StartHandshake
- Responder transitions to KKTExchange without sending
- Store state machine and close connection
- Client sends KKT request on next connection with computed session_id
### Use constant throughout codebase
- `gateway/src/node/lp_listener/handler.rs:115`: Use BOOTSTRAP_SESSION_ID in routing
- `nym-registration-client/src/lp_client/client.rs:259`: Use constant instead of literal 0
## Protocol Flow (Fixed)
```
Connection 1: Client sends ClientHello (session_id=0)
→ Gateway stores state, closes (no response)
Connection 2: Client sends KKT request (session_id=X)
→ Gateway finds state, processes, responds
Connection 3+: Handshake continues until complete...
```
## Testing
- All 13 unit tests pass
- Real test: docker/localnet + nym-gateway-probe (next step)
Fixes: nym-v9un
Unblocks: nym-21th
- Extend handle_transport_packet() to conditionally deserialize both LpRegistrationRequest and ForwardPacketData messages - Add handle_registration_request() helper for registration flow - Add handle_forwarding_request() helper for telescoping flow - Delete dead handle_forwarding_loop() method (persistent connection model) - Clean up unused imports and dead code warnings - All 13 tests passing Entry gateway now fully supports single-packet-per-connection architecture for both direct client registration and packet forwarding to exit gateways. Each ForwardPacket arrives on a new connection, gets processed, response sent, and connection closes - consistent with exit gateway behavior from nym-21th.
- Create TimestampedState<T> wrapper with created_at and last_activity tracking - Add TTL config fields to LpConfig: * handshake_ttl_secs (default: 90s) * session_ttl_secs (default: 24h) * state_cleanup_interval_secs (default: 5min) - Update LpHandlerState maps to use TimestampedState wrappers - Update all handler.rs access points to wrap/unwrap states - Implement background cleanup task in LpListener: * Spawns on startup, stops on shutdown * Removes handshakes older than handshake_ttl * Removes sessions with no activity > session_ttl * Tracks metrics: lp_states_cleanup_handshake_removed, lp_states_cleanup_session_removed - Touch last_activity on every packet in handle_transport_packet() - All 13 tests passing Prevents memory leaks from abandoned handshakes and expired sessions in long-running gateways. Configurable TTLs for different use cases.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
- Add OuterAeadKey derived from PSK via Blake3 KDF for packet encryption - Add LpMessage::Ack (0x0008) for ClientHello acknowledgment - Gateway sends Ack after processing ClientHello (packet-per-connection) - Update codec with AEAD encrypt/decrypt using ChaCha20-Poly1305 - Header remains cleartext (AAD), payload encrypted after PSK derivation - Add parse_lp_header_only() for routing before session lookup - Update session to expose outer_aead_key() getter - Various LP protocol improvements and test coverage Closes: nym-f4v1, nym-n9dr
…8wuj, nym-k0tb] Refactor LpRegistrationClient from persistent TCP connection model to packet-per-connection model, matching gateway's stateless connection pattern. Each LP packet exchange opens a new TCP connection, sends one packet, receives one response, then closes. State persists in LpStateMachine locally. Key changes: - Add connect_send_receive() helper for packet-per-connection exchanges - Rewrite perform_handshake() with clean loop-based approach - Remove LpTransport (packet-per-connection doesn't need persistent transport) - Combine send_registration_request + receive_registration_response into register() - Remove connect() method - handshake now handles connection internally NestedLpSession refactoring: - Add send_and_receive_via_forward() helper (consolidates 9 outer_key extractions) - Rewrite perform_handshake() from 6-level nesting to clean loop - Use BOOTSTRAP_RECEIVER_IDX constant instead of hardcoded 0 - Fix stale doc comment referencing removed connect() method Result: 438 fewer lines, cleaner control flow, DRY outer_key handling.
- Validate Ack response in NestedLpSession and LP client final handshake - Replace manual Drop with derive(Zeroize, ZeroizeOnDrop) for OuterAeadKey - Add replay counter check before AEAD decryption to prevent DoS [nym-qm2q, nym-z82d, nym-9ik3, nym-62fs]
Restructure LP packet format so cleartext fields (receiver_idx, counter) are always first, enabling trivial header parsing for routing before session lookup. Protocol version and reserved fields are now encrypted in the inner payload for encrypted packets. Wire format change: - Before: proto(1B) + reserved(3B) + receiver_idx(4B) + counter(8B) - After: receiver_idx(4B) + counter(8B) | proto(1B) + reserved(3B) + ... Key changes: - Add OuterHeader struct (12 bytes) for routing/replay protection - Update serialize_lp_packet/parse_lp_packet for unified format - parse_lp_header_only now returns OuterHeader - Gateway handler uses OuterHeader for session lookup - Update DESIGN.md with new wire format diagrams Security improvement: Only receiver_idx and counter visible after PSK establishment (was also exposing protocol version and reserved).
- Add subsession message types: SubsessionKK1, KK2, Ready, Request, Abort - Implement SubsessionHandshake for Noise KKpsk0 tunneled through parent - Add subsession PSK derivation from parent's PQ shared secret - Handle simultaneous initiation with X25519 key comparison tie-breaker - Add stale SubsessionAbort handler for message interleaving scenarios - Add test for simultaneous subsession initiation race condition Subsessions provide forward secrecy via periodic rekeying while inheriting PQ protection from the parent session's ML-KEM shared secret.
- Store LpStateMachine in session_states (not LpSession) for subsession handling - Add LpStateMachine::from_subsession() factory for promoted sessions - Rewrite handle_transport_packet() to use state machine for all messages - Add handle_subsession_complete() for session promotion flow - Add collision check for new_receiver_index before insert (nym-90rw) - Add demoted_session_ttl_secs config (default 60s) for ReadOnlyTransport sessions to be cleaned up quickly after subsession promotion (nym-atza) - Track demoted session cleanup separately with lp_states_cleanup_demoted_removed
jstuczyn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another batch of comments from me : )
- LpPacket is length-prefixed when put on the wire. but given we want them to fit inside UDP datagrams, isn't using u32 for that purpose a wasteful overkill? wouldn't u16 been more than enough? especially since we do actually make explicit checks to see if data is not bigger than what would have fit in u16 (plus 1):
const MAX_PACKET_SIZE: usize = 65536; // 64KB max
if packet_len > MAX_PACKET_SIZE {
return Err(GatewayError::LpProtocolError(format!(
"Packet size {} exceeds maximum {}",
packet_len, MAX_PACKET_SIZE
)));
}
- I'm really not a fan of
ClientHelloserialisation and depending on magic value denoting bincode overhead. the payload structure is simple enough to get rid of it (we could just concat all fields since they have constant lengths) - if we're going to use bincode for lp messages serialisation, we really should have an explicit function creating the serialiser to ensure fixed int encoding
- when generating receiver_index we have to ensure it's never the same as
BOOTSTRAP_RECEIVER_IDXotherwise we risk incorrect handling of data - I'm not convinced by the single packet per connection approach for registration - I can imagine it might cause some friction down the line especially if attempting to register with gateways far away from the user. then the overhead of creating multiple connections to complete the exchange will be very much non-negligible
| - `handshake_states`: Maps `session_id → LpStateMachine` (during handshake) | ||
| - `session_states`: Maps `session_id → LpSession` (after handshake complete) | ||
|
|
||
| Both maps use TTL-based cleanup to remove stale entries (default: 5 min handshake, 1 hour session). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that's not unconditional because sessions can easily last more than 1h
| ▼ | ||
| ┌────────────────┐ | ||
| │ Entry Gateway │ Sees: Client IP | ||
| │ │ Doesn't see: Exit destination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it must see exit destination, otherwise how would it forward the packet?
| #[derive(Debug, Clone, Serialize, Deserialize)] | ||
| pub struct ForwardPacketData { | ||
| /// Target gateway's Ed25519 identity (32 bytes) | ||
| pub target_gateway_identity: [u8; 32], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't seem like we're using the identity at all when forwarding the packet and only rely on the lp_address -> is it actually redundant or will it be used further down the PR stack?
|
|
||
| ### Session ID | ||
|
|
||
| Sessions are identified by a deterministic 32-bit ID computed from both parties' X25519 public keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this contradicts what's written in DESIGN.md:
The client generates a random 4-byte receiver_index and includes it in ClientHello. The gateway uses this as the session lookup key. This replaces the previous approach of computing a deterministic session_id from both parties' keys.
|
|
||
| // Send Collision response to tell client to retry with new receiver_index | ||
| // No outer key - this is before PSK derivation | ||
| let collision_packet = LpPacket::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question for @georgio for whether this is an actual problem. couldn't client technically perform an enumeration attack to figure indices of other connections? what would happen if they actually were successful?
| LpState::Transport { session } | ||
| } else { | ||
| // 4. Deliver data | ||
| result_action = Some(Ok(LpAction::DeliverData(BytesMut::from(plaintext.as_slice())))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to store the data in BytesMut given they should be delivered as they are, i.e. no further mutability? (ultimately they seem to be always just converted into Vec<u8> or dereferences into &[u8] to be further deserialised)
| .ok() | ||
| .and_then(|s| s.outer_aead_key()) | ||
| } else { | ||
| // Unknown session - will error during routing, parse cleartext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why not just error out immediately and abandon this handler?
| rand::thread_rng().fill_bytes(&mut salt[8..]); | ||
|
|
||
| Self { | ||
| receiver_index: rand::random(), // Auto-generate random receiver index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't use rand:random() directly as (even though incredibly unlikely) it could be set to BOOTSTRAP_RECEIVER_IDX and thus it would be incorrectly handled
| /// ``` | ||
| pub struct NestedLpSession { | ||
| /// Exit gateway's Ed25519 identity (32 bytes) | ||
| exit_identity: [u8; 32], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not keep it more strongly typed, i.e. as ed25519::PublicKey?
| exit_identity: [u8; 32], | ||
|
|
||
| /// Exit gateway's LP address (e.g., "2.2.2.2:41264") | ||
| exit_address: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, but as SocketAddr
This change is