Skip to content

Conversation

@durch
Copy link
Contributor

@durch durch commented Dec 12, 2025

Summary

  • Combines LP registration protocol implementation
  • Adds telescoping/nested sessions support
  • Adds localnet mode for gateway-probe testing
  • Integrates KKT + PSQ cryptographic primitives

This PR consolidates work from:

  • drazen/lp-reg
  • drazen/lp-reg-telescoping
  • drazen/gateway-probe-localnet-mode

Test plan

  • Run localnet with LP mode enabled
  • Test gateway-probe with --gateway-ip flag against localnet
  • Verify WireGuard tunnel establishment through LP registration
  • Run existing integration tests

This change is Reviewable

@durch durch requested a review from jstuczyn as a code owner December 12, 2025 14:44
@vercel
Copy link

vercel bot commented Dec 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
nym-explorer-v2 Ready Ready Preview, Comment Dec 23, 2025 11:46am
2 Skipped Deployments
Project Deployment Review Updated (UTC)
docs-nextra Ignored Ignored Preview Dec 23, 2025 11:46am
nym-node-status Ignored Ignored Preview Dec 23, 2025 11:46am

georgio and others added 4 commits December 12, 2025 16:30
Post-quantum Key Encapsulation Mechanism (KEM) Key Transfer protocol.
Enables efficient distribution of post-quantum KEM public keys.

Squashed from georgio/noise-psq branch.
Initial implementation of the Lewes Protocol (LP) for gateway registration:
- Add nym-lp crate with Noise protocol handshake
- Add LP listener to gateway for handling registrations
- Add LP client for registration flow
- Integrate KKT for post-quantum KEM key exchange
- Integrate PSQ for post-quantum PSK derivation
- Add Ed25519 authentication throughout
- Add docker/localnet support for testing

Co-authored-by: Jędrzej Stuczyński <[email protected]>
Extends LP protocol with telescoping architecture for nested sessions:
- Add nested session support with KKpsk0 rekeying
- Add subsession support with collision detection
- Implement unified packet format with outer header
- Refactor gateway handlers for single-packet forwarding
- Add TTL-based state cleanup for stale sessions
- Add outer AEAD encryption layer
- Refactor registration client for packet-per-connection model
Adds localnet testing mode to gateway-probe for LP development:
- Add TestMode enum for different probe configurations
- Add --gateway-ip flag for direct gateway testing
- Implement two-hop WireGuard tunnel for localnet
- Add mock ecash support for testing without real credentials
- Add netstack Go bindings for userspace networking
- Restructure probe with mode and common modules
- Update README with localnet mode documentation
durch added 12 commits December 16, 2025 11:28
- Change frg field from u8 to u16 in packet header (25 bytes total)
- Update encode/decode to use get_u16_le/put_u16_le
- Update Segment struct frg field to u16
- Remove truncating cast in session.rs
- Max message size now ~91MB (65,535 fragments × MTU)
- Internal protocol only, no interop concerns

Nym uses KCP for reliability and multiplexing, not standard real-time
use cases. The u8 limit (255 fragments, ~355KB) was insufficient.

Addresses: nym-yih9
Wrap hash and x25519_bytes in zeroize::Zeroizing to ensure private
key material is cleared from memory after use.

Closes: nym-k55g
Change KcpSession::input() to return Result<(), KcpError> so callers
can detect invalid packets instead of silently ignoring them.

- Add ConvMismatch error variant for conversation ID mismatches
- Update driver to propagate errors from session.input()
- Update all test and example callers

Closes: nym-n0kk
The from_bytes() function expects &[u8], need to deref the Zeroizing
wrapper to get the inner array.
Limits concurrent outbound connections when forwarding LP packets to
prevent file descriptor exhaustion under high load.

Key changes:
- Add max_concurrent_forwards config (default 1000)
- Add forward_semaphore to LpHandlerState
- Acquire semaphore permit before connecting in handle_forward_packet
- Return "Gateway at forward capacity" error when at limit

This provides load signaling so clients can choose another gateway
when the current one is overloaded.

Design note: Connection pooling was considered but provides minimal
benefit since telescope setup is one-time and targets are distributed
across many different gateways. See AIDEV-NOTE in LpHandlerState for
full analysis.

Closes: nym-xi3m
Replace .session().ok() with proper error handling to fail fast when
session is Closed or Processing after state machine processing.

Previously, the code silently continued with outer_key = None, which
could cause protocol errors downstream.

Closes: nym-8de0
Add bincode_options() helper that returns DefaultOptions with explicit
big_endian and varint_encoding configuration. This future-proofs against
bincode 1.x/2.x default changes and makes serialization format explicit.

Updated all 4 bincode usages in nested_session.rs to use the helper.
Extract common state_machine.session().ok().and_then(...) pattern into
two helper methods:
- get_send_key() for encryption (outer_aead_key_for_sending)
- get_recv_key() for decryption (outer_aead_key)

Updated 6 call sites to use the helpers, reducing verbosity.
- Create config.rs with LpConfig struct (kem_algorithm, psk_ttl, enable_kkt)
- Export LpConfig from lib.rs
- Add AIDEV-NOTE to psk.rs explaining:
  - Why PSQ is embedded in Noise (single round-trip, PSK binding)
  - KEM migration path (X25519 → MlKem768 → XWing)
- Add AIDEV-NOTE to state_machine.rs explaining protocol flow:
  - KKTExchange → Handshaking → Transport state transitions
  - PSK derivation formula (ECDH || PSQ || salt)
Add forward_timeout (30s default) to LpConfig and wrap send_forward_packet's
connect_send_receive call with tokio::time::timeout, matching the pattern
used by register() with registration_timeout.

This prevents indefinite hangs when forwarding packets through entry gateway.
Add AtomicU8 field to store the protocol version from handshake packet
headers. Includes getter and setter methods for future version negotiation
and compatibility checks.

- negotiated_version() returns current version (defaults to 1)
- set_negotiated_version() allows setting during handshake
- Subsessions inherit version 1 (can be enhanced to inherit parent's)
Breaking wire protocol change: MessageType field increased from 2 bytes
to 4 bytes in LP packets. This future-proofs the message type space and
aligns with other u32 fields.

Changes:
- message.rs: #[repr(u32)], from_u32(), to_u32()
- error.rs: InvalidMessageType(u32)
- codec.rs: All serialization/deserialization updated to 4-byte msg_type
  - Cleartext parsing: inner_bytes[4..8], content at [8..]
  - AEAD parsing: decrypted[4..8], content at [8..]
  - Serialization: 4 bytes for message type
/// Create explicit bincode options for consistent serialization across versions.
///
/// Using explicit options future-proofs against bincode 1.x/2.x default changes.
fn bincode_options() -> impl Options {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it'd be amazing to extend it to other bincode instances in LP, for example inside LpMessage::encode_content

subsession_counter: AtomicU64::new(0),
read_only: AtomicBool::new(false),
successor_session_id: Mutex::new(None),
negotiated_version: std::sync::atomic::AtomicU8::new(1), // Default to version 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be somehow derived from the exchange rather than being hardcoded? just thinking of how backwards compatibility is going to work once we bump it up to v2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really expect more than 65k message types? I'm not sure that's really realistic. if anything, u8 would have probably been enough. and worst case we can use the reserved bytes to bump up the capacity

Gateway (handler.rs):
- Add bound_receiver_idx field for session-affine connections
- Convert handle() from single-packet to loop with EOF detection
- Add validate_or_set_binding() for receiver_idx validation
- Set binding in handle_client_hello after collision check
- Centralize emit_lifecycle_metrics in main loop only
- Add is_connection_closed() helper for graceful EOF

Client (client.rs):
- Add stream field for persistent TCP connection
- Add ensure_connected(), send_packet(), receive_packet(), close() methods
- Modify perform_handshake_inner() to use persistent stream
- Modify register_with_credential() to use persistent stream
- Modify send_forward_packet() to use persistent stream
- Keep connect_send_receive() for reference (marked dead_code)

This reduces handshake overhead from ~5 TCP connections to 1.

Drive-by: Fix log::info! -> info! in wireguard peer_controller.rs
Entry gateway now maintains a persistent TCP connection to the exit
gateway per client session, reusing it for all forward requests from
that client. This reduces TCP handshake overhead significantly.

Key changes:
- Add exit_stream: Option<(TcpStream, SocketAddr)> to LpConnectionHandler
- Modify handle_forward_packet() to open on first forward, reuse after
- Clear exit_stream on connection errors (auto-reconnect on next forward)
- Semaphore only acquired for connection opens, not reuse (sequential access)
- Add 30s timeout to exit stream I/O operations (nym-df31)
  Prevents handler from hanging on unresponsive exit gateway

- Return error on forward target address mismatch (nym-zegu)
  Previously warned and proceeded, which could mask bugs

- Close client stream on handshake error paths (nym-scvm)
  Prevents state machine inconsistency on timeout or failure
Make LP registration resilient to network failures that could waste
credentials. When registration succeeds on the gateway but the response
is lost (e.g., network drop), clients can retry with the same WG key
and get the cached result instead of spending another credential.

Gateway-side:
- Add check_existing_registration() helper that looks up WG peer and
  returns cached GatewayData if already registered
- Add idempotency check in process_registration() dVPN branch
- Only return cached response if bandwidth > 0 (ensures registration
  was actually completed, not just peer created)
- Track idempotent registrations with lp_registration_dvpn_idempotent metric

Client-side:
- Add register_with_retry() to LpRegistrationClient that acquires
  credential once and retries handshake+registration on failure
- Add handshake_and_register_with_retry() to NestedLpSession for
  exit gateway registration via forwarding
- Add exponential backoff with jitter between retry attempts
- Verify outer session validity before nested session retry

Both retry methods clear state machine before retry to ensure fresh
handshake, and reuse the same credential across all attempts.
When enabled, mix nodes skip ack extraction and forwarding entirely.
The full payload (including ack portion) is returned as the message.

Closes: nym-3wrr
- Created tools/nym-lp-speedtest/ with Cargo.toml
- Added main.rs with CLI argument parsing
- Created stub modules: client.rs, speedtest.rs, topology.rs
- Added to workspace members
- Verified compilation with cargo check
- Add topology.rs with NymTopology integration
- Fetch mix nodes and gateways from nym-api
- Build GatewayInfo with LP addresses (port 41264)
- Provide random_route_to_gateway() for Sphinx routing
- Add required Cargo.toml dependencies
- Add send_data() and send_data_with_surbs() methods for mixnet data
- Integrate KCP reliable delivery with Sphinx packet construction
- Add x25519 encryption keypair for SURB reply mechanism
- Wire up main.rs to test LP handshake and data path
- Add NymRouteProvider support in topology for SURB construction
- Refactor send_data() to delegate to send_data_with_surbs(0) (DRY)

The client can now:
- Perform LP handshake with gateways
- Send data through the mixnet wrapped in KCP + Sphinx packets
- Attach SURBs for bidirectional communication
- Return encryption keys for decrypting replies
- Rename crate from nym-lp-speedtest to nym-lp-client
- Fix KCP bug: add driver.update() call before fetch_outgoing()
  Without update(), KCP never moves segments from snd_queue to snd_buf
- Update CLI name, about string, and user agent to match new name
- Extend RegistrationMode::Mixnet to include client_ed25519_pubkey
  and client_x25519_pubkey for nym address construction
- Add LpGatewayData struct containing gateway_identity and
  gateway_sphinx_key for SURB reply routing
- Add lp_gateway_data field to LpRegistrationResponse for mixnet mode
- Implement success_mixnet() constructor for mixnet registrations
- Update gateway registration to insert clients into ActiveClientsStore
  for SURB reply delivery, matching the websocket flow
- Add LpDataHandler for UDP data plane (port 51264)
- Decrypt LP layer and forward Sphinx packets to mixnet
- Add outbound_mix_sender to LpHandlerState
- Integrate data handler spawn into LpListener::run()
- Add metrics for data packets received/forwarded/errors

Implements nym-yzzm
Use state machine process_input() instead of manual decryption to ensure
proper replay protection:
- Counter check against receiving window
- Counter marking after successful decryption

Also handle subsession actions gracefully (SendPacket ignored on UDP,
clients should use TCP control plane for rekeying).

Security fix for nym-yzzm implementation.
- Add fetch_incoming() and recv() methods to KcpDriver for retrieving
  reassembled messages
- Create KcpSessionManager in ip-packet-router that manages KCP sessions
  keyed by conv_id (first 4 bytes of KCP packet header)
- Store ReplySurbs per session for sending anonymous replies
- Implement session timeout (5 min) and max sessions limit (10000)
- Add comprehensive tests for session lifecycle and KCP roundtrip
- Add KcpSessionManager field to MixnetListener struct
- Add is_kcp_message() helper to detect KCP-wrapped payloads
- Add on_kcp_message() to process LP client KCP messages
- Refactor on_reconstructed_message() to route KCP vs regular IPR
- Add KCP tick timer (100ms) for session updates and cleanup
- Initialize KcpSessionManager in IpPacketRouter::run_service_provider()

KCP messages are detected by checking byte 4 for valid KCP commands
(81-84), which doesn't conflict with IPR protocol version bytes (6-8)
at position 0.

Closes: nym-96zl
Add secondary check in is_kcp_message() to exclude messages that match
IPR protocol header pattern (version 6-8 at byte 0, ServiceProviderType
0-2 at byte 1). This prevents false positives where IPR messages with
byte 4 in range 81-84 would be incorrectly routed to KCP processing.

Added 4 unit tests to validate the detection logic.

Closes: nym-6f3x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants