Skip to content

Conversation

@durch
Copy link
Contributor

@durch durch commented Oct 23, 2025

Summary

This PR implements the Lewes Protocol (LP) registration path for direct TCP-based gateway connections, providing a more efficient alternative to mixnet-based registration while maintaining security through Noise protocol
handshake and credential verification.

What's Implemented

Server-Side (Gateway)

  • LP Listener on port 41264 accepting TCP connections
  • Noise protocol handshake (XKpsk3_25519_ChaChaPoly_SHA256) for secure channel establishment
  • Registration handler supporting both dVPN and Mixnet modes
  • Credential verification using existing EcashManager infrastructure
  • WireGuard peer management integration for dVPN registrations
  • Session management with replay protection using bitmap-based counter validation

Client-Side (RegistrationClient)

  • LpRegistrationClient module with TCP connection handling
  • State machine for LP handshake as initiator
  • Registration flow with request/response handling
  • Transport layer for post-handshake data transfer
  • Integration with existing RegistrationClient via register_lp() method
  • Result types (LpRegistrationResult) matching existing patterns

Protocol Features

  • Length-prefixed framing for TCP packet boundaries
  • Random PSK generation per registration (32-byte)
  • Replay protection with 1024-packet reorder window
  • Registration modes: dVPN (WireGuard) and Mixnet

Configuration & Defaults

  • Connection timeout: 10 seconds
  • Handshake timeout: 15 seconds
  • Registration timeout: 30 seconds
  • TCP optimizations: TCP_NODELAY enabled, no keepalive needed
  • No persistent connections: Registration-only protocol

Architecture Changes

New Modules

  • common/nym-lp/ - Core LP protocol implementation
  • common/nym-lp-common/ - Shared LP types
  • gateway/src/node/lp_listener/ - Gateway LP listener
  • nym-registration-client/src/lp_client/ - Client LP implementation

Modified Components

  • NymNode struct now includes optional lp_address field
  • RegistrationClient supports RegistrationMode::Lp
  • Gateway startup includes LP listener initialization

What's Outstanding

Testing

  • Unit tests for LP protocol components
  • Integration tests for end-to-end registration flow
  • Real gateway testing with LP-enabled nodes

Documentation

  • API documentation for LP registration usage
  • Configuration guide for LP-specific settings
  • Protocol specification documentation

Future Enhancements (Not blocking)

  • Metrics collection for LP operations
  • Gateway probe support for LP endpoints
  • Advanced PSK management (currently using random generation)
  • Error handling improvements and categorization

Breaking Changes

None - LP is an additional registration path that doesn't affect existing WireGuard or Mixnet registration modes.

Performance Impact

LP registration provides:

  • Lower latency: Direct TCP connection vs multi-hop mixnet
  • Reduced overhead: No Sphinx packet wrapping
  • Faster registration: ~1-2 seconds vs 10-30 seconds for mixnet

Security Considerations

  • Noise protocol provides mutual authentication and forward secrecy
  • Random PSK per registration prevents replay attacks
  • Credential verification ensures proper authorization
  • No persistent state reduces attack surface

Migration Path

No migration required. Clients can opt-in to LP registration by:

  1. Checking if gateway advertises LP address
  2. Using RegistrationMode::Lp when building RegistrationClient
  3. Falling back to existing modes if LP unavailable

This change is Reviewable

@vercel
Copy link

vercel bot commented Oct 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
nym-explorer-v2 Ready Ready Preview Comment Nov 24, 2025 11:16am
2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
docs-nextra Ignored Ignored Preview Nov 24, 2025 11:16am
nym-node-status Ignored Ignored Preview Nov 24, 2025 11:16am

@durch durch marked this pull request as ready for review November 20, 2025 16:19
@durch durch requested a review from jstuczyn as a code owner November 20, 2025 16:19
durch and others added 14 commits November 21, 2025 13:49
Implement LP (Lewes Protocol) registration flow testing in nym-gateway-probe
to validate gateway LP registration capabilities alongside existing WireGuard
and mixnet tests.

Changes:
- Add LpProbeResults struct to track LP registration test results
  (can_connect, can_handshake, can_register, error)
- Add lp_registration_probe() function that tests full registration flow:
  * TCP connection to LP listener (port 41264)
  * Noise protocol handshake with PSK derivation
  * Registration request with bandwidth credentials
  * Registration response validation
- Integrate LP test into main probe flow - runs automatically if gateway
  has LP address (derived from gateway IP + port 41264)
- Export LpRegistrationClient from nym-registration-client for probe use
- Add LP address field to TestedNodeDetails

The probe tests only successful registration without additional traffic,
keeping the implementation simple and focused.
* Add nymkkt with KKT convenience wrappers for nym-lp integration

Integrates nymkkt module from georgio/noise-psq branch to enable
post-quantum key distribution for nym-lp.

Changes:
- Add common/nymkkt from georgio/noise-psq (KKT protocol implementation)
- Add convenience wrapper layer (kkt.rs) with simplified API:
  - request_kem_key() - Client requests gateway's KEM key
  - validate_kem_response() - Client validates signed response
  - handle_kem_request() - Gateway handles requests
- Add nymkkt to workspace members in root Cargo.toml
- Export kkt module in lib.rs

The KKT (Key Encapsulation Mechanism Transport) protocol enables efficient
distribution of post-quantum KEM public keys. Instead of storing large PQ
keys in the directory (1KB-500KB), we store 32-byte hashes and fetch actual
keys on-demand via this authenticated protocol.

Tests: All 5 unit tests passing (authenticated, anonymous, signature
verification, hash validation)

* feat(lp): add Ed25519 authentication to PSQ protocol

Replace basic PSQ v0 API with authenticated v1 API that includes
cryptographic authentication via Ed25519 signatures.

Changes:
- PSQ initiator now signs encapsulated keys with Ed25519 private key
- PSQ responder verifies Ed25519 signatures before deriving PSK
- Prevents MITM attacks through mutual authentication
- Fixed test helpers to use role-based Ed25519 keypair assignment
  (initiator uses [1u8;32], responder uses [2u8;32])

Security: This adds a critical authentication layer to the post-quantum
PSK derivation protocol, ensuring both parties can verify each other's
identity during the handshake.

Tests: All 77 tests passing (was 11 failures, now 0)

* feat(lp): integrate PSQ post-quantum PSK derivation

Complete integration of Post-Quantum Secure (PSQ) protocol for PSK
derivation in the Lewes Protocol, replacing simple Blake3 derivation
with cryptographically secure DHKEM-based PSK establishment.

This commit encompasses three completed tasks:

- Add KKTRequest/KKTResponse message types to LpMessage enum
- Update codec to handle KKT message serialization/deserialization
- Add kkt_orchestrator.rs with high-level KKT API wrappers
- Enable key exchange orchestration for PSQ protocol

- Add set_psk() method to NoiseProtocol for dynamic PSK injection
- Integrate PSQ derivation into LpSession handshake flow
- PSQ payload embedded in first Noise message (ClientHello)
- Derive PSK using libcrux-psq before Noise handshake completion
- Add helper functions for X25519 to KEM conversions

- Add comprehensive PSQ integration tests in session_integration/
- Test PSQ handshake end-to-end flow
- Validate PSK derivation correctness between initiator/responder
- Test PSQ + Noise combined protocol operation

Dependencies:
- libcrux-psq: Post-quantum PSK protocol implementation
- libcrux-kem: Key Encapsulation Mechanism primitives
- nym-kkt: KKT key exchange protocol wrappers
- rand 0.9: Required for KKT compatibility

Security: This adds Harvest-Now-Decrypt-Later (HNDL) resistance by
combining classical ECDH with post-quantum KEM for PSK derivation.
Even if X25519 is broken by quantum computers, the PSK remains secure.

Tests: All 77 tests passing

* feat(lp): add PSQ error handling documentation and tests (nym-bbi)

Formalize the "always abort" error handling strategy for PSQ failures.
PSQ errors indicate attacks, misconfigurations, or protocol violations
that should not be silently ignored or worked around.

Changes:
- Add comprehensive error handling documentation to psk.rs module
- Add diagnostic logging with error categorization:
  * CredError → warn about potential attack
  * TimestampElapsed → warn about potential replay
  * Other errors → log as errors
- Add 4 error scenario tests:
  * test_psq_deserialization_failure
  * test_handshake_abort_on_psq_failure
  * test_psq_invalid_signature
  * test_psq_state_unchanged_on_error
- Add log dependency to Cargo.toml

Error handling strategy: All PSQ failures abort the handshake cleanly
with no retry or fallback. This prevents silent security degradation
and ensures misconfigurations are detected early.

State guarantees: PSQ errors leave session in clean state - dummy PSK
remains, Noise HandshakeState unchanged, no partial data, no cleanup needed.

Tests: 81 tests passing (77 original + 4 new error tests)

Closes: nym-bbi

* feat(lp): add PSK injection tracking to prevent dummy PSK usage (nym-ep2)

Add safety mechanism to ensure real post-quantum PSK was injected before
allowing transport mode operations (encrypt/decrypt). This prevents
accidentally using the insecure dummy PSK [0u8; 32] if PSQ injection fails.

Changes:
- Add `psk_injected: AtomicBool` field to LpSession
- Initialize to `false` in LpSession::new()
- Set to `true` after successful PSK injection:
  * Initiator: In prepare_handshake_message() after set_psk()
  * Responder: In process_handshake_message() after set_psk()
- Add NoiseError::PskNotInjected error variant
- Add PSK injection checks in encrypt_data() and decrypt_data()
  * Check happens before handshake completion check
  * Returns PskNotInjected if flag is false
- Add comprehensive PSK injection lifecycle documentation to LpSession
- Add test_transport_fails_without_psk_injection test
- Update test_encrypt_decrypt_before_handshake to expect PskNotInjected

PSK Injection Lifecycle:
1. Session created with dummy PSK [0u8; 32] in Noise HandshakeState
2. During handshake, PSQ runs and derives real post-quantum PSK
3. Real PSK injected via set_psk() - psk_injected flag set to true
4. Handshake completes, transport mode available
5. Transport operations check psk_injected flag for safety

This is defensive programming - normal PSQ flow always injects the real PSK.
The safety check prevents transport mode if PSQ somehow fails silently or is
bypassed due to implementation bugs.

Tests: 82 tests passing (81 original + 1 new)

Closes: nym-ep2

* docs(lp): fix PSK state documentation inaccuracy

Correct error handling documentation to clarify that PSK slot 3
remains unmodified only on error, not in all cases.

Previous: "PSK slot 3 = dummy [0u8; 32] (never modified)"
Corrected: "PSK slot 3 = dummy [0u8; 32] (not modified on error)"

This is more accurate since:
- On error: PSK remains as dummy value (never injected)
- On success: PSK is replaced with real post-quantum PSK

Documentation-only change, no functional impact.

* feat(lp): add KKTExchange state to state machine for pre-handshake KEM key transfer (nym-4za)

Add KKTExchange state to LpStateMachine to properly orchestrate KKT (KEM Key Transfer)
protocol before Noise handshake begins. This enables dynamic KEM public key exchange,
allowing post-quantum KEM algorithms to be used without pre-published keys.

Changes:
- Add KKTExchange state and KKTComplete action to state machine
- Implement automatic KKT exchange on StartHandshake:
  * Initiator: sends KKT request → waits for response → validates signature
  * Responder: waits for request → validates → sends signed KEM key
- Update process_kkt_response() to accept Option<&[u8]> for hash validation:
  * Some(hash): full KKT validation with directory hash (future)
  * None: signature-only mode (current deployment)
- Add local_x25519_public() helper for responder KEM key derivation
- Update state flow: ReadyToHandshake → KKTExchange → Handshaking → Transport
- Add PSK handle storage (psk_handle) for future re-registration
- Export generate_fresh_salt() for session creation
- Update psq_responder_process_message() to return encrypted PSK handle (ctxt_B)
- Add comprehensive tests:
  * test_kkt_exchange_initiator_flow
  * test_kkt_exchange_responder_flow
  * test_kkt_exchange_full_roundtrip
  * test_kkt_exchange_close
  * test_kkt_exchange_rejects_invalid_inputs
  * Updated test_state_machine_simplified_flow for KKT phase

All tests passing. Ready for nym-8y5 (PSQ handshake KKT integration).

* docs(lp): add state machine and post-quantum security protocol documentation

Add comprehensive documentation of the Lewes Protocol state machine and
post-quantum security architecture to LP_PROTOCOL.md.

New sections:
- State Machine and Security Protocol overview
- Detailed state transition diagram (ReadyToHandshake → KKTExchange → Handshaking → Transport)
- Complete message sequence diagram showing KKT + PSQ + Noise flow
- KKT (KEM Key Transfer) protocol specification
- PSQ (Post-Quantum Secure PSK) protocol details
- Security guarantees and implementation status
- Algorithm choices (current X25519, future ML-KEM-768)
- Message type specifications for KKT
- Version 1.1 changelog entry documenting KKT/PSQ integration

Documentation includes:
- ASCII art state machine diagram
- Message sequence diagram with all protocol phases
- PSK derivation formulas
- Security properties checklist
- Migration path to post-quantum KEMs
- Integration details (PSQ embedded in Noise, no extra round-trips)

Related to nym-4za (KKTExchange state implementation).

* feat(lp): use KKT-authenticated KEM key in PSQ handshake (nym-8y5)

Replace direct X25519→KEM conversion with KKT-derived authenticated key
in PSQ initiator flow. This ensures PSQ uses the responder's authenticated
KEM public key obtained via KKT protocol instead of blindly converting
their X25519 key, properly completing the post-quantum security chain.

Changes:
- session.rs: Extract KEM key from KKTState::Completed in prepare_handshake_message()
- session.rs: Add set_kkt_completed_for_test() helper for test initialization
- session.rs: Update create_handshake_test_session() to initialize KKT state
- session.rs: Fix test_handshake_abort_on_psq_failure and test_psq_invalid_signature
- session_manager.rs: Add init_kkt_for_test() for integration test setup
- session_integration/mod.rs: Update tests for KKT-first flow (6 rounds total)
- session_integration/mod.rs: Fix state machine test expectations for KKTExchange state

All 87 tests passing. Unblocks nym-w8f (KKT tests) and nym-m15 (production integration).

* feat(lp): simplify API to Ed25519-only, derive X25519 internally

Refactored LP state machine to use Ed25519 keys exclusively in the public
API, with X25519 keys derived internally via RFC 7748. This simplifies the
API from 6 parameters to 4 while maintaining protocol security.

**Core API Changes:**
- LpStateMachine::new(): Removed explicit X25519 keypair parameters
- Old: new(is_initiator, local_keypair, local_ed25519_keypair,
         remote_public_key, remote_ed25519_key, salt)
- New: new(is_initiator, local_ed25519_keypair, remote_ed25519_key, salt)
- X25519 keys now derived internally from Ed25519 using RFC 7748
- lp_id calculation moved inside state machine (uses derived X25519 keys)

**Protocol Changes:**
- ClientHello message extended from 65 to 97 bytes
- Now includes client_ed25519_public_key field (32 bytes)
- Required for PSQ authentication in KKT + PSQ handshake flow
- Breaking change: gateway must extract Ed25519 from ClientHello

**Gateway Updates:**
- receive_client_hello() now extracts Ed25519 public key
- LpGatewayHandshake::new_responder() accepts Ed25519 keys only
- Removed manual X25519 conversion (handled by state machine)

**Registration Client Updates:**
- LpRegistrationClient now uses Ed25519 keypairs
- Generate fresh ephemeral Ed25519 keys for LP registration
- ClientHello includes Ed25519 public key for gateway authentication
- Fixed 7 pre-existing build errors:
  * mixnet_client_startup_timeout field removal
  * IprClientConnect API change (async → sync)
  * Error variant renames (use helper function)
  * LP client key type mismatches (X25519 → Ed25519)

**Test Suite:**
- Updated 16+ test functions to use new 4-parameter constructor
- Fixed 5 integration test failures caused by lp_id mismatch
- Tests now derive X25519 from Ed25519 (matching production behavior)
- Added missing PublicKey imports in test modules
- All 87 tests passing (100% success rate)

**Implementation Details:**
- Added Ed25519RecoveryError variant to LpError enum
- Type conversion: nym_crypto X25519 → nym_lp keypair types
- Maintained backward compatibility for PSQ/KKT protocol flow
- Session manager updated to use new API signature

This change completes the Ed25519-only API migration, hiding X25519 as an
implementation detail while preserving all security properties of the
KKT-authenticated PSQ handshake protocol.

* chore: run cargo fmt

* chore: run cargo clippy --fix to resolve simple linter issues

* Basic handshake working

* Final tweaks

* Wrap PR comments, 2024

---------

Co-authored-by: Jędrzej Stuczyński <[email protected]>
Copy link
Contributor

@jstuczyn jstuczyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small set of comments based on a smaller subset of changes ( didn't have a chance to look into LP proper UPDATE: LP-related comments are in the following messages ):

  • I'm unsure about the changes to nym-node bonding -> why couldn't LP information be put into the self-described endpoint (which has to be queried anyway to retrieve, for example, the sphinx key)? making changes to the contract is always quite tricky especially when we want to ensure backwards compatibility
  • I think there might be an issue with the KCP driver and the way it handles new input data. it is defined as follows:
pub fn input(&mut self, data: &[u8]) -> Result<Vec<KcpPacket>, KcpError> {
    self.buffer.extend_from_slice(data);
    let mut pkts = Vec::new();
	while let Ok(Some(pkt)) = KcpPacket::decode(&mut self.buffer) {
	    debug!(
	        "Decoded packet, cmd: {}, sn: {}, frg: {}",
	        pkt.command(),
	        pkt.sn(),
	        pkt.frg()
	    );
	    self._input(&pkt)?;
	    pkts.push(pkt);
	}
	Ok(pkts)
}

However, if we receive malformed data, instead of returning error back the caller and terminating the processing, the driver will happily continue with Ok(Vec::new()) on all subsequent calls and will never advance its buffer (and continue filling it up with new, unprocessable data!). For each instance of KcpPacket::decode we should check for error and continue processing until we hit a None

  • might be my misunderstanding of KCP, but should new Segment have ts of 0 or should they use the current value within the KCPSession?
  • I think we're putting acks onto the send queue too eagerly without making all the checks
  • parse_data - on duplicates immediately return, meanwhile in C they still continue with moving items from the buffer onto the queue
  • some other minor nits regarding dependencies used: ideally all versions of the deps should be specified in the workspace and then be referred to via workspace = true. also, if possible, we shouldn't be including log and instead use tracing for consistency with other non-legacy crates

/// Optional LP (Lewes Protocol) listener address for direct gateway connections.
/// Format: "host:port", for example "1.1.1.1:41264" or "gateway.example.com:41264"
#[serde(default)]
pub lp_address: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what situation will lp host be any different from the existing host field? shouldn't we be just announcing the lp port (assuming it's non-default)?
as a matter of fact - do we even need to do that at all? why couldn't everything be retrieved via the self-described information that's available on the http endpoint? (like we're doing with say the sphinx keys and whatnot)

host: "1.2.3.4".to_string(),
custom_http_port: None,
identity_key,
lp_address: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I said before, I'm not sure this is the right place for the lp address

break; // Cannot start assembling a message now
}

// Scan ahead in rcv_buf to find if a complete message exists contiguously
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we concerned in delivering full messages as opposed to next fragment in order? should it be KCP's concern to de-fragment data?

Copy link
Contributor Author

@durch durch Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thats what that does, in case it got stuff out of order it will break, and wait for the next one. The fragment will remain, and wait its turn

// time to retransmit
need_send = true;
seg.xmit += 1;
// Exponential backoff: double RTO for this segment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait. wasn't the whole idea of KCP that it does NOT have expontential backoff?

In TCP, retransmission timeout (RTO) increases exponentially—each failure doubles the timeout interval (RTO × 2). This means three consecutive losses can escalate to RTO × 8, leading to serious transmission delays. KCP, in contrast, uses a more responsive approach in fast mode: the RTO increases by a factor of 1.5x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats interesting, if i remember correctly in the mixnet its too noisy to use the defaults, but maybe this is not the right fix

);
self.out_pkts.push(pkt);

// if too many xmit => dead_link check, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that's still to be done, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was supposed to come when we implement backpressure, we just never got to it

Copy link
Contributor

@jstuczyn jstuczyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another set of comments, this time regarding nym-lp:

  • Lp Packet trailer: it's meant to be authentication tag, but it seems it's always set to 0?
  • Lp Packet Client Hello: contains unimplemented!() macro placeholders; uses internal system clock which goes against the sans-io idea
  • should LP keypair be using sphinx' keys? like why should it care about sphinx key rotation?
  • LP key recovery might blow up
  • there's no support for no legacy PSK, just PSQ, even though comments would suggest otherwise
  • private keys are a bit mistreated in LpSession and converted back and forth between bytes to clone them - do they need to be cloned?
  • no initial state validation when processing KKT request (process_kkt_request)
  • other bits and pieces as pointed out by specific code comments

LpMessage::Busy => &[],
LpMessage::Handshake(payload) => payload.0.as_slice(),
LpMessage::EncryptedData(payload) => payload.0.as_slice(),
LpMessage::ClientHello(_) => unimplemented!(), // Structured data, serialized in encode_content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given ClientHelloData is defined, this shouldn't be difficult to include, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted it in a less bad way later on, ie just returning &[]. Tension here is that early message types were just Vecs, but then I added these structured ones, and now that whole thing is neither here nor there

/// Data structure for the ClientHello message
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ClientHelloData {
/// Client's LP x25519 public key (32 bytes) - derived from Ed25519 key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to be returning a nice slice in LpMessage::payload you might have to unfortunately get rid if the explicit structure in favour of a single [u8; 96] and then just return slices to corresponding bits

)));
}

let psq_payload = &payload[2..2 + psq_len];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a separate type, or at least set of function for serialising and deserialising initial PSQ handshake message so it'd be

  1. easier to test
  2. easier to read
  3. less error prone

Comment on lines +747 to +772
let local_private_bytes = &self.local_x25519_private.to_bytes();
let libcrux_private_key = libcrux_kem::PrivateKey::decode(
libcrux_kem::Algorithm::X25519,
local_private_bytes,
)
.map_err(|e| {
LpError::KKTError(format!(
"Failed to convert X25519 private key to libcrux PrivateKey: {:?}",
e
))
})?;
let dec_key = DecapsulationKey::X25519(libcrux_private_key);

let local_public_key = self.local_x25519_private.public_key();
let local_public_bytes = local_public_key.as_bytes();
let libcrux_public_key = libcrux_kem::PublicKey::decode(
libcrux_kem::Algorithm::X25519,
local_public_bytes,
)
.map_err(|e| {
LpError::KKTError(format!(
"Failed to convert X25519 public key to libcrux PublicKey: {:?}",
e
))
})?;
let enc_key = EncapsulationKey::X25519(libcrux_public_key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this is happening in each lp session, can't we just provide encapsulated keys straightaway in the constructor (and ideally Arc'd) to prevent this constant conversion?

let mut result_action: Option<Result<LpAction, LpError>> = None;

// 2. Match on the owned current_state. Each arm calculates and returns the NEXT state.
let next_state = match (current_state, input) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd help readibility if we split that match given it's over 300 lines long : D

Copy link
Contributor

@jstuczyn jstuczyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

penultimate comments:

  • IpPool allocation seems quite inefficient - essentially we have to iterate through the entire pool (both free and allocated entries) for every iteration
  • IpPair addresses are automatically released after 1h, even if in use!
  • Peer addresses might not get marked as used if they don't have a valid ipv6 address
  • LpPacket wire serialisation/deserialisation is duplicated in multiple places (say handshake.rs and handler.rs)
  • we're not listening for shutdown anyway within the LP handling -> we could very easily just block there

Comment on lines +54 to +59
/// Error message if registration failed
pub error: Option<String>,

/// Gateway configuration data (same as returned by authenticator)
/// This matches what WireguardRegistrationResult expects
pub gateway_data: Option<GatewayData>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to maybe combine those fields into a single enum to remove possibility (and required matches) of denegerate cases where both are set (or not set)?

Comment on lines +109 to +112
let free_ip = pool
.iter_mut()
.filter(|(_, state)| matches!(state, AllocationState::Free))
.choose(&mut rand::thread_rng())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this form of allocation be quite costly as we'd have to iterate the whole map? why not

  1. have two structures, one for allocated and one for free IpPair
  2. allocate addresses sequentially from the free pool

}
_ = self.ip_cleanup_interval.next() => {
// Periodically cleanup stale IP allocations
let freed = self.ip_pool.cleanup_stale(DEFAULT_IP_STALE_AGE).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems quite drastic given the staleness is set to 1h. what if there's still a connected client using the addresses and then the pair is re-allocated?

use nym_lp::codec::parse_lp_packet;

// Read 4-byte length prefix (u32 big-endian)
let mut len_buf = [0u8; 4];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tihs length prefixing looks like something that should have been handled by a codec and not dealt with manually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that codec is designed for datagram parsing, and not stream parsing, so this is a sort of adapter where we convert the stream to what the codec wants, and the use it

}

/// Receive an LP packet from the stream with proper length-prefixed framing
async fn receive_packet(&self, stream: &mut TcpStream) -> Result<LpPacket, GatewayError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we duplicating packet serialisation/deserialisation (from handler.rs)? Can't we wrap TcpStream with a codec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it again, this is debt from moving from TcpStreams to one Tcp connection per packet (to remain sans-io-ish), it can be consolidated nicely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And after more careful looking, I'd just leave it, if we want a generalized helper, and if we want it to live in nym-lp where it should be, then we need to drag tokio in there, all in all, it seems simpler to just special case it, I'm also a bit uncomfortable with 4 places where we duplicate this, but it seems simpler to just leave it

// Continue handshake until complete
loop {
// Read incoming packet
let packet = self.receive_packet(stream).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one instance of many, but we're not checking for shutdown - what if we end up waiting until the timeout and block the whole process?


// For mock ecash mode (local testing), skip cryptographic verification
// and just return a dummy bandwidth value since we don't have blockchain access
let allocated = if ecash_verifier.is_mock() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit is_mock() is a bit redundant. couldn't just the MockVerifier immediately return the mock amount in its verify() method?

inc!("lp_registration_attempts_total");

// 1. Validate timestamp for replay protection
if !request.validate_timestamp(30) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't the timestamp skew set in the config? shouldn't we use that value instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config probably came later, nice catch

Copy link
Contributor

@jstuczyn jstuczyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • when registering with the exit gateway (LP), we seem to connect to that node directly - doesn't that go against the whole idea of 2-hop? This comment, however, might be completely redundant because I can assume the telescoping PR changes this behaviour
  • LpTranport could easily be wrapped into Framed<TcpStream, LpCodec>, because that's essentially what send_packet and receive_packet are doing. (the same is true for the LpRegistrationClient) plus I already mentioned duplication of this very functionality in different places.

@@ -46,7 +50,7 @@ impl RegistrationClientBuilder {
let builder = MixnetClientBuilder::new_with_storage(mixnet_client_storage)
.event_tx(EventSender(event_tx));
let mixnet_client = tokio::time::timeout(
self.config.mixnet_client_startup_timeout,
MIXNET_CLIENT_STARTUP_TIMEOUT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-ish: shouldn't this still be configurable like before? I can imagine different scenarios might require different timeouts

mixnet_client: None,
source: RegistrationClientError::Cancelled,
})?;
let (entry, exit) = Box::pin(async { tokio::join!(entry_fut, exit_fut) }).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you seem to have removed checking for shutdown here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a messed up rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants