Skip to content

Conversation

@durch
Copy link
Contributor

@durch durch commented Dec 5, 2025

Add gateway-probe localnet mode with LP-based two-hop WireGuard testing

Summary

  • Add localnet mode to gateway-probe for testing LP registration and two-hop WireGuard tunnels in a containerized environment
  • Implement two-hop WireGuard tunnel support (Client → Entry Gateway → Exit Gateway → Internet)
  • Add mock ecash credentials for local testing without real blockchain
  • Add Docker/Container-based localnet infrastructure for multi-gateway testing

Key Changes

Gateway Probe Localnet Mode (nym-gateway-probe/)

  • New --localnet flag to test against container-based local network
  • TestMode enum to separate mainnet/localnet behavior
  • Two-hop WireGuard via UDP forwarder pattern (ported from VPN client)
  • LP-based registration flow testing with nested sessions
  • Mock ecash support via --mock-ecash flag

Localnet Infrastructure (docker/localnet/)

  • localnet.sh - Orchestrates multi-container test network (nym-api, validators, gateways)
  • Dockerfile.localnet - Container image with nym binaries + test tooling
  • build_topology.py - Generates network topology from running containers
  • NAT/iptables configuration for proper gateway routing

Gateway LP Registration (gateway/src/node/lp_listener/)

  • Removed unnecessary PSK from WireGuard peer registration (was breaking handshakes)

Known Issues

  • Container networking to external internet becomes unstable after probe runs. Internal container traffic continues working. Workaround: restart containers between probe runs. Needs validation on real machines

This change is Reviewable

durch added 11 commits December 1, 2025 10:18
- Add CLI args for localnet testing (no HTTP API needed):
  --entry-gateway-identity, --exit-gateway-identity
  --entry-lp-address, --exit-lp-address, --lp-port

- Add TestMode enum (Mixnet, SingleHop, TwoHop, LpOnly) with
  --mode CLI arg and auto-inference from legacy flags

- Add TestedNodeDetails::from_cli() for localnet mode

- Add Probe::new_localnet() constructor

- Fix LpRegistrationClient API calls for packet-per-connection model
The mode_to_flags() function was discarding the original only_wireguard
flag. Now we preserve args.only_wireguard since it's orthogonal to the
test mode (it means "skip ping tests" in mixnet mode).
- Extract TestMode enum to mode/mod.rs for cleaner organization
- Add common/wireguard.rs with shared WireGuard tunnel testing
- Deduplicate netstack code from wg_probe() and wg_probe_lp()
- Net reduction of 174 lines in lib.rs
- from_flags() tests for all flag combinations
- Helper method tests (needs_mixnet, uses_lp, tests_wireguard, needs_exit_gateway)
- Display and FromStr tests with alternate formats
- Roundtrip test ensuring Display/FromStr consistency

Closes: nym-39mt
- Replace only_lp_registration and test_lp_wg boolean params with TestMode
- Keep only_wireguard separate (controls ping behavior in Mixnet mode)
- Use TestMode helper methods for cleaner control flow:
  - needs_mixnet() && !only_wireguard → run ping tests
  - tests_wireguard() → run WG tests
  - uses_lp() → use LP path instead of authenticator
- Convert legacy flags to TestMode at call sites for backward compatibility

Closes: nym-dd70
- Add early validation for --mode two-hop without exit gateway (nym-c0hl)
  Clear error message instead of silent failure deep in probe

- Add safe_ratio() helper for division by zero protection (nym-ktvz)
  Returns 0.0 when sent_hosts/sent_ips is 0 instead of NaN/Inf

- Refactor run_tunnel_tests to take &mut WgProbeResults (nym-4v1p)
  Eliminates 40+ lines of field-by-field copying at call sites

Closes: nym-c0hl, nym-ktvz, nym-4v1p
- Remove unused _storage parameter from wg_probe_lp (nym-r3w9)
- Fix common/mod.rs docs to match implemented features (nym-1nsv)

Closes: nym-r3w9, nym-1nsv, nym-inol (epic)
- Add Test Modes section explaining mixnet/single-hop/two-hop/lp-only
- Add Localnet Mode (run-local) usage examples
- Add Split Network Configuration for docker setups
- Add CLI Reference with all new flags
- Add Output section with JSON example

Closes: nym-mj2q
Key changes:
- Add outer_aead_key_for_sending() to gate outer encryption on PSQ completion
  (fixes bug where initiator encrypted msg 1 before responder could decrypt)
- Add handshake_and_register_with_credential() to NestedLpSession for mock ecash
- Update PSQState::InitiatorWaiting to store PSK instead of ciphertext
- Add probe-localnet.sh script for two-hop localnet testing
- Update gateway handler with connection lifecycle statistics

The PSK timing fix ensures the first Noise message is sent in cleartext
because the responder hasn't derived the PSK yet from the PSQ payload.
Add UDP forwarder pattern (copied from VPN client) to enable proper two-hop
tunneling where traffic flows: Client → Entry Gateway → Exit Gateway → Internet.

Key changes:
- Add udp_forwarder.go for tunnel-in-tunnel traffic forwarding
- Add wgPingTwoHop() Go function and Rust FFI bindings
- Configure NAT/iptables in localnet for gateway routing
- Remove unnecessary PSK from gateway LP registration (was breaking handshakes)
- Document known container networking instability issue (nym-vbdo)

The probe now correctly uses the entry tunnel to reach the exit gateway's
WireGuard endpoint, rather than trying to connect directly to unreachable
container-internal IPs.
@vercel
Copy link

vercel bot commented Dec 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
nym-explorer-v2 Ready Ready Preview, Comment Dec 22, 2025 9:16am
2 Skipped Deployments
Project Deployment Review Updated (UTC)
docs-nextra Ignored Ignored Preview Dec 22, 2025 9:16am
nym-node-status Ignored Ignored Preview Dec 22, 2025 9:16am


impl TestMode {
/// Infer test mode from legacy boolean flags (backward compatibility)
pub fn from_flags(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than doing this overcomplicated flag combination matching, why not just expose a --test-mode flag in CLI instead? it would then directly parse into a TestMode

}
} else if only_wireguard {
// WireGuard via authenticator (still uses mixnet path)
TestMode::Mixnet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what's the point of only-wireguard if in the end it ends up with the mixnet mode?

--entry-gateway-identity "$ENTRY_ID" \
--entry-lp-address '127.0.0.1:41264' \
--exit-gateway-identity "$EXIT_ID" \
--exit-lp-address '192.168.65.6:41264' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is exit api on the local host, but this one assumes address on a container network (that I assume is unstable and X.X.65.6 will change once restarted?

///
/// Note: For sending packets during handshake, use `outer_aead_key_for_sending()`
/// which checks PSQ state to avoid encrypting before the responder can decrypt.
pub fn outer_aead_key(&self) -> Option<OuterAeadKey> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whilst this code hasn't been changed directly, I'm not sure if it's a good practice to clone the key -> wouldn't we be able to get away by returning a reference instead?

let outer_key = state_machine.session().ok().and_then(|s| s.outer_aead_key());
let outer_key = state_machine
.session()
.ok()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this pattern is not going to cause us issues later on. we fail to get a session (i.e. we're in either Closed or Processing state), we ignore it and then just try continue with what is going to be an empty key. why not just stop here because realistically we can't do anything useful?
I assume the same thing would apply in multiple places

));
}
};
let mut target_stream =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just traced the callstack of this guy - do I read it right that for every received client packet, we open a new connection to the target gateway every single time?
even ignoring hitting the open file limit pretty quickly, that's rather inefficient. couldn't we keep connections alive, at least for some time, from previous attempts?

exitIpc.WriteString("private_key=")
exitIpc.WriteString(req.ExitPrivateKey)
// Set listen_port so the forwarder knows which port to accept packets from
exitIpc.WriteString(fmt.Sprintf("\nlisten_port=%d", DEFAULT_EXIT_WG_CLIENT_PORT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be hardcoding the default exit port? what if the gateway has configured it to something else? (assuming we allow such configuration to happen in the first place)

/// Pre-queried exit gateway node (used when --exit-gateway-ip is specified for LP forwarding)
exit_gateway_node: Option<DirectoryNode>,
/// Localnet entry gateway info (used when --entry-gateway-identity is specified)
localnet_entry: Option<TestedNodeDetails>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than calling it "localnet", how about something that implies you provide explicit details yourself, i.e. you could very well run it on mainnet but provide all the information manually to avoid the extra queries

tracing::trace!("Built registration request: {:?}", request);

// Step 4: Serialize the request
let request_bytes = bincode::serialize(&request).map_err(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rathen than using the opaque bincode::serialize, can you make a helper that explicitly creates impl bincode::Options like we do elsewhere in such situations? this is because there are some changes in default behaviour between bincode 1.X and 2.X which would be quite painful to debug once (if) we migrate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants