Skip to content

Conversation

@D-Stacks
Copy link
Collaborator

@D-Stacks D-Stacks commented Dec 25, 2025

For main reference a corresponding paper can be found here: https://arxiv.org/abs/2006.14186

This branch is a "more" stable and tested version, I also maintain a dev-branch here: https://github.com/D-Stacks/rusty-kaspa/tree/perigee-dev but as a general disclaimer I would like to mention both are currently considered beta and work-in-progress, just in the dev-branch things might change and break more quickly, and I won't keep an updated change-log.

  • Implementation scores neighbors via a joint Subset Scoring. Utilizing the empirical 90th percentile, the top perigee peer is chosen, it's individual delay scores, where it is the top performing peer, are removed from subsequent scoring of the remaining peers which are in return rated on the empirical 90th percentile of remaining individual delay scores.. this process is rinse and repeated until the specified amount of peers to leverage are chosen. This ensures that the leveraged perigee peers are chosen holistically, minimizing delay to different parts of the block-producing network and not scored individually with overlap in respect to one-another.

  • Rounds are encapsulated within the connection manager's event loop, which means that they have a granularity of 30 seconds, specifying (--perigee_round_frequency=x) will trigger rounds every x * 30 secs. Although I still have a bug whereby sometimes a round may be skipped, Additionally, if for whatever reason the amount of perigee peers overflows the perigee peer target, an intermittent evaluation and trimming will be done, to ensure target consistency, within the connection manager's handle_outbound_connections.

  • Delays are measured in respect to the node's first seen timestamp, the paper also offers a method of using the block creation timestamp, which is not the method used here.

  • Some delays might be calculated erroneously, specifically at perigee round boundaries, for example block hash A might be registered from peer A, in perigee round A, while peer B might send it in perigee round B. As a rule perigee will only evaluate on blocks that were consensus verified within the round.. timestamps that enter the wrong round will be ignored, and as with all missing timestamps, will be evaluated as u64::MAX . But in practicality this can probably be ignored as mere "noise", and these will be filtered out via the usage of the empirical 90th percentile as the scoring mechanism.

  • Parameters such as number of perigee peers (--perigeepeers=<usize>), exploration rate(--perigee-exploration-rate=<f64>), leverage rate (--perigee_leverage_rate=<f64>) are all provided via the command line, not hard coded. Specifying values which do not result in enough exploration, or leverage, to maintain perigee will cause perigee to simply not trigger and revert to default random graph routing (with an info message). For testing purposes having flexibility here is of advantage, but we may simply provide good constants later on.

  • Current routing, can be (even partially) preserved, i.e. specifying --perigeepeers=<usize> < outpeers=<usize>, will leave the remaining peers under the current random graph routing paradigm, using both routings side-by-side.

  • Statistics are exposed via --perigee-statistics, this also automatically logs random graph routed peer timestamps, and compares these to the perigee outbound peers, for the sake of performance comparison. Ideally a node should be run in this scenario with an amount of perigeepeers == outpeers / 2, to ensure fair statistical comparison.

  • Currently field testing i am getting around 30-80% decrease in observed delay times of signaled blocks compared to random graph routing. Although this might decrease significantly if perigee is more widely adopted (as more perigee nodes will compete for, and cluster around, well-connected and block-producing nodes), or as with the case of public nodes, with a lot of inbound peers, a larger sample size of first seen timestamps can be used to calculate delays.

  • Some performance characteristics i gathered: On my machine with the start args ./kaspad --loglevel=info,kaspa_perigeemanager=debug --outpeers=16 --perigeepeers=8 --perigee-statistics --perigee-round-frequency=3 a typical perigee round with about 600 timestamps (~ 1 minute worth of data, and 8 peers, is executed in about 1-2ms, with virtually all time spent on building the peer table.. because i am currently on a bad internet connection, due to Christmas holidays, i can't really test more peers, as more peers clogs-up my internet. I also currently do not maintain a public node, since perigee also logs global first seen timestamps from all available source (i.e. including inbound peers), this might impact performance, and is something I cannot currently test.

  • As a side-product of implementing perigee, i had to categorize outbound peers accordingly, this means that each outgoing connection to a new peer now holds an enum, in both the Router, and the Peer struct, according to its expected connection and routing behavior:

#[derive(Copy, Debug, Clone)]
pub enum PeerOutboundType {
    Perigee,
    RandomGraph,
    /// this is a user-specifed persistent connection, established either via command line `--connectpeer`, or the add_peer RPC (whereby is_permanent=true). 
    /// These peers do not count towards the outbound limit, if they disconnect, the node will keep trying to reconnect to them indefinitely.
    Persistent,
    /// this is a user-specifed temporary connection, established either via command line `--addpeer`, or the add_peer RPC (whereby is_permanent=false).
    /// These peers do not count towards the outbound limit, if they disconnect, no effort will be made to reconnect.
    Temporary,
}

One caveat here is that I needed to categorize user-specified peers, these do not technically fall under either the random graph, or perigee, peer routing mechanism, and depending upon if they are specified as is_permanent, or not, display different behavior regarding persistence. currently when these peers are added to the node they occupy beyond the outbound limit, but when another connection is lost, replace the lost connection from the outbound limit's pov. As I was unsure how to handle these connections without altering or adding a bunch of new args, these peers will now not count towards the outbound limit. This may not be the desired handling, but is a change in this pr currently.

  • Regarding storage, first seen timestamps, as well as the set of verified blocks per perigee round are kept in the perigee manager.. the values of individual perigee participants are held within the router. these are stored in raw hash-maps / sets, not in any kind of cached stores. If a malicious peer where to flood the node, with erroneous BlockInvRelayMessages, these would be saved, and cleared after the next perigee round evaluates.. I guess security guarantee here is based on the fact that we expect erroneous BlockInvRelayMessages to quickly result in a protocol error, and expulsion, for that peer, and clean-up happens after the next round. In case we decide this is not enough, to strengthen security from such attacks, we could also implement a max cache size for timestamp hash-maps, or consider doing perigee clean up operations, right after a relevant protocol error is encountered. It is worth noting that erroneous timestamps cannot be falsely evaluated, as all evaluation is done only on blocks that have been consensus verified within the round.

  • Some thoughts on actual adoption, in essence perigee routing only benefits those nodes that actually require low latency block arrivals, I believe in the case of Kaspa, these are predominantly miners. although the network as a whole would benefit if the whole network adopts perigee, it may be more advantageous if this routing is only advertised to miners, or potentially to make it available only when --enable-mainnet-mining is set, or cheat it in as the default with that flag. Alternatively I believe only running a few perigee nodes in the wild could act as a poor-man's fast relay network, acting as a relatively direct and implicit relay between miners, thereby benefiting the network as a whole. Also having a lot of perigee nodes might put more pressure on the inbound limit on well-connected nodes, as such it might be worth while revisiting: inbound eviction logic / policy.  #431 first.

  • Probably code could use some rudimentary tests, and some clean up maybe catching some bugs etc.. but decided to post it as a draft so there can be some discussion about it on the side.

@D-Stacks D-Stacks marked this pull request as draft December 25, 2025 23:01
@D-Stacks D-Stacks changed the title Feat: Implementation of Perigee p2p Routing. P2P / Networking: — Implementation of Perigee p2p Routing. Jan 7, 2026
@D-Stacks D-Stacks changed the title P2P / Networking: — Implementation of Perigee p2p Routing. P2P / Networking — Implementation of Perigee p2p Routing. Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant