Skip to content

Stop decaying liquidity information during scoring #2656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
6471eb0
Depend on `libm` in `no-std` for `powf`(64)
TheBlueMatt Nov 1, 2023
6c366cf
Pass the current time through `ScoreUpDate` methods
TheBlueMatt Oct 2, 2023
b84842a
Add a scoring decay method to the `ScoreUpdate` trait
TheBlueMatt Oct 2, 2023
f0f8194
Track historical liquidity update time separately from the bounds
TheBlueMatt Oct 2, 2023
9659c06
Impl decaying in `ProbabilisticScorer::decay_liquidity_certainty`
TheBlueMatt Oct 2, 2023
35b4964
Stop decaying historical liquidity information during scoring
TheBlueMatt Oct 9, 2023
6f8838f
Stop decaying liquidity information during bounds-based scoring
TheBlueMatt Oct 9, 2023
5ac68c1
Update history bucket last_update time immediately on update
TheBlueMatt Nov 29, 2023
d54c930
Pipe `Duration`-based time information through scoring pipeline
TheBlueMatt Oct 9, 2023
2288842
Use `Duration` based time info in scoring rather than `Time`
TheBlueMatt Oct 9, 2023
d15a354
Drop now-unused `T: Time` bound on `ProbabilisticScorer`
TheBlueMatt Oct 9, 2023
512f44c
Drop now-trivial `decayed_offset_msat` helper utility
TheBlueMatt Oct 9, 2023
40b4094
Add a benchmark for decaying a 100k channel scorer's liquidity info
TheBlueMatt Oct 9, 2023
81389de
Drop warning about mixing `no-std` and `std` `ProbabilisticScorer`s
TheBlueMatt Oct 12, 2023
21facd0
Make scorer decay + persistence more frequent
TheBlueMatt Nov 29, 2023
18b4231
Drop half-life-based bucket decay in `update_history_buckets`
TheBlueMatt Nov 29, 2023
f8fb70a
Drop fake time advancing in scoring tests
TheBlueMatt Dec 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion bench/benches/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ criterion_group!(benches,
lightning_persister::fs_store::bench::bench_sends,
lightning_rapid_gossip_sync::bench::bench_reading_full_graph_from_file,
lightning::routing::gossip::benches::read_network_graph,
lightning::routing::gossip::benches::write_network_graph);
lightning::routing::gossip::benches::write_network_graph,
lightning::routing::scoring::benches::decay_100k_channel_bounds);
criterion_main!(benches);
93 changes: 63 additions & 30 deletions lightning-background-processor/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ const ONION_MESSAGE_HANDLER_TIMER: u64 = 1;
const NETWORK_PRUNE_TIMER: u64 = 60 * 60;

#[cfg(not(test))]
const SCORER_PERSIST_TIMER: u64 = 60 * 60;
const SCORER_PERSIST_TIMER: u64 = 60 * 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should use a constant here. It should be no more than the user-defined half-life, ideally such that the half-life is divisible by it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess? If a user sets an aggressive half-life I'm not entirely convinced we want to spin their CPU trying to decay liquidity bounds. Doing it a bit too often when they set a super high decay also seems fine-ish? I agree it'd be a bit nicer to switch to some function of the configured half-life, but I'm not sure its worth adding some accessor to ScoreUpdate.

#[cfg(test)]
const SCORER_PERSIST_TIMER: u64 = 1;

Expand Down Expand Up @@ -244,30 +244,30 @@ fn handle_network_graph_update<L: Deref>(
/// Updates scorer based on event and returns whether an update occurred so we can decide whether
/// to persist.
fn update_scorer<'a, S: 'static + Deref<Target = SC> + Send + Sync, SC: 'a + WriteableScore<'a>>(
scorer: &'a S, event: &Event
scorer: &'a S, event: &Event, duration_since_epoch: Duration,
) -> bool {
match event {
Event::PaymentPathFailed { ref path, short_channel_id: Some(scid), .. } => {
let mut score = scorer.write_lock();
score.payment_path_failed(path, *scid);
score.payment_path_failed(path, *scid, duration_since_epoch);
},
Event::PaymentPathFailed { ref path, payment_failed_permanently: true, .. } => {
// Reached if the destination explicitly failed it back. We treat this as a successful probe
// because the payment made it all the way to the destination with sufficient liquidity.
let mut score = scorer.write_lock();
score.probe_successful(path);
score.probe_successful(path, duration_since_epoch);
},
Event::PaymentPathSuccessful { path, .. } => {
let mut score = scorer.write_lock();
score.payment_path_successful(path);
score.payment_path_successful(path, duration_since_epoch);
},
Event::ProbeSuccessful { path, .. } => {
let mut score = scorer.write_lock();
score.probe_successful(path);
score.probe_successful(path, duration_since_epoch);
},
Event::ProbeFailed { path, short_channel_id: Some(scid), .. } => {
let mut score = scorer.write_lock();
score.probe_failed(path, *scid);
score.probe_failed(path, *scid, duration_since_epoch);
},
_ => return false,
}
Comment on lines 249 to 273
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this mean channels along recently used paths will have their offsets decayed but other channels will not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather the opposite - by the end of the patchset, we only decay in the timer method. When updating we just set the last-update to duration_since_epoch. In theory if a channel is updated in between each timer tick it won't be materially decayed, but I think that's kinda okay, I mean its not a lot of time anyway. If we want to be more pedantically correct I could decay the old data before update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm confused, but it looks like we only decay once per hour in the background processor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus once on startup. I'm not understanding your issue you're raising, are you saying we should reduce the hour to something less?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was pointing out that we are left in a state of partial decay. Added a comment elsewhere, but if you modify last_updated and set, say, the max offset, then you need to decay the min offset. Otherwise, it won't be properly decayed on the timer tick. So --after fixing that -- you'll end up with recently used channels decayed while the others are not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that said, I'm not really convinced either is a super critical issue, at least if we decay more often, at max we'd be off by a small part of a half-life.

Hmm... if one offset is updated frequently, you'll get into a state where the other offset is only ever partially decayed even though it may have been given that value many half-lives ago. So would really depend on both payment and decay frequency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're regularly sending some sats over a channel successfully, so we're constantly reducing our upper bound by the amount we're sending, I think its fine to not decay the lower bound? We'll eventually pick some other channel to send over cause we ran out of estimated liquidity, and we'll decay at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, that's not the only scenario. Failures at a channel and downstream from it adjust it's upper and lower bounds, respectively. So if you fail downstream with increasing amounts, the upper bound may not be properly decayed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but presumably repeatedly failing downstream of a channel with higher and higher amounts isn't super likely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for the same payment or at the same downstream channel. From the perspective of the scored channel, it's simply the normal case of learning a more accurate lower bound on its liquidity as a consequence of knowing a payment routed through it but failed downstream.

Expand All @@ -280,7 +280,7 @@ macro_rules! define_run_body {
$channel_manager: ident, $process_channel_manager_events: expr,
$peer_manager: ident, $process_onion_message_handler_events: expr, $gossip_sync: ident,
$logger: ident, $scorer: ident, $loop_exit_check: expr, $await: expr, $get_timer: expr,
$timer_elapsed: expr, $check_slow_await: expr
$timer_elapsed: expr, $check_slow_await: expr, $time_fetch: expr,
) => { {
log_trace!($logger, "Calling ChannelManager's timer_tick_occurred on startup");
$channel_manager.timer_tick_occurred();
Expand All @@ -294,6 +294,7 @@ macro_rules! define_run_body {
let mut last_scorer_persist_call = $get_timer(SCORER_PERSIST_TIMER);
let mut last_rebroadcast_call = $get_timer(REBROADCAST_TIMER);
let mut have_pruned = false;
let mut have_decayed_scorer = false;

loop {
$process_channel_manager_events;
Expand Down Expand Up @@ -383,11 +384,10 @@ macro_rules! define_run_body {
if should_prune {
// The network graph must not be pruned while rapid sync completion is pending
if let Some(network_graph) = $gossip_sync.prunable_network_graph() {
#[cfg(feature = "std")] {
if let Some(duration_since_epoch) = $time_fetch() {
log_trace!($logger, "Pruning and persisting network graph.");
network_graph.remove_stale_channels_and_tracking();
}
#[cfg(not(feature = "std"))] {
network_graph.remove_stale_channels_and_tracking_with_time(duration_since_epoch.as_secs());
} else {
log_warn!($logger, "Not pruning network graph, consider enabling `std` or doing so manually with remove_stale_channels_and_tracking_with_time.");
log_trace!($logger, "Persisting network graph.");
}
Expand All @@ -402,9 +402,24 @@ macro_rules! define_run_body {
last_prune_call = $get_timer(prune_timer);
}

if !have_decayed_scorer {
if let Some(ref scorer) = $scorer {
if let Some(duration_since_epoch) = $time_fetch() {
log_trace!($logger, "Calling time_passed on scorer at startup");
scorer.write_lock().time_passed(duration_since_epoch);
}
}
have_decayed_scorer = true;
}

if $timer_elapsed(&mut last_scorer_persist_call, SCORER_PERSIST_TIMER) {
if let Some(ref scorer) = $scorer {
log_trace!($logger, "Persisting scorer");
if let Some(duration_since_epoch) = $time_fetch() {
log_trace!($logger, "Calling time_passed and persisting scorer");
scorer.write_lock().time_passed(duration_since_epoch);
} else {
log_trace!($logger, "Persisting scorer");
}
if let Err(e) = $persister.persist_scorer(&scorer) {
log_error!($logger, "Error: Failed to persist scorer, check your disk and permissions {}", e)
}
Expand Down Expand Up @@ -510,12 +525,16 @@ use core::task;
/// are unsure, you should set the flag, as the performance impact of it is minimal unless there
/// are hundreds or thousands of simultaneous process calls running.
///
/// The `fetch_time` parameter should return the current wall clock time, if one is available. If
/// no time is available, some features may be disabled, however the node will still operate fine.
///
/// For example, in order to process background events in a [Tokio](https://tokio.rs/) task, you
/// could setup `process_events_async` like this:
/// ```
/// # use lightning::io;
/// # use std::sync::{Arc, RwLock};
/// # use std::sync::atomic::{AtomicBool, Ordering};
/// # use std::time::SystemTime;
/// # use lightning_background_processor::{process_events_async, GossipSync};
/// # struct MyStore {}
/// # impl lightning::util::persist::KVStore for MyStore {
Expand Down Expand Up @@ -584,6 +603,7 @@ use core::task;
/// Some(background_scorer),
/// sleeper,
/// mobile_interruptable_platform,
/// || Some(SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap())
/// )
/// .await
/// .expect("Failed to process events");
Expand Down Expand Up @@ -620,11 +640,12 @@ pub async fn process_events_async<
S: 'static + Deref<Target = SC> + Send + Sync,
SC: for<'b> WriteableScore<'b>,
SleepFuture: core::future::Future<Output = bool> + core::marker::Unpin,
Sleeper: Fn(Duration) -> SleepFuture
Sleeper: Fn(Duration) -> SleepFuture,
FetchTime: Fn() -> Option<Duration>,
>(
persister: PS, event_handler: EventHandler, chain_monitor: M, channel_manager: CM,
gossip_sync: GossipSync<PGS, RGS, G, UL, L>, peer_manager: PM, logger: L, scorer: Option<S>,
sleeper: Sleeper, mobile_interruptable_platform: bool,
sleeper: Sleeper, mobile_interruptable_platform: bool, fetch_time: FetchTime,
) -> Result<(), lightning::io::Error>
where
UL::Target: 'static + UtxoLookup,
Expand All @@ -648,15 +669,18 @@ where
let scorer = &scorer;
let logger = &logger;
let persister = &persister;
let fetch_time = &fetch_time;
async move {
if let Some(network_graph) = network_graph {
handle_network_graph_update(network_graph, &event)
}
if let Some(ref scorer) = scorer {
if update_scorer(scorer, &event) {
log_trace!(logger, "Persisting scorer after update");
if let Err(e) = persister.persist_scorer(&scorer) {
log_error!(logger, "Error: Failed to persist scorer, check your disk and permissions {}", e)
if let Some(duration_since_epoch) = fetch_time() {
if update_scorer(scorer, &event, duration_since_epoch) {
log_trace!(logger, "Persisting scorer after update");
if let Err(e) = persister.persist_scorer(&scorer) {
log_error!(logger, "Error: Failed to persist scorer, check your disk and permissions {}", e)
}
}
}
}
Expand Down Expand Up @@ -688,7 +712,7 @@ where
task::Poll::Ready(exit) => { should_break = exit; true },
task::Poll::Pending => false,
}
}, mobile_interruptable_platform
}, mobile_interruptable_platform, fetch_time,
)
}

Expand Down Expand Up @@ -810,7 +834,10 @@ impl BackgroundProcessor {
handle_network_graph_update(network_graph, &event)
}
if let Some(ref scorer) = scorer {
if update_scorer(scorer, &event) {
use std::time::SystemTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this functions uses system time now, it should probably be #[cfg(all(feature = "std"), not(feature = "no-std")] to handle when you'll be able to use both flags together in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just feature=std is correct. If two crates depend on LDK, with one setting std and another setting no-std, LDK should build with all features. Otherwise, the create relying on std features will fail to compile because of an unrelated crate also in the dependency tree.

let duration_since_epoch = SystemTime::now().duration_since(SystemTime::UNIX_EPOCH)
.expect("Time should be sometime after 1970");
if update_scorer(scorer, &event, duration_since_epoch) {
log_trace!(logger, "Persisting scorer after update");
if let Err(e) = persister.persist_scorer(&scorer) {
log_error!(logger, "Error: Failed to persist scorer, check your disk and permissions {}", e)
Expand All @@ -829,7 +856,12 @@ impl BackgroundProcessor {
channel_manager.get_event_or_persistence_needed_future(),
chain_monitor.get_update_future()
).wait_timeout(Duration::from_millis(100)); },
|_| Instant::now(), |time: &Instant, dur| time.elapsed().as_secs() > dur, false
|_| Instant::now(), |time: &Instant, dur| time.elapsed().as_secs() > dur, false,
|| {
use std::time::SystemTime;
Some(SystemTime::now().duration_since(SystemTime::UNIX_EPOCH)
.expect("Time should be sometime after 1970"))
},
)
});
Self { stop_thread: stop_thread_clone, thread_handle: Some(handle) }
Expand Down Expand Up @@ -1117,7 +1149,7 @@ mod tests {
}

impl ScoreUpdate for TestScorer {
fn payment_path_failed(&mut self, actual_path: &Path, actual_short_channel_id: u64) {
fn payment_path_failed(&mut self, actual_path: &Path, actual_short_channel_id: u64, _: Duration) {
if let Some(expectations) = &mut self.event_expectations {
match expectations.pop_front().unwrap() {
TestResult::PaymentFailure { path, short_channel_id } => {
Expand All @@ -1137,7 +1169,7 @@ mod tests {
}
}

fn payment_path_successful(&mut self, actual_path: &Path) {
fn payment_path_successful(&mut self, actual_path: &Path, _: Duration) {
if let Some(expectations) = &mut self.event_expectations {
match expectations.pop_front().unwrap() {
TestResult::PaymentFailure { path, .. } => {
Expand All @@ -1156,7 +1188,7 @@ mod tests {
}
}

fn probe_failed(&mut self, actual_path: &Path, _: u64) {
fn probe_failed(&mut self, actual_path: &Path, _: u64, _: Duration) {
if let Some(expectations) = &mut self.event_expectations {
match expectations.pop_front().unwrap() {
TestResult::PaymentFailure { path, .. } => {
Expand All @@ -1174,7 +1206,7 @@ mod tests {
}
}
}
fn probe_successful(&mut self, actual_path: &Path) {
fn probe_successful(&mut self, actual_path: &Path, _: Duration) {
if let Some(expectations) = &mut self.event_expectations {
match expectations.pop_front().unwrap() {
TestResult::PaymentFailure { path, .. } => {
Expand All @@ -1192,6 +1224,7 @@ mod tests {
}
}
}
fn time_passed(&mut self, _: Duration) {}
}

#[cfg(c_bindings)]
Expand Down Expand Up @@ -1469,7 +1502,7 @@ mod tests {
tokio::time::sleep(dur).await;
false // Never exit
})
}, false,
}, false, || Some(Duration::ZERO),
);
match bp_future.await {
Ok(_) => panic!("Expected error persisting manager"),
Expand Down Expand Up @@ -1600,7 +1633,7 @@ mod tests {

loop {
let log_entries = nodes[0].logger.lines.lock().unwrap();
let expected_log = "Persisting scorer".to_string();
let expected_log = "Calling time_passed and persisting scorer".to_string();
if log_entries.get(&("lightning_background_processor", expected_log)).is_some() {
break
}
Expand Down Expand Up @@ -1699,7 +1732,7 @@ mod tests {
_ = exit_receiver.changed() => true,
}
})
}, false,
}, false, || Some(Duration::from_secs(1696300000)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's behind the choice of this number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its, basically, when I wrote the patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why it can't be Duration::ZERO like in the other tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it just seemed a bit more realistic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the value doesn't affect the test, it's just curious to the reader to see something different from all the other places.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I tried to switch to ZERO but the test fails - it expects to prune entries from the network graph against a static RGS snapshot that has a timestamp in it.

);

let t1 = tokio::spawn(bp_future);
Expand Down Expand Up @@ -1874,7 +1907,7 @@ mod tests {
_ = exit_receiver.changed() => true,
}
})
}, false,
}, false, || Some(Duration::ZERO),
);
let t1 = tokio::spawn(bp_future);
let t2 = tokio::spawn(async move {
Expand Down
3 changes: 2 additions & 1 deletion lightning/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ unsafe_revoked_tx_signing = []
# Override signing to not include randomness when generating signatures for test vectors.
_test_vectors = []

no-std = ["hashbrown", "bitcoin/no-std", "core2/alloc"]
no-std = ["hashbrown", "bitcoin/no-std", "core2/alloc", "libm"]
std = ["bitcoin/std"]

# Generates low-r bitcoin signatures, which saves 1 byte in 50% of the cases
Expand All @@ -48,6 +48,7 @@ regex = { version = "1.5.6", optional = true }
backtrace = { version = "0.3", optional = true }

core2 = { version = "0.3.0", optional = true, default-features = false }
libm = { version = "0.2", optional = true, default-features = false }

[dev-dependencies]
regex = "1.5.6"
Expand Down
1 change: 1 addition & 0 deletions lightning/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ extern crate hex;
#[cfg(any(test, feature = "_test_utils"))] extern crate regex;

#[cfg(not(feature = "std"))] extern crate core2;
#[cfg(not(feature = "std"))] extern crate libm;

#[cfg(ldk_bench)] extern crate criterion;

Expand Down
5 changes: 3 additions & 2 deletions lightning/src/routing/router.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8160,6 +8160,7 @@ mod tests {
pub(crate) mod bench_utils {
use super::*;
use std::fs::File;
use std::time::Duration;

use bitcoin::hashes::Hash;
use bitcoin::secp256k1::{PublicKey, Secp256k1, SecretKey};
Expand Down Expand Up @@ -8308,10 +8309,10 @@ pub(crate) mod bench_utils {
if let Ok(route) = route_res {
for path in route.paths {
if seed & 0x80 == 0 {
scorer.payment_path_successful(&path);
scorer.payment_path_successful(&path, Duration::ZERO);
} else {
let short_channel_id = path.hops[path.hops.len() / 2].short_channel_id;
scorer.payment_path_failed(&path, short_channel_id);
scorer.payment_path_failed(&path, short_channel_id, Duration::ZERO);
}
seed = seed.overflowing_mul(6364136223846793005).0.overflowing_add(1).0;
}
Expand Down
Loading