Skip to content

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Sep 2, 2025

Implements the sender-side of sending payments as an often-offline sender to an often-offline recipient.

  • Fail back hold HTLCs if we are an often-offline node
  • Finalize design
  • Finish tests

Partially addresses #2298
Based on #4044, #4045

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 2, 2025

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link

codecov bot commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 92.37845% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.73%. Comparing base (c71334e) to head (9d71121).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 78.35% 17 Missing and 4 partials ⚠️
lightning/src/ln/async_payments_tests.rs 96.32% 13 Missing and 7 partials ⚠️
lightning/src/ln/channel.rs 87.93% 5 Missing and 2 partials ⚠️
lightning/src/ln/functional_test_utils.rs 73.33% 3 Missing and 1 partial ⚠️
lightning/src/offers/flow.rs 78.94% 3 Missing and 1 partial ⚠️
lightning/src/util/config.rs 33.33% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4046      +/-   ##
==========================================
+ Coverage   88.63%   88.73%   +0.10%     
==========================================
  Files         176      176              
  Lines      131920   132601     +681     
  Branches   131920   132601     +681     
==========================================
+ Hits       116927   117667     +740     
+ Misses      12325    12254      -71     
- Partials     2668     2680      +12     
Flag Coverage Δ
fuzzing 21.58% <16.74%> (-0.02%) ⬇️
tests 88.57% <92.74%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few initial comments.

Awesome to see how you bring everything that has been planned and discussed upfront together in this PR. Great work Val.

/// Parameters for the reply path to a [`HeldHtlcAvailable`] onion message.
pub enum HeldHtlcReplyPath {
/// The reply path to the [`HeldHtlcAvailable`] message should terminate at our node.
ToUs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too late now, but I think it would have been okay to just not implement the always-online sender to mostly-offline receiver case yet and focus purely on LSP clients.

fn hold_htlc_channels(&self) -> Result<Vec<ChannelDetails>, ()> {
let should_send_async = {
let cfg = self.config.read().unwrap();
cfg.hold_outbound_htlcs_at_next_hop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't just hold_outbound_htlcs_at_next_hop enough as a condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That option is true by default, so since users in general don't change defaults, my thinking was that we should override it if the user has proactively configured themselves to be an announced node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think I'd prefer a single clean flag that defines the node behavior. It's not that we are talking about an end-user check box somewhere in the config menu. This is for integrators that know what they are doing?

But no blocker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make the config knob an Option<bool>? Then we can default it to None, which bases on peers and announced-channels but users can override it? I don't feel super strongly but I agree that it does feel weird.

Copy link
Contributor

@joostjager joostjager Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in favor of dealing with uncertainty by adding flexibility. There is a cost to flexibility. Not just code, but also maintenance, dev questions, user questions, testing. Although of course in this particular case it isn't much.

What I think is the better option when faced with an unknown is to implement the absolute simplest and only extend it when there is a need.

@@ -2647,11 +2647,12 @@ pub fn get_route(send_node: &Node, route_params: &RouteParameters) -> Result<Rou
let scorer = TestScorer::new();
let keys_manager = TestKeysInterface::new(&[0u8; 32], Network::Testnet);
let random_seed_bytes = keys_manager.get_secure_random_bytes();
let first_hops = send_node.node.list_usable_channels();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be discoverable through #4045 (comment) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @TheBlueMatt offline, this is not something we should be concerned about apparently: "An ldk user can always create a lock inversion by calling ldk things in the same expression 🤷‍♂️ nothing we can really do to stop them"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes okay, but I meant just to auto-discover the issue in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, yes that's right. In general tests moving forward will probably use ChannelManager::send_payment and not ::send_to_route, the latter API is mostly used for older tests before we had send_payment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to see if lock ordering problems are indeed automatically detected in tests when running with --features backtrace as @TheBlueMatt suggested in the comment above, but I didn't succeed. Tried with this patch:

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 727617028..2c475d771 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -10866,6 +10866,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
                                                        },
                                                }
                                        } else {
+                                               let tmp = self.pending_intercepted_htlcs.lock().unwrap();
                                                match self.forward_htlcs.lock().unwrap().entry(scid) {
                                                        hash_map::Entry::Occupied(mut entry) => {
                                                                entry.get_mut().push(HTLCForwardInfo::AddHTLC(pending_add));
@@ -10874,6 +10875,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
                                                                entry.insert(vec![HTLCForwardInfo::AddHTLC(pending_add)]);
                                                        },
                                                }
+                                               tmp.is_empty();
                                        }
                                }
                        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yes that actually makes sense. But it isn't though as if that lock tree in comments is unnecessary, and also it isn't fully enforced by debug_sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the lock tree in comments isn't really necessary and it is human-maintained and can become outdated. I do find it useful to reference sometimes when I run into lock issues, personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tree prescribes that some two locks can't be held at the same time (such as the forward_htlcs and intercepted_htlcs). I thought that would somehow be checked automatically, but it doesn't appear to be the case. So the tree in comments is useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in some cases you could hold locks at the same time, you should just update the lock order docs if you do that since it creates a dependency, so that's a bit confusing. I'm not married to keeping it, I've just found it useful to reference when taking locks. Before we had it I remember having to scan the file to see what order locks are taken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes a total side discussion, but it would be nice if that tree could be enforced in debug in some way or the other.

// Create this network topology:
// LSP1
// / | \
// sender | recipient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty advanced scenario. It seems more likely that an multi-part payment will be sent through one LSP, that LSP holding both htlcs? Or maybe not likely at all with splicing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tis advanced and unlikely... We do support it though so I wanted to check that it works. We have been discussing liquidity ads lately so maybe multiple LSPs will become more common in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a multi-part payment through a single LSP doesn't necessarily mean that that LSP client has more than one channel. It could still be one channel, but the payment sharded because there isn't enough capacity at a more distant point in the network.

I don't know if this is a scenario worth covering. Doesn't seem essential, but suggesting nonetheless.

@valentinewallace
Copy link
Contributor Author

Rebased since #4045, will address feedback/CI issues when I next push.

fn hold_htlc_channels(&self) -> Result<Vec<ChannelDetails>, ()> {
let should_send_async = {
let cfg = self.config.read().unwrap();
cfg.hold_outbound_htlcs_at_next_hop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think I'd prefer a single clean flag that defines the node behavior. It's not that we are talking about an end-user check box somewhere in the config menu. This is for integrators that know what they are doing?

But no blocker.

@@ -2647,11 +2647,12 @@ pub fn get_route(send_node: &Node, route_params: &RouteParameters) -> Result<Rou
let scorer = TestScorer::new();
let keys_manager = TestKeysInterface::new(&[0u8; 32], Network::Testnet);
let random_seed_bytes = keys_manager.get_secure_random_bytes();
let first_hops = send_node.node.list_usable_channels();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to see if lock ordering problems are indeed automatically detected in tests when running with --features backtrace as @TheBlueMatt suggested in the comment above, but I didn't succeed. Tried with this patch:

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 727617028..2c475d771 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -10866,6 +10866,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
                                                        },
                                                }
                                        } else {
+                                               let tmp = self.pending_intercepted_htlcs.lock().unwrap();
                                                match self.forward_htlcs.lock().unwrap().entry(scid) {
                                                        hash_map::Entry::Occupied(mut entry) => {
                                                                entry.get_mut().push(HTLCForwardInfo::AddHTLC(pending_add));
@@ -10874,6 +10875,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
                                                                entry.insert(vec![HTLCForwardInfo::AddHTLC(pending_add)]);
                                                        },
                                                }
+                                               tmp.is_empty();
                                        }
                                }
                        }

unify_blockheight_across_nodes(&nodes);

// We need to be able to predict the offer_nonce to hardcode blinded payment paths containing a
// JIT channel scid for the recipient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't immediately clear why this is needed. Is that for the expect?

Copy link
Contributor Author

@valentinewallace valentinewallace Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I elaborated in the comment, is it clearer?

Edit: updated the comment again based on offline discussion

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@valentinewallace
Copy link
Contributor Author

Rebased due to conflicts conflicts after #4049 landed, will address feedback on the next push

@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label Sep 15, 2025
@TheBlueMatt TheBlueMatt added this to the 0.2 milestone Sep 15, 2025
@joostjager
Copy link
Contributor

Does this PR also handle a duplicate StaticInvoice onion message correctly, regardless of arrival order, as introduced with #4049?

fn hold_htlc_channels(&self) -> Result<Vec<ChannelDetails>, ()> {
let should_send_async = {
let cfg = self.config.read().unwrap();
cfg.hold_outbound_htlcs_at_next_hop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make the config knob an Option<bool>? Then we can default it to None, which bases on peers and announced-channels but users can override it? I don't feel super strongly but I agree that it does feel weird.

} else {
const ONE_DAY_BLOCKS: u16 = 144;
for chan in hold_htlc_channels.iter_mut() {
// Increase the CLTV delta for this channel so that they will hold the HTLC for up to one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to modify the ChannelDetails that we'll pass to the router (which may get the wrong idea) vs editing the resulting Route, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the router would get the wrong idea? The router adds a random offset based on how much CLTV delta is remaining to not exceed the max_total_cltv_expiry_delta, so I was thinking it would be good for it to have accurate numbers when it does that. I think we'd need some extra checks if we edit the route after the fact.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The router might very reasonably refuse to pay over channels with an extremely high minimum-cltv-expiry-delta, for example. Indeed, there's a bit of a dance to do here but I don't think its wildly complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I could see that. Moving this item to the follow-up PR.

@valentinewallace
Copy link
Contributor Author

Updated with diff, some minor test tweaks and whitespace fixes. Holding off on altering the config option til we're convinced we have a better option, I'm not super sure yet.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few responses, at least for now

} else {
const ONE_DAY_BLOCKS: u16 = 144;
for chan in hold_htlc_channels.iter_mut() {
// Increase the CLTV delta for this channel so that they will hold the HTLC for up to one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The router might very reasonably refuse to pay over channels with an extremely high minimum-cltv-expiry-delta, for example. Indeed, there's a bit of a dance to do here but I don't think its wildly complicated.

8513341 purported to remove a lock dependency
between ChannelManager::pending_intercepted_htlcs and ::forward_htlcs. However,
in practice the dependency remained because the locks of the two maps were
still held at the same time in some cases. Here we fix this.
Useful to filter for channel peers that support a specific feature, in this
case the hold_htlc feature, in upcoming commits.
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including the field
in Channel serialization.

See-also BOLTs PR 989
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including a
parameter for it in internal channel send_htlc APIs, including serializing the
field when the parameter is set.

    See-also BOLTs PR 989
Makes an upcoming commit cleaner.

In the util we allow manually passing in the first_hops that should be used
when paying a static invoice, which is useful for only passing in next-hop
channels that support the hold_htlcs feature.
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including the
parameter in the pay_route method.

See-also <lightning/bolts#989>
If the flag is set to true, then if we as an often-offline payer receive a
static invoice to pay, we will attempt to hold the corresponding outbound HTLCs
with our next-hop channel counterparty(s) that support the  feature. This
allows our node to go offline once the HTLCs are locked in even though the
recipient may not yet be online to receive them.
Returns a list of channels where our counterparty supports
InitFeatures::supports_htlc_hold, or an error if there are none or we detect
that we are an announced node. Useful for sending async payments to
StaticInvoices.
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

The prior commits laid groundwork to finally set the flag here in this commit.

See-also BOLTs PR 989
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages such that the reply path to
the message terminates at their always-online channel counterparty that is
holding the HTLC. That way when the recipient responds with release_held_htlc,
the sender's counterparty will receive that message.

Here we lay some groundwork for using a counterparty-created reply path when
sending held_htlc_available as an async sender in the next commit.
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages such that the reply path to
the message terminates at their always-online channel counterparty that is
holding the HTLC. That way when the recipient responds with release_held_htlc,
the sender's counterparty will receive that message.

After laying groundwork over some past commits, here we as an async sender send
held_htlc_available messages using reply paths created by our always-online
channel counterparty.
Now that we support the feature of sending payments as an often-offline sender
to an often-offline recipient, including the sender's LSP-side, we can start
conditionally advertising the feature bit to other nodes on the network.

Also remove an outdated comment, in a previous version of the code we had more
logic to check if we're an always-online node before advertising the feature
bit but it's since been removed.
Previously we were taking the NetworkGraph::{channels,nodes} locks before
ChannelManager::per_peer_state's peer_state locks in some tests, which violated
a lock order requirement we have in ChannelManager to take netgraph locks
*after* peer_state locks. See OffersMessageFlow::path_for_release_htlc which is
called while a peer_state lock is held and takes netgraph locks while creating
blinded paths.
Add testing for sending payments from an often-offline sender to an
often-offline recipient.
@valentinewallace
Copy link
Contributor Author

Rebased due to conflicts that were also causing CI failures.
Diff before rebase: link

Also added another test comment post-rebase

diff --git a/lightning/src/ln/async_payments_tests.rs b/lightning/src/ln/async_payments_tests.rs
index 5a0fedb5f..307510bbe 100644
--- a/lightning/src/ln/async_payments_tests.rs
+++ b/lightning/src/ln/async_payments_tests.rs
@@ -3113,6 +3113,14 @@ fn intercepted_hold_htlc() {
        create_announced_chan_between_nodes_with_value(&nodes, 1, 2, 1_000_000, 0);
        unify_blockheight_across_nodes(&nodes);

+       // Typically, JIT channels are created by an LSP providing the recipient with special "intercept"
+       // scids out-of-band, to be put in the recipient's BOLT 11 invoice route hints. The intercept
+       // scids signal to the LSP to open a JIT channel.
+       //
+       // Below we hardcode blinded payment paths containing intercept scids, to be used in the
+       // recipient's eventual static invoice, because we don't yet support intercept scids in the normal
+       // static invoice server flow.
+
        // We need to be able to predict the offer_nonce in order to hardcode acceptable blinded payment
        // paths containing a JIT channel scid for the recipient below.
        // Without the offer_nonce used below in the `AsyncBolt12OfferContext` matching the eventual offer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants