Batch `commitment_signed` messages for splicing #3651

jkczyz · 2025-03-06T22:57:28Z

Once a channel is funded, it may be spliced to add or remove funds. The new funding transaction is pending until confirmed on chain and thus needs to be tracked. Additionally, it may be replaced by another transaction using RBF with a higher fee. Hence, there may be more than one pending FundingScope to track for a splice.

This PR adds support for tracking pending funding scopes and accounting for any pending scopes where applicable (e.g., when handling and sending commitment_signed messages).

ldk-reviews-bot · 2025-03-06T22:57:31Z

👋 Thanks for assigning @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

jkczyz · 2025-03-06T22:59:04Z

@wpaulino Just looking for a quick concept ACK. Still needed:

serialization of pending_funding, presumably?
tests for commitment_signed
update get_available_balances

wpaulino

Yeah this makes sense. We'll need to support sending a commitment_signed for each scope as well.

lightning/src/ln/channel.rs

wpaulino · 2025-03-07T19:01:46Z

lightning/src/ln/channel.rs

+				claimed_htlcs: ref mut update_claimed_htlcs, ..
+			} = &mut update {
+				debug_assert!(update_claimed_htlcs.is_empty());
+				*update_claimed_htlcs = claimed_htlcs.clone();


Hm, it'd be nice to not have this duplicated data, but I guess it's pretty small anyway.

Somewhat related, in #3606 we're introducing a new update variant (for the counterparty commitment only, but we'll need to do the same for the holder commitment as well) that only tracks the commitment transaction. I wonder if we can get away with using it for the additional funding scopes as a way to simplify the transition to the new variant, as you wouldn't be allowed to downgrade with a pending spliced channel anyway.

ldk-reviews-bot · 2025-03-07T19:03:15Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

jkczyz · 2025-03-07T23:16:52Z

Yeah this makes sense. We'll need to support sending a commitment_signed for each scope as well.

By this do you mean we'll need msgs::CommitmentUpdate to contain a Vec of msgs::CommitmentSigned messages instead of a single one?

jkczyz · 2025-03-07T23:23:17Z

Pushed another commit for get_available_balances.

wpaulino · 2025-03-10T17:13:50Z

By this do you mean we'll need msgs::CommitmentUpdate to contain a Vec of msgs::CommitmentSigned messages instead of a single one?

Yeah we'll need to go through each case where we owe the counterparty a commitment_signed (except for the initial one sent in dual funding/splicing) and make sure we always consider all scopes.

wpaulino · 2025-03-12T00:28:29Z

lightning/src/ln/channel.rs

+			// May or may not have a pending splice
+			Some(batch) => {
+				self.commitment_signed_batch.push(msg.clone());
+				if self.commitment_signed_batch.len() < batch.batch_size as usize {


We should also consider the number of scopes available. We shouldn't receive a batch with anything other than that number.

There may be an edge case to consider. I started a discussion on the spec: https://github.com/lightning/bolts/pull/1160/files/8c907f6b8d26fad8ec79ad1fe3078eb92e5285a6#r1990528673

Though if pending_funding is empty, the spec states we should ignore any batched commitment_signed messages that don't match the new funding_txid.

If batch is set:
...

Otherwise (no pending splice transactions):

MUST ignore commitment_signed where funding_txid does not match
the current funding transaction.

If commitment_signed is missing for the current funding transaction:

MUST send an error and fail the channel.

Otherwise:

MUST respond with a revoke_and_ack message.

lightning/src/ln/channel.rs

jkczyz · 2025-03-12T18:24:46Z

Yeah we'll need to go through each case where we owe the counterparty a commitment_signed (except for the initial one sent in dual funding/splicing) and make sure we always consider all scopes.

Pushed a couple commits that I think accomplish this. Though I'm not sure about the following line from build_commitment_no_status_check:

rust-lightning/lightning/src/ln/channel.rs

Lines 8621 to 8622 in c66e554

    
           let (mut htlcs_ref, counterparty_commitment_tx) = 
        
           	self.build_commitment_no_state_update(&self.funding, logger);

It is called from methods like send_htlc_and_commit but for producing a ChannelMonitorUpdate. IIUC, I'll need to do this for all funding scopes?

lightning/src/ln/channel.rs

jkczyz · 2025-03-12T22:47:25Z

Rebased on main.

lightning/src/ln/channel.rs

ldk-reviews-bot · 2025-03-19T15:45:27Z

🔔 1st Reminder

Hey @dunxen @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

jkczyz · 2025-03-19T21:48:16Z

Responded and addressed a couple lingering comments.

jkczyz · 2025-03-21T17:04:09Z

Recent pushes should have fixed CI.

ldk-reviews-bot · 2025-03-22T00:00:50Z

🔔 2nd Reminder

Hey @dunxen @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-03-24T00:01:14Z

🔔 3rd Reminder

Hey @dunxen @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

jkczyz · 2025-03-24T21:56:41Z

Squashed per @wpaulino's request.

lightning/src/ln/channel.rs

lightning/src/sign/mod.rs

lightning/src/ln/channel.rs

wpaulino · 2025-03-24T22:56:06Z

lightning/src/ln/channel.rs

+					)?
+			},
+			// May or may not have a pending splice
+			Some(batch) => {


Error if batch_size == 1?

Hmm... the spec puts the requirement on the sender but doesn't specify that N pending splices can't be zero. I suppose if there are zero and the sender uses 1, we potentially would use LatestHolderCommitmentTX (once added; see #3651 (comment)) rather than LatestHolderCommitmentTXInfo. Would that be a problem for downgrades?

That would be a problem for downgrades indeed. Maybe we should clarify the spec then, that wording does make it seem like a batch of 1 is allowed.

Added a comment to the spec: https://github.com/lightning/bolts/pull/1160/files#r2014581638

jkczyz · 2025-03-25T17:19:00Z

Pushed some squashed fixups addressing feedback. Going to rebase to resolve merge conflicts.

Once a channel is funded, it may be spliced to add or remove funds. The new funding transaction is pending until confirmed on chain and thus needs to be tracked. Additionally, it may be replaced by another transaction using RBF with a higher fee. Hence, there may be more than one pending FundingScope to track for a splice. This commit adds support for tracking pending funding scopes. The following commits will account for any pending scopes where applicable (e.g., when handling commitment_signed).

jkczyz · 2025-03-25T18:57:00Z

Rebased on main.

TheBlueMatt · 2025-03-25T19:19:39Z

lightning/src/ln/channel.rs

@@ -4927,6 +4930,7 @@ pub(super) struct DualFundingChannelContext {
 pub(super) struct FundedChannel<SP: Deref> where SP::Target: SignerProvider {
 	pub funding: FundingScope,
 	pending_funding: Vec<FundingScope>,
+	commitment_signed_batch: BTreeMap<Txid, msgs::CommitmentSigned>,


Dear god WHAT? Who thought this was a good idea? 🤦

Maybe we should un-fuck the protocol at the PeerManager level rather than doing it here? Its kinda insane that we have to keep a pending list of messages in a queue just to process them later, obviously the protocol should have sent them as one message, but absent that it seems like something the PeerManager should do - its logically one message with 5 parts on the wire, which seems like something we shouldn't have to care about in channel.rs but rather our message de-framing logic should properly de-frame the N CommitmentSigneds into a CommitmentSignedBatch.

FWIW, the reason given for this was the 65K message size limitation.

lightning/bolts#1160 (comment)

I can look into moving the logic to PeerManager.

Then we should add the ability to frame larger messages at the wire. The 64K limit is supposed to be a nice anti-DoS limit by ensuring you never really need a large buffer to store pending messages, but if we're gonna work around it by sending multiple messages which get stored in a pending message queue then the whole point is kinda moot.

If we don't the spec still needs to treat them as a logical message - no other messages should be allowed to come in between, and we need some kind of init/complete message before/after so that the PeerManager can handle it without protocol-specific logic.

So some concerns:

Currently, ChannelManager also checks the message's channel_id before delegating to Channel, so PeerManager would need to track batches by channel, too. Should we batch using PeerState in ChannelManager instead of in PeerManager then?

Should we have a "raw" CommitmentSigned type with the optional batch data used for parsing (as it currently is written) and one without the batch data where Channel is given either a single one or BTreeMap for the entire batch? (i.e., never expose the optional batch TLV to Channel -- only infer it from when the method taking BTreeMap is called).

FYI, wrote the last comment before seeing your latest comment.

@TheBlueMatt Thoughts on #3651 (comment)?

Currently, ChannelManager also checks the message's channel_id before delegating to Channel, so PeerManager would need to track batches by channel, too. Should we batch using PeerState in ChannelManager instead of in PeerManager then?

Right, kinda drives home my comment, I think, that the spec needs to make this explicit, but if it doesn't, I think that's fine? Batches are (channnel_id, count)-indexed.

Should we have a "raw" CommitmentSigned type with the optional batch data used for parsing (as it currently is written) and one without the batch data where Channel is given either a single one or BTreeMap for the entire batch? (i.e., never expose the optional batch TLV to Channel -- only infer it from when the method taking BTreeMap is called).

I would imagine we'd have to, though there's not really any reason to not just give the Channel a Vec<RawCommitmentSigned> or whatever. The Channel would just ignore the batch field cause it doesn't need it.

(FWIW IMO if we do this we should swap where the btree is - store the funding scopes in the btree and make the message a vec, tho it doesn't really matter that much).

The other question is how much work is it actually to handle them iteratively? I guess we'd have to have similar state as now - some logic to make it so that no messages get processed in between commitment_signeds and then a flag on each funding state to track whether it has validated the new commit-sig or not and then some way to check them all when we're done. Its pretty gross and I'd kinda prefer to handle it all in one go, but if we do end up with in-Channel state maybe that's cleaner?

The only reason to do that, I think, would be worry over the amount of memory pending-commit-sigs can use up (~50K * in-flight splice count, afaict, which we'd have to set some hard limit on in PeerManager), especially if we do it for any connected peer. I don't think I'm worried about something like 500K-1M for an inbound buffer, but others may feel differently.

Right, kinda drives home my comment, I think, that the spec needs to make this explicit, but if it doesn't, I think that's fine? Batches are (channnel_id, count)-indexed.

Why not just channel_id? There's edge cases to consider around what you would do if a peer misbehaved and sent a different batch_size in each message. Here we could naively add another entry to the map (never accumulating enough message if batch_size increases each time), whereas in the current approach we'd simply append the item to the cache and could hardcode a limit?

I would imagine we'd have to, though there's not really any reason to not just give the Channel a Vec<RawCommitmentSigned> or whatever. The Channel would just ignore the batch field cause it doesn't need it.

If we don't care about Channel ignoring the batch field, then there doesn't seem to be much reason to introduce another message, IMO. Maybe to avoid duplicating the funding_txid if storing in a map?

(FWIW IMO if we do this we should swap where the btree is - store the funding scopes in the btree and make the message a vec, tho it doesn't really matter that much).

Not sure if that is better. We need to handle the edge case where we receive a batch after the splice is no longer pending. There we want to ignore any commitment_signed messages that don't match the funding_txid. So we'd rather do one look-up than N.

The other question is how much work is it actually to handle them iteratively? I guess we'd have to have similar state as now - some logic to make it so that no messages get processed in between commitment_signeds and then a flag on each funding state to track whether it has validated the new commit-sig or not and then some way to check them all when we're done. Its pretty gross and I'd kinda prefer to handle it all in one go, but if we do end up with in-Channel state maybe that's cleaner?

In PeerManager? It wouldn't have access to funding state, IIUC. Not quite sure I follow what you're proposing here. By "iteratively" do you mean compared to batching into one message for Channel?

FWIW, the current approach seems much simpler. And if you squint hard enough each message cached looks like a channel state transition. At one point, I thought about validating the commitment_signed when received to avoid cloning the message, storing a LatestHolderCommitmentTXInfo to make into a ChannelMonitorUpdateStep once all messages in the batch where received. But it seemed it could be DoS-y?

The only reason to do that, I think, would be worry over the amount of memory pending-commit-sigs can use up (~50K * in-flight splice count, afaict, which we'd have to set some hard limit on in PeerManager), especially if we do it for any connected peer. I don't think I'm worried about something like 500K-1M for an inbound buffer, but others may feel differently.

Not sure I understand. What's the worry compared to the alternative(s)?

Why not just channel_id? There's edge cases to consider around what you would do if a peer misbehaved and sent a different batch_size in each message. Here we could naively add another entry to the map (never accumulating enough message if batch_size increases each time), whereas in the current approach we'd simply append the item to the cache and could hardcode a limit?

Sorry, I misspoke, I meant indexed by channel-id but also storing count.

If we don't care about Channel ignoring the batch field, then there doesn't seem to be much reason to introduce another message, IMO. Maybe to avoid duplicating the funding_txid if storing in a map?

Just because we should have the complexity of re-building a multi-part message in PeerManager where we can stomach that complexity, rather than in Channel where we have to go add logic in every message handler to make sure we aren't getting a random other message in between a commit-sig message stream.

Not sure if that is better. We need to handle the edge case where we receive a batch after the splice is no longer pending. There we want to ignore any commitment_signed messages that don't match the funding_txid. So we'd rather do one look-up than N.

If we're iterating the list of commit-sigs then we still just do one lookup per commit-sig (which I guess might be one more than per commit-sig-we-want, but I'm not sure it matters?). But, yea, I mean it doesn't really matter honestly.

In PeerManager? It wouldn't have access to funding state, IIUC. Not quite sure I follow what you're proposing here. By "iteratively" do you mean compared to batching into one message for Channel?

No I meant doing it in Channel iteratively, like this v

FWIW, the current approach seems much simpler. And if you squint hard enough each message cached looks like a channel state transition. At one point, I thought about validating the commitment_signed when received to avoid cloning the message, storing a LatestHolderCommitmentTXInfo to make into a ChannelMonitorUpdateStep once all messages in the batch where received. But it seemed it could be DoS-y?

Not sure its any more DoS-y than the current approach? I guess a ChannelMonitorUpdateStep is a bit bigger so its nice to avoid, but mostly it just seems like a lot of complexity to shove in Channel, we have so many different in-flight state things going on and adding another one just adds to the already-jumbo cognitive load trying to figure out which states we might be in at any point.

Not sure I understand. What's the worry compared to the alternative(s)?

I wasn't thinking that you'd have to store the ChannelMonitorUpdateSteps so ignore that, we're just as DoS-y either way :(

Pushed the batching logic based on some offline discussion, in case you want an early look. Still need to re-organized how that is consumed in the channel state machine.

TheBlueMatt · 2025-03-25T19:40:21Z

lightning/src/ln/channel.rs

+		core::iter::once(funding)
+			.chain(pending_funding.iter())
+			.map(|funding| self.get_available_balances_for_scope(funding, fee_estimator))
+			.min_by_key(|balances| balances.next_outbound_htlc_limit_msat)


I don't think this is correct in the case where our counterparty spliced-out some funds - our next_outbound_htlc_limit_msat might be the same across two splices but inbound_capacity_msat is lower on one.

Hmmm.. are we able to pick one as the AvailableBalance? Or do we need to merge them somehow?

ISTM we should always report the lowest available balance for each type of balance (maybe we can do that more cleverly, though?)

Hmmm... actually, won't we need the maximum for next_outbound_htlc_minimum_msat and the minimum for the remaining fields?

FYI, I pushed a change to reduce all AvailableBalances into one instead of selecting one.

A FundedChannel may have more than one pending FundingScope during splicing, one for the splice attempt and one or more for any RBF attempts. The counterparty will send a commitment_signed message for each pending splice transaction and the current funding transaction. Defer handling these commitment_signed messages until the entire batch has arrived. Then validate them individually, also checking if all the pending splice transactions and the current funding transaction have a corresponding commitment_signed in the batch.

A FundedChannel may have more than one pending FundingScope during splicing, one for the splice attempt and one or more for any RBF attempts. When this is the case, send a commitment_signed message for each FundingScope and include the necessary batch information (i.e., batch_size and funding_txid) to the counterparty.

A FundedChannel may have more than one pending FundingScope during splicing, one for the splice attempt and one or more for any RBF attempts. When calling get_available_balances, consider all funding scopes and take the minimum by next_outbound_htlc_limit_msat. This is used both informationally and to determine which channel to use to forward an HTLC. The choice of next_outbound_htlc_limit_msat is somewhat arbitrary but matches the field used when determining which channel used to forward an HTLC. Any field should do since each field should be adjusted by the same amount relative to another FundingScope given the nature of the fields (i.e., inbound/outbound capacity, min/max HTLC limit). Using the minimum was chosen since an order for an HTLC to be sent over the channel, it must be possible for each funding scope -- both the confirmed one and any pending scopes, one of which may eventually confirm.

During splicing, commitment_signed messages need to be collected into a single batch before they are handled. Rather than including this as part of the channel state machine logic, batch when reading messages from the wire since they can be considered one logical message.

jkczyz requested a review from wpaulino March 6, 2025 22:59

wpaulino reviewed Mar 7, 2025

View reviewed changes

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from eac7be9 to 8b4e46a Compare March 11, 2025 22:44

jkczyz changed the title ~~Add pending funding scopes to FundedChannel~~ Batch commitment_signed messages for splicing Mar 11, 2025

wpaulino reviewed Mar 12, 2025

View reviewed changes

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from c66e554 to 2db5c60 Compare March 12, 2025 22:34

jkczyz marked this pull request as ready for review March 12, 2025 22:35

jkczyz requested a review from wpaulino March 12, 2025 22:35

jkczyz commented Mar 12, 2025

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch 2 times, most recently from 3872586 to e371143 Compare March 12, 2025 22:46

jkczyz added the weekly goal Someone wants to land this this week label Mar 12, 2025

wpaulino reviewed Mar 13, 2025

View reviewed changes

jkczyz requested a review from dunxen March 17, 2025 15:45

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from e371143 to 0362159 Compare March 18, 2025 23:01

jkczyz requested a review from wpaulino March 18, 2025 23:02

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 0362159 to 7021e01 Compare March 19, 2025 21:45

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch 3 times, most recently from 48bdd52 to 6db1c42 Compare March 21, 2025 14:42

jkczyz mentioned this pull request Mar 21, 2025

Remove redundant ChannelContext::channel_type #3678

Open

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 6db1c42 to 43cd9b5 Compare March 24, 2025 21:55

wpaulino reviewed Mar 24, 2025

View reviewed changes

jkczyz mentioned this pull request Mar 25, 2025

Introduce interactive signing state flags for funded states. #3637

Open

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 43cd9b5 to 135278e Compare March 25, 2025 17:17

jkczyz added 2 commits March 25, 2025 12:23

Remove unnecessary drain

ba60b04

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 135278e to 84ea044 Compare March 25, 2025 18:56

jkczyz requested a review from TheBlueMatt March 25, 2025 18:58

TheBlueMatt reviewed Mar 25, 2025

View reviewed changes

jkczyz added 5 commits March 26, 2025 12:03

f - remove CommitmentTxInfoCached persistence

e8e626d

f - merge available balances

4caac9c

jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 84ea044 to 4caac9c Compare March 26, 2025 20:14

jkczyz requested review from TheBlueMatt and wpaulino March 26, 2025 20:55

Batch commitment_signed messages for splicing #3651

Are you sure you want to change the base?

Batch commitment_signed messages for splicing #3651

Conversation

jkczyz commented Mar 6, 2025 • edited Loading

ldk-reviews-bot commented Mar 6, 2025 • edited Loading

jkczyz commented Mar 6, 2025

wpaulino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ldk-reviews-bot commented Mar 7, 2025

jkczyz commented Mar 7, 2025

jkczyz commented Mar 7, 2025

wpaulino commented Mar 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz commented Mar 12, 2025

jkczyz commented Mar 12, 2025

ldk-reviews-bot commented Mar 19, 2025

jkczyz commented Mar 19, 2025

jkczyz commented Mar 21, 2025

ldk-reviews-bot commented Mar 22, 2025

ldk-reviews-bot commented Mar 24, 2025

jkczyz commented Mar 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz commented Mar 25, 2025

jkczyz commented Mar 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Batch `commitment_signed` messages for splicing #3651

Batch `commitment_signed` messages for splicing #3651

jkczyz commented Mar 6, 2025 •

edited

Loading

ldk-reviews-bot commented Mar 6, 2025 •

edited

Loading