Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch commitment_signed messages for splicing #3651

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Mar 6, 2025

Once a channel is funded, it may be spliced to add or remove funds. The new funding transaction is pending until confirmed on chain and thus needs to be tracked. Additionally, it may be replaced by another transaction using RBF with a higher fee. Hence, there may be more than one pending FundingScope to track for a splice.

This PR adds support for tracking pending funding scopes and accounting for any pending scopes where applicable (e.g., when handling and sending commitment_signed messages).

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Mar 6, 2025

👋 Thanks for assigning @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@jkczyz jkczyz requested a review from wpaulino March 6, 2025 22:59
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 6, 2025

@wpaulino Just looking for a quick concept ACK. Still needed:

  • serialization of pending_funding, presumably?
  • tests for commitment_signed
  • update get_available_balances

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes sense. We'll need to support sending a commitment_signed for each scope as well.

claimed_htlcs: ref mut update_claimed_htlcs, ..
} = &mut update {
debug_assert!(update_claimed_htlcs.is_empty());
*update_claimed_htlcs = claimed_htlcs.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it'd be nice to not have this duplicated data, but I guess it's pretty small anyway.

Somewhat related, in #3606 we're introducing a new update variant (for the counterparty commitment only, but we'll need to do the same for the holder commitment as well) that only tracks the commitment transaction. I wonder if we can get away with using it for the additional funding scopes as a way to simplify the transition to the new variant, as you wouldn't be allowed to downgrade with a pending spliced channel anyway.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 7, 2025

Yeah this makes sense. We'll need to support sending a commitment_signed for each scope as well.

By this do you mean we'll need msgs::CommitmentUpdate to contain a Vec of msgs::CommitmentSigned messages instead of a single one?

@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 7, 2025

Pushed another commit for get_available_balances.

@wpaulino
Copy link
Contributor

By this do you mean we'll need msgs::CommitmentUpdate to contain a Vec of msgs::CommitmentSigned messages instead of a single one?

Yeah we'll need to go through each case where we owe the counterparty a commitment_signed (except for the initial one sent in dual funding/splicing) and make sure we always consider all scopes.

@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from eac7be9 to 8b4e46a Compare March 11, 2025 22:44
@jkczyz jkczyz changed the title Add pending funding scopes to FundedChannel Batch commitment_signed messages for splicing Mar 11, 2025
// May or may not have a pending splice
Some(batch) => {
self.commitment_signed_batch.push(msg.clone());
if self.commitment_signed_batch.len() < batch.batch_size as usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider the number of scopes available. We shouldn't receive a batch with anything other than that number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be an edge case to consider. I started a discussion on the spec: https://github.com/lightning/bolts/pull/1160/files/8c907f6b8d26fad8ec79ad1fe3078eb92e5285a6#r1990528673

Though if pending_funding is empty, the spec states we should ignore any batched commitment_signed messages that don't match the new funding_txid.

  • If batch is set:
    ...
    • Otherwise (no pending splice transactions):
    • MUST ignore commitment_signed where funding_txid does not match
      the current funding transaction.
    • If commitment_signed is missing for the current funding transaction:
      • MUST send an error and fail the channel.
    • Otherwise:
      • MUST respond with a revoke_and_ack message.

@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 12, 2025

Yeah we'll need to go through each case where we owe the counterparty a commitment_signed (except for the initial one sent in dual funding/splicing) and make sure we always consider all scopes.

Pushed a couple commits that I think accomplish this. Though I'm not sure about the following line from build_commitment_no_status_check:

let (mut htlcs_ref, counterparty_commitment_tx) =
self.build_commitment_no_state_update(&self.funding, logger);

It is called from methods like send_htlc_and_commit but for producing a ChannelMonitorUpdate. IIUC, I'll need to do this for all funding scopes?

@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from c66e554 to 2db5c60 Compare March 12, 2025 22:34
@jkczyz jkczyz marked this pull request as ready for review March 12, 2025 22:35
@jkczyz jkczyz requested a review from wpaulino March 12, 2025 22:35
@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch 2 times, most recently from 3872586 to e371143 Compare March 12, 2025 22:46
@jkczyz jkczyz added the weekly goal Someone wants to land this this week label Mar 12, 2025
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 12, 2025

Rebased on main.

@jkczyz jkczyz requested a review from dunxen March 17, 2025 15:45
@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from e371143 to 0362159 Compare March 18, 2025 23:01
@jkczyz jkczyz requested a review from wpaulino March 18, 2025 23:02
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @dunxen @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 0362159 to 7021e01 Compare March 19, 2025 21:45
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 19, 2025

Responded and addressed a couple lingering comments.

@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch 3 times, most recently from 48bdd52 to 6db1c42 Compare March 21, 2025 14:42
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 21, 2025

Recent pushes should have fixed CI.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @dunxen @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @dunxen @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 6db1c42 to 43cd9b5 Compare March 24, 2025 21:55
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 24, 2025

Squashed per @wpaulino's request.

)?
},
// May or may not have a pending splice
Some(batch) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error if batch_size == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... the spec puts the requirement on the sender but doesn't specify that N pending splices can't be zero. I suppose if there are zero and the sender uses 1, we potentially would use LatestHolderCommitmentTX (once added; see #3651 (comment)) rather than LatestHolderCommitmentTXInfo. Would that be a problem for downgrades?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a problem for downgrades indeed. Maybe we should clarify the spec then, that wording does make it seem like a batch of 1 is allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 25, 2025

Pushed some squashed fixups addressing feedback. Going to rebase to resolve merge conflicts.

jkczyz added 2 commits March 25, 2025 12:23
Once a channel is funded, it may be spliced to add or remove funds. The
new funding transaction is pending until confirmed on chain and thus
needs to be tracked. Additionally, it may be replaced by another
transaction using RBF with a higher fee. Hence, there may be more than
one pending FundingScope to track for a splice.

This commit adds support for tracking pending funding scopes. The
following commits will account for any pending scopes where applicable
(e.g., when handling commitment_signed).
@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 135278e to 84ea044 Compare March 25, 2025 18:56
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 25, 2025

Rebased on main.

@jkczyz jkczyz requested a review from TheBlueMatt March 25, 2025 18:58
@@ -4927,6 +4930,7 @@ pub(super) struct DualFundingChannelContext {
pub(super) struct FundedChannel<SP: Deref> where SP::Target: SignerProvider {
pub funding: FundingScope,
pending_funding: Vec<FundingScope>,
commitment_signed_batch: BTreeMap<Txid, msgs::CommitmentSigned>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear god WHAT? Who thought this was a good idea? 🤦

Maybe we should un-fuck the protocol at the PeerManager level rather than doing it here? Its kinda insane that we have to keep a pending list of messages in a queue just to process them later, obviously the protocol should have sent them as one message, but absent that it seems like something the PeerManager should do - its logically one message with 5 parts on the wire, which seems like something we shouldn't have to care about in channel.rs but rather our message de-framing logic should properly de-frame the N CommitmentSigneds into a CommitmentSignedBatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the reason given for this was the 65K message size limitation.

lightning/bolts#1160 (comment)

I can look into moving the logic to PeerManager.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should add the ability to frame larger messages at the wire. The 64K limit is supposed to be a nice anti-DoS limit by ensuring you never really need a large buffer to store pending messages, but if we're gonna work around it by sending multiple messages which get stored in a pending message queue then the whole point is kinda moot.

If we don't the spec still needs to treat them as a logical message - no other messages should be allowed to come in between, and we need some kind of init/complete message before/after so that the PeerManager can handle it without protocol-specific logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So some concerns:

  • Currently, ChannelManager also checks the message's channel_id before delegating to Channel, so PeerManager would need to track batches by channel, too. Should we batch using PeerState in ChannelManager instead of in PeerManager then?
  • Should we have a "raw" CommitmentSigned type with the optional batch data used for parsing (as it currently is written) and one without the batch data where Channel is given either a single one or BTreeMap for the entire batch? (i.e., never expose the optional batch TLV to Channel -- only infer it from when the method taking BTreeMap is called).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, wrote the last comment before seeing your latest comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ChannelManager also checks the message's channel_id before delegating to Channel, so PeerManager would need to track batches by channel, too. Should we batch using PeerState in ChannelManager instead of in PeerManager then?

Right, kinda drives home my comment, I think, that the spec needs to make this explicit, but if it doesn't, I think that's fine? Batches are (channnel_id, count)-indexed.

Should we have a "raw" CommitmentSigned type with the optional batch data used for parsing (as it currently is written) and one without the batch data where Channel is given either a single one or BTreeMap for the entire batch? (i.e., never expose the optional batch TLV to Channel -- only infer it from when the method taking BTreeMap is called).

I would imagine we'd have to, though there's not really any reason to not just give the Channel a Vec<RawCommitmentSigned> or whatever. The Channel would just ignore the batch field cause it doesn't need it.

(FWIW IMO if we do this we should swap where the btree is - store the funding scopes in the btree and make the message a vec, tho it doesn't really matter that much).

The other question is how much work is it actually to handle them iteratively? I guess we'd have to have similar state as now - some logic to make it so that no messages get processed in between commitment_signeds and then a flag on each funding state to track whether it has validated the new commit-sig or not and then some way to check them all when we're done. Its pretty gross and I'd kinda prefer to handle it all in one go, but if we do end up with in-Channel state maybe that's cleaner?

The only reason to do that, I think, would be worry over the amount of memory pending-commit-sigs can use up (~50K * in-flight splice count, afaict, which we'd have to set some hard limit on in PeerManager), especially if we do it for any connected peer. I don't think I'm worried about something like 500K-1M for an inbound buffer, but others may feel differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, kinda drives home my comment, I think, that the spec needs to make this explicit, but if it doesn't, I think that's fine? Batches are (channnel_id, count)-indexed.

Why not just channel_id? There's edge cases to consider around what you would do if a peer misbehaved and sent a different batch_size in each message. Here we could naively add another entry to the map (never accumulating enough message if batch_size increases each time), whereas in the current approach we'd simply append the item to the cache and could hardcode a limit?

I would imagine we'd have to, though there's not really any reason to not just give the Channel a Vec<RawCommitmentSigned> or whatever. The Channel would just ignore the batch field cause it doesn't need it.

If we don't care about Channel ignoring the batch field, then there doesn't seem to be much reason to introduce another message, IMO. Maybe to avoid duplicating the funding_txid if storing in a map?

(FWIW IMO if we do this we should swap where the btree is - store the funding scopes in the btree and make the message a vec, tho it doesn't really matter that much).

Not sure if that is better. We need to handle the edge case where we receive a batch after the splice is no longer pending. There we want to ignore any commitment_signed messages that don't match the funding_txid. So we'd rather do one look-up than N.

The other question is how much work is it actually to handle them iteratively? I guess we'd have to have similar state as now - some logic to make it so that no messages get processed in between commitment_signeds and then a flag on each funding state to track whether it has validated the new commit-sig or not and then some way to check them all when we're done. Its pretty gross and I'd kinda prefer to handle it all in one go, but if we do end up with in-Channel state maybe that's cleaner?

In PeerManager? It wouldn't have access to funding state, IIUC. Not quite sure I follow what you're proposing here. By "iteratively" do you mean compared to batching into one message for Channel?

FWIW, the current approach seems much simpler. And if you squint hard enough each message cached looks like a channel state transition. At one point, I thought about validating the commitment_signed when received to avoid cloning the message, storing a LatestHolderCommitmentTXInfo to make into a ChannelMonitorUpdateStep once all messages in the batch where received. But it seemed it could be DoS-y?

The only reason to do that, I think, would be worry over the amount of memory pending-commit-sigs can use up (~50K * in-flight splice count, afaict, which we'd have to set some hard limit on in PeerManager), especially if we do it for any connected peer. I don't think I'm worried about something like 500K-1M for an inbound buffer, but others may feel differently.

Not sure I understand. What's the worry compared to the alternative(s)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just channel_id? There's edge cases to consider around what you would do if a peer misbehaved and sent a different batch_size in each message. Here we could naively add another entry to the map (never accumulating enough message if batch_size increases each time), whereas in the current approach we'd simply append the item to the cache and could hardcode a limit?

Sorry, I misspoke, I meant indexed by channel-id but also storing count.

If we don't care about Channel ignoring the batch field, then there doesn't seem to be much reason to introduce another message, IMO. Maybe to avoid duplicating the funding_txid if storing in a map?

Just because we should have the complexity of re-building a multi-part message in PeerManager where we can stomach that complexity, rather than in Channel where we have to go add logic in every message handler to make sure we aren't getting a random other message in between a commit-sig message stream.

Not sure if that is better. We need to handle the edge case where we receive a batch after the splice is no longer pending. There we want to ignore any commitment_signed messages that don't match the funding_txid. So we'd rather do one look-up than N.

If we're iterating the list of commit-sigs then we still just do one lookup per commit-sig (which I guess might be one more than per commit-sig-we-want, but I'm not sure it matters?). But, yea, I mean it doesn't really matter honestly.

In PeerManager? It wouldn't have access to funding state, IIUC. Not quite sure I follow what you're proposing here. By "iteratively" do you mean compared to batching into one message for Channel?

No I meant doing it in Channel iteratively, like this v

FWIW, the current approach seems much simpler. And if you squint hard enough each message cached looks like a channel state transition. At one point, I thought about validating the commitment_signed when received to avoid cloning the message, storing a LatestHolderCommitmentTXInfo to make into a ChannelMonitorUpdateStep once all messages in the batch where received. But it seemed it could be DoS-y?

Not sure its any more DoS-y than the current approach? I guess a ChannelMonitorUpdateStep is a bit bigger so its nice to avoid, but mostly it just seems like a lot of complexity to shove in Channel, we have so many different in-flight state things going on and adding another one just adds to the already-jumbo cognitive load trying to figure out which states we might be in at any point.

Not sure I understand. What's the worry compared to the alternative(s)?

I wasn't thinking that you'd have to store the ChannelMonitorUpdateSteps so ignore that, we're just as DoS-y either way :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the batching logic based on some offline discussion, in case you want an early look. Still need to re-organized how that is consumed in the channel state machine.

core::iter::once(funding)
.chain(pending_funding.iter())
.map(|funding| self.get_available_balances_for_scope(funding, fee_estimator))
.min_by_key(|balances| balances.next_outbound_htlc_limit_msat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct in the case where our counterparty spliced-out some funds - our next_outbound_htlc_limit_msat might be the same across two splices but inbound_capacity_msat is lower on one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. are we able to pick one as the AvailableBalance? Or do we need to merge them somehow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM we should always report the lowest available balance for each type of balance (maybe we can do that more cleverly, though?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... actually, won't we need the maximum for next_outbound_htlc_minimum_msat and the minimum for the remaining fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I pushed a change to reduce all AvailableBalances into one instead of selecting one.

jkczyz added 5 commits March 26, 2025 12:03
A FundedChannel may have more than one pending FundingScope during
splicing, one for the splice attempt and one or more for any RBF
attempts. The counterparty will send a commitment_signed message for
each pending splice transaction and the current funding transaction.

Defer handling these commitment_signed messages until the entire batch
has arrived. Then validate them individually, also checking if all the
pending splice transactions and the current funding transaction have a
corresponding commitment_signed in the batch.
A FundedChannel may have more than one pending FundingScope during
splicing, one for the splice attempt and one or more for any RBF
attempts. When this is the case, send a commitment_signed message for
each FundingScope and include the necessary batch information (i.e.,
batch_size and funding_txid) to the counterparty.
A FundedChannel may have more than one pending FundingScope during
splicing, one for the splice attempt and one or more for any RBF
attempts. When calling get_available_balances, consider all funding
scopes and take the minimum by next_outbound_htlc_limit_msat. This is
used both informationally and to determine which channel to use to
forward an HTLC.

The choice of next_outbound_htlc_limit_msat is somewhat arbitrary but
matches the field used when determining which channel used to forward an
HTLC. Any field should do since each field should be adjusted by the
same amount relative to another FundingScope given the nature of the
fields (i.e., inbound/outbound capacity, min/max HTLC limit).

Using the minimum was chosen since an order for an HTLC to be sent over
the channel, it must be possible for each funding scope -- both the
confirmed one and any pending scopes, one of which may eventually
confirm.
@jkczyz jkczyz force-pushed the 2025-03-multiple-funding-scopes branch from 84ea044 to 4caac9c Compare March 26, 2025 20:14
@jkczyz jkczyz requested review from TheBlueMatt and wpaulino March 26, 2025 20:55
During splicing, commitment_signed messages need to be collected into a
single batch before they are handled. Rather than including this as part
of the channel state machine logic, batch when reading messages from the
wire since they can be considered one logical message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants