-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch commitment_signed
messages for splicing
#3651
base: main
Are you sure you want to change the base?
Batch commitment_signed
messages for splicing
#3651
Conversation
👋 Thanks for assigning @wpaulino as a reviewer! |
@wpaulino Just looking for a quick concept ACK. Still needed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes sense. We'll need to support sending a commitment_signed
for each scope as well.
claimed_htlcs: ref mut update_claimed_htlcs, .. | ||
} = &mut update { | ||
debug_assert!(update_claimed_htlcs.is_empty()); | ||
*update_claimed_htlcs = claimed_htlcs.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it'd be nice to not have this duplicated data, but I guess it's pretty small anyway.
Somewhat related, in #3606 we're introducing a new update variant (for the counterparty commitment only, but we'll need to do the same for the holder commitment as well) that only tracks the commitment transaction. I wonder if we can get away with using it for the additional funding scopes as a way to simplify the transition to the new variant, as you wouldn't be allowed to downgrade with a pending spliced channel anyway.
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
By this do you mean we'll need |
Pushed another commit for |
Yeah we'll need to go through each case where we owe the counterparty a |
eac7be9
to
8b4e46a
Compare
FundedChannel
commitment_signed
messages for splicing
// May or may not have a pending splice | ||
Some(batch) => { | ||
self.commitment_signed_batch.push(msg.clone()); | ||
if self.commitment_signed_batch.len() < batch.batch_size as usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also consider the number of scopes available. We shouldn't receive a batch with anything other than that number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be an edge case to consider. I started a discussion on the spec: https://github.com/lightning/bolts/pull/1160/files/8c907f6b8d26fad8ec79ad1fe3078eb92e5285a6#r1990528673
Though if pending_funding
is empty, the spec states we should ignore any batched commitment_signed
messages that don't match the new funding_txid
.
- If
batch
is set:
...
- Otherwise (no pending splice transactions):
- MUST ignore
commitment_signed
wherefunding_txid
does not match
the current funding transaction.- If
commitment_signed
is missing for the current funding transaction:
- MUST send an
error
and fail the channel.- Otherwise:
- MUST respond with a
revoke_and_ack
message.
Pushed a couple commits that I think accomplish this. Though I'm not sure about the following line from rust-lightning/lightning/src/ln/channel.rs Lines 8621 to 8622 in c66e554
It is called from methods like |
c66e554
to
2db5c60
Compare
3872586
to
e371143
Compare
Rebased on main. |
e371143
to
0362159
Compare
0362159
to
7021e01
Compare
Responded and addressed a couple lingering comments. |
48bdd52
to
6db1c42
Compare
Recent pushes should have fixed CI. |
6db1c42
to
43cd9b5
Compare
Squashed per @wpaulino's request. |
)? | ||
}, | ||
// May or may not have a pending splice | ||
Some(batch) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error if batch_size == 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... the spec puts the requirement on the sender but doesn't specify that N
pending splices can't be zero. I suppose if there are zero and the sender uses 1
, we potentially would use LatestHolderCommitmentTX
(once added; see #3651 (comment)) rather than LatestHolderCommitmentTXInfo
. Would that be a problem for downgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a problem for downgrades indeed. Maybe we should clarify the spec then, that wording does make it seem like a batch of 1 is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to the spec: https://github.com/lightning/bolts/pull/1160/files#r2014581638
43cd9b5
to
135278e
Compare
Pushed some squashed fixups addressing feedback. Going to rebase to resolve merge conflicts. |
Once a channel is funded, it may be spliced to add or remove funds. The new funding transaction is pending until confirmed on chain and thus needs to be tracked. Additionally, it may be replaced by another transaction using RBF with a higher fee. Hence, there may be more than one pending FundingScope to track for a splice. This commit adds support for tracking pending funding scopes. The following commits will account for any pending scopes where applicable (e.g., when handling commitment_signed).
135278e
to
84ea044
Compare
Rebased on main. |
@@ -4927,6 +4930,7 @@ pub(super) struct DualFundingChannelContext { | |||
pub(super) struct FundedChannel<SP: Deref> where SP::Target: SignerProvider { | |||
pub funding: FundingScope, | |||
pending_funding: Vec<FundingScope>, | |||
commitment_signed_batch: BTreeMap<Txid, msgs::CommitmentSigned>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear god WHAT? Who thought this was a good idea? 🤦
Maybe we should un-fuck the protocol at the PeerManager
level rather than doing it here? Its kinda insane that we have to keep a pending list of messages in a queue just to process them later, obviously the protocol should have sent them as one message, but absent that it seems like something the PeerManager
should do - its logically one message with 5 parts on the wire, which seems like something we shouldn't have to care about in channel.rs
but rather our message de-framing logic should properly de-frame the N CommitmentSigned
s into a CommitmentSignedBatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the reason given for this was the 65K message size limitation.
lightning/bolts#1160 (comment)
I can look into moving the logic to PeerManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should add the ability to frame larger messages at the wire. The 64K limit is supposed to be a nice anti-DoS limit by ensuring you never really need a large buffer to store pending messages, but if we're gonna work around it by sending multiple messages which get stored in a pending message queue then the whole point is kinda moot.
If we don't the spec still needs to treat them as a logical message - no other messages should be allowed to come in between, and we need some kind of init/complete message before/after so that the PeerManager
can handle it without protocol-specific logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So some concerns:
- Currently,
ChannelManager
also checks the message'schannel_id
before delegating toChannel
, soPeerManager
would need to track batches by channel, too. Should we batch usingPeerState
inChannelManager
instead of inPeerManager
then? - Should we have a "raw"
CommitmentSigned
type with the optional batch data used for parsing (as it currently is written) and one without the batch data whereChannel
is given either a single one orBTreeMap
for the entire batch? (i.e., never expose the optionalbatch
TLV toChannel
-- only infer it from when the method takingBTreeMap
is called).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, wrote the last comment before seeing your latest comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheBlueMatt Thoughts on #3651 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, ChannelManager also checks the message's channel_id before delegating to Channel, so PeerManager would need to track batches by channel, too. Should we batch using PeerState in ChannelManager instead of in PeerManager then?
Right, kinda drives home my comment, I think, that the spec needs to make this explicit, but if it doesn't, I think that's fine? Batches are (channnel_id, count)-indexed.
Should we have a "raw" CommitmentSigned type with the optional batch data used for parsing (as it currently is written) and one without the batch data where Channel is given either a single one or BTreeMap for the entire batch? (i.e., never expose the optional batch TLV to Channel -- only infer it from when the method taking BTreeMap is called).
I would imagine we'd have to, though there's not really any reason to not just give the Channel
a Vec<RawCommitmentSigned>
or whatever. The Channel
would just ignore the batch field cause it doesn't need it.
(FWIW IMO if we do this we should swap where the btree is - store the funding scopes in the btree and make the message a vec, tho it doesn't really matter that much).
The other question is how much work is it actually to handle them iteratively? I guess we'd have to have similar state as now - some logic to make it so that no messages get processed in between commitment_signeds and then a flag on each funding state to track whether it has validated the new commit-sig or not and then some way to check them all when we're done. Its pretty gross and I'd kinda prefer to handle it all in one go, but if we do end up with in-Channel
state maybe that's cleaner?
The only reason to do that, I think, would be worry over the amount of memory pending-commit-sigs can use up (~50K * in-flight splice count, afaict, which we'd have to set some hard limit on in PeerManager
), especially if we do it for any connected peer. I don't think I'm worried about something like 500K-1M for an inbound buffer, but others may feel differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, kinda drives home my comment, I think, that the spec needs to make this explicit, but if it doesn't, I think that's fine? Batches are (channnel_id, count)-indexed.
Why not just channel_id
? There's edge cases to consider around what you would do if a peer misbehaved and sent a different batch_size
in each message. Here we could naively add another entry to the map (never accumulating enough message if batch_size
increases each time), whereas in the current approach we'd simply append the item to the cache and could hardcode a limit?
I would imagine we'd have to, though there's not really any reason to not just give the
Channel
aVec<RawCommitmentSigned>
or whatever. TheChannel
would just ignore the batch field cause it doesn't need it.
If we don't care about Channel
ignoring the batch field, then there doesn't seem to be much reason to introduce another message, IMO. Maybe to avoid duplicating the funding_txid
if storing in a map?
(FWIW IMO if we do this we should swap where the btree is - store the funding scopes in the btree and make the message a vec, tho it doesn't really matter that much).
Not sure if that is better. We need to handle the edge case where we receive a batch after the splice is no longer pending. There we want to ignore any commitment_signed
messages that don't match the funding_txid
. So we'd rather do one look-up than N.
The other question is how much work is it actually to handle them iteratively? I guess we'd have to have similar state as now - some logic to make it so that no messages get processed in between commitment_signeds and then a flag on each funding state to track whether it has validated the new commit-sig or not and then some way to check them all when we're done. Its pretty gross and I'd kinda prefer to handle it all in one go, but if we do end up with in-
Channel
state maybe that's cleaner?
In PeerManager
? It wouldn't have access to funding state, IIUC. Not quite sure I follow what you're proposing here. By "iteratively" do you mean compared to batching into one message for Channel
?
FWIW, the current approach seems much simpler. And if you squint hard enough each message cached looks like a channel state transition. At one point, I thought about validating the commitment_signed
when received to avoid cloning the message, storing a LatestHolderCommitmentTXInfo
to make into a ChannelMonitorUpdateStep
once all messages in the batch where received. But it seemed it could be DoS-y?
The only reason to do that, I think, would be worry over the amount of memory pending-commit-sigs can use up (~50K * in-flight splice count, afaict, which we'd have to set some hard limit on in
PeerManager
), especially if we do it for any connected peer. I don't think I'm worried about something like 500K-1M for an inbound buffer, but others may feel differently.
Not sure I understand. What's the worry compared to the alternative(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just channel_id? There's edge cases to consider around what you would do if a peer misbehaved and sent a different batch_size in each message. Here we could naively add another entry to the map (never accumulating enough message if batch_size increases each time), whereas in the current approach we'd simply append the item to the cache and could hardcode a limit?
Sorry, I misspoke, I meant indexed by channel-id but also storing count.
If we don't care about Channel ignoring the batch field, then there doesn't seem to be much reason to introduce another message, IMO. Maybe to avoid duplicating the funding_txid if storing in a map?
Just because we should have the complexity of re-building a multi-part message in PeerManager
where we can stomach that complexity, rather than in Channel
where we have to go add logic in every message handler to make sure we aren't getting a random other message in between a commit-sig message stream.
Not sure if that is better. We need to handle the edge case where we receive a batch after the splice is no longer pending. There we want to ignore any commitment_signed messages that don't match the funding_txid. So we'd rather do one look-up than N.
If we're iterating the list of commit-sigs then we still just do one lookup per commit-sig (which I guess might be one more than per commit-sig-we-want, but I'm not sure it matters?). But, yea, I mean it doesn't really matter honestly.
In PeerManager? It wouldn't have access to funding state, IIUC. Not quite sure I follow what you're proposing here. By "iteratively" do you mean compared to batching into one message for Channel?
No I meant doing it in Channel iteratively, like this v
FWIW, the current approach seems much simpler. And if you squint hard enough each message cached looks like a channel state transition. At one point, I thought about validating the commitment_signed when received to avoid cloning the message, storing a LatestHolderCommitmentTXInfo to make into a ChannelMonitorUpdateStep once all messages in the batch where received. But it seemed it could be DoS-y?
Not sure its any more DoS-y than the current approach? I guess a ChannelMonitorUpdateStep
is a bit bigger so its nice to avoid, but mostly it just seems like a lot of complexity to shove in Channel
, we have so many different in-flight state things going on and adding another one just adds to the already-jumbo cognitive load trying to figure out which states we might be in at any point.
Not sure I understand. What's the worry compared to the alternative(s)?
I wasn't thinking that you'd have to store the ChannelMonitorUpdateStep
s so ignore that, we're just as DoS-y either way :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed the batching logic based on some offline discussion, in case you want an early look. Still need to re-organized how that is consumed in the channel state machine.
lightning/src/ln/channel.rs
Outdated
core::iter::once(funding) | ||
.chain(pending_funding.iter()) | ||
.map(|funding| self.get_available_balances_for_scope(funding, fee_estimator)) | ||
.min_by_key(|balances| balances.next_outbound_htlc_limit_msat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct in the case where our counterparty spliced-out some funds - our next_outbound_htlc_limit_msat
might be the same across two splices but inbound_capacity_msat
is lower on one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.. are we able to pick one as the AvailableBalance
? Or do we need to merge them somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISTM we should always report the lowest available balance for each type of balance (maybe we can do that more cleverly, though?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... actually, won't we need the maximum for next_outbound_htlc_minimum_msat
and the minimum for the remaining fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I pushed a change to reduce all AvailableBalances
into one instead of selecting one.
A FundedChannel may have more than one pending FundingScope during splicing, one for the splice attempt and one or more for any RBF attempts. The counterparty will send a commitment_signed message for each pending splice transaction and the current funding transaction. Defer handling these commitment_signed messages until the entire batch has arrived. Then validate them individually, also checking if all the pending splice transactions and the current funding transaction have a corresponding commitment_signed in the batch.
A FundedChannel may have more than one pending FundingScope during splicing, one for the splice attempt and one or more for any RBF attempts. When this is the case, send a commitment_signed message for each FundingScope and include the necessary batch information (i.e., batch_size and funding_txid) to the counterparty.
A FundedChannel may have more than one pending FundingScope during splicing, one for the splice attempt and one or more for any RBF attempts. When calling get_available_balances, consider all funding scopes and take the minimum by next_outbound_htlc_limit_msat. This is used both informationally and to determine which channel to use to forward an HTLC. The choice of next_outbound_htlc_limit_msat is somewhat arbitrary but matches the field used when determining which channel used to forward an HTLC. Any field should do since each field should be adjusted by the same amount relative to another FundingScope given the nature of the fields (i.e., inbound/outbound capacity, min/max HTLC limit). Using the minimum was chosen since an order for an HTLC to be sent over the channel, it must be possible for each funding scope -- both the confirmed one and any pending scopes, one of which may eventually confirm.
84ea044
to
4caac9c
Compare
During splicing, commitment_signed messages need to be collected into a single batch before they are handled. Rather than including this as part of the channel state machine logic, batch when reading messages from the wire since they can be considered one logical message.
Once a channel is funded, it may be spliced to add or remove funds. The new funding transaction is pending until confirmed on chain and thus needs to be tracked. Additionally, it may be replaced by another transaction using RBF with a higher fee. Hence, there may be more than one pending
FundingScope
to track for a splice.This PR adds support for tracking pending funding scopes and accounting for any pending scopes where applicable (e.g., when handling and sending
commitment_signed
messages).