-
Notifications
You must be signed in to change notification settings - Fork 427
Support generic HTLC interception #4300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support generic HTLC interception #4300
Conversation
|
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4300 +/- ##
=======================================
Coverage 86.59% 86.59%
=======================================
Files 158 158
Lines 102368 102402 +34
Branches 102368 102402 +34
=======================================
+ Hits 88641 88671 +30
- Misses 11313 11316 +3
- Partials 2414 2415 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, in #3843 and other prior contexts we had discussed that we might want to move away from Event-based interception to a dedicated handler or trait based interface. What do you think about going that way instead? Do you think it would still be easily possible with the approach you took here?
joostjager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No review yet, just posting form comments.
| // `ChannelManager`. Locks can be held at the same time if they are on the same branch in the tree, | ||
| // and should then be taken in the order of the lowest to the highest level in the tree. | ||
| // Note that locks on different branches shall not be taken at the same time, as doing so will | ||
| // create a new lock order for those specific locks in the order they were taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually found this a useful explanation. Agreed that it isn't ideal to keep this updated. Can't the commented by formulated in a generic way? Something along the lines of what you put in the commit msg maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe something in CONTRIBUTING.md? Its not specific to channelmanager.rs or ChannelManager mutexes, generally we kinda just expect devs to "put mutexes where they're needed, make sure you have test coverage, and only worry about lockorder if the test suite complains".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTRIBUTING.md seems to describe process rather than architecture. In #4296 I am proposing to add README.md for the lightning crate. It could go there perhaps.
Removing every explanation about this, not sure about that. But if other reviewers agree that it is redundant, I guess it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTRIBUTING.md is important things that contributors need to know, README.md is high-level things that downstream devs need to know. This is definitely not something that should go in README.md, IMO, as downstream devs shouldn't need to think about this at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. I think having these docs is valuable and the tree gives at least somewhat of a structured overview of what important fields are there. This is probably useful for everybody, but in particular for new contributors. channelmanager.rs is still 20k lines, let's not further reduce the few docs that we have that helps people to orient themselves in this huge collection of objects and methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhat of a structured overview of what important fields are there
I mean we can give a "list of important fields", then? The explicit lockorder documentation is generally likely to be wrong (does everyone reviewing PRs really check that no lockorder changes happened that invalidated the documentation? I certainly don't), which makes its presence worse than simply not having it.
lightning/src/ln/channelmanager.rs
Outdated
| }); | ||
| let shared_secret = next_hop.shared_secret().secret_bytes(); | ||
|
|
||
| macro_rules! fail_htlc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to avoid this? Macros are so annoying in navigation and debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a function but then it can't include the continue at the end, which materially reduces the chance of screwing it up (you can't obviously see that there's a missing continue just by looking at the callsite). Given its local to a function and trivial, I don't see much issue with the macro vs trying to use a function and taking the extra chance of future bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue the opposite. A continue hidden inside a macro actually hurts readability at the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename the macro fail_htlc_continue if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters much, but also +1 here, would prefer not to hide-away control flow statements in macros, even if that means that the code is not as DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fail_htlc_continue to me conveys something like "the htlc is going to continue". Macro names signaling control flow doesn't work for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also rename fail_htlc_and_continue or even fail_htlc_continue_to_next. Its not that I love control-flow in macros either, its that the alternative (macros with required control-flow at the callsite) has a much higher chance of future bugs. Not really sure what the right answer is there.
I'm not sure if its necessary. I do think that we need to not have a simple boolean "intercept everything" vs "intercept nothing", but the flags approach here seemed simple and I think likely covers the types of decisions a "do you want to consider intercepting this" trait would make. Its possible we're missing some granularity, but likely not too much (maybe the one thing would be to intercept things coming from private channels so that you can give them different fees?). Now the alternative trait would be one that could return intercept/forward/reject, which there might be some argument for here, mostly for cases where someone wants to limit HTLCs in some cases (eg policies like "only allow forwards to channels with limit liquidity from my private channels/LSP clients, but allow forwards to channels with more liquidity from anyone") where avoiding the event pipeline would be nice. That said, currently we're basically entirely single-threaded when forwarding no matter what - it all happens in a single method called from the BP. Pushing forwards through the event pipeline just makes forwarding decisions natively-async, it doesn't actually slow them down anymore than they already were. Now, there's likely some future case where we want to parallelize forwards a bit and move some of the forwarding work into something that's called in parallel (no idea what that would be? Maybe just fetching only some fresh commitment transactions for peers in a
I don't believe it would be a major change, no, just more boilerplate. |
|
I'd say that moving to a non-event-based interface would be too much scope creep. The flags-based approach is the simplest way to enable the requested functionality? |
|
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @tnull! This PR has been waiting for your review. |
carlaKC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, haven't looked at the tests yet!
While its nice to document things, the lockorder comment at the top of `ChannelManager` is just annoying to always update and doesn't add all that much value. Developers likely shouldn't be checking it while writing code, our automated lockorder issue detection framework more than suffices to catch any bugs in test-reachable code. That makes it basically write-only which isn't exactly a useful comment.
When we added support for async payments (which requires holding HTLCs until we receive an onion message), we added the hold logic to `ChannelManager::forward_htlcs`. This made sense as we reused the forwarding datastructure in the holding logic so already had the right types in place, but it turns out only a single call of `forward_htlcs` should ever result in an HTLC being held. All of the other calls (un-holding an HTLC, forwarding an intercepted HTLC, forwarding an HTLC decoded by LDK prior to 0.2, or processing a phantom receive) should never result in an HTLC being held. Instead, HTLCs should actually only ever be held when the HTLC is decoded in `process_pending_update_add_htlcs` before forwarding. Because of this, and because we want to move the interception (and thus also the holding logic) out of `forward_htlcs`, here we move the holding logic into `process_pending_update_add_htlcs`.
If a peer is offline, but only recently went offline and thus the channel has not yet been marked disabled in our gossip, we should be returning `LocalHTLCFailureReason::PeerOffline` rather than `LocalHTLCFailureReason::ChannelNotReady`. Here we fix the error returned and tweak documentation to make the cases clearer.
f2cd231 to
9af0db4
Compare
Well, as users have been asking for this for quite a while and it's particularly important to LSPs, the plan is to eventually expose this in the LDK Node interface. And then also in LDK Server, potentially even via the RPC interface if we don't have a better idea. So we will need to find a more user-friendly abstraction, and I'd personally would prefer if we could make first steps towards that at the LDK layer already.
Yes, users have been asking for the ability to reject HTLCs explictly, also since so far LDK has no way to disable forwarding at all.
Granted that it's all wired back to BP anyways, but in the user logic HTLC interception and event processing might be some completely unrelated parts of the codebase. And it seems most users end up with a pub/sub architecture where one thread/task continuously only publishes events so that other parts of the codebase can consume them. From what I've seen this also invites cases where they ~blindly mark events as processed, potentially risking loss of events in case they crash. So, sure, that's not our responsibility, but I'm mentioning this as I think a more user-friendly design that exposes dedicated interfaces for things like HTLC interception would help them avoid such risks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excuse the delay here! Did a first pass and in terms of logic changes nothing material stood out - though as noted above I think a more user-friendly interface would be preferable. But I still need to spend more time on the details of the last two commits.
| // `ChannelManager`. Locks can be held at the same time if they are on the same branch in the tree, | ||
| // and should then be taken in the order of the lowest to the highest level in the tree. | ||
| // Note that locks on different branches shall not be taken at the same time, as doing so will | ||
| // create a new lock order for those specific locks in the order they were taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this. I think having these docs is valuable and the tree gives at least somewhat of a structured overview of what important fields are there. This is probably useful for everybody, but in particular for new contributors. channelmanager.rs is still 20k lines, let's not further reduce the few docs that we have that helps people to orient themselves in this huge collection of objects and methods?
| incoming_accept_underpaying_htlcs, | ||
| next_packet_details_opt.map(|d| d.next_packet_pubkey), | ||
| ) { | ||
| Ok(info) => htlc_forwards.push((info, update_add_htlc.htlc_id)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: but it turns out only a single call of forward_htlcs should ever result in an HTLC being held.: Just want to note that AFAIU that might change when introducing receiver-side delays as we might want to reuse (some of) the HTLC-holding logic for receives. But, AFAICT the changes in this PR might even make this easier (mostly thinking out loud here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmmmm, maybe. Sadly I think the last case ("processing a phantom receive") will want to also have a receiver delay so it will result in duplicating the delay logic once. Not quite sure what to do about it, though, the alternative is to duplicate the hold logic in decode and in forward_htlcs which isn't really cleaner either.
lightning/src/ln/channelmanager.rs
Outdated
| }); | ||
| let shared_secret = next_hop.shared_secret().secret_bytes(); | ||
|
|
||
| macro_rules! fail_htlc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters much, but also +1 here, would prefer not to hide-away control flow statements in macros, even if that means that the code is not as DRY.
| "Intercepted HTLC, generating intercept event with ID {intercept_id}" | ||
| ); | ||
| let ev_entry = (intercept_ev, None); | ||
| // It's possible we processed this intercept forward, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to note that if we not used the event pipeline we could avoid edge cases like this altogether and make the flow easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a side-effect of the WIP reconstruction logic. We now have code to not persist intercepted HTLCs but do not have corresponding code to not persist the intercept event that came with it. The fix is to not persist the events :)
joostjager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current commit structure is a great improvement over the previous iteration.
The diff of "Move HTLC interception decisions to forward_htlcs callsites" does remain difficult to verify for me, but it seems no further split that helps much with that is possible. Probably requires more familiarity with the pipeline too.
lightning/src/ln/channelmanager.rs
Outdated
| match held_htlcs.entry(intercept_id) { | ||
| hash_map::Entry::Vacant(entry) => { | ||
| log_trace!( | ||
| self.logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the prev location this was a context logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is fixed two commits later. I guess I could try to fix it in this commit but not sure it matters?
lightning/src/ln/channelmanager.rs
Outdated
| }); | ||
| let shared_secret = next_hop.shared_secret().secret_bytes(); | ||
|
|
||
| macro_rules! fail_htlc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fail_htlc_continue to me conveys something like "the htlc is going to continue". Macro names signaling control flow doesn't work for me.
| // TODO: We do the fake SCID namespace check a bunch of times here (and indirectly via | ||
| // `forward_needs_intercept`, including as called in | ||
| // `can_forward_htlc_to_outgoing_channel`), we should find a way to reduce the number of | ||
| // times we do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the options for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to change the fake_scid API to allow us to ask which namespace an SCID is in rather than asking it repeatedly "is it of namespace X?" Not 100% trivial though so I left it for future work.
What would you propose? In terms of "an API that makes sense to expose over an RPC" I'd think that the current flags-based API is way better than a trait that makes the same decisions. An RPC interface would presumably have to be async (ie really can't be a trait) and can't decide whether to intercept live, but rather somewhat has to have a simpler filter language (like the flags proposed here).
This API certainly allows for that? Though of course in general we don't want people to "disable forwarding" or rejecting HTLCs being forwarded to public channels, cause they should just make the channel not-public if they want that.
Mmm, that's a fair point, but I think the "it has to be async if its gonna go over an RPC" issue noted above is pretty constraining. Luckily in this case if they fail to process an HTLC we will eventually fail it back before it times out (and on restart may regenerate the intercepted event now, thanks to val's work), so I don't think its super urgent here. |
In the next commit we'll substantially expand the types of HTLCs which can be intercepted. In order to do so, we want to make forwarding decisions with access to the (specified) destination channel. Sadly, this isn't available in `forward_htlcs`, so here we move interception decisions out of `forward_htlcs` and into `process_pending_update_add_htlcs` and `handle_release_held_htlc`. Note that we do not handle HTLC interception when forwarding an HTLC which was decoded in LDK versions prior to 0.2, which is noted in a suggested release note. This is due to a gap where such HTLC might have had its routing decision made already and be waiting for an interception decision in `forward_htlcs`, but now we will only make an interception decision when decoding the onion.
At various points we've had requests to support more generic HTLC interception in LDK. In most cases, full HTLC interception was not, in fact, the right way to accomplish what the developer wanted, but there have been various times when it might have been. Here, we finally add full HTLC interception support, doing so with a configurable bitfield to allow developers to intercept only certain classes of HTLCs. Specifically, we currently support intercepting HTLCs: * which were to be forwarded to intercept SCIDs (as was already supported), * which were to be forwarded to offline private channels (for LSPs to accept HTLCs for offline clients so that they can attempt to wake them before failing the HTLC), * which were to be forwarded to online private channels (for LSPs to take additional fees or enforce certain policies), * which were to be forwarded over public channels (for general forwarding policy enforcement), * which were to be forwarded to unknown SCIDs (for everything else).
1ebe8a6 to
c0121d4
Compare
For LSPS5 (and, generally, being an LSP), we need the ability to intercept and hold HTLCs destined for offline private channels, attempt to wake them and forward the HTLC after they connect (assuming they do). While we're adding interception for additional destinations, there's not any additional complexity to just make it a bitmask and support arbitrary interception, so we do that as well.