Skip to content

Conversation

@TheBlueMatt
Copy link
Collaborator

At various points we've had requests to support more generic HTLC
interception in LDK. In most cases, full HTLC interception was not,
in fact, the right way to accomplish what the developer wanted, but
there have been various times when it might have been.

Here, we finally add full HTLC interception support, doing so with
a configurable bitfield to allow developers to intercept only
certain classes of HTLCs.

Specifically, we currently support intercepting HTLCs:
 * which were to be forwarded to intercept SCIDs (as was already
   supported),
 * which were to be forwarded to offline private channels (for LSPs
   to accept HTLCs for offline clients so that they can attempt to
   wake them before failing the HTLC),
 * which were to be forwarded to online private channels (for LSPs
   to take additional fees or enforce certain policies),
 * which were to be forwarded over public channels (for general
   forwarding policy enforcement),
 * which were to be forwarded to unknown SCIDs (for everything
   else).

Similar to a few commits ago, this requires inlining the HTLC
interception logic from `forward_htlcs` to its callsites,
specifically `process_pending_update_add_htlcs` and
`handle_release_held_htlc`.

Note that we do not handle HTLC interception when forwarding an
HTLC which was decoded in LDK versions prior to 0.2, which is noted
in a suggested release note.

For LSPS5 (and, generally, being an LSP), we need the ability to intercept and hold HTLCs destined for offline private channels, attempt to wake them and forward the HTLC after they connect (assuming they do). While we're adding interception for additional destinations, there's not any additional complexity to just make it a bitmask and support arbitrary interception, so we do that as well.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jan 5, 2026

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull tnull self-requested a review January 5, 2026 20:25
@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

❌ Patch coverage is 82.75862% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.59%. Comparing base (c5d7b13) to head (c0121d4).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 84.77% 28 Missing and 2 partials ⚠️
lightning/src/util/config.rs 20.00% 3 Missing and 1 partial ⚠️
lightning/src/ln/functional_test_utils.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4300   +/-   ##
=======================================
  Coverage   86.59%   86.59%           
=======================================
  Files         158      158           
  Lines      102368   102402   +34     
  Branches   102368   102402   +34     
=======================================
+ Hits        88641    88671   +30     
- Misses      11313    11316    +3     
- Partials     2414     2415    +1     
Flag Coverage Δ
fuzzing 36.93% <37.43%> (-0.01%) ⬇️
tests 85.88% <83.16%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, in #3843 and other prior contexts we had discussed that we might want to move away from Event-based interception to a dedicated handler or trait based interface. What do you think about going that way instead? Do you think it would still be easily possible with the approach you took here?

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review yet, just posting form comments.

// `ChannelManager`. Locks can be held at the same time if they are on the same branch in the tree,
// and should then be taken in the order of the lowest to the highest level in the tree.
// Note that locks on different branches shall not be taken at the same time, as doing so will
// create a new lock order for those specific locks in the order they were taken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually found this a useful explanation. Agreed that it isn't ideal to keep this updated. Can't the commented by formulated in a generic way? Something along the lines of what you put in the commit msg maybe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe something in CONTRIBUTING.md? Its not specific to channelmanager.rs or ChannelManager mutexes, generally we kinda just expect devs to "put mutexes where they're needed, make sure you have test coverage, and only worry about lockorder if the test suite complains".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTRIBUTING.md seems to describe process rather than architecture. In #4296 I am proposing to add README.md for the lightning crate. It could go there perhaps.

Removing every explanation about this, not sure about that. But if other reviewers agree that it is redundant, I guess it's ok.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTRIBUTING.md is important things that contributors need to know, README.md is high-level things that downstream devs need to know. This is definitely not something that should go in README.md, IMO, as downstream devs shouldn't need to think about this at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. I think having these docs is valuable and the tree gives at least somewhat of a structured overview of what important fields are there. This is probably useful for everybody, but in particular for new contributors. channelmanager.rs is still 20k lines, let's not further reduce the few docs that we have that helps people to orient themselves in this huge collection of objects and methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat of a structured overview of what important fields are there

I mean we can give a "list of important fields", then? The explicit lockorder documentation is generally likely to be wrong (does everyone reviewing PRs really check that no lockorder changes happened that invalidated the documentation? I certainly don't), which makes its presence worse than simply not having it.

});
let shared_secret = next_hop.shared_secret().secret_bytes();

macro_rules! fail_htlc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to avoid this? Macros are so annoying in navigation and debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a function but then it can't include the continue at the end, which materially reduces the chance of screwing it up (you can't obviously see that there's a missing continue just by looking at the callsite). Given its local to a function and trivial, I don't see much issue with the macro vs trying to use a function and taking the extra chance of future bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue the opposite. A continue hidden inside a macro actually hurts readability at the call site.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename the macro fail_htlc_continue if you prefer.

Copy link
Contributor

@tnull tnull Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters much, but also +1 here, would prefer not to hide-away control flow statements in macros, even if that means that the code is not as DRY.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail_htlc_continue to me conveys something like "the htlc is going to continue". Macro names signaling control flow doesn't work for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also rename fail_htlc_and_continue or even fail_htlc_continue_to_next. Its not that I love control-flow in macros either, its that the alternative (macros with required control-flow at the callsite) has a much higher chance of future bugs. Not really sure what the right answer is there.

@joostjager joostjager self-requested a review January 6, 2026 13:17
@TheBlueMatt
Copy link
Collaborator Author

IIRC, in #3843 and other prior contexts we had discussed that we might want to move away from Event-based interception to a dedicated handler or trait based interface. What do you think about going that way instead?

I'm not sure if its necessary. I do think that we need to not have a simple boolean "intercept everything" vs "intercept nothing", but the flags approach here seemed simple and I think likely covers the types of decisions a "do you want to consider intercepting this" trait would make. Its possible we're missing some granularity, but likely not too much (maybe the one thing would be to intercept things coming from private channels so that you can give them different fees?).

Now the alternative trait would be one that could return intercept/forward/reject, which there might be some argument for here, mostly for cases where someone wants to limit HTLCs in some cases (eg policies like "only allow forwards to channels with limit liquidity from my private channels/LSP clients, but allow forwards to channels with more liquidity from anyone") where avoiding the event pipeline would be nice.

That said, currently we're basically entirely single-threaded when forwarding no matter what - it all happens in a single method called from the BP. Pushing forwards through the event pipeline just makes forwarding decisions natively-async, it doesn't actually slow them down anymore than they already were. Now, there's likely some future case where we want to parallelize forwards a bit and move some of the forwarding work into something that's called in parallel (no idea what that would be? Maybe just fetching only some fresh commitment transactions for peers in a get_and_clear_pending_msg_events call?), but even there the actual forwarding decision should be quite cheap so hopefully not something we care too much about. There's some extra memory allocations involved in pushing into the event queue, but it should be pretty amortized with the event queue vec.

Do you think it would still be easily possible with the approach you took here?

I don't believe it would be a major change, no, just more boilerplate.

@joostjager
Copy link
Contributor

I'd say that moving to a non-event-based interface would be too much scope creep. The flags-based approach is the simplest way to enable the requested functionality?

@joostjager joostjager removed their request for review January 8, 2026 07:36
@tnull tnull self-requested a review January 8, 2026 13:41
@TheBlueMatt TheBlueMatt self-assigned this Jan 8, 2026
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, haven't looked at the tests yet!

While its nice to document things, the lockorder comment at the top
of `ChannelManager` is just annoying to always update and doesn't
add all that much value. Developers likely shouldn't be checking it
while writing code, our automated lockorder issue detection
framework more than suffices to catch any bugs in test-reachable
code. That makes it basically write-only which isn't exactly a
useful comment.
When we added support for async payments (which requires holding
HTLCs until we receive an onion message), we added the hold logic
to `ChannelManager::forward_htlcs`. This made sense as we reused
the forwarding datastructure in the holding logic so already had
the right types in place, but it turns out only a single call of
`forward_htlcs` should ever result in an HTLC being held.

All of the other calls (un-holding an HTLC, forwarding an
intercepted HTLC, forwarding an HTLC decoded by LDK prior to 0.2,
or processing a phantom receive) should never result in an HTLC
being held. Instead, HTLCs should actually only ever be held when
the HTLC is decoded in `process_pending_update_add_htlcs` before
forwarding.

Because of this, and because we want to move the interception (and
thus also the holding logic) out of `forward_htlcs`, here we move
the holding logic into `process_pending_update_add_htlcs`.
If a peer is offline, but only recently went offline and thus the
channel has not yet been marked disabled in our gossip, we should
be returning `LocalHTLCFailureReason::PeerOffline` rather than
`LocalHTLCFailureReason::ChannelNotReady`. Here we fix the error
returned and tweak documentation to make the cases clearer.
@TheBlueMatt TheBlueMatt force-pushed the 2025-12-full-interception branch 5 times, most recently from f2cd231 to 9af0db4 Compare January 13, 2026 20:29
@tnull
Copy link
Contributor

tnull commented Jan 14, 2026

I'm not sure if its necessary. I do think that we need to not have a simple boolean "intercept everything" vs "intercept nothing", but the flags approach here seemed simple and I think likely covers the types of decisions a "do you want to consider intercepting this" trait would make. Its possible we're missing some granularity, but likely not too much (maybe the one thing would be to intercept things coming from private channels so that you can give them different fees?).

Well, as users have been asking for this for quite a while and it's particularly important to LSPs, the plan is to eventually expose this in the LDK Node interface. And then also in LDK Server, potentially even via the RPC interface if we don't have a better idea. So we will need to find a more user-friendly abstraction, and I'd personally would prefer if we could make first steps towards that at the LDK layer already.

Now the alternative trait would be one that could return intercept/forward/reject, which there might be some argument for here, mostly for cases where someone wants to limit HTLCs in some cases (eg policies like "only allow forwards to channels with limit liquidity from my private channels/LSP clients, but allow forwards to channels with more liquidity from anyone") where avoiding the event pipeline would be nice.

Yes, users have been asking for the ability to reject HTLCs explictly, also since so far LDK has no way to disable forwarding at all.

That said, currently we're basically entirely single-threaded when forwarding no matter what - it all happens in a single method called from the BP. Pushing forwards through the event pipeline just makes forwarding decisions natively-async, it doesn't actually slow them down anymore than they already were.

Granted that it's all wired back to BP anyways, but in the user logic HTLC interception and event processing might be some completely unrelated parts of the codebase. And it seems most users end up with a pub/sub architecture where one thread/task continuously only publishes events so that other parts of the codebase can consume them. From what I've seen this also invites cases where they ~blindly mark events as processed, potentially risking loss of events in case they crash. So, sure, that's not our responsibility, but I'm mentioning this as I think a more user-friendly design that exposes dedicated interfaces for things like HTLC interception would help them avoid such risks.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse the delay here! Did a first pass and in terms of logic changes nothing material stood out - though as noted above I think a more user-friendly interface would be preferable. But I still need to spend more time on the details of the last two commits.

// `ChannelManager`. Locks can be held at the same time if they are on the same branch in the tree,
// and should then be taken in the order of the lowest to the highest level in the tree.
// Note that locks on different branches shall not be taken at the same time, as doing so will
// create a new lock order for those specific locks in the order they were taken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. I think having these docs is valuable and the tree gives at least somewhat of a structured overview of what important fields are there. This is probably useful for everybody, but in particular for new contributors. channelmanager.rs is still 20k lines, let's not further reduce the few docs that we have that helps people to orient themselves in this huge collection of objects and methods?

incoming_accept_underpaying_htlcs,
next_packet_details_opt.map(|d| d.next_packet_pubkey),
) {
Ok(info) => htlc_forwards.push((info, update_add_htlc.htlc_id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: but it turns out only a single call of forward_htlcs should ever result in an HTLC being held.: Just want to note that AFAIU that might change when introducing receiver-side delays as we might want to reuse (some of) the HTLC-holding logic for receives. But, AFAICT the changes in this PR might even make this easier (mostly thinking out loud here).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmm, maybe. Sadly I think the last case ("processing a phantom receive") will want to also have a receiver delay so it will result in duplicating the delay logic once. Not quite sure what to do about it, though, the alternative is to duplicate the hold logic in decode and in forward_htlcs which isn't really cleaner either.

});
let shared_secret = next_hop.shared_secret().secret_bytes();

macro_rules! fail_htlc {
Copy link
Contributor

@tnull tnull Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters much, but also +1 here, would prefer not to hide-away control flow statements in macros, even if that means that the code is not as DRY.

"Intercepted HTLC, generating intercept event with ID {intercept_id}"
);
let ev_entry = (intercept_ev, None);
// It's possible we processed this intercept forward,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to note that if we not used the event pipeline we could avoid edge cases like this altogether and make the flow easier to reason about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a side-effect of the WIP reconstruction logic. We now have code to not persist intercepted HTLCs but do not have corresponding code to not persist the intercept event that came with it. The fix is to not persist the events :)

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current commit structure is a great improvement over the previous iteration.

The diff of "Move HTLC interception decisions to forward_htlcs callsites" does remain difficult to verify for me, but it seems no further split that helps much with that is possible. Probably requires more familiarity with the pipeline too.

match held_htlcs.entry(intercept_id) {
hash_map::Entry::Vacant(entry) => {
log_trace!(
self.logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the prev location this was a context logger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is fixed two commits later. I guess I could try to fix it in this commit but not sure it matters?

});
let shared_secret = next_hop.shared_secret().secret_bytes();

macro_rules! fail_htlc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail_htlc_continue to me conveys something like "the htlc is going to continue". Macro names signaling control flow doesn't work for me.

// TODO: We do the fake SCID namespace check a bunch of times here (and indirectly via
// `forward_needs_intercept`, including as called in
// `can_forward_htlc_to_outgoing_channel`), we should find a way to reduce the number of
// times we do it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the options for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change the fake_scid API to allow us to ask which namespace an SCID is in rather than asking it repeatedly "is it of namespace X?" Not 100% trivial though so I left it for future work.

@TheBlueMatt
Copy link
Collaborator Author

Well, as users have been asking for this for quite a while and it's particularly important to LSPs, the plan is to eventually expose this in the LDK Node interface. And then also in LDK Server, potentially even via the RPC interface if we don't have a better idea. So we will need to find a more user-friendly abstraction, and I'd personally would prefer if we could make first steps towards that at the LDK layer already.

What would you propose? In terms of "an API that makes sense to expose over an RPC" I'd think that the current flags-based API is way better than a trait that makes the same decisions. An RPC interface would presumably have to be async (ie really can't be a trait) and can't decide whether to intercept live, but rather somewhat has to have a simpler filter language (like the flags proposed here).

Yes, users have been asking for the ability to reject HTLCs explictly, also since so far LDK has no way to disable forwarding at all.

This API certainly allows for that? Though of course in general we don't want people to "disable forwarding" or rejecting HTLCs being forwarded to public channels, cause they should just make the channel not-public if they want that.

Granted that it's all wired back to BP anyways, but in the user logic HTLC interception and event processing might be some completely unrelated parts of the codebase. And it seems most users end up with a pub/sub architecture where one thread/task continuously only publishes events so that other parts of the codebase can consume them. From what I've seen this also invites cases where they ~blindly mark events as processed, potentially risking loss of events in case they crash. So, sure, that's not our responsibility, but I'm mentioning this as I think a more user-friendly design that exposes dedicated interfaces for things like HTLC interception would help them avoid such risks.

Mmm, that's a fair point, but I think the "it has to be async if its gonna go over an RPC" issue noted above is pretty constraining. Luckily in this case if they fail to process an HTLC we will eventually fail it back before it times out (and on restart may regenerate the intercepted event now, thanks to val's work), so I don't think its super urgent here.

In the next commit we'll substantially expand the types of HTLCs
which can be intercepted. In order to do so, we want to make
forwarding decisions with access to the (specified) destination
channel. Sadly, this isn't available in `forward_htlcs`, so here we
move interception decisions out of `forward_htlcs` and into
`process_pending_update_add_htlcs` and `handle_release_held_htlc`.

Note that we do not handle HTLC interception when forwarding an
HTLC which was decoded in LDK versions prior to 0.2, which is noted
in a suggested release note. This is due to a gap where such HTLC
might have had its routing decision made already and be waiting
for an interception decision in `forward_htlcs`, but now we will
only make an interception decision when decoding the onion.
At various points we've had requests to support more generic HTLC
interception in LDK. In most cases, full HTLC interception was not,
in fact, the right way to accomplish what the developer wanted, but
there have been various times when it might have been.

Here, we finally add full HTLC interception support, doing so with
a configurable bitfield to allow developers to intercept only
certain classes of HTLCs.

Specifically, we currently support intercepting HTLCs:
 * which were to be forwarded to intercept SCIDs (as was already
   supported),
 * which were to be forwarded to offline private channels (for LSPs
   to accept HTLCs for offline clients so that they can attempt to
   wake them before failing the HTLC),
 * which were to be forwarded to online private channels (for LSPs
   to take additional fees or enforce certain policies),
 * which were to be forwarded over public channels (for general
   forwarding policy enforcement),
 * which were to be forwarded to unknown SCIDs (for everything
   else).
@TheBlueMatt TheBlueMatt force-pushed the 2025-12-full-interception branch from 1ebe8a6 to c0121d4 Compare January 14, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants