Skip to content

[bug]: on-chain intercepts evicted after one block due to unset AutoFailHeight #10892

@kovin-muun

Description

@kovin-muun

Pre-Submission Checklist

  • I have searched the existing issues and believe this is a new bug.
  • I am not asking a question about how to use lnd, but reporting a bug.

LND Version

v0.19.3-beta (the bug is present unchanged on current master, afeb9e1)

LND Configuration

Relevant settings:

requireinterceptor=true

A gRPC client is permanently connected to routerrpc.Router/HtlcInterceptor and resolves intercepted HTLCs (settle with preimage / fail / resume).

Backend Version

Bitcoin Core (the bug is backend-independent and reproducible on regtest)

Backend Configuration

Not related to this bug.

OS/Distribution

Linux (Kubernetes)

Bug Details & Steps to Reproduce

Summary: HTLCs offered to the interceptor through the on-chain resolution flow (witness_beacon.go, added in #6219) are silently evicted from the interceptor held set on the first new block after being offered, because their AutoFailHeight is never set and the interceptor watchdog sweep (added in #6831) treats the zero value as "already expired". After the eviction, a Settle from the interceptor returns fwd not found (and tears down the interceptor stream), the preimage never reaches the witness beacon, and the HTLC is eventually claimed by the counterparty via the timeout path. This caused a direct loss of funds for us (incident details below).

Mechanism

When a channel force-closes with an unresolved intercepted HTLC, the incoming contest resolver offers it to the interceptor again so that it can still supply the preimage for an on-chain claim. The packet is built without AutoFailHeight (left at its zero value):

lnd/witness_beacon.go

Lines 96 to 110 in afeb9e1

packet := &htlcswitch.InterceptedPacket{
Hash: htlc.RHash,
IncomingExpiry: htlc.RefundTimeout,
IncomingAmount: htlc.Amt,
IncomingCircuit: models.CircuitKey{
ChanID: chanID,
HtlcID: htlc.HtlcIndex,
},
OutgoingChanID: payload.FwdInfo.NextHop,
OutgoingExpiry: payload.FwdInfo.OutgoingCTLV,
OutgoingAmount: payload.FwdInfo.AmountToForward,
InOnionCustomRecords: payload.CustomRecords(),
InWireCustomRecords: htlc.CustomRecords,
}
copy(packet.OnionBlob[:], nextHopOnionBlob)

The forward enters the same heldHtlcSet as off-chain intercepts. On every new block, failExpiredHtlcs calls popAutoFails:

func (h *heldHtlcSet) popAutoFails(height uint32, cb func(InterceptedForward)) {
for key, fwd := range h.set {
if uint32(fwd.Packet().AutoFailHeight) > height {
continue
}
cb(fwd)
delete(h.set, key)
}
}

if uint32(fwd.Packet().AutoFailHeight) > height { // 0 > height: never true
        continue
}

cb(fwd)              // FailWithCode -> ErrCannotFail for on-chain forwards: logged and ignored

delete(h.set, key)   // entry removed unconditionally

For an on-chain forward, FailWithCode returns ErrCannotFail ("cannot fail in the on-chain flow") by design — but the entry is deleted from the set regardless. The net effect is that every on-chain intercepted HTLC survives in the held set for at most one block (~10 minutes). Any Settle arriving after that gets fwd not found, which additionally terminates the whole HtlcInterceptor stream for all other in-flight intercepts.

The off-chain interception path sets the field correctly (

intercepted := &interceptedForward{
htlc: htlc,
packet: packet,
htlcSwitch: s.htlcSwitch,
autoFailHeight: int32(packet.incomingTimeout -
s.cltvRejectDelta),
}
), so only the on-chain rescue path is affected. The two interacting changes were merged about six months apart (#6219 in Apr 2022, #6831 in Oct 2022), so every release since v0.16.0 is affected.

Steps to reproduce (regtest)

  1. Three nodes A -> B -> C. B runs with requireinterceptor=true and has an interceptor client connected.
  2. Pay an invoice from A to C. Have the interceptor hold the intercepted HTLC at B (no resolution yet).
  3. Force-close the A–B channel. Once the close confirms, the contest resolver offers the HTLC to the interceptor through the on-chain flow.
  4. Mine one block. B logs [ERR] HSWC: Cannot fail packet: cannot fail in the on-chain flow and the held entry is gone.
  5. Have the interceptor send Settle with the correct preimage: the resolution fails with fwd not found, the interceptor stream is terminated, and B never claims the HTLC on-chain even though it had the preimage well before the HTLC's expiry.

Production incident (all times UTC)

When What happened
Jun 6, 00:01 Peer force-closed the incoming channel (block 952544) carrying an unresolved intercepted HTLC (~78k sats, expiry height 952959 — about 415 blocks / ~3 days away).
Jun 6, 04:26 lnd restarted (unrelated operational event). On startup the contest resolver offered the HTLC to the interceptor → held.
Jun 6, 04:31 First new block after startup: two Cannot fail packet: cannot fail in the on-chain flow errors — both on-chain intercepts from that close evicted from the held set.
Jun 6, 07:29 Interceptor sent Settle with the correct preimage → fwd (Chan ID=951172:715:0, HTLC ID=19881) not found, stream torn down.
Jun 6–9 No further offers (no restart), so the preimage never reached the witness beacon.
Jun 9 Expiry passed; the counterparty claimed the timeout path. ~78k sats lost despite the interceptor holding the preimage ~2.5 days before expiry.

Severity

Silent loss of funds. The exposure is burst-shaped rather than a steady drip: any incident that produces force closes of channels carrying pending intercepted HTLCs disables the on-chain rescue path for all of them at once, with only an easily-missed ERR log line as a trace. Effectively, #6219's on-chain interception recovery has been broken since #6831 for any Settle that arrives more than one block after the offer.

Proposed fix

The current guidelines indicate:

If you spot a glaring issue, we may still merge the fix or take it over ourselves. And if you're a new developer who notices an issue with the code, consider opening a detailed issue instead of a PR.

However, the critically of the bug and the scoped nature of the fix convinced us to submit a PR proposal in #10893

Expected Behavior

An on-chain intercepted forward should remain available to the interceptor until the HTLC's actual on-chain expiry (RefundTimeout). A Settle arriving at any time before expiry should hand the preimage to the witness beacon so the resolver can claim the output. There is no reason to auto-fail these entries earlier: the channel is already closed (so there is no force close left to prevent), and failing them back is impossible by construction (FailWithCode returns ErrCannotFail).

Debug Information

The two log lines an affected HTLC produces (default log levels):

2026-06-06 04:31:14.509 [ERR] HSWC: Cannot fail packet: cannot fail in the on-chain flow
2026-06-06 07:29:01.230 [ERR] RPCS: [/routerrpc.Router/HtlcInterceptor]: fwd (Chan ID=951172:715:0, HTLC ID=19881) not found

The first line fires exactly once per evicted on-chain intercept, so grepping for cannot fail in the on-chain flow over historical logs counts how many times the bug has fired on a given node.

Environment

lnd runs in Kubernetes; the interceptor client is a JVM service connected over the local network. The intercepted HTLCs are payments to our users, which is why the interceptor may legitimately hold a forward for hours before settling (waiting for the recipient to come online) — the window in which this bug destroys the rescue path.

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Bug.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions