-
Notifications
You must be signed in to change notification settings - Fork 401
Fix spurious MPP pathfinding failure #3707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix spurious MPP pathfinding failure #3707
Conversation
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
|
||
// Aggregate the fees of the hops that come after this one, and use those fees to compute the | ||
// maximum amount that this hop can contribute to the final value received by the payee. | ||
let (next_hops_aggregated_base, next_hops_aggregated_prop) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't using the aggregate version have rounding issues? Sadly if we're even one msat off I think the bug will persist :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure there's an easy way to get rid of the rounding issues :(
Will move forward with adding a fudge factor when checking whether we've hit prevent_redundant_path_selection
. It might also be good to prevent that code from exhausting last-hop channels.
ce7b12b
to
9f0eb8f
Compare
Updated some docs and added a fudge factor when deciding if we've fully utilized a path. diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs
index 1d529b9b9..73cc6d687 100644
--- a/lightning/src/routing/router.rs
+++ b/lightning/src/routing/router.rs
@@ -2111,9 +2111,10 @@ impl<'a> PaymentPath<'a> {
cur_hop.hop_use_fee_msat = new_fee;
total_fee_paid_msat += new_fee;
} else {
- // It should not be possible because this function is called only to reduce the
- // value. In that case, compute_fee was already called with the same fees for
- // larger amount and there was no overflow.
+ // It should not be possible because this function is only called either to reduce the
+ // value or with a larger amount that was already checked for overflow in
+ // `compute_max_final_value_contribution`. In the former case, compute_fee was already
+ // called with the same fees for larger amount and there was no overflow.
unreachable!();
}
}
@@ -2121,6 +2122,9 @@ impl<'a> PaymentPath<'a> {
value_msat + extra_contribution_msat
}
+ // Returns the maximum contribution that this path can make to the final value of the payment. May
+ // be slightly lower than the actual max due to rounding errors when aggregating fees along the
+ // path.
fn compute_max_final_value_contribution(
&self, used_liquidities: &HashMap<CandidateHopId, u64>, channel_saturation_pow_half: u8
) -> u64 {
@@ -3331,9 +3335,15 @@ where L::Target: Logger {
.or_insert(spent_on_hop_msat);
let hop_capacity = hop.candidate.effective_capacity();
let hop_max_msat = max_htlc_from_capacity(hop_capacity, channel_saturation_pow_half);
- if *used_liquidity_msat == hop_max_msat {
- // If this path used all of this channel's available liquidity, we know
- // this path will not be selected again in the next loop iteration.
+
+ // If this path used all of this channel's available liquidity, we know this path will not
+ // be selected again in the next loop iteration.
+ //
+ // Allow the used amount to be slightly below the max when deciding whether a channel is
+ // fully utilized, to account for rounding errors in
+ // `PaymentPath::compute_max_final_value_contribution`.
+ let maxed_out_hop_liquidity_range = hop_max_msat.saturating_sub(1000)..(hop_max_msat.saturating_add(1));
+ if maxed_out_hop_liquidity_range.contains(used_liquidity_msat) {
prevented_redundant_path_selection = true;
}
debug_assert!(*used_liquidity_msat <= hop_max_msat); |
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
let max_path_contribution_msat = payment_path.compute_max_final_value_contribution( | ||
&used_liquidities, channel_saturation_pow_half | ||
); | ||
let desired_value_contribution = cmp::min(max_path_contribution_msat, final_value_msat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than being fuzzy can we not have these determine whether we need to limit at a random hop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does simplify things a bit. Pushed with the following diff:
diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs
index 73cc6d687..bc4210b5c 100644
--- a/lightning/src/routing/router.rs
+++ b/lightning/src/routing/router.rs
@@ -3326,7 +3326,6 @@ where L::Target: Logger {
// might have been computed considering a larger value.
// Remember that we used these channels so that we don't rely
// on the same liquidity in future paths.
- let mut prevented_redundant_path_selection = false;
for (hop, _) in payment_path.hops.iter() {
let spent_on_hop_msat = value_contribution_msat + hop.next_hops_fee_msat;
let used_liquidity_msat = used_liquidities
@@ -3335,20 +3334,9 @@ where L::Target: Logger {
.or_insert(spent_on_hop_msat);
let hop_capacity = hop.candidate.effective_capacity();
let hop_max_msat = max_htlc_from_capacity(hop_capacity, channel_saturation_pow_half);
-
- // If this path used all of this channel's available liquidity, we know this path will not
- // be selected again in the next loop iteration.
- //
- // Allow the used amount to be slightly below the max when deciding whether a channel is
- // fully utilized, to account for rounding errors in
- // `PaymentPath::compute_max_final_value_contribution`.
- let maxed_out_hop_liquidity_range = hop_max_msat.saturating_sub(1000)..(hop_max_msat.saturating_add(1));
- if maxed_out_hop_liquidity_range.contains(used_liquidity_msat) {
- prevented_redundant_path_selection = true;
- }
debug_assert!(*used_liquidity_msat <= hop_max_msat);
}
- if !prevented_redundant_path_selection {
+ if max_path_contribution_msat > value_contribution_msat {
// If we weren't capped by hitting a liquidity limit on a channel in the path,
// we'll probably end up picking the same path again on the next iteration.
// Decrease the available liquidity of a hop in the middle of the path.
This bug was recently surfaced to us by a user who wrote a test where the sender is attempting to route an MPP payment split across the two channels that it has with its LSP, where the LSP has a single large channel to the recipient. Previously this led to a pathfinding failure because our router was not sending the maximum possible value over the first MPP path it found due to overestimating the fees needed to cover the following hops. Because the path that had just been found was not maxxed out, our router assumed that we had already found enough paths to cover the full payment amount and that we were finding additional paths for the purpose of redundant path selection. This caused the router to mark the recipient's only channel as exhausted, with the intention of choosing more unique paths in future iterations. In reality, this ended up with the recipient's only channel being disabled and subsequently failing to find a route entirely. Update the router to fully utilize the capacity of any paths it finds in this situation, preventing the "redundant path selection" behavior from kicking in.
9f0eb8f
to
209cb2a
Compare
Nice to see a fix in the works, thanks @valentinewallace!
A question that's somewhat related - will this PR also have the effect of generally reducing the # of shards over which MPPs are sent, since found paths are maxed out? We're currently dealing with issues in prod where sending a MPP payment which requires only two of the sender's outbound channels (i.e. two MPP shards) is actually producing a MPP route with seven shards. With a per-path success rate of 85% (our observed probing success rate), the overall payment success rate drops to a measly 0.85^7 = 32%. And the computed maximum per-path value contribution accounts for the scorer's liquidity bounds, correct? |
Hey @MaxFangX, this PR should reduce the number of shards in some cases but I'm not sure that it fully accounts for 7 shards vs 2... Let me know if you have logs for this case or any way to reproduce 👀
Actually no, the liquidity bounds are used when sorting paths to pick the paths that are the most likely to succeed. Interesting point though, maybe something to consider. |
I opened an issue to continue the discussion at #3717
I think for us it's not imperative that the liquidity bounds are incorporated while computing the max per-path value contribution since the limiting factor is usually the channel that the user has with our LSP, rather than a public channel in the network. Just speculating here, but I wonder if it would be a good strategy to:
This would have the effect of simultaneously sending the highest value along the highest probability paths, as well as reducing the number of paths overall. |
✅ Added second reviewer: @wpaulino |
This bug was recently surfaced to us by a user who wrote a test where the sender is attempting to route an MPP payment split across the two channels that it has with its LSP, where the LSP has a single large channel to the recipient.
Previously this led to a pathfinding failure because our router was not sending the maximum possible value over the first MPP path it found due to overestimating the fees needed to cover a later hop in the route. Because the path that had just been found was not maxxed out, our router assumed that we had already found enough paths to cover the full payment amount and that we were finding additional paths for the purpose of redundant path selection. This caused the router to mark the recipient's only channel as exhausted, with the intention of choosing more unique paths in future iterations. In reality, this ended up with the recipient's only channel being disabled and subsequently failing to find a route entirely.
Update the router to fully utilize the capacity of any paths it finds in this situation, preventing the "redundant path selection" behavior from kicking in.
Closes #3685