Skip to content

perf: eliminate triple-collect and double-pass in uphill routing#4247

Open
Basedfloppa wants to merge 3 commits into
freenet:mainfrom
Basedfloppa:perf/routing-engine-triple-collect
Open

perf: eliminate triple-collect and double-pass in uphill routing#4247
Basedfloppa wants to merge 3 commits into
freenet:mainfrom
Basedfloppa:perf/routing-engine-triple-collect

Conversation

@Basedfloppa
Copy link
Copy Markdown
Contributor

Closes items 1 and 3 from #4243.

Why

select_uphill_hop is on the CONNECT hot path — it runs on every uphill routing decision when a peer is at terminus but cannot accept. The function had two unnecessary allocation patterns that add latency proportional to candidate count:

  1. Triple-collect chain (scoredbest_candidatespartition): Two intermediate Vec allocations before the final partition. The best_candidates Vec served only as a filtering step with no semantic need to materialize.

  2. Double-pass reliability scoring (filter → map → fold → filter): Reliability was computed per-candidate in a .map(), then re-iterated in a .fold() to find the maximum. Since iteration is lazy until .collect(), this is a second pass over the entire candidate set.

Each allocation and pass is cheap in isolation, but on the hot path — especially with many candidates — they compound into measurable GC pressure and cache misses.

What

Issue 1 — Triple-collect chain (item 1 from #4243)

Chain filter().map() directly into .partition(), eliminating the intermediate Vec<PeerKeyLocation> (best_candidates). The scored Vec becomes the only intermediate allocation (necessary because we need the reliability-score data before partitioning).

Before: scored: Vec<(f64, PKL)>filter().map()best_candidates: Vec<PKL>partition()
After: scored: Vec<(f64, PKL)>filter().map()partition() (one less Vec)

Issue 3 — Single-pass reliability scoring (item 3 from #4243)

Merge the fold pass into the initial filter_map iterator by tracking best_reliability inline via best_reliability.max(reliability). The filter_map now serves triple duty: recency filtering, reliability scoring, and best-reliability tracking.

Before: filter() + map() to collect scored → iter().map().fold() for max
After: filter_map() with inline best_reliability tracking (single pass)

Correctness

All three changes (recency filter, reliability scoring, partition logic) preserve the exact same semantics:

  • best_reliability = max over all reliability scores — same result whether computed inline or via fold
  • filter(|(r, _)| best - *r < tol) operates on the same scored Vec in both cases
  • partition() into close/far by distance threshold — unchanged
  • Final router.select_peer() call — unchanged

Testing

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test -p freenet --lib -- operations::connect60/60 passed
  • cargo test -p freenet --lib — 2594/2596 passed; 2 pre-existing failures (IPv6-unavailable, flaky delegate test — both unrelated)

Related

Issue 1 — Triple-collect chain: chain filter+map directly into
partition() inside select_uphill_hop, eliminating the intermediate
Vec<PeerKeyLocation> allocation (best_candidates).

Issue 3 — Single-pass reliability scoring: merge the separate
reliability-scoring fold pass into the initial filter_map by tracking
best_reliability inline with best_reliability.max(reliability).

Closes first two items from freenet#4243.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 25, 2026

I now have enough context to write the review.

Rule Review: Missing boundary regression test for k=0 guard

Rules checked: git-workflow.md, code-style.md, testing.md, operations.md
Files reviewed: 2 (crates/core/src/operations/connect.rs, crates/core/src/router.rs)


Warnings

  • crates/core/src/router.rs:549–553 — The fix: commit adds a k > 0 guard to prevent select_nth_unstable_by(usize::MAX, ...) when consider_n_closest_peers = 0 with a non-empty peer list. No test covers this scenario. The existing test_select_k_best_empty_candidates (line 1359) sets considering_n_closest_peers(5) with empty peers, so k = 5.min(0) = 0 — but peer_distances.len() is also 0, meaning the old k < peer_distances.len() guard would have been false anyway, and the panic path is never reached in that test. A dedicated test with considering_n_closest_peers(0) and a non-empty peer list is the missing regression that would have caught (and now guards against) this specific panic. (rule: testing.md — boundary values, zero)

Info

  • crates/core/src/router.rs:548 — Comment ends with See PR #4247 review: https://github.com/freenet/freenet-core/pull/4247. Per code-style rules, comments must not reference the current PR/task — they rot. The useful part of the comment (explaining why k > 0 is the correct guard rather than k == 0) should stand alone without the URL. (rule: code-style.md)

  • Commit 58385372 — Subject line fix skip select_nth_unstable_by(0, ...) is missing the conventional-commit prefix (fix: + space before description). (rule: git-workflow.md)

  • PR title and commits 9da80af4 / 4fc752ff use perf: prefix. The project's git-workflow lists valid prefixes as feat|fix|docs|refactor|test|build; perf: is not among them. Consider using refactor: or adding perf: to the documented list. (rule: git-workflow.md)


Rule review against .claude/rules/. WARNING findings block merge. ⚠️ 1 warning(s) — fix or add review-override label

Issue 2 — Vec capacity hints: pre-allocate scored/fallbacks with
candidates.len() and expired with forward_attempts.len() to avoid
geometric reallocations on the CONNECT hot path.

Issue 6 — Partial sort: use select_nth_unstable_by in select_k_best_peers
to find the k closest peers in O(n) instead of O(n log n) full sort.
k = consider_n_closest_peers (default 25) is typically 2-4× smaller
than the candidate pool.

Part of freenet#4243.
@Basedfloppa
Copy link
Copy Markdown
Contributor Author

Closes items 1, 2, 3, and 6 from #4243.

Why

Four allocator-level hot spots identified in the routing engine, concentrated in
select_uphill_hop (CONNECT uphill routing), select_k_best_peers (peer
selection), and expire_forward_attempts.

Triple-collect chain (select_uphill_hop)

scoredbest_candidatespartition: two intermediate Vec allocations
before the final partition. The best_candidates Vec existed only to be
consumed by .partition() on the next line.

Double-pass reliability scoring (select_uphill_hop)

Reliability computed per-candidate in .map(), then re-iterated in .fold()
to find the maximum. Since iteration is lazy until .collect(), this is a
second pass over the entire candidate set.

Vec::new() without capacity hints (connect hot path)

scored, fallbacks, and expired Vecs grew from 0 capacity via geometric
doubling (0→1→2→4→8→16→32→64), causing 4–6 reallocations each despite the
final size being known in advance.

Full sort where partial sort suffices (select_k_best_peers)

sort_by_key is O(n log n) but only top k = 25 elements are needed.
select_nth_unstable_by partitions around the k-th element in O(n), then only
the truncated k elements are sorted (O(k log k)).

What

Issue 1 — Triple-collect chain

Chain filter().map() directly into .partition(), eliminating the
intermediate best_candidates Vec.

Before: scored: Vec<(f64, PKL)>filter().map()best_candidates: Vec<PKL>partition()

After: scored: Vec<(f64, PKL)>filter().map()partition()
(one less Vec)

Issue 2 — Vec capacity hints

Location Before After
select_next_hopscored Vec::new() Vec::with_capacity(candidates.len())
select_next_hopfallbacks Vec::new() Vec::with_capacity(candidates.len())
expire_forward_attemptsexpired Vec::new() Vec::with_capacity(self.forward_attempts.len())

Issue 3 — Single-pass reliability scoring

Merge the fold pass into the initial filter_map by tracking
best_reliability inline via best_reliability.max(reliability). The
filter_map now serves triple duty: recency filtering, reliability scoring,
and best-reliability tracking.

Issue 6 — Partial sort

Replace sort_by_key (O(n log n)) with select_nth_unstable_by (O(n)) for
finding the k closest peers, then sort only the truncated k.

What's intentionally excluded from this PR

Item Reason
4 — LRU cache Requires careful invalidation design; estimator state changes on every add_event(). Worth a focused PR.
5 — Router write-lock Architectural (channel-based handoff). ~1-2 days. Marginal gain outside dense gateways.
7 — PeerKeyLocation clones The 3→2 reduction conflicts with the return value ownership; function returns peer after insertion. Deeper redesign needed.

Testing

  • cargo fmt / cargo clippy --all-targets -- -D warnings — clean
  • cargo test -p freenet --lib -- operations::connect60/60 passed
  • cargo test -p freenet --lib -- router87/87 passed
  • Full lib suite — 2594/2596 passed; 2 pre-existing failures
    (IPv6-unavailable, flaky delegate test — both unrelated)

@Basedfloppa
Copy link
Copy Markdown
Contributor Author

Why items 4, 5, and 7 are excluded from this PR

Item 4 — LRU cache for predict_routing_outcome

Problem: predict_routing_outcome(peer, target) is recomputed for every
(peer, target) pair on every routing decision. Multi-hop connect: 30
candidates × 3 hops = 90 predictions instead of 30 unique.

Why not here: The cache would need invalidation on every add_event()
because the internal estimators (isotonic regression, renegade ML predictor)
change their state with each event. A naive TTL-based cache could serve stale
predictions, causing the router to systematically prefer peers whose scores
are no longer accurate.

Two viable approaches exist:

  1. Per-batch cache: Clear the entire cache at the start of each
    select_k_best_peers call. Only deduplicates repeated prediction calls
    within a single routing decision — no staleness risk.
  2. Write-through cache: Invalidate specific (peer, target) entries when
    add_event receives an event for that peer. Complex but preserves
    cross-call reuse.

Both approaches need careful benchmarking to confirm the ~60% hit rate
estimate from the issue. Worth a focused PR with a benchmark harness.

Item 5 — Router write lock blocks routing readers

Problem: router.write().add_event(event) holds an exclusive RwLock
write guard for ~5–15 µs (2–5 isotonic estimator updates + ML prediction).
During that window, all concurrent router.read() calls (routing decisions)
are blocked. On a dense gateway (500+ peers, 50+ ops/s) this causes ~2–3%
collision rate with rare 10–50 µs latency spikes.

Why not here: The fix requires routing events through a channel to a
background writer task, so readers never contend with writers. This is an
architectural change touching the event loop structure and the synchronisation
boundary between OpManager and Router — estimated at 1–2 days of work.
The gain is only measurable on dense gateways; typical nodes see ~0.004%
collision (negligible). A separate PR with #[bench] coverage makes sense.

Item 7 — PeerKeyLocation cloned 3× in forward_to_peer

Problem: forward_to_peer clones peer three times: once for
self.forwarded_to, once as the HashMap key in forward_attempts, and once
for the ForwardAttempt.peer field. The issue suggests cutting to two clones
by moving peer into the HashMap key and cloning from forwarded_to for the
field.

Why not here: The function returns (PeerKeyLocation, ConnectRequest),
and the returned PeerKeyLocation is consumed by callers (e.g. stored in
actions.forward, verified in test assertions). Moving peer into
forward_attempts.insert(peer, ...) means it's no longer available for the
return tuple. The suggested "3→2" pattern leaves a use-after-move error.

To actually eliminate a clone, one would need to either:

  • Change the return type to a reference (&PeerKeyLocation), propagating the
    lifetime change through Actions and all call sites, or
  • Store an Arc<PeerKeyLocation> so that clones are reference-counted.

Both are deeper API changes than the scope of this PR. The 3 clones amount to
~144 bytes + ~0.3 µs per hop — barely measurable (~12 KB/hr on a busy
gateway). Code hygiene issue, not a performance bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant