perf: eliminate triple-collect and double-pass in uphill routing#4247
perf: eliminate triple-collect and double-pass in uphill routing#4247Basedfloppa wants to merge 3 commits into
Conversation
Issue 1 — Triple-collect chain: chain filter+map directly into partition() inside select_uphill_hop, eliminating the intermediate Vec<PeerKeyLocation> allocation (best_candidates). Issue 3 — Single-pass reliability scoring: merge the separate reliability-scoring fold pass into the initial filter_map by tracking best_reliability inline with best_reliability.max(reliability). Closes first two items from freenet#4243.
|
I now have enough context to write the review. Rule Review: Missing boundary regression test for k=0 guardRules checked: git-workflow.md, code-style.md, testing.md, operations.md Warnings
Info
Rule review against |
Issue 2 — Vec capacity hints: pre-allocate scored/fallbacks with candidates.len() and expired with forward_attempts.len() to avoid geometric reallocations on the CONNECT hot path. Issue 6 — Partial sort: use select_nth_unstable_by in select_k_best_peers to find the k closest peers in O(n) instead of O(n log n) full sort. k = consider_n_closest_peers (default 25) is typically 2-4× smaller than the candidate pool. Part of freenet#4243.
|
Closes items 1, 2, 3, and 6 from #4243. WhyFour allocator-level hot spots identified in the routing engine, concentrated in Triple-collect chain (
|
| Location | Before | After |
|---|---|---|
select_next_hop — scored |
Vec::new() |
Vec::with_capacity(candidates.len()) |
select_next_hop — fallbacks |
Vec::new() |
Vec::with_capacity(candidates.len()) |
expire_forward_attempts — expired |
Vec::new() |
Vec::with_capacity(self.forward_attempts.len()) |
Issue 3 — Single-pass reliability scoring
Merge the fold pass into the initial filter_map by tracking
best_reliability inline via best_reliability.max(reliability). The
filter_map now serves triple duty: recency filtering, reliability scoring,
and best-reliability tracking.
Issue 6 — Partial sort
Replace sort_by_key (O(n log n)) with select_nth_unstable_by (O(n)) for
finding the k closest peers, then sort only the truncated k.
What's intentionally excluded from this PR
| Item | Reason |
|---|---|
| 4 — LRU cache | Requires careful invalidation design; estimator state changes on every add_event(). Worth a focused PR. |
| 5 — Router write-lock | Architectural (channel-based handoff). ~1-2 days. Marginal gain outside dense gateways. |
| 7 — PeerKeyLocation clones | The 3→2 reduction conflicts with the return value ownership; function returns peer after insertion. Deeper redesign needed. |
Testing
cargo fmt/cargo clippy --all-targets -- -D warnings— cleancargo test -p freenet --lib -- operations::connect— 60/60 passedcargo test -p freenet --lib -- router— 87/87 passed- Full lib suite — 2594/2596 passed; 2 pre-existing failures
(IPv6-unavailable, flaky delegate test — both unrelated)
Why items 4, 5, and 7 are excluded from this PRItem 4 — LRU cache for
|
Closes items 1 and 3 from #4243.
Why
select_uphill_hopis on the CONNECT hot path — it runs on every uphill routing decision when a peer is at terminus but cannot accept. The function had two unnecessary allocation patterns that add latency proportional to candidate count:Triple-collect chain (
scored→best_candidates→partition): Two intermediateVecallocations before the final partition. Thebest_candidatesVec served only as a filtering step with no semantic need to materialize.Double-pass reliability scoring (filter → map → fold → filter): Reliability was computed per-candidate in a
.map(), then re-iterated in a.fold()to find the maximum. Since iteration is lazy until.collect(), this is a second pass over the entire candidate set.Each allocation and pass is cheap in isolation, but on the hot path — especially with many candidates — they compound into measurable GC pressure and cache misses.
What
Issue 1 — Triple-collect chain (item 1 from #4243)
Chain
filter().map()directly into.partition(), eliminating the intermediateVec<PeerKeyLocation>(best_candidates). The scored Vec becomes the only intermediate allocation (necessary because we need the reliability-score data before partitioning).Before:
scored: Vec<(f64, PKL)>→filter().map()→best_candidates: Vec<PKL>→partition()After:
scored: Vec<(f64, PKL)>→filter().map()→partition()(one less Vec)Issue 3 — Single-pass reliability scoring (item 3 from #4243)
Merge the
foldpass into the initialfilter_mapiterator by trackingbest_reliabilityinline viabest_reliability.max(reliability). Thefilter_mapnow serves triple duty: recency filtering, reliability scoring, and best-reliability tracking.Before:
filter()+map()to collect scored →iter().map().fold()for maxAfter:
filter_map()with inlinebest_reliabilitytracking (single pass)Correctness
All three changes (recency filter, reliability scoring, partition logic) preserve the exact same semantics:
best_reliability= max over all reliability scores — same result whether computed inline or viafoldfilter(|(r, _)| best - *r < tol)operates on the samescoredVec in both casespartition()into close/far by distance threshold — unchangedrouter.select_peer()call — unchangedTesting
cargo fmt --check— cleancargo clippy --all-targets -- -D warnings— cleancargo test -p freenet --lib -- operations::connect— 60/60 passedcargo test -p freenet --lib— 2594/2596 passed; 2 pre-existing failures (IPv6-unavailable, flaky delegate test — both unrelated)Related