fix(transport): meter recv-side wire bytes post-authentication + LRU evict per-peer table by sanity · Pull Request #4024 · freenet/freenet-core

sanity · 2026-05-04T20:17:31Z

Problem

PR #3996 moved dashboard wire-byte accounting from stream-completion to the UDP socket layer so the local-peer dashboard reflects keep-alives, ACKs, and small contract ops — not just stream transfers. That fix exposed a pre-existing weakness in the per-peer DashMap:

TransportMetrics::record_packet_received ran from Socket::recv_from on every received UDP packet, before any decryption or peer-validation.
record_per_peer had no eviction policy and a 256-entry cap.

Combined, an attacker that fabricates UDP packets from many spoofed source IPs could:

Fill the per-peer DashMap to capacity. After 256 distinct source IPs, legitimate new peers' bytes were silently dropped from per-peer accounting forever.
Inflate cumulative_bytes_received arbitrarily — making the dashboard's "Total Data Received" gauge attacker-controlled.

The same no-eviction policy also caused stale entries to accumulate on long-running gateways as peer connections churned, even without a spoof attack — so a gateway with high peer turnover would eventually display "—" for SENT/RECV on legitimate new peers.

Approach

Two coordinated changes that follow the fix proposed in #3999, plus an eager-cleanup complement:

1. Move recv-side metering past authentication. record_packet_received is now called from PeerConnection::recv immediately after a packet passes try_decrypt_sym against the established symmetric session key — i.e. only for packets we know came from a peer holding the shared secret. The call was removed from Socket::recv_from. Outbound metering stays at the socket layer because send targets are controlled by us; there is no spoof vector on send.

2. LRU-evict the per-peer table when full. PeerTransferStats gained a last_seen_tick atomic stamped from a global monotonic counter on every update. When the table is full and a new peer arrives, record_per_peer evicts the entry with the smallest tick before inserting. Pure tick-based ordering avoids a wall-clock dependency and keeps simulation determinism intact (the rules in .claude/rules/code-style.md discourage Instant::now() in crates/core/).

3. Eager cleanup on disconnect. TRANSPORT_METRICS.remove_peer is called from record_peer_disconnected, so authenticated peer teardowns free their slot immediately rather than waiting for LRU eviction.

Testing

udp_socket_recv_from_does_not_record_pre_authentication (new) — pins the spoof regression: a forged UDP packet at the kernel handoff point creates no per-peer entry and contributes nothing to the cumulative counter on its own.
test_per_peer_lru_evicts_oldest (new) — pins the LRU ordering: refreshing the second-oldest entry's tick changes which entry is evicted on the next insert into a full table.
test_per_peer_capacity_bound (updated) — reflects the new LRU semantics. Pre-fix the over-capacity peer was silently dropped; post-fix it is tracked at the cost of evicting the oldest entry.
test_remove_peer_frees_slot and test_record_per_peer_is_idempotent_under_remove (new) — pin the disconnect-cleanup path and the no-op behavior of removing a non-existent / already-evicted peer.
udp_socket_records_packet_metrics (updated) — only the send side now asserts cumulative + per-peer movement; recv-side accounting moved to the post-auth path.

All 12 metrics unit tests + 13 transport dual-stack tests + full freenet lib suite (2538 tests) pass locally.

Fixes

Closes #3999

[AI-assisted - Claude]

…evict per-peer table ## Problem PR #3996 moved dashboard wire-byte accounting from stream-completion to the UDP socket layer so the local-peer dashboard reflects keep-alives, ACKs, and small contract ops. That fix exposed a pre-existing weakness in the per-peer DashMap: - `TransportMetrics::record_packet_received` ran from `Socket::recv_from` on every received UDP packet, before any decryption or peer-validation. - `record_per_peer` had no eviction policy and a 256-entry cap. Combined, an attacker who fabricates UDP packets from many spoofed source IPs could (a) fill the per-peer table to capacity, after which legitimate new peers were silently dropped from the dashboard's per-peer view forever, and (b) inflate `cumulative_bytes_received` arbitrarily — making the dashboard's "Total Data Received" gauge attacker-controlled. The same no-eviction policy also caused stale entries to accumulate on long-running gateways as peer connections churned, even without a spoof attack. ## Approach Two coordinated changes in line with the issue's proposed fix: 1. Move recv-side metering past authentication. `record_packet_received` is now called from `PeerConnection::recv` immediately after a packet passes `try_decrypt_sym` against the established symmetric session key — i.e. only for packets we know came from a peer holding the shared secret. The call is removed from `Socket::recv_from`. Outbound metering stays at the socket layer because send targets are controlled by us; there is no spoof vector on send. 2. LRU-evict the per-peer table when full. `PeerTransferStats` gained a `last_seen_tick` atomic stamped from a global monotonic counter on every update. When the table is full and a new peer arrives, `record_per_peer` evicts the entry with the smallest tick before inserting. Pure tick-based ordering avoids a wall-clock dependency and keeps simulation determinism intact. 3. As an eager-cleanup complement to LRU, `TRANSPORT_METRICS.remove_peer` is called from `record_peer_disconnected`, so authenticated peer teardowns free their slot immediately rather than waiting for LRU. ## Testing - `udp_socket_recv_from_does_not_record_pre_authentication`: pins the contract that a forged UDP packet at the kernel handoff point creates no per-peer entry and does not contribute to the cumulative counter. - `test_per_peer_lru_evicts_oldest`: pins the LRU ordering — refreshing the second-oldest entry's tick changes which entry is evicted on the next insert into a full table. - `test_per_peer_capacity_bound`: updated to reflect the LRU semantics (a new peer is now tracked at the cost of evicting the oldest, where pre-fix the new peer was silently dropped). - `test_remove_peer_frees_slot` / `test_record_per_peer_is_idempotent_under_remove`: pin the disconnect-cleanup path and the no-op behavior of removing a non-existent / already-evicted peer. - `udp_socket_records_packet_metrics`: updated — only the send side asserts cumulative + per-peer movement; recv-side accounting is exercised by the post-auth path in the integration suite. Closes #3999 [AI-assisted - Claude]

github-actions · 2026-05-04T20:19:30Z

Now I have everything I need to write the review.

Rule Review: No blocking issues; one test-isolation note

Rules checked: git-workflow.md, code-style.md, testing.md, transport.md
Files reviewed: 5 (network_status.rs, transport.rs, connection_handler.rs, metrics.rs, peer_connection.rs)

Warnings

None.

Info

crates/core/src/transport/peer_connection.rs — recv_records_packet_metrics_post_authentication uses the process-global TRANSPORT_METRICS singleton and guards its precondition with an is_none() check against the hardcoded address 10.99.99.1:50001. Under cargo test --test-threads > 1, two parallel invocations of the same test can race on that check: Thread B can find the entry inserted by Thread A before Thread A reaches the remove_peer cleanup, causing B's precondition assertion to fail. Zero risk under cargo nextest (per-process isolation). Consider documenting the nextest requirement above the test or obtaining the port dynamically with bind(0) + local_addr(). (rule: testing.md — concurrent/racy scenarios)

Summary of checks:

Rule area	Result
`fix:` PR has regression tests	✓ — 5 new test functions pinning specific bug scenarios
`.unwrap()` in new production code	✓ — none; all new `.unwrap()` are in test code
Post-authentication metering placement	✓ — `record_packet_received` called only after successful `try_decrypt_sym` / asymmetric decrypt at all four new call sites
LRU eviction bounded and non-atomic acknowledged	✓ — comment in `record_per_peer` explicitly accepts transient overrun
`remove_peer` on disconnect (GC exemption TTL rule)	✓ — eager cleanup in `record_peer_disconnected`, LRU as safety net
No fire-and-forget spawns, no unguarded `biased;`	✓
PR description sections	Not verifiable from diff alone

Rule review against .claude/rules/. WARNING findings block merge.

Cleanup pass on the #3999 fix: - Drop "Fast path"/"Slow path" labels in `record_per_peer`; the branch structure is self-evident and the load-bearing comment about non-atomic capacity/eviction stays. - In `udp_socket_records_packet_metrics`, remove the dead `cumulative_recv_before` capture and the `let _ = ...` discard that replaced the recv-side assertion. Fold the two adjacent comments about the "no per-peer entry for sender_addr" invariant into one block. - In `udp_socket_recv_from_does_not_record_pre_authentication`, drop the trivially-monotonic `cumulative_bytes_received >= cumulative_recv_before` assertion (cumulative is monotonic by construction, so the check pinned nothing). The per-peer absence assertion is the load-bearing pin for #3999. - Rename `test_record_per_peer_is_idempotent_under_remove` to `test_remove_peer_is_idempotent` — the test exercises `remove_peer`, not `record_per_peer`. No behavioral change; LRU eviction, post-auth metering, and remove_peer wiring are unchanged. [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…post-auth metering test Address review findings on PR #4024: 1. **Meter handshake-completion bytes** (code-first reviewer MEDIUM, skeptical reviewer H2). Three previously-unmetered authenticated inbound paths now call `record_packet_received`: - `peer_connection.rs::recv` intro-packet asymmetric decrypt success path (peer restart-detection) - `connection_handler.rs::gateway_connection` symmetric ACK reception - `connection_handler.rs::traverse_nat` outbound-handshake symmetric ACK reception Pre-fix these counted at the socket layer; without these calls the "Total Data Received" gauge would silently undercount handshake traffic on a busy gateway. 2. **Add post-auth metering regression test** (testing reviewer CRITICAL #1, #2). New `recv_records_packet_metrics_post_authentication` exercises the full path through `PeerConnection::recv`: encrypts a 100-byte ShortMessage, feeds it via the inbound channel, and asserts both that (a) the per-peer entry advanced by the **wire (encrypted) size**, not the decrypted size, and (b) the cumulative counter moved by at least the wire size. Pins both the call-site-presence and the byte-semantics invariants so a future refactor can't silently rot the dashboard. 3. **Fix stale field comment** (big-picture reviewer MEDIUM). The field-level comment on `cumulative_bytes_sent`/`cumulative_bytes_received` claimed both counters update at the socket layer "for every packet"; after the post-auth move that is only true for the send side. Rewritten to describe the asymmetry and cite #3999. [AI-assisted - Claude]

sanity · 2026-05-04T20:36:29Z

Review responses

Five reviews ran in parallel (code-first, testing, skeptical, big-picture, Codex). Summary of how each finding was addressed in 83ae73b:

Addressed in code

Code-first MEDIUM — handshake-completion bytes silently dropped from dashboard. Added record_packet_received at the three authenticated-decrypt-success sites identified by the reviewer plus the skeptical reviewer:

connection_handler.rs::gateway_connection (gateway-side symmetric ACK)
connection_handler.rs::traverse_nat (client-side symmetric ACK)
peer_connection.rs::recv intro-packet asymmetric-decrypt success path

Big-picture MEDIUM — stale field comment. The struct comment on cumulative_bytes_sent/cumulative_bytes_received still claimed both update at the socket layer "for every packet"; rewrote to describe the post-auth asymmetry and cite #3999.

Testing CRITICAL #1 + #2 — post-auth recv hook untested; encrypted-byte semantics unpinned. Added recv_records_packet_metrics_post_authentication that builds an encrypted ShortMessage, feeds it via the inbound channel, drives one iteration of PeerConnection::recv, and asserts both that the per-peer entry advanced by the wire (encrypted) size (not the decrypted size) and that the cumulative counter moved.

Considered but not changed

Codex P2 — backpressure-dropped packets at try_send_established no longer counted. Real but acceptable tradeoff. INBOUND_CHANNEL_CAPACITY = 2048 so this only fires under sustained heavy load, and the alternative (counting at the listener pre-decrypt) reintroduces the spoof inflation attack the issue is closing. Under-reporting under load is "users see truth modulo backpressure drops"; spoof-inflated counters are "users see fiction".

Skeptical H1 — replay/retransmit double-counts. Both pre-fix and post-fix count retransmits N times because they ARE on the wire N times — that's correct semantics for a wire-byte counter. The post-fix change does not introduce a new threat: an authenticated peer who replays packets to inflate their own dashboard entry can do the same by sending fresh valid traffic. Anonymous spoof inflation (the actual issue) is closed.

Skeptical H3 — concurrent-burst over-eviction race. Real but bounded. With N concurrent inserts at capacity, up to N evictions can fire instead of 1. In practice rare and self-healing; for a dashboard counter the imprecision is acceptable. Inline comment in record_per_peer already acknowledges the transient over-capacity. Adding a Mutex would serialize the hot path for an aesthetic improvement that doesn't change correctness.

Skeptical H4 — remove_peer racing with record_per_peer orphans the increment. DashMap 6.1's Ref<K, V> holds a per-shard RwLockReadGuard for its lifetime, so a concurrent remove (which needs the write lock) cannot interleave with our fetch_add on the held guard. The race the reviewer described is specific to Arc<RwLock<HashMap>> patterns, not DashMap. Verified against dashmap 6.1.0.

Big-picture LOW (other suggestions). Documentation polish, follow-up scope. Filing or addressing these would belong in a follow-up PR rather than block #3999.

Tests

cargo test -p freenet --lib transport:: — 565 passed
New test recv_records_packet_metrics_post_authentication — passes
cargo clippy --all-targets -- -D warnings — clean
cargo fmt — clean

[AI-assisted - Claude]

sanity and others added 2 commits May 4, 2026 15:23

sanity added this pull request to the merge queue May 4, 2026

Merged via the queue into main with commit 7738928 May 4, 2026
17 checks passed

sanity deleted the fix-3999 branch May 4, 2026 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(transport): meter recv-side wire bytes post-authentication + LRU evict per-peer table#4024

fix(transport): meter recv-side wire bytes post-authentication + LRU evict per-peer table#4024
sanity merged 3 commits intomainfrom
fix-3999

sanity commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

sanity commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sanity commented May 4, 2026

Problem

Approach

Testing

Fixes

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rule Review: No blocking issues; one test-isolation note

Warnings

Info

Uh oh!

sanity commented May 4, 2026

Review responses

Addressed in code

Considered but not changed

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 4, 2026 •

edited

Loading