Skip to content

fix(transport): meter recv-side wire bytes post-authentication + LRU evict per-peer table#4024

Merged
sanity merged 3 commits intomainfrom
fix-3999
May 4, 2026
Merged

fix(transport): meter recv-side wire bytes post-authentication + LRU evict per-peer table#4024
sanity merged 3 commits intomainfrom
fix-3999

Conversation

@sanity
Copy link
Copy Markdown
Collaborator

@sanity sanity commented May 4, 2026

Problem

PR #3996 moved dashboard wire-byte accounting from stream-completion to the UDP socket layer so the local-peer dashboard reflects keep-alives, ACKs, and small contract ops — not just stream transfers. That fix exposed a pre-existing weakness in the per-peer DashMap:

  • TransportMetrics::record_packet_received ran from Socket::recv_from on every received UDP packet, before any decryption or peer-validation.
  • record_per_peer had no eviction policy and a 256-entry cap.

Combined, an attacker that fabricates UDP packets from many spoofed source IPs could:

  1. Fill the per-peer DashMap to capacity. After 256 distinct source IPs, legitimate new peers' bytes were silently dropped from per-peer accounting forever.
  2. Inflate cumulative_bytes_received arbitrarily — making the dashboard's "Total Data Received" gauge attacker-controlled.

The same no-eviction policy also caused stale entries to accumulate on long-running gateways as peer connections churned, even without a spoof attack — so a gateway with high peer turnover would eventually display "—" for SENT/RECV on legitimate new peers.

Approach

Two coordinated changes that follow the fix proposed in #3999, plus an eager-cleanup complement:

1. Move recv-side metering past authentication. record_packet_received is now called from PeerConnection::recv immediately after a packet passes try_decrypt_sym against the established symmetric session key — i.e. only for packets we know came from a peer holding the shared secret. The call was removed from Socket::recv_from. Outbound metering stays at the socket layer because send targets are controlled by us; there is no spoof vector on send.

2. LRU-evict the per-peer table when full. PeerTransferStats gained a last_seen_tick atomic stamped from a global monotonic counter on every update. When the table is full and a new peer arrives, record_per_peer evicts the entry with the smallest tick before inserting. Pure tick-based ordering avoids a wall-clock dependency and keeps simulation determinism intact (the rules in .claude/rules/code-style.md discourage Instant::now() in crates/core/).

3. Eager cleanup on disconnect. TRANSPORT_METRICS.remove_peer is called from record_peer_disconnected, so authenticated peer teardowns free their slot immediately rather than waiting for LRU eviction.

Testing

  • udp_socket_recv_from_does_not_record_pre_authentication (new) — pins the spoof regression: a forged UDP packet at the kernel handoff point creates no per-peer entry and contributes nothing to the cumulative counter on its own.
  • test_per_peer_lru_evicts_oldest (new) — pins the LRU ordering: refreshing the second-oldest entry's tick changes which entry is evicted on the next insert into a full table.
  • test_per_peer_capacity_bound (updated) — reflects the new LRU semantics. Pre-fix the over-capacity peer was silently dropped; post-fix it is tracked at the cost of evicting the oldest entry.
  • test_remove_peer_frees_slot and test_record_per_peer_is_idempotent_under_remove (new) — pin the disconnect-cleanup path and the no-op behavior of removing a non-existent / already-evicted peer.
  • udp_socket_records_packet_metrics (updated) — only the send side now asserts cumulative + per-peer movement; recv-side accounting moved to the post-auth path.

All 12 metrics unit tests + 13 transport dual-stack tests + full freenet lib suite (2538 tests) pass locally.

Fixes

Closes #3999

[AI-assisted - Claude]

…evict per-peer table

## Problem

PR #3996 moved dashboard wire-byte accounting from stream-completion to
the UDP socket layer so the local-peer dashboard reflects keep-alives,
ACKs, and small contract ops. That fix exposed a pre-existing weakness
in the per-peer DashMap:

- `TransportMetrics::record_packet_received` ran from
  `Socket::recv_from` on every received UDP packet, before any
  decryption or peer-validation.
- `record_per_peer` had no eviction policy and a 256-entry cap.

Combined, an attacker who fabricates UDP packets from many spoofed
source IPs could (a) fill the per-peer table to capacity, after which
legitimate new peers were silently dropped from the dashboard's
per-peer view forever, and (b) inflate `cumulative_bytes_received`
arbitrarily — making the dashboard's "Total Data Received" gauge
attacker-controlled. The same no-eviction policy also caused stale
entries to accumulate on long-running gateways as peer connections
churned, even without a spoof attack.

## Approach

Two coordinated changes in line with the issue's proposed fix:

1. Move recv-side metering past authentication. `record_packet_received`
   is now called from `PeerConnection::recv` immediately after a packet
   passes `try_decrypt_sym` against the established symmetric session
   key — i.e. only for packets we know came from a peer holding the
   shared secret. The call is removed from `Socket::recv_from`.
   Outbound metering stays at the socket layer because send targets
   are controlled by us; there is no spoof vector on send.

2. LRU-evict the per-peer table when full. `PeerTransferStats` gained
   a `last_seen_tick` atomic stamped from a global monotonic counter
   on every update. When the table is full and a new peer arrives,
   `record_per_peer` evicts the entry with the smallest tick before
   inserting. Pure tick-based ordering avoids a wall-clock dependency
   and keeps simulation determinism intact.

3. As an eager-cleanup complement to LRU, `TRANSPORT_METRICS.remove_peer`
   is called from `record_peer_disconnected`, so authenticated peer
   teardowns free their slot immediately rather than waiting for LRU.

## Testing

- `udp_socket_recv_from_does_not_record_pre_authentication`: pins the
  contract that a forged UDP packet at the kernel handoff point creates
  no per-peer entry and does not contribute to the cumulative counter.
- `test_per_peer_lru_evicts_oldest`: pins the LRU ordering — refreshing
  the second-oldest entry's tick changes which entry is evicted on the
  next insert into a full table.
- `test_per_peer_capacity_bound`: updated to reflect the LRU semantics
  (a new peer is now tracked at the cost of evicting the oldest, where
  pre-fix the new peer was silently dropped).
- `test_remove_peer_frees_slot` / `test_record_per_peer_is_idempotent_under_remove`:
  pin the disconnect-cleanup path and the no-op behavior of removing a
  non-existent / already-evicted peer.
- `udp_socket_records_packet_metrics`: updated — only the send side
  asserts cumulative + per-peer movement; recv-side accounting is
  exercised by the post-auth path in the integration suite.

Closes #3999

[AI-assisted - Claude]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Now I have everything I need to write the review.

Rule Review: No blocking issues; one test-isolation note

Rules checked: git-workflow.md, code-style.md, testing.md, transport.md
Files reviewed: 5 (network_status.rs, transport.rs, connection_handler.rs, metrics.rs, peer_connection.rs)


Warnings

None.


Info

  • crates/core/src/transport/peer_connection.rsrecv_records_packet_metrics_post_authentication uses the process-global TRANSPORT_METRICS singleton and guards its precondition with an is_none() check against the hardcoded address 10.99.99.1:50001. Under cargo test --test-threads > 1, two parallel invocations of the same test can race on that check: Thread B can find the entry inserted by Thread A before Thread A reaches the remove_peer cleanup, causing B's precondition assertion to fail. Zero risk under cargo nextest (per-process isolation). Consider documenting the nextest requirement above the test or obtaining the port dynamically with bind(0) + local_addr(). (rule: testing.md — concurrent/racy scenarios)

Summary of checks:

Rule area Result
fix: PR has regression tests ✓ — 5 new test functions pinning specific bug scenarios
.unwrap() in new production code ✓ — none; all new .unwrap() are in test code
Post-authentication metering placement ✓ — record_packet_received called only after successful try_decrypt_sym / asymmetric decrypt at all four new call sites
LRU eviction bounded and non-atomic acknowledged ✓ — comment in record_per_peer explicitly accepts transient overrun
remove_peer on disconnect (GC exemption TTL rule) ✓ — eager cleanup in record_peer_disconnected, LRU as safety net
No fire-and-forget spawns, no unguarded biased;
PR description sections Not verifiable from diff alone

Rule review against .claude/rules/. WARNING findings block merge.

sanity and others added 2 commits May 4, 2026 15:23
Cleanup pass on the #3999 fix:

- Drop "Fast path"/"Slow path" labels in `record_per_peer`; the
  branch structure is self-evident and the load-bearing comment
  about non-atomic capacity/eviction stays.
- In `udp_socket_records_packet_metrics`, remove the dead
  `cumulative_recv_before` capture and the `let _ = ...` discard
  that replaced the recv-side assertion. Fold the two adjacent
  comments about the "no per-peer entry for sender_addr" invariant
  into one block.
- In `udp_socket_recv_from_does_not_record_pre_authentication`,
  drop the trivially-monotonic `cumulative_bytes_received >=
  cumulative_recv_before` assertion (cumulative is monotonic by
  construction, so the check pinned nothing). The per-peer
  absence assertion is the load-bearing pin for #3999.
- Rename `test_record_per_peer_is_idempotent_under_remove` to
  `test_remove_peer_is_idempotent` — the test exercises
  `remove_peer`, not `record_per_peer`.

No behavioral change; LRU eviction, post-auth metering, and
remove_peer wiring are unchanged.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…post-auth metering test

Address review findings on PR #4024:

1. **Meter handshake-completion bytes** (code-first reviewer MEDIUM,
   skeptical reviewer H2). Three previously-unmetered authenticated
   inbound paths now call `record_packet_received`:
   - `peer_connection.rs::recv` intro-packet asymmetric decrypt success
     path (peer restart-detection)
   - `connection_handler.rs::gateway_connection` symmetric ACK
     reception
   - `connection_handler.rs::traverse_nat` outbound-handshake
     symmetric ACK reception

   Pre-fix these counted at the socket layer; without these calls the
   "Total Data Received" gauge would silently undercount handshake
   traffic on a busy gateway.

2. **Add post-auth metering regression test** (testing reviewer
   CRITICAL #1, #2). New `recv_records_packet_metrics_post_authentication`
   exercises the full path through `PeerConnection::recv`: encrypts
   a 100-byte ShortMessage, feeds it via the inbound channel, and
   asserts both that (a) the per-peer entry advanced by the **wire
   (encrypted) size**, not the decrypted size, and (b) the cumulative
   counter moved by at least the wire size. Pins both the
   call-site-presence and the byte-semantics invariants so a future
   refactor can't silently rot the dashboard.

3. **Fix stale field comment** (big-picture reviewer MEDIUM). The
   field-level comment on `cumulative_bytes_sent`/`cumulative_bytes_received`
   claimed both counters update at the socket layer "for every packet";
   after the post-auth move that is only true for the send side.
   Rewritten to describe the asymmetry and cite #3999.

[AI-assisted - Claude]
@sanity
Copy link
Copy Markdown
Collaborator Author

sanity commented May 4, 2026

Review responses

Five reviews ran in parallel (code-first, testing, skeptical, big-picture, Codex). Summary of how each finding was addressed in 83ae73b:

Addressed in code

Code-first MEDIUM — handshake-completion bytes silently dropped from dashboard. Added record_packet_received at the three authenticated-decrypt-success sites identified by the reviewer plus the skeptical reviewer:

  • connection_handler.rs::gateway_connection (gateway-side symmetric ACK)
  • connection_handler.rs::traverse_nat (client-side symmetric ACK)
  • peer_connection.rs::recv intro-packet asymmetric-decrypt success path

Big-picture MEDIUM — stale field comment. The struct comment on cumulative_bytes_sent/cumulative_bytes_received still claimed both update at the socket layer "for every packet"; rewrote to describe the post-auth asymmetry and cite #3999.

Testing CRITICAL #1 + #2 — post-auth recv hook untested; encrypted-byte semantics unpinned. Added recv_records_packet_metrics_post_authentication that builds an encrypted ShortMessage, feeds it via the inbound channel, drives one iteration of PeerConnection::recv, and asserts both that the per-peer entry advanced by the wire (encrypted) size (not the decrypted size) and that the cumulative counter moved.

Considered but not changed

Codex P2 — backpressure-dropped packets at try_send_established no longer counted. Real but acceptable tradeoff. INBOUND_CHANNEL_CAPACITY = 2048 so this only fires under sustained heavy load, and the alternative (counting at the listener pre-decrypt) reintroduces the spoof inflation attack the issue is closing. Under-reporting under load is "users see truth modulo backpressure drops"; spoof-inflated counters are "users see fiction".

Skeptical H1 — replay/retransmit double-counts. Both pre-fix and post-fix count retransmits N times because they ARE on the wire N times — that's correct semantics for a wire-byte counter. The post-fix change does not introduce a new threat: an authenticated peer who replays packets to inflate their own dashboard entry can do the same by sending fresh valid traffic. Anonymous spoof inflation (the actual issue) is closed.

Skeptical H3 — concurrent-burst over-eviction race. Real but bounded. With N concurrent inserts at capacity, up to N evictions can fire instead of 1. In practice rare and self-healing; for a dashboard counter the imprecision is acceptable. Inline comment in record_per_peer already acknowledges the transient over-capacity. Adding a Mutex would serialize the hot path for an aesthetic improvement that doesn't change correctness.

Skeptical H4 — remove_peer racing with record_per_peer orphans the increment. DashMap 6.1's Ref<K, V> holds a per-shard RwLockReadGuard for its lifetime, so a concurrent remove (which needs the write lock) cannot interleave with our fetch_add on the held guard. The race the reviewer described is specific to Arc<RwLock<HashMap>> patterns, not DashMap. Verified against dashmap 6.1.0.

Big-picture LOW (other suggestions). Documentation polish, follow-up scope. Filing or addressing these would belong in a follow-up PR rather than block #3999.

Tests

  • cargo test -p freenet --lib transport:: — 565 passed
  • New test recv_records_packet_metrics_post_authentication — passes
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo fmt — clean

[AI-assisted - Claude]

@sanity sanity added this pull request to the merge queue May 4, 2026
Merged via the queue into main with commit 7738928 May 4, 2026
17 checks passed
@sanity sanity deleted the fix-3999 branch May 4, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move recv-side per-peer metering past authentication + add LRU eviction

1 participant