fix(transport): meter recv-side wire bytes post-authentication + LRU evict per-peer table#4024
Conversation
…evict per-peer table ## Problem PR #3996 moved dashboard wire-byte accounting from stream-completion to the UDP socket layer so the local-peer dashboard reflects keep-alives, ACKs, and small contract ops. That fix exposed a pre-existing weakness in the per-peer DashMap: - `TransportMetrics::record_packet_received` ran from `Socket::recv_from` on every received UDP packet, before any decryption or peer-validation. - `record_per_peer` had no eviction policy and a 256-entry cap. Combined, an attacker who fabricates UDP packets from many spoofed source IPs could (a) fill the per-peer table to capacity, after which legitimate new peers were silently dropped from the dashboard's per-peer view forever, and (b) inflate `cumulative_bytes_received` arbitrarily — making the dashboard's "Total Data Received" gauge attacker-controlled. The same no-eviction policy also caused stale entries to accumulate on long-running gateways as peer connections churned, even without a spoof attack. ## Approach Two coordinated changes in line with the issue's proposed fix: 1. Move recv-side metering past authentication. `record_packet_received` is now called from `PeerConnection::recv` immediately after a packet passes `try_decrypt_sym` against the established symmetric session key — i.e. only for packets we know came from a peer holding the shared secret. The call is removed from `Socket::recv_from`. Outbound metering stays at the socket layer because send targets are controlled by us; there is no spoof vector on send. 2. LRU-evict the per-peer table when full. `PeerTransferStats` gained a `last_seen_tick` atomic stamped from a global monotonic counter on every update. When the table is full and a new peer arrives, `record_per_peer` evicts the entry with the smallest tick before inserting. Pure tick-based ordering avoids a wall-clock dependency and keeps simulation determinism intact. 3. As an eager-cleanup complement to LRU, `TRANSPORT_METRICS.remove_peer` is called from `record_peer_disconnected`, so authenticated peer teardowns free their slot immediately rather than waiting for LRU. ## Testing - `udp_socket_recv_from_does_not_record_pre_authentication`: pins the contract that a forged UDP packet at the kernel handoff point creates no per-peer entry and does not contribute to the cumulative counter. - `test_per_peer_lru_evicts_oldest`: pins the LRU ordering — refreshing the second-oldest entry's tick changes which entry is evicted on the next insert into a full table. - `test_per_peer_capacity_bound`: updated to reflect the LRU semantics (a new peer is now tracked at the cost of evicting the oldest, where pre-fix the new peer was silently dropped). - `test_remove_peer_frees_slot` / `test_record_per_peer_is_idempotent_under_remove`: pin the disconnect-cleanup path and the no-op behavior of removing a non-existent / already-evicted peer. - `udp_socket_records_packet_metrics`: updated — only the send side asserts cumulative + per-peer movement; recv-side accounting is exercised by the post-auth path in the integration suite. Closes #3999 [AI-assisted - Claude]
|
Now I have everything I need to write the review. Rule Review: No blocking issues; one test-isolation noteRules checked: WarningsNone. Info
Summary of checks:
Rule review against |
Cleanup pass on the #3999 fix: - Drop "Fast path"/"Slow path" labels in `record_per_peer`; the branch structure is self-evident and the load-bearing comment about non-atomic capacity/eviction stays. - In `udp_socket_records_packet_metrics`, remove the dead `cumulative_recv_before` capture and the `let _ = ...` discard that replaced the recv-side assertion. Fold the two adjacent comments about the "no per-peer entry for sender_addr" invariant into one block. - In `udp_socket_recv_from_does_not_record_pre_authentication`, drop the trivially-monotonic `cumulative_bytes_received >= cumulative_recv_before` assertion (cumulative is monotonic by construction, so the check pinned nothing). The per-peer absence assertion is the load-bearing pin for #3999. - Rename `test_record_per_peer_is_idempotent_under_remove` to `test_remove_peer_is_idempotent` — the test exercises `remove_peer`, not `record_per_peer`. No behavioral change; LRU eviction, post-auth metering, and remove_peer wiring are unchanged. [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…post-auth metering test Address review findings on PR #4024: 1. **Meter handshake-completion bytes** (code-first reviewer MEDIUM, skeptical reviewer H2). Three previously-unmetered authenticated inbound paths now call `record_packet_received`: - `peer_connection.rs::recv` intro-packet asymmetric decrypt success path (peer restart-detection) - `connection_handler.rs::gateway_connection` symmetric ACK reception - `connection_handler.rs::traverse_nat` outbound-handshake symmetric ACK reception Pre-fix these counted at the socket layer; without these calls the "Total Data Received" gauge would silently undercount handshake traffic on a busy gateway. 2. **Add post-auth metering regression test** (testing reviewer CRITICAL #1, #2). New `recv_records_packet_metrics_post_authentication` exercises the full path through `PeerConnection::recv`: encrypts a 100-byte ShortMessage, feeds it via the inbound channel, and asserts both that (a) the per-peer entry advanced by the **wire (encrypted) size**, not the decrypted size, and (b) the cumulative counter moved by at least the wire size. Pins both the call-site-presence and the byte-semantics invariants so a future refactor can't silently rot the dashboard. 3. **Fix stale field comment** (big-picture reviewer MEDIUM). The field-level comment on `cumulative_bytes_sent`/`cumulative_bytes_received` claimed both counters update at the socket layer "for every packet"; after the post-auth move that is only true for the send side. Rewritten to describe the asymmetry and cite #3999. [AI-assisted - Claude]
Review responsesFive reviews ran in parallel (code-first, testing, skeptical, big-picture, Codex). Summary of how each finding was addressed in 83ae73b: Addressed in codeCode-first MEDIUM — handshake-completion bytes silently dropped from dashboard. Added
Big-picture MEDIUM — stale field comment. The struct comment on Testing CRITICAL #1 + #2 — post-auth recv hook untested; encrypted-byte semantics unpinned. Added Considered but not changedCodex P2 — backpressure-dropped packets at Skeptical H1 — replay/retransmit double-counts. Both pre-fix and post-fix count retransmits N times because they ARE on the wire N times — that's correct semantics for a wire-byte counter. The post-fix change does not introduce a new threat: an authenticated peer who replays packets to inflate their own dashboard entry can do the same by sending fresh valid traffic. Anonymous spoof inflation (the actual issue) is closed. Skeptical H3 — concurrent-burst over-eviction race. Real but bounded. With N concurrent inserts at capacity, up to N evictions can fire instead of 1. In practice rare and self-healing; for a dashboard counter the imprecision is acceptable. Inline comment in Skeptical H4 — Big-picture LOW (other suggestions). Documentation polish, follow-up scope. Filing or addressing these would belong in a follow-up PR rather than block #3999. Tests
[AI-assisted - Claude] |
Problem
PR #3996 moved dashboard wire-byte accounting from stream-completion to the UDP socket layer so the local-peer dashboard reflects keep-alives, ACKs, and small contract ops — not just stream transfers. That fix exposed a pre-existing weakness in the per-peer DashMap:
TransportMetrics::record_packet_receivedran fromSocket::recv_fromon every received UDP packet, before any decryption or peer-validation.record_per_peerhad no eviction policy and a 256-entry cap.Combined, an attacker that fabricates UDP packets from many spoofed source IPs could:
cumulative_bytes_receivedarbitrarily — making the dashboard's "Total Data Received" gauge attacker-controlled.The same no-eviction policy also caused stale entries to accumulate on long-running gateways as peer connections churned, even without a spoof attack — so a gateway with high peer turnover would eventually display "—" for SENT/RECV on legitimate new peers.
Approach
Two coordinated changes that follow the fix proposed in #3999, plus an eager-cleanup complement:
1. Move recv-side metering past authentication.
record_packet_receivedis now called fromPeerConnection::recvimmediately after a packet passestry_decrypt_symagainst the established symmetric session key — i.e. only for packets we know came from a peer holding the shared secret. The call was removed fromSocket::recv_from. Outbound metering stays at the socket layer because send targets are controlled by us; there is no spoof vector on send.2. LRU-evict the per-peer table when full.
PeerTransferStatsgained alast_seen_tickatomic stamped from a global monotonic counter on every update. When the table is full and a new peer arrives,record_per_peerevicts the entry with the smallest tick before inserting. Pure tick-based ordering avoids a wall-clock dependency and keeps simulation determinism intact (the rules in.claude/rules/code-style.mddiscourageInstant::now()incrates/core/).3. Eager cleanup on disconnect.
TRANSPORT_METRICS.remove_peeris called fromrecord_peer_disconnected, so authenticated peer teardowns free their slot immediately rather than waiting for LRU eviction.Testing
udp_socket_recv_from_does_not_record_pre_authentication(new) — pins the spoof regression: a forged UDP packet at the kernel handoff point creates no per-peer entry and contributes nothing to the cumulative counter on its own.test_per_peer_lru_evicts_oldest(new) — pins the LRU ordering: refreshing the second-oldest entry's tick changes which entry is evicted on the next insert into a full table.test_per_peer_capacity_bound(updated) — reflects the new LRU semantics. Pre-fix the over-capacity peer was silently dropped; post-fix it is tracked at the cost of evicting the oldest entry.test_remove_peer_frees_slotandtest_record_per_peer_is_idempotent_under_remove(new) — pin the disconnect-cleanup path and the no-op behavior of removing a non-existent / already-evicted peer.udp_socket_records_packet_metrics(updated) — only the send side now asserts cumulative + per-peer movement; recv-side accounting moved to the post-auth path.All 12 metrics unit tests + 13 transport dual-stack tests + full freenet lib suite (2538 tests) pass locally.
Fixes
Closes #3999
[AI-assisted - Claude]