-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Problem
PRs #103, #104, and #105 weakened the six-peer regression test's mesh topology assertion to work around connection formation failures in Docker NAT environments. This is the wrong approach — weakening tests to make them pass masks real issues.
What was changed (needs reverting)
- PR fix: improve mesh topology assertion to wait for connection formation #103: Rewrote
assert_mesh_topologywith polling loop requiring >= 2 P2P connections per peer (90s timeout) — this part was reasonable - PR fix: add post-subscription propagation delay for six-peer test #104: Added 15s post-subscription propagation delay — reasonable mitigation but may mask timing bugs
- PR fix: lower mesh assertion to 1 P2P connection for Docker NAT reliability #105: Lowered minimum from 2 to 1 P2P connection and reduced timeout from 90s to 60s — this weakened the test and should be reverted
What should happen instead
The mesh assertion should require >= 2 P2P connections per peer. If peers can't form enough connections in Docker NAT within the timeout, the root cause needs investigation:
- Why can't some peers form a second P2P connection? The topology manager sets
min_connections=4but peers consistently end up with only 1-3 P2P connections after 90+ seconds - Is the topology manager's connection selection algorithm too slow for small networks? With
DENSITY_SELECTION_THRESHOLD=5, peers with < 5 connections stay in "local neighborhood" mode which may not aggressively add connections - Is Docker NAT hole punching unreliable for certain peer pairs? Some peer pairs may fail NAT traversal consistently
Root cause investigation needed
The topology manager in freenet-core (crates/core/src/topology/mod.rs) adjusts connections periodically. In a 6-peer network:
min_connections=4means each peer should have at least 4 connections (including gateway)- After 120+ seconds (30s stabilization + 90s polling), some peers still have only 2 total connections (gateway + 1 P2P)
- The topology manager's
adjust_topologyruns every ~15s but isn't forming connections fast enough
Additionally, subscription propagation (freenet-core issue #3037) relies on proximity cache announcements that need time to propagate bidirectionally. This is a separate but related issue.
Action items
- Revert PR fix: lower mesh assertion to 1 P2P connection for Docker NAT reliability #105's changes (restore min_p2p_per_peer=2, max_wait=90s)
- Keep PR fix: add post-subscription propagation delay for six-peer test #104's subscription propagation delay (this is a valid mitigation)
- Investigate why the topology manager can't form enough connections in 90+ seconds in Docker NAT
- Fix the connection formation issue in freenet-core
[AI-assisted - Claude]