Skip to content

fix: revert weakened mesh assertion and properly debug six-peer connection formation #106

@sanity

Description

@sanity

Problem

PRs #103, #104, and #105 weakened the six-peer regression test's mesh topology assertion to work around connection formation failures in Docker NAT environments. This is the wrong approach — weakening tests to make them pass masks real issues.

What was changed (needs reverting)

  1. PR fix: improve mesh topology assertion to wait for connection formation #103: Rewrote assert_mesh_topology with polling loop requiring >= 2 P2P connections per peer (90s timeout) — this part was reasonable
  2. PR fix: add post-subscription propagation delay for six-peer test #104: Added 15s post-subscription propagation delay — reasonable mitigation but may mask timing bugs
  3. PR fix: lower mesh assertion to 1 P2P connection for Docker NAT reliability #105: Lowered minimum from 2 to 1 P2P connection and reduced timeout from 90s to 60s — this weakened the test and should be reverted

What should happen instead

The mesh assertion should require >= 2 P2P connections per peer. If peers can't form enough connections in Docker NAT within the timeout, the root cause needs investigation:

  1. Why can't some peers form a second P2P connection? The topology manager sets min_connections=4 but peers consistently end up with only 1-3 P2P connections after 90+ seconds
  2. Is the topology manager's connection selection algorithm too slow for small networks? With DENSITY_SELECTION_THRESHOLD=5, peers with < 5 connections stay in "local neighborhood" mode which may not aggressively add connections
  3. Is Docker NAT hole punching unreliable for certain peer pairs? Some peer pairs may fail NAT traversal consistently

Root cause investigation needed

The topology manager in freenet-core (crates/core/src/topology/mod.rs) adjusts connections periodically. In a 6-peer network:

  • min_connections=4 means each peer should have at least 4 connections (including gateway)
  • After 120+ seconds (30s stabilization + 90s polling), some peers still have only 2 total connections (gateway + 1 P2P)
  • The topology manager's adjust_topology runs every ~15s but isn't forming connections fast enough

Additionally, subscription propagation (freenet-core issue #3037) relies on proximity cache announcements that need time to propagate bidirectionally. This is a separate but related issue.

Action items

  1. Revert PR fix: lower mesh assertion to 1 P2P connection for Docker NAT reliability #105's changes (restore min_p2p_per_peer=2, max_wait=90s)
  2. Keep PR fix: add post-subscription propagation delay for six-peer test #104's subscription propagation delay (this is a valid mitigation)
  3. Investigate why the topology manager can't form enough connections in 90+ seconds in Docker NAT
  4. Fix the connection formation issue in freenet-core

[AI-assisted - Claude]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions