fix(noq): close the connection when a Connecting is dropped before handshake by drlukeangel · Pull Request #682 · n0-computer/noq

drlukeangel · 2026-05-29T17:24:57Z

Description

While running noq (via iroh) under sustained churn in a downstream project — peers abruptly killed
without a graceful CONNECTION_CLOSE — we found in-progress connections were never reclaimed.
Connecting had no Drop implementation, so dropping a Connecting future before the handshake
completed didn't signal the connection task. Before handshake completion the idle timeout isn't yet
in effect and there's no handshake timeout, and the background ConnectionDriver plus the
EndpointInner event-channel sender keep the connection alive — so the half-open connection state
lingered in the endpoint maps indefinitely, once per aborted inbound handshake.

(Thanks for noq, too — having a workable multipath QUIC we could build on underneath iroh is a big
deal, and the Connecting/driver/endpoint separation made the right fix point clear.)

Fix

Implement Drop for Connecting: on drop, if the handshake hasn't completed, take the inner
connection reference, and if it isn't already closed, call implicit_close(). That transitions the
connection to drained, dispatches the close to the endpoint (which removes the handle from its
active maps), and wakes the driver to exit cleanly — releasing the connection, its channels, and its
packet spaces.

To satisfy the borrow checker (E0509: cannot move out of a type implementing Drop), the oneshot
connected receiver is wrapped in Option, and into_0rtt / Future::poll take() it (poll
restores it on Pending).

Breaking Changes

None. Connecting's public API is unchanged; this only adds cleanup on the drop path.

Notes & open questions

Test: a focused regression test (start an inbound handshake, drop the Connecting, assert the
connection is removed from the endpoint maps) needs a connected-endpoint harness; glad to add one
in whatever style fits your test suite. We validated the behavior via a downstream chaos soak
(continuous kill/respawn): with the fix, the previously monotonic per-connection growth flattens
and connection structures no longer accumulate in heap profiles.
Open question: would you prefer this paired with an explicit pre-handshake timeout, or is
drop-driven cleanup the right scope here? Happy to follow your lead.

Checklist

Self-review
Documented the Option-wrap reasoning (E0509) and the drop-path intent
Tests — see note above; glad to add a harness-based regression test on request
No breaking changes

A Connecting dropped before handshake completion left its connection task, packet spaces, and channels stranded in EndpointInner forever (before handshake completion max_idle_timeout is not yet enabled and there is no handshake timeout). Drop now takes the inner connection and calls implicit_close() (guarded by !is_closed) to drain and release. The connected oneshot Receiver is wrapped in Option so poll/into_0rtt can take() it without moving out of a type implementing Drop (E0509).

Connecting had no Drop impl, so dropping it before the handshake completed left the connection alive — its driver task and the endpoint's event-channel sender kept it open, and no idle or handshake timeout applies pre-handshake. Under connection churn this leaked one half-open connection per aborted handshake. The new Drop closes the connection so the endpoint reclaims it. The connected oneshot receiver is now wrapped in Option so into_0rtt and Future::poll can take it, since a field cannot be moved out of a type that implements Drop. Adopted from n0-computer#682 (drlukeangel).

flub

Is it not possible to come up with a test that demonstrates this?

flub · 2026-06-01T10:45:10Z

        if is_ok {
            let conn = self.conn.take().unwrap();
-            Ok((Connection(conn), ZeroRttAccepted(self.connected)))
+            let connected = self.connected.take().unwrap();


we want fewer unwraps, not more. They are all problematic and keep being the source of bugs. Are there any actually type-safe ways of achieving this?

drlukeangel · 2026-06-01T15:20:33Z

Thank you so much for the feedback! 🙇‍♂️ I've incorporated your requested fixes: I implemented a safe ConnectingState enum to completely eliminate the .unwrap() calls, and I added a regression test (drop_connecting_cleans_up) to ensure the drop behavior is verified. I also replaced the unbounded FxHashSet with a bounded AbandonedPaths custom collection to prevent the set from growing indefinitely under churn. The PR has been updated with these changes!

divagant-martian · 2026-06-02T04:34:35Z

            remote_max_path_id: PathId::ZERO,
            max_path_id_with_cids: PathId::ZERO,
-            abandoned_paths: Default::default(),
+            abandoned_paths: AbandonedPaths::new(config.max_concurrent_multipath_paths.map_or(1, |v| v.get()) as usize),


I find two issues with this code

The only thing you are using here is a NonZeroUsize within the whole config, yet you are cloning it, incurring in atomic operations for the Arc::clone. There is no need for this, simply take the necessary value before and avoid the clone in the previous changed line.

The value choice itself. An endpoint that only desires to keep 2 paths open, might still have a lot of path "movement", via network changes etc. Remembering only the last two abandoned paths is far too low info for what the protocol should check

divagant-martian · 2026-06-02T04:35:45Z

Suggested change

remove this line. At time of this review CI has not been yet approved but this tells me you need to run cargo fmt on the code

divagant-martian · 2026-06-02T04:37:15Z

 }
+
+
+/// Bounded storage of the most recently abandoned paths


Suggested change

/// Bounded storage of the most recently abandoned paths

/// Bounded storage of the most recently abandoned paths.

divagant-martian · 2026-06-02T04:48:57Z

+
+/// Bounded storage of the most recently abandoned paths
+#[derive(Debug, Clone)]
+pub(crate) struct AbandonedPaths {


I understand your motivation, but this is not a solution we cannot afford to include in the code at this moment. Abandoning a path is final within the protocol, and to make it such it's imperative to remember all abandoned paths. Otherwise we open ourselves to either misbehave or allow the remote to do so. I have not considered if this change could have security implications and checking this would need further thought; and in any case, this wouldn't (or at least, shouldn't) remain in the code for long even if we decide to include it.

This does no mean we need to keep an ever-growing set of ids, there are other solutions to this, that we have not prioritized. @flub's comment inside the code makes a reference to this. The idea is to store contiguous ranges of abandoned paths instead of the explicit list. Think that instead of storing 1,2,3,6,7,8 we want to store [1-3], [6-8]

divagant-martian · 2026-06-02T05:02:37Z

+            if let ConnectingState::Active { conn, connected, .. } = std::mem::replace(&mut self.state, ConnectingState::Consumed) {
+                Ok((Connection(conn), ZeroRttAccepted(connected)))
+            } else {
+                unreachable!()


looking at the code it's clear why this is the case, but it's still preferable to write something here

divagant-martian · 2026-06-02T05:10:24Z

        // simple.
-        if let Some(x) = self.handshake_data_ready.take() {
+        let ConnectingState::Active { conn, handshake_data_ready, .. } = &mut self.state else {
+            panic!("used after yielding Ready");


this is a very confusing panic here. I see this was copied from local_ip. But this function has clear docs that would need to be added here for this to be clear for the reader. Verbatim from those ones:

/// Will panic if called after `poll` has returned `Ready`.

This was referenced May 29, 2026

fix(iroh-gossip): reap connection tasks and prune peer state on churn n0-computer/iroh-gossip#146

Open

fix(iroh): evict cached mapped addrs when a RemoteStateActor shuts down n0-computer/iroh#4294

Open

drlukeangel marked this pull request as ready for review May 29, 2026 17:53

n0bot Bot added this to iroh May 29, 2026

github-project-automation Bot moved this to 🚑 Needs Triage in iroh May 29, 2026

caiogondim mentioned this pull request May 31, 2026

fix: bound Connection::abandoned_paths (prune on discard) #683

Open

flub requested changes Jun 1, 2026

View reviewed changes

github-project-automation Bot moved this from 🚑 Needs Triage to 🏗 In progress in iroh Jun 1, 2026

fix: address PR feedback for noq

b49aa33

divagant-martian requested changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(noq): close the connection when a Connecting is dropped before handshake#682

fix(noq): close the connection when a Connecting is dropped before handshake#682
drlukeangel wants to merge 2 commits into
n0-computer:mainfrom
drlukeangel:fix/churn-memory-leak

drlukeangel commented May 29, 2026 •

edited

Loading

Uh oh!

flub left a comment

Uh oh!

flub Jun 1, 2026

Uh oh!

drlukeangel commented Jun 1, 2026

Uh oh!

divagant-martian Jun 2, 2026

Uh oh!

divagant-martian Jun 2, 2026

Uh oh!

divagant-martian Jun 2, 2026

Uh oh!

divagant-martian Jun 2, 2026 •

edited

Loading

Uh oh!

divagant-martian Jun 2, 2026

Uh oh!

divagant-martian Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	/// Bounded storage of the most recently abandoned paths
	/// Bounded storage of the most recently abandoned paths.

Conversation

drlukeangel commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fix

Breaking Changes

Notes & open questions

Checklist

Related

Uh oh!

flub left a comment

Choose a reason for hiding this comment

Uh oh!

flub Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

drlukeangel commented Jun 1, 2026

Uh oh!

divagant-martian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

divagant-martian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

divagant-martian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

divagant-martian Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divagant-martian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

divagant-martian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drlukeangel commented May 29, 2026 •

edited

Loading

divagant-martian Jun 2, 2026 •

edited

Loading