Fix _Once.ensure() to propagate handshake failure to concurrent waiters by bysiber · Pull Request #3397 · python-trio/trio

bysiber · 2026-02-20T06:26:07Z

Problem

_Once.ensure() in SSLStream can leave concurrent waiters hanging forever when the handshake fails.

When two tasks share an SSLStream (one sending, one receiving), both call ensure() to lazily perform the TLS handshake. The first task sets started = True and begins the handshake. The second task sees started=True, finds _done not yet set, and enters _done.wait().

If the handshake fails (certificate error, connection reset, etc.), the exception propagates to the first task — but _done.set() is never called. The second task is stuck forever in _done.wait(): the Event will never be signalled, and started is permanently True, so re-entry won't help either.

Reproduction scenario

Task A calls send_all() → enters ensure(), starts handshake
Task B calls receive_some() → enters ensure(), waits on _done
Handshake fails (remote peer rejects cert)
Task A gets BrokenResourceError — correct
Task B hangs indefinitely

Fix

Store the exception on failure and still signal _done, so that concurrent waiters (and any future callers) wake up and receive a BrokenResourceError chained from the original handshake exception.

When two tasks use an SSLStream concurrently (one sending, one receiving), both call _Once.ensure() to trigger the lazy handshake. If the first task starts the handshake and it fails (e.g. certificate error, connection reset), the exception propagates to that task but _done is never set. The second task, already waiting on _done.wait(), blocks indefinitely — the Event is never signalled and started is permanently True, so there is no recovery path. Store the failure exception and set _done even on error, so that all waiters wake up and receive a BrokenResourceError chained from the original handshake exception.

codecov · 2026-02-20T06:35:57Z

Codecov Report

❌ Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.97942%. Comparing base (2fd138e) to head (aa3256a).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/trio/_ssl.py	66.66667%	2 Missing and 2 partials ⚠️

❌ Your patch status has failed because the patch coverage (66.66667%) is below the target coverage (100.00000%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (99.97942%) is below the target coverage (100.00000%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                 Coverage Diff                  @@
##                 main       #3397         +/-   ##
====================================================
- Coverage   100.00000%   99.97942%   -0.02058%     
====================================================
  Files             128         128                 
  Lines           19424       19434         +10     
  Branches         1318        1320          +2     
====================================================
+ Hits            19424       19430          +6     
- Misses              0           2          +2     
- Partials            0           2          +2

Files with missing lines	Coverage Δ
src/trio/_ssl.py	`98.14815% <66.66667%> (-1.85185%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

A5rocks

Could you create an issue before making a PR fixing a bug like this? I'd like to confirm that this is a real problem first!

A5rocks · 2026-02-20T08:12:19Z

src/trio/_ssl.py

+            try:
+                await self._afn(*self._args)
+            except BaseException as exc:
+                self._failure = exc


This is a bad idea, because a) the stack frames for exc will be mutated, so you will have weird stack traces for raise ... from self._failure and b) this will lead to a refcycle I believe.

A5rocks requested changes Feb 20, 2026

View reviewed changes

A5rocks closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix _Once.ensure() to propagate handshake failure to concurrent waiters#3397

Fix _Once.ensure() to propagate handshake failure to concurrent waiters#3397
bysiber wants to merge 1 commit intopython-trio:mainfrom
bysiber:fix/ssl-once-ensure-hang-on-failure

bysiber commented Feb 20, 2026

Uh oh!

codecov bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

A5rocks left a comment

Uh oh!

A5rocks Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bysiber commented Feb 20, 2026

Problem

Reproduction scenario

Fix

Uh oh!

codecov bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

A5rocks left a comment

Choose a reason for hiding this comment

Uh oh!

A5rocks Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 20, 2026 •

edited

Loading