Replies: 1 comment
-
|
@acul71 , @sumanjeet0012 : Opening a project tracking issue keeping this discussion page for ensuring all the important points are documented here. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
PR: #1046 — Fix QUIC interop: Track Connection IDs for proper packet routing
Related Issue: #1044
Author: @acul71 (with additional contributions from @sumanjeet0012)
Review Date: 2025-11-17
Status: OPEN — Investigation Required Before Merge
1. Summary
PR #1046 provides a comprehensive fix for QUIC interoperability issues where Go-to-Python pings failed after the identify stream closed.
Root cause: Python QUIC listener was not tracking new QUIC Connection IDs, causing packets using new IDs to be dropped.
This PR introduces:
listener.pyThe implementation is high-quality, well-tested, and substantially improves maintainability.
However, a blocker remains: flakiness (~4% failure rate) in
test_yamux_stress_ping, indicating a likely concurrency race condition.2. Strengths
✔ Comprehensive Fix
✔ Excellent Code Organization
ConnectionIDRegistry✔ Performance Improvements
✔ Comprehensive Testing
local_ping_test.py)✔ Documentation and Newsfragments
3. Critical Blocker: Stress Test Flakiness
❗ Failing Test
tests/core/transport/quic/test_integration.pytest_yamux_stress_ping(lines 341–515)CI shows ~4% failure rate (96/100 streams).
Local runs: occasional failures (98–100/100 streams).
Why This Matters
This indicates:
ConnectionIDRegistryMasked by
@pytest.mark.flaky— this hides the root cause.MUST FIX BEFORE MERGE
This affects production-level correctness of QUIC routing.
4. Required Follow-Up Work (Blocking)
🔥 Priority 1 — Concurrency Testing of ConnectionIDRegistry
Develop high-concurrency tests exercising:
register_connection()+register_pending()add_connection_id()for same connectionpromote_pending()andremove_connection_id()🔥 Priority 2 — Race Condition Investigation
Likely points of failure:
🔥 Priority 3 — Remove flaky decorator
Restore strict correctness requirements.
5. Secondary Tasks (Non-blocking)
Consider state machine pattern for Connection ID transitions
(e.g., as referenced in
QUIC-STATE-MACHINE.md)Add observability/metrics for:
Additional integration tests:
6. Maintainers’ Discussion & Comments
Co-maintainer Note (from PR discussion)
Reviewer Response
7. Final Assessment
✔ The PR is high-quality and addresses the core issue.
✔ All previous blockers resolved.
❗ ONE CRITICAL BLOCKER REMAINS:
test_yamux_stress_pingflakiness — likely concurrency issue in ConnectionIDRegistry.Merge Readiness: NOT READY — Investigation Required
8. Task List
Blocking Tasks
@pytest.mark.flakyonce stableNon-Blocking Tasks
Beta Was this translation helpful? Give feedback.
All reactions