Skip to content

Commit b54711d

Browse files
committed
rfc: client route liveness probing / eventual default-on
1 parent 3f66924 commit b54711d

File tree

1 file changed

+12
-14
lines changed

1 file changed

+12
-14
lines changed

rfcs/rfc7-client-route-liveness-probing.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ ICMP echo probing was selected for its simplicity, universality, and backward-co
5151

5252
### Integration Context
5353

54-
The probing subsystem integrates with the existing **BGP plugin** in `doublezerod`. Each service type (IBRL, IBRL with allocated IP, multicast) can declare whether route probing is active. In this proposal, probing is **enabled only for IBRL (without allocated IP)** mode.
54+
The probing subsystem integrates with the existing BGP plugin in `doublezerod`. Each service type (IBRL, IBRL with allocated IP, multicast) can declare whether route probing is active. In this proposal, probing is enabled only for IBRL (without allocated IP) mode.
5555

5656
<details>
5757

@@ -187,9 +187,9 @@ sequenceDiagram
187187

188188
The probing worker periodically sends ICMP echo requests toward each destination.
189189

190-
- Echo replies are received either by the **user-space ICMP listener** or directly by the kernel’s ICMP stack when the route is already installed.
191-
- The **Scheduler** determines when each route is probed, introducing jitter to avoid synchronized bursts.
192-
- The **Limiter** bounds the number of concurrent probes to control resource usage.
190+
- Echo replies are received either by the user-space ICMP listener or directly by the kernel’s ICMP stack when the route is already installed.
191+
- The Scheduler determines when each route is probed, introducing jitter to avoid synchronized bursts.
192+
- The Limiter bounds the number of concurrent probes to control resource usage.
193193
- Each probe runs with a per-probe timeout, and the listener automatically restarts with exponential backoff if it fails.
194194
- **Liveness Evaluation**
195195

@@ -328,7 +328,7 @@ The feature introduces no new externally reachable services or credentials, and
328328

329329
## Backward Compatibility
330330

331-
Route liveness probing is designed to be **interoperable across mixed client versions**, ensuring that enabling it does not break communication between upgraded and non-upgraded peers.
331+
Route liveness probing is designed to be interoperable across mixed client versions, ensuring that enabling it does not break communication between upgraded and non-upgraded peers.
332332

333333
### Compatibility Matrix
334334

@@ -347,19 +347,17 @@ Route liveness probing is designed to be **interoperable across mixed client ver
347347

348348
### Deployment Considerations
349349

350-
Initial testing indicates that **approximately 7% of existing clients do not currently respond to ICMP probes**.
350+
Route probing will ship as a disabled feature flag in initial releases. Operators can opt in for testing, but it will remain off by default until the subsystem has proven stable and client ICMP responsiveness is broadly consistent.
351351

352-
These clients will appear unreachable to peers performing liveness probing, even though routing and forwarding may still function correctly over the control plane.
352+
Early testing shows that about 7% of existing clients do not currently respond to ICMP probes. These nodes will appear unreachable to peers performing liveness probing, even though control-plane routes and forwarding may continue to function normally.
353353

354-
To ensure consistent behavior, the **first phase of rollout** should focus on enabling ICMP responsiveness across all clients, regardless of whether route probing itself is enabled.
354+
The first rollout phase should focus on ensuring ICMP reachability across all clients. During this period, probing should be enabled only in controlled or opt-in deployments, where its behavior can be observed without affecting reachability. Once ICMP handling is consistent and the subsystem demonstrates stable performance, the feature can be gradually enabled for broader use and eventually made default-on in later releases.
355355

356-
Once universal ICMP handling is confirmed, **subsequent upgrades** can enable route probing selectively or by default.
356+
During the transition:
357357

358-
During this transition:
359-
360-
- **Mixed environments remain compatible**, as unupgraded peers still respond via their kernel-space ICMP stack when their routes are installed.
361-
- **When overlay ICMP is unavailable on the destination**, replies will return over the public internet rather than `doublezero0`, and are treated as probe failures.
362-
- **Full overlay-level reachability validation over `doublezero0`** becomes reliable once all clients are ICMP-responsive.
358+
- Mixed environments remain compatible, as unupgraded peers still respond via their kernel-space ICMP stack when their routes are installed.
359+
- When overlay ICMP is unavailable on the destination, replies will traverse the public internet instead of `doublezero0` and be treated as probe failures.
360+
- Full overlay-level reachability validation via `doublezero0` becomes authoritative once all clients handle ICMP properly.
363361

364362
## Open Questions
365363

0 commit comments

Comments
 (0)