Skip to content

Commit 4e604ac

Browse files
committed
RFC: Client Route Liveness Probing
1 parent 5340989 commit 4e604ac

File tree

2 files changed

+282
-1
lines changed

2 files changed

+282
-1
lines changed
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
# DoubleZero Client Route Liveness Probing
2+
3+
## Summary
4+
5+
This proposal introduces **Route Liveness Probing** to the `doublezerod` client daemon.
6+
7+
The goal is to enable active data-plane validation of BGP-learned routes from DoubleZero Devices (DZDs), ensuring that only reachable routes are installed in the local kernel routing table.
8+
9+
Each route is periodically probed via ICMP echo requests, and transitions between `UP` and `DOWN` states according to a hysteresis-based policy. Routes marked `UP` are installed in the kernel routing table; routes marked `DOWN` are removed from the kernel routing table until they recover.
10+
11+
The feature will initially be available only for the IBRL service type (unicast without allocated IP), where fallback reachability over the public internet path is available.
12+
13+
## Motivation
14+
15+
Currently, routes learned from DZDs over BGP are installed unconditionally. If a DZD or its tunnel fails while the BGP session remains established, these routes can remain in the kernel routing table even when traffic is no longer deliverable — leading to silent blackholing until standard BGP timers expire or manual intervention occurs.
16+
17+
Introducing route liveness probing provides an independent, data-plane-based signal of reachability. This allows `doublezerod` to locally suppress failed routes without disturbing control-plane stability.
18+
19+
It improves operational reliability, reduces convergence time after partial failures, and aligns with the goal of making the DoubleZero client resilient to asymmetric or silent path failures.
20+
21+
## New Terminology
22+
23+
- **Route Liveness Probe** — A periodic ICMP echo request sent by the client to verify that traffic can reach a given BGP-learned destination.
24+
- **Liveness State** — The local classification of a route as `Unknown`, `Up`, or `Down`, based on recent probe outcomes.
25+
- **Liveness Policy** — The decision logic (hysteresis-based) that determines when to transition between states, using configurable thresholds for consecutive successes or failures.
26+
- **Probing Worker** — The component that executes probes on a fixed interval and reports results to the policy tracker.
27+
- **User-Space ICMP Listener** — A lightweight responder on `doublezero0` that sends echo replies via `doublezero0` even when the route isn’t in the kernel table (where the kernel ICMP stack would otherwise return the reply over the public internet).
28+
- **Probing Subsystem** — The overall module within `doublezerod` that coordinates probing, evaluation, and route installation/withdrawal.
29+
30+
## Alternatives Considered
31+
32+
### Passive Monitoring (existing `doublezero-monitor-tool`)
33+
34+
A passive approach could infer route health from forwarding statistics such as `nftables` or kernel FIB counters. However, it cannot distinguish between an idle route and an unreachable one and provides no proactive assurance of data-plane reachability. Detection is reactive and only occurs once user traffic has already been impacted.
35+
36+
### BGP-Only (current in-client behavior)
37+
38+
Relying solely on BGP session state and withdrawals, as done today, limits detection to control-plane failures. It cannot detect partial or asymmetric data-plane failures where the session remains established but forwarding has stopped, leading to silent blackholing until standard hold timers expire.
39+
40+
### Active Liveness Probing via TWAMP
41+
42+
TWAMP would provide a standards-based active probing mechanism but requires reflector support on the remote side and coordinated upgrades across all participating devices. Because existing clients already support kernel-space ICMP responders, ICMP-based probing can be deployed incrementally without disrupting reachability between mixed-version peers.
43+
44+
### Active Liveness Probing via ICMP (selected)
45+
46+
ICMP echo probing was selected for its simplicity, universality, and backward-compatible deployment. It leverages existing ICMP handling paths, requires no additional coordination between clients, and provides a reliable binary reachability signal suitable for gating route installation.
47+
48+
## Detailed Design
49+
50+
### Integration Context
51+
52+
The probing subsystem integrates with the existing **BGP plugin** in `doublezerod`. Each service type (IBRL, IBRL with allocated IP, multicast) can declare whether route probing is active. In this proposal, probing is **enabled only for IBRL (without allocated IP)** mode.
53+
54+
<details>
55+
56+
<summary>System context diagram</summary>
57+
58+
```mermaid
59+
graph TB
60+
DZD[Connected DZD Peer]
61+
DESTS[Destinations in Advertised Prefixes]
62+
INTERNET[Public Internet Path]
63+
64+
subgraph CLIENT[Client Host]
65+
DZIF[doublezero0 Interface]
66+
67+
subgraph DZD_PROC[doublezerod Process]
68+
BGP[BGP Plugin]
69+
RM[Route Manager]
70+
PW[Probing Worker]
71+
LT[Liveness Tracker]
72+
UL[User-Space ICMP Listener]
73+
end
74+
75+
NL[Netlink API]
76+
KRT[Kernel Routing Table]
77+
end
78+
79+
%% Control Plane
80+
DZD -->|BGP updates: advertise / withdraw| BGP
81+
BGP -->|Learned routes| RM
82+
83+
%% Probing Workflow
84+
RM --> PW
85+
PW -->|ICMP echo via doublezero0| DESTS
86+
DESTS -->|ICMP reply| UL
87+
UL --> PW
88+
PW -->|Probe results| LT
89+
LT -->|State: Up / Down| RM
90+
91+
%% Routing Integration
92+
RM -->|Add / Delete route| NL
93+
NL --> KRT
94+
DZIF --- KRT
95+
96+
%% Fallback Path
97+
DESTS -. "When route is down, kernel replies may return via" .-> INTERNET
98+
```
99+
100+
</details>
101+
102+
<details>
103+
104+
<summary>Workflow sequence diagram</summary>
105+
106+
```mermaid
107+
sequenceDiagram
108+
autonumber
109+
participant DZD as DZD Peer
110+
participant BGP as BGP Plugin
111+
participant RM as Route Manager
112+
participant PW as Probing Worker
113+
participant UL as User-space ICMP Listener
114+
participant LT as Liveness Tracker
115+
participant NL as Netlink
116+
participant KRT as Kernel Routing Table
117+
participant DST as Destination Host
118+
119+
DZD->>BGP: BGP UPDATE (new/changed route)
120+
BGP->>RM: Learned route notification
121+
RM->>PW: Register route for probing
122+
RM->>LT: Initialize liveness (Unknown)
123+
124+
loop every probe interval
125+
PW->>DST: ICMP Echo via doublezero0
126+
alt echo reply received
127+
DST-->>UL: ICMP Echo Reply on doublezero0
128+
UL->>PW: Deliver reply
129+
PW->>LT: Record success
130+
else timeout or error
131+
PW->>LT: Record failure
132+
end
133+
134+
alt transition to UP
135+
LT-->>RM: State = UP
136+
RM->>NL: Install route
137+
NL->>KRT: Add route entry
138+
else transition to DOWN
139+
LT-->>RM: State = DOWN
140+
RM->>NL: Withdraw route
141+
NL->>KRT: Delete route entry
142+
else no change
143+
LT-->>RM: No state change
144+
end
145+
end
146+
147+
Note over DST,UL: If route is DOWN, host may reply via public internet path instead of doublezero0
148+
```
149+
150+
</details>
151+
152+
### Workflow
153+
154+
1. **Route Announcement**
155+
156+
When a new route is learned via BGP, it is registered with the route manager, which initializes its liveness state to `Unknown`.
157+
158+
2. **Probing**
159+
160+
The probing worker periodically sends ICMP echo requests toward each destination.
161+
162+
- Echo replies are handled by the **user-space ICMP listener** bound to `doublezero0`.
163+
- This listener ensures replies return over the overlay interface, since the kernel’s ICMP stack would otherwise send them over the public internet when the route isn’t installed.
164+
165+
3. **Liveness Evaluation**
166+
167+
Results are fed into the liveness policy tracker:
168+
169+
- Consecutive successes above a threshold transition the route to `Up`.
170+
- Consecutive failures above a threshold transition it to `Down`.
171+
- Intermediate results cause no state change.
172+
4. **Routing Synchronization**
173+
174+
The route manager reflects state changes into the kernel routing table:
175+
176+
- Routes marked `Up` are installed.
177+
- Routes marked `Down` are withdrawn.
178+
- BGP session state is unaffected.
179+
180+
### Configuration Parameters
181+
182+
| Parameter | Description | Default |
183+
| --- | --- | --- |
184+
| `--route-probing-enable` | Enables the probing subsystem | disabled |
185+
| `--route-probing-interval` | Probe interval per route | 1s |
186+
| `--route-probing-timeout` | Timeout per probe | 1s |
187+
| `--route-probing-up-threshold` | Consecutive successes to mark route `Up` | 3 |
188+
| `--route-probing-down-threshold` | Consecutive failures to mark route `Down` | 3 |
189+
190+
### Policy Design
191+
192+
The initial liveness policy is **hysteresis-based**, trading responsiveness for stability.
193+
194+
The policy layer is designed to be pluggable, enabling future replacement with alternative evaluation strategies such as EWMA-based smoothing, weighted failure scoring, or adaptive thresholds that respond to observed probe variance.
195+
196+
## Failure Scenarios
197+
198+
### Probing Subsystem Failure
199+
200+
If the probing subsystem crashes, deadlocks, or encounters runtime errors (e.g., socket exhaustion), route liveness state stops updating. Routes remain in their last known state — either `UP` or `DOWN` — until the subsystem recovers. This may temporarily cause stale routes to remain installed or withdrawn, but forwarding continuity is preserved.
201+
202+
### ICMP Unavailability on Destination Clients
203+
204+
If a destination DoubleZero client disables ICMP handling or filters echo replies, its peers will mark the associated routes as `DOWN` and withdraw them from their local routing tables. Traffic to that destination will then be sent via the public internet path instead of the `doublezero0` interface. This behavior preserves reachability but bypasses the DoubleZero overlay until the client resumes responding to ICMP.
205+
206+
### False Negatives and Transient Misclassification
207+
208+
ICMP rate limiting, temporary congestion, or asymmetric paths can cause sporadic probe failures and transient misclassification of route state. The hysteresis policy mitigates short-lived noise by requiring consecutive failures or recoveries before transition, but overly aggressive thresholds could still cause unnecessary route churn.
209+
210+
### Resource Exhaustion
211+
212+
In deployments with many routes, the probing loop may open a large number of concurrent ICMP sessions or consume excessive file descriptors. Concurrency limits and probe scheduling mitigate this risk, but misconfiguration or extreme churn could still degrade performance.
213+
214+
## Impact
215+
216+
### Operational Reliability
217+
218+
Ensures that only verifiably reachable routes remain active, preventing blackholes caused by stale BGP state.
219+
220+
### Convergence
221+
222+
Enables faster local convergence following data-plane failures, without affecting BGP session timers or advertisements.
223+
224+
### Resource Usage
225+
226+
Adds lightweight background ICMP traffic and minimal CPU overhead; concurrency and rate limits ensure scalability with large route tables.
227+
228+
### Observability
229+
230+
Exposes route state transitions via logs and metrics, providing operators with visibility into data-plane reachability.
231+
232+
## Security Considerations
233+
234+
The route liveness probing subsystem does not materially alter DoubleZero’s trust or threat model. It operates entirely within the client’s existing control and data plane, using ICMP echo requests to destinations learned through the trusted DZD control plane.
235+
236+
Probes are sent only toward prefixes advertised by connected DZDs, so there is no risk of arbitrary or unscoped network scanning. Probe frequency and concurrency are bounded to prevent overload or amplification. Responses are handled either by the `doublezerod` process (when the user-space ICMP listener is running) or by the kernel’s ICMP stack on remote peers running earlier versions.
237+
238+
The feature introduces no new externally reachable services or credentials, and ICMP payloads contain no sensitive information. The primary operational consideration is that ICMP must be permitted between peers for liveness detection to function accurately.
239+
240+
## Backward Compatibility
241+
242+
Route liveness probing is designed to be **interoperable across mixed client versions**, ensuring that enabling it does not break communication between upgraded and non-upgraded peers.
243+
244+
### Compatibility Matrix
245+
246+
- **Probing enabled on source only:**
247+
248+
The source client can still perform reachability checks, since destinations without probing respond using their kernel-space ICMP stack over the public internet path. Replies are routed normally, so liveness detection continues to function even if the remote side has not yet upgraded.
249+
250+
- **Probing enabled on both source and destination:**
251+
252+
Both clients use the DoubleZero user-space ICMP listener to exchange echo replies over the `doublezero0` interface, even when the route is not installed in the kernel table. This ensures accurate overlay-level reachability and preserves end-to-end validation within the DoubleZero fabric.
253+
254+
- **Probing disabled on both sides:**
255+
256+
Behavior remains unchanged from current deployments—routes are installed and withdrawn solely based on BGP control-plane updates.
257+
258+
259+
### Deployment Considerations
260+
261+
Initial testing indicates that **approximately 7% of existing clients do not currently respond to ICMP probes**.
262+
263+
These clients will appear unreachable to peers performing liveness probing, even though routing and forwarding may still function correctly over the control plane.
264+
265+
To ensure consistent behavior, the **first phase of rollout** should focus on enabling ICMP responsiveness across all clients, regardless of whether route probing itself is enabled.
266+
267+
Once universal ICMP handling is confirmed, **subsequent upgrades** can enable route probing selectively or by default.
268+
269+
During this transition:
270+
271+
- Mixed environments remain compatible, as unupgraded peers still respond via the kernel-space ICMP path.
272+
- Probing-capable clients automatically fall back to the public-internet ICMP path when remote overlay ICMP is unavailable.
273+
- Full overlay-level reachability validation over `doublezero0` becomes reliable once all clients are ICMP-responsive.
274+
275+
## Open Questions
276+
277+
- **Liveness Policy** — Is the current hysteresis approach good enough, or do we need something smoother like an EWMA or loss-weighted model to better handle intermittent loss and jitter?
278+
- **Thresholds & Convergence** — What probe interval and success/failure counts give us fast enough convergence without spamming probes or creating churn?
279+
- **Route Weighting** — Should all routes count the same, or should liveness results be weighted by stake or reputation (like `doublezero-monitor-tool`)?
280+
- **Probe Concurrency** — With lots of routes, how many probes can safely run at once, and do we need a global rate cap?
281+
- **Visibility & Monitoring** — How do we detect and debug flapping or systemic probe loss across clients? Should we collect telemetry or metrics from all clients to build an aggregate view of reachability and probe health?
282+
- **ICMP Reachability Rollout** — About 7% of clients don’t currently answer ICMP. What’s “good enough” coverage before we can safely make probing default?

tools/twamp/pkg/light/stub_fallback.go

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
//go:build !linux
2-
// +build !linux
32

43
package twamplight
54

0 commit comments

Comments
 (0)