Skip to content

Commit 6c941dc

Browse files
committed
rfc: per-link metrics
1 parent a74472e commit 6c941dc

File tree

1 file changed

+130
-0
lines changed

1 file changed

+130
-0
lines changed

rfcs/rfc6-per-link-metrics.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Methodology For Setting Per-link Metrics
2+
3+
## Summary
4+
5+
In order to optimize packet forwarding across DoubleZero, we need a methology for setting per-link metrics in order for an ingress DoubleZero device to derive the shortest path to an egress DoubleZero device, with min(latency) being the key optimization function. This document proposes using a metric value directly derived from per-link telemetry measurements written on-chain by each device's telemetry agent over the previous epoch. If a link has latency data populated for the previous epoch, the DoubleZero controller will generate per-link configuration to apply the generated metric value, which can then impact the shortest path decision making of an ingress DoubleZero device.
6+
7+
## Motivation
8+
9+
While per-link metric derivation currently happens on DoubleZero testnet today, the process is manually completed by a human pinging across a link at a single point in time and manually applying a metric on a DoubleZero device. While this functionally works, there are several pitfalls to this process:
10+
- Human error is a thing, and given hundreds of links, an incorrect metric value (100 instead of 1000) can transform what should be a high latency link to a low latency link and suboptimally impact packet forwarding across DoubleZero
11+
- Latency values can change change over time on a link, which isn't reflected during a single point in time measurement. For example, a bandwidth contributor may be using a protected circuit, and during a fiber cut, the underlying provider may reroute the physical link onto a longer fiber path to restore service. While this link may have been the most performant prior to the fiber cut, the additional latency post fiber cut on the protected path may make this a less than optimal link from DoubleZero's perspective.
12+
- Utilizing measured latency values written on-chain by DoubleZero device agents provide transparency into how each per-link metric is being calculated
13+
14+
## New Terminology
15+
16+
N/A
17+
18+
## Alternatives Considered
19+
20+
### Use Static Metric Values Set By The Bandwidth Contributor and/or Foundation
21+
22+
This by far the simplest approach. In the link data structure on-chain, there are already fields for both delay (`delay_ms`) and jitter (`jitter_ms`), which can be used to form a composite latency-based metric and consumed by the DoubleZero controller for rendering per-link configuration. These can be set by either the bandwidth contributor or foundation at turnup time.
23+
24+
The downside to this approach is all the issues mentioned in the motivation section of this document.
25+
26+
### Use Real Time Metric Values Set By The Telemetry Agent
27+
28+
The DoubleZero controller could consume the most recent telemetry samples written on-chain within an epoch and immediately apply updated per-link metrics. While this is fairly straightforward to implement, without a smoothing function applied, sample to sample outliers would directly impact packet forwarding within DoubleZero as jitter would potentially cause shortest path recalculations on devices.
29+
30+
The controller could use a windowing function internally to smooth latency outliers but unless the window was done on a strict boundary, it may be difficult to understand which samples contributed to a particular metric value. Additionally, the smoothed latency value and metric would need to be written back on-chain both for transparency and to provide durability to controller restarts, as the architecture is currently stateless and solely based on on-chain data.
31+
32+
## Detailed Design
33+
34+
The goal of this design is to leverage the existing telemetry pipeline and latency samples written on-chain to derive and configure per-link metrics. In order to optimize for stability, metric values will be derived based on aggregated telemetry data over the previous epoch. The side effect of this is a link cannot be brought into service until there is a previous epoch's worth of telemetry data.
35+
36+
Since the DoubleZero devices currently leverage IS-IS as an underlay routing protocol, link metrics are restricted to a minimum metric value of 1 and a maximum metric value of 16,777,215 (2^24), which is the maximum value allowed in an IS extended neighbor TLV used for wide metrics.
37+
38+
The per-link metric calculation can be kept simple to start. We can directly use the p99 latency in microseconds over the previous epoch as the metric value. For example, a p99 latency value of 400us would equate to a metric value of 400 on a link where a p99 latency of 220ms would equate to a metric of 220,000 on a link. One could argue to use a composite metric such as p99 latency (us) + p99 jitter (us) which would select for circuits with min(latency + jitter) and they would not be wrong but we should bias towards a walk/crawl/run approach, especially until we have more confidence on telemetry sampling.
39+
40+
In the event a link has no telemetry data from the previous epoch, the link could either be assigned a maximum metric of 16,777,215 or not placed into protocols altogether.
41+
42+
One of the major decision points is how/where will we store aggregated telemetry data. Below are two options for this.
43+
44+
### Option #1: Store Aggregated Telemetry In Telemetry Program
45+
46+
1. Telemetry samples are written on-chain by the telemetry agent running on each DZD.
47+
2. At the end of each epoch, an aggregation function runs to calculate p99 latency and jitter values. Since this is required for rewards calculation, it seems this could be reused here. In order to make these queryable, the output can be written to an on-chain record as part of the telemetry program.
48+
3. The controller can fetch the aggregated telemetry for all links from the telemetry program as well as all link metadata from the serviceability program. Given the serviceability program provides metadata such as interface names, desired operational state, etc, and the telemetry program contains latency values, enough inforamtion is available to render templated configuration for each DoubleZero device.
49+
4. Each DoubleZero device config agent will receive updated metrics as part of their configuration they fetch during their normal polling clock tick to the controller.
50+
51+
```mermaid
52+
flowchart TD
53+
54+
telemetry_agent["Telemetry Agent"]
55+
config_agent["Config Agent"]
56+
57+
controller["DoubleZero Controller"]
58+
59+
subgraph s2["Telemetry Program"]
60+
direction LR
61+
raw_telemetry["Raw Telemetry Samples Per DZD/Link/Epoch"] --> n4(("End of Epoch")) --> agg_telemetry["Aggregated Telemetry Per DZD/Link/Epoch"]
62+
end
63+
64+
subgraph s3["Serviceability Program"]
65+
links["DoubleZero Links"]
66+
end
67+
68+
telemetry_agent --> raw_telemetry
69+
agg_telemetry -- "Poll aggregated telemetry per link" --> controller
70+
links -- "Poll links" --> controller
71+
controller -- "Send per-link metrics" --> config_agent
72+
```
73+
74+
### Option #2: Store Aggregated Telemetry In Serviceability Program
75+
76+
This option is similar to #1 but instead of storing aggregated telemetry data in a seperate table within the telemetry program, the aggregation function could store them within the links table of the serviceability program itself. There is already a delay field (`delay_ms`) and jitter (`jitter_ms`) available in each row and could be reused or new fields created. This allows the controller to poll a single table for both link and aggregated telemetry information.
77+
78+
One downside of this is ownership of these records as contributors have access to edit their links and are currently able to populate delay/jitter values. These could be removed from the doublezero CLI but more nefarious users could bypass this.
79+
```mermaid
80+
flowchart TD
81+
82+
telemetry_agent["Telemetry Agent"]
83+
config_agent["Config Agent"]
84+
85+
controller["DoubleZero Controller"]
86+
87+
subgraph s2["Telemetry Program"]
88+
direction LR
89+
raw_telemetry["Raw Telemetry Samples Per DZD/Link/Epoch"] --> epoch(("End of Epoch"))
90+
end
91+
92+
subgraph s3["Serviceability Program"]
93+
links["DoubleZero Links"]
94+
end
95+
96+
telemetry_agent --> raw_telemetry
97+
epoch -- "Update links table w/ aggregated telemetry" --> links
98+
links -- "Poll links" --> controller
99+
controller -- "Send per-link metrics" --> config_agent
100+
101+
```
102+
## Impact
103+
104+
### Serviceability Program
105+
- Update link data structure to include:
106+
- Include drained flag boolean which would either set affinity on link or max metrc (2^24)
107+
- last epoch latency + jitter (if option #2 selected)
108+
109+
### Telemetry Program
110+
- Build or reuse rewards aggregate telemetry function and adapt to writing results on-chain either to new table in telemetry program (option #1) or the links table (option #2)
111+
112+
### Controller
113+
- Update controller to fetch link table, aggregate telemetry and transform p99 latency into metric value; render appropriate interface template.
114+
115+
## Security Considerations
116+
117+
It is mentioned in the open questions but it seems useful to have a mechanism to override a metric value for the purposes of traffic engineering in the short term. This can belong in the links table but if it can be modified by a bandwidth contributor, packet forwarding can be impacted across DoubleZero.
118+
119+
## Backwards Compatibility
120+
121+
N/A
122+
123+
## Open Questions
124+
125+
*Items that still need resolution.*
126+
List outstanding issues, research tasks, or decisions deferred to later milestones. This section helps reviewers focus feedback and signals areas where contributions are welcomed.
127+
128+
- Should we include an override field in the link record? We're solely depependent on telemetry behaving correctly and even if it is, there are times where manual traffic engineering will be necessary. This is however dangerous if this can be set by the bandwidth contributor so it may belong somewhere else.
129+
130+
- I don't love the circular dependencies we're building with the network decoupled from on-chain data. If a device loses management connectivity, we're at risk of not being able to update metric values in the event of an operational issue. We specifically didn't want to run validators directly on the DZDs since a network issue could also cause us to never reach consensus.

0 commit comments

Comments
 (0)