-
Notifications
You must be signed in to change notification settings - Fork 3
RFC2 - DoubleZero Exchange - DZX #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ben-malbeclabs
commented
Jun 7, 2025
- Details the use-case for DZX
- Offers options for physical/logical DZX deployments
- Considers use of community and vendor design, as well as hybrid
|
Not sure if we have an approach for keeping rfc numbers unique, but looks like there's already an rfc1 ahead of this in the queue https://github.com/malbeclabs/doublezero/pull/439/files |
|
The doc outlines 3 options for physical topology, then it outlines 4 options for logical topology. The doc provides a recommendation for logical topology (hybrid model) but I don't see a recommendation for physical topology. Did I miss it? |
rfcs/rfc2-doublezero_exchange.md
Outdated
| - Vendor solution abstracts away operational overhead of operating DZX for a cost | ||
| - Lack of visibility into network fabric | ||
|
|
||
| Cons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With DZX as a vendor fabric solution, is it a con that we won't be able to have a single DZX that spans multiple data center providers in any given metro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean by data center provider? Data center vendor like Equinix, Coresite etc.?
All of the vendors we have evaluated so far have POPs in multiple data centers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant Equinix, Coresite, etc., and I was alluding to the fact that if we have a DZX that spans multiple data center vendors (like Equinix + Coresite) then their fabric won't be able to provide connectivity between all the DZDs in the DZX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially yes, hopefully they would cover the vast majority. If we ever choose to use one of them, then we should probably create a separate design doc to review that.
rfcs/rfc2-doublezero_exchange.md
Outdated
| - New automation and configuration model to support DZX switch use-case | ||
| - Small switch buffers with Tomahawk may not be sufficient to absorb traffic bursts towards busy DZDs in the metro | ||
| - Limitations of scaling a single broadcast domain, for example the DZX topology MUST remain loop-free and MUST NOT rely on spanning-tree protocols | ||
| - Limited to three DZX switches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the three switch limit come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, this is an inaccurate statement. If we build physical topology option 3, we could have loop free and any number of DZX switches. If we build physical topology option 2, we are limited to 2 DZX switches to keep it loop free if the fabric is layer 2.
rfcs/rfc2-doublezero_exchange.md
Outdated
| - Limited to three DZX switches | ||
| - Is this a real limitation? | ||
| - Complex to integrate within a hybrid community/vendor model | ||
| - Layer 2 hand-off complexity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific about layer 2 hand-off complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complexity exists in creating a VLAN mapping database that represents virtual cross-connects between the different contributors. This will need to exist for the vendor fabric if we use that tech at all, but if we combine it in to a hybrid model then that needs to be extended through a DZX layer 2 fabric, potentially mapping to trunk ports and sub-interfaces on the DZDs.
How much complexity that is I suppose is open to debate and will have to be factored in to the cost of using layer 3 switches. Having a layer 3 hand-off feels much simpler to reason about.
rfcs/rfc2-doublezero_exchange.md
Outdated
| Cons | ||
| - New automation and configuration model to support DZX switch use-case | ||
| - Small switch buffers with Tomahawk may not be sufficient to absorb traffic bursts towards busy DZDs in the metro | ||
| - Limitations of scaling a single broadcast domain, for example the DZX topology MUST remain loop-free and MUST NOT rely on spanning-tree protocols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth calling out that we can bundle multiple links between two switches using link aggregation without creating loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"DZX topology MUST remain loop-free" - if a contributor accidentally cables up a loop, do we have any mitigations? If not, that seems like a deal breaker that takes the L2 DZX off the table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a mechanism that enforces topology, and not enable a link if it breaks these rules as part of a link RFS testing.
If we have a layer 2 network, we should run any protocols that are necessary to protect against layer 2 loops.
rfcs/rfc2-doublezero_exchange.md
Outdated
| - Vendor solution abstracts away operational overhead of operating DZX for a cost | ||
| - Lack of visibility into network fabric | ||
|
|
||
| Cons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cost a con for vendor fabrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have an economic model for vendor fabrics yet. It is being worked. List prices make it prohibitively expensive. It is one of the open questions.
Fair point, there is no explicit recommendation for the physical topology currently. It is probably option 2 if we can answer the question about ownership and cost for the interconnect. Option 1 would work in small deployments. Option 3 is probably the best technical solution but most likely the most expensive to implement. |
|
|
||
| | Option | Description | Pros | Cons | Recommendation | | ||
| |----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | **1. DZX as a Layer 2 Fabric** | - DZX operated by DZ community members<br>- Each DZX switch operates at layer 2<br>- A single subnet (broadcast domain) is used to address all DZD NNIs facing the DZX<br>- The existing DZ network routing is extended across the DZX<br> - A full-mesh of IS-IS/PIM neighbors is formed across the DZX<br> - The DZX is transparent from a routing perspective<br>- All links are assumed to be of equal latency<br> - Helps incentivize diversification out of a single data center<br> - Prevents an arms race in the metro<br>- The DZX switch maintains a layer 3 connection for telemetry | - The DZX switch requires a simpler feature set (layer 2 only) than a DZD<br>- Allows the DZX operator to purchase less expensive switch platforms e.g. Arista 7060X6 Tomahawk<br>- All links are assumed to be of equal latency<br> - Helps incentivize diversification out of a single data center<br> - Prevents an arms race in the metro<br>- Full visibility for monitoring and troubleshooting | - New automation and configuration model to support DZX layer 2 switch use-case<br>- Small switch buffers with Tomahawk may not be sufficient to absorb traffic bursts towards busy DZDs in the metro<br> - Would likely require traffic shaping on DZDs<br>- Limitations of scaling a single broadcast domain, for example the DZX topology MUST remain loop-free and MUST NOT rely on spanning-tree protocols<br> - Limited to two DZX switches<br> - Requires uses of layer 2 protocols such as spanning-tree protocol, storm-control<br>- Complex to integrate within a hybrid community/vendor model<br> - Layer 2 hand-off complexity e.g. mapping between VLANs | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a nitpick but I can't help it: "MUST NOT rely on spanning-tree protocols".... "Requires uses of layer 2 protocols such as spanning-tree protocol" 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a nitpick. That definitely needs to be made clearer and will address.
| ## New Terminology | ||
|
|
||
| - DoubleZero Exchange (DZX): a network fabric creating a contiguous network between all network contributors with a metro | ||
| - DoubleZero Device / DoubleZero eXchange (DZDx): a single DZ network device that acts as both a DZD and a DZX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this - dzdx - but we'll just use it internally right? It'll just be confusing to folks otherwise while we only have one dedicated DZX (galaxy-ny) and mostly DZDxs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. Ship it. Sorry for being the slow poke on this review.