Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions rfcs/rfc3-simultaneous-tunnels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Supporting Multiple Tunnels

## Summary

DoubleZero needs to support multiple tunnels of the same or different types. In order to terminate multiple tunnels to the same DZD from a user machine, a unique pair of (source, destination) tunnel IP endpoints is required for the Linux kernel to correctly demux traffic. This document defines the changes necessary to allow this and bootstraps the ability to begin storing network interface metadata which is necessary for upcoming work. The outcome of this work would be a new on-chain interfaces table, minimally populated with interfaces used for tunnel termination and all subsequent systems referring to this table to derive tunnel endpoints as opposed to the `public_ip` field of the devices table.

## Motivation

Multiple tunnel support is required now that DoubleZero supports IBRL and multicast. In fact, multicast can not be publicly released without multiple tunnel support due to restrictions in the Linux kernel.

## New Terminology

## Alternatives Considered

1. **Only support one tunnel on a user's machine.** In the current DoubleZero architecture, we're unable to support both unicast and multicast forwarding on a single tunnel. This would require a user to make a choice between using DoubleZero for unicast traffic or multicast traffic, which is not a user friendly tradeoff.

2. **Require users to obtain a second public address.** While this would satisfy the requirement of a unique (source, destination) tunnel IP endpoint per tunnel, it pushes this issue back on the users of DoubleZero and possibly prevents user uptake at the expense of more engineering work.

3. **Adapt the devices table in the current smart contract to fit a second tunnel (i.e. multicast) endpoint.** While this seems like significantly less work on its face, we end up needing to touch the same portions of the stack as a more ideal solution as they all need to be taught to understand this field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also has the downside of limiting us to 2 tunnels, so it's not a general solution


4. **Use GRE keys to identify tunnels.** GRE keys enable a route to de-encapsulate packets and idenfity the right tunnel to use. This would have been a good approach except that at rates of about 250 Mbps, packets were being dropped which makes it unviable option.

## Detailed Design

TBX

### Data Structure Changes

A new data structure, `Interface`, will be defined that is attached to a parent `Device`'s public key. The relationship between an `Interface` and `Device` is many-to-one.

```mermaid
classDiagram
class Interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we can infer the interface that a Link is associated with by looking at the link's tunnel_net. But now that we have Interface, should we make this explicit by adding an optional relationship between Interface and Link?

AccountType account_type
Pubkey owner
Pubkey device_pk
string name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something worth considering would be to have in the Device class the device_type or an equivalent field that allows us validate that the interface name is sensible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the device_type is there as a switch, but something more like device_model could capture this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a list of approved device_models that we support that we could validate against? or is this more for something that we could check in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two models: 7130LBR and 7280CR3. Both have very different conventions around interface naming e.g. swA/B/C on the former and EthA/B on the latter.

Can be future, but we need to figure out how we want to fail if these sorts of fields prove to have garbage data.

string device_type
IpV4Inet ip4_addr
bool tunnel_termination
}
class Device {
AccountType account_type
Pubkey owner
u128 index
u8 bump_seed
Pubkey location_pk
Pubkey exchange_pk
DeviceType device_type
IpV4 public_ip
DeviceStatus status
String code
NetworkV4List dz_prefixes
}

Interface --> Device : device_pk
````
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juan-malbeclabs you probably understand this better than me. Wouldn't this design require loading all interfaces in order to reduce to the set for device_pk? Would it be better to have a list of interface pubkeys in the device? Then they can be looked up by the account pubkey?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I conceptually agree with the diagram, I would store all interfaces in a second account linked to the Device. We would create a DeviceInterfaceList account. This way, we would need to read both the list of Devices and the list of Interfaces (one account per Device with multiple Interfaces).

The goal is to be able to modify this account whenever an interface is created or deleted, without having to load the Device information.


### Network Changes
IPs will be assigned from a general pool of IP addresses. These IP addresses will be originally sourced from the IPs that the contributors provide through their minimum /29. These IPs are already used to assign src IPs for multicast tunnels. There is a limited supply of IPs that will be exhausted somewhat quickly. To mitigate the IP resource problem, DoubleZero can either request more IPs from network contributors or if necessary, IPs can be pulled from the /21 that DoubleZero owns. These are being set aside for edge filtration so they should only be used if absolutely necessary.

### Service Changes

#### CLI
1. The CLI currently selects the tunnel termination endpoint for a user connection based on min(latency) across all DZDs. In the event there is an existing tunnel terminated on the DZD, we need to select the next best endpoint on the same DZD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not fair to just assume that the latency of one interface is as good as another on the same DZD and just iterate the interfaces as more tunnels are added?

2. Users need to be able perform CRUD operations on the on-chain interfaces i.e. `doublezero interface create`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think contributors need to do this, not users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the interfaces associated with the devices created by the contributor? In this context, who is the User?
Who has the authority to execute the doublezero interface create command?

3. Users need to be able to display interfaces listed on-chain via `doublezero interfaces list` or some derivative command.

#### Daemon
Latency probing changes are needed for this as the current implementation looks at the public_ip field of device record to probe each DZD:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding the benefit (or necessity) of testing latency to every tunnel termination interface on every DZD. Why would we expect different IPs on the same router to have different latency characteristics?

1. Look at device table and then the interface table based on the device pubkey
2. Filter on tunnel termination interfaces per device
3. Initiate latency probes per tunnel termination
4. Store results as <Device: Interface: LatencyResult> and serve via /latency endpoint for CLI

#### Activator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elitegreg are there any things that are missing from the activator / smart contract side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of, but I'm not sure I'm grasping the Activator change listed.

* Logic for assigning an IP will need to be modified to account for `n` > 1 IPs instead of just the first IP available
* Smart contract will need to be amended to associate `n` > 1 interfaces with a particular device
* Initial bootstrapping of a device may have to be revisited


#### Controller
* *optional*: configuration for tunnel termination loopbacks generated in device template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to do this in the controller templating.

Copy link
Contributor Author

@bgm-malbeclabs bgm-malbeclabs Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; can we launch multicast without the controller templating? (my thought is yes but could be convinced otherwise of course). Feels like the templating could be a separate RFC to sunset ansible so that we don't block multicast too long but again open to suggestions otherwise.

Copy link
Contributor

@ben-malbeclabs ben-malbeclabs Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take parts of Ansible config generation process and move it to the controller over time. I think here we just need to add loopback interfaces, and I would hope it isn't too much work. We need a way to allocate these from the dz_prefix pool and pass them from the activator to the controller I guess.

Feel free to correct me if it is more work than I appreciate, we can continue to push it manually for now.

* *optional*: migrate logic from ansible into the controller to reduce the need for ansible

## Security Considerations

While this RFC introduces the concept of more tunnels, the same security mechanisms are in place that guard against unauthorized actors through the allowlist generated through the smart contract. If there are security vulnerabilities, they exist for any and all tunnels.

There is more information exposed on-chain, namely the `interface` struct. Perhaps someone could use that information to put together a fuller picture of a contributor's topology, but network contributors are providing resources that will be used in an open and transparent way so this is likely not an issue.


## Backwards Compatibility

New logic will introduce a breaking change as this RFC covers the initial rollout of multicast. This release will be tagged with a minor version of 0.2.0 to signify the breaking change.

## Open Questions
* While not necessary for this initial multiple tunnels RFC, should logic be added to the controller to start handling some of the ansible functionality?
* Updating the smart contract seems non-trivial; must it be this way or are there things that can reduce the friction to smart contract changes?
* What kind of data validation / sanitization is required to ensure that bad data isn't entered? In a SQL db, indexes can be used (or am ORM) to ensure data confirms but not sure what kind of on-chain validation can or should be done.
* Should a user be able to provide their own "termination point" or should it be assigned by DoubleZero? To start, it makes sense to not allow this but is this functionality that a user would want?