Skip to content

Conversation

@bgm-malbeclabs
Copy link
Contributor

@bgm-malbeclabs bgm-malbeclabs commented Jun 9, 2025

Summary of Changes

@bgm-malbeclabs bgm-malbeclabs marked this pull request as draft June 9, 2025 22:17
@bgm-malbeclabs bgm-malbeclabs changed the title !#475 - simultaneous tunnels RFC RFC3: simultaneous tunnels Jun 9, 2025
AccountType account_type
Pubkey owner
Pubkey device_pk
string name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something worth considering would be to have in the Device class the device_type or an equivalent field that allows us validate that the interface name is sensible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the device_type is there as a switch, but something more like device_model could capture this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a list of approved device_models that we support that we could validate against? or is this more for something that we could check in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two models: 7130LBR and 7280CR3. Both have very different conventions around interface naming e.g. swA/B/C on the former and EthA/B on the latter.

Can be future, but we need to figure out how we want to fail if these sorts of fields prove to have garbage data.

* can a user set their own termination "points" etc

#### Controller
* *optional*: configuration for tunnel termination loopbacks generated in device template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to do this in the controller templating.

Copy link
Contributor Author

@bgm-malbeclabs bgm-malbeclabs Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; can we launch multicast without the controller templating? (my thought is yes but could be convinced otherwise of course). Feels like the templating could be a separate RFC to sunset ansible so that we don't block multicast too long but again open to suggestions otherwise.

Copy link
Contributor

@ben-malbeclabs ben-malbeclabs Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take parts of Ansible config generation process and move it to the controller over time. I think here we just need to add loopback interfaces, and I would hope it isn't too much work. We need a way to allocate these from the dz_prefix pool and pass them from the activator to the controller I guess.

Feel free to correct me if it is more work than I appreciate, we can continue to push it manually for now.

3. Initiate latency probes per tunnel termination
4. Store results as <Device: Interface: LatencyResult> and serve via /latency endpoint for CLI

#### Activator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elitegreg are there any things that are missing from the activator / smart contract side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of, but I'm not sure I'm grasping the Activator change listed.


## Motivation

Multiple tunnel support is required now that DoubleZero supports IBRL and multicast. In fact, multicast can not be publicly released wthout multiple tunnel support due to restrictions in the Linux kernel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal, but the limitations aren't from Linux but rather GRE.

}

Interface --> Device : device_pk
````
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juan-malbeclabs you probably understand this better than me. Wouldn't this design require loading all interfaces in order to reduce to the set for device_pk? Would it be better to have a list of interface pubkeys in the device? Then they can be looked up by the account pubkey?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I conceptually agree with the diagram, I would store all interfaces in a second account linked to the Device. We would create a DeviceInterfaceList account. This way, we would need to read both the list of Devices and the list of Interfaces (one account per Device with multiple Interfaces).

The goal is to be able to modify this account whenever an interface is created or deleted, without having to load the Device information.

### Service Changes

#### CLI
1. The CLI currently selects the tunnel termination endpoint for a user connection based on min(latency) across all DZDs. In the event there is an existing tunnel terminated on the DZD, we need to select the next best endpoint on the same DZD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not fair to just assume that the latency of one interface is as good as another on the same DZD and just iterate the interfaces as more tunnels are added?

3. Initiate latency probes per tunnel termination
4. Store results as <Device: Interface: LatencyResult> and serve via /latency endpoint for CLI

#### Activator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of, but I'm not sure I'm grasping the Activator change listed.

@bgm-malbeclabs bgm-malbeclabs requested a review from Copilot June 12, 2025 18:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces RFC3, a design proposal for enabling simultaneous tunnel support in DoubleZero by defining a new Interface data structure, updating service components, and outlining related on-chain and network changes.

  • Add initial RFC document outlining motivation, design, and components affected.
  • Define new Interface structure alongside Device, and detail CLI, daemon, activator, and controller updates.
  • Document security considerations, backwards compatibility impact, and open questions.
Comments suppressed due to low confidence (1)

rfcs/rfc3-simultaneous-tunnels.md:54

  • The Mermaid code block is opened with three backticks but closed with four; use three backticks consistently.
</details>


#### CLI
1. The CLI currently selects the tunnel termination endpoint for a user connection based on min(latency) across all DZDs. In the event there is an existing tunnel terminated on the DZD, we need to select the next best endpoint on the same DZD.
2. Users need to be able perform CRUD operations on the on-chain interfaces i.e. `doublezero interface create`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the interfaces associated with the devices created by the contributor? In this context, who is the User?
Who has the authority to execute the doublezero interface create command?


#### CLI
1. The CLI currently selects the tunnel termination endpoint for a user connection based on min(latency) across all DZDs. In the event there is an existing tunnel terminated on the DZD, we need to select the next best endpoint on the same DZD.
2. Users need to be able perform CRUD operations on the on-chain interfaces i.e. `doublezero interface create`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think contributors need to do this, not users.

3. Users need to be able to display interfaces listed on-chain via `doublezero interfaces list` or some derivative command.

#### Daemon
Latency probing changes are needed for this as the current implementation looks at the public_ip field of device record to probe each DZD:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding the benefit (or necessity) of testing latency to every tunnel termination interface on every DZD. Why would we expect different IPs on the same router to have different latency characteristics?


2. **Require users to obtain a second public address.** While this would satisfy the requirement of a unique (source, destination) tunnel IP endpoint per tunnel, it pushes this issue back on the users of DoubleZero and possibly prevents user uptake at the expense of more engineering work.

3. **Adapt the devices table in the current smart contract to fit a second tunnel (i.e. multicast) endpoint.** While this seems like significantly less work on its face, we end up needing to touch the same portions of the stack as a more ideal solution as they all need to be taught to understand this field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also has the downside of limiting us to 2 tunnels, so it's not a general solution


```mermaid
classDiagram
class Interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we can infer the interface that a Link is associated with by looking at the link's tunnel_net. But now that we have Interface, should we make this explicit by adding an optional relationship between Interface and Link?

@bgm-malbeclabs bgm-malbeclabs changed the title RFC3: simultaneous tunnels [on hold] RFC3: simultaneous tunnels Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants