Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public MCTP Medium Physical Address to MCTP Endpoint D-Bus Object path #52

Open
ThuBaNguyen opened this issue Oct 2, 2024 · 31 comments

Comments

@ThuBaNguyen
Copy link
Contributor

  1. Background
  • One MCTP Endpoint can support multiple medium interfaces: I2C, USB, PCIe-VDM. The speed of the different medium interface can be different. The sensor polling, event monitoring should use the highest speed interface to increase the performance.
  • The upper application such as pldmd can support multiple MCTP-Binding interface at the same time. It can also choose of interface with highest speed. So pldmd need to know the medium type to decide which medium interface should it use to communicate with EP when there are many medium interfaces.
  1. Propose
  • Add the property name Type in au.com.codeconstruct.MCTP.Endpoint1 with data type as byte to define the medium type as Table 2 – MCTP physical medium identifiers in DSP0239 V1.3.0
  • Create the property name Address in au.com.codeconstruct.MCTP.Endpoint1 with data type as string or byte array as Table 27 – Routing Table Entry format in DSP0236 V1.3.0. Depends on the MCTP binding type the size of this Address property can different. Maybe we can add AddressSize property to identify the byte size of address.
  • As table Table 27 – Routing Table Entry format, the port number should also be public to MCTP Endpoint D-Bus object path. When the EP is bridge we need to identify a particular bus connection that the physical address for the entry is defined under that bridge.
@amboar
Copy link
Contributor

amboar commented Oct 2, 2024

By DSP0236 v1.3.3 Figure 9 this statement isn't necessarily true:

The upper application such as pldmd can support multiple MCTP-Binding interface at the same time. It can also choose of interface with highest speed. So pldmd need to know the medium type to decide which medium interface should it use to communicate with EP when there are many medium interfaces.

The one EID can be assigned to multiple ports on a device, in which case the interface selection must be done by the kernel route table.

Given that I think that we need to adjust your proposal a little. What thoughts do you have on accommodating multiple interfaces for a given EID?

@ThuBaNguyen
Copy link
Contributor Author

ThuBaNguyen commented Oct 3, 2024

The one EID can be assigned to multiple ports on a device, in which case the interface selection must be done by the kernel route table.

Given that I think that we need to adjust your proposal a little. What thoughts do you have on accommodating multiple interfaces for a given EID?

In that case, as my opinion, the Type property can be integrated to Address property.
The address type will be changed from array of byte to vector of pair {typeAddress, portNumber, array of byte}.
So the address will list the supported interfaces address struct of one EID. Each address includes the address type and portNumber and Address bytes.

@amboar
Copy link
Contributor

amboar commented Oct 4, 2024

So changing the address type isn't feasible without changing the object version. I think it's something we should try to avoid. Bear in mind that the object path is keyed by the EID, so using an array of addresses in the interface feels like it's shoe-horning the information into places its not designed to belong.

Further, if one EID is mapped across multiple ports on a device, exposing this information in userspace isn't useful anyway as it shouldn't be the responsibility of userspace to make the choice.

I think we need something more generic, possibly:

/au/com/codeconstruct/mctp1/devices/<UUID>

On this we can have one interface describing the network in which the device is participating, and another interface describing its physical ports, their properties, and the EID associated with each port. That way you can use the current known EID to look up the endpoint in mctpd, pull the UUID from the xyz.openbmc_project.Common.UUID interface, and then look it up under the devices path to find the associated interfaces. If all interfaces are mapped to one EID then its up to the kernel to make the routing choice, but if distinct ports on the device have distinct EIDs then at least userspace can choose an EID based on the transport properties.

@ThuBaNguyen
Copy link
Contributor Author

/au/com/codeconstruct/mctp1/devices/<UUID>

I think we need trims some character from UUID to make it compatible with D-Bus object path requirement. Right?


On this we can have one interface describing the network in which the device is participating, 

Do you mean the interface like below?

xyz.openbmc_project.MCTP.Network    interface -         -                                      -
.NetworkIds                         property  au         1 2 3                                 const

and another interface describing its physical ports, their properties, and the EID associated with each port.

Do you have any idea about the Physical ports interface.
I think we can use the name xyz.openbmc_project.MCTP.Ports.
What do you think about below.

xyz.openbmc_project.Common.Port     interface -         -                                      -
.EIDs                               property  ay        8 9                                    const
.Types                              property  ay        3 d                                    const
.PortNumber                         property  ay        0 3                                    const
.Address                            property  v{ay}     {I2Caddress} {PCIe Address}            const

So the device has 2 ports:

  • The port 0: EID = 8, medium type 3 (SMBus) , Portnumber (in routing entry) 0, PhyAddress {I2CAddress}
  • The port 1: EID = 9, medium type d (PCIe 5.0) , Portnumber (in routing entry) 3, PhyAddress {PCieAddress}

That way you can use the current known EID to look up the endpoint in mctpd, pull the UUID from the xyz.openbmc_project.Common.UUID interface, and then look it up under the devices path to find the associated interfaces. If all interfaces are mapped to one EID then its up to the kernel to make the routing choice, but if distinct ports on the device have distinct EIDs then at least userspace can choose an EID based on the transport properties.

@jk-ozlabs
Copy link
Member

I think we need trims some character from UUID to make it compatible with D-Bus object path requirement. Right?

There is no limit on object path length, but we would need to comply with the object name requirements, and format the UUID accordingly (mainly: no - characters)

On this we can have one interface describing the network in which the device is participating,

Do you mean the interface like below?

xyz.openbmc_project.MCTP.Network    interface -         -                                      -
.NetworkIds                         property  au         1 2 3                                 const

Probably not. We would want to describe the connectivity to a device, rather than a set of network numbers (and then EIDs/hardware addresses elsewhere). We may be able to do that with an object path reference, but can discuss with Andrew on the design there.

That way you can use the current known EID to look up the endpoint in mctpd, pull the UUID from the xyz.openbmc_project.Common.UUID interface, and then look it up under the devices path to find the associated interfaces. If all interfaces are mapped to one EID then its up to the kernel to make the routing choice, but if distinct ports on the device have distinct EIDs then at least userspace can choose an EID based on the transport properties.

I think we're missing part of the requirements here: how are you discovering those devices in the first place?

If we just have the Medium Type available on the endpoint object, you should then have all you need to choose an endpoint, no?

Overall, I think we need to make some overall design choices on whether we would "prefer" the split vs. unified EID approach here, and target our design decisions accordingly (of course, we'd still want to support both approaches regardless). I'll have a think about that and update shortly.

@amboar
Copy link
Contributor

amboar commented Oct 9, 2024

If we just have the Medium Type available on the endpoint object, you should then have all you need to choose an endpoint, no?

But it's a matter of finding the set of EIDs associated with a specific device, right? If we don't have an index object such as something addressed by the UUID, then we have to iterate all the EID objects to correlate the UUIDs to find the faster interface.

@ThuBaNguyen
Copy link
Contributor Author

I think we need trims some character from UUID to make it compatible with D-Bus object path requirement. Right?

There is no limit on object path length, but we would need to comply with the object name requirements, and format the UUID accordingly (mainly: no - characters)

On this we can have one interface describing the network in which the device is participating,

Do you mean the interface like below?

xyz.openbmc_project.MCTP.Network    interface -         -                                      -
.NetworkIds                         property  au         1 2 3                                 const

Probably not. We would want to describe the connectivity to a device, rather than a set of network numbers (and then EIDs/hardware addresses elsewhere). We may be able to do that with an object path reference, but can discuss with Andrew on the design there.

That way you can use the current known EID to look up the endpoint in mctpd, pull the UUID from the xyz.openbmc_project.Common.UUID interface, and then look it up under the devices path to find the associated interfaces. If all interfaces are mapped to one EID then its up to the kernel to make the routing choice, but if distinct ports on the device have distinct EIDs then at least userspace can choose an EID based on the transport properties.

I think we're missing part of the requirements here: how are you discovering those devices in the first place?

There are two ways the mctpd can discovery the device.

  1. As the current mctpd, when BMC is BO, we are using the MCTP D-Bus method such as .AssignEndpoint, .AssignEndpointStatic, .LearnEndpoint, and .SetupEndpoint. In this case the EID, Address is known, PortNumber is 0 in this case. Network is assigned by mctpd.
  2. In the future, when BMC is EP, the mctpd service can send GetRoutingTable to get the Routing table from BO, in each routing entry we have EID, Address, port number. Network is assigned by mctpd.

If we just have the Medium Type available on the endpoint object, you should then have all you need to choose an endpoint, no?

It is not enough because we can't map the medium type of one EID.

The other things, we need to concern is the name of endpoint (Terminus name) which is required to create the PLDM sensors. Currently, the terminus name can come from PLDM "Overall system entity AUX name PDR" in pldmd. The other solution is static configuration in mctpd.

  1. About "Overall system entity AUX name PDR", this PDR is optional, I found many device will not support this PDRs. They don't event support the Sensors name PDRs. CXL cards are example. When there are multiple card, the terminus name from PDRs can be the same, pldmd need to use the phyaddress to indexing the cards.
  2. About the static configuration, when BMC is EP it can't assign static EID for terminus. Moreover in the PCIe interface the physical address is enumerated in the boot up time. The static configuration will not work also.
    In that case, mctpd needs provide the routing table infos to D-Bus to allow the other application such as pldmd mapping the Port number, physical address to Terminus name.

Overall, I think we need to make some overall design choices on whether we would "prefer" the split vs. unified EID approach here, and target our design decisions accordingly (of course, we'd still want to support both approaches regardless). I'll have a think about that and update shortly.

Yah, please things about that. In the current design, only mctpd knows about the physical address (physical address is mandatory for one endpoint in MCTP interface). If mctpd does not share those info, the other application can get the trouble.

@ThuBaNguyen
Copy link
Contributor Author

Hi Jeremy, I check the peer struct in mctp code

dest_phys phys;
. Each peer has one MCTP physical address as current struct.
Do you think we can add PhyAddress1 interface to current Endpoint object patch as current peer struct design?

busctl introspect au.com.codeconstruct.MCTP1 /au/com/codeconstruct/mctp1/networks/1/endpoints/22
NAME                                TYPE      SIGNATURE RESULT/VALUE                           FLAGS
au.com.codeconstruct.MCTP.Physical1 interface -         -                                      -
.Address                            property  ay        1 78                                   const
.Interface                          property  s         "mctpi2c3"                             const
.Port                               property  y         0                                      const

@jk-ozlabs
Copy link
Member

Having a single physical addressing data on an endpoint object does not allow for the full set of addressing models proposed in DSP0236. As you say above, one endpoint may have multiple physical interfaces. It may also use one EID

So: no, that proposal will not work here.

I'm concerned about why we're attempting to expose the physical transport mechanism here; that's one of the primary implementation details that MCTP is aimed at abstracting! :)

Is the motivation for this just to allow upper-layer applications (ie, pldmd) to select the best transport available?

@ThuBaNguyen
Copy link
Contributor Author

Is the motivation for this just to allow upper-layer applications (ie, pldmd) to select the best transport available?

Yes. That is common use case of physical address D-Bus interface.

The addition use case (At least in Ampere system) is matching the PLDM sensors of one PLDM terminus with one specific PCIe card which supports PLDM message. We can match MCTP physical address of PLDM-MCTP endpoint with PCIe address of that card from SMBios.

@jk-ozlabs
Copy link
Member

Is the motivation for this just to allow upper-layer applications (ie, pldmd) to select the best transport available?

Yes. That is common use case of physical address D-Bus interface.

ok, I think we can do much better with route metrics instead. That way, the application does not need to perform its own routing for bandwidth optimisation.

The addition use case (At least in Ampere system) is matching the PLDM sensors of one PLDM terminus with one specific PCIe card which supports PLDM message. We can match MCTP physical address of PLDM-MCTP endpoint with PCIe address of that card from SMBios.

This sounds like an inventory management objective; wouldn't this have been handed at the time that the endpoint was first enumerated?

@ThuBaNguyen
Copy link
Contributor Author

ThuBaNguyen commented Jan 8, 2025

This sounds like an inventory management objective; wouldn't this have been handed at the time that the endpoint was first enumerated?

In Ampere system, BMC is endpoint in PCIe-VDM, EID of PCIe devices are set by SoC. But those EIDs is not static. The PCIe address of PCIe devices is also enumerated at the SoC boot time ( PCIe address of one card is not static also). So to match the PLDM sensors with PCIe card we need to match the MCTP PCIe address with SMBios address every time the SoC boot up.

There is not much OEM/OEM support PCIe-VDM, but I think at some point the others OEM/ODM will also face the similar issue as Ampere. Because the PCIe address of one PLDM-VDM endpoint is not static, in that case locate and naming the PLDM sensors of terminus will have trouble.

@jk-ozlabs
Copy link
Member

ok, what does the "SoC" refer to here? the BMC, or something else?

@ThuBaNguyen
Copy link
Contributor Author

ok, what does the "SoC" refer to here? the BMC, or something else?

SoC is Ampere CPU.

@amboar
Copy link
Contributor

amboar commented Jan 8, 2025

@ThuBaNguyen can you draw a diagram of your MCTP network(s), annotated with (TM)BOs? It would be helpful for me at least.

Also, the discussion feels like it echoes some of that at #46 (comment)

@ThuBaNguyen
Copy link
Contributor Author

ThuBaNguyen commented Jan 9, 2025

@ThuBaNguyen can you draw a diagram of your MCTP network(s), annotated with (TM)BOs? It would be helpful for me at least.
Below is diagram, two medium interfaces (PCIe-VDM and SMBus) are two different networks.

mctp_network

@amboar
Copy link
Contributor

amboar commented Jan 9, 2025

If S0 is the BO for the D1P1 link and the BMC is a TMBO, then it must be the case that the BMC is only the TMBO for the D1P2 link, as MCTP networks must be hierarchical. Further, there cannot be multiple TMBOs in a given network, so it must be the case that we have two networks:

  1. N0 = { S0:P1, S0:P3, D3:P1, D4:P1, BMC:P1 }
  2. N1 = { BMC:P2, S0:P2 }

The BMC is the TMBO for N1, and I presume S0 is the TMBO for N0.

As there's no routing between networks (by definition in DSP0236 - we can set aside your statements about no cross-connection), the concept of using the physical transport medium of BMC:P1 and BMC:P2 to choose how to talk to S0 is an error of reasoning. They're incomparable as they're in separate networks. You must first choose the network you wish to use. In the context of your diagram, choosing the network uniquely determines the EID you must use with no regard for the physical transport, as there's only one relevant EID in each network.

How you choose the networks is another matter, but I think consideration of physical transports is a misdirection.

@ThuBaNguyen
Copy link
Contributor Author

If S0 is the BO for the D1P1 link and the BMC is a TMBO, then it must be the case that the BMC is only the TMBO for the D1P2 link, as MCTP networks must be hierarchical. Further, there cannot be multiple TMBOs in a given network, so it must be the case that we have two networks:

  1. N0 = { S0:P1, S0:P3, D3:P1, D4:P1, BMC:P1 }
  2. N1 = { BMC:P2, S0:P2 }

The BMC is the TMBO for N1, and I presume S0 is the TMBO for N0.

As there's no routing between networks (by definition in DSP0236 - we can set aside your statements about no cross-connection), the concept of using the physical transport medium of BMC:P1 and BMC:P2 to choose how to talk to S0 is an error of reasoning.

"there's no routing between networks" yes, in the mctp network there is not cross-connection between PCIe and SMBus.
But in the upper layer such as pldmd it is different story. The S0 has two EIDs (EID 20 in N1 use SMBus and EID 30 in N0 use PCIe) identify by the same UUID. When the S0 boots up, N0 will be up first, BMC can send/receive the PLDM messages with S0 uses EID 20 but when N1 is up (PCIe) with higher bandwidth, BMC can switch to use EID 30 to communicate with S0

They're incomparable as they're in separate networks. You must first choose the network you wish to use. In the context of your diagram, choosing the network uniquely determines the EID you must use with no regard for the physical transport, as there's only one relevant EID in each network.

I agree with this.

How you choose the networks is another matter, but I think consideration of physical transports is a misdirection.

We still need know how to choose correct network (Chose N1 between N0 and N1 in Ampere case).

@amboar
Copy link
Contributor

amboar commented Jan 9, 2025

"there's no routing between networks" yes, in the mctp network there is not cross-connection between PCIe and SMBus.

I'm a bit wary of this statement with its "in the mctp network"; there shouldn't be one network for your scenario, you need two. Have you configured the MCTP stack with two?

When the S0 boots up, N0 will be up first, BMC can send/receive the PLDM messages with S0 uses EID 20 but when N1 is up (PCIe) with higher bandwidth, BMC can switch to use EID 30 to communicate with S0

Just to clarify, N1 is the SMBus link in my description above, N0 is the PCIe link.

We still need know how to choose correct network (Chose N1 between N0 and N1 in Ampere case).

Rather than over-generalise, can you exploit the host power state for your specific circumstances?

@ThuBaNguyen
Copy link
Contributor Author

"there's no routing between networks" yes, in the mctp network there is not cross-connection between PCIe and SMBus.

I'm a bit wary of this statement with its "in the mctp network"; there shouldn't be one network for your scenario, you need two. Have you configured the MCTP stack with two?

Yes. I configured the MCTP stack with two. Similar as below.

busctl tree au.com.codeconstruct.MCTP1
`- /au
  `- /au/com
    `- /au/com/codeconstruct
      `- /au/com/codeconstruct/mctp1
        |- /au/com/codeconstruct/mctp1/interfaces
        | |- /au/com/codeconstruct/mctp1/interfaces/lo
        | |- /au/com/codeconstruct/mctp1/interfaces/mctpi2c3
        | `- /au/com/codeconstruct/mctp1/interfaces/mctppcie0
        `- /au/com/codeconstruct/mctp1/networks
          |- /au/com/codeconstruct/mctp1/networks/1
          | `- /au/com/codeconstruct/mctp1/networks/1/endpoints
          |   `- /au/com/codeconstruct/mctp1/networks/1/endpoints/8
          `- /au/com/codeconstruct/mctp1/networks/2
            `- /au/com/codeconstruct/mctp1/networks/2/endpoints
              `- /au/com/codeconstruct/mctp1/networks/2/endpoints/31

When the S0 boots up, N0 will be up first, BMC can send/receive the PLDM messages with S0 uses EID 20 but when N1 is up (PCIe) with higher bandwidth, BMC can switch to use EID 30 to communicate with S0

Just to clarify, N1 is the SMBus link in my description above, N0 is the PCIe link.

Sorry,
"When the S0 boots up, N1 will be up first, BMC can send/receive the PLDM messages with S0 uses EID 20 but when N0 is up (PCIe) with higher bandwidth, BMC can switch to use EID 30 to communicate with S0"

We still need know how to choose correct network (Chose N1 between N0 and N1 in Ampere case).

Rather than over-generalise, can you exploit the host power state for your specific circumstances?

In Ampere system, N1 (SMBus) is still available when N0 (PCIe) is up. So when BMC reboots with the host is on. N1 and N0 can be up at the same time. So as my opinion, we can't use the host power state in that case.

@amboar
Copy link
Contributor

amboar commented Jan 9, 2025

In Ampere system, N1 (SMBus) is still available when N0 (PCIe) is up. So when BMC reboots with the host is on. N1 and N0 can be up at the same time. So as my opinion, we can't use the host power state in that case.

If pldmd knows of the host power state and is aware of the network topology, the host power state can be used to select the network to use (regardless of the fact that both N0 and N1 are valid), right?

Edit: perhaps this is getting a bit off-topic for the mctp repo, might be best to discuss it in OpenBMC forums?

@ThuBaNguyen
Copy link
Contributor Author

In Ampere system, N1 (SMBus) is still available when N0 (PCIe) is up. So when BMC reboots with the host is on. N1 and N0 can be up at the same time. So as my opinion, we can't use the host power state in that case.

If pldmd knows of the host power state and is aware of the network topology,

busctl introspect au.com.codeconstruct.MCTP1 /au/com/codeconstruct/mctp1/networks/1
NAME                                TYPE      SIGNATURE RESULT/VALUE FLAGS
au.com.codeconstruct.MCTP.Network1  interface -         -            -
.LocalEIDs                          property  ay        1 8          const
org.freedesktop.DBus.Introspectable interface -         -            -
.Introspect                         method    -         s            -
org.freedesktop.DBus.Peer           interface -         -            -
.GetMachineId                       method    -         s            -
.Ping                               method    -         -            -
org.freedesktop.DBus.Properties     interface -         -            -
.Get                                method    ss        v            -
.GetAll                             method    s         a{sv}        -
.Set                                method    ssv       -            -
.PropertiesChanged                  signal    sa{sv}as  -            -
busctl introspect au.com.codeconstruct.MCTP1 /au/com/codeconstruct/mctp1/networks/2
NAME                                TYPE      SIGNATURE RESULT/VALUE FLAGS
au.com.codeconstruct.MCTP.Network1  interface -         -            -
.LocalEIDs                          property  ay        1 9          const
org.freedesktop.DBus.Introspectable interface -         -            -
.Introspect                         method    -         s            -
org.freedesktop.DBus.Peer           interface -         -            -
.GetMachineId                       method    -         s            -
.Ping                               method    -         -            -
org.freedesktop.DBus.Properties     interface -         -            -
.Get                                method    ss        v            -
.GetAll                             method    s         a{sv}        -
.Set                                method    ssv       -            -
.PropertiesChanged                  signal    sa{sv}as  -            -

Above is the D-Bus property of MCTP network, Do you have any idea to chose which network to use?

the host power state can be used to select the network to use (regardless of the fact that both N0 and N1 are valid), right?

Edit: perhaps this is getting a bit off-topic for the mctp repo, might be best to discuss it in OpenBMC forums?

@santoshpuranik
Copy link

We have similar use-case where MCTP applications protocol services (PLDM/SPDM/...) need to select a fastest transport where we have multiple paths to the same device. I don't think it matters whether or not the devices are on the same network.

On our platforms, both paths to the device are active at all times (on the same power domain) and the BMC is the BO on both of them..

I see @jk-ozlabs 's comment above about using route metrics above, and while I agree that seems like the "right" way to approach this problem, how does one build these metrics in the first place? Wouldn't that entail running some dummy traffic for the first selection?

Would it be OK if every endpoint object in mctpd exposes the underlying binding type as described in DSP0239? The intent being to use that in-lieu of port metrics as a "simpler" way to choose the right EID for a given device?

@jk-ozlabs
Copy link
Member

I see @jk-ozlabs 's comment above about using route metrics above, and while I agree that seems like the "right" way to approach this problem, how does one build these metrics in the first place?

That would happen as part of the initial route setup: we would assign a suitable metric at that time, likely based on a heuristic from the transport type.

This would probably just be a fairly simple configuration on mctpd's side, just something like PCIe VDM transports get a metric of 1, SMBus transports get a metric of 2, ...

Wouldn't that entail running some dummy traffic for the first selection?

I'd rather this be based on a configured logic, rather than runtime measurement.

@jk-ozlabs
Copy link
Member

Would it be OK if every endpoint object in mctpd exposes the underlying binding type as described in DSP0239? The intent being to use that in-lieu of port metrics as a "simpler" way to choose the right EID for a given device?

The issue is that "underlying binding type" is not well defined. For example:

  • For an endpoint that is not a direct neighbour: do we use the binding type for connectivity to the neighbour (bridge) instead? or the "slowest" binding in the path? or something else? (or do we even expose those endpoints?)
  • For an endpoint that is reachable over multiple transports (with the same EID), there is no single binding type to expose

@santoshpuranik
Copy link

This would probably just be a fairly simple configuration on mctpd's side, just something like PCIe VDM transports get a metric of 1, SMBus transports get a metric of 2, ...

Ah, okay. I misunderstood your statement about metrics, then. So what you are proposing is rather than expose the binding type and have various users interpret them and duplicate that logic, we'd just expose a simple rank that apps can choose the "lowest" number from in case the same device (identified via its MCTP UUID) is reachable via multiple EIDs, did I get that right?

@jk-ozlabs
Copy link
Member

So what you are proposing is rather than expose the binding type and have various users interpret them and duplicate that logic, we'd just expose a simple rank that apps can choose the "lowest" number from in case the same device (identified via its MCTP UUID) is reachable via multiple EIDs, did I get that right?

Partially :)

You're correct on not having every application decide on the best message routing internally. As you say, we would have mctpd set the priorities on specific transport types, so we only need to define that logic in one place.

The metrics themselves would be attributes of a route, and are set on the kernel MCTP routing table. The kernel just chooses the route with the best metric for any given destination EID. So, if a PCIe VDM transport is available, it will be used in preference to SMBus. If that VDM transport goes down, SMBus becomes the route with the best metric, so we still have connectivity.

Then, applications just send to a single EID of the destination endpoint, and the kernel handles the rest, based on the information that mctpd has provided in the initial route setup.

This is based on a fairly standard setup of multiple IP links used in linux too; here my laptop ethernet link is used in preference to the wifi:

default via 192.168.72.1 dev enxf4a80d76fdcc proto dhcp src 192.168.72.171 metric 100 
default via 172.16.128.1 dev wlp1s0 proto dhcp src 172.16.160.229 metric 600 

This behaviour does depend on adopting a MCTP network topology where each endpoint uses a single EID across all links though; otherwise we're back to a situation where each application needs to be making its own routing decisions, and also be aware of which EIDs are routable at any specific time.

@santoshpuranik
Copy link

The metrics themselves would be attributes of a route, and are set on the kernel MCTP routing table. The kernel just chooses the route with the best metric for any given destination EID. So, if a PCIe VDM transport is available, it will be used in preference to SMBus. If that VDM transport goes down, SMBus becomes the route with the best metric, so we still have connectivity.

Then, applications just send to a single EID of the destination endpoint, and the kernel handles the rest, based on the information that mctpd has provided in the initial route setup.

This is based on a fairly standard setup of multiple IP links used in linux too; here my laptop ethernet link is used in preference to the wifi:

default via 192.168.72.1 dev enxf4a80d76fdcc proto dhcp src 192.168.72.171 metric 100 
default via 172.16.128.1 dev wlp1s0 proto dhcp src 172.16.160.229 metric 600 

This behaviour does depend on adopting a MCTP network topology where each endpoint uses a single EID across all links though; otherwise we're back to a situation where each application needs to be making its own routing decisions, and also be aware of which EIDs are routable at any specific time.

Ah! this all does make sense (esp. the parallels to IP networks). However, I am not sure we can impose the "single EID for a device" restriction. Would it hurt to have both mechanisms and leave it to the applications to choose in the case where EIDs are different?

@santoshpuranik
Copy link

There can also be a case where the device supports only certain MCTP message types over certain transports (Ex. PLDM over PCIe and SPDM over I2C). I think any automatic route selection in the kernel would break such a case?

@ThuBaNguyen
Copy link
Contributor Author

This behaviour does depend on adopting a MCTP network topology where each endpoint uses a single EID across all links though;

"a MCTP network topology" includes BMC and multiple MCTP devices (CPU, PCIe cards) from different OEM/ODM, "single EID for a device" restriction requires the adopting in mctp-stack of those MCTP devices software as well.

otherwise we're back to a situation where each application needs to be making its own routing decisions, and also be aware of which EIDs are routable at any specific time.

This is my situation, in SMBus BMC is BO it will set EID 20 for S0. But in PCIe VDM, the S0 CPU firmware is BO and it will use different EID. I can't force the CPU firmware reuse EID 20.
Moreover, we use different networks for SMBus and PCIe medium interfaces. As my understanding using route metrics above can only be applied for single network.

@jk-ozlabs
Copy link
Member

As my understanding using route metrics above can only be applied for single network.

Yes, and where an endpoint only uses one EID.

Would it hurt to have both mechanisms and leave it to the applications to choose in the case where EIDs are different?

I think that would still be okay. It is reasonable to expose the physical medium details for cases where the routing is managed externally (ie, by the application selecting an EID).

(while I still think that's a suboptimal design, we can still accommodate it)

However, we don't yet have a scheme for that defined which satisfies the issues I have outlined above. I suspect we will need a set of physical address details per endpoint, where that set may contain zero or more items. We'll also need to define the policy for physical addressing details on non-local endpoints.

@mkj mkj changed the title Puplic MCTP Medium Physical Address to MCTP Endpoint D-Bus Object path Public MCTP Medium Physical Address to MCTP Endpoint D-Bus Object path Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants