-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mctpd does not track network interface changes correctly #55
Comments
In our solution we decided to use small fix, described in first option: |
Thanks for the report; for some reason this had not been visible for me until now. I'll take a look and review the proposed fix too. |
Excellent question; this was a while ago, and we should probably see if we can unify this. Your approach of sharing the linkmap seems sound (but I think we could restructure this a little to create a "secondary socket" that shares the original nl state). However, I would like to revisit whether we still need the separate sockets in the first place. |
OK, we need the separate netlink sockets to split the monitor and query operations. For example:
So, having a the separate However, those two sockets could definitely share the linkmap data. I'll work on a structure to allow that. |
Maybe then there is no point for the second socket (sd_monitor) in the nl_query context? |
The mctpd daemon was loaded. During the OS operation, mctpi2c* interfaces were repeatedly created and deleted.
After some time, a device was added via the SetupEndpoint d-bus interface. The device was added, but there was no communication with it via the mctpd daemon. Only rebooting the daemon and re-adding the device helped.
I enabled debug output (by adding "-v" to mctpd) and received the following entries (the log was cleaned up a bit):
Then I analyzed and debugged mctpd and found the essence of the problem. Receiving information and working with the kernel (in terms of managing addresses, routes and neighbors) is done via Netlink sockets. For this purpose, the context structures [1] "mctp_nl *nl" and "mctp_nl *nl_query" are defined in the ctx context. They both contain a list of kernel interfaces [2], which are filled when creating the nl, nl_query contexts.
When mctpd is running and receiving information from the kernel about network interface changes, changes are made only to the list of interfaces in the nl context (when calling the mctp_nl_handle_monitor function). The list of interfaces in the nl_query context does not change, so it becomes obsolete. Since the nl_query context is used to execute mctp_nl_route_del/mctp_nl_route_add, peer_neigh_update, peer_route_update queries, these queries operate with an obsolete interface table. I conducted an experiment by replacing nl_query with nl in the above queries and mctpd began to work correctly. I do not know for what purposes the work with the kernel Netlink interface was done through 2 contexts. In addition, each context contains 2 sockets (sd, sd_monitor), although sd_monitor is not used for the nl_query context. It turns out that the work is carried out via 3 netlink sockets. Therefore, perhaps my method is not correct. I would like to emphasize separately that this is only about exchanging netlink messages with the kernel and has nothing to do with exchanging with other mctp devices.
I also considered other ways to solve this situation. Since there is no need to keep 2 tables of network interfaces in each context, there were other options:
[1] https://github.com/CodeConstruct/mctp/blob/main/src/mctpd.c#L190
[2] https://github.com/CodeConstruct/mctp/blob/main/src/mctp-netlink.c#L44
The text was updated successfully, but these errors were encountered: