-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware interface lifecycle behavior does not follow ros2 design document #2079
Comments
Hello @Martin-Oehler! Regarding the issue, at the startup of the ros2_control this is indeed the desired behaviour. Once the startup is properly done, if any of the methods return the ERROR, the When I use this code in production, if the robot is not able to start properly and setup all the controllers. I want it to crash and fail instead of continuing. For your case, you can control the initial state by setting the parameter I hope this answers your questions:) |
Also one thing is pretty certain that you should never expect to have lifecycle node services for anything ros2_control. It massively violates the framework design. We are using the API and try aligning with the state machine as close as possible so there is something familiar for people. |
I agree in terms of services and CLI, but should CM really throw an exception if something is wrong in |
Ah that it shouldn't. We just haven't gotten to make that bulletproof
enough.
…On Sun, 2 Mar 2025, 17:36 Christoph Fröhlich, ***@***.***> wrote:
I agree in terms of services and CLI, but should CM really throw an
exception if something is wrong in on_configure() or on_activate()?
—
Reply to this email directly, view it on GitHub
<#2079 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA24PYKOTIBIXMHTHJESLPD2SM6QRAVCNFSM6AAAAABYCYD6VKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJSHAZTAMJSGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: christophfroehlich]*christophfroehlich* left a comment
(ros-controls/ros2_control#2079)
<#2079 (comment)>
I agree in terms of services and CLI, but should CM really throw an
exception if something is wrong in on_configure() or on_activate()?
—
Reply to this email directly, view it on GitHub
<#2079 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA24PYKOTIBIXMHTHJESLPD2SM6QRAVCNFSM6AAAAABYCYD6VKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJSHAZTAMJSGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I believe this was intentional only at the startup, this was introduced in #1729. ros2_control/controller_manager/src/controller_manager.cpp Lines 552 to 564 in 3aa38b0
If the user's intent is to set an initial state to the hardware component and if it is not able to set, it is better to fail initially than later |
Post startup, any change in component state between the lifecycle states is bulletproofed already |
Thank you very much for your answers. I can not follow the reasoning why it should be better to crash the whole process instead of going back into Consider the following two use-cases:
Both these cases can not be covered by the current implementation because the whole process just crashes and takes all interfaces with it. From my understanding, the whole point of the life-cycle system is to increase robustness and prevent exactly this case. |
This is something worth to discuss. The point of view from @saikishor is that if someone gives the initial state (explicitly, or implicitly) of the components in the configuration files and the launch fails, then this is an unintended behavior and should to blow up. |
Yes, as @christophfroehlich explained if the users intention is to have certain hardware components state to a specific initial state and if it is not respecting it, I think it's better to crash as you can encode other recovery behaviours, because all your startup controller spawners will no more work as they cannot get their interfaces as the hardware is not in expected state. For instance, for ethercat based systems, sometimes you have to restart the EtherCAT master, if something is not working. If my application crashes, based on the exit code I can decide to restart my EtherCAT master or not and restart my ros2_control_node app. If it continues to be active, there is no way right now to stop the spawners from spawning. Moreover, if your hardware needs time to boot up and it needs to wait until successful, why not do this in your hardware component, you can take as long as you want and in theory it shouldn't affect the RT loop as this happens in non-RT loop. If that's not what you want, then better setup the hardware initial state parameter to set your specific hardware to unconfigured state and activate it with hardware spawners. |
I do not think this would be a solution for these two examples. Even if you delay the startup process by initiating the state transition at a later point with a service call, the setup might still fail and crash unless you know with absolute certainty that the hardware is already available, leading to duplicate logic.
Sorry, I do not have a deep knowledge about the controller startup process, but shouldn't controllers just wait until the respective hardware interfaces become active? I assume there is already some logic implemented for that as the configuration time of a hardware interface is unknown.
In some situations, the correct error handling behavior might be crashing. But in others, it may not. My point is, that the current implementation decides for you that the error will be handled with crashing. If the |
If you do it at later point via service call, it won't be crashing anymore. The crash is only when the intial_state is not respected. From later on via service call, it is respecting the Lifecycle design.
Not really, they do not wait for the hardware interface to be available. If the controller requests an interface that doesn't exist yet, then it doesn't activate and it aborts. There might be a case, where the controller requests an invalid one and there is no way we can keep it waiting expecting someone to expose that particular interface. Do you get my point?
As long as you don't set the |
I see, thanks. I will try to implement my desired behavior with your suggestions and update this ticket with my results. |
So I found a pretty simple solution to my problem. As suggested, I do not initialize the hardware interface on startup by setting the parameter Next, I start spawners for the hardware interface as well as for all mandatory controllers with respawn:
This will ensure that hardware and controllers are always transitioned to the active state when possible. However, I still find it very unintuitive that the lifecycle transitions are different depending on whether the state is progressed by the controller manager or services interface. But as the latter works as expected, this solution works fine for me now. |
Independent of the crash-or-no-crash question, your solution here should be added to our docs. Do you have suggestions where to put that? |
I think this page seems the most related: https://control.ros.org/rolling/doc/ros2_control/hardware_interface/doc/hardware_components_userdoc.html It also does not contain information regarding the different handling of errors (transition by controller manager vs service interface). I suggest to rework and extend the last paragraph "Handling of errors that happen during read() and write() calls" to a more general "Error handling" paragraph. Regarding my solution, I noticed some minor nit-pick. Respawning the hardware spawner repeatedly to transition the hardware interface to active works still works when the interface is already active. However, the controller spawner complains in these cases because the controller is already active and exits with an error. This error spam can make it hard to spot actual errors. |
If you can open a PR, we will be happy to review and merge it
I'm not sure if I understand your case. Do you mind explaining it again?. What do you mean when the interface is already active? I don't understand the fact how can the controller can activate, if the HW is not yet available |
Unfortunately, I will change employment soon and will loose access to my ROS2 system. I will see what I can do.
For my solution, I want to ensure that A) all hardware interfaces are always active and B) that the necessary controllers (e.g. the joint state broadcaster) is always active. B) is necessary because controllers are disabled when the hardware interface registers an error. This ensures a robust behavior in case of any errors can that be recovered from. I achieve this by starting two spawners with "respawn" and "respawn_delay" set. These will periodically attempt to switch the hardware interface as well as the controllers to the active state. Naturally, because this happens periodically and unconditionally, hardware interface and controllers will already be in the active state unless an error occurred in the meantime. If this is the case, the hardware spawner will print out a success message on the console because the interface is already in the desired active state. The controller spawner, on the other hand, will print out an error because it was not able to switch the controller from "active" to "active". As this happens periodically, the console will be cluttered with (expected) error messages. I hope this explanation makes it clear. |
I don't understand why you have to periodically keep calling the hardware interface and controllers to be active, just so you recover from error state and go back to active state?
This behavior might have been fixed with #2088. Can you try it out and see if it works for you :) |
Describe the bug
The hardware interface class
SystemInterface
implements theLifecycleNodeInterface
, but its behavior does not follow the ros2 design document for managed nodes.As described in the document, raising an error in the primary states should lead to a transition to the
ErrorProcessing
state. Instead, the controller manager crashes with an exception. This prevents any error handling in these situations and somewhat defeats the purpose of the lifecycle logic in my opinion. This even takes down any other hardware interfaces running in the same manager, which is fatal.Furthermore, a custom logic is implemented for state transitions with two services:
Instead of the interfaces usually offered by lifecycle nodes. This prevents re-using lifecycle management logic and code. There is also no latched/transient local topic for transitions which makes external management of the lifecycle challenging.
To Reproduce
Return
CallbackReturn::ERROR
in eitheron_configure()
oron_activate()
(others not tested)The controller manager node will crash with:
Expected behavior
The hardware interface should obey the design document and transition to the
ErrorProcessing
state.Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: