Skip to content

Conversation

tahliar
Copy link
Contributor

@tahliar tahliar commented Oct 9, 2025

PR creator: Description

This article describes various ways to change an existing SBD configuration. It is almost entirely brand new content for the new crm sbd command.

PDF:
HA-sbd-changing-configuration_en.pdf (Updated 14 Oct)

PR creator: Are there any relevant issues/feature requests?

  • jsc#DOCTEAM-1962

PR reviewer: Checklist for editorial review

Apart from the usual checks, please double-check also the following:

Comment on lines +19 to +37
<para>
If you need to replace an &sbd; device, you can use <command>crm sbd device add</command>
to add the new device and <command>crm sbd device remove</command> to remove the old device.
If the cluster has two &sbd; devices, you can run these commands in any order. However, if
the cluster has one or three &sbd; devices, you must run these commands in a specific order:
</para>
<itemizedlist>
<listitem>
<para>
One device: <command>crm sbd device remove</command> cannot remove the only device,
so you must add the new device before you can remove the old device.
</para>
</listitem>
<listitem>
<para>
Three devices: <command>crm sbd device add</command> cannot add a fourth device,
so you must remove the old device before you can add the new device.
</para>
</listitem>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the procedure is a bit bloated because of having to swap the order of the steps depending on the number of devices. Did I overthink this, and there's actually a simpler way to do it?

If not, would it be possible to simplify it in a future release so you don't have to consider the order? 🙏
Maybe something like crm sbd device replace OLD-DEVICE NEW-DEVICE?

Copy link

@liangxin1300 liangxin1300 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, If the user wants to replace the device, he must have to consider the current device number, and the cluster must be restarted twice:

  • If there is only one device, must add first, then remove and restart
  • If there are 3 devices, must remove first, then add and restart

@zzhou1 @gao-yan, what do you think to add a new option crm sbd device replace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, If the user wants to replace the device, he must have to consider the current device number, and the cluster must be restarted twice:

Ahhh, thank you for pointing this out. It seemed to work okay for me just doing crm maintenance on/crm cluster restart --all/crm maintenance off once, after doing both add and remove. But I only have a basic cluster with one IP address resource, so I can't say what effect it would have on a proper working cluster.

To restart the cluster twice, can I put the cluster in maintenance mode first, then do the add/restart/remove/restart, then take the cluster out of maintenance only at the end?
Or does each change only apply after you remove maintenance mode (so you have to do crm maintenance on/crm cluster restart --all/crm maintenance off twice)?

E.g. here is what happened when I tried doing maintenance mode first with both step orders:

Add then remove:

  1. Put cluster in maintenance mode
  2. Added a new device
  3. The cluster restarted automatically
  4. Removed old device
  5. Had to restart the cluster manually
  6. Exited maintenance mode

Remove then add:

  1. Put cluster in maintenance mode
  2. Removed old device
  3. Had to restart the cluster manually
  4. Added a new device
  5. The cluster restarted automatically
  6. Exited maintenance mode

This seemed to work fine in both cases, but again I only have a minimal test cluster.

Copy link

@zzhou1 zzhou1 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my impression, I would propose a syntax like crm sbd device add xxx remove yyy, where the order is interchangeable with crm sbd device remove yyy add xxx. Let crmsh ensure the correct order internally.

And, I would prefer to avoid adding another subcommand, as I feel the crmsh options are already quite crowded. However, if the consensus is to implement a 'replace' command, I suggest a self-explanatory syntax like: crm sbd device replace xxx by yyy. Bash completion can help to display "by" to guide the sysadmin.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I put the cluster in maintenance mode first, then do the add/restart/remove/restart, then take the cluster out of maintenance only at the end?

This is good idea indeed! With maintenance on, the cluster restart twice will be very quick and same for sbd to pick up the latest configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this procedure, and the PDF in the description. Let me know if it still needs any adjustments :)

@liangxin1300
Copy link

Hi @tahliar
Since ClusterLabs/crmsh#1880, we have command crm help TimeoutFormulas to show all related formulas
I think it's fine to mention this interface somewhere

@tahliar tahliar requested a review from lvicoun October 14, 2025 23:45
Copy link
Contributor

@lvicoun lvicoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tahlia,
LGTM. Thanks!

Copy link
Contributor

@dariavladykina dariavladykina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor nits, really. Thanks!

&sbd; relies on multiple different timeout settings to manage node fencing. When you
configure &sbd; using the &crmshell;, these timeouts are automatically calculated and
adjusted. The automatic values are sufficient for most use cases, but if you need to
change them you can use the <command>crm sbd configure</command> command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
change them you can use the <command>crm sbd configure</command> command.
change them, you can use the <command>crm sbd configure</command> command.

<para>
When you change a timeout with <command>crm sbd configure</command>, the global
&stonith; timeouts are also adjusted automatically. The automatic values are
sufficient for most use cases, but if you need to change them you can use the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sufficient for most use cases, but if you need to change them you can use the
sufficient for most use cases, but if you need to change them, you can use the

Comment on lines +63 to +64
If you change one timeout, the other timeout is automatically adjusted so the
<literal>msgwait-timeout</literal> is double the <literal>watchdog-timeout</literal>. You
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you change one timeout, the other timeout is automatically adjusted so the
<literal>msgwait-timeout</literal> is double the <literal>watchdog-timeout</literal>. You
If you change one timeout, the other timeout is automatically adjusted so that the
<literal>msgwait-timeout</literal> is twice the <literal>watchdog-timeout</literal>. You

</step>
<step>
<para>
Check the status of &sbd; to make sure the device was removed from all of the nodes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check the status of &sbd; to make sure the device was removed from all of the nodes:
Check the status of &sbd; to make sure the device was removed from all the nodes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants