Skip to content

Commit 757cce1

Browse files
committed
Fix: fencer: Refresh CIB devices on change to nodes section
This fixes a regression introduced by bf7ffcd. As of that commit, the fake local node is created after all resources have been unpacked. So it doesn't get added to resources' allowed_nodes tables. This prevents registration of fencing devices when the fencer receives a CIB diff that removes the local node. For example, the user may have replaced the CIB with a boilerplate configuration that has an empty nodes section. Registering a fencing device requires that the local node be in the resource's allowed nodes table. One option would be to add the fake local node to all resources' allowed nodes tables immediately after creating it. However, it shouldn't necessarily be an allowed node for all resources. For example, if symmetric-cluster=false, then a node should not be placed in a resource's allowed nodes table by default. It's difficult to ensure correctness of the allowed nodes tables when a fake local node is involved. It may involve repeated code or a fragile and confusing dependence on the order of unpacking. Since the fake local node is a hack in the first place, it seems better to avoid using it where possible. Currently the only code that even sets the local_node_name member of scheduler->priv is in: * the fencer * crm_mon when showing the "times" section This commit works as follows. If the fencer receives a CIB diff notification that contains the nodes section, it triggers a full refresh of fencing device registrations. In our example above, where a user has replaced the CIB with a configuration that has an empty nodes section, this means all fencing device registrations will be removed. However, the controller also has a CIB diff notification callback: do_cib_updated(). The controller's callback repopulates the nodes section with up-to-date information from the cluster layer (or its node cache) if it finds that an untrusted client like cibadmin has sent a modified the nodes section. Then it updates the CIB accordingly. The fencer will receive this updated CIB and refresh fencing device registrations again, re-registering the fencing devices that were just removed. Note that in the common case, we're not doing all this wasted work. The "remove and then re-register" sequence should happen only when a user has modified the CIB in a sloppy way (for example, by deleting nodes from the CIB's nodes section that have not been removed from the cluster). In short, it seems better to rely on the controller's maintenance of the CIB's node list, than to rely on a "fake local node" hack in the scheduler. See the following pull requests from Hideo Yamauchi and their discussions: #3849 #3852 Thanks to Hideo for the report and finding the cause. Signed-off-by: Reid Wahl <[email protected]>
1 parent 80161d5 commit 757cce1

File tree

4 files changed

+43
-38
lines changed

4 files changed

+43
-38
lines changed

daemons/fenced/fenced_cib.c

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,23 @@ update_cib_stonith_devices(const xmlNode *patchset)
294294
}
295295
}
296296

297-
if (strstr(xpath, "/" PCMK_XE_RESOURCES)
297+
/* @FIXME Suppose a node gets deleted in one CIB diff and then re-added
298+
* in another. For example, this can happen if a client deletes a node
299+
* from the CIB when it's still part of the cluster, and then the
300+
* controller's diff callback repopulates the nodes section. If
301+
* responding to the deletion here causes a stonith device to get
302+
* unregistered, then the next monitor will fail. This is because when
303+
* the device gets re-registered after nodes are repopulated, the
304+
* api_registered flag doesn't get set. Only the cib_registered flag
305+
* gets set.
306+
*
307+
* See commit message in eaee0fc for some more info.
308+
*
309+
* The fencer should not refuse to run a monitor action for a resource
310+
* that the controller knows is started.
311+
*/
312+
if (strstr(xpath, "/" PCMK_XE_NODES)
313+
|| strstr(xpath, "/" PCMK_XE_RESOURCES)
298314
|| strstr(xpath, "/" PCMK_XE_CONSTRAINTS)
299315
|| strstr(xpath, "/" PCMK_XE_RSC_DEFAULTS)) {
300316

daemons/fenced/fenced_scheduler.c

Lines changed: 4 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -52,35 +52,6 @@ fenced_scheduler_init(void)
5252
return pcmk_rc_ok;
5353
}
5454

55-
/*!
56-
* \internal
57-
* \brief Set the local node name for scheduling purposes
58-
*
59-
* \param[in] node_name Name to set as local node name
60-
*/
61-
void
62-
fenced_set_local_node(const char *node_name)
63-
{
64-
pcmk__assert(scheduler != NULL);
65-
66-
scheduler->priv->local_node_name = pcmk__str_copy(node_name);
67-
}
68-
69-
/*!
70-
* \internal
71-
* \brief Get the local node name
72-
*
73-
* \return Local node name
74-
*/
75-
const char *
76-
fenced_get_local_node(void)
77-
{
78-
if (scheduler == NULL) {
79-
return NULL;
80-
}
81-
return scheduler->priv->local_node_name;
82-
}
83-
8455
/*!
8556
* \internal
8657
* \brief Free all scheduler-related resources
@@ -112,14 +83,15 @@ fenced_scheduler_cleanup(void)
11283
static pcmk_node_t *
11384
local_node_allowed_for(const pcmk_resource_t *rsc)
11485
{
115-
if ((rsc != NULL) && (scheduler->priv->local_node_name != NULL)) {
86+
const char *local_node = fenced_get_local_node();
87+
88+
if ((rsc != NULL) && (local_node != NULL)) {
11689
GHashTableIter iter;
11790
pcmk_node_t *node = NULL;
11891

11992
g_hash_table_iter_init(&iter, rsc->priv->allowed_nodes);
12093
while (g_hash_table_iter_next(&iter, NULL, (void **) &node)) {
121-
if (pcmk__str_eq(node->priv->name, scheduler->priv->local_node_name,
122-
pcmk__str_casei)) {
94+
if (pcmk__str_eq(node->priv->name, local_node, pcmk__str_casei)) {
12395
return node;
12496
}
12597
}

daemons/fenced/pacemaker-fenced.c

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright 2009-2024 the Pacemaker project contributors
2+
* Copyright 2009-2025 the Pacemaker project contributors
33
*
44
* The version control history for this file may have further details.
55
*
@@ -40,6 +40,9 @@
4040

4141
#define SUMMARY "daemon for executing fencing devices in a Pacemaker cluster"
4242

43+
//! Local node's name in cluster
44+
static char *local_node_name = NULL;
45+
4346
// @TODO This should be guint
4447
long long stonith_watchdog_timeout_ms = 0;
4548

@@ -68,6 +71,18 @@ crm_exit_t exit_code = CRM_EX_OK;
6871

6972
static void stonith_cleanup(void);
7073

74+
/*!
75+
* \internal
76+
* \brief Get the local node name
77+
*
78+
* \return Local node name
79+
*/
80+
const char *
81+
fenced_get_local_node(void)
82+
{
83+
return local_node_name;
84+
}
85+
7186
static int32_t
7287
st_ipc_accept(qb_ipcs_connection_t * c, uid_t uid, gid_t gid)
7388
{
@@ -628,7 +643,7 @@ main(int argc, char **argv)
628643
crm_crit("Cannot sign in to the cluster... terminating");
629644
goto done;
630645
}
631-
fenced_set_local_node(cluster->priv->node_name);
646+
local_node_name = pcmk__str_copy(cluster->priv->node_name);
632647

633648
if (!options.stand_alone) {
634649
setup_cib();
@@ -654,6 +669,8 @@ main(int argc, char **argv)
654669
pcmk_cluster_free(cluster);
655670
fenced_scheduler_cleanup();
656671

672+
free(local_node_name);
673+
657674
pcmk__output_and_clear_error(&error, out);
658675

659676
if (out != NULL) {

daemons/fenced/pacemaker-fenced.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright 2009-2024 the Pacemaker project contributors
2+
* Copyright 2009-2025 the Pacemaker project contributors
33
*
44
* This source code is licensed under the GNU General Public License version 2
55
* or later (GPLv2+) WITHOUT ANY WARRANTY.
@@ -13,6 +13,8 @@
1313
#include <crm/stonith-ng.h>
1414
#include <crm/fencing/internal.h>
1515

16+
const char *fenced_get_local_node(void);
17+
1618
/*!
1719
* \internal
1820
* \brief Check whether target has already been fenced recently
@@ -296,8 +298,6 @@ void setup_cib(void);
296298
void fenced_cib_cleanup(void);
297299

298300
int fenced_scheduler_init(void);
299-
void fenced_set_local_node(const char *node_name);
300-
const char *fenced_get_local_node(void);
301301
void fenced_scheduler_cleanup(void);
302302
void fenced_scheduler_run(xmlNode *cib);
303303

0 commit comments

Comments
 (0)