@@ -27,6 +27,58 @@ enough masterless master-eligible nodes to complete an election. If neither of
27
27
these occur quickly enough then the node will retry after
28
28
`discovery.find_peers_interval` which defaults to `1s`.
29
29
30
+ Once a master is elected, it will normally remain as the elected master until
31
+ it is deliberately stopped. It may also stop acting as the master if
32
+ <<cluster-fault-detection,fault detection>> determines the cluster to be
33
+ faulty. When a node stops being the elected master, it begins the discovery
34
+ process again.
35
+
36
+ [[modules-discovery-troubleshooting]]
37
+ ==== Troubleshooting discovery
38
+
39
+ In most cases, the discovery process completes quickly, and the master node
40
+ remains elected for a long period of time. If the cluster has no master for
41
+ more than a few seconds or the master is unstable, the logs for each node will
42
+ contain information explaining why:
43
+
44
+ * All nodes repeatedly log messages indicating that a master cannot be
45
+ discovered or elected using a logger called
46
+ `org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper`. By
47
+ default, this happens every 10 seconds.
48
+
49
+ * If a node wins the election, it logs a message containing
50
+ `elected-as-master`. If this happens repeatedly, the master node is unstable.
51
+
52
+ * When a node discovers the master or believes the master to have failed, it
53
+ logs a message containing `master node changed`.
54
+
55
+ * If a node is unable to discover or elect a master for several minutes, it
56
+ starts to report additional details about the failures in its logs. Be sure to
57
+ capture log messages covering at least five minutes of discovery problems.
58
+
59
+ If your cluster doesn't have a stable master, many of its features won't work
60
+ correctly. The cluster may report many kinds of error to clients and in its
61
+ logs. You must fix the master node's instability before addressing these other
62
+ issues. It will not be possible to solve any other issues while the master node
63
+ is unstable.
64
+
65
+ The logs from the `ClusterFormationFailureHelper` may indicate that a master
66
+ election requires a certain set of nodes and that it has not discovered enough
67
+ nodes to form a quorum. If so, you must address the reason preventing {es} from
68
+ discovering the missing nodes. The missing nodes are needed to reconstruct the
69
+ cluster metadata. Without the cluster metadata, the data in your cluster is
70
+ meaningless. The cluster metadata is stored on a subset of the master-eligible
71
+ nodes in the cluster. If a quorum cannot be discovered then the missing nodes
72
+ were the ones holding the cluster metadata. If you cannot bring the missing
73
+ nodes back into the cluster, start a new cluster and restore data from a recent
74
+ snapshot. Refer to <<modules-discovery-quorums>> for more information.
75
+
76
+ The logs from the `ClusterFormationFailureHelper` may also indicate that it has
77
+ discovered a possible quorum of master-eligible nodes. If so, the usual reason
78
+ that the cluster cannot elect a master is that one of the other nodes cannot
79
+ discover a quorum. Inspect the logs on the other master-eligible nodes and
80
+ ensure that every node has discovered a quorum.
81
+
30
82
[[built-in-hosts-providers]]
31
83
==== Seed hosts providers
32
84
0 commit comments