Skip to content

Zookeeper pod keeps crashing when scaling down and up #513

@hoyhbx

Description

@hoyhbx

Description

(Describe the feature, bug, question, proposal that you are requesting)
We found that zookeeper pod keeps crashing after a scale-down workload and a scale-up workload.
We found the crash is because the new pods are reusing the undeleted PVC during the scale-down process. However, this problem still persists even when we specify the reclaimPolicy: Delete.

The root cause of this issue is because there is no guard for scale-up before finishing deleting the PVC. We found that zookeeper-operator checks if the upgrade fails before updating the statefulSet, but this check misses checking for the undeleted OrphanPVCs. When reusing the old PVC, the startup script mistakenly thinks the membership list on the old PVC is the up-to-date one and skips updating it. When ZooKeeper starts, it crashes due to inconsistent membership list.

The orphaned PVCs will never get deleted because the operator always waits for the number of ready replicas to equal to desired replicas, which will never happen since one pod keeps crashing.

Importance

(Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have))
Importance: must-have

Location

(Where is the piece of code, package, or document affected by this issue?)
The PVC cleanup code is here:

func (r *ZookeeperClusterReconciler) cleanupOrphanPVCs(instance *zookeeperv1beta1.ZookeeperCluster) (err error) {

The logic to upgrade the statefulSet is here:

func (r *ZookeeperClusterReconciler) upgradeStatefulSet(instance *zookeeperv1beta1.ZookeeperCluster, foundSts *appsv1.StatefulSet) (err error) {

Suggestions for an improvement

And we think that the operator should check if all OrphanPVCs are deleted before proceeding to the next upgrade.
We suggest to add a check to check if there is no Orphaned PVCs before updating the statefulSet

(How do you suggest to fix or proceed with this issue?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions