Skip to content

Add RabbitMQ version upgrade and queue type migration basic support#526

Closed
lmiccini wants to merge 4 commits intoopenstack-k8s-operators:mainfrom
lmiccini:rmq4upg
Closed

Add RabbitMQ version upgrade and queue type migration basic support#526
lmiccini wants to merge 4 commits intoopenstack-k8s-operators:mainfrom
lmiccini:rmq4upg

Conversation

@lmiccini
Copy link
Contributor

@lmiccini lmiccini commented Jan 27, 2026

Implements RabbitMQ version upgrade for major/minor version changes
(e.g., 3.9 → 4.2) and queue type migrations (Mirrored → Quorum) using
a storage wipe approach with init containers.

## Version Tracking

- Status.CurrentVersion tracks the deployed RabbitMQ version
- target-version annotation (set by openstack-operator) triggers upgrades
- Detects existing 3.9 deployments on operator upgrade by checking for
  RabbitMQCluster resource existence during CurrentVersion initialization

## Multi-Phase State Machine

UpgradePhase status field tracks upgrade progress and enables resumption
after interruptions:
- "" → "DeletingResources" → "WaitingForPods" → "WaitingForCluster" → ""

Phases:
- DeletingResources: Delete RabbitMQCluster and ha-all policy (once)
- WaitingForPods: Wait for pod termination without re-deleting resources
- WaitingForCluster: Add wipe-data init container annotation
- "": Upgrade complete, clear phase when cluster ready

## Storage Wipe Implementation

Data-wipe init container runs before RabbitMQ container:
- Wipes /var/lib/rabbitmq (Mnesia data format changes between versions)
- Version-specific marker files (.operator-wipe-{version}) prevent re-wipe
- Reuses existing PVCs/PVs (no resource deletion required)
- Triggered by temporary storage-wipe-needed annotation on RabbitMQCluster

## Upgrade Workflow

1. openstack-operator sets target-version annotation
2. requiresStorageWipe() checks version compatibility (major/minor changes)
3. Status.UpgradePhase → "DeletingResources": Delete cluster and policy
4. Status.UpgradePhase → "WaitingForPods": Wait for pods to terminate
5. Status.UpgradePhase → "WaitingForCluster": Add storage-wipe-needed annotation
6. CreateOrPatch creates cluster with wipe-data init container
7. Init container wipes data based on marker file check
8. Cluster becomes ready: Update Status.CurrentVersion, clear UpgradePhase
9. Next reconcile: Remove storage-wipe-needed annotation via CreateOrPatch sync

## Configuration Version Selection

Uses target-version (not current version) for cluster configuration during
upgrades to prevent mid-upgrade configuration changes (e.g., TLS versions).

## RabbitMQ 4.x Changes

- TLS 1.3 enabled (RabbitMQ 4.x+)
- default_queue_type=quorum configuration
- deprecated_features.permit.classic_queue_mirroring=false
- quorum_queue.property_equivalence.relaxed_checks_on_redeclaration=true
- Automatic Mirrored → Quorum migration on 3.x → 4.x upgrades

## Webhook Validation

- Automatically overrides Mirrored to Quorum on RabbitMQ 4.x updates
- Blocks Mirrored → Quorum migration on RabbitMQ 3.x (no server enforcement)
- Allows migration when concurrent version upgrade to 4.x is present
- DefaultForUpdate() sets queueType=Quorum for RabbitMQ 4.x+ target versions

## Annotation Management

CreateOrPatch() in rabbitmqcluster.go now syncs annotations bidirectionally:
- Adds annotations from desired spec
- Removes annotations not in desired spec
Enables automatic cleanup of temporary storage-wipe-needed annotation.

## Init Container Preservation

Checks existing RabbitMQCluster for wipe-data init container and preserves
it across reconciles to avoid pod restarts. Combined with marker file check
in wipe script prevents duplicate data wipes.

## Queue Type Migration

Spec.QueueType change from Mirrored to Quorum triggers storage wipe workflow
(same phases as version upgrades). Required because mirrored queues cannot
be converted to quorum queues in-place.

Jira: https://issues.redhat.com/browse/OSPRH-22219

Depends-On: openstack-k8s-operators/openstack-operator#1805

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lmiccini lmiccini requested review from stuggi and removed request for viroel January 27, 2026 06:44
@lmiccini lmiccini force-pushed the rmq4upg branch 5 times, most recently from dfeb678 to 9be78b8 Compare January 27, 2026 09:32
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d9308d8b92f146ef93533d3002d49ade

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 04s
podified-multinode-edpm-deployment-crc FAILURE in 1h 01m 24s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 15m 20s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/645d32e739b74209b129928de9e5af66

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 27m 41s
podified-multinode-edpm-deployment-crc FAILURE in 1h 03m 01s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 12m 32s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/203560e945fd459095e9862a1dfebfa2

openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160425 failed in 0s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0565bc902cc540cbbb4a71ec46c402cf

openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160510 failed in 0s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/1940f844c9064cc9ab906819f6a331ab

openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160517 failed in 0s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f45533ae9ce047e08720f67fdfa9bb2d

openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160529 failed in 0s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@lmiccini lmiccini force-pushed the rmq4upg branch 7 times, most recently from 451d65d to 397e29b Compare February 10, 2026 16:38
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/baa1ae88035349b8b7136b862bdae9c9

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 31s
podified-multinode-edpm-deployment-crc FAILURE in 1h 01m 16s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 15m 49s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/fcbfa49ea336474b860478d0d8104e3c

openstack-k8s-operators-content-provider FAILURE in 8m 37s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1805 is needed.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c72cd82c495d42ff98b0a0b9c47b592d

openstack-k8s-operators-content-provider FAILURE in 8m 35s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/1b969bac6a05422fb4478e9acd5ce427

openstack-k8s-operators-content-provider FAILURE in 9m 06s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4f0a23c0e0914374bbe1d6b4d5740e6a

openstack-k8s-operators-content-provider FAILURE in 9m 23s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/676ad5d593304be1969004cd51bc2ab3

openstack-k8s-operators-content-provider FAILURE in 8m 47s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f1fee4f17bdd4914955473a02c2d3c9b

openstack-k8s-operators-content-provider FAILURE in 9m 05s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d6f58427e11f4adea3da1e29f4e1dafc

openstack-k8s-operators-content-provider FAILURE in 9m 03s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/904f9e29066c4ae3a2787e54c1bcaf0c

openstack-k8s-operators-content-provider FAILURE in 9m 05s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@lmiccini lmiccini force-pushed the rmq4upg branch 2 times, most recently from ee7ff70 to 310f1fe Compare February 16, 2026 11:50
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f45d05971e364f339fba88f2b94fbc2a

openstack-k8s-operators-content-provider FAILURE in 9m 30s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@lmiccini lmiccini force-pushed the rmq4upg branch 5 times, most recently from f72d845 to f303091 Compare February 16, 2026 12:59
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 16, 2026

@lmiccini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/functional 79e4436 link true /test functional
ci/prow/precommit-check 79e4436 link true /test precommit-check
ci/prow/infra-operator-build-deploy-kuttl 79e4436 link true /test infra-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant