Skip to content

CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled#817

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
vrutkovs:short-rotation-featuregate
Jun 20, 2025
Merged

CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled#817
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
vrutkovs:short-rotation-featuregate

Conversation

@vrutkovs
Copy link
Copy Markdown
Contributor

No description provided.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 12, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 2 times, most recently from cbc637b to 455f645 Compare August 16, 2024 09:36
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from 455f645 to c5c2087 Compare November 7, 2024 09:07
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 3 times, most recently from 79eac26 to 1a59b1d Compare November 19, 2024 13:51
@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 18, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2025
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from 1a59b1d to 84869ad Compare March 19, 2025 08:41
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2025
@vrutkovs vrutkovs changed the title certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled Mar 19, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 19, 2025
@vrutkovs vrutkovs marked this pull request as ready for review March 19, 2025 08:42
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 19, 2025

@vrutkovs: This pull request references CNTRLPLANE-343 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.19.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2025
@vrutkovs
Copy link
Copy Markdown
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci Bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2025
@openshift-ci openshift-ci Bot requested review from atiratree and ingvagabund March 19, 2025 08:43
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from 84869ad to 4b82d7b Compare March 19, 2025 09:42
@vrutkovs vrutkovs changed the title CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled WIP CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled Mar 19, 2025
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2025
@vrutkovs vrutkovs changed the title WIP CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled CNTRLPLANE-343: certrotationcontroller: use minutes instead of days when FeatureShortCertRotation is enabled Mar 19, 2025
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2025
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 2 times, most recently from 7ce56f6 to 0d18cbc Compare March 21, 2025 09:45
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 3 times, most recently from a9cf919 to 8b1f4fc Compare April 22, 2025 08:03
@vrutkovs
Copy link
Copy Markdown
Contributor Author

/retest

@vrutkovs
Copy link
Copy Markdown
Contributor Author

/retest-required

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 2 times, most recently from df6e4c4 to 22bfdea Compare May 15, 2025 08:58
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 2 times, most recently from 63ea291 to 496cfce Compare June 18, 2025 10:28
rotationDay = day
klog.Warningf("!!! UNSUPPORTED VALUE SET !!!")
klog.Warningf("Certificate rotation base set to %q", rotationDay)
monthPeriod := time.Hour * 24 * 30
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the defaultRotationDay constant if we do not need it anymore

Copy link
Copy Markdown
Member

@atiratree atiratree Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const is still present

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread pkg/operator/certrotationcontroller/certrotationcontroller.go Outdated
Validity: 30 * rotationDay,
Refresh: 15 * rotationDay,
Validity: monthPeriod,
Refresh: monthPeriod / 2,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to have different variables for the CA and the signer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in ckao we have "month", "tenMonth" and "year" periods - so we can combine those.

I think a single refresh period is okay for now, it can be extended later

return nil, fmt.Errorf("unable to get FeatureGates: %w", err)
}

if featureGates.Enabled(features.FeatureShortCertRotation) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do not support disabling the FG during runtime. Note: FeatureShortCertRotation is DevPreviewNoUpgrade currently and it seems the plan is to use it only for testing?

Maybe we could add a simple comment above to document the use case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added

Comment thread pkg/cmd/recoverycontroller/cmd.go Outdated
Comment thread pkg/cmd/recoverycontroller/cmd.go Outdated
Comment thread pkg/operator/starter.go
// we need to establish some kind of delay or back pressure to prevent the rollout. This ensures we don't trigger kas restart
// during e2e tests for now.
certRotationScale*8,
featureGateAccessor,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Feature is the direct successor of the certRotationScale right?

Are these comments by David obsolete? I suppose it was tuned to work with a specific scale and specific hard-coded value of 8 before.

Can we run into a similar problem with 2h/1h rotation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a replacement for cert rotation scale, yes.

The comments are obsolete - instead of tweaking scales in some operators we define a featuregate observed by every operator. The feature gate also makes your cluster unsupportable (unlike cert scale config via "secret" configmap). And also we run a test which ensures that no certs issued are longer than 8 hrs - so the config is platform wide.

Can we run into a similar problem with 2h/1h rotation?

Its possible - but we have e2e tests which ensure that rollouts happen without disruption

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great to hear that we have it covered and can remove the scale+comment.

Comment on lines +658 to +661
required.Spec.Containers[i].Env = append(container.Env, corev1.EnvVar{
Name: "OPERATOR_IMAGE_VERSION",
Value: operatorImageVersion,
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need the OPERATOR_IMAGE_VERSION env var for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operands needs current operator version to make sure observed feature gates are from expected version

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we pass the feature gates to operands directly? Or is this something planned? Or do you have a link where this is used?

Copy link
Copy Markdown
Contributor Author

@vrutkovs vrutkovs Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is used only for the operator command (RunOperator) and not for any of the operand containers as far as I can see.

I am not against adding the OPERATOR_IMAGE_VERSION env var if we plan to use it. But if not, I would prefer to keep the KCM pod cleaner/simpler.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, we no longer need it (featuregates are now hardcoded in operand sidecar)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can add the env later if we find the need.

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from 15a364e to 84b4a48 Compare June 19, 2025 13:10

if featureGates.Enabled(features.FeatureShortCertRotation) {
refreshPeriod = time.Hour * 2
klog.Infof("Setting monthPeriod to %v", refreshPeriod)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.Infof("Setting monthPeriod to %v", refreshPeriod)
klog.Infof("Setting refreshPeriod to %v", refreshPeriod)

rotationDay = day
klog.Warningf("!!! UNSUPPORTED VALUE SET !!!")
klog.Warningf("Certificate rotation base set to %q", rotationDay)
monthPeriod := time.Hour * 24 * 30
Copy link
Copy Markdown
Member

@atiratree atiratree Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const is still present

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 2 times, most recently from fa01763 to a746627 Compare June 20, 2025 09:42
@atiratree
Copy link
Copy Markdown
Member

As discussed #817 (comment), it seems we can remove the whole last commit now: a746627

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from a746627 to 50e9fe2 Compare June 20, 2025 10:48
@atiratree
Copy link
Copy Markdown
Member

Thanks!

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atiratree, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 20, 2025
@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 0740142 and 2 for PR HEAD 50e9fe2 in total

@vrutkovs
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-ovn

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 0740142 and 2 for PR HEAD 50e9fe2 in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 20, 2025

@vrutkovs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 50e9fe2 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 4c58d4c into openshift:master Jun 20, 2025
10 of 11 checks passed
@openshift-bot
Copy link
Copy Markdown
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-kube-controller-manager-operator
This PR has been included in build ose-cluster-kube-controller-manager-operator-container-v4.20.0-202506202242.p0.g4c58d4c.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants