Skip to content

Add support to disable CAPZ components through a manager flag #5552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bryan-cox
Copy link
Contributor

@bryan-cox bryan-cox commented Apr 7, 2025

What type of PR is this?
/kind bug

What this PR does / why we need it:
Adds the ability to disable CAPZ components through a manager flag. Flags added for disabling ASO Secret Controller and disabling Azure JSON Machine Controller.

Which issue(s) this PR fixes:
Fixes #5472

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Adds the ability to disable CAPZ components through a manager flag. Flags added for disabling ASO Secret Controller and disabling Azure JSON Machine Controller.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 7, 2025
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 7, 2025
@bryan-cox
Copy link
Contributor Author

/assign @nawazkh

Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 12.82051% with 34 lines in your changes missing coverage. Please review.

Project coverage is 53.25%. Comparing base (3a9eb5b) to head (ebc3066).

Files with missing lines Patch % Lines
main.go 0.00% 34 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5552      +/-   ##
==========================================
- Coverage   53.27%   53.25%   -0.02%     
==========================================
  Files         272      273       +1     
  Lines       29522    29541      +19     
==========================================
+ Hits        15727    15732       +5     
- Misses      12980    12994      +14     
  Partials      815      815              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nojnhuh
Copy link
Contributor

nojnhuh commented Apr 8, 2025

the AzureManagedControlPlane CR which is behind the ASOAPI feature gate.

This is incorrect. The ASOAPI feature gate only enables controllers for the AzureASOManagedControlPlane and the other AzureASO... resources. This ASO secret controller is necessary for all of the resource types that CAPZ manages with ASO, including resource groups and vnets which are created for every AzureCluster and AzureManagedControlPlane. I suspect that if you disable the ASOAPI feature gate in CI, then all of the e2e tests will blow up, not just the ones exercising the AzureASO... APIs.

I think I mentioned somewhere, maybe in #5099 or in a Slack thread related to that PR, that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags. Either something like that, or we explicitly disclaim all support for any installation that is not exactly equivalent to the CRDs and other manifests we publish for releases, since we generally assume the controller manager is running when all of the CRDs are installed. Changing the meaning of existing feature gates isn't a sustainable way to solve the general "I didn't install a CRD and now the controller manager is crashing" problem.

@enxebre
Copy link
Member

enxebre commented Apr 8, 2025

that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags.

I agree, fwiw created this some time ago to track that effort #5294

@nawazkh
Copy link
Member

nawazkh commented Apr 8, 2025

This is incorrect. The ASOAPI feature gate only enables controllers for the AzureASOManagedControlPlane and the other AzureASO... resources. This ASO secret controller is necessary for all of the resource types that CAPZ manages with ASO, including resource groups and vnets which are created for every AzureCluster and AzureManagedControlPlane. I suspect that if you disable the ASOAPI feature gate in CI, then all of the e2e tests will blow up, not just the ones exercising the AzureASO... APIs.

Thank you for adding more context on this Jon.

that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags.

I agree, fwiw created this some time ago to track that effort #5294

I agree also. We could start with updating manager.yaml with a bunch of env variables that enable different controllers and webhooks.


So we essentially close out this PR @bryan-cox ?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from nawazkh. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 8, 2025
@bryan-cox bryan-cox marked this pull request as draft April 8, 2025 18:07
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2025
@bryan-cox bryan-cox changed the title Move ASO secret controller behind feature gate Add support to disable CAPZ components through a manager flag Apr 8, 2025
@bryan-cox
Copy link
Contributor Author

/test all

@bryan-cox
Copy link
Contributor Author

/test all

@bryan-cox
Copy link
Contributor Author

@nojnhuh @nawazkh - when you have a moment, can I get another look at this PR? If we are good with moving this direction, I can follow up with some doc on how this can be used.

@nawazkh
Copy link
Member

nawazkh commented May 2, 2025

@nojnhuh @nawazkh - when you have a moment, can I get another look at this PR? If we are good with moving this direction, I can follow up with some doc on how this can be used.

I like the idea of being able to toggle controllers, so green flag from me on this approach.

However, we need to ensure that the Management cluster is functional despite turning off controller(maybe all in the future?), so maybe we also incorporate an e2e test to validate the functionality.
We could add that test scenario in an optional test.

@bryan-cox , what do you say ?

@nawazkh nawazkh mentioned this pull request May 6, 2025
4 tasks
bryan-cox added 2 commits May 12, 2025 09:14
This commit adds some minor documentation on how to use the new flag,
disable-controllers-or-webhooks.

Signed-off-by: Bryan Cox <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

Self-managed infrastructure of CAPZ crash loops when AzureManagedControlPlane is not installed
5 participants