OCI provider: Remove unprovisioned nodes in error state as requested instead of returning an error #8806
+45
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR updates the behavior of OCI's nodepool implementation of
DeleteNodes()when it is called with an unfulfilled placeholder node that is in an error state. Typically is is due to capacity or quota issues. Rather than returning an error because the placeholder node has an empty instance ID, theDeleteNodes()request effectively cancels the failed scale-up, allowing the Cluster Autoscaler to recover more quickly and try scaling up a different node pool that meets the scheduling requirements.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
These changes are currently running in an environment where we have proactively configured multiple node-pools with different shape configurations in order to make the Cluster Autoscaler more resilient against capacity issues..
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: