Skip to content

Commit f29a483

Browse files
authored
AKS min node improvements Azure#2683 (Azure#2697)
* AKS min node improvements Azure#2683 * Bump docs
1 parent 46f537b commit f29a483

12 files changed

+791
-44
lines changed

.vscode/settings.json

+1
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@
7575
"GREATEROREQUALS",
7676
"Hashtable",
7777
"inheritdoc",
78+
"konnectivity",
7879
"kube",
7980
"kubelet",
8081
"kubenet",

docs/CHANGELOG-v1.md

+14
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,20 @@ See [upgrade notes][1] for helpful information when upgrading from previous vers
3434

3535
What's changed since v1.33.2:
3636

37+
- New rules:
38+
- Azure Kubernetes Service:
39+
- Check that user mode pools have a minimum number of nodes by @BernieWhite.
40+
[#2683](https://github.com/Azure/PSRule.Rules.Azure/issues/2683)
41+
- Added configuration to support changing the minimum number of node and to exclude node pools.
42+
- Set `AZURE_AKS_CLUSTER_USER_POOL_MINIMUM_NODES` to set the minimum number of user nodes.
43+
- Set `AZURE_AKS_CLUSTER_USER_POOL_EXCLUDED_FROM_MINIMUM_NODES` to exclude a specific node pool by name.
44+
- Updated rules:
45+
- Azure Kubernetes Service:
46+
- Updated `Azure.AKS.MinNodeCount` the count nodes system node pools by @BernieWhite.
47+
[#2683](https://github.com/Azure/PSRule.Rules.Azure/issues/2683)
48+
- Improved guidance and examples specifically for system node pools.
49+
- Added configuration to support changing the minimum number of node.
50+
- Set `AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES` to set the minimum number of system nodes.
3751
- Engineering:
3852
- Bump Microsoft.NET.Test.Sdk to v17.9.0.
3953
[#2680](https://github.com/Azure/PSRule.Rules.Azure/pull/2680)
+257-9
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,277 @@
11
---
2+
reviewed: 2024-02-21
23
severity: Important
34
pillar: Reliability
4-
category: Load balancing and failover
5+
category: RE:05 Redundancy
56
resource: Azure Kubernetes Service
67
online version: https://azure.github.io/PSRule.Rules.Azure/en/rules/Azure.AKS.MinNodeCount/
78
ms-content-id: 320afea5-5c19-45ad-b9a5-c1a63ae6e114
89
---
910

10-
# Azure.AKS.MinNodeCount
11+
# Minimum number of system nodes in an AKS cluster
1112

1213
## SYNOPSIS
1314

14-
AKS clusters should have minimum number of nodes for failover and updates.
15+
AKS clusters should have minimum number of system nodes for failover and updates.
1516

1617
## DESCRIPTION
1718

18-
Kubernetes clusters should have minimum number of three (3) nodes for high availability and planned maintenance.
19+
Azure Kubernetes (AKS) clusters support multiple nodes and node pools.
20+
Each node is a virtual machine (VM) that runs Kubernetes components and a container runtime.
21+
A node pool is a grouping of nodes that run the same configuration.
22+
Application or system pods can be scheduled to run across multiple nodes to ensure resiliency and high availability.
23+
AKS supports configuring one or more system node pools, and zero or more user node pools.
24+
25+
System node pools are intended for pods that perform important management and infrastructure functions for cluster operation.
26+
This includes CoreDNS, konnectivity, and Azure Policy to name a few.
27+
The number of pods that are scheduled to run on system node pools varies based on the configuration of your cluster.
28+
29+
User node pools are intended for application pods.
30+
In general, schedule application workloads to run on user node pools to avoid disrupting the operation of system pods.
31+
32+
A minimum number of nodes in each node pool should be maintained to ensure resiliency during node failures or disruptions.
33+
Also consider how your nodes are distributed across availability zones when deploying to a supported region.
34+
Understanding that adding new nodes to a node pool can take time.
35+
36+
For example, in a three-node node pool:
37+
38+
- If one node fails ~33% capacity is lost until a new node is created to replace the failed node.
39+
- The pods running on the failed node may be rescheduled to run on the remaining two nodes if there is enough capacity.
40+
However, there is a number of factors that affect which pods will be scheduled to run on the two remaining nodes.
41+
42+
For example, in a 2x two-node node pool:
43+
44+
- If 2x two node pools are deployed both with availability zones `1`, `2`.
45+
AKS will automatically spread the nodes across the two availability zones as it scales out.
46+
- If availability zone `1` fails, 50% capacity on the remaining nodes in availability zone `2` will continue to run pods.
47+
- Pods running on the failed nodes in availability zone `1` will be rescheduled to run pending enough capacity.
1948

2049
## RECOMMENDATION
2150

22-
Use at least three (3) agent nodes.
23-
Consider deploying additional nodes as required to provide enough resiliency during nodes failures or planned maintenance.
51+
Consider configuring AKS clusters with at least three (3) agent nodes in system node pools.
52+
53+
## EXAMPLES
54+
55+
### Configure with Azure template
56+
57+
To deploy AKS clusters that pass this rule:
58+
59+
To deploy AKS clusters that pass this rule:
60+
61+
- For a single system mode node pool `properties.agentPoolProfiles`:
62+
- Set the `minCount` property to at least `3` for node pools with auto-scale. _OR_
63+
- Set the `count` property to at least `3` for node pools without auto-scale. _OR_
64+
- Deploy an additional system mode node pool so the total number of nodes is at least `3` across all pools.
65+
For example, two node pools with `minCount` set to `2` totalling _4_ nodes.
66+
67+
For example:
68+
69+
```json
70+
{
71+
"type": "Microsoft.ContainerService/managedClusters",
72+
"apiVersion": "2023-11-01",
73+
"name": "[parameters('name')]",
74+
"location": "[parameters('location')]",
75+
"identity": {
76+
"type": "UserAssigned",
77+
"userAssignedIdentities": {
78+
"[format('{0}', resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('identityName')))]": {}
79+
}
80+
},
81+
"properties": {
82+
"kubernetesVersion": "[parameters('kubernetesVersion')]",
83+
"disableLocalAccounts": true,
84+
"enableRBAC": true,
85+
"dnsPrefix": "[parameters('dnsPrefix')]",
86+
"agentPoolProfiles": [
87+
{
88+
"name": "system",
89+
"osDiskSizeGB": 0,
90+
"minCount": 3,
91+
"maxCount": 5,
92+
"enableAutoScaling": true,
93+
"maxPods": 50,
94+
"vmSize": "Standard_D4s_v5",
95+
"type": "VirtualMachineScaleSets",
96+
"vnetSubnetID": "[parameters('clusterSubnetId')]",
97+
"mode": "System",
98+
"osDiskType": "Ephemeral"
99+
},
100+
{
101+
"name": "user",
102+
"osDiskSizeGB": 0,
103+
"minCount": 3,
104+
"maxCount": 20,
105+
"enableAutoScaling": true,
106+
"maxPods": 50,
107+
"vmSize": "Standard_D4s_v5",
108+
"type": "VirtualMachineScaleSets",
109+
"vnetSubnetID": "[parameters('clusterSubnetId')]",
110+
"mode": "User",
111+
"osDiskType": "Ephemeral"
112+
}
113+
],
114+
"aadProfile": {
115+
"managed": true,
116+
"enableAzureRBAC": true,
117+
"adminGroupObjectIDs": "[parameters('clusterAdmins')]",
118+
"tenantID": "[subscription().tenantId]"
119+
},
120+
"networkProfile": {
121+
"networkPlugin": "azure",
122+
"networkPolicy": "azure",
123+
"loadBalancerSku": "standard",
124+
"serviceCidr": "[variables('serviceCidr')]",
125+
"dnsServiceIP": "[variables('dnsServiceIP')]"
126+
},
127+
"apiServerAccessProfile": {
128+
"authorizedIPRanges": [
129+
"0.0.0.0/32"
130+
]
131+
},
132+
"autoUpgradeProfile": {
133+
"upgradeChannel": "stable"
134+
},
135+
"oidcIssuerProfile": {
136+
"enabled": true
137+
},
138+
"addonProfiles": {
139+
"azurepolicy": {
140+
"enabled": true
141+
},
142+
"omsagent": {
143+
"enabled": true,
144+
"config": {
145+
"logAnalyticsWorkspaceResourceID": "[parameters('workspaceId')]"
146+
}
147+
},
148+
"azureKeyvaultSecretsProvider": {
149+
"enabled": true,
150+
"config": {
151+
"enableSecretRotation": "true"
152+
}
153+
}
154+
}
155+
},
156+
"dependsOn": [
157+
"[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('identityName'))]"
158+
]
159+
}
160+
```
161+
162+
### Configure with Bicep
163+
164+
To deploy AKS clusters that pass this rule:
165+
166+
- For a single system mode node pool `properties.agentPoolProfiles`:
167+
- Set the `minCount` property to at least `3` for node pools with auto-scale. _OR_
168+
- Set the `count` property to at least `3` for node pools without auto-scale. _OR_
169+
- Deploy an additional system mode node pool so the total number of nodes is at least `3` across all pools.
170+
For example, two node pools with `minCount` set to `2` totalling _4_ nodes.
171+
172+
For example:
173+
174+
```bicep
175+
resource clusterWithPools 'Microsoft.ContainerService/managedClusters@2023-11-01' = {
176+
location: location
177+
name: name
178+
identity: {
179+
type: 'UserAssigned'
180+
userAssignedIdentities: {
181+
'${identity.id}': {}
182+
}
183+
}
184+
properties: {
185+
kubernetesVersion: kubernetesVersion
186+
disableLocalAccounts: true
187+
enableRBAC: true
188+
dnsPrefix: dnsPrefix
189+
agentPoolProfiles: [
190+
{
191+
name: 'system'
192+
osDiskSizeGB: 0
193+
minCount: 3
194+
maxCount: 5
195+
enableAutoScaling: true
196+
maxPods: 50
197+
vmSize: 'Standard_D4s_v5'
198+
type: 'VirtualMachineScaleSets'
199+
vnetSubnetID: clusterSubnetId
200+
mode: 'System'
201+
osDiskType: 'Ephemeral'
202+
}
203+
{
204+
name: 'user'
205+
osDiskSizeGB: 0
206+
minCount: 3
207+
maxCount: 20
208+
enableAutoScaling: true
209+
maxPods: 50
210+
vmSize: 'Standard_D4s_v5'
211+
type: 'VirtualMachineScaleSets'
212+
vnetSubnetID: clusterSubnetId
213+
mode: 'User'
214+
osDiskType: 'Ephemeral'
215+
}
216+
]
217+
aadProfile: {
218+
managed: true
219+
enableAzureRBAC: true
220+
adminGroupObjectIDs: clusterAdmins
221+
tenantID: subscription().tenantId
222+
}
223+
networkProfile: {
224+
networkPlugin: 'azure'
225+
networkPolicy: 'azure'
226+
loadBalancerSku: 'standard'
227+
serviceCidr: serviceCidr
228+
dnsServiceIP: dnsServiceIP
229+
}
230+
apiServerAccessProfile: {
231+
authorizedIPRanges: [
232+
'0.0.0.0/32'
233+
]
234+
}
235+
autoUpgradeProfile: {
236+
upgradeChannel: 'stable'
237+
}
238+
oidcIssuerProfile: {
239+
enabled: true
240+
}
241+
addonProfiles: {
242+
azurepolicy: {
243+
enabled: true
244+
}
245+
omsagent: {
246+
enabled: true
247+
config: {
248+
logAnalyticsWorkspaceResourceID: workspaceId
249+
}
250+
}
251+
azureKeyvaultSecretsProvider: {
252+
enabled: true
253+
config: {
254+
enableSecretRotation: 'true'
255+
}
256+
}
257+
}
258+
}
259+
}
260+
```
261+
262+
## NOTES
263+
264+
### Rule configuration
265+
266+
<!-- module:config rule AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES -->
267+
268+
This rule fails by default if you have less than three (3) nodes in the cluster across all system node pools.
269+
To change the default, set the `AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES` configuration option.
24270

25271
## LINKS
26272

27-
- [Baseline architecture for an Azure Kubernetes Service (AKS) cluster](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks/secure-baseline-aks)
28-
- [Create an AKS cluster](https://docs.microsoft.com/azure/aks/use-multiple-node-pools#create-an-aks-cluster)
29-
- [Azure deployment reference](https://docs.microsoft.com/azure/templates/microsoft.containerservice/managedclusters)
273+
- [RE:05 Redundancy](https://learn.microsoft.com/azure/well-architected/reliability/redundancy)
274+
- [Azure Well-Architected Framework review - Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/well-architected/service-guides/azure-kubernetes-service)
275+
- [Manage node pools for a cluster in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/manage-node-pools)
276+
- [Manage system node pools in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/use-system-pools)
277+
- [Azure deployment reference](https://learn.microsoft.com/azure/templates/microsoft.containerservice/managedclusters)

0 commit comments

Comments
 (0)