|
1 | 1 | ---
|
| 2 | +reviewed: 2024-02-21 |
2 | 3 | severity: Important
|
3 | 4 | pillar: Reliability
|
4 |
| -category: Load balancing and failover |
| 5 | +category: RE:05 Redundancy |
5 | 6 | resource: Azure Kubernetes Service
|
6 | 7 | online version: https://azure.github.io/PSRule.Rules.Azure/en/rules/Azure.AKS.MinNodeCount/
|
7 | 8 | ms-content-id: 320afea5-5c19-45ad-b9a5-c1a63ae6e114
|
8 | 9 | ---
|
9 | 10 |
|
10 |
| -# Azure.AKS.MinNodeCount |
| 11 | +# Minimum number of system nodes in an AKS cluster |
11 | 12 |
|
12 | 13 | ## SYNOPSIS
|
13 | 14 |
|
14 |
| -AKS clusters should have minimum number of nodes for failover and updates. |
| 15 | +AKS clusters should have minimum number of system nodes for failover and updates. |
15 | 16 |
|
16 | 17 | ## DESCRIPTION
|
17 | 18 |
|
18 |
| -Kubernetes clusters should have minimum number of three (3) nodes for high availability and planned maintenance. |
| 19 | +Azure Kubernetes (AKS) clusters support multiple nodes and node pools. |
| 20 | +Each node is a virtual machine (VM) that runs Kubernetes components and a container runtime. |
| 21 | +A node pool is a grouping of nodes that run the same configuration. |
| 22 | +Application or system pods can be scheduled to run across multiple nodes to ensure resiliency and high availability. |
| 23 | +AKS supports configuring one or more system node pools, and zero or more user node pools. |
| 24 | + |
| 25 | +System node pools are intended for pods that perform important management and infrastructure functions for cluster operation. |
| 26 | +This includes CoreDNS, konnectivity, and Azure Policy to name a few. |
| 27 | +The number of pods that are scheduled to run on system node pools varies based on the configuration of your cluster. |
| 28 | + |
| 29 | +User node pools are intended for application pods. |
| 30 | +In general, schedule application workloads to run on user node pools to avoid disrupting the operation of system pods. |
| 31 | + |
| 32 | +A minimum number of nodes in each node pool should be maintained to ensure resiliency during node failures or disruptions. |
| 33 | +Also consider how your nodes are distributed across availability zones when deploying to a supported region. |
| 34 | +Understanding that adding new nodes to a node pool can take time. |
| 35 | + |
| 36 | +For example, in a three-node node pool: |
| 37 | + |
| 38 | +- If one node fails ~33% capacity is lost until a new node is created to replace the failed node. |
| 39 | +- The pods running on the failed node may be rescheduled to run on the remaining two nodes if there is enough capacity. |
| 40 | + However, there is a number of factors that affect which pods will be scheduled to run on the two remaining nodes. |
| 41 | + |
| 42 | +For example, in a 2x two-node node pool: |
| 43 | + |
| 44 | +- If 2x two node pools are deployed both with availability zones `1`, `2`. |
| 45 | + AKS will automatically spread the nodes across the two availability zones as it scales out. |
| 46 | +- If availability zone `1` fails, 50% capacity on the remaining nodes in availability zone `2` will continue to run pods. |
| 47 | +- Pods running on the failed nodes in availability zone `1` will be rescheduled to run pending enough capacity. |
19 | 48 |
|
20 | 49 | ## RECOMMENDATION
|
21 | 50 |
|
22 |
| -Use at least three (3) agent nodes. |
23 |
| -Consider deploying additional nodes as required to provide enough resiliency during nodes failures or planned maintenance. |
| 51 | +Consider configuring AKS clusters with at least three (3) agent nodes in system node pools. |
| 52 | + |
| 53 | +## EXAMPLES |
| 54 | + |
| 55 | +### Configure with Azure template |
| 56 | + |
| 57 | +To deploy AKS clusters that pass this rule: |
| 58 | + |
| 59 | +To deploy AKS clusters that pass this rule: |
| 60 | + |
| 61 | +- For a single system mode node pool `properties.agentPoolProfiles`: |
| 62 | + - Set the `minCount` property to at least `3` for node pools with auto-scale. _OR_ |
| 63 | + - Set the `count` property to at least `3` for node pools without auto-scale. _OR_ |
| 64 | +- Deploy an additional system mode node pool so the total number of nodes is at least `3` across all pools. |
| 65 | + For example, two node pools with `minCount` set to `2` totalling _4_ nodes. |
| 66 | + |
| 67 | +For example: |
| 68 | + |
| 69 | +```json |
| 70 | +{ |
| 71 | + "type": "Microsoft.ContainerService/managedClusters", |
| 72 | + "apiVersion": "2023-11-01", |
| 73 | + "name": "[parameters('name')]", |
| 74 | + "location": "[parameters('location')]", |
| 75 | + "identity": { |
| 76 | + "type": "UserAssigned", |
| 77 | + "userAssignedIdentities": { |
| 78 | + "[format('{0}', resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('identityName')))]": {} |
| 79 | + } |
| 80 | + }, |
| 81 | + "properties": { |
| 82 | + "kubernetesVersion": "[parameters('kubernetesVersion')]", |
| 83 | + "disableLocalAccounts": true, |
| 84 | + "enableRBAC": true, |
| 85 | + "dnsPrefix": "[parameters('dnsPrefix')]", |
| 86 | + "agentPoolProfiles": [ |
| 87 | + { |
| 88 | + "name": "system", |
| 89 | + "osDiskSizeGB": 0, |
| 90 | + "minCount": 3, |
| 91 | + "maxCount": 5, |
| 92 | + "enableAutoScaling": true, |
| 93 | + "maxPods": 50, |
| 94 | + "vmSize": "Standard_D4s_v5", |
| 95 | + "type": "VirtualMachineScaleSets", |
| 96 | + "vnetSubnetID": "[parameters('clusterSubnetId')]", |
| 97 | + "mode": "System", |
| 98 | + "osDiskType": "Ephemeral" |
| 99 | + }, |
| 100 | + { |
| 101 | + "name": "user", |
| 102 | + "osDiskSizeGB": 0, |
| 103 | + "minCount": 3, |
| 104 | + "maxCount": 20, |
| 105 | + "enableAutoScaling": true, |
| 106 | + "maxPods": 50, |
| 107 | + "vmSize": "Standard_D4s_v5", |
| 108 | + "type": "VirtualMachineScaleSets", |
| 109 | + "vnetSubnetID": "[parameters('clusterSubnetId')]", |
| 110 | + "mode": "User", |
| 111 | + "osDiskType": "Ephemeral" |
| 112 | + } |
| 113 | + ], |
| 114 | + "aadProfile": { |
| 115 | + "managed": true, |
| 116 | + "enableAzureRBAC": true, |
| 117 | + "adminGroupObjectIDs": "[parameters('clusterAdmins')]", |
| 118 | + "tenantID": "[subscription().tenantId]" |
| 119 | + }, |
| 120 | + "networkProfile": { |
| 121 | + "networkPlugin": "azure", |
| 122 | + "networkPolicy": "azure", |
| 123 | + "loadBalancerSku": "standard", |
| 124 | + "serviceCidr": "[variables('serviceCidr')]", |
| 125 | + "dnsServiceIP": "[variables('dnsServiceIP')]" |
| 126 | + }, |
| 127 | + "apiServerAccessProfile": { |
| 128 | + "authorizedIPRanges": [ |
| 129 | + "0.0.0.0/32" |
| 130 | + ] |
| 131 | + }, |
| 132 | + "autoUpgradeProfile": { |
| 133 | + "upgradeChannel": "stable" |
| 134 | + }, |
| 135 | + "oidcIssuerProfile": { |
| 136 | + "enabled": true |
| 137 | + }, |
| 138 | + "addonProfiles": { |
| 139 | + "azurepolicy": { |
| 140 | + "enabled": true |
| 141 | + }, |
| 142 | + "omsagent": { |
| 143 | + "enabled": true, |
| 144 | + "config": { |
| 145 | + "logAnalyticsWorkspaceResourceID": "[parameters('workspaceId')]" |
| 146 | + } |
| 147 | + }, |
| 148 | + "azureKeyvaultSecretsProvider": { |
| 149 | + "enabled": true, |
| 150 | + "config": { |
| 151 | + "enableSecretRotation": "true" |
| 152 | + } |
| 153 | + } |
| 154 | + } |
| 155 | + }, |
| 156 | + "dependsOn": [ |
| 157 | + "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('identityName'))]" |
| 158 | + ] |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +### Configure with Bicep |
| 163 | + |
| 164 | +To deploy AKS clusters that pass this rule: |
| 165 | + |
| 166 | +- For a single system mode node pool `properties.agentPoolProfiles`: |
| 167 | + - Set the `minCount` property to at least `3` for node pools with auto-scale. _OR_ |
| 168 | + - Set the `count` property to at least `3` for node pools without auto-scale. _OR_ |
| 169 | +- Deploy an additional system mode node pool so the total number of nodes is at least `3` across all pools. |
| 170 | + For example, two node pools with `minCount` set to `2` totalling _4_ nodes. |
| 171 | + |
| 172 | +For example: |
| 173 | + |
| 174 | +```bicep |
| 175 | +resource clusterWithPools 'Microsoft.ContainerService/managedClusters@2023-11-01' = { |
| 176 | + location: location |
| 177 | + name: name |
| 178 | + identity: { |
| 179 | + type: 'UserAssigned' |
| 180 | + userAssignedIdentities: { |
| 181 | + '${identity.id}': {} |
| 182 | + } |
| 183 | + } |
| 184 | + properties: { |
| 185 | + kubernetesVersion: kubernetesVersion |
| 186 | + disableLocalAccounts: true |
| 187 | + enableRBAC: true |
| 188 | + dnsPrefix: dnsPrefix |
| 189 | + agentPoolProfiles: [ |
| 190 | + { |
| 191 | + name: 'system' |
| 192 | + osDiskSizeGB: 0 |
| 193 | + minCount: 3 |
| 194 | + maxCount: 5 |
| 195 | + enableAutoScaling: true |
| 196 | + maxPods: 50 |
| 197 | + vmSize: 'Standard_D4s_v5' |
| 198 | + type: 'VirtualMachineScaleSets' |
| 199 | + vnetSubnetID: clusterSubnetId |
| 200 | + mode: 'System' |
| 201 | + osDiskType: 'Ephemeral' |
| 202 | + } |
| 203 | + { |
| 204 | + name: 'user' |
| 205 | + osDiskSizeGB: 0 |
| 206 | + minCount: 3 |
| 207 | + maxCount: 20 |
| 208 | + enableAutoScaling: true |
| 209 | + maxPods: 50 |
| 210 | + vmSize: 'Standard_D4s_v5' |
| 211 | + type: 'VirtualMachineScaleSets' |
| 212 | + vnetSubnetID: clusterSubnetId |
| 213 | + mode: 'User' |
| 214 | + osDiskType: 'Ephemeral' |
| 215 | + } |
| 216 | + ] |
| 217 | + aadProfile: { |
| 218 | + managed: true |
| 219 | + enableAzureRBAC: true |
| 220 | + adminGroupObjectIDs: clusterAdmins |
| 221 | + tenantID: subscription().tenantId |
| 222 | + } |
| 223 | + networkProfile: { |
| 224 | + networkPlugin: 'azure' |
| 225 | + networkPolicy: 'azure' |
| 226 | + loadBalancerSku: 'standard' |
| 227 | + serviceCidr: serviceCidr |
| 228 | + dnsServiceIP: dnsServiceIP |
| 229 | + } |
| 230 | + apiServerAccessProfile: { |
| 231 | + authorizedIPRanges: [ |
| 232 | + '0.0.0.0/32' |
| 233 | + ] |
| 234 | + } |
| 235 | + autoUpgradeProfile: { |
| 236 | + upgradeChannel: 'stable' |
| 237 | + } |
| 238 | + oidcIssuerProfile: { |
| 239 | + enabled: true |
| 240 | + } |
| 241 | + addonProfiles: { |
| 242 | + azurepolicy: { |
| 243 | + enabled: true |
| 244 | + } |
| 245 | + omsagent: { |
| 246 | + enabled: true |
| 247 | + config: { |
| 248 | + logAnalyticsWorkspaceResourceID: workspaceId |
| 249 | + } |
| 250 | + } |
| 251 | + azureKeyvaultSecretsProvider: { |
| 252 | + enabled: true |
| 253 | + config: { |
| 254 | + enableSecretRotation: 'true' |
| 255 | + } |
| 256 | + } |
| 257 | + } |
| 258 | + } |
| 259 | +} |
| 260 | +``` |
| 261 | + |
| 262 | +## NOTES |
| 263 | + |
| 264 | +### Rule configuration |
| 265 | + |
| 266 | +<!-- module:config rule AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES --> |
| 267 | + |
| 268 | +This rule fails by default if you have less than three (3) nodes in the cluster across all system node pools. |
| 269 | +To change the default, set the `AZURE_AKS_CLUSTER_MINIMUM_SYSTEM_NODES` configuration option. |
24 | 270 |
|
25 | 271 | ## LINKS
|
26 | 272 |
|
27 |
| -- [Baseline architecture for an Azure Kubernetes Service (AKS) cluster](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks/secure-baseline-aks) |
28 |
| -- [Create an AKS cluster](https://docs.microsoft.com/azure/aks/use-multiple-node-pools#create-an-aks-cluster) |
29 |
| -- [Azure deployment reference](https://docs.microsoft.com/azure/templates/microsoft.containerservice/managedclusters) |
| 273 | +- [RE:05 Redundancy](https://learn.microsoft.com/azure/well-architected/reliability/redundancy) |
| 274 | +- [Azure Well-Architected Framework review - Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/well-architected/service-guides/azure-kubernetes-service) |
| 275 | +- [Manage node pools for a cluster in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/manage-node-pools) |
| 276 | +- [Manage system node pools in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/use-system-pools) |
| 277 | +- [Azure deployment reference](https://learn.microsoft.com/azure/templates/microsoft.containerservice/managedclusters) |
0 commit comments