Node autoprovisioning (preview)

When you deploy workloads onto AKS, you need to make a decision about the node pool configuration regarding the VM size needed. As your workloads become more complex, and require different CPU, memory, and capabilities to run, the overhead of having to design your VM configuration for numerous resource requests becomes difficult.

Node autoprovisioning (NAP) (preview) decides based on pending pod resource requirements the optimal VM configuration to run those workloads in the most efficient and cost effective manner.

NAP is based on the Open Source Karpenter project, and the AKS provider is also Open Source. NAP automatically deploys and configures and manages Karpenter on your AKS clusters.

Important

Node autoprovisioning (NAP) for AKS is currently in PREVIEW. See the Supplemental Terms of Use for Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Before you begin

Install the aks-preview CLI extension

  1. Install the aks-preview CLI extension using the az extension add command.

    az extension add --name aks-preview
    
  2. Update the extension to ensure you have the latest version installed using the az extension update command.

    az extension update --name aks-preview
    

Register the NodeAutoProvisioningPreview feature flag

  1. Register the NodeAutoProvisioningPreview feature flag using the az feature register command.

    az feature register --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"
    

    It takes a few minutes for the status to show Registered.

  2. Verify the registration status using the az feature show command.

    az feature show --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"
    
  3. When the status reflects Registered, refresh the registration of the Microsoft.ContainerService resource provider using the az provider register command.

    az provider register --namespace Microsoft.ContainerService
    

Limitations

  • The only network configuration allowed is Cilium + Overlay + Azure
  • You can't enable in a cluster where node pools have cluster autoscaler enabled

Unsupported features

  • Windows node pools
  • Applying custom configuration to the node kubelet
  • IPv6 clusters
  • Service Principals

    Note

    You can use either a system-assigned or user-assigned managed identity.

  • Disk encryption sets
  • CustomCATrustCertificates
  • Start Stop mode
  • HTTP proxy
  • OutboundType mutation. All OutboundTypes are supported, however you can't change them after creation.
  • Private cluster (and BYO private DNS)

Enable node autoprovisioning

Enable node autoprovisioning on a new cluster

  • Enable node autoprovisioning on a new cluster using the az aks create command and set --node-provisioning-mode to Auto. You also need to set the --network-plugin to azure, --network-plugin-mode to overlay, and --network-dataplane to cilium.

    az aks create \
        --name $CLUSTER_NAME \
        --resource-group $RESOURCE_GROUP_NAME \
        --node-provisioning-mode Auto \
        --network-plugin azure \
        --network-plugin-mode overlay \
        --network-dataplane cilium \
        --generate-ssh-keys
    

Enable node autoprovisioning on an existing cluster

  • Enable node autoprovisioning on an existing cluster using the az aks update command and set --node-provisioning-mode to Auto. You also need to set the --network-plugin to azure, --network-plugin-mode to overlay, and --network-dataplane to cilium.

    az aks update --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP_NAME --node-provisioning-mode Auto --network-plugin azure --network-plugin-mode overlay --network-dataplane cilium
    

Node pools

Node autoprovision uses a list of VM SKUs as a starting point to decide which is best suited for the workloads that are in a pending state. Having control over what SKU you want in the initial pool allows you to specify specific SKU families, or VM types and the maximum amount of resources a provisioner uses.

If you have specific VM SKUs that are reserved instances, for example, you may wish to only use those VMs as the starting pool.

You can have multiple node pool definitions in a cluster, but AKS deploys a default node pool definition that you can modify:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  template:
    spec:
      nodeClassRef:
        name: default

      # Requirements that constrain the parameters of provisioned nodes.
      # These requirements are combined with pod.spec.affinity.nodeAffinity rules.
      # Operators { In, NotIn, Exists, DoesNotExist, Gt, and Lt } are supported.
      # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#operators
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: karpenter.azure.com/sku-family
        operator: In
        values:
        - D

Supported node provisioner requirements

SKU selectors with well known labels

Selector Description Example
karpenter.azure.com/sku-family VM SKU Family D, F, L etc.
karpenter.azure.com/sku-name Explicit SKU name Standard_A1_v2
karpenter.azure.com/sku-version SKU version (without "v", can use 1) 1 , 2
karpenter.sh/capacity-type VM allocation type (Spot / On Demand) spot or on-demand
karpenter.azure.com/sku-cpu Number of CPUs in VM 16
karpenter.azure.com/sku-memory Memory in VM in MiB 131072
karpenter.azure.com/sku-gpu-name GPU name A100
karpenter.azure.com/sku-gpu-manufacturer GPU manufacturer nvidia
karpenter.azure.com/sku-gpu-count GPU count per VM 2
karpenter.azure.com/sku-networking-accelerated Whether the VM has accelerated networking [true, false]
karpenter.azure.com/sku-storage-premium-capable Whether the VM supports Premium IO storage [true, false]
karpenter.azure.com/sku-storage-ephemeralos-maxsize Size limit for the Ephemeral OS disk in Gb 92
topology.kubernetes.io/zone The Availability Zone(s) [chinaeast2,chinanorth3]
kubernetes.io/os Operating System (Linux only during preview) linux
kubernetes.io/arch CPU architecture (AMD64 or ARM64) [amd64, arm64]

To list the VM SKU capabilities and allowed values, use the vm list-skus command from the Azure CLI.

az vm list-skus --resource-type virtualMachines --location <location> --query '[].name' --output table

Node pool limits

By default, NAP attempts to schedule your workloads within the Azure quota you have available. You can also specify the upper limit of resources that is used by a node pool, specifying limits within the node pool spec.

  # Resource limits constrain the total size of the cluster.
  # Limits prevent Karpenter from creating new instances once the limit is exceeded.
  limits:
    cpu: "1000"
    memory: 1000Gi

Node pool weights

When you have multiple node pools defined, it's possible to set a preference of where a workload should be scheduled. Define the relative weight on your Node pool definitions.

  # Priority given to the node pool when the scheduler considers which to select. Higher weights indicate higher priority when comparing node pools.
  # Specifying no weight is equivalent to specifying a weight of 0.
  weight: 10

Kubernetes and node image updates

AKS with NAP manages the Kubernetes version upgrades and VM OS disk updates for you by default.

Kubernetes upgrades

Kubernetes upgrades for NAP node pools follows the Control Plane Kubernetes version. If you perform a cluster upgrade, your NAP nodes are updated automatically to follow the same versioning.

Node image updates

By default NAP node pool virtual machines are automatically updated when a new image is available. If you wish to pin a node pool at a certain node image version, you can set the imageVersion on the node class:

kubectl edit aksnodeclass default

Within the node class definition, set the imageVersion to one of the published releases listed on the AKS Release notes. You can also see the availability of images in regions by referring to the AKS release tracker

The imageVersion is the date portion on the Node Image as only Ubuntu 22.04 is supported, for example, "AKSUbuntu-2204-202311.07.0" would be "202311.07.0"

apiVersion: karpenter.azure.com/v1alpha2
kind: AKSNodeClass
metadata:
  annotations:
    kubernetes.io/description: General purpose AKSNodeClass for running Ubuntu2204
      nodes
    meta.helm.sh/release-name: aks-managed-karpenter-overlay
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2023-11-16T23:59:06Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: karpenter-overlay-main-adapter-helmrelease
    helm.toolkit.fluxcd.io/namespace: 6556abcb92c4ce0001202e78
  name: default
  resourceVersion: "1792"
  uid: 929a5b07-558f-4649-b78b-eb25e9b97076
spec:
  imageFamily: Ubuntu2204
  imageVersion: 202311.07.0
  osDiskSizeGB: 128

Removing the imageVersion spec would revert the node pool to be updated to the latest node image version.

Node disruption

When the workloads on your nodes scale down, NAP uses disruption rules on the Node pool specification to decide when and how to remove those nodes and potentially reschedule your workloads to be more efficient.

You can remove a node manually using kubectl delete node, but NAP can also control when it should optimize your nodes.

  disruption:
    # Describes which types of Nodes NAP should consider for consolidation
    consolidationPolicy: WhenUnderutilized | WhenEmpty
    # 'WhenUnderutilized', NAP will consider all nodes for consolidation and attempt to remove or replace Nodes when it discovers that the Node is underutilized and could be changed to reduce cost

    #  `WhenEmpty`, NAP will only consider nodes for consolidation that contain no workload pods
    
    # The amount of time NAP should wait after discovering a consolidation decision
    # This value can currently only be set when the consolidationPolicy is 'WhenEmpty'
    # You can choose to disable consolidation entirely by setting the string value 'Never'
    consolidateAfter: 30s

Monitoring selection events

Node autoprovision produces cluster events that can be used to monitor deployment and scheduling decisions being made. You can view events through the Kubernetes events stream.

kubectl get events -A --field-selector source=karpenter -w