Connect to Azure Kubernetes Service (AKS) cluster nodes for maintenance or troubleshooting

Throughout the lifecycle of your Azure Kubernetes Service (AKS) cluster, you eventually need to directly access an AKS node. This access could be for maintenance, log collection, or troubleshooting operations.

You access a node through authentication, which methods vary depending on your Node OS and method of connection. You securely authenticate against AKS Linux and Windows nodes through two options discussed in this article. One requires that you have Kubernetes API access, and the other is through the AKS ARM API, which provides direct private IP information. For security reasons, AKS nodes aren't exposed to the internet. Instead, to connect directly to any AKS nodes, you need to use either kubectl debug or the host's private IP address.

Access nodes using the Kubernetes API

This method requires usage of kubectl debug command.

Before you begin

This guide shows you how to create a connection to an AKS node and update the SSH key of your AKS cluster. To follow along the steps, you need to use Azure CLI that supports version 2.0.64 or later. Run az --version to check the version. If you need to install or upgrade, see Install Azure CLI.

Complete these steps if you don't have an SSH key. Create an SSH key depending on your Node OS Image, for macOS and Linux, or Windows. Make sure you save the key pair in the OpenSSH format, avoid unsupported formats such as .ppk. Next, refer to Manage SSH configuration to add the key to your cluster.

Linux and macOS

Linux and macOS users can access their node using kubectl debug or their private IP Address. Windows users should skip to the Windows Server Proxy section for a workaround to SSH via proxy.

Connect using kubectl debug

To create an interactive shell connection, use the kubectl debug command to run a privileged container on your node.

  1. To list your nodes, use the kubectl get nodes command:

    kubectl get nodes -o wide
    

    Sample output:

    NAME                                STATUS   ROLES   AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE
    aks-nodepool1-37663765-vmss000000   Ready    agent   166m   v1.25.6   10.224.0.33   <none>        Ubuntu 22.04.2 LTS
    aks-nodepool1-37663765-vmss000001   Ready    agent   166m   v1.25.6   10.224.0.4    <none>        Ubuntu 22.04.2 LTS
    aksnpwin000000                      Ready    agent   160m   v1.25.6   10.224.0.62   <none>        Windows Server 2022 Datacenter
    
  2. Use the kubectl debug command to start a privileged container on your node and connect to it.

    kubectl debug node/aks-nodepool1-37663765-vmss000000 -it --image=mcr.azk8s.cn/cbl-mariner/busybox:2.0
    

    Sample output:

    Creating debugging pod node-debugger-aks-nodepool1-37663765-vmss000000-bkmmx with container debugger on node aks-nodepool1-37663765-vmss000000.
    If you don't see a command prompt, try pressing enter.
    root@aks-nodepool1-37663765-vmss000000:/#
    

    You now have access to the node through a privileged container as a debugging pod.

    Note

    You can interact with the node session by running chroot /host from the privileged container.

Exit kubectl debug mode

When you're done with your node, enter the exit command to end the interactive shell session. After the interactive container session closes, delete the debugging pod used with kubectl delete pod.

kubectl delete pod node-debugger-aks-nodepool1-37663765-vmss000000-bkmmx

Use Host Process Container to access Windows node

  1. Create hostprocess.yaml with the following content and replacing AKSWINDOWSNODENAME with the AKS Windows node name.

    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        pod: hpc
      name: hpc
    spec:
      securityContext:
        windowsOptions:
          hostProcess: true
          runAsUserName: "NT AUTHORITY\\SYSTEM"
      hostNetwork: true
      containers:
        - name: hpc
          image: mcr.azk8s.cn/windows/servercore:ltsc2022 # Use servercore:1809 for WS2019
          command:
            - powershell.exe
            - -Command
            - "Start-Sleep 2147483"
          imagePullPolicy: IfNotPresent
      nodeSelector:
        kubernetes.io/os: windows
        kubernetes.io/hostname: AKSWINDOWSNODENAME
      tolerations:
        - effect: NoSchedule
          key: node.kubernetes.io/unschedulable
          operator: Exists
        - effect: NoSchedule
          key: node.kubernetes.io/network-unavailable
          operator: Exists
        - effect: NoExecute
          key: node.kubernetes.io/unreachable
          operator: Exists
    
  2. Run kubectl apply -f hostprocess.yaml to deploy the Windows host process container (HPC) in the specified Windows node.

  3. Use kubectl exec -it [HPC-POD-NAME] -- powershell.

  4. You can run any PowerShell commands inside the HPC container to access the Windows node.

Note

You need to switch the root folder to C:\ inside the HPC container to access the files in the Windows node.

Next steps

If you need more troubleshooting data, you can view the kubelet logs or view the Kubernetes control plane logs.

To learn about managing your SSH keys, see Manage SSH configuration.