Pod Sandboxing (preview) with Azure Kubernetes Service (AKS)
To help secure and protect your container workloads from untrusted or potentially malicious code, AKS now includes a mechanism called Pod Sandboxing (preview). Pod Sandboxing provides an isolation boundary between the container application, and the shared kernel and compute resources of the container host. For example CPU, memory, and networking. Pod Sandboxing complements other security measures or data protection controls with your overall architecture to help you meet regulatory, industry, or governance compliance requirements for securing sensitive information.
This article helps you understand this new feature, and how to implement it.
Prerequisites
The Azure CLI version 2.44.1 or later. Run
az --version
to find the version, and runaz upgrade
to upgrade the version. If you need to install or upgrade, see Install Azure CLI.The
aks-preview
Azure CLI extension version 0.5.123 or later.Register the
KataVMIsolationPreview
feature in your Azure subscription.AKS supports Pod Sandboxing (preview) on version 1.24.0 and higher with all AKS network plugins.
To manage a Kubernetes cluster, use the Kubernetes command-line client kubectl. Azure Power Shell comes with
kubectl
. You can install kubectl locally using the az aks install-cli command.
Install the aks-preview Azure CLI extension
Important
AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:
To install the aks-preview extension, run the following command:
az extension add --name aks-preview
Run the following command to update to the latest version of the extension released:
az extension update --name aks-preview
Register the KataVMIsolationPreview feature flag
Register the KataVMIsolationPreview
feature flag by using the az feature register command, as shown in the following example:
az feature register --namespace "Microsoft.ContainerService" --name "KataVMIsolationPreview"
It takes a few minutes for the status to show Registered. Verify the registration status by using the az feature show command:
az feature show --namespace "Microsoft.ContainerService" --name "KataVMIsolationPreview"
When the status reflects Registered, refresh the registration of the Microsoft.ContainerService resource provider by using the az provider register command:
az provider register --namespace "Microsoft.ContainerService"
Limitations
The following are constraints with this preview of Pod Sandboxing (preview):
Kata containers may not reach the IOPS performance limits that traditional containers can reach on Azure Files and high performance local SSD.
Microsoft Defender for Containers doesn't support assessing Kata runtime pods.
Kata host-network isn't supported.
How it works
To achieve this functionality on AKS, Kata Containers running on the Azure Linux container host for AKS stack delivers hardware-enforced isolation. Pod Sandboxing extends the benefits of hardware isolation such as a separate kernel for each Kata pod. Hardware isolation allocates resources for each pod and doesn't share them with other Kata Containers or namespace containers running on the same host.
The solution architecture is based on the following components:
- The Azure Linux container host for AKS
- Microsoft Hyper-V Hypervisor
- Azure-tuned Dom0 Linux Kernel
- Open-source Cloud-Hypervisor Virtual Machine Monitor (VMM)
- Integration with Kata Container framework
Deploying Pod Sandboxing using Kata Containers is similar to the standard containerd workflow to deploy containers. The deployment includes kata-runtime options that you can define in the pod template.
To use this feature with a pod, the only difference is to add runtimeClassName kata-mshv-vm-isolation to the pod spec.
When a pod uses the kata-mshv-vm-isolation runtimeClass, it creates a VM to serve as the pod sandbox to host the containers. The VM's default memory is 2 GB and the default CPU is one core if the Container resource manifest (containers[].resources.limits
) doesn't specify a limit for CPU and memory. When you specify a limit for CPU or memory in the container resource manifest, the VM has containers[].resources.limits.cpu
with the 1
argument to use one + xCPU, and containers[].resources.limits.memory
with the 2
argument to specify 2 GB + yMemory. Containers can only use CPU and memory to the limits of the containers. The containers[].resources.requests
are ignored in this preview while we work to reduce the CPU and memory overhead.
Deploy new cluster
Perform the following steps to deploy an Azure Linux AKS cluster using the Azure CLI.
Create an AKS cluster using the az aks create command and specifying the following parameters:
- --workload-runtime: Specify KataMshvVmIsolation to enable the Pod Sandboxing feature on the node pool. With this parameter, these other parameters shall satisfy the following requirements. Otherwise, the command fails and reports an issue with the corresponding parameter(s).
- --os-sku: AzureLinux. Only the Azure Linux os-sku supports this feature in this preview release.
- --node-vm-size: Any Azure VM size that is a generation 2 VM and supports nested virtualization works. For example, Dsv3 VMs.
The following example creates a cluster named myAKSCluster with one node in the myResourceGroup:
az aks create --name myAKSCluster \ --resource-group myResourceGroup \ --os-sku AzureLinux \ --workload-runtime KataMshvVmIsolation \ --node-vm-size Standard_D4s_v3 \ --node-count 1 \ --generate-ssh-keys
Run the following command to get access credentials for the Kubernetes cluster. Use the az aks get-credentials command and replace the values for the cluster name and the resource group name.
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
List all Pods in all namespaces using the kubectl get pods command.
kubectl get pods --all-namespaces
Deploy to an existing cluster
To use this feature with an existing AKS cluster, the following requirements must be met:
- Follow the steps to register the KataVMIsolationPreview feature flag.
- Verify the cluster is running Kubernetes version 1.24.0 and higher.
Use the following command to enable Pod Sandboxing (preview) by creating a node pool to host it.
Add a node pool to your AKS cluster using the az aks nodepool add command. Specify the following parameters:
- --resource-group: Enter the name of an existing resource group to create the AKS cluster in.
- --cluster-name: Enter a unique name for the AKS cluster, such as myAKSCluster.
- --name: Enter a unique name for your clusters node pool, such as nodepool2.
- --workload-runtime: Specify KataMshvVmIsolation to enable the Pod Sandboxing feature on the node pool. Along with the
--workload-runtime
parameter, these other parameters shall satisfy the following requirements. Otherwise, the command fails and reports an issue with the corresponding parameter(s).- --os-sku: AzureLinux. Only the Azure Linux os-sku supports this feature in the preview release.
- --node-vm-size: Any Azure VM size that is a generation 2 VM and supports nested virtualization works. For example, Dsv3 VMs.
The following example adds a node pool to myAKSCluster with one node in nodepool2 in the myResourceGroup:
az aks nodepool add --cluster-name myAKSCluster --resource-group myResourceGroup --name nodepool2 --os-sku AzureLinux --workload-runtime KataMshvVmIsolation --node-vm-size Standard_D4s_v3
Run the az aks update command to enable pod sandboxing (preview) on the cluster.
az aks update --name myAKSCluster --resource-group myResourceGroup
Deploy a trusted application
To demonstrate deployment of a trusted application on the shared kernel in the AKS cluster, perform the following steps.
Create a file named trusted-app.yaml to describe a trusted pod, and then paste the following manifest.
kind: Pod apiVersion: v1 metadata: name: trusted spec: containers: - name: trusted image: mcr.azk8s.cn/aks/fundamental/base-ubuntu:v0.0.11 command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
Deploy the Kubernetes pod by running the kubectl apply command and specify your trusted-app.yaml file:
kubectl apply -f trusted-app.yaml
The output of the command resembles the following example:
pod/trusted created
Deploy an untrusted application
To demonstrate the deployment of an untrusted application into the pod sandbox on the AKS cluster, perform the following steps.
Create a file named untrusted-app.yaml to describe an untrusted pod, and then paste the following manifest.
kind: Pod apiVersion: v1 metadata: name: untrusted spec: runtimeClassName: kata-mshv-vm-isolation containers: - name: untrusted image: mcr.azk8s.cn/aks/fundamental/base-ubuntu:v0.0.11 command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
The value for runtimeClassNameSpec is
kata-mhsv-vm-isolation
.Deploy the Kubernetes pod by running the kubectl apply command and specify your untrusted-app.yaml file:
kubectl apply -f untrusted-app.yaml
The output of the command resembles the following example:
pod/untrusted created
Verify Kernel Isolation configuration
To access a container inside the AKS cluster, start a shell session by running the kubectl exec command. In this example, you're accessing the container inside the untrusted pod.
kubectl exec -it untrusted -- /bin/bash
Kubectl connects to your cluster, runs
/bin/sh
inside the first container within the untrusted pod, and forward your terminal's input and output streams to the container's process. You can also start a shell session to the container hosting the trusted pod.After starting a shell session to the container of the untrusted pod, you can run commands to verify that the untrusted container is running in a pod sandbox. You'll notice that it has a different kernel version compared to the trusted container outside the sandbox.
To see the kernel version run the following command:
uname -r
The following example resembles output from the pod sandbox kernel:
root@untrusted:/# uname -r 5.15.48.1-8.cm2
Start a shell session to the container of the trusted pod to verify the kernel output:
kubectl exec -it trusted -- /bin/bash
To see the kernel version run the following command:
uname -r
The following example resembles output from the VM that is running the trusted pod, which is a different kernel than the untrusted pod running within the pod sandbox:
5.15.80.mshv2-hvl1.m2
Cleanup
When you're finished evaluating this feature, to avoid Azure charges, clean up your unnecessary resources. If you deployed a new cluster as part of your evaluation or testing, you can delete the cluster using the az aks delete command.
az aks delete --resource-group myResourceGroup --name myAKSCluster
If you enabled Pod Sandboxing (preview) on an existing cluster, you can remove the pod(s) using the kubectl delete pod command.
kubectl delete pod pod-name
Next steps
Learn more about Azure Dedicated hosts for nodes with your AKS cluster to use hardware isolation and control over Azure platform maintenance events.