Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
Briefly

Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
"Dynamic Resource Allocation (DRA) is now the standard for GPU resource use in Kubernetes. Instead of static resources like nvidia.com/gpu, GPUs are allocated dynamically using DeviceClasses and ResourceClaims. This change enhances scheduling and improves integration with virtualization technologies like NVIDIA vGPU."
"Virtual accelerators like NVIDIA vGPU often handle smaller tasks. They allow one physical GPU to be split among many users or applications. This setup is helpful for enterprise AI/ML development, fine-tuning, and audio/visual processing. vGPU offers predictable performance while still providing CUDA capabilities to containerized workloads."
"On the infrastructure side, this feature relies on Azure's NVadsA10_v5 virtual machine series. Instead of assigning the whole GPU to one VM, vGPU technology partitions it into multiple fixed-size slices at the hypervisor layer. From Kubernetes' view, each VM shows one clear GPU device."
Azure Kubernetes Service now supports Dynamic Resource Allocation (DRA) as the standard for GPU resource management, replacing static allocation methods. DRA uses DeviceClasses and ResourceClaims for dynamic GPU allocation, improving scheduling and virtualization integration. NVIDIA vGPU technology partitions physical GPUs into multiple virtual slices, allowing one GPU to serve multiple users or applications while maintaining predictable performance and CUDA capabilities. This approach benefits enterprise AI/ML development, fine-tuning, and audio/visual processing. The infrastructure uses Azure's NVadsA10_v5 virtual machine series, with vGPU partitioning occurring at the hypervisor layer. Implementation requires Kubernetes 1.34 or newer, provisioning node pools with appropriate labels, and deploying the NVIDIA DRA driver via Helm with specific configuration flags.
Read at InfoQ
Unable to calculate read time
[
|
]