Microsoft expands AKS with RAG functionality and vLLM support
Briefly

Microsoft has expanded its Azure Kubernetes Service (AKS) during KubeCon by introducing Retrieval Augmented Generation (RAG) capabilities in KAITO, facilitating advanced search functionality for developers. With RAG, deployment of the engine is expedited, allowing quick indexing of large datasets. Furthermore, the vLLM serving engine is now the default for the AI toolchain operator add-on, ensuring a notable boost in request processing speed. Additionally, AKS offers users the ability to install custom GPU drivers, enhancing operational flexibility for NVIDIA GPU support across different node pools.
During KubeCon, Microsoft announced that it supports Retrieval Augmented Generation (RAG) in KAITO on Azure Kubernetes Service (AKS) clusters, enhancing advanced search features for developers.
The AI toolchain operator add-on now implements model inference workloads with the vLLM serving engine by default, providing a significant acceleration in processing incoming requests.
Developers can choose to install custom GPU drivers or use the GPU Operator on AKS, giving them flexibility in managing NVIDIA GPU installations.
With RAG support in KAITO, users can deploy the RAG engine and a supported embedding model to index and search large datasets with minimal setup.
Read at Techzine Global
[
|
]