
"The pair already combined forces on the Dell AI Factory with Nvidia, a fully integrated AI system, but this is now supported by Dell's Automation Platform, which offers centralized delivery and management of IT operations. The aim of this system is to help businesses rapidly deploy AI agents, the new must-have capability in the AI world, according to Varun Chhabra, Dell's Senior Vice President of Infrastructure."
"This enables KV cache offloading, which moves the large key-value (KV) cache data for processing large language models from GPU memory to cheaper storage, reducing GPU memory usage and improving performance. According to Dell, its own testing of this configuration delivered a one-second time to first token (TTFT), even with a full context window of 131,000 tokens, compared with the standard vLLM configuration, which took over 17 seconds."
Dell positions itself as a one-stop shop for enterprise AI infrastructure by expanding servers, storage, and software for AI and HPC workloads. The Dell AI Factory integrates with Nvidia and is now supported by the Dell Automation Platform for centralized delivery and IT operations management. The automation platform provides a curated catalog of validated workload blueprints to enable rapid deployment of AI agents on AI Factory with Nvidia configurations. Integration of Nvidia's Dynamo inference framework with PowerScale and ObjectScale via the NIXL library enables KV cache offloading to cheaper storage, reducing GPU memory usage and improving performance. Dell reported a one-second time to first token with a 131,000-token context window in its tests and is adding the PowerEdge XE8712 server for rack-scale, self-monitoring deployments.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]