
"On paper, Positron's next-gen Asimov accelerators, no doubt named for the beloved science fiction author, don't look like much of a match for Nvidia's Rubin GPUs. Yet, the Arm-backed AI startup boasts its inference chip will churn out five times as many tokens per dollar while using one-fifth the power of Nvidia's latest accelerators to do it. Those are certainly some bold claims, which the company contends are possible because the chip was designed to support large-scale inference workloads."
"Unlike its prior generation Atlas systems, which used high-bandwidth memory (HBM), the Asimov uses LPDDR5x memory, which can be expanded using Compute Express Link (CXL) from 864GB to 2.3TB per chip. Higher memory capacity means more room for LLM parameters and the key-value caches used to keep track of the model state. But while LPDDR5x is both cheaper and higher capacity than HBM, it's also glacially slow by comparison."
"Nvidia's newly announced Rubin GPUs pack 288GB of HBM4 good for 22 TB/s of peak bandwidth. By comparison, Asimov appears to top out at around 3 TB/s. The difference, the company claims, is its chips can actually saturate 90 percent of that bandwidth, while GPUs are lucky to hit 30 percent in the real world. However, that stat appears only to apply to the on-package LPDDR5x memory. Any CXL memory expansion is going to be limited by the chip's 32 PCIe 3.0 lanes, which are enough for about 256 GB/s of bandwidth."
Positron's Asimov inference accelerator claims to deliver five times as many tokens per dollar while consuming one-fifth the power of Nvidia's Rubin GPUs. The chip uses LPDDR5x memory expandable via Compute Express Link (CXL) from 864GB to 2.3TB per chip, increasing capacity for LLM parameters and key-value caches. LPDDR5x provides higher capacity and lower cost than HBM but much lower peak bandwidth. Positron reports the design can saturate roughly 90 percent of on-package LPDDR5x bandwidth, compared with about 30 percent real-world HBM utilization on GPUs. CXL expansion will be constrained by 32 PCIe 3.0 lanes, providing about 256 GB/s for KV-Cache storage.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]