Data science
fromTheregister
1 day agoDeepSeek's new models offer big inference cost savings
DeepSeek V4 introduces a new large language model that rivals top American models while reducing inference costs and supporting Huawei's AI accelerators.
PolarQuant is doing most of the compression, but the second step cleans up the rough spots. Google proposes smoothing that out with a technique called Quantized Johnson-Lindenstrauss (QJL).
Dr. Conor Boland explained that red-light timing can erase small speed advantages, allowing a slower car to catch up again and again. He noted, 'You pass a car, and then a few minutes later, it ends up beside you again.' This phenomenon is partly psychological, as we remember surprising moments when the same car shows up again, but it is also built into how traffic works.
You settle in for a quick scroll through your feed, maybe just to unwind for a minute or two. But somewhere between a cooking hack and a clip you've already forgotten, forty minutes vanished. It's all a blur. Welcome to the era of infinite content and finite attention, where our brains are working overtime just to keep up with the deluge.
Four generations, MTIA 300, 400, 450, and 500, have been produced within less than two years, with several already in production and others scheduled for mass deployment in 2026 and 2027. The quick pace is deliberate. Rather than betting on a single chip generation and waiting years for results, Meta has adopted a roughly six-month cadence per generation, using modular chiplet architecture to enable incremental upgrades without replacing entire rack systems.
The condition is described as mental fatigue that can occur when people use AI tools to an extent that exceeds their cognitive capacity. Symptoms can include mental fog, difficulty concentrating, slower decision-making, and sometimes headaches.
What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Since AlexNet5, deep learning has replaced heuristic hand-crafted features by unifying feature learning with deep neural networks. Later, Transformers6 and GPT-3 (ref. 1) further advanced sequence learning at scale, unifying structured tasks such as natural language processing. However, multimodal learning, spanning modalities such as images, video and text, has remained fragmented, relying on separate diffusion-based generation or compositional vision-language pipelines with many hand-crafted designs.
"To enable the next generation of foundation models, we must solve the problem of continual learning: enabling AI systems to keep learning and improving over time, similar to how humans accumulate knowledge and refine skills throughout their lives," the researchers noted. Reinforcement learning offers a way to train on data generated by the model's own policy, which reduces forgetting. However, it typically requires explicit reward functions, which are not easy in every situation.
Each of these achievements would have been a remarkable breakthrough on its own. Solving them all with a single technique is like discovering a master key that unlocks every door at once. Why now? Three pieces converged: algorithms, computing power, and massive amounts of data. We can even put faces to them, because behind each element is a person who took a gamble.
AI companies are duking it out for greater and greater quantities of memory chips. The problem? The industry is heavily supply-constrained. Costs have skyrocketed, products have been tied up, and some companies - especially those in consumer electronics - are increasing prices. On the AI front, Google DeepMind CEO Demis Hassabis told CNBC that physical challenges were "constraining a lot of deployment."