#model-scaling

[ follow ]
Artificial intelligence
fromTechCrunch
1 week ago

In 2026, AI will move from hype to pragmatism | TechCrunch

2026 shifts AI from brute-force scaling to practical deployment: smaller models, embedded intelligence, human-centered systems, and new architecture research.
fromTechzine Global
1 week ago

DeepSeek breakthrough gives LLMs the highways it has long needed

As LLMs cannot grow infinitely large but do improve with size, researchers must find ways to make the technology effective at smaller scales. One well-known method is Mixture-of-Experts, where an LLM activates only a portion of itself to generate a response (text, photo, video) based on a prompt. This makes a larger model effectively smaller and faster during operation. mHC promises to be even more fundamental. It offers the chance to increase model complexity without the pain points of the past.
Artificial intelligence
Artificial intelligence
fromBusiness Insider
1 week ago

China's DeepSeek kicked off 2026 with a new AI training method that analysts say is a 'breakthrough' for scaling

DeepSeek developed Manifold-Constrained Hyper-Connections (mHC), a training method that enables richer internal model communication while preserving training stability and efficiency as models scale.
fromBusiness Insider
1 month ago

Databricks CEO says AGI is already here - and Silicon Valley just keeps moving the goalposts

Everybody would say yes, but we kept moving the goalposts," Ghodsi said in the discussion, which was published Tuesday.
Artificial intelligence
Artificial intelligence
fromWIRED
2 months ago

The AI Industry's Scaling Obsession Is Headed for a Cliff

Very large, compute-heavy AI models will likely yield diminishing performance returns over the next decade, while efficiency improvements will make smaller models increasingly capable.
fromArs Technica
3 months ago

Anthropic says its new AI model "maintained focus" for 30 hours on multistep tasks

On Monday, Anthropic released Claude Sonnet 4.5, a new AI language model the company calls its "most capable model to date," with improved coding and computer use capabilities. The company also revealed Claude Code 2.0, a command-line AI agent for developers, and the Claude Agent SDK, which is a tool developers can use to build their own AI coding agents.
Artificial intelligence
Artificial intelligence
fromHackernoon
1 year ago

Empirical Validation of Multi-Token Prediction for LLMs | HackerNoon

Multi-token prediction enhances model performance by scaling size, improving inference speed, and learning long-term patterns.
Artificial intelligence
fromHackernoon
56 years ago

Multi-Token Prediction: Architecture for Memory-Efficient LLM Training | HackerNoon

Multi-token prediction enhances language modeling efficacy by allowing simultaneous forecasting of multiple tokens.
Improved model performance scales with increased size.
[ Load more ]