#training-stability

[ follow ]
Artificial intelligence
fromBusiness Insider
13 hours ago

China's DeepSeek kicked off 2026 with a new AI training method that analysts say is a 'breakthrough' for scaling

DeepSeek developed Manifold-Constrained Hyper-Connections (mHC), a training method that enables richer internal model communication while preserving training stability and efficiency as models scale.
Artificial intelligence
fromInfoQ
1 month ago

Kimi's K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer

Kimi K2 is a Mixture-of-Experts LLM (32B activated, 1.04T total) trained on 15.5T tokens using MuonClip to improve training stability.
[ Load more ]