"The Chinese AI startup published a research paper on Wednesday, describing a method to train large language models that could shape "the evolution of foundational models," it said. The paper, co-authored by its founder Liang Wenfeng, introduces what DeepSeek calls "Manifold-Constrained Hyper-Connections," or mHC, a training approach designed to scale models without them becoming unstable or breaking altogether. As language models grow, researchers often try to improve performance by allowing different parts of a model to share more information internally."
"As language models grow, researchers often try to improve performance by allowing different parts of a model to share more information internally. However, this increases the risk of the information becoming unstable, the paper said. DeepSeek's latest research enables models to share richer internal communication in a constrained manner, preserving training stability and computational efficiency even as models scale, it added."
DeepSeek introduced Manifold-Constrained Hyper-Connections (mHC) to enable model scaling without training instability. mHC permits richer internal communication among model components while constraining those interactions to maintain stability and computational efficiency. The approach combines techniques to limit additional training cost and to improve performance even with modest extra compute. mHC supports end-to-end redesign of the training stack to enable rapid experimentation and unconventional architectures. The method positions DeepSeek to scale toward larger flagship models and to mitigate compute bottlenecks during model development.
#manifold-constrained-hyper-connections-mhc #model-scaling #training-stability #computational-efficiency
Read at Business Insider
Unable to calculate read time
Collection
[
|
...
]