
"The success of DeepSeek's powerful artificial intelligence (AI) model R1 that made the US stock market plummet when it was released in January did not hinge on being trained on the output of its rivals, researchers at the Chinese firm have said. The statement came in documents released alongside a peer-reviewed version of the R1 model, published today in Nature."
"Its supplementary material reveals for the first time how much R1 cost to train: the equivalent of just US$294,000. This comes on top of the $6 million or so that the company, based in Hangzhou, spent to make the base LLM that R1 is built on, but the total amount is still substantially less than the tens of millions of dollars that rival models are thought to have cost."
"The paper updates a preprint released in January, which describes how DeepSeek augmented a standard large language model (LLM) to tackle reasoning tasks. R1 is designed to excel at reasoning' tasks such as mathematics and coding, and is a cheaper rival to tools developed by US technology firms. As an open weight' model, it is available for anyone to download and is the most popular such model on the AI community platform Hugging Face to date, having been downloaded 10.9 million times."
R1 is an AI model optimized for reasoning tasks such as mathematics and coding and positioned as a cheaper rival to US technology firms' tools. The model is open-weight and has been downloaded 10.9 million times from the Hugging Face platform, making it highly popular. Supplementary materials reveal R1's incremental training cost was about US$294,000, on top of roughly $6 million to develop the base LLM, leaving total expenses far below the tens of millions typical for rival models. Training was conducted mainly on Nvidia H800 chips subject to 2023 export controls.
#large-language-models #model-training-cost #open-weight-models #nvidia-h800-export-controls #ai-reasoning
Read at www.nature.com
Unable to calculate read time
Collection
[
|
...
]