
"According to The Information, internal tests by DeepSeek employees have shown that V4 may outperform rivals such as Anthropic's Claude and OpenAI's GPT series, specifically in coding tasks. The latest V4 model has also achieved breakthroughs in processing extremely long code prompts. This could be a significant advantage for developers working on complex software projects. The processing capacity for long contexts builds on the sparse attention technology in V3.2-Exp."
"DeepSeek uses a Mixture of Experts (MoE) architecture that is more energy-efficient than classic dense models. V3 already had 671 billion parameters, with only a portion being activated per prompt. DeepSeek has attracted worldwide attention with its efficient approach. Training the R1 model reportedly cost only $294,000, significantly less than what US companies estimate for comparable models. Nevertheless, the company is under increasing scrutiny."
Internal tests indicate V4 outperforms rivals such as Anthropic's Claude and OpenAI's GPT series in coding tasks. V4 achieves breakthroughs in processing extremely long code prompts, benefiting developers on complex projects. The long-context processing builds on sparse attention technology used in V3.2-Exp. DeepSeek employs a Mixture of Experts (MoE) architecture that is more energy-efficient than classic dense models. V3 had 671 billion parameters with only a portion activated per prompt. Training the R1 model reportedly cost $294,000, far lower than comparable estimates from US companies. The company faces increasing scrutiny and investigations over security and privacy practices ahead of V4's mid-February launch.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]