DeepSeek introduces self-learning AI models
Briefly

DeepSeek has partnered with Tsinghua University to innovate AI training processes, developing models that minimize operational costs. Their approach employs reinforcement learning to reward models with accurate responses, which aligns AI outputs closer to human preferences. This new method, called self-principled critique tuning, has shown superior performance on benchmarks while requiring less computational power. The resulting models are termed DeepSeek-GRM and will be released open-source. As other companies like Alibaba and OpenAI explore similar advancements, DeepSeek also integrates a mixture of experts architecture to optimize resource use.
DeepSeek is collaborating with Tsinghua University to develop AI models that are cheaper and more efficient, using a new reinforcement learning approach.
The new method aligns AI outputs with human preferences by rewarding accurate and comprehensible answers, aiming to enhance efficiency in broader applications.
DeepSeek's self-principled critique tuning outperforms existing models on benchmarks, reducing computing power needed while enhancing performance in various tasks.
The startup's models utilize a mixture of experts architecture, similar to Meta's Llama 4, but with no announcement yet on their next flagship model.
Read at Techzine Global
[
|
]