DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch
Briefly

DeepSeek has launched a smaller, distilled AI model named DeepSeek-R1-0528-Qwen3-8B which shows promise by outperforming other comparable models in various math reasoning tests. Built on the Qwen3-8B foundation, it notably eclipses Google’s Gemini 2.5 Flash in the AIME 2025 math challenge and performs close to Microsoft’s Phi 4 on HMMT tests. This model offers a less resource-intensive alternative to the full R1, making it an attractive option for both researchers and developers, and is available under an MIT license for broader use.
DeepSeek's new distilled model, DeepSeek-R1-0528-Qwen3-8B, challenges larger AI counterparts, showcasing significant advancements in math reasoning tasks despite being computationally less demanding.
The smaller DeepSeek-R1-0528-Qwen3-8B model, derived from Qwen3-8B, outperforms comparable models like Google's Gemini 2.5 Flash on rigorous benchmarks.
DeepSeek-R1-0528-Qwen3-8B is tailored for both academic and industrial use, highlighting a trend towards more accessible AI models that require less computational power.
With its release under an MIT license, DeepSeek-R1-0528-Qwen3-8B is available for commercial use, showcasing a growing trend in shared AI resources.
Read at TechCrunch
[
|
]