#deepseek-r1
#deepseek-r1

[ follow ]

Researchers Hack DeepSeek to Speak Freely About Tiananmen Square

Researchers compressed DeepSeek R1 by 55% and removed its censorship using quantum-inspired tensor-network compression while maintaining performance and reducing parameters.

Artificial intelligence

fromWIRED

8 months ago

Distillation Can Make AI Models Smaller and Cheaper

Knowledge distillation enables smaller models to mimic larger ones efficiently and can explain DeepSeek R1's claims and the resulting industry reaction.

Artificial intelligence

fromIT Pro

8 months ago

DeepSeek's R1 model training costs pour cold water on big tech's massive AI spending

DeepSeek trained its R1 reasoning model for about $294,000 using 512 Nvidia H800 chips, plus ~$6M for its base LLM.

[ Load more ]

#deepseek-r1#deepseek-r1

Researchers Hack DeepSeek to Speak Freely About Tiananmen Square

Distillation Can Make AI Models Smaller and Cheaper

DeepSeek's R1 model training costs pour cold water on big tech's massive AI spending

#deepseek-r1
#deepseek-r1