Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 9 submissions in the queue.

Submission Preview

No link to story available

How a top Chinese AI model overcame US sanctions

Accepted submission by AnonTechie at 2025-01-25 11:17:59 from the Learning From Your Mistakes - Dept. dept.
/dev/random

An interesting article about the development of DeepSeek R1

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model.

The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost.

“This could be a truly equalizing breakthrough that is great for researchers and developers with limited resources, especially those from the Global South,” says Hancheng Cao, an assistant professor in information systems at Emory University.

DeepSeek’s success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China’s AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.

To create R1, DeepSeek had to rework its training process to reduce the strain on its GPUs, a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products, according to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University.

DeepSeek R1 has been praised by researchers for its ability to tackle complex reasoning tasks, particularly in mathematics and coding. The model employs a “chain of thought” approach similar to that used by ChatGPT o1, which lets it solve problems by processing queries step by step.

Dimitris Papailiopoulos, principal researcher at Microsoft’s AI Frontiers research lab, says what surprised him the most about R1 is its engineering simplicity. “DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness,” he says.

DeepSeek has also released six smaller versions of R1 that are small enough to run locally on laptops. It claims that one of them even outperforms OpenAI’s o1-mini on certain benchmarks.“DeepSeek has largely replicated o1-mini and has open sourced it,” tweeted Perplexity CEO Aravind Srinivas. DeepSeek did not reply to MIT Technology Review’s request for comments.

MIT Technology Review [technologyreview.com]


Original Submission