Title | DeepDive into DeepSeek | |
Date | Sunday February 02, @08:08AM | |
Author | hubie | |
Topic | ||
from the dept. |
https://www.wired.com/story/deepseek-china-model-ai/
https://web.archive.org/web/20250125102155/https://www.wired.com/story/deepseek-china-model-ai/
On January 20, DeepSeek, a relatively unknown AI research lab from China, released an open source model that's quickly become the talk of the town in Silicon Valley. According to a paper authored by the company, DeepSeek-R1 beats the industry's leading models like OpenAI o1 on several math and reasoning benchmarks. In fact, on many metrics that matter—capability, cost, openness—DeepSeek is giving Western AI giants a run for their money.
https://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/
The releases immediately caught the attention of the AI community because most existing open-weights models—which can often be run and fine-tuned on local hardware—have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models.
https://github.com/deepseek-ai/DeepSeek-R1
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.
Check leaderboard and compare at Chatbot Arena: https://lmarena.ai/
China's DeepSeek AI dethrones ChatGPT on App Store: Here's what you should know:
Some American tech CEOs are clambering to respond before clients switch to potentially cheaper offerings from DeepSeek, with Meta reportedly starting four DeepSeek-related "war rooms" within its generative AI department.
Microsoft CEO Satya Nadella wrote on X that the DeepSeek phenomenon was just an example of the Jevons paradox, writing, "As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of." OpenAI CEO Sam Altman tweeted a quote he attributed to Napoleon, writing, "A revolution can be neither made nor stopped. The only thing that can be done is for one of several of its children to give it a direction by dint of victories."
Yann LeCun, Meta's chief AI scientist, wrote on LinkedIn that DeepSeek's success is indicative of changing tides in the AI sector to favor open-source technology.
LeCun wrote that DeepSeek has profited from some of Meta's own technology, i.e., its Llama models, and that the startup "came up with new ideas and built them on top of other people's work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source."
Alexandr Wang, CEO of Scale AI, told CNBC last week that DeepSeek's last AI model was "earth-shattering" and that its R1 release is even more powerful.
"What we've found is that DeepSeek ... is the top performing, or roughly on par with the best American models," Wang said, adding that the AI race between the U.S. and China is an "AI war." Wang's company provides training data to key AI players including OpenAI, Google and Meta.
Earlier this week, President Donald Trump announced a joint venture with OpenAI, Oracle and SoftBank to invest billions of dollars in U.S. AI infrastructure. The project, Stargate, was unveiled at the White House by Trump, SoftBank CEO Masayoshi Son, Oracle co-founder Larry Ellison and OpenAI CEO Sam Altman. Key initial technology partners will include Microsoft, Nvidia and Oracle, as well as semiconductor company Arm. They said they would invest $100 billion to start and up to $500 billion over the next four years.
An interesting article about the development of DeepSeek R1
The AI community is abuzz over DeepSeek R1, a new open-source reasoning model.
The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI's ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost.
"This could be a truly equalizing breakthrough that is great for researchers and developers with limited resources, especially those from the Global South," says Hancheng Cao, an assistant professor in information systems at Emory University.
DeepSeek's success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China's AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.
To create R1, DeepSeek had to rework its training process to reduce the strain on its GPUs, a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products, according to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University.
DeepSeek R1 has been praised by researchers for its ability to tackle complex reasoning tasks, particularly in mathematics and coding. The model employs a "chain of thought" approach similar to that used by ChatGPT o1, which lets it solve problems by processing queries step by step.
Dimitris Papailiopoulos, principal researcher at Microsoft's AI Frontiers research lab, says what surprised him the most about R1 is its engineering simplicity. "DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness," he says.
DeepSeek has also released six smaller versions of R1 that are small enough to run locally on laptops. It claims that one of them even outperforms OpenAI's o1-mini on certain benchmarks."DeepSeek has largely replicated o1-mini and has open sourced it," tweeted Perplexity CEO Aravind Srinivas. DeepSeek did not reply to MIT Technology Review's request for comments.
Original Submission #1 Original Submission #2 Original Submission #3
Links |
printed from SoylentNews, DeepDive into DeepSeek on 2025-02-10 01:46:29