Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 9 submissions in the queue.
posted by hubie on Sunday February 02, @08:08AM   Printer-friendly

How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI

https://www.wired.com/story/deepseek-china-model-ai/
https://web.archive.org/web/20250125102155/https://www.wired.com/story/deepseek-china-model-ai/

On January 20, DeepSeek, a relatively unknown AI research lab from China, released an open source model that's quickly become the talk of the town in Silicon Valley. According to a paper authored by the company, DeepSeek-R1 beats the industry's leading models like OpenAI o1 on several math and reasoning benchmarks. In fact, on many metrics that matter—capability, cost, openness—DeepSeek is giving Western AI giants a run for their money.

https://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/

The releases immediately caught the attention of the AI community because most existing open-weights models—which can often be run and fine-tuned on local hardware—have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models.

https://github.com/deepseek-ai/DeepSeek-R1

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.

Check leaderboard and compare at Chatbot Arena: https://lmarena.ai/

China's DeepSeek AI Dethrones ChatGPT on App Store: Here's What You Should Know

China's DeepSeek AI dethrones ChatGPT on App Store: Here's what you should know:

Some American tech CEOs are clambering to respond before clients switch to potentially cheaper offerings from DeepSeek, with Meta reportedly starting four DeepSeek-related "war rooms" within its generative AI department.

Microsoft CEO Satya Nadella wrote on X that the DeepSeek phenomenon was just an example of the Jevons paradox, writing, "As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of." OpenAI CEO Sam Altman tweeted a quote he attributed to Napoleon, writing, "A revolution can be neither made nor stopped. The only thing that can be done is for one of several of its children to give it a direction by dint of victories."

Yann LeCun, Meta's chief AI scientist, wrote on LinkedIn that DeepSeek's success is indicative of changing tides in the AI sector to favor open-source technology.

LeCun wrote that DeepSeek has profited from some of Meta's own technology, i.e., its Llama models, and that the startup "came up with new ideas and built them on top of other people's work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source."

Alexandr Wang, CEO of Scale AI, told CNBC last week that DeepSeek's last AI model was "earth-shattering" and that its R1 release is even more powerful.

"What we've found is that DeepSeek ... is the top performing, or roughly on par with the best American models," Wang said, adding that the AI race between the U.S. and China is an "AI war." Wang's company provides training data to key AI players including OpenAI, Google and Meta.

Earlier this week, President Donald Trump announced a joint venture with OpenAIOracle and SoftBank to invest billions of dollars in U.S. AI infrastructure. The project, Stargate, was unveiled at the White House by Trump, SoftBank CEO Masayoshi Son, Oracle co-founder Larry Ellison and OpenAI CEO Sam Altman. Key initial technology partners will include MicrosoftNvidia and Oracle, as well as semiconductor company Arm. They said they would invest $100 billion to start and up to $500 billion over the next four years.

How a top Chinese AI model overcame US sanctions

An interesting article about the development of DeepSeek R1

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model.

The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI's ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost.

"This could be a truly equalizing breakthrough that is great for researchers and developers with limited resources, especially those from the Global South," says Hancheng Cao, an assistant professor in information systems at Emory University.

DeepSeek's success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China's AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.

To create R1, DeepSeek had to rework its training process to reduce the strain on its GPUs, a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products, according to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University.

DeepSeek R1 has been praised by researchers for its ability to tackle complex reasoning tasks, particularly in mathematics and coding. The model employs a "chain of thought" approach similar to that used by ChatGPT o1, which lets it solve problems by processing queries step by step.

Dimitris Papailiopoulos, principal researcher at Microsoft's AI Frontiers research lab, says what surprised him the most about R1 is its engineering simplicity. "DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness," he says.

DeepSeek has also released six smaller versions of R1 that are small enough to run locally on laptops. It claims that one of them even outperforms OpenAI's o1-mini on certain benchmarks."DeepSeek has largely replicated o1-mini and has open sourced it," tweeted Perplexity CEO Aravind Srinivas. DeepSeek did not reply to MIT Technology Review's request for comments.

MIT Technology Review


Original Submission #1Original Submission #2Original Submission #3

This discussion was created by hubie (1068) for logged-in users only. Log in and try again!
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Insightful) by gnuman on Sunday February 02, @09:37AM (3 children)

    by gnuman (5013) on Sunday February 02, @09:37AM (#1391250)

    How will China get ahead of US? With effort and *because* of the sanctions...

    If you want Chinese products to be behind, you allow them to buy whatever they want. Even top of the line stuff, doesn't matter. Because as long as they are using what others are using, the playing field is more or less level. And people that like to "catch up" are always copying things instead of making their own, they are perpetually behind. Cutting corners by copying is NOT how you get ahead. There is no danger of someone copying you, for them to get ahead. They will always be behind by definition. So, the answer is to drop these idiotic restrictions, and allow Chinese companies to buy all they want. Isn't Trump obsessed with the stupid trade deficit anyway?

    If we need analogies, here are some.

    You afraid of someone copying your test answers because they could "get ahead"? Copying doesn't give you understanding... And even teaching them gives the teacher better understanding and drives innovation.

    Are you afraid of someone competing with your OS because of emulator/simulator/whatever? Like Wine? Or is it when someone makes competing system that you lose?

    Sell the Chinese almost everything they want, because those are known quantities. They are not inventing anything this way. You only lose if you stop inventing new things. Forcing China to invent new things is exactly how you produce a competitor that may just out-compete you. And when you start copying the Chinese, you know you are behind and will never catch up.

    • (Score: 2) by c0lo on Sunday February 02, @09:49AM

      by c0lo (156) Subscriber Badge on Sunday February 02, @09:49AM (#1391251) Journal

      You only lose if you stop inventing new things.

      Well, playing NYC realtor with your friends and allies is an invention in the world of geopolitics - I don't know however if you aren't going to lose anyway.

      In other words, inventing is necessary to avoid losing, but not sufficient. Seems to be the case for both China and US ATM.

      --
      https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 3, Interesting) by Mojibake Tengu on Sunday February 02, @11:03AM

      by Mojibake Tengu (8598) on Sunday February 02, @11:03AM (#1391253) Journal

      American sanctions not only were completely useless, but also came completely late.

      DeepSeek runs its learning on Huawei Atlas servers. Kunpeng architecture, with 8x8 Huawei Ascend 910 (4 NPU cores) array per one rack drawer.
      For serious industrial deduction only, weaker Ascend 310 are more than enough.

      Those machines are actually 2020 design, at least by marketing papers I have archived since then. I vaguely remember I already mentioned them here on SoylentNews years ago, in my note related to SUSE not following sanctions, making base system Linux for Kunpeng.
      The Chinese (Alibaba, Tencent, Youku) used those machines for rendering dozens of Donghua (Chinese 3D Anime) series and movies for all the years of Covid19 incident period.

      What DeepSeek actually did anew, the Academy people just took those (old?) machines and started to use them for some serious stuff.

      That means, sanctioning by NVIDIA was completely useless, only damaged NVIDIA's market value (and their dumb investors) and revealed fun fact the CIA understands shit nothing about current Chinese technology.

      --
      Rust programming language offends both my Intelligence and my Spirit.
    • (Score: 3, Insightful) by Anonymous Coward on Sunday February 02, @01:01PM

      by Anonymous Coward on Sunday February 02, @01:01PM (#1391255)

      > How will China get ahead of US? With effort and *because* of the sanctions...

      I agree with your argument, it's the same in motor racing where the winning margin is often tiny ("in the noise") and the temptation to copy what others are doing is high. There is even a good dose of superstition, completely non-scientific reasoning, on the part of the "follower" teams. The best teams (imo) don't do a lot of copying and are at the top because they do the work and perhaps most importantly do their own thinking.

      The place where your argument may fall apart is that you haven't taken the last step in:
      1. --
      2. --
      3. ?????
      4. Profit!
      Here in the US, profit rarely counts unless it's in the most recent quarter. Long term performance seems to mean next to nothing to the MBA class. This short term thinking is what promotes short term solutions like the sanctions.

  • (Score: 1, Interesting) by Anonymous Coward on Monday February 03, @01:08PM

    by Anonymous Coward on Monday February 03, @01:08PM (#1391363)

    DeepSeek, a relatively unknown AI research lab from China, released an open source model

    Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models.

    DeepSeek has largely replicated o1-mini and has open sourced it

    I like this title: https://gizmodo.com/openai-claims-deepseek-plagiarized-its-plagiarism-machine-2000556339 [gizmodo.com]

    OpenAI Claims DeepSeek Plagiarized Its Plagiarism Machine

    🤣

    I actually do have some content on the Internet (youtube, code etc). If my content is going to be plagiarized anyway, I think I'd rather it become open-sourced and make life more difficult for the corporations and billionaires trying to charge subscriptions for plagiarized stuff.

    p.s. To those who keep saying "AIs are just doing what humans are doing". Nope, wrong. Most people don't redistribute the content to the public (much less try to charge subscriptions for it). There are lots of people with very good memories. Few authors have problems with them reading books and remembering everything. BUT if they write the stuff down from memory and redistribute it to the public without proper licensing/permission they could still be liable for copyright infringement. And in many cases it doesn't have to be a 100% copy for it to be infringement. In at least one case it's photo vs sculpture: https://en.wikipedia.org/wiki/Rogers_v._Koons [wikipedia.org]

(1)