SoylentNews
SoylentNews is people
https://soylentnews.org/

Title    Microsoft and Nvidia Create 105-Layer, 530 Billion Parameter Language Model That Needs 280 A100 GPUs
Date    Tuesday October 12 2021, @09:42PM
Author    martyb
Topic   
from the dept.
https://soylentnews.org/article.pl?sid=21/10/12/2254224

takyon writes:

Microsoft and Nvidia create 105-layer, 530 billion parameter language model that needs 280 A100 GPUs, but it's still biased

Nvidia and Microsoft have teamed up to create the Megatron-Turing Natural Language Generation model, which the duo claims is the "most powerful monolithic transformer language model trained to date".

The AI model has 105 layers, 530 billion parameters, and operates on chunky supercomputer hardware like Selene. By comparison, the vaunted GPT-3 has 175 billion parameters.

"Each model replica spans 280 NVIDIA A100 GPUs, with 8-way tensor-slicing within a node, and 35-way pipeline parallelism across nodes," the pair said in a blog post.

[...] However, the need to operate with languages and samples from the real world meant an old problem with AI reappeared: Bias. "While giant language models are advancing the state of the art on language generation, they also suffer from issues such as bias and toxicity," the duo said.

Related: OpenAI's New Language Generator GPT-3 is Shockingly Good
A College Student Used GPT-3 to Write a Fake Blog Post that Ended Up at the Top of Hacker News
A Robot Wrote This Entire Article. Are You Scared Yet, Human?
OpenAI's Text-Generating System GPT-3 Is Now Spewing Out 4.5 Billion Words a Day


Original Submission

Links

  1. "takyon" - https://soylentnews.org/~takyon/
  2. "Microsoft and Nvidia create 105-layer, 530 billion parameter language model that needs 280 A100 GPUs, but it's still biased" - https://www.zdnet.com/article/microsoft-and-nvidia-create-105-layer-530-billion-parameter-language-model-that-needs-280-a100-gpus-and-its-biased/
  3. "chunky supercomputer hardware" - https://www.zdnet.com/article/how-nvidia-built-selene-the-worlds-seventh-fastest-computer-in-three-weeks/
  4. "vaunted GPT-3" - https://www.zdnet.com/article/what-is-gpt-3-everything-business-needs-to-know-about-openais-breakthrough-ai-language-program/
  5. "blog post" - https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
  6. "old problem with AI reappeared" - https://www.zdnet.com/article/what-is-bias-in-ai-really-and-why-cant-ai-neutralize-it/
  7. "OpenAI's New Language Generator GPT-3 is Shockingly Good" - https://soylentnews.org/article.pl?sid=20/08/14/012235
  8. "A College Student Used GPT-3 to Write a Fake Blog Post that Ended Up at the Top of Hacker News" - https://soylentnews.org/article.pl?sid=20/08/18/031243
  9. "A Robot Wrote This Entire Article. Are You Scared Yet, Human?" - https://soylentnews.org/article.pl?sid=20/09/10/1659234
  10. "OpenAI's Text-Generating System GPT-3 Is Now Spewing Out 4.5 Billion Words a Day" - https://soylentnews.org/article.pl?sid=21/03/30/0336258
  11. "Original Submission" - https://soylentnews.org/submit.pl?op=viewsub&subid=51906

© Copyright 2025 - SoylentNews, All Rights Reserved

printed from SoylentNews, Microsoft and Nvidia Create 105-Layer, 530 Billion Parameter Language Model That Needs 280 A100 GPUs on 2025-03-24 00:46:27