Stories
Slash Boxes
Comments

SoylentNews is people

posted by takyon on Tuesday November 13 2018, @09:26PM   Printer-friendly
from the good-on-paper dept.

Naples, Rome, Milan, Zen 4: An Interview with AMD CTO, Mark Papermaster

The goal of AMD's event in the middle of the fourth quarter of the year was to put into perspective two elements of AMD's strategy: firstly, its commitment to delivering a 7nm Vega based product by the end of the year, as the company promised in early 2018, but also to position its 7nm capabilities as some of the best by disclosing the layout of its next generation enterprise processor set to hit shelves in 2019. [...] We sat down with AMD's CTO, Mark Papermaster, to see if we could squeeze some of the finer details about both AMD's strategy and the finer points of some of the products from the morning sessions.

[...] Ian Cutress: Forrest explained on the stage that the datacenter of today is very different to the datacenter ten years ago (or even 3-5 years ago). What decisions are you making today to predict the datacenter of the future?

Mark Papermaster: We believe we will be positioned very well – it all ties back to my opening comments on Moore's Law. We all accept that the traditional Moore's Law is slowing down, and that while process does still matter you have to be agile about how you put the pieces together, otherwise you cannot win. We leveraged ourselves to have scalability in our first EPYC launch. We leveraged our ability in our chiplet approach here to combine really small 7nm CPU dies with tried and proven 14nm for the IO die. That modularity only grows in importance going forward. We've stated our case as to where we believe it is necessary to keep pace on a traditional Moore's Law growth despite the slowing of the process gains per node and the length of time between major semiconductor nodes. I think you'll see others adopt what we've done with the chiplet approach, and I can tell you we are committed.

[...] IC: Where does Rome sit with CCIX support?

MP: We didn't announce specifically those attributes beyond PCIe 4.0 today, but I can say we are a member of CCIX as we are with Gen Z. Any further detail there you will have to wait until launch. Any specific details about the speeds, feeds, protocols, are coming in 2019.

IC: There have been suggestions that because AMD is saying that Rome is coming in 2019 then that means Q4 2019.

MP: We're not trying to imply any specific quarter or time frame in 2019. If we look at today's event, it was timed it to launch our MI60 GPU in 7nm which is imminent. We wanted to really share with the industry how we've embraced 7nm, and preview what's coming out very soon with MI60, and really share our approach on CPU on Zen 2 and Rome. We're not implying any particular time in 2019, but we'll be forthcoming with that. Even though the GPU is PCIe 3.0 backwards compatible, it helps for a PCIe 4.0 GPU to have a PCIe 4.0 CPU to connect to!

[...] IC: One of the key aspects in AMD's portfolio is the Infinity Fabric, and with Rome you have stated that AMD is now on its second generation IF. Do you see an end in its ability to scale down in process node but also scale out to more chiplets and different IP?

MP: I don't see an end because the IF is made of both of Scalable Data Fabric and a Scalable Control Fabric. The SCF is the key to giving the modularity and that's an architectural product. With our SDF we are very confident on the protocols we developed. The SCF protocols are based on the rich history we have with HyperTransport and we are committed in it generationally to improve bandwidth and latency every generation. IF is important when it applies to on chip connectivity, but it can go chip to chip like we did with EPYC, and also with Vega Radeon Instinct in connecting GPU to GPU. For the chip to chip IF, you are also dependent on the package technology. We see tremendous improvements in package technology over the next five years.

See also: AMD Shows Off "Rome" Data Center CPU, Signs Amazon as Cloud Chip Customer

Previously: AMD Previews Zen 2 Epyc CPUs with up to 64 Cores, New "Chiplet" Design


Original Submission

Related Stories

AMD Previews Zen 2 Epyc CPUs with up to 64 Cores, New "Chiplet" Design 9 comments

AMD has announced the next generation of its Epyc server processors, with up to 64 cores (128 threads) each. Instead of an 8-core "core complex" (CCX), AMD's 64-core chips will feature 8 "chiplets" with 8 cores each:

AMD on Tuesday formally announced its next-generation EPYC processor code-named Rome. The new server CPU will feature up to 64 cores featuring the Zen 2 microarchitecture, thus providing at least two times higher performance per socket than existing EPYC chips.

As discussed in a separate story covering AMD's new 'chiplet' design approach, AMD EPYC 'Rome' processor will carry multiple CPU chiplets manufactured using TSMC's 7 nm fabrication process as well as an I/O die produced at a 14 nm node. As it appears, high-performance 'Rome' processors will use eight CPU chiplets offering 64 x86 cores in total.

Why chiplets?

Separating CPU chiplets from the I/O die has its advantages because it enables AMD to make the CPU chiplets smaller as physical interfaces (such as DRAM and Infinity Fabric) do not scale that well with shrinks of process technology. Therefore, instead of making CPU chiplets bigger and more expensive to manufacture, AMD decided to incorporate DRAM and some other I/O into a separate chip. Besides lower costs, the added benefit that AMD is going to enjoy with its 7 nm chiplets is ability to easier[sic] bin new chips for needed clocks and power, which is something that is hard to estimate in case of servers.

AMD also announced that Zen 4 is under development. It could be made on a "5nm" node, although that is speculation. The Zen 3 microarchitecture will be made on TSMC's N7+ process ("7nm" with more extensive use of extreme ultraviolet lithography).

AMD's Epyc CPUs will now be offered on Amazon Web Services.

AnandTech live blog of New Horizon event.

Previously: AMD Epyc 7000-Series Launched With Up to 32 Cores
TSMC Will Make AMD's "7nm" Epyc Server CPUs
Intel Announces 48-core Xeons Using Multiple Dies, Ahead of AMD Announcement

Related: Cray CS500 Supercomputers to Include AMD's Epyc as a Processor Option
Oracle Offers Servers with AMD's Epyc to its Cloud Customers


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2) by bzipitidoo on Tuesday November 13 2018, @11:10PM (13 children)

    by bzipitidoo (4388) Subscriber Badge on Tuesday November 13 2018, @11:10PM (#761493) Journal

    I've read that the Spectre bug won't be completely fixed on Zen+ AMD CPUs, have to wait until Zen 2.

    The way manufacturers have been handling this issue is sadly all too reminiscent of the infamous Pentium division bug. Claiming that it's not all that important, that their lame microcode bandages are good enough fixes, and it doesn't need to be completely fixed, etc. I wonder if AMD will keep their word about having Spectre completely fixed in Zen 2.

    I would certainly like a Zen based PC, preferably with AV1 decoding in hardware as well as the Vega graphics. But, only if there are good open graphics drivers for Vega. And I'd like fixes for all conceivable variants of Spectre.

    • (Score: 3, Interesting) by RamiK on Wednesday November 14 2018, @12:16AM (11 children)

      by RamiK (1813) on Wednesday November 14 2018, @12:16AM (#761523)

      If you're holding off for a speculative execution vulnerability fix, you should give up now. After Meltdown and Spectre there were a dozen other disclosures of varying natures and severity with some requiring firmware fixes and future hardware redesigns. As soon as they patch the L$, a TLB issue pops up. When they're done with that, something at the RAM level shows up...

      Having said that, waiting for PCIe 4 and the vega discretes that's to follow might be worth it. Also, looking at the ZhongShan Subor Z+, I'm quite curious about their next generation of APUs especially in laptops.

      --
      compiling...
      • (Score: 2) by takyon on Wednesday November 14 2018, @12:56AM (6 children)

        by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday November 14 2018, @12:56AM (#761537) Journal

        I'm quite curious about their next generation of APUs especially in laptops.

        These have been pretty neglected, with Raven Ridge coming out much later than desktop parts, and less cores.

        I'm looking forward to 6-8 cores (a single "chiplet") for laptop APUs, and maybe something on the sub 5 Watt end that could be used in fanless Chromebooks/tablets. I don't want to feed Intel any more $$$, and getting AMD to compete in all segments would make that a lot easier.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
        • (Score: 2) by RamiK on Wednesday November 14 2018, @01:43PM (5 children)

          by RamiK (1813) on Wednesday November 14 2018, @01:43PM (#761725)

          Personally I don't see the point of an x86 at that performance range. That kind of hardware can't do any useful client-side tasks other than simple spreadsheets and word processing and that much I can do with android as well.

          Moreover, the few things it can do, ARM does well as well.

          That leaves development and gaming. The former I've already relegated to server-side and could basically stick anything in there if it has the core count and RAM. For the latter any recent mid-range graphics card will satisfy considering the little I game is pixel art titles and that isn't about to change until some VR revolution will suck $10k out of my pocket in half a decade.

          So, personally, a good APU is what I'm after.

          --
          compiling...
          • (Score: 2) by takyon on Wednesday November 14 2018, @04:06PM (3 children)

            by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday November 14 2018, @04:06PM (#761766) Journal

            Personally I don't see the point of an x86 at that performance range. That kind of hardware can't do any useful client-side tasks other than simple spreadsheets and word processing and that much I can do with android as well.

            I prefer having the larger screen and keyboard. The devices are cheaper than the top-end Android smartphones that would rival them. If I was able to get an AMD APU Chromebook, it could potentially perform a lot better than what I have now.

            You wait 5 years or so, and any particular form factor is going to perform a lot better than it used to. My chip is from mid-2014 [notebookcheck.net]. I'll probably hold out until 2021 for hardware AV1 support and "7nm" with use of EUV.

            --
            [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
            • (Score: 2) by RamiK on Wednesday November 14 2018, @06:41PM (2 children)

              by RamiK (1813) on Wednesday November 14 2018, @06:41PM (#761835)

              I prefer having the larger screen and keyboard.

              I think you can attach a powered usb hub via an otg adapter to most smartphones and get the keyboard and mouse going that way. For the screen there's MHL cables (usb-to-hdmi) or casting which should suffice for casual word processing and video viewing.

              If I was able to get an AMD APU Chromebook, it could potentially perform a lot better than what I have now.

              The price will be at the Pixel range I'm afraid. Closest anyone got to that market was Intel with their Atom lines a few years ago. Apparently they gave up after not breaking even and only stuck to it to prevent ARM from entering the low-end laptop segment.

              You wait 5 years or so, and any particular form factor is going to perform a lot better than it used to.

              Not tablets and laptops. Not for the same price at the very least. Well, unless you consider putting windows 10 in an 8" tablet form an improvement... I, for once, don't.

              Btw, point and case regarding why you shouldn't hold off for a speculative vulnerabilities fix: https://www.zdnet.com/article/researchers-discover-seven-new-meltdown-and-spectre-attacks/ [zdnet.com]

              --
              compiling...
              • (Score: 2) by takyon on Wednesday November 14 2018, @07:45PM (1 child)

                by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday November 14 2018, @07:45PM (#761870) Journal

                My CB3-111 [notebookcheck.net] cost about $95 on Black Friday. It gets what I need it to do done and is quite portable. It has a dual-core Intel Celeron N2840, which is a 7.5 Watt Intel Atom chip from mid-2014. Intel has continued to update this line of chips, and the Intel Celeron N4000 from late 2017 is about 40% faster at a lower TDP and base clock. A pretty big improvement for less than 3 years.

                Considering that AMD has put out plenty of 15 Watt A4-series APUs for cheap laptops, I'm sure they could create a 6-7.5 Watt one intended for fanless laptops, using the improved power efficiency of the TSMC "7nm" process.

                Oh, and I said nothing about holding off for speculative execution fixes.

                --
                [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
                • (Score: 2) by RamiK on Wednesday November 14 2018, @10:56PM

                  by RamiK (1813) on Wednesday November 14 2018, @10:56PM (#761954)

                  It gets what I need it to do done and is quite portable...the Intel Celeron N4000 from late 2017 is about 40% faster at a lower TDP and base clock. A pretty big improvement for less than 3 years.

                  First of all, Black Friday or otherwise, the price doubled in-between the models so that's apples to oranges right there.

                  Secondly, it's not a big improvement. It's just scaling production nodes. In fact, it's quite poor compared to ARM's progress between 2014 and 2017.

                  Thirdly, the functionality neither increased nor expanded so it's hard to claim an improvement to the form factor. I mean, it's not like the machine suddenly stopped being a word processor and started being capable of some real world practical usage the previous iteration couldn't pull off. It won't edit videos. It won't run games. And considering web pages just got worse javascript and assets wise, I'd argue that 40% figure might actually not be enough to keep up and the real world usage experience only gotten worse.

                  Even Apple done better with their latest tablet by comparison: Their price is as ridiculous as ever but now it actually has the horse power to substitute laptops where it previously couldn't.

                  --
                  compiling...
          • (Score: 2) by takyon on Wednesday November 14 2018, @06:19PM

            by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday November 14 2018, @06:19PM (#761824) Journal

            I forgot to add: It would be nice to see more APUs with added High Bandwidth Memory.

            --
            [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 3, Interesting) by bzipitidoo on Wednesday November 14 2018, @12:46PM (3 children)

        by bzipitidoo (4388) Subscriber Badge on Wednesday November 14 2018, @12:46PM (#761717) Journal

        Well, yes, it looks impractical to hold off for Spectre fixes. Currently, if you want to be completely immune to Spectre, have to dig out 25 year old Pentiums, the last CPUs that did not have speculative execution. Those are of course impractically slow by today's standards. And as you say there are plenty of other bugs, like the hyperthreading problem in Skylake and Kaby Lake processors that was revealed a few months before news about Spectre broke.

        > waiting for PCIe 4 and the vega discretes that's to follow might be worth it.

        DDR5 would be nice to have as well.

        • (Score: 2) by RamiK on Wednesday November 14 2018, @02:00PM (2 children)

          by RamiK (1813) on Wednesday November 14 2018, @02:00PM (#761730)

          DDR5 would be nice to have as well.

          Yup.

          Currently, if you want to be completely immune to Spectre, have to dig out 25 year old Pentiums

          Itanium wasn't out-of-order so it shouldn't be affected by speculative execution vulnerabilities. MIPS is in-order too. I'm sure if you'd look hard enough you could find hardware equivalent to 10yr/old Intel CPUs that can work as a simple desktop or even a server. But nothing in production since most of those companies switched to ASIC cryptomining a few years ago.

          --
          compiling...
          • (Score: 2) by bzipitidoo on Thursday November 15 2018, @05:48AM (1 child)

            by bzipitidoo (4388) Subscriber Badge on Thursday November 15 2018, @05:48AM (#762060) Journal

            Out of order execution alone isn't speculative execution. Speculative execution is executing both continuations after a branch, then discarding whichever one was not taken. If no branch is involved. everything will eventually be executed in whatever order the CPU logic determines is best, but it will all be executed, there's no speculation.

            The root of the problem is that the checks for permission to access any memory that might be involved are performed after speculating, when it should be performed before doing any speculation. Obviously it's a big performance boost to delay such checks. Evidently, the designers thought they could get away with not ever having to make those checks at all if the code is not ultimately on the execution path. It's kind of like just letting someone log in and have a few seconds use of an account, before checking the password, then kicking them out if the password is wrong.

            • (Score: 2) by RamiK on Thursday November 15 2018, @05:34PM

              by RamiK (1813) on Thursday November 15 2018, @05:34PM (#762256)

              Out of order execution alone isn't speculative execution...

              That's irrelevant since the cache is unaffected by the mispredicts if you're waiting for the previous instruction to complete between dispatch and issue as you do in a in-order machine. That is, regardless if you're exploiting out-of-order execution (Meltdown) or speculative execution (Spectre), you still the machine to be an out-of-order.

              --
              compiling...
    • (Score: 0) by Anonymous Coward on Wednesday November 14 2018, @06:22PM

      by Anonymous Coward on Wednesday November 14 2018, @06:22PM (#761825)

      it would be nice if these spyware producers would let you buy a chip without their backdoors too. if everyone would just wait a year or two to buy any newish product and tell amd why, they would quit that shit real quick. too bad everyone makes excuses and willingly funds their own prison.

  • (Score: 0) by Anonymous Coward on Tuesday November 13 2018, @11:14PM (3 children)

    by Anonymous Coward on Tuesday November 13 2018, @11:14PM (#761494)

    After playing with the 2990wx for a bit I've discovered that even if you have 128 GB ram, you start running into lots of memory issues when you get up to 60 threads. Each task is then going to be limited to 1-2 gb, which will often mean you need to play games to get it all to fit or cut the cores.

    Now I have read in different places that the actual limit is 2 TB, 1 TB, or 256 GB. But I am pretty sure 128 GB is going to be the biggest kit usually available for a non-server. With 64 threads (or the upcoming 128) I can definitely see needing 1-2 TB of ram to get the best use from it.

    • (Score: 0) by Anonymous Coward on Tuesday November 13 2018, @11:22PM

      by Anonymous Coward on Tuesday November 13 2018, @11:22PM (#761496)

      It would apparently be ~$24k per TB, assuming there is motherboard and cpu support:
      https://www.amazon.com/MemoryMasters-Supermicro-MEM-DR412L-HL01-LR26-1x128GB-Reduced/dp/B07HKGGZDX/ [amazon.com]

    • (Score: 2) by takyon on Wednesday November 14 2018, @12:51AM (1 child)

      by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday November 14 2018, @12:51AM (#761531) Journal

      https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review [anandtech.com]

      For 32 cores, AMD takes the same 32-core EPYC silicon, but upgrades it to Zen+ on 12nm for a higher frequency and lower power. However, to make it socket compatible with the first generation, it is slightly neutered: we have to go back to four memory channels and 60 lanes of PCIe. AMD wants users to think of this as an upgraded first generation product, with more cores, rather than a cut enterprise part. The easy explanation is to do with product segmentation, a tactic both companies have used over time to offer a range of products.

      As a result, one way of visioning the new second generation 32-core and 24-core products is bi-modal: half the chip has access to the full resources, similar to the first generation product, while the other half of the chip doubles the same compute resources but has additional memory and PCIe latency compared to the first half. For any user that is entirely compute bound, and not memory or PCIe bound, then AMD has the product for you.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 1, Informative) by Anonymous Coward on Wednesday November 14 2018, @01:12AM

        by Anonymous Coward on Wednesday November 14 2018, @01:12AM (#761540)

        I'm not even talking about that. I am saying that when you want to do a bunch of stuff in parallel you need to allocate memory to each individual process since they dont see what each other sees (for the most part: with UNIX, ie non-windows, systems you can use fork() to let each process share the same ram as long as nothing changed from the fork point).

        So lets say your task needs to manipulate 3 gb of data, to do it in serial would take only that 3 gb but in parallel will be nCores*3 gb. So you won't be using more than ~40 threads for that (if limited to 128 gb of ram).

        As an example I just ran something with 48 cores that took ~100 gb of ram in 119 min. Then I tested 32 cores on the same thing, which took about ~60 gb in 137 min.

  • (Score: 2) by takyon on Wednesday November 14 2018, @12:53AM (3 children)

    by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday November 14 2018, @12:53AM (#761534) Journal
    • (Score: 0) by Anonymous Coward on Wednesday November 14 2018, @02:13AM (2 children)

      by Anonymous Coward on Wednesday November 14 2018, @02:13AM (#761547)

      Holy crap, does that say 64 cores at 235 GHz?

  • (Score: 2) by Bot on Wednesday November 14 2018, @09:52AM

    by Bot (3902) on Wednesday November 14 2018, @09:52AM (#761673) Journal

    So when Rome boards show half the advertised speed, Naples boards refuse to wake up before 11AM and Milan boards emit a nasty smell of plastics, the repairman will reply "Well, what else did you expect?".
    Remember to call the liquid cooled one Venice and watch out for the periodical leaks.

    --
    Account abandoned.
(1)