Qualcomm Launches 48-core Centriq for $1995: Arm Servers for Cloud Native Applications
Following on from the SoC disclosure at Hot Chips, Qualcomm has this week announced the formal launch of its new Centriq 2400 family of Arm-based SoCs for cloud applications. The top processor is a 48-core, Arm v8-compliant design made using Samsung's 10LPE FinFET process, with 18 billion transistors in a 398mm2 design. The cores are 64-bit only, and are grouped into duplexes – pairs of cores with a shared 512KB of L2 cache, and the top end design will also have 60 MB of L3 cache. The full design has 6 channels of DDR4 (Supporting up to 768 GB) with 32 PCIe Gen 3.0 lanes, support for Arm Trustzone, and all within a TDP of 120W and for $1995.
We covered the design of Centriq extensively in our Hot Chips overview, including the microarchitecture, security and new power features. What we didn't know were the exact configurations, L3 cache sizes, and a few other minor details. One key metric that semiconductor professionals are interested in is the confirmation of using Samsung's 10LPE process, which Qualcomm states gave them 18 billion transistors in a 398mm2 die (45.2MTr/mm2). This was compared to Intel's Skylake XCC chip on 14nm (37.5MTr/mm2, from an Intel talk), but we should also add in Huawei's Kirin 970 on TSMC 10nm (55MTr/mm2).
Previously: Qualcomm's Centriq 2400 Demoed: A 48-Core ARM SoC for Servers
Related Stories
Qualcomm this month demonstrated its 48-core Centriq 2400 SoC in action and announced that it had started to sample its first server processor with select customers. The live showcase is an important milestone for the SoC because it proves that the part is functional and is on track for commercialization in the second half of next year.
Qualcomm announced plans to enter the server market more than two years ago, in November 2014, but the first rumors about the company's intentions to develop server CPUs emerged long before that. In fact, being one of the largest designers of ARM-based SoCs for mobile devices, Qualcomm was well prepared to move beyond smartphones and tablets. However, while it is not easy to develop a custom ARMv8 processor core and build a server-grade SoC, building an ecosystem around such chip is even more complicated in a world where ARM-based servers are typically used in isolated cases. From the very start, Qualcomm has been rather serious not only about the processors themselves but also about the ecosystem and support by third parties (Facebook was one of the first companies to support Qualcomm's server efforts). In 2015, Qualcomm teamed up with Xilinx and Mellanox to ensure that its server SoCs are compatible with FPGA-based accelerators and data-center connectivity solutions (the fruits of this partnership will likely emerge in 2018 at best). Then it released a development platform featuring its custom 24-core ARMv8 SoC that it made available to customers and various partners among ISVs, IHVs and so on. Earlier this year the company co-founded the CCIX consortium to standardize various special-purpose accelerators for data-centers and make certain that its processors can support them. Taking into account all the evangelization and preparation work that Qualcomm has disclosed so far, it is evident that the company is very serious about its server business.
From the hardware standpoint, Qualcomm's initial server platform will rely on the company's Centriq 2400-series family of microprocessors that will be made using a 10 nm FinFET fabrication process in the second half of next year. Qualcomm does not name the exact manufacturing technology, but the timeframe points to either performance-optimized Samsung's 10LPP or TSMC's CLN10FF (keep in mind that TSMC has a lot of experience fabbing large chips and a 48-core SoC is not going to be small). The key element of the Centriq 2400 will be Qualcomm's custom ARMv8-compliant 64-bit core code-named Falkor. Qualcomm has yet has to disclose more information about Falkor, but the important thing here is that this core was purpose-built for data-center applications, which means that it will likely be faster than the company's cores used inside mobile SoCs when running appropriate workloads.
Here's an older article about Qualcomm's ARM server efforts.
(Score: 0) by Anonymous Coward on Thursday November 16 2017, @01:24AM (2 children)
And does it contain an unlocked stage0 bootloader, or is it locked down similiar to modern OEM windows systems with something akin to SecureBoot even though it could have been made fully optional for the plebs, but left unlocked/self-configured for the security minded users.
Additionally: How does the TALOS II compare to this chip/system? The prices look to be in a similiar ballpark, but this chip has 6 times the cores compared to a dual processor TALOS II.
(Score: 1, Interesting) by Anonymous Coward on Thursday November 16 2017, @02:38AM
Trust zone isn't like IME. If you are in control of trust zone, it is a good thing. If some phone vendor uses it to protect DRM, it is a bad thing. I have dev boards where I am in control of trust zone.
(Score: 3, Informative) by TheRaven on Thursday November 16 2017, @09:03AM
sudo mod me up
(Score: 4, Insightful) by bob_super on Thursday November 16 2017, @01:47AM (10 children)
They run at over 2GHz, decently priced for that market.
As more and more tools get ported to ARM, and more and more people realize that their phones are running apps quite competently on ARM, Intel is potentially going to get a run for its money on the desktop too.
That's an interesting development, after all those years of cheering for all the much better architectures, which got crushed by x86 for lack of SW support...
(Score: 3, Interesting) by takyon on Thursday November 16 2017, @01:51AM
Epyc 2 (AMD x86 Ryzen for servers) could ship with 64 cores.
https://hothardware.com/news/amd-epyc-2-64-cores-128-threads-and-256mb-l3-cache [hothardware.com]
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by c0lo on Thursday November 16 2017, @02:38AM (5 children)
The "we should also add in Huawei's Kirin 970 on TSMC 10nm" at the end of TFS.
Made me curious and I got this [digitaltrends.com]:
- the designer - Huawei [wikipedia.org] - Chinese. Largest telecommunications equipment manufacturer in the world.
- the foundry - TMSC [wikipedia.org] - Chinese. Largest dedicated independent semiconductor foundry
Ooopsie. Mandarin, do you speak it...?
https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
(Score: 2) by takyon on Thursday November 16 2017, @02:47AM (4 children)
1. TSMC, not TMSC. It's a mistake many have made, including me.
2. TSMC is Taiwanese rather than Chinese. A distinction that might help you avoid getting beat up by a Taiwanese nationalist.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 2) by c0lo on Thursday November 16 2017, @03:40AM
Mmm... the "for now" qualifier seems needed.
Uhhh... again a matter of choice, being beaten by a Taiwanese nationalist or facing jail in the mainland China [thediplomat.com]?
'Cause the latter is quite strong-minded [thediplomat.com] in regards with the "one country, two systems".
https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
(Score: 1, Disagree) by WillR on Thursday November 16 2017, @03:16PM (2 children)
...where they speak Cantonese. Now you have two languages to learn.
(Score: 2) by bob_super on Thursday November 16 2017, @05:28PM (1 child)
That Mandarin- and Taiwanese-speaking Taiwanese nationalist is very disappointed, though not entirely surprised, at your ignorance.
(Score: 1) by WillR on Friday November 17 2017, @10:27PM
(Score: 2) by JoeMerchant on Thursday November 16 2017, @03:47AM (2 children)
So, this doesn't seem to compare well with a rack full of Raspberry Pis - granted there are plenty of applications that will run better on the Qualcomm configuration, but if you're looking at flops per dollar, or fault tolerance, it would seem like a re-spin of the basic Raspberry Pi concept (but with a respectable GHz network connection this time) could get you there with just a slightly larger equipment rack.
🌻🌻 [google.com]
(Score: 4, Insightful) by TheRaven on Thursday November 16 2017, @09:19AM (1 child)
sudo mod me up
(Score: 3, Interesting) by JoeMerchant on Thursday November 16 2017, @12:43PM
As I mentioned, the Pi's I/O is lame - but the basic approach of a rack full of inexpensive ARM systems would still seem to address a lot of needs, even and perhaps especially cloud needs without resorting to the BIG IRON.
IBM tried making BIG IRON systems for the cloud about 10 years ago (ahead of their time, right?) Their pitch was to create an (apparently) "overpowered" system which they would divvy up with hypervisor and then lease you the capacity you needed out of it. The appeal was supposed to be that the hardware would "scale" with demand and you only paid for what you used.
We did an analysis of those IBM systems vs a rack of Mac Pros, and, in the end, if you were really using all the capacity of the system, the IBM was coming in at roughly 25% of the flops per dollar of dedicated Intel cores.
So much cloud is just a stone simple app accessing a tiny sliver of data, scaled up to many users accessing many different slivers of data - things that one Pi could probably service many many active users simultaneously without a struggle. Instead of scaling by having a massively powerful chip that you slice up into thousands of services, the approach of having a modest unit that you replicate thousands of times gives you the ability to match capacity to need much more closely.
Of course, the sales appeal is: I can't predict my need, it could be MASSIVE - to which the BIG IRON sellers reply "don't worry, we've got you covered." But a system of many little workers doing many little jobs can scale up too. Have you ever seen a sweatshop full of laborers? There's many good reasons businesses use that model, and CPUs don't need healthcare coverage.
🌻🌻 [google.com]