Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by martyb on Tuesday March 17 2020, @02:36AM   Printer-friendly
from the more-in-less-more-or-less dept.

Marvell Announces ThunderX3: 96 Cores & 384 Thread 3rd Gen Arm Server Processor

The Arm server ecosystem is well alive and thriving, finally getting into serious motion after several years of false-start attempts. Among the original pioneers in this space was Cavium, which went on to be acquired by Marvell in 2018. Among the company's server CPU products is the ThunderX line; while the first generation ThunderX left quite a lot to be desired, the ThunderX2 was the first Arm server silicon that we deemed viable and competitive against Intel and AMD products. Since then, the ecosystem has accelerated quite a lot, and only last week we saw how impressive the new Amazon Graviton2 with the N1 chips ended up. Marvell didn't stop at the ThunderX2, and had big ambitions for its newly acquired CPU division, and today is announcing the new ThunderX3.

The ThunderX3 is a continuation and successor to then-Cavium's custom microarchitecture found in the TX2, adopting a lot of the key characteristics, most notably the capability of 4-way SMT. Adopting a new microarchitecture with higher IPC capabilities, the new TX3 also ups the clock frequencies, and now hosts up to a whopping 96 CPU cores, allowing the chip to scale up to 384 threads in a single socket.

Related: Marvell Technology to Buy Cavium for $6 Billion
ARM "Project Trillium", Cambricon MLU-100, and Cavium ThunderX2
HPE Delivers World's Largest Arm Supercomputer for U.S. Department of Energy
Ampere Launches its First ARM-Based Server Processors in Challenge to Intel
Amazon Announces 64-core Graviton2 Arm CPU
80-Core Arm CPU To Bring Lower Power, Higher Density To A Rack Near You


Original Submission

Related Stories

Marvell Technology to Buy Cavium for $6 Billion 6 comments

Marvell is buying Cavium. Both are "fabless" semiconductor manufacturers:

Chipmaker Marvell Technology Group Ltd (MRVL.O) said it would buy smaller rival Cavium Inc (CAVM.O) in a $6 billion deal, as it seeks to expand its wireless connectivity business in a fast consolidating semiconductor industry.

[...] Hamilton, Bermuda-based Marvell makes chips for storage devices while San Jose, California-based Cavium builds network equipment. "With Marvell facing secular challenges on its core chip business, this acquisition is a smart strategic move which puts the company in a stronger competitive position for the coming years," said GBH Insights analyst Daniel Ives.

Marvell, which has been trying to diversify from its storage devices business, had come under pressure from Starboard Value LP last year, when the activist investor called the company undervalued. "This is an exciting combination of two very complementary companies that together equal more than the sum of their parts," Marvell's Chief Executive Matt Murphy said in a statement.

Also at Ars Technica.

Related: HPC Chips Abound


Original Submission

ARM "Project Trillium", Cambricon MLU-100, and Cavium ThunderX2 2 comments

ARM Details "Project Trillium" Machine Learning Processor Architecture

[ARM has detailed] more of the architecture of what Arm now seems to more consistently call their "machine learning processor" or MLP from here on now. The MLP IP started off a blank sheet in terms of architecture implementation and the team consists of engineers pulled off from the CPU and GPU teams.

With the MLP Arm set out to provide three key aspects that are demanded in machine learning IPs: Efficiency of convolutional computations, efficient data movement, and sufficient programmability. From a high level perspective the MLP seems no different than many other neural network accelerator IPs out there. It still has a set of MAC engines for the raw computational power, while offering some sort of programmable control flow block alongside a sufficiently robust memory subsystem.

HPE Delivers World’s Largest Arm Supercomputer for U.S. Department of Energy 12 comments

HPE is building the world's first petascale supercomputer powered by ARM processors. It will reach 2.3 petaflops of peak performance:

PALO ALTO, Calif., June 18, 2018 – Hewlett Packard Enterprise (HPE) today announced its collaboration with Sandia National Laboratories and the U.S. Department of Energy (DOE) to deliver the world's largest Arm supercomputer. As part of the Vanguard program, Astra, the new Arm-based system, will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.

[...] Astra will be deployed at Sandia National Laboratories and will run on the HPE Apollo 70. This purpose-built HPC platform is based on the Cavium ThunderX2 Arm processor. Astra is comprised of over 145,000 cores in 2,592 dual-processor servers and offers greater density with four compute nodes in a 2U form factor.

The supercomputer will draw 1.2 MW, giving a possible efficiency of 1.92 gigaflops per Watt. That's only good enough to put it around #131 on the November 2017 Green500 list (the top 5 systems exceed 14 gigaflops per Watt).


Original Submission

Ampere Launches its First ARM-Based Server Processors in Challenge to Intel 20 comments

Submitted via IRC for takyon

Ampere is launching two versions of its first ARM-based 64-bit server processor today in a challenge to Intel's dominance of data center chips.

Intel dominates about 99 percent of the server chip market with its x86-based processors, but Ampere is targeting power-efficient, high-performance, and high-memory capacity features with its Ampere eMAG processors for data centers.

Renee James, former president of Intel and CEO of Ampere, said in an interview with VentureBeat that customers can now order the chip from the company's website. The chips are aimed at hyperscale cloud and edge computing, using the ARMv8-A cores. The chips target big data and in-memory databases.

[...] Based on the SPECint benchmark performance, Ampere's eMAG processor can deliver about twice the performance of the Intel Xeon Gold 6130 processor at about the same price, the company said. The eMAG with 32 cores and 3.3 Ghz in performance will sell for $850, and with 16 cores at 3.3 GHz will sell for $550.

[...] Ampere designed its cores, which feature eight DDR4-2667 memory controllers, 42 lanes of PCIe 3.0 for high bandwidth I/O, 125W TDP for maximum power efficiency, and a 16-nanometer FinFET manufacturing process at contract manufacturer TSMC.

Source: https://venturebeat.com/2018/09/18/ampere-launches-its-first-arm-based-server-processors-in-challenge-to-intel/

Previously: Former Intel President Launches New Chip Company With Backing From Carlyle Group


Original Submission

Amazon Announces 64-core Graviton2 Arm CPU 14 comments

Amazon Announces Graviton2 SoC Along With New AWS Instances: 64-Core Arm With Large Performance Uplifts

The new Graviton2 SoC is a custom design by Amazon's own in-house silicon design teams and is a successor to the first-generation Graviton chip. The new chip quadruples the core count from 16 cores to 64 cores and employs Arm's newest Neoverse N1 cores. Amazon is using the highest performance configuration available, with 1MB L2 caches per core, with all 64 cores connected by a mesh fabric supporting 2TB/s aggregate bandwidth as well as integrating 32MB of L3 cache.

Amazon claims the new Graviton2 chip is[sic] can deliver up to 7x higher performance than the first generation based A1 instances in total across all cores, up to 2x the performance per core, and delivers memory access speed of up to 5x compared to its predecessor. The chip comes in at a massive 30B transistors on a 7nm manufacturing node - if Amazon is using similar high density libraries to mobile chips (they have no reason to use HPC libraries), then I estimate the chip to fall around 300-350mm² if I was forced to put out a figure.

The memory subsystem of the new chip is supported by 8 DDR4-3200 channels with support for hardware AES256 memory encryption. Peripherals of the system are supported by 64 PCIe4 lanes.


Original Submission

80-Core Arm CPU To Bring Lower Power, Higher Density To A Rack Near You 9 comments

Arthur T Knackerbracket has found the following story:

ARM CPU vendor Ampere announced an 80-core CPU called the Altra on Tuesday. If the core count didn't clue you in already, the Altra is aimed at data-center computing rather than home or even typical business needs. The Altra's 80 cores do not offer hyperthreading, so 80 cores here means 80 threads as well.

Before we go into too much detail about the Altra—which is currently sampling but is not yet generally available and does not have any third-party benchmarks—it's instructive to take a look slightly backward to its little sibling, the 32-core eMAG 8180.

The Altra is not Ampere's first entry into data-center ARM computing. Its last processor, the eMAG 8180, is a 32-core part running at up to 3.3GHz turbo. The eMAG 8180 is available in packet.net's c2.large.arm package, in the form of Lenovo's ThinkSystem HR330A 1u single-socket systems.

Kinvolk, a Berlin-based Linux development company, did some pretty extensive benchmarking of a single-socket eMAG 8180 system—comparing it to a 24-core AMD Epyc 7401P (24c/48t) and a dual-socket Xeon Gold 5120 (28c/56t total).

[...] Like the eMAG, the Altra does not offer SMT (Simultaneous Multi Threading), so its 80 cores mean 80 threads. Unlike the eMAG, the Altra is designed for either single or dual-socket operation—so we can expect to see 160-core Altra-powered systems later in 2020. We know that there will be multiple SKUs, with a TDP range the data sheet specifies at 45W to 210W. But we don't know their individual details.

Ampere Announces Altra ARM CPUs with Up to 80 Cores, Going to 128 Cores by 2021 41 comments

Ampere's Product List: 80 Cores, up to 3.3 GHz at 250 W; 128 Core in Q4

The Ampere Altra range, as part of today's release, will offer parts from 32 cores up to 80 cores, up to 3.3 GHz, with a variety of TDPs up to 250 W. As we've described in our previous news items on the chip, this is an Arm v8.2 core with a few 8.3+8.5 features, offers support for FP16 and INT8, supports 8 channels of DDR4-3200 ECC at 2 DIMMs per channel, and up to 4 TiB of memory per socket in a 1P or 2P configuration. Each CPU will offer 128 PCIe 4.0 lanes, 32 of which can be used for socket-to-socket communications implemented with the CCIX protocol over PCIe. This means 50 GB/s in each direction, and 192 PCIe 4.0 lanes in a dual socket system for add-in cards. Each of the PCIe lanes can bifurcate down to x2.

[...] Previously Ampere had stated they were going for 80 cores at 3.0 GHz at 210 W, however the Q80-33 is pushing that frequency another 300 MHz for another 40 W, and we understand that the tapeout of silicon from TSMC performed better than expected, hence this new top processor.

[...] If that wasn't enough, Ampere dropped a sizeable nugget into our pre-announcement briefing. The company is set to launch a 128-core version of Altra later this year.

This will be a new silicon design, beyond Ampere's initial layout of 80 cores for Altra, however Ampere states that while they are using the same platform as the regular Altra, they have done extensive tweaking and optimizations within the mesh interconnect for Altra Max to hide the additional contention that might occur when using the same main memory speeds.

Altra Max will be socket and pin-compatible with Altra, also support dual socket deployments, and Ampere states that the silicon will be ready for early sampling with partners in Q4, and is looking to move into high volume in mid-2021.

Previously: Ampere Launches its First ARM-Based Server Processors in Challenge to Intel
80-Core Arm CPU To Bring Lower Power, Higher Density To A Rack Near You

Related: Amazon Announces 64-core Graviton2 Arm CPU
Marvell Announces ThunderX3, an ARM Server CPU With 96 Cores, 384 Threads
AMD and Intel Have a Formidable New Foe (Amazon)


Original Submission

Marvell ThunderX3 ARM Server CPU Will Have Up to 60 Cores Per Die, with 96-Core Dual-Die Option 7 comments

Hot Chips 2020: Marvell Details ThunderX3 CPUs - Up to 60 Cores Per Die, 96 Dual-Die in 2021

Today as part of HotChips 2020 we saw Marvell finally reveal some details on the microarchitecture of their new ThunderX3 server CPUs and core microarchitectures. The company had announced the existence of the new server and infrastructure processor back in March, and is now able to share more concrete specifications about how the in-house CPU design team promises to distinguish itself from the quickly growing competition that is the Arm server market.

[...] Marvell started off the HotChips presentation with a roadmap of its products, detailing that the ThunderX3 generation isn't merely just a single design, but actually represents a flexible approach using multiple dies, with the first generation 60-core CN110xx SKUs using a single die as a monolithic design in 2020, and next year seeing the release of a 96-core dual-die variant aiming for higher performance.

The use of a dual-die approach like this is very interesting as it represents a mid-point between a completely monolithic design, and a chiplet approach from vendors such as AMD. Each die here is identical in the sense that it can be used independently as standalone products.

Some details about the CPUs and the 4-way SMT were given in the presentation. TDPs will range from 100 Watts to 240 Watts.

Previously: Marvell Announces ThunderX3, an ARM Server CPU With 96 Cores, 384 Threads


Original Submission

Marvell Announces PCIe 5.0 SSD Controllers Capable of 14 GB/s Sequential Reads 10 comments

Marvell Announces First PCIe 5.0 NVMe SSD Controllers: Up To 14 GB/s

Today Marvell is announcing the first NVMe SSD controllers to support PCIe 5.0, and a new branding strategy for Marvell's storage controllers. The new SSD controllers are the first under the umbrella of Marvell's Bravera brand, which will also encompass HDD controllers and other storage accelerator products. The Bravera SC5 family of PCIe 5.0 SSD controllers will consist of two controller models: the 8-channel MV-SS1331 and the 16-channel MV-SS1333.

These new SSD controllers roughly double the performance available from PCIe 4.0 SSDs, meaning sequential read throughput hits 14 GB/s and random read performance of around 2M IOPS. To reach this level of performance while staying within the power and thermal limits of common enterprise SSD form factors, Marvell has had to improve power efficiency by 40% over their previous generation SSD controllers. That goes beyond the improvement that can be gained simply from smaller fab process nodes, so Marvell has had to significantly alter the architecture of their controllers. The Bravera SC5 controllers still include a mix of Arm cores (Cortex-R8, Cortex-M7 and a Cortex-M3), but now includes much more fixed-function hardware to handle the basic tasks of the controller with high throughput and consistently low latency.

Top-of-the-line PCIe 4.0 controllers from Phison and Silicon Motion are capable of 7.4 GB/s of sequential reads.

Related: Marvell Looking to Integrate Machine Learning Engines Onto SSD Controllers
Marvell Announces ThunderX3, an ARM Server CPU With 96 Cores, 384 Threads
Marvell ThunderX3 ARM Server CPU Will Have Up to 60 Cores Per Die, with 96-Core Dual-Die Option
Silicon Motion Launches PCIe 4.0 NVMe SSD Controllers


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 1, Funny) by Anonymous Coward on Tuesday March 17 2020, @03:24AM (1 child)

    by Anonymous Coward on Tuesday March 17 2020, @03:24AM (#972086)

    I am looking to replace my iPhone 4S.

  • (Score: 1, Funny) by Anonymous Coward on Tuesday March 17 2020, @04:35AM (3 children)

    by Anonymous Coward on Tuesday March 17 2020, @04:35AM (#972100)

    It's all very well have 96 cores and 384 threads but if the compiler hasn't optimized the idle loop for parallelism you're just wasting that many more cycles.

    • (Score: 2) by FunkyLich on Tuesday March 17 2020, @07:54AM (1 child)

      by FunkyLich (4689) on Tuesday March 17 2020, @07:54AM (#972119)

      Then, what is known on how it is done? I would think it would be similar to how it is in the countless processors present in all mobile devices. How does it work there?

      • (Score: 0) by Anonymous Coward on Tuesday March 17 2020, @12:12PM

        by Anonymous Coward on Tuesday March 17 2020, @12:12PM (#972146)

        https://www.theregister.co.uk/2018/12/18/arm_cortex_a65ae/ [theregister.co.uk]

        ARM is starting to do its own SMT, but others have had their own implementations. You can either use vanilla ARM designs or put in the work to customize them.

    • (Score: 2) by DannyB on Tuesday March 17 2020, @02:06PM

      by DannyB (5839) Subscriber Badge on Tuesday March 17 2020, @02:06PM (#972200) Journal

      It's all very well have 96 cores and 384 threads but if the compiler hasn't optimized the idle loop for parallelism you're just wasting that many more cycles.

      <almost-no-sarcasm>
      If a Java1 workload2 already can effectively use many cores and threads in order to print Hello World on Intel, then why would it be any different on ARM? There are languages and frameworks in use to help programmers exploit parallel programming. Why would the underlying principles be different for ARM servers?
      </almost-no-sarcasm>

      1actually, any language running on the JVM runtime
      2such as a Hello World [github.com] or FizzBuzz [github.com] program

      --
      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
  • (Score: 2) by epitaxial on Tuesday March 17 2020, @05:09PM (1 child)

    by epitaxial (3165) on Tuesday March 17 2020, @05:09PM (#972337)

    If people really cared about ARM processors in the datacenter then companies would already be offering them.

  • (Score: 2) by FunkyLich on Wednesday March 18 2020, @11:44AM (1 child)

    by FunkyLich (4689) on Wednesday March 18 2020, @11:44AM (#972717)

    I don't think hyperthreading is relevant and helpful in most of the cases. I never found the performance of a system with hyperthreading enabled to be higher compared when it was disabled, actually the contrary has been true. The first thing I do when setting up a server is disable the hyperthreading in the bios. I think that hyperthreading is a good marketing tool, but offers nothing to performance realistically.
    And with 96 cores in a single system, the need for hyperthreading just would become the more distant for me.

    • (Score: 2) by takyon on Wednesday March 18 2020, @12:04PM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday March 18 2020, @12:04PM (#972722) Journal

      Obviously, your experience does not apply to all workloads. Or the feature wouldn't be offered. Especially since they have gone for 4-way, which offers a smaller theoretical benefit over 2-way.

      Also, different implementations of simultaneous multithreading work at least slightly differently. Intel, AMD, Cavium/Marvell, IBM (which does 8-way SMT)...

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(1)