from the but-does-it-run...-OK,-you've-heard-it-before dept.
The TOP500 List of the world's fastest supercomputers for June 2015 has been released. China's Tianhe-2 remains the leader with 33.86 petaflops on the LINPACK benchmark. It has topped the list since June 2013. The only new supercomputer in the top 10 is the Shaheen II in Saudi Arabia, a 5.536 PFlop/s Cray XC40 system using 196,608 Intel Xeon E5-2698v3 cores.
The Platform has an analysis of the results. Although performance growth is slowing, pre-exascale supercomputers (100+ petaflops) can be expected within the next two to three years. The U.S. Department of Energy's Aurora supercomputer will deliver 180 petaflops of performance in 2018. Around the same time, the Summit supercomputer is expected to reach 150-300 petaflops while Sierra will reach 100+ petaflops. ~1 exaflop supercomputers are expected to appear around 2018-2022.
The June 2015 Green500 list ranking supercomputers by megaflops per watt will be available sometime later in the month. Here is the November 2014 Green500 list. The Piz Daint supercomputer appears within the top 10 on both lists.
Stats from the press release:
Although the United States remains the top country in terms of overall systems with 233, up from 231 six months ago and the same as in June 2014 and down from 265 on the November 2013 list. The U.S. is nearing its historical low number on the list. The number of European systems rose to 141, up from 130 on the last list, while the number of systems across Asia dropped to 108 from 120. The number of Chinese systems on the list also dropped to 37, compared to 61 last November, China has only half as many systems on the newest list as it did one year ago. Japan continues to increase its count on the list, claiming 39 spots this time, up from 32 last November. However, China's role in high performance computing is increasing in the manufacturing arena, with Lenovo now being counted among the vendors of systems on the TOP500 list. 3 new systems are solely attributed to Lenovo, while 20 systems previously listed as IBM are now labeled jointly between IBM and Lenovo.
Cray Inc., a company long associated with supercomputers, is on a resurgence and emerges in the latest list as the clear leader in performance, claiming a 24 percent share of installed total performance (up from 18.2 percent). IBM takes the second spot with a 22.2 percent share, down from 28 percent last November. On the latest edition of the list, the No. 500 system recorded a performance of 153.6 teraflops (trillions of calculations per second, 133.7 teraflop/s six months ago. The last system on the newest list was listed at position 421 in the previous TOP500. This represents the lowest turnover rate in the list in two decades.
- Total combined performance of all 500 systems has grown to 363 Pflop/s, compared to 309 Pflop/s last November and 274 Pflop/s one year ago. This increase in installed performance also exhibits a noticeable slowdown in growth compared to the previous long-term trend.
- There are 68 systems with performance greater than 1 petaflop/s on the list, up from 50 last November.
- A total of 88 systems on the list are using accelerator/co-processor technology, up from 75 on November 2014. Fifty-two (52) of these use NVIDIA chips, four use ATI Radeon, and there are now 33 systems with Intel MIC technology (Xeon Phi). Four systems use a combination of Nvidia and Intel Xeon Phi accelerators/co-processors.
- HP has the lead in the total number of systems with 178 (35.6 percent) compared to IBM with 111 systems (22.2 percent). Last November, HP had 179 systems and IBM had 153 systems. In the system category, Cray remains third with 71 systems (14.2 percent).
The Register's new sister site, The Platform, broke news of an upcoming 180 petaflops supercomputer named "Aurora" to be installed at the Argonne National Laboratory. The system will reportedly use 2.7x the power (from 4.8 megawatts to 13 megawatts) to deliver 18x the peak performance of Argonne's existing Mira supercomputer (more detail here).
Aurora will use Intel's upcoming 10nm "Knights Hill" Xeon Phi processors and a second-generation Omni-Path optical interconnect with far greater bandwidth than current designs. The storage capacity will exceed 150 petabytes. Cray Inc. will manufacture the system, which will cost $200 million and round out the CORAL trio of supercomputers, including the 150-300 PFLOPS Summit at Oak Ridge National Laboratory and the 100+ PFLOPS Sierra at Lawrence Livermore National Laboratory. The other two systems will use IBM Power9 and NVIDIA Volta chips.
An 8.5 petaflops, 1.7 MW secondary system named Theta will be built in 2016.
According to Intel and Argonne National Laboratory:
Research goals for the Aurora system include: more powerful, efficient and durable batteries and solar panels; improved biofuels and more effective disease control; improving transportation systems and enabling production of more highly efficient and quieter engines; and wind turbine design and placement for improved efficiency and reduced noise.
Mflop/s is a rate of execution, millions of floating point operations per second. Whenever this term is used it will refer to 64 bit floating point operations and the operations will be either addition or multiplication. Gflop/s refers to billions of floating point operations per second and Tflop/s refers to trillions of floating point operations per second.
The Platform reports that CPU export restrictions to Chinese supercomputing centers may have backfired. Tianhe-2 has remained the world's top supercomputer for the last five iterations of the TOP500 list using a heterogeneous architecture that mixes Intel's Xeon and Xeon Phi chips. Tianhe-2 will likely be upgraded to Tianhe-2A within the next year (rather than by the end of 2015 as originally planned), nearly doubling its peak performance from 54.9 petaflops to around 100 petaflops, while barely raising peak power usage. However, instead of using a new Intel Xeon Phi chip, a homegrown "China Accelerator" and novel architecture will be used.
A few details about the accelerator are known:
Unlike other [digital signal processor (DSP)] efforts that were aimed at snapping into supercomputing systems, this one is not a 32-bit part, but is capable of supporting 64-bit and further, it can also support both single (as others do) and double-precision. As seen below, the performance for both single and double precision is worth remarking upon (around 2.4 single, 4.8 double teraflops for one card) in a rather tiny power envelope. It will support high bandwidth memory as well as PCIe 3.0. In other words, it gives GPUs and Xeon Phi a run for the money—but the big question has far less to do with hardware capability and more to do with how the team at NUDT will be able to build out the required software stack to support applications that can gobble millions of cores on what is already by far the most core-dense machine on the planet.