SoylentNews Comments | New List of TOP500 Supercomputers [Updated]

New List of TOP500 Supercomputers [Updated]

posted by martyb on Tuesday June 20 2017, @03:34PM

from the Is-that-a-Cray-in-your-pocket? dept.

A new list was published on top500.org. It might be noteworthy that the NSA, Google, Amazon, Microsoft etc. are not submitting information to this list. Currently, the top two places are occupied by China, with a comfortable 400% head-start in peak-performance and 370% Rmax performance to the 3rd place (Switzerland). US appears on rank 4, Japan on rank 7, and Germany is not in the top ten at all.

All operating systems in the top-10 are Linux and derivates. It seems obvious that, since it is highly optimized hardware, only operating systems are viable which can be fine-tune (so, either open source or with vendor-support for such customizations). Still I would have thought that, since a lot of effort needs to be invested anyway, maybe other systems (BSD?) could be equally suited to the task.

Rank	Site	System	Cores	Rmax (TFlop/s)	Rpeak (TFlop/s)	Power (kW)
1	China: National Supercomputing Center in Wuxi	Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway - NRCPC	10,649,600	93,014.6	125,435.9	15,371
2	China: National Super Computer Center in Guangzhou	Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P - NUDT	3,120,000	33,862.7	54,902.4	17,808
3	Switzerland: Swiss National Supercomputing Centre (CSCS)	Piz Daint - Cray XC50, Xeon E5-2690v3 12C 2.6GHz, Aries interconnect , NVIDIA Tesla P100 - Cray Inc.	361,760	19,590.0	25,326.3	2,272
4	U.S.: DOE/SC/Oak Ridge National Laboratory	Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x - Cray Inc.	560,640	17,590.0	27,112.5	8,209
5	U.S.: DOE/NNSA/LLNL	Sequoia - BlueGene/Q, Power BQC 16C 1.60 GHz, Custom - IBM	1,572,864	17,173.2	20,132.7	7,890
6	U.S.: DOE/SC/LBNL/NERSC	Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect - Cray Inc.	622,336	14,014.7	27,880.7	3,939
7	Japan: Joint Center for Advanced High Performance Computing	Oakforest-PACS - PRIMERGY CX1640 M1, Intel Xeon Phi 7250 68C 1.4GHz, Intel Omni-Path - Fujitsu	556,104	13,554.6	24,913.5	2,719
8	Japan: RIKEN Advanced Institute for Computational Science (AICS)	K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect - Fujitsu	705,024	10,510.0	11,280.4	12,660
9	U.S.: DOE/SC/Argonne National Laboratory	Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom - IBM	786,432	8,586.6	10,066.3	3,945
10	U.S.: DOE/NNSA/LANL/SNL	Trinity - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect - Cray Inc.	301,056	8,100.9	11,078.9	4,233

takyon: TSUBAME3.0 leads the Green500 list with 14.110 gigaflops per Watt. Piz Daint is #3 on the TOP500 and #6 on the Green500 list, at 10.398 gigaflops per Watt.

According to TOP500, this is only the second time in the history of the list that the U.S. has not secured one of the top 3 positions.

The #100 and #500 positions on June 2017's list have an Rmax of 1.193 petaflops and 432.2 teraflops respectively. Compare to 1.0733 petaflops and 349.3 teraflops for the November 2016 list.

[Update: Historical lists can be found on https://www.top500.org/lists/. There was a time when you only needed 0.4 gigaflops to make the original Top500 list — how do today's mobile phones compare? --martyb]

Original Submission

Rank	Site	System	Cores	Rmax (TFlop/s)	Rpeak (TFlop/s)	Power (kW)
1	China: National Supercomputing Center in Wuxi	Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway - NRCPC	10,649,600	93,014.6	125,435.9	15,371
2	China: National Super Computer Center in Guangzhou	Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P - NUDT	3,120,000	33,862.7	54,902.4	17,808
3	Switzerland: Swiss National Supercomputing Centre (CSCS)	Piz Daint - Cray XC50, Xeon E5-2690v3 12C 2.6GHz, Aries interconnect , NVIDIA Tesla P100 - Cray Inc.	361,760	19,590.0	25,326.3	2,272
4	U.S.: DOE/SC/Oak Ridge National Laboratory	Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x - Cray Inc.	560,640	17,590.0	27,112.5	8,209
5	U.S.: DOE/NNSA/LLNL	Sequoia - BlueGene/Q, Power BQC 16C 1.60 GHz, Custom - IBM	1,572,864	17,173.2	20,132.7	7,890
6	U.S.: DOE/SC/LBNL/NERSC	Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect - Cray Inc.	622,336	14,014.7	27,880.7	3,939
7	Japan: Joint Center for Advanced High Performance Computing	Oakforest-PACS - PRIMERGY CX1640 M1, Intel Xeon Phi 7250 68C 1.4GHz, Intel Omni-Path - Fujitsu	556,104	13,554.6	24,913.5	2,719
8	Japan: RIKEN Advanced Institute for Computational Science (AICS)	K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect - Fujitsu	705,024	10,510.0	11,280.4	12,660
9	U.S.: DOE/SC/Argonne National Laboratory	Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom - IBM	786,432	8,586.6	10,066.3	3,945
10	U.S.: DOE/NNSA/LANL/SNL	Trinity - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect - Cray Inc.	301,056	8,100.9	11,078.9	4,233

This discussion has been archived. No new comments can be posted.

New List of TOP500 Supercomputers [Updated] | Log In/Create an Account | Top | 45 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Unofficial AMA Unofficial AMA (Score: 4, Interesting) by Anonymous Coward on Tuesday June 20 2017, @03:58PM (9 children)

by Anonymous Coward on Tuesday June 20 2017, @03:58PM (#528530)

I hope Cray doesn't get pissed off and fire me for this:

I work as an engineer at Cray Inc., and I would be willing to answer questions about the Top 500 list as well as Cray, with the understanding that the responses are all my own opinions and do not represent Cray's position in any official way whatsoever. I'm only responding as an individual, not as a Cray employee representing the company.

That said, I've personally run a number of codes on the Titan supercomputer in particular, as well as a number of other large Cray machines. I am bound by certain NDAs, but I'm happy to share whatever I can, as appropriate.

Please, feel free to ask me anything; I'll be back on later after work, and try to answer some Qs.

Starting Score:	0		points
Moderation		+4
Interesting=4, Total=4
Extra 'Interesting' Modifier		0

Total Score:		4

Re:Unofficial AMA Re:Unofficial AMA (Score: 2) by VLM on Tuesday June 20 2017, @04:02PM (1 child)

by VLM (445) on Tuesday June 20 2017, @04:02PM (#528535)

Y U no *BSD?
My guess, from purely public non-NDA sources, is your interconnect driver code is more complicated than a cut-paste-compile job although theoretically you could BSD if you wanted.
With a side dish of license problems where GPL means people have to share advances and BSD doesn't require it, so naturally linux will develop faster (for good or in the case of systemd cancer, bad)

Parent
- Re:Unofficial AMA (Score: 1, Informative) by Anonymous Coward on Tuesday June 20 2017, @04:12PM
  
  by Anonymous Coward on Tuesday June 20 2017, @04:12PM (#528546)
  
  Y U no *BSD? [...]
  I can't be sure. If I were to hazard a guess, it would be that Cray has already invested a lot in Linux. Cray uses the Cray Linux Environment (CLE 5.2 [cray.com]) on its supercomputer line (Cray has a cluster product lineup as well, which provides more software flexibility to customers), which I think is currently based on SLES 11 SP3. While I don't personally know of any technical reason that BSD couldn't be made to work, it would probably take a lot of money and effort to switch.
  
  Parent
yo dog, piz daint cray yo dog, piz daint cray (Score: 0) by Anonymous Coward on Tuesday June 20 2017, @04:40PM (1 child)

by Anonymous Coward on Tuesday June 20 2017, @04:40PM (#528574)

Doesn't "Piz Daint Cray" sound like something you'd hear on the streets of Philadelphia?
Seriously now, how do you design software and toolchains to take advantage of a large number of cores? Do you need to use specialized languages?
How do you manage I/O? I/O is a bottleneck in desktop computing, so it must be a serious consideration in supercomputing.
How hot does the room get when you have all of those cores running at once? If you wanted to build a supercomputer that could run at room temperature with minimal air conditioning, how big a performance hit would you have to take?

Parent
- Re:yo dog, piz daint cray (Score: 1, Insightful) by Anonymous Coward on Tuesday June 20 2017, @06:32PM
  
  by Anonymous Coward on Tuesday June 20 2017, @06:32PM (#528636)
  
  [...] how do you design software and toolchains to take advantage of a large number of cores? Do you need to use specialized languages?
  
  How do you manage I/O? I/O is a bottleneck in desktop computing, so it must be a serious consideration in supercomputing.
  
  How hot does the room get when you have all of those cores running at once? If you wanted to build a supercomputer that could run at room temperature with minimal air conditioning, how big a performance hit would you have to take?
  1) A lot can be done with fairly standard languages. Fortran does very well in the HPC (high-performance computing) space. Many younger people think of Fortran as an outdated dinosaur, but this couldn't be further from the truth. Fortran is a modern language with features such as various kinds of closure, object oriented support, and much, much more. Now, I don't personally care for some aspects of the syntax, but Fortran is no dog. In fact, due to the way that arrays in Fortran are first class language constructs (unlike C where everything is just a bare pointer to a chunk of memory and an offset), Fortran code often (nearly always) outperforms other languages like C and C++. This is because more information is available for the computer to use during optimization. The compiler just can't "see" as much of what's going on in C code, with bare pointers all over the place. That said, more and more HPC codes are being written or "ported" to C/C++ now than ever before. C and C++ work just fine on supercomputers as long as the code is very carefully written. Start with MPI and OpenMP in Fortran or plain C if you want to get started the easiest way possible for programming supercomputers. The fun part is, you can even run thse MPI+OMP codes on your home desktop to test them out (at a tiny scale). -- That said, there are many custom languages, frameworks, libraries, etc. available on these machines, if one feels like getting really fancy.
  
  2) I/O is a big issue. Cray tends to use the Lustre parallel filesystem on their machines, but other supercomputers use different parallel filesystems like GPFS. If you've ever setup a NFS server, you can think of what supercomputers use as the same basic idea, but "on steroids". One filesystem will span many servers and many disks, so as to provide a high level of parallelism to the highly parallel application code. Cray also offers nodes with "Burst Buffers", which are just SSDs sitting on the compute nodes along with some nice software to expose these to the compute processes in an easy to use way. That said, I should make a comment for those not in the HPC space: the best way to avoid I/O is not to do it. So, at supercomputer scales, many people take the stance that nothing should touch disk unless it has to. So, data is not passed around in temp files on disk. Instead, data is communicated directly between compute nodes, in memory, using the high-speed interconnect whenever possible. I/O is a large and complicated subject, and I'm just scratching the surface here.
  
  3) I'm going to guess that this is mostly a cost issue, and a lot of it comes down to compute density, in terms of floor space in the datacenter. If you have air cooling, that's not really a problem per se, but then the same amount of compute power will take more floor space, because it can't be as dense and still be cooled as well as if it were liquid cooled. So, there's no performance reason, really, that things are liquid cooled, it's just that you can get denser, and this can use a smaller datacenter footprint to get the same performance. Remember that the datacenter can at times cost just as much if not more than a machine (depending on the machine and datacenter in question).
  
  Parent
Re:Unofficial AMA Re:Unofficial AMA (Score: 2) by LoRdTAW on Tuesday June 20 2017, @05:05PM (2 children)

by LoRdTAW (3755) on Tuesday June 20 2017, @05:05PM (#528588) Journal

I have a few:
1) What's next in terms of supercomputing hardware? Are we still building what amounts to a Intel PC with a video card and a fancy interconnect? Or will we see more exotic hardware like those Google AI chips or the proposed DARPA CPU: https://soylentnews.org/article.pl?sid=17/06/12/1959259 [soylentnews.org]? What about Xeon PHI's, Arm, AMD, GPU/APU, FPGA, or ASIC's?
2) What bottlenecks do you currently have to deal with and how do you get around them? e.g. I/O, bandwidth, storage, Memory, CPU/GPU/Etc?
3) Is AI becoming a factor in supercomputing?
4) Lastly, Outside of AI and large government research projects, do you see any future applications for supercomputers?

Parent
- Re:Unofficial AMA Re:Unofficial AMA (Score: 2, Interesting) by Anonymous Coward on Tuesday June 20 2017, @06:09PM (1 child)
  
  by Anonymous Coward on Tuesday June 20 2017, @06:09PM (#528619)
  
  1) What's next in terms of supercomputing hardware? Are we still building what amounts to a Intel PC with a video card and a fancy interconnect? Or will we see more exotic hardware like those Google AI chips or the proposed DARPA CPU: https://soylentnews.org/article.pl?sid=17/06/12/1959259 [soylentnews.org] [soylentnews.org]? What about Xeon PHI's, Arm, AMD, GPU/APU, FPGA, or ASIC's?
  
  2) What bottlenecks do you currently have to deal with and how do you get around them? e.g. I/O, bandwidth, storage, Memory, CPU/GPU/Etc?
  
  3) Is AI becoming a factor in supercomputing?
  
  4) Lastly, Outside of AI and large government research projects, do you see any future applications for supercomputers?
  
  1) I have to mostly recuse myself from this particular question due to NDA concerns. However, I could perhaps point you to coverage of a recent announcement from the US government on funding for exascale research: Six Exascale PathForward Vendors Selected; DoE Providing $258M [hpcwire.com]. I could also say that Moore's Law and Dennard Scaling are showing signs of slowing down. This means that easy performance gains from scaling to a smaller manufacturing node (making transistors smaller) are not coming in like they used to. This means that there may be a little more room for doing some clever hardware designing instead of just scaling the same old things down. This may also mean there is a little more room to get a real HPC-oriented CPU instead of just commodity/server parts, just maybe. -- Cray does currently sell systems with the Xeon PHI: consider Cori, #6 on the list.
  
  2) Yes. All of those are issues, and they all matter. If I were to pick one to focus on, I would probably pick the interconnect. I would say something like: a large number of CPUs in the same room does not a supercomputer make. That's just a lot of individual computers. To make a true supercomputer, you need to be able to have all those tens of thousands of CPUs working together on the same problem. This requires a very high-bandwidth and low-latency interconnect. Cray has historically placed a very strong emphasis on the interconnect, and Cray has created several custom interconnects in the past when commodity parts were simply not good enough. Today, one can limp along with EDR InfiniBand and do OK on the smaller end of the supercomputer market. At the top end, IB gets very expensive actually. Cray's Aries interconnect still holds its own on real world workloads, despite being a smidge old now. I can't comment on if or when Cray plans to introduce a new interconnect as a follow on to Aries. In addition to tackling the communications issues by using a high-performance interconnect, a lot of work is also done on the software side. Cray has an optimized MPI library (if you want to learn to drive a supercomputer, learn MPI and OMP), and decades of experience scaling and optimizing codes. To give just one tip, always write your code to overlap communication and computation as much as possible. That is, initiate some communication, do some other work without waiting for the comms to finish, then finally wait for confirmation that the previously initiated communication has completed only after you've done said computation. Writing good code which does a good job of this comm/comp overlap, where possible, is a good place to start.
  
  (3) Yes. While NVIDIA likes to tell stories about being able to put a "supercomputer on a desk" or make a "supercomputer fit in a PCIe slot", people actually in the HPC (High Performance Computing) industry tend to laugh at this in private. While Deep Learning / Machine Learning does run very well on GPUs, a computer with 16 GPUs is not a supercomputer. Consider that Piz Daint, #3 on the list, has 5,320 P100 GPUs. Cray is actually uniquely positioned to have some of the highest-performing Deep Learning training runs in the world take place on some machines they've built. This has a lot to do with the interconnect and communications stack, but that's not Cray's only advantage here. Note that the size of the Deep Neural Networks in use in industry is increasing. These nets and their training data sets are growing very quickly, and soon, they may not fit well on small machines that can't scale efficiently to 1000s of nodes, whether each of those nodes is a CPU, GPU, or other accelerator.
  
  (4) Yes. Actually, I'm very bullish about the future of the supercomputing market, particularly in the commercial (non-gov) space. More and more companies are increasing their investment and reliance on computing in general, and this is also true at the high end of the market. From oil and gas (reservoir simulation) to traditional manufacturing (CFD, etc.), supercomputing is no longer solely an activity of national governments. While the largest machines may continue to be owned by governments, more and more companies are realizing the competitive advantages that can come from effective utilization of true supercomputer-class machines.
  
  Parent
  - Re:Unofficial AMA (Score: 2) by LoRdTAW on Tuesday June 20 2017, @08:19PM
    
    by LoRdTAW (3755) on Tuesday June 20 2017, @08:19PM (#528693) Journal
    
    Great answers. Thank you.
    
    Parent
Re:Unofficial AMA Re:Unofficial AMA (Score: 2) by takyon on Tuesday June 20 2017, @08:35PM (1 child)

by takyon (881) <takyonNO@SPAMsoylentnews.org> on Tuesday June 20 2017, @08:35PM (#528700) Journal

What's your take on the new storage and memory technologies?
Examples:
High Bandwidth Memory
Hybrid Memory Cube
GDDR6/GDDR5X/DDR5
3D QLC NAND
NAND with 64-96 layers
Intel/Micron 3D XPoint (the only significant post-NAND technology to make it to market)
and last but probably least,
helium-filled shingled magnetic recording hard drives (because HAMR is nowhere to be found)

--
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]

Parent
- Re:Unofficial AMA (Score: 1, Informative) by Anonymous Coward on Wednesday June 21 2017, @12:27AM
  
  by Anonymous Coward on Wednesday June 21 2017, @12:27AM (#528807)
  
  What's your take on the new storage and memory technologies?
  Examples:
  High Bandwidth Memory
  Hybrid Memory Cube
  GDDR6/GDDR5X/DDR5
  3D QLC NAND
  NAND with 64-96 layers
  Intel/Micron 3D XPoint (the only significant post-NAND technology to make it to market)
  
  and last but probably least,
  
  helium-filled shingled magnetic recording hard drives (because HAMR is nowhere to be found)
  Memory with a high bandwidth (and a low latency) is becoming more and more critical in HPC. CPU core performance has, for some time now, been increasing faster than memory performance. That is, it's getting harder and harder to keep the cores fed with data. This is often more true for HPC (high-performance computing) workloads than for workloads in many other spaces. Now, there are different technologies which attempt to deliver this bandwidth (don't forget latency) as you point out. I wish I could say more here, but I can't due to NDA concerns. However, perhaps I could paint a very rough picture from publicly available information. HBM ("High Bandwidth Memory") is good stuff compared to DDR, and HBM2+ is better. There are some concerns here, as the HMB stacks use a very wide, parallel bus, and need to be placed very close to the CPU/SoC die. I think we're talking about distances on the order of 1mm or so. There's only so much room close to a CPU/SoC die to place HBM stacks, given this distance requirement. Don't forget that the HBM stacks and the CPU/SoC die probably have to share a (likely silicon) interposer. The nice thing about HMC ("Hybrid Memory Cube"), is that it uses a serial, rather than a parallel interface like HBM. Thus, HMC can be placed further away from the CPU/SoC die(s), and this can increase total memory capacity. The issue here is, you then pay for this extra capacity in terms of latency, as you have to introduce a SerDes step, etc. Also, one needs to think about power consumption: consider the amount of power, on average, that it takes to move one bit of data to/from memory with HBM vs. HMC; think pico Joules per bit. -- I can't speak much to the GDDRX or NAND stuff myself. XPoint sure sounds interesting, but there has been a lot of hype there. I wonder how this will turn out in the near to mid future. Again, keep an eye on power consumption there, this will limit the solution space XPoint can compete in. I don't know much about helium-filled drives either, sorry.
  
  Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

New List of TOP500 Supercomputers [Updated]

Unofficial AMA Unofficial AMA (Score: 4, Interesting) by Anonymous Coward on Tuesday June 20 2017, @03:58PM (9 children)

Re:Unofficial AMA Re:Unofficial AMA (Score: 2) by VLM on Tuesday June 20 2017, @04:02PM (1 child)

Re:Unofficial AMA (Score: 1, Informative) by Anonymous Coward on Tuesday June 20 2017, @04:12PM

yo dog, piz daint cray yo dog, piz daint cray (Score: 0) by Anonymous Coward on Tuesday June 20 2017, @04:40PM (1 child)

Re:yo dog, piz daint cray (Score: 1, Insightful) by Anonymous Coward on Tuesday June 20 2017, @06:32PM

Re:Unofficial AMA Re:Unofficial AMA (Score: 2) by LoRdTAW on Tuesday June 20 2017, @05:05PM (2 children)

Re:Unofficial AMA Re:Unofficial AMA (Score: 2, Interesting) by Anonymous Coward on Tuesday June 20 2017, @06:09PM (1 child)

Re:Unofficial AMA (Score: 2) by LoRdTAW on Tuesday June 20 2017, @08:19PM

Re:Unofficial AMA Re:Unofficial AMA (Score: 2) by takyon on Tuesday June 20 2017, @08:35PM (1 child)

Re:Unofficial AMA (Score: 1, Informative) by Anonymous Coward on Wednesday June 21 2017, @12:27AM