The Platform reports that CPU export restrictions [soylentnews.org] to Chinese supercomputing centers may have backfired [theplatform.net]. Tianhe-2 has remained the world's top supercomputer for the last five iterations of the TOP500 list using a heterogeneous architecture that mixes Intel's Xeon and Xeon Phi chips. Tianhe-2 will likely be upgraded to Tianhe-2A within the next year (rather than by the end of 2015 as originally planned), nearly doubling its peak performance from 54.9 petaflops to around 100 petaflops, while barely raising peak power usage. However, instead of using a new Intel Xeon Phi chip, a homegrown "China Accelerator" and novel architecture will be used.
A few details about the accelerator [theplatform.net] are known:
Unlike other [digital signal processor (DSP)] efforts that were aimed at snapping into supercomputing systems, this one is not a 32-bit part, but is capable of supporting 64-bit and further, it can also support both single (as others do) and double-precision. As seen below, the performance for both single and double precision is worth remarking upon (around 2.4 single, 4.8 double teraflops for one card) in a rather tiny power envelope. IT will support high bandwidth memory as well as PCIe 3.0. In other words, it gives GPUs and Xeon Phi a run for the money—but the big question has far less to do with hardware capability and more to do with how the team at NUDT will be able to build out the required software stack to support applications that can gobble millions of cores on what is already by far the most core-dense machine on the planet.