SoylentNews Comments | A $1,499 Supercomputer on a Card?

A $1,499 Supercomputer on a Card?

posted by n1 on Thursday April 10 2014, @07:39PM

from the will-it-play-crysis-though dept.

Bytram writes:

A $1,499 supercomputer on a card? That's what I thought when reading El Reg's report of AMD's Radeon R9 295X2 graphics card which is rated at 11.5 TFlop/s(*). It is water-cooled, contains 5632 stream processors, has 8 GB of DDR5 RAM, and runs at 1018MHz.

AMD's announcement claims it's "the world's fastest, period". The $1,499 MSRP compares favorably to the $2,999 NVidia GTX Titan Z which is rated at 8 TFlop/s.

From a quick skim of the reviews (at: Hard OCP, Hot Hardware, and Tom's Hardware), it appears AMD has some work to do on its drivers to get the most out of this hardware. The twice-as-expensive NVidia Titan in many cases outperformed it (especially at lower resolutions). At higher resolutions (3840x2160 and 5760x1200) the R9 295x2 really started to shine.

For comparison, consider that this 500 watt, $1,499 card is rated better than the world's fastest supercomputer listed in the top 500 list of June 2001.

(*) Trillion FLoating-point OPerations per Second.

This discussion has been archived. No new comments can be posted.

A $1,499 Supercomputer on a Card? | Log In/Create an Account | Top | 23 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

SP vs FP..., applicationsSP vs FP..., applications (Score: 1) by opinionated_science on Thursday April 10 2014, @08:03PM

by opinionated_science (4031) on Thursday April 10 2014, @08:03PM (#29677)

I read this review too, and it looks very exciting to have ever increasing computational density!!
I under though, that that is a discrepancy between the Single precision and Double precision that may not be a physical limitation as a software one. I believe the 11 Tflop/s is SP, and I imagine the DP performacne might be 2.3 or so?
Anyone know how they got that 11Tf number? Was it a real code?
In addition, although I understand OpenCL has improved , the AMD drivers are not consider as stable as Nvidia's , and of course the proprietary CUDA tools.
My interest is in molecular biophysics (MD simulation), and I would really like to see a supercomputer on the desktop, or at least a fraction of Anton....
- Re:SP vs FP..., applicationsRe:SP vs FP..., applications (Score: 2, Informative) by Bytram on Thursday April 10 2014, @08:10PM
  
  by Bytram (4043) on Thursday April 10 2014, @08:10PM (#29681) Journal
  
  I have not seen what specific code AMD (or NVidia) ran to get their numbers, but here's a link to the TOP 500 list's Linpack Benchmark Page [top500.org] and to the Linpack FAQ [netlib.org].
  
  Parent
  - Re:SP vs FP..., applications(Score: 1) by opinionated_science on Thursday April 10 2014, @08:20PM
    
    by opinionated_science (4031) on Thursday April 10 2014, @08:20PM (#29687)
    
    well I was looking for LINPACK numbers as I found this:
    http://devgurus.amd.com/message/1285375#1285375 [amd.com] (OpenCL 8 GPU DGEMM (5.1 TFlop/s double precision). Heterogeneous HPL (High Performance Linpack from Top500).)
    They got > 5Tflops DP using 3 older Radeon cards, and it was posted Mar 2014, so and update will be interesting.
    One thing that LINPACK helps, is it gives a measure of *some* useful work to relate practical performance characteristics. Ok not ideal, but stops the marketing fluff getting in the way ;-)
    
    Parent
Guess my HD5450 is obsolete...Guess my HD5450 is obsolete... (Score: 2) by n1 on Thursday April 10 2014, @08:05PM

by n1 (993) on Thursday April 10 2014, @08:05PM (#29678) Journal

Never ever thought i'd say this but, seems like a good deal for a $1500 graphics card.
- Re:Guess my HD5450 is obsolete...Re:Guess my HD5450 is obsolete... (Score: 3, Insightful) by Lazarus on Thursday April 10 2014, @08:14PM
  
  by Lazarus (2769) on Thursday April 10 2014, @08:14PM (#29684)
  
  It's a pretty bad deal for a graphics card, but an excellent one for a high-speed computing platform.
  
  Parent
  - Excellent value for high-speed computing platform(Score: 1) by Bytram on Friday April 11 2014, @02:34AM
    
    by Bytram (4043) on Friday April 11 2014, @02:34AM (#29819) Journal
    
    It's a pretty bad deal for a graphics card, but an excellent one for a high-speed computing platform.
    
    Excellent value, indeed! So it was called the ASCI White [wikipedia.org] made by IBM and installed at the Lawrence Livermore National Laboratory. (ASCI = Accelerated Strategic Computing Initiative.) LLNL has a great write-up [llnl.gov] about it including this picture [llnl.gov].
    The ASCI White system contained 8,192 375MHz processors; had 6 TB of memory and 160TB of disk storage in about 7,000 disk drives. It weighed 106 tons, needed 3 MW of electricity to run, and needed another 3 MW for cooling. The system cost $110 million and was installed in a 20,000 sq ft computer room.
    By comparison, the Radeon R9 295x2 card comes up rather short on memory and storage, but compares quite favorably when looking at weight, power consumption, price, and size. =)
    
    Parent
- Re:Guess my HD5450 is obsolete...(Score: 1) by jasassin on Friday April 11 2014, @07:17AM
  
  by jasassin (3566) <jasassin@gmail.com> on Friday April 11 2014, @07:17AM (#29899) Homepage Journal
  
  At least your 5450 still works with fglrx and the new xorgs (I think HD5450 is the oldest to still work). My 3450 is doomed to Windows.
  
  --
  jasassin@gmail.com GPG Key ID: 0xE6462C68A9A3DB5A
  
  Parent
All that power for graphics?All that power for graphics? (Score: 1) by GlennC on Thursday April 10 2014, @08:06PM

by GlennC (3656) on Thursday April 10 2014, @08:06PM (#29680)

All that power, and it's on a graphics card?
I'm sure there's a market for it. I'm just as sure I'm not part of that market.

--
Sorry folks...the world is bigger and more varied than you want it to be. Deal with it.
- Re:All that power for graphics?Re:All that power for graphics? (Score: 4, Informative) by maxwell demon on Thursday April 10 2014, @09:13PM
  
  by maxwell demon (1608) on Thursday April 10 2014, @09:13PM (#29717) Journal
  
  The market would be scientific computing. That is, using the graphics card as parallel computing coprocessor. The fact that it also has video out (if it has, there are actually cards that don't) is irrelevant for that purpose.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
  - Re:All that power for graphics?(Score: 2, Interesting) by opinionated_science on Thursday April 10 2014, @09:24PM
    
    by opinionated_science (4031) on Thursday April 10 2014, @09:24PM (#29723)
    
    not entirely irrelevant. There is a use in molecular simulation to vizualise the system in question, and even to "steer" it while running.
    
    Parent
  - Graphics output?(Score: 2, Interesting) by Kymation on Thursday April 10 2014, @09:32PM
    
    by Kymation (1047) on Thursday April 10 2014, @09:32PM (#29728)
    
    The manufacturer's page doesn't list video output in the specifications. Odd. The picture of the card shows five output connectors, so I suspect that it does actually have video out though.
    
    Parent
- Re:All that power for graphics?Re:All that power for graphics? (Score: 1) by _NSAKEY on Thursday April 10 2014, @09:55PM
  
  by _NSAKEY (16) on Thursday April 10 2014, @09:55PM (#29735)
  
  The more hardcore guys who post on hashcat.net's forum will probably have 4 of these running in one box within a week of the card's launch. Granted, it will take more than one power supply, but the kind of person who would bulk order these cards is also the same kind of person who has used multiple PSUs in the same rig before.
  
  Parent
  - Re:All that power for graphics?(Score: 1) by opinionated_science on Thursday April 10 2014, @10:19PM
    
    by opinionated_science (4031) on Thursday April 10 2014, @10:19PM (#29739)
    
    it would be useful for those of us want to scientific calculations, if they would run the benchmarks for computation for their rigs!! I would wager a few scientists would have a crack at replicating the best performing designs.
    We might get the vendors to start optimizing for reproducible calculation, rather than marketing numbers...
    
    Parent
- Re:All that power for graphics?(Score: 2) by zim on Friday April 11 2014, @05:20AM
  
  by zim (1251) on Friday April 11 2014, @05:20AM (#29877)
  
  I AM part of that market! But the money is not part of me... So :(
  
  But hey. I can buy one in a few years when they're the new $150 card.
  
  Parent
not necessarilynot necessarily (Score: 3, Informative) by takyon on Thursday April 10 2014, @08:23PM

by takyon (881) <takyonNO@SPAMsoylentnews.org> on Thursday April 10 2014, @08:23PM (#29690) Journal

AMD: [anandtech.com]
AMD Radeon R9 295X2 AMD Radeon R9 290X AMD Radeon HD 7990 AMD Radeon HD 7970 GHz Edition FP64 1/8 1/8 1/4 1/4

NVIDIA: [anandtech.com]
GTX Titan Black GTX 780 Ti GTX Titan GTX 780 FP64 1/3 FP32 1/24 FP32 1/3 FP32 1/24 FP32

"Today NVIDIA is letting its compute-at-home customers have their cake and eat it too with the GeForce GTX Titan Black. The Titan Black is a full GK110 implementation, just like the GTX 780 Ti, with all of the compute focused-ness of the old GTX Titan. That means you get FP64 performance that's only 1/3 of the card's FP32 performance (compared to 1/24 with the 780 Ti)."
Double-precision floating-point format [wikipedia.org]
FLOPS - Floating-point operation and integer operation [wikipedia.org]

--
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
- Re:not necessarily(Score: 1) by opinionated_science on Thursday April 10 2014, @08:39PM
  
  by opinionated_science (4031) on Thursday April 10 2014, @08:39PM (#29700)
  
  ahh , thank you for the summary of the crippled cards ;-)
  From what I understand DP=1/2 SP from a memory bandwidth point of view.
  Is there any technical reasons that the best FP64 performance is 1/3 FP32, other than marketing?
  
  Parent
Hardly news(Score: 3, Insightful) by Dunbal on Thursday April 10 2014, @08:35PM

by Dunbal (3515) on Thursday April 10 2014, @08:35PM (#29697)

AMD has some work to do on its drivers? Seriously? OK. Hold your breath. Any decade now. AMD's shortcoming has ALWAYS been its drivers. I wouldn't expect much more because crappy drivers are just business as usual for them. Hey but maybe they'll release their code so then it's not their fault anymore, is it? It's yours and mine.
Path of least resistance to use this power?Path of least resistance to use this power? (Score: 1) by dbe on Thursday April 10 2014, @09:17PM

by dbe (1422) on Thursday April 10 2014, @09:17PM (#29721)

Kind of related to this, for someone familiar with standard 'linear' programming, what is the best way to approach these monsters?
If you want to do signal/image processing or other embarrassingly parallel tasks, what would you recommend learning, openmp?
Also after dealing with hand-optimization on modern SIMD processors (neon/arm), is it realistic to understand these cards pipeline and cache structure to really get the best performance when writing a computation kernel?
-dbe
- Re:Path of least resistance to use this power?(Score: 2, Insightful) by No.Limit on Thursday April 10 2014, @10:38PM
  
  by No.Limit (1965) on Thursday April 10 2014, @10:38PM (#29744)
  
  I think for GPU computing you have the option between OpenCL, Nvidia's CUDA and OpenACC (there may be more that I don't know of). OpenMP is still CPU only as far as I know, though I believe to have read that OpenMP wants to support GPUs too sometime.
  I don't know much about OpenCL, but it's an open standard and supported on many platforms. I believe it's quite similar to CUDA.
  Nvidia's CUDA works only on Nvidia GPUs (so certainly not on this AMD one). It has lots of good tools (profilers, debuggers etc), documentation, examples, video-tutorials. It's works very well and it gives you a lot of control over the GPU.
  OpenACC is a younger standard for GPU coding. It's on a much higher level than both OpenCL and CUDA, but you still get a pretty good amount of control by specifying additional information for the compiler. There are some proprietary compilers that support it (e.g. from cray or PGI). GCC wants to support OpenACC as well, but I don't think they're very far at the moment.
  Now for SIMD instructions, pipelining and cache structures: GPUs are fundemantally different than CPUs.
  A GPU core is much much simpler than CPU core. To improve sequential execution CPUs have added a lot of complexity (branch prediction, caching, out of order execution, pipelining etc).
  However, GPUs have mostly focused on parallel performance for a long time. So instead they kept the cores simple and made sure to add more cores and made sure that adding more cores scales well.
  So because GPUs are already so well optimized for parallel computing you don't have do to a lot yourself when it comes the details. You may not even be able to code in assembly, but only in C.
  You mainly want to make sure that the overall structure is optimized well.
  So that means using caches efficiently (the usual struct of arrays instead of array of structs, cache friendly access patterns etc). In CUDA the cores are divided into blocks that have a shared faster memory (like a cache) over which you have control meaning you can load data manually.
  You want to make sure that you divide the work well over the blocks and cores. And if you have to transfer a lot of data from or to GPU memory (over the slow PCIe), then you want to make sure that you don't block computation with the transfer (you can transfer data and compute things at the same time).
  
  Parent
Cost...on two fronts(Score: 2, Insightful) by VanessaE on Thursday April 10 2014, @11:03PM

by VanessaE (3396) <vanessa.e.dannenberg@gmail.com> on Thursday April 10 2014, @11:03PM (#29758) Journal

Ok, fifteen hundred bucks if you BUY IT NAOW! Sure, we all know it'll come down to a more reasonable price eventually, but what about on the back end, *after* you buy it? 500 watts just for just a GPU, assuming that's when it's maxed out? I'm sorry but last I knew, hardcore gamers who would buy such a card tend to play for hours on end, let alone folks who would buy them for mining coins, and that kind of power usage is just insane.
My three computers, four decent DFP monitors, and all their ancillary gadgetry all combined use between 790 and 815 watts (according to my Kill-a-Watt) when they're all running at full blast, and they are *not* low-end hardware at all.
Acronym nitpickingAcronym nitpicking (Score: 1) by monster on Friday April 11 2014, @10:45AM

by monster (1260) on Friday April 11 2014, @10:45AM (#29967) Journal

Acronym nitpicking: It's not 'Trillion FLoating-point OPerations per Second', it's 'Tera FLoating-point OPerations per Second'. The fact that Tera uses the same initial letter to the english trillion is just a coincidence. You can see the difference with other units like GFLOP (Giga) and not (BFLOP) (Billion).
- Re:Acronym nitpickingRe:Acronym nitpicking (Score: 1) by Bytram on Friday April 11 2014, @01:08PM
  
  by Bytram (4043) on Friday April 11 2014, @01:08PM (#30014) Journal
  
  If you go to The Linpack Benchmark [top500.org] on the TOP 500 site, there's a link to Frequently Asked Questions on the Linpack Benchmark and Top500 [netlib.org]. At the entry for "What is a Mflop/s?" which you can reach directly [netlib.org], it states:
  What is a Mflop/s?
  Mflop/s is a rate of execution, millions of floating point operations per second. Whenever this term is used it will refer to 64 bit floating point operations and the operations will be either addition or multiplication. Gflop/s refers to billions of floating point operations per second and Tflop/s refers to trillions of floating point operations per second.
  As these are the folks who came up with the list, I defer to their historical and continued use of this definition.
  
  Parent
  - Re:Acronym nitpicking(Score: 1) by monster on Friday April 11 2014, @01:18PM
    
    by monster (1260) on Friday April 11 2014, @01:18PM (#30022) Journal
    
    Thanks for the clarification, but unless they are being incoherent in their naming, their comment validates my point: It's Mflops (Mega), Gflops (Giga) and Tflops (Tera), even if they put them side by side with their numerical values.
    Anyway, enough nitpicking for now.
    
    Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

A $1,499 Supercomputer on a Card?

SP vs FP..., applicationsSP vs FP..., applications (Score: 1) by opinionated_science on Thursday April 10 2014, @08:03PM

Re:SP vs FP..., applicationsRe:SP vs FP..., applications (Score: 2, Informative) by Bytram on Thursday April 10 2014, @08:10PM

Re:SP vs FP..., applications(Score: 1) by opinionated_science on Thursday April 10 2014, @08:20PM

Guess my HD5450 is obsolete...Guess my HD5450 is obsolete... (Score: 2) by n1 on Thursday April 10 2014, @08:05PM

Re:Guess my HD5450 is obsolete...Re:Guess my HD5450 is obsolete... (Score: 3, Insightful) by Lazarus on Thursday April 10 2014, @08:14PM

Excellent value for high-speed computing platform(Score: 1) by Bytram on Friday April 11 2014, @02:34AM

Re:Guess my HD5450 is obsolete...(Score: 1) by jasassin on Friday April 11 2014, @07:17AM

All that power for graphics?All that power for graphics? (Score: 1) by GlennC on Thursday April 10 2014, @08:06PM

Re:All that power for graphics?Re:All that power for graphics? (Score: 4, Informative) by maxwell demon on Thursday April 10 2014, @09:13PM

Re:All that power for graphics?(Score: 2, Interesting) by opinionated_science on Thursday April 10 2014, @09:24PM

Graphics output?(Score: 2, Interesting) by Kymation on Thursday April 10 2014, @09:32PM

Re:All that power for graphics?Re:All that power for graphics? (Score: 1) by _NSAKEY on Thursday April 10 2014, @09:55PM

Re:All that power for graphics?(Score: 1) by opinionated_science on Thursday April 10 2014, @10:19PM

Re:All that power for graphics?(Score: 2) by zim on Friday April 11 2014, @05:20AM

not necessarilynot necessarily (Score: 3, Informative) by takyon on Thursday April 10 2014, @08:23PM

Re:not necessarily(Score: 1) by opinionated_science on Thursday April 10 2014, @08:39PM

Hardly news(Score: 3, Insightful) by Dunbal on Thursday April 10 2014, @08:35PM

Path of least resistance to use this power?Path of least resistance to use this power? (Score: 1) by dbe on Thursday April 10 2014, @09:17PM

Re:Path of least resistance to use this power?(Score: 2, Insightful) by No.Limit on Thursday April 10 2014, @10:38PM

Cost...on two fronts(Score: 2, Insightful) by VanessaE on Thursday April 10 2014, @11:03PM

Acronym nitpickingAcronym nitpicking (Score: 1) by monster on Friday April 11 2014, @10:45AM

Re:Acronym nitpickingRe:Acronym nitpicking (Score: 1) by Bytram on Friday April 11 2014, @01:08PM

Re:Acronym nitpicking(Score: 1) by monster on Friday April 11 2014, @01:18PM