Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Thursday June 25 2015, @06:07PM   Printer-friendly
from the one-step-at-a-time dept.

Nvida's latest mark of their newly discovered open-source kindness is beginning to provide open-source hardware reference headers for their latest GK20A/GM20B Tegra GPUs while they are working to also provide hardware header files on their older GPUs. These programming header files in turn will help the development of the open-source Nouveau video driver as up to this point they have had to do much of the development via reverse-engineering.

In order to drive Nouveau as NVIDIA's primary development environment for Tegra, they are looking at adding "official" hardware reference headers to Nouveau. Ken explained, " The headers are derived from the information we use internally. I have arranged the definitions such that the similarities and differences between GPUs is made explicit. I am happy to explain the rationale for any design choices and since I wrote the generator I am able to tweak them in almost any way the community prefers."

So far he has been cleared to provide the programming headers for the GK20A and GM20B. For those concerned this is just an item for driving up future Tegra sales, Ken added, "over the long-term I'm confident any information we need to fill-in functionality >= NV50/G80 will be made public eventually. We just need to go through the internal steps necessary to make that happen."

Perhaps most interesting is that moving forward they would like to use the Nouveau kernel driver code-base as the primary development environment for new hardware. In 2012 Torvalds sent a public "fuck you!" to Nvidia. Also, don't forget Intel and AMD offerings.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by gman003 on Thursday June 25 2015, @11:45PM

    by gman003 (4155) on Thursday June 25 2015, @11:45PM (#201303)

    You have quite clearly never done any serious GPU programming.

    "Smart" sound cards, that did heavy audio processing, fell by the wayside for two reasons: low-end, dumb, integrated audio was good enough for enough people that dedicated sound cards became rare, too rare for most applications to bother supporting, and ore importantly, audio processing was easy and efficient to do on the CPU. Audio processing is basically multiply-accumulate buffers - attenuate this source by this much, then add it to the sound buffer. The only thing I can think of that wouldn't be done with FMA is pitch shifting, which I would guess is a FFT, still easy enough to do on a CPU. And it's a small, one-dimensional buffer - a second of stereo audio is only 172KiB, so you can have quite a number of buffers before the data set becomes too large.

    Compare graphics. First, the scale of the problem goes up exponentially because it's 2D, not 1D, buffers, and there's a lot more of them (instead of left/right audio, you have red, green, blue, depth, and alpha, and you may be compositing several full-screen buffers for one frame). A single 1080p buffer is about 8MiB, and you'll be using many of those.

    Next, rasterization is an implicitly parallel task. Efficient cores - superscalar, out-of-order, speculative-executing big CPU cores - don't work particularly faster than a single GPU "core", which is scalar, in-order and literally dumber than a Pentium Pro core. But a decent GPU has hundreds of ALUs, and a top-end one has thousands, while the widest CPU I know of is just 18 cores. GPUs are an embarrassingly parallel solution to an embarrassingly parallel problem. There are some old algorithms that could exploit CPU efficiencies better, but even modern software renderers don't use them. It's easier to just throw cores at the problem.

    And all those cores need to be fed. GPU caches can be small, because for the most part you process one tile and then move on, but the memory bandwidth needed is staggering. An entry-level gaming GPU will push about 80GiB/s in memory I/O, which beats even a top-end server CPU. The top end GPUs can push about 600GiB/s, and once HBM matures, we're looking at terabytes per second of data being pushed.

    We're never going to move graphics back to the CPU. Best-case, we'll have an onboard GPU that's fully-programmable, with a published ISA and the few remaining fixed-function blocks moved into code, attached to a CPU with a far, far wider memory bus than today. But if we hit a wall on processor scaling before we hit a wall on LCD resolutions, we'll still have discrete GPUs just to manage the heat dissipation.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3