Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Thursday June 25 2015, @06:07PM   Printer-friendly
from the one-step-at-a-time dept.

Nvida's latest mark of their newly discovered open-source kindness is beginning to provide open-source hardware reference headers for their latest GK20A/GM20B Tegra GPUs while they are working to also provide hardware header files on their older GPUs. These programming header files in turn will help the development of the open-source Nouveau video driver as up to this point they have had to do much of the development via reverse-engineering.

In order to drive Nouveau as NVIDIA's primary development environment for Tegra, they are looking at adding "official" hardware reference headers to Nouveau. Ken explained, " The headers are derived from the information we use internally. I have arranged the definitions such that the similarities and differences between GPUs is made explicit. I am happy to explain the rationale for any design choices and since I wrote the generator I am able to tweak them in almost any way the community prefers."

So far he has been cleared to provide the programming headers for the GK20A and GM20B. For those concerned this is just an item for driving up future Tegra sales, Ken added, "over the long-term I'm confident any information we need to fill-in functionality >= NV50/G80 will be made public eventually. We just need to go through the internal steps necessary to make that happen."

Perhaps most interesting is that moving forward they would like to use the Nouveau kernel driver code-base as the primary development environment for new hardware. In 2012 Torvalds sent a public "fuck you!" to Nvidia. Also, don't forget Intel and AMD offerings.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Insightful) by Anonymous Coward on Thursday June 25 2015, @09:36PM

    by Anonymous Coward on Thursday June 25 2015, @09:36PM (#201236)

    I can give a good reason why they don't want to be so forthcoming with the intricate details of how their hardware works. Remember the day when a Sound Blaster branded card was the sound card of choice if you didn't want to be laughed at? Shortly after Creative Lab's peak popularity, clones started appearing.

    As another anon states, the clones just provided the same interface to the hardware, tracing the circuitry it was easy to see how the SB worked. However, there is another reason that GPU makers are holding their proprietary driver blobs dear: Look at what happened to audio cards. There were all sorts of features, more mixing channels, bigger instrument banks, etc. Now none of that matters as it's all mixed CPU side, we just need an interface to select what speaker gets what batch of samples at what rate and we're done. The same is happening to GPUs.

    At first there was the Fixed Function Pipeline. GPU code could optimize around the interface to the fixed function pipeline. Now the programmable pipeline has made all such functional offerings obsolete. E.g., offerings based on how many dynamic lights the "card" supported are now completely redundant as pixel shader code can support as many or as few lights with as many or as few attributes as it wants to per pass -- The number of lights is not "hardware" dependent anymore, and it largely never was (there was some algorithmic hardware acceleration for some aspects of the fixed function pipe at first, but they quickly migrated to firmware [supplied by drivers]). With GPU Compute shaders and shared memory architectures the line between GPU and CPU is dissolving. Soon we''ll have back the freedom of pure software rasterization (which could freely interact with CPU memory, not needing separate geometry memory for networking, physics, etc, and no bottleneck between the physics / networking / rendering RAM). Just as we once had a separate FPU or Math Co-processor which are now integrated into the CPUs, the paralellization of GPUs will merge with CPUs and the only distinguishing feature will then be the driver code that OS's provide to developers.

    It's a losing battle to keep the drivers proprietary. Eventually they must be made openly available if we're ever to do online banking, email, etc. on hardware that utilizes the "GPU" as a component of ordinary processing, like the FPU is used now. GPU vendors know the future is coming and are vying to have their proprietary vectorization solutions positioned to become a defacto standard and thus have more control over their competitors.

    Starting Score:    0  points
    Moderation   +1  
       Insightful=1, Total=1
    Extra 'Insightful' Modifier   0  

    Total Score:   1  
  • (Score: 2) by kaszz on Thursday June 25 2015, @11:11PM

    by kaszz (4211) on Thursday June 25 2015, @11:11PM (#201284) Journal

    Will the GPU be generic and efficient enough to really make the CPU obsolete in order to merge with it in a meaningful way?
    GPUs are what I know very good for specific algorithms on large amounts of data. But generic processing is another thing.

    • (Score: 2) by TheRaven on Friday June 26 2015, @10:00AM

      by TheRaven (270) on Friday June 26 2015, @10:00AM (#201455) Journal
      The nVidia Project Denver architecture looks like it could be the start of this kind of convergence. It's based on the Transmeta designs after nVidia bought them and has an internal (private, undocumented) VLIW instruction set and an ARM decoder that performs trivial translation from ARM to VLIW. There's also a JIT that will trace ARM operations and compile hot code paths to a much more efficient encoding. The structure of the VLIW pipeline borrows a lot from nVidia GPU designs and it would be quite easy to imagine a variant with a much wider set of FPU pipes for SIMT code and a explicit instructions for turning them on and off that would allow the control and register renaming logic, as well as a subset of the pipelines, to be shared between the CPU and GPU.
      --
      sudo mod me up
      • (Score: 2) by kaszz on Friday June 26 2015, @10:32AM

        by kaszz (4211) on Friday June 26 2015, @10:32AM (#201462) Journal

        If they try to keep instruction sets secret when delving into the generic CPU area. They will have a problem..

        Would be nice if sellers started with a sticker [Nvidia free]. ;)

        • (Score: 0) by Anonymous Coward on Friday June 26 2015, @04:32PM

          by Anonymous Coward on Friday June 26 2015, @04:32PM (#201574)

          So Intel does publish the microcode instruction set of their CPUs? I don't think so.

        • (Score: 2) by TheRaven on Monday June 29 2015, @08:40AM

          by TheRaven (270) on Monday June 29 2015, @08:40AM (#202694) Journal
          As the other poster points out, x86 vendors do this: their CPUs translate x86 instructions into something totally undocumented (and do optimisations at this level, fusing micro-ops from different instructions into single micro-ops and even doing some quite clever things like recognising memcpy idioms). There's no real difference for nVidia doing this. It's quite nice to keep the real ISA secret, because it means that you can change it periodically. IBM has done this with their mainframes quite explicitly since the '60s, with a public ISA that's completely decoupled from the implementation.
          --
          sudo mod me up
  • (Score: 3, Interesting) by gman003 on Thursday June 25 2015, @11:45PM

    by gman003 (4155) on Thursday June 25 2015, @11:45PM (#201303)

    You have quite clearly never done any serious GPU programming.

    "Smart" sound cards, that did heavy audio processing, fell by the wayside for two reasons: low-end, dumb, integrated audio was good enough for enough people that dedicated sound cards became rare, too rare for most applications to bother supporting, and ore importantly, audio processing was easy and efficient to do on the CPU. Audio processing is basically multiply-accumulate buffers - attenuate this source by this much, then add it to the sound buffer. The only thing I can think of that wouldn't be done with FMA is pitch shifting, which I would guess is a FFT, still easy enough to do on a CPU. And it's a small, one-dimensional buffer - a second of stereo audio is only 172KiB, so you can have quite a number of buffers before the data set becomes too large.

    Compare graphics. First, the scale of the problem goes up exponentially because it's 2D, not 1D, buffers, and there's a lot more of them (instead of left/right audio, you have red, green, blue, depth, and alpha, and you may be compositing several full-screen buffers for one frame). A single 1080p buffer is about 8MiB, and you'll be using many of those.

    Next, rasterization is an implicitly parallel task. Efficient cores - superscalar, out-of-order, speculative-executing big CPU cores - don't work particularly faster than a single GPU "core", which is scalar, in-order and literally dumber than a Pentium Pro core. But a decent GPU has hundreds of ALUs, and a top-end one has thousands, while the widest CPU I know of is just 18 cores. GPUs are an embarrassingly parallel solution to an embarrassingly parallel problem. There are some old algorithms that could exploit CPU efficiencies better, but even modern software renderers don't use them. It's easier to just throw cores at the problem.

    And all those cores need to be fed. GPU caches can be small, because for the most part you process one tile and then move on, but the memory bandwidth needed is staggering. An entry-level gaming GPU will push about 80GiB/s in memory I/O, which beats even a top-end server CPU. The top end GPUs can push about 600GiB/s, and once HBM matures, we're looking at terabytes per second of data being pushed.

    We're never going to move graphics back to the CPU. Best-case, we'll have an onboard GPU that's fully-programmable, with a published ISA and the few remaining fixed-function blocks moved into code, attached to a CPU with a far, far wider memory bus than today. But if we hit a wall on processor scaling before we hit a wall on LCD resolutions, we'll still have discrete GPUs just to manage the heat dissipation.