Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday October 05 2017, @12:12PM   Printer-friendly
from the please-let-the-vapors-condense dept.

From the lowRISC project's blog:

A high quality, upstream RISC-V backend for LLVM is perhaps the most frequently requested missing piece of the RISC-V software ecosystem… As always, you can track status here and find the code here.

RV32

100% of the GCC torture suite passes for RV32I at -O0, -O1, -O2, -O3, and -Os (after masking gcc-only tests). MC-layer (assembler) support for RV32IMAFD has now been implemented, as well as code generation for RV32IM.

RV64

This is the biggest change versus my last update. LLVM recently gained support for parameterising backends by register size, which allows code duplication to be massively reduced for architectures like RISC-V. As planned, I've gone ahead and implemented RV64I MC-layer and code generation support making use of this feature. I'm happy to report that 100% of the GCC torture suite passes for RV64I at O1, O2, O3 and Os (and there's a single compilation failure at O0). I'm very grateful for Krzysztof Parzyszek's (QUIC) work on variable-sized register classes, which has made it possible to parameterise the backend on XLEN in this way. That LLVM feature was actually motivated by requirements of the Hexagon architecture - I think this is a great example of how we can all benefit by contributing upstream to projects, even across different ISAs.

[...] Community members Luís Marques and David Craven have been experimenting with D and Rust support respectively.

[...] Approach and philosophy

As enthusiastic supporters of RISC-V, I think we all want to see a huge range of RISC-V core implementations, making different trade-offs or targeting different classes of applications. But we don't want to see that variety in the RISC-V ecosystem result in dozens of different vendor-specific compiler toolchains and a fractured software ecosystem. Unfortunately most work on LLVM for RISC-V has been invested in private/proprietary code bases or short-term prototypes. The work described in this post has been performed out in the open from the start, with a strong focus on code quality, testing, and on moving development upstream as quickly as possible - i.e. a solution for the long term.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by Anonymous Coward on Thursday October 05 2017, @12:52PM (31 children)

    by Anonymous Coward on Thursday October 05 2017, @12:52PM (#577421)

    They have a 128 bit variant of the cpu architecture available, and while it isn't available in FPGA, nevermind ASIC form yet AFAIK, it would really help to start clearing up the datatypes and working on 128 bit errata in the codebase now, rather than sitting on it like they did with the 64 bit CPU variants 15-25 years ago so that first the compiler, and then the C/C++ code it can compile is 128 bit clean long before we have working hardware implementations of it, so it will only be issues with the silicon rather than the already known theoreticals that hold things up/cause bugs.

    Personally I would like RISC-V to succeed, but until they tape out a trustworthy SoC or Desktop CPU as well as an independent motherboard chipset (I don't care as much about the bus latencies like AMD/Intel chose to, I would personally rather keep the CPU the CPU, and have memory/peripheral bus support off dedicated chips, makes it easier to support different CPU architectures, hopefully the J-series SuperH variants, and perhaps in the future x86 clones or other cpus people might get produced if there was an open motherboard chipset and a patent free cpu to northbridge interconnect bus), the whole project seems mostly like navel gazing to me. Embedded CPUs are already both plenty open and plenty cheap. AiO (think 8 bit computers/Raspberry Pi) are moderately closed, but with open variants for some chips/boards/models, whereas Desktop hardware, from the cpu to the motherboard to the display card are all heavily closed source and now basically impossible to modify at the software/firmware level. Those are the segment of the market that needs immediate attention, because without it all the other markets are already untrustworthy, since the toolchains, memory, capacity, etc for building software for the smaller devices are almost always focused on the desktop grade hardware. You can barely compile anything past C on a Raspberry Pi anymore for instance (Seriously, Firefox is like 4 gigs for some parts, and linking may be more now. LLVM can barely be done in 1 gig, but only if you ensure linking steps are done single-job since every compilation thread takes between 128 and 512 megs of RAM EACH. Not including linking or link time optimizations.)

    Starting Score:    0  points
    Moderation   +3  
       Interesting=1, Informative=2, Total=3
    Extra 'Informative' Modifier   0  

    Total Score:   3  
  • (Score: 2) by DannyB on Thursday October 05 2017, @01:14PM (7 children)

    by DannyB (5839) Subscriber Badge on Thursday October 05 2017, @01:14PM (#577425) Journal

    1. +1 Interesting
    2. Wow. I thought I wrote run on sentences.
    3. Isn't it possible to cross compile / link Raspberry Pi code from bigger beefier* machines. Or is that a problem?
    4. LLVM is something like a dream come true. Something I could only have dreamed of in the 90's. Not that I care to work at that level. But I had come to accept the idea of C source to be the "low level" intermediate language target for toy high level languages.

    * or some suitable alternative architecture for vegans

    --
    The lower I set my standards the more accomplishments I have.
    • (Score: 4, Informative) by Anonymous Coward on Thursday October 05 2017, @02:11PM (6 children)

      by Anonymous Coward on Thursday October 05 2017, @02:11PM (#577448)

      1. ...
      2. ...
      3. The bigger beefier machines now have either Trustzone or Intel ME with signed firmware running on them, rendering them even less trustworthy than prior concerns over cpu microcode updates (which at least would require unverified code to trigger any backdoors.) While the RPi SoC still has some issues, it has the firmware signing disabled (I believe it also supported an earlier TrustZone implementation that uses a shared cpu state, similiar to the hypervisor support in the RPi 2/3 SoC, but disabled for the RPi boards. It's an optional feature for people creating proprietary tamper-resistant devices utilizing the broadcom SoC.) Given that neither Intel nor AMD seems willing to offer minimal documented versions of their management engine firmware, and the fact that the capabilities each has involve network access and 'offers' you cryptographic storage of your 'assumed private' encryption keys, there is a very real danger that modern cpus are intended to either give up said keys on command, or be capable of triggering to automatically submit keys at some point in the future. Similiar concerns to Windows 10/8.1/7 telemetry, only at a much lower level. Combined with the restrictions on bios replacements, like coreboot/linuxbios had been working on for the past 15+ years which has become much more restricted in scope since 2009 and then 2013 (when first Intel then AMD started pushing in signed firmware blobs for their cpu/motherboard chipsets initialization), which in many cases can no longer be done without an external bios reflasher, since they now software strap the write pin in the firmware, rather than either a physical jumper on the motherboard, or a software jumper in the bios, you cannot even replace the bios if you have administrator access to the bios, because they choose to not let you.

      This is why the pushes for RISCV and the J-series SuperH fork are so important. x86 is not trustworthy. Depending on copyright status (since the patents for anything up to Pentium 3 should be expired now.) it might be possible to build new clone x86 systems up to Super Socket 7, Slot 1/2 and Socket 370, including the original AGTL bus (I believe it was GTL, AGTL, AGTL+ which was the P4 variant.) This might offer an option for people who need x86 for backwards compatibility as well as a trustworthy and audited implementation. But as Intel and AMD themselves have shown, errata is very common with x86 implementations, especially the out of order variants, and while you might produce chips that can interpret the same code, there is no guarantee it will be close enough to the originals for whatever legacy code you actually want to run.

      4. Wait and see. LLVM has two different 'good things' going for it. The one is that it is clean and relatively easy to extend. GCC was actually like this, but Stallman and the dumbasses steering the project decided to start obfuscating code and breaking interfaces to keep other companies from doing like NeXT and outright stealing the code (go read up on how long it took to get them to start releasing the ObjC frontend changes), or writing plugins for gcc and keeping them proprietary (which was considered a grey area of linking and might allow them to keep plugins proprietary that became considered 'critical' to developers using gcc, effectively making the open source compiler proprietary.) The other good thing about LLVM is the license, which ties back to the reason GCC went down the path it did: The license makes it easier for proprietary companies to extend, enhance, and monetize the platform. The downside to this as mentioned previous is that they can find something critical to the future expansion of the compiler and patent/copyright/whatever it and become tollkeepers to the use of the compiler suite for possibly an entire range of critical uses. There is the possibility the open source community or another corporation's workers will find a workaround and be able to circumvent whatever patent/code/copyright development they choose to do this, but as is sometimes the case with patents, there isn't ALWAYS a way to work around something and still provide the output needed to support the intended usecase, whether it is matching an ABI, ensuring code will fit into a certain limited amount of memory, or some form of error checking or correcting that is considered critical to security today.

      There is a lot of history in the past 30 years that gets glossed over or forgotten and much like the old saying, we are dooming ourselves to repeat the mistakes of the past, whether regarding proprietary computer systems, weak copyright open source, or too much corporate involvement. Others will obviously disagree with this assessment, and the pendulum swings both ways (as mentioned, gcc shot itself in the foot tryig to protect itself with both code roadblocks, and copyright roadblocks, which ended up weaking it so that LLVM could compelling take over from it.) GCC 7.2 apparently has gotten much better about it, but C++ is a horrible language for ABI stability, even above and beyond early C implementations.

      • (Score: 2) by DannyB on Thursday October 05 2017, @02:30PM (4 children)

        by DannyB (5839) Subscriber Badge on Thursday October 05 2017, @02:30PM (#577452) Journal

        The bigger beefier machines now have either Trustzone or Intel ME with signed firmware

        As soon as I read that much, I was immediately going OMG . . . . yikes! I see your point there immediately.

        Even without the caffeine from drinking a Lake made of Coffee1, I can see the possibilities. If a TLA can interfere with the compilation process, as in the famous Trusting Trust, which I read many aeons ago, they can compromise even trusted systems by compromising the binaries of the OSes, the compilers, etc. Of course such an attack would be immensely sophisticated.

        patent/copyright/whatever it and become tollkeepers

        You misspelled trollkeepers.

        I remember the daze of NeVR NeXT. In hindsight, it doesn't surprise me that a huckster would try to steal an open source compiler, or at least the massive work that went into it, without contributing back important changes. (eg, "think different") Now that I consider what you pointed out, it just reinforces my deep dislike of Steve Jobs and Apple. To think in my younger more naive days I was an Apple fanboy back in the 80's and early 90's. Wow.

        but C++ is a horrible language for ABI stability, even above and beyond early C implementations.

        I like to work at a much higher level than that. But I remember my C++ days in the 90's. I realized the ABI incompatibilities back then. Not to mention that every vendor compiler was a (different) subset of the overall C++ language. I fled to Java and have remained there, including higher level languages on top of the JVM. So that also makes me very interested in ways that JIT improvements can benefit me. (or my employer)

        1lets call it a Coffee Lake

        --
        The lower I set my standards the more accomplishments I have.
        • (Score: 0) by Anonymous Coward on Thursday October 05 2017, @03:27PM (3 children)

          by Anonymous Coward on Thursday October 05 2017, @03:27PM (#577468)

          Yeah.

          Mind you you might have a predisposition to root for assholes if you chose to jump from Apple and C++ to first Sun then Oracle and Java :P

          Not really a big fan of Java as a language, although I do appreciate the work it has lead to in VMs, especially JamVM. The promised write once run anywhere never really became compelling over write once compile anywhere, as both the systems monoculture and the number of platform specific java apis increased.

          • (Score: 2) by DannyB on Thursday October 05 2017, @05:47PM (2 children)

            by DannyB (5839) Subscriber Badge on Thursday October 05 2017, @05:47PM (#577543) Journal

            I don't root for Sun. I especially don't root for Oracle. Back when I did root for Apple, they were leading the industry in every way. But then Apple stumbled. Steve Jobs was brought back. And then everything became about fashion. Now Apple products aren't necessarily even user friendly. Let alone being leading tech. (using the highest end components is not leading tech) I could elaborate about how Apple was way ahead of PCs back in the day, the list of ways is long.

            Java is okay as a language. But if you don't like Java, then what you're probably not using is the tooling of modern IDEs. Modern IDEs do amazing things with Java source code. The compiler isn't an "executable". The compiler is a library with an API. The compiler executable is quite small but uses this huge library. The source editor is deeply integrated with the compiler. It keeps a database about the AST of all code in your project. Want to rename a method? The IDE precisely renames it everywhere. (Not by doing a dumb search and replace.) But precise renaming based on understanding of the AST of all other references to this identifier. You could use the same identifier name in some other context in the same project. Perhaps another class has an unrelated method of the same name. It won't be touched because the editor knows, just as the compiler knows, that it is something different. And renaming is just the tip of the iceberg of source code manipulations that the editor can do for you.

            I'm glad you recognize the VM. It is industrial strength. Choice of GC algorithms. Many tuning parameters. There are vendors such as Azul that have specialized GC's to offer. You can have dozens or hundreds of gigabytes and still have short GC times. I don't think most other language runtimes can boast that yet.

            The Java VM JIT is something. Two compilers C1 and C2. When your code becomes "hot" enough (using enough CPU according to dynamic profiling) it is rapidly compiled to native code using C1, and it is scheduled to be compiled later with C2. When C2 comes along, it will do a high quality compile of that method, spending more time optimizing. This is why Java applications seem to "warm up" and become very fast.

            The C2 compiler aggressively inlines. What if your method calls my method. Your method got compiled with C2 and aggressively inlined code from my method. Now, for whatever reason, a new version of the class containing my method is dynamically reloaded. Oh, no! Your method now has stale code inlined from my method. Not to worry. The JVM immediately invalidates your method's compiled code, and it goes back to interpreted mode again. If your method is hot enough to get recompiled again, it will go through the C1 / C2 process again.

            If you don't like Java, there are other languages that run on the JVM.

            Here is something I've seen before. People want something. With gobs of amazing features. Eventually somebody builds it. It has all your wish list items. But then the complaint is that it is big and complex. I want a simple text editor instead of an IDE. An IDE is too big an complicated. That's like complaining that you'd rather dig a ditch with a shovel instead of the noise and complexity of operating a backhoe. Or this . . . I don't like this big complex word processor compared to a simple text editor. Okay, but that word processor has a lot of features not in the text editor.

            Similarly with languages and runtimes. When languages get to be high level and offer tons of abstractions, some people, who like to work closer to the hardware, complain about the cost of those abstract languages. It's the same battle from the 1970's about using Pascal or Fortran vs Assembly language. Guess which languages won? The thing is: human productivity is a cost also. Would you rather have your new whizbang program six months sooner if it only used twice as much memory and 50% more cpu? A lot of people would say yes. But here's a better argument: if I can write in Java and get to market six months (or more) sooner than the C++ guy, my boss won't blink an eye if I want six times as much RAM. Memory is cheap -- in these terms. You can't buy back opportunity.

            --
            The lower I set my standards the more accomplishments I have.
            • (Score: 0) by Anonymous Coward on Thursday October 05 2017, @07:03PM (1 child)

              by Anonymous Coward on Thursday October 05 2017, @07:03PM (#577576)

              ... then you're not working on something worthwhile. It's just consumer crap.

              Anyway, as a language, C++ is superior because one of its guiding design principles is that you shouldn't have to pay for a feature you don't use; this has led to a language that can be used for coding from the lowest levels of abstraction to the highest, and without any overhead being imposed unnecessarily.

              That is why people don't like C++; it actually provides nearly everything that everyone wants, and people hate both having to make choices and having to deal with other people's choices.

      • (Score: 1, Touché) by Anonymous Coward on Thursday October 05 2017, @04:08PM

        by Anonymous Coward on Thursday October 05 2017, @04:08PM (#577491)

        GCC was actually like this, but Stallman and the dumbasses steering the project decided to start obfuscating code and breaking interfaces to keep other companies from doing like NeXT and outright stealing the code

        Ironically, by doing so, by his own definition Stallman made GCC nonfree. Because one of the four essential freedoms [gnu.org] is:

        The freedom to study how the program works, and change it so it does your computing as you wish

        And he goes on:

        A program is free software if it gives users adequately all of these freedoms. Otherwise, it is nonfree.

        And a bit further you find:

        Obfuscated “source code” is not real source code and does not count as source code.

  • (Score: 5, Interesting) by TheRaven on Thursday October 05 2017, @01:24PM (11 children)

    by TheRaven (270) on Thursday October 05 2017, @01:24PM (#577429) Journal

    The work that Alex has done is now using some work from a Krzysztof Parzyszek at CodeAurora to allow instruction definitions to support different register sizes depending on the current target. This means that most of the RV32 and RV64 code is shared, and it would be fairly easy to add support for RV128.

    That said, the vast majority of existing '64-bit' systems only support a maximum of 48 or 56-bit virtual addresses (often only 36- or 40-bit physical addresses), so it's going to be a long time before RV128 makes sense. Even if your demand for address space doubles every year, it's going to be 16 years before 64 bits starts to feel cramped, and so far address space demand has been growing a lot more slowly than that. Operating system support for RV128 is going to take a lot longer than compiler support.

    Oh, and in related news, I'm in the process of assembling the working group for the RISC-V J extension, to make RISC-V a better target for JITs. Anyone doing research in this space, watch the RISC-V mailing lists for the official announcement in a week or two.

    --
    sudo mod me up
    • (Score: 2, Interesting) by Anonymous Coward on Thursday October 05 2017, @03:30PM (8 children)

      by Anonymous Coward on Thursday October 05 2017, @03:30PM (#577470)

      Glad to hear we have members of the soylent community so closely involved in projects like this :)

      While most people still scoff at the need for 128 bit addressing/hardware, I don't forsee it taking as long as many others do to end up needing the added address space, especially given how many of the current 64 bit arches chose to cripple their 64bit addressing to values much less than 64 bits, in some cases requiring a near future kludge or breaking of backwards compatibility in order to allow userspace applications utilizing the full pointer size.

      • (Score: 4, Interesting) by Anonymous Coward on Thursday October 05 2017, @03:40PM (6 children)

        by Anonymous Coward on Thursday October 05 2017, @03:40PM (#577476)

        128 bit addressing would also be useful for IPv6 since it already uses 128 bit address fields and a 128 bit RISC-V chip might make a good alternative for routers and other devices that need to work with such values in as few cycles as possible.

        • (Score: 2) by Azuma Hazuki on Thursday October 05 2017, @07:22PM

          by Azuma Hazuki (5086) on Thursday October 05 2017, @07:22PM (#577585) Journal

          I was just about to ask what the use of a 128-bit ISA would be, and then saw this. Thanks :) That is absolutely brilliant, as it means a single IPv6 address can fit in, if I understand this right, one register of said CPU.

          --
          I am "that girl" your mother warned you about...
        • (Score: 0) by Anonymous Coward on Friday October 06 2017, @07:00AM (3 children)

          by Anonymous Coward on Friday October 06 2017, @07:00AM (#577851)

          Wouldn´t it be better to use vector instructions for this?

          • (Score: 2) by maxwell demon on Friday October 06 2017, @08:10AM (2 children)

            by maxwell demon (1608) on Friday October 06 2017, @08:10AM (#577879) Journal

            Is there a fundamental reason why it should not be possible to interpret one and the same register as a single 128 bit value for normal instructions, and as eight 16-bit values for vector instructions? Note that for some instructions (like bitwise operations) the difference is nonexistent, and for others (addition/subtraction) the only difference is that a few carry/borrow lines need to be disabled. And I guess even for multiplication, a lot of the circuitry could be shared between normal and vector instructions.

            --
            The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 3, Informative) by TheRaven on Friday October 06 2017, @11:49AM (1 child)

              by TheRaven (270) on Friday October 06 2017, @11:49AM (#577951) Journal

              You might want to look at Sun's MAJC architecture, which had a single register file that could be used for different types (integer, floating point, vector) depending on the instructions. More practically, architectural registers are a fiction on modern CPUs. The hardware has a lot more physical registers than it has architectural registers. These are mapped to physical registers on demand by the register rename unit (which is one of the most complex parts of a modern CPU). Typically, you have different banks of fixed-size registers, because it complicates rename logic to split them and you have redundant data in wires (i.e. heat) if you use larger rename registers than required, but there's nothing stopping you from using the same 128-bit rename registers for vectors and pointers (except that 128 bits is pretty small for a vector register these days).

              To the other part of your post, sharing ALUs between 16-bit vectors and 128-bit integers, there's a huge difference in the circuitry between 8 16-bit adders and one 128-bit adder. If you don't have a carry in, then your entire structure is different. You could create a 128-bit adder and put an extra and gate on 7 of the carry lines, connected to an is-this-a-128-bit-integer control signal (with it becoming an 8-element 16-bit vector if the bit is not set), but it would be a staggeringly inefficient vector adder. You'd also be optimising for the wrong thing. Since the end of Dennard Scaling (about a decade ago), transistors are cheap, transistors that you are actually using are expensive. Having two separate pipelines, one for adding integers, one for adding vectors, and only using one at a time is only marginally more expensive than having just one, and having a combined one that does both and is less efficient than either is more expensive than having both.

              --
              sudo mod me up
              • (Score: 2) by maxwell demon on Friday October 06 2017, @03:35PM

                by maxwell demon (1608) on Friday October 06 2017, @03:35PM (#578067) Journal

                Thanks; learned something new today.

                --
                The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 2) by TheRaven on Friday October 06 2017, @11:08AM

          by TheRaven (270) on Friday October 06 2017, @11:08AM (#577936) Journal
          Several things are wrong with this argument:
          • Unless you want to memory-map the Internet, your address size doesn't really matter for this kind of computation.
          • Routing decisions don't care about the size of the address, they care about the size of the network part of the address, which in IPv6 is usually 48 or 64 bits (the low 64 bits are all a link-local address).
          • Routers that care about performance have large TCAMs in hardware and don't do most of the address mapping in software anyway.
          --
          sudo mod me up
      • (Score: 2) by TheRaven on Friday October 06 2017, @11:10AM

        by TheRaven (270) on Friday October 06 2017, @11:10AM (#577937) Journal
        The only people I've worked with who are even starting to think about 64 bits not being enough, in a practical sense, are the HP guys working on The Machine. They're looking at a single address space OS, with all persistent storage directly addressable. 64 bits gives you 'only' 16EiB. That's far more than most data centres, but it's within the realms of possibility for a single rack-scale machine in a few years, assuming NVRAM densities keep increasing. It's a very long way off for any other use case, and most people aren't building single system image, flat address space, machines, even when that are building rack-scale machines.
        --
        sudo mod me up
    • (Score: 2) by jmorris on Thursday October 05 2017, @08:19PM (1 child)

      by jmorris (4844) on Thursday October 05 2017, @08:19PM (#577614)

      Even if your demand for address space doubles every year, it's going to be 16 years before 64 bits starts to feel cramped

      Probably not. You are assuming everything remains constant. What happens if The Machine's notion of -everything- memory mapped proves itself? Goodbye 64 bits is what, because we can already threaten that limit hard enough nobody would build a flat architecture around it. And since x86 clips to 56bits of max possible virtual memory space it could be exceeded today with a big enough cluster.

      • (Score: 2) by TheRaven on Friday October 06 2017, @11:12AM

        by TheRaven (270) on Friday October 06 2017, @11:12AM (#577939) Journal
        Even with The Machine, a single machine with 16 exabytes of storage is quite a few years off. Maybe not 16, but at least 5-10. The HP folk are the only people I work with who are serious about 56 bits not being enough, most people are fine with 48 (and most JavaScript implementations, for example, assume that only the low 48 bits of an address are significant, so going beyond 48 bits is going to require some fairly significant rewrites of various things).
        --
        sudo mod me up
  • (Score: 2) by turgid on Thursday October 05 2017, @01:55PM (2 children)

    by turgid (4318) Subscriber Badge on Thursday October 05 2017, @01:55PM (#577438) Journal

    64 bits ought to be enough for anyone. You can quote me on that.

    • (Score: 3, Interesting) by DannyB on Thursday October 05 2017, @02:08PM (1 child)

      by DannyB (5839) Subscriber Badge on Thursday October 05 2017, @02:08PM (#577445) Journal

      Instead of focusing on 128 bits. Then 256 bits, etc. I wish they would focus on putting MORE CORES into cpus. How about 16 cores. 32 cores. 64 cores. Etc. IMO, that is where the future lies as it becomes harder and more turgid to squeeze out the last drops of performance from a single core.

      After years of thinking about how to do it, we software dudes1 have accepted that we need to recognize and look for opportunities where things can be turned from loops into "work units" of smaller loops. These work units can then be queued and spread out across cores. Then the results combined, and then the main thread continues about its business. Of course, this works best if you have tens of millions of items you are going to process in a loop. Like some big data set. Modern languages have modern libraries that make it easy to ForkJoin and other approaches to solve this problem. And yes, if you have enough items in a loop to significantly exceed the cost of the multi threading overhead, then it adds a significant speedup. Even in a desktop application. (Trust me, I did it this just a few months back.)

      1not intended to exclude females

      --
      The lower I set my standards the more accomplishments I have.
      • (Score: 0) by Anonymous Coward on Thursday October 05 2017, @02:17PM

        by Anonymous Coward on Thursday October 05 2017, @02:17PM (#577449)

        Would be a far better solution, although it would cause variance in code size and execution, as well as issues with current languages that make assumptions about pointer and other type sizes.

        A number of processors over the years have used 'chunks' or 'segments' for this exact reason. The problem being some code/problems just works better with larger address spaces, the chunk swapping is time and bandwidth consuming, and both capacity is always increasing and many programmers are either sizing their code to match, or writing code sloppy enough to need the added ram, rather than engineering it efficiently to scale, regardless.

  • (Score: 2) by jmorris on Thursday October 05 2017, @08:28PM (1 child)

    by jmorris (4844) on Thursday October 05 2017, @08:28PM (#577619)

    Actually we might not have as much problem in the future with motherboards. Everything seems to be converging to a single base standard of differential pair signaling with 130/128 encoding. A modern CPU basically has a sea of pins using that base signaling and hooks groups of them to PCIe, intercpu links and soon memory controllers. USB even uses basically the same low level signaling. All that is needed now is a standardized cpu socket, some standards mandating that some pins always do X, etc. Assuming we can stop making ever larger packages with ever more set of pins, we don't switch to optical connects or something else unplanned for. And that is the bottom line, the industry can't really settle down to any sort of long term standard as long as everything keeps being upended every couple of years. Every time we think Moore's Law has hit the end of the line somebody gets clever.

    • (Score: 0) by Anonymous Coward on Thursday October 05 2017, @08:33PM

      by Anonymous Coward on Thursday October 05 2017, @08:33PM (#577620)

      Why has nobody put together a rough draft for a standard, so we don't have to speak in riddles?

  • (Score: 2) by LoRdTAW on Friday October 06 2017, @12:03AM

    by LoRdTAW (3755) on Friday October 06 2017, @12:03AM (#577707) Journal

    If we're talking about busses, have a look at RapidIO. Been around for a long time, supported in Linux, and heavily used in telecom, aerospace, and military systems. It's open and can handle the same tasks as PCIe and USB3 as it is a switched packet based protocol. It allows for chip-chip, board-board, and chassis-chassis connectivity so it covers all use cases. It also follows the Ethernet physical interface standards so existing ethernet hardware like SFP's for fiber or plain old 8p8c (RJ45) copper can be used for chassis-chassis connectivity. It behaves more like a network of devices and is not a master-slave system as in every one of Intel's hamstrung patent encumbered designs we are burdened with (USB, PCIe, Thunderbolt). It can also replace the front-side bus AND it can handle CC-NUMA for global shared memory across multiple CPU's/GPU's/DSP's/FPGA's etc. One bus to rule them all.

    Let's have a look at USB3, PCIe, Thunderbolt, and Displayport. All four are different protocols, each delivering a different type of data YET they all use the same basic principals: packet switched gigabit serial connections. All of those can be replaced by RapidIO. A reversible 2 lane RIO miniplug can replace USB3, Thunderbolt and Displayport. Three ports down, one to go. Redesign the current PCIe slots with a new key position and PCIe can be replaced by RIO, end of story. Bonus: we don't need any new standards eliminating NIH and early-adopter pains. Now you can expand the number of CPU's as easily as we can expand the number of GPU's, just plug em in! Taking that idea even further, a dual role port could be integrated which can switch from RIO to Ethernet. This way you could use cheap readily available CAT6/7 cables to hook up hardware at 10Gb. Switch from 10GbE to 10Gb RIO and hook up another GPU, or to an RIO switch and multiple GPU's. Or hard drives, or whatever. Maybe that port can multiplex both ethernet and RIO with dual role switches for data center and supercomputing.

    Dream SoC for desktop and laptop use:
    4 cores, 2+GHz, 64 bit w/128 bit for SIMD, etc.
    IOMMU, virtualization extensions, essentially all the modern desktop/server CPU functionality.
    Some kind of GPU.
    Single low power core for a big-little like setup that can allow for very low power idling.
    Dual channel 64 bit DDR4 memory with ECC support.
    Lots of RIO lanes, say 32.
    PCIe root complex with 16 lanes and four endpoints. Config options for 1x16, 2x8, 1x8 & 3x1, or 4x4 (quad NVME).
    USB 2 controller and possibly a USB 3 controller for "legacy" support.
    single or dual integrated 100/1Gb/10GbE ports with RIO switching. Or just GbE.
    The rest of the peripherals can easily fit on board like SPI/Queued SPI/Quad SPI (for SD cards & eMMC), i2c/i2s/SMbus etc, audio, UART/USART, PIT's, RTC's and other basic hardware.

    I say keep the memory controller on the CPU die, memory bandwidth and latency is critical. Moving it off the chip could increase latencies and reduce bandwidth. That is very beneficial for multi-socket systems. The 32x RIO lanes would allow you to hook that SoC to another for 8 cores and/or divide them up for internal/external RIO ports or routed to a switch, etc. Maybe you could build a small cluster with multiple SoC's via RIO switches or an internal on-die RIO switch that can allow the SoC's to network to each other without external silicon. The PCIe root complex allows you to utilize existing PCIe hardware until/if RIO peripherals such as GPU's, mass storage controllers, networking and other I/O devices become available.

    Oh, I'm not done. see here: https://soylentnews.org/comments.pl?sid=21917 [soylentnews.org]

  • (Score: 2) by RamiK on Sunday October 08 2017, @12:34PM (4 children)

    by RamiK (1813) on Sunday October 08 2017, @12:34PM (#578860)

    The RV128 specs serve to to prevent future (like, circa 2030) fragmentation over a few people pushing for larger address space instructions while others trying to code crappy hacks around it:

    The primary reason to extend integer register width is to support larger address spaces. Although some applications would benefit from wider integer support, including cryptography, these are best added as packed-SIMD extensions to the f registers to avoid growing the size of address pointers. It is not clear when a flat address space larger than 64 bits will be required. At the time of writing, the fastest supercomputer in the world as measured by the Top500 benchmark had over 1 PB of DRAM, and would require over 50 bits of address space if all the DRAM resided in a single address space. Some warehouse-scale computers already contain even larger quantities of DRAM, and new dense solid-state non-volatile memories and fast interconnect technologies might drive a demand for even larger memory spaces. Exascale systems research is targeting 100 PB memory systems, which occupy 57 bits of address space. At historic rates of growth, it is possible that greater than 64 bits of address space might be required before 2030. History suggests that whenever it becomes clear that more than 64 bits of address space is needed, architects will repeat intensive debates about alternatives to extending the address space, including segmentation, 96-bit address spaces, and software workarounds, until, finally, flat 128-bit address spaces will be adopted as the simplest and best solution.

    http://digitalassets.lib.berkeley.edu/techreports/ucb/text/EECS-2014-54.pdf [berkeley.edu] (p.91)

    If the strategy ends up a success, around the time RV128 sees compiler and verilog work done for it, RV256 will be announced just to further future proof the ISA.

    --
    compiling...
    • (Score: 2) by TheRaven on Monday October 09 2017, @08:57AM (3 children)

      by TheRaven (270) on Monday October 09 2017, @08:57AM (#579199) Journal

      History suggests that whenever it becomes clear that more than 64 bits of address space is needed, architects will repeat intensive debates about alternatives to extending the address space, including segmentation, 96-bit address spaces, and software workarounds, until, finally, flat 128-bit address spaces will be adopted as the simplest and best solution.

      That's by no means certain. Even with current machines with more than a few tens of GBs of RAM, NUMA concerns are increasingly important. RAM density is increasing, but it's hitting the same scaling limitations as other ICs: the power consumption isn't dropping proportionally. This means that it's very likely that, long before you need 65 bits of address space, you'll be talking about lots of different speeds of DRAM. It's not clear that a uniform naming scheme for resources with very different characteristics (e.g. 300 cycle to 30,000 cycle response times) is actually beneficial. It's also not clear that languages with a simple pointers-are-integers abstraction will be dominant by the time that this happens. From the perspective of a high-level language compiler, address vs segment + offset are not that different and the latter can make garbage collection easier (in fact, high-performance JVM implementations effectively emulate segments by splitting the address bits so that the high ones indicate a GC region and the low ones indicate an offset). For languages with immutable types, having a cheap way of telling the difference between local and distant RAM and copying immutable types from distant to local RAM gives a nice speedup.

      --
      sudo mod me up
      • (Score: 2) by RamiK on Monday October 09 2017, @10:18AM (2 children)

        by RamiK (1813) on Monday October 09 2017, @10:18AM (#579216)

        you'll be talking about lots of different speeds of DRAM.

        We already have that. It's called cache levels. Peripheral or not, it will be handled by the hardware.

        It's also not clear that languages with a simple pointers-are-integers abstraction will be dominant by the time that this happens.

        RISC-V already does base and bound schemes and the physical protection memory and virtual memory proposals I've seen so far are fairly conventional so I fail to see the relevance.

        and the latter can make garbage collection easier

        Beyond what was previously said about base&bound, hardware assisted garbage collection goes back to Oberon and was fairly recently discussed here [berkeley.edu]. They're using Berkeley's out-of-order machine [github.io] (RV64G) to benchmark stuff and I think they actually influenced some of the changes that recently made it into version 2 [berkeley.edu] but I don't see them doing anything regarding the ISA's addressing (though they do mix different SRAM types which I guess is somewhat relevant to your previous DRAM comment?) so it doesn't appear to be that much of an issue when all things considered.

        Overall, the way I see it, non-integer pointers and garbage collection hasn't got much to do with the with of the address space width.

        --
        compiling...
        • (Score: 2) by TheRaven on Monday October 09 2017, @11:11AM (1 child)

          by TheRaven (270) on Monday October 09 2017, @11:11AM (#579235) Journal

          We already have that. It's called cache levels. Peripheral or not, it will be handled by the hardware

          Caches are one of the biggest obstacles to writing efficient code today and they don't scale to very large systems. Take a look at the big SGI machines: not exposing varying latency for their flat address space is one of the major sources of inefficiency and the reason why people tend to use MPI interfaces rather than the distributed shared memory schemes when writing high-performance code on them.

          RISC-V already does base and bound schemes and the physical protection memory and virtual memory proposals I've seen so far are fairly conventional so I fail to see the relevance.

          The relevance is that having a linear flat address space is very useful if you care primarily about C code and far less important for other languages. In 10-15 years, when people start really caring about 65-bit address spaces, if they aren't primarily interested in C code then you won't see the same set of core requirements.

          Overall, the way I see it, non-integer pointers and garbage collection hasn't got much to do with the with of the address space width.

          They have nothing to do with address space width, but they have a lot to do with address representation.

          --
          sudo mod me up
          • (Score: 0) by Anonymous Coward on Monday October 09 2017, @06:05PM

            by Anonymous Coward on Monday October 09 2017, @06:05PM (#579343)

            nothing to do with address space width, but they have a lot to do with address representation

            You know, it's customary to start new discussion threads when going off topic like that. As a friendly reminder, OP was questioning why RISC-V's RV128 isn't being implemented to which RamiK responded by quoting the official stance from ISA papers regarding why they bothered to specify it and extended that logic to why the implementer decided not to implement it. If you have your own points regarding the future of memory representation and how it will obsolete RV128 that's all well and good, but it doesn't change what the specs and implementers are saying and doing.

            sudo mod me up

            -2 Off-topic

            +1 Interesting