Stories
Slash Boxes
Comments

SoylentNews is people

posted by Cactus on Saturday March 08 2014, @07:30AM   Printer-friendly
from the don't-tell-me-upgrade-PCs dept.

Subsentient writes:

"I'm a C programmer and Linux enthusiast. For some time, I've had it on my agenda to build the new version of my i586/Pentium 1 compatible distro, since I have a lot of machines that aren't i686 that are still pretty useful.

Let me tell you, since I started working on this, I've been in hell these last few days! The Pentium Pro was the first chip to support CMOV (Conditional move), and although that was many years ago, lots of chips were still manufactured that didn't support this (or had it broken), including many semi-modern VIA chips, and the old AMD K6.

Just about every package that has to deal with multimedia has lots of inline assembler, and most of it contains CMOV. Most packages let you disable it, either with a switch like ./configure --disable-asm or by tricking it into thinking your chip doesn't support it, but some of them (like MPlayer, libvpx/vp9) do NOT. This means, that although my machines are otherwise full blown, good, honest x86-32 chips, I cannot use that software at all, because it always builds in bad instructions, thanks to these huge amounts of inline assembly!

Of course, then there's the fact that these packages, that could otherwise possibly build and work on all types of chips, are now limited to what's usually the ARM/PPC/x86 triumvirate (sorry, no SPARC Linux!), and the small issue that inline assembly is not actually supported by the C standard.

Is assembly worth it for the handicaps and trouble that it brings? Personally I am a language lawyer/standard Nazi, so inline ASM doesn't sit well with me for additional reasons."

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by neagix on Saturday March 08 2014, @08:04AM

    by neagix (25) on Saturday March 08 2014, @08:04AM (#13128)
    Maybe we need the equivalent of Clang for assembly (maybe not).

    Back then when I learnt with Ketman Assembly language tutorial [oocities.org] I was thrilled by the opportunity of having such an "intimate" connection with the machine, perhaps assembly needs to be taken more seriously as a learning tool as well before we can use it outside of expensive IDEs or as C speed-up hacks.
    • (Score: 5, Funny) by jt on Saturday March 08 2014, @09:17AM

      by jt (2890) on Saturday March 08 2014, @09:17AM (#13136)

      C _is_ a nice, portable macro assembly language :)

      • (Score: 3, Insightful) by neagix on Saturday March 08 2014, @09:47AM

        by neagix (25) on Saturday March 08 2014, @09:47AM (#13139)

        Sure, but is it maintainable to extend C with the CPU-specific trove of shiny optimized instructions for very specific tasks?

        But yeah, I get it's a flaming discussion..

        • (Score: 2) by mojo chan on Saturday March 08 2014, @03:07PM

          by mojo chan (266) on Saturday March 08 2014, @03:07PM (#13200)

          It's easy to maintain such a collection, just add them to the compiler's optimizer. Your compiler is open source, right? ;-)

          --
          const int one = 65536; (Silvermoon, Texture.cs)
        • (Score: 1, Funny) by Anonymous Coward on Saturday March 08 2014, @06:30PM

          by Anonymous Coward on Saturday March 08 2014, @06:30PM (#13259)

          > Sure, but is it maintainable to extend C with the CPU-specific trove of shiny optimized instructions for very specific tasks?

          Yes, we call those "libraries." :)

        • (Score: 2) by TheRaven on Sunday March 09 2014, @10:14AM

          by TheRaven (270) on Sunday March 09 2014, @10:14AM (#13519) Journal
          That's what most compilers do. Very often, the SSE or equivalent instructions are exposed as intrinsics. You use them as if they were functions from C, but they will use the compiler's register allocator, will typically be expanded to a single instruction, can be mixed with other constructs, and have known semantics and so can be reordered based on the compiler's knowledge of the target pipeline (or to reduce register pressure elsewhere). When clang didn't support some inline assembly from a package, we tried replacing it with some C using compiler intrinsics and found that the result ran faster, with both gcc and clang, than the original.
          --
          sudo mod me up
          • (Score: 2) by neagix on Sunday March 09 2014, @12:27PM

            by neagix (25) on Sunday March 09 2014, @12:27PM (#13547)

            Although it's a nice anecdotal example, since my OP I was referring to all cases where you can't ignore assembly for extra juice or features, but still you would prefer a graceful fallback to the #ifdef jungle.

            Please don't let me open the "bestiary", we would fight on each case with anecdotes, but I am talking about emulation (also dynamic recompilation), GPU/CPU integration quirks, codecs, and architecture-specific gotchas for embedded devices, more in general where - as I said - you want the best available OR a graceful fallback, without having to lay boilerplates.

            I know libraries is the most obvious answer, but not a perfect solution (higher cost, less elegant because of very tiny functional payload). Another way could be CMAKE sorcerery, but CMAKE is far high in the foodchain to not strip the power of being in contact with the machine. In a perfect standardized world [xkcd.com] we wouldn't have these problems in first place (discussing this is OT).
            So I was wondering if there could be a "framework-like" approach/solution to the problem (maybe not). The premise to discuss is if there is enough redundancy to find common patterns/development shortcuts.

            I know the riddle, "let's make a factory to make factories of hammers" and so on, but I would consider such idea IIF (if and only if) it doesn't clash with C. It's quite a theoretical discussion, but basically nothing new at thinking to put a portability patch on the mess vendors do with hardware we buy (just look at the hard work in Linux kernel over the years).

      • (Score: 1, Informative) by Anonymous Coward on Saturday March 08 2014, @07:00PM

        by Anonymous Coward on Saturday March 08 2014, @07:00PM (#13267)

        > C _is_ a nice, portable macro assembly language :)

        I know this is an old joke, but it still bugs me when I hear it. Coding in C is nothing like coding in assembly. Assembly coding is about more than just fast, compact code.

        • (Score: 1) by HiThere on Saturday March 08 2014, @07:59PM

          by HiThere (866) Subscriber Badge on Saturday March 08 2014, @07:59PM (#13290) Journal

          FWIW, it's not entirely a joke. I've seen an implementation of C that was nearly complete done in M6800 (or possibly M68000) assembly language. It was in an old issue of Byte. And it was nearly as complete, though possibly not as fast, as Lifeboat C, which was then a commercial C (subset) for the i8080 or z80.

          That said, I agree that the thought processes used in assembler coding are different from those used in C coding...but they CAN be made to be the same.

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 2) by jt on Saturday March 08 2014, @09:29PM

          by jt (2890) on Saturday March 08 2014, @09:29PM (#13310)

          Having spent years writing C and various assembly languages, as well as higher level stuff more recently, it is only partially a joke. Much of the C language design makes sense when you consider the transformation of language features into machine code structures and typical general lurpose CPU instruction sets.

    • (Score: 0) by Anonymous Coward on Sunday March 09 2014, @09:09PM

      by Anonymous Coward on Sunday March 09 2014, @09:09PM (#13637)

      > Back then when I learnt with Ketman Assembly language tutorial

      What ever happened to Ketman? Its page is long dead. Is there a similar, more up-to-date tutorial available?

  • (Score: 5, Informative) by SGT CAPSLOCK on Saturday March 08 2014, @08:24AM

    by SGT CAPSLOCK (118) on Saturday March 08 2014, @08:24AM (#13130) Journal

    I used to love writing assembly code back when it felt meaningful, but it feels like it's mostly just a bother now. Modern C compilers generate and optimize much better code than I could ever write by hand, but maybe there are still some tricks that the hardcore asm guys know about...

    The last time I wrote any assembly code was to make use of some special instructions on a chip the Playstation Portable has for vector floating point arithmetic for some homebrew stuff I was doing. I haven't touched a line of asm since, and it's been years... and years... and years...

    • (Score: 2, Interesting) by neagix on Saturday March 08 2014, @09:51AM

      by neagix (25) on Saturday March 08 2014, @09:51AM (#13141)

      I think here we are talking about exactly the "specialty of the day" of various houses, across time.

      It is also a game of competitiveness vs portability where users are pawns

    • (Score: 5, Insightful) by mojo chan on Saturday March 08 2014, @10:58AM

      by mojo chan (266) on Saturday March 08 2014, @10:58AM (#13152)

      Modern C compilers generate and optimize much better code than I could ever write by hand

      This. x86 assembler in particular is now so convoluted and as the OP points out varies from CPU to CPU that the only sane thing to do is let the compiler handle it. On other platforms that have a nice stable ISA or where performance is a real issue (like microcontrollers) assembler makes sense, but not on x86.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      • (Score: 3, Informative) by mashdar on Saturday March 08 2014, @03:18PM

        by mashdar (3505) on Saturday March 08 2014, @03:18PM (#13207)

        Even for embedded systems, if you are not intimately familiar with the architecture, writing assembly instead of C is probably a waste of time.
        Knowing the timings for all of the pipeline stages, branch prediction schemes, out-of-order-execution schemes, etc, is hardly trivial. And then you need to be good at algorithm design to take advantage of that knowledge.
        C compilers are pretty darn good these days, and 99.9% of applications do not require cycle-optimum operation thanks to higher clock speeds and IPC on modern uCs.

        • (Score: 3, Informative) by Anonymous Coward on Saturday March 08 2014, @05:27PM

          by Anonymous Coward on Saturday March 08 2014, @05:27PM (#13239)

          However video players belong to the few applications which still profit from such optimization.

        • (Score: 3, Interesting) by mojo chan on Saturday March 08 2014, @06:07PM

          by mojo chan (266) on Saturday March 08 2014, @06:07PM (#13251)

          I think you are confusing embedded computer systems based on 32/64 bit architectures with microcontrollers, which are mostly 8 or 16 bit and rarely have features like pipelining, branch prediction or out of order execution. Those things would actually be highly detrimental on micros, because they break deterministic timing and use a lot of power.

          --
          const int one = 65536; (Silvermoon, Texture.cs)
    • (Score: 5, Interesting) by mindriot on Saturday March 08 2014, @01:19PM

      by mindriot (928) on Saturday March 08 2014, @01:19PM (#13174)

      I used to love writing assembly code back when it felt meaningful, but it feels like it's mostly just a bother now. Modern C compilers generate and optimize much better code than I could ever write by hand, but maybe there are still some tricks that the hardcore asm guys know about...

      That depends on what kind of code you're working with. When you need to make the most of SIMD instructions, there's no way around assembly. I use the Eigen matrix library [tuxfamily.org] a lot in my day-to-day work (IMHO the best C++ matrix library out there in terms of usability and performance), and under the hood both make extensive use of vectorized code (for an example, dig into the Eigen sources [bitbucket.org]). The library user never has to see this or worry about it; but it is the underlying assembly code for vectorization that makes these libraries as fast as they are.

      That said, personally I'm quite happy I haven't had to deal with such code myself for quite some time. Such assembly coding is necessary for these library developers, but I'd rather stick to developing my higher-level code than having to fiddle with such details. YMMV :)

      --
      soylent_uid=$(echo $slash_uid|cut -c1,3,5)
      • (Score: 2, Interesting) by FatPhil on Saturday March 08 2014, @07:40PM

        by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Saturday March 08 2014, @07:40PM (#13281) Homepage
        > When you need to make the most of SIMD instructions, there's no way around assembly.

        > I use the Eigen matrix library a lot in my day-to-day work (IMHO the best C++ matrix library out there in terms of usability and performance)

        How the fuck is watching such obvious self-contradiction "interesting"?

        "Use assembly!"

        followed by

        "Use C++!"

        You, and some fuck-nutted moderators, have fundamentally misunderstood the purpose of high level languages such as C++ and of libraries. *Someone else* writes the asm, and packages it up in a library that you can link to from your high level language of choice.
        --
        Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
        • (Score: 3, Informative) by mindriot on Sunday March 09 2014, @03:12AM

          by mindriot (928) on Sunday March 09 2014, @03:12AM (#13421)

          Wow, that's quite some language there. Had a bad day?

          You must have missed the part where Eigen is written in C++ code which uses ... inline assembly statements.

          In other words, while the user of this library may get to avoid inline assembly code, the developer doesn't avoid it. So, getting back to the original question, if you'd like me to be more specific: most of the time, you'll be fine writing high-level code and using libraries which may or may not use inline assembly. If you're the one developing said library, or you need to seriously get the most performance out of your code especially SIMD instructions , assembly can still be worth it.

          --
          soylent_uid=$(echo $slash_uid|cut -c1,3,5)
  • (Score: 5, Insightful) by Anonymous Coward on Saturday March 08 2014, @08:27AM

    by Anonymous Coward on Saturday March 08 2014, @08:27AM (#13131)

    Let me get this straight, you're asking if the significant performance gains for 99.99% of users is worth the trouble it gives you in your niche hobby?

    • (Score: 5, Funny) by M. Baranczak on Saturday March 08 2014, @03:37PM

      by M. Baranczak (1673) on Saturday March 08 2014, @03:37PM (#13212)

      Seriously, how many people feel the burning desire to run multi-media apps under Linux on a Pentium 1? Three, four? Why don't you just call these people on the phone and summarize the plot of the movie for them?

      • (Score: 2) by maxwell demon on Saturday March 08 2014, @06:22PM

        by maxwell demon (1608) on Saturday March 08 2014, @06:22PM (#13255) Journal

        Because the movie industry would sue him for copyright infringment. Duh.

        --
        The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 0) by Anonymous Coward on Sunday March 09 2014, @07:18AM

        by Anonymous Coward on Sunday March 09 2014, @07:18AM (#13478)

        Regard the Pentium 1 / Pentium Pro dividing line, even if folks dodge the CMOV issue, there is another gotcha: Distros are starting to require support for PAE [wikipedia.org].

        I can't even count the number of times I've seen folks in forums say "I have a Pentium M" without specifying Banias or Dothan [wikipedia.org]. (Banias does not support PAE.)

        -- gewg_

        • (Score: 0) by Anonymous Coward on Monday March 10 2014, @09:14AM

          by Anonymous Coward on Monday March 10 2014, @09:14AM (#13797)

          Banias or Dothan

          And my axe!

  • (Score: 4, Insightful) by maxwell demon on Saturday March 08 2014, @08:41AM

    by maxwell demon (1608) on Saturday March 08 2014, @08:41AM (#13134) Journal

    A simple solution: Just don't support those multimedia packages in your distro. Or alternatively, get a sufficiently old version which still runs on those old computers (they probably are better optimized for those computers anyway).

    --
    The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 5, Informative) by jt on Saturday March 08 2014, @09:15AM

    by jt (2890) on Saturday March 08 2014, @09:15AM (#13135)

    Quoteth Knuth, 'Premature optimization is the root of all evil.' But, sometimes you really need it, where your compiler doesn't support the machine instructions you need, or the optimizer doesn't generate the optimal code and you can do better.

    That said, most hand-crafted assembly is trash; both inefficient and buggy. It's really hard to employ all the clever tricks available to a modern compiler and optimizer. Even harder if you want your code to work right for all the awkward edge cases. Seriously, compile some simple algorithms and then try to work back from the object code to the algorithm. Good luck with that.

    So it really depends on your goals. Squeezing more performance where you know more than the compiler? Yes. Portability and readability? No.

    • (Score: 3, Insightful) by HonestFlames on Saturday March 08 2014, @10:13AM

      by HonestFlames (3704) on Saturday March 08 2014, @10:13AM (#13148)

      Knowing more than the compiler is *very* difficult. Back when a 50MHz 68030 was considered relatively quick, it was possible to really understand every intricate detail of the CPU, RAM and DMA timing interactions. Amazingly optimised code was produced in the 90's and 00's by people who intuitively understood how things were moving through the machine.

      I got to the point where I understood it if I kept the Motorola CPU reference books on hand.

      You have either got to be arrogant, genius or arrogant genius to think you can outsmart a compiler in more than a handful of situations.

      Are you *sure* your routine is quicker if the CPU has just failed a branch prediction, if the other 2 cores are munching through a video encode, if you're running on a second version AMD FX chip instead of a first?

      There are just too many things going on and too many variables in the pot to be able to accurately second-guess a modern compiler, especially if you add to the mix all the tools available outside of the main compiler... static code analysis, profilers that actually run through the compiled software to identify common code paths... probably other things that I don't even know exist.

      • (Score: 5, Informative) by hankwang on Saturday March 08 2014, @11:53AM

        by hankwang (100) on Saturday March 08 2014, @11:53AM (#13161) Homepage

        Knowing more than the compiler is *very* difficult.

        Depends. Around 2001 I tried to implement a calculation algorithm in C/C++ that would use table lookups with interpolation in order to prevent spending too much time calculating some expensive function. Compiled in gcc -O2, I think running on a an Athlon. It was something like this:

        int i = (int) floor((x - x0)*(1.0/step_size));
        double y = table[i];
        // do something with y

        It was slow as a dog. It turned out that the combination of gcc and glibc turned this into something like

        save FPU status
        set FPU status to "floor" rounding
        round
        restore FPU status
        save FPU status
        set FPU status to "truncate" rounding
        assign to int
        restore FPU status

        Each of these four FPU status changes would flush the entire CPU/FPU pipeline and this happened at some inner loop that was called a hundred million times. Replacing this by a few lines of assembler sped upt the program by a factor 10 or so.

        I'm also not so convinced that gcc and g++ do a very good job at emitting vector instructions (e.g. SSE) all by themselves. If I write a loop such as

        float x[8], y[8];
        // ...
        for (i = 0; i < 8; ++i)
            x[i] = x[i]*0.5 + y[i];

        Just tried on gcc 4.5 (-O2 -S -march=core2 -mfpmath=sse). It will happily use the SSE registers but not actually use vector instructions. With -mavx I get vector instructions, but only if the compiler knows the size of the array at compile-time. If I do something like this with a variable array size n, and decide that n=10000 during program execution, it will not vectorize at all even with -mavx, and that is even if I ensure that the compiler can assume non-aliased data.

        Now gcc of course has a zillion options to tweak the code generation, but I can imagine that at some point, someone prefers to simply write assembler code in order to make sure that vectorization is used in places where it make sense.

        • (Score: 2, Interesting) by cubancigar11 on Saturday March 08 2014, @04:50PM

          by cubancigar11 (330) on Saturday March 08 2014, @04:50PM (#13222) Homepage Journal

          That looks like a quite a common form of code. Have you tried contacting gcc guys? They would love this kind of info, and maybe we will learn about a way to generate optimized code.

        • (Score: 3, Informative) by mojo chan on Saturday March 08 2014, @05:58PM

          by mojo chan (266) on Saturday March 08 2014, @05:58PM (#13248)

          The problem with GCC not vectorizing code is due to you not telling it all the assumptions you made when you expected it to. GCC will only vectorize when it knows it is absolutely safe to do so, and you need to communicate that. When you wrote your own assembler version you did so based on these same assumptions.

          In the specific example you cite have a look at the FFT code in ffdshow, specifically the ARM assembler stuff that uses NEON. To get good performance there is a hell of a lot of duplicated code since it processes stuff in power of 2 block sizes. If you had specified -O3 that's the signal to the compiler to go nuts and generate massive amounts of unrolled code like that. Even then it might not be worth it in all cases because if the array was only say 5 elements long you might spend more time setting up the vector stuff than it would save. So what you need to do is create your own functions to break the array down into fixed size units that can be heavily optimized, just like they did in the ffdshow assembler code. The compiler isn't psychic, unless you tell it this stuff it can't know what kind of data your code will be processing or how big variable length arrays are likely to be at run time.

          --
          const int one = 65536; (Silvermoon, Texture.cs)
          • (Score: 3, Interesting) by hankwang on Saturday March 08 2014, @06:38PM

            by hankwang (100) on Saturday March 08 2014, @06:38PM (#13262) Homepage

            The problem with GCC not vectorizing code is due to you not telling it all the assumptions you made when you expected it to. GCC will only vectorize when it knows it is absolutely safe to do so

            For the record, this is the full test code:

            float calc(float x[], float c, int veclen)
            {
              int i, j, k;
              for (int i = 0; i < 10000; ++i) {
                for (k = 0; k < veclen*4; ++k)
                  x[k] = c*x[k] + x[k+veclen*4];
              }
            }

            The compiler should know that there cannot be any aliasing issues in the array 'x', so it *is* safe. But I wasn't aware that -O2 and -O3 makes such a big difference; with -O3 I do indeed get vector instuctions. From now on, I my number crunching code will be -O3...

            • (Score: 3, Informative) by mojo chan on Sunday March 09 2014, @12:14AM

              by mojo chan (266) on Sunday March 09 2014, @12:14AM (#13370)

              O2 doesn't make the compiler check if x is safe from aliasing and so forth because it is an expensive operation, and the resulting code can be problematic to debug on some architectures. Moving to O3 does check, so the compiler uses vector instructions. C can be somewhat expensive to optimize because there is a lot of stuff you can do legally that has to be checked for, and often that involves checking entire modules.

              --
              const int one = 65536; (Silvermoon, Texture.cs)
  • (Score: 4, Insightful) by tftp on Saturday March 08 2014, @09:24AM

    by tftp (806) on Saturday March 08 2014, @09:24AM (#13137) Homepage

    Is assembly worth it for the handicaps and trouble that it brings?

    Of course, it depends on your values. If 5 lines of assembly code can save the Earth from destruction... OTOH, bad example :-)

    In most cases, in most commercial software, assembly code is neither needed nor wanted. Still, it is in use in a few places... but look at Linux, where is the assembly code there? Primarily in the code that boots up the CPU and prepares the C runtime. Once that is done, C is your assembly code. Compilers indeed do a better job today. A compiler is written once, and then it optimizes millions of LOCs. If you do it by hand, you do it every time, for each line - and after every code change.

    Assembly instructions are not only machine-dependent; they are also not intuitive; they do what they do because it makes sense to do it this way in hardware. Instruction sets are made for compilers, not for humans. It was somewhat reasonable to use assembly code on Intel 8080 that ran at a few MHz and had just a kilobyte of ROM. You simply had no options at that time. Today C is available and preferred for similar AVR/PIC systems.

    I cannot say if that MPlayer or VLC or other piece of code really *needs* assembly patches to do what it needs to do. Chances are, this is just old code, designed for older and slower computers and for older and slower C compilers. It's always possible, of course, to write such a code that only can deliver results in your lifetime if it is written in assembly. But the penalty for writing such a code (unmaintainable, undebuggable, undocumentable, etc.) are so high that it is often easier to use a higher level language even when performance *is* a concern (such as in every embedded system.) The autopilot on your aircraft may need additional 5W of power, and it may cost $10 more, but it will not be as likely to lock up, or to feed perfectly wrong data to the control surfaces when you least expect it.

    • (Score: 5, Informative) by maxwell demon on Saturday March 08 2014, @09:52AM

      by maxwell demon (1608) on Saturday March 08 2014, @09:52AM (#13142) Journal

      undocumentable

      Assembly language is not undocumentable. For an example of documented assembly code, see here. [literateprograms.org]

      Now of course the same in C code needs less documentation (and probably would have been fairly understandable even with no documentation at all), but needing more effort is entirely different from being impossible.

      Note that I'm not advocating using more assembly language (it is still unportable, prone to errors, hard to maintain, and in most case just a waste of effort), I'm only objecting to your claim that it is undocumentable. If you don't manage to document your assembly code, it's your failure, not the failure of assembly language.

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 4, Insightful) by anubi on Saturday March 08 2014, @12:22PM

        by anubi (2828) on Saturday March 08 2014, @12:22PM (#13164) Journal

        Personally, I only like to use assembler when bit-banging a *specific* processor in a *specific* application.

        Example: I wanted an I2C interface which needs only to interface to 8574 and ADS1110 chips. I bit banged one in assembler. For a specific microprocessor - for a specific port. That's all my code will do. Extremely limited, but what it does do, it does very fast.

        The code is extremely fast, but it is not portable... it only runs on that specific machine with that specific architecture.

        If you elect to use assembler, please be generous with the comments. I find the only way I have ever been able to document my code is to explain what every line does in a comment.

        Maxwell dug up an excellent link for an example. Maxwell is so right on his urging to document what you did, as assembly is not well known and will confuse the hell out of most people. I consider myself to be halfway formidable with assembly, mostly because while I can read and write MY code, I can have a helluva time trying to understand what someone else is doing unless they do a good documentation job.

        An assembler is a nice tool to know, but its kinda like superglue. If they try to port this thing to another processor, they are apt to have to rewrite everything I did. My algorithm may be re-used, but there is a snowball's chance in hell the code will neatly drop in.

        --
        "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
      • (Score: 1) by tftp on Saturday March 08 2014, @09:21PM

        by tftp (806) on Saturday March 08 2014, @09:21PM (#13308) Homepage

        Perhaps I was a bit harsh claiming undocumentality of assembly code. However, the problem is that your execution environment changes as you move from one area of the code to another. You use registers, or parts of registers, for one function here, and then for another function there. There is no consistency; you can't have as story here. You can have a single bit flag in R3 bit 0 in one place; but then you optimize the code and notice that 100 lines down R3 comes loaded with something that you need, so in that place you have a similar flag in R2 bit 3. The compiler can't care less about such things; a bit is a bit. A human, however, is very likely to lose track of things. Coupled with lack of verification of the code, aside from mere syntax checks, a program in assembly is not an easy thing to work on. Small, tight loops are often OK, as they can be seen inserted into an otherwise C code (for GCC.) But a whole 'wc' in assembly is pure masochism. One could do it for a very clear, specific and justified purpose, like when coding something for a 4-bit MCU that runs a doll and costs 7.02 cents.

  • (Score: 4, Insightful) by Anonymous Coward on Saturday March 08 2014, @09:47AM

    by Anonymous Coward on Saturday March 08 2014, @09:47AM (#13140)

    Seems worth it to me. If it only causes problems to the market segment that is:
    a) still using old pentiums
    AND
    b) trying to run multimedia stuff.
    AND
    c) trying to use newer software to do it.

    I've used 386, 486s, pentiums and the other CPUs around 15-20 years ago, and they weren't that great at doing multimedia stuff. I can't remember what percent CPU was required to do full screen video but it's likely to have been double digits for weaker CPUs, and back then full screen video was typically at best 640x480 (24-30fps was an achievement on slower pentiums), and more likely NTSC/PAL (352x240) or 320x240.

    Which was probably why some of the new instructions you grumble about were added - so that slightly more powerful CPUs like the Pentium Pro could do things more efficiently. IIRC back then they even had stuff like hardware assistance for MPEG playback. It was even common for users back then to complain that programs like Winamp were using too much CPU (20+% on P133) and that's merely for a fancy MP3 player.

    So even if you replaced all those pesky instructions with working code, you might only be able to do 352x240 video at 24fps on old Pentiums, and only if they were encoded with not so CPU intensive encodings. Here's a random USENET thread from the past: https://groups.google.com/forum/#!search/pentium$2 0133$20max$20fps$20movie/alt.comp.periphs.mainboar d.tyan/NUxgNVPI_EQ/F1BoFGtgmc0J [google.com]
    Search for more relevant ones if you wish.

    It'll definitely be a noteworthy achievement if you can playback video encoded with modern encodings fast enough on a Pentium- please submit that story if you manage it!

    You seem a bit like someone with a collection of ancient cars finding that the newer car audio stuff doesn't fit in them without a lot of work. You should be enjoying the challenge actually. If you don't enjoy it then maybe you need to adjust the focus/direction of your hobby a bit.

    • (Score: 2) by hankwang on Saturday March 08 2014, @03:44PM

      by hankwang (100) on Saturday March 08 2014, @03:44PM (#13213) Homepage

      Seems worth it to me. If it only causes problems to the market segment that is:
      a) still using old pentiums

      I wonder what one could possibly want to use a Pentium for, these days, other than running some kind of computer museum or running some ancient industrial ISA hardware that is difficult to replace. (Even then, one can buy industrial computers that combine modern CPUs with ISA buses.) A Raspberry Pi has much more computing power than that old Pentium, at 5-10% of the power consumption. (At Dutch electricity prices, the break-even is at about 3 months of continuous use - EUR 0.22/kWh, 75 versus 5 W.)

      • (Score: 0) by Anonymous Coward on Saturday March 08 2014, @06:52PM

        by Anonymous Coward on Saturday March 08 2014, @06:52PM (#13264)

        I'm wondering if an x86 emulator on a modern smartphone could run Linux, mplayer and be faster than one of his pentium machines... ;)

        For some perspective Intel Core 2 came out in 2006. And the first Opterons and Athlon 64s in 2003.
        Compare Athlon 64s vs Pentium 100MHz: http://www.cpu-world.com/Compare/255/AMD_Athlon_64 _3800+_(45W)_vs_Intel_Pentium_100_MHz_(A80502-100) .html [cpu-world.com]
        So even 10 year old machines are about 50-200x faster than Pentium 100MHz (about 20 years old).

        If you replace the offending ASM instructions, how much of the current multimedia stuff out there will work in practice if the CPU is 100x slower? Good luck playing Full HD videos ;).

        So yeah use pentium systems as a hobby if you wish, but complaining that mplayer doesn't work? What next complain mplayer can't play 1080p on your pentium?

    • (Score: 1) by epitaxial on Saturday March 08 2014, @04:24PM

      by epitaxial (3165) on Saturday March 08 2014, @04:24PM (#13218)

      I see more powerful computers sitting out for the trash and for sale in thrift stores. Throw away those Pentium 1 based boxes already. Hell you could replace them with $45 BeagleBone Black boards and recoup your electricity costs in the first year alone.

  • (Score: 5, Informative) by marcello_dl on Saturday March 08 2014, @10:01AM

    by marcello_dl (2685) on Saturday March 08 2014, @10:01AM (#13145)

    Picolisp creator chose assembler, not for performance reasons, across different architectures.
    See his opinions for his use case [software-lab.de].

  • (Score: 1) by quitte on Saturday March 08 2014, @11:11AM

    by quitte (306) on Saturday March 08 2014, @11:11AM (#13157) Journal

    I remember them making the switch from 386 to 486. I don't think they switched again.

    • (Score: 1) by suxen on Monday March 10 2014, @10:13AM

      by suxen (3225) on Monday March 10 2014, @10:13AM (#13806)

      Which means applications written in C and C++ are compiled with that CPU arch as their target, doesn't do anything for assembler code targetting other architectures.

  • (Score: 4, Interesting) by d on Saturday March 08 2014, @12:32PM

    by d (523) on Saturday March 08 2014, @12:32PM (#13166)

    I'm actually surprised that noone mentioned this yet. When you're exploiting a buffer overflow and have very little storage to take advantage of, it might be useful to be an assembly wizard. Also, I guess that you need some assembly skills to implement a JIT compiler or the juicy details of a kernel.

    • (Score: 2, Interesting) by AnythingGoes on Saturday March 08 2014, @02:24PM

      by AnythingGoes (3345) on Saturday March 08 2014, @02:24PM (#13182)

      I think you will find that most OS nowadays only have a very small assembly language layer that is used for the initial setup. Everything else is then written in C or higher languages. That also helps in making the OS multi-platform (Linux, BSDs, OS X, Windows, etc)

      For JIT compilers and compilers in general, yes, there is a need to understand assembly language, but the failure of the Itanium should be a large indication that relying on the compiler to perform "magic" is not feasible. Even Intel's best programmers and hardware designers with access to the underlying hardware characteristics could not create an Itanium compiler that could make the assembly perform much faster than most x86 systems. IOW, a very good understanding of the underlying assembly language does not really help in squeezing out the last iota of speed (due to the vagaries of out-of-order-execution, pipeline stalls and cache hits/flushes).

      Regarding hacking, yes, it is still an important part of the hacker's arsenal, but really how many people need to do that? when was the last time most non-kernel programmers had to debug into the kernel?

      The only other case where assembly is really useful is for the programmer to understand what is happening under the hood of what they write. Getting programmers to think of the consequences of what they write is useful, because it does help in debugging/performance analysis. As almost every language compiler is way better at performance optimization than humans, the gains from actually replacing code segments with assembly tends to be great only in corner cases, and sometimes only for 1 specific version of a CPU, which is then not needed for the next version as there would be a better instruction to sue (x86 assembly usually), most other platforms do not have this problem, as they have a very reduced instruction set as compared to Intel...

      Practically, assembly language should be taught and every programmer should understand it, but when it comes to hand coding assembly language modules, anything more than a few lines usually ends up being not much faster than machine generated code at the risk of much more nasty bugs.

      For the original poster, while CMOV does in theory perform faster, there are issues with pipeline stalls and cache flushes which can cause performance slowdowns.. just because it performs faster for one set of tests, does not mean that every other processor will perform equally as fast or get faster speedup or that every new processor from now onwards would perform faster. Hence, it is probably easier to just leave it to the compiler.

    • (Score: 3, Insightful) by jt on Saturday March 08 2014, @02:54PM

      by jt (2890) on Saturday March 08 2014, @02:54PM (#13196)

      Good points. Writing exploits, shellcode, and that sort of thing will benefit from some hardcore knowledge even if you end up calling out to library functions for much of the time. In general purpose coding it is very useful to understand what the cpu will be doing behind the scenes; maybe we will actually write it in C but with an awareness of impact on cache, pipeline, and so on.

  • (Score: 4, Insightful) by robp on Saturday March 08 2014, @01:17PM

    by robp (3485) on Saturday March 08 2014, @01:17PM (#13172)

    I'm not fully sure I understand the question, but here are my (probably uninformed) two cents.

      -Should mplayer, libavcodec, libvpx et al. be using inline assembly?
        I would hope that if all of this inlined code is being used for multimedia acceleration, that someone had profiled it and found it to indeed be faster. I am optimistic and assuming that this is the case. Also, realistically many modern codecs are not going to run well on systems as old as the 586 and K5/K6 class processors. You could try replacing the inlined bits with generic C code, and you might find that it runs unbearably slow.

      -Should anyone use inline assembly in userland programs?
      If you want to use SIMD extensions, there are a lot of vector libraries out there that will let you avoid coding this directly. I believe this was part of how Apple handled switching from Altivec to SSE when they switched artchitectures. For other situations, you really need to profile the hell out of the C code and figure out exactly what needs optimization before jumping into assembly by hand.

      -Should people understand assembly?
      Absolutely. Particularly with how gnarly the x86 architecture is, with the strange GDT, IDT, LDT structures, and specific quirks like VM86 mode and such. Understanding assembly is really important for understanding backtraces and getting a better feel for whats going on under the hood..

    (Personally, I'm crap at assembly, but I have written a bootloader that gets into 32-bit protected mode, and lets me run flat compiled C code. Doesn't do anything useful, but it was a great learning experience.)

  • (Score: 1) by Aiwendil on Saturday March 08 2014, @02:46PM

    by Aiwendil (531) on Saturday March 08 2014, @02:46PM (#13190) Journal

    It all depends on why the assembler is in use really. And yes it is a tradeoff.

    But to in essence it all boils down to the very question you ask "When is it worth it?", if you are working one a machine with only a few hundred bytes of memory the cases for assembler are quite high, but when you work on machines where you just can throw more hardware at the code it normally isn't worth assembler - unless you need to do something the compiler can't do or if the compiler tends to optimize badly (in the case of bad optmization it usally is a good idea to maintain a portable (C or whatever) equalent for when the compiler catches up - or until someone comes along that knows how to hint to the compiler what you are trying to do).

    Also, there are times when one wants to violate the assumptions of the language and compiler- in these cases assembler usually is the better option.

  • (Score: 1) by deif on Saturday March 08 2014, @04:29PM

    by deif (92) on Saturday March 08 2014, @04:29PM (#13219)

    I think using assembly nowadays is not worth the trouble, unless you have to (e.g. because there's no compiler yet for a platform, etc).

    BUT, the same applies for using pentiums nowadays. Seriously.
    Go buy a Raspberry Pi or something. You will save electricity, maintenance time, and I'm sure there are other benefits aswell.

    --
    ∀(x, y ∈ A ∪ B; x ≠ y) x² - y² ≥ 0
    • (Score: 3, Funny) by Subsentient on Saturday March 08 2014, @05:19PM

      by Subsentient (1111) on Saturday March 08 2014, @05:19PM (#13236) Homepage Journal

      I am infinitely too cheap for that. I have powerful machines, I just like being able to use my old dinosaurs. Some of these machines are so old that when you crack open the case, you hear Jitterbug music, but nonetheless, I do have use for them :^)

      --
      "It is no measure of health to be well adjusted to a profoundly sick society." -Jiddu Krishnamurti
      • (Score: 1) by Fwip on Saturday March 08 2014, @11:30PM

        by Fwip (953) on Saturday March 08 2014, @11:30PM (#13347)

        Penny-wise, pound foolish.
        How long does your dinosaur have to run to consume a Pi's cost in electricity?

  • (Score: 1) by jackb_guppy on Saturday March 08 2014, @04:51PM

    by jackb_guppy (3560) on Saturday March 08 2014, @04:51PM (#13223)

    The idea of higher languages is to remove the hardware from coding question. This allows for portability between different systems and OS. Now if language can be independent of the OS, would be more good.

    ASM/machine level coding is there to meet special needs. It is more a "business" function and less like a design consideration.

    Year ago I was supporting a hotel system, written in RPG (language on IBM midrange machines mainly) with ASM objects that where linked at compile time to handle date conversions and currency formatting. The business need of for functionality and speed where more important that easy of transporting it the code another platform. Later we also wrote ASM routines to handle B-trees, inventory and room searches. These high level functions became the new bottle neck in the system preventing better use of the underlying hardware and price point of equipment.

    Yes, we could have done the job without ASM, but the price point would be higher and along with maintenance costs. But we gave up portability.

  • (Score: 2, Informative) by dacut on Sunday March 09 2014, @06:38AM

    by dacut (1766) on Sunday March 09 2014, @06:38AM (#13467) Homepage

    It used to be necessary for high-speed integer code due to pointer aliasing. For an example (a tad contrived*, but it displays the issue with brevity):

    void add_array_to_constant(int *dest, const int *src, size_t len) {
        while (n > 0) {
            *dest++ += *src;
        }
    }

    You might have intended for src and dest to be completely different arrays, but the compiler doesn't know that. Instead, it reloads *src on each iteration through the loop, e.g.:

    .L7:
            add     eax, 4
            mov     ecx, DWORD PTR [ebx]  /* Reload *src on each iteration */
            sub     edx, 1
            add     DWORD PTR [eax-4], ecx
            cmp     edx, -1
            jne     .L7

    Back in ye olden days, you would time this, scratch your head, examine the assembly output, curse, and then recode this in assembly to move the reload instruction outside of the loop. (If there was a Fortran guy in the house, he would laugh at you and your pitiful C compiler to rub some salt into the wound.)

    Thankfully, C99 (sadly not C++11, but available as a nonstandard extension in almost every C++ compiler) adds the restrict qualifier which fixes this issue:

    void add_array_to_constant(int *restrict dest, const int *restrict src, size_t len) {
        while (n > 0) {
            *dest++ += *src;
        }
    }

    The generated assembly (on gcc 4.8.1, at least) ends up moving the load outside of the loop. Huzzah! No need for assembly here!

    * There are cases where this might not be so contrived; for example, if you have to code to a certain interface so this can be used as a callback function.