Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.

Sections

SoylentNews

Writing Efficient C Code?

posted by LaminatorX on Sunday March 16 2014, @03:28AM

from the premature-optimization-is-the-root-of-all-evil dept.

Subsentient writes:

"I've been writing C for quite some time, but I never followed good conventions I'm afraid, and I never payed much attention to the optimization tricks of the higher C programmers. Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use, but I need speed, and I really don't want to use inline ASM. So, I am wondering, what would other Soylenters do to write efficient, pure, standards-compliant C?"

This discussion has been archived. No new comments can be posted.

Writing Efficient C Code? | Log In/Create an Account | Top | 76 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Interesting) by jackb_guppy on Sunday March 16 2014, @02:22PM

by jackb_guppy (3560) on Sunday March 16 2014, @02:22PM (#17182)

Read it! And look to simpler coding

I did this at one many years ago to find out why a array process was slow.

a(i++) = b(j++);
vs
a(i) = b(j);
i++;
j++;

The second code was much faster! The first case produced 3 temp variables and multiple lines of ASM. The second produced 5 lines of ASM and no temp variables.

Yes, compilers are faster and better now, but knowing what it does is important.

Parent

Starting Score:	1		point
Moderation		+3
Interesting=2, Informative=1, Total=3
Extra 'Interesting' Modifier		0

Total Score:		4

Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:29PM

by Anonymous Coward on Sunday March 16 2014, @03:29PM (#17190)

The first thing I noticed is you're calling functions instead of indexing arrays. You know we use square brackets for arrays in C, right? :)

Parent
Re:Read the disassembly. (Score: 1) by fnj on Sunday March 16 2014, @06:14PM

by fnj (1654) on Sunday March 16 2014, @06:14PM (#17227)

The operative phrase is "many years ago". It is highly unlikely using gcc and targeting current sophisticated CPUs that there is going to be any execution speed difference whatsoever between the two styles.

Parent
Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Informative) by frojack on Sunday March 16 2014, @08:13PM

by frojack (1554) on Sunday March 16 2014, @08:13PM (#17254) Journal

This is so true.
For many years we practiced code optimization with a ruler. Some times we needed a yard stick. (Usually for string handling code, oddly enough).
We would literally print out the assembly, (an option offered by our compiler, which included the source code in comments) spread it out, and measure how many INCHES of assembly each source code line would generate.
Almost always, a series of individual hand coded operations resulted in shorter segments of assembly than would the complex single statement. We would compile it both ways, and simply measure the total amount of assembly statements in each.
We learned to identify two foot language constructs from two inch constructs.
After a while we learned to avoid those constructs that that would generate a mountain of code, and write simpler structures to do the same work. We might use a small amount of code in hand coded loop(s) to process an array, and avoid the complex code generated by the language's array operations.
We wrote another program to scan the assembly and assign how many clocks each assembly operation took, and sum them up, on the theory that many long sequences of assembly might be faster than smaller sequences performed many times. (We were prepared to sacrifice a little speed for being able to fit the code into memory.)
But it turns out that we always gained speed, and a lot of it. Simple code seldom generates a complex high-clock instruction sequence.
Sadly, hardly anybody does this analysis anymore, they just use the shortest high level code (smuggly congratulating themselves on its uptuseness) while forcing the compiler to generate what ever mess it might spew and then turn on the compilers optimization option and assume it was the best.
Its frequently not even close to the best.

--
No, you are mistaken. I've always had this sig.

Parent
- Re:Read the disassembly. (Score: 4, Insightful) by maxwell demon on Sunday March 16 2014, @08:45PM
  
  by maxwell demon (1608) on Sunday March 16 2014, @08:45PM (#17269) Journal
  
  However today neither counting instruction, nor adding cycles is going to give a good estimate about your running time (unless it turns out e.g. that the loop is slightly too large to fit into the instruction cache), because the processors tend to do a lot behind the scenes (register renaming, branch prediction, out-of-order execution, speculative execution, ...). Far more important issues are things like cache locality (this alone can get you quite a bit of speedup, and can be analyzed entirely on the C level). And of course no amount of micro-optimization can save you from a badly chosen algorithm.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
- Re:Read the disassembly. (Score: 1) by jackb_guppy on Sunday March 16 2014, @09:22PM
  
  by jackb_guppy (3560) on Sunday March 16 2014, @09:22PM (#17286)
  
  I so agree. You have to understand what the compiler and underlying hardware will do.
  One day back in the XT & AT days, we had two young programmers trying to write ASM for simple screen processing. They tried to save as many instructions as possible figuring it was faster. So one instruction was multiple by 80. It looks good. Showed that a Store, 2 shifts, Add, 4 shifts was way faster, but took 8 instructions, instead of 1.
  
  Parent
- Re:Read the disassembly. Re:Read the disassembly. (Score: 1) by rev0lt on Sunday March 16 2014, @09:33PM
  
  by rev0lt (3125) on Sunday March 16 2014, @09:33PM (#17288)
  
  For many years we practiced code optimization with a ruler
  Assuming x86, after Pentium Pro, you ought to have a big ruler. As an example, hand-optimization for P4 is a *f****** nightmare*.
  After a while we learned to avoid those constructs that that would generate a mountain of code, and write simpler structures to do the same work. We might use a small amount of code in hand coded loop(s) to process an array, and avoid the complex code generated by the language's array operations.
  The problem is, reducing the amount of instructions isn't necessarily good. On your example, short loops were discouraged in long pipeline CPUs such as the prescott line, but would actually be faster in the Centrino/Pentium M line. Also, you'd gain a huge amount of speed if you respected 32-byte boundaries on cache lines. So, having 15 instructions to avoid 2 odd out-of-boundaries memory operations would be faster than having a simple, compact loop that would cause a cache miss. And don't even get me started on paralelization - reducing the number of instructions doesn't necessarily mean the code will be faster.
  Its frequently not even close to the best.
  Its not the best. But unless you're a asm wizard, the result will be fast enough regardless of the CPU. For most purposes, handwritten assembly or optimizing listings is a waste of time. However, I do agree that knowing what the compiler generates will make you a better programmer, and avoid some generic pitfalls.
  
  Parent
  - Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by frojack on Sunday March 16 2014, @10:12PM
    
    by frojack (1554) on Sunday March 16 2014, @10:12PM (#17302) Journal
    
    You haven't a clue about how to quote do you?
    You also assume way more than I've said.
    You also assume that changes in pipe lining make all efforts at optimization useless and unnecessary. Nothing could be further from the truth. The techniques one might adopt with knowledge of current processors might be different than what you would use before, but there are many more things you can so in your code today than you could do before.
    The "fast enough" mentality is exactly part of the problem.
    
    --
    No, you are mistaken. I've always had this sig.
    
    Parent
    - Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @12:10AM
      
      by Anonymous Coward on Monday March 17 2014, @12:10AM (#17325)
      
      To be fair to him, it's easy to assume the quoting mechanism is still <quote> instead of <blockquote> .
      
      Parent
      - Re:Read the disassembly. (Score: 1, Troll) by frojack on Monday March 17 2014, @12:18AM
        
        by frojack (1554) on Monday March 17 2014, @12:18AM (#17326) Journal
        
        When the screen you post from clearly shows the supported syntax?
        
        --
        No, you are mistaken. I've always had this sig.
        
        Parent
      - Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:26PM
        
        by maxwell demon (1608) on Monday March 17 2014, @05:26PM (#17707) Journal
        
        Actually <blockquote> is the old one. It worked already before Slashdot introduced <quote> with its slightly different spacing behaviour, and it never stopped working.
        
        --
        The Tao of math: The numbers you can count are not the real numbers.
        
        Parent
    - Re:Read the disassembly. (Score: 1) by rev0lt on Monday March 17 2014, @05:11AM
      
      by rev0lt (3125) on Monday March 17 2014, @05:11AM (#17408)
      
      You haven't a clue about how to quote do you?
      No, not really (Tnx AC). Is that relevant to the topic?
      ou also assume that changes in pipe lining make all efforts at optimization useless and unnecessary.
      Well, you assume I said that. I didn't. What I said was that producing a blend of optimized code for all common CPUs at a given time is complex, and one of the most obvious examples was when you had both Prescott and Pentium M in the market. Totally different CPUs in terms of optimization.
      Nothing could be further from the truth. The techniques one might adopt with knowledge of current processors might be different than what you would use before
      Well, I've worked extensively with handwritten and hand-optimized assembly for most (all?) Intel x86 CPUs upto Pentium4. Just because you optimize it, doesn't necessarily mean its faster (as an old fart example, think about all those integer-only Bresenham line algorithms vs having a div per pixel). And even if it is generically faster, it is usually model-specific. And it is very easy to get it to run slower (eg. by direct and indirect stalls, cache misses, branch prediction misses, etc). The Intel Optimization Manual is more than 600 pages (http://www.intel.com/content/www/us/en/architectu re-and-technology/64-ia-32-architectures-optimizat ion-manual.html), if you can generically beat a good compiler, good for you. Or you can stop wasting time and use a profiling tool like http://software.intel.com/en-us/intel-vtune-amplif ier-xe [intel.com] to have a concrete idea of what and when to optimize, instead of having to know all little details all by yourself.
      
      Parent
    - qvcezvqumx (Score: 0) by Anonymous Coward on Thursday August 21 2014, @08:15PM
      
      by Anonymous Coward on Thursday August 21 2014, @08:15PM (#84075)
      
      jbHbj7 lqaicvvgiujj [lqaicvvgiujj.com], [url=http://wmkxravbedoo.com/]wmkxravbedoo[/url], [link=http://zgtdxuwccnvm.com/]zgtdxuwccnvm[/link], http://sseaichrwtuf.com/ [sseaichrwtuf.com]
      
      Parent
Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @08:57PM

by Anonymous Coward on Sunday March 16 2014, @08:57PM (#17273)

I did this at one many years ago to find out why a array process was slow.
a(i++) = b(j++);
foobar.c: error: expression is not assignable a(i++) = b(j++); ~~~~~~ ^

Parent
- Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:31PM
  
  by maxwell demon (1608) on Monday March 17 2014, @05:31PM (#17710) Journal
  
  You forgot to include the header file containing
  #define a(x) (*(p->foo.bar(baz((x))->qux)->abc))
  
  :-)
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
  - Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @06:12PM
    
    by Anonymous Coward on Monday March 17 2014, @06:12PM (#17735)
    
    I thought we were talking about _efficient_ C, sorry. :)
    
    Parent
Re:Read the disassembly. (Score: 2) by mojo chan on Monday March 17 2014, @04:43PM

by mojo chan (266) on Monday March 17 2014, @04:43PM (#17687)

Unfortunately it is hard to diagnose why this was not properly optimized without seeing the rest of the code, but it looks like a compiler bug. GCC will produce well optimized code in both cases, either converting to a memcpy or at least a tight and minimal loop.
The problem is that once you start trying to second guess the compiler you end up with code that is both horrible and probably won't optimize so well in a year or two when the compiler has been improved. Fortunately we now have a couple of really good free C compilers so can for the most part just write portable code and not worry about it.
I write firmware for microcontrollers in C so optimization is a big deal for me, but most of the time it is better to let GCC worry about it.

--
const int one = 65536; (Silvermoon, Texture.cs)

Parent

Moderator Help

* m2 stares at the monitor... it looks like a hamburger... m2 - that's a bad sign

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Writing Efficient C Code?

Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Interesting) by jackb_guppy on Sunday March 16 2014, @02:22PM

Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:29PM

Re:Read the disassembly. (Score: 1) by fnj on Sunday March 16 2014, @06:14PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Informative) by frojack on Sunday March 16 2014, @08:13PM

Re:Read the disassembly. (Score: 4, Insightful) by maxwell demon on Sunday March 16 2014, @08:45PM

Re:Read the disassembly. (Score: 1) by jackb_guppy on Sunday March 16 2014, @09:22PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 1) by rev0lt on Sunday March 16 2014, @09:33PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by frojack on Sunday March 16 2014, @10:12PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @12:10AM

Re:Read the disassembly. (Score: 1, Troll) by frojack on Monday March 17 2014, @12:18AM

Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:26PM

Re:Read the disassembly. (Score: 1) by rev0lt on Monday March 17 2014, @05:11AM

qvcezvqumx (Score: 0) by Anonymous Coward on Thursday August 21 2014, @08:15PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @08:57PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:31PM

Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @06:12PM

Re:Read the disassembly. (Score: 2) by mojo chan on Monday March 17 2014, @04:43PM