Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Writing Efficient C Code?

posted by LaminatorX on Sunday March 16 2014, @03:28AM

from the premature-optimization-is-the-root-of-all-evil dept.

Subsentient writes:

"I've been writing C for quite some time, but I never followed good conventions I'm afraid, and I never payed much attention to the optimization tricks of the higher C programmers. Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use, but I need speed, and I really don't want to use inline ASM. So, I am wondering, what would other Soylenters do to write efficient, pure, standards-compliant C?"

This discussion has been archived. No new comments can be posted.

Writing Efficient C Code? | Log In/Create an Account | Top | 76 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Read the disassembly. Read the disassembly. (Score: 5, Insightful) by Anonymous Coward on Sunday March 16 2014, @03:39AM

by Anonymous Coward on Sunday March 16 2014, @03:39AM (#17057)

If you want to really optimize your code, most likely you will need to write it so that it compiles into the best machine code for your platform. And sometmes, things that look 'clever' in C actually produce horrible machine code.
- Re:Read the disassembly. Re:Read the disassembly. (Score: 5, Informative) by rev0lt on Sunday March 16 2014, @03:45AM
  
  by rev0lt (3125) on Sunday March 16 2014, @03:45AM (#17064)
  
  +1
  If you want compliance, stick with C (without monkey stuff). If you want speed, you go with assembly, but identify first your bottlenecks. You can probably squeeze more performance by choosing the right algorithm for your problem than optimizing for the language. And 9 in 10, the compiler will still beat you in assembly - unless when working with well-defined architectures.
  
  Parent
  - Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Interesting) by jackb_guppy on Sunday March 16 2014, @02:22PM
    
    by jackb_guppy (3560) on Sunday March 16 2014, @02:22PM (#17182)
    
    Read it! And look to simpler coding
    I did this at one many years ago to find out why a array process was slow.
    a(i++) = b(j++);
    vs
    a(i) = b(j);
    i++;
    j++;
    The second code was much faster! The first case produced 3 temp variables and multiple lines of ASM. The second produced 5 lines of ASM and no temp variables.
    Yes, compilers are faster and better now, but knowing what it does is important.
    
    Parent
    - Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:29PM
      
      by Anonymous Coward on Sunday March 16 2014, @03:29PM (#17190)
      
      The first thing I noticed is you're calling functions instead of indexing arrays. You know we use square brackets for arrays in C, right? :)
      
      Parent
    - Re:Read the disassembly. (Score: 1) by fnj on Sunday March 16 2014, @06:14PM
      
      by fnj (1654) on Sunday March 16 2014, @06:14PM (#17227)
      
      The operative phrase is "many years ago". It is highly unlikely using gcc and targeting current sophisticated CPUs that there is going to be any execution speed difference whatsoever between the two styles.
      
      Parent
    - Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Informative) by frojack on Sunday March 16 2014, @08:13PM
      
      by frojack (1554) on Sunday March 16 2014, @08:13PM (#17254) Journal
      
      This is so true.
      For many years we practiced code optimization with a ruler. Some times we needed a yard stick. (Usually for string handling code, oddly enough).
      We would literally print out the assembly, (an option offered by our compiler, which included the source code in comments) spread it out, and measure how many INCHES of assembly each source code line would generate.
      Almost always, a series of individual hand coded operations resulted in shorter segments of assembly than would the complex single statement. We would compile it both ways, and simply measure the total amount of assembly statements in each.
      We learned to identify two foot language constructs from two inch constructs.
      After a while we learned to avoid those constructs that that would generate a mountain of code, and write simpler structures to do the same work. We might use a small amount of code in hand coded loop(s) to process an array, and avoid the complex code generated by the language's array operations.
      We wrote another program to scan the assembly and assign how many clocks each assembly operation took, and sum them up, on the theory that many long sequences of assembly might be faster than smaller sequences performed many times. (We were prepared to sacrifice a little speed for being able to fit the code into memory.)
      But it turns out that we always gained speed, and a lot of it. Simple code seldom generates a complex high-clock instruction sequence.
      Sadly, hardly anybody does this analysis anymore, they just use the shortest high level code (smuggly congratulating themselves on its uptuseness) while forcing the compiler to generate what ever mess it might spew and then turn on the compilers optimization option and assume it was the best.
      Its frequently not even close to the best.
      
      --
      No, you are mistaken. I've always had this sig.
      
      Parent
      - Re:Read the disassembly. (Score: 4, Insightful) by maxwell demon on Sunday March 16 2014, @08:45PM
        
        by maxwell demon (1608) on Sunday March 16 2014, @08:45PM (#17269) Journal
        
        However today neither counting instruction, nor adding cycles is going to give a good estimate about your running time (unless it turns out e.g. that the loop is slightly too large to fit into the instruction cache), because the processors tend to do a lot behind the scenes (register renaming, branch prediction, out-of-order execution, speculative execution, ...). Far more important issues are things like cache locality (this alone can get you quite a bit of speedup, and can be analyzed entirely on the C level). And of course no amount of micro-optimization can save you from a badly chosen algorithm.
        
        --
        The Tao of math: The numbers you can count are not the real numbers.
        
        Parent
      - Re:Read the disassembly. (Score: 1) by jackb_guppy on Sunday March 16 2014, @09:22PM
        
        by jackb_guppy (3560) on Sunday March 16 2014, @09:22PM (#17286)
        
        I so agree. You have to understand what the compiler and underlying hardware will do.
        One day back in the XT & AT days, we had two young programmers trying to write ASM for simple screen processing. They tried to save as many instructions as possible figuring it was faster. So one instruction was multiple by 80. It looks good. Showed that a Store, 2 shifts, Add, 4 shifts was way faster, but took 8 instructions, instead of 1.
        
        Parent
      - Re:Read the disassembly. Re:Read the disassembly. (Score: 1) by rev0lt on Sunday March 16 2014, @09:33PM
        
        by rev0lt (3125) on Sunday March 16 2014, @09:33PM (#17288)
        
        For many years we practiced code optimization with a ruler
        Assuming x86, after Pentium Pro, you ought to have a big ruler. As an example, hand-optimization for P4 is a *f****** nightmare*.
        After a while we learned to avoid those constructs that that would generate a mountain of code, and write simpler structures to do the same work. We might use a small amount of code in hand coded loop(s) to process an array, and avoid the complex code generated by the language's array operations.
        The problem is, reducing the amount of instructions isn't necessarily good. On your example, short loops were discouraged in long pipeline CPUs such as the prescott line, but would actually be faster in the Centrino/Pentium M line. Also, you'd gain a huge amount of speed if you respected 32-byte boundaries on cache lines. So, having 15 instructions to avoid 2 odd out-of-boundaries memory operations would be faster than having a simple, compact loop that would cause a cache miss. And don't even get me started on paralelization - reducing the number of instructions doesn't necessarily mean the code will be faster.
        Its frequently not even close to the best.
        Its not the best. But unless you're a asm wizard, the result will be fast enough regardless of the CPU. For most purposes, handwritten assembly or optimizing listings is a waste of time. However, I do agree that knowing what the compiler generates will make you a better programmer, and avoid some generic pitfalls.
        
        Parent
        
        Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by frojack on Sunday March 16 2014, @10:12PM
        
        by frojack (1554) on Sunday March 16 2014, @10:12PM (#17302) Journal
        
        You haven't a clue about how to quote do you?
        You also assume way more than I've said.
        You also assume that changes in pipe lining make all efforts at optimization useless and unnecessary. Nothing could be further from the truth. The techniques one might adopt with knowledge of current processors might be different than what you would use before, but there are many more things you can so in your code today than you could do before.
        The "fast enough" mentality is exactly part of the problem.
        
        --
        No, you are mistaken. I've always had this sig.
        
        Parent
        
        Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @12:10AM
        
        by Anonymous Coward on Monday March 17 2014, @12:10AM (#17325)
        
        To be fair to him, it's easy to assume the quoting mechanism is still <quote> instead of <blockquote> .
        
        Parent
        
        Re:Read the disassembly. (Score: 1, Troll) by frojack on Monday March 17 2014, @12:18AM
        
        by frojack (1554) on Monday March 17 2014, @12:18AM (#17326) Journal
        
        When the screen you post from clearly shows the supported syntax?
        
        --
        No, you are mistaken. I've always had this sig.
        
        Parent
        
        Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:26PM
        
        by maxwell demon (1608) on Monday March 17 2014, @05:26PM (#17707) Journal
        
        Actually <blockquote> is the old one. It worked already before Slashdot introduced <quote> with its slightly different spacing behaviour, and it never stopped working.
        
        --
        The Tao of math: The numbers you can count are not the real numbers.
        
        Parent
        
        Re:Read the disassembly. (Score: 1) by rev0lt on Monday March 17 2014, @05:11AM
        
        by rev0lt (3125) on Monday March 17 2014, @05:11AM (#17408)
        
        You haven't a clue about how to quote do you?
        No, not really (Tnx AC). Is that relevant to the topic?
        ou also assume that changes in pipe lining make all efforts at optimization useless and unnecessary.
        Well, you assume I said that. I didn't. What I said was that producing a blend of optimized code for all common CPUs at a given time is complex, and one of the most obvious examples was when you had both Prescott and Pentium M in the market. Totally different CPUs in terms of optimization.
        Nothing could be further from the truth. The techniques one might adopt with knowledge of current processors might be different than what you would use before
        Well, I've worked extensively with handwritten and hand-optimized assembly for most (all?) Intel x86 CPUs upto Pentium4. Just because you optimize it, doesn't necessarily mean its faster (as an old fart example, think about all those integer-only Bresenham line algorithms vs having a div per pixel). And even if it is generically faster, it is usually model-specific. And it is very easy to get it to run slower (eg. by direct and indirect stalls, cache misses, branch prediction misses, etc). The Intel Optimization Manual is more than 600 pages (http://www.intel.com/content/www/us/en/architectu re-and-technology/64-ia-32-architectures-optimizat ion-manual.html), if you can generically beat a good compiler, good for you. Or you can stop wasting time and use a profiling tool like http://software.intel.com/en-us/intel-vtune-amplif ier-xe [intel.com] to have a concrete idea of what and when to optimize, instead of having to know all little details all by yourself.
        
        Parent
        
        qvcezvqumx (Score: 0) by Anonymous Coward on Thursday August 21 2014, @08:15PM
        
        by Anonymous Coward on Thursday August 21 2014, @08:15PM (#84075)
        
        jbHbj7 lqaicvvgiujj [lqaicvvgiujj.com], [url=http://wmkxravbedoo.com/]wmkxravbedoo[/url], [link=http://zgtdxuwccnvm.com/]zgtdxuwccnvm[/link], http://sseaichrwtuf.com/ [sseaichrwtuf.com]
        
        Parent
    - Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @08:57PM
      
      by Anonymous Coward on Sunday March 16 2014, @08:57PM (#17273)
      
      I did this at one many years ago to find out why a array process was slow.
      a(i++) = b(j++);
      foobar.c: error: expression is not assignable a(i++) = b(j++); ~~~~~~ ^
      
      Parent
      - Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:31PM
        
        by maxwell demon (1608) on Monday March 17 2014, @05:31PM (#17710) Journal
        
        You forgot to include the header file containing
        #define a(x) (*(p->foo.bar(baz((x))->qux)->abc))
        
        :-)
        
        --
        The Tao of math: The numbers you can count are not the real numbers.
        
        Parent
        
        Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @06:12PM
        
        by Anonymous Coward on Monday March 17 2014, @06:12PM (#17735)
        
        I thought we were talking about _efficient_ C, sorry. :)
        
        Parent
    - Re:Read the disassembly. (Score: 2) by mojo chan on Monday March 17 2014, @04:43PM
      
      by mojo chan (266) on Monday March 17 2014, @04:43PM (#17687)
      
      Unfortunately it is hard to diagnose why this was not properly optimized without seeing the rest of the code, but it looks like a compiler bug. GCC will produce well optimized code in both cases, either converting to a memcpy or at least a tight and minimal loop.
      The problem is that once you start trying to second guess the compiler you end up with code that is both horrible and probably won't optimize so well in a year or two when the compiler has been improved. Fortunately we now have a couple of really good free C compilers so can for the most part just write portable code and not worry about it.
      I write firmware for microcontrollers in C so optimization is a big deal for me, but most of the time it is better to let GCC worry about it.
      
      --
      const int one = 65536; (Silvermoon, Texture.cs)
      
      Parent
Look at java implementation Look at java implementation (Score: 5, Interesting) by ls671 on Sunday March 16 2014, @03:42AM

by ls671 (891) on Sunday March 16 2014, @03:42AM (#17061) Homepage

Look at the java implementation of Strings. It is pretty interesting. Strings are final and are reused. So if you have 1000 instances of the String "yes" in your program at once, only one area of the memory will contain "yes".
There is other nifty tricks as well in the string copy operations.
Get the source code.

--
Everything I write is lies, including this sentence.
- Re:Look at java implementation Re:Look at java implementation (Score: -1, Troll) by Ethanol-fueled on Sunday March 16 2014, @03:56AM
  
  by Ethanol-fueled (2792) on Sunday March 16 2014, @03:56AM (#17069) Homepage
  
  We call those "const" in my hood, nigga.
  
  Parent
  - Re:Look at java implementation (Score: 3, Informative) by ls671 on Sunday March 16 2014, @04:33AM
    
    by ls671 (891) on Sunday March 16 2014, @04:33AM (#17075) Homepage
    
    "const" don't have the copy on write feature. You can't even write them. So sorry, we are not talking about the same thing..
    
    --
    Everything I write is lies, including this sentence.
    
    Parent
  - Re:Look at java implementation Re:Look at java implementation (Score: 1) by sjames on Sunday March 16 2014, @06:32AM
    
    by sjames (2882) on Sunday March 16 2014, @06:32AM (#17106) Journal
    
    Not the same thing at all. Const can't be assigned at all (other than the initial assignment). Thi is entiorely different. There is a string pool containing one and only one copy of every string assigned anywhere. If you have char *a="hello " and char *b="there" and char *c = "hello there!", there will be 3 strings stored. If you concatenate a and b assigned to char *d, d will point to the same instance of "hello there" as c does (and it's reference count will be incremented).
    Unlike a const, you can then do d = "Never Mind".
    
    Parent
    - Re:Look at java implementation Re:Look at java implementation (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:32PM
      
      by Anonymous Coward on Sunday March 16 2014, @03:32PM (#17191)
      
      You mean whenever I change a string in Java, the JVM runs through ALL my strings in memory to make sure I don't have a duplicate string already? and there's no opting out??
      Yikes, that sounds like a huge waste of CPU cycles. I'll happily trade memory to get that performance back, thanks.
      
      Parent
      - Re:Look at java implementation (Score: 1) by sjames on Sunday March 16 2014, @08:44PM
        
        by sjames (2882) on Sunday March 16 2014, @08:44PM (#17267) Journal
        
        It's done with hashes.
        
        Parent
- Re:Look at java implementation Re:Look at java implementation (Score: 3, Informative) by zip on Sunday March 16 2014, @04:04AM
  
  by zip (702) on Sunday March 16 2014, @04:04AM (#17072)
  
  I'm not sure if this still applies with newer versions, but maybe it's unrelated: https://stackoverflow.com/questions/16123446/java- 7-string-substring-complexity [stackoverflow.com]
  But as far as I know, Qt strings still behave this way.
  
  Parent
  - Re:Look at java implementation (Score: 2) by ls671 on Monday March 17 2014, @02:07AM
    
    by ls671 (891) on Monday March 17 2014, @02:07AM (#17346) Homepage
    
    Before:
    String a = "1234567890";
    a = a.substring(0,2); // a is now "12"
    1) No memory copy has occured.
    2) "1234567890" is still kept into memory with a reference count of 1
    Now (apparently according to the link you posted):
    String a = "1234567890";
    a = a.substring(0,2); // a is now "12"
    1) Memory copy has occured.
    2) "1234567890" is eligible for garbage collection while a new area of memory has been assigned to "12".
    IMHO, there is work to be done to combine the 2 approaches. Hint: Keep old behavior but revert back to new behavior in the background after a given TTL where "1234567890" hasn't been accessed, then transparently copy the substring "12" to a new memory area.
    I know it sounds very complicated but it is pretty simple compared to the many flavors of garbage collection floating around ;-)
    
    --
    Everything I write is lies, including this sentence.
    
    Parent
- Re:Look at java implementation Re:Look at java implementation (Score: 1) by germanbird on Sunday March 16 2014, @05:17AM
  
  by germanbird (2619) on Sunday March 16 2014, @05:17AM (#17090)
  
  As cool as the java implementation is, it may not be what the original poster is looking for. Sure it is very memory efficient, but optimizing for memory tends to reduce performance (as optimizing for performance often increases memory usage).
  Maintaining a "library" of final/immutable strings means that every time you create or modify a string, you will have to search the list of existing strings for an existing copy. Likewise any destroyed or modified strings may need to remove an entry from the list. At this point, you are implementing some sort of a reference counting mechanism and/or garbage collection. Interesting (well, at least interesting to certain code geeks) problems to solve and play with, but maybe not exactly what you are looking for.
  The big implication of all this is that anytime you modify a string, a new immutable string is (possibly) created. So think about appending text in a loop. For every iteration of that loop, a new immutable string is added (or found) in your library of strings. This can make certain text operations pretty expensive.
  I think all of this stuff is worth thinking through, but if you are just looking to write a library to help you with or abstract common string operations, this is probably more work than you need to do. Either way from your original post, it sounds like you care more about cpu performance than memory usage, which probably means this is the wrong direction for you.
  Either way, food for thought and a good mental exercise.
  
  Parent
  - Re:Look at java implementation (Score: 3, Insightful) by jorl17 on Sunday March 16 2014, @05:19AM
    
    by jorl17 (3747) on Sunday March 16 2014, @05:19AM (#17091)
    
    It really depends on the context of application. If he wants to manipulate strings ever so often, then that seems like an overkill. If he's going to be crunching strings like beta crunches F-bombs, then it might be a good idea to go beyond that.
    I often say that Context Information is everything. And it really is.
    
    Parent
  - Re:Look at java implementation Re:Look at java implementation (Score: 1) by TheGratefulNet on Sunday March 16 2014, @06:00AM
    
    by TheGratefulNet (659) on Sunday March 16 2014, @06:00AM (#17102)
    
    deduplication is great for disks, but if I have something in memory, I want it to be fast.
    it sounds 'elegant' for java to de-dupe strings but I'm not sure its going to be faster. I never pick elegant of speed, personally. elegant is for school. simple and obvious is much better for production code. and not having to search for instances of a string, just to de-dupe it, is not at all simple or fast.
    
    --
    "It is now safe to switch off your computer."
    
    Parent
    - Re:Look at java implementation (Score: 2) by Aighearach on Monday March 17 2014, @09:10PM
      
      by Aighearach (2621) on Monday March 17 2014, @09:10PM (#17799)
      
      As a Rubyist I challenge the idea that elegant is at odds with simple. The simpler code is generally more elegant.
      It is the more clever code that uses the excuse of being faster, not the more elegant. Elegant code is faster where the algorithm it uses is simpler. That is the essence of an elegant improvement; the code gets simpler and typically faster.
      
      Parent
  - Re:Look at java implementation Re:Look at java implementation (Score: 1) by sjames on Sunday March 16 2014, @07:20AM
    
    by sjames (2882) on Sunday March 16 2014, @07:20AM (#17113) Journal
    
    There are tradeoffs. Yes, you do extra work in some cases, but in others you do a lot less. For example, getting the length of the string becomes trivial rather than scanning the buffer (potentially wiping out your cache) Comparisons are, of coure, trivial. You do more accounting when you cat the strings, but you do less scanning and there's no chance of overrunning the buffer. It can also be a win if you have good optimization.
    Which is overall better and faster will strongly depend on what you're doing. I do suspect that on average final strings will be a bit slower but safer.
    
    Parent
    - Re:Look at java implementation (Score: 2) by maxwell demon on Sunday March 16 2014, @12:15PM
      
      by maxwell demon (1608) on Sunday March 16 2014, @12:15PM (#17158) Journal
      
      Any sane string implementation (that is, not C's) will have a length field (or, alternatively, an additional pointer to end), so you don't have to scan the string to find out its length (an O(n) operation which, as you noted, will do bad things to your cache), but can just read the length (or calculate it in O(1) by simple pointer subtraction). This also means you'll not arbitrarily restrict the characters which can be stored in your string (C applications are generally easily identified by them being tripped off by a zero byte in positions supposed to contain text).
      
      --
      The Tao of math: The numbers you can count are not the real numbers.
      
      Parent
- Re:Look at java implementation (Score: 2) by Debvgger on Sunday March 16 2014, @08:30AM
  
  by Debvgger (545) on Sunday March 16 2014, @08:30AM (#17124)
  
  In C, if the strings are constant (I mean, if they are hard-coded) their memory positions are reused, too. This is one of the most basic optimizations C compilers do.
  
  Parent
- Re:Look at java implementation (Score: 1) by koja on Sunday March 16 2014, @08:02PM
  
  by koja (3832) on Sunday March 16 2014, @08:02PM (#17250)
  
  Doesn't need to go as far as java lies. std::string in C++ is much nearer and has the same COW optimization.
  https://en.wikipedia.org/wiki/Copy-on-write [wikipedia.org]
  
  Parent
Don't be clever Don't be clever (Score: 5, Insightful) by jorl17 on Sunday March 16 2014, @04:34AM

by jorl17 (3747) on Sunday March 16 2014, @04:34AM (#17077)

Many people try to do this and then resort to clever tricks. They use hand-made stacks for some algorithms, they sort the hell out of many things (using their own sorting functions), and micro-optimize for whatever gets in their head. Newsflash: you will never anticipate everything...
Also try to use the standard. Sorting? use qsort. Copying? Use memcpy, strcpy and the likes. Don't try to beat the implementation!
Most often, simple code turns out to be efficient, and the compiler can optimize it more easily. Remember that super awesome triple pointer you built to cash something? Well, those 3 dereferences slowed the fuck out of your application, too.
Simple is mostly gold.
Of course if you really want speed, I suggest you look at the disassembly for many architectures and study it.
Lastly: optimize the algorithms, not the method. If you implement an algorithm badly, in 1-2 years, the CPU power advance will make it run faster. It won't change its complexity, though.
- Re:Don't be clever Re:Don't be clever (Score: 5, Informative) by forsythe on Sunday March 16 2014, @05:02AM
  
  by forsythe (831) on Sunday March 16 2014, @05:02AM (#17088)
  
  I completely agree, and offer Rob Pike's notes [lysator.liu.se]. In particular, rule 3 of complexity stands out:
  Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy.
  
  Parent
  - Re:Don't be clever Re:Don't be clever (Score: 1) by Fry on Sunday March 16 2014, @07:38AM
    
    by Fry (642) on Sunday March 16 2014, @07:38AM (#17114)
    
    Fancy algorithms are slow when n is small, and n is usually small.
    Which is why I only use bubble sort!
    
    Parent
    - Re:Don't be clever (Score: 2) by Debvgger on Sunday March 16 2014, @08:27AM
      
      by Debvgger (545) on Sunday March 16 2014, @08:27AM (#17123)
      
      I only use Bubble Bobble, sort of.
      
      Parent
- Re:Don't be clever (Score: 4, Insightful) by Anonymous Coward on Sunday March 16 2014, @05:23AM
  
  by Anonymous Coward on Sunday March 16 2014, @05:23AM (#17094)
  
  Main thing about "don't be clever" is to use OTHER CLEVERER people's code (and benchmark).
  There are other C string libraries out there, why not use those? Why are you rolling your own? If you said you wanted to learn about stuff then rolling your own makes sense, BUT you said you needed speed.
  Less code to write, fewer bugs you're responsible for and less documentation to do.
  
  Parent
- Re:Don't be clever Re:Don't be clever (Score: 5, Insightful) by germanbird on Sunday March 16 2014, @05:32AM
  
  by germanbird (2619) on Sunday March 16 2014, @05:32AM (#17096)
  
  This.
  It is a constant battle to keep programmers from prematurely optimizing things as that tends to make problems more "interesting". However, focusing your time and energy on simple code and good algorithms tends to get better results (code that is more maintainable, easier for the compiler to optimize, etc.).
  The time for optimizing comes once you have a working library. Use a profiler. Spend your time on the bits that are actually making a difference in the overall performance of your program(s). That might be the time where you find a few portions of code that could benefit from some assembly or code tricks. Most of the time, though, when I profile a program, I just find more algorithmic improvements I can make to avoid that scenario or other optimizations such as caching or reworking a data structure.
  Now if you are wanting to do a bare metal string implementation using lots of assembler just for the fun of it, I totally understand and say go for it. Hobby projects like that keep your geek skills sharp. But in the grand scheme of things, spending time on algorithms and data structures and profiling is probably a better investment into your overall code quality.
  Good luck!
  
  Parent
  - Re:Don't be clever (Score: 1) by khallow on Sunday March 16 2014, @01:27PM
    
    by khallow (3766) on Sunday March 16 2014, @01:27PM (#17170) Journal
    
    This^N. Get something that works first, then figure out how to make it better.
    But having said that, your first few passes through the profiler may be quite enlightening as to where your program is spending its time. If I were profiling this string library, I'd start by just performing each distinct operation of the code a few thousand to few million times each, including creation and deleting of strings. Also, if your code is optimized for certain types of strings (eg, long string operations - see forsythe's comment [soylentnews.org]), then try it on stuff that it isn't optimized for (like short strings).
    Stress testing (where you keep performing zillions of operations until something breaks) would be useful too. This is a more likely way to catch memory leaks and other subtle problems that slowly build up over time.
    
    Parent
memcpy and restrict memcpy and restrict (Score: 0) by Anonymous Coward on Sunday March 16 2014, @04:39AM

by Anonymous Coward on Sunday March 16 2014, @04:39AM (#17079)

Don't be afraid of memcpy. The compiler can optimize it well. If you insist, you can even pick the optimization strategy from the command line.
Use the "restrict" keyword. It's available as __restrict on most ANSI C compilers, including Visual Studio and the default setting for gcc. Put gcc in C99 mode with -std=gnu99 if you want it without the underscores, or just #define it.
- Re:memcpy and restrict (Score: 2) by forsythe on Sunday March 16 2014, @05:42AM
  
  by forsythe (831) on Sunday March 16 2014, @05:42AM (#17099)
  
  To be pedantic, using gnu99 might run against the goal of being standards-compliant. Using -std=c99 would probably be better for the submitter's purpose.
  
  Parent
- Re:memcpy and restrict (Score: 3, Insightful) by cmbrandenburg on Sunday March 16 2014, @01:52PM
  
  by cmbrandenburg (3744) on Sunday March 16 2014, @01:52PM (#17174)
  
  No, do be afraid of memcpy, as well as malloc. These are two of the four horsemen of poor performance [pl.atyp.us].
  After you've picked the best algorithm for your needs, one of the next best optimizations you can do is to avoid unnecessary memory allocation and copying. The key is developing a good eye for "unnecessary." For example, a common anti-pattern in intra-process communication is passing messages by copying one buffer to another. There's usually a better way, one that involves passing a pointer to the message rather than all of its content.
  What makes extraneous memory allocation and copying insidious is that no single memory allocation or copy will ruin your program. However, when they're repeated throughout a large code base, the inefficiency adds up--all while becoming a diffuse inefficiency that poses no single bottleneck. That is, once you realize that memory allocation and copying are a problem, it's too late to be an easy fix.
  
  Parent
I'd "Steal" it I'd "Steal" it (Score: 2) by istartedi on Sunday March 16 2014, @06:03AM

by istartedi (123) on Sunday March 16 2014, @06:03AM (#17103) Journal

"Steal" in quotes of course. For string libraries? I'd seek out the best Free/Open source or find out what happens when I link against a black-box proprietary lib. The odds that I'll come up with something better are low. I say this as somebody who has been writing C since the late 90s. Some of it was professional work that earned good marks for speed. None of it used assembly. It hit the bit bucket before it needed to run any faster. I read a book by John Abrash (worked for Id Software) and it had a lot interesting things about how he beat the standard library--with assembly. As some of the others mentioned, you might find some older libraries that aren't using the new restrict keyword and thus don't run as fast as they could. AFAIK, the GNU stuff uses it now. IMHO, you won't beat any of it without assembly.
For something less well known than a string library, it could be an entirely different story. There are probably still a lot of projects that haven't been taken to the limit. Performance is also subjective in some ways (e.g., latency or throughput?).

--
Appended to the end of comments you post. Max: 120 chars.
- Re:I'd "Steal" it Re:I'd "Steal" it (Score: 4, Informative) by c0lo on Sunday March 16 2014, @07:03AM
  
  by c0lo (156) on Sunday March 16 2014, @07:03AM (#17111) Journal
  
  "Steal" in quotes of course. For string libraries? I'd seek out the best Free/Open source or find out what happens when I link against a black-box proprietary lib.
  a comparison [sourceforge.net] between string libraries.
  bstring seems to be GPL/BSD double licensed, you'd be able to fork under whichever you like.
  
  --
  https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  
  Parent
  - bstr (Score: 1) by smaug9 on Sunday March 16 2014, @01:50PM
    
    by smaug9 (96) on Sunday March 16 2014, @01:50PM (#17173)
    
    Yeah, bstr is nice. Better to not fork. Wrap it in the interface you like, but leave the library itself alone. Maybe writing the one or two missing features you are looking for, but otherwise reusing existing work.
    
    Parent
- Re:I'd "Steal" it (Score: 3, Informative) by jones_supa on Sunday March 16 2014, @09:43AM
  
  by jones_supa (554) on Sunday March 16 2014, @09:43AM (#17135)
  
  I read a book by John Abrash (worked for Id Software) and it had a lot interesting things about how he beat the standard library--with assembly.
  Michael Abrash's Graphics Programming Black Book.
  
  Parent
- Re:I'd "Steal" it (Score: 2, Informative) by sorle on Sunday March 16 2014, @09:52AM
  
  by sorle (3778) on Sunday March 16 2014, @09:52AM (#17138)
  
  That would be **Mike** Abrash, optimization guru, and his black book is legendary. He's doing virtual reality work at Valve now, still pushing the limits of real-time performance.
  
  Parent
Profile Early, Profile Often Profile Early, Profile Often (Score: 4, Insightful) by prospectacle on Sunday March 16 2014, @06:19AM

by prospectacle (3422) on Sunday March 16 2014, @06:19AM (#17104) Journal

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth.
Write some example/load-testing applications that will use your library heavily, and keep track of which operations slow the programs down in practice. This might because of how they're written, or because of how often they're called. A fairly well-written operation that you'll need to call millions of times, needs more optimisation than a poorly written operation that's used rarely.
This is doubly true if it's for personal use, because you probably know what you want it to do a lot, and what you need it to do only occasionally.

--
If a plan isn't flexible it isn't realistic
- Re:Profile Early, Profile Often (Score: 1, Insightful) by Anonymous Coward on Sunday March 16 2014, @03:55PM
  
  by Anonymous Coward on Sunday March 16 2014, @03:55PM (#17196)
  
  > "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth.
  This correlates with some other helpful axioms, though:
  * Don't repeat yourself
  * The best optimization is a more efficient algorithm
  * Don't do more work than necessary, etc.
  So don't spend 90% of your time trying to squeeze out that extra 10% performance, but DO think about efficiency from the beginning of your design.
  I like to refer people to the parable about Shlemiel the Painter [fogcreek.com] to emphasize why good planning is necessary BEFORE you start to code!
  
  Parent
- Re:Profile Early, Profile Often Re:Profile Early, Profile Often (Score: 2, Insightful) by fnj on Sunday March 16 2014, @06:43PM
  
  by fnj (1654) on Sunday March 16 2014, @06:43PM (#17232)
  
  With respect, no, just no. Profile when you NEED to profile. If the code is not too slow in real applications, and usually that is the case with C, there is no point in profiling at all - let alone optimizing.
  
  Remember, ALL optimization is "premature" unless the code is actually too slow in operation. I borrow the wording for that thought from here [stackoverflow.com] because it is expressed so well.
  
  Parent
  - Re:Profile Early, Profile Often (Score: 0) by Anonymous Coward on Sunday March 16 2014, @07:21PM
    
    by Anonymous Coward on Sunday March 16 2014, @07:21PM (#17240)
    
    Sorry, going to have to part with conventional wisdom here. The best reason for profiling without intending to optimize is just so that you know what your code is actually doing, and how it performs under load. Testing and measurement are part of the process, and that includes the binary end-product, not just source code.
    Also, responsiveness is always important...always. You can never have too much optimization. Every millisecond counts. As Steve Jobs pointed out to Larry Kenyon [folklore.org], if you shave 10 seconds off the boot time and multiply that times 5 million users, "thats 50 million seconds, every single day. Over a year, that's probably dozens of lifetimes. So if you make it boot ten seconds faster, you've saved a dozen lives!"
    However, you CAN waste too much time optimizing for diminishing returns. I agree with Michael Abrash as he puts it in his Black Book:
    "Before we can create high-performance code, we must understand what high performance is. The objective (not always attained) in creating high-performance software is to make the software able to carry out its appointed tasks so rapidly that it responds instantaneously, as far as the user is concerned. In other words, high-performance code should ideally run so fast that any further improvement in the code would be pointless. (emphasis added)
    "Notice that the above definition most emphatically does not say anything about making the software as fast as possible. It also does not say anything about using assembly language, or an optimizing compiler, or, for that matter, a compiler at all. It also doesn't say anything about how the code was designed and written. What it does say is that high-performance code shouldn't get in the user's way--and that's all."
    
    Parent
Algorithms Algorithms (Score: 2, Insightful) by TGV on Sunday March 16 2014, @08:13AM

by TGV (2838) on Sunday March 16 2014, @08:13AM (#17120)

The key to efficient coding is not in using dirty tricks, but in efficient algorithms. C, however, does allow you to express certain algorithms more efficiently than other languages. I had a string problem once, and I ended up with an AVL tree of char* strings (this was before const). Every time the same string was created (and in text processing that happens very, very often), the same pointer to the string in that tree was returned. Elsewhere, string comparisons were done by comparing the pointer. Strings were identical if and only if the pointer was identical. For my particular problem, that saved a huge amount of time and memory.
However, for generic string libraries, there is not one algorithm guaranteed to work optimally. So, my advice is to analyze what your library has to be good at, find proper algorithms to do that, and code them efficiently (and safely).
- Re:Algorithms [and Architecture] (Score: 1) by Platinumrat on Sunday March 16 2014, @09:09PM
  
  by Platinumrat (395) on Sunday March 16 2014, @09:09PM (#17278)
  
  I agree with the above, but add the following proviso...
  You need an architecture that supports your algorithms
  What I mean, is that people tend to micro optimise, but then go use CORBA, JSON, XML, HTLM, REST (name your web service) architecture for inter-process communication or data access, when it's not appropriate. Sometimes, mbufs, dmesg, SQL or TCP/UDP are perfectly appropriate means of passing data around.
  No compiler can help, if you include inappropriate frameworks.
  
  Parent
Never trust the compiler Never trust the compiler (Score: 5, Interesting) by Debvgger on Sunday March 16 2014, @09:50AM

by Debvgger (545) on Sunday March 16 2014, @09:50AM (#17136)

From a gaming background, and having MAXIMUM speed in mind, ten things to remember:

1) Never trust the compiler.
2) Never trust the Standard Library.
3) Never trust the compiler.
4) Never trust any library you did not write.
5) Never trust the compiler.
6) Never trust any code you did not write.
7) Never trust the compiler.
8) Never trust your code.
9) Never trust the compiler.
A) ?

Disassemble the code to gain understanding about how your compiler works and what is your code REALLY doing. How many parameters can you pass to a function before it resorts to using the stack? How does it recycle variable values? What is it doing with your array accesses? So many things to have in mind.

Get used to reading assembler code. It is not THAT hard: There are too many myths about assembler, but in fact it is one of the easiest languages you can use because it's small size and few rules. Yes, it produces long and what -at first- resembles like somewhat cryptic code. But it is NOT: what you read is half people whining because they don't want to spend a few months improving their skills, half social pressure making them learn using some tools they later can't stop using. You will be amazed at how unbelievably bad are compilers in general, and also be amazed at how sometimes the compiler does an amazing trick you were not expecting. It is truly amazing to see how compilers struggle to implement basic SIMD instruction optimizations, how bad the floating point code is, etc. If you work with architectures like ARM the high level code is laughable due to short-sighted use of the fantastic ARM conditional code execution.

Every compiler VERSION behaves differently. Compilers improve constantly. But since I started programming I have been hearing these good old "compilers produce better code than humans" and "computers today have so many mhz that asm is unnecessary" bullshit. The first time I was told the later we were using 486sx clocked at 25 mhz. Eat my balls, dude.

The thing is, to produce SHARP code you need to understand what's happening in the lower levels. Don't be in a hurry. Good things need time, and need work. Don't be sucked into these modern ideas of writing a lot of code in as few time as possible. You are not a slave, you are an artist. You will find many slaves, angry with the life, throwing bullshit at you trying to deceit themselves.

Having said that, the tenth rule:

A) Do not trust ANYONE: Always check yourself how things work on YOUR code for YOUR purpose.

Everyone writes different code for different purposes. My words in this post can be a pile of bullshit for many people. For me, it is the golden rule I try to follow daily, and it works beautifully for me, for the type of stuff I work on.

Also, as someone already pointed, profile often. It is a very good way to understand where your code is slowing down. Start disassembling the functions your program spends the most time on! You are usually only going to need to optimize a few things to get an immense speed gain.

Oh, and by the way, always try to start optimizing the algorithms, of course :-)

Suggested reading: http://graphics.stanford.edu/~seander/bithacks.htm l [stanford.edu]
- Re:Never trust the compiler (Score: 5, Funny) by Fry on Sunday March 16 2014, @10:41AM
  
  by Fry (642) on Sunday March 16 2014, @10:41AM (#17143)
  
  1) Never trust the compiler.
  2) Never trust the Standard Library.
  3) Never trust the compiler.
  4) Never trust any library you did not write.
  5) Never trust the compiler.
  6) Never trust any code you did not write.
  7) Never trust the compiler.
  8) Never trust your code.
  9) Never trust the compiler.
  A) ?
  A one-based list? The horror...
  
  Parent
- Re:Never trust the compiler (Score: 4, Insightful) by FatPhil on Sunday March 16 2014, @11:15AM
  
  by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Sunday March 16 2014, @11:15AM (#17148) Homepage
  
  I got through about half of your post, and didn't see the word "measure" once. Anyone who doesn't start with saying "measure" on the first line of their comment simply doesn't know anything about optimising.
  
  If the compiler produces shitty object code from your C, you're a shitty C programmer.
  
  I one ran one of my number-crunching inner loops through (vendor-supplied) simulators for Alpha and Power processors a couple of years back (I've not done any number crunching since then). The same portable C source for both. And the pipeline analyses told me that every single pipeline was 100% saturated, except where instruction dispatch was unable to get new instructions to them, but the instruction dispatch unit was totally saturated every clock tick. Both compilers had produced absolutely perfect object code, and my C code was un-microoptimisable. And as 99.9% of the time was spent in that routine. My job was done. Thank goodness for the high quality of C compilers!
  
  --
  Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  
  Parent
- Re:Never trust the compiler (Score: 3, Insightful) by maxwell demon on Sunday March 16 2014, @12:26PM
  
  by maxwell demon (1608) on Sunday March 16 2014, @12:26PM (#17160) Journal
  
  OK, reading your post, I conclude the right way to write code it to start with your own operating system (because otherwise you didn't write the kernel code, and thus cannot trust it).
  Oh, and only run it on a RISC architecture, because you simply cannot trust microcode written by someone else ... so, sorry, no x86.
  BTW, am I at least allowed to trust VHDL code, or am I supposed to build my own computer chips as well?
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
- fall to the extreme (Score: 4, Interesting) by lajos on Sunday March 16 2014, @01:04PM
  
  by lajos (528) on Sunday March 16 2014, @01:04PM (#17164)
  
  Oh come on, this is just pure nonsense.
  Don't trust the compiler? Don't trust stl?
  What ends up happening is people will waste time rewriting basic things like string classes, containers, and you get a code base where you constantly have to interface between multiple implementations of the same thing.
  You of course won't implement half of what the stl gives you, because you say you don't need it. But others will, and will reimplement another quarter of the stls.
  You'll have bugs, that are hard to find, but will tout that your code runs 2.1% faster. You of course still won't support utf-8 or wchar, because those are stupid. Which will make your library work on one single architecture on one single operating system.
  Then some 16 year old comes in, links mono to your game, writes stuff in c# and his code runs twice as fast as yours. Supports utf-8, wchar and a thousand utility functions you thought were stupid, hence producing better and more.
  Now there is of course place for asm. But there is also one for the compiler, and the stl. Let's not fall for the extremes.
  Measure your performance, and optimize where needed.
  And a string class? Unless you are doing it for learning, just grab a well tested library.
  
  Parent
- Re:Never trust the compiler Re:Never trust the compiler (Score: 5, Insightful) by Anonymous Coward on Sunday March 16 2014, @01:56PM
  
  by Anonymous Coward on Sunday March 16 2014, @01:56PM (#17175)
  I've dealt with plenty of "don't worry I'm a game developer, I'm the best kind of programmer there is" programmers, your post is almost cliche. It's almost entertaining to tear their illusions down on StackOverflow, because they are so full of themselves, both when asking and answering questions. It's easy to spot them, because they always start their posts with "I'm a game programmer".
  Here are some hard truths:
  
  You are not as awesome as you think you are. Don't suffer from Not-Invented-Here-Syndrome. People that write code for a living have to accomplish goals, not rewrite that blitting routine for the 20th time.
  Don't bother reading assembly, rewriting routines, etc, until you've measured it. That's the glaring omission in your list.
  Writing code is not an art. It's creative work, but requires a lot of well established techniques. If you want to be employable, you have to show that you can construct programs that make sense to other programmers (so you use names, concepts, structures, algorithms, etc that are familiar to other programmers), that are maintainable.
  Parent
  - Re:Never trust the compiler Re:Never trust the compiler (Score: 2) by Debvgger on Sunday March 16 2014, @04:33PM
    
    by Debvgger (545) on Sunday March 16 2014, @04:33PM (#17204)
    
    Sir, I wonder why it would have been different if I said "I work on banking applications". Too bad for you if you have a problem with someone starting a post putting their next words into proper context, or about someone having a sort of (honest) job. I was talking about getting MAXIMUM (in CAPS) speed, from a gaming perspective. Sorry if what I do and how I have to do it is not what you do and how yo do it. The world is full of differences, learn to bare with it.
    
    About the points you make:
    
    1) I'm not bragging about anything, and probably would be getting more in another field, for example yours. I'm glad for you. I'm also glad so many people suffer the "not invented here syndrome" that brought to the development of all the libraries you use, because thanks to that we are not still eating bananas in a tree.
    
    2) My post says clearly to profile to know what functions are the time hoggers, investigate to write better algorithms and disassemble them to know how your compiler interprets your code, to be able to write them in the best way you can think. I'm surprised that my post was too long to read for someone who writes code in as few time as possible for a living and to accomplish goals, as you should surely have more time than me. Now tell me about wifes and kids. Yeah, I have these too, and still can afford to spend some extra time studying my own code to improve it.
    
    3) I understand painting walls and facades is not what you study at beaux-arts at the university. There are many types of people who use paint. All are necessary, and all are respectable. There are many types of people who use a compiler, too. There are many metrics you can use to measure code quality, but we were talking about maximum speed, and if you truly think anything I said is in detriment of good code quality I'm afraid you have a deep understanding problem, a programming problem, or both, but in any case you would not last too much at my job place. Not that you would want to get it, of course. Usually people painting facades get much more than these trying to change the world with a brush. I also compose music. Maybe not an art, also, because you can hit bongos with your balls :-)
    
    No need to be angry with the rest of the world, dude :-)
    
    Parent
    - Re:Never trust the compiler (Score: 0) by Anonymous Coward on Sunday March 16 2014, @04:53PM
      
      by Anonymous Coward on Sunday March 16 2014, @04:53PM (#17207)
      
      "The world is full of differences, learn to bare with it."
      Indeed, if we all bare with it then some major differences that split the world population in two will become obvious.
      
      Parent
- Re:Never trust the compiler (Score: 1) by rochrist on Sunday March 16 2014, @05:11PM
  
  by rochrist (3737) on Sunday March 16 2014, @05:11PM (#17212)
  
  So....you're saying never trust the compiler?
  
  Parent
- Re:Never trust the compiler Re:Never trust the compiler (Score: 1) by fnj on Sunday March 16 2014, @06:07PM
  
  by fnj (1654) on Sunday March 16 2014, @06:07PM (#17224)
  
  Worst possible advice. Yes, SOME of it had SOME validity in the 1980s. It is all bunk now.
  1) For compilers on the order of gcc and clang, and probably even Visual C++, the compiler writers know more than you could possibly imagine about creating efficient machine code.
  2) For the 8088 you could do a lot with hand optimization. For CPUs of the last 20 years, you will only make the performance worse.
  3) Do not fixate on optimization. Straightforward C code on today's CPUs is orders of magnitude better in performance than you will ever need or notice for almost all tasks, and the benefits of trickery and cleverness, if any, will be miniscule and far exceeded by a few months of CPU tech progress.
  4) If the time comes that you ever do need to optimize, remember the 99%/1% rule. 99% of potential speed gains can be harvested by optimizing the critical 1% of the code.
  5) Write for human comprehension and maintenance. Do not waste time trying to "help" the compiler be "smarter", and do not show off by using obscure tricks no one else is going to understand.
  6) Staring at assembly level compiler output isn't going to do crap for code efficiency. The days when a table of individual instruction timing values told you anything of value, or was even possible to create, are long gone. Pipelining, out of order execution, speculative branch prediction, and the like are the order of the day.
  7) Leverage existing libraries. They have been through more debugging using more distributed brain power than anything you write is ever going to get.
  
  Parent
  - Re:Never trust the compiler (Score: 0) by Anonymous Coward on Sunday March 16 2014, @06:34PM
    
    by Anonymous Coward on Sunday March 16 2014, @06:34PM (#17230)
    
    For compilers on the order of gcc and clang, and probably even Visual C++, the compiler writers know more than you could possibly imagine about creating efficient machine code.
    Citation needed. Who are these superhuman compiler gods who have powers of logic comprehension beyond mere mortals like us?
    
    Parent
  - Re:Never trust the compiler (Score: 1) by smellotron on Sunday March 16 2014, @08:30PM
    
    by smellotron (3346) on Sunday March 16 2014, @08:30PM (#17260)
    
    For compilers on the order of gcc and clang, and probably even Visual C++, the compiler writers know more than you could possibly imagine about creating efficient machine code.
    
    When optimizing, a large part of the value in inspecting the compiler-generated object code is in identifying pessimizations that the compiler was forced to make to follow the language standard. It is very common in code that I write (not games, but still latency-sensitive) that some trivial reordering of operations in the C code will reduce or remove stack usage, or replace a nested function call with a tail call. Especially because compiler writers tend to be the sharpest optimizers in the room, there is value in double-checking their output to detect data-dependency/optimization bugs in your code.
    
    Parent
Don't (Score: 2, Interesting) by ld a, b on Sunday March 16 2014, @10:44AM

by ld a, b (2414) on Sunday March 16 2014, @10:44AM (#17144)

Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use
The above gives no indication that you understand what you are currently doing, let alone what you intend to do.
My advice would be to learn the basics before jumping into the pilot's cabin.
Once you know what you are doing with C, google yourself a good string algorithm that matches your intended use patterns and implement that in non-clever C.
Once that is done, take the code using your string library and profile it. If it's good enough then just use it as is.
Once you identify the bottleneck is say the raw character copy, you google again for the fastest way of moving stuff around. You replace your old copying code and then see if it works.
If you start out looking for fast copying hacks, or improvising with algorithms, you'll end up with a slow unusable piece of crap.
Intended use is the most important part!
You can see people here raving about immutable strings or whatever, but "appending" stuff to a simple immutable string object can easily bring a system down. Usually, seemingly dumb algorithms have their strong points as well. It's not "What were System.String or strcat(3) designers smoking?" but "What was I smoking when I used that for my log storage?".

--
10 little-endian boys went out to dine, a big-endian carp ate one, and then there were -246.
Why? Why? (Score: 3, Insightful) by VLM on Sunday March 16 2014, @12:46PM

by VLM (445) on Sunday March 16 2014, @12:46PM (#17161)

"I'm trying to write a C-string handling library for personal use, but I need speed"
Why?
For the past decade or so, one possible way to interpret my job has been to do weird technical analysis of data that arrives in the form of what boils down to immense strings. Everything from plain text to csv to xml to who knows. Mush them up against each other and output something entirely different that is useful, or at least profitable.
The point being that data storage, error detection / flagging / correction / handling, analysis, and reporting are such that mere string handling just isn't relevant, a tiny fraction of total CPU cycles.
So I'm curious what you're "really" doing.
What you call a "string handling problem" might in reality be "solving a traveling salesman problem, while using lots of strings as an input". Or implementing what amounts to a homemade RDBMS, using strings as data input. Or implementing a massive data warehouse, or sorting huge amounts of records, or some kind of search algo.
The point being that magically making string manipulations execute instantly wouldn't really save me much processor time, and I feel I'm in a pretty "string heavy" line of work, so you must be doing something real interesting... real time process control (process as in ChemEng/SCADA not sysadmin). Or real time security analysis like an IDS. Or maybe some kind of weird data mining thing? Or high freq trading? Those are the only things I can think of that I'm not (directly) involved in that could be more "string heavy". So you must have an interesting story. Especially "for personal use".
Generally, when running up against an optimization wall, I've had my biggest successes by walking around the wall, not trying to hit it even harder. So that makes the subject interesting.
- Re:Why? Re:Why? (Score: 2) by hubie on Sunday March 16 2014, @01:56PM
  
  by hubie (1068) on Sunday March 16 2014, @01:56PM (#17177) Journal
  
  I am curious what language you use for most of your work.
  
  Parent
  - Re:Why? (Score: 2) by VLM on Sunday March 16 2014, @02:16PM
    
    by VLM (445) on Sunday March 16 2014, @02:16PM (#17180)
    
    Depends on "most" and "work". And how old the legacy (if any) code base is and what it used.
    Everything runs under shell scripts run by cron and a batching system mostly to prevent hopeless thrashing meltdowns if cron fired off everything at once or there was a backlog. You generally clean up a mess like that once and then make your own system or implement off the shelf like torque or whatever.
    Something that is a couple stereotypical unix tools ends up as a shell script. If fundamentally all you're trying to do is run "grep -c something somefile" and then pipe that number into an email for automated system status alerts, then its possible to turn that one line into 10 lines of perl or 100 lines of java, but why?
    "harder" problems that require large-ish state machines seem to live in Perl. Especially if there exists a CPAN library than makes a simple solution. Historical and compatibility reasons. Some fooling around with Ruby and other languages have made an appearance.
    You can run into definition games where a shell script that runs an old perl script to slightly cook some raw data before feeding it repeatedly into R or octave for serious mathematical analysis and then some new ruby that eats the octave output and outputs something gnuplot likes and some simple html wrapper to reference it, is that shell, perl, octave, ruby...
    Sometimes I get the depressing feeling that if a language package exists in Debian, its probably system critical somewhere here, which is annoying.
    
    Parent
It's simple really It's simple really (Score: 2) by bucc5062 on Sunday March 16 2014, @01:13PM

by bucc5062 (699) on Sunday March 16 2014, @01:13PM (#17167)

"what would other Soylenters do to write efficient, pure, standards-compliant C?"
Hire someone. Then, last time I wrote in c was back in college. Hell, back then we wrote c on punch cards, fed it into the univac machine where it spit out assembler cards that we then fed to the multivac to solve life's questions. Good times. Kid's got it easy these days.

--
The more things change, the more they look the same
- Re:It's simple really (Score: 0) by Anonymous Coward on Monday March 17 2014, @10:51AM
  
  by Anonymous Coward on Monday March 17 2014, @10:51AM (#17502)
  
  Univac couldn't handle C code.
  
  Parent
Who is the better programmer? (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:42PM

by Anonymous Coward on Sunday March 16 2014, @03:42PM (#17195)

Are you better at writing a string routine than the standard library authors who have written platform-optimized code that has been worked on for two, three, four decades? I'd be wary of rewriting anything in the standard library. This code has been around a very long time and has been scrutinized by many programmers.
Three simple steps. (Score: 1) by morpheus on Sunday March 16 2014, @10:31PM

by morpheus (1989) on Sunday March 16 2014, @10:31PM (#17308)

Well, we are predictably hearing a chorus of professionals crying out: do not do it. This does not answer his question though. How about these simple steps:
1) Think of the standard libriry/outside code as prototypig tools, put together a good test case.
2) Try to identify and measure a small task that in your opinion slows you down.
3) Educate yourself on how to improve this one task and make it run faster than what you have
You have to have an absolutely clear mind about 2) and 3). The truth is, a lot of `standard code' is suboptimal but good enough (for once, `use qsort() for sorting' is bad advice, sometimes even bubble sort is a lot more appropriate). If you are truly dissatisfied with speed, look at the algorithm first. C is a fantastic language but it does have limitations simply because it is based on a specific model of memory and computation (all those malloc's and register qualifiers have a specific class of computer architectures in mind). Sometimes writing in assembly makes the code easier to write and understand(I did not say more portable). Check this example out:
http://software-lab.de/doc64/README [software-lab.de]
If you are convinced it is C you want get a deep understanding what happens when a function is called, a pointer is dereferenced, etc. Optimization is not magic, either, just a lot of linear algebra. Once you have looked at all this you will come around ... to what everybody else was saying: learn more but do not reinvent the wheel ... unless you really want to.

Moderator Help

If bankers can count, how come they have eight windows and only four tellers?

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Writing Efficient C Code?

Read the disassembly. Read the disassembly. (Score: 5, Insightful) by Anonymous Coward on Sunday March 16 2014, @03:39AM

Re:Read the disassembly. Re:Read the disassembly. (Score: 5, Informative) by rev0lt on Sunday March 16 2014, @03:45AM

Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Interesting) by jackb_guppy on Sunday March 16 2014, @02:22PM

Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:29PM

Re:Read the disassembly. (Score: 1) by fnj on Sunday March 16 2014, @06:14PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 4, Informative) by frojack on Sunday March 16 2014, @08:13PM

Re:Read the disassembly. (Score: 4, Insightful) by maxwell demon on Sunday March 16 2014, @08:45PM

Re:Read the disassembly. (Score: 1) by jackb_guppy on Sunday March 16 2014, @09:22PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 1) by rev0lt on Sunday March 16 2014, @09:33PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by frojack on Sunday March 16 2014, @10:12PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @12:10AM

Re:Read the disassembly. (Score: 1, Troll) by frojack on Monday March 17 2014, @12:18AM

Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:26PM

Re:Read the disassembly. (Score: 1) by rev0lt on Monday March 17 2014, @05:11AM

qvcezvqumx (Score: 0) by Anonymous Coward on Thursday August 21 2014, @08:15PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 0) by Anonymous Coward on Sunday March 16 2014, @08:57PM

Re:Read the disassembly. Re:Read the disassembly. (Score: 2) by maxwell demon on Monday March 17 2014, @05:31PM

Re:Read the disassembly. (Score: 0) by Anonymous Coward on Monday March 17 2014, @06:12PM

Re:Read the disassembly. (Score: 2) by mojo chan on Monday March 17 2014, @04:43PM

Look at java implementation Look at java implementation (Score: 5, Interesting) by ls671 on Sunday March 16 2014, @03:42AM

Re:Look at java implementation Re:Look at java implementation (Score: -1, Troll) by Ethanol-fueled on Sunday March 16 2014, @03:56AM

Re:Look at java implementation (Score: 3, Informative) by ls671 on Sunday March 16 2014, @04:33AM

Re:Look at java implementation Re:Look at java implementation (Score: 1) by sjames on Sunday March 16 2014, @06:32AM

Re:Look at java implementation Re:Look at java implementation (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:32PM

Re:Look at java implementation (Score: 1) by sjames on Sunday March 16 2014, @08:44PM

Re:Look at java implementation Re:Look at java implementation (Score: 3, Informative) by zip on Sunday March 16 2014, @04:04AM

Re:Look at java implementation (Score: 2) by ls671 on Monday March 17 2014, @02:07AM

Re:Look at java implementation Re:Look at java implementation (Score: 1) by germanbird on Sunday March 16 2014, @05:17AM

Re:Look at java implementation (Score: 3, Insightful) by jorl17 on Sunday March 16 2014, @05:19AM

Re:Look at java implementation Re:Look at java implementation (Score: 1) by TheGratefulNet on Sunday March 16 2014, @06:00AM

Re:Look at java implementation (Score: 2) by Aighearach on Monday March 17 2014, @09:10PM

Re:Look at java implementation Re:Look at java implementation (Score: 1) by sjames on Sunday March 16 2014, @07:20AM

Re:Look at java implementation (Score: 2) by maxwell demon on Sunday March 16 2014, @12:15PM

Re:Look at java implementation (Score: 2) by Debvgger on Sunday March 16 2014, @08:30AM

Re:Look at java implementation (Score: 1) by koja on Sunday March 16 2014, @08:02PM

Don't be clever Don't be clever (Score: 5, Insightful) by jorl17 on Sunday March 16 2014, @04:34AM

Re:Don't be clever Re:Don't be clever (Score: 5, Informative) by forsythe on Sunday March 16 2014, @05:02AM

Re:Don't be clever Re:Don't be clever (Score: 1) by Fry on Sunday March 16 2014, @07:38AM

Re:Don't be clever (Score: 2) by Debvgger on Sunday March 16 2014, @08:27AM

Re:Don't be clever (Score: 4, Insightful) by Anonymous Coward on Sunday March 16 2014, @05:23AM

Re:Don't be clever Re:Don't be clever (Score: 5, Insightful) by germanbird on Sunday March 16 2014, @05:32AM

Re:Don't be clever (Score: 1) by khallow on Sunday March 16 2014, @01:27PM

memcpy and restrict memcpy and restrict (Score: 0) by Anonymous Coward on Sunday March 16 2014, @04:39AM

Re:memcpy and restrict (Score: 2) by forsythe on Sunday March 16 2014, @05:42AM

Re:memcpy and restrict (Score: 3, Insightful) by cmbrandenburg on Sunday March 16 2014, @01:52PM

I'd "Steal" it I'd "Steal" it (Score: 2) by istartedi on Sunday March 16 2014, @06:03AM

Re:I'd "Steal" it Re:I'd "Steal" it (Score: 4, Informative) by c0lo on Sunday March 16 2014, @07:03AM

bstr (Score: 1) by smaug9 on Sunday March 16 2014, @01:50PM

Re:I'd "Steal" it (Score: 3, Informative) by jones_supa on Sunday March 16 2014, @09:43AM

Re:I'd "Steal" it (Score: 2, Informative) by sorle on Sunday March 16 2014, @09:52AM

Profile Early, Profile Often Profile Early, Profile Often (Score: 4, Insightful) by prospectacle on Sunday March 16 2014, @06:19AM

Re:Profile Early, Profile Often (Score: 1, Insightful) by Anonymous Coward on Sunday March 16 2014, @03:55PM

Re:Profile Early, Profile Often Re:Profile Early, Profile Often (Score: 2, Insightful) by fnj on Sunday March 16 2014, @06:43PM

Re:Profile Early, Profile Often (Score: 0) by Anonymous Coward on Sunday March 16 2014, @07:21PM

Algorithms Algorithms (Score: 2, Insightful) by TGV on Sunday March 16 2014, @08:13AM

Re:Algorithms [and Architecture] (Score: 1) by Platinumrat on Sunday March 16 2014, @09:09PM

Never trust the compiler Never trust the compiler (Score: 5, Interesting) by Debvgger on Sunday March 16 2014, @09:50AM

Re:Never trust the compiler (Score: 5, Funny) by Fry on Sunday March 16 2014, @10:41AM

Re:Never trust the compiler (Score: 4, Insightful) by FatPhil on Sunday March 16 2014, @11:15AM

Re:Never trust the compiler (Score: 3, Insightful) by maxwell demon on Sunday March 16 2014, @12:26PM

fall to the extreme (Score: 4, Interesting) by lajos on Sunday March 16 2014, @01:04PM

Re:Never trust the compiler Re:Never trust the compiler (Score: 5, Insightful) by Anonymous Coward on Sunday March 16 2014, @01:56PM

Re:Never trust the compiler Re:Never trust the compiler (Score: 2) by Debvgger on Sunday March 16 2014, @04:33PM

Re:Never trust the compiler (Score: 0) by Anonymous Coward on Sunday March 16 2014, @04:53PM

Re:Never trust the compiler (Score: 1) by rochrist on Sunday March 16 2014, @05:11PM

Re:Never trust the compiler Re:Never trust the compiler (Score: 1) by fnj on Sunday March 16 2014, @06:07PM

Re:Never trust the compiler (Score: 0) by Anonymous Coward on Sunday March 16 2014, @06:34PM

Re:Never trust the compiler (Score: 1) by smellotron on Sunday March 16 2014, @08:30PM

Don't (Score: 2, Interesting) by ld a, b on Sunday March 16 2014, @10:44AM

Why? Why? (Score: 3, Insightful) by VLM on Sunday March 16 2014, @12:46PM

Re:Why? Re:Why? (Score: 2) by hubie on Sunday March 16 2014, @01:56PM

Re:Why? (Score: 2) by VLM on Sunday March 16 2014, @02:16PM

It's simple really It's simple really (Score: 2) by bucc5062 on Sunday March 16 2014, @01:13PM

Re:It's simple really (Score: 0) by Anonymous Coward on Monday March 17 2014, @10:51AM