"I've been writing C for quite some time, but I never followed good conventions I'm afraid, and I never payed much attention to the optimization tricks of the higher C programmers. Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use, but I need speed, and I really don't want to use inline ASM. So, I am wondering, what would other Soylenters do to write efficient, pure, standards-compliant C?"
Many people try to do this and then resort to clever tricks. They use hand-made stacks for some algorithms, they sort the hell out of many things (using their own sorting functions), and micro-optimize for whatever gets in their head. Newsflash: you will never anticipate everything...
Also try to use the standard. Sorting? use qsort. Copying? Use memcpy, strcpy and the likes. Don't try to beat the implementation!
Most often, simple code turns out to be efficient, and the compiler can optimize it more easily. Remember that super awesome triple pointer you built to cash something? Well, those 3 dereferences slowed the fuck out of your application, too.
Simple is mostly gold.
Of course if you really want speed, I suggest you look at the disassembly for many architectures and study it.
Lastly: optimize the algorithms, not the method. If you implement an algorithm badly, in 1-2 years, the CPU power advance will make it run faster. It won't change its complexity, though.
I completely agree, and offer Rob Pike's notes [lysator.liu.se]. In particular, rule 3 of complexity stands out:
Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy.
Fancy algorithms are slow when n is small, and n is usually small.
Which is why I only use bubble sort!
I only use Bubble Bobble, sort of.
Main thing about "don't be clever" is to use OTHER CLEVERER people's code (and benchmark).
There are other C string libraries out there, why not use those? Why are you rolling your own? If you said you wanted to learn about stuff then rolling your own makes sense, BUT you said you needed speed.
Less code to write, fewer bugs you're responsible for and less documentation to do.
It is a constant battle to keep programmers from prematurely optimizing things as that tends to make problems more "interesting". However, focusing your time and energy on simple code and good algorithms tends to get better results (code that is more maintainable, easier for the compiler to optimize, etc.).
The time for optimizing comes once you have a working library. Use a profiler. Spend your time on the bits that are actually making a difference in the overall performance of your program(s). That might be the time where you find a few portions of code that could benefit from some assembly or code tricks. Most of the time, though, when I profile a program, I just find more algorithmic improvements I can make to avoid that scenario or other optimizations such as caching or reworking a data structure.
Now if you are wanting to do a bare metal string implementation using lots of assembler just for the fun of it, I totally understand and say go for it. Hobby projects like that keep your geek skills sharp. But in the grand scheme of things, spending time on algorithms and data structures and profiling is probably a better investment into your overall code quality.
This^N. Get something that works first, then figure out how to make it better.
But having said that, your first few passes through the profiler may be quite enlightening as to where your program is spending its time. If I were profiling this string library, I'd start by just performing each distinct operation of the code a few thousand to few million times each, including creation and deleting of strings. Also, if your code is optimized for certain types of strings (eg, long string operations - see forsythe's comment [soylentnews.org]), then try it on stuff that it isn't optimized for (like short strings).
Stress testing (where you keep performing zillions of operations until something breaks) would be useful too. This is a more likely way to catch memory leaks and other subtle problems that slowly build up over time.