"I've been writing C for quite some time, but I never followed good conventions I'm afraid, and I never payed much attention to the optimization tricks of the higher C programmers. Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use, but I need speed, and I really don't want to use inline ASM. So, I am wondering, what would other Soylenters do to write efficient, pure, standards-compliant C?"
It is a constant battle to keep programmers from prematurely optimizing things as that tends to make problems more "interesting". However, focusing your time and energy on simple code and good algorithms tends to get better results (code that is more maintainable, easier for the compiler to optimize, etc.).
The time for optimizing comes once you have a working library. Use a profiler. Spend your time on the bits that are actually making a difference in the overall performance of your program(s). That might be the time where you find a few portions of code that could benefit from some assembly or code tricks. Most of the time, though, when I profile a program, I just find more algorithmic improvements I can make to avoid that scenario or other optimizations such as caching or reworking a data structure.
Now if you are wanting to do a bare metal string implementation using lots of assembler just for the fun of it, I totally understand and say go for it. Hobby projects like that keep your geek skills sharp. But in the grand scheme of things, spending time on algorithms and data structures and profiling is probably a better investment into your overall code quality.
This^N. Get something that works first, then figure out how to make it better.
But having said that, your first few passes through the profiler may be quite enlightening as to where your program is spending its time. If I were profiling this string library, I'd start by just performing each distinct operation of the code a few thousand to few million times each, including creation and deleting of strings. Also, if your code is optimized for certain types of strings (eg, long string operations - see forsythe's comment [soylentnews.org]), then try it on stuff that it isn't optimized for (like short strings).
Stress testing (where you keep performing zillions of operations until something breaks) would be useful too. This is a more likely way to catch memory leaks and other subtle problems that slowly build up over time.