"I've been writing C for quite some time, but I never followed good conventions I'm afraid, and I never payed much attention to the optimization tricks of the higher C programmers. Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use, but I need speed, and I really don't want to use inline ASM. So, I am wondering, what would other Soylenters do to write efficient, pure, standards-compliant C?"
However today neither counting instruction, nor adding cycles is going to give a good estimate about your running time (unless it turns out e.g. that the loop is slightly too large to fit into the instruction cache), because the processors tend to do a lot behind the scenes (register renaming, branch prediction, out-of-order execution, speculative execution, ...). Far more important issues are things like cache locality (this alone can get you quite a bit of speedup, and can be analyzed entirely on the C level). And of course no amount of micro-optimization can save you from a badly chosen algorithm.