I've recently done some touchups to my SubStrings library, and have reminded myself of my own undying glory, er, I mean, come to the conclusion that it's useful enough to warrant me advertising it a little bit. I use it in a large percentage of my projects nowadays. I thought I'd showcase part of what makes it awesome to me.
Before we begin, there is one thing I must mention: SubStrings is only designed to work well with null terminated strings only. It's almost certainly unsafe to use it for anything else.
And, one last thing: This is indeed a C library, written in C89 and works in C. The OOP appearance, such as SubStrings.Length(), is function pointer trickery for cleanliness' sake. SubStrings is also a non-hosted library, meaning it has no dependencies, and can thus even be used in your bootloader's source code.
String copy and concatenation, truly safe
SubStrings copy and concatenation functions have precise bounds checking, and always result in a null terminated string, and, if the size parameter is given accurately, SubStrings never has buffer overflows. Let's illustrate.
#include "substrings/substrings.h"
void MyFunc(void)
{
char Array[1024];//So, the maximum string data copied will be sizeof Array - 1, so there is room for the '\0'.
SubStrings.Copy(Array, "My string is awesome!", sizeof Array);/*SubStrings.Cat's size parameter needs to be the *maximum capacity* of the destination. You don't need to subtract from
the max size each iteration of a loop. SubStrings knows how to do it.*/
SubStrings.Cat(Array, " And it merges really nice too!", sizeof Array);
}
strcpy and strcat() are in general, unsafe, and strncat and strncpy have differing and confusing behavior. SubStrings eliminates these problems with one consistent approach for concatenation and copy operations.
All the necessities provided
SubStrings also provides all the other functions you might want for basic string operations, including Length(), Compare(), NCompare() [analogous to strncmp()], Find(), and CFind() [Find a single character], and FindAnyOf() [Analogous to strchr()].
Find() and CFind() accept another, new argument, allowing you to directly request the N-th occurrence of the matching string/characters. There is also IsLowerS, IsLowerC, IsUpperC, and conversion functions like LowerS, LowerC, UpperS, and UpperC.
In general, functions ending in S deal with strings, and functions ending in C deal with single characters.
High level stuff is here too
Some of the cooler stuff is stuff you might expect to find in Python.
Stuff like StartsWith, EndsWith, Replace, Strip, Reverse, and StripLeadingChars and StripTrailingChars.
And some original ideas too
There are other functions, like Extract(), which pulls the string content that's in between two sequences.
Let's have an example.
#include "substrings/substrings.h"
int main(int argc, char **argv)
{
int Inc = 1;
char Buf[256];for (; Inc < argc; ++Inc)
{
if (SubStrings.StartsWith("--config=", argv[Inc]))
{
SubStrings.Extract(Buf, sizeof Buf, "=", NULL, argv[Inc]);
DoSomething(Buf);
}
}
}
What's happening here is that Extract is pulling the data that starts after the =, and since the next parameter is NULL, it reads on until the end of the string. This makes handling command line arguments marginally simpler.
There's other goodies, like SubStrings.CopyUntil(). Let's take a look.
#include "substrings/substrings.h"
void MyFunc(void)
{
const char *const String = "Wibble[END]Nurble[END]Aburble[END]Farts";
const char *Iter = String;
char Buf[256];
while (SubStrings.CopyUntil(Buf, sizeof Buf, &Iter, "[END]", true))
{
puts(Buf);
}
}
This produces the output:
Wibble
Nurble
Aburble
Farts
I actually find myself using CopyUntil and its sister function CopyUntilC quite often. Then there's SubStrings.Line.GetLine(), which is a specialized CopyUntil that helps with processing multi-line C strings.
Lastly, there's another useful function, Split().
#include "substrings/substrings.h"
void MyFunc(void)
{
const char *String = "Gerbil|Wibble";
char One[256], Two[256];SubStrings.Split(One, Two, "|", String, SPLIT_HALFONE);
}
Now, One[] contains "Gerbil|" and Two[] contains "Wibble". You can specify to discard the split tokens, or put them in half one or two. The options are SPLIT_NOKEEP, SPLIT_HALFONE, and SPLIT_HALFTWO. Because Split() doesn't ask for buffer sizes for the sake of convenience, the way to do it safely is to make sure that both One and Two will be able to hold the entire length of String, if needed.
This library is getting more touchups. You can find the github here, and the SubStrings homepage here.
Thoughts, ideas or suggestions? Let me know.
(Score: 0) by Anonymous Coward on Tuesday November 17 2015, @03:59PM
if (SubStrings.StartsWith("--config=", argv[Inc]))
{
SubStrings.Extract(Buf, sizeof Buf, "=", NULL, argv[Inc]);
DoSomething(Buf);
}
How efficient is this? I'd like to see some benchmark comparisons of your library against other options. A quick peek at the code it shows StartsWith() calls Length(), Extract() calls Length() and it also calls multiple Find()s, and Find() in turn calls another Length()... Length() is already O(n), so that's quite a few iterations over your entire string!
This is kind of a silly example too, since you already know that if your string starts with "--config=" that your argument will begin at argv[Inc] + sizeof("--config=") - 1. No need for all the Find()-ing that is implicit in using Extract().
(Score: 3, Informative) by Subsentient on Tuesday November 17 2015, @08:28PM
Some of the fancier functions tend to be a little expensive, but stuff like Compare(), Copy(), and Cat() performs very well in gprof. SubStrings does have some overhead from internal function calls, but in general, even advanced operations with SubStrings performs far, far better than Python/Perl/etc equivalents. The example you mention is something I've been doing because I'm lazy and don't like typing repeat string literals. If anyone thinks the performance of argv parsing is that important, bite me. :^)
"It is no measure of health to be well adjusted to a profoundly sick society." -Jiddu Krishnamurti
(Score: 2, Insightful) by https on Tuesday November 17 2015, @11:06PM
"...if the size parameter is given accurately..."
But otherwise, neat.
Offended and laughing about it.
(Score: 1) by Ethanol-fueled on Friday November 20 2015, @05:21AM
Babby questions: in your snippet, why not initialize the variable within the loop? And why use a capital letter?
int Inc = 1;
char Buf[256];
for (; Inc < argc; ++Inc)
(Score: 2) by Subsentient on Friday November 20 2015, @07:56PM
It's a stylistic choice for both. In C, the scope of a variable declared inside a loop ends when the loop does. So, if you plan to use the value of Inc after the loop's exit, it's wise to keep it separate.
"It is no measure of health to be well adjusted to a profoundly sick society." -Jiddu Krishnamurti