C Strings without the OO Adventure

Often times, I need a place to create a string, and then have the string go away when I no longer need it.

What I mean is something like:

char *
yymmdd (time_t now)
{
    static char buf [7];
    struct tm *tm;

    tm = localtime (&now);
    strftime (buf, sizeof (buf), "%y%m%d", tm);
    return (buf);
}

Easy peasy. Granted, the string doesn't "go away," but I certainly don't allocate more of them. But of course, you run into this:

printf ("Date range %s to %s\n", yymmdd (start), yymmdd (end));

Oopsy. (In case you missed it, since buf is a static variable in yymmdd(), it gets overwritten by whoever is called second, meaning that both dates print out the same).

This has traditionally been solved by allocating storage in the yymmdd() function, and then putting the onus on the caller to free() the string when they are done with it. I'm thinking, "bad idea."

Other times, it's solved by having the caller pass a buffer:

...
    char buf1 [HOW_BIG?], buf2 [HOW_BIG?];

    printf ("Date range %s to %s\n", yymmdd (buf1, start), yymmdd (buf2, end));
...

char *
yymmdd (char *buf, time_t now)
{
    struct tm *tm;

    tm = localtime (&now);
    strftime (buf, what_size_value_to_put_here?, "%y%m%d", tm);
    return (buf);
}

But this has two problems; how big of a buffer should the caller allocate, and what size do I pass to strftime()? You could guess the size, and add another parameter to strftime(). Yuck! Our beautiful little function yymmdd() has sprouted horns! (But not the good *BSD horns, either!)

OO languages solve this with "garbage collection" (GC). For me, that's just too much work, and it's not predictable in terms of realtime runtime performance.

Statically allocated strings are generally considered a poor solution, but they are even worse when you are dealing with a multi-threaded program. In this case, two threads will both call yymmdd() simultaneously and trip all over each other's use of the static buffer. A mutex doesn't help, kids! :-)

While it would be nice to come up with a "general purpose solution", (and I do generally try to do that), in this case, the "use case" is that there are "a few" instances of the string lying around. Here's the patch:

#define N_STRINGS  4

char *
yymmdd (time_t now)
{
    static char buf [N_STRINGS][7];
    static int rotator = 0;
    struct tm *tm;

    tm = localtime (&now);
    strftime (buf [rotator++], sizeof (buf [0]), "%y%m%d", tm);
    if (rotator >= N_STRINGS) {
        rotator = 0;
    }
    return (buf);
}

I'm back to "Easy peasy." Obviously, if you're in a multithreaded environment, you'll want to put a mutex around the whole thing (if you want to be cheesy), or rework it so that the critical section is minimized:

#define N_STRINGS  4

char *
yymmdd (time_t now)
{
    static char buf [N_STRINGS][7];
    static int rotator = 0;
    static pthread_mutex_t yymmdd_mutex = PTHREAD_MUTEX_INITIALIZER;
    struct tm *tm;
    char *ptr;

    pthread_mutex_lock (&yymmdd_mutex);
        ptr = buf [rotator++];
        if (rotator >= N_STRINGS) {
            rotator = 0;
        }
    pthread_mutex_unlock (&yymmdd_mutex);
    tm = localtime (&now);
    strftime (ptr, sizeof (buf [0]), "%y%m%d", tm);
    return (ptr);
}

Ta da! Obviously, the use-case will dictate the value of N_STRINGS. (Yes, the indentation of the mutex-locked area is intentional, and yes, the comparison against rotator being greater than N_STRINGS is unneccesary, but I call it "defensive programming" — it can't "run away" on you.)

There are those who will complain, "but this uses a lot of string space for nothing!" Suck it up, buttercup. In your worst-case system design, when you have as many threads / whatever contending for this resource, you will use as much string space (or more) as this implementation. It's up to you to select the value of N_STRINGS to be appropriate to your worst case situation. The only time this solution will be worse than a GC one is if you have large data areas that are mutually exclusive from each other. In that case, yes, our worst-case memory usage is guaranteed to be the size of all data areas, whereas in a GC situation it may be smaller.

Hacks

It has been pointed out that the rotator logic could be simplified. There are a number of ways of doing this:

The second one is a very special case, and does lead to some readability problems (i.e., if you don't analyze the code in depth, you'll be wondering why it doesn't go off to elements 256, 257, 258, etc.) The first one is easily done via:

#define N_STRINGS_POWER     4 // Only ever adjust this number
#define N_STRINGS           (1 << N_STRINGS_POWER)
#define N_STRINGS_MASK      (N_STRINGS - 1)

And then the relevant code part becomes:

ptr = buf [rotator++ & N_STRINGS_MASK];

which, granted, does look a lot nicer. Notice that we allow rotator to overflow — we really don't care about that, so long as the resulting index is within range. (That said, if you modify this to not be a power-of-2, and instead use the modulo operator, you are in for a world of grief when rotator hits its integer-size boundary, because it doesn't align nicely with the modulo-imposed boundary. Good luck with that.) The power-of-2 solution does limit your tunability — you can only tune to a power of two, which isn't a big deal for small values but can be overkill for larger ones.