The Scrabble Optimization

This article discusses implementing a Scrabble game in C++ and the 3+ orders of magnitude optimizations encountered and lessons learned.

It seems so simple...

As most people know, Scrabble is a word game that takes place on a 15 x 15 board. Each player is given seven random letters, and places them on the board to make words. There are other rules, but that's the gist of it.

Of course, I want to write this in software and have AIs play against each other using massive dictionaries that are beyond the ability of most people to remember, let alone manage.

In theory, then, it seems rather simple — all you need to do is take the seven letters that you have in your “rack” and put them on the board in such a way as to make one or more valid words (it's like a crossword; you can join up with other words) and of course maximize your score (certain letters are worth more than others, certain board positions inflict a letter or word bonus, and if you use up all seven letters, you get 50 more points).

Sounds like a couple of nested for loops, right?

Yes ... but.

Architecture

The first design decision you need to make is, “how am I going to represent everything?” That includes the board, the “bag” (a little black plastic bag that holds 100 letters that you randomly draw from), the rack, and, especially, the dictionary (the universe of “valid” words that you can choose from).

As usual, let's start with the requirements.

If you don't have requirements, you don't know what you're making. If you don't know what you're making, then you're making a mess.

The Bag

The bag of letters is easy. There are 100 letters, with some pre-defined distribution of “A” through “Z”, and two wildcards which are blank — they can take on any letter A through Z but are worth zero points.

The simplest implementation, then, is a simple C++ string that contains 9 As, 2 Bs, 2 Cs, and so on (as per the game's distribution), totalling 100 letters.

The functions you need are pick, which removes a random letter from the bag, give_back which returns a letter back to the bag (for exchanging letters), and, for convenience, score that tells you how much a letter is worth.

We're not even going to look at the code here — it's nothing more than rand, erase, push_back and so on — basic operations.

The Rack

The rack is equally simple — seven letters — I just use a C++ string and some operations; much like the bag, above.

Now, using the rack, that's a different matter.

If I have, for example, ABCDEFG as my seven letters, I can construct all kinds of “words”: A, B, C, D, E, F, G, AB, AC, AD, AE, ... ABCDEFG, ABCDEGF, ABCDFEG, ABCDFGE, ... GFEDCBA.

The number of words I can construct is computed using basic combinatorial math: 7 one letter words, 7 x 6 two letter words, 7 x 6 x 5 three letter words, and so on, for a total of 13,699.

Placing Letters

But that's just the words that can be formed with the letters in the rack, placed one after another! That doesn't take into account the way that letters are actually placed on the board in Scrabble — that is, they can be added after, before, or interspersed amongst other letters on the board. Worse still, when you place a set of letters say above another word, the placed letters must form valid words along the horizontal and vertical. For example, if a part of the board had:

. . . . . . .
. . . . . . .
. W I T H . .
. . . . . . .

And you placed the letters ATE above the T:

. . . . . . .
. . . A T E .
. W I T H . .
. . . . . . .

That would not be valid — sure, the horizontal ATE is good, as is the vertical AT, but the next vertical TH doesn't spell a valid word.

What this means is that every time I place a word, I need to make sure that it's valid. This means keeping track of which letters I placed, and verifying horizontally and vertically, if applicable.

Dictionary

Another important piece of the game is the dictionary. This is simply a list of valid words. As input, it can be an unsorted list of words, one word per line.

As a first pass, I'd sort all the words and then use a binary search to index into them in order to discover if the set of letters that I have generated is a real word.

I have several dictionaries; one is very simple, and contains around 2500 words. The second (much bigger) dictionary contains over 170k words. Doing the math, using a binary search means that I need 12 searches for the 2500 word one, and 18 searches for the 170k word one (12 searches gives me up to 4096, 18 gives me up to 262144 words). We'll keep these numbers in mind as we analyze more about this.

Where to place?

The first decision I need to make is, where can I place letters, and how many? To that end, I simply construct two shadow matrixes -- one tells me the minimum number of letters I can place at any given square horizontally, and the other tells me the same, except vertically.

Thus, if a given element has the value zero, it means that I can't place any letters there. If it had 3, for example, then it means that I'd need to place at least 3 letters, horizontally or vertically (depending on which of the two matrixes it was), in order to connect to an anchor letter.

It turns out, this is very useful for multithreading! (You knew we were going to multithread the crap out of this, right?)

Multithreading

I'll just dive in and show you the driver loop for multithreading, and then we'll discuss the implementation. This is a simplified outline, with some details omitted.

static void
buzz (void)
{
#define MULTITHREADED
#ifdef  MULTITHREADED
  // create a pool of worker threads
  int nthreads = thread::hardware_concurrency();
  if (!nthreads) {
    nthreads = 8;
  }
  vector  threads (nthreads);

  for (auto & t : threads) t = thread (probe_thread);
  for (auto & t : threads) t.join ();
#else // MULTITHREADED
  probe_thread();
#endif  // MULTITHREADED
}

Ok, that wasn't very exciting. That's just the high level demux point that lets me decide if I'm multithreading or just going single thread (for debugging). Here's where the real work is done:

void
probe_thread(void)
{
  // this is the cleanest way to do "for (auto dir : Direction)" :-)
  Direction directions [] = { Horizontal, Vertical };

  for (int major = 0; major < BOARD; major++) {
    for (int minor = 0; minor < BOARD; minor++) {
      for (auto dir : directions) {
        // there might be work
        if (the_board.minimum (dir, major, minor)) {
          // using the mutex, verify that there's actually work
          bool execute = false;
          int thresh = the_board.minimum (dir, major, minor);
          the_mutex.lock ();
            if (the_board.minimum (dir, major, minor)) {
              the_board.minimum_reset (dir, major, minor);
              execute = true;
            }
          the_mutex.unlock ();

          if (execute) {
            Probe p (dir, major, minor, thresh);

            // compare "best" results while protected by mutex
            the_mutex.lock ();
              if (p.best.score > the_best.score) {
                if (debug_scoring) cout << "RESULT: " << p.best.score << " beats " << the_best.score << endl;
                the_best = p.best;
              }
            the_mutex.unlock ();
          }
        }
      }
    }
  }
}

That's a fair bit of code, so let's look at it line by line. The three nested for loops are responsible for visiting each and every square on the board, and for trying both a horizontal and vertical direction.

Direction directions [] = { Horizontal, Vertical };
for (int major = 0; major < BOARD; major++) {
  for (int minor = 0; minor < BOARD; minor++) {
    for (auto dir : directions) {
    }
  }
}

One of the tricks that I latched onto early on in development was to abstract away from the direction — I don't care if I'm going horizontally or vertically, they're both the “same” algorithm. So, in the inner body of the loop, I have a major, minor, and dir that indicates what work I need to be doing.

Each worker thread finds work for itself. It does that by looking at the matrix I discussed above — that is, the “minimum number of letters required” matrix. If the value is zero, or more than the number of letters available in the rack, the thread moves on to the next entry. Otherwise, it instantiates a Probe class which probes the given square in the given direction.

For optimization, we know that if there isn't any work available, we don't have to think further about it — it's not like work might suddenly “appear” in that position. But, if there is work available, we need to grab a mutex and take another look. That's because some other thread may be looking at the exact same position as we are, and it might start working on it. If we both work on it, we're wasting CPU time. So, grabbing the mutex, we verify that there is indeed work. If not, no harm done, move on to the next one.

If there is work available, we reset the position (while we still have the mutex), and make a note of where the work is. We release the mutex, and then go off and do the work.

When the Probe class has completed, we once again lock the mutex, and see if the results that we came up with had a better score than any other thread's results — if so, we store ours, otherwise we ignore them. We then release the mutex.

The important thing here with multithreading is that:

  • we hold the mutex for the absolute minimum amount of time,
  • we use the mutex for exclusive access when contention would be bad, and
  • each thread has all the context it needs in order to do its work.

(We're getting to the optimization stuff. Patience grasshopper!)

Optimizations

In the first version of the game, I permuted all the letters, and then looked them up to see if they happened to form valid words.

Permutations

The permutation is straightforward, actually, I surprised myself with how simple it was:

void
permute (string base, string rest)
{
  for (int i = 0; i < rest.size(); i++) {
    permute (base + rest [i], string (rest).remove (i, 1));
  }
}

That's the gist of it — you go through and select one letter, and then recursively deal with the rest. Given ABC, for example, the outer for loop gets the A and calls permute with BC. That iteration gets the B, and calls permute with C.

This constitutes the permutation ABC.

The innermost permute doesn't have any more letters, so it returns. The middle permute now moves on to C, and calls permute with B.

That constitutes the permutation ACB.

And so on. Of course, someone is actually doing something with the letters as they go by — this was just to show the “guts” of the algo.

So, as mentioned above, I permuted all the letters and then looked to see if they form a valid word. Turns out, that's kind of a dumb idea. If I've already got QK as my first two letters, then no matter what permutations I try, I certainly will not get a valid word, because nothing starts with “QK” :-).

The optimization, therefore, is to see if the permutation-so-far is actually the beginning of a word. If not, then there's no point continuing!

Dictionary

And ... that brings us back, full circle, to the dictionary. As implemented, the dictionary is good for telling us if a given word is, or is not, in the dictionary. It tells us nothing about if the letters that we have so far are a valid prefix for other words in the dictionary.

Do you remember rotary dial telephones? Do you know how they worked, and how the phone network connected calls?

It's a fascinating story, actually.

The rotary telephone has a dial, and you select a digit, put your finger in the hole corresponding to the digit, and move it as far clock-wise as it will go. Then you release. The dial turns back (counter clock-wise), and emits a number of pulses (corresponding to the digit you selected, except 0 which emits ten pulses).

In a world without software, mechanical relays were used for switching. Your phone would start off connected to a relay when you went off hook. When you dialled a number, say 5, the relay would move five positions up, and then would hunt across for the next available relay. Your first relay would stay in that position until you hung up. When you dialled your second digit, say 9, the second relay would move nine positions up, and then hunt across for the next relay. And so on, until you dialled all of the digits. The final relay would connect you to the actual telephone that you had dialled.

In the software world, this is called a trie.

In the context of a dictionary, what we want to do then is look at the first character. Say we're trying to see if “app” is a valid prefix (we know that it is, it forms the words “apple”, “apples”, “appliance”, “application” and so on).

Assume there's a root array, with 26 elements in it. You put the first letter, “A”, into the array, and it gives you a pointer to another array, also with 26 elements in it. You put the second letter, “P”, into that array, and it gives you a pointer to a third array, again with 26 elements in it.

If the letters that you're feeding into this algorithm do indeed form a prefix of a word, then you will be able to traverse this network of arrays. Much like the rotary dial telephone traversed the public switched telephone network.

If, however, you are unable to traverse to the next array (for example, our nasty word “QK”), then you know that it does not form a prefix. Here, we'd look at the “Q” index of the first array, and that would lead us to a second array. You can think of that second array as “all the words that start with the letter Q.” Obviously, looking into that array at the “K” index gives you a NULL pointer — nothing in English starts with “QK”.

Here's the code that constructs the dictionary:

struct Dict
{
  struct Dict *next [26] = { 0 };
  bool terminal = false;
};

static Dict root;

void
load_dictionary (string fname)
{
  Mapfile f (fname);
  const char *end = (const char *) f.ptr() + f.size();
  const char *start = (const char *) f.ptr();
  for (const char *index = start; index < end; index++) {
    if (*index == '\n') {
      // skip comments and blank lines
      if (*start != '#' && start != index) {

        // add word to dictionary
        Dict *d = &root;
        for (; start < index; start++) {
          if (!d -> next [*start - 'a']) {
            d -> next [*start - 'a'] = new Dict;
          }
          d = d -> next [*start - 'a'];
        }
        d -> terminal = true;
      }

      // put start at beginning of next word
      start = index + 1;
    }
  }
}

I have a helper class called Mapfile that takes a filename, mmaps it into memory, and gives me a pointer and a size. Then, I walk through that memory, find the beginnings of words and process them. The innermost for loop walks through the letters in the word, and puts them into the dictionary (creating new entries as required). (We're assuming the input is sanitized and has just lowercase letters; you can add checks of course.)

To test if a word is valid, we have the following:

bool
isvalid (string word)
{
  Dict *d = &root;
  for (auto & c : word) {
    c = tolower (c) - 'a';
    if (!d -> next [c]) return (false);
    d = d -> next [c];
  }
  return (d -> terminal);
}

I can just hear the relays in the central office clicking away! Each letter is used as an index into the 26 element array, and if we can't proceed to the next letter, then it's not a word! If we were able to process all the letters, then it might be a word. We use the terminal bool to tell if that letter is the end of a word. For example, “app” is a word, and so is “apple” — but if I looked up “app”, there are more letters possible. The terminal flag will be set when I encounter “app,” letting me know that that sub-string is a word on its own and is thus ok.

The opposite works too — just getting “flou” isn't enough — it's not a real word. The word “flour” is good, or “flounder,” but not “flou”. Thus, “flou” would have terminal set to false.

Finally, here's the code for isprefix. The implementation is just the same as isvalid, except we're ok with the word just being a prefix, and not a full word.

bool
isprefix (string word)
{
  Dict *d = &root;
  for (auto & c : word) {
    c = tolower (c) - 'a';
    if (!d -> next [c]) return (false);
    d = d -> next [c];
  }
  return (true);
}

Size?

Yes. It's big. On a 64-bit architecture, each pointer is 64 bits. Assuming some padding, each Dict structure is thus 27 x 8 = 216 bytes:

Dictionary Words Characters Dict objects Bytes
small 2,294 16,172 5,183 1.1MB
big 171,855 1,691,993 380,468 82.2MB
(ratio) 74.9X 104.6X 73.4X

Interesting, no? The ratio of object sizes is smaller (albeit slightly) than the number of word sizes.

Comparing objects to words gives a 2.26X ratio for the small dictionary and a 2.21X ratio for the big one.

Speed?

I'm glad you asked — it's the fastest. Whereas a binary search would have to do some number (we had said between 12 and 18, above) of lookups, each of which would require a comparison of at least one letter, this just walks along the letter and does a direct lookup.