Rob Krten's Web Page

Preventing (or at least mitigating) priority inversion

Introduction

In this article, I'll explore the problems of priority inversion by defining the problem, presenting traditional solutions, and then showing how the traditional solutions don't necessarily work well with servers. A solution to the server problem is presented.

The problem

Priority inversion is defined as an interference between two threads at different priorities, with the thread at the lower priority preventing the thread at the higher priority from getting CPU time. There are a number of variations on this theme, which we'll explore as well.

Priority inversion can happen in a number of ways. In the examples below, I'll use three threads, identified as L, M, and H, standing for, respectively, a low priority thread, a mid priority thread, and a high priority thread.

In the first example, L obtains a mutex, which H will need to obtain later. Since L owns the mutex, H cannot run, until it obtains the mutex. In this case, L has prevented H from running (a classical priority inversion). This problem can be exacerbated when M runs. Because M is now running, it's preventing L from running (and rightly so, because M's priority is higher than L's). L, however, cannot get the CPU time that it needs in order to perform whatever operations need to be done in order to release the mutex. Thus, because M is preventing L from getting CPU time, H cannot get CPU time because it's waiting for L to release the mutex, and L is being prevented by M from getting CPU time. We say this is a priority inversion because M is preventing H from running.

In another scenario, L sends a request to H, and H runs (at its higher priority) on behalf of L. In this case, another thread, M, doesn't get CPU time, because H is running. The problem is that L leveraged CPU away from M (by way of H), and thus caused a priority inversion.

Traditional solutions

Traditionally, these problems are solved by priority inheritance.

In the mutex case, above, the priority of L is boosted to match the priority of the highest priority thread that's waiting for the mutex (H in this case). This means that L is now running at the same priority as the highest priority waiting thread, H. Since L is running at the higher priority, M will not get to run, because its priority is lower than the now-boosted L's priority. Granted, this is still a priority inversion (in that L is now running at a higher priority than what it should have been running at), but it prevents a worse priority inversion from happening, namely, it prevents H from being blocked. The assumption is that L is coded to release the mutex as soon as possible; once the mutex is released, L's priority drops to its original value, and H now obtains the mutex and runs.

In the server case above, a similar situation occurs, but with an extra step. When L sent the message to H, in order to prevent a priority inversion, H's priority was dropped to match that of L. This means that M can still run, whereas before it would have been prevented from running by H's higher priority. Let's suppose that M is now indeed running. Let's introduce another thread (X), whose priority is higher than that of M, but lower than H's original priority. X now requests a service from H. Recall that H is running at the lower priority of L (in this case, it means that H is in fact not running, because M has a higher priority than L, and M is running). Here we once again encounter a priority inversion problem, in that X is prevented from having its request completed by H because M is running. The priority inversion problem stems from the fact that M is lower priority than X, and a lower priority thread should not prevent a higher priority thread from running. The way this is solved is similar to the mutex case — the priority of H is boosted to be the same as the priority of the highest waiting client (in this case X). This means that H is now running at the priority of X, but is performing services on behalf of L. Again, as with the mutex, this is a slight priority inversion, but it prevents a worse priority inversion from happening, namely, it prevents X from being blocked by M.

In both cases above, the priority inversion was solved by changing the priority of a thread. This in itself is a slight priority inversion, but the reason it's considered acceptable is because it prevents an even worse priority inversion from happening. In effect, when the priority of a low priority thread is raised, it's an attempt to clear out the blocking condition.

Not an ideal solution

Focusing on the server case, this still isn't an ideal solution.

In a typical server, the problem with priority inversions (and the traditional solutions) is that the operations requested of the server might take a considerable amount of time.

Let's take the example of a filesystem server (H). L requests a write() of 1 gigabyte. Immediately thereafter, M requests a read() of 256 bytes.

The server, H, goes through the steps outlined above. It drops its priority to match that of L, and begins the procedure of writing out 1 gigabyte of data. When M's request arrives, H's priority is raised to match the highest priority of all waiting clients, so H gets its priority boosted to the same priority as M. Now H is busy writing 1 gigabyte of data at priority M.

The first thing that's wrong with this approach is that H is now running at priority M, causing anything below priority M to be starved of CPU. M has inadvertantly allowed L to leverage CPU away from any threads running at a priority of less than M. Since L's request will take a substatial amount of time to complete, this means that a substantial priority inversion has now occurred. Not only that, but because of the size of L's request, all other clients of H are waiting for H to complete L's work. This is a slightly more subtle form of priority inversion, in that it's the nature of L's work that is preventing clients of H from getting their work done.

This problem too has a "traditional" solution. Generally, a server will have multiple threads within it, each ready to service clients. Thus, when L sends its request for a 1 gigabyte write(), a thread within the server is assigned to handle the request (and, as you would expect, that thread is lowered in priority to match the priority of L). Therefore, when M's request for the 256 byte read() arrives, another thread within the server is assigned to handle that request (and its priority is set to match M's priority). Since M is a higher priority than L, the 256 byte read() operation proceeds at a higher priority and gets satisfied before L's 1 gigabyte write() is completed.

While this might seem to solve everything, there remain two problems:

in a filesystem, caches are involved, so while L's writing operation may indeed proceed at L's priority, all it's doing is filling the cache at L's priority — when it comes time to flush the data to disk, this may be done at a completely different priority (i.e., there might not be any association between cache blocks and their respective priorities).
there are only a finite number of threads in the server; once all of the threads are busy servicing clients, new clients will block until a thread becomes available. This is partially alleviated by boosting the priority of the existing threads to match that of the highest priority waiting clients.

The priority queueing solution

Coupled with the above problems, there are two other problems with the current design of certain servers:

they are not sensitive to the nature of the requests,
they don't track priorities all the way through to the completion of the request

These problems aren't relevant to all servers; only servers that have a varying amount of work that can be done on behalf of a client, and ones where additional work may need to take place on behalf of the client.

The solution to this involves complexity in the server design. At the highest level, it means that the server needs to look at the nature of the request, and parcel out large requests into smaller, more manageable chunks. Secondly, the server needs to track these requests through to their completion.

In a filesystem, this would mean that when a client requested a 1 gigabyte write(), the filesystem would note the priority of the client, and packetize the client's request into much smaller chunks — say 16 thousand 64k writes (these don't need to be stored as 16 thousand individual data structures!). Then, this packetized request structure would be enqueued onto a priority-based queue, and the client-facing side of the filesystem server would be done with that particular request, and be able to service other requests (the client's request, however, is not yet completed, it's merely queued).

As other requests arrive, they too are packetized (if required) and stored on the priority-based queue. The packetization is in place for minimizing contention to shared resources. For example, if a mutex must be acquired before a write() and held for the entire time of the write(), then it's best to do the write() in smaller chunks, allowing the mutex to be released between chunks. For some servers, this packetization may not be necessary.

The priority-based queue is an array with lists of requests. The number of array elements is equal to the number of priorities in the operating system (64 in the current version of QNX/Neutrino, for example). Each priority has a service thread associated with it (running at the associated priority), which is blocked, waiting for a request to arrive on its list. When a request arrives, the thread is unblocked. Since each thread is operating at a different priority, this scheme effectively preserves the priority of the request from the client right through to the servicing of the requests. In this way, apart from short contentions for mutual exclusion (like obtaining a mutex to access a data structure), all service requests will proceed at the same priority as the client.

Now, in our example above, we need only one thread on the client-facing side of the filesystem, since that thread's job is simply to enqueue the request. This means that L's 1 gigabyte write() request was enqueued onto the priority queue at L's priority, and M's 256 byte read() request was enqueued onto the priority queue at M's priority. Two threads in the filesystem (FS-L, and FS-M, at L's and M's priority) are now ready, and are servicing the requests. This means that FS-L is filling up the filesystem cache with data from L, and FS-M is requesting a 256 byte read() from disk. However, the second part of the solution is that the cache blocks are tagged with a priority. This means that even though FS-L may be filling cache blocks at a fast rate, because FS-M has requested a read() from disk (to fill a cache block), priority is given to M's request. The additional complication is that the filesystem has to keep track of the requests' priorities, and handle them in priority order.

It may seem tempting to optimize the situation with the individual priority-based threads so that there is only one thread running at a high priority which scans the queue. The problem with this is that this one thread will need to constantly monitor the queue for higher priority requests as it's servicing lower priority ones. While this isn't difficult, it does add extra processing overhead. Another, more significant, problem would be that ideally we want all processing for a priority N client done at priority N — running a high priority thread over the queues means that this goal isn't met, unless we modified the priority of the thread. Unfortunately, that's not possible because if we lowered the priority of the thread, it may now become pre-empted by some other thread in the system, which would throw the whole scheme into disarray. A possible optimization would be to use thread pools for the priority array. This would be useful if the actual required number of threads was less than the number of priorities (as might be the case if there were few clients, or only a few priorities were in use, or a very large number of priority levels).

Conclusions

While this approach introduces additional complexities (requests must be queued) and resource requirements (many more threads are required), it solves the problem of priority inversion in servers by tracking the priorities of clients' requests from beginning to end, and allowing large requests to be packetized. Note that this approach is recommended only for servers where long or resource intensive operations are occuring; for example, it makes no sense to do this for a server that completes all requests almost instantly, and all requests are the same length. For lengthy, mutually exclusive atomic operations, it may be necessary to provide a "backout" scheme, where partially completed requests can be backed out and retried or aborted.