RTOS debugging, part 4: Priority inversion ? when the important stuff has to wait
October 10, 2017
As a developer, you have to watch out for programming pitfalls that can result in a higher priority task having to wait for a lower priority task ? this condition is known as priority inversion.
The central idea underlying an RTOS with a fixed-priority scheduler is that a high-priority task should be scheduled ahead of one with lower priority. If necessary, the RTOS can even pre-empt the running task, forcing it to yield the CPU to a higher priority task. Yet, as a developer you have to watch out for programming pitfalls that can result in a higher priority task having to wait for a lower priority task – this condition is known as priority inversion.
Priority inversions can occur in conjunction with a mutex, message queue or other type of synchronization object. The best way to describe the problem is probably to step through an example.
In the timeline diagram below, captured with Tracealyzer, we have a low-priority task (green) executing. It takes a binary semaphore to protect some critical section and continues to execute code within the critical section. When the high-priority task (red) enters the ready state, the RTOS pre-empts the green task and lets the red run. The red task tries to grab the same binary semaphore, but is blocked as the low-priority green task is holding it.
So far, everything is fine – this is expected behavior. In general, the green task would now run and quickly release the semaphore, at which time it is again pre-empted and the red task can obtain the semaphore and proceed. This time, however, an inversion occurs instead. For some reason, maybe a timed wait that has expired, a medium-priority (orange) task has entered the ready state and is allowed to execute ahead of the green task. As the orange task has no knowledge of the contested semaphore, it happily runs to completion. Only then does the green task finally run so that it can release the semaphore and hand over execution to the red, high-priority task.
So, the high-priority task was blocked and had to wait for an indeterminate time while a medium-priority task ran to completion. That is priority inversion at work.
It is important to realize that the three tasks involved here were essentially helpless. None of them could have done anything to avoid the inversion, at least not without some support from the RTOS. Luckily, such support is available in many RTOSs in the form of mutexes with priority inheritance. A mutex (short for Mutual Exclusion) is a kind of semaphore intended for protecting a shared resource. Priority inheritance means that if a high priority task blocks while attempting to obtain a mutex that is currently held by a lower priority task, then the priority of the task holding the mutex is temporarily raised to that of the blocked task. In our scenario, when the red task was blocked the green task would have been elevated to red priority, effectively preventing the orange task from running.
Priority inheritance does not really cure priority inversion, it just minimizes its effect in some situations. Hard real-time applications should still be carefully designed such that priority inversion does not happen in the first place.
Generally, avoid blocking on shared resources whenever possible. As an example, if your task writes data to a message queue (that might become full) you could instead use a sufficiently large queue that doesn’t get full and, as an extra precaution, write in a non-blocking manner and check the return value for any failed writes. And instead of using multiple critical sections scattered all over the code (sharing a mutex) you can instead create a “server” task that performs all direct operations on the resource and taking requests from “client” tasks using a message queue, in a non-blocking manner. The server can send any replies via other message queues, specified in the requests, that are owned by the client tasks.