Mutex madness?

I recently received an interesting kernel trace from a customer. It showed several threads going from STATE_RUNNING to STATE_MUTEX without an intervening SyncMutexLock kernel call. He wondered if perhaps the trace was corrupted.

After looking at the trace it seemed to be sane, but the behaviour shown was somewhat puzzling. However it is explainable, and it has to do with our implementation of mutexes.

No kernel calls?

One of the first things you must realize, when looking at a QNX kernel trace, is that a pthread_mutex_lock() doesn’t always result in a SyncMutexLock() kernel call. This is because we implement an optimisation where an uncontested mutex can be aquired without entry to the kernel. Likewise, unlocking a mutex on which no-one is waiting will not result in a SyncMutexUnlock() kernel call. This is implemented using atomic compare-and-exchange operations. How they are implemented depends on which cpu you are using. Take a look at $QNX_TARGET/usr/include/<cpu>/smpcmp.h if you are interested in the various ways of doing this.

Not our man though…

State changes without seeing kernel calls might point us in that direction then, but the fact is that we are seeing a change to STATE_MUTEX, which would indicate a contested mutex.

The next clue though is that we ARE, in fact, seeing these state changes while in the kernel. The kernel call was a MsgReceivev(). This led me to the answer.

When we unlock a mutex that other threads are being waited on, we call SyncMutexUnlock(), which checks a priority ordered list of waiting threads. It then will remove the first thread from the list and mark it runnable (ie it will transition to STATE_READY). It also sets a thread flag, _NTO_TF_ACQUIRE_MUTEX, which indicates that the thread should aquire the mutex it was waiting on upon exit from the kernel. The address of the mutex was saved in the thread kernel calls arguments when it called SyncMutexLock(), some time in the past.

The point is, though, that the thread has not yet aquired the mutex. This can lead to some interesting behaviour.

In the log that was sent to me, the holding thread unlocked the mutex, and then immediately tried to relock it. The first unlock makes a waiting thread ready. SyncMutexUnlock() is not a blocking call though, so the thread remains running.

At this point, no-one holds the lock, although the lock indicates that people are waiting on it.

The following call to SyncMutexLock detects that there are threads waiting, and readies another thread (it has no knowledge that there was already a thread readied) to attempt to acquire the mutex. It then blocks the calling thread, and at this point the first thread that wanted to aquire the mutex will be made running.

Time passes, at at some point the second thread that was marked ready may actually be made running. As it exits the kernel, the kernel exit code processing notices the aquire flag is set on the thread, and attempts to aquire the mutex. Of course it is still held by the earlier thread, and thus we need to block this thread and choose another.


And there you have it. I’m sure it’s all a little odd without better knowledge of how kernel is implemented. The good news is that you can checkout the source and see for yourself. Also don’t forget to checkout the evolving docs on the project wiki page.

The moral of this story is to not get too upset by multiple thread state changes when we are within the kernel. The kernel can change it’s mind several times on which thread will ultimately make it out of the kernel and become running. The last state change before you see a kernel exit is the one to believe.



No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: