Mutex or Semaphore for Performance?
I had someone who was porting code over to Neutrino ask the other day about the choice of synchronization primitives to use in a semaphore callout that the software had to provide mutual exclusion for its data structures.
The developer’s initial thought was to use a semaphore since that is what the name of the callout implied, and other ports had used named semaphores … and QNX/POSIX has semaphores, both named and un-named, but the comment about the use of the callout to provide mutual exclusion seemed to imply that despite the naming, a mutex would be a better choice for performance which is when he asked me about it.
This is a semaphore that trades off API convenience* (a pathname style location) for performance since you are going to be going through a resource manager (procnto in later versions of Neutrino or mqueue in earlier versions of Neutrino) who will be serializing the access to the semaphore count. While operationally it is the same, this extra messaging is going to impose an extra bit of overhead.
This is a semaphore that is not identified by name, and whose operation is managed by the kernel. For each semaphore operation (sem_wait/sem_post) a kernel call is made to handle the management of the semaphore data. This is a smaller overhead than then message passing and server operations required for a named semaphore, but is still a kernel call for each semaphore call.
While a mutex doesn’t provide the same semantics as a counting semaphore, is can be a great high performance alternative to a binary semaphore. Under Neutrino, mutexes are highly optimized such that they use the processor’s atomic operations to do an in place compare and exchange. Only if the mutex is contested is there any requirement to enter into the kernel. This means that in most cases where there is only minimal contention for the synchronization primitive there is no additional kernel call overhead
For most operations, the standard mutex is going to provide much better performance than a binary named semaphore and also better performance than the normal semaphore. However, if you really want to crank performance (and still use standard primitives) then you might have noticed that there are also inline mutex operations defined in pthread.h.
So in general, I tend to favour mutex operations over semaphores just because they are so nicely optimized. If you need to have a counting semaphore, then you might want to consider using a condition variable instead (which relies on a mutex base) for greater flexibility and potentially higher throughput. If that is the case, reading through the post a condvar is not a semaphore might be usefull.
*Named semaphores, normal semaphores and mutexes can all be used as synchronization tools that can be used between threads or processes. For named semaphores, this is inherent in the API for creating them, for un-named semaphores and mutexes they need to be created in a block of shared memory.