Can you wait a little bit faster?
I was having a conversation with a customer the other day who was very concerned about a pidin listing where he was seeing something like the following:
# pidin pid tid name prio STATE Blocked ... 20 1 server_app 255o RECEIVE 1 20 1 server_app 10o RUNNING 20 1 server_app 255o RECEIVE 1 ... 32 1 client_app 10o REPLY 20 32 2 client_app 10f NANOSLEEP ...
He was concerned because most of his system generally operated at priority 10 and if there were tasks that were at priority 255, then wouldn’t those other tasks be starved out and never get a chance to run!?
Of course if you take a few minutes to reflect on what is really being displayed then peace and calm return quite quickly to the world.
The threads that are all marked as being high priority are all in blocking states. That is to say that they are not RUNNING, nor are they wanting to run and in the READY state. In fact they are all blocked waiting for some other sort of event to occur or non-CPU resource to become available. While their reported priorities are high they are not going to interfere with the operation of the system … at least not until they transition to a RUNNING or READY state. In the case of threads which are RECEIVE blocked such as is the case here, unless the communication channels were created specifically to avoid priority inheritance, the receive thread will end up READY/RUNNING at the priority of the sending client.
If that is the case you may wonder, why have pidin report the priority information on blocking states at all if it is just going to cause people this sense of panic and trigger these types of false alarms?
Well the information can be insightfull since in most blocking scenarios, the thread was READY or RUNNING prior to being blocked and as such the priority gives a hint to what they might have been doing. Also in the case of priority inheritance states (MUTEX, SEND) then you can better see who is causing priorities to be automatically shifted in the system.
In this case, 255 was a curious number to be seeing and we eventually traced it back to their code where they were generating asynchronous pulse events to a server, but the pulses were not being properly initialized and were picking up junk off the stack and setting bad priorities … hence the 255 value.