Archive for May, 2007|Monthly archive page

You’ve been waiting long enough!

In my previous post on the problems with a polling waitfor, I mentioned that there was a timing window in which we could miss the notification event.

If the path we were looking for was attached in between the stat() and the procmgr_event_notify() call (admittedly a small window) then we would end up waiting the entire timeout duration before noticing that the path had appeared.

Here’s a new version that uses a pulse that closes that hole, by using a pulse as the notification event.

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include <sys/netmgr.h>
#include <sys/neutrino.h>

#include <sys/procmgr.h>
#include <sys/siginfo.h>

#define PULSE_CODE_TIMEOUT      _PULSE_CODE_MINAVAIL
#define PULSE_CODE_PATHSPACE    PULSE_CODE_TIMEOUT+1

int main( int argc, char *argv[] )
{
    struct itimerspec   timeout;
    int                 timer_id;
    struct sigevent     event;
    struct stat         sbuf;
    int                 chid, coid;
    char                *path;
    struct _pulse       pulse;

    path = argv[1];
    memset( &timeout, 0, sizeof(timeout) );
    timeout.it_value.tv_sec = atoi(argv[2]);

    chid = ChannelCreate(0);
    coid = ConnectAttach( ND_LOCAL_NODE, 0, chid, _NTO_SIDE_CHANNEL, 0 );

    SIGEV_PULSE_INIT( &event, coid, getprio(0), PULSE_CODE_TIMEOUT, 0 );
    timer_create( CLOCK_REALTIME, &event, &timer_id );

    /* make sure PATHSPACE event has higher priority */
    SIGEV_PULSE_INIT( &event, coid, getprio(0)+1, PULSE_CODE_PATHSPACE, 0 );
    procmgr_event_notify( PROCMGR_EVENT_PATHSPACE, &event );

    /* Check to make sure we don't wait for something already there... */
    if ( stat( argv[1], &sbuf ) == 0 ) {
        printf("Found %s\\n", path );
        return EXIT_SUCCESS;
    }

    timer_settime( timer_id, 0, &timeout, NULL );

    while( MsgReceivePulse( chid, &pulse, sizeof(pulse), NULL ) == 0 ) {
        switch(pulse.code) {
        case PULSE_CODE_PATHSPACE:
            if ( stat( argv[1], &sbuf ) == 0 ) {
                printf("Found %s\\n", path );
                return EXIT_SUCCESS;
            }
            break;
        case PULSE_CODE_TIMEOUT:
            printf("Timed out waiting for %s\\n", path);
            return EXIT_FAILURE;
        default:
            printf("Unknown pulse code %d waiting for %s\\n", pulse.code, path);
            return EXIT_FAILURE;
        }
    }
    return EXIT_FAILURE;
}

Note that I use a timeout pulse too, and make the timeout pulse have a lower priority than the notification pulse. This is because pulses queue in priority order, and I want to make sure that I receive the path event pulse first.

Cheers,

Colin

Resource manager dispatching … using your own channels

Ever write a resource manager?  Ever wonder why you were doing that and not just the raw operating system primitives?

If you have used Neutrino for any length of time, then you have probably done the former and if it was a trivial system, then the latter thought has probably crossed your mind.

QNX’s message passing strategy is all about following the Send/Receive/Reply mantra where clients talk to servers and block until they are replied to with a message.  Under QNX4 there were officially no threads so when you were sending a message, you only needed to supply the node (transparent network message passing!) and the process identifier as routing information.  When Neutrino (QNX6) was introduced, threads were made first class citizensand to make the targeting of message operations completely unambiguous, message channels were added.  This meant that instead of just a node and process id, you also had to specify the channel identifier.

 How does a client get these three pieces of information about the server it wants to talk to?  There are lots of different schemes to accomplish this depending on the structure of the system you are building; ranging from sticking the information into a piece of shared memory to passing it from parent to child processes via a fork/exec/spawn type of operation.   In a simple system, you can have simple sharing policies to get at this data, however as a system gets more complicated with more clients and servers, then things get out of hand quickly.

The preferred mechanism however, would be to let the OS do the heavy lifting and to replace the node, process id, channel id with a symbolic entry that is easy for clients to use and discover and let the resolution of that be handled by the operating system.

Enter the raison d’etre for the resource manager.  It was introduced to help facilitate that binding on both the client (name resolution) and on the server (node, process id, channel id binding) as well as providing dispatching capabilities to help with the demultiplexing of common messages to specific handlers.  The Neutrino Programmer’s Guide tells you all about this, and Rob Krten’s Neutrino books do a great job of providing further enlightenment.

About every six months or so I run into someone who is stuck in a very specific situation.  They have created their own channel via ChannelCreate(), perhaps to turn off priority inheritance or because they need to avoid using certain flags, and would like to use it with the resource manager framework.   The resource manager framework does such a  good job of abstracting and hiding the channel that it seems that it isn’t possible.

 Alas … do not lose hope!  There is an undocumented function that allows this to occur:

dispatch_t *_dispatch_create(int chid, unsigned flags)

The function above allows you to pass in a channel identifier that you have created yourself and serves as a replacement for the general dispatch initialization function, dispatch_create(), that is the preferred initialization.  As always, when you are using something that is undocumented (but publicly exposed) there are some caveats to consider:

  • The flags field should be passed as 0 at this time
  • In order for proper operation of the resource manager, you must specify _NTO_CHF_UNBLOCK | _NTO_CHF_DISCONNECT as flags for the channel.  Without these flags, your resource manager may fail to properly close off connections from clients that abnormally disconnect.

One of the situations where this function is required, is if you are trying to have a resource manager as part of an application that also uses Photon.  While this joining of server and interface functionality is generally frowned upon (a clean separation is always preferred) there are some situations where it simply isn’t practical.  In this case, Photon generally grabs the _NTO_CHF_COID_DISCONNECT flag for itself, which the current resource manager framework assumes it will be able to use.  Since this flag can be set on only one channel per process, your resource manager will end up failing during initialization with a mysterious error … a future release will clean this up and allow everyone to play nice, but for the current 6.3.x releases of Neutrino, you will need to use this function to work around that situation.

 Thomas

Can you wait a little bit faster?

I was having a conversation with a customer the other day who was very concerned about a pidin listing where he was seeing something like the following:

# pidin 
     pid tid name               prio STATE       Blocked 
... 
       20   1 server_app             255o RECEIVE    1 
       20   1 server_app             10o  RUNNING 
       20   1 server_app             255o RECEIVE    1 
... 
       32   1 client_app              10o REPLY         20 
       32   2 client_app              10f NANOSLEEP 
...

He was concerned because most of his system generally operated at priority 10 and if there were tasks that were at priority 255, then wouldn’t those other tasks be starved out and never get a chance to run!?

Of course if you take a few minutes to reflect on what is really being displayed then peace and calm return quite quickly to the world. 

The threads that are all marked as being high priority are all in blocking states.  That is to say that they are not RUNNING, nor are they wanting to run and in the READY state.  In fact they are all blocked waiting for some other sort of event to occur or non-CPU resource to become available.  While their reported priorities are high they are not going to interfere with the operation of the system … at least not until they transition to a RUNNING or READY state.   In the case of threads which are RECEIVE blocked such as is the case here, unless the communication channels were created specifically to avoid priority inheritance, the receive thread will end up READY/RUNNING at the priority of the sending client.

If that is the case you may wonder, why have pidin report the priority information on blocking states at all if it is just going to cause people this sense of panic and trigger these types of false alarms?

Well the information can be insightfull since in most blocking scenarios, the thread was READY or RUNNING prior to being blocked and as such the priority gives a hint to what they might have been doing.  Also in the case of priority inheritance states (MUTEX, SEND) then you can better see who is causing priorities to be automatically shifted in the system.

In this case, 255 was a curious number to be seeing and we eventually traced it back to their code where they were generating asynchronous pulse events to a server, but the pulses were not being properly initialized and were picking up junk off the stack and setting bad priorities … hence the 255 value. 

QNX Coding Tip of the Day:  Don’t initialize sigevent‘s using the member values, use the SIGEV_* macros!

 Thomas