Archive for April, 2007|Monthly archive page

What are you waiting for?

One of the common problems I’ve helped customers with is optimising their device startup time.

There are many things you can do to squeeze a few precious seconds out of your startup sequence, but one of the simplest may not be immediately obvious.

The sequence

devb-eide
waitfor /dev/hd0 30

is an all too common one, but it’s surprisingly damaging if you are trying to not waste cycles.

For one thing, it means that you can’t really start anything after the waitfor until devb-eide has gone through its sometimes lengthy initialization sequence (sorry devb, don’t mean to pick on you…).

Really all the system can do is to finish off the initialization of any servers you already started and poll for the device to appear.

Wait a second – POLL??!!!

Hmmm, in RTOS speak, Poll=BAD, WaitForAnEvent=GOOD!

Yes, my friends, it’s sad to say that /bin/waitfor (actually the amazingly versatile /bin/on) does actually poll, with a period of 100ms to boot, as does the builtin version that procnto provides when running your image’s startup script.

There are a number of nasties associated with this, one of the most annoying being that if your device appears 1ms after that 100ms poll (which is just a stat()) then you’re going to wait around for ANOTHER 99ms before waitfor notices!

And unless you have plenty else going on in the meantime then that means the dreaded idle thread will happily be using up your precious cpu. Another good reason to have plenty of servers started BEFORE you do the waitfor.

So how about a remedy?

Well, the basic version of waitfor looks something like this…

while we haven’t reached the max limit
do
stat the device
if it’s there, break out
delay 100ms
done

Wouldn’t it be nice if we could do something more like

while we haven’t reached the max limit
do
sleep for the remaining timeout, but wakeup if the device appears?
done

Well, as it happens, if you are running QNX6.3.0 SP2 or later, then there is actually a facility you can use. It’s a new flag to procmgr_event_notify, PROCMGR_EVENT_NAMESPACE,which will see a sigevent winging it’s way to you whenever the pathspace changes, which is basically whenever someone attaches or detaches a pathname.

So here’s a revised version of waitfor that waits until something changes, then hopefully stats the device.

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/procmgr.h>
#include <sys/siginfo.h>

int main( int argc, char *argv[] )
{
struct sigevent event;
struct stat sbuf;
int timeout;
int total;
char *path;
time_t start, now;

path = argv[1];
timeout = atoi(argv[2]);

if ( stat( argv[1], &sbuf ) == 0 ) {
printf(“Found %s!\n”, path );
return EXIT_SUCCESS;
}

SIGEV_UNBLOCK_INIT( &event );
procmgr_event_notify( PROCMGR_EVENT_PATHSPACE, &event );

start = time(NULL);
do {
sleep( timeout );
if ( stat( argv[1], &sbuf ) == 0 ) {
printf(“Found %s!\n”, path );
return EXIT_SUCCESS;
}
now = time(NULL);
timeout -= (now – start);
start = now;
} while( timeout > 0 );
return EXIT_FAILURE;
}

If 99ms of idle time upsets you, this might help you sleep better.

Colin

PS – yes, this will be fixed in the next release of the kernel/utils, if I have anything to do with it.

PPS – I should mention that there is yet another location that waitfor is defined, that is the ewaitfor builtin in /bin/fesh. This version at least lets you set the poll period, as a third argument to ewaitfor. Again, that should be in 6.3.0 SP2 or later.

Advertisements

From file to device …

Interesting question arose last week from one of the media developers at QNX working with a third party library.  The API that the third party was providing to him involved him providing a file path to the library, that was then stored and passed back to him as a plugin that he was supposed to use for some specialized device access (in this case a custom CD drive handling).

The problem, other than the not so elegant API required by this third party library, was that while the library needed a filename to work with, his plugin needed to send drive level control commands in the form of custom devctl()‘s.

 This reminded me of some control commands that had been put in to the block/filesystem level in the early days of Neutrino.  Hunting around I found the devctl() commands I was looking for in <sys/dcmd_blk.h>:

#define DCMD_FSYS_MOUNTED_ON  __DIOF(_DCMD_FSYS,  16, char[256]) 
#define DCMD_FSYS_MOUNTED_AT  __DIOF(_DCMD_FSYS,  17, char[256]) 
#define DCMD_FSYS_MOUNTED_BY  __DIOF(_DCMD_FSYS,  18, char[256])

When you have a file descriptor that points to something handled by of of the block oriented filesystems or block driver (ie devb-*), then you can use these handy commands to dig down or up or to determine where an entry is mounted in the filesystem.

DCMD_FSYS_MOUNTED_ON
This command will return the pathname of the underlying device, partition or file the current entry was mounted on. For example a file descriptor to a filesystem file will generally point back to a partition (or the raw block device), a partition file descriptor will point back to the raw device. In this way you can burrow your way down to the controlling device.

DCMD_FSYS_MOUNTED_BY
This command tries to root out and determine where the file descriptor you are referencing has been mounted if it is a singular mount point such as a partition. This is an upward version of MOUNTED_ON, but since multiple users can exist for a single device, it isn’t as generally useful as the MOUNTED_ON command.

DCMD_FSYS_MOUNTED_AT
This command tells you where the object that you are referencing was mounted at. This is similar to the type of information that you get back from the mount utility, but in this case the MOUNTED_AT command is a specific command for the block filesystem.

Due to the overlay nature of the Neutrino pathname space, these commands are really only applicable for file entries since directory entry resolution isn’t as explicit as for files*, but can come in handy if you are in a pinch like we were with this silly third party library API.

Thomas


* We’ll leave the magic of mountpoints, overlays and the contents of the /proc/mount directory for a rain(ier) day I think.

Hello World (SIGSEGV remix)

Hi there, I’m Colin. Look Ma, I can write programs…

#./hello
Process 5861407 (hello) terminated SIGSEGV code=1 fltno=11 ip=b0348cec mapaddr=00048cec. ref=00000000

Oops, that’s not the start to this blog that I was hoping to make. Two lines in and already my efforts are riddled with bugs. Oh my.

In actual fact though, I wanted to start of by sharing some a tidbit I found the other day, while staring at the termination code in procnto.

The message you see above is generated by the termer thread, which runs some procnto code on behalf of your dearly departed process to cleanup leftover resources, such as file descriptors, memory mappings etc. And one of the things it does, if you have added a -v to procnto in your build file, is to print out information about the exit status of processes.

What it tells us is that your process died of a SIGSEGV, and the code=1 (SEGV_MAPERR from sys/siginfo.h) and fault=11 (FLTPAGE from sys/fault.h) tell me that it was trying to access a page that wasn’t mapped.

It died at the address 0xb0348cec, which is all very well, but if I load my hello binary into gdb then this address is clearly not in my application.

The clue is mapaddr=00048cec. This tells me that the object that contains the offending code was relocated when it was mapped into memory, and thus we are crashing in a shared object. But where/what is 0x48cec? The long winded approach is to figure out which shared object contains 0xb0348cec (using pidin mem) and then load that shared object into either gdb or objdump. But that’s all rather grungy.

Which brings me to the paydirt – there is code in the termer thread to actually fish out the name of the shared library and the function name in which you crashed!

So why weren’t we seeing this? Well a little investigation revealed that a linked list that was supposed to point to the process’ shared objects was not being initialised (Note this bug has been fixed now and will be available in a newer version of libc.so).

The initialisation function is an internal function, but you can force it to be re-called by trying to figure the address of a symbol with dladdr().

So now, here’s my hello world with dladdr invocation added. You might already note my error, but let’s run it just in case…

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char **argv)
{
char *world = NULL;
Dl_info info;

dladdr( main, &info );

printf( “Hello %s!\n”, strdup(argv[1]) );

world = “World”;

return 0;
}

#./hello
Process 5951519 (hello) terminated SIGSEGV code=1 fltno=11 ip=b0348cec(libc.so.2@strdup+0x20) mapaddr=00048cec. ref=00000000

Well silly me – I simply forgot to pass a command line (not to mention some nice error checking? Serves me right for writing such a contrived example… )

#./hello World
Hello World!

Process 6004767 (hello) exited status=0.

Ah, that’s better…

Mutex or Semaphore for Performance?

I had someone who was porting code over to Neutrino ask the other day about the choice of synchronization primitives to use in a semaphore callout that the software had to provide mutual exclusion for its data structures.

The developer’s initial thought was to use a semaphore since that is what the name of the callout implied, and other ports had used named semaphores … and QNX/POSIX has semaphores, both named and un-named, but the comment about the use of the callout to provide mutual exclusion seemed to imply that despite the naming, a mutex would be a better choice for performance which is when he asked me about it.

Assuming that you are using a binary semaphore and not a counting semaphore then purely from a performance point of view here is your ranking from worst to best choices:

Named Semaphore

This is a semaphore that trades off API convenience* (a pathname style location) for performance since you are going to be going through a resource manager (procnto in later versions of Neutrino or mqueue in earlier versions of Neutrino) who will be serializing the access to the semaphore count. While operationally it is the same, this extra messaging is going to impose an extra bit of overhead.

Normal Semaphore
This is a semaphore that is not identified by name, and whose operation is managed by the kernel. For each semaphore operation (sem_wait/sem_post) a kernel call is made to handle the management of the semaphore data. This is a smaller overhead than then message passing and server operations required for a named semaphore, but is still a kernel call for each semaphore call.

Mutex
While a mutex doesn’t provide the same semantics as a counting semaphore, is can be a great high performance alternative to a binary semaphore. Under Neutrino, mutexes are highly optimized such that they use the processor’s atomic operations to do an in place compare and exchange. Only if the mutex is contested is there any requirement to enter into the kernel. This means that in most cases where there is only minimal contention for the synchronization primitive there is no additional kernel call overhead

Inline Mutex
For most operations, the standard mutex is going to provide much better performance than a binary named semaphore and also better performance than the normal semaphore. However, if you really want to crank performance (and still use standard primitives) then you might have noticed that there are also inline mutex operations defined in pthread.h.

So in general, I tend to favour mutex operations over semaphores just because they are so nicely optimized. If you need to have a counting semaphore, then you might want to consider using a condition variable instead (which relies on a mutex base) for greater flexibility and potentially higher throughput. If that is the case, reading through the post a condvar is not a semaphore might be usefull.


*Named semaphores, normal semaphores and mutexes can all be used as synchronization tools that can be used between threads or processes. For named semaphores, this is inherent in the API for creating them, for un-named semaphores and mutexes they need to be created in a block of shared memory.

Embedding help into your binaries

Since QNX Neutrino is designed primarily to be deployed on embedded systems but also maintain many of the self-hosted characteristics that makes development on those platforms efficient (no-reboots, easy access, familiar cmd line tools) its help system is dually organized. There is a rich set of standard documentation (HTML or PDF) for the operating system, its programming environment, various libraries and utility functions.

However, since that documentation is not always at hand when you want to run a quick command* each target binary (ls, find, pidin etc) also contains an embedded text segment that holds a description on how to use the utility. In order to get at this information, you use the imaginatively named use utility.

For example, I never remember all the options to the find utility. If I were on a target then I could type in:

 % use find 
find  - find files (POSIX) 
find  ... [operand_expression] 
Operand expressions: 
...

In some ways this information is equivalent to unix style man pages, but since it is embedded directly into the binaries you never have to do any fancy configuration to prune information in (or out). Compared to rooting through and finding the utilities you use and then pruning the man pages appropriately, this is wicked easy.

In addition to the usage type of information, you can also print out build version information that is included with QNX binaries:

 % use -i find 
NAME=find 
DESCRIPTION=find files 
DATE=2004/12/16-00:01:41-EST 
STATE=Stable 
HOST=uberbuild 
USER=toolsbuild 
VERSION=6.3.0SP1 
TAGID=329

If you are ever contacting QNX support, this is great stuff to send them so that they can match up your binaries directly with the source for a given release if need be.

Under Neutrino the usage information is stuffed into two different ELF sections of the binary.

 % ntox86-objdump -h find 
/usr/bin/find:     file format elf32-i386 
Sections: 
Idx Name          Size      VMA       LMA       File off  Algn 
... 
 19 QNX_usage     00004fb7  00000000  00000000  0000d304  2**0 
                  CONTENTS, READONLY 
 20 QNX_info      00000085  00000000  00000000  000122bb  2**0 
                  CONTENTS, READONLY

This approach is a double edged sword. On the one hand it means you don’t pay a memory penalty when loading the process since the QNX loader knows that these sections aren’t executable and basically ignores them. Since they are standard ELF sections, you can use all of the standard binary manipulation tools to work with them. The downside of this is that these sections can be removed by utilities such as strip, although the Neutrino image build process, using mkifs will preserve these sections in the OS filesystem image.

The cool thing is that this information stuffing isn’t something just reserved for QNX utilities and shared objects! It can be added to any Neutrino binary via the usemsg utility. There is no set format for the general usage section (QNX_usage), so you can put whatever you like there. The convention is to put command line parameter explanations and example invocations in this free form text section. The information section (QNX_info) is a structured key=value area and has a date and time field that is automatically updated when information is added. It is easy to use this technique to integrate notes that can be easily retrieved on both host and target systems.

% use -i ./a.out 
No info available in ./a.out. 
% usemsg -i qnx=cool ./a.out 
% use -i ./a.out 
NAME=a.out 
DATE=2007-03-25EDT-20:06:24 
QNX=cool

Use messages are definitely a handy bit of infrastructure that make things just a bit easier for you to work with and manage your system software.


* For example working remotely connected to a customer site with no development environment at hand.