Hello World (SIGSEGV remix)

Hi there, I’m Colin. Look Ma, I can write programs…

#./hello
Process 5861407 (hello) terminated SIGSEGV code=1 fltno=11 ip=b0348cec mapaddr=00048cec. ref=00000000

Oops, that’s not the start to this blog that I was hoping to make. Two lines in and already my efforts are riddled with bugs. Oh my.

In actual fact though, I wanted to start of by sharing some a tidbit I found the other day, while staring at the termination code in procnto.

The message you see above is generated by the termer thread, which runs some procnto code on behalf of your dearly departed process to cleanup leftover resources, such as file descriptors, memory mappings etc. And one of the things it does, if you have added a -v to procnto in your build file, is to print out information about the exit status of processes.

What it tells us is that your process died of a SIGSEGV, and the code=1 (SEGV_MAPERR from sys/siginfo.h) and fault=11 (FLTPAGE from sys/fault.h) tell me that it was trying to access a page that wasn’t mapped.

It died at the address 0xb0348cec, which is all very well, but if I load my hello binary into gdb then this address is clearly not in my application.

The clue is mapaddr=00048cec. This tells me that the object that contains the offending code was relocated when it was mapped into memory, and thus we are crashing in a shared object. But where/what is 0x48cec? The long winded approach is to figure out which shared object contains 0xb0348cec (using pidin mem) and then load that shared object into either gdb or objdump. But that’s all rather grungy.

Which brings me to the paydirt – there is code in the termer thread to actually fish out the name of the shared library and the function name in which you crashed!

So why weren’t we seeing this? Well a little investigation revealed that a linked list that was supposed to point to the process’ shared objects was not being initialised (Note this bug has been fixed now and will be available in a newer version of libc.so).

The initialisation function is an internal function, but you can force it to be re-called by trying to figure the address of a symbol with dladdr().

So now, here’s my hello world with dladdr invocation added. You might already note my error, but let’s run it just in case…

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char **argv)
{
char *world = NULL;
Dl_info info;

dladdr( main, &info );

printf( “Hello %s!\n”, strdup(argv[1]) );

world = “World”;

return 0;
}

#./hello
Process 5951519 (hello) terminated SIGSEGV code=1 fltno=11 ip=b0348cec(libc.so.2@strdup+0x20) mapaddr=00048cec. ref=00000000

Well silly me – I simply forgot to pass a command line (not to mention some nice error checking? Serves me right for writing such a contrived example… )

#./hello World
Hello World!

Process 6004767 (hello) exited status=0.

Ah, that’s better…

Advertisements

9 comments so far

  1. Mario on

    Hello Colin,

    Tried that, and all I get is :

    #./a.out
    Memory fault
    #

    I did notice that in some circumstances a program that SIGSEGV will simply display “Memory fault” while at other time it will display a more complete message. I have not yet been able to figure out why.

    Suggestion?

  2. colinburgess on

    Memory fault is actually a message from the shell – you probably don’t have the verbose option set on procnto. Check your build file, you need at least 1 -v

  3. Mario on

    Woops, just noticed it was mentioned in the original post. Sorry.

    Thanks!

  4. konstantin on

    Does it make any difference while dealing with multithreaded applications?

    Also, is it possible somehow to get the full “stack” of nested functions to that one, which caused the signal?

    Thanks in advance.
    Konstantin.

  5. konstantin on

    Does it make any difference, while dealing with multithreaded applications?

    Also, is it possible to get the full “stack” of nested functions to that one, which caused the signal?

    Thanks in advance.
    Konstantin.

  6. colinburgess on

    No, it doesn’t make any difference with multi-threaded apps, although the thread number is not printed out

    To get the full backtrace you would have to load the core file into gdb

    Do a backtrace is non trivial and would bloat procnto beyound belief, especially on some of the RISC architectures.

  7. konstantin on

    Hello,
    I have another question – what could it mean, if I can’t see the failing function though I’m using this technique? But when I’m trying ‘printf(“%s”, NULL)’ – everything works as expected?

    I’m using self-written dynamically linked library in my code, could this be the reason? (Doubtly, because I still can trace libc functions)

    Thanks.
    Konstantin.

  8. colinburgess on

    It could be, that the failing function is a static function, and thus would not have an entry in the dynamic symbol table.

    If this is true, then linking with -Wl,-E should solve that issue.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: