Wednesday, July 14, 2010

Writing a Debugger

You must have debugged your code using gdb before or traced some process with strace, if so, you must wonder how it works ? well it's not magic :) and it's not really that difficult either !

Today I'm going to walk you through writing a debugger, it's a small, yet completely functional, debugger, called mdb, using a mixture of ptrace, libbfd and libopcodes so enjoy... :)

Overview
The task of writing a debugger can be broken down into a set of smaller tasks:
  • Tracing a process.
  • Setting break point.
  • Disassembling instructions.
The main function will open a binary, read the symbol table, run it in a traceable process and then wait for commands from the user to set break points, single step through the code or continue execution normally. We already have a lot to do so let's get started...

Tracing a process:
Tracing a process is examining and controlling the execution of that process, In order for a process to be traced, it must either explicitly express its wish of being traced using ptrace(PTRACE_TRACEME,...) or another process may attach and trace it using ptrace(PTRACE_ATTACH,...).

However, since we're writing a debugger, attaching to a running process is not really an option, since we would like to debug it from the beginning, and we may not always have access to the source code of the process to explicitly start tracing, so what should we do ?

We can call fork(), creating a new process, and then call ptrace(PTRACE_TRACEME,...) on the newly created process, doing so makes it traceable, if we then exec() the binary we wish to trace we end up having our code executing in a traceable process:
pid_t pid;
switch (pid = fork()) {
    case -1: /*error*/
        perror("fork()");
        exit(-1);
    case 0:/*child process*/
        ptrace(PTRACE_TRACEME, NULL, NULL); /*allow process to be traced*/
        execl(path, name, NULL);            /*child will be stopped here*/
        perror("execl()");
        exit(-1);
}
/*parent continues execution here*/ 
After calling ptrace((PTRACE_TRACEME...) any signal, except for SIGKILL, sent to the traced process will cause it to stop executing, Calling exec from the traced process causes a SIGTRAP being sent to it, also causing it to stop.

The parent process can then be notified of the status of the child process using wait() and then resume the execution of the child process using PTRACE_SINGLESTEP or PTRACE_CONT.

Setting break points
Right, so we now have control over the execution of the process, not a very fine one though, a finer control of execution can be achived using breakpoints.

We can use the address of a symbol we wish to break at for setting a breakpoint, however, for obvious reasons, we would like to set break points using the symbol and not it's memory address. In order to do so, we must first read the symbol table of the binary, hopefully the binary in question is not stripped, and save it in hash table, so that when the user wishes to break on a symbol we look up that symbol in the hash and find its address. 

libbfd, part of the binutils, is used to read the symbol table from the binary it also happens to provide hash tables, but, uthash is more flexible: 
/* load symbol table*/
long size;
long nsym;
asymbol **asymtab;

bfd_init();
bfd *abfd = bfd_openr(path, NULL);

bfd_check_format(abfd, bfd_object);
size = bfd_get_symtab_upper_bound(abfd);
asymtab = malloc(size);
nsym = bfd_canonicalize_symtab(abfd, asymtab); /*reads symtab*/

/*create symbol table hash*/
long i;
SymbolTable *symtab=NULL, *symbol;
for (i = 0; i < nsym; i++) {
    symbol = malloc(sizeof(Symbol));
    symbol->sym  = bfd_asymbol_name(asymtab[i]);
    symbol->addr = bfd_asymbol_value(asymtab[i]);
    HASH_ADD_KEYPTR(hh, symtab, symbol->sym, strlen(symbol->sym), symbol);
}
Once we have the address we need a way of making the process stop executing at that address, that is, we need the process to generate a software interrupt or a trap, namly INT 3, which is defined specifically for use by debuggers.

Injecting the opcode of INT 3, which is either 0xCC or 0xCD03 when using 0xCD<imm8>, at the address of the breakpoint should be enough to get the job done:
case BREAK: {/*set break point*/
    /*look up the symbol in the symbol table*/
    HASH_FIND_STR(symtab, arg, symbol);
    if (symbol) {
        /*insert new break point*/
        brp = malloc(sizeof(Symbol));
        brp->sym  = symbol->sym;
        brp->addr = symbol->addr;        
        /*save instruction at eip*/
        brp->opc = ptrace(PTRACE_PEEKTEXT, pid, symbol->addr, NULL);        
        /*insert break point*/
        ptrace(PTRACE_POKETEXT, pid, symbol->addr, 0xcc);
        /*add break point to hash*/
        HASH_ADD_INT(brptab, addr, brp);
        printf("break %lx<%s>\n", brp->addr, brp->sym);
    } else {
        printf("symbol not found <%s>\n", arg);
    }                  
    free(arg);
    break;
} 
However, one minor "detail" is left, injecting opcode at an aribitrary address will most likely miss up the instructions that follow the breakpoint address, causing the processor to raise an exception and our poor process eventually being killed!

The solution is to first backup the instruction at the breakpoint address and then restore it when we reach this break point after the process being stopped, and before executing the next instruction we also need to decrement eip so that it points at the beginning of the restored word:
if (WIFSTOPPED(status)) {
    /*read eip*/
    long rip =  ptrace(PTRACE_PEEKUSER, pid, 8 * RIP, NULL)-1;
    
    /*look up eip in the breakpoint hashtable*/
    HASH_FIND_INT(brptab, &rip, brp);
    if (brp) {
        HASH_DEL(brptab, brp);
        /*restore instruction(s)*/
        ptrace(PTRACE_POKETEXT, pid, brp->addr, brp->opc);
        /*decrement eip*/ 
        ptrace(PTRACE_POKEUSER, pid, 8 * RIP, rip);
        printf("process %d stopped at 0x%lx<%s>\n", pid, rip, brp->sym);
    } else {
        printf("process %d stopped at 0x%lx\n", pid, rip);
    }
}

Disassembling instructions
We would like our debugger to print something readable instead of machine code, to do so we need something that can disassemble machine code from memory, libopcodes, also part of binutils, allows you to disassemble and print machine code of multiple architectures:
/*disassembly routine*/ 
void disassemble(void *buf, unsigned long size)
{
    struct disassemble_info info;
    init_disassemble_info (&info, stdout, fprintf);
    info.mach = bfd_mach_x86_64;
    info.endian = BFD_ENDIAN_LITTLE;
    info.buffer = buf;
    info.buffer_length = size;
    print_insn_i386(0, &info);
    printf("\n");
}
Cool now that we got that out of the way, one small problem left, since this is a CISC architecture then instructions, most likely, are of variable length, they could be 1 byte or 16 bytes long!

Two solutions jump to my mind, we can either let libopcodes decode whole words in this case it will output one or more instruction(s) and some garbage or we can use the program counter to find out the exact length of the instruction.

The program counter, or eip, is a register that keeps count of how many bytes executed so far, so at any given time, eip points to the next instruction to be executed, and when that instruction is executed it gets incremented with the size of the instruction in bytes, if we save the current eip until the next instruction and then subtract it form the new eip we get the size of the previous instruction in bytes, pretty neat, no ?

Not so fast, this might work for normal instructions, branching however, is quite a different story, branch instructions like jmp, will either increment the eip too much, if jumping forward, or decrement it, if jumping backward, if so, we will ignore the eip difference if it's to big or if it's a negative number, okay now we're ready:
case NEXT: {/*next instruction*/
    /*read instruction pointer*/
    long rip =  ptrace(PTRACE_PEEKUSER, pid, 8 * RIP, NULL);
    
    if (old_rip) {
        /*calculate instruction size*/
        long oplen = rip - old_rip;
        if (oplen > 0 && oplen < 16) {
            disassemble(&opcode, oplen);
        }
    }

    /*read two words at eip*/
    opcode[0] = ptrace(PTRACE_PEEKDATA, pid, rip, NULL);
    opcode[1] = ptrace(PTRACE_PEEKDATA, pid, rip+sizeof(long), NULL);
    old_rip = rip;

    /*next instruction*/
    ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
    wait(&status);
    break;
}
Test run
Time for fun :) we will debug a small program with mdb, this is the source code for the program:
#include <stdio.h>
#include <string.h>
int say_hello(int x, int y, int z)
{
    printf("Hello\n");
    return 0;
}

int main(int argc, char **argv)
{
    int x = 0x55;
    int y = 0x56;
    int z = 0x57;
    say_hello(0x255, 0x256, 0x257);
    return 0;
}
And this is the complete session:
./mdb test
process 15384 stopped at 0x7f519defcaef
(mdb) break main
breakpoint at 400546<main>
process 15384 stopped at 0x7f519defcaef
(mdb) continue
process 15384 stopped at 0x400546<main>
(mdb) next
process 15384 stopped at 0x400546
(mdb) next
push   %rbp
process 15384 stopped at 0x400549
(mdb) next
mov    %rsp,%rbp
process 15384 stopped at 0x40054d
(mdb) next
sub    $0x20,%rsp
process 15384 stopped at 0x400550
(mdb) next
mov    %edi,-0x14(%rbp)
process 15384 stopped at 0x400554
(mdb) next
mov    %rsi,-0x20(%rbp)
process 15384 stopped at 0x40055b
(mdb) next
movl   $0x55,-0x4(%rbp)
process 15384 stopped at 0x400562
(mdb) next
movl   $0x56,-0x8(%rbp)
process 15384 stopped at 0x400569
(mdb) next
movl   $0x57,-0xc(%rbp)
process 15384 stopped at 0x40056e
(mdb) next
mov    $0x257,%edx
process 15384 stopped at 0x400573
(mdb) next
mov    $0x256,%esi
process 15384 stopped at 0x400578
(mdb) next
mov    $0x255,%edi
process 15384 stopped at 0x400523
(mdb) break say_hello
breakpoint at 400524<say_hello>
process 15384 stopped at 0x400523
(mdb) continue
process 15384 stopped at 0x400524<say_hello>
(mdb) registers
rax 0x7f519def8ec8
rbx 0x0
rcx 0x0
rdx 0x257
rsi 0x256
rdi 0x255
rbp 0x7fff2395c140
rsp 0x7fff2395c118
rip 0x400524
process 15384 stopped at 0x400523
(mdb) kill
process 15384 terminated
The source code of the debugger is around 250 lines it's hosted on mercurial along with uthash, test.c and a Makefile. 
hg clone https://iabdelkader@bitbucket.org/iabdelkader/mdb
A port for x86-32 bit is also available, written by hnd, you may contact him at drona89@live.com for any questions or comments:
http://code.google.com/p/xbugger/

14 comments:

  1. I'm wondering if you can bypass some anti-debugging stuff ... like the RDTSC trick, by calling this instruction twice you get the current time stamp counter value and calculate the time a specific set (single?) of instruction it took.

    Or to bypass the ptrace(TRACEME) trick, which detects a debugger in use. I know most of all the tricks can be bypassed by a skilled person, I'm just suggesting that you look into those and figure a way to semi-automatically detect these tricks or even completely automatically if you can swing it.

    Anyways ... great stuff :-)

    ReplyDelete
  2. yes, you can easily and automatically bypass both of them

    a) rdtsc can be disabled for a process with prctl(2).

    b) ptrace(PTRACE_TRACEME,...) is easier to bypass you get a signal before each syscall,

    if syscall == ptrace
    single_step(); //ptrace(PTRACE_TRACEME) -1
    set_eax(0);

    or you can always hack the plt :)

    if you like, I can give you write access to the repo and you can implement both anti anti-debugging techniques.

    Thanks for the feedback !

    ReplyDelete
  3. GREAT ARTICLE!! I was coding my own debugger too and its my luck that i stumbled upon this, though this is for 64 bit machines, its quite easy to understand this and port it to 32 bit, which I did. Really, helped me a lot. There is but one issue, while compiling my own debugger, I need libbfd.a which is at /usr/lib but when I gcc -l /usr/lib/libbfd.a it simply says /usr/lib/libbfd.a not found. Any help (or abuse, as I always forget important gcc compilation options) is welcome :)

    ReplyDelete
  4. Thanks, it's nice to know that it useful :)

    Try gcc -lbfd and it should work.

    ReplyDelete
  5. yea i figured it out, i was trying to use the math.h when it gave me the same error so i did gcc -lm and it worked so i thought to give it a try as "gcc -lbfd" and it worked :D thanks anyways, one last question, where can i upload mine so that other developers (maybe) can develop it further? and also am planning to make a scheduler for Linux (just like that, for my final year project maybe, Linux already has a good one), so any idea on that?

    ReplyDelete
  6. well, feel free to host it any where you like but if you have a bitbucket account I can give you write access to the repo so you can add it yourself as a branch maybe, however, if you host it somewhere else, post the link here along with your name/email, if you like, and I will edit the post and include your port :)

    As for the scheduler, I haven't tried to write one before but I recommend the "Linux Kernel Development 3rd Edition" and also you may want to look at the FreeRTOS scheduler.

    ReplyDelete
  7. yup yup :D I'll post it somewhere and will give the email asap :D :D thanks mux :) I'll look at FreeRTOS scheduler and I'm using Understanding the Linux Kernel by Bovet and Cesati so I think I'm going on the right track :) thanks once again for this awesome article and all the help... may the force be with you :D

    ReplyDelete
  8. hi again mux :) its quite long but you said that i should post the link to where I'd host my port for your debugger, i hosted it at google codes at http://code.google.com/p/xbugger/
    hopefully, this port may be useful as its quite generic (the disassemble function in specific)
    any comments or thereby can be mailed to me at drona89@live.com thanks again mux, may the force be with you :)

    ReplyDelete
  9. This is very useful for me. Thanks for the nice post :)

    ReplyDelete
  10. Nice post. One question: Why should we write our own debugger? Can you please throw some specific examples.

    ReplyDelete
    Replies
    1. no specific application, just for fun and learning...

      Delete
  11. This code does not handles the exceptions. If I try to define a signal with its specific handler I see that the code never gets back from the signal handler function. How can I handle that?

    ReplyDelete
  12. Great post
    can you explain how to implement watch points to this debugger?

    ReplyDelete
  13. teach me the ways of the wizard your a dangerous man my friend (thats a compliment)

    ReplyDelete