Sunday, March 14, 2010

X86 Machine Code

A while ago someone asked how to encode/execute x86 machine code, I replied with a small tutorial on the subject, which I thought I should expand and share here

X86 instructions can range from 1 byte to 15 bytes long, an example for 1 byte instruction is the NOP (no operation instruction 10010000b or 0x90h), in general/from a bird's eye view x86 instructions format looks like this:
byte: 0,1,2,3   4,5    6        7              8,9,10,11        12,13,14,15
func: prefix    opcode reg/mem  scaled indexed mem displacement imm data
*Although this is 16 bytes long, the actual instruction can not exceed 15 bytes because some bytes are mutually exclusive.

Now lets encode a simple x86 instruction "mov $0x8888 , %eax". Depending on the operands, memory addressing scheme etc... the MOV instruction can vary, here we want to move a 2 byte immediate operand (0x8888 ) to register EAX which takes the following format:
opcode    8/16bit    eax    byte0     byte1
1011      1          000    10001000  10001000 or 0xB88888

Now comes the fun part, executing the instruction :)  I will use a simple C program to call the code, and watch the results with gdb, there are two ways to call this code, the first and easy way is to just call it ! casting the buffer to a function pointer and calling it.

The second way, which is harder but more educational, involves poking around the stack a little bit, I won't fully explain it, but basically, we will call a function, which will change its own return address on the stack to the address of the opcode so that the execution continues at the buffer (think buffer overflow):
char *opcode = "\xB8\x88\x88";
void run()
    long *ret;
    ret=&ret+2;    /*return address on stack*/
    *ret=(long*)&opcode; /*now run() will return to opcode*/

Compile with:
gcc mov.c -o mov -ggdb
This is the complete gdb session:
(gdb) mov                 //run with gdb  
(gdb) break run           //set break point at run() function 
(gdb) display /i $pc      //add a display to see the inst mnemonic 
(gdb) run                 //run the program  
(gdb) nexti               // skip instructions until you see ret
(gdb) nexti
0x0804835e in run () at mov.c:7
0x804835e <run+26>:  ret  //return to the opcode address
(gdb) nexti
0x0804954c in opcode () 1: x/i $pc
0x804954c <opcode>:  mov  0x8888,%eax  //Finally our hand coded instruction
(gdb) nexti               //one more nexti to execute the instruction 
(gdb) info registers      //dump registers  
 eax 0x8888       34952   //and eax now holds 0x8888 !
 ecx 0xbf877640  -1081641408
 edx 0xbf877620  -1081641440  

That's it for today, I hope this small tutorial has inspired you to start experimenting yourself.


  1. Let me raise all the hats I've for you. Rabena yekremak and please keep up the impressive work. My name is Ahmed Abdalla and I am impressed :)

  2. Ahmed,

    I'm glad to see that someone finds my ranting useful :)

    Thanks for the nice comment.

  3. Nice! Your blog is a gem; glad I stumbled upon it. Just curious - is your background in Electrical/Electronics Engineering?

    1. Thank you! and no, computer science.

    2. Ah, I see. I asked because you have a quite a few articles on electronics. :)