low tech: 2010

Sunday, November 21, 2010

Arduino Sonar

This is a fun project I made with an Arduino UNO. It's a simple Sonar using an ultrasonic sensor.

Arduino

Arduino is an open-source hardware/software platform based on Atmel ATmega, being an open-source fanatic I couldn't resist buying it :) it's remarkably easy to learn, given that this is my first Arduino project and it took just a couple of hours ! Arduino is programmed in a language based on Wiring. At the time of this writing UNO is the latest Arduino board based on ATmega328.

Arduino Sonar

The idea behind the project quite simple, the servo rotates the ultrasonic sensor 180 degrees back and forth while sending/receiving ultrasonic pulses, objects are displayed on LCD by drawing lines proportionally to the time it takes the pulse to travel back to the sensor. This type of Sonar is called an Active Sonar because it initiates the ultrasonic waves and listens for the echo as opposed to a passive one that listens to incoming waves from other sources.

I used the PING Ultrasonic Sensor from parallax, however, any sensor should work equally well, even an Infrared sensor, as for the LCD I used the Nokia 6100 Knock-off color LCD from sparkfun. Here are some pictures showing the results of scanning the test subjects :)

The whole project can be found here, I ported the LCD driver it was originally written for mbed.

hg clone https://arduino-sonar.googlecode.com/hg/ arduino-sonar

Tuesday, September 7, 2010

Introduction to ARM Cortex-M3 Part 1-Overview

Hello, this is an introduction to the ARM Cotrex-M3 microprocessor. In the first part we'll talk about the core features of the Cortex-M3, the LPC1768 MCU and the prototyping board mbed, if you decide to get started with ARM this series should have a fair amount of information to help you do so quickly... have fun :)

Cortex-M3 overview

The Cortex-M3 is a 32-bit microprocessor made by ARM based on the ARMv7 architecture, this a Harvard architecture, i.e. separate code and data buses allowing parallel instruction and data fetches, with three profiles, A for high-end applications, R for real time applications and the microcontroller targeted M profile.

The Cortex-M3 is based on the M profile it has a 3 stage pipeline, an advanced interrupt controller (NVIC ) with low interrupt latency, DMA controller with 8 32-bit channels, support for an optional MPU (the LPC17xx has one), support for two operation and two access modes and support for the Thumb2 instruction set.

Thumb2

Generally speaking, ARM processors support two instruction sets, the 32-bit ARM and the 16-bit Thumb instruction sets, when executing ARM code the processor is said to be in ARM state and when executing Thumb code it's said to be in Thumb state.

The ARM state offers more performance for speed-critical tasks, also certain operations require the ARM state e.g. interrupt handlers, while the thumb state offers higher code density (more instructions in less memory).

ARM and Thumb code live in separate source files, switching the processor between the two states frequently could complicate things and become an overhead on both the development and execution time.

The Cortex-M3 supports the Thumb2 set (a superset of the Thumb) which allows mixing 16/32 bit instructions, thus it offers the best of both worlds, performance and high code density in one instruction set without the need for state switching to do so, in fact, it's not even supported by the Cortex-M3, i.e. it's always in Thumb state.

Operation Modes

The Cortex-M3 supports two operation modes the thread mode for process execution and the handler mode for exception handlers code, each mode has its own stack pointer, the process stack pointer (PSP) and the main stack pointer (MSP).

Usually the thread and handler modes use the same stack pointer thus sharing the stack memory, however, by configuring the processor to use different stack pointers, the stack memory for those two modes can be separated, consequently protecting the system stack memory form a faulty user process.

The two stack pointers are banked, that is only one can be accessed at a time, SP accesses the currently used stack pointer.In an interrupt handler MSP is always used, access to the PSP maybe still be needed for several reasons, mainly when an OS, e.g. RTOS, is running things it may need to change the PSP for context switching or fetch the SVC number from the program counter (PC) when an SVC interrupt is made.

Other reasons for accessing the PSP in an interrupt handler may include locating a faulty instruction, e.g. an instruction that caused a bus fault can be fetched in the bus fault handler from the process stack using the stacked PC.

Access Modes

The Cortex-M3 supports two access modes, user and privileged access, in user mode access to certain registers and instructions is restricted and if an MPU is available access to memory regions, containing OS data or another process data, can also be restricted for a user process. This is mainly intended for use by a multitasking OS.

After executing the rest handler the processor is running in a privileged thread mode, a switch to user level should occur shortly after, when entering an exception handler the processor switches to a privileged level and then back to the previous level upon exiting the handler. This could serve very well when implementing system calls.

A user level thread cannot switch back to a privileged level except through an exception handler (running in handler mode) that changes the access level to privileged on behalf of the thread before returning to thread mode.

NVIC

The Nested Vectored Interrupt Controller (NVIC) is an advanced interrupt controller that allows nesting interrupts with higher priorities, that is if an interrupt handler is currently executing when a higher priority interrupt occurs, the lower priority interrupt is preempted and the higher priority interrupt executes next.

Some priorities are fixed, like the reset exception which is the highest (lowest number) priority, other are programmable and can be changed dynamically, i.e at run time, however, this requires the interrupt vector to be relocated from flash to ram and then patched with the new handler.

The NVIC handles stacking (pushing registers onto the stack) and un-stacking (popping registers from the stack) at the hardware level, this does not only relief the programmer from doing so himself, it also allows the handler to be normal C function and decreases the overall interrupt latency.

Vectored means that when an interrupt is asserted its number is known to the processor and used to index into the interrupt vector to obtain the handler address directly, as opposed to having a shared handler and enumerating devices to know which one interrupted the processor.

This is part of the interrupt vector from the startup code of the Cortex-M3, the first word is the initial MSP value, second is the address of the first interrupt handler and so on... The interrupt vector is located at address 0x0 in flash:

.long   __cs3_stack                 /* Top of Stack                 */
.long   __cs3_reset                 /* Reset Handler                */
.long   NMI_Handler                 /* NMI Handler                  */
.long   HardFault_Handler           /* Hard Fault Handler           */
.long   MemManage_Handler           /* MPU Fault Handler            */
.long   BusFault_Handler            /* Bus Fault Handler            */

.....

/* Dummy Exception Handlers */

    .weak   NMI_Handler
    .type   NMI_Handler, %function
NMI_Handler:
    B       .
    .size   NMI_Handler, . - NMI_Handler

    .weak   HardFault_Handler
    .type   HardFault_Handler, %function
HardFault_Handler:
    B       .
    .size   HardFault_Handler, . - HardFault_Handler

The __cs3_stack symbol is the address of the start of the stack region in ram, this address is loaded into MSP on startup, since the Cortex-M3 uses a descending stack, i.e. the stack grows downwards while the heap grows upwards, this address is the last memory address:

In the linker script, which is basically the memory layout, __cs3_stack symbol is defined as start of ram + ram size:

PROVIDE(__cs3_stack = __cs3_region_start_ram + __cs3_region_size_ram);

Near the end of the startup code dummy interrupt handlers, branches to same address (B "."), are provided for all the interrupt handlers, those are defined as weak so when linking the binary they can be overridden by user defined handlers (if any).

Two interesting techniques used by the NVIC to further decrease the interrupt latency, tail-chaining and late arrivals.

Tail-Chaining

When two interrupts arrive at the same time, or a lower priority interrupt occurs while executing a same or higher priority interrupt i.e. non pre-empting, the higher priority interrupt executes first while the other remains pending and as soon as the higher priority interrupt finishes executing the pending interrupt is executed immediately, i.e. tail-chained to the first one, without un-stacking and then stacking the registers again which is not necessary because the contents of the stack has not changed, Thus saving a significant amount of time on executions of subsequent interrupt handlers.

Notice that due to the Harvard architecture, the stacking of the registers can take place simultaneously with the fetches of the interrupt vector and isr code.

Late Arrivals

When a high priority interrupt is asserted while entering a lower priority interrupt, and it happens to occur between stacking and before executing the first handler instruction, the higher priority interrupt vector is fetched stacking is allowed to finish, and as soon as it does, the processor executes the higher priority interrupt immediately, when it finishes executing, the lower priority interrupt is then tail-chained to the higher priority interrupt and allowed to execute.

Similarly, if the interrupt occurs while un-stacking is taking place, the un-stacking is abandoand and the lower priority interrupt is tail-chained to the higher priority interrupt and allowed to execute.

Bit-Banding

A Bit-Band region allows atomic bit manipulation through another memory region called the alias region, each bit in the Bit-Band region is addressable through a 32-bit aligned address in the alias region, that is each word in the Bit-Band is mapped to 32 addresses in the alias region.

Bit-Banding can shorten a read-modify-write operation, say you wish to set a bit in some device register, e.g. to enable interrupts, you usually do a read-modify-write operation that is you would read a word, mask it and then write it back, you may also need to make sure this operation won't be interrupted, i.e. atomic, to ensure data consistency, so your final code might look like this:

#define DEVICE_BASE_ADDR    ((uint32_t*)0x2007C000)
#define ENABLE_INT_MASK     (0x01)

void enable_int()
{
    // tag for exclusive access
    // wait while it's locked
    while (__LDREXW(DEVICE_BASE_ADDR)); 

    uint32_t i = *DEVICE_BASE_ADDR;
    i |= ENABLE_INT_MASK;
    __STREXW(i, DEVICE_BASE_ADDR);      //exclusive write    
}

Note that ldrex and strex are the newer exclusive access instructions, starting from ARMv6 they replaced SWP instruction for several reasons mainly that they don't lock the bus, anyway, using Bit-Banding it's just a simple matter of writing to the alias address of the bit:

#define DEVICE_BASE_ALIAS   ((uint32_t*)0x22000000)

void enable_int()
{
   *DEVICE_BASE_ALIAS = 1;
}

Next we will talk about the LPC1768 Cortex-M based MCU.

NXP LPC1768

The LPC1768, manufactured by NXP/Philips, is an MCU based on Cortex-M3 it has just about every peripheral you could think of UARTs USB, SPI, I2C, ADC/DAC, PWM, CAN, Ethernet... well you get the picture. It also has an MPU, as mentioned before, which can be used to define memory regions with access attributes like (rwx).

The LPC1768 has an on-chip 512k Flash and 64k SRAM 32k of those are reserved for USB and Ethernet, however, they can still be used as general purpose ram when either of those peripherals is not used.

The LPC1768 has an 8 channel DMA, each can be configured independently to handle a transfer from memory to memory, memory to peripheral, peripheral to memory and peripheral to peripheral. The DMA allows the CPU to be free for number crunching while it handles transfers by taking control of the bus, saving the cycles required if the CPU were to handle the transfer itself.

The DMA controller has a separate IRQ line that, when configured to so, interrupts when a transfer is complete, it can handle either a single transfer or multiple transfers chained using a linked list. it also supports different source and destination transfer width by packing and unpacking data.

There are many prototyping boards based on NXP's microcontrollers, one of those is the mbed board.

mbed

The mbed is a prototyping board for the NXP LPC1768 MCU. It is a complete development platform for rapid prototyping that provides high-level C++ drivers for the peripherals, protocol stacks, code samples and a cloud-based based compiler.

The mbed makes programming the MCU really easy, it has a small 2MBs flash with FAT fs, to program the LPC1768 you mount the flash, copy the binary image and then reset the mbed, the binary image is then copied to the mcu flash and started. That's basically all you need to do! you'll definitly appreciate this if you tried JTAG with openocd before.

This provides an easy way into ARM hacking, although I must say that I don't prefer C++ and I definitely hate the online compiler altogether ! Therefore I've taken a different approach, leaving the mbed libraries and compiler and using NXP's driver library, CMSIS and a GNU based toolchain. I highly recommend you do so too, this way you'll learn more about your processor.

That's all for today, in part 2 we'll talk about more fun stuff like CodeSourcery, NXP driver library, CMSIS, SPI and other things... Also we will make a couple of simple applications to put everything together, see you soon :)

Tuesday, August 3, 2010

How To Read SMS Messages From Huawei E620 Modem

I always wanted to read SMS messages from my Huawei modem on Linux but unfortunately It seems that they only support Microsatan Winblows...

Anyway, while I was trying to read /dev/ttyUSB2, I was connecting a usb gadget with serial over usb, I got a of bunch of gibberish messages, I found out, beside the fact that the usb gadget didn't work :D, that the Huawei supports serial over usb ! it also seems that I have been in a cave, hence the bad reception, because people have been playing with this for decades :)

Cool I can finally read the messages without playing solitaire !

Huawei AT commands

While the modem is connected it keeps sending statistics which include the connection duration, upload/download rates and signal strength:

DSFLOWRPT:000054F0,00000622,00000D89,00000000006FD555,00000000017D4579,0000BB80,0000FA00
DSFLOWRPT:000054F2,00000000,00000000,00000000006FD555,00000000017D4579,0000BB80,0000FA00
RSSI:4
DSFLOWRPT:000054F4,0000001F,0000007B,00000000006FD594,00000000017D466F,0000BB80,0000FA00
RSSI:4
DSFLOWRPT:000054F6,00000000,00000000,00000000006FD594,00000000017D466F,0000BB80,0000FA00

It also supports a subset of the AT commands, I googled for Huawei AT commands, It seems that every model (or perhaps Vendor) supports a different set of AT commands, but "they" don't publicly provide this information !

Anyway, most of them support dialling numbers, notifications, reading/sending SMS messages... Which means that you can easily turn your machine into an SMS gateway ;) !

I'm only interested in a subset of the commands:

AT+CMGF=1
AT+CSCS="UCS2"
AT+CMGL="ALL" 
AT+CMGS="phone number" "message" <ctr+z>

The first command, enables the text mode for other commands, second one sets the encoding to UCS2, I set it to UCS2 to unifiy all the output from the Huawei so I can parse both English and Arabic messages, but there are other options try AT+CSCS=?.

The third one retrieves the list of messages and the last one sends out an SMS, but I haven't tried this one because I didn't have any credit left :)

Python Script

After a some attempts with minicom and ipython, I quickly put together a small python script to read the SMS messages, it's crappy but it works:

#!/usr/bin/env python
import serial

port = serial.Serial(baudrate=115200, port='/dev/ttyUSB2', timeout=5)
port.open()
port.write('AT+CMGF=1\r\n')      #set text mode
port.write('AT+CSCS="UCS2"\r\n') #set encoding to UCS2
port.write('AT+CMGL="ALL"\r\n')  #get all messages

def __decode(str):
    ustr = u''
    str = str.strip().replace('"', '')
    for i in range(len(str)):
        if not i % 4:
            ustr += unichr(int(str[i:i+4], 16))
    return ustr

gotmsg = False

while(1):
    line = port.readline()
    if line.startswith('+CMGL'):
        info = line.split(',')
        print 'message#%s from %s date %s time %s %s'     \
              %(info[0].split(':')[1], __decode(info[2]), \

info[4], info[5], __decode(port.readline()))
        gotmsg = True
    if gotmsg and line.startswith('OK'): break

Damn it ! more bills :(

message# 7 from 1011161051159710897116 date "10/07/23 time 16:59:55+12"
 عميلناالعزيز:نحيط سيادتكم علمابضرورة سدادفاتورتكم خلال48ساعة نظرا لانتهاء موعد 
message# 8 from 1011161051159710897116 date "10/07/23 time 16:59:55+12"
www.etisalat.com.eg استحقاقها وذلك لضمان استمرارالخدمة,كما يمكنكم السدادعن طريق موقع اتصالات
message# 9 from 1011161051159710897116 date "10/07/23 time 16:59:55+12"
 في حالةالسداديرجى تجاهل الرسالة

Wednesday, July 14, 2010

Writing a Debugger

You must have debugged your code using gdb before or traced some process with strace, if so, you must wonder how it works ? well it's not magic :) and it's not really that difficult either !

Today I'm going to walk you through writing a debugger, it's a small, yet completely functional, debugger, called mdb, using a mixture of ptrace, libbfd and libopcodes so enjoy... :)

Overview

The task of writing a debugger can be broken down into a set of smaller tasks:

Tracing a process.
Setting break point.
Disassembling instructions.

The main function will open a binary, read the symbol table, run it in a traceable process and then wait for commands from the user to set break points, single step through the code or continue execution normally. We already have a lot to do so let's get started...

Tracing a process:

Tracing a process is examining and controlling the execution of that process, In order for a process to be traced, it must either explicitly express its wish of being traced using ptrace(PTRACE_TRACEME,...) or another process may attach and trace it using ptrace(PTRACE_ATTACH,...).

However, since we're writing a debugger, attaching to a running process is not really an option, since we would like to debug it from the beginning, and we may not always have access to the source code of the process to explicitly start tracing, so what should we do ?

We can call fork(), creating a new process, and then call ptrace(PTRACE_TRACEME,...) on the newly created process, doing so makes it traceable, if we then exec() the binary we wish to trace we end up having our code executing in a traceable process:

pid_t pid;
switch (pid = fork()) {
    case -1: /*error*/
        perror("fork()");
        exit(-1);
    case 0:/*child process*/
        ptrace(PTRACE_TRACEME, NULL, NULL); /*allow process to be traced*/
        execl(path, name, NULL);            /*child will be stopped here*/
        perror("execl()");
        exit(-1);
}
/*parent continues execution here*/

After calling ptrace((PTRACE_TRACEME...) any signal, except for SIGKILL, sent to the traced process will cause it to stop executing, Calling exec from the traced process causes a SIGTRAP being sent to it, also causing it to stop.

The parent process can then be notified of the status of the child process using wait() and then resume the execution of the child process using PTRACE_SINGLESTEP or PTRACE_CONT.

Setting break points

Right, so we now have control over the execution of the process, not a very fine one though, a finer control of execution can be achived using breakpoints.

We can use the address of a symbol we wish to break at for setting a breakpoint, however, for obvious reasons, we would like to set break points using the symbol and not it's memory address. In order to do so, we must first read the symbol table of the binary, hopefully the binary in question is not stripped, and save it in hash table, so that when the user wishes to break on a symbol we look up that symbol in the hash and find its address.

libbfd, part of the binutils, is used to read the symbol table from the binary it also happens to provide hash tables, but, uthash is more flexible:

/* load symbol table*/
long size;
long nsym;
asymbol **asymtab;

bfd_init();
bfd *abfd = bfd_openr(path, NULL);

bfd_check_format(abfd, bfd_object);
size = bfd_get_symtab_upper_bound(abfd);
asymtab = malloc(size);
nsym = bfd_canonicalize_symtab(abfd, asymtab); /*reads symtab*/

/*create symbol table hash*/
long i;
SymbolTable *symtab=NULL, *symbol;
for (i = 0; i < nsym; i++) {
    symbol = malloc(sizeof(Symbol));
    symbol->sym  = bfd_asymbol_name(asymtab[i]);
    symbol->addr = bfd_asymbol_value(asymtab[i]);
    HASH_ADD_KEYPTR(hh, symtab, symbol->sym, strlen(symbol->sym), symbol);
}

Once we have the address we need a way of making the process stop executing at that address, that is, we need the process to generate a software interrupt or a trap, namly INT 3, which is defined specifically for use by debuggers.

Injecting the opcode of INT 3, which is either 0xCC or 0xCD03 when using 0xCD<imm8>, at the address of the breakpoint should be enough to get the job done:

case BREAK: {/*set break point*/
    /*look up the symbol in the symbol table*/
    HASH_FIND_STR(symtab, arg, symbol);
    if (symbol) {
        /*insert new break point*/
        brp = malloc(sizeof(Symbol));
        brp->sym  = symbol->sym;
        brp->addr = symbol->addr;        
        /*save instruction at eip*/
        brp->opc = ptrace(PTRACE_PEEKTEXT, pid, symbol->addr, NULL);        
        /*insert break point*/
        ptrace(PTRACE_POKETEXT, pid, symbol->addr, 0xcc);
        /*add break point to hash*/
        HASH_ADD_INT(brptab, addr, brp);
        printf("break %lx<%s>\n", brp->addr, brp->sym);
    } else {
        printf("symbol not found <%s>\n", arg);
    }                  
    free(arg);
    break;
}

However, one minor "detail" is left, injecting opcode at an aribitrary address will most likely miss up the instructions that follow the breakpoint address, causing the processor to raise an exception and our poor process eventually being killed!

The solution is to first backup the instruction at the breakpoint address and then restore it when we reach this break point after the process being stopped, and before executing the next instruction we also need to decrement eip so that it points at the beginning of the restored word:

if (WIFSTOPPED(status)) {
    /*read eip*/
    long rip =  ptrace(PTRACE_PEEKUSER, pid, 8 * RIP, NULL)-1;
    
    /*look up eip in the breakpoint hashtable*/
    HASH_FIND_INT(brptab, &rip, brp);
    if (brp) {
        HASH_DEL(brptab, brp);
        /*restore instruction(s)*/
        ptrace(PTRACE_POKETEXT, pid, brp->addr, brp->opc);
        /*decrement eip*/ 
        ptrace(PTRACE_POKEUSER, pid, 8 * RIP, rip);
        printf("process %d stopped at 0x%lx<%s>\n", pid, rip, brp->sym);
    } else {
        printf("process %d stopped at 0x%lx\n", pid, rip);
    }
}

Disassembling instructions

We would like our debugger to print something readable instead of machine code, to do so we need something that can disassemble machine code from memory, libopcodes, also part of binutils, allows you to disassemble and print machine code of multiple architectures:

/*disassembly routine*/ 
void disassemble(void *buf, unsigned long size)
{
    struct disassemble_info info;
    init_disassemble_info (&info, stdout, fprintf);
    info.mach = bfd_mach_x86_64;
    info.endian = BFD_ENDIAN_LITTLE;
    info.buffer = buf;
    info.buffer_length = size;
    print_insn_i386(0, &info);
    printf("\n");
}

Cool now that we got that out of the way, one small problem left, since this is a CISC architecture then instructions, most likely, are of variable length, they could be 1 byte or 16 bytes long!

Two solutions jump to my mind, we can either let libopcodes decode whole words in this case it will output one or more instruction(s) and some garbage or we can use the program counter to find out the exact length of the instruction.

The program counter, or eip, is a register that keeps count of how many bytes executed so far, so at any given time, eip points to the next instruction to be executed, and when that instruction is executed it gets incremented with the size of the instruction in bytes, if we save the current eip until the next instruction and then subtract it form the new eip we get the size of the previous instruction in bytes, pretty neat, no ?

Not so fast, this might work for normal instructions, branching however, is quite a different story, branch instructions like jmp, will either increment the eip too much, if jumping forward, or decrement it, if jumping backward, if so, we will ignore the eip difference if it's to big or if it's a negative number, okay now we're ready:

case NEXT: {/*next instruction*/
    /*read instruction pointer*/
    long rip =  ptrace(PTRACE_PEEKUSER, pid, 8 * RIP, NULL);
    
    if (old_rip) {
        /*calculate instruction size*/
        long oplen = rip - old_rip;
        if (oplen > 0 && oplen < 16) {
            disassemble(&opcode, oplen);
        }
    }

    /*read two words at eip*/
    opcode[0] = ptrace(PTRACE_PEEKDATA, pid, rip, NULL);
    opcode[1] = ptrace(PTRACE_PEEKDATA, pid, rip+sizeof(long), NULL);
    old_rip = rip;

    /*next instruction*/
    ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
    wait(&status);
    break;
}

Test run
Time for fun :) we will debug a small program with mdb, this is the source code for the program:

#include <stdio.h>
#include <string.h>
int say_hello(int x, int y, int z)
{
    printf("Hello\n");
    return 0;
}

int main(int argc, char **argv)
{
    int x = 0x55;
    int y = 0x56;
    int z = 0x57;
    say_hello(0x255, 0x256, 0x257);
    return 0;
}

And this is the complete session:

./mdb test
process 15384 stopped at 0x7f519defcaef
(mdb) break main
breakpoint at 400546<main>
process 15384 stopped at 0x7f519defcaef
(mdb) continue
process 15384 stopped at 0x400546<main>
(mdb) next
process 15384 stopped at 0x400546
(mdb) next
push   %rbp
process 15384 stopped at 0x400549
(mdb) next
mov    %rsp,%rbp
process 15384 stopped at 0x40054d
(mdb) next
sub    $0x20,%rsp
process 15384 stopped at 0x400550
(mdb) next
mov    %edi,-0x14(%rbp)
process 15384 stopped at 0x400554
(mdb) next
mov    %rsi,-0x20(%rbp)
process 15384 stopped at 0x40055b
(mdb) next
movl   $0x55,-0x4(%rbp)
process 15384 stopped at 0x400562
(mdb) next
movl   $0x56,-0x8(%rbp)
process 15384 stopped at 0x400569
(mdb) next
movl   $0x57,-0xc(%rbp)
process 15384 stopped at 0x40056e
(mdb) next
mov    $0x257,%edx
process 15384 stopped at 0x400573
(mdb) next
mov    $0x256,%esi
process 15384 stopped at 0x400578
(mdb) next
mov    $0x255,%edi
process 15384 stopped at 0x400523
(mdb) break say_hello
breakpoint at 400524<say_hello>
process 15384 stopped at 0x400523
(mdb) continue
process 15384 stopped at 0x400524<say_hello>
(mdb) registers
rax 0x7f519def8ec8
rbx 0x0
rcx 0x0
rdx 0x257
rsi 0x256
rdi 0x255
rbp 0x7fff2395c140
rsp 0x7fff2395c118
rip 0x400524
process 15384 stopped at 0x400523
(mdb) kill
process 15384 terminated

The source code of the debugger is around 250 lines it's hosted on mercurial along with uthash, test.c and a Makefile.

hg clone https://iabdelkader@bitbucket.org/iabdelkader/mdb

A port for x86-32 bit is also available, written by hnd, you may contact him at drona89@live.com for any questions or comments:

http://code.google.com/p/xbugger/

Monday, July 5, 2010

RN-41 Bluetooth Module

Recently I've been playing with the RN-41 bluetooth module from sparkfun, this module is really easy to use, it requires just a couple of connections to work and implements the SPP (bluetooth serial profile) with TTL levels so it's perfect for a pic project :) This is a little how-to on configuring and interfacing with the RN-41 module.

The circuit

This is a typical pic microcontroller circuit with a 3.3v voltage regulated power supply, the pic is running at 4Mhz, connected to the pic's serial port is the RN-41 module, rx->tx and tx->rx. I also wired a 16x2 LCD to print text sent to the RN-41.

Programming the pic

Alright we have the hardware ready now we need to program the pic to read from the RN-41 and print to the LCD, I used the CCS compiler, here's the code

#include <16f876.h>
#use delay(clock=4000000)     /*4Mhz*/
#use RS232(BAUD=9600, XMIT=PIN_C6, RCV=PIN_C7, STREAM=COM_A) 
#fuses XT, NOWDT, NOPROTECT, NOLVP 
#include "flcd.c"            /*flex lcd driver/

char buf[16];
int8 have_string = 0;

/*serial port interrupt service routine*/
#int_rda
void sp_isr()
{ 
    fgets(buf, COM_A);  
    have_string = 1;
}

void main(void)
{
    enable_interrupts(global);    /*enable interrupts*/
    enable_interrupts(int_rda);   /*enable serial port interrupts*/
   
    lcd_init();
    delay_ms(100); /*must delay after initiating the lcd*/
   
    while (1) {             
       if (have_string) {
          have_string = 0;         
          printf(lcd_putc, "\f%s", buf);
       }
    }
}

The code is pretty easy, but anyway, we first enable global and serial port interrupts so we get an interrupt whenever we receive data, you can poll the serial port instead but you should use interrupts because they free up the processor to do other stuff.

Inside the isr, we read a string from the serial port and set the have_string flag which is continuously checked in the main loop, when we have a string we print it to the lcd. After programming the pic, last step is configuring the RN-41.

Configuring the Module

The RN-41 module can be configured either locally, over the serial port, or remotly, over the air, I think the later approach is easier, so that's what I will be doing, I will use python to connect to the RN-41 since it's a lot more faster than doing so in C.

First we need a bdaddr and a channel to connect to the RN-41 module, so go ahead connect the power to the circuit, open a terminal and type:

$hcitool scan
Scanning ...
    00:06:66:04:11:94    FireFly-1194

$sdptool records 00:06:66:04:11:94
Service Name: SPP
Service RecHandle: 0x10000
Service Class ID List:
  "Serial Port" (0x1101)
Protocol Descriptor List:
  "L2CAP" (0x0100)
  "RFCOMM" (0x0003)
    Channel: 1
Language Base Attr List:
  code_ISO639: 0x656e
  encoding:    0x6a
  base_offset: 0x100
...

And we have the bdaddr and channel, time to configure the RN-41, the RN-41 has two modes of operation, a command mode and a data mode to configure the RN-41 we need to enter the command mode and we have to do so within then config time window, 60 seconds by default, open the python interpreter and:

import bluetooth
sock = bluetooth.BluetoothSocket(bluetooth.RFCOMM)
sock.connect(('00:06:66:04:11:94', 1))  # bdaddr, spp channel
sock.send('$$$')                        # config mode, within 60 of connecting
sock.send('ST,255\r\n')                 # enables continuous configuration
sock.send('SU,9600')                    # set baudrate to 9600bps 
sock.send('---\r\n')                    # switch back to data mode 
sock.send('Hello World !\r\n')          # send some text to test

And we're done have fun :)

Notes:

The 10k pot controls the LCD contrast.
Continous configuration allows you to enter the config mode at any time, if you don't use it don't forget to enter the config mode within the config time window.
The flcd driver can be found somewhere at the CCS forums
If you're going to use the same pins for the LCD this is my configuration:

#define LCD_DB7   PIN_B7
#define LCD_DB6   PIN_B6
#define LCD_DB5   PIN_B5
#define LCD_DB4   PIN_B4
#define LCD_E     PIN_B1
#define LCD_RS    PIN_B0

Sunday, March 14, 2010

X86 Machine Code

A while ago someone asked how to encode/execute x86 machine code, I replied with a small tutorial on the subject, which I thought I should expand and share here

X86 instructions can range from 1 byte to 15 bytes long, an example for 1 byte instruction is the NOP (no operation instruction 10010000b or 0x90h), in general/from a bird's eye view x86 instructions format looks like this:

byte: 0,1,2,3   4,5    6        7              8,9,10,11        12,13,14,15
func: prefix    opcode reg/mem  scaled indexed mem displacement imm data

*Although this is 16 bytes long, the actual instruction can not exceed 15 bytes because some bytes are mutually exclusive.

Now lets encode a simple x86 instruction "mov $0x8888 , %eax". Depending on the operands, memory addressing scheme etc... the MOV instruction can vary, here we want to move a 2 byte immediate operand (0x8888 ) to register EAX which takes the following format:

opcode    8/16bit    eax    byte0     byte1
1011      1          000    10001000  10001000 or 0xB88888

Now comes the fun part, executing the instruction :) I will use a simple C program to call the code, and watch the results with gdb, there are two ways to call this code, the first and easy way is to just call it ! casting the buffer to a function pointer and calling it.

The second way, which is harder but more educational, involves poking around the stack a little bit, I won't fully explain it, but basically, we will call a function, which will change its own return address on the stack to the address of the opcode so that the execution continues at the buffer (think buffer overflow):

char *opcode = "\xB8\x88\x88";
void run()
{
    long *ret;
    ret=&ret+2;    /*return address on stack*/
    *ret=(long*)&opcode; /*now run() will return to opcode*/
}

main()
{
    //((void(*)(void))opcode)();
    run();
}

Compile with:

gcc mov.c -o mov -ggdb

This is the complete gdb session:

(gdb) mov                 //run with gdb  
(gdb) break run           //set break point at run() function 
(gdb) display /i $pc      //add a display to see the inst mnemonic 
(gdb) run                 //run the program  
(gdb) nexti               // skip instructions until you see ret
(gdb) nexti
0x0804835e in run () at mov.c:7
0x804835e <run+26>:  ret  //return to the opcode address
(gdb) nexti
0x0804954c in opcode () 1: x/i $pc
0x804954c <opcode>:  mov  0x8888,%eax  //Finally our hand coded instruction
(gdb) nexti               //one more nexti to execute the instruction 
(gdb) info registers      //dump registers  
 eax 0x8888       34952   //and eax now holds 0x8888 !
 ecx 0xbf877640  -1081641408
 edx 0xbf877620  -1081641440

That's it for today, I hope this small tutorial has inspired you to start experimenting yourself.