Monday, October 24, 2011

Cortex-M3 Exception Vector Checksum

The 7th entry of the Cortex-M3 exception vector is reserved for the 2's complement of the checksum of the exception vector. At run time the bootloader computes the checksum again and adds it to the one stored in the exception vector, if the result equals zero it starts executing the user code.

The checksum is usually computed by the software that flashes the binary like FashMagic or openocd, if you're using openocd like me, you may see this message every time you flash a binary:
Warn : Verification will fail since checksum in image (0x00000000) to be 
written to flash is different from calculated vector checksum (0xeffc59e6).
Warn : To remove this warning modify build tools on developer PC to inject 
correct LPC vector checksum.
The code will run normally, because openocd computes the checksum for you, but it's just too annoying, so I wrote the following small utility to compute the checksum and inject it into the binary, it's called from the Makefile before running openocd to flash the binary:
#include <stdio.h>
int main(int argc, char **argv)
    if (argc == 1) {
        printf("usage: cm3_checksum <bin>\n");
        return 1;

    FILE *file;
    if ((file = fopen(argv[1], "r+")) == NULL) {    
        return 1;

    /* The checksum of the exception vector */
    unsigned int i, n, checksum=0;
    for (i=0; i<7; i++) {
        fread(&n, 4, 1, file);
        checksum += n;

    /* The 2's complement of the checksum */
    checksum = -checksum;
    printf("checksum: 0x%X\n", checksum);
    /* write back the checksum to location 7 */
    fseek(file, 0x1c, SEEK_SET);
    fwrite(&checksum, 4, 1, file);

    return 0;
Read more ...

Tuesday, July 12, 2011

Delay Slots

Delay slots are an artifact of some early pipelined architectures in which  pipeline hazards were not handled explicitly. I was puzzled for while by some unexpected assembly produced by gcc while working on my  own implementation of the MIPS ISA,  further investigation yielded the following results about the branch delay and the load delay slots, both of them occurred in early MIPS architectures.

Load Delay Slot
The load word instruction (lw) loads a word from memory to the specified register, because of the pipelined nature of the architecture, the next instruction(s) execute concurrently with the current instruction,  if the following instruction uses the lw destination register as  one of its source registers then it cannot continue before the lw data is fetched from memory and written back to the destination register, otherwise, it will read invalid data. For example, in the following code snippet, the add instruction uses $s0 as one of its source registers, if it were to read $s0 before it's written by the lw instruction it  would read the old value of $s0.
        lw $s0, 4($0)
        add $s2, $s0, $0

This peculiarity is called a Hazard, specifically a data hazard. Data hazards can be handled by either stalling the pipeline or with register forwarding. The pipeline can be stalled with a nop,  which is sometimes called a bubble because it propagates through the whole pipeline causing every stage to be idle. Register Forwarding, on the other hand, forwards the result of an instruction from the current stage to the previous stage (i.e to the next instruction) bypassing the pipeline, the following image shows register forwarding from the first instruction to the second and third, the following instructions can read the register normally:

If the hazard is not handled by the data path, the assembler introduces a delay slot which it later fills with either a nop or by re-ordering the instructions, if it can find something useful to fill in the delay slot with. This is what the disassembled code looks like, the assembler fills the delay slot with a nop:
00000000 <main>:
   0: 8c100004  lw s0,4(zero)
   4: 00000000  nop
   8: 02008820  add s1,s0,zero

If we change the  add instruction  such that it's not dependant on the lw instruction any more, therefore it can execute in parallel, the assembler removes the nop:
00000000 <main>:
   0: 8c100004  lw s0,4(zero)
   4: 02408820  add s1,s2,zero

Branch Delay Slot
Similarly, due to the parallel execution of the instructions, by the time the branch target gets resolved the following instruction would have been fetched, in  other words, the instruction following a branch always executes whether the branch is taken or not.

This type of hazard is called a control hazard, modern architectures use a branch predictor to avoid flushing the pipeline, it gets flushed only in the case of a branch misprediction. Early MIPS didn't handle this, obviously, and the assembler needed to introduce yet another delay slot and fills it with either a nop or, if possible, by re-ordering the instructions. In the following example, if the branch were to execute without a delay slot it would execute the last addi instruction at each step of the loop which would never finish (because $s0 and $s2 are both incremented)
        addi $s0, $0, 1 
        add $s1, $s0, $0
        bne $s0, $s2, loop    
        addi $s2, $0, 1

Note that the second add instruction does not affect the branch target so its order is irrelevant and can come before or after the branch, the following disassembled code shows how the assembler re-ordered the instructions and used the add instruction to fill the delay slot:
00000000 <loop>:
   0: 20100001  addi s0,zero,1
   4: 1612fffe  bne s0,s2,0 <loop>
   8: 02008820  add s1,s0,zero
   c: 20120001  addi s2,zero,1
If the branch was, otherwise, dependant on the add instruction, its order will  not be changed and the assembler will use a nop to fill the delay slot instead. If we modify the and instruction and have it write to $s2 making the branch dependant on the and result it will use a nop:
00000000 <loop>:
   0: 20100001  addi s0,zero,1
   4: 2012ffff  addi s2,zero,-1
   8: 1612fffd  bne s0,s2,0 <loop>
   c: 00000000  nop
  10: 20120001  addi s2,zero,1
Read more ...

Saturday, April 30, 2011

Pimp My Hexbug!

What is a Hexbug you ask ? Well, a Hexbug is a line of micro robotic creatures! Sounds fancy doesn't it ? actually it's quite boring, if you ask me, for example, mine just walks around until it hits something and then it turns around and that's basically just about it!
That's why I've decided to reuse the mechanical parts and boost the bug a bit.  So I designed a small wireless board to control my Hexbug. This is the modified Hexbug, I call it nrfbug

Now let's get down to the glorious details...

Wireless Link
For the wireless link I used the nRF24L01+ chip from Nordic. This chip is by far the most amazing VLSI chip that I have ever seen, it certainly deserves it's own post, however briefly, the nRF24L01+ chip is a 2.4Ghz wireless chip that implements a packet-based low level datalink layer protocol (similar to Ethernet) with dynamic payload length, auto-retransmission, auto-ack, CRC, FIFOs,  multiple transmitters, multiple receivers (broadcast). Newer chips even has an integrated USB controller, an improved 8051 core and an AES engine! it's just amazing!

I wrote a library for this chip, originally wrote it for the LPC1768, it's still a work in progress, but it does the job, link in downloads section.

For the MCU I used an atmega328 running on the internal RC oscillator at 8Mhz, the board has the SPI interface broken out to the header, it's not really compatible with any programmer, that I know of, I just use avrdude and an FTDI chip to bitbang the ihex file to flash.

On board is an H-bridge to control the motor direction, when the current flows in one direction the nrfbug moves both sets of legs, when it moves in the other direction it moves just one causing it to rotate. The bridge has fly-back diodes for protection. The h-bridge is controlled with two GPIO pins on the MCU.

While I was at it, I throw in an SMD ambient light sensor. The sensor is quite simple, it's basically just a light-sensitive transistor that is read by the ADC.

At the other end, I use an mbed to send commands to the bug. Connected to  the mbed is another Nordic chip and a joystick connected to the ADC to move the bug around.
The nrfbug board was designed using Eagle and fabricated at BatchPCB.  The first board had a small problem with the chip antenna having ground pours beneath it, according the datasheet it shouldn't! that's the bless of reading the datasheet after you finish your project :), it only affects the range though (and maybe cause more packets to drop), anyway, I fixed it and waiting for the new revision. The new Eagle files are available in the downloads.

Finally, the bug in action
nRF24L01p avr library
hg clone
nrfbug Eagle files
hg clone
Read more ...

MCP9800 Temperature Sensor

A while ago I picked up a few temperature sensors from DigiKey, it's been in my junk box for sometime now, so I decided it's about time to do something with it!

The MCP9800 is a high accuracy digital temperature sensor from Microchip, the sensor has an  I2C interface, a configurable 9-bit to 12-bit temperature resolution, shutdown mode, one-shot mode (one conversion while in shutdown) and finally an interrupt pin.

Typical Application
The MCP9800 requires a few external components, the standard I2C pull-ups and, depending on the polarity of the ALERT pin, a pull-up/down resistor.

The ALERT pin gets asserted when the temperature exceeds the upper temperature limit (TSET register) and again when it falls back below the lower limit (THYST register). The interrupt must be cleared by reading any register.

I used the MCP9800 in a wireless sensor network using nrRF4L01+ and atmega328, each node sends a unique id followed by the temperature reading to a central receiver which then sorts out the data and displays it.

MCP9800 avr library (the library has a nice native avr I2C example)
hg clone mcp9800
Read more ...

Wednesday, March 30, 2011

TCM8230MD Breakout

The TCM8230MD is a tiny camera from Toshiba theoretically capable of outputting 640x480@30FPS! This post is to document my experience with this devilish cam. 

This is my second breakout board for the camera, this one is designed to be connected as a module to another board and doesn't use a crystal oscillator for the clock, I'm using one of the PWM channels instead. However, the older breakout is still in the repo. Both boards were designed with eagle and fabricated at BatchPCB.
The camera has an I2C interface for configuring its registers. A few basic registers must be set for the camera to start outputting frames. Mainly, 0x02 sets the FPS and 0x03 sets the image size/mode and enables the camera.

Next step, is reading frame data, the camera has three interrupt lines, VD, HD and DCLK, when VD is asserted a new frame is ready, each HD edge indicates a new scanline is available, finally, while HD is high each DCLK edge indicates a valid byte on the parallel port.

I enable VD initially and when VD is triggered only then HD is enabled in the IRQ handler. On each HD interrupt the whole scan line is read in a loop inserting some nop's to sync with DCLK. Finally, the whole frame is sent using DMA to the OLED screen.

Eagle files
hg clone

Read more ...

Tuesday, March 29, 2011

Binary Counters

Binary counters can be used for a variety of things from time keeping to generating/measuring frequencies. Today, we will talk about the concepts behind binary counters using an atmega328 for practice.

Let's start with a simple 4-bit counter based on JK flip-flops. A JK flip-flop has the following truth table:

J K Qt+1 State
0 0 Qt No change
0 1 0 Clear
1 0 1 Set
1 1 Qt` Complement

Where Qt is the current output Qt+1 is the output after the next clock edge.

The idea behind it is quite simple, the output of each flip-flop is feed to one input of an AND gate (the other input is the enable signal) so at the next raising-edge of the clock JKn is complemented only if all the least significant JKs are set/high and this is basically how you count in binary.

The following is a snapshot of the counter, I paused the simulation at the count of 3, at the next rising-edge of the clock the first three flip-flops are complemented output of the counter becomes 100b.
The last AND gate is the carry bit it can be used to extend the counter or even as an interrupt, note that the JK flip-flop inputs are connected  this is practically equivalent to a T flip-flop, so it's also possible to use a T flip-flop for the same counter.

The next counter is slightly more complex. It's a 4-bit counter with parallel load and synchronous clear:

It's basically the same counter with the addition of a couple of subcircuits, we could see how this counter works using boolean algebra to evaluate J and K and comparing the results with the JK truth table:

J  = ICL + CLE
K = C + ICL + CLE

Enable Load Clear
Input Output
0 0 0  J=0, K=0 No change
1 0 0  J=1, K=1 Complement
1 0 1  J=0, K=1 Clear
1 1 0 J=I,  K=I Load

AVR Timer/Counter
The atmega328 has 2 8-bit timers and 1 16-bit timer.  We will use Timer/Counter2 to generate a 1Hz square wave. That is, the event of the line going from low to high and then low again occurs once per second, so we need to switch the line state every 500ms.

First, set the clock source (prescalar) in  TCCR2. Setting the prescalar to 0 disables the timer, 1 means no prescaling, any other value will divide the system clock into smaller frequencies, e.g. assuming a clock frequency of 16Mhz and a 1:1024 prescalar the clock frequency is
f  = 16Mhz / 1024 = 15.6Khz
And the period is
T = 1/15.6Khz = 64us
Next we need to set the Output Compare Register (OCR2). The value contained in this register is continuously compared with the counter value, when the counter matches this value it will trigger an interrupt. If we load OCR2 with 250 the counter will will match this value every 16ms
250 * 64us = 16.0ms

Finally, we need to set the compare match interrupt enable flag in TIMSK. Given the clock frequency, 31 interrupts, roughly speaking, are needed for 500ms to pass. The following example demonstrates using TIMER2:
    if (ticks++ == 31) { 
      ticks = 0;

void main()
      * f = 16Mhz / 1024 = 15.6Khz
      * p = 1/f
      * p = 1/15.6Khz = 64us
      * interrupt every = 250 * 64us = 16.0ms
      * round(500 / 16) = 31 interrupts
    TCCR2B = 0;            /*choose timer clock source, 0 = timer disabled*/
    TCCR2A = ((1<<WGM21)|~(1<<WGM20));/*CTC mode WGM22:0 = 2*/
    TCNT2  = 0;            /*clear timer*/
    OCR2A  = 250;          /*load output compare register with 255*/
    TIFR2  |= (1<<OCF2A);  /*Clear compare match flag*/    
    TIMSK2 |= (1<<OCIE2A); /*enable compare match interrupt*/
    TCCR2B = ((1<<CS22)|(1<<CS21)|(1<<CS20));/*1:1024 prescalar*/
Using a logic analyzer to sample the signal, we should see something like the following snapshot:

atmega328 datasheet 
Computer System Architecture-Morris Mano 
Read more ...

Saturday, January 1, 2011

Introduction to ARM Cortex-M3 Part 2-Programming

Welcome to the second part of the Introduction to ARM Cortex-M3, in part 1 we went through the core features of the Cortex-M3 and the LPC1768. In this part we will focus more on programming the LPC1768 by covering the following points:
  • Toolchain overview
  • Library Tweaks
  • Hardware Interfaces
  • Software Stacks
we have a lot to cover so let's get started...

Toolchain overview
The toolchain of choice is the CodeSourcery toolchain, CodeSourcery is  a gnu-based ARM toolchain developed in partnership with ARM, it's freely available both in source and pre-compiled and uses the embedded C library newlib (by Redhat) as the standard C library.

We're almost good to go, however, we still need drivers. Fortunately, NXP provides a nice driver library for the LPC1768, the library is based on the Cortex Microcontroller Standard Interface (CMSIS) developed by ARM as an abstraction for the core layers, it comes with the startup code, system initialization code, linker script, drivers for all the peripherals plus many examples.

Library Tweaks
I tweaked the Makefile a little bit to build the drivers, startup and system code into a single library, I then installed this library and headers in /usr/local/lpc17xx I think this greatly simplifies my Makefiles since I only need to link one library.

I also added some common initialization code mostly to system_LPC17xx.c to avoid duplicating it in every project. Finally, I tweaked the linker script a little bit. Let's have a detailed look at this code

First order of business is to enable logging. printf, puts and similar routines found in libc eventually call the low-level funciton _write and since it can not really have any useful implementation in newlib, because it's platform dependant, you will need to provide your own to redirect output somewhere. For example, the following redefines _write to redirect output to UART0:
#define stduart (LPC_UART_TypeDef *)LPC_UART0_BASE
int _write(int fd, const void *buf, uint32_t nbyte)
    UART_Send(stduart, buf, nbyte, BLOCKING);
    return nbyte;
Almost all the code I see regardless of the embedded platform needs some sort of a delay/sleep function to wait on some event or just waste time.  There are two ways you can have delays, one is by using loops of instructions that have a known execution time, which is highly inaccurate and could be interrupted, the other way is using counters.

The SysTick timer counts down from the value loaded into one of its registers until it reaches 0 and then it asserts the SysTick IRQ, the value then is reloaded and the timer counts down again and so on.

The following example demonstrates using the SysTick timer for delays. The system tick count is kept in sys_ticks, when sleep is called the value of sys_ticks is saved and then we keep subtracting this value from the current tick count until the difference is equal to or greater than the required delay:
volatile uint32_t sys_ticks; 
void SysTick_Handler(void)__attribute__((weak));

void SysTick_Handler(void) 

void sleep(uint32_t ms) 
    uint32_t curr_ticks = sys_ticks;
    while ((sys_ticks - curr_ticks) < ms);

int main(void) 
    /*Setup SysTick to interrupt every 1 ms*/
    SysTick_Config((SystemCoreClock / 1000) * 1 - 1);
A few things to note here, first, SysTick_Handler is declared weak, when a weak symbol is redefined the second definition is linked instead, so basically,  it can be overridden by the application, later you will see that I redefine it in the RTOS scheduler.

Second thing to note is that setting the timer to interrupt every 1 ms gives reasonable resolution but not necessarily the best throughput so you may wish to tweak that depending on your application, or disable it altogether,  but keep in mind that it affects your timer's resolution.

More RAM
We mentioned before in part1 that the LPC17xx has a second 32k block of SRAM for the Ethernet and USB controllers, however, if you're not using either you may wish utilize this extra memory to your application, you can do so by accessing the memory directly at 0x2007C000 or, more neatly, by defining a new memory region and a new section in the linker script:
/*linker script*/
  rom (rx)   : ORIGIN = 0x00000000, LENGTH = 512K
  ram (rwx)  : ORIGIN = 0x10000000, LENGTH = 32K
  ram2 (rwx) : ORIGIN = 0x2007C000, LENGTH = 32K   /*define memory region*/ 

  .ram2 : /*define section*/
  } >ram2  
 .text : /*other sections*/
And then using the gcc section attribute to place stuff into this section:
/*place buffer in section .ram2*/
uint8_t buffer[BUFFER_SIZE] __attribute__ ((section (".ram2")))={0};
Hardware Interfaces
Next, we will look at initializing and using some of the common hardware interfaces available on the LPC1768, using the NXP driver library.

The serial interface is the most common interface out there, we've seen how to use _write to redirect the output to the USART, now we look at initializing it. The following is an excerpt from my SystemInit function:
#define stduart (LPC_UART_TypeDef *)LPC_UART0_BASE

void SystemInit()
  /*some init code here*/

  PINSEL_CFG_Type pin_cfg={ /*pinsel config*/
      .Funcnum      = 1,
      .Portnum      = 0,
      .Pinmode      = PINSEL_PINMODE_PULLUP,
      .OpenDrain    = PINSEL_PINMODE_NORMAL,

  UART_CFG_Type uart_cfg={ /*UART config*/
      .Baud_rate    = 57600,
      .Databits     = UART_DATABIT_8,
      .Parity       = UART_PARITY_NONE,
      .Stopbits     = UART_STOPBIT_1,
  /*setup tx*/
  pin_cfg.Pinnum  = 2; 

  /*setup rx*/
  pin_cfg.Pinnum  = 3;

  /*Initialize uart*/
  UART_Init(stduart, &uart_cfg);
  UART_TxCmd(stduart, ENABLE);
More advanced things can be done with the USART of course. For example, there are 16 byte  RX/TX hardware FIFOs that can be set to trigger an interrupt or a DMA transfer at a certain level you can find more about this in the datasheet.

Note that when you mount the mbed you may need  to set the baud rate of /dev/ttyACMx using the following command:
stty -F /dev/ttyACM0 speed 57600
SPI is a full duplex serial interface that uses four wires for data transfer, Master In Salve Out (MISO), Master Out Slave In (MOSI), Serial Clock (SCLK) and Slave Select (SSEL).

Slave Select, or Chip Select (CS), is used to select the active slave when multiple slaves are present on the bus.  A few things  about SPI are worth mentioning. First, both end points have shift registers, when one is  loaded with a byte and shifted the other gets shifted too, i.e. exchanged, and so in order to read a byte you must write one too, however, when writing you may ignore the received byte.

Second, the clock polarity (CPOL) and clock phase (CPHA), the clock polarity determines the idle state of the clock, if CPOL = 0 the clock is low when it's idle and high when it's active, CPOL = 1 the clock is high when it's idle and low when it's active. Clock phase determines edge at which the data is sampled, for CPHA = 0 the data is sampled at the first edge, for CPHA = 1 the data is sampled on second edge.

The four combinations of the CPOL and CPHA are called SPI modes, you need to select a mode when configuring SPI, depending on your hardware, note that if CPOL = 0 and CPHA = 0 the clock transitions form low to high when active and the data is sampled on the raising edge of the clock this is the same as when CPOL =1 and CPHA = 1 because data will still be sampled on the raising edge of the clock

A real life example is the oled driver SSD1339 that samples data on the raising edge of the clock, that is, two modes work equally well CPOL = 0/CPHA = 0 and CPOL = 1/CPHA = 1.

Finally, note that the maximum frequency you can set SPI to, according to the datasheet, is 1/8 of the peripheral clock (PCLK) selected in PCLKSEL0/1. Setting  PCLK_SPI to 10b selects the CPU clock (CCLK) as the peripheral clock source  and so f = (CCLK)/8 = 100Mhz/8 that is 12.5 Mhz.

Now this is an example of initializing and using SPI, note that the legacy SPI interface is replaced by SSP which supports SPI among other protocols:
void ssp_init()
    PINSEL_CFG_Type pin_cfg={
        .Portnum    = SSP_PORT,
        .Pinmode    = PINSEL_PINMODE_PULLUP,
        .OpenDrain  = PINSEL_PINMODE_NORMAL,
    SSP_CFG_Type  ssp_cfg ={
        .CPHA           = SSP_CPHA_FIRST,
        .CPOL           = SSP_CPOL_HI,
        .ClockRate      = SSP_CLCK,
        .Databit        = SSP_DATABIT_8,
        .Mode           = SSP_MASTER_MODE,
        .FrameFormat    = SSP_FRAME_SPI

    /*SSP PINSEL configuration*/
    pin_cfg.Funcnum = 2;
    pin_cfg.Pinnum = SSP_MISO;

    pin_cfg.Pinnum = SSP_MOSI;
    pin_cfg.Pinnum = SSP_SCLK;

    pin_cfg.Pinnum = SSP_SSEL;
    /*initialize SSP*/
    SSP_Init(SSP_BASE, &ssp_cfg);

uint8_t ssp_read()
    while(SSP_GetStatus(SSP_BASE, SSP_STAT_BUSY));
    SSP_SendData(SSP_BASE, 0xFF);
    return SSP_ReceiveData(SSP_BASE);

void ssp_write(uint8_t c)
    while(SSP_GetStatus(SSP_BASE, SSP_STAT_BUSY));
    SSP_SendData(SSP_BASE, c);
I2C is a half duplex low speed serial interface, I2C uses two wires Serial Data Line (SDA) and Serial Clock Line (SCL). Both lines are open-collector, an open-collector pin can only pull the signal line low (sink) thus it's active low and requires a pull-up resistor to keep the line high in idle state.
I2C doesn't use a chip select instead each slave on the bus has a unique  7-bit address to which it only responds to. The following excerpt is from the bma180 accelerometer driver I wrote demonstrating the use of I2C:
void bma180_init()
    PINSEL_CFG_Type pin_cfg={
        .Funcnum    = 3,
        .Portnum    = BMA180_I2C_PORT, /*port0 I2C1*/
        .Pinmode    = PINSEL_PINMODE_PULLUP,
        .OpenDrain  = PINSEL_PINMODE_NORMAL

    pin_cfg.Pinnum = BMA180_I2C_SDA;
    pin_cfg.Pinnum = BMA180_I2C_SCL;

    I2C_Init(BMA180_I2C_BASE, BMA180_I2C_CLCK);
    I2C_Cmd(BMA180_I2C_BASE, ENABLE);

    printf("bma180 chip id %d\n", bma180_read_id());

/*reads chip id*/
int8_t bma180_read_id()
    int8_t buf = 0x00;

    I2C_M_SETUP_Type tfr_cfg = {
        .tx_data     = &buf,
        .tx_length   = sizeof(buf),
        .rx_data     = &buf,
        .rx_length   = sizeof(buf),
        .sl_addr7bit = BMA180_I2C_ADDR,
        .retransmissions_max = 3,

    /*write register address/read register value*/
    I2C_MasterTransferData(BMA180_I2C_BASE, &tfr_cfg, I2C_TRANSFER_POLLING);

    return buf;
The LPC1769 has a full fledged 10/100 Ethernet Controller with WOL and other capabilities. Due to it's length I did not include the example here, however, it's  available  in the sources section.

Software Stacks
Nice so we now have a working toolchain and drivers, and we know a bit  on how to initialize and use some common hardware interfaces. Next we look at some essential software stacks.

FatFS is a fat filesystem implementation that is abstracted from the underlying hardware layer, that is, to use it you'll need to provide your own low-level disk initialization and I/O routines that will eventually be called by the FatFS library to read/write sectors from a disk, or from whatever medium you choose to store your data.

I personally use an SDC for my projects, you can find an SDC driver online and port it, or you can use mine, you can find it below in the sources section. The driver is already  integrated with FatFS and tested so it should save you some time.

If you decide to write your own driver, or just want to know how SDC work, I highly recommend the tutorial on the FatFS homepage along with the sdc Simplified Physical Layer Spec.

Once you have managed the low-level I/O, you'll find that FatFS has a really familiar and easy to use interface, here's a small example:
#include <stdio.h>
#include "ff.h"
int main()
    FIL     fp;
    FATFS   ffs;
    UINT    len;
    const char *path = "0:test.txt";
    const char *text = "SDC Test";

    f_mount(0, &ffs);   /*mount ffs work area*/
    f_open(&fp, path, FA_WRITE|FA_CREATE_ALWAYS);
    f_write(&fp, text, strlen(text), &len);
FreeRTOS is an open-source real time OS, A few things worth mentioning when using FreeRTOS, first, the choice of heap allocator, FreeRTOS comes with 3 allocators the first one doesn't free memory, second one allows freeing memory but doesn't handle fragmentations, third and last one is just a wrapper around malloc/free this is the one you should use unless you want FreeRTOS and libc both poking holes in the heap.

Second issue, is the SysTick_Handler, not really an issue since we declared it as weak we can override it here, simply change xPortSysTickHandler to SysTick_Handler.

Finally, we can't use our sleep function because a) SysTickCnt is not updated anymore b) RTOS needs to know about tasks that yield the processor, i.e. go to sleep, so it can schedule other tasks, so I define the following macro in FreeRTOS.h
#define sleep(ms) vTaskDelay(ms) 
This a small FreeRTOS example that schedules two task
#include <stdio.h>
#include "FreeRTOS.h"
#include "task.h"

void task_one(void *args)
  while (1) {    

void task_two(void *args)
  while (1) {    

void vApplicationStackOverflowHook(xTaskHandle *pxTask, signed char *pcTaskName)
    printf("task stack overflow\n", pcTaskName);

int main(void)
    xTaskCreate(task_one,   /*task function              */
                "task_one", /*task name                  */
                256,        /*task stack size in words 1k*/
                NULL,       /*task parameters            */
                1,          /*task priority              */
                NULL);      /*task handle                */
    xTaskCreate(task_two,   /*task function              */
                "task_two", /*task name                  */
                256,        /*task stack size in words 1k*/
                NULL,       /*task parameters            */
                2,          /*task priority              */
                NULL);      /*task handle                */
uIP is an open-source embedded TCP/IP stack. uIP implements ARP, SLIP, IP, TCP, UDP and ICMP and provides two APIs a BSD socket like API and an event-based API. In the sources section you will find a bare minimum example of an ARP-enabled echo server using the event-based API. 

A very powerful MCU like the LPC1768 opens up a lot of possibilities I hope this introduction is enough to help you explore those possibilities. Any feedback is more than welcome & Thank you.

FatFs-SDC driver
hg clone
uIP echo server and client
hg clone
Sparkfun OLED driver
hg clone
The LPC1768 datasheet 
The Definitive Guide to the ARM Cortex-M3
ARM System Developer's Guide: Designing and Optimizing System Software
Designing Embedded Hardware
Read more ...