Saturday, January 1, 2011

Introduction to ARM Cortex-M3 Part 2-Programming

Welcome to the second part of the Introduction to ARM Cortex-M3, in part 1 we went through the core features of the Cortex-M3 and the LPC1768. In this part we will focus more on programming the LPC1768 by covering the following points:
  • Toolchain overview
  • Library Tweaks
  • Hardware Interfaces
  • Software Stacks
we have a lot to cover so let's get started...

Toolchain overview
The toolchain of choice is the CodeSourcery toolchain, CodeSourcery is  a gnu-based ARM toolchain developed in partnership with ARM, it's freely available both in source and pre-compiled and uses the embedded C library newlib (by Redhat) as the standard C library.

We're almost good to go, however, we still need drivers. Fortunately, NXP provides a nice driver library for the LPC1768, the library is based on the Cortex Microcontroller Standard Interface (CMSIS) developed by ARM as an abstraction for the core layers, it comes with the startup code, system initialization code, linker script, drivers for all the peripherals plus many examples.

Library Tweaks
I tweaked the Makefile a little bit to build the drivers, startup and system code into a single library, I then installed this library and headers in /usr/local/lpc17xx I think this greatly simplifies my Makefiles since I only need to link one library.

I also added some common initialization code mostly to system_LPC17xx.c to avoid duplicating it in every project. Finally, I tweaked the linker script a little bit. Let's have a detailed look at this code

First order of business is to enable logging. printf, puts and similar routines found in libc eventually call the low-level funciton _write and since it can not really have any useful implementation in newlib, because it's platform dependant, you will need to provide your own to redirect output somewhere. For example, the following redefines _write to redirect output to UART0:
#define stduart (LPC_UART_TypeDef *)LPC_UART0_BASE
int _write(int fd, const void *buf, uint32_t nbyte)
    UART_Send(stduart, buf, nbyte, BLOCKING);
    return nbyte;
Almost all the code I see regardless of the embedded platform needs some sort of a delay/sleep function to wait on some event or just waste time.  There are two ways you can have delays, one is by using loops of instructions that have a known execution time, which is highly inaccurate and could be interrupted, the other way is using counters.

The SysTick timer counts down from the value loaded into one of its registers until it reaches 0 and then it asserts the SysTick IRQ, the value then is reloaded and the timer counts down again and so on.

The following example demonstrates using the SysTick timer for delays. The system tick count is kept in sys_ticks, when sleep is called the value of sys_ticks is saved and then we keep subtracting this value from the current tick count until the difference is equal to or greater than the required delay:
volatile uint32_t sys_ticks; 
void SysTick_Handler(void)__attribute__((weak));

void SysTick_Handler(void) 

void sleep(uint32_t ms) 
    uint32_t curr_ticks = sys_ticks;
    while ((sys_ticks - curr_ticks) < ms);

int main(void) 
    /*Setup SysTick to interrupt every 1 ms*/
    SysTick_Config((SystemCoreClock / 1000) * 1 - 1);
A few things to note here, first, SysTick_Handler is declared weak, when a weak symbol is redefined the second definition is linked instead, so basically,  it can be overridden by the application, later you will see that I redefine it in the RTOS scheduler.

Second thing to note is that setting the timer to interrupt every 1 ms gives reasonable resolution but not necessarily the best throughput so you may wish to tweak that depending on your application, or disable it altogether,  but keep in mind that it affects your timer's resolution.

More RAM
We mentioned before in part1 that the LPC17xx has a second 32k block of SRAM for the Ethernet and USB controllers, however, if you're not using either you may wish utilize this extra memory to your application, you can do so by accessing the memory directly at 0x2007C000 or, more neatly, by defining a new memory region and a new section in the linker script:
/*linker script*/
  rom (rx)   : ORIGIN = 0x00000000, LENGTH = 512K
  ram (rwx)  : ORIGIN = 0x10000000, LENGTH = 32K
  ram2 (rwx) : ORIGIN = 0x2007C000, LENGTH = 32K   /*define memory region*/ 

  .ram2 : /*define section*/
  } >ram2  
 .text : /*other sections*/
And then using the gcc section attribute to place stuff into this section:
/*place buffer in section .ram2*/
uint8_t buffer[BUFFER_SIZE] __attribute__ ((section (".ram2")))={0};
Hardware Interfaces
Next, we will look at initializing and using some of the common hardware interfaces available on the LPC1768, using the NXP driver library.

The serial interface is the most common interface out there, we've seen how to use _write to redirect the output to the USART, now we look at initializing it. The following is an excerpt from my SystemInit function:
#define stduart (LPC_UART_TypeDef *)LPC_UART0_BASE

void SystemInit()
  /*some init code here*/

  PINSEL_CFG_Type pin_cfg={ /*pinsel config*/
      .Funcnum      = 1,
      .Portnum      = 0,
      .Pinmode      = PINSEL_PINMODE_PULLUP,
      .OpenDrain    = PINSEL_PINMODE_NORMAL,

  UART_CFG_Type uart_cfg={ /*UART config*/
      .Baud_rate    = 57600,
      .Databits     = UART_DATABIT_8,
      .Parity       = UART_PARITY_NONE,
      .Stopbits     = UART_STOPBIT_1,
  /*setup tx*/
  pin_cfg.Pinnum  = 2; 

  /*setup rx*/
  pin_cfg.Pinnum  = 3;

  /*Initialize uart*/
  UART_Init(stduart, &uart_cfg);
  UART_TxCmd(stduart, ENABLE);
More advanced things can be done with the USART of course. For example, there are 16 byte  RX/TX hardware FIFOs that can be set to trigger an interrupt or a DMA transfer at a certain level you can find more about this in the datasheet.

Note that when you mount the mbed you may need  to set the baud rate of /dev/ttyACMx using the following command:
stty -F /dev/ttyACM0 speed 57600
SPI is a full duplex serial interface that uses four wires for data transfer, Master In Salve Out (MISO), Master Out Slave In (MOSI), Serial Clock (SCLK) and Slave Select (SSEL).

Slave Select, or Chip Select (CS), is used to select the active slave when multiple slaves are present on the bus.  A few things  about SPI are worth mentioning. First, both end points have shift registers, when one is  loaded with a byte and shifted the other gets shifted too, i.e. exchanged, and so in order to read a byte you must write one too, however, when writing you may ignore the received byte.

Second, the clock polarity (CPOL) and clock phase (CPHA), the clock polarity determines the idle state of the clock, if CPOL = 0 the clock is low when it's idle and high when it's active, CPOL = 1 the clock is high when it's idle and low when it's active. Clock phase determines edge at which the data is sampled, for CPHA = 0 the data is sampled at the first edge, for CPHA = 1 the data is sampled on second edge.

The four combinations of the CPOL and CPHA are called SPI modes, you need to select a mode when configuring SPI, depending on your hardware, note that if CPOL = 0 and CPHA = 0 the clock transitions form low to high when active and the data is sampled on the raising edge of the clock this is the same as when CPOL =1 and CPHA = 1 because data will still be sampled on the raising edge of the clock

A real life example is the oled driver SSD1339 that samples data on the raising edge of the clock, that is, two modes work equally well CPOL = 0/CPHA = 0 and CPOL = 1/CPHA = 1.

Finally, note that the maximum frequency you can set SPI to, according to the datasheet, is 1/8 of the peripheral clock (PCLK) selected in PCLKSEL0/1. Setting  PCLK_SPI to 10b selects the CPU clock (CCLK) as the peripheral clock source  and so f = (CCLK)/8 = 100Mhz/8 that is 12.5 Mhz.

Now this is an example of initializing and using SPI, note that the legacy SPI interface is replaced by SSP which supports SPI among other protocols:
void ssp_init()
    PINSEL_CFG_Type pin_cfg={
        .Portnum    = SSP_PORT,
        .Pinmode    = PINSEL_PINMODE_PULLUP,
        .OpenDrain  = PINSEL_PINMODE_NORMAL,
    SSP_CFG_Type  ssp_cfg ={
        .CPHA           = SSP_CPHA_FIRST,
        .CPOL           = SSP_CPOL_HI,
        .ClockRate      = SSP_CLCK,
        .Databit        = SSP_DATABIT_8,
        .Mode           = SSP_MASTER_MODE,
        .FrameFormat    = SSP_FRAME_SPI

    /*SSP PINSEL configuration*/
    pin_cfg.Funcnum = 2;
    pin_cfg.Pinnum = SSP_MISO;

    pin_cfg.Pinnum = SSP_MOSI;
    pin_cfg.Pinnum = SSP_SCLK;

    pin_cfg.Pinnum = SSP_SSEL;
    /*initialize SSP*/
    SSP_Init(SSP_BASE, &ssp_cfg);

uint8_t ssp_read()
    while(SSP_GetStatus(SSP_BASE, SSP_STAT_BUSY));
    SSP_SendData(SSP_BASE, 0xFF);
    return SSP_ReceiveData(SSP_BASE);

void ssp_write(uint8_t c)
    while(SSP_GetStatus(SSP_BASE, SSP_STAT_BUSY));
    SSP_SendData(SSP_BASE, c);
I2C is a half duplex low speed serial interface, I2C uses two wires Serial Data Line (SDA) and Serial Clock Line (SCL). Both lines are open-collector, an open-collector pin can only pull the signal line low (sink) thus it's active low and requires a pull-up resistor to keep the line high in idle state.
I2C doesn't use a chip select instead each slave on the bus has a unique  7-bit address to which it only responds to. The following excerpt is from the bma180 accelerometer driver I wrote demonstrating the use of I2C:
void bma180_init()
    PINSEL_CFG_Type pin_cfg={
        .Funcnum    = 3,
        .Portnum    = BMA180_I2C_PORT, /*port0 I2C1*/
        .Pinmode    = PINSEL_PINMODE_PULLUP,
        .OpenDrain  = PINSEL_PINMODE_NORMAL

    pin_cfg.Pinnum = BMA180_I2C_SDA;
    pin_cfg.Pinnum = BMA180_I2C_SCL;

    I2C_Init(BMA180_I2C_BASE, BMA180_I2C_CLCK);
    I2C_Cmd(BMA180_I2C_BASE, ENABLE);

    printf("bma180 chip id %d\n", bma180_read_id());

/*reads chip id*/
int8_t bma180_read_id()
    int8_t buf = 0x00;

    I2C_M_SETUP_Type tfr_cfg = {
        .tx_data     = &buf,
        .tx_length   = sizeof(buf),
        .rx_data     = &buf,
        .rx_length   = sizeof(buf),
        .sl_addr7bit = BMA180_I2C_ADDR,
        .retransmissions_max = 3,

    /*write register address/read register value*/
    I2C_MasterTransferData(BMA180_I2C_BASE, &tfr_cfg, I2C_TRANSFER_POLLING);

    return buf;
The LPC1769 has a full fledged 10/100 Ethernet Controller with WOL and other capabilities. Due to it's length I did not include the example here, however, it's  available  in the sources section.

Software Stacks
Nice so we now have a working toolchain and drivers, and we know a bit  on how to initialize and use some common hardware interfaces. Next we look at some essential software stacks.

FatFS is a fat filesystem implementation that is abstracted from the underlying hardware layer, that is, to use it you'll need to provide your own low-level disk initialization and I/O routines that will eventually be called by the FatFS library to read/write sectors from a disk, or from whatever medium you choose to store your data.

I personally use an SDC for my projects, you can find an SDC driver online and port it, or you can use mine, you can find it below in the sources section. The driver is already  integrated with FatFS and tested so it should save you some time.

If you decide to write your own driver, or just want to know how SDC work, I highly recommend the tutorial on the FatFS homepage along with the sdc Simplified Physical Layer Spec.

Once you have managed the low-level I/O, you'll find that FatFS has a really familiar and easy to use interface, here's a small example:
#include <stdio.h>
#include "ff.h"
int main()
    FIL     fp;
    FATFS   ffs;
    UINT    len;
    const char *path = "0:test.txt";
    const char *text = "SDC Test";

    f_mount(0, &ffs);   /*mount ffs work area*/
    f_open(&fp, path, FA_WRITE|FA_CREATE_ALWAYS);
    f_write(&fp, text, strlen(text), &len);
FreeRTOS is an open-source real time OS, A few things worth mentioning when using FreeRTOS, first, the choice of heap allocator, FreeRTOS comes with 3 allocators the first one doesn't free memory, second one allows freeing memory but doesn't handle fragmentations, third and last one is just a wrapper around malloc/free this is the one you should use unless you want FreeRTOS and libc both poking holes in the heap.

Second issue, is the SysTick_Handler, not really an issue since we declared it as weak we can override it here, simply change xPortSysTickHandler to SysTick_Handler.

Finally, we can't use our sleep function because a) SysTickCnt is not updated anymore b) RTOS needs to know about tasks that yield the processor, i.e. go to sleep, so it can schedule other tasks, so I define the following macro in FreeRTOS.h
#define sleep(ms) vTaskDelay(ms) 
This a small FreeRTOS example that schedules two task
#include <stdio.h>
#include "FreeRTOS.h"
#include "task.h"

void task_one(void *args)
  while (1) {    

void task_two(void *args)
  while (1) {    

void vApplicationStackOverflowHook(xTaskHandle *pxTask, signed char *pcTaskName)
    printf("task stack overflow\n", pcTaskName);

int main(void)
    xTaskCreate(task_one,   /*task function              */
                "task_one", /*task name                  */
                256,        /*task stack size in words 1k*/
                NULL,       /*task parameters            */
                1,          /*task priority              */
                NULL);      /*task handle                */
    xTaskCreate(task_two,   /*task function              */
                "task_two", /*task name                  */
                256,        /*task stack size in words 1k*/
                NULL,       /*task parameters            */
                2,          /*task priority              */
                NULL);      /*task handle                */
uIP is an open-source embedded TCP/IP stack. uIP implements ARP, SLIP, IP, TCP, UDP and ICMP and provides two APIs a BSD socket like API and an event-based API. In the sources section you will find a bare minimum example of an ARP-enabled echo server using the event-based API. 

A very powerful MCU like the LPC1768 opens up a lot of possibilities I hope this introduction is enough to help you explore those possibilities. Any feedback is more than welcome & Thank you.

FatFs-SDC driver
hg clone
uIP echo server and client
hg clone
Sparkfun OLED driver
hg clone
The LPC1768 datasheet 
The Definitive Guide to the ARM Cortex-M3
ARM System Developer's Guide: Designing and Optimizing System Software
Designing Embedded Hardware
Read more ...