Tuesday, September 7, 2010

Introduction to ARM Cortex-M3 Part 1-Overview

Hello, this is an introduction to the ARM Cotrex-M3 microprocessor. In the first part we'll talk about the core features of the Cortex-M3, the LPC1768 MCU and the prototyping board mbed, if you decide to get started with ARM this series should have a fair amount of information to help you do so quickly... have fun :)

Cortex-M3 overview
The Cortex-M3 is a 32-bit microprocessor made by ARM based on the ARMv7 architecture, this a Harvard architecture, i.e. separate code and data buses allowing parallel instruction and data fetches, with three profiles, A for high-end applications, R for real time applications and the microcontroller targeted M profile.

The Cortex-M3 is based on the M profile it has a 3 stage pipeline, an advanced interrupt controller (NVIC ) with low interrupt latency, DMA controller with 8 32-bit channels, support for an optional MPU (the LPC17xx has one), support for two operation and two access modes and support for the Thumb2 instruction set.

Generally speaking, ARM processors support two instruction sets, the 32-bit ARM and the 16-bit Thumb instruction sets, when executing ARM code the processor is said to be in ARM state and when executing Thumb code it's said to be in Thumb state.

The ARM state offers more performance for speed-critical tasks, also certain operations require the ARM state e.g. interrupt handlers, while the thumb state offers higher code density (more instructions in less memory).

ARM and Thumb code live in separate source files, switching the processor between the two states frequently could complicate things and become an overhead on both the development and execution time.

The Cortex-M3 supports the Thumb2 set (a superset of the Thumb) which allows mixing 16/32 bit instructions, thus it offers the best of both worlds, performance and high code density in one instruction set without the need  for state switching to do so, in fact, it's not even supported by the Cortex-M3, i.e. it's always in Thumb state.

Operation Modes
The Cortex-M3 supports two operation modes the thread mode for process execution and the handler mode for exception handlers code, each mode has its own stack pointer, the process stack pointer (PSP) and the main stack pointer (MSP).

Usually the thread and handler modes use the same stack pointer thus sharing the stack memory, however, by configuring the processor to use different stack pointers, the stack memory for those two modes can be separated, consequently protecting the system stack memory form a faulty user process.

The two stack pointers are banked, that is only one can be accessed at a time, SP accesses the currently used stack pointer.In an interrupt handler MSP is always used, access to the PSP maybe still be needed for several reasons, mainly when an OS, e.g. RTOS, is running things it may need to change the PSP for context switching or fetch the SVC number from the program counter (PC) when an SVC interrupt is made.

Other reasons for accessing the PSP in an interrupt handler may include locating a faulty instruction, e.g. an instruction that caused a bus fault can be fetched in the bus fault handler from the process stack using the stacked PC.

Access Modes
The Cortex-M3 supports two access modes, user and privileged access, in user mode access to certain registers and instructions is restricted and if an MPU is available access to memory regions, containing OS data or another process data, can also be restricted for a user process. This is mainly intended for use by a multitasking OS.

After executing the rest handler the processor is running in a privileged thread mode, a switch to user level should occur shortly after, when entering an exception handler the processor switches to a privileged level and then back to the previous level upon exiting the handler. This could serve very well when implementing system calls.
A user level thread cannot switch back to a privileged level except through an exception handler (running in handler mode) that changes the access level to privileged on behalf of the thread before returning to thread mode.

The Nested Vectored Interrupt Controller (NVIC) is an advanced interrupt controller that allows nesting interrupts with higher priorities, that is if an interrupt handler is currently executing when a higher priority interrupt occurs, the lower priority interrupt is preempted and the higher priority interrupt executes next.

Some priorities are fixed, like the reset exception which is the highest (lowest number) priority, other are programmable and can be changed dynamically, i.e at run time, however, this requires the interrupt vector to be relocated from flash to ram and then patched with the new handler.

The NVIC handles stacking (pushing registers onto the stack) and un-stacking (popping registers from the stack) at the hardware level, this does not only relief the programmer from doing so himself, it also allows the handler to be normal C function and decreases the overall interrupt latency.

Vectored means that when an interrupt is asserted its number is known to the processor and used to index into the interrupt vector to obtain the handler address directly, as opposed to having a shared handler and enumerating devices to know which one interrupted the processor.

This is part of the interrupt vector from the startup code of the Cortex-M3, the first word is the initial MSP value, second is the address of the first interrupt  handler and so on... The interrupt vector is located at address 0x0 in flash:
.long   __cs3_stack                 /* Top of Stack                 */
.long   __cs3_reset                 /* Reset Handler                */
.long   NMI_Handler                 /* NMI Handler                  */
.long   HardFault_Handler           /* Hard Fault Handler           */
.long   MemManage_Handler           /* MPU Fault Handler            */
.long   BusFault_Handler            /* Bus Fault Handler            */


/* Dummy Exception Handlers */

    .weak   NMI_Handler
    .type   NMI_Handler, %function
    B       .
    .size   NMI_Handler, . - NMI_Handler

    .weak   HardFault_Handler
    .type   HardFault_Handler, %function
    B       .
    .size   HardFault_Handler, . - HardFault_Handler
The __cs3_stack symbol is the address of the start of the stack region in ram, this address is loaded into MSP on startup, since the Cortex-M3 uses a descending stack, i.e. the stack grows downwards while the heap grows upwards, this address is the last memory address:

In the linker script, which is basically the memory layout, __cs3_stack symbol is defined as start of ram + ram size:
PROVIDE(__cs3_stack = __cs3_region_start_ram + __cs3_region_size_ram);

Near the end of the startup code dummy interrupt handlers, branches to same address (B "."), are provided for all the interrupt handlers, those are defined as weak so when linking the binary they can be overridden by user defined handlers (if any).

Two interesting techniques used by the NVIC to further decrease the interrupt latency, tail-chaining and late arrivals.

When two interrupts arrive at the same time, or a lower priority interrupt occurs while executing a same or higher priority interrupt i.e. non pre-empting, the higher priority interrupt executes first while the other remains pending and as soon as the higher priority interrupt finishes executing the pending interrupt is executed immediately, i.e. tail-chained to the first one, without un-stacking and then stacking the registers again which is not necessary because the contents of the stack has not changed, Thus saving a significant amount of time on  executions of subsequent interrupt handlers.

Notice that due to the Harvard architecture, the stacking of the registers can take place simultaneously with the fetches of the interrupt vector and isr code.

Late Arrivals
When a high priority interrupt is asserted while entering a lower priority interrupt, and it happens to occur between stacking and before executing the  first handler instruction, the higher priority interrupt vector is fetched stacking is allowed to finish, and as soon as it does, the processor executes the higher priority interrupt immediately, when it finishes executing, the lower priority interrupt is then tail-chained to the higher priority interrupt and allowed to execute.

Similarly, if the interrupt occurs while un-stacking is taking place, the un-stacking is abandoand and the lower priority interrupt is tail-chained to the higher priority interrupt and allowed to execute.

A Bit-Band region allows atomic bit manipulation through another memory region called the alias region, each bit in the Bit-Band region is addressable through a 32-bit aligned address in the alias region, that is each word in the Bit-Band is mapped to 32 addresses in the alias region.

Bit-Banding can shorten a read-modify-write operation, say you wish to set a bit in some device register, e.g. to enable interrupts, you usually do a read-modify-write operation that is you would read a word, mask it and then write it back, you may also need to make sure this operation won't be interrupted, i.e. atomic, to ensure data consistency, so your final code might look like this:
#define DEVICE_BASE_ADDR    ((uint32_t*)0x2007C000)
#define ENABLE_INT_MASK     (0x01)

void enable_int()
    // tag for exclusive access
    // wait while it's locked
    while (__LDREXW(DEVICE_BASE_ADDR)); 

    uint32_t i = *DEVICE_BASE_ADDR;
    __STREXW(i, DEVICE_BASE_ADDR);      //exclusive write    
Note that ldrex and strex are the newer exclusive access instructions, starting from ARMv6 they replaced SWP instruction for several reasons mainly that they don't lock the bus, anyway, using Bit-Banding it's just a simple matter of writing to the alias address of the bit:
#define DEVICE_BASE_ALIAS   ((uint32_t*)0x22000000)

void enable_int()

Next we will talk about the LPC1768 Cortex-M based MCU.

NXP  LPC1768
The LPC1768, manufactured by NXP/Philips, is an MCU based on Cortex-M3 it has just about every peripheral you could think of UARTs USB, SPI, I2C, ADC/DAC, PWM, CAN, Ethernet... well you get the picture. It also has an MPU, as mentioned before, which can be used to define memory regions with access attributes like (rwx).

The LPC1768 has an on-chip 512k Flash and 64k SRAM 32k of those are reserved for USB and Ethernet, however, they can still be used as general purpose ram when either of those peripherals is not used.

The LPC1768 has an 8 channel DMA, each can be configured independently to handle a transfer from memory to memory, memory to peripheral, peripheral to memory and peripheral to peripheral. The DMA allows the CPU to be free for number crunching while it handles transfers by taking control of the bus, saving the cycles required if the CPU were to handle the transfer itself.

The DMA controller has a separate IRQ line that, when configured to so, interrupts when a transfer is complete, it can handle either a single transfer or multiple transfers chained using a linked list. it also supports different source and destination transfer width by packing and unpacking data.

There are many prototyping boards based on NXP's microcontrollers, one of  those is the mbed board.

The mbed is a prototyping board for the NXP LPC1768 MCU. It is a complete development platform for rapid prototyping that provides high-level C++ drivers for the peripherals, protocol stacks, code samples and a cloud-based based compiler.

The mbed makes programming the MCU really easy, it has a small 2MBs flash with FAT fs, to program the LPC1768 you mount the flash, copy the binary  image and then reset the mbed, the binary image is then copied to the mcu flash and started. That's basically all you need to do! you'll definitly appreciate this if you tried JTAG with openocd before.

This provides an easy way into ARM hacking, although I must say that I don't prefer C++ and I definitely hate the online compiler altogether ! Therefore I've taken a different approach, leaving the mbed libraries and compiler and using NXP's driver library, CMSIS and a GNU based toolchain. I highly recommend you do so too, this way you'll learn more about your processor.

That's all for today, in part 2 we'll talk about more fun stuff like CodeSourcery, NXP driver library, CMSIS, SPI and other things... Also we will make a couple of simple applications to put everything together, see you soon :)


  1. This is cool stuff :)

    You should add "about me" to your blog.

  2. Great write up. Am leaning heavily towards getting that NXP mbed soon for my Sparkfun AVC robot.

  3. shimniok,

    I'm glade you like it. I highly recommend mbed it's well worth it if you ask me.

    Please check the blog again in a few days for part 2 and thanks for the feedback.

  4. Great write up....Keep publishing articles like this...!!

  5. what language and version is the code example above?
    like eg;
    .long __cs3_stack /* Top of Stack */,
    is it the Kiel MDK?

  6. "The Cortex-M3 has an 8 channel DMA " is incorrect. This is just a peripheral on your NXP chip

    1. That's right, should say the "LPC1768" Thanks!