Saturday, February 8, 2014

OpenMV Update: MicroPython, More I/O, uSD and Lots of Other Things!

Time for another update, sorry this took me so long, I've been very busy working on OpenMV, the good news is I have lots of new features implemented! There's a new (smaller :D) hardware revision with more I/O (USART/I2C and SPI) and a uSD socket, MicroPython support, an IDE for the camera, and for those of you who have been wondering, I'm working with Michael Shimniok from Bot-Thoughts on doing a Kickstarter campaign for OpenMV, soon, hopefully, you will be able to get one for a very reasonable price :) so stay tuned!

Okay, so on the software side, you've probably heard of the MicroPython project, if not make sure to check it out, basically MicroPython is very efficient, lightweight Python VM for microcontrollers, the plan was to script the camera with Lua/eLua but MP has some really neat features already implemented, so long story short, I've decided to script the camera with MP... after lots of work, I managed to get MP running on OpenMV, and wrote some MP bindings to export the subsystems of OpenMV to Python, eventually it will be completely controlled with Python.

So how this works so far, basically, on reset OpenMV runs a default Python script with the old serial camera interface (receive commands from the serial port, process and return result) but it also shows up as a small USB storage device where you can copy your own Python script(s), reset and it runs that instead of the default script.. In addition to that, you can also "talk" to the camera directly using a Python shell over the com port while watching the framebuffer in realtime :)

I've also combined all those nice features into a single "IDE" for convenience, written with Python, PyGTK and PyUSB. The IDE has a Python shell, a framebuffer viewer, and it can run scripts or save them to flash:

Moving on to the hardware, the new revision is 1.0x1.30 inches, it has a tiny uSD socket (which will be available to Python user code) USART, SPI and I2C broken out on the main 2.54mm header and a separate 2mm SWD debugging header.. There's also a switch, which will be used for boot or reset.
Here are some pics of the 3rd (2nd?) revision:

Compared to the old one:

That's it for now, please let me know if you have any comments :) thanks!
Read more ...

Saturday, January 18, 2014

Overclocking the STM32F4

I've been doing some tests with the STM32F407 to see how fast it can go, STMicro has released an almost identical one that runs at 180MHz, is it a marketing thing ? will they release a 200MHz version in a few months? who cares, anyway, I was able to run the STM32F407 at 240MHz without any "obvious" problems, in addition to overclocking, the code listed below lets you set some different frequencies, which could be useful for frequency scaling.

On the STM32F4 the clocks are controlled via the RCC (Reset and Clock Control) block, it's easy enough to change the frequency, the tricky part however, is understanding all the different dividers and getting them right. According to the datasheet, the following are the maximum clock frequencies for the core and peripheral buses:
PLLC48: 48MHz (feeds the USB OTG FS and RNG)
APB1 clock: 42MHz
APB2 clock: 84MHz
Based on that, I used multiples of 42MHz to get the maximum possible frequencies for the peripheral buses (APB1 and APB2) for frequencies lower than 168MHz (this doesn't always result in the maximum USB frequency, which is required to get the full 12Mbps), for frequencies higher than 168MHz, I used 200MHz and 240MHz mainly because they are convenient to my application, you might want to use different frequencies based on yours:
enum sysclk_freq {

void rcc_set_frequency(enum sysclk_freq freq)
    int freqs[]   = {42, 84, 168, 200, 240};

    /* USB freqs: 42MHz, 42Mhz, 48MHz, 50MHz, 48MHz */
    int pll_div[] = {2, 4, 7, 10, 10}; 

    /* PLL_VCO = (HSE_VALUE / PLL_M) * PLL_N */
    /* SYSCLK = PLL_VCO / PLL_P */
    /* USB OTG FS, SDIO and RNG Clock =  PLL_VCO / PLLQ */ 
    uint32_t PLL_P = 2;
    uint32_t PLL_N = freqs[freq] * 2;
    uint32_t PLL_M = (HSE_VALUE/1000000);
    uint32_t PLL_Q = pll_div[freq];


    /* Enable HSE osscilator */

    if (RCC_WaitForHSEStartUp() == ERROR) {

    /* Configure PLL clock M, N, P, and Q dividers */

    /* Enable PLL clock */

    /* Wait until PLL clock is stable */
    while ((RCC->CR & RCC_CR_PLLRDY) == 0);

    /* Set PLL_CLK as system clock source SYSCLK */

    /* Set AHB clock divider */

    /* Set APBx clock dividers */
    switch (freq) {
        /* Max freq APB1: 42MHz APB2: 84MHz */
        case SYSCLK_42_MHZ:
            RCC_PCLK1Config(RCC_HCLK_Div1); /* 42MHz */
            RCC_PCLK2Config(RCC_HCLK_Div1); /* 42MHz */
        case SYSCLK_84_MHZ:
            RCC_PCLK1Config(RCC_HCLK_Div2); /* 42MHz */
            RCC_PCLK2Config(RCC_HCLK_Div1); /* 84MHz */
        case SYSCLK_168_MHZ:
            RCC_PCLK1Config(RCC_HCLK_Div4); /* 42MHz */
            RCC_PCLK2Config(RCC_HCLK_Div2); /* 84MHz */
        case SYSCLK_200_MHZ:
            RCC_PCLK1Config(RCC_HCLK_Div4); /* 50MHz */
            RCC_PCLK2Config(RCC_HCLK_Div2); /* 100MHz */
        case SYSCLK_240_MHZ:
            RCC_PCLK1Config(RCC_HCLK_Div4); /* 60MHz */
            RCC_PCLK2Config(RCC_HCLK_Div2); /* 120MHz */

    /* Update SystemCoreClock variable */ 

Note: after calling this function, you will probably need to re-enable all the clocks ...

Note on Overclocking:
There's no telling if overclocking will always work, it might fail at some temperature, critical path or just randomly, but it is nice to know that you have the option to run (burn?) the micro at higher speeds if needed... One thing I can confirm though, is that it doesn't overheat too much, obviously touching the micro with your finger tip is not an accurate way to determine that, so I took this a step further and made a series of tests using the internal core temperature sensor.

I ran the micro at different frequencies while collecting samples from the internal temperature sensor, which is connected to one of the ADC's channels, for each frequency I collected a number of samples and then I plotted the average vs the frequencies and here's the result:
One important thing to note here, the internal temperature sensor readings varies from chip to chip up to 45 degrees, which means those are NOT absolute temperature values but should only be used to detect temperature variations. Now, the conclusion, it looks like overclocking the STM32F4 from 168MHz to 240MHz increases the core temperature by ~4 degrees.
Read more ...

Thursday, December 5, 2013

OpenMV Update: 25FPS Face Detection, USB Support and More

So I've been working on OpenMV for the past week and this is what I have so far:

USB Support:
The camera now supports USB OTG full speed, I've also written a small userspace tool with libusb/SDL to interface with the camera and view the frame buffer, this makes it really easy to debug the image processing code, and it also lets you change the sensor's registers while watching the results in realtime.

I've mentioned building the STM32F4xx libraries in a previous post, you can checkout the repo linked there if you want to build the libraries.

Face Detection:
Many were very interested in this feature, well I've managed to get the viola-jones face detector working on the camera, and it's working fine.. For those of you familiar with the detector, the haar cascade is exported as a C header which is linked to the binary and loaded into the CCM (Core Coupled Memory) a 64KB memory block connected directly to the core. Only one integral image is pre-computed and allocated on the heap, the other one, the squared integral image, which is used for computing the standard deviation, can't fit into memory for the QQVGA resoultion, and so, instead, the standard deviation is computed on the fly for every detection scale using some SIMD instructions to speed it up a bit.

The memory can hold up to 23 stages, however, using only 12 stages and with a relatively large scale step, the detector is working great, with occasional false detections of course, more stages can be used if greater accuracy is required, but not without some performance penalty...As for the numbers, the camera can process 7-8FPS QQVGA, and for QQCIF (88x72) I get 25FPS

Here's a video of the face detector in action running at 25FPS:

Here's another video of color tracking running at 30FPS:

Other Updates:
I've been doing some general fixes here and there, mainly to improve the image quality, in addition to that, I've compiled all the libraries and code with optimizations (-O2) and I've seen great improvements in speed, there's also a new pixel format, grayscale, which is basically just the Y channel extracted from the YUV422 to avoid doing that every time a grayscale image is required.

The QCIF/QQCIF are working now (the sensor can output 60FPS when using QQCIF ) and through some other register probing, I've removed a few useless registers and discovered that the sensor has digital zoom, cool!

There's also simple motion detection code in progress, it's based on frame differencing and using the first frame as the background, more work will be done here as soon as I get around to it. And I will probably try template matching next.

I've also just finished a new hardware revision, it has a tiny uSD socket, which I imagine can be used for anything from storing haar cascades, snapshots or video to buffering larger frames, the new revision is also a bit smaller. 
Read more ...

Monday, December 2, 2013

Using The CCM Memory on the STM32

The STM32 series have non-contiguous memories divided into blocks, for example the STM32F4, has 2 (contiguous) blocks of SRAM connected to the bus matrix with different interconnects, and a Core Coupled Memory (CCM) block which is connected directly to the core.

This tight coupling of the CCM memory to the core, leads to zero wait states, in other words, the core has exclusive access to this memory block, so for example, while other bus masters are using the main SRAM the core can access the CCM. Therefore, the CCM block is commonly used for the stack and other critical OS data, this partitioning, allows the core to continue executing code while for example, a DMA transfer takes place. However, the CCM could also be used as an extra memory block, doing so is easy, and there are a few examples out there that show how, simply defining a section in the linker script will do:
.ccm : {
  . = ALIGN(4);
  _sccm = .;
  . = ALIGN(4);      
  _eccm = .;

And a section attribute is used to allocate memory into that section :
const int8_t my_array[13] __attribute__ ((section (".ccm")))= {....};

However, what if you want to load initialized data into that section ? some look-up tables for example?  using that section is not enough, see, the linker script makes the distinction between the Load Memory Address  (LMA) where data is stored initially, and the Virtual Memory Address (VMA) where the data should be loaded at runtime, if the LMA is not specified explicitly, it becomes the same as VMA.

You can see here that GDB loads the .ccm data into the CCM block (LMA=VMA=0x10000000) directly, while all other sections are loaded into the flash region (0x8xxxxxx):

Loading section .ccm, size 0x4ebc lma 0x10000000
Loading section .isr_vector, size 0x188 lma 0x8000000
Loading section .text, size 0x9744 lma 0x8000188
Loading section .ARM, size 0x8 lma 0x80098cc
Loading section .init_array, size 0x8 lma 0x80098d4
Loading section .fini_array, size 0x4 lma 0x80098dc
Loading section .data, size 0xa30 lma 0x80098e0
Loading section .jcr, size 0x4 lma 0x800a310

While this may sound right, it's not, if GDB loads the .ccm section is loaded into SRAM directly, it will disappear after a power cycle! So instead, we want the LMA to be somewhere in the FLASH region (0x8xxxxxxx) and the VMA to be (0x10000000):
_eidata = (_sidata + SIZEOF(.data) + SIZEOF(.jcr));
.ccm : AT ( _sidata + SIZEOF(.data) + SIZEOF(.jcr))
  . = ALIGN(4);
  _sccm = .;
  . = ALIGN(4);      
  _eccm = .;

Note the .jcr is included in by some startup code for something related to Java, without adding the SIZEOF(.jcr) the .ccm will overlap that section, also note the _eidata symbol which will be referenced later in code. Now, when you try to load the elf, GDB prints:

Loading section .isr_vector, size 0x188 lma 0x8000000
Loading section .text, size 0x9794 lma 0x8000188
Loading section .ARM, size 0x8 lma 0x800991c
Loading section .init_array, size 0x8 lma 0x8009924
Loading section .fini_array, size 0x4 lma 0x800992c
Loading section .data, size 0xa30 lma 0x8009930
Loading section .jcr, size 0x4 lma 0x800a360
Loading section .ccm, size 0x4ebc lma 0x800a364

Great, now the .ccm data is loaded into the FLASH region, we just need something to load it from FLASH to CCM in runtime, if you look at the startup code, there's an assembly function that copies initialized data from the flash to where it should be loaded in SRAM (the VMA), you need to do the same for the .ccm data, by either modifying the startup code, or perferrably, copying the data with a C function, so here it is:
void load_ccm_section () __attribute__ ((section (".init")));
void load_ccm_section (){
    extern char _eidata, _sccm, _eccm;

    char *src = &_eidata;
    char *dst = &_sccm;
    while (dst < &_eccm) {
        *dst++ = *src++;
Note that the function is placed into the .init section so it executes before main. Now in runtime, this function will load the data from FLASH into SRAM using the pointer defined in the linker script.
Read more ...

Thursday, November 28, 2013

STM32F4xx Libraries

I wrote a tutorial before on how to setup a toolchain and build the STM32F4xx standard peripheral drivers into one convenient library, since then, a few people have asked me about the library, so to make life easier, I downloaded the latest StdPeriph/CMSIS, in addition to a few other libraries that I might need later, and shared everything in one repository, which currently has the following libraries:
Cortex-M  CMSIS      V3.20
STM32F4xx CMSIS      V1.3.0
STM32F4xx StdPeriph  V1.3.0
STM32_USB_Device     V1.1.0
STM32_USB_OTG        V2.1.0
In addition to those, the repository also includes a simple USB device library  (stm32f4xx/USB_Generic) which abstracts all the horrible details of the USB libraries into a very simple generic USB device implementation with just two Bulk endpoints...

To use this library you just pass a struct with two callback functions and the library will call those functions whenever data is received or requested, it's as simple as that, note that it's configured for OTG FS only, it could still be useful if you just want to get USB working and don't have time to go through all the examples.

Finally, repository also includes some examples, a Blinky, a USB_Generic example and some user-space code with libusb.

Building The Libraries:
To build the libraries and examples just type make in the top directory, the top Makefile will pass along all the flags and variables, here are some options you can pass on the command line:

make DEBUG=0
This will build everything with -O2 and no debugging symbols (not recommended)

make DEBUG=1 CFLAGS="-DOSC=xx"
This will build the library with debugging enabled, no optimization and using the given crystal frequency in MHz (for example -DOSC=16)

Read more ...