Thursday, December 5, 2013

OpenMV Update: 25FPS Face Detection, USB Support and More

So I've been working on OpenMV for the past week and this is what I have so far:

USB Support:
The camera now supports USB OTG full speed, I've also written a small userspace tool with libusb/SDL to interface with the camera and view the frame buffer, this makes it really easy to debug the image processing code, and it also lets you change the sensor's registers while watching the results in realtime.

I've mentioned building the STM32F4xx libraries in a previous post, you can checkout the repo linked there if you want to build the libraries.

Face Detection:
Many were very interested in this feature, well I've managed to get the viola-jones face detector working on the camera, and it's working fine.. For those of you familiar with the detector, the haar cascade is exported as a C header which is linked to the binary and loaded into the CCM (Core Coupled Memory) a 64KB memory block connected directly to the core. Only one integral image is pre-computed and allocated on the heap, the other one, the squared integral image, which is used for computing the standard deviation, can't fit into memory for the QQVGA resoultion, and so, instead, the standard deviation is computed on the fly for every detection scale using some SIMD instructions to speed it up a bit.

The memory can hold up to 23 stages, however, using only 12 stages and with a relatively large scale step, the detector is working great, with occasional false detections of course, more stages can be used if greater accuracy is required, but not without some performance penalty...As for the numbers, the camera can process 7-8FPS QQVGA, and for QQCIF (88x72) I get 25FPS

Here's a video of the face detector in action running at 25FPS:

Here's another video of color tracking running at 30FPS:

Other Updates:
I've been doing some general fixes here and there, mainly to improve the image quality, in addition to that, I've compiled all the libraries and code with optimizations (-O2) and I've seen great improvements in speed, there's also a new pixel format, grayscale, which is basically just the Y channel extracted from the YUV422 to avoid doing that every time a grayscale image is required.

The QCIF/QQCIF are working now (the sensor can output 60FPS when using QQCIF ) and through some other register probing, I've removed a few useless registers and discovered that the sensor has digital zoom, cool!

There's also simple motion detection code in progress, it's based on frame differencing and using the first frame as the background, more work will be done here as soon as I get around to it. And I will probably try template matching next.

I've also just finished a new hardware revision, it has a tiny uSD socket, which I imagine can be used for anything from storing haar cascades, snapshots or video to buffering larger frames, the new revision is also a bit smaller. 
Read more ...

Monday, December 2, 2013

Using The CCM Memory on the STM32

The STM32 series have non-contiguous memories divided into blocks, for example the STM32F4, has 2 (contiguous) blocks of SRAM connected to the bus matrix with different interconnects, and a Core Coupled Memory (CCM) block which is connected directly to the core.

This tight coupling of the CCM memory to the core, leads to zero wait states, in other words, the core has exclusive access to this memory block, so for example, while other bus masters are using the main SRAM the core can access the CCM. Therefore, the CCM block is commonly used for the stack and other critical OS data, this partitioning, allows the core to continue executing code while for example, a DMA transfer takes place. However, the CCM could also be used as an extra memory block, doing so is easy, and there are a few examples out there that show how, simply defining a section in the linker script will do:
.ccm : {
  . = ALIGN(4);
  _sccm = .;
  . = ALIGN(4);      
  _eccm = .;

And a section attribute is used to allocate memory into that section :
const int8_t my_array[13] __attribute__ ((section (".ccm")))= {....};

However, what if you want to load initialized data into that section ? some look-up tables for example?  using that section is not enough, see, the linker script makes the distinction between the Load Memory Address  (LMA) where data is stored initially, and the Virtual Memory Address (VMA) where the data should be loaded at runtime, if the LMA is not specified explicitly, it becomes the same as VMA.

You can see here that GDB loads the .ccm data into the CCM block (LMA=VMA=0x10000000) directly, while all other sections are loaded into the flash region (0x8xxxxxx):

Loading section .ccm, size 0x4ebc lma 0x10000000
Loading section .isr_vector, size 0x188 lma 0x8000000
Loading section .text, size 0x9744 lma 0x8000188
Loading section .ARM, size 0x8 lma 0x80098cc
Loading section .init_array, size 0x8 lma 0x80098d4
Loading section .fini_array, size 0x4 lma 0x80098dc
Loading section .data, size 0xa30 lma 0x80098e0
Loading section .jcr, size 0x4 lma 0x800a310

While this may sound right, it's not, if GDB loads the .ccm section is loaded into SRAM directly, it will disappear after a power cycle! So instead, we want the LMA to be somewhere in the FLASH region (0x8xxxxxxx) and the VMA to be (0x10000000):
_eidata = (_sidata + SIZEOF(.data) + SIZEOF(.jcr));
.ccm : AT ( _sidata + SIZEOF(.data) + SIZEOF(.jcr))
  . = ALIGN(4);
  _sccm = .;
  . = ALIGN(4);      
  _eccm = .;

Note the .jcr is included in by some startup code for something related to Java, without adding the SIZEOF(.jcr) the .ccm will overlap that section, also note the _eidata symbol which will be referenced later in code. Now, when you try to load the elf, GDB prints:

Loading section .isr_vector, size 0x188 lma 0x8000000
Loading section .text, size 0x9794 lma 0x8000188
Loading section .ARM, size 0x8 lma 0x800991c
Loading section .init_array, size 0x8 lma 0x8009924
Loading section .fini_array, size 0x4 lma 0x800992c
Loading section .data, size 0xa30 lma 0x8009930
Loading section .jcr, size 0x4 lma 0x800a360
Loading section .ccm, size 0x4ebc lma 0x800a364

Great, now the .ccm data is loaded into the FLASH region, we just need something to load it from FLASH to CCM in runtime, if you look at the startup code, there's an assembly function that copies initialized data from the flash to where it should be loaded in SRAM (the VMA), you need to do the same for the .ccm data, by either modifying the startup code, or perferrably, copying the data with a C function, so here it is:
void load_ccm_section () __attribute__ ((section (".init")));
void load_ccm_section (){
    extern char _eidata, _sccm, _eccm;

    char *src = &_eidata;
    char *dst = &_sccm;
    while (dst < &_eccm) {
        *dst++ = *src++;
Note that the function is placed into the .init section so it executes before main. Now in runtime, this function will load the data from FLASH into SRAM using the pointer defined in the linker script.
Read more ...