Two brains are better than one…

Tim Fernandez-Hart

These past few weeks I have had the opportunity to use an Oak-D-lite camera with an Ultra96. This is quite a distinct setup. With processing available on both the host (Ultra96) and on the camera itself. A dream team?


The “Oak-D” camera line up was originally released as part of a kickstarter campaign by Luxonis. The Oak-D-lite is one of their smallest and designed with edge computing and robotics in mind. But don’t be fooled, it still packs quite a punch into its diminutive frame.

Each camera contains a greyscale stereo pair, and a central colour camera. The stereo pair can be used for depth perception and run at a slightly lower resolution in the lite version. The colour camera has a resolution of 4K at 60Hz.

The real advantage to using these cameras however is that they contain a Myriad-X VPU (Vision Processing Unit). The Myriad-X is part of Intel’s Movidius neural compute product collection. Movidius compute is a semi-programmable architecture designed for high throughput, vision and machine learning applications. This means we can run neural networks and OpenCV functions directly on the camera itself. Dedicated hardware such as this, also helps keep power requirements down. A fancy webcam this is not! Luxonis are also pushing the limits of affordability with the Oak-D-Lite currently coming in at $149 + postage.

Oak-D-lite camera specifications:

Camera Specs Colour camera Stereo pair
Sensor IMX214 OV7251
DFOV / HFOV / VFOV 81° / 69° / 54° 86° / 73° / 58°
Resolution 13MP (4208×3120) 480P (640×480)
Focus AF: 8cm – ∞ OR FF: 50cm – ∞ Fixed-Focus 6.5cm – ∞
Max Framerate 60 FPS 200 FPS
F-number 2.2 ± 5% 2.2
Lens size 1/3.1 inch 1/7.5 inch
Effective Focal Length 3.37mm 1.3mm
Distortion < 1% < 1.5%
Pixel size 1.12µm x 1.12µm 3µm x 3µm


Given all that onboard computing power, Luxonis have worked hard to make it as accessible as possible. They offer both a lower level API and a higher level SDK to access the device, all in Python. An output is constructed by chaining together nodes to create an image processing pipeline.

Output can be in the form of raw images, rectified images or colour images, nodes can also feed neural network data out of the camera. Such as object locations, and bounding box sizes. Communication to and from the camera is done via XLinkIn/Out nodes.

The final piece of our setup is an Ultra96-v2. A small Linaro 96 style board which has a quad core Arm Cortex-A53 and plenty of useful peripherals (BlueTooth, WiFi, USB, DisplayPort etc.). The Ultra96 also has Field Programmable Gate Array (FPGA) logic on board. The FPGA fabric can be used to accelerate processing of different kinds and at a much lower power usage.

First steps in depth perception
We’ll be using the PYNQ environment to get up and running as this generally allows an easy way to get going with the Ultra96 to test new devices and algorithms. PYNQ uses Jupyter notebooks and connects to your computer through a USB webserver.

Binocular vision
It may not have escaped your attention, but most sighted animals have two eyes, the minimum number required to judge depth. This is simple triangulation using the fact the distance between the cameras (or eyes) is known. Meaning, that a given feature, in a set of two stereo images, will appear further apart when viewed close-up compared to when it’s viewed from a distance.

For example, the two pictures above are composed from superimposing the left and right images of a stereo pair on the Oak-D-lite. To illustrate the concept of disparity, the same scene is viewed from a distance (left) and close range (right). The disparity between the two images, at the bottom of the cup handle, is the length of the red line. Things closer to the camera have a larger disparity. Doing this for all pixels in an image yields a disparity map that can be used to estimate depth.
Producing such a map and colourizing it gives us some rudimentary depth perception.

Now, we have skimmed over some important details such as image rectification but this is all handled by the Oak-D-lite camera, so we don’t have to deal with it ourselves. But it is good practice to be aware of it.

Future work
In the near future I hope to run some simple neural networks on the Oak-D-lite and output both images and corresponding data into the Ultra96. The most obvious application where this sort of synergistic system would be perfect is SLAM.


Contact us for more information