YOLOv3 on the Xilinx Kria KV260

YOLOv3 on the Xilinx Kria KV260 AI Vision Starter Kit

Real-Time Object Detection on the Xilinx Kria

YOLOv3 is a convolutional neural network used in this project to perform real-time object detection. YOLOv3 works by taking a single image, dividing it into multiple areas and predicting bounding boxes and probabilities for each region. These bounding boxes are then weighted by their predicted probabilities. The result of this operation is image ratios used to calculate the resolution for the bounding boxes, together with the appropriate class id number.

For some of our previous work using YOLO v3 on the Zynq UltraScale+ MPSoC ZCU104 you can click here.

Ivica Matić

Kria KV260 Vision AI Starter Kit

The Kria KV260 is a development board made by XIlinx featuring Zynq® UltraScale+™ MPSoC FPA device on board. It comes in a SOM + Carrier Board form factor with relatively low cost and size, providing the user-friendly platform for accelerating Vision applications using XIlinx hardware, products and toolchains.

KV260 Vision AI Starter Kit board is followed and supported by Vitis AI software, which provides an easy way for developers to adapt their AI models to work on an FPGA architecture.

In our work, we are primarily using YoloV3 real-time object detection neural network/ system to train our network to recognize objects required depending on our project goal and needs.

Workflow to train a custom Yolov3 neural network consists of:

Obtain the images for training the neural network
Annotate the objects on the image using our annotation tool
Move the images to the AWS instance
Move the annotation files to the AWS instance
Run the training command
Download trained weights from AWS instance and run the inference for evaluation
Convert darknet yolov3 weights to TensorFlow files
Quantize and compile the neural network using Vitis AI Tool
Run the inference on FPGA target device

For this demo and evaluation purposes, we trained our neural network on COCO dataset.

You can download the pre-trained network on https://pjreddie.com/darknet/yolo/

As GPU devices work quite well with floating-point numbers, unlike FPGAs, we need to convert our neural network model to use integer numbers. This task is accomplished using the Vitis AI tool. But because VAI does not natively support Darknet Yolov3 we need to convert it to a TensorFlow model using a conversion tool that can be found here.

Afterwards, we can continue to quantize our model by providing a subset of original un-labelled images used for training at the first stage when we used our AWS darknet container.

On the end of our quantization process we will have a .xmodel file ready for deployment on our KV260 board. Then we can move that .xmodel file to the KV260 dev-board via ssh and run the following code for running the inference for both video and image sources.

Custom YOLO v3 Image C++ Snippet