HiPEAC Internship Report – Power profiling embedded FPGA systems

Below are details from a report by Baturay Onural and Ivica Matic. Baturay worked as an intern with Sundance as part of the HiPEAC internship program which has been a great success. 

We live in a connected world, in which we can find embedded devices anywhere we go; from our portable devices like digital watches and MP3 players to large complex systems such as traffic light controllers or sensor systems, or autonomous vehicles. The number of Internet-connected embedded systems has reached more than 20 billion unique devices by the end of 2020 and from the growth charts, we can see that this number is not going to drop off soon.

Why FPGA? With the prices of FPGAs dropping, engineers could take this to their advantage and further optimize the system, reducing the size of the products even further, all while keeping the performance and reliability at the same or even greater level. The big advantage that FPGAs bring into the play is hardware-time speed and reliability, crucial factors required for the correct function of real-time operating systems.
The advantage that goes into the hand of any given embedded system is that it is designed to do one specific task for a specific purpose, so engineers and designers can optimize it to reduce the cost and the size of the product while increasing the reliability and the performance at the same time. The advantage of using FPGAs in place of standard microcontrollers are:

  • Both hardware and firmware re-programmability
  • Parallel processing
  • Lower power consumption
  • Higher speed as component complexity is much lower

But… even with the lower power consumption for the FPGA devices, we still need to consider it as not only is it a big consideration for battery-powered systems, it also matters for heat dissipation inside the closed environment in which most of the embedded devices tend to operate.

What is the VCS-1?

The VCS-1 is a PC/104 Linux stack composed of 2 main components, namely the EMC2 board which is a PCIe/104 OneBank carrier for a Trenz compatible SoC Module and the FM191 expansion card that fans out the I/Os from the SoC to the outside world, in the case featuring the ZU4EV SOM. The SoC provides standard connectivity (e.g. SPI, RS232, I2C, USB, GigE, PCIe, etc), ARM-based processing which is used to run Linux OS, memory interfaces, and Programmable Logic used for Hardware acceleration and GPIO. The SoC can either be Xilinx Zynq 7 Series (Dual Core ARM Cortex A9) or Xilinx MPSoC Zynq Ultrascale+ (Quad-Core ARM Cortex A53).

What is the LynSyn Lite?

The Lynsyn Lite is a power measurement utility board, designed to measure the power usage of a system and correlates power values with the source code of the program running on the system.

  • 10kHz current sampling frequency
  • 3 independent current sensors
  • Non-intrusive PC sampling (JTAG) for correlating power with source code; supports ARMv7-A and ARMv8-A architectures

VCS-1 controlled Flipper robot

Motion Impossible’s Agito controlled by the VCS-1

Capturing 3D data with the VCS-1

Our use-case

Computer vision is a big task in the computing world that can be optimized everywhere from big servers in warehouses to small embedded devices, and so, with the increasing popularity of computer vision, we are starting to have more and more vision devices on the embedded edge side.

The primary goal of optimizing vision-based edge embedded devices is to minimize power usage while maintaining the required throughput. In our work, we will try to demonstrate this issue by accelerating a simple matrix multiplication using Xilinx SDx. Matrix multiplication serves as a core function for linear-algebra applications and vision-based applications are heavily dependent on matrix multiplication computations.

We aim to perform power monitoring on the running application of the VCS. The intentions are to show how easily this power monitoring can be done therefore the main efforts are directed to the power monitoring side of this project. Power monitoring is done by a Lynsyn Lite and it must be configured before managing the power sampling. The picture shows how connections should be made. The connections can be configured according to which voltage rail the user wants to investigate. In our application, we will be investigating VIN rail which feeds both PS and PL sides.

Experimental Results

The application runs on both the PL and the PS sides as mentioned earlier. After configuring and building the project, the output files for the SD card can be used to boot the VCS-1 device. After booting the device, the user should log in with credentials as root-root. The first experiment a user can make is to simply run the application. This is possible by directly calling the executable from the terminal. Elf file should be executing the matrix multiplication in PS and PL. After running successfully, the application reports the difference of performance for HW and SW versions of the application. As expected, the HW version should be faster than the SW since it is accelerated in the PL. 

To measure the power consumption, the user should run the Lynsyn application in parallel while executing the accelerated application in the FPGA board. Lynsyn should be able to record the power usage if configured properly. To start the power sampling, click the profile button in the Lynsyn application. After setting up the duration, the Lynsyn device would start sampling the power usage.

Lynsyn output (Power):

Lynsyn output (Voltage):

As shown in the power samples, the application consumes around ~2.5W for both PS and PL sides. Considering other edge devices, these power numbers are satisfactory. As vision applications get more power-hungry due to the exponential increase of the computational data, tracking power consumption is getting more and more crucial for edge devices.

Power usage and performance

We have already discussed the advantages of using the FPGAs for embedded devices in the previous sections, primarily the low power operation while maintaining adequate data throughput for the required use-case.

From the figures we can see that the power-per-watt (green bar) performance is much better on the FPGA platform compared to their GPU counterparts, thus FPGA devices are more suitable to be installed on the edge, where power-saving is a crucial deployment factor.

Perhaps tracking power usage is much more important these days to enable more power-efficient designs in the near future!

We hope that this study expresses its motivations well and is understandable for everybody. Do not hesitate to contact us if you have any questions or ideas.

Contact us for more information