HiPEAC Internship Report – Power profiling embedded FPGA systems
Below are details from a report by Baturay Onural and Ivica Matic. Baturay worked as an intern with Sundance as part of the HiPEAC internship program which has been a great success.
We live in a connected world, in which we can find embedded devices anywhere we go; from our portable devices like digital watches and MP3 players to large complex systems such as traffic light controllers or sensor systems, or autonomous vehicles. The number of Internet-connected embedded systems has reached more than 20 billion unique devices by the end of 2020 and from the growth charts, we can see that this number is not going to drop off soon.
- Both hardware and firmware re-programmability
- Parallel processing
- Lower power consumption
- Higher speed as component complexity is much lower
But… even with the lower power consumption for the FPGA devices, we still need to consider it as not only is it a big consideration for battery-powered systems, it also matters for heat dissipation inside the closed environment in which most of the embedded devices tend to operate.
What is the VCS-1?
The VCS-1 is a PC/104 Linux stack composed of 2 main components, namely the EMC2 board which is a PCIe/104 OneBank carrier for a Trenz compatible SoC Module and the FM191 expansion card that fans out the I/Os from the SoC to the outside world, in the case featuring the ZU4EV SOM. The SoC provides standard connectivity (e.g. SPI, RS232, I2C, USB, GigE, PCIe, etc), ARM-based processing which is used to run Linux OS, memory interfaces, and Programmable Logic used for Hardware acceleration and GPIO. The SoC can either be Xilinx Zynq 7 Series (Dual Core ARM Cortex A9) or Xilinx MPSoC Zynq Ultrascale+ (Quad-Core ARM Cortex A53).
What is the LynSyn Lite?
The Lynsyn Lite is a power measurement utility board, designed to measure the power usage of a system and correlates power values with the source code of the program running on the system.
- 10kHz current sampling frequency
- 3 independent current sensors
- Non-intrusive PC sampling (JTAG) for correlating power with source code; supports ARMv7-A and ARMv8-A architectures
VCS-1 controlled Flipper robot
Motion Impossible’s Agito controlled by the VCS-1
Capturing 3D data with the VCS-1
Our use-case
Computer vision is a big task in the computing world that can be optimized everywhere from big servers in warehouses to small embedded devices, and so, with the increasing popularity of computer vision, we are starting to have more and more vision devices on the embedded edge side.
The primary goal of optimizing vision-based edge embedded devices is to minimize power usage while maintaining the required throughput. In our work, we will try to demonstrate this issue by accelerating a simple matrix multiplication using Xilinx SDx. Matrix multiplication serves as a core function for linear-algebra applications and vision-based applications are heavily dependent on matrix multiplication computations.
We aim to perform power monitoring on the running application of the VCS. The intentions are to show how easily this power monitoring can be done therefore the main efforts are directed to the power monitoring side of this project. Power monitoring is done by a Lynsyn Lite and it must be configured before managing the power sampling. The picture shows how connections should be made. The connections can be configured according to which voltage rail the user wants to investigate. In our application, we will be investigating VIN rail which feeds both PS and PL sides.
Experimental Results
The application runs on both the PL and the PS sides as mentioned earlier. After configuring and building the project, the output files for the SD card can be used to boot the VCS-1 device. After booting the device, the user should log in with credentials as root-root. The first experiment a user can make is to simply run the application. This is possible by directly calling the executable from the terminal. Elf file should be executing the matrix multiplication in PS and PL. After running successfully, the application reports the difference of performance for HW and SW versions of the application. As expected, the HW version should be faster than the SW since it is accelerated in the PL.
Lynsyn output (Power):
Lynsyn output (Voltage):
As shown in the power samples, the application consumes around ~2.5W for both PS and PL sides. Considering other edge devices, these power numbers are satisfactory. As vision applications get more power-hungry due to the exponential increase of the computational data, tracking power consumption is getting more and more crucial for edge devices.
Power usage and performance
We have already discussed the advantages of using the FPGAs for embedded devices in the previous sections, primarily the low power operation while maintaining adequate data throughput for the required use-case.
From the figures we can see that the power-per-watt (green bar) performance is much better on the FPGA platform compared to their GPU counterparts, thus FPGA devices are more suitable to be installed on the edge, where power-saving is a crucial deployment factor.
Perhaps tracking power usage is much more important these days to enable more power-efficient designs in the near future!
We hope that this study expresses its motivations well and is understandable for everybody. Do not hesitate to contact us if you have any questions or ideas.