| Literature DB >> 35448240 |
Angelos Kyriakos1, Elissaios-Alexios Papatheofanous1, Charalampos Bezaitis1, Dionysios Reisis1.
Abstract
A plethora of image and video-related applications involve complex processes that impose the need for hardware accelerators to achieve real-time performance. Among these, notable applications include the Machine Learning (ML) tasks using Convolutional Neural Networks (CNNs) that detect objects in image frames. Aiming at contributing to the CNN accelerator solutions, the current paper focuses on the design of Field-Programmable Gate Arrays (FPGAs) for CNNs of limited feature space to improve performance, power consumption and resource utilization. The proposed design approach targets the designs that can utilize the logic and memory resources of a single FPGA device and benefit mainly the edge, mobile and on-board satellite (OBC) computing; especially their image-processing- related applications. This work exploits the proposed approach to develop an FPGA accelerator for vessel detection on a Xilinx Virtex 7 XC7VX485T FPGA device (Advanced Micro Devices, Inc, Santa Clara, CA, USA). The resulting architecture operates on RGB images of size 80×80 or sliding windows; it is trained for the "Ships in Satellite Imagery" and by achieving frequency 270 MHz, completing the inference in 0.687 ms and consuming 5 watts, it validates the approach.Entities:
Keywords: CNN; FPGA; accelerator; image processing; vessel detection
Year: 2022 PMID: 35448240 PMCID: PMC9032259 DOI: 10.3390/jimaging8040114
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Input block architecture.
Figure 2Convolution block architecture (∗ refers to fixed-point integer multiplication, + refers to fixed-point integer addition).
Figure 3Pooling block architecture.
Figure 4Output block architecture (∗ refers to fixed-point integer multiplication, + refers to fixed-point integer addition).
Figure 5Model architecture.
Figure 6FPGA Architecture of the input layer, first convolution and pooling layers and the second input layer (+ refers to fixed-point integer addition).
Figure 7FPGA architecture of the second convolution and pooling layers, fully connected layer and output layer (∗ refers to fixed-point integer multiplication, s refers to the “select” input pin of the multiplexer, + refers to fixed-point integer addition).
Resource utilization.
| Resource | Utilization | Utilization % |
|---|---|---|
| LUT | 50,743 | 16.71 |
| LUTRAM | 4228 | 3.23 |
| FF | 70,786 | 11.66 |
| BRAM | 96.5 | 9.37 |
| DSP | 843 | 30.11 |
Figure 8Power utilization.
Performance comparison to CPU and GPU.
| Execution Time (ms) | FPGA Speed-Up | |
|---|---|---|
| FPGA | 0.687 | - |
| CPU | 4.696 | 6.836 |
| GPU | 2.202 | 3.205 |
Performance and power comparison to edge devices.
| Execution Time (ms) | Speed-Up | Power (W) | |
|---|---|---|---|
| Jetson Nano CPU | 440 | - | 10 |
| Jetson Nano GPU | 20.3 | 21.7 | 10 |
| Myriad2 1 SHAVE | 56.27 | 7.8 | 0.5 |
| Myriad2 12 SHAVE | 14.59 | 30.1 | 1 |
| FPGA Accelerator | 0.687 | 640.5 | 5 |
Reporting the features of related results.
| [ | [ | [ | Proposed Accelerator | |
|---|---|---|---|---|
| Precision | fl. point | fl. point | fixed-point | fixed-point |
| 32 bits | 32 bits | 16 bits | 17 bits | |
| Frequency (MHz) | 100 | 100 | 156 | 270 |
| FPGA | Xilinx Virtex | Xilinx Zynq | Xilinx Zynq | Xilinx Virtex |
| VC707 | 7100 | ZCU106 | VC707 | |
| CNN Size | 1.33 GFLOP | N/A | N/A | 18.122 MMAC |
| Performance (GOP/s) | 61.62 | 17.11 | N/A | 52.80 |
| Power (Watt) | 18.61 | 4.083 | 3.4 | 5.001 |
| Perf./Watt | 3.31 | 4.19 | N/A | 10.56 |
| (GOP/s/Watt) | ||||
| DSPs | 2240 | 1926 | 1175 | 843 |
| DSP Efficiency | 0.027 | 0.008 | N/A | 0.062 |
| (GOP/s/DSP) |