| Literature DB >> 36236767 |
Yonghao Tan1, Mengying Sun1, Huanshihong Deng1, Haihan Wu1, Minghao Zhou1, Yifei Chen1, Zhuo Yu1, Qinghan Zeng2, Ping Li3, Lei Chen1, Fengwei An1.
Abstract
With the wide application of autonomous mobile robots (AMRs), the visual inertial odometer (VIO) system that realizes the positioning function through the integration of a camera and inertial measurement unit (IMU) has developed rapidly, but it is still limited by the high complexity of the algorithm, the long development cycle of the dedicated accelerator, and the low power supply capacity of AMRs. This work designs a reconfigurable accelerated core that supports different VIO algorithms and has high area and energy efficiency, precision, and speed processing characteristics. Experimental results show that the loss of accuracy of the proposed accelerator is negligible on the most authoritative dataset. The on-chip memory usage of 70 KB is at least 10× smaller than the state-of-the-art works. Thus, the FPGA implementation's hardware-resource consumption, power dissipation, and synthesis in the 28 nm CMOS outperform the previous works with the same platform.Entities:
Keywords: AMRs; SLAM; VIO; accelerator; reconfigurable
Year: 2022 PMID: 36236767 PMCID: PMC9570810 DOI: 10.3390/s22197669
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Process Diagram of SLAM.
Figure 2The overall procedure of the visual–inertial odometry.
Figure 3Diagram of IMU model.
Figure 4The hardware-friendly pipeline FAST algorithm.
Supported operations by the accelerated core.
| Operation | Description | Time Consumption |
|---|---|---|
| Scalar add/sub | Scalar addition/subtraction | 5 clock cycles |
| Scalar mul | Scalar multiplication | 2 clock cycles |
| Scalar reci | Scalar reciprocal | 10 clock cycles |
| Scalar sqrt_slow | Scalar square root with high accuracy | 27 clock cycles |
| Scalar sqrt_fast | Scalar square root with low latency | 10 clock cycles |
| Sin_Cos | Sine and cosine function for input in radians | 52 clock cycles |
| M_inv | Matrix inversion | 31 clock cycles |
| Li2R | Transform lie algebra to a rotation matrix | 73 clock cycles |
| Li2Q | Transform lie algebra to quaternion | 65 clock cycles |
| R2Q | Transform rotation matrix to quaternion | 53 clock cycles |
| Q2R | Transform quaternion to a rotation matrix | 14 clock cycles |
| Q_q | Quaternion multiplication | 14 clock cycles |
Figure 5Overall hardware architecture of the proposed accelerated core. (a) Hardware architecture of the proposed VIO accelerated core; (b) control and dataflow of the three-layer circuits; (c) shared memory for the storage of vectors and matrices.
Figure 6Detailed structure of the fixed-point vision pipeline.
Figure 7Modules and processing flow of the programmable computation core.
Description and time consumption of operations supported by the computation core.
| Operation | Description | Time Consumption |
|---|---|---|
| Scalar add/sub | Scalar addition/subtraction | 5 clock cycles |
| Scalar mul | Scalar multiplication | 2 clock cycles |
| Scalar reci | Scalar reciprocal | 10 clock cycles |
| Scalar sqrt_slow | Scalar square root with high accuracy | 27 clock cycles |
| Scalar sqrt_fast | Scalar square root with low latency | 10 clock cycles |
| Sin_Cos | Sine and cosine function for input in radians | 52 clock cycles |
| M_inv | Matrix inversion | 31 clock cycles |
| Li2R | Transform lie algebra to a rotation matrix | 73 clock cycles |
| Li2Q | Transform lie algebra to quaternion | 65 clock cycles |
| R2Q | Transform rotation matrix to quaternion | 53 clock cycles |
| Q2R | Transform quaternion to a rotation matrix | 14 clock cycles |
| Q_q | Quaternion multiplication | 14 clock cycles |
Figure 8Detailed structure of the feature processing engine.
Figure 9Schematic diagram of vectorized matrix multiplication strategy.
Evaluation of accuracy.
| Dataset | ROVIO (Software) | The Proposed Core |
|---|---|---|
| MH_1 | 0.19% | 0.19% |
| MH_2 | 0.23% | 0.23% |
| MH_3 | 0.47% | 0.49% |
| MH_4 | 0.55% | 0.52% |
| MH_5 | 0.78% | 0.79% |
| V1_1 | 0.28% | 0.26% |
| V1_2 | 0.35% | 0.34% |
| V1_3 | 0.27% | 0.25% |
| V2_1 | 0.26% | 0.26% |
| V2_2 | 0.37% | 0.40% |
| V2_3 | 0.61% | 0.61% |
Figure 10Demo evaluation platform. (a) VU440 FPGA board with mt9v034 image sensor and MPU9250 IMU. (b) New features are shown through the HDMI display. (c) Trajectory output in the x, y, and z axes (three dimensions).
Figure 11Measured experience map over frames during SLAM process.
Hardware implementation results. The outstanding work in the comparison is bold.
| MIT 2017 [ | ICFPT 2021 [ | JSSC 2019 [ | ISSCC 2019 [ | This Work | |
|---|---|---|---|---|---|
| Type | VIO | SLAM | VIO | SLAM | VIO |
| Odometry | IMU | Visual | IMU | Visual | IMU |
| FPGA Platform | Kintex-7 | UltraScale + XCZU7EV | N/A | N/A | UltraScale + |
| Technology | N/A | N/A | 65 nm | 28 nm | 28 nm * |
| Resolution | N/A | 640 × 480 | 752 × 480 | 640 × 480 | 640 × 480 |
| Speed | 20 fps | 15.5 fps |
| 80 fps | 160 fps |
| Frequency | 100 MHz | 100 MHz | 62.5 MHz/ | 240 MHz |
|
| SoC | No | Yes | No | No | No |
| On-chip Memory | 2048 KB |
| 854 KB | 1126 KB | 70 KB |
| LUTs | 192,000 | 146,572 | N/A | N/A |
|
| FFs | 144,000 | 74,166 | N/A | N/A |
|
| DSPs | 771 | 173 | N/A | N/A |
|
| Power | 1.46 W | Not Given | 24 mW | 243.6 mW | |
| Area | N/A | N/A | 20 mm2 | 10.92 mm2 |
|
| Reconfigurable | No | No | No | No | Yes |
| Application | Nano and | Autonomous | AR, VR and UAVs | AMRs | |
*: Synthesis results in 28 nm CMOS technology.
Figure 12Power proportion of each module. (a) on-chip power; (b) dynamic utilization; (c) utilization details.