| Literature DB >> 30469380 |
Javier Corral-García1, José-Luis González-Sánchez2, Miguel-Ángel Pérez-Toledano3.
Abstract
The Internet of Things (IoT) is faced with challenges that require green solutions and energy-efficient paradigms. Architectures (such as ARM) have evolved significantly in recent years, with improvements to processor efficiency, essential for always-on devices, as a focal point. However, as far as software is concerned, few approaches analyse the advantages of writing efficient code when programming IoT devices. Therefore, this proposal aims to improve source code optimization to achieve better execution times. In addition, the importance of various techniques for writing efficient code for Raspberry Pi devices is analysed, with the objective of increasing execution speed. A complete set of tests have been developed exclusively for analysing and measuring the improvements achieved when applying each of these techniques. This will raise awareness of the significant impact the recommended techniques can have.Entities:
Keywords: Raspberry Pi; code optimization; efficient code; performance optimization
Year: 2018 PMID: 30469380 PMCID: PMC6263706 DOI: 10.3390/s18114066
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Raspberry Pi (RPi) comparison chart.
| Model | RPi 2 B | RPi 3 B | RPi 3 B+ |
|---|---|---|---|
|
| Broadcom BCM2836 | Broadcom BCM2837 | Broadcom BCM2837B0 |
|
| 900 MHz Quad Core ARM Cortex-A7 | 1.2 GHz Quad Core ARM Cortex-A53 | 1.4 GHz Quad Core ARM Cortex-A53 |
|
| 1 GB | 1 GB | 1 GB |
|
| Broadcom VideoCore IV 1080p30 | Broadcom VideoCore IV 1080p60 | Broadcom VideoCore IV 1080p60 |
|
| 4 | 4 | 4 |
|
| 100 Mbit/s base Ethernet | 100 Mbit/s base Ethernet | Gigabit Ethernet over USB 2.0 (maximum throughput 300 Mbps) |
|
| No | No | Yes (requires separate PoE HAT) |
|
| No | On Board WiFi 802.11n | On Board WiFi 802.11ac Dual Band 2.4 GHz & 5 GHz |
|
| No | On Board Bluetooth 2.0/4.1 | On Board Bluetooth 2.0/4.1/4.2 LS BLE |
|
| HDMI 3.5 mm Composite DSI (for LCD) | HDMI 3.5 mm Composite DSI (for LCD) | HDMI 3.5 mm Composite DSI (for LCD) |
|
| I | I | I |
|
| 15 Pin CSI | 15 Pin CSI | 15 Pin CSI |
|
| 40 | 40 | 40 |
|
| MicroSD | MicroSD | MicroSD |
Runtime results and percentages of improvement in Raspberry Pi 2B.
| Technique | Runtime (without Optimization) | Runtime (Optimization Level 3) | |||||
|---|---|---|---|---|---|---|---|
| Standard (ns) | Efficient (ns) | Improvement (%) | Standard (ns) | Efficient (ns) | Improvement (%) | ||
| 1 | Bit fields | 85.71 | 42.28 | 50.67 | 31.25 | 28.92 | 7.46 |
| 2 | Boolean return | 40.63 | 38.35 | 5.61 | 20.32 | 20.30 | 0.10 |
| 3 | Cascaded function calls | 828.28 | 418.60 | 49.46 | 514.83 | 98.46 | 80.88 |
| 4 | Row-major accessing | 94,096.73 | 94,096.72 | 0.00 | 12,850.10 | 7502.61 | 41.61 |
| 5 | Constructor initialization lists | 1158.58 | 1128.70 | 2.58 | 1148.76 | 1107.63 | 3.58 |
| 6 | Common subexpression elimination | 74.50 | 55.29 | 25.79 | 36.13 | 34.97 | 3.21 |
| 7 | Mapping structures | 522.61 | 812.45 | −55.46 | 505.80 | 442.10 | 12.59 |
| 8 | Dead code elimination | 32.47 | 24.61 | 24.21 | 16.82 | 16.78 | 0.24 |
| 9 | Exception handling | 9465.32 | 42.51 | 99.55 | 9406.65 | 10.06 | 99.89 |
| 10 | Global variables within loops | 558.62 | 425.40 | 23.85 | 87.27 | 87.29 | −0.02 |
| 11 | Function inlining | 55.66 | 35.60 | 36.04 | 20.15 | 20.13 | 0.10 |
| 12 | Global variables | 1538.84 | 1211.68 | 21.26 | 697.21 | 589.75 | 15.41 |
| 13 | Constants inside loops | 504.34 | 353.86 | 29.84 | 224.77 | 228.08 | −1.47 |
| 14 | Initialization versus assignment | 57.09 | 27.96 | 51.02 | 54.55 | 10.01 | 81.65 |
| 15 | Division by a power-of-two denominator | 35.83 | 35.79 | 0.11 | 20.16 | 20.13 | 0.15 |
| 16 | Multiplication by a power-of-two factor | 35.63 | 35.60 | 0.08 | 20.05 | 20.02 | 0.15 |
| 17 | Integer versus character | 60.45 | 54.82 | 9.31 | 20.14 | 20.13 | 0.05 |
| 18 | Loop count down | 1483.21 | 1824.74 | −23.03 | 360.35 | 360.29 | 0.02 |
| 19 | Loop unrolling | 770.95 | 501.29 | 34.98 | 115.28 | 78.31 | 32.07 |
| 20 | Passing structures by reference | 509.51 | 44.75 | 91.22 | 474.79 | 10.06 | 97.88 |
| 21 | Pointer aliasing | 121.96 | 114.11 | 6.44 | 71.63 | 68.24 | 4.73 |
| 22 | Chains of pointers | 84.58 | 61.19 | 27.65 | 24.64 | 24.61 | 0.12 |
| 23 | Pre-increment versus post-increment | 2706.01 | 2707.06 | −0.04 | 692.16 | 690.97 | 0.17 |
| 24 | Linear search | 2487.52 | 2060.83 | 17.15 | 700.99 | 378.30 | 46.03 |
| 25 | Invariant IF statements within loops | 2161.82 | 1496.00 | 30.80 | 156.92 | 156.88 | 0.03 |
Runtime results and percentages of improvement in Raspberry Pi 3B.
| Technique | Runtime (without Optimization) | Runtime (Optimization Level 3) | |||||
|---|---|---|---|---|---|---|---|
| Standard (ns) | Efficient (ns) | Improvement (%) | Standard (ns) | Efficient (ns) | Improvement (%) | ||
| 1 | Bit fields | 55.09 | 28.35 | 48.54 | 21.05 | 20.16 | 4.23 |
| 2 | Boolean return | 32.84 | 25.23 | 23.17 | 12.65 | 12.61 | 0.32 |
| 3 | Cascaded function calls | 532.65 | 295.25 | 44.57 | 309.11 | 70.57 | 77.17 |
| 4 | Row-major accessing | 61,209.77 | 61,139.89 | 0.11 | 6796.87 | 5282.00 | 22.29 |
| 5 | Constructor initialization lists | 766.16 | 746.05 | 2.62 | 757.83 | 731.68 | 3.45 |
| 6 | Common subexpression elimination | 50.05 | 37.52 | 25.03 | 26.20 | 25.22 | 3.74 |
| 7 | Mapping structures | 399.75 | 524.69 | −31.25 | 381.80 | 371.36 | 2.73 |
| 8 | Dead code elimination | 21.03 | 15.12 | 28.10 | 10.96 | 10.92 | 0.36 |
| 9 | Exception handling | 5465.30 | 26.68 | 99.51 | 5460.63 | 6.67 | 99.88 |
| 10 | Global variables within loops | 382.89 | 281.08 | 26.59 | 74.11 | 74.07 | 0.05 |
| 11 | Function inlining | 37.88 | 23.54 | 37.86 | 12.54 | 12.50 | 0.32 |
| 12 | Global variables | 910.18 | 749.08 | 17.70 | 521.35 | 480.29 | 7.88 |
| 13 | Constants inside loops | 340.73 | 236.00 | 30.74 | 144.84 | 137.48 | 5.08 |
| 14 | Initialization versus assignment | 40.35 | 18.48 | 54.20 | 37.03 | 6.72 | 81.85 |
| 15 | Division by a power-of-two denominator | 23.41 | 23.35 | 0.26 | 12.55 | 12.51 | 0.32 |
| 16 | Multiplication by a power-of-two factor | 23.47 | 23.35 | 0.51 | 12.64 | 12.61 | 0.24 |
| 17 | Integer versus character | 40.07 | 35.86 | 10.51 | 12.64 | 12.60 | 0.32 |
| 18 | Loop count down | 1027.85 | 1292.02 | −25.70 | 269.68 | 188.72 | 30.02 |
| 19 | Loop unrolling | 532.41 | 330.48 | 37.93 | 93.45 | 57.18 | 38.81 |
| 20 | Passing structures by reference | 348.64 | 30.02 | 91.39 | 332.00 | 6.72 | 97.98 |
| 21 | Pointer aliasing | 84.31 | 78.39 | 7.02 | 48.47 | 45.86 | 5.38 |
| 22 | Chains of pointers | 60.54 | 43.69 | 27.83 | 16.82 | 16.79 | 0.18 |
| 23 | Pre-increment versus post-increment | 1867.60 | 1868.37 | −0.04 | 519.81 | 519.78 | 0.01 |
| 24 | Linear search | 1690.00 | 1368.81 | 19.01 | 435.44 | 284.90 | 34.57 |
| 25 | Invariant IF statements within loops | 1538.42 | 1038.22 | 32.51 | 123.54 | 123.50 | 0.03 |
Runtimes results and percentages of improvement in Raspberry Pi 3B+.
| Technique | Runtime (without Optimization) | Runtime (Optimization Level 3) | |||||
|---|---|---|---|---|---|---|---|
| Standard (ns) | Efficient (ns) | Improvement (%) | Standard (ns) | Efficient (ns) | Improvement (%) | ||
| 1 | Bit fields | 47.92 | 24.67 | 48.52 | 18.18 | 17.41 | 4.24 |
| 2 | Boolean return | 28.35 | 21.77 | 23.21 | 10.90 | 10.88 | 0.18 |
| 3 | Cascaded function calls | 460.35 | 256.04 | 44.38 | 267.48 | 61.58 | 76.98 |
| 4 | Row-major accessing | 53,434.46 | 53,584.19 | −0.28 | 6764.78 | 4622.89 | 31.66 |
| 5 | Constructor initialization lists | 673.66 | 667.99 | 0.84 | 662.28 | 653.39 | 1.34 |
| 6 | Common subexpression elimination | 43.01 | 32.16 | 25.23 | 22.46 | 21.52 | 4.19 |
| 7 | Mapping structures | 338.54 | 460.12 | −35.91 | 324.79 | 323.35 | 0.44 |
| 8 | Dead code elimination | 17.94 | 12.86 | 28.32 | 9.50 | 9.43 | 0.74 |
| 9 | Exception handling | 4988.09 | 23.59 | 99.53 | 5056.53 | 6.03 | 99.88 |
| 10 | Global variables within loops | 330.50 | 243.72 | 26.26 | 63.54 | 63.81 | −0.42 |
| 11 | Function inlining | 32.21 | 20.01 | 37.88 | 10.93 | 10.88 | 0.46 |
| 12 | Global variables | 793.95 | 679.45 | 14.42 | 464.86 | 436.42 | 6.12 |
| 13 | Constants inside loops | 289.83 | 202.27 | 30.21 | 126.07 | 119.64 | 5.10 |
| 14 | Initialization versus assignment | 34.35 | 15.72 | 54.24 | 31.52 | 5.71 | 81.88 |
| 15 | Division by a power-of-two denominator | 20.08 | 20.01 | 0.35 | 10.93 | 10.88 | 0.46 |
| 16 | Multiplication by a power-of-two factor | 20.07 | 20.01 | 0.30 | 10.80 | 10.72 | 0.74 |
| 17 | Integer versus character | 34.88 | 31.20 | 10.55 | 10.77 | 10.72 | 0.46 |
| 18 | Loop count down | 889.60 | 1103.27 | −24.02 | 240.11 | 180.32 | 24.90 |
| 19 | Loop unrolling | 456.63 | 286.43 | 37.27 | 79.51 | 48.60 | 38.88 |
| 20 | Passing structures by reference | 301.95 | 26.29 | 91.29 | 285.03 | 5.85 | 97.95 |
| 21 | Pointer aliasing | 72.21 | 67.18 | 6.97 | 41.57 | 39.31 | 5.44 |
| 22 | Chains of pointers | 51.51 | 37.16 | 27.86 | 14.34 | 14.29 | 0.35 |
| 23 | Pre-increment versus post-increment | 1611.73 | 1652.95 | −2.56 | 469.88 | 486.87 | −3.62 |
| 24 | Linear search | 1455.65 | 1177.63 | 19.10 | 386.03 | 270.78 | 29.86 |
| 25 | Invariant IF statements within loops | 1321.56 | 895.18 | 32.26 | 105.90 | 105.86 | 0.04 |
Figure 1Percentages of improvement in the runtime achieved by writing efficient code (without compiler optimization).
Figure 2Percentages of improvement in the runtime achieved by writing efficient code (with compiler optimization level 3).
Figure 3Cascaded function calls. Percentage of improvement in the runtime according to the number of calls to the function (number of iterations of the loop) in RPi 3B+.
Figure 4Row-major accessing. Percentage of improvement in the runtime according to the matrix size in RPi 3B+.
Figure 5Loop count down. Percentages of improvement in the runtime according to the array size (number of elements) in RPi 3B+.
Figure 6Loop unrolling in RPi 3B+. Percentages of improvement in the runtime according to the number of iterations. (a) Array size of 50 elements. (b) Array size of 100 elements. (c) Array size of 200 elements. (d) Array size of 300 elements.
Figure 7Linear search with for loops. Percentages of improvement in the runtime according to the array size (number of elements) in RPi 3B+.