| Literature DB >> 29360824 |
Luis Rodríguez-Flores1, Miguel Morales-Sandoval2, René Cumplido1, Claudia Feregrino-Uribe1, Ignacio Algredo-Badillo3.
Abstract
Security is a crucial requirement in the envisioned applications of the Internet of Things (IoT), where most of the underlying computing platforms are embedded systems with reduced computing capabilities and energy constraints. In this paper we present the design and evaluation of a scalable low-area FPGA hardware architecture that serves as a building block to accelerate the costly operations of exponentiation and multiplication in [Formula: see text], commonly required in security protocols relying on public key encryption, such as in key agreement, authentication and digital signature. The proposed design can process operands of different size using the same datapath, which exhibits a significant reduction in area without loss of efficiency if compared to representative state of the art designs. For example, our design uses 96% less standard logic than a similar design optimized for performance, and 46% less resources than other design optimized for area. Even using fewer area resources, our design still performs better than its embedded software counterparts (190x and 697x).Entities:
Mesh:
Year: 2018 PMID: 29360824 PMCID: PMC5779673 DOI: 10.1371/journal.pone.0190939
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Notation.
| Symbol | Description |
|---|---|
| Operand size in bits | |
| Total | |
| The modulus defining | |
| Elements in | |
| Radix | |
| Precomputed value, | |
| The | |
| Exponent | |
| The | |
| Exponent size in bits | |
| Value of |
Fig 1A< computation in a digit by digit approach.
Fig 2New digit-digit Montgomery multiplier architecture, memory and result reside in memory blocks.
Fig 3Digit-digit Montgomery Powering Ladder architecture.
Fig 4Implementation results of the Montgomery multiplier (Fig 2) in the Virtex-7 FPGA.
Fig 5Implementation result for the MPL architecture for a Virtex-7 FPGA.
Results and comparison for a 1024-bit exponentiation.
| Work | Alg. | Op.Size | FPGA | Area | BRAMs | DSPs | Freq | avg Cyc | avg T | Thrg | Efficiency |
|---|---|---|---|---|---|---|---|---|---|---|---|
| our.(k = 16) | MPL | 1024 | Z-7010 | 3 | 6 | 106.38 | 4265 | 40.10 | 25.535 | 0.234 | |
| our.(k = 32) | MPL | 1024 | Z-7010 | 5 | 22 | 68.49 | 1087 | 15.76 | 64.49 | 0.258 | |
| [ | MPL | 1024 | Spartan3E | 3899 | 16 | 20 | 119.05 | 946 | 7.95 | 128.84 | 0.033 |
| our.(k = 16) | MPL | 1024 | Spartan3E | 6 | 6 | 77.16 | 4265 | 55.29 | 18.521 | ||
| our.(k = 32) | MPL | 1024 | Spartan3E | 6 | 22 | 54.59 | 1087 | 19.93 | 51.387 | ||
| [ | MSB | 1024 | Virtex-5 | 7303 | - | - | 384.62 | 529 | 1.38 | 744.60 | 0.102 |
| [ | LSB | 1024 | Virtex-5 | 6217 | - | - | 222.11 | 397 | 1.79 | 572.50 | 0.092 |
| [ | LSB | 1024 | Virtex-5 | 4060 | - | - | 384.62 | 793 | 2.03 | 503.60 | 0.124 |
| [ | MPL | 1024 | Virtex-5 | 3218 | - | - | 346.02 | 1097 | 3.18 | 322.01 | 0.100 |
| [ | LSB | 1024 | Virtex-5 | 6776 | - | - | 401 | - | 1.37 | 747.4 | 0.110 |
| [ | MSB | 1024 | Virtex-5 | 12716 | - | - | 401 | - | 0.92 | 1113 | 0.087 |
| our(k = 16) | MPL | 1024 | Virtex-5 | 6 | 8 | 190.84 | 4265 | 22.35 | 45.809 | ||
| our(k = 32) | MPL | 1024 | Virtex-5 | 6 | 22 | 73.91 | 1087 | 14.71 | 69.605 | ||
| [ | LSB | 512 | Virtex-7 | 343 | - | 14 | 458 | - | 1.23 | 416.26 | 1.214 |
| our(k = 16) | MPL | 512 | Virtex-7 | 6 | 8 | 193.12 | 543 | 2.82 | 181.85 | ||
| [ | LSB | 1024 | Virtex-7 | 1060 | - | 26 | 485 | - | 2.33 | 439.48 | 0.415 |
| our(k = 64) | MPL | 1024 | Virtex-7 | 10 | 66 | 80.21 | 284 | 3.55 | 288.55 | ||
| [ | LSB | 2048 | Virtex-7 | 3558 | - | 54 | 399 | - | 5.68 | 360.56 | 0.101 |
| our(k = 64) | MPL | 2048 | Virtex-7 | 10 | 66 | 81.11 | 2174 | 26.82 | 76.37 |
Supply power (W) of the MPL architecture.
| Size | k | Clocks | Logic | Signals | BRAMs | DSPs | IOs | Dynamic | Quiescent | Total |
|---|---|---|---|---|---|---|---|---|---|---|
| 1024 | 8 | 0.005 | 0.003 | 0.008 | 0.021 | 0.006 | 0.007 | 0.049 | 0.178 | 0.227 |
| 1024 | 16 | 0.007 | 0.004 | 0.012 | 0.017 | 0.008 | 0.013 | 0.061 | 0.178 | 0.239 |
| 1024 | 64 | 0.006 | 0.015 | 0.032 | 0.036 | 0.023 | 0.021 | 0.132 | 0.178 | 0.311 |
| 2048 | 16 | 0.007 | 0.004 | 0.015 | 0.021 | 0.008 | 0.013 | 0.069 | 0.178 | 0.247 |
| 2048 | 64 | 0.006 | 0.014 | 0.029 | 0.036 | 0.023 | 0.021 | 0.128 | 0.178 | 0.307 |
GF(p) exponentiation in software vs. proposed MPL compact hardware architecture.
| Ref. | Imp. | Time |
|---|---|---|
| [ | MSP430 @ 8MHz | ≈3 s |
| [ | ATmega128 8MHz | 10.99 s |
| [ | WSN Software | 22.03 s |
| our(k = 64) | Virtex-7 | 3.55 ms |
| our(k = 32) | Virtex-5 | 14.71 ms |
| our(k = 32) | Zynq-Z7010 | 15.76 ms |
Fig 6Proposed hardware-software co-design for in-circuit verification of the MPL exponentiator hardware architecture in the Zynq Z-7010 MicroZed.
Area usage of the hardware-software co-design implementation in the MicroZed board.
| Resource | Used | Available | Utilization (%) |
|---|---|---|---|
| Slices | 459 | 4400 | 10.43 |
| DSP48E1 | 22 | 80 | 27.50 |
| RAMB36E1 | 4 | 60 | 6.67 |
| RAMB18E1 | 2 | 120 | 1.67 |
Supply power (W) for the SoC in the MicroZed board.
| Power (W) | |
|---|---|
| Clocks | 0.004 |
| Signals | 0.010 |
| Logic | 0.006 |
| BRAM | 0.013 |
| DSP | 0.013 |
| Zynq PS | 1.529 |
| Dynamic | 1.575 |
| Device Static | 0.134 |
| Total On-Chip Power | 1.709 |