| Literature DB >> 25133249 |
M F Siddiqui1, A W Reza1, J Kanesan1, H Ramiah1.
Abstract
A wide interest has been observed to find a low power and area efficient hardware design of discrete cosine transform (DCT) algorithm. This research work proposed a novel Common Subexpression Elimination (CSE) based pipelined architecture for DCT, aimed at reproducing the cost metrics of power and area while maintaining high speed and accuracy in DCT applications. The proposed design combines the techniques of Canonical Signed Digit (CSD) representation and CSE to implement the multiplier-less method for fixed constant multiplication of DCT coefficients. Furthermore, symmetry in the DCT coefficient matrix is used with CSE to further decrease the number of arithmetic operations. This architecture needs a single-port memory to feed the inputs instead of multiport memory, which leads to reduction of the hardware cost and area. From the analysis of experimental results and performance comparisons, it is observed that the proposed scheme uses minimum logic utilizing mere 340 slices and 22 adders. Moreover, this design meets the real time constraints of different video/image coders and peak-signal-to-noise-ratio (PSNR) requirements. Furthermore, the proposed technique has significant advantages over recent well-known methods along with accuracy in terms of power reduction, silicon area usage, and maximum operating frequency by 41%, 15%, and 15%, respectively.Entities:
Mesh:
Year: 2014 PMID: 25133249 PMCID: PMC4124737 DOI: 10.1155/2014/620868
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Number of arithmetic operations for 8-point DCT computation of some well-known algorithms.
| Algorithm | Additions | Multiplications |
|---|---|---|
| Loeffler et al. [ | 29 | 11 |
|
Suehiro and Hatori [ | 29 | 12 |
| Lee [ | 29 | 12 |
| Wang [ | 29 | 13 |
| Chen et al. [ | 26 | 16 |
Figure 12D-DCT using separability property.
12-bit precise DCT coefficient in CSD form and reduction of nonzero bits.
| Coefficient | Decimal value | Binary representation | CSD representation | Reduction of Nonzero bits |
|---|---|---|---|---|
|
| 0.4904 | 0.011111011000 |
| 4 |
|
| 0.4619 | 0.011101100100 |
| 2 |
|
| 0.4157 | 0.011010100110 |
| 0 |
|
| 0.3536 | 0.010110101000 |
| 0 |
|
| 0.2778 | 0.010001110001 |
| 1 |
|
| 0.1913 | 0.001100001111 |
| 2 |
|
| 0.0975 | 0.000110001111 |
| 2 |
Figure 2Proposed 5-stage pipelined DCT architecture.
Selector S1 attributes according to selection Sel1.
| Sel1 | Operation | |||
|---|---|---|---|---|
| Output 1 | Output 2 | Output 3 | Output 4 | |
| 00 |
|
|
|
|
| 01 |
|
|
|
|
| 10 |
|
|
|
|
| 11 |
|
|
|
|
Selector S2 attributes according to selection Sel2.
| Sel2 | Operation | |
|---|---|---|
| Output 1 | Output 2 | |
| 0 |
|
|
| 1 |
|
|
Description of the platforms used for the experiments with power analysis.
| Xilinx FPGAs | Altera FPGAs | |||
|---|---|---|---|---|
| Family | Virtex-II | Virtex-II | Spartan-3 | Cyclone II |
| Device | XC2VP30 | XC2VP50 | XC2S200 | EP2C35 |
| Speed grade | −5 | −5 | −5 | 6 |
| Design voltage (V) | 1.4 | 1.4 | 1.2 | 1.2 |
| Max. clock frequency (MHz) | 205 | 205 | 163.84 | 191.79 |
| Static power (mW) | 768 | 768 | 44 | 83 |
| Dynamic power (mW) | 66 | 66 | 23 | 42 |
Resource usages and DCT computing cycles of the proposed architecture.
| Total number of adders | 9 |
| Total number of subtractors | 6 |
| Total number of add/sub | 7 |
| Total number of fixed shifts | 13 |
| Total number of selectors | 2 |
| DSP slices | 0 |
| Memory modules | 0 |
| Total number of clock cycles for computing 1D-DCT | 4 + 8 |
| Total number of clock cycles for computing 8 × 8 2D-DCT | 12 + 64 |
Macrostatistics of 1D-DCT implementation.
| Method | [ | [ | [ | [ | [ | [ | [ | Proposed |
|---|---|---|---|---|---|---|---|---|
| Adders | 84 | 72 | 69 | 67 | 56 | 31 | 26 |
|
Performance analysis of different 1D-DCT architectures on Xilinx FPGAs.
| FPGA chip | XC2VP30 | XC2VP50 | XC3S200 | |||
|---|---|---|---|---|---|---|
| Architecture | [ | Proposed | [ | Proposed | [ | Proposed |
| Implementation | DA | CSD + New-CSE | CSD + CSE | CSD + New-CSE | DA | CSD + New-CSE |
| Precision (bits) | 9 | 12 | 11 | 12 | 9 | 12 |
| Number of slices | 936 | 347 | 454 | 347 | 793 | 340 |
| Operating clock frequency (MHz) | 99 | 205 | 119 | 120 | 61 | 163.84 |
| Dynamic power dissipation (mW) | 83.4 | 66 | 39 | 35 | 45 | 23 |
| Multiport input memory (number of read ports) | Yes (8) | No (1) | Yes (8) | No (1) | Yes (8) | No (1) |
Performance analysis of different 1D-DCT architectures on Altera FPGA.
| FPGA chip | Cyclone II (EP2C35F672C6) | ||
|---|---|---|---|
| Architecture | [ | [ | Proposed |
| Implementation | Modified Loeffler | Modified Loeffler | CSD + New-CSE |
| Precision (bits) | 12 | 12 | 12 |
| Logic elements | 1146 | 1109 | 713 |
| Operating clock frequency (MHz) | 128.25 | 139.55 | 191.79 |
| Dynamic power dissipation (mW) | 57 | 52 | 42 |
| Multiport input memory (number of read ports) | Yes (8) | Yes (8) | No (1) |
Figure 3Dynamic power consumption estimation per sample with 1.2 V design.
Proposed design applications to various image/video standards (8 × 8 block size).
| Applications | Data rate | Operating frequency | Number of frames computed | Dynamic power consumption |
|---|---|---|---|---|
| JPEG | 640 × 480 | 0.38 | 1 | 0.01 |
| H.263-QCIF | 176 × 144 × 10 | 0.26 | 10 | 0.01 |
| H.263-CIF | 352 × 288 × 15 | 1.52 | 15 | 0.07 |
| MPEG-1 | 352 × 240 × 30 | 2.54 | 30 | 0.11 |
| MPEG-2 | 720 × 480 × 30 | 10.37 | 30 | 0.45 |
| MPEG-2 (PAL) | 720 × 576 × 25 | 10.37 | 25 | 0.45 |
| MPEG-2 (HD1) | 1440 × 1080 × 30 | 46.66 | 30 | 2.04 |
| MPEG-2 (HD2) | 1920 × 1080 × 30 | 62.21 | 30 | 2.73 |
Figure 4Original standard testing images and their reconstructed images. (a) Original Image “Peppers” (b) Reconstructed Image “Peppers” (PSNR = 52.94 dB) (c) Original Image “Lena” (d) Reconstructed Image “Lena” (PSNR = 54.04 dB) (e) Original Image “Goldhill” (f) Reconstructed Image “Goldhill” (PSNR = 54.64 dB) (g) Original Image “Mandrill” (h) Reconstructed Image “Mandrill” (PSNR = 53.82 dB).
Figure 5PSNR analysis on different standard testing images.