| Literature DB >> 32143459 |
Riccardo Peloso1, Maurizio Capra1, Luigi Sole1, Massimo Ruo Roch1, Guido Masera1, Maurizio Martina1.
Abstract
In the last years, the need for new efficient video compression methods grown rapidly as frame resolution has increased dramatically. The Joint Collaborative Team on Video Coding (JCT-VC) effort produced in 2013 the H.265/High Efficiency Video Coding (HEVC) standard, which represents the state of the art in video coding standards. Nevertheless, in the last years, new algorithms and techniques to improve coding efficiency have been proposed. One promising approach relies on embedding direction capabilities into the transform stage. Recently, the Steerable Discrete Cosine Transform (SDCT) has been proposed to exploit directional DCT using a basis having different orientation angles. The SDCT leads to a sparser representation, which translates to improved coding efficiency. Preliminary results show that the SDCT can be embedded into the HEVC standard, providing better compression ratios. This paper presents a hardware architecture for the SDCT, which is able to work at a frequency of 188 M Hz , reaching a throughput of 3.00 GSample/s. In particular, this architecture supports 8k UltraHigh Definition (UHD) (7680 × 4320) with a frame rate of 60 Hz , which is one of the best resolutions supported by HEVC.Entities:
Keywords: VLSI; directional transform; discrete cosine transform; video coding
Year: 2020 PMID: 32143459 PMCID: PMC7085551 DOI: 10.3390/s20051405
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1HEVC basic structure.
Figure 2Example of Discrete Cosine Transform (DCT) and Steerable Discrete Cosine Transform (SDCT) kernels.
Figure 3Whole SDCT structure.
Figure 4Steerable block structure.
Figure 5Zig-zag scanning order.
Figure 6Lifting-based rotation.
Steerable Control Unit FSM states.
| write input buffer | A | START |
| read input & write output buffer | B | WAIT |
| write input & read output buffer | C, F, I, L | WB |
| read input & write output & read output buffer | D, G, H, M | RWB |
| read output buffer |
| RB |
Example of FSM state evolution.
| Memory Operation | Number of Cycles | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Write | 16 | X | 16 | X | 8 | X | 4 | X | 4 | 12 | X | X |
| Read & Write | X | 16 | X | 16 | X | 8 | X | 4 | X | X | 16 | X |
| Read | X | X | 16 | X | 8 | 8 | 4 | 4 | 4 | X | X | 16 |
| State | A | B | C | B | C | D | C | D | C | A | B | E |
| 16 | 16 | 8 | 4 | 16 | ||||||||
Figure 7Simplified FSM diagram.
Estimated power consumption at 188 MHz.
| Power | Internal | Switching | Total Dynamic | Leakage |
|---|---|---|---|---|
| basic DCT | ||||
| clock gated DCT | 21 mW | |||
| basic SDCT | ||||
| clock gated SDCT | ||||
| clock gated SDCT-16 | ||||
| clock gated SDCT-8 |
SDCT area occupation for different clock regimes.
| Cell | 1× Total Area | 2× Total Area | 4× Total Area | 8× Total Area |
|---|---|---|---|---|
| SDCT | ||||
| 2D-DCT | ||||
| IM | ||||
| OM | ||||
| FIFO | ||||
| ROM | 5895 |
Overview of the obtained architectures.
| Architecture | DCT | SDCT | SDCT-16 | SDCT-8 |
|---|---|---|---|---|
| Technology (nm) | 65 | 65 | 65 | 65 |
| Frequency (MHz) | 188 | 188 | 188 | 188 |
| Power (mW) |
|
|
|
|
| Throughput | ||||
| Area (mm |
|
|
|
|
BDBR [%] for implemented reduced SDCT sizes versus DCT-only.
| Sequence | SDCT [ | SDCT-16 | SDCT-8 |
|---|---|---|---|
| Kimono | −0.795 | −0.144 | −0.020 |
| ParkScene | −0.617 | −0.500 | −0.128 |
| Cactus | −0.485 | −0.392 | −0.209 |
| BQTerrace | −0.265 | −0.267 | −0.193 |
| BasketballDrive | −0.199 | −0.174 | −0.112 |
|
| −0.472 | −0.295 | −0.132 |
Figure 8Histogram of obtained BDBR saving with respect to DCT.
Comparison of 2D-DCT and SDCT Architectures.
| Design | Technology | Frequency | Throughput | Power | EPS | |
|---|---|---|---|---|---|---|
| [nm] | [MHz] | [Gsps] | [mW] | [pJ] | ||
| Zhao et al. [ | 45 | 333 | 0.634 | - | - | |
| Ahmed et al. [ | 90 | 150 | 0.246 | - | - | |
| Meher et al. [ | Folded | 90 | 187 | 2.992 | 40.04 | 13.38 |
| Full-parallel | 90 | 187 | 5.984 | 67.57 | 11.29 | |
| Masera et al. [ | Architecture 1 | 90 | 250 | 3.212 | 51.72 | 16.10 |
| SDCT | Folded | 65 | 188 | 2.992 | 148.67 | 49.69 |
| Folded-16 | 65 | 188 | 1.496 | 56.85 | 38 | |
| Folded-8 | 65 | 188 | 0.748 | 14.17 | 18.94 | |