| Literature DB >> 33265272 |
Abstract
Rate-distortion optimization (RDO) plays an essential role in substantially enhancing the coding efficiency. Currently, rate-distortion optimized mode decision is widely used in scalable video coding (SVC). Among all the possible coding modes, it aims to select the one which has the best trade-off between bitrate and compression distortion. Specifically, this tradeoff is tuned through the choice of the Lagrange multiplier. Despite the prevalence of conventional method for Lagrange multiplier selection in hybrid video coding, the underlying formulation is not applicable to 3-D wavelet-based SVC where the explicit values of the quantization step are not available, with on consideration of the content features of input signal. In this paper, an efficient content adaptive Lagrange multiplier selection algorithm is proposed in the context of RDO for 3-D wavelet-based SVC targeting quality scalability. Our contributions are two-fold. First, we introduce a novel weighting method, which takes account of the mutual information, gradient per pixel, and texture homogeneity to measure the temporal subband characteristics after applying the motion-compensated temporal filtering (MCTF) technique. Second, based on the proposed subband weighting factor model, we derive the optimal Lagrange multiplier. Experimental results demonstrate that the proposed algorithm enables more satisfactory video quality with negligible additional computational complexity.Entities:
Keywords: 3-D wavelet-based SVC; Lagrange multiplier; mode decision; motion-compensated temporal filtering; rate-distortion optimization; scalable video coding
Year: 2018 PMID: 33265272 PMCID: PMC7512699 DOI: 10.3390/e20030181
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Block diagram of the ENH-MC-EZBC codec system model.
Figure 2Lifting-based MCTF framework with adaptive switching based on Haar and 5/3 filters.
Coefficients and depending on the mode of the current frame.
| Frame Mode |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| bi-direction | −1/2 | 1 | −1/2 | 1/4 | 1 | 1/4 |
| uni-left | −1 | 1 | 0 | 1/2 | 1 | 0 |
| uni-right | 0 | 1 | −1 | 0 | 1 | 1/2 |
| intra | N/A | N/A | N/A | N/A | 1 | N/A |
The values of each MCTF level for test video sequences.
| Sequences | MCTF Level | |||
|---|---|---|---|---|
| 1st | 2nd | 3rd | 4th | |
| Football | 0.9921 | 0.9894 | 0.9951 | 0.9975 |
| Foreman | 0.9860 | 0.9954 | 0.9967 | 0.9982 |
| Soccer | 0.9984 | 0.9962 | 0.9981 | 0.9993 |
| Crew | 0.9976 | 0.9983 | 0.9989 | 0.9991 |
| Ice | 0.9787 | 0.9899 | 0.9935 | 0.9980 |
| City | 0.9843 | 0.9894 | 0.9932 | 0.9982 |
| Johnny | 0.9885 | 0.9891 | 0.9964 | 0.9989 |
| KristenAndSara | 0.9932 | 0.9967 | 0.9970 | 0.9985 |
| Stockholm | 0.9924 | 0.9962 | 0.9984 | 0.9990 |
| Basketball | 0.9787 | 0.9812 | 0.9899 | 0.9988 |
| Cactus | 0.9949 | 0.9950 | 0.9966 | 0.9990 |
| Park_joy | 0.9815 | 0.9843 | 0.9957 | 0.9981 |
| Traffic | 0.9870 | 0.9932 | 0.9960 | 0.9991 |
| PeopleOnStreet | 0.9893 | 0.9919 | 0.9949 | 0.9993 |
Properties of standard video test sequences.
| Sequences | Resolution |
| Characteristics |
|---|---|---|---|
| Football | 352 | 260 | Fast camera and human subject motion, highly spatial details |
| Foreman | 352 | 300 | Fast camera and content motion with pan at the end |
| Soccer | 352 | 300 | Fast changes in motion, rapid camera panning |
| Crew | 704 | 300 | Multiple moderate objects movement |
| Ice | 704 | 240 | Still background and moderate human subject motion |
| City | 704 | 300 | Fast camera motion, high detail of buildings |
| Johnny | 1280 | 100 | Still background and low local motion |
| KristenAndSara | 1280 | 100 | Still background and moderate local motion |
| Stockholm | 1280 | 100 | Moderate camera panning, high detail of buildings |
| Basketball | 1920 | 100 | Fast camera and human subject motion, highly spatial details |
| Cactus | 1920 | 100 | Circling motion and highly spatial details |
| Park_joy | 1920 | 100 | Camera and content motion, high detail of trees |
| Traffic | 2560 | 100 | Moderate translational motion and highly spatial details |
| PeopleOnStreet | 2560 | 100 | Still background and many human subject motion |
1 The number of frames in the test video sequence.
Figure 3The spatial and temporal information indices of the test sequences (red star represents the coordinate value of SI and TI in the test sequence).
Figure 4R-D performance comparisons among five different codecs for sequences: (a) Soccer; (b) Crew; (c) Stockholm; (d) Basketball; (e) Park_joy, and (f) PeopleOnStreet.
Figure 5The average PSNR gains obtained for various test sequences.
Figure 6Average PSNR versus Lagrange multiplier at different target bitrates for test video sequences: (a) Soccer (640 kbps) and (b) Park_joy (10240 kbps).
Figure 7The average standard deviations of PSNR for various test sequences.
Figure 8Subjective visual quality comparisons of the 8th reconstructed frame of “City” sequence at 896 kbps.
Encoding speed comparison results.
| Resolution | Encoding Speed | ||||
|---|---|---|---|---|---|
| Ours | ENH-MC-EZBC | RWTH-MC-EZBC | RPI-MC-EZBC | MC-EZBC | |
| CIF | 5.31 | 5.82 | 4.87 | 1.52 | 1.21 |
| 4CIF | 5.16 | 5.55 | 4.52 | 1.29 | 1.05 |
| 720p | 3.05 | 3.57 | 2.71 | 0.87 | 0.64 |
| 1080p | 1.09 | 1.34 | 0.83 | 0.35 | 0.29 |
| 2K | 0.72 | 0.79 | 0.55 | 0.16 | 0.12 |