| Literature DB >> 34642471 |
Abstract
New methods of securing the distribution of audio content have been widely deployed in the last twenty years. Their impact on perceptive quality has, however, only been seldomly the subject of recent extensive research. We review digital speech watermarking state of the art and provide subjective testing of watermarked speech samples. Latest speech watermarking techniques are listed, with their specifics and potential for further development. Their current and possible applications are evaluated. Open-source software designed to embed watermarking patterns in audio files is used to produce a set of samples that satisfies the requirements of modern speech-quality subjective assessments. The patchwork algorithm that is coded in the application is mainly considered in this analysis. Different watermark robustness levels are used, which allow determining the threshold of detection to human listeners. The subjective listening tests are conducted following ITU-T P.800 Recommendation, which precisely defines the conditions and requirements for subjective testing. Further analysis tries to determine the effects of noise and various disturbances on watermarked speech's perceived quality. A threshold of intelligibility is estimated to allow further openings on speech compression techniques with watermarking. The impact of language or social background is evaluated through an additional experiment involving two groups of listeners. Results show significant robustness of the watermarking implementation, retaining both a reasonable net subjective audio quality and security attributes, despite mild levels of distortion and noise. Extended experiments with Chinese listeners open the door to formulate a hypothesis on perception variations with geographical and social backgrounds.Entities:
Mesh:
Year: 2021 PMID: 34642471 PMCID: PMC8511066 DOI: 10.1038/s41598-021-99811-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Watermarking principle.
MOS score quality equivalence.
| MOS score | Corresponding quality |
|---|---|
| 5 | Excellent |
| 4 | Good |
| 3 | Fair |
| 2 | Poor |
| 1 | Bad |
Figure 2Results of reference studio condition with increasing watermark strength.
t-Test: results of reference studio condition with increasing watermark strength.
| Condition | Reference condition | t-value |
|---|---|---|
| C02 | C01 | 0.104 |
| C04 | C01 | 1.243 |
| C07 | C01 | 11.402* |
| C11 | C01 | 25.928* |
| C14 | C01 | 44.882* |
Statistically important differences ( critical value 1.662) are marked with *character.
Figure 3Results of reference studio condition and noise conditions without watermark.
Figure 4Results of reference studio condition with noise and increasing watermarking strength.
Figure 5Results of reference noise condition and increasing watermarking strength.
t-Test table: results of reference noise condition and increasing watermarking strength.
| Condition | Reference condition | t-value |
|---|---|---|
| C06 | C05 | 0.000 |
| C08 | C09 | 0.488 |
| C10 | C05 | 0.738 |
| C12 | C09 | 0.293 |
| C15 | C03 | 16.117* |
| C16 | C03 | 0.815 |
Statistically important differences ( critical value 1.662) are marked with * character.
Figure 6Comparison of studio conditions between mixed and Chinese listeners.
Figure 7Comparison of watermarked noise conditions between mixed and Chinese listeners.