| Literature DB >> 24999492 |
Dongdong Li1, Yingchun Yang2, Weihui Dai3.
Abstract
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.Entities:
Mesh:
Year: 2014 PMID: 24999492 PMCID: PMC4066940 DOI: 10.1155/2014/628516
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Example of segment boundaries estimation for the phrase “shi de.” The vertical bars represent the segment boundaries from the critical points of pitch contours.
Figure 2The probability distribution of PEM for the male (a) and female speakers (b) under the five emotion states.
Figure 3The frame-level score rank's probability density functions for target speakers and nontarget speakers over 68 subjects in MASC.
Figure 4DET curves for the traditional speaker models trained with neutral speech only.
Comparison of system performance under different types of affective speech (%).
| Method | Baseline | CSSR |
|---|---|---|
| Anger | 21.80 | 33.74 |
| Elation | 22.70 | 36.23 |
| Neutral | 94.40 | 95.63 |
| Panic | 26.30 | 36.14 |
| Sadness | 51.13 | 54.67 |
| Total |
|
|
Figure 5DET curves for the baseline, T-norm, ENORM, PFLSR, and CSSR based speaker verification system.