| Literature DB >> 24982991 |
Chang-Hong Lin1, Wei-Kai Liao1, Wen-Chi Hsieh1, Wei-Jiun Liao1, Jia-Ching Wang1.
Abstract
The investigations of emotional speech identification can be divided into two main parts, features and classifiers. In this paper, how to extract an effective speech feature set for the emotional speech identification is addressed. In our speech feature set, we use not only statistical analysis of frame-based acoustical features, but also the approximated speech feature contours, which are obtained by extracting extremely low frequency components to speech feature contours. Furthermore, principal component analysis (PCA) is applied to the approximated speech feature contours so that an efficient representation of approximated contours can be derived. The proposed speech feature set is fed into support vector machines (SVMs) to perform multiclass emotion identification. The experimental results demonstrate the performance of the proposed system with 82.26% identification rate.Entities:
Mesh:
Year: 2014 PMID: 24982991 PMCID: PMC4055048 DOI: 10.1155/2014/757121
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Block diagram of the proposed system.
Figure 2Flowchart to obtain the approximated speech feature contour.
Figure 3Examples of approximated LFE contours from an angry speech utterance.
Figure 4Examples of approximated LFE contours from a bored speech utterance.
Figure 5Examples of approximated LFE energy contours from a sad speech utterance.
Figure 6PCA bases of approximated LFE with m value being 1.
Figure 7PCA bases of approximated LFE contours with m value being 2.
Figure 8PCA bases of approximated LFE contours with m value being 3.
Performance evaluation using approximated LFE contours.
| Feature | Dimension | Identification rate (%) |
|---|---|---|
|
| 3 | 44.9 |
|
| 5 | 43.77 |
|
| 6 | 44.52 |
Adopted statistical analysis features after feature selection.
| Statistical analysis feature set | Dim. |
|---|---|
| Silence ratio | 1 |
| Voiced ratio | 1 |
| Mean and standard deviation of pitch | 2 |
| Mean and standard deviation of log frame energy | 2 |
| Mean and standard deviation of subband powers | 8 |
| Mean and standard deviation of spectral centroid | 2 |
| Mean and standard deviation of bandwidth | 2 |
| Mean and standard deviation of MFCCs | 26 |
Performance evaluation using different feature sets.
| Ang. | Bor. | Dis. | Fear | Hap. | Neu. | Sad. | Total | |
|---|---|---|---|---|---|---|---|---|
| Γ | 93.7 | 80 | 69.6 | 76.5 | 62.9 | 76.9 | 87.1 | 80.0 |
| Γ, | 93.7 | 80 | 69.6 | 76.5 | 65.7 | 79.5 | 87.1 | 80.8 |
| Γ, | 93.7 | 80 | 73.9 | 79.4 | 65.7 | 82.1 | 83.9 | 81.5 |
| Γ, | 93.7 | 80 | 78.3 | 79.4 | 68.6 | 82.1 | 83.9 | 82.3 |
Confusion matrix of Γ, P 3 feature set. The left column denotes actual emotions, and the top row represents predicted emotions.
| Ang. | Bor. | Dis. | Fear | Hap. | Neu. | Sad. | |
|---|---|---|---|---|---|---|---|
| Ang. | 59 | 0 | 0 | 1 | 3 | 0 | 0 |
| Bor. | 0 | 32 | 0 | 0 | 0 | 7 | 1 |
| Dis. | 2 | 0 | 18 | 1 | 1 | 0 | 1 |
| Fear | 4 | 0 | 0 | 27 | 1 | 2 | 0 |
| Hap. | 8 | 0 | 0 | 3 | 24 | 0 | 0 |
| Neu. | 0 | 7 | 0 | 0 | 0 | 32 | 0 |
| Sad. | 0 | 2 | 0 | 0 | 1 | 2 | 26 |