| Literature DB >> 25006595 |
Seyed Mostafa Mirhassani1, Alireza Zourmand1, Hua-Nong Ting1.
Abstract
Automatic estimation of a speaker's age is a challenging research topic in the area of speech analysis. In this paper, a novel approach to estimate a speaker's age is presented. The method features a "divide and conquer" strategy wherein the speech data are divided into six groups based on the vowel classes. There are two reasons behind this strategy. First, reduction in the complicated distribution of the processing data improves the classifier's learning performance. Second, different vowel classes contain complementary information for age estimation. Mel-frequency cepstral coefficients are computed for each group and single layer feed-forward neural networks based on self-adaptive extreme learning machine are applied to the features to make a primary decision. Subsequently, fuzzy data fusion is employed to provide an overall decision by aggregating the classifier's outputs. The results are then compared with a number of state-of-the-art age estimation methods. Experiments conducted based on six age groups including children aged between 7 and 12 years revealed that fuzzy fusion of the classifier's outputs resulted in considerable improvement of up to 53.33% in age estimation accuracy. Moreover, the fuzzy fusion of decisions aggregated the complementary information of a speaker's age from various speech sources.Entities:
Mesh:
Year: 2014 PMID: 25006595 PMCID: PMC4070543 DOI: 10.1155/2014/534064
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Block diagram of the proposed fuzzy data fusion method.
Summary of speech database.
| Speaker ages | /a/ | /e/ | / | /i/ | /o/ | /u/ |
|---|---|---|---|---|---|---|
| 7 | 60 | 60 | 60 | 60 | 60 | 60 |
| 8 | 60 | 60 | 60 | 60 | 60 | 60 |
| 9 | 60 | 60 | 60 | 60 | 60 | 60 |
| 10 | 60 | 60 | 60 | 60 | 60 | 60 |
| 11 | 60 | 60 | 60 | 60 | 60 | 60 |
| 12 | 60 | 60 | 60 | 60 | 60 | 60 |
A comparative result of vowel independent age estimation.
| Classification method | Accuracy (%) | Specifications |
|---|---|---|
| ANN (ELM) | 24.77 | 100 hidden neurons |
| SVM | 24.21 | Linear kernel |
| KNN | 23.47 | Euclidean distance, number of nearest neighbors = 20 |
Vowel-based age estimation accuracy (in percentage) based on different activation functions and fusion of the results using the proposed fuzzy information fusion method.
| Vowel groups | Fusion | |||||
|---|---|---|---|---|---|---|
| /a/ | /e/ | / | /i/ | /o/ | /u/ | |
| 25.83 | 23.33 | 29.17 | 25.83 | 19.17 | 30.83 | 53.33 |
Confusion matrix of the proposed age estimation method based on 6 age classes.
| 7 | 8 | 9 | 10 | 11 | 12 | Accuracy (%) | |
|---|---|---|---|---|---|---|---|
| 7 | 17 | 1 | 2 | 0 | 0 | 0 | 85.0 |
| 8 | 3 | 15 | 1 | 1 | 0 | 0 | 75.0 |
| 9 | 5 | 1 | 11 | 2 | 0 | 1 | 55.0 |
| 10 | 6 | 5 | 1 | 6 | 1 | 1 | 30.0 |
| 11 | 4 | 4 | 1 | 1 | 9 | 1 | 45.0 |
| 12 | 7 | 3 | 1 | 2 | 1 | 6 | 30.0 |
| 53.33 |
Confusion matrix of the proposed age estimation method based on 3 age classes.
| 7, 8 | 9, 10 | 11, 12 | Accuracy (%) | |
|---|---|---|---|---|
| 7, 8 | 108 | 12 | 0 | 90.0 |
| 9, 10 | 51 | 60 | 9 | 50.0 |
| 11, 12 | 54 | 15 | 51 | 42.5 |
| 60.83 |
A comparative result of the proposed method and the baseline system for age estimation.
| Classification method | Accuracy (%) | Specifications |
|---|---|---|
| Proposed method | 53.33 | 60 hidden neurons |
| SVM [ | 30.56 | Linear kernel, Gamma = 2 |
|
| 37.5 | Supervector size = 300, linear kernel, Gamma = 2 |