| Literature DB >> 25674450 |
Hazrat Ali1, Nasir Ahmad2, Xianwei Zhou3, Khalid Iqbal3, Sahibzada Muhammad Ali4.
Abstract
This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of Urdu. A variety of age and dialect has been covered by using a balanced corpus approach. After extraction of features, the classification has been achieved by using Linear Discriminant Analysis. After the classification task, the confusion matrix obtained for the DWT features has been compared with the one obtained for Mel-Frequency Cepstral Coefficients based speech recognition. The framework has been trained and tested for speech data recorded under controlled environments. The experimental results are useful in determination of the optimum features for speech recognition task.Entities:
Keywords: Automatic speech recognition; Discrete wavelet transforms; Linear discriminant analysis; Mel-frequency cepstral coefficients; Urdu isolated words recognition
Year: 2014 PMID: 25674450 PMCID: PMC4320178 DOI: 10.1186/2193-1801-3-204
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Typical parameters for ASR complexity
| Parameter | Range |
|---|---|
| Speaking mode | Isolated words to continuous speech |
| Speaking style | Read speech to spontaneous speech |
| Enrollment | Speaker-dependent to speaker-independent |
| Vocabulary | Small (20 words) to large (20,000 words) |
| Language model | Finite-state to context-sensitive |
| Perplexity | Small (10) to large (100) |
| SNR | High (30 dB) to low (10 dB) |
| Transducer | Voice-cancelling microphone to telephone |
Vowels in english
| Vowel type | Vowel | Example |
|---|---|---|
| /iv/ | Beet | |
| Front vowels | /ih/ | It |
| /ae/ | At | |
| /aa/ | Father | |
| Mid position | /ax/ | All |
| /ah/ | Up | |
| Back vowels | /ux/ | Foot |
| /o/ | Obey |
Figure 1Overall block diagram.
Figure 2Decomposition of Signal by DWT.
Representation of speaker attributes
| Speaker name | Age group | Gender | Native non-native |
|---|---|---|---|
| AAMNG1 | G1 | Male | Non-native |
| ABMNG1 | G1 | Male | Non-native |
| ACMNG2 | G2 | Male | Non-native |
| AEFYG1 | G1 | Female | Native |
| AFFYG1 | G1 | Female | Native |
| AGMNG1 | G1 | Male | Non-native |
| AHMNG1 | G1 | Male | Non-native |
Figure 3Confusion matrix graph for words 01 to 10 - DWT features.
Figure 4Confusion matrix graph for words 01 to 10 - MFCC features.
Comparison of percentage error for DWT features and MFCCs - first ten words
| Word No. | Σ
|
| Σ
|
|
|---|---|---|---|---|
| 001 | 0 | 100 | 0.667 | 33.33 |
| 002 | 0 | 100 | 0.333 | 66.67 |
| 003 | 0.667 | 33.33 | 0.333 | 66.67 |
| 004 | 1.0 | 0 | 1.0 | 0 |
| 005 | 0.667 | 33.33 | 0.667 | 33.33 |
| 006 | 0 | 100 | 0.667 | 33.33 |
| 007 | 0.667 | 33.33 | 0.333 | 66.67 |
| 008 | 0 | 100 | 0.667 | 33.33 |
| 009 | 0.667 | 33.33 | 0.667 | 33.33 |
| 010 | 0.667 | 33.33 | 0.667 | 33.33 |
Figure 5Percentage error-wise distribution of words for DWT features based ASR.
Figure 6Percentage error-wise distribution of words for MFCCs based ASR.