| Literature DB >> 29201333 |
Tamer A Mesallam1, Mohamed Farahat1, Khalid H Malki1, Mansour Alsulaiman2, Zulfiqar Ali2, Ahmed Al-Nasheri2, Ghulam Muhammad2.
Abstract
A voice disorder database is an essential element in doing research on automatic voice disorder detection and classification. Ethnicity affects the voice characteristics of a person, and so it is necessary to develop a database by collecting the voice samples of the targeted ethnic group. This will enhance the chances of arriving at a global solution for the accurate and reliable diagnosis of voice disorders by understanding the characteristics of a local group. Motivated by such idea, an Arabic voice pathology database (AVPD) is designed and developed in this study by recording three vowels, running speech, and isolated words. For each recorded samples, the perceptual severity is also provided which is a unique aspect of the AVPD. During the development of the AVPD, the shortcomings of different voice disorder databases were identified so that they could be avoided in the AVPD. In addition, the AVPD is evaluated by using six different types of speech features and four types of machine learning algorithms. The results of detection and classification of voice disorders obtained with the sustained vowel and the running speech are also compared with the results of an English-language disorder database, the Massachusetts Eye and Ear Infirmary (MEEI) database.Entities:
Mesh:
Year: 2017 PMID: 29201333 PMCID: PMC5672151 DOI: 10.1155/2017/8783751
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Arabic digits with international phonetic alphabets (IPAs) and English translation.
| Arabic digits | English translation | IPAs of Arabic digits |
|---|---|---|
| صفر | Zero | / |
| واحد | One | /w/, /a/, / |
| أثنين | Two | /a/, /th/, /n/, /a/, /y/, /n/ |
| ثلاثة | Three | /th/, /a/, /l/, / |
| أربعة | Four | /a/, /r/, /b/, /ʕ/, /a/ |
| خمسة | Five | /kh/, /a/, /m/, /s/, /a/ |
| ستة | Six | /s/, /i/, /t/, /t/, /a/ |
| سبعة | Seven | /s/, /a/, /b/, /ʕ/, /a/ |
| ثمانية | Eight | /th/, /a/, /m/, / |
| تسعة | Nine | /t/, /i/, /s/, /ʕ/, /a/ |
| عشرة | Ten | /ʕ/, /a/, / |
| — | ||
Common words with IPAs and English translation.
| Common words | English translation | IPAs of common words |
|---|---|---|
| ظرف | Envelope | /z/, /a/, /r/, /f/ |
| غزال | Deer | / |
| جمل | Camel | /j/, /a/, /m/, /a/, /l/ |
Text from Al-Fateha with English translation.
| English translation | Al-Fateha | Sentence number |
|---|---|---|
| Praise be to God, Lord of all the worlds |
| 1 |
|
| ||
| The Compassionate, the Merciful |
| 2 |
|
| ||
| Ruler on the Day of Reckoning |
| 3 |
|
| ||
| You alone do we worship, and You alone do we ask for help |
| 4 |
|
| ||
| Guide us on the straight path |
| 5 |
|
| ||
| The path of those who have received your grace |
| 6 |
|
| ||
| Not the path of those who have brought down wrath, nor of those who wander astray |
| 7 |
Number of occurrences of each Arabic letter in the recorded text.
| Letters | Number of occurrences |
|---|---|
| ا | 30 |
| ب | 5 |
| ت | 5 |
| ث | 4 |
| ج | 1 |
| ح | 4 |
| خ | 1 |
| د | 5 |
| ذ | 1 |
| ر | 10 |
| ز | 1 |
| س | 6 |
| ش | 1 |
| ص | 3 |
| ض | 2 |
| ط | 2 |
| ظ | 1 |
| ع | 10 |
| غ | 3 |
| ف | 2 |
| ق | 1 |
| ك | 3 |
| ل | 21 |
| م | 15 |
| ن | 13 |
| ه | 4 |
| و | 5 |
| ي | 14 |
Figure 1(a) Distribution of normal and voice disorder subjects in the AVPD. (b) Number of male and female samples for each disorder and normal subjects.
Figure 2Age distribution of male and female subjects in the AVPD.
Description of errors encountered during the verification process.
| Errors in the segments | Abbreviation | Description | Examples |
|---|---|---|---|
| Incomplete | i | When some part of the extracted text is missing at the start or end | (a) “d” is missing in |
| More | m | When a segment contains some part of the next or previous segment | (a) Segment of |
|
| |||
| Different | d | When the text in a segment is other than the expected one | Segment contains |
Tasks for the AVPD.
| Number | Tasks | Description |
|---|---|---|
| Task 1 | Time labeling | Start and end times of the recorded vowels, digits, Al-Fateha, and common words |
|
| ||
| Task 2 | Extraction | By using start and end times, the recorded vowels, digits, Al-Fateha, and common words are extracted and stored in a new |
|
| ||
| Task 3 | Verification | Verification of the extracted vowels, digits, Al-Fateha, and common words |
|
| ||
|
| ||
|
| ||
| Task 4 | Repeat time labeling | Update start and end time of the erroneous segments |
|
| ||
| Task 5 | Repeat extraction | Extract the segments again using updated time |
Overall best accuracies (%) for sustained vowels and running speech by using the AVPD.
| Features | Experiments | SVM | GMM | VQ | HMM | ||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| MFCC | Detection | 76.5 | 77.4 | 74.4 | 77.1 | 70.3 | 71.1 | 71.6 | 78.1 |
| Classification | 89.2 | 89.2 | 88.9 | 89.5 | 75.3 | 81.6 | 88.7 | 90.9 | |
|
| |||||||||
| LPCC | Detection | 60.1 | 76.5 | 54.5 | 76.7 | 70.3 | 75.9 | 73.5 | 71.5 |
| Classification | 67.6 | 84.7 | 75.4 | 86.0 | 75.5 | 77.9 | 59.0 | 86.0 | |
|
| |||||||||
| RASTA-PLP | Detection | 77.0 | 76.7 | 72.8 | 74.5 | 67.1 | 75.0 | 66.3 | 79.0 |
| Classification | 92.9 | 90.2 | 91.3 | 91.2 | 88.9 | 90.3 | 88.7 | 92.7 | |
|
| |||||||||
| LPC | Detection | 62.3 | 71.6 | 53.7 | 71.9 | 70.7 | 71.5 | 71.4 | 62.3 |
| Classification | 66.3 | 82.4 | 74.6 | 79.7 | 78.6 | 75.3 | 85.9 | 75.9 | |
|
| |||||||||
| PLP | Detection | 75.8 | 79.1 | 73.2 | 78.5 | 72.0 | 78.1 | 73.6 | 81.6 |
| Classification | 91.5 | 90.1 | 88.9 | 91.2 | 79.4 | 77.2 | 88.7 | 85.8 | |
|
| |||||||||
| MDVP | Detection | 79.5 | — | 69.8 | — | 64.8 | — | — | — |
| Classification | 82.3 | — | — | — | — | — | — | — | |
The best detection rate for sustained vowels. The best detection rate for running speech. The best classification rate for sustained vowels. The best classification rate for running speech.
Overall best accuracies (%) for sustained vowels and running speech by using the MEEI database.
| Features | Experiments | SVM | GMM | VQ | HMM | ||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| MFCC | Detection | 93.6 | 97.4 | 91.6 | 97.3 | 90.3 | 96.0 | 88.9 | 98.3 |
| Classification | 95.4 | 97.3 | 97.3 | 97.3 | 96.3 | 97.3 | 87.5 | 88.9 | |
|
| |||||||||
| LPCC | Detection | 91.0 | 97.9 | 90.7 | 96.4 | 83.2 | 97.8 | 87.6 | 98.2 |
| Classification | 95.4 | 97.3 | 97.3 | 97.3 | 98.2 | 97.3 | 87.5 | 97.3 | |
|
| |||||||||
| RASTA-PLP | Detection | 93.6 | 98.0 | 91.6 | 98.1 | 84.1 | 96.4 | 88.9 | 98.1 |
| Classification | 95.5 | 97.3 | 97.3 | 97.3 | 97.3 | 96.3 | 85.2 | 84.6 | |
|
| |||||||||
| LPC | Detection | 82.9 | 96.0 | 83.2 | 98.7 | 78.3 | 97.3 | 80.1 | 96.3 |
| Classification | 95.2 | 97.3 | 97.3 | 97.3 | 97.3 | 94.4 | 75.0 | 82.5 | |
|
| |||||||||
| PLP | Detection | 87.8 | 96.8 | 91.2 | 97.8 | 89.4 | 97.8 | 87.4 | 96.3 |
| Classification | 95.0 | 97.3 | 97.3 | 97.3 | 98.2 | 94.4 | 61.1 | 84.6 | |
|
| |||||||||
| MDVP | Detection | 89.5 | — | 88.3 | — | 68.3 | — | — | — |
| Classification | 88.9 | — | — | — | — | — | — | — | |
The best detection rate for sustained vowels. The best detection rate for running speech. The best classification rate for sustained vowels. The best classification rate for running speech.
Comparison of AVPD with two publicly available voice disorder databases.
| Sr. number | Characteristics | MEEI | AVPD | SVD |
|---|---|---|---|---|
| (1) | Language | English | Arabic | German |
|
| ||||
| (2) | Recording location | Massachusetts Eye & Ear Infirmary (MEEI) voice and speech laboratory, USA | Communication and Swallowing Disordered Unit, King Abdulaziz University Hospital, Saudi Arabia | Saarland University, Germany |
|
| ||||
| (3) | Sampling frequency | Samples are recorded at different sampling frequencies | All samples are recorded at same frequency | All samples are recorded at same frequency |
|
| ||||
| (4) | Extension of recorded samples | Recorded samples are stored in .NSP format only | Recorded samples are stored in .wav and .nsp format | Recorded samples are stored in .wav and .nsp format |
|
| ||||
| (5) | Recorded text | (i) Vowel /a/ | (i) Vowel /a/ | (i) Vowel /a/ |
|
| ||||
| (6) | Recording of vowels | Only stable part of the phonation | Complete phonation including onset and offset parts | Only stable part of the phonation |
|
| ||||
| (7) | Length of recorded samples | Normal | (i) Vowel: 5 sec | Vowels: 1~3 sec |
|
| ||||
| (8) | Ratio of normal and pathological subjects | Normal: 7% | Normal: 51% | Normal: 33% |
|
| ||||
| (9) | Perceptual severity |
| ✓ |
|
|
| ||||
| (10) | Pathology types | Functional and organic | Organic | Functional and organic |
|
| ||||
| (11) | Evaluation of normal subjects |
| ✓ | No such information is available |
Figure 3Comparison of detection and classification accuracy for the AVPD and MEEI databases.