| Literature DB >> 36129754 |
Thomas Soroski1, Thiago da Cunha Vasco2, Sally Newton-Mason1, Saffrin Granby2, Caitlin Lewis1, Anuj Harisinghani2, Matteo Rizzo2, Cristina Conati2, Gabriel Murray3, Giuseppe Carenini2, Thalia S Field1, Hyeju Jang2.
Abstract
BACKGROUND: Speech data for medical research can be collected noninvasively and in large volumes. Speech analysis has shown promise in diagnosing neurodegenerative disease. To effectively leverage speech data, transcription is important, as there is valuable information contained in lexical content. Manual transcription, while highly accurate, limits the potential scalability and cost savings associated with language-based screening.Entities:
Keywords: Alzheimer disease; machine learning; memory; mild cognitive impairment; natural language processing; neurodegenerative disease; speech; speech recognition software; transcription software
Year: 2022 PMID: 36129754 PMCID: PMC9536526 DOI: 10.2196/33460
Source DB: PubMed Journal: JMIR Aging ISSN: 2561-7605
Figure 1Diagram of our methods and process.
Figure 2Diagram illustrating how the 3 different transcript data sets were generated.
Features for machine learning classification models.
| Task | Feature groups and number of features (n) in each group |
| Picture description |
Cookie Theft image information units (13) Part-of-speech (15), context-free-grammar rules (44), syntactic complexity (24), vocabulary richness (4), psycholinguistic (5), repetitiveness (5) |
| Reading |
Syllable count (1), pause count (1)a, total duration (1), total time spent speaking (1), proportion of time spent speaking (1), speech rate (1), average syllable duration (1), pauses per syllable (1)a, pause rate (1)a, pause duration (3)a |
| Experience description |
Part-of-speech (15), context-free-grammar rules (44), syntactic complexity (24), vocabulary richness (4), psycholinguistic (5), repetitiveness (5) |
aThese features were computed using acoustic data and transcript data and are also affected by method of pause detection (ie, acoustic vs text data).
Figure 3Google speech-to-text confidence results. Error bars represent the standard deviation. * represents P<.001, calculated by t-test.
Figure 4Average error rates by task and participant type. Error bars represent the standard deviation. There were no significant differences in error rates between or within tasks. MER: match error rate; WER: word error rate.
Machine learning classification results of models trained on automatic transcripts compared to results of models trained on manually corrected transcripts.
| Task and model type | Automatic transcripts AUROCa | Manually corrected transcripts AUROC | Change in AUROCb | |
|
| ||||
|
| RFc | 0.617 | 0.687 | 0.070d |
|
| GNBe | 0.662 | 0.725 | 0.063d |
|
| LRf | 0.671 | 0.743 | 0.072d |
|
| BERTg | 0.618 | 0.686 | 0.068d |
|
| ||||
|
| RF | 0.503 | 0.636 | 0.133d |
|
| GNB | 0.549 | 0.677 | 0.128d |
|
| LR | 0.543 | 0.674 | 0.131d |
|
| BERT | 0.630 | 0.650 | 0.020d |
aAUROC: area under the receiver operating characteristic curve.
bPositive change in AUROC indicates that the manually corrected transcript model outperformed the automatic transcript model.
cRF: random forest.
dIndicates P<.001.
eGNB: Gaussian naive Bayes.
fLR: logistic regression.
gBERT: Bidirectional Encoder Representations from Transformers.
Machine learning classification results of models trained on reading task data with pause features computed using acoustic data or computed using text data.
| Reading task | (1) Automatic transcripts AUROCa,b | (2) Manually corrected transcripts AUROCb | (3) Manually corrected transcripts AUROCc | Change in AUROC (3)–(1) |
| RFd | 0.638 | 0.655 | 0.662 | 0.024 |
| GNBe | 0.677 | 0.677 | 0.693 | 0.016 |
| LRf | 0.589 | 0.587 | 0.568 | −0.021 |
aAUROC: area under the receiver operating characteristic curve.
bPauses detected from acoustic data.
cPauses detected from text data.
dRF: random forest.
eGNB: Gaussian naive Bayes.
fLR: logistic regression.
Machine learning classification results of models trained on manually corrected transcripts without pauses compared to results of models trained on manually corrected transcripts (with pauses).
| Task and model type | Transcripts without pauses AUROCa | Transcripts with pauses AUROC | Change in AUROCb | |
|
| ||||
|
| RFc | 0.666 | 0.687 | 0.021 |
|
| GNBd | 0.730 | 0.725 | −0.005 |
|
| LRe | 0.755 | 0.743 | −0.012 |
|
| BERTf | 0.686 | 0.691 | 0.005 |
|
| ||||
|
| RF | 0.631 | 0.636 | 0.005 |
|
| GNB | 0.676 | 0.677 | 0.001 |
|
| LR | 0.692 | 0.674 | −0.018 |
|
| BERT | 0.622 | 0.649 | 0.027 |
aAUROC: area under the receiver operating characteristic curve.
bPositive change in AUROC indicates that the pause model outperformed the no-pause model.
cRF: random forest.
dGNB: Gaussian naive Bayes.
eLR: logistic regression.
fBERT: Bidirectional Encoder Representations from Transformers.