| Literature DB >> 36172131 |
James Anibal, Adam Landa, Hang Nguyen, Alec Peltekian, Andrew Shin, Anna Christou, Lindsey Hazen, Miranda Song, Jocelyne Rivera, Robert Morhard, Ulas Bagci, Ming Li, David Clifton, Bradford Wood.
Abstract
Social media data can boost artificial intelligence (AI) systems designed for clinical applications by expanding data sources that are otherwise limited in size. Currently, deep learning methods applied to large social media datasets are used for a variety of biomedical tasks, including forecasting the onset of mental illness and detecting outbreaks of new diseases. However, exploration of online data as a training component for diagnostics tools remains rare, despite the deluge of information that is available through various APIs. In this study, data from YouTube was used to train a model to detect the Omicron variants of SARS-CoV-2 from changes in the human voice. According to the ZOE Health Study, laryngitis and hoarse voice were among the most common symptoms of the Omicron variant, regardless of vaccination status. 1 Omicron is characterized by pre-symptomatic transmission as well as mild or absent symptoms. Therefore, impactful screening methodologies may benefit from speed, convenience, and non-invasive ergonomics. We mined YouTube to collect voice data from individuals with self-declared positive COVID-19 tests during time periods where the Omicron variant (or sub-variants, including BA.4/5) consisted of more than 95% of cases. 2,3,4 Our dataset contained 183 distinct Omicron samples (28.39 hours), 192 healthy samples (33.90 hours), 138 samples from other upper respiratory infections (8.09 hours), and 133 samples from non-Omicron variants of COVID-19 (22.84 hours). We used a flexible data collection protocol and implemented a simple augmentation strategy that leveraged intra-sample variance arising from the diversity of unscripted speech (different words, phrases, and tones). This approach led to enhanced model generalization despite a relatively small number of samples. We trained a DenseNet model to detect Omicron in subjects with self-declared positive COVID-19 tests. Our model achieved 86% sensitivity and 81% specificity when detecting healthy voices (asymptomatic negative vs. all positive). We also achieved 76% sensitivity and 70% specificity separating between symptomatic negative samples and all positive samples. This result showed that social media data may be used to counterbalance the limited amount of well-curated data commonly available for deep learning tasks in clinical medicine. Our work demonstrates the potential of digital, non-invasive diagnostic methods trained with public online data and explores novel design paradigms for diagnostic tools that rely on audio data.Entities:
Year: 2022 PMID: 36172131 PMCID: PMC9516853 DOI: 10.1101/2022.09.13.22279673
Source DB: PubMed Journal: medRxiv
Figure 1.Workflow for COVID-19 detection pipeline. (1) Videos were mined from YouTube; (2) audio was extracted, and the human voice was separated from music and background noise; (3) audio recordings were split into segments and converted to spectrograms; (4) DenseNet model was trained on the spectrograms; (5) trained model was used to predict if samples in a testing dataset were positive or negative for COVID-19.
Model performance on randomly selected test datasets using the unscripted YouTube and scripted Coswara datasets:
| Dataset | Task | Sensitivity | Specificity |
|---|---|---|---|
| YouTube | Healthy Screening | 0.86 | 0.81 |
| YouTube | Symptomatic Testing | 0.76 | 0.70 |
| Coswara | Healthy Screening | 0.58 | 0.55 |
| Coswara | Symptomatic Testing | 0.52 | 0.43 |
Statistics for COVID-19 Sound/Voice Datasets used in this study.
| Dataset | COVID-19 Samples (All variants) | Omicron Samples | Other URI (Symptomatic) Samples | COVID-19 total audio | Omicron total audio | URI total audio |
|---|---|---|---|---|---|---|
| YouTube Dataset | 316 | 183 | 138 |
|
|
|
| Coswara | 464 |
| 102 | 1.92 | 0.95 | 0.47 |