| Literature DB >> 36011614 |
Leihong Wu1, Syed Ali1, Heather Ali2, Tyrone Brock1,3, Joshua Xu1, Weida Tong1.
Abstract
COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data. In this study, we developed a language model to search abstracts using the most advanced artificial intelligence (AI) to accurately retrieve articles on COVID-19-associated neurological disorders. We applied this NeuroCORD model to the largest benchmark dataset of COVID-19, CORD-19. We found that the model developed on the training set yielded 94% prediction accuracy on the test set. This result was subsequently verified by two experts in the field. In addition, when applied to 96,000 non-labeled articles that were published after 2020, the NeuroCORD model accurately identified approximately 3% of them to be relevant for the study of COVID-19-associated neurological disorders, while only 0.5% were retrieved using conventional keyword searching. In conclusion, NeuroCORD provides an opportunity to profile neurological disorders resulting from COVID-19 in a rapid and efficient fashion, and its general framework could be used to study other COVID-19-related emerging health issues.Entities:
Keywords: BERT model; COVID-19; information retrieval; language model; machine learning; neurological disorders; text mining
Mesh:
Year: 2022 PMID: 36011614 PMCID: PMC9408703 DOI: 10.3390/ijerph19169974
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1The overall study pipeline for developing the NeuroCORD model.
Result of grid optimization on embedding and modeling algorithms.
| Embedding Algorithm | Modeling Algorithm | 10-Fold CV (2561) | Testing (819) |
|---|---|---|---|
| allenai-specter | RF | 0.959 | 0.923 |
| KNN | 0.906 | 0.904 | |
| MLP | 0.972 | 0.940 | |
| roberta-large | RF | 0.936 | 0.855 |
| KNN | 0.839 | 0.832 | |
| MLP | 0.952 | 0.926 | |
| glove_6B_300d | RF | 0.930 | 0.889 |
| KNN | 0.857 | 0.882 | |
| MLP | 0.943 | 0.940 |
Results of baseline tests.
| NeuroCORD | Test | Train (B1) | Test (B1) | Train (B2) | Test (B2) | |
|---|---|---|---|---|---|---|
|
| 0.906 | 0.904 | 0.508 | 0.514 | 0.698 | 0.703 |
|
| 0.959 | 0.923 | 0.500 | 0.479 | 0.746 | 0.709 |
|
| 0.972 | 0.940 | 0.499 | 0.471 | 0.775 | 0.722 |
Figure 2T-SNE analysis of new publications (in 2021). Distribution of articles (a) in the training dataset, (b) in the new publications, and (c) in the combined dataset. (Predicted) positive articles are in blue/red/purple; negative articles are in gray. The red and green bounding boxes in (a,c) are the two clusters discussed in this paper.