| Literature DB >> 34337589 |
Sandeep Reddy1, Ravi Bhaskar2, Sandosh Padmanabhan3, Karin Verspoor4, Chaitanya Mamillapalli5, Rani Lahoti6, Ville-Petteri Makinen7, Smitan Pradhan8, Puru Kushwah9, Saumya Sinha8.
Abstract
The emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) late last year has not only led to the world-wide coronavirus disease 2019 (COVID-19) pandemic but also a deluge of biomedical literature. Following the release of the COVID-19 open research dataset (CORD-19) comprising over 200,000 scholarly articles, we a multi-disciplinary team of data scientists, clinicians, medical researchers and software engineers developed an innovative natural language processing (NLP) platform that combines an advanced search engine with a biomedical named entity recognition extraction package. In particular, the platform was developed to extract information relating to clinical risk factors for COVID-19 by presenting the results in a cluster format to support knowledge discovery. Here we describe the principles behind the development, the model and the results we obtained.Entities:
Keywords: Biomedical NLP; COVID-19; Cluster Algorithms; Text Mining
Year: 2021 PMID: 34337589 PMCID: PMC8050406 DOI: 10.1016/j.cmpbup.2021.100010
Source DB: PubMed Journal: Comput Methods Programs Biomed Update ISSN: 2666-9900
Fig. 1Increase in pre-print and peer reviewed publications relating to COVID-19 February-May 2020 (Source: NIH, 2020).
Fig. 2Our biomedical NLP sequence.
Fig. 3Spectral and Agglomerative Cluster representation for search term ‘Lymphocytopenia'. Each dot represents a paper relating to the term.
Fig. 4Spectral and Agglomerative Cluster representation and confusion matrix for search term 'Anosmia’. Each dot represents a paper relating to the term.
Fig. 5Dendrogram for agglomerative clustering of search terms ‘Lymphocytopenia’ and 'Anosmia’ respectively.
Quality metrics of dimensionality reduction and cluster coherence.
| Quality Metrics | Spectral Clustering | Agglomerative Clustering | ||
|---|---|---|---|---|
| Search Term | Lymphocytopenia | Anosmia | Lymphocytopenia | Anosmia |
| Silhouette Index Score | 0.890 | 0.833 | 0.929 | 0.894 |
| Calinski-Harabasz Index Score | 196.617 | 256.558 | 258.605 | 849.014 |
| Davies-Bouldin Index Score | 0.808 | 0.898 | 0.514 | 0.539 |