| Literature DB >> 32458963 |
David Oniani1, Guoqian Jiang2, Hongfang Liu2, Feichen Shen2.
Abstract
OBJECTIVE: As coronavirus disease 2019 (COVID-19) started its rapid emergence and gradually transformed into an unprecedented pandemic, the need for having a knowledge repository for the disease became crucial. To address this issue, a new COVID-19 machine-readable dataset known as the COVID-19 Open Research Dataset (CORD-19) has been released. Based on this, our objective was to build a computable co-occurrence network embeddings to assist association detection among COVID-19-related biomedical entities.Entities:
Keywords: COVID-19; association extraction; co-occurrence network embeddings; coronavirus INFECTIOUS diseases
Mesh:
Year: 2020 PMID: 32458963 PMCID: PMC7314034 DOI: 10.1093/jamia/ocaa117
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Study workflow.
Evaluation results for the 4 edge embeddings operations along with 6 machine learning algorithms
| Operations | Algorithms | Average Precision (AP) | ROC score | Precision | Recall | F1 score |
|---|---|---|---|---|---|---|
| Hadamard | DT | 0.79 | 0.82 | 0.84 | 0.82 | 0.81 |
| LR | 0.89 | 0.83 | 0.86 | 0.82 | 0.81 | |
| SVM | 0.80 | 0.81 | 0.85 | 0.81 | 0.81 | |
| RF | 0.92 | 0.89 | 0.87 | 0.86 | 0.86 | |
| NB | 0.82 | 0.84 | 0.86 | 0.84 | 0.84 | |
| MLP | 0.56 | 0.60 | 0.63 | 0.60 | 0.57 | |
| Average | DT | 0.81 | 0.84 | 0.85 | 0.84 | 0.84 |
| LR | 0.94 | 0.92 | 0.87 | 0.85 | 0.85 | |
| SVM | 0.83 | 0.86 | 0.87 | 0.86 | 0.85 | |
| RF | 0.97 | 0.96 | 0.91 | 0.91 | 0.90 | |
| NB | 0.88 | 0.91 | 0.91 | 0.91 | 0.91 | |
| MLP | 0.78 | 0.84 | 0.84 | 0.84 | 0.84 | |
| L1 | DT | 0.75 | 0.80 | 0.80 | 0.80 | 0.80 |
| LR | 0.95 | 0.94 | 0.89 | 0.89 | 0.89 | |
| SVM | 0.87 | 0.89 | 0.90 | 0.89 | 0.89 | |
| RF | 0.96 | 0.95 | 0.89 | 0.88 | 0.88 | |
| NB | 0.85 | 0.88 | 0.89 | 0.88 | 0.88 | |
| MLP | 0.87 | 0.89 | 0.89 | 0.89 | 0.89 | |
| L2 | DT | 0.75 | 0.80 | 0.80 | 0.80 | 0.80 |
| LR | 0.94 | 0.93 | 0.89 | 0.88 | 0.88 | |
| SVM | 0.87 | 0.88 | 0.90 | 0.88 | 0.88 | |
| RF | 0.96 | 0.95 | 0.89 | 0.88 | 0.88 | |
| NB | 0.85 | 0.87 | 0.88 | 0.87 | 0.87 | |
| MLP | 0.85 | 0.87 | 0.88 | 0.87 | 0.87 |
DT: decision tree; LR: logistic regression; MLP: multilayer perceptron; NB: naïve Bayes; RF: random forest; ROC: receiver-operating characteristic; SVM: support vector machine.
Highest value.
Figure 2.Receiver-operating characteristic scores for the average operation with 6 machine learning algorithms. DT: decision tree; LR: logistic regression; MLP: multilayer perceptron; NB: naïve Bayes; RF: random forest; SVM: support vector machine.
Figure 3.Clustering visualization for (A) diseases and (B) all the biomedical entities. COVID-19 (coronavirus disease 2019) is represented in red, SARS (severe acute respiratory syndrome) is represented in black, coronavirus is represented in green, pneumonia is represented in blue, fever is represented in cyan, fibrosis is represented in yellow, diarrhea is represented in magenta, bronchitis is represented in olive drab, Ebola is represented in pink, influenza is represented in dark orchid, ZIKA is represented in khaki, all the genes are represented in purple, all the mutations are represented in silver, and all the chemicals are represented in salmon.
Top 10 intracluster closest biomedical entities for 5 selected coronavirus infectious diseases
| Coronavirus infectious diseases | Top 10 closest entities | Cosine similarity score |
|---|---|---|
| COVID-19 (cluster #6) | VP35 (Gene) | 0.9777 |
| HD11 (Gene) | 0.9774 | |
| Coronavirus infection process (Disease) | 0.9700 | |
| Fibroblast growth factor (FGF)-2 (Gene) | 0.9655 | |
| Acute respiratory infection illness (Disease) | 0.9596 | |
| PIGS (Gene) | 0.9576 | |
| TGF alpha (Gene) | 0.9571 | |
| SFPQ (Gene) | 0.9561 | |
| Tumor necrosis factor (TNF) (Gene) | 0.9549 | |
| Praziquantel (Chemical) | 0.9537 | |
| Pulmonary coronavirus infection (cluster #1) | PTP (Gene) | 0.9754 |
| SARS-CoV–infected human airway epithelia cell cultures (Disease) | 0.9699 | |
| “5'-tgg gat tca aca” (Chemical) | 0.9672 | |
| Trachea nasal respiratory epithelial cells and llamas (lama glama) (Disease) | 0.9658 | |
| Suppressor of cytokine signaling 3 (Gene) | 0.9620 | |
| KAT (Gene) | 0.9604 | |
| CD32 (Gene) | 0.9573 | |
| Maternal SARS infection (Disease) | 0.9553 | |
| Respiratory syndrome coronavirus (MERS-CoV) infections (Disease) | 0.9547 | |
| S27 (Gene) | 0.9546 | |
| SARS-COV infection damages lung (cluster #2) | IL-1α (Gene) | 0.9560 |
| Sucralfate prn (Chemical) | 0.9589 | |
| Acute respiratory syndrome-cov infection (Disease) | 0.9555 | |
| IL-5– and IL-13–producing ilc-iis (Gene) | 0.9487 | |
| HAP1 (Gene) | 0.9342 | |
| FSK (Chemical) | 0.9337 | |
| Low fever (Disease) | 0.9328 | |
| HIV and Ebola virus infection (Disease) | 0.9327 | |
| YKL-40 (Gene) | 0.9288 | |
| ETF (Gene) | 0.9280 | |
| Coronavirus upper respiratory infection (cluster #23) | Viruses | 0.9890 |
| Plasmin (Gene) | 0.9719 | |
| JAM-1 (Gene) | 0.9654 | |
| TNF receptor–associated factor 6 (Gene) | 0.9648 | |
| GPC3 (Gene) | 0.9613 | |
| Renin (Gene) | 0.9582 | |
| ZO-1 (Gene) | 0.9563 | |
| Cathepsin G (Gene) | 0.9556 | |
| rs5743313 (Mutation) | 0.9547 | |
| Alpha1 antitrypsin (Gene) | 0.9544 | |
| Coronavirus-infected pneumonia (cluster #10) | Respiratory syncytial viral infection (Disease) | 0.9923 |
| Pegylated interferon-alpha (Chemical) | 0.9891 | |
| IFITM6 (Gene) | 0.9872 | |
| Feline b (Chemical) | 0.9858 | |
| E119V (Mutation) | 0.9854 | |
| Epac2 (Gene) | 0.9850 | |
| GFTP2 (Gene) | 0.9849 | |
| Hepatitis coronavirus infection (Disease) | 0.9843 | |
| Ouabain (Chemical) | 0.9797 | |
| LY6G (Gene) | 0.9786 |
Cluster ID and the type of entities are marked in parentheses.