| Literature DB >> 32993652 |
Wei Tse Li1,2, Jiayan Ma1,2, Neil Shende1,2, Grant Castaneda1,2, Jaideep Chakladar1,2, Joseph C Tsai1,2, Lauren Apostol1,2, Christine O Honda1,2, Jingyue Xu1,2, Lindsay M Wong1,2, Tianyi Zhang1,2, Abby Lee1,2, Aditi Gnanasekar1,2, Thomas K Honda1,2, Selena Z Kuo3, Michael Andrew Yu4, Eric Y Chang5,6, Mahadevan Raj Rajasekaran7,8, Weg M Ongkeko9,10.
Abstract
BACKGROUND: The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.Entities:
Keywords: COVID-19; Diagnostic model; Machine learning
Mesh:
Year: 2020 PMID: 32993652 PMCID: PMC7522928 DOI: 10.1186/s12911-020-01266-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Clinical Variables Summary of Meta-analysis
| Continuous Variable | ||||
|---|---|---|---|---|
| Clinical Variable | # of Data | mean | median | variance |
| Age | 389 | 38.91306 | 39 | 21.85783 |
| NumberOfFamilyMembersInfected | 54 | 3.37037 | 2 | 2.6338 |
| neutrophil | 103 | 6.854078 | 3.31 | 12.62838 |
| SerumLevelsOfWhiteBloodCell | 130 | 7.031223 | 5.965 | 4.250785 |
| lymphocytes | 135 | 2.022841 | 0.98 | 4.207139 |
| Plateletes | 50 | 220.32 | 185.5 | 146.3334 |
| CReactiveProteinLevels | 139 | 31.18187 | 15 | 40.4953 |
| Eosinophils | 8 | 0.06125 | 0.01 | 0.070078 |
| RedBloodCells | 4 | 4.225 | 4.205 | 0.189011 |
| Hemoglobin | 24 | 45.5 | 14.5 | 49.99953 |
| Procalcitonin | 33 | 2.586394 | 0.07 | 12.54482 |
| DurationOfIllness | 88 | 14.06818 | 12 | 8.970653 |
| DaysToDeath | 3 | 12.66667 | 12 | 6.548961 |
| DaysBeforeSymptomsAppear | 38 | 7.368421 | 6 | 5.142297 |
| NumberOfAffectedLobes | 24 | 1.75 | 2 | 1.163687 |
| TimeBetweenAdmissionAndDiagnosis | 47 | 5.893617 | 6 | 4.116568 |
| bodyTemperature | 67 | 37.6209 | 37.5 | 0.972999 |
| Hematocrit | 7 | 0.320286 | 0.355 | 0.078175 |
| ActivatedPartialThromboplastinTime | 9 | 33.18889 | 33.4 | 3.642784 |
| fibrinogen | 9 | 3.685556 | 3.91 | 0.752184 |
| urea | 19 | 3.123158 | 3 | 0.863884 |
| Discrete Variable | ||||
| Variables | Number | Percentage | ||
| Sex | ||||
| M | 194 | 49.4898 | ||
| F | 198 | 50.5102 | ||
| Community Transmission | ||||
| Yes | 93 | 37.5 | ||
| No | 46 | 18.54839 | ||
| No/Wuhan | 109 | 43.95161 | ||
| Neutrophil | ||||
| low | 15 | 11.81102 | ||
| normal | 83 | 65.35433 | ||
| high | 29 | 22.83465 | ||
| Serum Levels Of White Blood Cell | ||||
| low | 55 | 32.35294 | ||
| normal | 94 | 55.29412 | ||
| high | 21 | 12.35294 | ||
| Lymphocytes | ||||
| low | 86 | 48.86364 | ||
| normal | 73 | 41.47727 | ||
| high | 17 | 9.659091 | ||
| C Reactive Protein (CRP) Levels | ||||
| normal | 60 | 37.97468 | ||
| high | 98 | 62.02532 | ||
| CT Scan Results | ||||
| pos | 124 | 89.20863 | ||
| neg | 15 | 10.79137 | ||
| RT-PCR Results | ||||
| pos | 100 | 96.15385 | ||
| neg | 4 | 3.846154 | ||
| X-ray Result | ||||
| pos | 35 | 74.46809 | ||
| neg | 12 | 25.53191 | ||
| GGO | ||||
| Yes | 92 | 96.84211 | ||
| No | 3 | 3.157895 | ||
| Diarrhea | ||||
| Yes | 30 | 45.45455 | ||
| No | 36 | 54.54545 | ||
| Fever | ||||
| Yes | 261 | 91.25874 | ||
| No | 25 | 8.741259 | ||
| Coughing | ||||
| Yes | 164 | 82.82828 | ||
| No | 34 | 17.17172 | ||
| Shortness Of Breath | ||||
| Yes | 45 | 60 | ||
| No | 30 | 40 | ||
| Sore Throat | ||||
| Yes | 37 | 60.65574 | ||
| No | 24 | 39.34426 | ||
| Nausea/Vomiting | ||||
| Yes | 18 | 52.94118 | ||
| No | 16 | 47.05882 | ||
| Pregnant | ||||
| Yes | 43 | 66.15385 | ||
| No | 22 | 33.84615 | ||
| Fatigue | ||||
| Yes | 8 | 61.53846 | ||
| No | 5 | 38.46154 | ||
Fig. 1Select correlations with continuous clinical variables for COVID-19 patients. a Correlations between two continuous variables (Spearman, p < 0.05). b Correlations between one continuous and one categorical variable (Kruskal-Wallis test, p < 0.05)
Fig. 2Correlations between gender and another categorical variable. a Correlation between lymphocyte level categories and gender. b Correlation between neutrophil level categories and gender. c Correlation between serum leukocyte level categories and gender. A contingency table and a bar plot of the number of patients in each level are displayed for each correlation
Fig. 3Summary of COVID-19 patient clustering using SOM. a Plot of topographic error of the 2D SOM grid vs. size of the grid. b 2D plot of SOM neurons after retaining only the most significant clinical variable for analysis. Each small grid represents a neuron, and the size of the square in each grid represents the number of patients associated with each neuron. The color code corresponds to superclusters presented in panel (d). c Plot of number of patients in each neuron. d 3D dendrogram summarizing the neurons into superclusters. e 2D dendrogram with the same information as the dendrogram in panel (d). In both dendrograms, the vertical axis represents the relative distance between clusters, which can be known between any two clusters by looking at the branch point where they diverge. f Gradient map where light blue regions of the SOM depict higher similarity of neurons with each other. g Boxplots of immune-associated clinical variables that differentiate superclusters. h Boxplots in which superclusters 1 and 3 display similar trends. i Boxplots in which only one supercluster has a median at a different value from the other three. All variables have been previously normalized. For binary variables, only three possible positions on the vertical axis is possible: the bottom one being no, the middle one being yes, and the top one being missing. For the gender (sex) variable, the bottom position is female, the middle is male, and the top one is missing
Fig. 4Summary of XGBoost classification of COVID-19 and influenza patients. a ROC curve of prediction. b Precision recall curve of prediction. c Confusion matrix of prediction. d Variables most important for classification, listed by decreasing order of importance. e 6-level sample model of SOM decision tree construction
Fig. 5Classification of COVID-19 vs. influenza patients using RIDGE, random forest, and LASSO models. ROC curves and AUC for each model were presented
Fig. 6Classification of COVID-19 vs. influenza patients in different demographic cohorts. RIDGE, LASSO, random forest (RF), and XGBoost classification models were applied to 5 different cohorts of patients