| Literature DB >> 33085645 |
Xing Guo1, Yanrong Li2, Hua Li3, Xueqin Li4, Xu Chang5, Xuemei Bai6, Zhanghong Song7, Junfeng Li1, Kefeng Li8.
Abstract
COVID-19 shared many symptoms with seasonal flu, and community-acquired pneumonia (CAP) Since the responses to COVID-19 are dramatically different, this multicenter study aimed to develop and validate a multivariate model to accurately discriminate COVID-19 from influenza and CAP. Three independent cohorts from two hospitals (50 in discovery and internal validation sets, and 55 in the external validation cohorts) were included, and 12 variables such as symptoms, blood tests, first reverse transcription-polymerase chain reaction (RT-PCR) results, and chest CT images were collected. An integrated multi-feature model (RT-PCR, CT features, and blood lymphocyte percentage) established with random forest algorism showed the diagnostic accuracy of 92.0% (95% CI: 73.9 - 99.1) in the training set, and 96. 6% (95% CI: 79.6 - 99.9) in the internal validation cohort. The model also performed well in the external validation cohort with an area under the receiver operating characteristic curve of 0.93 (95% CI: 0.79 - 1.00), an F1 score of 0.80, and a Matthews correlation coefficient (MCC) of 0.76. In conclusion, the developed multivariate model based on machine learning techniques could be an efficient tool for COVID-19 screening in nonendemic regions with a high rate of influenza and CAP in the post-COVID-19 era.Entities:
Keywords: COVID-19; diagnostic model; influenza; multi-feature; random forest
Mesh:
Year: 2020 PMID: 33085645 PMCID: PMC7655178 DOI: 10.18632/aging.104132
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
Patient characteristics in the discovery and internal validation cohorts.
| Age (years) | 25.1 (24.2 - 62.5) | 29.5 (25.6 - 54.3) | 31 (23.1 - 56.4) | 0.93 |
| Male, n(%) | 4 (50%) | 4 (50%) | 22 (64.5%) | 0.36 |
| Fever, n (%) | 2 (25%) | 4 (50%) | 15 (44.1%) | 0.43 |
| Cough, n (%) | 2 (25%) | 1 (12.5%) | 12 (35.3%) | 0.37 |
| Sore throat, n (%) | 2 (25%) | 1 (12.5%) | 3 (8.8%) | 0.22 |
| Fatigue, n (%) | 2 (25%) | 2 (25%) | 4 (11.7%) | 0.27 |
| WBC (109/L) | 5.3 (3.6 - 6) | 4.9 (3.2 - 6.2) | 5.5 (4.1 - 6.5) | 0.65 |
| Lymphocyte count (109/L) | 0.93 (0.76 - 1.35) | 1.4 (0.9 - 1.9) | 1.5 (1.3 - 2.1) | 0.08 |
| Lymphocyte percentage (%) | 14.2 ± 6.3 | 36.4 ± 7.1 | 33.9 ± 10.1 | <0.001 |
The data are presented as the median and interquartile range (IQR), n (%) or mean and standard deviation (SD) For categorical data, chi-square test was used, and for continuous variables, Kruskal-Wallis was applied. WBC: white blood cell count.
Figure 1The representative chest images for patients with COVID-19 (A), influenza (B) and community-acquired pneumonia (CAP) (C).
Patient characteristics in the external validation cohort enrolled from another hospital.
| Age (years) | 31.0 (25.0 - 50.0) | 32.0 (26.5 - 52.3) | 39.0 (27 - 48) | 0.95 |
| Male, n (%) | 6 (54.5%) | 12 (60.0%) | 15 (62.5%) | 0.66 |
| Fever, n (%) | 9 (81.8%) | 13 (65.0%) | 20 (83.3%) | 0.65 |
| Cough, n (%) | 3 (27.3%) | 9 (45.0%) | 11 (45.8%) | 0.36 |
| Sore throat, n (%) | 1 (9.1%) | 3 (15.0%) | 2 (8.3%) | 0.81 |
| Fatigue, n (%) | 1 (9.1%) | 6 (30.0%) | 4 (16.7%) | 0.86 |
| WBC (109/L) | 6.5 (5.1 – 8.8) | 7.7 (5.5 - 9.8) | 6.8 (5.3 - 8.2) | 0.32 |
| Lymphocyte count (109/L) | 1.2 (1.0 - 1.9) | 1.3 (0.9 - 2.1) | 1.7 (1.3 - 2.3) | 0.03 |
| Lymphocyte percentage (%) | 13.1 (8.5 – 14.8) | 17.2 (11.7 - 25.5) | 22.1 (17.0 - 30.2) | 0.093 |
The data are presented as the median and interquartile range (IQR), or n (%). For categorical data, chi-square test was used, and for continuous variables, Kruskal-Wallis was applied. WBC: white blood cell count.
Figure 2Features ranked by mean decrease accuracy (MDA) scores in the random forest model for classification between COVID-19 and other infections. The number of trees = 1000. LYP: lymphocyte percentage; WBC: white blood cell.
Figure 3The development of an integrated model for the differentiation of COVID-19 from other respiratory diseases in the training set. (A) ROC curve for the performance of first RT-PCR; (B) ROC curve for the performance of CT; (C) ROC curve for the integrated model. The integrated model contained the results of the first RT-PCR, CT, and LYP in the blood. (D) The cross-validation of the integrated model using the permutation test (1000 times). ROC curve: Receiver operator characteristic curve; LYP: lymphocyte percentage.
The performance of the developed models for the differentiation of COVID-19 from influenza, and community-acquired pneumonia with similar symptoms.
| PCR | 92.0 (73.9 - 99.1) | 0.66 | 0.67 | 100 (75.8 - 100) | 91.3 (71.9 - 98.9) | 50.1 (21.1 - 78.9) | 100 |
| CT | 84.0 (63.9 - 95.5) | 0.67 | 0.64 | 100 (79.8 - 100) | 80.9 (58.1 - 94.6) | 50.1 (29.3 - 70.7) | 100 |
| Integrated model-training set | 92.0 (73.9 - 99.1) | 0.81 | 0.78 | 100 (89.8 - 100) | 90.5 (69.6 - 98.8) | 86.7 (74.8 -92.2) | 100 |
| Integrated model-internal validation | 96.0 (79.6 - 99.9) | 0.86 | 0.85 | 92.1 (89.4 - 99.4) | 88.2 (83.9 - 100) | 92.3 (85.4 – 100) | 94.5 (79.4 - 99.1) |
| Integrated model-external validation | 92.7 (82.4 – 97.9) | 0.80 | 0.76 | 88.9 (51.8 – 99.7) | 93.5 (82.1 – 98.6) | 72.7 (46.6 – 89.1) | 97.7 (87.1 – 99.6) |
The models were trained by the random forest algorithm. 95% CI: 95% confidence interval; MCC: Matthews correlation coefficient; PPV (%): positive predictive value; NPV (%): negative predictive value.
Figure 4The performance of the developed integrated model evaluated by ROC curve in the external validation cohort. (A) ROC curve; (B) The cross-validation with the permutation test (1000 times). ROC curve: Receiver operator characteristic curve.