| Literature DB >> 33208927 |
Wanshan Ning1, Shijun Lei2,3, Jingjing Yang4,5, Yukun Cao6,7, Peiran Jiang1, Qianqian Yang2, Jiao Zhang2, Xiaobei Wang2, Fenghua Chen2, Zhi Geng2, Liang Xiong8, Hongmei Zhou9, Yaping Guo1, Yulan Zeng10, Heshui Shi11,12, Lin Wang13,14, Yu Xue15, Zheng Wang16,17.
Abstract
Data from <span class="Species">patients with <span class="Disease">coronavirus disease 2019 (COVID-19) are essential for guiding clinical decision making, for furthering the understanding of this viral disease, and for diagnostic modelling. Here, we describe an open resource containing data from 1,521 patients with pneumonia (including COVID-19 pneumonia) consisting of chest computed tomography (CT) images, 130 clinical features (from a range of biochemical and cellular analyses of blood and urine samples) and laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) clinical status. We show the utility of the database for prediction of COVID-19 morbidity and mortality outcomes using a deep learning algorithm trained with data from 1,170 patients and 19,685 manually labelled CT slices. In an independent validation cohort of 351 patients, the algorithm discriminated between negative, mild and severe cases with areas under the receiver operating characteristic curve of 0.944, 0.860 and 0.884, respectively. The open database may have further uses in the diagnosis and management of patients with COVID-19.Entities:
Year: 2020 PMID: 33208927 PMCID: PMC7723858 DOI: 10.1038/s41551-020-00633-5
Source DB: PubMed Journal: Nat Biomed Eng ISSN: 2157-846X Impact factor: 25.671
The data characteristics of cohort 1 and cohort 2
| Type | Cohort 1 | Cohort 2 | ||
|---|---|---|---|---|
| No. of patients | No. of CT slices | No. of patients | No. of CT slices | |
| Control | 222 | 54,853 | 106 | 27,386 |
| Mild | 23 | 3,246 | 1 | 458 |
| Regular | 415 | 92,485 | 181 | 45,027 |
| Severe | 146 | 38,583 | 56 | 17,549 |
| Critically ill | 65 | 7,901 | 7 | 1,010 |
| Suspected | 299 | 75,859 | 0 | 0 |
| Deceased | 53 | 4,935 | 4 | 0 |
| Cured | 450 | 105,636 | 212 | 59,362 |
| Unknown | 146 | 31,644 | 29 | 4,682 |
Fig. 1Statistics of the study data.
a, Numbers of control subjects, subjects with suspected COVID-19, and patients with confirmed mild, regular, severe and critically ill forms of COVID-19 in cohorts 1 and 2. b, Numbers of patients that are cured, deceased and with unknown outcome in the two cohorts. c, Numbers of chest CT slices of patients with or without COVID-19 pneumonia and cured and deceased cases in the two cohorts. d, Statistical comparisons of CFs between subjects with COVID-19 (type I and type II) and controls (P < 10−5), between type II and type I cases (P < 10−9), and between deceased and cured cases (P < 10−5). Two-sided unpaired t-test was performed for data following a normal distribution; otherwise a Mann–Whitney U test was used. ALG, albumin/globulin ratio; ALB, albumin; ALP, alkaline phosphatase; APTT, activated partial thromboplastin time; AST, aspartate aminotransferase; BUN, urea nitrogen; CA, calcium; CRP, C-reactive protein; DBIL, direct bilirubin; DD, D-dimer; EO, eosinophil count; EOP, eosinophil percentage; GGT, γ-glutamyltransferase; GLB, globulin; GLU, glucose; HSCRP, high-sensitivity C-reactive protein; IL-6, interleukin-6; INR, international normalization ratio; LDH, lactate dehydrogenase; LY, lymphocyte count; LYP, lymphocyte percentage; MOP, monocyte percentage; NE, neutrophil count; NEP, neutrophil percentage; PCT, procalcitonin; PT, prothrombin time; RDWCV, red cell volume distribution width; RDWSD, standard deviation of red cell volume distribution width; TBIL, total bilirubin; and WBC, white blood cell. The full list and details of the CFs are presented in Supplementary Data 1. Further details on the statistical analyses are presented in Supplementary Data 3.
Fig. 2The hybrid learning architecture of HUST-19.
HUST-19 includes a 13-layer CNN framework for predicting individual CT slices, a second 13-layer CNN framework to transform individual slice-based prediction into patient-based prediction of clinical outcomes, a 7-layer DNN framework to predict clinical outcomes of patients with COVID-19 from CFs, and the PLR algorithm used for integration of CT- and CF-based results to predict morbidity or mortality outcomes.
Details on the performance evaluation of HUST-19 for the prediction of individual CT slices, morbidity outcomes, and mortality outcomes
| Prediction | Type | AUC | Sn (%) | Sp (%) | Ac (%) | PPV (%) | NPV (%) | MCC |
|---|---|---|---|---|---|---|---|---|
| Prediction of individual CT slices | NiCT | 0.994 | 98.40% | 99.64% | 99.42% | 99.12% | 99.55% | 0.9861 |
| pCT | 0.996 | 97.00% | 90.68% | 91.97% | 72.74% | 99.16% | 0.7940 | |
| nCT | 0.991 | 85.47% | 99.12% | 92.38% | 99.00% | 87.25% | 0.8557 | |
| CT based | Control | 0.919 | 51.99% | 98.01% | 84.66% | 91.46% | 83.32% | 0.6115 |
| Type I | 0.804 | 94.70% | 39.17% | 67.76% | 62.29% | 87.45% | 0.4105 | |
| Type II | 0.838 | 19.98% | 98.33% | 83.05% | 74.39% | 83.53% | 0.3257 | |
| CF based | Control | 0.882 | 49.95% | 96.75% | 84.55% | 84.43% | 84.57% | 0.5677 |
| Type I | 0.856 | 92.58% | 47.56% | 68.92% | 61.82% | 86.68% | 0.4385 | |
| Type II | 0.879 | 44.96% | 98.04% | 84.27% | 88.94% | 83.56% | 0.5583 | |
| HUST-19 | Control | 0.978 | 85.01% | 99.80% | 95.31% | 99.46% | 93.86% | 0.8897 |
| Type I | 0.921 | 87.82% | 79.20% | 83.34% | 79.55% | 87.59% | 0.6708 | |
| Type II | 0.931 | 70.86% | 92.67% | 87.94% | 72.79% | 91.99% | 0.6415 | |
| CT based | 0.808 | 76.47% | 76.40% | 76.41% | 13.40% | 98.55% | 0.5000 | |
| CF based | 0.822 | 81.13% | 70.32% | 71.49% | 24.86% | 96.86% | 0.4994 | |
| HUST-19 | 0.856 | 88.24% | 78.26% | 78.73% | 16.67% | 99.26% | 0.5236 | |
The Sn, Sp, accuracy (Ac), positive predictive value (PPV), negative predictive value (NPV) and Matthews correlation coefficient (MCC) were calculated from the tenfold cross-validations.
Details on the performance evaluation of HUST-19 for the prediction of morbidity outcomes using data from cohort 2
| Prediction | Type | AUC | Sn (%) | Sp (%) | Ac (%) | PPV (%) | NPV (%) | MCC |
|---|---|---|---|---|---|---|---|---|
| CT based | Control | 0.895 | 53.57% | 94.47% | 83.70% | 77.59% | 85.06% | 0.5486 |
| Type I | 0.775 | 86.67% | 50.36% | 70.85% | 69.33% | 74.47% | 0.4027 | |
| Type II | 0.832 | 32.73% | 95.08% | 84.33% | 58.06% | 87.15% | 0.3546 | |
| CF based | Control | 0.888 | 54.72% | 99.59% | 86.04% | 98.31% | 83.56% | 0.6668 |
| Type I | 0.834 | 72.53% | 78.70% | 75.50% | 78.57% | 72.68% | 0.5124 | |
| Type II | 0.845 | 71.63% | 78.82% | 77.49% | 42.45% | 92.65% | 0.4200 | |
| HUST-19 | Control | 0.944 | 51.19% | 98.30% | 85.89% | 91.49% | 84.93% | 0.6150 |
| Type I | 0.860 | 80.56% | 76.26% | 78.68% | 81.46% | 75.18% | 0.5673 | |
| Type II | 0.884 | 80.00% | 78.41% | 78.68% | 43.56% | 94.95% | 0.4743 |
The Sn, Sp, Ac, PPV, NPV and MCC values were directly calculated.
Fig. 3The performance evaluation of HUST-19 based on tenfold cross-validations.
a, The slice-based prediction of NiCT, pCT and nCT images. b,c, The integration of CT and CF data for predicting morbidity outcomes in cohort 1 (207 controls, 384 type I patients and 149 type II patients having both CT and CF data) (b) and cohort 2 (106 controls, 180 type I patients and 56 type II patients having both CT and CF data) (c). d, The integration of CT and CF data for predicting mortality outcomes on the merged cohort (594 cured and 19 deceased cases having both CT and CF data). e–h, Corresponding confusion matrices for the four types of predictions in a–d, respectively, are derived from the tenfold cross-validations under the sensitive threshold. Further details on the performance evaluation are provided in Supplementary Data 4.
Fig. 4Prediction of potential morbidity outcomes of 299 suspected cases without laboratory confirmation of SARS-CoV-2 status at the time of enrolment.
a, HUST-19 was used to predict whether 299 suspected cases were COVID-19 negative, or type I or type II cases (Supplementary Data 4). b, t-SNE analysis of the classification efficiency of HUST-19 for the predictions described in a. c, Schematics showing the clinical courses of two suspected cases of COVID-19, patient 324 and patient 610, who were predicted by HUST-19 to correspond to type I and type II cases, respectively. Jan, January; Feb, February; Mar, March.