| Literature DB >> 33091741 |
Xiaofeng Zhu1, Bin Song2, Feng Shi3, Yanbo Chen3, Rongyao Hu4, Jiangzhang Gan4, Wenhai Zhang3, Man Li3, Liye Wang3, Yaozong Gao3, Fei Shan5, Dinggang Shen6.
Abstract
With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the conversion time that patients possibly convert to the severe stage, for designing effective treatment plans and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time formulated as a classification task, and if yes, the conversion time will be predicted formulated as a classification task. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification, and 2) the weight for each feature via a sparsity regularization term to remove the redundant features of the high-dimensional data and learn the shared information across two tasks, i.e., the classification and the regression. To our knowledge, this study is the first work to jointly predict the disease progression and the conversion time, which could help clinicians to deal with the potential severe cases in time or even save the patients' lives. Experimental analysis was conducted on a real data set from two hospitals with 408 chest computed tomography (CT) scans. Results show that our method achieves the best classification (e.g., 85.91% of accuracy) and regression (e.g., 0.462 of the correlation coefficient) performance, compared to all comparison methods. Moreover, our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the conversion time.Entities:
Keywords: CT Scan data; Coronavirus disease; Feature selection; Imbalance classification; Sample selection
Mesh:
Year: 2020 PMID: 33091741 PMCID: PMC7547024 DOI: 10.1016/j.media.2020.101824
Source DB: PubMed Journal: Med Image Anal ISSN: 1361-8415 Impact factor: 8.545
Demographic information of all subjects. The numbers in parentheses denote the number of subjects in each class. It is noteworthy that 52 cases were converted to severe on average 5.64 days and 34 cases were severe at admission.
| Severe cases | Non-severe cases | |
|---|---|---|
| (86) | (322) | |
| Female/male | 35/51 | 160/162 |
| Age | 55.43 ± 16.35 | 49.30 ± 15.70 |
Fig. 1The visualization of the segments of a chest CT scan image, i.e., 5 lung lobes (left) and 18 pulmonary segments.
Algorithm 1The pseudo of our optimization method.
The summarization of all methods. The first block and the second block, respectively, list the classification methods and the regression methods, while all multi-task methods are listed in the third block. Note that, FS: feature selection, SW: sample weight, IMB: imbalance classification, CLASS: classification, and REG: regression.
| Methods | FS | SW | IMB | CLASS | REG |
|---|---|---|---|---|---|
| SVM ( | √ | ||||
| L1SVM ( | √ | √ | |||
| Random forest ( | √ | √ | |||
| SFS ( | √ | √ | √ | √ | |
| Ridge regression | √ | ||||
| L1SVR ( | √ | √ | |||
| Lasso ( | √ | √ | |||
| Random forest ( | √ | √ | √ | ||
| MSFS ( | √ | √ | √ | ||
| Hyperface ( | √ | √ | |||
| COVID-CAPS ( | √ | √ | √ | ||
| CNNE ( | √ | √ | |||
| Proposed | √ | √ | √ | √ | √ |
Fig. 2Classification results of all methods.
Fig. 3ROC curves of all methods.
Classification results (%) of three methods. Proposed W/O Regression indicates Eq. (3).
| Methods | SFS ( | Proposed w/o Regression | Proposed |
|---|---|---|---|
| Accuracy | 78.18 ± 3.71 | 83.25 ± 2.44 | |
| Sensitivity | 50.65 ± 6.33 | 70.73 ± 3.36 | |
| Specificity | 86.31 ± 2.69 | 86.60 ± 3.45 | |
| AUC | 73.88 ± 6.66 | 81.74 ± 3.30 |
Regression results of all methods.
| Methods | CC | RMSE |
|---|---|---|
| Ridge regression | 0.329 ± 0.158 | 20.02 ± 9.724 |
| L1SVR ( | 0.351 ± 0.085 | 10.49 ± 2.072 |
| Lasso ( | 0.354 ± 0.165 | 9.92 ± 9.571 |
| Random forest ( | 0.406 ± 0.188 | 13.22 ± 6.762 |
| MSFS ( | 0.408 ± 0.092 | 9.29 ± 1.104 |
| Hyperface ( | 7.58 ± 0.719 | |
| COVID-CAPS ( | 0.467 ± 0.023 | 7.89 ± 0.451 |
| CNNE ( | 0.459 ± 0.033 | 8.61 ± 0.084 |
| Proposed | 0.462 ± 0.056 |
Fig. 4Scatter plots and the corresponding correlation coefficients (CCs) of all methods for predicting the severe cases.
The results of all evaluation metrics with different values of λ.
| Accuracy | Sensitivity | Specificity | AUC | CC | RMSE | |
|---|---|---|---|---|---|---|
| 0.001 | 81.59 ± 1.79 | 75.92 ± 2.09 | 83.11 ± 2.19 | 82.99 ± 3.43 | 0.423 ± 0.026 | 9.58 ± 0.719 |
| 0.01 | 83.57 ± 2.21 | 78.31 ± 2.31 | 84.98 ± 1.91 | 83.28 ± 4.14 | 0.451 ± 0.053 | 8.88 ± 1.217 |
| 0.1 | 82.15 ± 2.08 | 73.16 ± 2.83 | 84.56 ± 2.89 | 83.21 ± 3.55 | 0.449 ± 0.043 | 8.11 ± 1.191 |
| 1 | 84.57 ± 3.25 | 77.11 ± 3.06 | 86.75 ± 2.85 | 84.65 ± 3.74 | 0.428 ± 0.039 | 9.35 ± 1.084 |
| 10 | 85.69 ± 2.20 | 76.97 ± 3.36 | 88.02 ± 1.45 | 85.91 ± 2.27 | 0.462 ± 0.056 | 7.35 ± 1.087 |
| 100 | 84.21 ± 2.74 | 77.12 ± 3.50 | 86.11 ± 3.06 | 83.65 ± 4.52 | 0.448 ± 0.071 | 7.68 ± 1.464 |
| 1000 | 83.44 ± 3.74 | 78.53 ± 3.55 | 84.55 ± 2.80 | 83.78 ± 3.82 | 0.439 ± 0.081 | 8.49 ± 1.641 |
Fig. 5The variations of the objective function values of Eq. (6) at different iterations.
Regions distribution at different HU ranges for the top selected regions.
| Hu ranges | left lung (6) | right lung (16) |
|---|---|---|
| 0 | 2 | |
| 1 | 8 | |
| 2 | 5 | |
| 1 | 0 | |
| [50, ∞] | 2 | 1 |
Fig. 6Ratios of infected volumes in the HU ranges of [-700,-500] and [-500, -200], where patient IDs 1–322 are non-severe cases and patient IDs (323–408) are severe cases.