| Literature DB >> 36185061 |
Zekun Jiang1,2,3, Jin Yin1, Peilun Han1,2,3, Nan Chen4, Qingbo Kang1,2,3, Yue Qiu1,2,3, Yiyue Li1,2,3, Qicheng Lao1,2,3,5, Miao Sun1,6, Dan Yang7, Shan Huang7, Jiajun Qiu1, Kang Li1,2,3,5,8.
Abstract
Background: This study set out to develop a computed tomography (CT)-based wavelet transforming radiomics approach for grading pulmonary lesions caused by COVID-19 and to validate it using real-world data.Entities:
Keywords: COVID-19; computed tomography (CT); machine learning; quantitative image analysis; radiomics
Year: 2022 PMID: 36185061 PMCID: PMC9511418 DOI: 10.21037/qims-22-252
Source DB: PubMed Journal: Quant Imaging Med Surg ISSN: 2223-4306
Figure 1Overview of the study workflow showing the process from CT images to radiomics features to machine learning models. The red arrow indicates the location of the lesion. CT, computed tomography; H, high-pass decomposition filter; L, low-pass decomposition filter; FIR, finite impulse response; GLCM, gray-level co-occurrence matrix; GLRLM, gray-level run length matrix; GLSZM, gray-level size zone matrix; NGTDM, neighboring gray tone difference matrix; GLDM, gray-level dependence matrix; ROC, receiver operating characteristic; AUC, area under the receiver operating characteristic curve.
Figure 2Flowchart showing the application of inclusion and exclusion criteria to the study population. RT-PCR, reverse transcription-polymerase chain reaction; CT, computed tomography; N, the number of patients; M, the number of lesions.
Clinical characteristics of the study population
| Characteristics | All lesions (n=187) | Mild lesions (n=108) | Moderate/severe lesions (n=79) | P value |
|---|---|---|---|---|
| Age (years), mean ± SD | 44.88±12.15 | 46.51±12.30 | 42.65±11.58 | 0.030 |
| Sex, n (%) | 0.112 | |||
| Male | 112 (59.89) | 70 (64.81) | 42 (53.16) | |
| Female | 75 (40.11) | 38 (35.19) | 37 (46.84) | |
| Cohort, n (%) | 0.039 | |||
| Training | 127 (67.91) | 80 (74.07) | 47 (59.50) | |
| Test | 60 (32.09) | 28 (25.93) | 32 (40.50) |
SD, standard deviation.
Details of the wavelets used in this study
| Wavelets | Abbreviation | NA | Decomposition mode |
|---|---|---|---|
| Haar | haar | N/A | LLH, LHL, LHH, HLL, HLH, HHL, HHH, LLL |
| Daubechies N | dbN | 1, 10, 20 | |
| Symlets N | symN | 2, 10, 20 | |
| Coiflets N | coifN | 1, 2, 3, 4, 5 | |
| Biorthogonal Nr.Nd | bior Nr.Nd | 1.1, 2.2, 3.3, 4.4, 5.5 | |
| Reverse biorthogonal Nr.Nd | rbio Nr.Nd | 1.1, 2.2, 3.3, 4.4, 5.5 | |
| “Discrete” FIR approximation of Meyer | dmey | N/A |
A, decomposition progression of wavelets. “N” in dbN, symN, and coifN refers to the number of vanishing moments. In biorNr.Nd and rbioNr.Nd, “Nr” is the number of the order of the functions used for reconstruction and “Nd” is the order of the functions used for decomposition (36). FIR, finite impulse response; N/A, not applicable; L, low-pass decomposition filter; H, high-pass decomposition filter.
Performance of different machine learning pipelines
| Machine learning pipeline | Training AUC | Cross-validation mean AUC | Test AUC |
|---|---|---|---|
| BorutaShap + RF* | 0.98* | 0.85* | 0.88* |
| BorutaShap + SVM | 0.99 | 0.82 | 0.84 |
| BorutaShap + LR | 0.96 | 0.82 | 0.83 |
| BorutaShap + MLP | 0.99 | 0.83 | 0.85 |
| Boruta + RF | 0.98 | 0.84 | 0.87 |
| Boruta + SVM | 0.97 | 0.84 | 0.86 |
| Boruta + LR | 0.95 | 0.83 | 0.84 |
| Boruta + MLP | 0.99 | 0.85 | 0.87 |
| LASSO + RF | 0.94 | 0.78 | 0.80 |
| LASSO + SVM | 0.94 | 0.78 | 0.80 |
| LASSO + LR | 0.98 | 0.84 | 0.86 |
| LASSO + MLP | 0.97 | 0.83 | 0.85 |
| RFE + RF | 0.97 | 0.84 | 0.86 |
| RFE + SVM | 0.96 | 0.85 | 0.87 |
| RFE + LR | 0.97 | 0.80 | 0.86 |
| RFE + MLP | 0.97 | 0.82 | 0.85 |
*, the best-performing pipeline. AUC, area under the receiver operating characteristic curve; RF, random forest; SVM, support vector machine; LR, logistic regression; MLP, multilayer perceptron; LASSO, least absolute shrinkage and selection operator; RFE, recursive feature elimination.
Radiomic features selected in original images
| Feature | Feature type | Feature value | CorrelationA | P value |
|---|---|---|---|---|
| F1 | GLRLM | Long run high gray-level emphasis | 0.441 | <0.001 |
| F2 | GLCM | Autocorrelation | 0.408 | <0.001 |
| F3 | GLCM | Joint averages | 0.414 | <0.001 |
| F4 | GLCM | Cluster shade | −0.476 | <0.001 |
| F5 | GLDM | Large dependence high gray-level emphasis | 0.395 | <0.001 |
| F6 | NGTDM | Busyness | −0.345 | <0.001 |
| F7 | GLDM | Gray-level non-uniformity | −0.227 | 0.010 |
| F8 | GLSZM | Gray-level non-uniformity | −0.207 | 0.020 |
| F9 | GLCM | Correlation | 0.168 | 0.042 |
A, Spearman correlation coefficient with two-sided test. GLRLM, gray-level run length matrix; GLCM, gray-level co-occurrence matrix; GLDM, gray-level dependence matrix; NGTDM, neighboring gray tone difference matrix; GLSZM, gray-level size zone matrix.
Figure 3Training and test ROC curves of the bior1.1 LLL wavelet transforming radiomic model. AUC, area under the receiver operating characteristic curve; bior, biorthogonal; L, low-pass decomposition filter; ROC, receiver operating characteristic.
Detailed performance evaluation of the radiomic model
| Index | Training cohort | Test cohort | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Support | Precision | Recall | F1 | Support | ||
| Mild lesions | 0.96 | 0.94 | 0.95 | 80 | 0.73 | 0.96 | 0.83 | 28 | |
| Moderate/severe lesions | 0.90 | 0.94 | 0.92 | 47 | 0.96 | 0.69 | 0.80 | 32 | |
| Accuracy | N/A | N/A | 0.94 | 127 | N/A | N/A | 0.82 | 60 | |
| Macro average | 0.93 | 0.94 | 0.93 | 127 | 0.84 | 0.83 | 0.82 | 60 | |
| Weighted average | 0.94 | 0.94 | 0.94 | 127 | 0.85 | 0.82 | 0.81 | 60 | |
N/A, not applicable.
Figure 4Calibration (A,B) and decision (C,D) curves for our radiomic model in the training (A,C) and test (B,D) cohorts. Apparent curves refer to in-sample calibration of the model, ideal curves refer to the perfect prediction model, and bias-corrected curves show results of overfitting calibrated by bootstrap sampling, where B is the numbers of bootstraps. For decision curves (C,D), gray lines represent the hypothesis that all lesions were moderate or severe and black lines represent the hypothesis that all lesions were mild.
Figure 5CT scans of typical COVID-19 lesions with radiomic feature maps showing the diagnostic performance of our radiomic model. (A) Results for a 49-year-old man with a moderate/severe lesion, showing a prediction probability of 0.98 by using the radiomic model. (B) Results for a 64-year-old man with a mild lesion, showing a prediction probability of 0.08. The details of radiomic features F1–F9 are presented in . Hot color mapping can be used for better visualization. The insets on the CT scans to which arrows point are the regions of interest in the lesion in the original image; the different radiomics feature maps are shown to the right. CT, computed tomography.