| Literature DB >> 33067484 |
Chong Hyun Suh1, Kyung Hwa Lee2,3, Young Jun Choi4, Sae Rom Chung1, Jung Hwan Baek1, Jeong Hyun Lee1, Jihye Yun1, Sungwon Ham3, Namkug Kim5,6.
Abstract
We investigated the ability of machine-learning classifiers on radiomics from pre-treatment multiparametric magnetic resonance imaging (MRI) to accurately predict human papillomavirus (HPV) status in patients with oropharyngeal squamous cell carcinoma (OPSCC). This retrospective study collected data of 60 patients (48 HPV-positive and 12 HPV-negative) with newly diagnosed histopathologically proved OPSCC, who underwent head and neck MRIs consisting of axial T1WI, T2WI, CE-T1WI, and apparent diffusion coefficient (ADC) maps from diffusion-weighted imaging (DWI). The median age was 59 years (the range being 35 to 85 years), and 83.3% of patients were male. The imaging data were randomised into a training set (32 HPV-positive and 8 HPV-negative OPSCC) and a test set (16 HPV-positive and 4 HPV-negative OPSCC) in each fold. 1618 quantitative features were extracted from manually delineated regions-of-interest of primary tumour and one definite lymph node in each sequence. After feature selection by using the least absolute shrinkage and selection operator (LASSO), three different machine-learning classifiers (logistic regression, random forest, and XG boost) were trained and compared in the setting of various combinations between four sequences. The highest diagnostic accuracies were achieved when using all sequences, and the difference was significant only when the combination did not include the ADC map. Using all sequences, logistic regression and the random forest classifier yielded higher accuracy compared with the that of the XG boost classifier, with mean area under curve (AUC) values of 0.77, 0.76, and 0.71, respectively. The machine-learning classifier of non-invasive and quantitative radiomics signature could guide the classification of the HPV status.Entities:
Mesh:
Year: 2020 PMID: 33067484 PMCID: PMC7568530 DOI: 10.1038/s41598-020-74479-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Baseline characteristics of the included patients.
| Sequence | HPV+ oropharyngeal cancer (n = 48) | HPV− oropharyngeal cancer (n = 12) |
|---|---|---|
| Age (mean ± SD) | 60.6 ± 8.6 | 59.4 ± 15.7 |
| Male:female | 39:9 | 11:1 |
| Tonsil | 34 (71%) | 6 (50%) |
| Base of tongue | 8 (17%) | 3 (25%) |
| Posterior pharyngeal wall | 2 (4%) | 3 (25%) |
| Soft palate | 1 (2%) | 0 (0%) |
| No evidence of primary tumor | 3 (6%) | 0 (0%) |
| 0 | 3 (6%) | 0 (0%) |
| 1 | 7 (15%) | 0 (0%) |
| 2 | 20 (42%) | 4 (33%) |
| 3 | 3 (6%) | 3 (25%) |
| 4 | 15 (31%) | 5b (42%) |
| 0 | 7 (15%) | 2 (17%) |
| 1 | 31 (65%) | 0 (0%) |
| 2 | 10 (21%) | 9c (75%) |
| 3 | 0 (0%) | 1d (8%) |
| 0 | 47 (98%) | 11 (92%) |
| 1 | 1 (2%) | 1 (8%) |
SD standard deviation.
aTNM staging was based on AJCC 8th edition.
bFive patients were T4a.
cFive patents were N2b and four patients were N2c.
dOne patient was N3b.
Figure 1Flowchart of the radiomic machine-learning classifier.
Top 7 features from four MR sequences.
| Sequence | Wavelets | Class | Variables | Frequency | Sum_Coef* | Freqa Sum_Coef |
|---|---|---|---|---|---|---|
| ADC | LLL | GLCM_dist_2 | Entropy_std | 58/60 (0.96) | 2.547 | 2.462 |
| T1 | Original | GLCM_dist_1 | Autocorrelation_std | 52/60 (0.86) | 1.494 | 1.295 |
| ADC | HLH | GLCM_dist_2 | Correlation_std | 45/60 (0.75) | 1.368 | 1.026 |
| ADC | LLH | GLCM_dist_1 | Homogeneity1_std | 47/60 (0.78) | 1.230 | 0.964 |
| ADC | Original | GLCM_dist_3 | Entropy_std | 44/60 (0.73) | 0.887 | 0.651 |
| ADC | HHH | GLCM_dist_3 | Correlation | 40/60 (0.66) | 0.950 | 0.633 |
| ADC | Original | GLCM_dist_1 | Difference variance | 55/60 (0.91) | 0.654 | 0.599 |
ADC apparent diffusion coefficient, T1WI T1-weighted imaging, GLCM gray-level co-occurrence matrix.
aSum of LASSO coefficients (= weights).
Figure 2Example of the original apparent diffusion coefficient (ADC) map and its 3D wavelet-transformed image for each human papillomavirus (HPV)-positive and HPV-negative case. (a) Original ADC map. (b) 3D wavelet-transformed image of ‘LLL’. (c) 3D wavelet-transformed image of ‘HLH’.
Classification accuracies between various combinations of sequences.
| Sequence | No. of selected features | AUC | |||||
|---|---|---|---|---|---|---|---|
| Logistic regression | Random forest | XG boost | |||||
| ADC | 166 | 0.72 ± 0.11 | .016 | 0.76 ± 0.11 | .456 | 0.69 ± 0.11 | .240 |
| T1WI | 160 | 0.42 ± 0.15 | < .001 | 0.45 ± 0.13 | < .001 | 0.43 ± 0.17 | < .001 |
| T2WI | 156 | 0.47 ± 0.13 | < .001 | 0.52 ± 0.13 | < .001 | 0.50 ± 0.12 | < .001 |
| CE-T1WI | 165 | 0.55 ± 0.12 | < .001 | 0.54 ± 0.13 | < .001 | 0.59 ± 0.15 | < .001 |
| ADC + T1WI | 190 | 0.69 ± 0.12 | < .001 | 0.74 ± 0.11 | .165 | 0.71 ± 0.11 | .393 |
| ADC + T2WI | 196 | 0.72 ± 0.11 | .020 | 0.73 ± 0.11 | .141 | 0.69 ± 0.11 | .113 |
| ADC + CE-T1WI | 193 | 0.76 ± 0.11 | .357 | 0.76 ± 0.12 | .495 | 0.71 ± 0.14 | .481 |
| T1WI + T2WI | 185 | 0.48 ± 0.15 | < .001 | 0.46 ± 0.13 | < .001 | 0.44 ± 0.16 | < .001 |
| T1WI + CE-T1WI | 200 | 0.56 ± 0.13 | < .001 | 0.56 ± 0.14 | < .001 | 0.51 ± 0.14 | < .001 |
| T2WI + CE-T1WI | 191 | 0.52 ± 0.13 | < .001 | 0.54 ± 0.14 | < .001 | 0.51 ± 0.14 | < .001 |
| ADC + T1WI + T2WI | 210 | 0.69 ± 0.14 | .003 | 0.73 ± 0.11 | .167 | 0.69 ± 0.12 | .229 |
| ADC + T1WI + CE-T1WI | 211 | 0.76 ± 0.11 | .316 | 0.74 ± 0.11 | .186 | 0.71 ± 0.12 | .482 |
| ADC + T2WI + CE-T1WI | 212 | 0.75 ± 0.11 | .173 | 0.74 ± 0.11 | .181 | 0.70 ± 0.12 | .373 |
| T1WI + T2WI + CE-T1WI | 213 | 0.53 ± 0.15 | < .001 | 0.54 ± 0.15 | < .001 | 0.50 ± 0.14 | < .001 |
| All | 221 | 0.77 ± 0.12 | 0.76 ± 0.12 | 0.71 ± 0.12 | |||
Average results ± standard deviations are reported.
AUC area under the curve, ADC apparent diffusion coefficient, T1WI T1-weighted imaging, T2WI fat-suppressed T2-weighted imaging, CE-T1WI fat-suppressed contrast-enhanced T1-weighted imaging.
Figure 3Results of the receiver operating characteristic curve analysis of three classifiers.
Results of the ROC curve analysis of 3 models.
| Classifiers | AUC | Sensitivity | Specificity |
|---|---|---|---|
| Logistic regression | 0.77 (0.50, 0.96) | 0.71 (0.31, 0.97) | 0.72 (0.50, 1.00) |
| Random forest | 0.76 (0.47, 0.97) | 0.70 (0.33, 0.93) | 0.72 (0.50, 1.00) |
| XG boost | 0.71 (0.50, 0.93) | 0.62 (0.21, 0.90) | 0.65 (0.25, 0.10) |
Unless otherwise specified, data are averages, with 95% confidence interval in parentheses.
ROC receiver operator characteristic, AUC area under the curve, CI confidence interval.