| Literature DB >> 29556480 |
Hubert S Gabryś1,2,3, Florian Buettner4, Florian Sterzing3,5,6, Henrik Hauswald3,5,6, Mark Bangert1,3.
Abstract
PURPOSE: The purpose of this study is to investigate whether machine learning with dosiomic, radiomic, and demographic features allows for xerostomia risk assessment more precise than normal tissue complication probability (NTCP) models based on the mean radiation dose to parotid glands.Entities:
Keywords: IMRT; NTCP; dosiomics; head and neck; machine learning; radiomics; radiotherapy; xerostomia
Year: 2018 PMID: 29556480 PMCID: PMC5844945 DOI: 10.3389/fonc.2018.00035
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Patients and tumor characteristics.
| All | 0–6 months | 6–15 months | 15–24 months | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Grade 0 | Grade 1 | Grade 2 | Grade 0 | Grade 1 | Grade 2 | Grade 0 | Grade 1 | Grade 2 | ||
| Total patients | 153 | 17 | 87 | 30 | 19 | 99 | 13 | 15 | 53 | 9 |
| Age | ||||||||||
| Median | 61 | 60 | 60 | 62 | 60 | 61 | 61 | 61 | 61 | 61 |
| Q1–Q3 | 55–66 | 54–66 | 54–64 | 53–69 | 57–63 | 53–66 | 54–68 | 55–68 | 52–66 | 54–68 |
| Range | 29–82 | 44–78 | 29–82 | 43–80 | 49–75 | 29–82 | 43–74 | 47–80 | 39–78 | 41–80 |
| Sex | ||||||||||
| Female | 37 | 5 | 19 | 7 | 6 | 24 | 2 | 2 | 9 | 4 |
| Male | 116 | 12 | 68 | 23 | 13 | 75 | 11 | 13 | 44 | 5 |
| Tumor site | ||||||||||
| Hypopharynx/larynx | 37 | 7 | 20 | 7 | 7 | 20 | 2 | 3 | 15 | 0 |
| Nasopharynx | 12 | 0 | 8 | 2 | 2 | 8 | 1 | 0 | 5 | 0 |
| Oropharynx | 99 | 9 | 57 | 20 | 10 | 69 | 9 | 11 | 32 | 9 |
| Other | 5 | 1 | 2 | 1 | 0 | 2 | 1 | 1 | 1 | 0 |
| Radiation modality | ||||||||||
| IMRT | 37 | 2 | 25 | 5 | 1 | 29 | 2 | 2 | 18 | 1 |
| Tomotherapy | 116 | 15 | 62 | 25 | 18 | 70 | 11 | 13 | 35 | 8 |
| Ipsi parotid dose (Gy) | ||||||||||
| Median | 24.3 | 22.9 | 25.0 | 23.0 | 19.5 | 24.8 | 25.9 | 22.9 | 23.8 | 24.5 |
| Q1–Q3 | 20.6–27.6 | 18.5–24.6 | 21.4–29.0 | 21.4–25.4 | 16.8–24.3 | 21.8–28.7 | 21.8–27.2 | 18.5–31.5 | 20.8–26.4 | 21.6–26.2 |
| Range | 0.4–63.4 | 0.4–36.0 | 7.4–61.4 | 4.6–59.0 | 0.4–32.9 | 4.6–61.4 | 17.3–63.4 | 0.4–51.4 | 4.6–46.0 | 17.3–63.4 |
| Contra parotid dose (Gy) | ||||||||||
| Median | 19.9 | 19.4 | 20.3 | 19.6 | 15.6 | 20.5 | 20.4 | 12.7 | 19.7 | 20.1 |
| Q1–Q3 | 15.4–23.1 | 13.1–21.8 | 15.2–23.8 | 16.5–22.0 | 10.3–20.7 | 16.3–23.8 | 19.8–23.1 | 5.2–17.9 | 16.3–23.7 | 16.4–22.3 |
| Range | 0.3–30.9 | 0.3–24.9 | 4.1–28.6 | 4.2–26.2 | 0.3–27.9 | 4.1–30.9 | 15.1–26.2 | 0.3–27.9 | 4.1–27.2 | 15.1–26.0 |
The total number of patients differs among the groups due to the follow-up availability.
Figure 1Frequency of the follow-up reports collection.
Feature sets before and after the removal of highly correlated pairs (Kendall’s |τ| > 0.5).
| Feature group | Initial feature set | Final feature set |
|---|---|---|
| Demographics | Age, sex | Age, sex |
| Parotid shape | Volume, area, sphericity, eccentricity, compactness, λ | Volume, sphericity, eccentricity |
| Dose–volume histogram | Mean, spread, skewness, D2, D98, D10, D20, D30, D40, D50, D60, D70, D80, D90, V10, V15, V20, V25, V30, V35, V40, V45, entropy, uniformity | Mean, spread, skewness |
| Subvolume mean dose | ||
| Spatial dose gradient | Gradientx, gradienty, gradientz | Gradientx, gradienty, gradientz |
| Spatial dose spread | ||
| Spatial dose correlation | ||
| Spatial dose skewness | ||
| Spatial dose coskewness |
Feature definitions are provided in Appendix .
Figure 2The workflow of a multivariate five-step model building comprising, in this order, feature-group selection, feature scaling, sampling, feature selection, and classification.
Predictive performance of the mean-dose models and the morphological model proposed by Buettner et al. (4), that is logistic regression with , , , and .
| End point | Model | AUC | ||
|---|---|---|---|---|
| Early | Meani | 0.58 (0.56–0.60) | ||
| Meanc | 0.42 (0.41–0.44) | |||
| Meanb | 0.50 (0.48–0.53) | |||
| Meani, meanc | 0.49 (0.48–0.51) | |||
| Morphological | 0.42 (0.40–0.44) | |||
| Late | Meani | 0.48 (0.44–0.51) | ||
| Meanc | 0.58 (0.55–0.61) | |||
| Meanb | 0.55 (0.52–0.58) | |||
| Meani, meanc | 0.54 (0.51–0.57) | |||
| Morphological | 0.59 (0.56–0.62) | |||
| Long-term | Meani | 0.40 (0.37–0.44) | ||
| Meanc | 0.58 (0.55–0.61) | |||
| Meanb | 0.56 (0.52–0.60) | |||
| Meani, meanc | 0.47 (0.44–0.50) | |||
| Morphological | 0.64 (0.60–0.67) | |||
| Longitudinal | Meani | 0.51 (0.45–0.56) | ||
| Meanc | 0.57 (0.51–0.62) | |||
| Meanb | 0.50 (0.44–0.55) | |||
| Meani, meanc | 0.52 (0.46–0.58) | |||
| Morphological | 0.55 (0.49–0.60) | |||
i, ipsilateral gland; c, contralateral gland; b, both glands.
Figure 3Predictive power of individual features in the time-specific models measured with the area under the receiver operating characteristic curve (AUC). The left-hand side vertical axis lists the features, the right-hand side vertical axis lists the feature groups. The AUCs were calculated from the corresponding Mann–Whitney U statistic. Bars marked with * are significant at the false discovery rate (FDR) ≤ 0.05.
Figure 4The mean dose and the absolute right–left dose gradient distribution in our patient cohort.
Figure 5A comparison of classification, feature selection, and sampling algorithms in terms of their predictive performance in model tuning. All heat maps in a given column belong to a single end point, whereas all heat maps in a given row correspond to a single classifier. In each heat map, rows represent feature selection algorithms and columns correspond to sampling methods. The color maps are normalized per end point. The color bar ticks correspond to the worst, average, and the best model performance.
Figure 6Heat maps showing a proportion of times a given algorithm on the vertical axis outperformed another algorithm on the horizontal axis in terms of the best AUC in model tuning. For example, support vector machines (SVM) performed better than extra-trees (ET) in 73% of the time-specific models.
Figure 7A comparison of classification, feature selection, and sampling methods against one another with the Nemenyi test. Lower ranks correspond to better performance of the algorithm, that is rank 1 is the best. Algorithms which ranks differ by less than the critical difference (CD) are not significantly different at 0.05 significance level and are connected by the black bars.
Expected generalization performance of selected models evaluated by nested cross-validation.
| End point | Classifier | Feature selection | Sampling | AUC tuning | AUC testing |
|---|---|---|---|---|---|
| Early | LR-L1 | RFE-ET | NCL | 0.62 (0.60–0.64) | 0.56 (0.53–0.60) |
| LR-L2 | RFE-LR | NCL | 0.62 (0.60–0.64) | 0.46 (0.42–0.49) | |
| LR-EN | MB-ET | NCL | 0.62 (0.60–0.64) | 0.54 (0.50–0.57) | |
| kNN | UFS-F | SMOTE + ENN | 0.68 (0.66–0.70) | 0.65 (0.62–0.68)a | |
| SVM | UFS-F | None | 0.70 (0.68–0.72) | 0.57 (0.53–0.61) | |
| ET | MB-LR | NCL | 0.63 (0.61–0.65) | 0.44 (0.41–0.47) | |
| GTB | UFS-F | None | 0.66 (0.64–0.68) | 0.55 (0.51–0.59) | |
| Late | LR-L1 | RFE-LR | NCL | 0.78 (0.75–0.80) | 0.63 (0.56–0.69) |
| LR-L2 | RFE-LR | NCL | 0.76 (0.73–0.78) | 0.60 (0.53–0.66) | |
| LR-EN | MB-LR | SMOTE + TL | 0.73 (0.70–0.76) | 0.56 (0.51–0.62) | |
| kNN | MB-LR | NCL | 0.78 (0.76–0.80) | 0.62 (0.57–0.67) | |
| SVM | UFS-F | TL | 0.80 (0.77–0.82) | 0.52 (0.46–0.58) | |
| ET | RFE-ET | NCL | 0.78 (0.75–0.80) | 0.55 (0.50–0.61) | |
| GTB | MB-LR | OSS | 0.77 (0.75–0.79) | 0.65 (0.59–0.70)a | |
| Long-term | LR-L1 | MB-LR | ROS | 0.95 (0.94–0.96) | 0.86 (0.80–0.90) |
| LR-L2 | MB-LR | None | 0.96 (0.95–0.97) | 0.86 (0.81–0.90) | |
| LR-EN | MB-LR | SMOTE + ENN | 0.92 (0.90–0.93) | 0.83 (0.76–0.88) | |
| kNN | UFS-MI | TL | 0.88 (0.86–0.90) | 0.74 (0.68–0.80) | |
| SVM | RFE-LR | ENN | 0.94 (0.92–0.96) | 0.79 (0.73–0.85) | |
| ET | MB-LR | ENN | 0.93 (0.92–0.94) | 0.88 (0.84–0.91)a | |
| GTB | UFS-F | ROS | 0.89 (0.86–0.91) | 0.77 (0.71–0.83) | |
| Longitudinal | LR-L1 | UFS-MI | None | 0.63 (0.57–0.68) | 0.52 (0.41–0.61) |
| LR-L2 | RFE-LR | NCL | 0.60 (0.55–0.66) | 0.39 (0.29–0.48) | |
| LR-EN | UFS-MI | TL | 0.62 (0.57–0.68) | 0.52 (0.42–0.60) | |
| kNN | UFS-MI | NCL | 0.65 (0.61–0.69) | 0.58 (0.49–0.66) | |
| SVM | UFS-MI | OSS | 0.66 (0.60–0.71) | 0.57 (0.46–0.66) | |
| ET | UFS-MI | TL | 0.66 (0.61–0.71) | 0.51 (0.40–0.60) | |
| GTB | RFE-LR | ROS | 0.68 (0.62–0.72) | 0.63 (0.52–0.71)a | |
.
Figure 8Features underlying the multivariate models of long-term xerostomia. i, ipsilateral gland; c, contralateral gland.
| LR-L1 | Logistic regression with L1 penalty |
| LR-L2 | Logistic regression with L2 penalty |
| LR-EN | Logistic regression with elastic net penalty |
| kNN | k-Nearest neighbors |
| SVM | Support vector machine |
| ET | Extra-trees |
| GTB | Gradient tree boosting |
| UFS-F | Univariate feature selection by F-score |
| UFS-MI | Univariate feature selection by mutual information |
| RFE-LR | Recursive feature elimination by logistic regression |
| RFE-ET | Recursive feature elimination by extra-trees |
| MB-LR | Model-based feature selection by logistic regression |
| MB-ET | Model-based feature selection by extra-trees |
| ROS | Random oversampling |
| SMOTE | Synthetic minority |
| oversampling | |
| ADASYN | Adaptive synthetic sampling |
| OSS | One-sided selection |
| TL | Tomek links |
| ENN | Wilson’s edited nearest |
| neighbor rule | |
| NCL | Neighborhood cleaning rule |
| SMOTE + ENN | SMOTE followed by the ENN |
| SMOTE + TL | SMOTE followed by TL |
Hyperparameters used to tune the sampling algorithms.
| Algorithm | Hyperparameters | Values |
|---|---|---|
| ROS | – | – |
| SMOTE | {3,4,5} | |
| {7,8,9} | ||
| {“regular,” “borderline1,” “borderline2”} | ||
| ADASYN | {3,5,8} | |
| OSS | – | – |
| TL | – | – |
| ENN | {2,3,5} | |
| {“all,” “mode”} | ||
| NCL | {2,3,5} | |
| SMOTE + TL | – | – |
| SMOTE + ENN | – | – |
Hyperparameters not listed in this table assumed the default values of imbalanced-learn package (.
Hyperparameters used to tune the feature selection algorithms.
| Algorithm | Hyperparameters | Values |
|---|---|---|
| UFS-F | {2,3,4,5,6} | |
| UFS-MI | {2,3,4,5,6} | |
| RFE-LR | {2,3,4,5,6} | |
| 1 | ||
| remove at each iteration. | ||
| {None, “balanced”} | ||
| are equal or inversely proportional | ||
| to class frequencies. | ||
| {2−5, 2−4.985, | ||
| 2−4.97, …, 210} | ||
| “l2” | ||
| RFE-ET | {2,3,4,5,6} | |
| 0.5 | ||
| at each iteration. | ||
| {None, “balanced,” | ||
| are equal or inversely proportional to | “balanced_subsample”} | |
| class frequencies. | ||
| [90,140] | ||
| decision trees. | ||
| MB-LR | {2,3,4,5,6} | |
| {None, “balanced”} | ||
| are equal or inversely proportional to | ||
| class frequencies. | ||
| {2−5, 2−4.985, | ||
| 2−4.97, …, 210} | ||
| {“l1,” “l2”} | ||
| MB-ET | {2,3,4,5,6} | |
| {None, “balanced,” | ||
| are equal or inversely proportional | “balanced_subsample”} | |
| to class frequencies. | ||
| [90,140] | ||
| decision trees. |
Hyperparameters not listed in this table assumed the default values of scikit-learn package (.
Hyperparameters used to tune the classification algorithms.
| Algorithm | Hyperparameters | Values |
|---|---|---|
| LR-L1 | {None, “balanced”} | |
| {2−5, 2−4.985, 2−4.97, …, 210} | ||
| LR-L2 | {None, “balanced”} | |
| {2−5, 2−4.985, 2−4.97, …, 210} | ||
| LR-EN | {None, “balanced”} | |
| {2−10, 2−9.985, 2−9.97, …, 25} | ||
| [0,1] | ||
| kNN | {1,2,3,…,9} | |
| {1,2,∞} | ||
| SVM | {None, “balanced”} | |
| {2−5, 2−4.985, 2−4.97, …, 210} | ||
| {2−15, 2−14.982, 2−14.964, …, 23} | ||
| ET | [90, 230] | |
| {None, “balanced”} | ||
| {“gini,” “entropy”} | ||
| {0.05, 0.10, 0.15,…,1} | ||
| {2,3,4,…,20} | ||
| {1,2,3,…,20} | ||
| GTB | [200, 2000] | |
| {2−7, 2−6.994, 2−6.988, …, 2−1} | ||
| {1,2,3,…,6} | ||
| {0.05,0.1,0.3,0.5,0.7,0.9,1} | ||
| {1,3,5,7} | ||
| {0.6,0.65,0.70,…,1} | ||
| [0,1] | ||
| [0,1] |
Hyperparameters not listed in this table assumed the default values of scikit-learn (.