| Literature DB >> 35029732 |
Chen Chen1, Yuhui Qin1, Haotian Chen1, Junying Cheng2, Bo He1, Yixuan Wan1, Dongyong Zhu1, Fabao Gao3, Xiaoyue Zhou4.
Abstract
OBJECTIVE: We used radiomics feature-based machine learning classifiers of apparent diffusion coefficient (ADC) maps to differentiate small round cell malignant tumors (SRCMTs) and non-SRCMTs of the nasal and paranasal sinuses. MATERIALS: A total of 267 features were extracted from each region of interest (ROI). Datasets were randomized into two sets, a training set (∼70%) and a test set (∼30%). We performed dimensional reductions using the Pearson correlation coefficient and feature selection analyses (analysis of variance [ANOVA], relief, recursive feature elimination [RFE]) and classifications using 10 machine learning classifiers. Results were evaluated with a leave-one-out cross-validation analysis.Entities:
Keywords: Apparent diffusion coefficient; Machine learning; Neoplasms; Radiomics
Mesh:
Year: 2022 PMID: 35029732 PMCID: PMC9123077 DOI: 10.1007/s00330-021-08465-w
Source DB: PubMed Journal: Eur Radiol ISSN: 0938-7994 Impact factor: 7.034
Fig. 1a shows axial ADC of a 36-year-old male patient with SCC. b Corresponding ROI, (c) parts of 267 feature values, and (d) histogram maps are shown. (e) shows axial ADC of a 44-year-old male patient with ONB. (f) Corresponding ROI, (g) some of 267 feature values, and (h) histogram maps are shown
Texture analysis methods and the corresponding texture features
| Method | Texture feature parameters |
|---|---|
| Histogram (9) | Mean, variance, skewness, kurtosis, and percentiles (1%, 10%, 50%, 90%, and 99%) |
| Gray-level co-occurrence matrix (GLCM) (220) | Angular second moment (AngScMom), contrast, inverse different moment (IDM), entropy (Ent), correlation (Correlat), sum of squares (SumOfSqs), sum average (SumAverg), sum variance (SumVarnc), sum entropy (SumEntrp), difference variance (DifVarnc), difference entropy (DifEntrp) along the 0°, 45°, 90°, 135°, and |
| Gray‐level run‐length matrix (GLRLM) (20) | Run-length nonuniformity (RLNonUni), gray-level nonuniformity (GLevNonU), long run emphasis (LngREmph), short run emphasis (ShrtREmp), fraction of image in runs (Fraction) of four different angels (horizontal, vertical, diagonal 45, and digonal135) |
| Auto‐regressive model (ARM) (5) | Teta1, Teta2, Teta3, Teta4, Sigma |
| Wavelet transform (WAV) (8) | Energy computed from the low–low frequency band within the first image scale (WavEnLL_s-1), WavEnLH_s-1, WavEnHL_s-1, WavEnHH_s-1, WavEnLL_s-2, WavEnLH_s-2, WavEnHL_s-2, WavEnHH_s-2 |
| Absolute gradient statistics (AGS) (5) | Absolute gradient mean (GrMean), variance (GrVariance), skewness (GrSkewness), kurtosis (GrKurtosis), nonzeros (GrNonZeros) |
The parameters of the algorithms
| Algorithms | Parameters |
|---|---|
| SVM | C = 1.0, kernel = ‘rbf’, degree = 3, gamma = ‘scale’, coef0 = 0.0, shrinking = True, probability = False, tol = 0.001, cache_size = 200, class_weight = None, verbose = False, max_iter =—1, decision_function_shape = ‘ovr’, break_ties = False, random_state = None |
| AE | hidden_layer_sizes = (100), activation = ‘relu’, *, solver = ‘adam’, alpha = 0.0001, batch_size = ‘auto’, learning_rate = ‘constant’, learning_rate_init = 0.001, power_t = 0.5, max_iter = 200, shuffle = True, random_state = None, tol = 0.0001, verbose = False, warm_start = False, momentum = 0.9, nesterovs_momentum = True, early_stopping = False, validation_fraction = 0.1, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-08, n_iter_no_change = 10, max_fun = 15,000 |
| LDA | solver = ‘svd’, shrinkage = None, priors = None, n_components = None, store_covariance = False, tol = 0.0001 |
| RF | n_estimators = 100, *, criterion = ‘gini’, max_depth = None, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = ‘auto’, max_leaf_nodes = None, min_impurity_decrease = 0.0, min_impurity_split = None, bootstrap = True, oob_score = False, n_jobs = None, random_state = None, verbose = 0, warm_start = False, class_weight = None, ccp_alpha = 0.0, max_samples = None |
| LR | penalty = ‘l2’, *, dual = False, tol = 0.0001, C = 1.0, fit_intercept = True, intercept_scaling = 1, class_weight = None, random_state = None, solver = ‘lbfgs’, max_iter = 100, multi_class = ‘auto’, verbose = 0, warm_start = False, n_jobs = None, l1_ratio = None |
| LRLasso | alpha = 1.0, *, fit_intercept = True, normalize = False, precompute = False, copy_X = True, max_iter = 1000, tol = 0.0001, warm_start = False, positive = False, random_state = None, selection = ‘cyclic’ |
| AB | base_estimator = None, *, n_estimators = 50, learning_rate = 1.0, algorithm = ‘SAMME.R’, random_state = None |
| DT | criterion = ‘gini’, splitter = ‘best’, max_depth = None, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = None, random_state = None, max_leaf_nodes = None, min_impurity_decrease = 0.0, min_impurity_split = None, class_weight = None, ccp_alpha = 0.0 |
| GP | kernel = None, *, optimizer = ‘fmin_l_bfgs_b’, n_restarts_optimizer = 0, max_iter_predict = 100, warm_start = False, copy_X_train = True, random_state = None, multi_class = ‘one_vs_rest’, n_jobs = None |
| NB | alpha = 1.0, binarize = 0.0, fit_prior = True, class_prior = None |
Fig. 2A schematic diagram for the whole radiomics and machine learning pipeline
Fig. 3Model performance generated using recursive feature elimination. a Receiver operating characteristic (ROC) curves of this model using different datasets. b FeAture Explorer (FAE) software suggested a candidate eight-feature model according to the “one-standard error” rule. c The contribution of features in the final model
Fig. 4Model performance generated using the analysis of variance (ANOVA). a Receiver operating characteristic (ROC) curves of this model using different datasets. b FeAture Explorer (FAE) software suggested a candidate one-feature model according to the “one-standard error” rule. c The contribution of features in the final model
Fig. 5Performance of model generated by relief. a Receiver operating characteristic (ROC) curves of this model using different datasets. b FeAture Explorer (FAE) software suggested a candidate one-feature model according to the “one-standard error” rule. c The contribution of features in the final model
Fig. 6Areas under the curve (AUCs) looking at different datasets. Feature selections using (a) recursive feature elimination (RFE), (b) analysis of variance (ANOVA), and (c) relief
The optimal area under the receiver operator characteristics curve (AUC), 95% confidence interval (CI), standard error, accuracy, Youden index, sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) of all algorithm classifications with leave-one-out cross-validation
| Feature set | AUC | 95% CIs | Std | Acc | Youden Index | Sen | Spe | PPV | NPV |
|---|---|---|---|---|---|---|---|---|---|
| Zscore_PCC_ANOVA_1_AE | 0.895 | [0.8260–0.9533] | 0.033 | 0.840 | 0.513 | 0.833 | 0.848 | 0.877 | 0.796 |
| Zscore_PCC_ANOVA_3_LDA | 0.891 | [0.8212–0.9481] | 0.032 | 0.840 | 0.367 | 0.850 | 0.826 | 0.864 | 0.809 |
| Zscore_PCC_ANOVA_3_LRLasso | 0.887 | [0.8114–0.9512] | 0.035 | 0.849 | 0.359 | 0.883 | 0.804 | 0.855 | 0.841 |
| Zscore_PCC_ANOVA_8_SVM | 0.885 | [0.8100–0.9461] | 0.034 | 0.830 | 0.586 | 0.767 | 0.913 | 0.920 | 0.750 |
| Zscore_PCC_ANOVA_3_LR | 0.884 | [0.8103–0.9449] | 0.035 | 0.840 | 0.388 | 0.883 | 0.783 | 0.841 | 0.837 |
| Zscore_PCC_ANOVA_1_NB | 0.878 | [0.8025–0.9444] | 0.036 | 0.840 | 0.403 | 0.833 | 0.848 | 0.877 | 0.796 |
| Zscore_PCC_ANOVA_3_RF | 0.878 | [0.7977–0.9457] | 0.038 | 0.859 | 0.520 | 0.867 | 0.848 | 0.881 | 0.830 |
| Zscore_PCC_ANOVA_4_GP | 0.869 | [0.7940–0.9343] | 0.036 | 0.821 | 0.482 | 0.817 | 0.826 | 0.860 | 0.776 |
| Zscore_PCC_ANOVA_3_AB | 0.865 | [0.7867–0.9337] | 0.038 | 0.802 | 0.529 | 0.683 | 0.957 | 0.954 | 0.698 |
| Zscore_PCC_ANOVA_5_DT | 0.811 | [0.7298–0.8842] | 0.039 | 0.811 | 1.000 | 0.817 | 0.804 | 0.845 | 0.771 |
| Zscore_PCC_RFE_10_GP | 0.902 | [0.8379–0.9519] | 0.029 | 0.830 | 0.498 | 0.867 | 0.783 | 0.839 | 0.818 |
| Zscore_PCC_RFE_1_AE | 0.895 | [0.8260–0.9533] | 0.033 | 0.840 | 0.513 | 0.833 | 0.848 | 0.877 | 0.796 |
| Zscore_PCC_RFE_8_RF | 0.894 | [0.8251–0.9525] | 0.033 | 0.849 | 0.565 | 0.833 | 0.870 | 0.893 | 0.800 |
| Zscore_PCC_RFE_3_LDA | 0.891 | [0.8208–0.9478] | 0.032 | 0.840 | 0.367 | 0.850 | 0.826 | 0.864 | 0.809 |
| Zscore_PCC_RFE_3_LRLasso | 0.886 | [0.8107–0.9502] | 0.035 | 0.849 | 0.359 | 0.883 | 0.804 | 0.855 | 0.841 |
| Zscore_PCC_RFE_1_SVM | 0.883 | [0.8073–0.9460] | 0.035 | 0.821 | 0.672 | 0.733 | 0.935 | 0.936 | 0.729 |
| Zscore_PCC_RFE_3_LR | 0.883 | [0.8096–0.9449] | 0.035 | 0.840 | 0.388 | 0.883 | 0.783 | 0.841 | 0.837 |
| Zscore_PCC_RFE_1_NB | 0.878 | [0.8025–0.9444] | 0.036 | 0.840 | 0.403 | 0.833 | 0.848 | 0.877 | 0.796 |
| Zscore_PCC_RFE_3_AB | 0.865 | [0.7867–0.9337] | 0.038 | 0.802 | 0.529 | 0.683 | 0.957 | 0.954 | 0.698 |
| Zscore_PCC_RFE_9_DT | 0.808 | [0.7304–0.8795] | 0.039 | 0.811 | 1.000 | 0.833 | 0.783 | 0.833 | 0.783 |
| Zscore_PCC_Relief_5_LRLasso | 0.886 | [0.8108–0.9483] | 0.035 | 0.849 | 0.359 | 0.883 | 0.804 | 0.855 | 0.841 |
| Zscore_PCC_Relief_5_LDA | 0.884 | [0.8088–0.9435] | 0.033 | 0.840 | 0.332 | 0.883 | 0.783 | 0.841 | 0.837 |
| Zscore_PCC_Relief_5_SVM | 0.883 | [0.8077–0.9454] | 0.035 | 0.830 | 0.511 | 0.817 | 0.848 | 0.875 | 0.780 |
| Zscore_PCC_Relief_5_LR | 0.882 | [0.8055–0.9454] | 0.035 | 0.830 | 0.417 | 0.850 | 0.804 | 0.850 | 0.804 |
| Zscore_PCC_Relief_3_GP | 0.880 | [0.8054–0.9423] | 0.035 | 0.840 | 0.552 | 0.800 | 0.891 | 0.906 | 0.774 |
| Zscore_PCC_Relief_3_NB | 0.875 | [0.7950–0.9423] | 0.037 | 0.830 | 0.362 | 0.817 | 0.848 | 0.875 | 0.780 |
| Zscore_PCC_Relief_2_AE | 0.871 | [0.7907–0.9373] | 0.037 | 0.821 | 0.441 | 0.817 | 0.826 | 0.860 | 0.776 |
| Zscore_PCC_Relief_19_RF | 0.869 | [0.7947–0.9347] | 0.035 | 0.821 | 0.645 | 0.767 | 0.891 | 0.902 | 0.746 |
| Zscore_PCC_Relief_5_AB | 0.855 | [0.7652–0.9254] | 0.040 | 0.821 | 0.513 | 0.767 | 0.891 | 0.902 | 0.746 |
| Zscore_PCC_Relief_9_DT | 0.786 | [0.7069–0.8678] | 0.041 | 0.793 | 1.000 | 0.833 | 0.739 | 0.807 | 0.773 |
SVM, support vector machine; LDA, linear discriminant analysis; AE, auto-encoder; RF, random forests; LR, logistic regression; LRLasso, logistic regression via Lasso; AB, ada-boost; DT, decision tree; GP, Gaussian process; NB, naive Bayes