| Literature DB >> 35520623 |
Okan İnce1, Hülya Yıldız1, Tanju Kisbet1, Şükrü Mehmet Ertürk2, Hakan Önder1.
Abstract
Purpose: This study aims to evaluate the potential of machine learning algorithms built with radiomics features from computed tomography urography (CTU) images that classify RB1 gene mutation status in bladder cancer. Method: The study enrolled CTU images of 18 patients with and 54 without RB1 mutation from a public database. Image and data preprocessing were performed after data augmentation. Feature selection steps were consisted of filter and wrapper methods. Pearson's correlation analysis was the filter, and a wrapper-based sequential feature selection algorithm was the wrapper. Models with XGBoost, Random Forest (RF), and k-Nearest Neighbors (kNN) algorithms were developed. Performance metrics of the models were calculated. Models' performances were compared by using Friedman's test.Entities:
Keywords: Bladder; CT; Computer applications-detection; Diagnosis
Year: 2022 PMID: 35520623 PMCID: PMC9061624 DOI: 10.1016/j.heliyon.2022.e09311
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Radiomics workflow is presented. GLDM: Gray-level dependence matrix, GLCM: Gray-level co-occurrence matrix, GLRLM: Gray-level run-length matrix, GLSZM: Gray-level size zone matrix, NGTDM: Neighboring gray-tone difference matrix.
According to the Radiomics Quality Score, the strengths of this study are as follows: (21 points).
| Radiomics Quality Score |
|---|
Well documented image quality protocols. (Criteria-1, 1 point) |
Implementing a two-step feature reduction to reduce the risk overfitting. Using 5-fold cross validation technic in the wrapper method also played individual role in reducing the risk of overfitting. (Criteria-5, 3 points) |
Discussing the correlation of the radiomics models with a biological gene. (Criteria-7, 1 point) |
Reporting discrimination analysis with using cross-validation/bootstrapping (used within some of the algorithms) technic as well. (Criteria-9, 2 points) |
Reporting calibration statistics with using cross-validation/bootstrapping (used within some of the algorithms) technic as well. (Criteria-10, 2 points) |
The models’ performances were validated using an independent test set that was created by splitting the main data from multiple centers. (Criteria- 12, 5 points) |
The gold standard of the study was histopathological and genomic examinations which were included in TGCA-BLCA database. The models’ performances were evaluated according to the results of the gold standard. (Criteria-13, 2 points). |
Discussing the clinical implementations of models in the future. (Criteria-14, 2 points) |
Using an open-source data. Since the data was freely accessible, medical images can be downloaded from The Cancer Imaging Archive. The project IDs of the patients from each group can be found in the TCGA-BLCA project, are also shareable upon request to the corresponding author. The segmentation labels and radiomics features dataset are also shareable upon request to the corresponding author. (Criteria-16, 3 points) |
Figure 22D segmentation of tumor from anterior bladder wall in the preprocessed image is presented.
Total features extracted from original and wavelet filtered images. (GLDM: Gray-level dependence matrix, GLCM: Gray-level co-occurrence matrix, GLRLM: Gray-level run length matrix, GLSZM: Gray-level size-zone matrix, NGTDM: Neighboring gray-tone difference matrix).
| Shape | First Order | GLDM | GLCM | GLRLM | GLSZM | NGTDM |
|---|---|---|---|---|---|---|
| Voxel Volume | Interquartile Range | Gray Level Variance | Joint Average | Short Run Low Gray Level Emphasis | Gray Level Variance | Coarseness |
| Maximum 3D Diameter | Skewness | High Gray Level Emphasis | Sum Average | Gray Level Variance | Zone Variance | Complexity |
| Mesh Volume | Uniformity | Dependence Entropy | Joint Entropy | Low Gray Level Run Emphasis | Gray Level Non Uniformity Normalized | Strength |
| Major Axis Length | Median | Dependence Non Uniformity | Cluster Shade | Gray Level Non Uniformity Normalized | Size Zone Non Uniformity Normalized | Contrast |
| Sphericity | Energy | Gray Level Non Uniformity | Maximum Probability | Run Variance | Size Zone Non Uniformity | Busyness |
| Least Axis Length | Robust Mean Absolute Deviation | Small Dependence Emphasis | Idmn | Gray Level Non Uniformity | Gray Level Non Uniformity | |
| Elongation | Mean Absolute Deviation | Small Dependence High Gray Level Emphasis | Joint Energy | Long Run Emphasis | Large Area Emphasis | |
| Surface Volume Ratio | Total Energy | Dependence Non Uniformity Normalized | Contrast | Short Run High Gray Level Emphasis | Small Area High Gray Level Emphasis | |
| Maximum 2D Diameter Slice | Maximum | Large Dependence Emphasis | Difference Entropy | Run Length Non Uniformity | Zone Percentage | |
| Flatness | Root Mean Squared | Large Dependence Low Gray Level Emphasis | Inverse Variance | Short Run Emphasis | Large Area Low Gray Level Emphasis | |
| Surface Area | 90 Percentile | Dependence Variance | Difference Variance | Long Run High Gray Level Emphasis | Large Area High Gray Level Emphasis | |
| Minor Axis Length | Minimum | Large Dependence High Gray Level Emphasis | Idn | Run Percentage | High Gray Level Zone Emphasis | |
| Maximum 2D Diameter Column | Entropy | Small Dependence Low Gray Level Emphasis | Idm | Long Run Low Gray Level Emphasis | Small Area Emphasis | |
| Maximum 2D Diameter Row | Range | Low Gray Level Emphasis | Correlation | Run Entropy | Low Gray Level Zone Emphasis |
Demographic characteristics of included patients by their RB1 mutation status.
| with RB1 mutation | without RB1 mutation | |
|---|---|---|
| Age (mean ± SD) | 69.1 ± 7.5 | 69.1 ± 11 |
| Sex (female/male) | 4/14 | 14/40 |
| Scanner (Vendor/Model) | 4/11 | 4/14 |
RB1: Retinoblastoma – 1 gene.
Selected features after both filter and wrapper methods.
| Feature Label | Image Type | Feature Class | Feature Name |
|---|---|---|---|
| TexF1 | wavelet-HLL | GLDM | Dependence Entropy |
| TexF2 | wavelet-LHL | FIRST ORDER | Maximum |
| TexF3 | wavelet-LHH | GLRLM | Low Gray Level Run Emphasis |
| TexF4 | wavelet-LHH | GLSZM | Gray Level Variance |
| TexF5 | wavelet-LHH | GLSZM | High Gray Level Zone Emphasis |
| TexF6 | wavelet-LLH | FIRST ORDER | Energy |
| TexF7 | wavelet-LLH | GLSZM | Gray Level Variance |
| TexF8 | wavelet-LLH | NGTDM | Strength |
| TexF9 | wavelet-HHH | FIRST ORDER | Skewness |
| TexF10 | wavelet-HHH | GLSZM | Gray Level Non Uniformity |
| TexF11 | wavelet-LLL | GLCM | Correlation |
| TexF12 | original | GLCM | Correlation |
L: Low, H: High, GLDM: Gray-level dependence matrix, GLRLM: Gray-level run-length matrix, GLSZM: Gray-level size-zone-matrix, NGTDM: Neighboring Gray-tone difference matrix, GLCM: Gray-level co-occurrence matrix.
Figure 3Correlation matrix of selected features are shown in the heatmap. None of the features are correlated to each other (r < 0.7).
Performance metrics and confusion matrices of three algorithms.
| Accuracy | Sensitivity | Specificity | Precision | Recall | F1 | AUC | TP/FP | TN/FN | |
|---|---|---|---|---|---|---|---|---|---|
| XGBoost | 84% | 80% | 88% | 86% | 80% | 0.83 | 0.84 | 12/2 | 15/3 |
| RF | 72% | 80% | 65% | 67% | 80% | 0.73 | 0.72 | 12/6 | 11/3 |
| kNN | 66% | 53% | 76% | 67% | 53% | 0.60 | 0.65 | 8/4 | 13/7 |
RF: Random forest, kNN: k-Nearest Neighbors, AUC: Area under receiver operator characteristics curve, TP: True positive, FP: False positive, TN: True negative, FN: False-negative.
Figure 4Calibration plot of three algorithms is shown. Each of the algorithms has better calibration, especially in higher probabilities.