| Literature DB >> 34066857 |
Reza Reiazi1,2, Colin Arrowsmith1, Mattea Welch1,2,3, Farnoosh Abbas-Aghababazadeh1, Christopher Eeles1, Tony Tadic1,2,3, Andrew J Hope1,3, Scott V Bratman1,2,3, Benjamin Haibe-Kains1,2,4,5,6.
Abstract
Studies have shown that radiomic features are sensitive to the variability of imaging parameters (e.g., scanner models), and one of the major challenges in these studies lies in improving the robustness of quantitative features against the variations in imaging datasets from multi-center studies. Here, we assess the impact of scanner choice on computed tomography (CT)-derived radiomic features to predict the association of oropharyngeal squamous cell carcinoma with human papillomavirus (HPV). This experiment was performed on CT image datasets acquired from two different scanner manufacturers. We demonstrate strong scanner dependency by developing a machine learning model to classify HPV status from radiological images. These experiments reveal the effect of scanner manufacturer on the robustness of radiomic features, and the extent of this dependency is reflected in the performance of HPV prediction models. The results of this study highlight the importance of implementing an appropriate approach to reducing the impact of imaging parameters on radiomic features and consequently on the machine learning models, without removing features which are deemed non-robust but may contain learning information.Entities:
Keywords: computed tomography; human papillomavirus; oropharyngeal cancer; radiomics; robustness
Year: 2021 PMID: 34066857 PMCID: PMC8125906 DOI: 10.3390/cancers13092269
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1Schematic diagram of the research methodology. Downstream processes are as follows: sampling original patient cohort, train and test set splitting, class imbalance adjustment followed by selecting robust (Wilcoxon rank-sum) and HPV-relevant features (mRMRe), and finally model validation by estimating AUC values over the test set. The overall process is repeated 1000 times (also w/random variables) to evaluate the statistical significance of the reported values.
Figure 2t-SNE clusters labeled by scanner manufacturer ((A) red: GE, blue: Toshiba) and the samples’ HPV status ((B) orange: HPV negative, green: HPV positive). The corresponding silhouette analysis and average silhouette score is shown on the right. The impact of scanner manufacturer is clearly seen when samples are labeled by manufacturer type. However, radiomic features do not show intrinsic dependency on the sample’s HPV status.
Figure 3Percentage of robust features according to the type of feature group (A,C) and imaging filters (B,D). (A,B) have been normalized to the total number of robust features and (C,D) have been normalized to the number of features in each feature group (C,D).
Figure 4Percentage of HPV-relevant features for different samples (GE, Toshiba and mix) according to the type of feature group and imaging filters prior to robustness evaluation (A,B) and after (C,D). (A,B): GLRLM: Gray Level Run Length Matrix; GLSZM: Gray Level Size Zone Matrix; FO: First Order Statistics; GLCM: Gray Level Co-Occurrence Matrix; GLDM: Gray Level Dependence Matrix; NGTDM: Neighboring Gray Tone Difference Matrix. (C,D): Orig: Original; Exp: Exponential; Gra: Gradient; LBP: Local Binary Pattern; Log: Logarithm; LoG: Laplacian of Gaussian; Sq: Square, SqR; Square Root; and WL: Wavelet.
Figure 5Venn diagram of the common radiomic features selected out of samples from different CT scanner types from (A) all radiomic features and (B) only robust features.
Figure 6The prediction accuracy (AUC) of HPV status obtained by the RF Classifiers for 9 configurations of scanner manufacturers, used for training and tests after 100 runs. The Wilcoxon rank–sum test was applied to select robust features against the scanner models (adjusted p-value > 10−2, Bonferroni correction). The mRMRe was used to select HPV-relevant features. The model was trained and tested on different sets based on their scanner manufacturer (T: Toshiba, G: GE, M: mix) with a different number of features (mRMRe and mRMR + Robust). The corresponding scatter plots (color circles below each violin plot) are from the same model but with random dependent variables.