| Literature DB >> 34016188 |
Simon J Doran1, Santosh Kumar2, Matthew Orton2, James d'Arcy2, Fenna Kwaks2, Elizabeth O'Flynn3, Zaki Ahmed4, Kate Downey2,4, Mitch Dowsett5,6, Nicholas Turner6,7, Christina Messiou2,4, Dow-Mu Koh2,4.
Abstract
BACKGROUND: Most MRI radiomics studies to date, even multi-centre ones, have used "pure" datasets deliberately accrued from single-vendor, single-field-strength scanners. This does not reflect aspirations for the ultimate generalisability of AI models. We therefore investigated the development of a radiomics signature from heterogeneous data originating on six different imaging platforms, for a breast cancer exemplar, in order to provide input into future discussions of the viability of radiomics in "real-world" scenarios where image data are not controlled by specific trial protocols but reflective of routine clinical practice.Entities:
Keywords: Feature reduction; Multi-vendor; Nodal status; Radiomics; Survival
Mesh:
Year: 2021 PMID: 34016188 PMCID: PMC8136229 DOI: 10.1186/s40644-021-00406-6
Source DB: PubMed Journal: Cancer Imaging ISSN: 1470-7330 Impact factor: 3.909
Fig. 1(a) Flow diagram of subject exclusion process; (b) Venn diagram illustrating availability of data between image contrast types and explaining the patient numbers in the right-hand side of (a)
Sequence parameters for the image data analysed
| Manufacturer | Siemens, Philips |
| Model | Aera 1.5 T (65), Avanto 1.5 T (14), Intera 1.5 T (1), Achieva 1.5 T (32), Achieva 3 T (29), Skyra 3 T (15) |
| Field strength | 1.5 T (112), 3 T (44) |
| Sequence type | Multi-slice, turbo spin-echo, transverse |
| TR | 3400–8690 ms |
| TE | 70–120 ms |
| Matrix size x | 448–576 |
| Matrix size y | 448–576 |
| Matrix size z (number of slices) | 34–60 |
| Slice thickness | 3.0–4.0 mm |
Contrast agent Typical administered dose | Dotarem® (Gadoteric acid - Gadoterate meglumine). 0.2 ml/kg |
| Typical injection rate | 2–3 ml/s |
| Sequence type | 3-D, T1-w subtraction |
| TR | 3.9–5.5 ms |
| TE | 1.5–2.6 ms |
| Flip angle | 14–18° |
| Matrix size x | 290–512 |
| Matrix size y | 290–512 |
| Matrix size z | 125–224 |
| Slice thickness | 1.0–2.5 mm |
| Sequence type | Diffusion-weighted multi-slice EPI, transverse |
| TR | 2000–12,800 ms |
| TE | 56–86 ms |
| Matrix size x | 128–512 |
| Matrix size y | 72–404 |
| Matrix size z | 30–200 |
| Slice thickness | 3.0–5.0 mm |
Radiologist semantic features captured
| Tumour visible (y / n) | |
| T2 signal (high / intermediate / low) | |
| Shape (irregular / lobular / oval /round) | |
| Margin (irregular / smooth / spiculated) | |
| Tumour visible (y / n) | |
| Mass enhancement (y / n) | |
| MR assessment of focality (focal / multicentric / multifocal) | |
| Lymph node appearance (normal / abnormal, primarily guided by size) | |
| Shape (irregular / lobular / oval /round / non-mass-like enhancement) | |
| Location quadrant (lower inner, lower outer, upper inner, upper outer) | |
| Margin (irregular / smooth / spiculated / non-mass-like enhancement) | |
| Tumour visible (y / n) |
Calculated image region-of-interest features, as defined in the Supplementary Data of Aerts et al. [26]
| Compactness 1 | Spherical disproportion |
| Compactness 2 | Surface area |
| Maximum diameter | Surface-to-volume ratio |
| Sphericity | Volume |
| Energy | Minimum |
| Entropy | Range |
| Kurtosis | RMS |
| Maximum | Skewness |
| Mean | Standard deviation |
| Mean absolute deviation | Variance |
| Median | Uniformity |
| Autocorrelation | Inverse difference moment normalised |
| Cluster prominence | Inverse difference normalised |
| Cluster shade | Inverse variance |
| Cluster tendency | Long run emphasis |
| Contrast | Long run low grey level emphasis |
| Correlation | Low grey level run emphasis |
| Difference entropy | Maximum probability |
| Dissimilarity | Run length non-uniformity |
| Energy | Run percentage |
| Entropy | Short run emphasis |
| Grey level nonuniformity | Short run low grey level emphasis |
| High grey level run emphasis | Short run high grey level emphasis |
| Homogeneity 1 | Sum average |
| Homogeneity 2 | Sum entropy |
| Informational measure correlation 1 | Sum variance |
| Informational measure correlation 2 | Short run emphasis |
Patient demographics and clinical features available for evaluation
| Age | 54 ± 12.5 [24–88], IQR 16.3 |
| Receptor status ER (+/−) | negative (22), positive (133), NA (1) |
| Receptor status PR (+/−) | negative (44), positive (110), NA (2) |
| Receptor status HER2 (+/−) | negative (126), positive (16), NA (14) |
| Subtype | basal (12), HER2 (9), luminal (131), NA (4) |
| Molecular subtype | basal (12), HER2 (9), luminalA (113), luminalB (4), NA (18) |
| Grade | 1 (11), 2 (102), 3 (41), NA (2) |
| Lymphovascular space invasion (LVSI) (determined at surgery) | absent (107), present (42), NA (7) |
| Nodal status (determined at surgery) | negative (99), positive (39), micrometastases (11), NA (7) |
Disease pathology (multiple conditions allowed per patient) | DCIS (47), IDC (95), ILC (62), LCIS (23) |
Number of recorded lesions * For some patients, clinical measurements recorded after therapy noted no residual tumour | 0* (7), 1 (133), 2 (14), 3 (2) |
| Laterality | left (87), right (63), bilateral (6) |
| Tumour size | (31 ± 24) [0–128] mm, IQR 26 mm |
| Focality (largest tumour) | unifocal (126), bifocal (12), multifocal (17), NA (1) |
| Type (largest tumour) | ductal (91), lobular (52), both (11), NA (2) |
| Surgery | Central excision (1), mastectomy (58), therapeutic mammoplasty (1), WLE (94), NA (2) |
| Neoadjuvant chemoendocrine therapy | no (112), yes (43), NA (1) |
| Date of diagnosis | March 2007 – July 2014 |
| Date of imaging | January 2012 – December 2013 |
| Date of surgery | January 2012 – September 2019 |
| Date of death or last follow up | August 2013 – August 2019 |
| Status at last follow-up | alive (140), dead (16) |
Fig. 2Pseudo-code describing model fitting, parameter tuning and performance estimating using a nested cross-validation process, as described in the text
Fig. 3An exemplar image set showing both original ROIs and the repeat annotations for the ICC feature stability sub-study for (a) T2w-weighted (b) early-phase dynamic subtraction, and (c) diffusion-weighted images
Fig. 4Radiomics features selected on the basis of the intraclass correlation coefficient (ICC), using a two-way “agreement” model, with threshold of 0.75, for the three different imaging contrasts
Results of classification modelling for target variable lymph node status. Correlation-based feature selection (FS) refers to the method described in the Materials and Methods section, incorporating both ICC and Spearman rank correlations assessed in order of feature groups. Full feature selection starts with the features retained by the correlation-based approach and then applies R’s rfe algorithm under cross-validation. Results represent the mean AUC for 5 repetitions of 10-fold cross-validation, with standard deviations in the range 0.14–0.21 and standard error in the mean 0.02–0.03. However, the Individual data AUC values are not normally-distributed, independent random variables, and so these values should be regarded as indicative only and we do not quote an estimated confidence interval
| Model type | Variables included | AUC (correlation-based FS) | AUC (correlation-based FS + recursive elimination) |
|---|---|---|---|
| SVM | Clinical | 0.68 | 0.71 |
| Random forest | Clinical | 0.72 | |
| XGBoost | Clinical | 0.68 | 0.72 |
| Naïve Bayes | Clinical | 0.71 | 0.72 |
| SVM | Radiomics | 0.55 | 0.62 |
| Random forest | Radiomics | 0.57 | 0.64 |
| XGBoost | Radiomics | 0.48 | 0.60 |
| Naïve Bayes | Radiomics | 0.65 | |
| SVM | Clinical + Radiomics | 0.66 | 0.70 |
| Random forest | Clinical + Radiomics | 0.62 | 0.74 |
| XGBoost | Clinical + Radiomics | 0.56 | 0.71 |
| Naïve Bayes | Clinical + Radiomics | 0.67 |
Fig. 5Mean ROC curves for nodal status classification problem using a Naïve Bayes classifier
Fig. 6Analysis of the composition of models produced using recursive feature elimination: variable importance averaged across model folds and repetitions for models involving predictors drawn from (a) clinical data, (b) radiomics data (calculated plus semantic features, (c) clinical and radiomic data
Fig. 7Principal component plot for the imaging feature data for all patients, with data points colour-coded by MR scanner type. Larger symbols represent group centroids. This partial separation via unsupervised classification methods demonstrates the significant extent to which data source acts as a confounding factor in radiomics studies of real-world data
Results of classification modelling for the scanner related target variables. All classifications used only the calculated radiomic features, as passed to the models of Table 5 and there was no direct access by the models to either the raw pixel matrices or the DICOM header information
| Classification target | AUC |
|---|---|
| Siemens Avanto 1.5 T vs. Rest | 0.91 |
| Siemens Aera 1.5 T vs. Rest | 0.96 |
| Philips Achieva 3 T vs. Rest | 0.97 |
| Philips Achieva or Intera 1.5 T vs. Rest | 0.99 |
| Siemens Skyra 3 T vs. Rest | 0.97 |
| 1.5 T vs. 3 T | 0.95 |
| Siemens vs. Philips | 1.00 |
Fig. 8Mean variable importance for the top 10 variables in a Naïve Bayes fitted model to classify images by the manufacturer of scanner on which they were acquired, as an illustration of the degree to which scanner type is a confounding factor influencing the radiomics features
Results of survival modelling
| Feature groups considered | Number of subjects | Number of deaths | Prediction error (%) |
|---|---|---|---|
| Clinical | 156 | 16 | 19.3 |
| T2W radiomics | 151 | 16 | 51.8 |
| DCE radiomics | 154 | 16 | 40.1 |
| DW radiomics | 135 | 15 | 36.9 |
| Clinical + | |||
| T2W radiomics | 151 | 16 | 22.7 |
| DCE radiomics | 154 | 16 | 24.0 |
| DW radiomics | 135 | 15 | 25.5 |
| Clinical + all radiomics | 131 | 15 | 30.4 |
Fig. 9Kaplan-Meier plots for the survival data showing censoring events and separation of strata by: (a) nodal disease status (NDS); (b) tumour grade; (c) all combinations of nodal status and grade. Quoted p-values are for the null hypothesis that the survival curves for the given strata are the same. It will be seen that almost all death events come from the group that has nodal involvement and tumour grade 3