| Literature DB >> 34638294 |
Heather M Whitney1,2, Hui Li1, Yu Ji1,3, Peifang Liu3, Maryellen L Giger1.
Abstract
Radiomic features extracted from medical images may demonstrate a batch effect when cases come from different sources. We investigated classification performance using training and independent test sets drawn from two sources using both pre-harmonization and post-harmonization features. In this retrospective study, a database of thirty-two radiomic features, extracted from DCE-MR images of breast lesions after fuzzy c-means segmentation, was collected. There were 944 unique lesions in Database A (208 benign lesions, 736 cancers) and 1986 unique lesions in Database B (481 benign lesions, 1505 cancers). The lesions from each database were divided by year of image acquisition into training and independent test sets, separately by database and in combination. ComBat batch harmonization was conducted on the combined training set to minimize the batch effect on eligible features by database. The empirical Bayes estimates from the feature harmonization were applied to the eligible features of the combined independent test set. The training sets (A, B, and combined) were then used in training linear discriminant analysis classifiers after stepwise feature selection. The classifiers were then run on the A, B, and combined independent test sets. Classification performance was compared using pre-harmonization features to post-harmonization features, including their corresponding feature selection, evaluated using the area under the receiver operating characteristic curve (AUC) as the figure of merit. Four out of five training and independent test scenarios demonstrated statistically equivalent classification performance when compared pre- and post-harmonization. These results demonstrate that translation of machine learning techniques with batch data harmonization can potentially yield generalizable models that maintain classification performance.Entities:
Keywords: breast cancer; computer-aided diagnosis; harmonization; machine learning; magnetic resonance imaging; radiomics
Year: 2021 PMID: 34638294 PMCID: PMC8508003 DOI: 10.3390/cancers13194809
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1Illustration of two stages in an AI/CADx medical imaging workflow where harmonization can be applied. (a) Previous work investigated the impacts of batch harmonization in a single stage of all features in the two databases with covariates in the context of cross-validation (Whitney et al., Journal of Medical Imaging 2020 [5]). (b) This present work investigates both batch harmonization of features and harmonization of feature selection in a training and independent test framework (i.e., harmonization is conducted on the features of the lesions in the training set and separately applied to the features of the lesions in the test set), with no covariates.
Description of the dataset: number of lesions, age of subjects, and size of lesions, by database. The size of lesions is given as maximum linear size (extracted radiomic feature S4, see Table 2 below) (A: Database A; B: Database B; CI: confidence interval; min: minimum; max: maximum; tr: training; te: test).
| Training Set | Test Set | |||
|---|---|---|---|---|
| Database | Benign | Cancer | Benign | Cancer |
|
|
|
| ||
| Number (% of set) | 554 (24%) | 1726 (76%) | 183 (22%) | 660 (78%) |
| Age in years | 44 [23, 70] | 49 [31, 77] | 44 [22, 67] | 50 [30, 73] |
| Size in mm | 18.1 [5.6, 66.7] | 28.6 [10.6, 98] | 16.5 [5.7, 60.2] | 28.3 [11.0, 95.2] |
|
| ||||
| Number (% of set) | 184 (22%) | 646 (78%) | 72 (23%) | 235 (77%) |
| Age in years | 49 [25, 74] | 56 [34, 82] | 47 [27, 67] | 52 [30, 74] |
| Size in mm | 12.9 [5.3, 55.8] | 29.5 [8.3, 105.5] | 12.7 [4.7, 55.2] | 35 [9.7, 115.2] |
|
| ||||
| Number (% of set) | 370 (26%) | 1080 (74%) | 111 (21%) | 425 (79%) |
| Age in years (median, [95% CI]) | 43 [21.5, 62.5] | 47 [30, 70] | 43 [21, 59.2] | 48 [30, 68] |
| Size in mm | 20.9 [5.9, 70.1] | 28.2 [12.0, 90.5] | 17.7 [7.2, 63.2] | 27.3 [11.8, 85.5] |
In Database A, the ages of 36 subjects with benign lesions and 63 subjects with cancers in the training set, and 10 subjects with benign lesions and 26 subjects with cancers in the test set, were unknown.
Figure 2Description of the dataset: fraction of types of benign lesions (one lesion per case) by database (A: Database A; B: Database B; tr: training; te: test).
Figure 3Description of the dataset: fraction of types of cancers (one lesion per case) by database (A: Database A; B: Database B; tr: training; te: test; DCIS: ductal carcinoma in situ; IDC: invasive ductal carcinoma).
Figure 4Framework for independent training and test between each set. Each arrow originates in a training set and terminates in an independent test set. Years of image acquisition are indicated in parentheses.
Description of radiomic features.
|
| ||
|
|
|
|
| M1 | Margin sharpness | Mean of the image gradient at the lesion margin |
| M2 | Variance of margin sharpness | Variance of the image gradient at the lesion margin |
| M3 | Variance of radial gradient histogram | Degree to which the enhancement structure extends in a radial pattern originating from the center of the lesion |
| T1 | Contrast | Location image variations |
| T2 | Correlation | Image linearity |
| T3 | Difference entropy | Randomness of the difference of neighboring voxels’ gray-levels |
| T4 | Difference variance | Variations of difference of gray-levels between voxel-pairs |
| T5 | Energy | Image homogeneity |
| T6 | Entropy | Randomness of the gray-levels |
| T7 | Inverse difference moment (homogeneity) | Image homogeneity |
| T8 | Information measure of correlation 1 | Nonlinear gray-level dependence |
| T9 | Information measure of correlation 2 | Nonlinear gray-level dependence |
| T10 | Maximum correlation coefficient | Nonlinear gray-level dependence |
| T11 | Sum average | Overall brightness |
| T12 | Sum entropy | Randomness of the sum of gray-levels of neighboring voxels |
| T13 | Sum variance | Spread in the sum of the gray-levels of voxel-pairs distribution |
| T14 | Sum of squares (variance) | Spread in the gray-level distribution |
| K1 | Maximum enhancement | Maximum contrast enhancement |
| K2 | Time to peak (s) | Time at which the maximum enhancement occurs |
| K3 | Uptake rate (1/s) | Uptake speed of the contrast enhancement |
| K6 | Enhancement at first postcontrast time point | Enhancement at first post-contrast time point |
| K7 | Signal enhancement ratio | Ratio of initial enhancement to overall enhancement |
|
| ||
|
|
|
|
| S1 | Volume (mm3) | Volume of lesion |
| S2 | Effective diameter (mm) | Greatest dimension of a sphere with the same volume as the lesion |
| S3 | Surface area (mm2) | Lesion surface area |
| S4 | Maximum linear size (mm) | Maximum distance between any 2 voxels in the lesion |
| G1 | Sphericity | Similarity of the lesion shape to a sphere |
| G2 | Irregularity | Deviation of the lesion surface from the surface of a sphere |
| G3 | Surface area/volume (1/mm) | Ratio of surface area to volume |
| K4 | Washout rate (1/s) | Washout speed of the contrast enhancement |
| K5 | Curve shape index | Difference between late and early enhancement |
| K8 | Volume of most enhancing voxels (mm3) | Volume of the most enhancing voxels |
Figure 5Feature value harmonization visualized through t-SNE presentations of pre-harmonization (top row) and post-harmonization (bottom row) by lesion type (benign lesion or cancer) and feature set (training or test set). Harmonization was conducted on the combination of benign lesions and cancers in the training set without covariate of lesion type and applied to lesions in the test set without covariate of lesion type. Results shown here are separated out by lesion type (cancer or benign) to aid in visualization only (A: Database A; B: Database B).
Figure 6Feature selection by feature sets used in the study. Top: pre-harmonization features; bottom: post-harmonization features. Abbreviations in the figure correspond to the features listed in Table 2. Squares outlined with a dashed line indicate those for which harmonization was conducted.
Figure 7Classification performance results after batch feature harmonization and feature selection harmonization: Receiver operating characteristic (ROC) curves for training and testing between each database in the task of classification of lesions as malignant or benign, using a database-specific independent test set determined by date of image acquisition. Solid lines show the ROC curve when using pre-harmonization features, while dashed lines show the ROC curve when using post-harmonization features. AUC values are given in Table 3 (A: Database A; B: Database B).
Difference in area under the receiver operating characteristic curve (ΔAUC), p-value for comparison, and, when ΔAUC fails to show statistically significant difference, equivalence margin for equivalence and non-inferiority, when using combinations of separate training and independent test sets. The difference in AUC (ΔAUC) is determined as AUCpost-harmonization-AUCpre-harmonization. An asterisk (*) indicates statistically significant difference (including after adjusting the criteria using the Bonferroni correction for significance due to multiple comparisons, when appropriate) (A: Database A; B: Database B; tr: training set; te: test set).
| Training Set | Independent Test Set | Equivalence Margin (ΔAUC) for Equivalence | |
|---|---|---|---|
|
| |||
| Atr | Bte | <0.0001 * | n/a |
| Btr | Ate | 0.5 | 0.058 |
|
| |||
| (A + B)tr | Ate | 0.9 | 0.019 |
| (A + B)tr | Bte | 0.17 | 0.020 |
| (A + B)tr | (A + B)te | 0.4 | 0.014 |
Figure 8Classification performance results after batch feature harmonization and feature selection harmonization: Receiver operating characteristic (ROC) curves for the training and testing using a training set comprised of lesions from both databases. Solid lines show the ROC curve when using pre-harmonization features, while dashed lines show the ROC curve when using post-harmonization features. AUC values are given in Table 3 (A: Database A; B: Database B).