| Literature DB >> 31360877 |
Daniel F Schmidt1,2, Enes Makalic1, Benjamin Goudey1,3, Gillian S Dite1, Jennifer Stone1,4, Tuong L Nguyen1, James G Dowty1, Laura Baglietto5, Melissa C Southey6,7, Gertraud Maskarinec8, Graham G Giles1,9, John L Hopper1.
Abstract
BACKGROUND: We applied machine learning to find a novel breast cancer predictor based on information in a mammogram.Entities:
Year: 2018 PMID: 31360877 PMCID: PMC6649799 DOI: 10.1093/jncics/pky057
Source DB: PubMed Journal: JNCI Cancer Spectr ISSN: 2515-5091
Marginal odds ratio (OR) per unadjusted standard deviation, with 95% confidence interval (CI) and P value, for each feature from each study
| Feature | Caucasian cohort (Australia) | Caucasian case–control (Australia) | Japanese cohort (Hawaii) | |||
|---|---|---|---|---|---|---|
| OR (95% CI) | OR (95% CI) | OR (95% CI) | ||||
| Autocorrelation | 0.89 (0.82 to 0.96) | .004 | 1.14 (1.01 to 1.29) | .03 | 1.18 (0.98 to 1.43) | .07 |
| Cluster prominence | 1.11 (1.02 to 1.20) | .01 | 1.05 (0.93 to 1.19) | .4 | 0.92 (0.77 to 1.11) | .4 |
| Cluster shade | 1.13 (1.04 to 1.22) | .003 | 0.94 (0.83 to 1.07) | .3 | 0.87 (0.73 to 1.05) | .1 |
| Contrast | 0.57 (0.52 to 0.63) | 10−28 | 0.60 (0.50 to 0.72) | 3 × 10−8 | 0.66 (0.53 to 0.81) | 6 × 10−5 |
| Correlation | 1.75 (1.59 to 1.94) | 10−28 | 1.68 (1.40 to 2.01) | 2 × 10−8 | 1.56 (1.26 to 1.92) | 4 × 10−5 |
| Difference entropy | 0.58 (0.54 to 0.64) | 10−31 | 0.60 (0.52 to 0.70) | 10−11 | 0.67 (0.55 to 0.81) | 4 × 10−5 |
| Difference variance | 0.57 (0.52 to 0.63) | 10−28 | 0.60 (0.50 to 0.72) | 3 × 10−8 | 0.66 (0.53 to 0.81) | 6 × 10−5 |
| Dissimilarity | 0.58 (0.53 to 0.64) | 10−29 | 0.61 (0.52 to 0.71) | 10−10 | 0.66 (0.54 to 0.81) | 4 × 10−5 |
| Energy | 1.18 (1.09 to 1.27) | 4 × 10−5 | 1.01 (0.90 to 1.15) | .8 | 0.90 (0.75, 1.08) | .3 |
| Entropy | 0.69 (0.63 to 0.75) | 10−18 | 0.72 (0.63 to 0.82) | 9 × 10−7 | 0.82 (0.68 to 0.99) | .04 |
| Homogeneity | 1.71 (1.56 to 1.88) | 10−30 | 1.64 (1.41 to 1.90) | 10−10 | 1.51 (1.24 to 1.84) | 4 × 10−5 |
| Information correlation 1 | 0.58 (0.53 to 0.63) | 10−32 | 0.58 (0.51 to 0.67) | 10−13 | 0.63 (0.52 to 0.77) | 4 × 10−6 |
| Information correlation 2 | 1.80 (1.64 to 1.98) | 10−33 | 1.86 (1.58 to 2.19) | 10−13 | 1.80 (1.47 to 2.20) | 1 × 10−8 |
| Maximum probability | 1.09 (1.00 to 1.17) | .04 | 0.91 (0.81 to 1.03) | .2 | 0.84 (0.70 to 1.01) | .06 |
| Moment normalized inverse difference | 1.75 (1.58 to 1.93) | 10−28 | 1.67 (1.40 to 1.99) | 1 × 10−8 | 1.53 (1.24 to 1.87) | 5 × 10−5 |
| Normalized inverse difference | 1.72 (1.56 to 1.89) | 10−29 | 1.64 (1.41 to 1.92) | 10−10 | 1.51 (1.24 to 1.84) | 4 × 10−5 |
| Sum average | 0.89 (0.82 to 0.97) | .005 | 1.11 (0.99 to 1.26) | .08 | 1.18 (0.98 to 1.41) | .09 |
| Sum variance | 0.94 (0.87 to 1.01) | .1 | 1.28 (1.14 to 1.45) | 6 × 10−5 | 1.28 (1.06 to 1.54) | .009 |
| Sum entropy | 0.75 (0.69 to 0.81) | 10−12 | 0.83 (0.73 to 0.93) | .002 | 0.93 (0.77 to 1.11) | .4 |
| Variance | 0.88 (0.81 to 0.95) | .002 | 1.12 (0.99 to 1.27) | .07 | 1.17 (0.97 to 1.41) | .1 |
Odds ratio per standard deviation after adjusting for age and BMI (95% confidence intervals) for the Cirrus measures trained on one dataset and tested on the same (diagonal) or another (off-diagonal) dataset
| Training dataset | Testing dataset | ||
|---|---|---|---|
| Caucasian | Caucasian | Japanese American | |
| Cohort | case–control | cohort | |
| (Australia) | (Australia) | (Hawaii) | |
| Caucasian | 1.83 (1.65 to 2.03) | 1.60 (1.41 to 1.82) | 1.78 (1.46 to 2.17) |
| Cohort | |||
| (Australia) | |||
| Caucasian | 1.56 (1.43 to 1.72) | 1.72 (1.52 to 1.95) | 1.75 (1.44 to 2.12) |
| Case–control | |||
| (Australia) | |||
| Japanese | 1.58 (1.43 to 1.75) | 1.61 (1.40 to 1.86) | 1.92 (1.57 to 2.36) |
| cohort | |||
| (Hawaii) | |||
Marginal odds ratio (OR) per standard deviation after adjusting for age and body mass index, with 95% confidence interval (CI) and P value, for each feature from analysis of the combined dataset
| Feature | OR (95% CI) | |
|---|---|---|
| Autocorrelation | 1.01 (0.94 to 1.08) | .9 |
| Cluster prominence | 1.06 (0.99 to 1.14) | .09 |
| Cluster shade | 1.03 (0.96 to 1.10) | .5 |
| Contrast | 0.55 (0.49 to 0.62) | 10−22 |
| Correlation | 1.82 (1.62 to 2.05) | 10−23 |
| Difference entropy | 0.57 (0.52 to 0.63) | 10−32 |
| Difference variance | 0.55 (0.49 to 0.62) | 10−22 |
| Dissimilarity | 0.57 (0.51 to 0.63) | 10−28 |
| Energy | 1.08 (1.01 to 1.16) | .03 |
| Entropy | 0.70 (0.65 to 0.76) | 10−17 |
| Homogeneity | 1.75 (1.59 to 1.93) | 10−29 |
| Information correlation 1 | 0.55 (0.50 to 0.60) | 10−35 |
| Information correlation 2 | 1.97 (1.77 to 2.20) | 10−33 |
| Maximum probability | 0.99 (0.92 to 1.07) | .9 |
| Moment normalized inverse difference | 1.81 (1.61 to 2.03) | 10−23 |
| Normalized inverse difference | 1.76 (1.59 to 1.94) | 10−28 |
| Sum average | 1.00 (0.93 to 1.07) | 1.0 |
| Sum entropy | 0.79 (0.74 to 0.85) | 10−9 |
| Sum variance | 1.09 (1.01 to 1.17) | .02 |
| Variance | 0.99 (0.92 to 1.07) | .8 |
Figure 1.Correlations between the 20 textural features.
The skewness and excess kurtosis of each of the 20 features in the combined dataset, along with the posterior standard deviation (SD) and standardized weight used to create the final Cirrus risk measure
| Feature | Skewness | Excess kurtosis | Posterior SD | Standardized weight |
|---|---|---|---|---|
| Autocorrelation | 0.34 | 0.40 | 1.651 | 0.218 |
| Cluster prominence | −0.26 | 0.33 | 0.008957 | 1.630 |
| Cluster shade | −0.34 | 0.21 | 0.07228 | 0.267 |
| Contrast | 3.17 | 19.5 | 26.020 | 0.397 |
| Correlation | −3.18 | 19.6 | 235.51 | −0.321 |
| Difference entropy | 1.14 | 2.37 | 9.0306 | −2.156 |
| Difference variance | 3.17 | 19.5 | 25.889 | 0.399 |
| Dissimilarity | 1.79 | 5.77 | 42.727 | 0.429 |
| Energy | 0.03 | 0.30 | 35.733 | 0.675 |
| Entropy | 0.60 | 0.63 | 17.761 | −1.452 |
| Homogeneity | −1.51 | 3.88 | 91.983 | −0.983 |
| Information correlation 1 | 1.09 | 2.03 | 35.328 | −1.338 |
| Information correlation 2 | −2.17 | 8.14 | 136.29 | −1.193 |
| Maximum probability | −0.33 | 0.48 | 22.878 | −1.473 |
| Moment normalized inverse difference | −2.96 | 16.9 | 1976.2 | 1.096 |
| Normalized inverse difference | −1.65 | 4.81 | 392.33 | −0.600 |
| Sum average | 0.33 | 0.39 | 5.2236 | −0.228 |
| Sum entropy | 0.19 | 0.002 | 19.135 | 1.919 |
| Sum variance | 0.26 | 0.46 | 0.6445 | 0.664 |
| Variance | 0.34 | 0.38 | 1.5106 | −0.795 |
Figure 2.Distribution of the Cirrus measure created on the combined studies, adjusted for age and BMI, for cases (gray line) and controls (dark line). Risk increases with increasing Cirrus. The difference in the mean between cases and controls is equal to the log of the OR per standard deviation, which in turn is linearly related to the area under the receiver operator curve (AUC) in the range of 0.5 to 0.7; see theory and references in the Supplementary Methods (available online). See also Figure 3, which shows the corresponding receiver operator curve.
Odds ratio (OR) adjusted for age and body mass index (BMI) for the mammography-based risk measures, age- and BMI-adjusted Cirrus, absolute and percentage mammographic density, and log BMI, fitted alone and in combination, from analysis of the combined dataset
| Feature | OR (95% CI) | |
|---|---|---|
| Cirrus | 1.90 (1.73 to 2.09) | 10−38 |
| Absolute mammographic density | 1.34 (1.25 to 1.43) | 10−17 |
| Percentage mammographic density | 1.38 (1.29 to 1.48) | 10−20 |
| Log BMI | 1.07 (0.99 to 1.15) | .07 |
| Cirrus | 1.76 (1.59 to 1.95) | 10−27 |
| Absolute mammographic density | 1.16 (1.08 to 1.24) | 4 × 10−5 |
| Log BMI | 1.04 (0.96 to 1.12) | .3 |
| Cirrus | 1.74 (1.57 to 1.93) | 10−25 |
| Percentage mammographic density | 1.16 (1.07 to 1.25) | 2 × 10−4 |
| Log BMI | 1.04 (0.97 to 1.12) | .3 |
| Cirrus | 1.74 (1.56 to 1.93) | 10−24 |
| Absolute mammographic density | 1.11 (1.00 to 1.23) | .04 |
| Percentage mammographic density | 1.06 (0.95 to 1.19) | .3 |
| Log BMI | 1.04 (0.97 to 1.12) | .3 |
Figure 3.Receiver operator curves based on fitting: Cirrus created on the combined studies (continuous line); homogeneity alone (dashed line); and percentage mammographic density (dotted line). The corresponding areas under the receiver operator curves are 0.662, 0.642, and 0.620, respectively.