| Literature DB >> 22490545 |
Lothar Häberle1, Florian Wagner, Peter A Fasching, Sebastian M Jud, Katharina Heusinger, Christian R Loehberg, Alexander Hein, Christian M Bayer, Carolin C Hack, Michael P Lux, Katja Binder, Matthias Elter, Christian Münzenmayer, Rüdiger Schulz-Wendtland, Martina Meier-Meitinger, Boris R Adamietz, Michael Uder, Matthias W Beckmann, Thomas Wittenberg.
Abstract
INTRODUCTION: Although mammographic density is an established risk factor for breast cancer, its use is limited in clinical practice because of a lack of automated and standardized measurement methods. The aims of this study were to evaluate a variety of automated texture features in mammograms as risk factors for breast cancer and to compare them with the percentage mammographic density (PMD) by using a case-control study design.Entities:
Mesh:
Year: 2012 PMID: 22490545 PMCID: PMC3446394 DOI: 10.1186/bcr3163
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Characteristics of the study population relative to case and control status
| Characteristic | Cases ( | Controls ( |
|---|---|---|
| Age at mammogram (years) | 57.5 (± 10.8) | 57.3 (± 10.6) |
| BMI (kg/m2) | 26.1 (± 5.0) | 24.6 (± 3.8) |
| Age at last menstruation {years} | 48.7 (± 5.5) | 47.5 (± 6.6) |
| Age at first menarche (years) | 13.5 (± 1.6) | 13.4 (± 1.4) |
| Age at FTP (years) | 25.2 (± 4.6) | 25.6 (± 4.4) |
| Menopausal status | ||
| Premenopausal | 221 (30.9%) | 105 (26.6%) |
| Postmenopausal | 495 (69.1%) | 209 (73.4%) |
| Parity | ||
| No birth | 125 (15.5%) | 54 (14.3%) |
| 1 to 2 births | 507 (62.7%) | 202 (53.4%) |
| ≥ 3 births | 177 (21.9%) | 122 (32.3%) |
| Family history of breast cancer | ||
| No | 700 (85.6%) | 237 (80.3%) |
| Yes | 118 (14.4%) | 58 (19.7%) |
| HRT ever | ||
| No | 564 (70.2%) | 156 (42.5%) |
| Yes | 239 (29.8%) | 211 (57.5%) |
For the cases, age is identical with age at diagnosis. BMI, body mass index; FTP, first term pregnancy; HRT, hormone replacement therapy.
The process of variable selection
| Feature group | Total | Selected features | Features | |
|---|---|---|---|---|
| Moment-based features | 76 | 71 | 8 | 1 |
| Form-based features | 86 | 74 | 16 | 4 |
| Statistical features | 130 | 86 | 46 | 29 |
| Structural features | 108 | 90 | 23 | 10 |
| Spectral features | 70 | 21 | 6 | 2 |
| Total | 470 | 342 | 99 | 46 |
Numbers of features are shown. aPreselection due to high correlations between features. bFeature group scores were constructed with these features. cThe final score was constructed with these features.
Simple and multiple logistic regression models to measure the predictive power of percentage mammographic density (PMD) and selected features within and across the five texture feature groups, via feature group scores and the final feature score, respectivelya
| Training data set | Validation data set | ||||||
|---|---|---|---|---|---|---|---|
| Unadjusted | Adjusted for age and BMI | Adjusted for age, BMI, parity, family history, and age at FTP | |||||
| Texture features included | AUC | AUC | OR (95% CI) | AUC | OR (95% CI) | AUC | OR (95% CI) |
| Noneb | - | - | - | 0.60 | - | 0.65 | - |
| PMD | 0.53 | 0.51 | 1.05 (0.89-1.23) | 0.61 | 1.24 (1.00-1.55) | 0.66 | 1.19 (0.93-1.53) |
| Moment-based features | 0.66 | 0.58 | 1.46 (1.22-1.73) | 0.62 | 1.43 (1.19-1.72) | 0.67 | 1.41 (1.14-1.75) |
| Form-based features ( | 0.67 | 0.59 | 1.47 (1.23-1.74) | 0.64 | 1.44 (1.20-1.74) | 0.67 | 1.49 (1.21-1.84) |
| Statistical features | 0.82 | 0.72 | 2.40 (1.98-2.90) | 0.73 | 2.28 (1.87-2.78) | 0.74 | 2.36 (1.88-2.96) |
| Structural features | 0.77 | 0.65 | 1.64 (1.38-1.95) | 0.68 | 1.60 (1.34-1.92) | 0.71 | 1.70 (1.39-2.08) |
| Spectral features | 0.71 | 0.65 | 1.67 (1.40-1.99) | 0.67 | 1.57 (1.30-1.90) | 0.68 | 1.60 (1.29-1.98) |
| Selected features across all feature groups (final model; | 0.85 | 0.75 | 2.65 (2.18-3.21) | 0.75 | 2.55 (2.08-3.11) | 0.79 | 2.88 (2.28-3.65) |
| Selected features across all feature groups + PMD | 0.85 | 0.75 | 2.63 (2.17-3.18) | 0.75 | 2.52 (2.06-3.08) | 0.79 | 2.86 (2.26-3.62) |
The area under the curve (AUC) of the regression models and the odds ratio (OR) per standard-deviation (SD) change for the feature scores with 95% confidence intervals are shown. Features were selected as described in the Patients and Methods sections.
AUC, area under the curve; BMI, body mass index; CI, confidence interval; FFTP, first term pregnancy; OR, odds ratio; PMD, percentage mammographic density; SD, standard deviation. a In the training data, each logistic regression model used selected features as independent variables; in validation data, the logistic regression models used the feature group scores and the final feature score, respectively, as independent variable. Adjusted analyses with regular risk factors as additional independent variables. bPrediction only with regular risk factors.
Figure 1Histogram of the final feature score, based on the 46 finally selected features applied on the validation data set.
Figure 2Histogram of the percentage mammographic density (PMD) on the validation data set.
Figure 3Finally selected features (. Strength of risk prediction within the final logistic regression model on x-axis (absolute value of log odds ratio per standard deviation) and the feature's Spearman correlation with percentage mammographic density (PMD) on the y-axis. +The texture feature and PMD have the same direction with regard to their association with risk (that is, either positive log OR and positive correlation with PMD or negative log OR and negative correlation with PMD). •The texture feature and PMD have the opposite direction with regard to their association with risk (that is, either positive log OR and negative correlation or negative log OR and positive correlation with PMD). Gray symbols, Feature is selected in fewer than 90% of the bootstrap samples. Black symbols, It is selected in more than 90% of the bootstrap samples. The dashed line circumscribes a cluster of second-order statistical features, and the continuous gray line circumscribes a cluster of first-order statistical features. "Static histogram" refers to features describing the relative frequency of gray-level values according to a given interval (bin). These features are thus first-order statistics describing the gray-level distribution. SDH refers to features calculated from sum and difference histograms, and GLCM refers to features calculated from a gray level co-occurrence matrix. Both of these are second-order statistics, describing the gray-level distribution relative to spatial relations between adjacent pixels. SGF refers to the statistical geometric features, describing the structure of the microtexture. A more-detailed description of all of the features is given in the Methods section.
Figure 4Example of a feature with the same direction for the correlation of the feature with breast cancer risk and percentage mammographic density (PMD). Patients with mammograms like that on the left had low values for the feature "SDH (0.5 cm) difference of contrast" and had a low predicted risk of breast cancer. Patients with mammograms like that on the right had high feature values, a high risk of breast cancer, and a high mammographic density. The Spearman correlation with PMD for this feature was +0.54.
Figure 5Example of a feature with no correlation with percentage mammographic density (PMD). Patients with mammograms like that on the left had low values for the feature "GLCM inverse difference moment" and had a low predicted risk of breast cancer. Patients with mammograms like that on the right had high feature values and a high risk of breast cancer. The Spearman correlation with PMD for this feature was -0.05.
Figure 6Example of a feature with different directions for the correlation with breast cancer risk and PMD. Patients with mammograms like that on the left had low values for the feature "SDH (0.5 cm) difference of entropies" and had a low predicted risk of breast cancer and a high mammographic density. Patients with mammograms like that on the right had high feature values, a high risk of breast cancer, and a low mammographic density. The Spearman correlation with PMD for this feature was -0.72.
Figure 7Examples of images with low score values calculated with the final prediction model and a low risk of breast cancer (left), and images with high score values and a high risk of breast cancer (right). Spearman's rho for the correlation between the final score and percentage mammographic density (PMD) was 0.02.