| Literature DB >> 36230723 |
Aimilia Gastounioti1,2,3, Mikael Eriksson4, Eric A Cohen1,2, Walter Mankowski1,2, Lauren Pantalone1,2, Sarah Ehsan5, Anne Marie McCarthy5, Despina Kontos1,2, Per Hall4,6, Emily F Conant7.
Abstract
Despite the demonstrated potential of artificial intelligence (AI) in breast cancer risk assessment for personalizing screening recommendations, further validation is required regarding AI model bias and generalizability. We performed external validation on a U.S. screening cohort of a mammography-derived AI breast cancer risk model originally developed for European screening cohorts. We retrospectively identified 176 breast cancers with exams 3 months to 2 years prior to cancer diagnosis and a random sample of 4963 controls from women with at least one-year negative follow-up. A risk score for each woman was calculated via the AI risk model. Age-adjusted areas under the ROC curves (AUCs) were estimated for the entire cohort and separately for White and Black women. The Gail 5-year risk model was also evaluated for comparison. The overall AUC was 0.68 (95% CIs 0.64-0.72) for all women, 0.67 (0.61-0.72) for White women, and 0.70 (0.65-0.76) for Black women. The AI risk model significantly outperformed the Gail risk model for all women p < 0.01 and for Black women p < 0.01, but not for White women p = 0.38. The performance of the mammography-derived AI risk model was comparable to previously reported European validation results; non-significantly different when comparing White and Black women; and overall, significantly higher than that of the Gail model.Entities:
Keywords: artificial intelligence; breast cancer risk; breast density; digital mammography; racial disparities; screening; supplemental screening
Year: 2022 PMID: 36230723 PMCID: PMC9564051 DOI: 10.3390/cancers14194803
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Flowchart showing criteria for case–cohort sample selection. FFDM = full-field digital mammography.
Baseline characteristics of study dataset by case–control status.
| Characteristic | Controls, | Cases, | |
|---|---|---|---|
| Age at screening | 56.49 (10.32) | 59.20 (11.06) | 0.002 |
| BMI at screening | 29.42 (7.47) | 29.37 (6.85) | 0.92 |
| Missing BMI | 165 | 10 | |
| Age > 50 (postmenopausal) | 3462/4963 (70%) | 132/176 (75%) | 0.14 |
| Race | 0.21 | ||
| White | 2069/4917 (42%) | 85/175 (49%) | |
| Black | 2521/4917 (51%) | 81/175 (46%) | |
| Other | 327/4917 (6.7%) | 9/175 (5.1%) | |
| Missing | 46 | 1 | |
| Age at first child | 0.70 | ||
| Nulliparous | 1102/4302 (26%) | 33/142 (23%) | |
| <20 | 830/4302 (19%) | 24/142 (17%) | |
| 20–24 | 859/4302 (20%) | 29/142 (20%) | |
| 25–29 | 811/4302 (19%) | 27/142 (19%) | |
| ≥30 | 700/4302 (16%) | 29/142 (20%) | |
| Missing | 661 | 34 | |
| Family history of breast cancer | <0.001 | ||
| No family history | 3985/4899 (81%) | 115/167 (69%) | |
| One 1st degree relative | 832/4899 (17%) | 39/167 (23%) | |
| ≥2 1st degree relatives | 82/4899 (1.7%) | 13/167 (7.8%) | |
| Missing | 64 | 9 | |
| Number of prior biopsies | <0.001 | ||
| 0 | 438/1227 (36%) | 4/46 (8.7%) | |
| 1 | 543/1227 (44%) | 24/46 (52%) | |
| 2 or more | 246/1227 (20%) | 18/46 (39%) | |
| Missing | 3736 | 130 | |
| Atypical hyperplasia | 31/350 (8.9%) | 3/17 (18%) | 0.20 |
| Missing | 4613 | 159 | |
| BI-RADS density | <0.001 | ||
| 1 | 623/4963 (13%) | 13/176 (7.4%) | |
| 2 | 2816/4963 (57%) | 84/176 (48%) | |
| 3 | 1424/4963 (29%) | 77/176 (44%) | |
| 4 | 100/4963 (2.0%) | 2/176 (1.1%) | |
1 Mean (SD); n/N (%). 2 For age and BMI, the Welch Two Sample t-test was used; for race, age at first child, family history of breast cancer, number of prior biopsies, and BI-RADS density, the Pearson’s Chi-squared test was used; for postmenopausal status and atypical hyperplasia, the Fisher’s exact test was used.
AI and Gail risk scores in study dataset: Distributions by case–control status.
| Characteristic | Controls, | Cases, | |
|---|---|---|---|
| Breast percent density 3 | 25.88 (20.42) | 31.59 (22.14) | <0.001 |
| Calcs malignancy | 0.13 (0.16) | 0.22 (0.22) | <0.001 |
| Masses malignancy | 0.18 (0.19) | 0.24 (0.24) | 0.001 |
| Calcs asymmetry | 0.03 (0.05) | 0.07 (0.09) | <0.001 |
| Masses asymmetry | 0.05 (0.06) | 0.08 (0.08) | <0.001 |
| AI absolute 2-year risk (%) | 0.79 (0.49, 1.35) | 1.39 (0.79, 2.96) | <0.001 |
| Gail absolute 5-year risk (%) 4 | 1.38 (1.01, 1.76) | 1.57 (1.24, 2.21) | <0.001 |
1 Mean (SD); n/N (%); Median (Q1, Q3). 2 The Welch Two Sample t-test was used for breast percent density, calcs and masses malignancies, and calcs and masses asymmetries; the Wilcoxon rank sum test was used for the AI and Gail absolute risk scores. 3 For one breast in a control exam, the percent density was not obtained, and the unilateral density result was used for the risk analysis. 4 Gail risk was available on 166 cases and 4894 controls.
Discriminatory performance (AUC) in the full cohort and in subgroups of women by mammographic density and tumor characteristics, stratified by White and Black women.
| Study Participant Characteristic Subgroup | All Women 1 | White Women | Black Women | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| AUC | 95% CI |
| AUC | 95% CI |
| AUC | 95% CI | ||
| Full cohort | 176/4963 | 0.68 | 0.64–0.72 | 85/2069 | 0.67 | 0.61–0.72 | 81/2521 | 0.70 | 0.65–0.76 | 0.20 |
| BI-RADS density | ||||||||||
| 1 + 2 | 97/3439 | 0.67 | 0.62–0.72 | 43/1276 | 0.66 | 0.58–0.73 | 48/1975 | 0.69 | 0.62–0.76 | 0.17 |
| 3 + 4 | 79/1524 | 0.69 | 0.62–0.74 | 42/793 | 0.68 | 0.60–0.76 | 33/546 | 0.71 | 0.61–0.80 | 0.69 |
| 0.82 | 0.85 | 0.63 | ||||||||
| Tumor invasiveness | ||||||||||
| Invasive | 128/4963 | 0.70 | 0.65–0.74 | 59/2069 | 0.68 | 0.60–0.75 | 62/2521 | 0.72 | 0.66–0.78 | 0.22 |
| In situ | 48/4963 | 0.63 | 0.55–0.70 | 26/2069 | 0.64 | 0.54–0.74 | 19/2521 | 0.65 | 0.52–0.77 | 0.74 |
| 0.18 | 0.64 | 0.38 | ||||||||
| Tumor size (invasive tumors only) | ||||||||||
| ≤10 mm | 68/4963 | 0.66 | 0.60–0.72 | 37/2069 | 0.63 | 0.53–0.72 | 25/2521 | 0.71 | 0.62–0.80 | 0.08 |
| >10–20 mm | 38/4963 | 0.73 | 0.64–0.81 | 16/2069 | 0.73 | 0.60–0.84 | 21/2521 | 0.71 | 0.59–0.82 | 0.68 |
| >20 mm | 22/4963 | 0.76 | 0.67–0.84 | 6/2069 | 0.79 | 0.62–0.91 | 16/2521 | 0.74 | 0.63–0.84 | 0.95 |
| 0.26 | 0.11 | 0.71 | ||||||||
| In situ grade | ||||||||||
| Low–intermediate | 35/4963 | 0.63 | 0.54–0.71 | 18/2069 | 0.65 | 0.53–0.77 | 15/2521 | 0.60 | 0.46–0.74 | 0.55 |
| High | 13/4963 | 0.64 | 0.48–0.78 | 8/2069 | 0.63 | 0.46–0.77 | 4/2521 | 0.83 | 0.66–0.95 | 0.12 |
| 0.63 | 0.37 | 0.14 | ||||||||
1 All women in the cohort also includes non-White and non-Black women, and women with missing information on race. AUCs adjusted for age at baseline. Confidence intervals estimated using bootstrapping. Permutation test tested for difference between AUCs in White and Black women (p-value 2) and between AUCs in study participant characteristic subgroups (p-value 3).
Figure 2Frequency distribution of AI absolute 2-year risk scores for developing breast cancer in cases (red) and controls (green). Distributions presented for the entire dataset and in racial subgroups. 1 Cut-offs for general, moderate, and high-risk groups are based on the NICE guidelines for 10-year risk in age group 40–50 (<3%, 3–8%, >8%) divided by 5. We added a fourth low-risk group with the absolute risk cut-off 0.15. 2 The relative risk was calculated as ratios of average risks in each absolute risk category. High-risk women in the full cohort had a 37-fold higher risk compared with women at low risk. The corresponding numbers for White and Black women were 36-fold and 34-fold. NICE: National Institute of Health and Care Excellence guidelines.
Discriminatory performance (AUC) in women with available Gail risk factors, in the full cohort and in racial subgroups, for any breast cancer subtype and for invasive breast cancer.
| Risk Model in | All Women (166/4894) 1 | White Women (78/2037) | Black Women (80/2487) | ||||
|---|---|---|---|---|---|---|---|
| AUC | 95% CI | AUC | 95% CI | AUC | 95% CI | ||
|
| |||||||
| Gail 5-year risk | 0.55 | 0.50–0.60 | 0.61 | 0.54–0.68 | 0.48 | 0.41–0.54 | 0.12 |
| AI 2-year risk | 0.68 | 0.64–0.72 | 0.66 | 0.60–0.72 | 0.71 | 0.65–0.76 | 0.54 |
| <0.01 | 0.38 | <0.01 | |||||
|
| |||||||
| Gail 5-year risk | 0.55 | 0.50–0.61 | 0.61 | 0.53–0.69 | 0.47 | 0.39–0.54 | 0.12 |
| AI 2-year risk | 0.70 | 0.65–0.75 | 0.67 | 0.59–0.74 | 0.73 | 0.66–0.79 | 0.56 |
| <0.01 | 0.39 | <0.01 | |||||
1 All women in the cohort also includes non-White and non-Black women, and women with missing information on race. AUCs adjusted for age at baseline. Confidence intervals estimated using bootstrapping. Permutation test tested for difference between AUCs in White and Black women for each model (p-value 2) and between models (p-value 3).
Figure 3Frequency distribution of Gail 5-year (left column) and AI 2-year (right column) absolute risk scores for developing breast cancer in cases (red) and controls (green). Distributions presented for a subset of n = 166 breast cancer cases and n = 4894 controls with available Gail and AI risk scores. 1 Cut-offs for general, moderate, and high-risk groups are based on the NICE guidelines for 10-year risk in age group 40–50 (<3%, 3–8%, >8%) adapted to 5-year and 2-year, respectively, by dividing the 10-year risk by 2 and 5. We added a fourth low-risk group with the absolute risk cut-off 0.15 2-year risk (or 0.375 5-year risk). 2 The relative risk was calculated as ratios of average risks in each absolute risk category. High-risk women identified using Gail 18-fold higher risk compared with women at low risk. The corresponding numbers for AI Risk was 36-fold. NICE: National Institute of Health and Care Excellence guidelines.