| Literature DB >> 35122217 |
Ziba Gandomkar1, Sarah J Lewis2, Tong Li2, Ernest U Ekpo2, Patrick C Brennan2.
Abstract
OBJECTIVES: Proposing a machine learning model to predict readers' performances, as measured by the area under the receiver operating characteristics curve (AUC) and lesion sensitivity, using the readers' characteristics.Entities:
Keywords: Area under curve; Inter-observer variability; Machine learning; Mammography; ROC curve
Mesh:
Year: 2022 PMID: 35122217 PMCID: PMC9226081 DOI: 10.1007/s12282-022-01335-3
Source DB: PubMed Journal: Breast Cancer ISSN: 1340-6868 Impact factor: 3.307
Characteristics of the cases in each test set
| Feature | Set 1 | Set 2 | Set 3 | Set 4 | Set 5 | Set 6 | Set 7 | Set 8 | Set 9 |
|---|---|---|---|---|---|---|---|---|---|
| Breast density | |||||||||
| BI-RADS A | 5 | 4 | 6 | 1 | 9 | 5 | 6 | 8 | 6 |
| BI-RADS B | 31 | 20 | 20 | 21 | 35 | 33 | 25 | 30 | 21 |
| BI-RADS C | 23 | 34 | 28 | 37 | 15 | 19 | 23 | 22 | 30 |
| BI-RADS D | 1 | 2 | 6 | 1 | 1 | 3 | 6 | 0 | 3 |
| Cancer type | |||||||||
| Architectural Distortion | 2 | 1 | 1 | 4 | 2 | 3 | 0 | 4 | 0 |
| Calcification | 4 | 4 | 5 | 4 | 3 | 2 | 7 | 0 | 2 |
| Non-specific density | 4 | 4 | 3 | 4 | 3 | 1 | 2 | 4 | 5 |
| Discrete Mass | 3 | 1 | 6 | 1 | 3 | 1 | 3 | 0 | 4 |
| Spiculated Mass | 0 | 1 | 3 | 3 | 4 | 3 | 8 | 0 | 4 |
| Stellate | 7 | 9 | 2 | 4 | 5 | 10 | 0 | 12 | 5 |
| Mean size (mm) | 7.4 | 5.6 | 5.5 | 12.8 | 7.4 | 9.2 | 7.3 | 5.9 | 6.8 |
| Case difficulty ranking* | 9 | 8 | 3 | 7 | 5 | 4 | 1 | 2 | 6 |
*1 is the easiest and 9 is the most difficult test set
The mean value (standard deviation) of area under receiver operating characteristics curve (AUC), sensitivity, specificity, and lesion sensitivity of participants when grouped by various variables
| No | AUC | Sensitivity | Specificity | Lesion sens | |
|---|---|---|---|---|---|
| Gender† | |||||
| Male | 144 | 0.81 (0.10) | 77.20 (17.31) | 75.48 (17.26) | 65.46 (19.80) |
| Female | 184 | 0.86 (0.07) | 82.21 (13.33) | 80.22 (14.31) | 74.28 (15.82) |
| Not responded | 577 | 0.83 (0.09) | 82.28 (14.45) | 73.33 (17.12) | 68.76 (18.48) |
| Position | |||||
| Breast physician | 39 | 0.84 (0.07) | 81.6 (14.31) | 73.31 (17.4) | 68.06 (16.47) |
| Radiologist | 866 | 0.83 (0.09) | 81.45 (14.86) | 75.15 (16.79) | 69.41 (18.47) |
| | 0.880 | 0.963 | 0.544 | 0.467 | |
| # Cases per week | |||||
| No cases | 10 | 0.70 (0.08) | 62.5 (25.08) | 68.5 (32.81) | 40.58 (13.17) |
| < 20 | 261 | 0.78 (0.1) | 75.53 (16.77) | 71.25 (18.51) | 59.40 (19.66) |
| 20–50 | 72 | 0.84 (0.08) | 84.38 (13.16) | 70.92 (18.19) | 68.92 (16.14) |
| 51–100 | 113 | 0.85 (0.07) | 81.55 (15.64) | 77.27 (14.42) | 67.56 (18.15) |
| 101–150 | 122 | 0.86 (0.07) | 84.93 (12.21) | 77.38 (14.91) | 75.69 (14.10) |
| 151–200 | 167 | 0.85 (0.07) | 84.61 (10.52) | 76.94 (14.95) | 76.18 (13.11) |
| > 200 | 157 | 0.87 (0.06) | 85.53 (11.6) | 78.05 (15.21) | 77.31 (15.14) |
| Not responded | 3 | 0.8 (0.11) | 56.67 (30.14) | 92 (12.17) | 55.67 (31.02) |
| | |||||
| # Hours per week | |||||
| None | 10 | 0.70 (0.08) | 62.5 (25.08) | 68.5 (32.81) | 40.58 (13.17) |
| < 4 | 307 | 0.80 (0.1) | 77.86 (15.92) | 72.75 (18.42) | 62.33 (18.98) |
| 5–10 | 299 | 0.83 (0.08) | 81.33 (14.9) | 75.04 (16.07) | 70.73 (17.51) |
| 10–15 | 100 | 0.86 (0.06) | 84.57 (11.12) | 77.48 (13.71) | 74.84 (14.74) |
| 16–20 | 82 | 0.86 (0.08) | 84.51 (16) | 76.76 (16.05) | 74.78 (18.51) |
| 21–30 | 46 | 0.89 (0.06) | 89.15 (8.32) | 79.28 (13.51) | 81.40 (9.99) |
| > 30 | 61 | 0.88 (0.06) | 85.09 (10.52) | 80.28 (13.43) | 73.41 (17.01) |
| | |||||
| Cases in usual practice | |||||
| Both | 113 | 0.80 (0.1) | 76.74 (15.41) | 73.92 (18.62) | 61.78 (18.93) |
| Hard copy | 129 | 0.82 (0.1) | 79.96 (15.14) | 74.55 (16.89) | 67.03 (19.72) |
| Soft copy | 662 | 0.84 (0.08) | 82.55 (14.51) | 75.40 (16.48) | 71.1 (17.67) |
| Not responded | 1 | 0.83 (0) | 85 (0) | 57 (0) | 71 (0) |
| | 0.555 | ||||
| Screening program | |||||
| BreastScreen Aotearoa | 181 | 0.87 (0.06) | 87.92 (9.57) | 76.09 (13.88) | 82.15 (10.53) |
| BreastScreen Australia | 451 | 0.84 (0.08) | 82.19 (13.9) | 76.98 (16.14) | 69.80 (17.14) |
| No | 273 | 0.79 (0.09) | 75.96 (17.07) | 71.25 (18.95) | 60.13 (19.2) |
| | |||||
| Fellowship | |||||
| Yes | 205 | 0.85 (0.08) | 82.22 (13.93) | 78.38 (15.7) | 73.34 (17.24) |
| No | 574 | 0.82 (0.09) | 80.72 (15.41) | 74.15 (17.11) | 69.69 (18.67) |
| Not Responded | 126 | 0.85 (0.07) | 83.57 (13.28) | 73.88 (16.62) | 61.34 (16.41) |
| | 0.312 | ||||
| Age | |||||
| Q1: [28–44) | 212 | 0.81 (0.1) | 77.81 (16.54) | 74.77 (18.14) | 65.65 (19.87) |
| Q2: [44–54) | 234 | 0.84 (0.08) | 83.34 (13.79) | 74.46 (17.52) | 71.02 (17.85) |
| Q3: [54–61) | 211 | 0.85 (0.07) | 83.65 (12.09) | 76.27 (14.29) | 72.49 (15.74) |
| Q4: 61 + | 248 | 0.84 (0.09) | 81.38 (15.73) | 74.88 (16.86) | 68.53 (19.15) |
| | 0.932 | ||||
| # Years 1 | |||||
| Q1: [0–3) | 208 | 0.79 (0.09) | 76.8 (16.26) | 71.59 (18.98) | 60.85 (18.47) |
| Q2: [3–10) | 237 | 0.83 (0.08) | 80.44 (15.76) | 76.07 (15.79) | 69.09 (18.87) |
| Q3: [10–18) | 210 | 0.86 (0.07) | 84.82 (12.18) | 77.2 (16.09) | 73.68 (16.4) |
| Q4:18 + | 250 | 0.85 (0.08) | 83.47 (13.61) | 75.23 (16.07) | 73.05 (17.01) |
| | |||||
| # Years 2 | |||||
| Q1: [0–1) | 224 | 0.79 (0.09) | 75.07 (17.62) | 72.98 (19.3) | 60.29 (19.11) |
| Q2: [1–10) | 223 | 0.85 (0.08) | 82.28 (14.16) | 77.43 (14.83) | 72.04 (17.85) |
| Q3: [10–19) | 237 | 0.85 (0.08) | 85.25 (12.16) | 74.21 (16.28) | 73.94 (16.09) |
| Q4:19 + | 221 | 0.85 (0.08) | 83.07 (13.14) | 75.16 (16.47) | 70.77 (17.95) |
| | |||||
The number (No.) of participants (out of 905) and p values, indicating whether the difference in each performance metric among categories is significant, are also shown. The reader’s age, number of years reading mammograms (# Years 1), and number of years certified as screening readers (# Years 2) were discretized in four quartiles
†As majority of participants did not provide the response to the questions about their gender, the p value for this feature is not reported
*It should be noted that although p values are significant, adjustments to confounding factors are required for judging the effect of each variable on the performance
Significant p values are in bold
Adjusted Odds Ratios (OR) for readers characteristics, which led to an adjusted OR, significantly greater than, or less than 1
| Variables | AUC > Median | L. Sens. > Median |
|---|---|---|
| Treating #Hours, #Cases, and #Years as ordinal variables | ||
| Radiologist: breast Phy | 2.31 (1.14–4.69) | – |
| # Hours (ordinal 1–7) | 1.28 (1.16–1.40) | 1.21 (1.07–1.35) |
| # Cases (ordinal 1–7) | 1.38 (1.23–1.55) | 1.35 (1.23–1.48) |
| # Years | 1.03 (1.01–1.05) | – |
| Treating #Hours, #Cases, and #Years as categorical variables | ||
| Radiologist: breast Phy | 2.56 (1.22–5.36) | – |
| # Year Q3: Q1 | 2.34 (1.45–3.78) | 2.03 (1.24–3.31) |
| # Year Q4: Q1 | 1.72 (1.07–2.76) | – |
| # Cases 21–50: none | 13.66 (1.33–140.61) | 23.57 (2.38–233.46) |
| # Cases 51–100: none | 12.53 (1.23–127.74) | 26.96 (2.75–264.54) |
| # Cases 101–150: none | 21.8 (2.15–221.19) | 31.41 (3.24–304.6) |
| #Cases 151–200: none | – | 23.42 (2.43–225.88) |
| #Cases 201 + : none | 15.06 (1.48–152.78) | 33.12 (3.4–322.47) |
The ORs for having an area under receiver operating characteristics curve (AUC) and lesion sensitivity (L. Sens.) greater than the median values are shown. The analyses have been done twice (Ordinal and Categorical). Only significant values are shown
#Hours, #Cases, and #Years represent number of hours reading mammograms per week, number of mammographic cases per week, and number of years reading mammographic images
“–” represents non-significant adjusted ORs
The performance of the proposed model’s prediction if used for categorizing the readers as high- and low-performers
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| Predicting over-all reader’s performance, as measured by AUC | |||||
| Baseline model | 0.72(0.68–0.75) | 0.68(0.64–0.71) | 0.65(0.62–0.69) | 0.75(0.71–0.79) | 0.72(0.68–0.76) |
| Proposed model | 0.81(0.79–0.84) | 0.78(0.74–0.80) | 0.73(0.70–0.77) | 0.89(0.86–0.92) | 0.85(0.82–0.88) |
| Predicting reader’s lesion sensitivity | |||||
| Baseline model | 0.71(0.68–0.75) | 0.69(0.66–0.72) | 0.68(0.63–0.71) | 0.78(0.74–0.82) | 0.74(0.71–0.78) |
| Proposed model | 0.81(0.78–0.84) | 0.79(0.76–0.81) | 0.79(0.75–0.82) | 0.91(0.89–0.94) | 0.88(0.86–0.91) |
The numbers in the columns represent five different ways of categorizing readers as low- and high-performers, as presented in the Statistical Analysis and Validation section. The performance is measured by area under receiver operating characteristics curve (AUC) and corresponding 95% confidence interval for the AUC values. The baseline model for the comparison only includes reading volume (as measured by number of cases per week) and set difficulty as its inputs
Fig. 1The receiver operating characteristics (ROC) curves and their confidence intervals (dashed and dotted lines) for categorising readers as high- and low-performers using the median AUC value (the second way of categorisation in the Statistical Analysis and Validation section). The grey ROC curve represents the ROC of the proposed ensemble of regression trees and the black ROC curve represents how well one can categorise these two groups of readers if only the number of cases per week (measure of reading volume) and case set difficulty is used
Fig. 2The histogram of the absolute error for predicting the AUC of readers using the proposed machine learning model
Fig. 3The output of the proposed machine learning model has been simulated for various values for number of cases per week and years of experiences a the simulation results for the most difficult; and b the simulation result for the easiest case set. The output of the proposed machine learning model has been simulated for all possible pairs for hours per week and cases per week c the simulation result for the easiest case set; and d the simulation results for the most difficult case set