| Literature DB >> 35463778 |
Esther Puyol-Antón1, Bram Ruijsink1,2,3, Jorge Mariscal Harana1, Stefan K Piechnik4, Stefan Neubauer4, Steffen E Petersen5,6,7,8, Reza Razavi1,2, Phil Chowienczyk1,9, Andrew P King1.
Abstract
Background: Artificial intelligence (AI) techniques have been proposed for automation of cine CMR segmentation for functional quantification. However, in other applications AI models have been shown to have potential for sex and/or racial bias. The objective of this paper is to perform the first analysis of sex/racial bias in AI-based cine CMR segmentation using a large-scale database.Entities:
Keywords: cardiac magnetic resonance; deep learning; fair AI; inequality fairness in deep learning-based CMR segmentation; segmentation
Year: 2022 PMID: 35463778 PMCID: PMC9021445 DOI: 10.3389/fcvm.2022.859310
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
Population characteristics for the train/validation and test sets.
| Train/validation | Test | |||
| Continuous variables | Patients, | 4,410 | 1,250 | |
| Age (years; mean, | 62 (8) | 61 (8) | ||
| Height (cm; mean, | 169 (9) | 169 (9) | ||
| Weight (kg; mean, | 76 (15) | 75 (14) | ||
| BMI (kg/m2; mean, | 27 (4) | 26 (4) | ||
| BSA (m2; mean, | 1.86 (0.21) | 1.85 (0.20) | ||
| Systolic blood pressure (mmHg; mean, | 136 (20) | 136 (18) | ||
| Diastolic blood pressure (mmHg; mean, SD) | 79 (11) | 80 (10) | ||
| Heart rate (bpm; mean, SD) | 63 (20) | 63 (10) | ||
|
| ||||
| Categorical variables | Sex (males; | 2,299 (52) | 655 (52) | |
| Racial group | White ( | 3,570 (81) | 1,025 (81) | |
| Mixed ( | 136 (3) | 34 (3) | ||
| Asian ( | 313 (7) | 83 (7) | ||
| Black ( | 190 (4) | 47 (4) | ||
| Chinese ( | 87 (2) | 27 (2) | ||
| Other ( | 144 (3) | 34 (3) | ||
All continuous values are reported as mean(SD), while categorical variables are reported as number (percentage). SD, standard deviation.
Dice similarity coefficient (DSC) values for the overall test set and by sex and race.
| LVBP | LVMyo | RVBP | AVG | |
| Total | 94.39 (2.61) | 88.68 (3.06) | 90.77 (3.86) | 91.28 (3.18) |
|
| ||||
| Male | 94.35 (2.55) | 89.10 (2.84) | 90.61 (3.96) | 91.35 (3.12) |
| Female | 94.44 (2.67) | 88.59 (3.26) | 90.94 (3.94) | 91.32 (3.29) |
|
| ||||
| White | 95.13 (1.98) | 89.81 (1.48) | 92.24 (2.11) | 92.39 (1.86) |
| Mixed | 89.79 (1.34) | 80.72 (2.38) | 82.95 (2.53) | 84.49 (2.08) |
| Asian | 92.15 (2.48) | 86.46 (2.18) | 86.27 (2.63) | 88.29 (2.43) |
| Black | 91.41 (1.53) | 85.78 (1.73) | 80.88 (2.10) | 86.02 (1.79) |
| Chinese | 88.98 (2.43) | 79.75 (2.21) | 82.58 (2.32) | 83.77 (2.32) |
| Others | 90.46 (2.53) | 82.64 (5.44) | 84.77 (3.46) | 85.96 (3.81) |
DSC reported for the LV blood pool (LVBP), LV myocardium (LVMyo) and RV blood pool (RVBP), and average DSC values across LVBP, LVM and RVBP (AVG column). DSC is reported as mean and standard deviation (in parentheses). The first row reports the DSC for the full database, the second and third rows report DSC by sex and the remaining rows report DSC by racial group. Values are reported as mean(SD). Comparison of variables between groups (i.e., male vs. female, white vs. non-white, mixed vs. non-mixed, etc.) was carried out using an independent Student’s t-test. Pairwise post hoc testing was carried out using Bonferroni correction for multiple comparisons. Asterisks indicate statistically significant differences between each group and the rest of the test set after correction (28 tests), where *p < 0.01/28, **p < 0.001/28, ***p < 0.0001/28. Exact p-values are reported in
Manual clinical measurements (top table) and absolute (middle table) and relative (bottom table) differences in volumetric and functional measures between automated and manual segmentations, overall and by sex and race.
| (A) Manual | |||||||
|
|
|
|
|
|
|
| |
| Total | 79 (20) | 33 (12) | 60 (7) | 51 (14) | 86 (22) | 38 (13) | 57 (7) |
|
| |||||||
| Male | 82 (20) | 36 (12) | 59 (7) | 50 (12) | 95 (21) | 45 (13) | 54 (7) |
| Female | 72 (14) | 29 (8) | 61 (7) | 42 (9) | 77 (14) | 32 (8) | 58 (6) |
|
| |||||||
| White | 83 (20) | 35 (12) | 59 (6) | 51 (14) | 87 (22) | 39 (13) | 56 (6) |
| Mixed | 76 (20) | 27 (9) | 64 (8) | 47 (14) | 83 (20) | 35 (10) | 58 (8) |
| Asian | 70 (18) | 25 (10) | 65 (8) | 48 (12) | 76 (19) | 32 (11) | 58 (6) |
| Black | 87 (21) | 33 (11) | 63 (6) | 59 (13) | 94 (27) | 41 (14) | 56 (6) |
| Chinese | 66 (12) | 22 (7) | 66 (7) | 46 (11) | 75 (16) | 32 (8) | 58 (6) |
| Others | 77 (19) | 28 (9) | 64 (6) | 53 (15) | 86 (23) | 36 (13) | 59 (7) |
|
| |||||||
|
| |||||||
|
| |||||||
|
|
|
|
|
|
|
| |
|
| |||||||
| Total | 2.6 (1.7) | 2.1 (1.8) | 2.5 (2.4) | 3.8 (3.9) | 3.5 (2.6) | 3.0 (2.2) | 3.6 (3.0) |
|
| |||||||
| Male | 2.7 (1.7) | 2.1 (1.7) | 2.1 (1.9) | 4.1 (4.2) | 3.4 (2.6) | 3.0 (2.1) | 3.1 (2.7) |
| Female | 2.6 (1.7) | 2.1 (1.8) | 2.9 (2.8) | 3.5 (3.4) | 3.5 (2.6) | 4.6 (2.2) | 4.1 (3.3) |
|
| |||||||
| White | 2.3 (1.5) | 1.9 (1.5) | 2.1 (2.1) | 4.0 (3.3) | 3.2 (2.6) | 2.8 (2.2) | 3.4 (2.9) |
| Mixed | 3.9 (2.1) | 3.4 (1.7) | 4.1 (2.7) | 1.9 (1.7) | 4.6 (1.8) | 3.9 (1.8) | 4.9 (2.5) |
| Asian | 3.4 (1.9) | 2.8 (2.3) | 4.0 (2.9) | 2.0 (2.3) | 4.4 (2.4) | 3.4 (1.9) | 4.4 (3.3) |
| Black | 3.6 (1.8) | 2.9 (2.8) | 3.3 (3.0) | 2.0 (2.2) | 4.4 (1.6) | 3.5 (1.9) | 3.9 (2.6) |
| Chinese | 4.4 (2.2) | 3.4 (2.1) | 4.7 (2.8) | 4.1 (3.6) | 4.8 (2.4) | 4.0 (2.9) | 6.4 (5.4) |
| Others | 3.7 (1.9) | 3.1 (2.0) | 4.3 (3.2) | 2.3 (2.5) | 4.6 (3.4) | 3.6 (1.8) | 4.3 (2.8) |
|
| |||||||
|
| |||||||
|
| |||||||
|
|
|
|
|
|
|
| |
|
| |||||||
| Total | 3.4 (2.5) | 7.1 (7.4) | 4.1 (3.9) | 8.7 (8.3) | 4.3 (3.4) | 8.8 (7.5) | 6.4 (5.2) |
|
| |||||||
| Male | 3.0 (2.3) | 6.2 (6.3) | 3.6 (3.1) | 7.8 (6.5) | 3.7 (3.0) | 7.3 (5.9) | 5.8 (5.0) |
| Female | 3.7 (2.7) | 7.9 (8.2) | 4.6 (4.4) | 9.6 (9.6) | 4.9 (3.7) | 10.2 (8.4) | 7.0 (5.4) |
|
| |||||||
| White | 3.0 (2.1) | 6.0 (6.1) | 3.7 (3.6) | 8.4 (8.7) | 4.0 (3.4) | 8.2 (7.3) | 6.0 (5.1) |
| Mixed | 5.7 (3.1) | 14.1 (8.2) | 6.5 (4.2) | 10.3 (6.1) | 6.2 (2.4) | 13.3 (6.8) | 9.2 (5.1) |
| Asian | 5.1 (3.2) | 11.8 (11.6) | 5.8 (4.2) | 10.5 (5.4) | 6.1 (3.4) | 11.5 (6.8) | 7.2 (4.9) |
| Black | 4.1 (2.3) | 7.7 (6.8) | 5.1 (4.8) | 7.3 (4.1) | 5.1 (2.2) | 9.3 (5.9) | 7.3 (4.7) |
| Chinese | 7.0 (4.3) | 16.5 (10.6) | 6.9 (3.7) | 13.6 (7.1) | 6.2 (3.2) | 13.8 (11.4) | 10.4 (9.4) |
| Others | 5.0 (2.9) | 12.6 (10.2) | 7.7 (5.5) | 8.9 (4.2) | 5.2 (3.9) | 11.9 (7.0) | 8.1 (4.9) |
Clinical measurements for the LV and RV end diastolic volume (EDV), end systolic volume (ESV), ejection fraction (EF), and left ventricular mass (LVmass). All cardiac volumes were indexed to body surface area using the Dubois and Dubois formula (
Associations between average DSC and racial group.
| (A) Univariate linear regression | |||
|
| |||
|
|
|
| |
|
| |||
| Mixed | 1,250 | 0.34 (0.30, 0.38) | 6.30E-16 |
| Asian | 1,250 | 0.33 (0.29, 0.37) | 1.57E-12 |
| Black | 1,250 | 0.36 (0.32, 0.40) | 1.30E-19 |
| Chinese | 1,250 | 0.32 (0.28, 0.36) | 1.08E-8 |
| Other | 1,250 | 0.30 (0.26, 0.34) | 4.43E-14 |
|
| |||
|
| |||
|
| |||
|
|
|
| |
|
| |||
| Age | 1,250 | 0.03 (–0.02, 0.08) | 0.210 |
| Sex | 1,250 | 0.02 (–0.03, 0.08) | 0.364 |
| Weight | 1,250 | 0.10 (–0.36, 0.51) | 0.699 |
| Height | 1,250 | 0.00 (–0.28, 0.29) | 0.972 |
| BMI | 1,250 | -0.02 (–0.36, 0.36) | 0.944 |
| HR | 1,250 | 0.03 (–0.01, 0.07) | 0.114 |
| SBP | 1,250 | -0.01 (–0.07, 0.04) | 0.579 |
| DBP | 1,250 | -0.04 (–0.08, 0.01) | 0.114 |
| LVEDV | 1,250 | -0.02 (–0.21, 0.17) | 0.855 |
| LVESV | 1,250 | -0.07 (–0.20, 0.06) | 0.284 |
| RVEDV | 1,250 | 0.12 (–0.09, 0.31) | 0.235 |
| RVESV | 1,250 | -0.11 (–0.24, 0.04) | 0.127 |
| Lvmass | 1,250 | -0.04 (–0.11, 0.02) | 0.174 |
| Diabetes | 1,250 | 0.10 (–0.07, 0.27) | 0.273 |
| Hypertension | 1,250 | 0.05 (0.00, 0.10) | 0.034 |
| Hyper | 1,250 | 0.00 (–0.04, 0.05) | 0.860 |
| Smoking | 1,250 | 0.00 (–0.05, 0.03) | 0.812 |
| Center | 1,250 | 0.15 (0.09, 0.21) | 9.99E-02 |
| Mixed | 1,250 | 0.38 (0.36, 0.41) | 9.99E-04 |
| Asian | 1,250 | 0.37 (0.34, 0.41) | 9.99E-04 |
| Black | 1,250 | 0.40 (0.38, 0.43) | 9.99E-04 |
| Chinese | 1,250 | 0.36 (0.34, 0.39) | 9.99E-04 |
| Other | 1,250 | 0.34 (0.30, 0.38) | 9.99E-04 |
Standardized regression beta-coefficients and CI are shown, representing the z-score change in variables with increasing DSC. The White racial group was selected as control. LV, left ventricle, EDV, end-diastolic volume, ESV, end-systolic volume, SBP, systolic blood pressure, DBP, diastolic blood pressure, CI, confidence interval. Model 1 is unadjusted; Model 2 is adjusted for sex, height, weight, blood pressure at scan-time, heart rate at scan-time, LVEDV, LVESV, RVEDV, RVESV, LVmass, diabetes, hypertension, hypercholesterolemia, smoking and center. *p < 0.01, **p < 0.001, ***p < 0.00001.
The comparison of adjusted mean between racial groups based on one-way ANOVA and ANCOVA.
| Mean (95% CI) | |||
|
| Model 4 | Model 5 | |
| White | 1,025 | 0.93 (0.93, 0.93) | 0.93 (0.93, 0.93) |
| Mixed | 34 | 0.84 (0.86, 0.82) | 0.83 (0.85, 0.80) |
| Asian | 83 | 0.89 (0.90, 0.88) | 0.88 (0.89, 0.88) |
| Black | 47 | 0.86 (0.87, 0.85) | 0.85 (0.86, 0.83) |
| Chinese | 27 | 0.84 (0.86, 0.81) | 0.82 (0.84, 0.78) |
| Other | 34 | 0.86 (0.88, 0.85) | 0.85 (0.87, 0.83) |
Model 4 is unadjusted; Model 5 is adjusted for sex, height, weight, blood pressure at scan-time, heart rate at scan-time, LVEDV, LVESV, RVEDV, RVESV, LVmass, diabetes, hypertension, hypercholesterolemia, smoking, and center. CI, confidence interval. For model 4 and model 5, pairwise post hoc testing was carried out using Scheffé’s method.
Misclassification rate for HF diagnosis.
| HFrEF | HFmrEF | HFpEF | |||||
| LVEF < 40% | LEF 40–49% | LVEF ≥ 50% | |||||
|
| n GT | MCR (%) | n GT | MCR | n GT | MCR (%) | |
| White | 107 | 5 | 3.74 | 14 | 5.61 | 88 | 7.48 |
| Mixed | 11 | 3 | 45.45 | 0 | – | 8 | 36.36 |
| Black | 8 | 0 | – | 4 | 12.05 | 4 | 25.00 |
| Asian | 14 | 4 | 21.43 | 2 | 7.14 | 8 | 14.29 |
| Chinese | 4 | 0 | – | 2 | 25.00 | 2 | 50.00 |
| Other | 6 | 1 | 33.33 | 5 | 16.67 | 0 | – |
|
| |||||||
| Minority groups | 43 | 8 | 23.26 | 13 | 9.30 | 22 | 23.26 |
The table summarizes numbers of subjects in each racial group and HF diagnosis (i.e., HFrEF, HFmrEF and HFpEF), as well as the misclassification rate (MCR,%) for each racial group and diagnosis. The row Minority groups combines data from the Mixed, Black, Asian, Chinese and Other groups. The left column (n overall) shows the number of subjects for each racial group used to compute the MCRs. For each HF diagnosis, the first column shows the number of ground truth positive subjects in that group, and the second column shows the MCR. When computing the MCRs, the ground truth negative subjects were all subjects from the other HF diagnoses for that racial group. HFrEF, HF with reduced EF; HFmrEF, HF with mildly reduced EF; HFpEF, HF with preserved EF. Blank cells show regions with missing data.