| Literature DB >> 35396246 |
Majerle Reeves1, Harish S Bhat2, Sidra Goldman-Mellor3.
Abstract
OBJECTIVE: Improve methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods.Entities:
Keywords: Data Science; Decision Trees; Machine Learning
Mesh:
Year: 2022 PMID: 35396246 PMCID: PMC8996002 DOI: 10.1136/bmjhci-2021-100456
Source DB: PubMed Journal: BMJ Health Care Inform ISSN: 2632-1009
Data broken down by race/ethnic feature, excluding the ‘other’ and ‘unknown’ race categories
| White | Hispanic | Black | Asian | Native American | |
| Patient records | 17 337 370 | 9 863 670 | 4 437 649 | 2 014 810 | 125 769 |
| Suicide death records | 27 974 | 4739 | 2099 | 1246 | 264 |
| Suicide death records per 100 000 patient records | 161 | 48 | 47 | 62 | 210 |
Note that suicide rates differ considerably by category.
Average test set AUC with SD (at training set specificity of approximately 0.76) of different combinations of resampling procedure plus statistical/machine learning method, by racial/ethnic group
| Asian | Black | Hispanic | White | Size of range | |
| Blind—Logistic Regression | 0.77 (0.02) |
| 0.77 (0.02) | 0.73 (0.01) |
|
| Blind—Naive Bayes | 0.74 (0.03) |
| 0.75 (0.02) | 0.71 (0.01) |
|
| Blind—XGBoost |
|
|
|
| 0.07 (0.05) |
| Blind—Random Forest | 0.77 (0.02) | 0.72 (0.05) | 0.76 (0.01) | 0.73 (0.01) | 0.07 (0.04) |
| Separate—Logistic Regression | 0.74 (0.02) | 0.66 (0.05) |
| 0.73 (0.01) | 0.10 (0.05) |
| Separate—Naive Bayes | 0.73 (0.03) | 0.67 (0.05) | 0.74 (0.02) | 0.71 (0.01) |
|
| Separate—XGBoost |
|
|
|
|
|
| Separate—Random Forest | 0.74 (0.02) | 0.67 (0.07) |
| 0.73 (0.01) | 0.10 (0.06) |
| Equity—Logistic Regression | 0.76 (0.02) | 0.71 (0.05) | 0.77 (0.02) | 0.72 (0.01) | 0.08 (0.04) |
| Equity—Naive Bayes | 0.74 (0.03) | 0.72 (0.05) | 0.76 (0.02) | 0.71 (0.01) |
|
| Equity—XGBoost |
|
|
|
| 0.08 (0.05) |
| Equity—Random Forest | 0.76 (0.03) | 0.70 (0.06) | 0.76 (0.01) | 0.72 (0.01) | 0.09 (0.05) |
For each column and each resampling method, boldface indicates the top performing method(s).
AUC, area under the curve.
Average test set sensitivity with SD (at training set specificity of approximately 0.76) of different combinations of resampling procedure plus statistical/machine learning method, by racial/ethnic group
| Asian | Black | Hispanic | White | Size of range | |
| Blind—Logistic Regression | 0.43 (0.05) | 0.30 (0.08) | 0.32 (0.04) | 0.76 (0.02) | 0.47 (0.05) |
| Blind—Naive Bayes |
|
|
| 0.70 (0.02) |
|
| Blind—XGBoost | 0.37 (0.03) | 0.27 (0.08) | 0.30 (0.03) |
| 0.56 (0.05) |
| Blind—Random Forest | 0.31 (0.03) | 0.24 (0.07) | 0.25 (0.03) | 0.79 (0.04) | 0.58 (0.05) |
| Separate—Logistic Regression | 0.69 (0.03) | 0.56 (0.09) | 0.63 (0.04) |
| 0.16 (0.07) |
| Separate—Naive Bayes | 0.60 (0.04) |
| 0.57 (0.04) | 0.56 (0.03) |
|
| Separate—XGBoost | 0.67 (0.04) | 0.53 (0.12) | 0.57 (0.07) |
| 0.19 (0.09) |
| Separate—Random Forest |
|
|
| 0.57 (0.03) | 0.20 (0.08) |
| Equity—Logistic Regression | 0.58 (0.03) | 0.52 (0.09) | 0.63 (0.04) | 0.61 (0.02) | 0.14 (0.05) |
| Equity—Naive Bayes | 0.56 (0.05) | 0.57 (0.09) | 0.63 (0.05) | 0.59 (0.03) | 0.12 (0.04) |
| Equity—XGBoost | 0.55 (0.04) | 0.50 (0.10) | 0.69 (0.03) |
| 0.24 (0.06) |
| Equity—Random Forest |
|
|
|
|
|
For each column and each resampling method, boldface indicates the top performing method(s).
Average test set specificity with SD (at training set specificity of approximately 0.76) of different combinations of resampling procedure plus statistical/machine learning method, by racial/ethnic group
| Asian | Black | Hispanic | White | Size of range | |
| Blind—Logistic Regression | 0.91 (0.01) | 0.93 (0.01) | 0.94 (0.01) | 0.57 (0.01) | 0.38 (0.01) |
| Blind—Naive Bayes | 0.91 (0.01) | 0.90 (0.01) | 0.91 (0.00) |
|
|
| Blind—XGBoost | 0.96 (0.00) | 0.95 (0.01) | 0.95 (0.00) | 0.54 (0.02) | 0.42 (0.02) |
| Blind—Random Forest |
|
|
| 0.52 (0.04) | 0.45 (0.04) |
| Separate—Logistic Regression | 0.66 (0.01) | 0.66 (0.02) | 0.74 (0.01) | 0.75 (0.00) |
|
| Separate—Naive Bayes |
| 0.63 (0.08) |
| 0.74 (0.01) | 0.15 (0.07) |
| Separate—XGBoost | 0.71 (0.02) |
| 0.76 (0.05) |
|
|
| Separate—Random Forest | 0.61 (0.03) | 0.64 (0.03) | 0.70 (0.03) | 0.75 (0.02) | 0.15 (0.04) |
| Equity—Logistic Regression | 0.79 (0.01) | 0.78 (0.01) |
|
|
|
| Equity—Naive Bayes |
| 0.74 (0.01) |
|
| 0.10 (0.01) |
| Equity—XGBoost |
|
| 0.72 (0.01) | 0.65 (0.01) | 0.17 (0.01) |
| Equity—Random Forest | 0.70 (0.06) | 0.65 (0.03) | 0.66 (0.04) | 0.61 (0.06) | 0.10 (0.02) |
For each column and each resampling method, boldface indicates the top performing method(s).