Cheryl R Clark1,2,3, Mark J Ommerborn4, Kaitlyn Moran4, Katherine Brooks5, Jennifer Haas6, David W Bates5, Adam Wright5. 1. Center for Community Health and Health Equity, Brigham and Women's Hospital, 1620 Tremont Street, Boston, MA 02120, Boston, MA, USA. crclark@partners.org. 2. Harvard Medical School, Boston, MA, USA. crclark@partners.org. 3. Division of General Medicine and Primary Care, Brigham and Women's-Faulkner Hospitalist Program, Boston, MA, USA. crclark@partners.org. 4. Center for Community Health and Health Equity, Brigham and Women's Hospital, 1620 Tremont Street, Boston, MA 02120, Boston, MA, USA. 5. Division of General Medicine and Primary Care, Brigham and Women's-Faulkner Hospitalist Program, Boston, MA, USA. 6. Division of General Medicine and Primary Care, Massachusetts General Hospital, Boston, MA, USA.
Abstract
BACKGROUND: Self-rated health is a strong predictor of mortality and morbidity. Machine learning techniques may provide insights into which of the multifaceted contributors to self-rated health are key drivers in diverse groups. OBJECTIVE: We used machine learning algorithms to predict self-rated health in diverse groups in the Behavioral Risk Factor Surveillance System (BRFSS), to understand how machine learning algorithms might be used explicitly to examine drivers of self-rated health in diverse populations. DESIGN: We applied three common machine learning algorithms to predict self-rated health in the 2017 BRFSS survey, stratified by age, race/ethnicity, and sex. We replicated our process in the 2016 BRFSS survey. PARTICIPANTS: We analyzed data from 449,492 adult participants of the 2017 BRFSS survey. MAIN MEASURES: We examined area under the curve (AUC) statistics to examine model fit within each group. We used traditional logistic regression to predict self-rated health associated with features identified by machine learning models. KEY RESULTS: Each algorithm, regularized logistic regression (AUC: 0.81), random forest (AUC: 0.80), and support vector machine (AUC: 0.81), provided good model fit in the BRFSS. Predictors of self-rated health were similar by sex and race/ethnicity but differed by age. Socioeconomic features were prominent predictors of self-rated health in mid-life age groups. Income [OR: 1.70 (95% CI: 1.62-1.80)], education [OR: 2.02 (95% CI: 1.89, 2.16)], physical activity [OR: 1.52 (95% CI: 1.46-1.58)], depression [OR: 0.66 (95% CI: 0.63-0.68)], difficulty concentrating [OR: 0.62 (95% CI: 0.58-0.66)], and hypertension [OR: 0.59 (95% CI: 0.57-0.61)] all predicted the odds of excellent or very good self-rated health. CONCLUSIONS: Our analysis of BRFSS data show social determinants of health are prominent predictors of self-rated health in mid-life. Our work may demonstrate promising practices for using machine learning to advance health equity.
BACKGROUND: Self-rated health is a strong predictor of mortality and morbidity. Machine learning techniques may provide insights into which of the multifaceted contributors to self-rated health are key drivers in diverse groups. OBJECTIVE: We used machine learning algorithms to predict self-rated health in diverse groups in the Behavioral Risk Factor Surveillance System (BRFSS), to understand how machine learning algorithms might be used explicitly to examine drivers of self-rated health in diverse populations. DESIGN: We applied three common machine learning algorithms to predict self-rated health in the 2017 BRFSS survey, stratified by age, race/ethnicity, and sex. We replicated our process in the 2016 BRFSS survey. PARTICIPANTS: We analyzed data from 449,492 adult participants of the 2017 BRFSS survey. MAIN MEASURES: We examined area under the curve (AUC) statistics to examine model fit within each group. We used traditional logistic regression to predict self-rated health associated with features identified by machine learning models. KEY RESULTS: Each algorithm, regularized logistic regression (AUC: 0.81), random forest (AUC: 0.80), and support vector machine (AUC: 0.81), provided good model fit in the BRFSS. Predictors of self-rated health were similar by sex and race/ethnicity but differed by age. Socioeconomic features were prominent predictors of self-rated health in mid-life age groups. Income [OR: 1.70 (95% CI: 1.62-1.80)], education [OR: 2.02 (95% CI: 1.89, 2.16)], physical activity [OR: 1.52 (95% CI: 1.46-1.58)], depression [OR: 0.66 (95% CI: 0.63-0.68)], difficulty concentrating [OR: 0.62 (95% CI: 0.58-0.66)], and hypertension [OR: 0.59 (95% CI: 0.57-0.61)] all predicted the odds of excellent or very good self-rated health. CONCLUSIONS: Our analysis of BRFSS data show social determinants of health are prominent predictors of self-rated health in mid-life. Our work may demonstrate promising practices for using machine learning to advance health equity.
Entities:
Keywords:
healthcare disparities; machine learning; self-rated health; social determinants of health; socioeconomic factors
Authors: Alvin Rajkomar; Michaela Hardt; Michael D Howell; Greg Corrado; Marshall H Chin Journal: Ann Intern Med Date: 2018-12-04 Impact factor: 25.391
Authors: Karen B DeSalvo; Tiffany M Jones; John Peabody; Jay McDonald; Stephan Fihn; Vincent Fan; Jiang He; Paul Muntner Journal: Med Care Date: 2009-04 Impact factor: 2.983
Authors: Stephanie A Robert; Dasha Cherepanov; Mari Palta; Nancy Cross Dunham; David Feeny; Dennis G Fryback Journal: J Gerontol B Psychol Sci Soc Sci Date: 2009-03-23 Impact factor: 4.077