Sanjay Basu1,2,3, Rajiv Narayanaswamy4. 1. Research and Analytics, Collective Health, San Francisco, CA. 2. Center for Primary Care, Harvard Medical School, Boston, MA. 3. School of Public Health, Imperial College, London, UK. 4. KPMG LLP, San Francisco, CA.
Abstract
BACKGROUND: Social determinants of health (SDH) at the area level are understood to influence the likelihood of having poor glycemic control for patients with type 2 diabetes mellitus (T2DM). OBJECTIVES: To develop a model for predicting whether a person with T2DM has uncontrolled diabetes (hemoglobin A1c ≥9%), incorporating individual and area-level (census tract) covariates. RESEARCH DESIGN: Development and validation of machine learning models. SUBJECTS: Total of N=1,015,808 privately insured persons in claims data with T2DM. MEASURES: C-statistic, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. RESULTS: A standard logistic regression model selecting among the available individual-level covariates and area-level SDH covariates (at the census tract level) performed poorly, with a C-statistic of 0.685, sensitivity of 25.6%, specificity of 90.1%, positive predictive value of 56.9%, negative predictive value of 70.4%, and accuracy of 68.4% on a 25% held-out validation subset of the data. By contrast, machine learning models improved upon risk prediction, with the highest performance from a random forest algorithm with a C-statistic of 0.928, sensitivity of 68.5%, specificity of 94.6%, positive predictive value of 69.8%, negative predictive value of 94.3%, and accuracy of 90.6%. SDH variables alone explained 16.9% of variation in uncontrolled diabetes. CONCLUSIONS: A predictive model developed through a machine learning approach may assist health care organizations to identify which area-level SDH data to monitor for prediction of diabetes control, for potential use in risk-adjustment and targeting.
BACKGROUND: Social determinants of health (SDH) at the area level are understood to influence the likelihood of having poor glycemic control for patients with type 2 diabetes mellitus (T2DM). OBJECTIVES: To develop a model for predicting whether a person with T2DM has uncontrolled diabetes (hemoglobin A1c ≥9%), incorporating individual and area-level (census tract) covariates. RESEARCH DESIGN: Development and validation of machine learning models. SUBJECTS: Total of N=1,015,808 privately insured persons in claims data with T2DM. MEASURES: C-statistic, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. RESULTS: A standard logistic regression model selecting among the available individual-level covariates and area-level SDH covariates (at the census tract level) performed poorly, with a C-statistic of 0.685, sensitivity of 25.6%, specificity of 90.1%, positive predictive value of 56.9%, negative predictive value of 70.4%, and accuracy of 68.4% on a 25% held-out validation subset of the data. By contrast, machine learning models improved upon risk prediction, with the highest performance from a random forest algorithm with a C-statistic of 0.928, sensitivity of 68.5%, specificity of 94.6%, positive predictive value of 69.8%, negative predictive value of 94.3%, and accuracy of 90.6%. SDH variables alone explained 16.9% of variation in uncontrolled diabetes. CONCLUSIONS: A predictive model developed through a machine learning approach may assist health care organizations to identify which area-level SDH data to monitor for prediction of diabetes control, for potential use in risk-adjustment and targeting.
Authors: Melanie T Chen; Danielle M Krzyszczyk; Alison G M Brown; Nancy Kressin; Norma Terrin; Amresh Hanchate; Jillian Suzukida; Sucharita Kher; Lori Lyn Price; Amy M LeClair; Elena Byhoff; Karen M Freund Journal: J Racial Ethn Health Disparities Date: 2021-05-19