Jiali Yan1, Kristin A Linn2, Brian W Powers3,4,5,6, Jingsan Zhu7, Sachin H Jain5, Jennifer L Kowalski8, Amol S Navathe9,10. 1. Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA. 2. Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA. 3. Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA. 4. Department of Population Medicine, Harvard Medical School/Harvard Pilgrim Health Care Institute, Boston, MA, USA. 5. CareMore Health System, Cerritos, CA, USA. 6. Atrius Health, Boston, MA, USA. 7. Department of Medical Ethics and Health Policy, University of Pennsylvania Perelman School of Medicine, 1108 Blockley Hall, Philadelphia, PA, 19104, USA. 8. Anthem Public Policy Institute, Washington, DC, USA. 9. Department of Medical Ethics and Health Policy, University of Pennsylvania Perelman School of Medicine, 1108 Blockley Hall, Philadelphia, PA, 19104, USA. amol@wharton.upenn.edu. 10. Corporal Michael J. Cresencz VA Medical Center, Philadelphia, PA, USA. amol@wharton.upenn.edu.
Abstract
BACKGROUND: Efforts to improve the value of care for high-cost patients may benefit from care management strategies targeted at clinically distinct subgroups of patients. OBJECTIVE: To evaluate the performance of three different machine learning algorithms for identifying subgroups of high-cost patients. DESIGN: We applied three different clustering algorithms-connectivity-based clustering using agglomerative hierarchical clustering, centroid-based clustering with the k-medoids algorithm, and density-based clustering with the OPTICS algorithm-to a clinical and administrative dataset. We then examined the extent to which each algorithm identified subgroups of patients that were (1) clinically distinct and (2) associated with meaningful differences in relevant utilization metrics. PARTICIPANTS: Patients enrolled in a national Medicare Advantage plan, categorized in the top decile of spending (n = 6154). MAIN MEASURES: Post hoc discriminative models comparing the importance of variables for distinguishing observations in one cluster from the rest. Variance in utilization and spending measures. KEY RESULTS: Connectivity-based, centroid-based, and density-based clustering identified eight, five, and ten subgroups of high-cost patients, respectively. Post hoc discriminative models indicated that density-based clustering subgroups were the most clinically distinct. The variance of utilization and spending measures was the greatest among the subgroups identified through density-based clustering. CONCLUSIONS: Machine learning algorithms can be used to segment a high-cost patient population into subgroups of patients that are clinically distinct and associated with meaningful differences in utilization and spending measures. For these purposes, density-based clustering with the OPTICS algorithm outperformed connectivity-based and centroid-based clustering algorithms.
BACKGROUND: Efforts to improve the value of care for high-cost patients may benefit from care management strategies targeted at clinically distinct subgroups of patients. OBJECTIVE: To evaluate the performance of three different machine learning algorithms for identifying subgroups of high-cost patients. DESIGN: We applied three different clustering algorithms-connectivity-based clustering using agglomerative hierarchical clustering, centroid-based clustering with the k-medoids algorithm, and density-based clustering with the OPTICS algorithm-to a clinical and administrative dataset. We then examined the extent to which each algorithm identified subgroups of patients that were (1) clinically distinct and (2) associated with meaningful differences in relevant utilization metrics. PARTICIPANTS: Patients enrolled in a national Medicare Advantage plan, categorized in the top decile of spending (n = 6154). MAIN MEASURES: Post hoc discriminative models comparing the importance of variables for distinguishing observations in one cluster from the rest. Variance in utilization and spending measures. KEY RESULTS: Connectivity-based, centroid-based, and density-based clustering identified eight, five, and ten subgroups of high-cost patients, respectively. Post hoc discriminative models indicated that density-based clustering subgroups were the most clinically distinct. The variance of utilization and spending measures was the greatest among the subgroups identified through density-based clustering. CONCLUSIONS: Machine learning algorithms can be used to segment a high-cost patient population into subgroups of patients that are clinically distinct and associated with meaningful differences in utilization and spending measures. For these purposes, density-based clustering with the OPTICS algorithm outperformed connectivity-based and centroid-based clustering algorithms.
Authors: Brian W Powers; Jiali Yan; Jingsan Zhu; Kristin A Linn; Sachin H Jain; Jennifer L Kowalski; Amol S Navathe Journal: J Gen Intern Med Date: 2018-12-03 Impact factor: 5.128
Authors: Jeffrey D Clough; Gerald F Riley; Melissa Cohen; Sheila M Hanley; Darshak Sanghavi; Darren A DeWalt; Rahul Rajkumar; Patrick H Conway Journal: Healthc (Amst) Date: 2015-10-01
Authors: Walid M Abdelmoula; Benjamin Balluff; Sonja Englert; Jouke Dijkstra; Marcel J T Reinders; Axel Walch; Liam A McDonnell; Boudewijn P F Lelieveldt Journal: Proc Natl Acad Sci U S A Date: 2016-10-10 Impact factor: 11.205
Authors: Jemila S Hamid; Christopher Meaney; Natasha S Crowcroft; Julia Granerod; Joseph Beyene Journal: BMC Infect Dis Date: 2010-12-31 Impact factor: 3.090
Authors: Brian W Powers; Jiali Yan; Jingsan Zhu; Kristin A Linn; Sachin H Jain; Jennifer L Kowalski; Amol S Navathe Journal: J Gen Intern Med Date: 2018-12-03 Impact factor: 5.128
Authors: Samuel J Amodeo; Henrik F Kowalkowski; Halley L Brantley; Nicholas W Jones; Lauren R Bangerter; David J Cook Journal: J Gen Intern Med Date: 2021-06-07 Impact factor: 6.473
Authors: Ravi B Parikh; Kristin A Linn; Jiali Yan; Matthew L Maciejewski; Ann-Marie Rosland; Kevin G Volpp; Peter W Groeneveld; Amol S Navathe Journal: PLoS One Date: 2021-02-19 Impact factor: 3.240
Authors: Caitlin E Coombes; Zachary B Abrams; Suli Li; Lynne V Abruzzo; Kevin R Coombes Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497