Peter Wingrove1,2, Winston Liaw2,3, Jeremy Weiss4, Stephen Petterson2, John Maier5, Andrew Bazemore2. 1. University of Pittsburgh, School of Medicine, Pittsburgh, Pennsylvania pmw27@pitt.edu. 2. Robert Graham Center, Washington, DC. 3. University of Houston, College of Medicine, Department of Health Systems and Population Health Sciences, Houston, Texas. 4. Carnegie Mellon University, Pittsburgh, Pennsylvania. 5. University of Pittsburgh, Department of Biomedical Informatics, Pittsburgh, Pennsylvania.
Abstract
PURPOSE: To develop and test a machine-learning-based model to predict primary care and other specialties using Medicare claims data. METHODS: We used 2014-2016 prescription and procedure Medicare data to train 3 sets of random forest classifiers (prescription only, procedure only, and combined) to predict specialty. Self-reported specialties were condensed to 27 categories. Physicians were assigned to testing and training cohorts, and random forest models were trained and then applied to 2014-2016 data sets for the testing cohort to generate a series of specialty predictions. Comparing the predicted specialty to self-report, we assessed performance with F1 scores and area under the receiver operating characteristic curve (AUROC) values. RESULTS: A total of 564,986 physicians were included. The combined model had a greater aggregate (macro) F1 score (0.876) than the prescription-only (0.745; P <.01) or procedure-only (0.821; P <.01) model. Mean F1 scores across specialties in the combined model ranged from 0.533 to 0.987. The mean F1 score was 0.920 for primary care. The mean AUROC value for the combined model was 0.992, with values ranging from 0.982 to 0.999. The AUROC value for primary care was 0.982. CONCLUSIONS: This novel approach showed high performance and provides a near real-time assessment of current primary care practice. These findings have important implications for primary care workforce research in the absence of accurate data.
PURPOSE: To develop and test a machine-learning-based model to predict primary care and other specialties using Medicare claims data. METHODS: We used 2014-2016 prescription and procedure Medicare data to train 3 sets of random forest classifiers (prescription only, procedure only, and combined) to predict specialty. Self-reported specialties were condensed to 27 categories. Physicians were assigned to testing and training cohorts, and random forest models were trained and then applied to 2014-2016 data sets for the testing cohort to generate a series of specialty predictions. Comparing the predicted specialty to self-report, we assessed performance with F1 scores and area under the receiver operating characteristic curve (AUROC) values. RESULTS: A total of 564,986 physicians were included. The combined model had a greater aggregate (macro) F1 score (0.876) than the prescription-only (0.745; P <.01) or procedure-only (0.821; P <.01) model. Mean F1 scores across specialties in the combined model ranged from 0.533 to 0.987. The mean F1 score was 0.920 for primary care. The mean AUROC value for the combined model was 0.992, with values ranging from 0.982 to 0.999. The AUROC value for primary care was 0.982. CONCLUSIONS: This novel approach showed high performance and provides a near real-time assessment of current primary care practice. These findings have important implications for primary care workforce research in the absence of accurate data.
Authors: Sunil Gupta; Truyen Tran; Wei Luo; Dinh Phung; Richard Lee Kennedy; Adam Broad; David Campbell; David Kipp; Madhu Singh; Mustafa Khasraw; Leigh Matheson; David M Ashley; Svetha Venkatesh Journal: BMJ Open Date: 2014-03-17 Impact factor: 2.692
Authors: Lei Zhang; Jason J Ong; Xianglong Xu; Christopher K Fairley; Eric P F Chow; David Lee; Ei T Aung Journal: Sci Rep Date: 2022-05-24 Impact factor: 4.996
Authors: Amanda L Terry; Jacqueline K Kueper; Ron Beleno; Judith Belle Brown; Sonny Cejic; Janet Dang; Daniel Leger; Scott McKay; Leslie Meredith; Andrew D Pinto; Bridget L Ryan; Moira Stewart; Merrick Zwarenstein; Daniel J Lizotte Journal: BMC Med Inform Decis Mak Date: 2022-09-09 Impact factor: 3.298