Jason K Wang1, Jason Hom2, Santhosh Balasubramanian2, Alejandro Schuler3, Nigam H Shah3, Mary K Goldstein2, Michael T M Baiocchi4, Jonathan H Chen5. 1. Mathematical and Computational Science Program, Stanford University, Stanford, CA, USA. 2. Department of Medicine, Stanford University, Stanford, CA, USA. 3. Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA. 4. Prevention Research Center, Stanford University, Stanford, CA, USA. 5. Department of Medicine, Stanford University, Stanford, CA, USA; Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA. Electronic address: jonc101@stanford.edu.
Abstract
OBJECTIVE: Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes. MATERIALS AND METHODS: Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into low-mortality (21.8%, n = 397) and high-mortality (6.0%, n = 110) extremes using a two-sided P-value score quantifying deviation of observed vs. expected 30-day patient mortality rates. Three patient cohorts were assembled: patients seen by low-mortality clinicians, high-mortality clinicians, and an unfiltered crowd of all clinicians (n = 1046, 1046, and 5230 post-propensity score matching, respectively). Predicted order lists were automatically generated from recommender system algorithms trained on each patient cohort and evaluated against (i) real-world practice patterns reflected in patient cases with better-than-expected mortality outcomes and (ii) reference standards derived from clinical practice guidelines. RESULTS: Across six common admission diagnoses, order lists learned from the crowd demonstrated the greatest alignment with guideline references (AUROC range = 0.86-0.91), performing on par or better than those learned from low-mortality clinicians (0.79-0.84, P < 10-5) or manually-authored hospital order sets (0.65-0.77, P < 10-3). The same trend was observed in evaluating model predictions against better-than-expected patient cases, with the crowd model (AUROC mean = 0.91) outperforming the low-mortality model (0.87, P < 10-16) and order set benchmarks (0.78, P < 10-35). DISCUSSION: Whether machine-learning models are trained on all clinicians or a subset of experts illustrates a bias-variance tradeoff in data usage. Defining robust metrics to assess quality based on internal (e.g. practice patterns from better-than-expected patient cases) or external reference standards (e.g. clinical practice guidelines) is critical to assess decision support content. CONCLUSION: Learning relevant decision support content from all clinicians is as, if not more, robust than learning from a select subgroup of clinicians favored by patient outcomes.
OBJECTIVE: Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patientmortality outcomes. MATERIALS AND METHODS: Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into low-mortality (21.8%, n = 397) and high-mortality (6.0%, n = 110) extremes using a two-sided P-value score quantifying deviation of observed vs. expected 30-day patientmortality rates. Three patientcohorts were assembled: patients seen by low-mortality clinicians, high-mortality clinicians, and an unfiltered crowd of all clinicians (n = 1046, 1046, and 5230 post-propensity score matching, respectively). Predicted order lists were automatically generated from recommender system algorithms trained on each patientcohort and evaluated against (i) real-world practice patterns reflected in patient cases with better-than-expected mortality outcomes and (ii) reference standards derived from clinical practice guidelines. RESULTS: Across six common admission diagnoses, order lists learned from the crowd demonstrated the greatest alignment with guideline references (AUROC range = 0.86-0.91), performing on par or better than those learned from low-mortality clinicians (0.79-0.84, P < 10-5) or manually-authored hospital order sets (0.65-0.77, P < 10-3). The same trend was observed in evaluating model predictions against better-than-expected patient cases, with the crowd model (AUROC mean = 0.91) outperforming the low-mortality model (0.87, P < 10-16) and order set benchmarks (0.78, P < 10-35). DISCUSSION: Whether machine-learning models are trained on all clinicians or a subset of experts illustrates a bias-variance tradeoff in data usage. Defining robust metrics to assess quality based on internal (e.g. practice patterns from better-than-expected patient cases) or external reference standards (e.g. clinical practice guidelines) is critical to assess decision support content. CONCLUSION: Learning relevant decision support content from all clinicians is as, if not more, robust than learning from a select subgroup of clinicians favored by patient outcomes.
Authors: Clyde W Yancy; Mariell Jessup; Biykem Bozkurt; Javed Butler; Donald E Casey; Mark H Drazner; Gregg C Fonarow; Stephen A Geraci; Tamara Horwich; James L Januzzi; Maryl R Johnson; Edward K Kasper; Wayne C Levy; Frederick A Masoudi; Patrick E McBride; John J V McMurray; Judith E Mitchell; Pamela N Peterson; Barbara Riegel; Flora Sam; Lynne W Stevenson; W H Wilson Tang; Emily J Tsai; Bruce L Wilkoff Journal: Circulation Date: 2013-06-05 Impact factor: 29.690
Authors: Jeremy A Rassen; Abhi A Shelat; Jessica M Franklin; Robert J Glynn; Daniel H Solomon; Sebastian Schneeweiss Journal: Epidemiology Date: 2013-05 Impact factor: 4.822
Authors: Carrie A Herzke; Henry J Michtalik; Nowella Durkin; Joseph Finkelstein; Amy Deutschendorf; Jason Miller; Curtis Leung; Daniel J Brotman Journal: J Hosp Med Date: 2017-12-20 Impact factor: 2.960
Authors: Hude Quan; Vijaya Sundararajan; Patricia Halfon; Andrew Fong; Bernard Burnand; Jean-Christophe Luthi; L Duncan Saunders; Cynthia A Beck; Thomas E Feasby; William A Ghali Journal: Med Care Date: 2005-11 Impact factor: 2.983
Authors: Jason K Wang; Jason Hom; Santhosh Balasubramanian; Alejandro Schuler; Nigam H Shah; Mary K Goldstein; Michael T M Baiocchi; Jonathan H Chen Journal: J Biomed Inform Date: 2018-09-07 Impact factor: 6.317
Authors: Mark V Mai; Evan W Orenstein; John D Manning; Anthony A Luberti; Adam C Dziorny Journal: Appl Clin Inform Date: 2020-06-24 Impact factor: 2.342
Authors: Kathryn Rough; Andrew M Dai; Kun Zhang; Yuan Xue; Laura M Vardoulakis; Claire Cui; Atul J Butte; Michael D Howell; Alvin Rajkomar Journal: Clin Pharmacol Ther Date: 2020-04-11 Impact factor: 6.875
Authors: Andre Kumar; Rachael C Aikens; Jason Hom; Lisa Shieh; Jonathan Chiang; David Morales; Divya Saini; Mark Musen; Michael Baiocchi; Russ Altman; Mary K Goldstein; Steven Asch; Jonathan H Chen Journal: J Am Med Inform Assoc Date: 2020-12-09 Impact factor: 4.497