Jenna Wong1, Mara Murray Horwitz1, Li Zhou2,3, Sengwee Toh1. 1. Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA. 2. Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA. 3. Harvard Medical School, Boston, MA.
Abstract
PURPOSE OF REVIEW: Electronic health records (EHRs) contain valuable data for identifying health outcomes, but these data also present numerous challenges when creating computable phenotyping algorithms. Machine learning methods could help with some of these challenges. In this review, we discuss four common scenarios that researchers may find helpful for thinking critically about when and for what tasks machine learning may be used to identify health outcomes from EHR data. RECENT FINDINGS: We first consider the conditions in which machine learning may be especially useful with respect to two dimensions of a health outcome: 1) the characteristics of its diagnostic criteria, and 2) the format in which its diagnostic data are usually stored within EHR systems. In the first dimension, we propose that for health outcomes with diagnostic criteria involving many clinical factors, vague definitions, or subjective interpretations, machine learning may be useful for modeling the complex diagnostic decision-making process from a vector of clinical inputs to identify individuals with the health outcome. In the second dimension, we propose that for health outcomes where diagnostic information is largely stored in unstructured formats such as free text or images, machine learning may be useful for extracting and structuring this information as part of a natural language processing system or an image recognition task. We then consider these two dimensions jointly to define four common scenarios of health outcomes. For each scenario, we discuss the potential uses for machine learning - first assuming accurate and complete EHR data and then relaxing these assumptions to accommodate the limitations of real-world EHR systems. We illustrate these four scenarios using concrete examples and describe how recent studies have used machine learning to identify these health outcomes from EHR data. SUMMARY: Machine learning has great potential to improve the accuracy and efficiency of health outcome identification from EHR systems, especially under certain conditions. To promote the use of machine learning in EHR-based phenotyping tasks, future work should prioritize efforts to increase the transportability of machine learning algorithms for use in multi-site settings.
PURPOSE OF REVIEW: Electronic health records (EHRs) contain valuable data for identifying health outcomes, but these data also present numerous challenges when creating computable phenotyping algorithms. Machine learning methods could help with some of these challenges. In this review, we discuss four common scenarios that researchers may find helpful for thinking critically about when and for what tasks machine learning may be used to identify health outcomes from EHR data. RECENT FINDINGS: We first consider the conditions in which machine learning may be especially useful with respect to two dimensions of a health outcome: 1) the characteristics of its diagnostic criteria, and 2) the format in which its diagnostic data are usually stored within EHR systems. In the first dimension, we propose that for health outcomes with diagnostic criteria involving many clinical factors, vague definitions, or subjective interpretations, machine learning may be useful for modeling the complex diagnostic decision-making process from a vector of clinical inputs to identify individuals with the health outcome. In the second dimension, we propose that for health outcomes where diagnostic information is largely stored in unstructured formats such as free text or images, machine learning may be useful for extracting and structuring this information as part of a natural language processing system or an image recognition task. We then consider these two dimensions jointly to define four common scenarios of health outcomes. For each scenario, we discuss the potential uses for machine learning - first assuming accurate and complete EHR data and then relaxing these assumptions to accommodate the limitations of real-world EHR systems. We illustrate these four scenarios using concrete examples and describe how recent studies have used machine learning to identify these health outcomes from EHR data. SUMMARY: Machine learning has great potential to improve the accuracy and efficiency of health outcome identification from EHR systems, especially under certain conditions. To promote the use of machine learning in EHR-based phenotyping tasks, future work should prioritize efforts to increase the transportability of machine learning algorithms for use in multi-site settings.
Entities:
Keywords:
cohort identification; electronic health records; health outcomes; machine learning; phenotyping
Authors: Mark I Neuman; Edward Y Lee; Sarah Bixby; Stephanie Diperna; Jeffrey Hellinger; Richard Markowitz; Sabah Servaes; Michael C Monuteaux; Samir S Shah Journal: J Hosp Med Date: 2011-10-18 Impact factor: 2.960
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Katherine P Liao; Tianxi Cai; Vivian Gainer; Sergey Goryachev; Qing Zeng-treitler; Soumya Raychaudhuri; Peter Szolovits; Susanne Churchill; Shawn Murphy; Isaac Kohane; Elizabeth W Karlson; Robert M Plenge Journal: Arthritis Care Res (Hoboken) Date: 2010-08 Impact factor: 4.794
Authors: Michelle Petri; Ana-Maria Orbai; Graciela S Alarcón; Caroline Gordon; Joan T Merrill; Paul R Fortin; Ian N Bruce; David Isenberg; Daniel J Wallace; Ola Nived; Gunnar Sturfelt; Rosalind Ramsey-Goldman; Sang-Cheol Bae; John G Hanly; Jorge Sánchez-Guerrero; Ann Clarke; Cynthia Aranow; Susan Manzi; Murray Urowitz; Dafna Gladman; Kenneth Kalunian; Melissa Costner; Victoria P Werth; Asad Zoma; Sasha Bernatsky; Guillermo Ruiz-Irastorza; Munther A Khamashta; Soren Jacobsen; Jill P Buyon; Peter Maddison; Mary Anne Dooley; Ronald F van Vollenhoven; Ellen Ginzler; Thomas Stoll; Christine Peschken; Joseph L Jorizzo; Jeffrey P Callen; S Sam Lim; Barri J Fessler; Murat Inanc; Diane L Kamen; Anisur Rahman; Kristjan Steinsson; Andrew G Franks; Lisa Sigler; Suhail Hameed; Hong Fang; Ngoc Pham; Robin Brey; Michael H Weisman; Gerald McGwin; Laurence S Magder Journal: Arthritis Rheum Date: 2012-08
Authors: James A McCart; Donald J Berndt; Jay Jarman; Dezon K Finch; Stephen L Luther Journal: J Am Med Inform Assoc Date: 2012-12-15 Impact factor: 4.497
Authors: Loreen Straub; Joshua J Gagne; Judith C Maro; Michael D Nguyen; Nicolas Beaulieu; Jeffrey S Brown; Adee Kennedy; Margaret Johnson; Adam Wright; Li Zhou; Shirley V Wang Journal: Drug Saf Date: 2019-09 Impact factor: 5.606
Authors: Jenna Wong; Daniel Prieto-Alhambra; Peter R Rijnbeek; Rishi J Desai; Jenna M Reps; Sengwee Toh Journal: Drug Saf Date: 2022-05-17 Impact factor: 5.228
Authors: Hassane Alami; Pascale Lehoux; Yannick Auclair; Michèle de Guise; Marie-Pierre Gagnon; James Shaw; Denis Roy; Richard Fleet; Mohamed Ali Ag Ahmed; Jean-Paul Fortin Journal: J Med Internet Res Date: 2020-07-07 Impact factor: 5.428
Authors: Zhaohua Lu; Jin-Ah Sim; Jade X Wang; Christopher B Forrest; Kevin R Krull; Deokumar Srivastava; Melissa M Hudson; Leslie L Robison; Justin N Baker; I-Chan Huang Journal: J Med Internet Res Date: 2021-11-03 Impact factor: 7.076
Authors: Martin Prodel; Laurent Finkielsztejn; Laëtitia Roustand; Gaëlle Nachbaur; Lucie De Leotoing; Marie Genreau; Fabrice Bonnet; Jade Ghosn Journal: J Public Health Res Date: 2021-11-29