BACKGROUND: Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. OBJECTIVE: To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. MATERIALS AND METHODS: The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. RESULTS: By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. CONCLUSIONS: Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
BACKGROUND: Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. OBJECTIVE: To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. MATERIALS AND METHODS: The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. RESULTS: By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. CONCLUSIONS: Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
Entities:
Keywords:
electronic health record; electronic medical record; genomics; phenotype; validation studies
Authors: Joshua C Denny; Marylyn D Ritchie; Dana C Crawford; Jonathan S Schildcrout; Andrea H Ramirez; Jill M Pulley; Melissa A Basford; Daniel R Masys; Jonathan L Haines; Dan M Roden Journal: Circulation Date: 2010-11-01 Impact factor: 29.690
Authors: Iftikhar J Kullo; Jin Fan; Jyotishman Pathak; Guergana K Savova; Zeenat Ali; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Mark C Hornbrook; Gene Hart; Jennifer L Ellis; Donald J Bachman; Gary Ansell; Sarah M Greene; Edward H Wagner; Roy Pardee; Mark M Schmidt; Ann Geiger; Amy L Butani; Terry Field; Hassan Fouayzi; Irina Miroshnik; Liyan Liu; Robert Diseker; Karen Wells; Rick Krajenta; Lois Lamerato; Christine Neslund Dudas Journal: J Natl Cancer Inst Monogr Date: 2005
Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063
Authors: Paul R Burton; Anna L Hansell; Isabel Fortier; Teri A Manolio; Muin J Khoury; Julian Little; Paul Elliott Journal: Int J Epidemiol Date: 2008-08-01 Impact factor: 7.196
Authors: Wei-Qi Wei; Pedro L Teixeira; Huan Mo; Robert M Cronin; Jeremy L Warner; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2015-09-02 Impact factor: 4.497
Authors: Gregory W Hruby; Luke V Rasmussen; David Hanauer; Vimla L Patel; James J Cimino; Chunhua Weng Journal: Int J Med Inform Date: 2016-06-16 Impact factor: 4.046
Authors: Ning Shang; Cong Liu; Luke V Rasmussen; Casey N Ta; Robert J Caroll; Barbara Benoit; Todd Lingren; Ozan Dikilitas; Frank D Mentch; David S Carrell; Wei-Qi Wei; Yuan Luo; Vivian S Gainer; Iftikhar J Kullo; Jennifer A Pacheco; Hakon Hakonarson; Theresa L Walunas; Joshua C Denny; Ken Wiley; Shawn N Murphy; George Hripcsak; Chunhua Weng Journal: J Biomed Inform Date: 2019-09-19 Impact factor: 6.317
Authors: Hassan S Dashti; Brian E Cade; Gerda Stutaite; Richa Saxena; Susan Redline; Elizabeth W Karlson Journal: Sleep Date: 2021-03-12 Impact factor: 5.849
Authors: Katherine N Cahill; Christina B Johns; Jing Cui; Paige Wickner; David W Bates; Tanya M Laidlaw; Patrick E Beeler Journal: J Allergy Clin Immunol Date: 2016-07-25 Impact factor: 10.793