A Muthalagu1, J A Pacheco1, S Aufox1, P L Peissig2, J T Fuehrer2, G Tromp3, A N Kho4, L J Rasmussen-Torvik5. 1. Northwestern University, Center for Genetic Medicine , Chicago, Illinois, United States. 2. Marshfield Clinic Research Foundation , Marshfield, WI. 3. Weis Center for Research, Geisinger Health System , Danville, PA. 4. Department of Medicine, Northwestern University , Chicago, Illinois, United States. 5. Department of Preventive Medicine, Northwestern University Feinberg School of Medicine , Chicago, Illinois, United States.
Abstract
BACKGROUND: Height is a critical variable for many biomedical analyses because it is an important component of Body Mass Index (BMI). Transforming EHR height measures into meaningful research-ready values is challenging and there is limited information available on methods for "cleaning" these data. OBJECTIVES: We sought to develop an algorithm to clean adult height data extracted from EHR using only height values and associated ages. RESULTS: The algorithm we developed is sensitive to normal decreases in adult height associated with aging, is implemented using an open-source software tool and is thus easily modifiable, and is freely available. We checked the performance of our algorithm using data from the Northwestern biobank and a replication sample from the Marshfield Clinic biobank obtained through our participation in the eMERGE consortium. The algorithm identified 1262 erroneous values from a total of 33937 records in the Northwestern sample. Replacing erroneous height values with those identified as correct by the algorithm resulted in meaningful changes in height and BMI records; median change in recorded height after cleaning was 7.6 cm and median change in BMI was 2.9 kg/m(2). Comparison of cleaned EHR height values to observer measured values showed that 94.5% (95% C.I 93.8-% - 95.2%) of cleaned values were within 3.5 cm of observer measured values. CONCLUSIONS: Our freely available height algorithm cleans EHR height data with only height and age inputs. Use of this algorithm will benefit groups trying to perform research with height and BMI data extracted from EHR.
BACKGROUND: Height is a critical variable for many biomedical analyses because it is an important component of Body Mass Index (BMI). Transforming EHR height measures into meaningful research-ready values is challenging and there is limited information available on methods for "cleaning" these data. OBJECTIVES: We sought to develop an algorithm to clean adult height data extracted from EHR using only height values and associated ages. RESULTS: The algorithm we developed is sensitive to normal decreases in adult height associated with aging, is implemented using an open-source software tool and is thus easily modifiable, and is freely available. We checked the performance of our algorithm using data from the Northwestern biobank and a replication sample from the Marshfield Clinic biobank obtained through our participation in the eMERGE consortium. The algorithm identified 1262 erroneous values from a total of 33937 records in the Northwestern sample. Replacing erroneous height values with those identified as correct by the algorithm resulted in meaningful changes in height and BMI records; median change in recorded height after cleaning was 7.6 cm and median change in BMI was 2.9 kg/m(2). Comparison of cleaned EHR height values to observer measured values showed that 94.5% (95% C.I 93.8-% - 95.2%) of cleaned values were within 3.5 cm of observer measured values. CONCLUSIONS: Our freely available height algorithm cleans EHR height data with only height and age inputs. Use of this algorithm will benefit groups trying to perform research with height and BMI data extracted from EHR.
Keywords:
Height; body mass index; dimensional measurement accuracy; electronic health record; electronic medical record; phenotyping
Authors: Polly Hitchcock Noël; Laurel A Copeland; Ruth A Perrin; A Elizabeth Lancaster; Mary Jo Pugh; Chen-Pin Wang; Mary J Bollinger; Helen P Hazuda Journal: J Rehabil Res Dev Date: 2010
Authors: Daniel F Gudbjartsson; G Bragi Walters; Gudmar Thorleifsson; Hreinn Stefansson; Bjarni V Halldorsson; Pasha Zusmanovich; Patrick Sulem; Steinunn Thorlacius; Arnaldur Gylfason; Stacy Steinberg; Anna Helgadottir; Andres Ingason; Valgerdur Steinthorsdottir; Elinborg J Olafsdottir; Gudridur H Olafsdottir; Thorvaldur Jonsson; Knut Borch-Johnsen; Torben Hansen; Gitte Andersen; Torben Jorgensen; Oluf Pedersen; Katja K Aben; J Alfred Witjes; Dorine W Swinkels; Martin den Heijer; Barbara Franke; Andre L M Verbeek; Diane M Becker; Lisa R Yanek; Lewis C Becker; Laufey Tryggvadottir; Thorunn Rafnar; Jeffrey Gulcher; Lambertus A Kiemeney; Augustine Kong; Unnur Thorsteinsdottir; Kari Stefansson Journal: Nat Genet Date: 2008-04-06 Impact factor: 38.330
Authors: Abel N Kho; M Geoffrey Hayes; Laura Rasmussen-Torvik; Jennifer A Pacheco; William K Thompson; Loren L Armstrong; Joshua C Denny; Peggy L Peissig; Aaron W Miller; Wei-Qi Wei; Suzette J Bielinski; Christopher G Chute; Cynthia L Leibson; Gail P Jarvik; David R Crosslin; Christopher S Carlson; Katherine M Newton; Wendy A Wolf; Rex L Chisholm; William L Lowe Journal: J Am Med Inform Assoc Date: 2011-11-19 Impact factor: 4.497
Authors: Abel N Kho; Jennifer A Pacheco; Peggy L Peissig; Luke Rasmussen; Katherine M Newton; Noah Weston; Paul K Crane; Jyotishman Pathak; Christopher G Chute; Suzette J Bielinski; Iftikhar J Kullo; Rongling Li; Teri A Manolio; Rex L Chisholm; Joshua C Denny Journal: Sci Transl Med Date: 2011-04-20 Impact factor: 17.956
Authors: Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2016-03-28 Impact factor: 4.497
Authors: Kathryn L Jackson; Michael Mbagwu; Jennifer A Pacheco; Abigail S Baldridge; Daniel J Viox; James G Linneman; Sanjay K Shukla; Peggy L Peissig; Kenneth M Borthwick; David A Carrell; Suzette J Bielinski; Jacqueline C Kirby; Joshua C Denny; Frank D Mentch; Lyam M Vazquez; Laura J Rasmussen-Torvik; Abel N Kho Journal: BMC Infect Dis Date: 2016-11-17 Impact factor: 3.090
Authors: Kenneth M Borthwick; Diane T Smelser; Jonathan A Bock; James R Elmore; Evan J Ryer; Zi Ye; Jennifer A Pacheco; David S Carrell; Michael Michalkiewicz; William K Thompson; Jyotishman Pathak; Suzette J Bielinski; Joshua C Denny; James G Linneman; Peggy L Peissig; Abel N Kho; Omri Gottesman; Harpreet Parmar; Iftikhar J Kullo; Catherine A McCarty; Erwin P Böttinger; Eric B Larson; Gail P Jarvik; John B Harley; Tanvir Bajwa; David P Franklin; David J Carey; Helena Kuivaniemi; Gerard Tromp Journal: Int J Biomed Data Min Date: 2015-07-30
Authors: Michael Mbagwu; Dustin D French; Manjot Gill; Christopher Mitchell; Kathryn Jackson; Abel Kho; Paul J Bryar Journal: JMIR Med Inform Date: 2016-05-04
Authors: Charlotte S C Woolley; Ian G Handel; B Mark Bronsvoort; Jeffrey J Schoenebeck; Dylan N Clements Journal: PLoS One Date: 2020-01-24 Impact factor: 3.240
Authors: Benjamin Bowe; Andrew K Gibson; Yan Xie; Yan Yan; Aaron van Donkelaar; Randall V Martin; Ziyad Al-Aly Journal: Environ Health Perspect Date: 2021-04-01 Impact factor: 9.031