David A Hanauer1, Naren Ramakrishnan. 1. Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109-5940, USA. hanauer@umich.edu
Abstract
OBJECTIVE: We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients. METHODS: We performed two independent analyses including a pairwise association analysis using a χ(2) test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance. RESULTS: We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pylori infection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori, raising questions about possible cause and effect. DISCUSSION: Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation. CONCLUSIONS: Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
OBJECTIVE: We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients. METHODS: We performed two independent analyses including a pairwise association analysis using a χ(2) test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance. RESULTS: We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pyloriinfection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori, raising questions about possible cause and effect. DISCUSSION: Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation. CONCLUSIONS: Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
Authors: Melissa S Cline; Michael Smoot; Ethan Cerami; Allan Kuchinsky; Nerius Landys; Chris Workman; Rowan Christmas; Iliana Avila-Campilo; Michael Creech; Benjamin Gross; Kristina Hanspers; Ruth Isserlin; Ryan Kelley; Sarah Killcoyne; Samad Lotia; Steven Maere; John Morris; Keiichiro Ono; Vuk Pavlovic; Alexander R Pico; Aditya Vailaya; Peng-Liang Wang; Annette Adler; Bruce R Conklin; Leroy Hood; Martin Kuiper; Chris Sander; Ilya Schmulevich; Benno Schwikowski; Guy J Warner; Trey Ideker; Gary D Bader Journal: Nat Protoc Date: 2007 Impact factor: 13.491
Authors: Herbert C Szeto; Robert K Coleman; Parisa Gholami; Brian B Hoffman; Mary K Goldstein Journal: Am J Manag Care Date: 2002-01 Impact factor: 2.229
Authors: David N Reshef; Yakir A Reshef; Hilary K Finucane; Sharon R Grossman; Gilean McVean; Peter J Turnbaugh; Eric S Lander; Michael Mitzenmacher; Pardis C Sabeti Journal: Science Date: 2011-12-16 Impact factor: 47.728
Authors: Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford Journal: Bioinformatics Date: 2010-03-24 Impact factor: 6.937
Authors: Joyce Y Tung; Chuong B Do; David A Hinds; Amy K Kiefer; J Michael Macpherson; Arnab B Chowdry; Uta Francke; Brian T Naughton; Joanna L Mountain; Anne Wojcicki; Nicholas Eriksson Journal: PLoS One Date: 2011-08-17 Impact factor: 3.240
Authors: David A Hanauer; Mohammed Saeed; Kai Zheng; Qiaozhu Mei; Kerby Shedden; Alan R Aronson; Naren Ramakrishnan Journal: J Am Med Inform Assoc Date: 2014-06-13 Impact factor: 4.497
Authors: Svetlana Lyalina; Bethany Percha; Paea LePendu; Srinivasan V Iyer; Russ B Altman; Nigam H Shah Journal: J Am Med Inform Assoc Date: 2013-08-16 Impact factor: 4.497
Authors: Serena G Liao; Yan Lin; Dongwan D Kang; Divay Chandra; Jessica Bon; Naftali Kaminski; Frank C Sciurba; George C Tseng Journal: BMC Bioinformatics Date: 2014-11-05 Impact factor: 3.169
Authors: Jason S Knight; Roberto Caricchio; Jean-Laurent Casanova; Alexis J Combes; Betty Diamond; Sharon E Fox; David A Hanauer; Judith A James; Yogendra Kanthi; Virginia Ladd; Puja Mehta; Aaron M Ring; Ignacio Sanz; Carlo Selmi; Russell P Tracy; Paul J Utz; Catriona A Wagner; Julia Y Wang; William J McCune Journal: J Clin Invest Date: 2021-12-15 Impact factor: 14.808
Authors: P LePendu; S V Iyer; A Bauer-Mehren; R Harpaz; J M Mortensen; T Podchiyska; T A Ferris; N H Shah Journal: Clin Pharmacol Ther Date: 2013-03-04 Impact factor: 6.875