David A Hanauer1, Mohammed Saeed2, Kai Zheng3, Qiaozhu Mei4, Kerby Shedden5, Alan R Aronson6, Naren Ramakrishnan7. 1. Department of Pediatrics, University of Michigan Medical School, Ann Arbor, Michigan, USA. 2. Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, USA. 3. Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, Michigan, USA School of Information, University of Michigan, Ann Arbor, Michigan, USA. 4. School of Information, University of Michigan, Ann Arbor, Michigan, USA Department of Electronic Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA. 5. Center for Statistical Consultation and Research, University of Michigan, Ann Arbor, Michigan, USA. 6. Lister Hill Center, National Library of Medicine, Bethesda, Maryland, USA. 7. Department of Computer Science, Discovery Analytics Center, Virginia Tech, Arlington, Virginia, USA.
Abstract
OBJECTIVE: We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. METHODS: Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. RESULTS: The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. DISCUSSION: Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. CONCLUSIONS: In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
OBJECTIVE: We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. METHODS: Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. RESULTS: The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. DISCUSSION: Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. CONCLUSIONS: In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Entities:
Keywords:
Data Mining; Electronic Health Records; International Classification of Diseases; Medline; Natural Language Processing; Unified Medical Language System
Authors: Paul Avillach; Jean-Charles Dufour; Gayo Diallo; Francesco Salvo; Michel Joubert; Frantz Thiessard; Fleur Mougin; Gianluca Trifirò; Annie Fourrier-Réglat; Antoine Pariente; Marius Fieschi Journal: J Am Med Inform Assoc Date: 2012-11-29 Impact factor: 4.497
Authors: S Zhang; D J Hunter; S E Hankinson; E L Giovannucci; B A Rosner; G A Colditz; F E Speizer; W C Willett Journal: JAMA Date: 1999-05-05 Impact factor: 56.272
Authors: Christianne L Roumie; Edward Mitchel; Patricia S Gideon; Cristina Varas-Lorenzo; Jordi Castellsague; Marie R Griffin Journal: Pharmacoepidemiol Drug Saf Date: 2008-01 Impact factor: 2.890
Authors: Jason S Knight; Roberto Caricchio; Jean-Laurent Casanova; Alexis J Combes; Betty Diamond; Sharon E Fox; David A Hanauer; Judith A James; Yogendra Kanthi; Virginia Ladd; Puja Mehta; Aaron M Ring; Ignacio Sanz; Carlo Selmi; Russell P Tracy; Paul J Utz; Catriona A Wagner; Julia Y Wang; William J McCune Journal: J Clin Invest Date: 2021-12-15 Impact factor: 14.808
Authors: Elizabeth S Chen; Genevieve B Melton; Richard C Wasserman; Paul T Rosenau; Diantha B Howard; Indra Neil Sarkar Journal: AMIA Annu Symp Proc Date: 2015-11-05