Literature DB >> 24928177

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

David A Hanauer1, Mohammed Saeed2, Kai Zheng3, Qiaozhu Mei4, Kerby Shedden5, Alan R Aronson6, Naren Ramakrishnan7.   

Abstract

OBJECTIVE: We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel.
METHODS: Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations.
RESULTS: The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. DISCUSSION: Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations.
CONCLUSIONS: In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Entities:  

Keywords:  Data Mining; Electronic Health Records; International Classification of Diseases; Medline; Natural Language Processing; Unified Medical Language System

Mesh:

Year:  2014        PMID: 24928177      PMCID: PMC4147617          DOI: 10.1136/amiajnl-2014-002767

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  66 in total

Review 1.  Questions on validity of International Classification of Diseases-coded diagnoses.

Authors:  G Surján
Journal:  Int J Med Inform       Date:  1999-05       Impact factor: 4.046

2.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

3.  Supporting discovery in medicine by association rule mining in Medline and UMLS.

Authors:  D Hristovski; J Stare; B Peterlin; S Dzeroski
Journal:  Stud Health Technol Inform       Date:  2001

4.  A flexible framework for deriving assertions from electronic medical records.

Authors:  Kirk Roberts; Sanda M Harabagiu
Journal:  J Am Med Inform Assoc       Date:  2011-07-01       Impact factor: 4.497

5.  Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project.

Authors:  Paul Avillach; Jean-Charles Dufour; Gayo Diallo; Francesco Salvo; Michel Joubert; Frantz Thiessard; Fleur Mougin; Gianluca Trifirò; Annie Fourrier-Réglat; Antoine Pariente; Marius Fieschi
Journal:  J Am Med Inform Assoc       Date:  2012-11-29       Impact factor: 4.497

6.  A proof of concept for assessing emergency room use with primary care data and natural language processing.

Authors:  J St-Maurice; M-H Kuo; P Gooch
Journal:  Methods Inf Med       Date:  2012-12-07       Impact factor: 2.176

7.  A prospective study of folate intake and the risk of breast cancer.

Authors:  S Zhang; D J Hunter; S E Hankinson; E L Giovannucci; B A Rosner; G A Colditz; F E Speizer; W C Willett
Journal:  JAMA       Date:  1999-05-05       Impact factor: 56.272

8.  Validation of ICD-9 codes with a high positive predictive value for incident strokes resulting in hospitalization using Medicaid health data.

Authors:  Christianne L Roumie; Edward Mitchel; Patricia S Gideon; Cristina Varas-Lorenzo; Jordi Castellsague; Marie R Griffin
Journal:  Pharmacoepidemiol Drug Saf       Date:  2008-01       Impact factor: 2.890

9.  Concept Discovery for Pathology Reports using an N-gram Model.

Authors:  Vincent Yip; Mutlu Mete; Umit Topaloglu; Sinan Kockara
Journal:  Summit Transl Bioinform       Date:  2010-03-01

10.  Using linked data for mining drug-drug interactions in electronic health records.

Authors:  Jyotishman Pathak; Richard C Kiefer; Christopher G Chute
Journal:  Stud Health Technol Inform       Date:  2013
View more
  13 in total

1.  Challenges in clinical natural language processing for automated disorder normalization.

Authors:  Robert Leaman; Ritu Khare; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2015-07-14       Impact factor: 6.317

2.  Interpretable Topic Features for Post-ICU Mortality Prediction.

Authors:  Yen-Fu Luo; Anna Rumshisky
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

3.  Constructing a knowledge-based heterogeneous information graph for medical health status classification.

Authors:  Thuan Pham; Xiaohui Tao; Ji Zhang; Jianming Yong
Journal:  Health Inf Sci Syst       Date:  2020-02-14

Review 4.  The intersection of COVID-19 and autoimmunity.

Authors:  Jason S Knight; Roberto Caricchio; Jean-Laurent Casanova; Alexis J Combes; Betty Diamond; Sharon E Fox; David A Hanauer; Judith A James; Yogendra Kanthi; Virginia Ladd; Puja Mehta; Aaron M Ring; Ignacio Sanz; Carlo Selmi; Russell P Tracy; Paul J Utz; Catriona A Wagner; Julia Y Wang; William J McCune
Journal:  J Clin Invest       Date:  2021-12-15       Impact factor: 14.808

5.  Mining and Visualizing Family History Associations in the Electronic Health Record: A Case Study for Pediatric Asthma.

Authors:  Elizabeth S Chen; Genevieve B Melton; Richard C Wasserman; Paul T Rosenau; Diantha B Howard; Indra Neil Sarkar
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

6.  Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media.

Authors:  Julia Wu; Venkatesh Sivaraman; Dheekshita Kumar; Juan M Banda; David Sontag
Journal:  ArXiv       Date:  2021-02-13

7.  Phenotype risk scores (PheRS) for pancreatic cancer using time-stamped electronic health record data: Discovery and validation in two large biobanks.

Authors:  Maxwell Salvatore; Lauren J Beesley; Lars G Fritsche; David Hanauer; Xu Shi; Alison M Mondul; Celeste Leigh Pearce; Bhramar Mukherjee
Journal:  J Biomed Inform       Date:  2020-12-03       Impact factor: 8.000

8.  Identifying Plant-Human Disease Associations in Biomedical Literature: A Case Study.

Authors:  Vivekanand Sharma; Wayne Law; Michael J Balick; Indra Neil Sarkar
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-19

9.  Text Mining of Journal Articles for Sleep Disorder Terminologies.

Authors:  Calvin Lam; Fu-Chih Lai; Chia-Hui Wang; Mei-Hsin Lai; Nanly Hsu; Min-Huey Chung
Journal:  PLoS One       Date:  2016-05-20       Impact factor: 3.240

10.  Improving biomedical information retrieval by linear combinations of different query expansion techniques.

Authors:  Ahmed AbdoAziz Ahmed Abdulla; Hongfei Lin; Bo Xu; Santosh Kumar Banbhrani
Journal:  BMC Bioinformatics       Date:  2016-07-25       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.