Literature DB >> 21420508

Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets.

David Ruau1, Michael Mbagwu2, Joel T Dudley3, Vijay Krishnan4, Atul J Butte5.   

Abstract

Publicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as "gold standard". Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility.
Copyright © 2011 Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Year:  2011        PMID: 21420508      PMCID: PMC3155012          DOI: 10.1016/j.jbi.2011.03.007

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  21 in total

1.  Show me the data!

Authors:  C M Perou
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

2.  Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors:  A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

3.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

4.  Integration of biological networks and gene expression data using Cytoscape.

Authors:  Melissa S Cline; Michael Smoot; Ethan Cerami; Allan Kuchinsky; Nerius Landys; Chris Workman; Rowan Christmas; Iliana Avila-Campilo; Michael Creech; Benjamin Gross; Kristina Hanspers; Ruth Isserlin; Ryan Kelley; Sarah Killcoyne; Samad Lotia; Steven Maere; John Morris; Keiichiro Ono; Vuk Pavlovic; Alexander R Pico; Aditya Vailaya; Peng-Liang Wang; Annette Adler; Bruce R Conklin; Leroy Hood; Martin Kuiper; Chris Sander; Ilya Schmulevich; Benno Schwikowski; Guy J Warner; Trey Ideker; Gary D Bader
Journal:  Nat Protoc       Date:  2007       Impact factor: 13.491

5.  Enabling integrative genomic analysis of high-impact human diseases through text mining.

Authors:  Joel Dudley; Atul J Butte
Journal:  Pac Symp Biocomput       Date:  2008

6.  Current methodologies for translational bioinformatics.

Authors:  Yves A Lussier; Atul J Butte; Lawrence Hunter
Journal:  J Biomed Inform       Date:  2010-05-12       Impact factor: 6.317

7.  Disease signatures are robust across tissues and experiments.

Authors:  Joel T Dudley; Robert Tibshirani; Tarangini Deshpande; Atul J Butte
Journal:  Mol Syst Biol       Date:  2009-09-15       Impact factor: 11.429

8.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks.

Authors:  Cecily J Wolfe; Isaac S Kohane; Atul J Butte
Journal:  BMC Bioinformatics       Date:  2005-09-14       Impact factor: 3.169

9.  Comparison of concept recognizers for building the Open Biomedical Annotator.

Authors:  Nigam H Shah; Nipun Bhatia; Clement Jonquet; Daniel Rubin; Annie P Chiang; Mark A Musen
Journal:  BMC Bioinformatics       Date:  2009-09-17       Impact factor: 3.169

10.  Design and implementation of microarray gene expression markup language (MAGE-ML).

Authors:  Paul T Spellman; Michael Miller; Jason Stewart; Charles Troup; Ugis Sarkans; Steve Chervitz; Derek Bernhart; Gavin Sherlock; Catherine Ball; Marc Lepage; Marcin Swiatek; W L Marks; Jason Goncalves; Scott Markel; Daniel Iordan; Mohammadreza Shojatalab; Angel Pizarro; Joe White; Robert Hubley; Eric Deutsch; Martin Senger; Bruce J Aronow; Alan Robinson; Doug Bassett; Christian J Stoeckert; Alvis Brazma
Journal:  Genome Biol       Date:  2002-08-23       Impact factor: 13.583

View more
  6 in total

1.  Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

Authors:  David A Hanauer; Mohammed Saeed; Kai Zheng; Qiaozhu Mei; Kerby Shedden; Alan R Aronson; Naren Ramakrishnan
Journal:  J Am Med Inform Assoc       Date:  2014-06-13       Impact factor: 4.497

2.  Integrative approach to pain genetics identifies pain sensitivity loci across diseases.

Authors:  David Ruau; Joel T Dudley; Rong Chen; Nicholas G Phillips; Gary E Swan; Laura C Lazzeroni; J David Clark; Atul J Butte; Martin S Angst
Journal:  PLoS Comput Biol       Date:  2012-06-07       Impact factor: 4.475

3.  Reengineering of MeSH thesauri for term selection to optimize literature retrieval and knowledge reconstruction in support of stem cell research.

Authors:  Yan Su; James Andrews; Hong Huang; Yue Wang; Liangliang Kong; Peter Cannon; Ping Xu
Journal:  BMC Med Inform Decis Mak       Date:  2016-05-23       Impact factor: 2.796

4.  Extending TCGA queries to automatically identify analogous genomic data from dbGaP.

Authors:  Erin K Wagner; Satyajeet Raje; Liz Amos; Jessica Kurata; Abhijit S Badve; Yingquan Li; Ben Busby
Journal:  F1000Res       Date:  2017-03-24

5.  Bibliometric Properties of Placebo Literature From the JIPS Database: A Descriptive Study.

Authors:  Katja Weimer; Cliff Buschhart; Ellen K Broelz; Paul Enck; Björn Horing
Journal:  Front Psychiatry       Date:  2022-03-25       Impact factor: 4.157

6.  Identifying medical terms in patient-authored text: a crowdsourcing-based approach.

Authors:  Diana Lynn MacLean; Jeffrey Heer
Journal:  J Am Med Inform Assoc       Date:  2013-05-05       Impact factor: 4.497

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.