Literature DB >> 17079192

Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

Martijn J Schuemie1, Barend Mons, Marc Weeber, Jan A Kors.   

Abstract

Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

Mesh:

Year:  2006        PMID: 17079192     DOI: 10.1016/j.jbi.2006.09.002

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  16 in total

Review 1.  Network integration and graph analysis in mammalian molecular systems biology.

Authors:  A Ma'ayan
Journal:  IET Syst Biol       Date:  2008-09       Impact factor: 1.615

2.  Word add-in for ontology recognition: semantic enrichment of scientific literature.

Authors:  J Lynn Fink; Pablo Fernicola; Rahul Chandran; Savas Parastatidis; Alex Wade; Oscar Naim; Gregory B Quinn; Philip E Bourne
Journal:  BMC Bioinformatics       Date:  2010-02-24       Impact factor: 3.169

3.  Concept-based query expansion for retrieving gene related publications from MEDLINE.

Authors:  Sérgio Matos; Joel P Arrais; João Maia-Rodrigues; José Luis Oliveira
Journal:  BMC Bioinformatics       Date:  2010-04-28       Impact factor: 3.169

Review 4.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

5.  Gimli: open source and high-performance biomedical name recognition.

Authors:  David Campos; Sérgio Matos; José Luís Oliveira
Journal:  BMC Bioinformatics       Date:  2013-02-15       Impact factor: 3.169

6.  Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.

Authors:  Rob Jelier; Guido Jenster; Lambert C J Dorssers; Bas J Wouters; Peter J M Hendriksen; Barend Mons; Ruud Delwel; Jan A Kors
Journal:  BMC Bioinformatics       Date:  2007-01-18       Impact factor: 3.169

7.  Anni 2.0: a multipurpose text-mining tool for the life sciences.

Authors:  Rob Jelier; Martijn J Schuemie; Antoine Veldhoven; Lambert C J Dorssers; Guido Jenster; Jan A Kors
Journal:  Genome Biol       Date:  2008-06-12       Impact factor: 13.583

8.  Literature-aided meta-analysis of microarray data: a compendium study on muscle development and disease.

Authors:  Rob Jelier; Peter A C 't Hoen; Ellen Sterrenburg; Johan T den Dunnen; Gert-Jan B van Ommen; Jan A Kors; Barend Mons
Journal:  BMC Bioinformatics       Date:  2008-06-24       Impact factor: 3.169

9.  Novel protein-protein interactions inferred from literature context.

Authors:  Herman H H B M van Haagen; Peter A C 't Hoen; Alessandro Botelho Bovo; Antoine de Morrée; Erik M van Mulligen; Christine Chichester; Jan A Kors; Johan T den Dunnen; Gert-Jan B van Ommen; Silvère M van der Maarel; Vinícius Medina Kern; Barend Mons; Martijn J Schuemie
Journal:  PLoS One       Date:  2009-11-18       Impact factor: 3.240

10.  Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.

Authors:  Kristina M Hettne; André Boorsma; Dorien A M van Dartel; Jelle J Goeman; Esther de Jong; Aldert H Piersma; Rob H Stierum; Jos C Kleinjans; Jan A Kors
Journal:  BMC Med Genomics       Date:  2013-01-29       Impact factor: 3.063

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.