Literature DB >> 15215466

NLProt: extracting protein names and sequences from papers.

Sven Mika1, Burkhard Rost.   

Abstract

Automatically extracting protein names from the literature and linking these names to the associated entries in sequence databases is becoming increasingly important for annotating biological databases. NLProt is a novel system that combines dictionary- and rule-based filtering with several support vector machines (SVMs) to tag protein names in PubMed abstracts. When considering partially tagged names as errors, NLProt still reached a precision of 75% at a recall of 76%. By many criteria our system outperformed other tagging methods significantly; in particular, it proved very reliable even for novel names. Names encountered particularly frequently in Drosophila, such as white, wing and bizarre, constitute an obvious limitation of NLProt. Our method is available both as an Internet server and as a program for download (http://cubic.bioc.columbia.edu/services/NLProt/). Input can be PubMed/MEDLINE identifiers, authors, titles and journals, as well as collections of abstracts, or entire papers.

Entities:  

Mesh:

Year:  2004        PMID: 15215466      PMCID: PMC441565          DOI: 10.1093/nar/gkh427

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  17 in total

1.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.

Authors: 
Journal:  Genome Inform Ser Workshop Genome Inform       Date:  1998

Review 2.  The bioinformatics of microarray gene expression profiling.

Authors:  John N Weinstein; Uwe Scherf; Jae K Lee; Satoshi Nishizuka; Fuad Gwadry; Ajay Kim Bussey; S Kim; Lawrence H Smith; Lorraine Tanabe; Samuel Richman; Jessie Alexander; Hosein Kouros-Mehr; Alika Maunakea; William C Reinhold
Journal:  Cytometry       Date:  2002-01-01

3.  A biological named entity recognizer.

Authors:  Meenakshi Narayanaswamy; K E Ravikumar; K Vijay-Shanker
Journal:  Pac Symp Biocomput       Date:  2003

4.  Protein names and how to find them.

Authors:  Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

5.  GAPSCORE: finding gene and protein names one word at a time.

Authors:  Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

6.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

7.  Database of homology-derived protein structures and the structural meaning of sequence alignment.

Authors:  C Sander; R Schneider
Journal:  Proteins       Date:  1991

8.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

9.  Toward information extraction: identifying protein names from biological papers.

Authors:  K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal:  Pac Symp Biocomput       Date:  1998

10.  Local alignment statistics.

Authors:  S F Altschul; W Gish
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

View more
  14 in total

Review 1.  Bioinformatics for personal genome interpretation.

Authors:  Emidio Capriotti; Nathan L Nehrt; Maricel G Kann; Yana Bromberg
Journal:  Brief Bioinform       Date:  2012-01-13       Impact factor: 11.622

2.  BioTagger-GM: a gene/protein name recognition system.

Authors:  Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2008-12-11       Impact factor: 4.497

3.  LAITOR--Literature Assistant for Identification of Terms co-Occurrences and Relationships.

Authors:  Adriano Barbosa-Silva; Theodoros G Soldatos; Ivan L F Magalhães; Georgios A Pavlopoulos; Jean-Fred Fontaine; Miguel A Andrade-Navarro; Reinhard Schneider; J Miguel Ortega
Journal:  BMC Bioinformatics       Date:  2010-02-01       Impact factor: 3.169

Review 4.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

5.  Chapter 15: disease gene prioritization.

Authors:  Yana Bromberg
Journal:  PLoS Comput Biol       Date:  2013-04-25       Impact factor: 4.475

6.  PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries.

Authors:  Adriano Barbosa-Silva; Jean-Fred Fontaine; Elisa R Donnard; Fernanda Stussi; J Miguel Ortega; Miguel A Andrade-Navarro
Journal:  BMC Bioinformatics       Date:  2011-11-09       Impact factor: 3.307

7.  A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature.

Authors:  Anália Lourenço; Michael Conover; Andrew Wong; Azadeh Nematzadeh; Fengxia Pan; Hagit Shatkay; Luis M Rocha
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

8.  Semantic annotation of biological concepts interplaying microbial cellular responses.

Authors:  Rafael Carreira; Sónia Carneiro; Rui Pereira; Miguel Rocha; Isabel Rocha; Eugénio C Ferreira; Anália Lourenço
Journal:  BMC Bioinformatics       Date:  2011-11-28       Impact factor: 3.169

9.  Context-specific protein network miner--an online system for exploring context-specific protein interaction networks from the literature.

Authors:  Rajesh Chowdhary; Sin Lam Tan; Jinfeng Zhang; Shreyas Karnik; Vladimir B Bajic; Jun S Liu
Journal:  PLoS One       Date:  2012-04-06       Impact factor: 3.240

10.  Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.

Authors:  Miguel García-Remesal; Alejandro García-Ruiz; David Pérez-Rey; Diana de la Iglesia; Víctor Maojo
Journal:  Biomed Res Int       Date:  2012-12-27       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.