Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 NLProt: extracting protein names and sequences from papers.

Literature DB >> 15215466

NLProt: extracting protein names and sequences from papers.

Abstract

Automatically extracting protein names from the literature and linking these names to the associated entries in sequence databases is becoming increasingly important for annotating biological databases. NLProt is a novel system that combines dictionary- and rule-based filtering with several support vector machines (SVMs) to tag protein names in PubMed abstracts. When considering partially tagged names as errors, NLProt still reached a precision of 75% at a recall of 76%. By many criteria our system outperformed other tagging methods significantly; in particular, it proved very reliable even for novel names. Names encountered particularly frequently in Drosophila, such as white, wing and bizarre, constitute an obvious limitation of NLProt. Our method is available both as an Internet server and as a program for download (http://cubic.bioc.columbia.edu/services/NLProt/). Input can be PubMed/MEDLINE identifiers, authors, titles and journals, as well as collections of abstracts, or entire papers.

Entities: Species

Mesh：

Year: 2004 PMID： 15215466 PMCID： PMC441565 DOI： 10.1093/nar/gkh427

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

17 in total

1. Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.

Authors:
Journal: Genome Inform Ser Workshop Genome Inform Date: 1998

Review 2. The bioinformatics of microarray gene expression profiling.

Authors: John N Weinstein; Uwe Scherf; Jae K Lee; Satoshi Nishizuka; Fuad Gwadry; Ajay Kim Bussey; S Kim; Lawrence H Smith; Lorraine Tanabe; Samuel Richman; Jessie Alexander; Hosein Kouros-Mehr; Alika Maunakea; William C Reinhold
Journal: Cytometry Date: 2002-01-01

3. A biological named entity recognizer.

Authors: Meenakshi Narayanaswamy; K E Ravikumar; K Vijay-Shanker
Journal: Pac Symp Biocomput Date: 2003

4. Protein names and how to find them.

Authors: Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal: Int J Med Inform Date: 2002-12-04 Impact factor: 4.046

5. GAPSCORE: finding gene and protein names one word at a time.

Authors: Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

6. Tagging gene and protein names in biomedical text.

Authors: Lorraine Tanabe; W John Wilbur
Journal: Bioinformatics Date: 2002-08 Impact factor: 6.937

7. Database of homology-derived protein structures and the structural meaning of sequence alignment.

Authors: C Sander; R Schneider
Journal: Proteins Date: 1991

8. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors: A Bairoch; R Apweiler
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

9. Toward information extraction: identifying protein names from biological papers.

Authors: K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal: Pac Symp Biocomput Date: 1998

10. Local alignment statistics.

Authors: S F Altschul; W Gish
Journal: Methods Enzymol Date: 1996 Impact factor: 1.600

14 in total

Review 1. Bioinformatics for personal genome interpretation.

Authors: Emidio Capriotti; Nathan L Nehrt; Maricel G Kann; Yana Bromberg
Journal: Brief Bioinform Date: 2012-01-13 Impact factor: 11.622

2. BioTagger-GM: a gene/protein name recognition system.

Authors: Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal: J Am Med Inform Assoc Date: 2008-12-11 Impact factor: 4.497

3. LAITOR--Literature Assistant for Identification of Terms co-Occurrences and Relationships.

Authors: Adriano Barbosa-Silva; Theodoros G Soldatos; Ivan L F Magalhães; Georgios A Pavlopoulos; Jean-Fred Fontaine; Miguel A Andrade-Navarro; Reinhard Schneider; J Miguel Ortega
Journal: BMC Bioinformatics Date: 2010-02-01 Impact factor: 3.169

Review 4. What the papers say: text mining for genomics and systems biology.

Authors: Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal: Hum Genomics Date: 2010-10 Impact factor: 4.639

5. Chapter 15: disease gene prioritization.

Authors: Yana Bromberg
Journal: PLoS Comput Biol Date: 2013-04-25 Impact factor: 4.475

6. PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries.

Authors: Adriano Barbosa-Silva; Jean-Fred Fontaine; Elisa R Donnard; Fernanda Stussi; J Miguel Ortega; Miguel A Andrade-Navarro
Journal: BMC Bioinformatics Date: 2011-11-09 Impact factor: 3.307

7. A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature.

Authors: Anália Lourenço; Michael Conover; Andrew Wong; Azadeh Nematzadeh; Fengxia Pan; Hagit Shatkay; Luis M Rocha
Journal: BMC Bioinformatics Date: 2011-10-03 Impact factor: 3.169

8. Semantic annotation of biological concepts interplaying microbial cellular responses.

Authors: Rafael Carreira; Sónia Carneiro; Rui Pereira; Miguel Rocha; Isabel Rocha; Eugénio C Ferreira; Anália Lourenço
Journal: BMC Bioinformatics Date: 2011-11-28 Impact factor: 3.169

9. Context-specific protein network miner--an online system for exploring context-specific protein interaction networks from the literature.

Authors: Rajesh Chowdhary; Sin Lam Tan; Jinfeng Zhang; Shreyas Karnik; Vladimir B Bajic; Jun S Liu
Journal: PLoS One Date: 2012-04-06 Impact factor: 3.240

10. Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.

Authors: Miguel García-Remesal; Alejandro García-Ruiz; David Pérez-Rey; Diana de la Iglesia; Víctor Maojo
Journal: Biomed Res Int Date: 2012-12-27 Impact factor: 3.411