Literature DB >> 11928491

Predicting the sub-cellular location of proteins from text using support vector machines.

B J Stapley1, L A Kelley, M J E Sternberg.   

Abstract

We present an automatic method to classify the sub-cellular location of proteins based on the text of relevant medline abstracts. For each protein, a vector of terms is generated from medline abstracts in which the protein/gene's name or synonym occurs. A Support Vector Machine (SVM) is used to automatically partition the term space and to thus discriminate the textual features that define sub-cellular location. The method is benchmarked on a set of proteins of known sub-cellular location from S. cerevisiae. No prior knowledge of the problem domain nor any natural language processing is used at any stage. The method out-performs support vector machines trained on amino acid composition and has comparable performance to rule-based text classifiers. Combining text with protein amino-acid composition improves recall for some sub-cellular locations. We discuss the generality of the method and its potential application to a variety of biological classification problems.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 11928491     DOI: 10.1142/9789812799623_0035

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  15 in total

1.  Using text analysis to identify functionally coherent gene groups.

Authors:  Soumya Raychaudhuri; Hinrich Schütze; Russ B Altman
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  On the pH-optimum of activity and stability of proteins.

Authors:  Kemper Talley; Emil Alexov
Journal:  Proteins       Date:  2010-09

3.  A stacked graphical model for associating sub-images with sub-captions.

Authors:  Zhenzhen Kou; William W Cohen; Robert F Murphy
Journal:  Pac Symp Biocomput       Date:  2007

4.  Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes.

Authors:  Albin Sandelin; Annette Höglund; Boris Lenhard; Wyeth W Wasserman
Journal:  Funct Integr Genomics       Date:  2003-06-25       Impact factor: 3.410

5.  The Text-mining based PubChem Bioassay neighboring analysis.

Authors:  Lianyi Han; Tugba O Suzek; Yanli Wang; Steve H Bryant
Journal:  BMC Bioinformatics       Date:  2010-11-08       Impact factor: 3.169

6.  Characterization and sequence prediction of structural variations in α-helix.

Authors:  Ashish V Tendulkar; Pramod P Wangikar
Journal:  BMC Bioinformatics       Date:  2011-02-15       Impact factor: 3.169

7.  Automatic extraction of protein point mutations using a graph bigram association.

Authors:  Lawrence C Lee; Florence Horn; Fred E Cohen
Journal:  PLoS Comput Biol       Date:  2007-02-02       Impact factor: 4.475

8.  Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Authors:  Kevin Nagel; Antonio Jimeno-Yepes; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

9.  Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature.

Authors:  Jeyakumar Natarajan; Jawahar Ganapathy
Journal:  Bioinformation       Date:  2007-12-30

10.  Improving classification in protein structure databases using text mining.

Authors:  Antonis Koussounadis; Oliver C Redfern; David T Jones
Journal:  BMC Bioinformatics       Date:  2009-05-05       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.