Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Predicting the sub-cellular location of proteins from text using support vector machines.

Literature DB >> 11928491

Predicting the sub-cellular location of proteins from text using support vector machines.

B J Stapley¹, L A Kelley, M J E Sternberg.

Abstract

We present an automatic method to classify the sub-cellular location of proteins based on the text of relevant medline abstracts. For each protein, a vector of terms is generated from medline abstracts in which the protein/gene's name or synonym occurs. A Support Vector Machine (SVM) is used to automatically partition the term space and to thus discriminate the textual features that define sub-cellular location. The method is benchmarked on a set of proteins of known sub-cellular location from S. cerevisiae. No prior knowledge of the problem domain nor any natural language processing is used at any stage. The method out-performs support vector machines trained on amino acid composition and has comparable performance to rule-based text classifiers. Combining text with protein amino-acid composition improves recall for some sub-cellular locations. We discuss the generality of the method and its potential application to a variety of biological classification problems.

Entities: Species

Mesh：

Substances：
Fungal Proteins

Year: 2002 PMID： 11928491 DOI： 10.1142/9789812799623_0035

Source DB: PubMed Journal: Pac Symp Biocomput ISSN： 2335-6928

Keyword Cloud
Cited

15 in total

Predicting the sub-cellular location of proteins from text using support vector machines.

1. Using text analysis to identify functionally coherent gene groups.

2. On the pH-optimum of activity and stability of proteins.

3. A stacked graphical model for associating sub-images with sub-captions.

4. Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes.

5. The Text-mining based PubChem Bioassay neighboring analysis.

6. Characterization and sequence prediction of structural variations in α-helix.

7. Automatic extraction of protein point mutations using a graph bigram association.

8. Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

9. Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature.

10. Improving classification in protein structure databases using text mining.