Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Semi-supervised protein classification using cluster kernels.

Literature DB >> 15905279

Semi-supervised protein classification using cluster kernels.

Jason Weston¹, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff, William Stafford Noble.

Abstract

MOTIVATION: Building an accurate protein classification system depends critically upon choosing a good representation of the input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data--examples with known 3D structures, organized into structural classes--whereas in practice, unlabeled data are far more plentiful.
RESULTS: In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods and at the same time achieving far greater computational efficiency. AVAILABILITY: Source code is available at www.kyb.tuebingen.mpg.de/bs/people/weston/semiprot. The Spider matlab package is available at www.kyb.tuebingen.mpg.de/bs/people/spider. SUPPLEMENTARY INFORMATION: www.kyb.tuebingen.mpg.de/bs/people/weston/semiprot.

Entities: Species

Mesh：

Substances：
Proteins

Year: 2005 PMID： 15905279 DOI： 10.1093/bioinformatics/bti497

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

25 in total

1. Maximum margin classifier working in a set of strings.

Authors: Hitoshi Koyano; Morihiro Hayashida; Tatsuya Akutsu
Journal: Proc Math Phys Eng Sci Date: 2016-03 Impact factor: 2.704

2. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

Authors: Debasis Chakraborty; Ujjwal Maulik
Journal: IEEE J Transl Eng Health Med Date: 2014-12-02 Impact factor: 3.316

3. Semi-supervised learning improves gene expression-based prediction of cancer recurrence.

Authors: Mingguang Shi; Bing Zhang
Journal: Bioinformatics Date: 2011-09-04 Impact factor: 6.937

4. Exploiting physico-chemical properties in string kernels.

Authors: Nora C Toussaint; Christian Widmer; Oliver Kohlbacher; Gunnar Rätsch
Journal: BMC Bioinformatics Date: 2010-10-26 Impact factor: 3.169

5. Accelerating the Original Profile Kernel.

Authors: Tobias Hamp; Tatyana Goldberg; Burkhard Rost
Journal: PLoS One Date: 2013-06-18 Impact factor: 3.240

6. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences.

Authors: Miguel A Santos; Andrei L Turinsky; Serene Ong; Jennifer Tsai; Michael F Berger; Gwenael Badis; Shaheynoor Talukder; Andrew R Gehrke; Martha L Bulyk; Timothy R Hughes; Shoshana J Wodak
Journal: Nucleic Acids Res Date: 2010-08-12 Impact factor: 16.971

Semi-supervised protein classification using cluster kernels.

1. Maximum margin classifier working in a set of strings.

2. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

3. Semi-supervised learning improves gene expression-based prediction of cancer recurrence.

4. Exploiting physico-chemical properties in string kernels.

5. Accelerating the Original Profile Kernel.

6. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences.

7. Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models.

8. Diversity and dispersal of a ubiquitous protein family: acyl-CoA dehydrogenases.

9. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

10. Protein localization prediction using random walks on graphs.