Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Classification and knowledge discovery in protein databases.

Literature DB >> 15465476

Classification and knowledge discovery in protein databases.

Predrag Radivojac¹, Nitesh V Chawla, A Keith Dunker, Zoran Obradovic.

Abstract

We consider the problem of classification in noisy, high-dimensional, and class-imbalanced protein datasets. In order to design a complete classification system, we use a three-stage machine learning framework consisting of a feature selection stage, a method addressing noise and class-imbalance, and a method for combining biologically related tasks through a prior-knowledge based clustering. In the first stage, we employ Fisher's permutation test as a feature selection filter. Comparisons with the alternative criteria show that it may be favorable for typical protein datasets. In the second stage, noise and class imbalance are addressed by using minority class over-sampling, majority class under-sampling, and ensemble learning. The performance of logistic regression models, decision trees, and neural networks is systematically evaluated. The experimental results show that in many cases ensembles of logistic regression classifiers may outperform more expressive models due to their robustness to noise and low sample density in a high-dimensional feature space. However, ensembles of neural networks may be the best solution for large datasets. In the third stage, we use prior knowledge to partition unlabeled data such that the class distributions among non-overlapping clusters significantly differ. In our experiments, training classifiers specialized to the class distributions of each cluster resulted in a further decrease in classification error.

Mesh：

Substances：
Proteins

Year: 2004 PMID： 15465476 DOI： 10.1016/j.jbi.2004.07.008

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

8 in total

1. Imbalanced class learning in epigenetics.

Authors: M Muksitul Haque; Michael K Skinner; Lawrence B Holder
Journal: J Comput Biol Date: 2014-05-05 Impact factor: 1.479

Review 2. Intrinsic disorder and functional proteomics.

Authors: Predrag Radivojac; Lilia M Iakoucheva; Christopher J Oldfield; Zoran Obradovic; Vladimir N Uversky; A Keith Dunker
Journal: Biophys J Date: 2006-12-08 Impact factor: 4.033

3. Analysis of structured and intrinsically disordered regions of transmembrane proteins.

Authors: Bin Xue; Liwei Li; Samy O Meroueh; Vladimir N Uversky; A Keith Dunker
Journal: Mol Biosyst Date: 2009-12

4. SMOTE for high-dimensional class-imbalanced data.

Authors: Rok Blagus; Lara Lusa
Journal: BMC Bioinformatics Date: 2013-03-22 Impact factor: 3.169

5. Predicting protein disorder by analyzing amino acid sequence.

Authors: Jack Y Yang; Mary Qu Yang
Journal: BMC Genomics Date: 2008-09-16 Impact factor: 3.969

6. Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models.

Authors: Rok Blagus; Lara Lusa
Journal: BMC Bioinformatics Date: 2015-11-04 Impact factor: 3.169

7. Iterative nearest neighborhood oversampling in semisupervised learning from imbalanced data.

Authors: Fengqi Li; Chuang Yu; Nanhai Yang; Feng Xia; Guangming Li; Fatemeh Kaveh-Yazdy
Journal: ScientificWorldJournal Date: 2013-07-10

8. Global Phosphoproteomic Analysis Reveals the Involvement of Phosphorylation in Aflatoxins Biosynthesis in the Pathogenic Fungus Aspergillus flavus.

Authors: Silin Ren; Mingkun Yang; Yu Li; Feng Zhang; Zhuo Chen; Jia Zhang; Guang Yang; Yuewei Yue; Siting Li; Feng Ge; Shihua Wang
Journal: Sci Rep Date: 2016-09-26 Impact factor: 4.379

8 in total