Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Unsupervised learning assisted robust prediction of bioluminescent proteins.

Literature DB >> 26599828

Unsupervised learning assisted robust prediction of bioluminescent proteins.

Abstract

Bioluminescence plays an important role in nature, for example, it is used for intracellular chemical signalling in bacteria. It is also used as a useful reagent for various analytical research methods ranging from cellular imaging to gene expression analysis. However, identification and annotation of bioluminescent proteins is a difficult task as they share poor sequence similarities among them. In this paper, we present a novel approach for within-class and between-class balancing as well as diversifying of a training dataset by effectively combining unsupervised K-Means algorithm with Synthetic Minority Oversampling Technique (SMOTE) in order to achieve the true performance of the prediction model. Further, we experimented by varying different levels of balancing ratio of positive data to negative data in the training dataset in order to probe for an optimal class distribution which produces the best prediction accuracy. The appropriately balanced and diversified training set resulted in near complete learning with greater generalization on the blind test datasets. The obtained results strongly justify the fact that optimal class distribution with a high degree of diversity is an essential factor to achieve near perfect learning. Using random forest as the weak learners in boosting and training it on the optimally balanced and diversified training dataset, we achieved an overall accuracy of 95.3% on a tenfold cross validation test, and an accuracy of 91.7%, sensitivity of 89. 3% and specificity of 91.8% on a holdout test set. It is quite possible that the general framework discussed in the current work can be successfully applied to other biological datasets to deal with imbalance and incomplete learning problems effectively.

Keywords: Class imbalance; K-Means; Optimal class distribution; SMOTE; Training set diversity

Mesh：

Substances：
Luminescent Proteins

Year: 2015 PMID： 26599828 DOI： 10.1016/j.compbiomed.2015.10.013

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Keyword Cloud
Cited

3 in total

1. Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors.

Authors: Abhigyan Nath; Karthikeyan Subbiah
Journal: 3 Biotech Date: 2016-03-21 Impact factor: 2.406

2. Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme.

Authors: Jian Zhang; Haiting Chai; Guifu Yang; Zhiqiang Ma
Journal: BMC Bioinformatics Date: 2017-06-05 Impact factor: 3.169

3. iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins.

Authors: Dan Zhang; Hua-Dong Chen; Hasan Zulfiqar; Shi-Shi Yuan; Qin-Lai Huang; Zhao-Yue Zhang; Ke-Jun Deng
Journal: Comput Math Methods Med Date: 2021-01-07 Impact factor: 2.238

3 in total