Literature DB >> 25966480

Imbalanced Protein Data Classification Using Ensemble FTM-SVM.

.   

Abstract

Classification of protein sequences into functional and structural families based on machine learning methods is a hot research topic in machine learning and Bioinformatics. In fact, the underlying protein classification problem is a huge multiclass problem. Generally, the multiclass problem can be reduced to a set of binary classification problems. The protein in one class are seen as positive examples while those outside the class are seen as negative examples. However, the class imbalance problem will arise in this case because the number of protein in one class is usually much smaller than that of the protein outside the class. To handle the challenge, we propose a novel framework to classify the protein. We firstly use free scores (FS) to perform feature extraction for protein; then, the inverse random under sampling (IRUS) is used to create a large number of distinct training sets; next, we use a new ensemble approach to combine these distinct training sets with a new fuzzy total margin support vector machine (FTM-SVM) that we have constructed. we call the novel ensemble classifier as ensemble fuzzy total margin support vector machine (EnFTM-SVM). We then give a full description of our method, including the details of its derivation. Finally, experimental results on fourteen benchmark protein data sets indicate that the proposed method outperforms many state-of-the-art protein classifying methods.

Entities:  

Year:  2015        PMID: 25966480     DOI: 10.1109/TNB.2015.2431292

Source DB:  PubMed          Journal:  IEEE Trans Nanobioscience        ISSN: 1536-1241            Impact factor:   2.935


  2 in total

1.  SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

Authors:  Ying Hong Li; Jing Yu Xu; Lin Tao; Xiao Feng Li; Shuang Li; Xian Zeng; Shang Ying Chen; Peng Zhang; Chu Qin; Cheng Zhang; Zhe Chen; Feng Zhu; Yu Zong Chen
Journal:  PLoS One       Date:  2016-08-15       Impact factor: 3.240

2.  Network-based protein structural classification.

Authors:  Khalique Newaz; Mahboobeh Ghalehnovi; Arash Rahnama; Panos J Antsaklis; Tijana Milenković
Journal:  R Soc Open Sci       Date:  2020-06-03       Impact factor: 2.963

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.