Literature DB >> 29100918

Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.

Shengli Zhang1, Xin Duan2.   

Abstract

Predicting protein subcellular location with support vector machine has been a popular research area recently because of the dramatic explosion of bioinformation. Though substantial achievements have been obtained, few researchers considered the problem of data imbalance before classification, which will lead to low accuracy for some categories. So in this work, we combined oversampling method with SVM to deal with the protein subcellular localization of unbalanced data sets. To capture valuable information of a protein, a PseAAC (Pseudo Amino Acid Composition) has been extracted from PSSM(Position-Specific Scoring Matrix) as a feature vector, and then be selected by principal component analysis (PCA). Next, samples which are treated by oversampling method to eliminate the imbalance of sample numbers in different classes are fed into support vector machine to predict the protein subcellular location. To evaluate the performance of proposed method, Jackknife tests are performed on three benchmark datasets (ZD98, CL317 and ZW225). Results of SVM experiments with and without oversampling gained by Jackknife tests show that oversampling methods have successfully decrease the imbalance of data sets, and the prediction accuracy of each class in each dataset is higher than 88.9%. With comparison with other protein subcellular localization methods, the method in this work reaches the best performance. The overall accuracies of ZD98, CL317 and ZW225 are 93.2%, 96.00% and 92.15% respectively, which are 2.4%, 8.0% and 8.2% higher than the best methods in the comparison. The excellent overall accuracy gained by the proposed method indicates that the feature representation and selection capture useful information of protein sequence and oversampling methods successfully solve the imbalance of sample numbers in SVM classification.
Copyright © 2017 Elsevier Ltd. All rights reserved.

Keywords:  Jackknife tests; Oversampling; PCA; PSSM; PseAAC; SVM

Mesh:

Substances:

Year:  2017        PMID: 29100918     DOI: 10.1016/j.jtbi.2017.10.030

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  15 in total

1.  Predicting membrane proteins and their types by extracting various sequence features into Chou's general PseAAC.

Authors:  Ahmad Hassan Butt; Nouman Rasool; Yaser Daanial Khan
Journal:  Mol Biol Rep       Date:  2018-09-20       Impact factor: 2.316

2.  Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.

Authors:  Lei Du; Qingfang Meng; Yuehui Chen; Peng Wu
Journal:  BMC Bioinformatics       Date:  2020-05-24       Impact factor: 3.169

Review 3.  Some illuminating remarks on molecular genetics and genomics as well as drug development.

Authors:  Kuo-Chen Chou
Journal:  Mol Genet Genomics       Date:  2020-01-01       Impact factor: 3.291

Review 4.  Understanding molecular mechanisms of disease through spatial proteomics.

Authors:  Sandra Pankow; Salvador Martínez-Bartolomé; Casimir Bamberger; John R Yates
Journal:  Curr Opin Chem Biol       Date:  2018-10-09       Impact factor: 8.822

5.  Self-evoluting framework of deep convolutional neural network for multilocus protein subcellular localization.

Authors:  Hanhan Cong; Hong Liu; Yuehui Chen; Yi Cao
Journal:  Med Biol Eng Comput       Date:  2020-10-20       Impact factor: 2.602

6.  Machine and Deep Learning for Prediction of Subcellular Localization.

Authors:  Gaofeng Pan; Chao Sun; Zijun Liao; Jijun Tang
Journal:  Methods Mol Biol       Date:  2021

7.  Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism.

Authors:  Hanhan Cong; Hong Liu; Yi Cao; Yuehui Chen; Cheng Liang
Journal:  Interdiscip Sci       Date:  2022-01-23       Impact factor: 2.233

8.  iRNA-3typeA: Identifying Three Types of Modification at RNA's Adenosine Sites.

Authors:  Wei Chen; Pengmian Feng; Hui Yang; Hui Ding; Hao Lin; Kuo-Chen Chou
Journal:  Mol Ther Nucleic Acids       Date:  2018-03-30       Impact factor: 8.886

9.  iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC.

Authors:  Hui Yang; Wang-Ren Qiu; Guoqing Liu; Feng-Biao Guo; Wei Chen; Kuo-Chen Chou; Hao Lin
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

10.  Predictions of Apoptosis Proteins by Integrating Different Features Based on Improving Pseudo-Position-Specific Scoring Matrix.

Authors:  Xiaoli Ruan; Dongming Zhou; Rencan Nie; Yanbu Guo
Journal:  Biomed Res Int       Date:  2020-01-14       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.