Leyi Wei1,2, Chen Zhou1, Huangrong Chen1, Jiangning Song3,4, Ran Su5,2. 1. School of Computer Science and Technology, Tianjin University, Tianjin, China. 2. State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China. 3. Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology. 4. Monash Centre for Data Science, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia. 5. School of Computer Software, Tianjin University, Tianjin, China.
Abstract
Motivation: Anti-cancer peptides (ACPs) have recently emerged as promising therapeutic agents for cancer treatment. Due to the avalanche of protein sequence data in the post-genomic era, there is an urgent need to develop automated computational methods to enable fast and accurate identification of novel ACPs within the vast number of candidate proteins and peptides. Results: To address this, we propose a novel predictor named Anti-Cancer peptide Predictor with Feature representation Learning (ACPred-FL) for accurate prediction of ACPs based on sequence information. More specifically, we develop an effective feature representation learning model, with which we can extract and learn a set of informative features from a pool of support vector machine-based models trained using sequence-based feature descriptors. By doing so, the class label information of data samples is fully utilized. To improve the feature representation, we further employ a two-step feature selection technique, resulting in a most informative five-dimensional feature vector for the final peptide representation. Experimental results show that such five features provide the most discriminative power for identifying ACPs than currently available feature descriptors, highlighting the effectiveness of the proposed feature representation learning approach. The developed ACPred-FL method significantly outperforms state-of-the-art methods. Availability and implementation: The web-server of ACPred-FL is available at http://server.malab.cn/ACPred-FL. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Anti-cancer peptides (ACPs) have recently emerged as promising therapeutic agents for cancer treatment. Due to the avalanche of protein sequence data in the post-genomic era, there is an urgent need to develop automated computational methods to enable fast and accurate identification of novel ACPs within the vast number of candidate proteins and peptides. Results: To address this, we propose a novel predictor named Anti-Cancer peptide Predictor with Feature representation Learning (ACPred-FL) for accurate prediction of ACPs based on sequence information. More specifically, we develop an effective feature representation learning model, with which we can extract and learn a set of informative features from a pool of support vector machine-based models trained using sequence-based feature descriptors. By doing so, the class label information of data samples is fully utilized. To improve the feature representation, we further employ a two-step feature selection technique, resulting in a most informative five-dimensional feature vector for the final peptide representation. Experimental results show that such five features provide the most discriminative power for identifying ACPs than currently available feature descriptors, highlighting the effectiveness of the proposed feature representation learning approach. The developed ACPred-FL method significantly outperforms state-of-the-art methods. Availability and implementation: The web-server of ACPred-FL is available at http://server.malab.cn/ACPred-FL. Supplementary information: Supplementary data are available at Bioinformatics online.