Literature DB >> 24997236

R3P-Loc: a compact multi-label predictor using ridge regression and random projection for protein subcellular localization.

Shibiao Wan1, Man-Wai Mak2, Sun-Yuan Kung3.   

Abstract

Locating proteins within cellular contexts is of paramount significance in elucidating their biological functions. Computational methods based on knowledge databases (such as gene ontology annotation (GOA) database) are known to be more efficient than sequence-based methods. However, the predominant scenarios of knowledge-based methods are that (1) knowledge databases typically have enormous size and are growing exponentially, (2) knowledge databases contain redundant information, and (3) the number of extracted features from knowledge databases is much larger than the number of data samples with ground-truth labels. These properties render the extracted features liable to redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address these problems, this paper proposes an efficient multi-label predictor, namely R3P-Loc, which uses two compact databases for feature extraction and applies random projection (RP) to reduce the feature dimensions of an ensemble ridge regression (RR) classifier. Two new compact databases are created from Swiss-Prot and GOA databases. These databases possess almost the same amount of information as their full-size counterparts but with much smaller size. Experimental results on two recent datasets (eukaryote and plant) suggest that R3P-Loc can reduce the dimensions by seven-folds and significantly outperforms state-of-the-art predictors. This paper also demonstrates that the compact databases reduce the memory consumption by 39 times without causing degradation in prediction accuracy. For readers׳ convenience, the R3P-Loc server is available online at url:http://bioinfo.eie.polyu.edu.hk/R3PLocServer/.
Copyright © 2014 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Compact databases; Multi-label classification; Multi-location proteins

Mesh:

Substances:

Year:  2014        PMID: 24997236     DOI: 10.1016/j.jtbi.2014.06.031

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  8 in total

1.  Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.

Authors:  Shibiao Wan; Man-Wai Mak; Sun-Yuan Kung
Journal:  BMC Bioinformatics       Date:  2016-02-24       Impact factor: 3.169

2.  Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction.

Authors:  Bin Yu; Shan Li; Wenying Qiu; Minghui Wang; Junwei Du; Yusen Zhang; Xing Chen
Journal:  BMC Genomics       Date:  2018-06-19       Impact factor: 3.969

3.  Comparison and development of machine learning tools in the prediction of chronic kidney disease progression.

Authors:  Jing Xiao; Ruifeng Ding; Xiulin Xu; Haochen Guan; Xinhui Feng; Tao Sun; Sibo Zhu; Zhibin Ye
Journal:  J Transl Med       Date:  2019-04-11       Impact factor: 5.531

4.  Benchmark data for identifying multi-functional types of membrane proteins.

Authors:  Shibiao Wan; Man-Wai Mak; Sun-Yuan Kung
Journal:  Data Brief       Date:  2016-05-21

5.  Using Baidu index to nowcast hand-foot-mouth disease in China: a meta learning approach.

Authors:  Yang Zhao; Qinneng Xu; Yupeng Chen; Kwok Leung Tsui
Journal:  BMC Infect Dis       Date:  2018-08-13       Impact factor: 3.090

Review 6.  Computational Approaches to Prioritize Cancer Driver Missense Mutations.

Authors:  Feiyang Zhao; Lei Zheng; Alexander Goncearenco; Anna R Panchenko; Minghui Li
Journal:  Int J Mol Sci       Date:  2018-07-20       Impact factor: 5.923

7.  Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method.

Authors:  Yu-Hua Yao; Ya-Ping Lv; Ling Li; Hui-Min Xu; Bin-Bin Ji; Jing Chen; Chun Li; Bo Liao; Xu-Ying Nan
Journal:  BMC Bioinformatics       Date:  2019-12-30       Impact factor: 3.169

8.  Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter.

Authors:  Zhan-Heng Chen; Zhu-Hong You; Li-Ping Li; Yan-Bin Wang; Yu Qiu; Peng-Wei Hu
Journal:  BMC Genomics       Date:  2019-12-27       Impact factor: 3.969

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.