Literature DB >> 24361712

MSLoc-DT: a new method for predicting the protein subcellular location of multispecies based on decision templates.

Shao-Wu Zhang1, Yan-Fang Liu2, Yong Yu2, Ting-He Zhang2, Xiao-Nan Fan2.   

Abstract

Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html. Crown
Copyright © 2013. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Amino acid index distribution; Decision template; Gene ontology; Multispecies; Subcellular location

Mesh:

Substances:

Year:  2013        PMID: 24361712     DOI: 10.1016/j.ab.2013.12.013

Source DB:  PubMed          Journal:  Anal Biochem        ISSN: 0003-2697            Impact factor:   3.365


  2 in total

1.  Prediction of protein-protein interaction with pairwise kernel support vector machine.

Authors:  Shao-Wu Zhang; Li-Yang Hao; Ting-He Zhang
Journal:  Int J Mol Sci       Date:  2014-02-21       Impact factor: 5.923

2.  Gene Prediction in Metagenomic Fragments with Deep Learning.

Authors:  Shao-Wu Zhang; Xiang-Yang Jin; Teng Zhang
Journal:  Biomed Res Int       Date:  2017-11-08       Impact factor: 3.411

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.