Literature DB >> 27993784

Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

Hang Zhou1,2, Yang Yang3,4, Hong-Bin Shen1,2.   

Abstract

Motivation: Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models.
Results: In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. Availability and Implementation: www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. Contacts: hbshen@sjtu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 27993784     DOI: 10.1093/bioinformatics/btw723

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  24 in total

1.  GP4: an integrated Gram-Positive Protein Prediction Pipeline for subcellular localization mimicking bacterial sorting.

Authors:  Stefano Grasso; Tjeerd van Rij; Jan Maarten van Dijl
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

Review 2.  Understanding molecular mechanisms of disease through spatial proteomics.

Authors:  Sandra Pankow; Salvador Martínez-Bartolomé; Casimir Bamberger; John R Yates
Journal:  Curr Opin Chem Biol       Date:  2018-10-09       Impact factor: 8.822

3.  Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism.

Authors:  Hanhan Cong; Hong Liu; Yi Cao; Yuehui Chen; Cheng Liang
Journal:  Interdiscip Sci       Date:  2022-01-23       Impact factor: 2.233

4.  LocText: relation extraction of protein localizations to assist database curation.

Authors:  Juan Miguel Cejuela; Shrikant Vinchurkar; Tatyana Goldberg; Madhukar Sollepura Prabhu Shankar; Ashish Baghudana; Aleksandar Bojchevski; Carsten Uhlig; André Ofner; Pandu Raharja-Liu; Lars Juhl Jensen; Burkhard Rost
Journal:  BMC Bioinformatics       Date:  2018-01-17       Impact factor: 3.169

5.  Regulation of the Intranuclear Distribution of the Cockayne Syndrome Proteins.

Authors:  Teruaki Iyama; Mustafa N Okur; Tyler Golato; Daniel R McNeill; Huiming Lu; Royce Hamilton; Aishwarya Raja; Vilhelm A Bohr; David M Wilson
Journal:  Sci Rep       Date:  2018-11-30       Impact factor: 4.379

6.  HBPred: a tool to identify growth hormone-binding proteins.

Authors:  Hua Tang; Ya-Wei Zhao; Ping Zou; Chun-Mei Zhang; Rong Chen; Po Huang; Hao Lin
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

7.  Consistency and variation of protein subcellular location annotations.

Authors:  Ying-Ying Xu; Hang Zhou; Robert F Murphy; Hong-Bin Shen
Journal:  Proteins       Date:  2020-09-26

8.  Web tools to perform long non-coding RNAs analysis in oncology research.

Authors:  Shixing Gu; Guangjie Zhang; Qin Si; Jiawen Dai; Zhen Song; Yingshuang Wang
Journal:  Database (Oxford)       Date:  2021-07-23       Impact factor: 3.451

9.  To Decipher the Mycoplasma hominis Proteins Targeting into the Endoplasmic Reticulum and Their Implications in Prostate Cancer Etiology Using Next-Generation Sequencing Data.

Authors:  Mohammed Zakariah; Shahanavaj Khan; Anis Ahmad Chaudhary; Christian Rolfo; Mohamed Maher Ben Ismail; Yousef Ajami Alotaibi
Journal:  Molecules       Date:  2018-04-24       Impact factor: 4.411

10.  Correcting mistakes in predicting distributions.

Authors:  Valérie Marot-Lassauzaie; Michael Bernhofer; Burkhard Rost
Journal:  Bioinformatics       Date:  2018-10-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.