Literature DB >> 31738385

DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

Chengxin Zhang1, Wei Zheng1, S M Mortuza1, Yang Li1,2, Yang Zhang1,3.   

Abstract

MOTIVATION: The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved.
RESULTS: We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library.
AVAILABILITY AND IMPLEMENTATION: https://zhanglab.ccmb.med.umich.edu/DeepMSA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 31738385      PMCID: PMC7141871          DOI: 10.1093/bioinformatics/btz863

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  41 in total

1.  The I-TASSER Suite: protein structure and function prediction.

Authors:  Jianyi Yang; Renxiang Yan; Ambrish Roy; Dong Xu; Jonathan Poisson; Yang Zhang
Journal:  Nat Methods       Date:  2015-01       Impact factor: 28.547

2.  The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.

Authors:  Nelson Gil; Andras Fiser
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

3.  COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information.

Authors:  Chengxin Zhang; Peter L Freddolino; Yang Zhang
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

4.  Improving protein structure prediction using multiple sequence-based contact predictions.

Authors:  Sitao Wu; Andras Szilagyi; Yang Zhang
Journal:  Structure       Date:  2011-08-10       Impact factor: 5.006

5.  Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment.

Authors:  Jianyi Yang; Ambrish Roy; Yang Zhang
Journal:  Bioinformatics       Date:  2013-08-23       Impact factor: 6.937

6.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

Authors:  David T Jones; Shaun M Kandathil
Journal:  Bioinformatics       Date:  2018-10-01       Impact factor: 6.937

7.  NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers.

Authors:  Baoji He; S M Mortuza; Yanting Wang; Hong-Bin Shen; Yang Zhang
Journal:  Bioinformatics       Date:  2017-08-01       Impact factor: 6.937

8.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.

Authors:  Baris E Suzek; Yuqi Wang; Hongzhan Huang; Peter B McGarvey; Cathy H Wu
Journal:  Bioinformatics       Date:  2014-11-13       Impact factor: 6.937

9.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

Authors:  Sheng Wang; Siqi Sun; Zhen Li; Renyu Zhang; Jinbo Xu
Journal:  PLoS Comput Biol       Date:  2017-01-05       Impact factor: 4.475

10.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.

Authors:  Badri Adhikari; Jie Hou; Jianlin Cheng
Journal:  Bioinformatics       Date:  2018-05-01       Impact factor: 6.937

View more
  42 in total

1.  FUpred: detecting protein domains through deep-learning-based contact map prediction.

Authors:  Wei Zheng; Xiaogen Zhou; Qiqige Wuyun; Robin Pearce; Yang Li; Yang Zhang
Journal:  Bioinformatics       Date:  2020-06-01       Impact factor: 6.937

2.  Functions of Essential Genes and a Scale-Free Protein Interaction Network Revealed by Structure-Based Function and Interaction Prediction for a Minimal Genome.

Authors:  Chengxin Zhang; Wei Zheng; Micah Cheng; Gilbert S Omenn; Peter L Freddolino; Yang Zhang
Journal:  J Proteome Res       Date:  2021-01-04       Impact factor: 4.466

3.  Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks.

Authors:  Yang Li; Chengxin Zhang; Eric W Bell; Wei Zheng; Xiaogen Zhou; Dong-Jun Yu; Yang Zhang
Journal:  PLoS Comput Biol       Date:  2021-03-26       Impact factor: 4.475

4.  DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins.

Authors:  Sutanu Bhattacharya; Rahmatullah Roche; Bernard Moussad; Debswapna Bhattacharya
Journal:  Proteins       Date:  2021-10-11

5.  Progressive assembly of multi-domain protein structures from cryo-EM density maps.

Authors:  Xiaogen Zhou; Yang Li; Chengxin Zhang; Wei Zheng; Guijun Zhang; Yang Zhang
Journal:  Nat Comput Sci       Date:  2022-04-28

Review 6.  I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction.

Authors:  Xiaogen Zhou; Wei Zheng; Yang Li; Robin Pearce; Chengxin Zhang; Eric W Bell; Guijun Zhang; Yang Zhang
Journal:  Nat Protoc       Date:  2022-08-05       Impact factor: 17.021

7.  Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14.

Authors:  Yang Li; Chengxin Zhang; Wei Zheng; Xiaogen Zhou; Eric W Bell; Dong-Jun Yu; Yang Zhang
Journal:  Proteins       Date:  2021-08-19

8.  Protein Contact Map Refinement for Improving Structure Prediction Using Generative Adversarial Networks.

Authors:  Sai Raghavendra Maddhuri Venkata Subramaniya; Genki Terashi; Aashish Jain; Yuki Kagaya; Daisuke Kihara
Journal:  Bioinformatics       Date:  2021-03-31       Impact factor: 6.937

9.  CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction.

Authors:  Fusong Ju; Jianwei Zhu; Bin Shao; Lupeng Kong; Tie-Yan Liu; Wei-Mou Zheng; Dongbo Bu
Journal:  Nat Commun       Date:  2021-05-05       Impact factor: 14.919

10.  MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction.

Authors:  Tianqi Wu; Jian Liu; Zhiye Guo; Jie Hou; Jianlin Cheng
Journal:  Sci Rep       Date:  2021-06-23       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.