Literature DB >> 29947739

The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.

Nelson Gil1, Andras Fiser1.   

Abstract

Motivation: The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs).
Results: The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2019        PMID: 29947739      PMCID: PMC6298051          DOI: 10.1093/bioinformatics/bty523

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  49 in total

1.  The amino-acid sequence in the glycyl chain of insulin.

Authors:  F SANGER; E O P THOMPSON
Journal:  Biochem J       Date:  1952-09       Impact factor: 3.857

2.  Network analysis of protein structures identifies functional residues.

Authors:  Gil Amitai; Arye Shemesh; Einat Sitbon; Maxim Shklar; Dvir Netanely; Ilya Venger; Shmuel Pietrokovski
Journal:  J Mol Biol       Date:  2004-12-03       Impact factor: 5.469

Review 3.  T cell costimulation: a rational target in the therapeutic armamentarium for autoimmune diseases and transplantation.

Authors:  Flavio Vincenti; Michael Luggen
Journal:  Annu Rev Med       Date:  2007       Impact factor: 13.739

Review 4.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues.

Authors:  Jing Yan; Stefanie Friedrich; Lukasz Kurgan
Journal:  Brief Bioinform       Date:  2015-05-01       Impact factor: 11.622

5.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.

Authors:  Christophe N Magnan; Pierre Baldi
Journal:  Bioinformatics       Date:  2014-05-24       Impact factor: 6.937

6.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

7.  Identifying functionally informative evolutionary sequence profiles.

Authors:  Nelson Gil; Andras Fiser
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

8.  Covalent and noncovalent intermediates of an NAD utilizing enzyme, human CD38.

Authors:  Qun Liu; Irina A Kriksunov; Hong Jiang; Richard Graeff; Hening Lin; Hon Cheung Lee; Quan Hao
Journal:  Chem Biol       Date:  2008-10-20

9.  Active site prediction using evolutionary and structural information.

Authors:  Sriram Sankararaman; Fei Sha; Jack F Kirsch; Michael I Jordan; Kimmen Sjölander
Journal:  Bioinformatics       Date:  2010-01-14       Impact factor: 6.937

10.  CRHunter: integrating multifaceted information to predict catalytic residues in enzymes.

Authors:  Jun Sun; Jia Wang; Dan Xiong; Jian Hu; Rong Liu
Journal:  Sci Rep       Date:  2016-09-26       Impact factor: 4.379

View more
  8 in total

1.  Discovery of receptor-ligand interfaces in the immunoglobulin superfamily.

Authors:  Nelson Gil; Eduardo J Fajardo; Andras Fiser
Journal:  Proteins       Date:  2019-07-29

2.  DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

Authors:  Chengxin Zhang; Wei Zheng; S M Mortuza; Yang Li; Yang Zhang
Journal:  Bioinformatics       Date:  2020-04-01       Impact factor: 6.937

3.  INTERCAAT: identifying interface residues between macromolecules.

Authors:  Steven Grudman; J Eduardo Fajardo; Andras Fiser
Journal:  Bioinformatics       Date:  2021-09-09       Impact factor: 6.931

4.  Assessing the accuracy of contact predictions in CASP13.

Authors:  Rojan Shrestha; Eduardo Fajardo; Nelson Gil; Krzysztof Fidelis; Andriy Kryshtafovych; Bohdan Monastyrskyy; Andras Fiser
Journal:  Proteins       Date:  2019-10-24

5.  One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.

Authors:  Carlos Valiente-Mullor; Beatriz Beamud; Iván Ansari; Carlos Francés-Cuesta; Neris García-González; Lorena Mejía; Paula Ruiz-Hueso; Fernando González-Candelas
Journal:  PLoS Comput Biol       Date:  2021-01-27       Impact factor: 4.475

6.  Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences.

Authors:  Dimitrii O Kostenko; Eugene V Korotkov
Journal:  Int J Mol Sci       Date:  2022-03-29       Impact factor: 5.923

7.  cpxDeepMSA: A Deep Cascade Algorithm for Constructing Multiple Sequence Alignments of Protein-Protein Interactions.

Authors:  Zi Liu; Dong-Jun Yu
Journal:  Int J Mol Sci       Date:  2022-07-30       Impact factor: 6.208

8.  Integrated structure-based protein interface prediction.

Authors:  M Walder; E Edelstein; M Carroll; S Lazarev; J E Fajardo; A Fiser; R Viswanathan
Journal:  BMC Bioinformatics       Date:  2022-07-25       Impact factor: 3.307

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.