Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.

Literature DB >> 29947739

The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.

Abstract

Motivation: The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs).
Results: The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh：

Substances：
Proteins

Year: 2019 PMID： 29947739 PMCID： PMC6298051 DOI： 10.1093/bioinformatics/bty523

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

49 in total

1. The amino-acid sequence in the glycyl chain of insulin.

Authors: F SANGER; E O P THOMPSON
Journal: Biochem J Date: 1952-09 Impact factor: 3.857

2. Network analysis of protein structures identifies functional residues.

Authors: Gil Amitai; Arye Shemesh; Einat Sitbon; Maxim Shklar; Dvir Netanely; Ilya Venger; Shmuel Pietrokovski
Journal: J Mol Biol Date: 2004-12-03 Impact factor: 5.469

Review 3. T cell costimulation: a rational target in the therapeutic armamentarium for autoimmune diseases and transplantation.

Authors: Flavio Vincenti; Michael Luggen
Journal: Annu Rev Med Date: 2007 Impact factor: 13.739

Review 4. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues.

Authors: Jing Yan; Stefanie Friedrich; Lukasz Kurgan
Journal: Brief Bioinform Date: 2015-05-01 Impact factor: 11.622

5. SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.

Authors: Christophe N Magnan; Pierre Baldi
Journal: Bioinformatics Date: 2014-05-24 Impact factor: 6.937

6. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors: S B Needleman; C D Wunsch
Journal: J Mol Biol Date: 1970-03 Impact factor: 5.469

7. Identifying functionally informative evolutionary sequence profiles.

Authors: Nelson Gil; Andras Fiser
Journal: Bioinformatics Date: 2018-04-15 Impact factor: 6.937

8. Covalent and noncovalent intermediates of an NAD utilizing enzyme, human CD38.

Authors: Qun Liu; Irina A Kriksunov; Hong Jiang; Richard Graeff; Hening Lin; Hon Cheung Lee; Quan Hao
Journal: Chem Biol Date: 2008-10-20

9. Active site prediction using evolutionary and structural information.

Authors: Sriram Sankararaman; Fei Sha; Jack F Kirsch; Michael I Jordan; Kimmen Sjölander
Journal: Bioinformatics Date: 2010-01-14 Impact factor: 6.937

10. CRHunter: integrating multifaceted information to predict catalytic residues in enzymes.

Authors: Jun Sun; Jia Wang; Dan Xiong; Jian Hu; Rong Liu
Journal: Sci Rep Date: 2016-09-26 Impact factor: 4.379

8 in total

1. Discovery of receptor-ligand interfaces in the immunoglobulin superfamily.

Authors: Nelson Gil; Eduardo J Fajardo; Andras Fiser
Journal: Proteins Date: 2019-07-29

2. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

Authors: Chengxin Zhang; Wei Zheng; S M Mortuza; Yang Li; Yang Zhang
Journal: Bioinformatics Date: 2020-04-01 Impact factor: 6.937

The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.

1. The amino-acid sequence in the glycyl chain of insulin.

2. Network analysis of protein structures identifies functional residues.

Review 3. T cell costimulation: a rational target in the therapeutic armamentarium for autoimmune diseases and transplantation.

Review 4. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues.

5. SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.

6. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

7. Identifying functionally informative evolutionary sequence profiles.

8. Covalent and noncovalent intermediates of an NAD utilizing enzyme, human CD38.

9. Active site prediction using evolutionary and structural information.

10. CRHunter: integrating multifaceted information to predict catalytic residues in enzymes.

1. Discovery of receptor-ligand interfaces in the immunoglobulin superfamily.

2. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

3. INTERCAAT: identifying interface residues between macromolecules.

4. Assessing the accuracy of contact predictions in CASP13.

5. One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.

6. Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences.

7. cpxDeepMSA: A Deep Cascade Algorithm for Constructing Multiple Sequence Alignments of Protein-Protein Interactions.

8. Integrated structure-based protein interface prediction.