Literature DB >> 28754661

Predicting Amino Acid Substitution Probabilities Using Single Nucleotide Polymorphisms.

Francesca Rizzato1, Alex Rodriguez1, Xevi Biarnés2, Alessandro Laio3,4.   

Abstract

Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85-100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
Copyright © 2017 by the Genetics Society of America.

Entities:  

Keywords:  SNP; protein sequence alignment; protein sequence evolution; substitution matrices; substitution rate variability

Mesh:

Year:  2017        PMID: 28754661      PMCID: PMC5629329          DOI: 10.1534/genetics.117.300078

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  46 in total

1.  Codon usage tabulated from international DNA sequence databases: status for the year 2000.

Authors:  Y Nakamura; T Gojobori; T Ikemura
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

3.  Exhaustive matching of the entire protein sequence database.

Authors:  G H Gonnet; M A Cohen; S A Benner
Journal:  Science       Date:  1992-06-05       Impact factor: 47.728

4.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

5.  Automated assembly of protein blocks for database searching.

Authors:  S Henikoff; J G Henikoff
Journal:  Nucleic Acids Res       Date:  1991-12-11       Impact factor: 16.971

6.  Real-time DNA sequencing using detection of pyrophosphate release.

Authors:  M Ronaghi; S Karamohamed; B Pettersson; M Uhlén; P Nyrén
Journal:  Anal Biochem       Date:  1996-11-01       Impact factor: 3.365

7.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

8.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites.

Authors:  Z Yang
Journal:  Mol Biol Evol       Date:  1993-11       Impact factor: 16.240

9.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

10.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome.

Authors:  Jaime Huerta-Cepas; Salvador Capella-Gutiérrez; Leszek P Pryszcz; Marina Marcet-Houben; Toni Gabaldón
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.