| Literature DB >> 32053390 |
Gary Larson1, Jeffrey L Thorne2,3, Scott Schmidler1,4.
Abstract
Evolutionary models of proteins are widely used for statistical sequence alignment and inference of homology and phylogeny. However, the vast majority of these models rely on an unrealistic assumption of independent evolution between sites. Here we focus on the related problem of protein structure alignment, a classic tool of computational biology that is widely used to identify structural and functional similarity and to infer homology among proteins. A site-independent statistical model for protein structural evolution has previously been introduced and shown to significantly improve alignments and phylogenetic inferences compared with approaches that utilize only amino acid sequence information. Here we extend this model to account for correlated evolutionary drift among neighboring amino acid positions. The result is a spatiotemporal model of protein structure evolution, described by a multivariate diffusion process convolved with a spatial birth-death process. This extended site-dependent model (SDM) comes with little additional computational cost or analytical complexity compared with the site-independent model (SIM). We demonstrate that this SDM yields a significant reduction of bias in estimated evolutionary distances and helps further improve phylogenetic tree reconstruction. We also develop a simple model of site-dependent sequence evolution, which we use to demonstrate the bias resulting from the application of standard site-independent sequence evolution models.Entities:
Keywords: diffusion process; dynamic programming; evolution; phylogeny; protein structure
Mesh:
Substances:
Year: 2020 PMID: 32053390 PMCID: PMC7081252 DOI: 10.1089/cmb.2019.0500
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479