| Literature DB >> 16789817 |
Paul D Williams1, David D Pollock, Benjamin P Blackburne, Richard A Goldstein.
Abstract
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of "ancestral sequences" inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a "best guess" amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16789817 PMCID: PMC1480538 DOI: 10.1371/journal.pcbi.0020069
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Stability (Represented as ΔG Folding) for the True Ancestor and Reconstructed Sequences
Stability values shown for three nodes, as labelled in Figure S3: (A) shallow node, blue, (B) intermediate node, red, and (C) deep node, green. Each point represents the reconstruction of a single ancestral node in one of the 84 analysed simulations. Reconstructions were performed with MP, ML, and BI approaches. ΔG Folding values represent the average of 100 reconstructions. Points on the diagonal represent reconstructions generating accurate ancestral protein stabilities.
Figure 2Distribution of Errors in Reconstruction Stabilities (ΔΔG) When Reconstructions Are Made with MP (Green), ML (Red), and BI (Blue)
Figure 3Cumulative Distribution of Absolute Errors in Reconstruction Stabilities (|ΔΔG|)
Colour code as in Figure 2. Error with BI reconstructions are shown when averaging is performed over 100 sequence reconstructions (solid line), ten sequence reconstructions (dashed line), and a single reconstruction (dotted line).