| Literature DB >> 22042333 |
Axel Künstner1, Benoit Nabholz, Hans Ellegren.
Abstract
A major conclusion from comparative genomics is that many sequences that do not code for proteins are conserved beyond neutral expectations, indicating that they evolve under the influence of purifying selection and are likely to have functional roles. Due to the degeneracy of the genetic code, synonymous sites within protein-coding genes have previously been seen as "silent" with respect to function and thereby invisible to selection. However, there are indications that synonymous sites of vertebrate genomes are also subject to selection and this is not necessarily because of potential codon bias. We used divergence in ancestral repeats as a neutral reference to estimate the constraint on 4-fold degenerate sites of avian genes in a whole-genome approach. In the pairwise comparison of chicken and zebra finch, constraint was estimated at 24-32%. Based on three-species alignments of chicken, turkey, and zebra finch, lineage-specific estimates of constraint were 43%, 29%, and 24%, respectively. The finding of significant constraint at 4-fold degenerate sites from data on interspecific divergence was replicated in an analysis of intraspecific diversity in the chicken genome. These observations corroborate recent data from mammalian genomes and call for a reappraisal of the use of synonymous substitution rates as neutral standards in molecular evolutionary analysis, for example, in the use of the well-known d(N)/d(S) ratio and in inferences on positive selection. We show by simulations that the rate of false positives in the detection of positively selected genes and sites increases several-fold at the levels of constraint at 4-fold degenerate sites found in this study.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22042333 PMCID: PMC3242499 DOI: 10.1093/gbe/evr112
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FEstimated sequence divergence of ARs, introns, 4-fold degenerated sites, and 0-fold sites in the chicken–zebra finch comparison estimated gene by gene. ***Denotes significantly lower divergence in comparison to ARs (P < 0.001).
Lineage-Specific Divergence (Mean ± Standard Deviation) of Different Sequences Categories Estimated from Concatenated Three-Species Alignments of Chicken, Turkey, and Zebra Finch
| 4-Fold Sites | AR | |
| Chicken | 0.028 (±0.001) | 0.049 (±0.001) |
| Turkey | 0.038 (±0.001) | 0.054 (±0.001) |
| Zebra finch | 0.302 (±0.005) | 0.399 (±0.004) |
Note.—The lineages are from an unrooted tree of the three species.
FGene-by-gene differences between divergence estimates of AR and synonymous sites. The dashed horizontal line marks where estimates of d and dAR are equal. Values below the line are genes where divergence at AR is estimated higher than divergence at synonymous sites (and vice versa for values above the line). The red line denotes the lowess curve.
Simulation Results for the Proportion of Significant Likelihood Ratio Tests (LRT) for Positive Selected Genes and for the Number of Positively Selected Sites with Constraint (Denoted by “Constr.”) and without Constraint (Denoted by “No con.”)
| Simulation | GC-Content | Positive Selection Simulated | Proportions of Significant LRT | Mean Number of Positively Evolving Sites | ||||||||||
| 25% Constraint | 35% Constraint | 45% Constraint | 25% Constraint | 35% Constraint | 45% Constraint | |||||||||
| No Con. | Constr. | No Con. | Constr. | No Con. | Constr. | No Con. | Constr. | No Con. | Constr. | No Con. | Constr. | |||
| 1 | Low | No | 0.09 | 0.14 | 0.04 | 0.16 | 0.05 | 0.33 | 5.5 | 6 | 5 | 33 | 5 | 79.5 |
| 2 | Average | No | 0.04 | 0.14 | 0.05 | 0.17 | 0.06 | 0.3 | 6 | 6 | 5 | 25.5 | 4.5 | 74 |
| 3 | High | No | 0.08 | 0.15 | 0.09 | 0.22 | 0.03 | 0.32 | 6 | 6 | 6 | 13 | 6.5 | 77 |
| 4 | Low | Yes | 0.53 | 0.71 | 0.47 | 0.86 | 0.51 | 0.96 | 56.5 | 71 | 59.8 | 85.8 | 53.3 | 90.1 |
| 5 | Average | Yes | 0.49 | 0.73 | 0.49 | 0.76 | 0.45 | 0.94 | 74 | 77 | 74 | 81 | 74 | 86 |
| 6 | High | Yes | 0.53 | 0.74 | 0.54 | 0.9 | 0.52 | 0.96 | 39.5 | 70 | 42.5 | 88 | 50.5 | 96 |
Note.—Results were obtained from simulating 200 data sets of 1,000 codons with and without constraint acting on synonymous sites and applied the branch-site likelihood test of positive selection as implemented in PAML. LRT, Likelihood Ratio Tests.