| Literature DB >> 24113537 |
Erika M Kvikstad1, Laurent Duret.
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.Entities:
Keywords: homoplasy; indels; natural selection; sequence evolution
Mesh:
Year: 2013 PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FeDAF spectra for polymorphic deletions (solid bars) and insertions (hash bars) by genome annotation, segregating in the YRI population. (A) All indels; (B) all indels compared with indels in ARs, and non-AR noncoding sequence. Error bars represent 1 SEM, using a binomial distribution to model the eDAF.
FDifference in mean eDAF as a function of local crossover rates between (A) AT to GC (weak to strong: WS) and GC to AT (strong to weak: SW) SNPs and (B) insertions and deletions. Sequence variants are shown segregating in the YRI (red), CEU (blue), and JPTCHB (green) populations.
FHeterogeneity in mean eDAF for polymorphic indels located in various DNA contexts. Shown are frequencies of polymorphic deletions (solid bars) and insertions (hashed bars), separately. Contexts: all indels (All), indel hotspots (hotspot), and nonrepetitive (NR; see Materials and Methods for details). Indels are segregating in the YRI population. Error bars represent 95% confidence intervals of the mean.
FTheoretical estimation of the difference in insertion and deletion mean eDAF (A) and estimated ratio of deletion to insertion (rDIe; B) as a function of polarization errors, for various true ratios of deletion to insertion events (rDIt).
Modified McDonald–Kreitman Tests of the Estimated Deletion to Insertion Ratio (rDIe) for Indel Polymorphism versus Divergence.
| All | Hotspot | NR | ||||
|---|---|---|---|---|---|---|
| rDIe | rDIe | rDIe | ||||
| Polymorphism | 276,495 | 2.22 | 50,297 | 0.97 | 226,198 | 2.75 |
| Divergence | 336,884 | 1.83 (1.3 × 103) | 74,497 | 0.38 (6.0 × 103) | 262,387 | 3.0 (1.9 × 102) |
Note.—For each test of polymorphism versus divergence, the rDIe (chi-square statistic) is provided.
aN, numbers of indels per category.
bHotspot, indel hotspot loci exhibiting greater than or equal to SNP diversity (see Materials and Methods for details).
cNR, nonrepetitive indels defined by excluding hotspots.
*Significant χ2 statistic (P < 10−16).
Expected Parsimony Error Rates due to Indel Rate Heterogeneity, Estimated from Models of Indel Sequence Evolution.
| Mean | Range (Lower) | Range (Upper) | |
|---|---|---|---|
| Deletions | |||
| FNR | |||
| Divergence | 0.170 | 0.064 | 0.469 |
| Polymorphism | 0.149 | 0.061 | 0.327 |
| FDR | |||
| Divergence | 3.28e−04 | 6.30e−05 | 1.18e−03 |
| Polymorphism | 8.36e−05 | 2.39e−05 | 2.96e−04 |
| Insertions | |||
| FNR | |||
| Divergence | 0.023 | 0 | 0.273 |
| Polymorphism | 0.020 | 0 | 0.219 |
| FDR | |||
| Divergence | 3.42e−03 | 1.07e−03 | 6.49e−03 |
| Polymorphism | 3.37e−04 | 2.26e−04 | 6.82e−04 |
| rDIe | |||
| rDIt − rDIe | |||
| Divergence | 1.217 | −2.220 | 5.439 |
| rDIe − rDIe | |||
| Divergence–polymorphism | 0.067 | −4.44e−16 | 0.214 |
acFNR, False negative rate.
bdFDR, False discovery rate.
crDIe, parsimony estimated deletion to insertion ratio.
dSimple, indels at sites with unambiguous polarization using orangutan and rhesus sequences to determine ancestral state.