| Literature DB >> 30924871 |
Abstract
Insertions and deletions (INDELs) remain understudied, despite being the most common form of genetic variation after single nucleotide polymorphisms. This stems partly from the challenge of correctly identifying the ancestral state of an INDEL and thus identifying it as an insertion or a deletion. Erroneously assigned ancestral states can skew the site frequency spectrum, leading to artificial signals of selection. Consequently, the selective pressures acting on INDELs are, at present, poorly resolved. To tackle this issue, we have recently published a maximum likelihood approach to estimate the mutation rate and the distribution of fitness effects for INDELs. Our approach estimates and controls for the rate of ancestral state misidentification, overcoming issues plaguing previous INDEL studies. Here, we apply the method to INDEL polymorphism data from ten high coverage (∼44×) European great tit (Parus major) genomes. We demonstrate that coding INDELs are under strong purifying selection with a small proportion making it into the population (∼4%). However, among fixed coding INDELs, 71% of insertions and 86% of deletions are fixed by positive selection. In noncoding regions, we estimate ∼80% of insertions and ∼52% of deletions are effectively neutral, the remainder show signatures of purifying selection. Additionally, we see evidence of linked selection reducing INDEL diversity below background levels, both in proximity to exons and in areas of low recombination.Entities:
Keywords: adaptive mutation; deletions; distribution of fitness effects; insertions; linked selection
Mesh:
Year: 2019 PMID: 30924871 PMCID: PMC6543879 DOI: 10.1093/gbe/evz068
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Nucleotide Diversity (π) for SNPs, INDELs (Unpolarized), Insertions (ins), and Deletions (del) in Different Genomic Contexts
| Context |
|
|
|
|
|---|---|---|---|---|
| Genome wide | 0.00310 | 0.000356 | 0.000113 | 0.000142 |
| Ancestral repeats | 0.00432 | 0.000363 | 0.000117 | 0.000175 |
| Intergenic | 0.00333 | 0.000378 | 0.000121 | 0.000154 |
| Introns | 0.00306 | 0.000361 | 0.000116 | 0.000143 |
| CDS | 0.00145 |
|
|
|
| In-frame | — |
|
|
|
| Frameshift | — |
|
|
|
| 4-Fold | 0.00369 | — | — | — |
| 0-Fold | 0.000586 | — | — | — |
| Nonsense |
| — | — | — |
Note.—Estimates in parentheses corrected for polarization error.
. 1.—Tajima’s D (a) and divergence (b) estimates for SNPs, INDELs (unpolarized), insertions (INS), and deletions (DEL) in different genomic contexts. Divergence estimates for SNPs are presented as the true divergence divided by 10.
Maximum Likelihood Parameter Estimates for the Best-Fitting Models for INDELs in CDS Regions and Noncoding regions
| Model and DFE | Variant Type |
|
|
| Scale | Shape |
|
|
|---|---|---|---|---|---|---|---|---|
| CDS: equal mutation rate | Insertions | 1 |
| –1.14 | — | — | 0.0799 | |
| Discrete | Insertions | 2 | 0.000134 | –801 | — | — | 0.000307 | 71 |
| Ancestral repeat reference | Deletions | 1 |
| –2.70 | — | — | 0.0368 | |
| Deletions | 2 | 0.000206 | –649 | — | — |
| 86 | |
| CDS: equal mutation rate | Insertions | 1 |
| –0.264 | — | — | 0.0729 | |
| Discrete | Insertions | 2 | 0.000156 | –897 | — | — | 0.000526 | 63 |
| Noncoding reference | Deletions | 1 |
| –1.70 | — | — | 0.0366 | |
| Deletions | 2 | 0.000205 | –629 | — | — | 0.00587 | 79 | |
| Noncoding: free mutation rate | Insertions | — | 0.000170 | –53.6 | 1,553 | 0.0345 | 0.0110 | — |
| Continuous | Deletions | — | 0.000293 | –75.5 | 715 | 0.106 | 0.0166 | — |
Note.—C defines the number of site class, θ the population scaled mutation rate, γ the population scaled selection coefficient, ϵ the polarization error, and α the proportion of INDEL substitutions driven by positive selection. Where γ values are presented for the continuous model these are mean γ estimates and the product of the scale and shape parameters.
. 2.—DFEs for noncoding insertions (INS NC), noncoding deletions (DEL NC), coding insertions (INS CDS), and coding deletions (DEL CDS), shown as the proportion of mutations falling into different selection coefficient (γ) bins.
. 3.—Relationship between mutation rate estimates (θ) for insertions (turquoise) and deletions (purple) and distance from exons in 2-kb windows. Dashed lines represent the genome-wide average mutation rate for noncoding variants, as shown in table 2.
. 4.—The relationship between local recombination rate (log transformed) and π (a) and Tajima’s D (b) for both insertions (turquoise) and deletions (purple).