| Literature DB >> 25227256 |
B C Jackson1, J L Campos2, K Zeng1.
Abstract
Using the data provided by the Drosophila Population Genomics Project, we investigate factors that affect the genetic differentiation between Rwandan and French populations of D. melanogaster. By examining within-population polymorphisms, we show that sites in long introns (especially those >2000 bp) have significantly lower π (nucleotide diversity) and more low-frequency variants (as measured by Tajima's D, minor allele frequencies, and prevalence of variants that are private to one of the two populations) than short introns, suggesting a positive relationship between intron length and selective constraint. A similar analysis of protein-coding polymorphisms shows that 0-fold (degenerate) sites in more conserved genes are under stronger purifying selection than those in less conserved genes. There is limited evidence that selection on codon bias has an effect on differentiation (as measured by FST) at 4-fold (degenerate) sites, and 4-fold sites and sites in 8-30 bp of short introns ⩽65 bp have comparable FST values. Consistent with the expected effect of purifying selection, sites in long introns and 0-fold sites in conserved genes are less differentiated than those in short introns and less conserved genes, respectively. Genes in non-crossover regions (for example, the fourth chromosome) have very high FST values at both 0-fold and 4-fold degenerate sites, which is probably because of the large reduction in within-population diversity caused by tight linkage between many selected sites. Our analyses also reveal subtle statistical properties of FST, which arise when information from multiple single nucleotide polymorphisms is combined and can lead to the masking of important signals of selection.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25227256 PMCID: PMC4270736 DOI: 10.1038/hdy.2014.80
Source DB: PubMed Journal: Heredity (Edinb) ISSN: 0018-067X Impact factor: 3.821
Summary statistics for loci in crossover (C) regions
| π | F | F | |||||
|---|---|---|---|---|---|---|---|
| A | 0-fold | RG | 0.0012 | −0.8397 | 0.1222 | 0.1516 | 0.1709 |
| FR | 0.0010 | −0.2586 | |||||
| 4-fold | RG | 0.0154 | −0.1069 | 0.1653 | 0.1684 | 0.1743 | |
| FR | 0.0119 | 0.1116 | |||||
| SI | RG | 0.0145 | −0.1380 | 0.1630 | 0.1677 | 0.1766 | |
| FR | 0.0113 | 0.1413 | |||||
| X | 0-fold | RG | 0.0012 | −1.1907 | 0.1073 | 0.1653 | 0.2924 |
| FR | 0.0005 | −0.2293 | |||||
| 4-fold | RG | 0.0166 | −0.4679 | 0.1367 | 0.1903 | 0.2879 | |
| FR | 0.0068 | 0.1412 | |||||
| SI | RG | 0.0160 | −0.4561 | 0.1379 | 0.2033 | 0.3173 | |
| FR | 0.0061 | 0.3414 | |||||
Abbreviations: MAF, minor allele frequency
Summary statistics calculated using data from within a subpopulation for the type of site under consideration.
Summary statistics calculated using data from both subpopulations for the type of site under consideration. The F-statistics are defined by Equations (5) and (6).
Population of origin; RG, Rwandan; FR, French.
Sites from 8–30 bp regions of short introns ⩽65 bp.
Figure 1Polymorphism patterns within 17 Rwandan D. melanogaster lines for coding sequence (CDS) binned by K value (to D. yakuba), and for sites in the 8–30 bp regions of short introns ⩽65 bp (SI sites). (a) Nucleotide diversity (π) for autosomal CDS-C and (b) X-linked CDS-C regions; (c) Tajima's D for autosomal CDS-C regions and (d) X-linked CDS-C regions. The x axes show the maximum K value in each bin. Symbols: 0-fold degenerate sites—open circles; 4-fold degenerate sites—open triangles; SI sites—open red squares.
Figure 2Differentiation patterns between 7 French and 17 Rwandan D. melanogaster lines for coding sequence (CDS) binned by K value (to D. yakuba), and for SI sites. (a) Unweighted mean F (F; Equation (5)) for autosomal coding CDS-C and (b) X-linked CDS-C regions; (c) population-average MAF for autosomal CDS-C regions and (d) X-linked CDS-C regions; (e) the proportion of SNPs per bin in which one allele was private to one of the D. melanogaster populations for autosomal CDS-C regions and (f) X-linked CDS-C regions. Symbols: 0-fold degenerate sites—open circles; 4-fold degenerate sites—open triangles; SI sites—open red squares.
Figure 3Divergence and polymorphism patterns for intronic sites binned by intron length. (a) Divergence (K) between D. melanogaster and D. simulans for autosomal introns and (b) X-linked introns; (c) nucleotide diversity (π) for autosomal introns and (d) X-linked introns; (e) Tajima's D for autosomal introns and (f) X-linked introns. The x axes display the maximum intron length in each bin. Note that the number of SNPs in each autosomal intron bin is roughly the same as that in the autosomal SI bin; the same applies to the X-linked data. Symbols: Long intronic sites—open circles; positions 8–30 bp sites of short introns ⩽65 bp (SI sites)—open red squares.
Figure 4Differentiation between 7 French and 17 Rwandan D. melanogaster lines for long intronic sites binned by intron length, and for SI sites. (a) Unweighted mean F (F; Equation (5)) for autosomal introns and (b) X-linked introns. Symbols: Long intronic sites—open circles; SI sites—open red squares.
Summary statistics for loci in non-crossover (NC) regions
| π | F | F | |||||
|---|---|---|---|---|---|---|---|
| A | 0-fold | RG | 0.00036 | −0.6737 | 0.1152 | 0.1817 | 0.2302 |
| FR | 0.00032 | −0.7098 | |||||
| 4-fold | RG | 0.00129 | −0.5274 | 0.1208 | 0.1906 | 0.2281 | |
| FR | 0.00122 | −0.5417 | |||||
| X | 0-fold | RG | 0.00056 | −0.6392 | 0.1556 | 0.3012 | 0.5673 |
| FR | 0.00023 | −0.3126 | |||||
| 4-fold | RG | 0.00327 | −0.0084 | 0.1395 | 0.2323 | 0.3485 | |
| FR | 0.00090 | 0.2069 | |||||
Abbreviations: FR, French; MAF, minor allele frequency; RG, Rwandan.
The statistics were obtained in the same way as in Table 1; see Materials and Methods for more details.
Figure 5Differentiation between 7 French and 17 Rwandan D. melanogaster lines for 4-fold degenerate sites and SI sites in C regions as a function of local recombination rate, and for 4-fold degenerate sites in NC regions. (a) F for autosomal CDS regions and (b) autosomal SI regions.