Literature DB >> 24531731

Evidence that natural selection on codon usage in Drosophila pseudoobscura varies across codons.

Richard M Kliman1.   

Abstract

Like other species of Drosophila, Drosophila pseudoobscura has a distinct bias toward the usage of C- and G-ending codons. Previous studies have indicated that this bias is due, at least in part, to natural selection. Codon bias clearly differs among amino acids (and other codon classes) in Drosophila, which may reflect differences in the intensity of selection on codon usage. Ongoing natural selection on synonymous codon usage should be reflected in the shapes of the site frequency spectra of derived states at polymorphic positions. Specifically, regardless of other demographic effects on the spectrum, it should be shifted toward higher values for changes from less-preferred to more-preferred codons, and toward lower values for the converse. If the intensity of natural selection is increased, shifts in the site frequency spectra should be more pronounced. A total of 33,729 synonymous polymorphic sites on Chromosome 2 in D. pseudoobscura were analyzed. Shifts in the site frequency spectra are consistent with differential intensity of natural selection on codon usage, with stronger shifts associated with higher codon bias. The shifts, in general, are greater for polymorphic synonymous sites than for polymorphic intron sites, also consistent with natural selection. However, unlike observations in D. melanogaster, codon bias is not reduced in areas of low recombination in D. pseudoobscura; the site frequency spectrum signal for selection on codon usage remains strong in these regions. However, diversity is reduced, as expected. It is possible that estimates of low recombination reflect a recent change in recombination rate.

Entities:  

Keywords:  Drosophila pseudoobscura; codon bias; natural selection; recombination; site frequency spectrum

Mesh:

Substances:

Year:  2014        PMID: 24531731      PMCID: PMC4059240          DOI: 10.1534/g3.114.010488

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


The relative usage of synonymous codons varies among genes within an organism. In some organisms (e.g., humans), this variation may largely reflect base composition variation across the genome (Bernardi ; Kliman and Bernal 2005). In many organisms, however, natural selection appears to directly influence codon usage, with positive correlations between the levels of codon bias and gene expression that are consistent with selection on transcriptional efficiency and/or fidelity (Chavancy ; Gouy and Gautier 1982; Ikemura 1985; Akashi 1994; Moriyama and Powell 1997; Kliman and Henry 2005; Plotkin and Kudla 2011). This relationship was first reported for Drosophila by Shields , and numerous studies using diverse approaches have supported the hypothesis that natural selection influences codon usage in several Drosophila species (Kliman and Hey 1993; Akashi and Schaeffer 1997; Carlini and Stephan 2003; Haddrill ; De Proce ). Effective weak selection on codon usage requires a sufficient effective population size to overcome the effects of genetic drift [although see Hershberg and Petrov (2008), who point out that selection on codon usage may not be always be weak]. However, evidence has been emerging that natural selection may even be influencing codon usage in humans (Lavner and Kotlar 2005), mammals more generally (Yang and Nielsen 2008), and other vertebrates (Doherty and McInerney 2013). Among the studies supporting the hypothesis that selection influences codon usage are (1) those that show shifts in the site frequency spectra (SFS) of derived states at synonymous polymorphic sites, such that the SFS is shifted toward higher values for changes to more preferred codons (Akashi and Schaeffer 1997; Kliman 1999; Llopart ); and (2) those that show significantly reduced codon bias in areas of the genome with very low recombination rates (Kliman and Hey 1993; Hey and Kliman 2002). The latter is consistent with the expectation that natural selection will be less effective in the absence of recombination due to linkage disequilibrium among targets of selection; these are often in repulsion due to independent emergence of selectively favored or disfavored mutations on different copies of a chromosome in a population (Hill and Robertson 1966; Felsenstein 1974; McVean and Charlesworth 2000; Comeron ). Limited recombination among targets of selection is also predicted to lead to reduced diversity, whether by selective sweeps that wipe out standing variation (Maynard Smith and Haigh 1974; Gillespie 2000) or by background selection against continually arising deleterious mutations that prevent diversity from accumulating in the first place (Charlesworth ; McVean and Charlesworth 2000). This prediction was most notably confirmed by Begun and Aquadro (1992) in D. melanogaster and has been corroborated by subsequent studies in Drosophila (Comeron ) and other organisms [reviewed by Nachman (2002) and Stephan (2010)]. These earlier studies often relied on estimates of recombination rate derived by fitting recombination maps to physical maps, using a variety of line- or curve-fitting approaches. The advent of “next-generation” DNA-sequencing methods has allowed investigators to identify numerous single-nucleotide polymorphisms that can be used in testcrosses to directly estimate recombination rate at a fine scale. Cirulli directly estimated recombination rates across a section of the D. pseudoobscura X chromosome and found considerable heterogeneity in recombination rate. Kulathinal showed that estimates of recombination rate at finer scales (i.e., with more densely spaced markers) correlated better with diversity, a finding that suggests that fine-scale recombination rates are more reliable when they can be obtained. McGaugh extended this work to three complete chromosomes, not only confirming heterogeneity, but showing that estimates could be replicated using crosses of different strains. Importantly, McGaugh also sequenced 10 additional D. pseudoobscura genomes (along with those of other close relatives) and observed the predicted correlation between recombination rate and diversity. Stevison and Noor (2010) observed a similar relationship between fine-scale recombination rate and diversity in closely related D. persimilis. Although previous polymorphism-based studies on natural selection codon usage in D. pseudoobscura have relied on hundreds of variable sites, the Chromosome 2 data from McGaugh provide tens of thousands of variable sites. These data, therefore, allow us to much more thoroughly investigate the effects of natural selection on codon usage in this species. In addition to providing increased statistical power to detect subtle effects, it becomes possible to subdivide data and retain statistical power to test for differential effects. In particular, we investigate whether variation among amino acids in the degree of codon bias can be explained by variation in selection intensity. Furthermore, because of its generally higher recombination rate, analysis with D. pseudoobscura provides a valuable contrast to D. melanogaster. As expected, we observe a fairly strong correlation between recombination rate and diversity. Although other composition-biasing influences may be influencing patterns of diversity, most notably G/C-biased gene conversion (Marais , 2003; Singh ), a comparison of SFSs of synonymous and intron sites shows that the SFS shifts are significantly stronger at synonymous sites. Therefore, although selectively neutral influences may be partially responsible for the observed SFS shifts, the data support an influence of natural selection on codon usage. Furthermore, differences among subsets of codons in the SFS shifts are consistent with the differential influence of natural selection. We do not, however, find that selection on codon usage is consistently weaker in areas with lower estimates of recombination rate in D. pseudoobscura.

Materials and Methods

Data set

Chromosome 2 was recently sequenced in 10 strains of D. pseudoobscura, along with one strain of the outgroup D. lowei, using Illumina platforms (McGaugh ) (National Center for Biotechnology Information sequence read archive accession numbers SRA044960.1, SRA044955.2, and SRA044956.1). The reference strain of D. pseudoobscura (Richards ) v2.9 was also included in analyses. The D. pseudoobscura strains represent nearly isogenic lines generated by full-sibling matings over several generations (Machado ; McGaugh ). Population structure is very limited in D. pseudoobscura (Schaeffer and Miller 1992; Noor ), making it unlikely that the choice of strains, including the reference strain, would influence patterns of diversity. Of the 55 possible pairwise contrasts, 10 of the 11 that showed the lowest pairwise difference at synonymous sites included the reference strain. The reference strain also contributes the smallest number of derived singletons to the polymorphism data set, suggesting that, if anything, there are fewer sequencing errors in the reference strain than in the others.

Genes

Genes were excluded from analysis if any of the following criteria were met in the reference sequence: the annotated start codon was not AUG; the annotated stop codon was not UGA, UAG, or UAA; any amino acid codon was incompletely resolved; there was a premature stop codon; or the intron/exon boundaries were noncanonical. A total of 3548 genes met all inclusion criteria.

Recombination rates

Estimates were obtained from a pair of testcrosses (Flagstaff and Pikes Peak) described in McGaugh . The Flagstaff testcross involved two strains from Arizona bearing the “Arrowhead” arrangement on chromosome 3; the Pikes Peak testcross involved two lines from New Mexico bearing the “Pikes Peak” arrangement on chromosome 3. As these authors noted, recombination rates estimated from the independent testcrosses were very similar along Chromosome 2 (Figure 1). Estimates were obtained for 140 segments in the Flagstaff cross and for 158 segments in the Pikes Peak cross. Except near the ends of the chromosome, most positions are represented in both recombination maps. Unless otherwise stated, for all analyses, the average of the two recombination rates was used for these positions.
Figure 1

Recombination rate along Chromosome 2. Estimates are from independent testcrosses reported in McGaugh .

Recombination rate along Chromosome 2. Estimates are from independent testcrosses reported in McGaugh .

Site inclusion criteria

For each sequence, excluding the reference strain, a base was considered unresolved (’N’) if either the phred consensus quality score was below 30 or the depth of coverage in the alignment was below 15 × . To be included in the analysis, an intron site or complete codon (i.e., all three sites) had to be completely resolved in all 12 strains. A total of 35,376 codons meeting these criteria displayed synonymous polymorphism in D. pseudoobscura. Of these, 33,755 segregated two character states; 26 of these loci were in regions for which no recombination rate has been obtained (McGaugh ) and were excluded from analyses involving recombination. A total of 68,332 intron sites meeting the inclusion criteria were polymorphic in D. pseudoobscura; of these, 65,266 segregated two character states.

Statistical analyses

Statistical analyses were performed using R v3.0.1 for Mac OS X (the R Foundation for Statistical Computing), implemented in RStudio v0.97.551 (RStudio 2013).

Inference of preferred codons

Following Hey and Kliman (2002), preferred codons were inferred by factor analysis on the 59 codons that encode the 18 degenerate amino acids. Only genes that used all 18 amino acids were included in the factor analysis. The primary factor was polarized, such that values correlated positively with Chi/L (Shields ) and negatively with effective number of codons (Wright 1990); both are measures of uneven codon usage, with Chi/L increasing and ENC decreasing as codon usage becomes less even. Codons that loaded positively on the primary factor were considered “preferred.” The degree of preference (or “preference score”) of each codon was defined as its loading score (Llopart ). For each synonymous polymorphic site, Δpref was defined as prefderived − prefancestral, where prefderived and prefancestral are the codon preference scores for the derived and ancestral states, respectively, as inferred by parsimony (Llopart ). For convenience, a site is defined as P→U if Δpref is negative, and as U→P if Δpref is positive. This designation is not clear-cut for amino acids with degeneracy above 2; for example, a mutation may substitute one preferred codon with another preferred codon. However, for our analyses, the polarity of the fitness effect is more important than the assignment of “preferred” or “unpreferred.”

Analysis of diversity

SFS of derived preferred vs. derived unpreferred codons (or similar contrasts) were compared using parametric and nonparametric tests (Akashi and Schaeffer 1997), as well as a permutation test (Llopart ; described in context under Site frequency spectra relative to Δpref).

Results

All C-ending codons, and all but three G-ending codons, are preferred in D. pseudoobscura. All A-and T-ending codons are unpreferred (Table 1).
Table 1

Codon preference scores

CodonAmino AcidPref ScoreCodonAmino AcidPref Score
GCCala0.571CTGleu0.697
GCGala0.076CTCleu0.316
GCTala−0.401TTGleu−0.434
GCAala−0.491CTAleu−0.443
CGCarg0.501CTTleu−0.458
CGGarg0.163TTAleu−0.572
CGTarg−0.090AAGlys0.700
AGGarg−0.188AAAlys−0.700
CGAarg−0.279TTCphe0.534
AGAarg−0.471TTTphe−0.534
AACasn0.461CCCpro0.425
AATasn−0.461CCGpro0.211
GACasp0.392CCTpro−0.404
GATasp−0.392CCApro−0.442
TGCcys0.304TCCser0.287
TGTcys−0.304AGCser0.270
CAGgln0.656TCGser0.228
CAAgln−0.656AGTser−0.297
GAGglu0.724TCTser−0.351
GAAglu−0.724TCAser−0.456
GGCgly0.430ACCthr0.435
GGGgly−0.083ACGthr0.204
GGTgly−0.222ACTthr−0.357
GGAgly−0.291ACAthr−0.416
CAChis0.331TACtyr0.421
CAThis−0.331TATtyr−0.421
ATCile0.584GTGval0.453
ATTile−0.345GTCval0.245
ATAile−0.405GTAval−0.489
GTTval−0.492

Pref, preference.

Pref, preference.

Estimates of diversity, divergence, and codon bias

Synonymous sites were counted taking into account the degeneracy of codons. For example, a fourfold degenerate codon would provide one synonymous site, whereas a twofold degenerate codon would provide 1/3 of a synonymous site. A total of 543,985 synonymous sites and 2,746,629 intron sites were completely resolved in all 12 strains. The Watterson (1975) estimator of synonymous θ in D. pseudoobscura was 0.0222 across all sites; average synonymous divergence from D. lowei was 0.0760. Diversity and divergence varied among amino acids (Table 2), with twofold degenerate amino acids having higher values of both. Watterson’s estimator of intron θ in D. pseudoobscura was 0.0085; divergence from D. lowei was 0.0297. The lower values for intron sites probably reflect a larger denominator, as each intron site was counted as one full site.
Table 2

Estimates of synonymous divergence and diversity

Amino AcidNsynaSsynbSsyn (2)cDsyndθ^W/bpeD/bpf
All543,985.035,37633,72941,3600.0222030.076032
ala50,380.02661251331260.0180330.062048
arg48,644.02558236530470.0179540.062639
asn13,894.31282128116210.0315020.116667
asp15,130.01303130218350.0294030.121282
cys5,124.65005005760.0333110.112398
glu18,283.61733173119360.0323610.105887
gln12,420.71034103411510.0284220.092668
gly40,903.02600242729320.0217020.071682
his7051.35935938130.0287130.115298
ile31,135.41833178220280.0201000.065135
leu91,350.65619513963130.0210010.069107
lys17,437.31500150018170.0293700.104202
phe11,413.01358135616340.0406240.143170
pro35,924.02137197524810.0203100.069062
ser44,903.32865273533310.0217840.074182
thr43,553.02406225727750.0188610.063715
tyr9123.685184910620.0318450.116401
val47,313.02543239028820.0183510.060913

Number of synonymous sites in D. pseudoobscura.

Number of synonymous polymorphic sites in D. pseudoobscura.

Number of synonymous polymorphic sites segregating two codons in D. pseudoobscura, and for which a recombination rate estimate is available.

Number of divergent synonymous sites between the reference strain of D. pseudoobscura v2.9 (Richards ) and D. lowei for codons fully resolved in all D. pseudoobscura strains and in D. lowei.

Watterson (1975) estimator of synonymous theta in D. pseudoobscura.

Synonymous divergence per base pair.

Number of synonymous sites in D. pseudoobscura. Number of synonymous polymorphic sites in D. pseudoobscura. Number of synonymous polymorphic sites segregating two codons in D. pseudoobscura, and for which a recombination rate estimate is available. Number of divergent synonymous sites between the reference strain of D. pseudoobscura v2.9 (Richards ) and D. lowei for codons fully resolved in all D. pseudoobscura strains and in D. lowei. Watterson (1975) estimator of synonymous theta in D. pseudoobscura. Synonymous divergence per base pair.

Association of diversity and codon bias with recombination rate

Using average recombination rate (Flagstaff and Pikes Peak), loci were placed into 25 recombination categories: 0.00−0.25 cM/Mb, >0.25−0.50 cm/Mb, ..., >5.75−6.00 cM/Mb, and >6.00 cM/Mb. (No sites are in regions with >0.25−0.50 cM/Mb.) As shown in Figure 2A, synonymous diversity increases with recombination rate (defined by the upper bound of the category) until about 2 cM/Mb, at which point diversity levels off [r = 0.800, 21 degrees of freedom (d.f.), 1-tailed P < 10−5]. A very similar relationship is observed for intron diversity (r = 0.703, 21 d.f., 1-tailed P < 10−4) (Figure 2B). However, there is no clear relationship between recombination rate and the frequency of optimal codons [Fop, a measure of preferred codon usage (Sharp and Devine 1989)] (r = 0.209, 21 d.f., 1-tailed P = 0.158) (Figure 2C).
Figure 2

Diversity and codon bias relative to recombination rate. Points are plotted at the upper end of the recombination rate range (e.g., at 0.25 for 0−0.25 cM/Mb); the red point represents sites in regions with recombination rate above 6 cM/Mb. (A) Synonymous diversity measure is the Watterson (1975) estimator of θ. (B) Intron diversity. (C) Codon bias measure is Fop (Sharp and Devine 1989).

Diversity and codon bias relative to recombination rate. Points are plotted at the upper end of the recombination rate range (e.g., at 0.25 for 0−0.25 cM/Mb); the red point represents sites in regions with recombination rate above 6 cM/Mb. (A) Synonymous diversity measure is the Watterson (1975) estimator of θ. (B) Intron diversity. (C) Codon bias measure is Fop (Sharp and Devine 1989). Synonymous diversity varied spatially along Chromosome 2. Using the 142 segments defined by the Flagstaff recombination map (the 140 regions with recombination rate estimates, along with the two external regions), there is obvious heterogeneity in levels of synonymous diversity (Figure 3A). However, there is also considerable variation in the number of sites analyzed, with as few as 14 sites to as many as 22,635. With a median of 2897 sites per segment, 18 of the 142 segments had fewer than 1000 sites, and 29 had more than 5000 sites. When only the latter are plotted to minimize sampling error (Figure 3B), diversity is clearly reduced at the two ends of the chromosome, and it is also somewhat suppressed in the center of the chromosome. As expected based on the analysis of recombination rate classes, there is a positive correlation between recombination rate in each segment and synonymous diversity (r = 0.533, 2-tailed P = 0.0026 for the regions with at least 5000 sites). The correlation, although slightly weaker (r = 0.350), remains highly significant (P < 10−4) when all regions are included. For intron sites, the correlation between diversity and recombination rate is somewhat weaker, but still significant, for regions with at least 5,000 sites (r = 0.268, n = 120, P = 0.0029). However, when regions with fewer sites are included, the correlation is lost (r = −0.029). In contrast to diversity, variation along Chromosome 2 in Fop appears to be negligible (Figure 3, C and D). Results were qualitatively similar, including the significant correlation between recombination rate and either synonymous or intron diversity, using the Pikes Peak recombination map (data not shown).
Figure 3

Diversity and codon bias along Chromosome 2. (A) Diversity in all recombination map segments. Segments upstream (FLint_upout) and downstream (FLint_dnout) of the recombination map are also shown; for these, there is no corresponding recombination rate estimate. (B) Diversity in segments with at least 5000 synonymous sites. (C) Codon bias (Fop) in all segments. (D) Codon bias in segments with at least 5,000 synonymous sites. Recombination rates (cM/Mb from the Flagstaff testcross) are shown for reference.

Diversity and codon bias along Chromosome 2. (A) Diversity in all recombination map segments. Segments upstream (FLint_upout) and downstream (FLint_dnout) of the recombination map are also shown; for these, there is no corresponding recombination rate estimate. (B) Diversity in segments with at least 5000 synonymous sites. (C) Codon bias (Fop) in all segments. (D) Codon bias in segments with at least 5,000 synonymous sites. Recombination rates (cM/Mb from the Flagstaff testcross) are shown for reference.

Site frequency spectra relative to Δpref

Under a constant-Ne Wright-Fisher neutral model, the relative frequency of sites with d-derived states is 1/d, such that the expected mean d is (k − 1)/a, where k is the sample size and a is the sum of 1, 1/2, ..., 1/(k − 1) (Fu 1995). For k = 11 D. pseudoobscura sequences, we expect 3.414 derived states/site. Overall, our data for synonymous polymorphic sites indicate a shift of the SFS toward lower values, with a mean of 2.59 derived states/site. For intron sites, the mean was 2.279 derived states/site. However, there are noticeably raised tails in the SFSs, likely due in part to ancestral state misassignment (ASM; discussed in Impact of ASM). If natural selection is acting on codon usage, then the SFSs for P→U and U→P changes should be shifted relative to each other; i.e., the SFS for U→P changes should be shifted toward higher values (Akashi and Schaeffer 1997). Consistent with natural selection, the synonymous SFS was shifted toward higher values for the 9918 U→P sites (mean = 3.395) than for the 23,811 P→U sites (mean = 2.258) (Figure 4). However, the SFS for the U→P sites is not right-shifted relative to the expectations of a constant-Ne Wright-Fisher neutral model; this may indicate a demographic effect on diversity, such as historically increasing Ne (Tajima 1989a). The difference between U→P and P→U sites is highly significant using either a 1-tailed Student’s t-test (t.test in R, t = 33.715, P < 10−15) or a 1-tailed Mann-Whitney U-test (following Akashi and Schaeffer (1997)) (wilcox.test in R, P < 10−15).
Figure 4

Sites frequency spectrum for synonymous polymorphic sites. Shown are sites that segregate two codons and fall within a region for which recombination rate was estimated. “P to U,” a change to a more unpreferred codon; “U to P,” a change to a more preferred codon.

Sites frequency spectrum for synonymous polymorphic sites. Shown are sites that segregate two codons and fall within a region for which recombination rate was estimated. “P to U,” a change to a more unpreferred codon; “U to P,” a change to a more preferred codon. Given that singletons (i.e., sites with 1 or 10 individuals carrying the inferred derived state) are the most likely polymorphic sites to represent sequencing errors, the analyses were repeated excluding 5045 U→P and 14,912 P→U singletons. The SFS shift remained highly significant (U→P mean = 4.456; P→U mean = 3.767; t = 16.807, P < 10−15; U-test, P < 10−15). Under a constant-Ne Wright-Fisher neutral model, the expected frequency of derived states per nonsingleton site would be 4.374. Thus, the SFS for U→P sites was shifted slightly toward higher values, whereas that for P→U sites was shifted toward lower values. Analyses were repeated for each amino acid. In every case, the SFS was shifted toward higher values for U→P sites than for P→U sites, although there was variation among amino acids in the extent of the shift (see Variation among amino acids in the SFS shift). The results were qualitatively unchanged when singletons were excluded (Table 3).
Table 3

Shifts in site frequency spectra for each amino acid

Amino AcidAll SitesSingletons Excluded
N U→PaN P→UMean U→PbMean P→UP-Value t-TestcP-Value U-TestcN U→PN P→UMean U→PMean P→UP-Value t-testP-Value U-Test
ala74317703.3552.267<10−15<10−153466824.4973.7483.43 × 10−72.54 × 10−8
arg73816272.9762.4105.78 × 10−72.96 × 10−83586514.1683.9020.03540.0177
asn5537283.5622.4044.68 × 10−134.34 × 10−142792774.4663.9100.002460.00178
asp6186843.5652.3671.39 × 10−143.02 × 10−143052594.6623.8118.26 × 10−66.95 × 10−7
cys1623383.5252.3461.43 × 10−54.278 × 10−6811254.7163.7760.003198.19 × 10−4
gln2328023.6082.1701.72 × 10−101.84 × 10−101123044.8753.6121.66 × 10−62.457 × 10−7
glu40813233.7332.225<10−15<10−151984674.7683.7771.20 × 10−63.003 × 10−6
gly85015773.0292.3732.47 × 10−94.26 × 10−124095974.1173.8580.03150.00371
his2553383.3331.9766.14 × 10−101.08 × 10−101241084.2743.4720.002810.00347
ile50212803.1712.2212.46 × 10−115.43 × 10−122434764.4123.7541.40 × 10−42.48 × 10−4
leu115139883.5812.217<10−15<10−1557814524.4903.7663.58 × 10−101.56 × 10−10
lys33511653.8782.039<10−15<10−151704114.6593.6402.13 × 10−61.14 × 10−5
phe3729844.0242.181<10−15<10−151833604.7873.7289.97 × 10−71.42 × 10−6
pro62213533.2382.3623.25 × 10−114.14 × 10−113045474.3653.8264.84 × 10−44.47 × 10−4
ser79919363.2992.246<10−15<10−153846944.3313.7101.164 × 10−52.66 × 10−5
thr69115663.1482.3178.74 × 10−123.81 × 10−123406154.4123.7842.553 × 10−51.78 × 10−4
tyr3385113.7812.160<10−15<10−151951814.4823.6308.016 × 10−54.88 × 10−4
val54918413.3532.223<10−15<10−152646934.5983.7556.314 × 10−79.97 × 10−8

N, number of polymorphic sites.

Mean frequency of derived states/site.

P-values are for 1-tailed tests.

N, number of polymorphic sites. Mean frequency of derived states/site. P-values are for 1-tailed tests. Llopart proposed an alternative test, based on the prediction that natural selection should lead to a positive relationship between d and Δpref. Computationally, the sum of d × Δpref can serve as a proxy for a correlation or regression coefficient; therefore, 1-tailed statistical significance can be estimated from the proportion of random permutations of d vs. Δpref that lead to a higher sum of products. The test is significant for all amino acids (including or excluding singletons; see Table 4), as well as for all sites pooled.
Table 4

Shifts in site frequency spectra for each amino acid, Monte Carlo analyses

Amino AcidP-Value,b All Sites, ParsimonyP-Value, All Sites, BayesianP-Value, No Singletons, ParsimonyP-Value, No Singletons, Bayesian
ala0c000
arg000.026670.01952
asn000.002380.00542
asp000.000010.00002
cys0.000020.000080.002730.00527
gln0000
glu0000
gly000.020550.01272
his000.003370.00969
ile000.000010.00002
leu0000
lys0000
phe0000
pro000.000090.00008
ser0000
thr0000.00001
tyr000.000070.00067
val0000

Permutation test of Llopart .

All P-values are for 1-tailed tests.

A reported estimate of 0 indicates that none of 100,000 data permutations led to a higher value of the test statistic.

Permutation test of Llopart . All P-values are for 1-tailed tests. A reported estimate of 0 indicates that none of 100,000 data permutations led to a higher value of the test statistic. Llopart also proposed a modification to the test to correct for ASM. Essentially, a simple Bayesian approach was suggested to calculate a posterior odds ratio of correct assignment by parsimony to ASM. The likelihood ratio was based on the neutral SFS in a constant-N population: likelihood ratio = (k − d)/d, where d is the frequency of derived states assuming parsimony, k is the number of individuals in the sample (here, 11), and k − d is the number of derived states when parsimony is incorrect (i.e., when there is ASM). The prior odds ratio is the relative probability of no mutation on the branch connecting the outgroup to the base of the ingroup coalescent to the probability of a single mutation on that branch:where and are estimates of divergence and diversity, respectively. Thus, following Llopart , the posterior odds ratio is calculated as:The permutation test can then be performed after randomly assigning an ancestral state for each site using the posterior odds ratio. As shown in Table 4, this usually slightly more conservative test (using distinct estimates of θ and D for each amino acid) remains significant for all analyses. Following up on the recombination analyses, we compared the slopes of the regression lines (d on Δpref) for sites in four recombination classes [following McGaugh ]: 0−0.5, >0.5−3.0, >3.0−6.0, and >6.0 cM/Mb. For the individual testcrosses (Flagstaff and Pikes Peak), we compared sites in 0-recombination regions to sites found elsewhere. The slope was not lower for the 0−0.5 cM/Mb class (b = 0.864, n = 1164) than for the three other classes (>0.5−3.0 cM/Mb: b = 0.648, n = 14,088; >3.0−6.0 cM/Mb, b = 0.672, n = 13,817; >6.0 cM/Mb, b = 0.832, n = 4660). For the Flagstaff testcross, the slope was significantly lower for the 0-recombination regions (b = 0.164, n = 217) than for higher-recombination regions (b = 0.694, n = 33,226; P = 0.0002, 1-tailed Tukey-Kramer test). This result was not mirrored, however, for the Pikes Peak testcross (0-recombination regions: b = 0.903, n = 1467; higher recombination regions: b = 0.679, n = 32,262). In fact, mean d for U→P changes exceeded mean d for P→U changes in all five regions with recombination estimates of 0, the difference ranging from 0.804 (interval 152, n = 258) to 2.011 (interval 151, n = 133). One-tailed Mann-Whitney U-tests were significant after sequential Bonferroni correction (Rice 1989) for four of the five contrasts. The permutation test on each of the five regions produced similar results; all five tests were significant after Bonferroni correction assuming parsimony, and three were significant after Bonferroni correction when allowing for ASM. Therefore, if the slope of the regression line corresponds to effectiveness of selection on codon usage, there is only equivocal evidence for an effect of low recombination in D. pseudoobscura.

Variation among amino acids in the SFS shift

As noted previously, for all amino acids, the average frequency of derived states at synonymous polymorphic sites was greater for P→U changes than for U→P changes. This result is consistent with natural selection on synonymous codon usage. However, it is also consistent with the G/C-biased gene conversion (although recent work by Comeron suggests that that this does not occur in D. melanogaster). In the latter, individuals heterozygous for a preferred and an unpreferred codon will usually be segregating a pair of purines or a pair of pyrimidines at the synonymous site (usually the third position of a codon). If heteroduplex intermediates generated during crossing-over tend to resolve toward G or C, then this process could lead to shifts in the SFS even if crossing-over is not, itself, mutagenic. In the standard genetic code, there are 16 T/C-ending synonymous codon pairs and 13 A/G-ending synonymous codon pairs. Although the degree of bias varies, C- or G-ending codons are usually used disproportionately (Table 5). The C-ending codon is always used disproportionately, although barely so for Asp (50.5% C-ending). For the A/G pairs, the G-ending codon is used disproportionately in all cases except for Gly, where unpreferred GGG is used less often than unpreferred GGA.
Table 5

Analysis of variance (site type × direction)

Base ChangeEffectd.f.SSMSFP-Value
C↔TSite typea1667667107.87<10−15
Directionb1755675561,222.32<10−15
Interaction146746775.62<10−15
Residual30,844190,6696.2
G↔ASite type117917929.89<10−7
Direction1682968291,137.35<10−15
Interaction128228246.95<10−11
Residual25,420152,6236.0
C↔ASite type120020037.89<10−9
Direction112351235234.06<10−15
Interaction1979718.31<10−4
Residual11,12158,6995.3
G↔TSite type118118134.05<10−8
Direction111341134212.96<10−15
Interaction148489.000.00271
Residual9.84852,4455.3
C↔GSite type129229246.46<10−11
Direction10.000.000.000.993
Interaction15.915.910.940.332
Residual9,09157,1006.3
A↔TSite type1330.8330.862.78<10−14
Direction12.22.20.420.519
Interaction111.911.92.260.133
Residual10,98157,8675.3

d.f., degrees of freedom; SS, sum of squares; MS, mean squares.

Site type can be intron or codon third position.

Direction can be, for example, C→T or T→C.

d.f., degrees of freedom; SS, sum of squares; MS, mean squares. Site type can be intron or codon third position. Direction can be, for example, C→T or T→C. For all 29 pairs, the average frequency of derived states per polymorphic site is higher for T→C or A→G sites than for corresponding C→T or G→A sites. If biased gene conversion is responsible for these SFS shifts, the magnitude of the shifts should be similar for all codon pairs (at least within the A/G or C/T class). There is, however, considerable variation among codon pairs in relative codon usage and the difference in derived states/site (Table 5). For C/T pairs, two-way analysis of variance [ANOVA; codon pair by direction (C→T vs. T→C)] indicated highly significant effects of codon pair (F15,12681 = 3.859, P < 10−6) and direction (F1,12681 = 730.5, P < 10−15), as well as a highly significant interaction effect (F15,12681 = 3.250, P < 10−4). Similar results were obtained for A/G pairs (site type: F12,9514 = 2.841, P < 10−4; direction: F1,9514 = 576.1, P < 10−15; interaction: F12,9514 = 5.607, P < 10−8). For the C/T pairs, the difference in mean d correlates moderately with the difference in the preference scores between the C- and T-ending codons, although not well with the degree of bias (Figure 5, A and B). For the A/G pairs, the difference correlates well with degree of bias, and somewhat with the difference in G- and A-ending codon preference scores (Figure 5, C and D).
Figure 5

Shifts in site frequency spectra among codon pairs. (A) Difference in average frequency of derived states/polymorphic site for C/T codon pairs relative to codon usage (i.e., proportion of C-ending codons). (B) Difference in average frequency of derived states/site for C/T codon pairs relative to Δpref for T→C changes. (C, D) Corresponding figures for G/A codon pairs. Letters in legend correspond to single-letter amino acid codes; blue, fourfold degenerate amino acids; light blue, codon pair from fourfold degenerate subclass of sixfold degenerate amino acids; red, codon pair from isoleucine or twofold degenerate subclass of sixfold degenerate amino acids; gold, twofold degenerate amino acids. Dashed lines correspond to linear regression through all points.

Shifts in site frequency spectra among codon pairs. (A) Difference in average frequency of derived states/polymorphic site for C/T codon pairs relative to codon usage (i.e., proportion of C-ending codons). (B) Difference in average frequency of derived states/site for C/T codon pairs relative to Δpref for T→C changes. (C, D) Corresponding figures for G/A codon pairs. Letters in legend correspond to single-letter amino acid codes; blue, fourfold degenerate amino acids; light blue, codon pair from fourfold degenerate subclass of sixfold degenerate amino acids; red, codon pair from isoleucine or twofold degenerate subclass of sixfold degenerate amino acids; gold, twofold degenerate amino acids. Dashed lines correspond to linear regression through all points. It is worth contrasting the SFS shifts for codons to those of introns, as shifts in SFSs may provide insight into composition-biasing process, such as biased gene conversion. Using D. lowei to infer the ancestral states, the SFSs for all 12 possible changes were obtained. Of particular note are those for C→T, T→C, G→A, and A→G, because these mirror the changes discussed previously for codons. The differences in mean derived states/site, although highly significant, are not as pronounced for introns as they are for synonymous sites. For C→T and T→C, mean d was 1.977 and 2.802, respectively (a difference of 0.825). This contrasts with means of 2.231 and 3.582, respectively, for all C/T-segregating codons (a difference of 1.351). Two-way ANOVA on d (Table 5) indicated highly significant effects of site type (intron vs. codon) and direction, as well as a site type × direction interaction. Similarly, for G→A and A→G intron sites, mean d was 1.939 and 2.885, respectively (a difference of 0.946). This contrasts with means of 2.187 and 3.614 for all G/A-segregating codons (a difference of 1.427). Again, two-way ANOVA on d indicated highly significant effects of site type and direction, as well as a strong site type × direction interaction. The site type × direction interaction effects indicate significantly larger shifts in codons relative to introns, as expected if the SFS shifts in codons are due to selection, and not only a composition-biasing influence shared by all sites, such as G/C-biased gene conversion. It is worth noting that restricting the analysis to the much smaller subset of sites in small introns does not qualitatively affect the results. Following Halligan and Keightley (2006), who proposed that short introns were less constrained than longer introns, sites were restricted to introns of 80 bp or shorter, excluding the first nine and last eight bases adjacent to splice junctions. There were 2414 C↔T sites and 1544 G↔A sites. The SFS shift for C↔T sites was slightly reduced (0.645), whereas the SFS shift for G↔A sites (0.935) was essentially unchanged. Mean d in introns was nearly identical for G→C and C→G (2.341 and 2.375, respectively) and for A→T and T→A (2.117 and 2.174, respectively); neither difference was statistically significant. Two-way ANOVA indicated only significant effects of site type (intron vs. codon); of note, there was no significant interaction effect (nor was there a significant direction effect). Intermediate results were obtained for G→T vs. T→G and C→A vs. A→C (both favoring changes toward G or C); all three effects in the ANOVA were significant. The general implication is that the forces shifting SFSs for synonymous codon pairs are stronger than those for introns, consistent with previous observations in Drosophila (De Proce ). Therefore, although there may be some effect of biased gene conversion on codon usage, this is insufficient to explain our observations.

Discussion

Our analyses corroborate the likely influence of natural selection on codon usage in Drosophila, specifically D. pseudoobscura. We also observe, as expected, a positive correlation between diversity and recombination rate. As shown by Kulathinal , this association is stronger when recombination is estimated at a finer scale. However, in contrast to analyses on D. melanogaster, we observe no significant association of codon bias with recombination rate, despite considerable statistical power and reliable estimates of recombination rate at the finer scale advocated by Kulathinal . Evidence from polymorphism data for reduced natural selection on codon usage in areas of low recombination in our dataset is limited at best. It is possible that recombination rate is recently reduced in these areas, such that Ne-reducing effects on diversity of linkage (especially by positive selection on recent beneficial mutations) are apparent, but the Ne remains sufficient for effective selection. Changes in codon bias would not become apparent for some time, given that this requires accumulation of synonymous substitutions. Both findings indicate that recombination rate across chromosome 2 exceeds the threshold necessary for effective natural selection. Recombination rates in D. pseudoobscura are generally higher than those estimated for D. melanogaster (Comeron ). In D. melanogaster, 21% of intervals had recombination rate estimates below 0.5 cM/Mb, in contrast with 6% in D. pseudoobscura. Although 59% of intervals had recombination rates above 1.5 cM/Mb in D. melanogaster (56% for autosomal regions only), this value was exceeded in 82% of intervals in D. pseudoobscura. Furthermore, while the local Ne could effectively vary across the chromosome due to the Ne-reducing effect of selection on linked sites in areas of lower recombination, the generally higher Ne in D. pseudoobscura may mitigate the Hill-Robertson effect to some extent. That is, the product of Ne and s may be sufficient over most of the chromosome for selection to effectively fix the optimal codons.

Distortion of the SFS

The shape of the overall SFS for the 32,729 synonymous polymorphic sites clearly differs from that expected for neutral variants in a constant-N population. First, the proportion of derived singletons (55.1%) is much higher than the 34.1% expected in a population of constant size (Figure 6). Second, there is a raised “tail” in the SFS, with more sites having 10 derived states than 9 derived states.
Figure 6

Site frequency spectra corrected for sequencing error and ancestral state misassignment (ASM). Expected proportions under a constant-Ne Wright-Fisher neutral model are shown in black; our data, assuming parsimony, are shown in blue. (A) Correction for ASM based on observed levels of diversity and divergence (following Llopart ). (B) Correction for ASM with a 0.1% sequencing error rate. (C) Correction for ASM with a 0.54% error rate.

Site frequency spectra corrected for sequencing error and ancestral state misassignment (ASM). Expected proportions under a constant-Ne Wright-Fisher neutral model are shown in black; our data, assuming parsimony, are shown in blue. (A) Correction for ASM based on observed levels of diversity and divergence (following Llopart ). (B) Correction for ASM with a 0.1% sequencing error rate. (C) Correction for ASM with a 0.54% error rate. Biological explanations for an excess of singletons include purifying background selection (Charlesworth ) and population expansion (Tajima 1989a), both of which distort the coalescent of a population with constant Ne to increase the relative lengths of branches upon which mutations would be observed as singletons (Tajima 1989a,b). Background selection or a demographic influence on the SFS should affect U→P and P→U sites similarly, but we observe a significantly stronger shift toward low-frequency derived states in P→U sites. Therefore, while both influences may be at work in D. pseudoobscura, they are not sufficient to explain our observations. In addition to demographic effects, an excess of singletons can arise, in principle, from sequencing error. With a low value of θ, most sites in a small sample (here, k = 11) would be invariant. A single error at an invariant site would produce an apparent singleton. Additional errors would add to the remainder of the SFS, but assuming that errors are independent, the impact would be seen mainly on singletons. Assuming a 0.1% error rate (equivalent to a phred consensus quality score of 30), following the binomial distribution, 98.91% of truly invariant sites would be observed as invariant. However, 1.09% of invariant sites would be apparent singletons, whereas only 5.5 × 10−3% would appear to have two derived states (assuming that all errors produce the same character state). Likewise, 99.00% of true singletons would be observed as singletons, whereas 0.99% would present as having two derived states (i.e., be observed as “doubletons”) and 4.4 × 10−3% would present as having three derived states. This approach can be extended to all possible values for true and observed derived states (i.e., for a true invariant site appearing to have 1, 2, ... 10 derived states; for a true singleton appearing to have 2, 3... 10 derived states; and so on), ultimately leading to 42.3% of sites having one apparent derived state. This is well below the observed value of 55.1%, which would require an error rate of approximately 0.33%. Given the requirements of consensus quality scores of 30 or better, a minimum of 15 × coverage in highly inbred strains, and full resolution of all three codon positions in every strain (including the outgroup), sequencing error alone does not explain the high proportion of singletons observed. It is further unlikely that the resequencing using the Illumina platform is leading to an accumulation of false As and Ts. Across a range of genomic G+C contents, Nakamura found that A/T→G/C errors were more likely than G/C→A/T errors. Although Illumina sequencing is more prone to base call errors than either ABI SOLiD or Roche 454 (Liu ), error bias is unlikely to explain our observed shifts in SFSs. Most A/T→G/C polymorphic sites reflect U→P changes in D. pseudoobscura yet the proportion of singletons is much lower for U→P sites (43.1%) than it is for P→U sites (60.7%; Figure 4).

SFS differences among codon pairs

The SFS data are consistent with an influence of natural selection on codon usage in D. pseudoobscura, corroborating previous studies (Akashi and Schaeffer 1997; Haddrill ). Additionally, the data indicate that intensity of selection varies among synonymous mutations. Vicario previously proposed this possibility in a comparison of codon usage in the genomes of 12 Drosophila species. Although there is a correspondence between codon bias and SFS shifts consistent with differential selection, the nature of that differential selection is uncertain. Selection does not appear to be strongest on the more common amino acids within a degeneracy class, as might be expected for selection on efficiency of translation. For example, phenylalanine and tyrosine are used at an intermediate level among the C/T-ending twofold degenerate amino acids, despite showing the strongest SFS shifts in this class. Alanine is the most commonly used fourfold degenerate amino acid, but its SFS shifts for C↔T and A↔G changes are intermediate within the degeneracy class. On the other hand, for sixfold degenerate amino acids, leucine and serine are used more often than arginine (and are among the most used amino acids overall) and show much stronger SFS shifts for changes in the fourfold degenerate subsets; in fact, their SFS shifts are among the strongest observed in this analysis. Vicario proposed ad hoc explanations for stronger selection on some amino acids (e.g., for accurate translation of disulfide bridge-forming cysteine or for accurate and efficient translation of heavily used hydrophobic leucine). Potential influences of isoaccepting tRNA pools on codon bias have been identified for Drosophila (Moriyama and Powell 1997; Powell and Moriyama 1997), but these authors note that pools likely change over time within an individual (White ) and that this plasticity may itself influence codon bias among amino acids. Furthermore, the complement of tRNA genes likely changes over evolutionary time in Drosophila, with evidence of numerous gains, losses, and reassignments (Rogers ). The observed slightly stronger shifts for A↔G vs. C↔T synonymous changes may reflect differences in composition-biasing influences, as reflected in corresponding SFS shifts for introns. It is, therefore, probably premature to speculate too extensively on the bases of differential selection among amino acids. That this main result is not an artifact of sequencing error is reinforced by differences among codon pairs in the proportion of derived singletons and the proportion of polymorphic sites. Variation in the proportion of derived singletons among all 16 ancestrally C-ending codons is not quite significant (G = 24.91, 15 d.f., P = 0.0511), and there is no significant variation for ancestral T-ending codons (G = 14.85, 15 d.f., P = 0.462). However, variation in the proportion of derived singletons is significant for both ancestral G-ending (G = 21.42, 12 d.f., P = 0.0445) and A-ending codons (G = 22.98, 12 d.f., P = 0.0279). For C/T codon pairs, 3.25% of ancestral C-ending codons are polymorphic, whereas 2.51% of ancestral T-ending codons are polymorphic; this difference is highly significant (G test of independence, G = 189.4, 1 d.f., P < 10−15). Although this finding could indicate a higher probability of C→T errors (which is not likely with Illumina platforms; see Distortion of the SFS), there is considerable variation among C/T codon pairs in the proportion of polymorphic ancestral C-ending codons (G test of independence: G = 199.3, 15 d.f., P < 10−15) and polymorphic T-ending codons (G = 197.4, 15 d.f., P < 10−15). Similar results were obtained for G/A codon pairs; 3.13% of ancestral G-ending codons were polymorphic and 2.22% of ancestral A-ending codons were polymorphic (G = 224.5, 1 d.f., P < 10−15) . However, the proportions varied among codon pairs (ancestral G-ending: G = 102.2, 12 d.f., P < 10−15; ancestral A-ending: G = 42.7, 12 d.f., P < 10−4) (see Table 6 for proportions). Examining codon pairs from fourfold degenerate sites only, G tests remained significant for all four ancestral bases (all P-values below 10−4). Likewise, for twofold degenerate sites only, G tests were highly significant for ancestral C-ending and T-ending codons (P < 10−15) and for ancestral G-ending codons (P = 0.00012); however, there was no significant heterogeneity among codon pairs for ancestral A-ending codons (P = 0.403). Thus, there is considerable variation in the proportion of polymorphic codons, even within similar degeneracy classes, a result that does not support a major role for sequencing error, but is consistent with differences among codon pairs in the influence of weak selection.
Table 6

Summary data for C/T and G/A segregating and fixed different codon third positions

Codon PairAmino AcidC3 or G3aΔprefTA->CGbS/N (CG)cS/N (TA)dd¯CG>TA:d¯TA>CGe
GC C/TAla0.7570.972684/24,976193/8,2752.251:3.539
GG C/TGly0.7470.652574/19,504226/7,0272.336:3.319
CC C/TPro0.7740.829428/14,44497/4,1442.185:3.557
AC C/TThr0.7240.792458/15,326104/6,1172.216:3.808
GT C/TVal0.6630.737355/12,95998/7,0112.352:3.622
CG C/TArg40.6920.591509/14,398194/7,0442.369:3.170
CT C/TLeu40.7280.774525/14,629116/5,8692.051:3.776
TC C/TSer40.7430.638436/14,149150/5,1641.959:4.093
AT C/TIle0.6010.929753/21.090241/15,3102.112:3.560
AG C/TSer20.7080.567468/16,090223/7,3532.348:3.090
AA C/TAsn0.5520.922728/21,727553/19,9562.404:3.562
GA C/TAsp0.5050.784684/21,184618/24.2062.367:3.565
TG C/TCys0.7230.608338/10,916162/4,4582.346:3.525
CA C/THis0.5780.662338/11,373255/9,7811.976:3.333
TT C/TPhe0.6291.064984/20,878372/13,3612.181:4.024
TA C/TTyr0.6210.842511/15,979338/11,3922.160:3.781
GC G/AAla0.5220.567278/8.438161/8,6912.392:3.503
GG G/AGly0.3610.208229/4,821201/9,5512.655:2.960
CC G/APro0.5570.653308/8,910181/8,4562.360:3.508
AC G/AThr0.5710.620408/12,025181/10,0852.368:3.320
GT G/AVal0.8490.942631/23,036113/4,3072.019:3.894
CG G/AArg40.5660.442217/5,828115/5,3982.618:3.104
CT G/ALeu40.8741.140959/34,643169/5,6492.088:4.172
TC G/ASer40.7300.684399/12,539102/5,2372.120:3.941
AG G/AArg20.5740.283145/4,23670/3,3942.159:3.171
TT G/ALeu20.8270.138334/12,65874/2,7882.135:3.824
GA G/AGlu0.7221.4481323/38,468408/16,3832.225:3.733
CA G/AGln0.7531.312802/28,876232/10,3862.170:3.608
AA G/ALys0.7231.4001165/37,359335/14,9532.039:3.878

Usage of the codon ending in C or G for a C/T or G/A codon pair, respectively.

Δpref for a substitution of a T- or A-ending codon with the corresponding C- or G-ending codon.

S, frequency of polymorphic sites with C or G as the ancestral state; N, frequency of sites with C or G as the ancestral state; frequencies are reported only for sites that are fully resolved at all three codon positions in all D. pseudoobscura and D. lowei sequences.

N and S for sites with T or A as the ancestral state (see c).

Mean frequency of derived states per site; CG→TA, ancestral state ends with either C or G; TA→CG, ancestral state ends with either T or A.

Usage of the codon ending in C or G for a C/T or G/A codon pair, respectively. Δpref for a substitution of a T- or A-ending codon with the corresponding C- or G-ending codon. S, frequency of polymorphic sites with C or G as the ancestral state; N, frequency of sites with C or G as the ancestral state; frequencies are reported only for sites that are fully resolved at all three codon positions in all D. pseudoobscura and D. lowei sequences. N and S for sites with T or A as the ancestral state (see c). Mean frequency of derived states per site; CG→TA, ancestral state ends with either C or G; TA→CG, ancestral state ends with either T or A.

Impact of ASM

The raised tail of the SFS may be explained by ASM, especially if the SFS is already left-shifted. Under a model with constant Ne, we expect a ratio of 10:9 for sites with 9 vs. 10 derived states. However, if we allow ASM with a probability of 1/(LR+1), where LR is the likelihood ratio in Equation 1, we can solve for the value of divergence required for a given value of θ that would lead to a 1:1 ratio by (a) factoring in sites with 1 or 2 derived states that present as having 10 or 9 derived states and (b) factoring out sites with 10 or 9 derived states that would present as having 1 or 2 derived states. For our θ = 0.0222, a divergence of 0.0424 would be sufficient and would lead to 4.1% of sites presenting 9 and 4.1% of sites presenting 10 derived states under a constant-N model. While both values exceed our observations, the excess of singletons by necessity decreases the proportion of sites in the remainder of the remainder of the SFS. We estimated per-site synonymous divergence at 0.0763, which leads to a slightly raised tail (a ratio of 1.12 for 10:9 derived states); we observed a ratio of 1.54. If we assume a constant-N model, but with ASM and a sequencing error rate of 0.1%, we can begin to approach the observed SFS (Figure 6). However, we still observe an excess of singletons; we still need a much higher error rate to approach the observed SFS. In fact, for P→U sites, the error rate would have to be approximately 0.54% to produce the extreme left shift. It is not possible to reproduce the SFS for U→P sites, with a slight excess of singletons and a markedly raised tail, with sequencing error and ASM alone. Analysis of SFS of derived synonymous mutations in D. pseudoobscura indicates that the intensity of natural selection varies among classes of synonymous mutation. The shapes of the SFSs are likely shaped by other influences, possibly including ASM and sequencing error, but variation among synonymous codon pairs in the extent of the SFS shifts support differential intensity of selection.
  56 in total

1.  Genetic drift in an infinite population. The pseudohitchhiking model.

Authors:  J H Gillespie
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

2.  The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation.

Authors:  G A McVean; B Charlesworth
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

3.  The 'effective number of codons' used in a gene.

Authors:  F Wright
Journal:  Gene       Date:  1990-03-01       Impact factor: 3.688

4.  The effect of change in population size on DNA polymorphism.

Authors:  F Tajima
Journal:  Genetics       Date:  1989-11       Impact factor: 4.562

5.  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

Authors:  F Tajima
Journal:  Genetics       Date:  1989-11       Impact factor: 4.562

6.  Microsatellite variation in populations of Drosophila pseudoobscura and Drosophila persimilis.

Authors:  M A Noor; M D Schug; C F Aquadro
Journal:  Genet Res       Date:  2000-02       Impact factor: 1.588

7.  Statistical properties of segregating sites.

Authors:  Y X Fu
Journal:  Theor Popul Biol       Date:  1995-10       Impact factor: 1.570

8.  Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster.

Authors:  D J Begun; C F Aquadro
Journal:  Nature       Date:  1992-04-09       Impact factor: 49.962

9.  Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do 'prefer' optimal codons.

Authors:  P M Sharp; K M Devine
Journal:  Nucleic Acids Res       Date:  1989-07-11       Impact factor: 16.971

Review 10.  Codon usage and tRNA content in unicellular and multicellular organisms.

Authors:  T Ikemura
Journal:  Mol Biol Evol       Date:  1985-01       Impact factor: 16.240

View more
  4 in total

1.  Evidence for stabilizing selection on codon usage in chromosomal rearrangements of Drosophila pseudoobscura.

Authors:  Zachary L Fuller; Gwilym D Haynes; Dianhui Zhu; Matthew Batterton; Hsu Chao; Shannon Dugan; Mehwish Javaid; Joy C Jayaseelan; Sandra Lee; Mingmei Li; Fiona Ongeri; Sulan Qi; Yi Han; Harshavardhan Doddapaneni; Stephen Richards; Stephen W Schaeffer
Journal:  G3 (Bethesda)       Date:  2014-10-17       Impact factor: 3.154

2.  Effective population size does not predict codon usage bias in mammals.

Authors:  Michael D Kessler; Matthew D Dean
Journal:  Ecol Evol       Date:  2014-09-23       Impact factor: 2.912

3.  Variation in the Intensity of Selection on Codon Bias over Time Causes Contrasting Patterns of Base Composition Evolution in Drosophila.

Authors:  Benjamin C Jackson; José L Campos; Penelope R Haddrill; Brian Charlesworth; Kai Zeng
Journal:  Genome Biol Evol       Date:  2017-01-01       Impact factor: 3.416

4.  Meiotic, genomic and evolutionary properties of crossover distribution in Drosophila yakuba.

Authors:  Nikale Pettie; Ana Llopart; Josep M Comeron
Journal:  PLoS Genet       Date:  2022-03-23       Impact factor: 5.917

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.