Literature DB >> 21680889

The role of the effective population size in compensatory evolution.

Abstract

The impact of the effective population size (Ne) on the efficacy of selection has been the focus of many theoretical and empirical studies over the recent years. Yet, the effect of Ne on evolution under epistatic fitness interactions is not well understood. In this study, we compare selective constraints at independently evolving (unpaired) and coevolving (paired) sites in orthologous transfer RNAs (tRNA molecules for vertebrate and drosophilid species pairs of different Ne. We show that patterns of nucleotide variation for the two classes of sites are explained well by Kimura's one- and two-locus models of sequence evolution under mutational pressure. We find that constraints in orthologous tRNAs increase with increasing Ne of the investigated species pair. Thereby, the effect of Ne on the efficacy of selection is stronger at unpaired sites than at paired sites. Furthermore, we identify a "core" set of tRNAs with high structural similarity to tRNAs from all major kingdoms of life and a "peripheral" set with lower similarity. We observe that tRNAs in the former set are subject to higher constraints and less prone to the effect of Ne, whereas constraints in tRNAs of the latter set show a large influence of Ne. Finally, we are able to demonstrate that constraints are relaxed in X-linked drosophilid tRNAs compared with autosomal tRNAs and suggest that Ne is responsible for this difference. The observed effects of Ne are consistent with the hypothesis that evolution of most tRNAs is governed by slightly to moderately deleterious mutations (i.e., |Nes|≤5).

Entities: Chemical Disease Gene Species

Mesh：

Substances：
RNA, Transfer

Year: 2011 PMID： 21680889 PMCID： PMC3140890 DOI： 10.1093/gbe/evr057

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

The effective population size (Ne) is a fundamental quantity in population genetics. It is essential in shaping neutral nucleotide variation in a population and crucial for determining the efficacy of selection (Kimura 1983; Charlesworth 2009). The rate of molecular evolution may decrease, remain unchanged, or increase with increasing Ne, depending on whether mutations are deleterious, (nearly) neutral, or beneficial in nature, respectively (Gillespie 1999). For independently evolving sites, the rate depends on the product of Ne and the selection coefficient s as well as the scaled mutation rate (θ = 4Neμ). Therefore, a mutation that is slightly deleterious in a species of large Ne might have a neutral effect in a species with small Ne (Chamary et al. 2006). This role of Ne in the evolution of independently evolving sites has been studied extensively from a theoretical point of view (Kimura and Ohta 1969; Ohta 1972; Kimura 1983) and has been empirically confirmed (Weinreich and Rand 2000; Woolfit and Bromham 2003, 2005; Eőry et al. 2010; Andolfatto et al. 2011). However, the relation between the speed of evolution due to Ne at independent nucleotide sites and positions that evolve under epistasis is much less clear. To study the evolution of sites that are involved in epistatic interactions, a model with at least two loci is needed. Kimura (1985) introduced a two-locus model of compensatory neutral mutations in molecular evolution (fig. 1). He assumed that mutations at a pair of loci may be individually deleterious but neutral in certain combinations. Given two loci with wild-type alleles A and B at the first and second locus, respectively, he studied the expected fixation time () for the double-mutant ab under the assumption that selection against individual mutants is strong (and thus the mutation process is nearly irreversible). Specifically, he assumed that the intermediate configurations of alleles (Ab, aB) suffer the same disadvantage s and that the wild-type AB and double-mutant ab have the same fitness (i.e., the process does not lead to adaptation but only compensation). Under such conditions, ab may rise to fixation without prior fixation of any of the deleterious intermediates (stochastic tunneling) (Iwasa et al. 2004). Subsequently, Kimura's model was extended by incorporating different reductions in fitness (s1, s2) for the intermediates Ab, aB (Stephan 1996) and also allowing for weak purifying selection such that back mutations may be possible (Innan and Stephan 2001). In this case, fixation of ab can be preceded by a fixation of any of the deleterious intermediates (Ohta 1973). Fixation times in the two-locus case were also investigated in diploid populations (Ichinose et al. 2008) and for double mutations that lead to adaptation (Lynch 2010; Weissman et al. 2010). All these models have in common that for most parameter combinations, was found to increase with increasing Nes. Furthermore, for weak selection, faster fixation at independently evolving sites is expected, whereas it was shown that in the case of strong selection against deleterious intermediates, evolution proceeds faster at coevolving sites (fig. 1) (Kimura 1985).

(a) Kimura's (1985) two-locus model of sequence evolution, which assumes unidirectional mutation. The model is described in the main text. (b) Expected ratio of waiting times until fixation of deleterious and selectively neutral mutations at independently evolving (solid lines) and coevolving sites (dashed lines). Black lines describe fixation times in Kimura's unidirectional models (eq. 13 from Kimura 1980 and eq. 16 from Kimura 1985). Gray lines were obtained by taking back mutations into account using equations (5a and 6) in Innan and Stephan (2001) for coevolving sites and simulations of the Wright–Fisher process for independent sites. Results are given for mutation rate μ = 2.5 × 10−8 and selection coefficient s = 10−4. The role of compensatory mutations has been investigated in the case of protein evolution (Brown et al. 2010), but also RNA molecules provide a great opportunity to directly compare evolution at independently evolving and coevolving sites as they are composed of unpaired nucleotides and nucleotides that form Watson–Crick (WC) base pairs. Previous studies have shown that compensatory mutations are the main driving force of evolution in paired regions of RNA molecules (Parsch et al. 1997; Chen et al. 1999; Chen and Stephan 2003; Meer et al. 2010) and that the rate of compensation depends on structural features of the molecule. Specifically, this rate can be related to the length of the pairing region (helix), the position of the pairing nucleotide within the helix, and the GC content of the helix (Parsch et al. 2000; Piskol and Stephan 2008). Furthermore, population genetic parameters such as the recombination rate between pairing sites were shown to influence the rate of coevolution in RNA molecules (Kirby et al. 1995). Here we investigate how another population genetic parameter, Ne, shapes RNA evolution. We are especially interested how it influences the rate of evolution at independently evolving and coevolving sites. Therefore, we focus on transfer RNAs (tRNAs)—a class of noncoding RNAs with well-studied structure and function. We present a rigorous analysis of selective constraints in tRNA molecules with particular focus on the difference between selective constraints for paired and unpaired nucleotides and interpret the results in the light of theoretical predictions for fixation times of deleterious mutations. We use Kimura's (1980, 1985) unidirectional models to describe sequence evolution at independently evolving and coevolving sites (fig. 1). In our analysis, the range of moderate and weak purifying selection (|Nes| ≤ 5) is of particular interest as evolution of paired sites in noncoding RNA molecules was shown to take place in this parameter range (Piskol and Stephan 2011).

Materials and Methods

Sequence Data

Sequence data were obtained from the University of California Santa Cruz (UCSC) Genome Browser FTP server (Kent et al. 2002) in form of axt pairwise alignments for the following vertebrate species pairs: human/macaque (hg19/rheMac2), macaque/marmoset (rheMac2/calJac3), dog/cat (canFam2/felCat3), and chicken/zebra finch (galGal3/taeGut1). The assemblies of these genomes are the same as used by Rfam (Gardner et al. 2009) for the annotation of noncoding RNA families. The pairwise genomic alignment of mouse/rat available at UCSC is based on different assemblies than the Celera assemblies (Mural et al. 2002) used by Rfam. Therefore, the Celera assemblies of the mouse and rat genomes were aligned following the same protocol that was used to produce the UCSC alignments. The vertebrate alignments served as a source for orthologous tRNAs and neutrally evolving sequences. Annotations of tRNAs were downloaded from Rfam (Release 10.0) for human, macaque, mouse, rat, dog, and chicken. The UCSC Drosophila multiple alignment, which consists of up to 12 species, was analyzed for the species pairs Drosophila melanogaster/D. simulans and D. melanogaster/D. yakuba. It was used to determine neutrally evolving regions only. The annotations of orthologous Drosophila tRNAs were taken from Rogers et al. (2010), and corresponding sequences were downloaded in batch from Flybase (Tweedie et al. 2009). Annotations of protein coding genes were acquired from the refGene tracks of the UCSC Genome Browser for all species except mouse and rat. The locations of ancestral repeats (ARs), that is, repetitive sequences common to both species in a pair were determined according to RepeatMasker annotations available in the “rmsk” tables of the UCSC Genome Browser (downloaded on 18 December 2010). Protein coding gene annotations in mouse and rat were obtained from GenBank, and repeats in the Celera mouse and rat assemblies were annotated using RepeatMasker 3.2.9 (Smit et al. 1996–2010) based on mouse/rat-specific repeat libraries RM-20090604 (Jurka et al. 2005).

Effective Population Sizes

Estimates of long-term effective population sizes were obtained from the literature (table 1) for chicken/zebra finch (Jennings and Edwards 2005), mouse/rat (Baines and Harr 2007), and Drosophila (Li and Stephan 2006). Ne for macaque/marmoset and dog/cat were taken from Piganeau and Eyre-Walker (2009), assuming that the ratio Ne−autosomes:Ne−mitochondria is 4:1. In most of these studies, long-term Ne for the pairs were calculated as averages of single-species Ne, which were obtained from polymorphism data. Because no estimate of Ne existed for the pair human/macaque, we averaged over Ne for the two species (Eyre-Walker et al. 2002; Evans et al. 2010). However, due to the heterogeneity of the data sources employed for the calculation of Ne, the absolute values were not directly used in our analysis. Estimates of Ne merely served to establish the following semiquantitative relationship between species pairs: Ne (human/macaque) < Ne (macaque/marmoset) < Ne (dog/cat) < Ne (chicken/zebra finch) < Ne (mouse/rat) < Ne (D. melanogaster/D. yakuba) ≈ Ne (D. melanogaster/D. simulans).

Table 1

Composition of tRNA Data Sets for Different Species Pairs

Species Pair	N_e	all tRNAs			Peripheral tRNAs			Core tRNAs
		# tRNAs	GC Content		# tRNAs	GC Content		# tRNAs	GC Content
			Paired	Unpaired		Paired	Unpaired		Paired	Unpaired
Human/macaque	8.9 × 10⁴	277 (2)	0.5138	0.3105	151 (2)	0.4976	0.3142	126 (0)	0.5316	0.3059
Macaque/marmoset	1.7 × 10⁵	268 (1)	0.5172	0.3165	144 (1)	0.4915	0.3173	124 (0)	0.5441	0.3156
Dog/cat	5.2 × 10⁵	259 (0)	0.5256	0.3080	134 (0)	0.5206	0.3124	125 (0)	0.5298	0.3033
Chicken/zebra finch	6.5 × 10⁵	114 (1)	0.7029	0.4123	63 (1)	0.7149	0.4169	51 (0)	0.6884	0.4062
Mouse/rat	≈10⁶	106 (0)	0.5552	0.3074	46 (0)	0.5850	0.3271	60 (0)	0.5356	0.2920
Drosophila melanogaster/D. yakuba	>10⁶	277 (21)	0.6963	0.3827	95 (5)	0.6788	0.4019	182 (16)	0.7061	0.3720
D. melanogaster/D. simulans	>10⁶	229 (13)	0.6956	0.3822	83 (2)	0.6770	0.4025	146 (11)	0.7071	0.3700

NOTE.—Numbers of X-linked tRNAs are shown in parentheses. The GC content is given for non–CpG-prone positions only.

Composition of tRNA Data Sets for Different Species Pairs NOTE.—Numbers of X-linked tRNAs are shown in parentheses. The GC content is given for non–CpG-prone positions only.

tRNA Alignments and Structures

Orthologous vertebrate tRNA sequences and structures for all species pairs were determined based on the pairwise species alignments and Rfam annotations. Thereby, if tRNA Rfam annotations existed for both species in a pair, overlapping orthologs were identified and the corresponding sequences extracted from the pairwise alignment. If Rfam annotations existed only for the reference species, then sequences of the other species (“query” species) that were aligned to the reference in the annotated regions were evaluated against the Rfam tRNA covariance model using cmsearch from the INFERNAL package (version 1.0.2) (Nawrocki et al. 2009) to obtain a bit score S and corresponding E value for each sequence. The score S indicates how well a given sequence matches the Rfam tRNA covariance model, which describes a consensus tRNA structure that is based on a seed alignment of tRNAs from 967 species. These cover all major kingdoms of life (bacteria, fungi, plants, and animals). The score represents the log2 odds ratio between the probability of the target sequence under the covariance model and its probability to be a random sequence. For instance, a bit score of 35 symbolizes a 235 higher probability of the target sequence to be a tRNA, compared with a random sequence. Higher S values indicate larger similarity of the target sequence to the covariance model and therefore higher structural similarity to a core set of tRNAs common to all species, whereas low E values describe a small probability that the sequence occurred only by chance in a database of random sequences. Therefore, selecting for higher bit scores allows us to choose tRNAs that share a high similarity with tRNAs from a wide range of other species. For all pairwise vertebrate and Drosophila alignments, only tRNA annotations with an INFERNAL bit score of S > 35 in both species and an E value < 0.01 were retained for further analysis. Furthermore, we discarded cases where the query sequence aligned to more than one location in the reference genome and only considered cases where both aligned tRNA annotations were located either on the X chromosome or on the autosomes in the two species. Subsequently, each pair of orthologous sequences was realigned using cmalign (Nawrocki et al. 2009). To rule out the influence of alignment and structure prediction on observed selective constraints and to avoid problems with the alignment of unpaired regions, we also created alternative alignments using mlocarna (Will et al. 2007) for a structure-based alignment and a combination of muscle (Edgar 2004) and RNAalifold (Hofacker et al. 2002) where alignment and structure are determined separately from each other. Both mlocarna and RNAalifold rely on thermodynamic predictions of the secondary structure. In some cases, thermodynamic prediction may fail to determine the correct topology of tRNA molecules. Therefore, we informed mlocarna and RNAalifold by providing the cmalign structures as constraints for either both sequences or the reference sequence, respectively. Orthologous drosophilid tRNAs from Rogers et al. (2010) were scored with cmsearch and subsequently aligned using the same three methods as for vertebrate tRNAs. Here, we present results based on the mlocarna alignments. Estimates of selective constraints obtained with muscle and cmalign are shown in supplementary tables S2 and S3 (Supplementary Material online), respectively. They only differ quantitatively, whereas qualitative predictions are the same for all three methods.

Neutrally Evolving Sequences

ARs, that is, repetitive sequences common to both species in a pair, served as indicators for neutral evolution in vertebrates (Eőry et al. 2010). Only ARs that reside in intergenic locations were considered. ARs were excluded if the pairwise alignment contained less than 50% of aligned nucleotides. Similar to previous studies (Eőry et al. 2010; Piskol and Stephan 2011), only long terminal repeats, DNA transposons, short interspersed elements, long interspersed elements, and other repeats were considered, whereas simple repeats, low complexity regions, and microsatellites were excluded from the analysis. Neutral evolution in drosophilids was based on positions 8–30 in short introns of protein coding genes (Parsch et al. 2010). Thereby only introns of single transcript genes were analyzed to ensure that the sequence is exclusively located in an intron and does not overlap with exons of other splice forms. Introns in genes with overlapping gene annotations on the same or opposite strand were discarded.

Selective Constraints

The strength of selection on a sequence of interest in a species was estimated by calculating the amount of selective constraint , where Nobs is the number of observed nucleotide substitutions between two closely related species and Nneut is the number of substitutions in a neutrally evolving region of the same length. We obtained Nobs in tRNAs for each species pair by concatenating all single tRNA orthologs. Therefore, positions of the alignment were classified into paired and unpaired states according to the consensus tRNA structure for the two sequences. Subsequently, Nobs was obtained separately for paired and unpaired alignment positions. The estimation of constraints may be confounded by several factors. Usually, the rate of substitutions in mammals is increased for dinucleotides in a CpG context through an elevation of the C → G transversion rates after the methylation of cytosine (Siepel and Haussler 2004). For that reason, all CpG-prone sites were excluded from the analysis by removal of all sites that are preceded by a C or followed by a G in the mammalian sequences (Gaffney and Keightley 2008). Furthermore, it was shown before that the GC content of the sequence and its deviation from the equilibrium GC content (GC*) will lead to increased rates of substitutions (Piganeau et al. 2002; Piskol and Stephan 2008). Therefore, differences in GC content between species pairs were accounted for by replacing Nneut with the expected number of substitutions (Nexp) that was calculated from ARs following the method of Halligan et al. (2004). Thereby, substitution rates that change the GC content were adjusted according to GC*, which was assumed to be 0.37 (Halligan et al. 2004; Khelifi et al. 2006; Duret and Arndt 2008). In all cases, 95% confidence intervals (CIs) for constraints were obtained by bootstrapping the tRNA alignments by column (while ensuring that the number of paired and unpaired columns in the bootstrapped alignment remained the same).

Results and Discussion

Expected Selective Pressures in RNA Molecules

We used the selective constraint (C) defined by Halligan et al. (2004) as a proxy for the level of selection on tRNA molecules. C describes the portion of deleterious mutations that are removed from the sequence due to purifying selection and is defined as (see Materials and Methods). , where and are the expected fixation times for deleterious and neutral mutations, respectively (Innan and Stephan 2001; Piskol and Stephan 2011). Therefore, the expected values for C can be described in terms of the theoretical predictions for the fixation times as Due to the dependence of the fixation times on θ and Nes, also C will be influenced by these parameters. The resulting relationship between selective constraints and Nes (fig. 2) for coevolving sites (Ccoev) and independently evolving sites (Cind) can be obtained by using the expected fixation times ( and ) and their neutral analogs in equation (1), respectively. We used Kimura's unidirectional models for and (Kimura 1980, 1985) because they are directly comparable in terms of model assumptions and parameters. However, the predictions made here are qualitatively the same as for models that take reversibility of the mutation process into account. Assuming that s is constant between species, the comparison of Ccoev and Cind allows for three main predictions in the case of weak purifying selection against new mutations:

Expected selective constraints at independently evolving sites (Cind) and coevolving sites (Ccoev) as a function of the scaled selection coefficient Nes. Dashed lines indicate the corresponding slopes. There exists a range of Nes in which Cind increases more rapidly than Ccoev. Therefore, the steeper slope for Cind results in a larger difference in constraints at independently evolving sites than at coevolving sites between species with different Ne. The trajectories for Cind and Ccoev were obtained from Kimura's unidirectional models for the expected fixation times of mutant alleles in a population (eq. 13 from Kimura 1980 and eq. 16 from Kimura 1985) for a mutation rate μ = 2:5 × 10−8 and selection coefficient s = 10−4. Coevolving sites are under stronger selective constraints than independently evolving sites (i.e., Ccoev > Cind), Constraints increase with increasing effective population size (i.e., Cind (Ne1) < Cind (Ne2) and Ccoev (Ne1) < Ccoev (Ne2) for Ne1 < Ne2), and There exists a range of Nes in which Ne has a stronger effect on the evolution at independently evolving than on coevolving sites (i.e., |Cind (Ne1) − Cind (Ne2)| > |Ccoev(Ne1) − Ccoev(Ne2)|). These general observations are independent of differences in scaled mutation rates (supplementary fig. S1, Supplementary Material online) and also imply that a change in Ne will result in small differences between C for large Nes but in large differences if Nes is small (fig. 2). We can use tRNA molecules to test these predictions by assuming that tRNA positions, which are not involved in secondary structure formation (here denoted as “unpaired” positions), evolve under the independent model, whereas changes at nucleotide positions that are involved in WC pair formation with other partners within the sequence (“paired” positions) will be subject to coevolutionary dynamics. It is important for the analysis that Ne differs between tRNA molecules. This can be achieved by comparing orthologous tRNAs between species pairs of different long-term Ne but can also be tested within species pairs through the comparison of constraints between X chromosomal and autosomal tRNAs that differ in Ne.

Data Set

To investigate the effect of Ne on selective constraints in tRNAs, we collected data sets of orthologous tRNAs in seven species pairs of different Ne (table 1). We were able to extract approximately the same numbers of orthologous tRNAs for all pairs (a list of genomic positions is available from the authors upon request). Only for murids and birds, a smaller number of tRNAs was available. Although it might be expected that the amount of identifiable orthologous tRNAs will decrease with increasing divergence between species, we did not observe such a correlation. However, for all species, only relatively small numbers of tRNAs (if any) were identified on the X chromosome compared with the autosomes. This is not due to a low rate of detection of orthologs on the X chromosome but rather due to a significant underrepresentation of tRNA annotations on the X chromosomes. For instance, the initial set of tRNA annotations in the human genome contained 13 annotations on the X chromosome but 543 on the autosomes. Considering the contribution of the X chromosome to the complete genetic material, 28 tRNAs would have been expected to be located on the X chromosome and 528 on the autosomes, which constitutes a significant deviation from the observed numbers (χ2 = 8.265, P = 0.004). The same is true for drosophilid tRNAs. In general, the GC content in the paired portion of tRNA molecules is larger than for unpaired nucleotides. In particular, paired regions in Drosophila and birds show elevated levels of GC nucleotides. For these species, no specific increase in mutations due to CpG dinucleotides was expected. Therefore, we did not apply the procedure of Gaffney and Keightley (2008), which usually removes a large portion of guanines and cytosines from the sequence and resulted in a lower number of G and C nucleotides in vertebrates.

Core and Peripheral Sets of tRNAs

The total sets of orthologous tRNAs consisted only of molecules that fit the Rfam tRNA covariance model with high probability (relative to a null model that assumes no structure), which was reflected in INFERNAL bit scores S > 35 (see Materials and Methods). We noticed that the distribution of S for most vertebrate pairs is bimodal with a valley at S ≈ 60 (supplementary fig. S2, Supplementary Material online). Therefore, the initial set was separated into two subsets according to this value. tRNAs with very high scores (S ≥ 60) were denoted as a “core” set because they share great structural similarity with tRNAs from other species in various kingdoms of life. The second (“peripheral”) set consisted of tRNAs with lower similarity to the consensus structure of a tRNA (35 < S < 60). This partitioning was performed because we suspected that tRNAs in the core set are under stronger selective constraints, whereas constraints in peripheral tRNAs are more relaxed. We assumed that under these circumstances, Ne will have stronger influence in the peripheral set and will result in more pronounced differences between C, as expected for slightly deleterious mutations. Here, our notion of a core set is based on the structural similarity of tRNAs and differs from the definition of Rogers et al. (2010) who defined a core set based on the conservation of tRNAs throughout the Drosophila genus. According to tRNAScan-SE (Lowe and Eddy 1997), our data set contains tRNAs that encode 20 amino acids (supplementary fig. S3, Supplementary Material online). Most tRNAs that encode a specific amino acid are present in both core and peripheral sets. We did not find a preference of tRNAs for either the core or the peripheral set depending on their potential to encode essential or nonessential amino acids (χ2 = 0.0781, P = 0.7799). Although several species pairs contain pseudogenized tRNAs in the peripheral sets, this amount is comparatively small (human/macaque = 6, macaque/marmoset = 2, dog/cat = 6, mouse/rat = 1) and will only have a minor impact on the effect of Ne in our analysis. Also the length of the variable region that allows to discriminate between Class I tRNAs (short variable region of 4–5 nt) and Class II tRNAs (long variable region of > 10 nt, which is supposed to form a short helix) (Rich and RajBhandary 1976) does not correlate with the presence of tRNAs in core and peripheral sets. In that respect, although tRNALeu and tRNASer are both representatives of Class II and contain a long variable region, the former is more abundant in the peripheral set, whereas the latter occurs more often in the core set. Vice versa, several Class I tRNAs are more abundant in the core set, whereas others are present in higher numbers in the peripheral set. This also suggests that the variable region only plays a minor role in our classification of tRNAs according to the Rfam bit score. Unusually high (or low) bit scores (and therefore classification of tRNAs in core or peripheral set) may have also been caused by a biased nucleotide composition. However, we did not observe any indication that high scores in our data were related to an exceptionally high or low GC content (supplementary fig. S4, Supplementary Material online).

The Influence of the Effective Population Size on Constraints in Nuclear-Encoded tRNAs

To test the predictions that are based on Kimura's models for sequence evolution at independently evolving and coevolving sites under continued mutation pressure (Kimura 1980, 1985), we calculated selective constraints at paired (Cpaired) and unpaired (Cunpaired) positions in orthologous tRNAs for all species pairs (fig. 3; supplementary table S1, Supplementary Material online). Thus, we related the rate of molecular evolution in tRNAs to evolutionary rates obtained from the corresponding neutral standard (supplementary table S5, Supplementary Material online). Depending on the species pair, the obtained values for Cpaired and Cunpaired fall into the ranges of (0.884, 0.996) and (0.698, 0.982), respectively, and thus surpass constraints at nonsynonymous sites in protein coding genes of hominids, murids, and drosophilids (Eőry et al. 2010; Parsch et al. 2010). For each species pair, we were able to observe significantly higher Cpaired than Cunpaired values (CIs do not overlap), as was expected from the comparison of independently evolving and coevolving sites under Kimura's models. The larger Cpaired can be explained by the requirement for paired nucleotides to maintain their conformation and thus to preserve the secondary structure of the molecule.

Constraint (C) for paired (light gray) and unpaired (dark gray) positions in orthologous tRNAs of different species pairs for (a) the whole data set, (b) peripheral set, and (c) core set.

Constraint (C) for paired (light gray) and unpaired (dark gray) positions in orthologous tRNAs of different species pairs for (a) the whole data set, (b) peripheral set, and (c) core set. Further examination of figure 3a, in which species pairs were arranged by increasing Ne from left to right, indicates that constraints also increase in the same order and verifies that Cpaired and Cunpaired increase with increasing Ne—the second prediction that followed from Kimura's models. For instance, the species pairs human/macaque, chicken/zebra finch, and D. melanogaster/D. yakuba, in that order, have significantly increasing Ne in the ranges of 104, 105, and 106, respectively. At the same time, the corresponding values of Cpaired increase from (0.839, 0.933) to (0.942, 0.966) and (0.994, 0.999) and thus significantly differ as well. The same relationship also exists at unpaired sites to an even larger extent. This observation immediately results in the third prediction of Kimura's models, which stated that constraints at independently evolving sites are affected by changes in Ne to a larger extent than at coevolving sites. As a result, larger differences can be observed in constraints at unpaired sites between species of different Ne than at paired sites. For example, the difference in Cunpaired between primates and murids ΔCunpaired (prim/mur) = 0.248, whereas at paired positions ΔCpaired (prim/mur) = 0.105 and thus much smaller. The same is true for most comparisons between other species pairs. However, as was expected, with increasing Ne (and thus also increasing C), the discrepancies between constraints in different species become smaller. Furthermore, the stronger effect of Ne on Cunpaired also manifests itself in comparisons within species pairs through a decrease in the difference |Cpaired − Cunpaired| with increasing Ne. In general, these particular patterns in selective constraints are caused by the interplay of Nes and the influence of mutation rates (supplementary fig. S1, Supplementary Material online). Thereby, the relationship between Cind is mostly independent of θ, whereas increased Ccoev is expected when θ is low. This is particularly apparent in the pair human/macaque, which has the lowest θ value (=0.001) in our analysis. When comparing constraints between species pairs with different divergence (k), it might have been expected that C increases with increasing k because only orthologous tRNAs with higher conservation can be identified for distant sequences. However, similar to the nonsignificant relation between k and the number of identified tRNAs (n), the relationships between k and C (Kendall's τ = −0.43, P = 0.24) as well as n and C (Kendall's τ = −0.39, P = 0.22) are not significant. Therefore, we can exclude an influence of divergence and number of identified tRNAs on estimates of constraints in our data. Even if we assume that divergence between chicken/zebra finch and D. melanogaster/D. yakuba is of a magnitude such that multiple hits cannot be safely ignored (which would result in an underestimation of C for these species), the general pattern persists. For instance, our hypothesis still holds if we replace the D. melanogaster/D. yakuba pair by D. melanogaster/D. simulans (which has much smaller divergence and thus lower probability for multiple hits).

Stronger Constraints in Core tRNA Genes

It is also of interest to determine whether the effect of Ne on C is influenced by the overall strength of purifying selection in tRNAs. Therefore, constraints in tRNAs were analyzed after splitting the data into core and peripheral sets. If selection in the former set is strong, then the effect of Ne on C in this set should be low and vice versa. If, on the other hand, Ne is not responsible for the pattern observed above, the core and peripheral sets should both show signs of approximately equally reduced constraints in species of small Ne. However, the latter assumption can be clearly rejected based on figure 3. Consistent with the assumption that selection pressure is higher in tRNAs belonging to the core set, we are able to observe higher constraints in tRNAs from the core set compared with the peripheral set for all species pairs. It is more important, however, that the increase in C with increasing Ne is strong in the peripheral set of tRNAs (fig. 3) and only very weak (but still present) in the core set of tRNAs as shown in figure 3. The latter effect is better visible after applying the transformation log(1 − C) (supplementary fig. S5c, Supplementary Material online). Therefore, we can assume that selective constraints in tRNAs are most likely influenced by Ne and that this effect is strong if selection is weak, whereas in the case of strong selection, our observations follow theoretical predictions, which show that the fixation time of compensatory double mutants is rather independent of Ne (eq. 8c in Stephan 1996). Even though a separation of the data according to a single score may be crude, our results show that it allows us to distinguish between two sets of tRNAs that seem to be under different selective constraints. Further evidence for this hypothesis comes from the observed GC contents at non–CpG-prone nucleotides of the two sets. Compared with the peripheral sets, the core sets show higher GC contents in the paired portion of tRNA molecules for most species pairs (table 1). A higher GC content was shown to be associated with an increased substitution rate (Eőry et al. 2010; Piskol and Stephan 2011). Therefore, if tRNAs in the core set were subject to the same constraints as tRNAs in the peripheral set, more substitutions would have been expected in tRNAs belonging to the core set. However, the exact opposite is observed, which justifies our separation of the data in two sets and confirms higher constraints in the core set. To understand how a difference in GC content between neutrally evolving regions and regions of interest may influence selective constraints, we resampled the neutrally evolving regions such that they match the GC content at paired and unpaired nucleotides in vertebrate tRNAs. Subsequently, we repeated the calculation of selective constraints for all tRNAs, as well as for the peripheral sets and core sets of tRNAs separately (supplementary table S4, Supplementary Material online). Although the effect of Ne on selective constraints is slightly weakened after resampling of neutral regions, the same overall pattern as before persists (compare supplementary tables S1 and S4, Supplementary Material online). Constraints at paired positions only change marginally after resampling of neutral sites. This is presumably due to the high overall selective pressure at paired sites. However, the resampling of neutral regions to match the GC content at unpaired sites showed a larger effect on constraint estimates. It led to a decrease in GC content and therefore an increase of substitution rates at neutral positions, which ultimately resulted in higher estimated selective constraints at unpaired sites. This effect was expected due to the overall negative correlation between substitution rates and GC content (Eőry et al. 2010) for GC contents between 30% and 50%. These findings suggest that our estimates of selective constraint at paired sites, which were obtained without resampling of neutral standards (fig. 3 and supplementary table S1, Supplementary Material online), are robust against differences in GC content. For unpaired sites, our estimates without resampling are conservative and an adjustment through resampling leads to higher estimated constraints.

Differences in Selective Constraints between Autosomes and X Chromosome

Apart from the differences in Ne between species and their effect on nucleotide variation, effects of Ne on C might also be expected within species. If the contribution of genetic material to the next generation is equal for males and females, the expected ratio of X to autosomal Ne () is 0.75, due to the presence of the X chromosome in a single copy in males. However, this assumption is not always met. It was reported previously (Hutter et al. 2007) that in a European population of D. melanogaster =0.49 and thus lower than expected, whereas in an African (ancestral) population of D. melanogaster =0.90. Other studies also suggest that in ancestral populations may be larger than expected (Andolfatto 2001; Connallon 2007; Singh et al. 2007). Therefore, the efficacy of selection may differ between X chromosome and autosomes and may lead to different selective constraints. To test whether differences in constraints are observed between the X chromosome and autosomes, we divided the 277 orthologous tRNAs for the D. melanogaster/D. yakuba pair according to their genomic location into 21 X-linked and 256 autosomal tRNAs and obtained selective constraints separately for these two sets. It was shown before (Betancourt et al. 2002) that evolutionary rates do not differ between chromosomes in D. melanogaster. Nonetheless, we avoided any confounding effects due to systematic differences in mutation rates between X chromosome and autosomes by using introns that were exclusively located on the X chromosome or autosomes as neutral standards for the evolution of X and autosomes, respectively (supplementary table S5, Supplementary Material online). Table 2A shows constraints in paired and unpaired regions for all X-linked and autosomal tRNAs. Again, paired positions are subject to significantly higher evolutionary constraints than positions that are not involved in the formation of WC base pairs for both autosomes and the X chromosome. More interestingly, lower constraints can be observed on the X chromosome (Cx) than on the autosomes (CA) (presumably due to the smaller Ne of the X chromosome). The difference in constraints between autosomes and X chromosome (|CA − Cx|) is particularly apparent in unpaired portions of tRNAs and is in accordance with theoretical predictions that Ne will have a large impact on evolution at independently evolving sites. Lower constraints on the X chromosome might have also been observed due to reduced evolutionary rates in the neutral standard on the X chromosome rather than increased rates of fixation in tRNAs. However, our neutral divergence estimate for the X chromosome is slightly larger than for autosomes (supplementary table S5, Supplementary Material online) and hence cannot be held accountable for lower constraints in X-linked tRNAs but suggests that a lower Cx at paired and unpaired sites is indeed due to a higher number of fixed differences in tRNAs on the X chromosome. Similar patterns of higher divergence on the X chromosome have also been observed at nonsynonymous sites in the D. melanogaster and D. yakuba lineages (Begun et al. 2007). Given the estimates of >0.75 for the ancestral population of D. melanogaster from previous studies and assuming that mutations in tRNAs will be mostly slightly deleterious, we would have expected that the rate of fixation on the X chromosome was reduced compared with the autosomes (Vicoso and Charlesworth 2009; Mank et al. 2010). However, the slightly lower constraints on the X chromosome suggest faster fixations of mildly deleterious mutations in X-linked tRNAs (compared with autosomal tRNAs) and point to a long-term , which is smaller than 0.75 for tRNAs in the D. melanogaster/D. yakuba pair (see fig. 3 in Vicoso and Charlesworth 2009).

Table 2

		C_paired	(95% CI)	C_unpaired	(95% CI)	\|C_paired − C_unpaired\|
A.	Autosomes	0.9977	(0.9961,0.9996)	0.9862	(0.9804,0.9932)	0.0115
	X chromosome	0.9833	(0.9707,1.000)	0.9357	(0.8900,0.9879)	0.0467
	C_A − C_x	0.0144	0.031*	0.0505	0.004**
B.	Autosomes	0.9937	(0.9894,0.9989)	0.9698	(0.9547,0.9860)	0.0239
	X chromosome	0.9472	(0.8944,1.000)	0.8369	(0.7073,0.9750)	0.1103
	C_A − C_x	0.0455	0.015*	0.1329	0.001**
C.	Autosomes	1.000	(1.000,1.000)	0.9961	(0.9922,1.000)	0.0059
	X chromosome	0.9945	(0.9891,1.000)	0.9744	(0.9488,1.000)	0.0201
	C_A − C_x	0.0055	0.085	0.0217	0.067

NOTE.—CA − Cx is the difference in constraints between tRNAs encoded on the autosomes and X chromosome for paired and unpaired sites. In this case, values in the 95% CI column give the P value for the difference. Significance levels: *P < 0.05, **P < 0.01.

Selective Constraints for Paired (Cpaired) and Unpaired (Cunpaired) Positions in Drosophilid tRNAs Located on the Autosomes and the X Chromosome for (A) the Whole Data Set, (B) Peripheral Set, and (C) Core Set NOTE.—CA − Cx is the difference in constraints between tRNAs encoded on the autosomes and X chromosome for paired and unpaired sites. In this case, values in the 95% CI column give the P value for the difference. Significance levels: *P < 0.05, **P < 0.01. In addition, we confirmed that the lower constraints in X-linked tRNAs are in fact significant and did not arise simply by chance due to the small sample size of tRNAs on the X chromosome. For this reason, we generated 1000 data sets by randomly splitting the 277 Drosophila tRNAs into sets of 21 and 256 instances (resembling the sizes of X and autosomal data). For all repetitions, we calculated constraints at paired and unpaired sites in the large and small sets, respectively, and thus obtained distributions for |CA − Cx| that would be expected at random (fig. 4). Indeed, the observed values of |CA − Cx| are significantly larger than in the randomly assembled sets. This is true for paired regions (|CA − Cx| = 0.0144, P = 0.031*) and to a larger extent in the unpaired portion of tRNAs (|CA − Cx| = 0.0505, P = 0.004**).

Histogram of differences in constraints at (a) paired and (b) unpaired positions between sets of 256 and 21 tRNAs that were created by randomly splitting 277 orthologous tRNAs of Drosophila melanogaster and D. yakuba 1000 times. The dashed lines represent the observed values of |CA − Cx| taken from table 2, A. When repeated separately for tRNAs grouped in core and peripheral sets, the same analysis also supports our previous conjecture that effects of Ne on the difference in constraints between X chromosome and autosomes are large if selection is weak (table 2, B) but much smaller when selection on the tRNA molecule is overall strong (table 2, C). This becomes apparent through significant values of |CA − Cx| in the peripheral set, whereas no significant differences are observed in the core set of tRNAs (table 2 and supplementary figs. S6 and S7, Supplementary Material online).

Conclusions

We showed that divergence patterns in nuclear-encoded tRNA molecules of vertebrate and drosophilid species follow general theoretical predictions for sequence evolution under mutational pressure. Larger selective constraints can be observed with increasing Ne. This effect is weaker at coevolving sites than at independently evolving sites. The influence of Ne on nucleotide variation is not exclusive to tRNAs but seems to be universal in RNA molecules as microRNAs exhibit a similar increase of selective constraints with increasing Ne (supplementary table S6, Supplementary Material online). Here, we did not take the effect of recombination on into account. It was shown previously that recombination may retard the rate of fixation of compensatory double mutants in RNA molecules even when the distance in sequence (d) between paired nucleotides is small (50 < d < 250) (Piskol and Stephan 2008). For mildly deleterious single mutants, recombination also has the potential to combine individual mutant alleles thus leading to complex adaptations (Lynch 2010; Weissman et al. 2010). However, usually fixation times of double mutants are only moderately affected by recombination. Although we cannot completely rule out that some of the substitutions investigated in our study are of an adaptive nature, we assumed that the vast majority of mutations in tRNAs are deleterious. Given that tRNA molecules have a well-defined function, mutations will most likely alter the structure and original conformation of the molecule in space thus potentially changing its functionality and leading to a decrease in fitness. Very important for our analysis was the assumption that WC base pairs, which form the secondary structure of the tRNA, are subject to coevolutionary dynamics, whereas other nucleotides in the tRNA, whether involved in non-WC pairs or completely unpaired, may evolve independently. This was shown to be the case in bacterial ribosomal RNAs (Dutheil et al. 2010) and is also directly applicable to tRNAs due to the universality of base pairs (Leontis and Westhof 2001). In a recent study (Piskol and Stephan 2011), we reported that selective constraints in computationally predicted noncoding RNAs that are encoded in the nuclear genomes of drosophilids and hominids differ in their magnitude between the two genera. We suggested that Ne is responsible for this difference and results in stronger selective constraints in drosophilids. In general, the definition of neutral evolution and the distinction between neutrality and purifying selection in terms of Ne is complicated and has been the topic of many controversies (Nei et al. 2010). Even though the definition of neutrality may have changed over the years (Ohta and Gillespie 1996; Nei 2005), our present results demonstrate that Ne indeed can be held accountable for differences in the efficacy of selection and does so by affecting coevolving and independently evolving sites to different degrees. We suggest that there exists a set of peripheral tRNAs for which mutations are slightly deleterious and scaled selective coefficients are only of a moderate size (|Nes| ≤ 5). For this regime, the pattern of increasing constraints is strongly influenced by the effective population size (fig. 3) and follows the theoretical predictions for fixation times of deleterious mutations in Kimura's one- and two-locus models (Kimura 1980, 1985). The remaining (core) tRNAs might be subject to stronger evolutionary restrictions, and thus, divergence patterns in these molecules are less susceptible to differences in Ne. Although the reason for the existence of different constraints in tRNAs is largely unknown, the expression of tRNAs and the usage of optimal codons may play a role in the maintenance of certain levels of selective pressures in these molecules (Moriyama and Powell 1997; Carlini and Stephan 2003; Hense et al. 2010). Our results have also direct consequences for the inference of phylogenetic relationships between taxa that differ in their long-term effective population size. If the estimation of branch lengths is performed using independently evolving sites that are subject to weak purifying selection (e.g., synonymous sites), then the length of branches leading to taxa with large Ne might be underestimated to a larger extent than for taxa with small Ne.

Supplementary Material

Supplementary tables S1–S6 and figures S1–S7 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

68 in total

1. Comparative sequence analysis and patterns of covariation in RNA secondary structures.

Authors: J Parsch; J M Braverman; W Stephan
Journal: Genetics Date: 2000-02 Impact factor: 4.562

2. Secondary structure prediction for aligned RNA sequences.

Authors: Ivo L Hofacker; Martin Fekete; Peter F Stadler
Journal: J Mol Biol Date: 2002-06-21 Impact factor: 5.469

3. Population size and molecular evolution on islands.

Authors: Megan Woolfit; Lindell Bromham
Journal: Proc Biol Sci Date: 2005-11-07 Impact factor: 5.349

4. Effective population size and the Faster-X effect: empirical results and their interpretation.

Authors: Judith E Mank; Beatriz Vicoso; Sofia Berlin; Brian Charlesworth
Journal: Evolution Date: 2009-09-30 Impact factor: 3.694

5. On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila.

Authors: John Parsch; Sergey Novozhilov; Sarah S Saminadin-Peter; Karen M Wong; Peter Andolfatto
Journal: Mol Biol Evol Date: 2010-02-11 Impact factor: 16.240