Literature DB >> 30590559

Pleiotropy Modulates the Efficacy of Selection in Drosophila melanogaster.

Christelle Fraïsse¹, Gemma Puixeu Sala¹, Beatriz Vicoso¹.

Abstract

Pleiotropy is the well-established idea that a single mutation affects multiple phenotypes. If a mutation has opposite effects on fitness when expressed in different contexts, then genetic conflict arises. Pleiotropic conflict is expected to reduce the efficacy of selection by limiting the fixation of beneficial mutations through adaptation, and the removal of deleterious mutations through purifying selection. Although this has been widely discussed, in particular in the context of a putative "gender load," it has yet to be systematically quantified. In this work, we empirically estimate to which extent different pleiotropic regimes impede the efficacy of selection in Drosophila melanogaster. We use whole-genome polymorphism data from a single African population and divergence data from D. simulans to estimate the fraction of adaptive fixations (α), the rate of adaptation (ωA), and the direction of selection (DoS). After controlling for confounding covariates, we find that the different pleiotropic regimes have a relatively small, but significant, effect on selection efficacy. Specifically, our results suggest that pleiotropic sexual antagonism may restrict the efficacy of selection, but that this conflict can be resolved by limiting the expression of genes to the sex where they are beneficial. Intermediate levels of pleiotropy across tissues and life stages can also lead to maladaptation in D. melanogaster, due to inefficient purifying selection combined with low frequency of mutations that confer a selective advantage. Thus, our study highlights the need to consider the efficacy of selection in the context of antagonistic pleiotropy, and of genetic conflict in general.

Entities: Chemical Disease Gene Species

Keywords: zzm321990 Drosophila melanogasterzzm321990 ; (mal)adaptation; evo-devo; gene expression; pleiotropy; selective constraint

Mesh：

Year: 2019 PMID： 30590559 PMCID： PMC6389323 DOI： 10.1093/molbev/msy246

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

Natural selection should ensure that only mutations that are beneficial become fixed in a population, whereas deleterious ones are removed. However, the efficacy of selection can be reduced by several factors, leading to the fixation of deleterious mutations and loss of beneficial ones. The best understood is a reduction in the effective population size, Ne, which exacerbates the effect of genetic drift (Kimura and Ohta 1971, Chapter I; Maynard 1976). Differences in effective population size can occur between species, populations, and even genomic regions. For instance, population bottlenecks, as well as high levels of inbreeding, can lead to strong reductions of the effective size of a population (Nei et al. 1975). Similarly, Hill–Robertson effects (Hill and Robertson 1966), the interference of selective effects between partially linked sites, can reduce Ne in genomic regions of low recombination and/or high density of functional sites (Elyashiv et al. 2016; Campos et al. 2017; Booker and Keightley 2018). Another facet of selection efficacy concerns the nature of new mutations themselves. For instance, selection on recessive mutations will be largely inefficient until these are at a frequency high enough that homozygous individuals are common, which may lead to a sieve against new recessive beneficial mutations (Haldane 1927; Marad et al. 2018). Such mutations should be more effectively selected on X chromosomes, which are haploid in males (Charlesworth et al. 1987; Vicoso and Charlesworth 2009). Second, although mutations are traditionally categorized as beneficial, neutral or deleterious, they can also have a context-dependent effect on fitness, if they bring an advantage only to a specific tissue, sex or life stage, but are otherwise deleterious. Because of these potentially antagonistic pressures, pleiotropic genes, which affect different phenotypes, likely evolve under unusual selective scenarios (Connallon and Hall 2018). Specifically, pleiotropic mutations have been predicted to be under stronger purifying selection, as selection will act on many phenotypes at once (Kimura and Ohta 1974; Van Dyken and Wade 2010). On the other hand, if a new mutation affects many phenotypes, adaptation may be strongly limited (Fisher 1930; Collet et al. 2018), and divergence at that locus may be primarily driven by drift. The best-studied case of antagonistic pleiotropy, sexual antagonism, occurs when the functional divergence of males and females leads to sex-specific optima for phenotypes expressed in both sexes (Darwin 1871). As males and females share essentially the same genome, mutations favorable in one sex may be deleterious to the other. Thus, it may be difficult for the sexes to reach their own fitness optima, leading to a so-called gender load in sexual populations (e.g., in beetles: Arnqvist and Tuda 2010). Evidence for sexual antagonism in animals and plants is accumulating (Bonduriansky and Chenoweth 2009), and experiments in Drosophila have shown that sexually antagonistic variation occurs genome-wide (Rice 1992; Innocenti and Morrow 2010). The resolution of sexual conflict is expected to involve mechanisms that decouple developmental pathways between the sexes, such as sex-biased expression (Ellegren and Parsch 2007; VanKuren and Long 2018). The pervasive expression divergence between the sexes found in many organisms has consequently been interpreted as potential evidence of past sexual conflict (Parsch and Ellegren 2013). Many studies in Drosophila (Zhang et al. 2004; Pröschel et al. 2006; Grath and Parsch 2012; Perry et al. 2014; Avila et al. 2015) and in other taxa (e.g., nematodes: Cutter and Ward 2005; mammals: Torgerson et al. 2002; Good and Nachman 2005; algae: Lipinska et al. 2015) have further found a faster rate of evolution of male-biased genes, possibly due to a relaxation of pleiotropic constraints (Meisel 2011; Parsch and Ellegren 2013). Developmental pleiotropy has also been well characterized, especially in Drosophila (Davis et al. 2005; Artieri et al. 2009) and Caenorhabditis elegans (Cutter and Ward 2005). It implies that genes expressed early during development will be more evolutionary constrained than genes expressed later, because they may be involved in a higher number of functional interactions. This model echoes another pleiotropic model used to explain the evolution of aging, first expressed by Williams (1957) and experimentally validated by Rose and Charlesworth (1980), which assumes the existence of pleiotropic alleles that increase fitness more strongly early in life than they decrease it later (see Lemaître et al. 2015 for a review). Altogether, these models highlight the potential pervasiveness of antagonistic pleiotropy in nature and its importance in evolution. Understanding how selective pressures shape genome evolution has been a main goal of evolutionary genomics, and many tests of the efficacy of selection in different species and genomic regions have been performed (see Booker et al. 2017 for a review). A commonly used measure is α (Smith and Eyre-Walker 2002), the proportion of divergent sites fixed by positive selection, which can be estimated from the number of synonymous and nonsynonymous sites that differ within and between species. If selection is efficient and adaptive evolution dominates, α should approximate one, whereas if selection is too inefficient for adaptive mutations to become fixed and deleterious mutations to be removed, α should be strongly reduced. In genes evolving primarily under strong purifying selection, α should also be close to zero. To understand whether differences in α are due to changes in purifying selection or in positive selection, the rate of adaptive substitutions relative to the rate of neutral evolution (ωA) can additionally be estimated (Bierne and Eyre-Walker 2004; Gossmann et al. 2010; Galtier 2016). Another estimate related to α is the direction of selection (DoS, Stoletzki and Eyre-Walker 2011), which is zero under neutrality, and negative and positive under purifying and positive selection, respectively. Finally, many studies simply use the ratio of nonsynonymous to synonymous divergence (Dn/Ds) or polymorphism (Pn/Ps) to assess selective pressures: Dn/Ds should increase under positive selection, and both Dn/Ds and Pn/Ps should decrease under purifying selection. These empirical studies have consistently detected a reduced efficacy of selection in regions of low recombination (Campos et al. 2014; Charlesworth and Campos 2014; Castellano et al. 2016) and in species with lower effective population size (Jensen and Bachtrog 2011; Bataillon et al. 2015; Galtier 2016; but see Bachtrog 2008 and Andolfatto et al. 2011 for counterexamples). Similarly, α values for X-linked genes are consistently larger than for autosomal genes (Meisel and Connallon 2013), even when differences in recombination are accounted for (Campos et al. 2018; Charlesworth et al. 2018). Although increased purifying selection against pleiotropic mutations (McGuigan et al. 2014) and slow rates of evolution of pleiotropic genes (Salathé et al. 2006) have been observed, the overall effect of pleiotropy on selection efficacy is not as well understood. Some studies found reduced rates of adaptation and/or efficacy of selection for more pleiotropic genes (Hahn and Kern 2005; Papakostas et al. 2014), whereas others found the opposite (Vedanayagam and Garrigan 2015; Huber et al. 2017; Josephs et al. 2017), or no effect at all (Jordan et al. 2003; Hahn et al. 2004). One challenge in making sense of these studies is that pleiotropy can be interpreted in many ways (Paaby and Rockman 2013), and different measures are used to estimate it. Common measures include the number of protein–protein interactions, a proxy for the number of molecular functions of a gene, as well as the breadth of expression, which reflects the potential of a gene to affect different phenotypes. Both of these suffer from the drawback of not directly assessing whether the gene has an effect on more than one fitness component (e.g., a gene may perform the same function in many tissues, or have a single molecular function in a very large gene network). Finally, a last challenge faced by all these studies is the fact that many parameters influencing the efficacy of selection are themselves correlated (e.g., expression and nonneutral divergence, recombination and expression, connectivity and expression, etc.; see supplementary fig. S1, Supplementary Material online), making it difficult to disentangle their individual effects. In particular, studies of pleiotropy rarely take into account recombination, and generally focus on one type of pleiotropic antagonism (e.g., between sexes, life stages, tissues, or gene networks). Similarly, although many studies have detected faster adaptive evolution of male-biased genes, it is difficult to disentangle the effect of reduced pleiotropy from increased rates of adaptation due to sexual selection, the favored hypothesis. Here, we take a systematic approach to examine the effect of pleiotropy on the efficacy of selection in Drosophila melanogaster. First, we consider the effect of well-known modulators of selection efficacy, such as X-linkage, recombination rate and expression level, on different measures of positive and purifying selection (α, DoS, ωA, Pn/Ps, Dn/Ds). We then combine these with various proxies for pleiotropy: gene connectivity, and breath of expression in different tissues, sexes and life stages. This allows us both to disentangle the effect of pleiotropy from that of its covariates, and to assess the relative effect of antagonism between sexes, life stages, tissues and gene networks on the direction, strength, and efficacy of selection.

Results

A total of 10,631 and 9,895 genes were analyzed at coding and noncoding (untranslated regions [UTRs]) sites, respectively. The proportion of substitutions fixed by positive selection (α) was calculated based on the number of synonymous and nonsynonymous/UTRs polymorphic sites within D. melanogaster and divergent sites with D. simulans (see Materials and Methods). In the first two sections, we present results related to coding regions, whereas in the last section we compare them with noncoding regions.

General Modulators of Selection Efficacy

We first quantified the effect of factors that potentially affect the efficacy of selection by means of Spearman’s rank correlations using binned data (fig. 1). We ranked genes by their value at each covariate and divided them into equally sized classes for numerical variables (see Materials and Methods); we then correlated the medians of the variable at each bin with the respective median αMK values. As previously found (Campos et al. 2018; Charlesworth et al. 2018), we detected a faster-X effect, that is, a higher fraction of adaptive substitutions on X-linked genes compared with autosomal genes (medians: αA = 0.600, αX = 0.817; Wilcoxon’s rank test: P-value = 2.94E-80, nA = 8,943, nX = 1,688; fig. 1) and a faster rate of adaptive evolution (medians: ωAA = 0.190, ωAX = 0.322; Wilcoxon’s rank test: P-value = 1.54E-49, nA = 8,943, nX = 1,688; supplementary fig. S2a, Supplementary Material online). This pattern held for all categories of sex-biased expression (male-biased: αA = 0.800 [nA = 799], αX = 0.831 [nX = 90], P-value = 3.43E-1; unbiased: αA = 0.576 [nA = 8,048], αX = 0.818 [nX = 1,555] P-value < 2.2E-16; female-biased: αA = 0.548 [nA = 96], αX = 0.746 [nX = 43], P-value = 0.224), which may indicate a primary role of the higher effective population size of the X compared with autosomes in D. melanogaster (see Campos et al. 2018 for a discussion).

. 1.

The proportion of adaptive substitutions (αMK) in coding (red) and noncoding (blue) regions is strongly influenced by general modulators. The relationship between each covariate (x axis) and αMK (y axis) is shown in successive panels: (a) chromosomal location, (b) recombination rate (4*Ne*r per bp), (c) length of the longest transcript (bp), and (d) global expression level (RPKM). Each point represents the median over genes grouped according to their values at each covariate. Error bars indicate 95% bootstrapped confidence intervals. For numerical covariates, the Spearman’s rank correlation coefficient (ρ and ρpartial) is given. Categorical covariates were compared with Wilcoxon’s rank tests. Significance levels are denoted by asterisks (***P < 0.001, **P < 0.01, *P < 0.5, ns: non significant). The lines are least-squares regressions (linear or quadratic depending on which model best fitted the data based on AIC comparison; supplementary table S1, Supplementary Material online, for coding regions and supplementary table S2, Supplementary Material online, for noncoding regions), but they should be considered as indicative because of the binning procedure. Variants below 5% frequency were excluded. We also found that the fraction of adaptive substitutions is strongly positively correlated with the recombination rate (ρ = 0.932, P-value = 8.02E-23, n = 50; fig. 1), consistent with previous studies (Campos et al. 2014; Castellano et al. 2016). Similarly, there is a strong negative correlation between the length of the longest transcript and α (ρ = −0.808, P-value = 1.32E-12, n = 50; fig. 1). We also confirmed earlier findings (Larracuente et al. 2008) that transcript length is negatively correlated with the evolutionary rate, Dn/Ds (ρ = −0.962, P-value = 8.38E-29, n = 50; supplementary fig. S2c, Supplementary Material online). These observations can be interpreted as a consequence of both positive and purifying selection at linked sites (Hill–Robertson interferences), between neighboring genes or between sites within a gene, which reduces the efficacy of selection in a linear manner (supplementary table S1, Supplementary Material online) in regions of low recombination and long genes. Interestingly, we noted a strong decrease in the ratio of nonsynonymous to synonymous polymorphism, Pn/Ps, with the recombination rate (ρ = −0.878, P-value = 5.45E-17, n = 50; supplementary fig. S2b, Supplementary Material online), indicating that purifying selection is more efficient when interferences are weaker (see Campos et al. 2014 for similar results in noncrossover vs. crossover regions of D. melanogaster). On the contrary, the evolutionary rate, Dn/Ds, increased with recombination (ρ = 0.608, P-value = 2.89E-6, n = 50; supplementary fig. S2b), but the pattern was noisier. Previous analyses of the correlation between Dn/Ds and recombination have yielded mixed results (Betancourt and Presgraves 2002; Presgraves 2005; Zhang and Parsch 2005; Haddrill et al. 2007; Campos et al. 2014; Bolívar et al. 2016). This may reflect the fact that the predicted effect of recombination rate on Dn/Ds depends on the predominant mode of selection (and therefore on the set of genes used for the analysis): Dn/Ds is expected to be positively correlated with recombination under positive selection and negatively under purifying selection. The detection of a positive correlation supports a significant contribution of positive selection to protein divergence in D. melanogaster (consistent with the positive correlation between ωA and recombination: ρ = 0.902, P-value = 3.64E-19, n = 50; supplementary fig. S2b). Genes with low expression are also known to be associated with increased rates of evolution, presumably because they are under relaxed purifying selection (e.g., Pál et al. 2006; Larracuente et al. 2008, see also our supplementary fig. S2d). However, the relationship between expression and the fraction of adaptive fixations is not as well understood (Carneiro et al. 2012). Figure 1 shows that the lower the expression level of a gene, the lower the fraction of adaptive fixations (ρ = 0.551, P-value = 3.41E-5, n = 50), whereas the rate of adaptive substitutions (ωA: ρ= −0.773, P-value = 4.67E-11, n = 50) and Pn/Ps (ρ = −0.954, P-value = 7.07E-27, n = 50) followed the opposite trend (supplementary fig. S2d). This suggests that genes with low expression are less conserved and evolve (adaptively) more rapidly than genes with higher expression. However, the relationship with α was nonlinear (Akaike information criterion: AIClinear = −131, AICquadratic = −172.3, supplementary table S1) and mostly driven by very highly expressed genes that are under strong purifying selection but experience a higher adaptive evolutionary rate than expected linearly (supplementary fig. S2d). Finally, α necessarily depends on the strength and frequency of positive selection itself. Genes under very strong constraint are expected to have low α, if their mutational space only includes very few beneficial mutations compared with neutral ones (supplementary fig. S2e). To account for this, we performed a supplementary analysis that incorporates Dn/Ds itself as a predictor (supplementary tables S3 and S4, Supplementary Material online). It should be noted that using Dn/Ds as a confounding variable is a conservative approach, as Dn/Ds and α are intrinsically correlated, leading to some loss of power to detect the effect of other predictors. However, it is an important control, especially when sex-biased genes, which are often under unusual selective regimes due to sexual selection, are considered. Notably, all these patterns were robust to the different procedures used to filter the data (supplementary fig. S3a–e, Supplementary Material online): excluding singletons instead of variants below 5% frequency to control for slightly deleterious polymorphisms; excluding transcript shorter than 900 bp to avoid small counts; and excluding X-linked genes, sex-biased genes, or immune genes as these factors have a strong effect on α (Obbard et al. 2009; Campos et al. 2018). Moreover, we estimated α with another method (Eyre-Walker and Keightley 2009) that explicitly models deleterious mutations based on the site frequency spectrum of all variants (supplementary fig. S2a–e, Supplementary Material online), and we again recovered all qualitative patterns.

The Proportion of Adaptive Substitutions under Different Pleiotropic Regimes

Our aim was to quantify the net effect of different facets of pleiotropy (namely, gene networks and breadth of expression across tissues, life stages, and sexes) on selection efficacy after controlling for the general modulators introduced above. This was done by applying Spearman’s partial correlations on unbinned data (table 1). Although partial correlations between α and the general modulators were weaker than correlations on binned data, they followed the same trends (fig. 1). In addition, we obtained partial correlations (supplementary table S3, Supplementary Material online) with the direction of selection (DoS) and the adaptive substitution rate (ωAMK), and further estimated the effect size of each predictor in linear models (supplementary table S3, Supplementary Material online). Importantly, the general modulators alone explained 10.5% of the variance in DoS (respectively 9.5% for ωAMK), whereas pleiotropy explained only 2.3% (respectively 2.5% for ωAMK) of the total variance. The effect of all pleiotropic metrics combined was thus limited, but significant based on a likelihood ratio test (Log Lmodulators = 2,748.8, Log Lall = 2,893.2; P-value = 9.62E-58; supplementary table S3, Supplementary Material online). In comparison, the Hill–Robertson factors alone (recombination rate and transcript length) contributed up to 9.5% to the variance in DoS (respectively 7.9% for ωAMK).

Table 1.

Spearman’s Rank Partial Correlations on αMK in (a) Coding and (b) Noncoding Regions.

	(a) Coding		(b) Noncoding
Covariate	Estimate	P-value	Estimate	P-value
Log(recombination rate)	1.28E-001	6.61E-040***	1.21E-001	1.86E-033***
Log(transcript length)	−1.32E-001	1.55E-042***	−2.89E-002	4.04E-003**
Log(expression level)	1.09E-001	2.76E-029***	−2.01E-003	8.42E-001ns
X chromosome	1.31E-001	1.34E-041***	7.78E-002	9.33E-015***
Gene–gene interactions	2.57E-002	8.12E-003**	−2.39E-003	8.12E-001ns
Protein–protein interactions	4.03E-002	3.21E-005***	9.12E-003	3.65E-001ns
microRNA–gene interactions	2.41E-003	8.04E-001ns	−3.66E-002	2.71E-004***
TF–gene interactions	−9.08E-003	3.50E-001ns	1.02E-002	3.10E-001ns
Sexually antagonistic genes	−1.26E-002	1.93E-001ns	−2.99E-002	2.91E-003**
Intersexual genetic correlation	−1.90E-002	5.01E-002*	6.69E-003	5.06E-001ns
Folded sex specificity	8.99E-002	1.72E-020***	2.78E-002	5.79E-003**
Tissue-by-stage specificity	−6.68E-003	4.91E-001ns	−5.99E-003	5.52E-001ns

Note.—Variants below 5% frequency were excluded.

Significance levels are denoted by asterisks: ***P < 0.001, **P < 0.01, *P < 0.05. ns: non significant.

Spearman’s Rank Partial Correlations on αMK in (a) Coding and (b) Noncoding Regions. Note.—Variants below 5% frequency were excluded. Significance levels are denoted by asterisks: ***P < 0.001, **P < 0.01, *P < 0.05. ns: non significant.

Gene Networks

Pleiotropy has been commonly measured by the number of interactions a gene is involved in, that is, gene connectivity. However, previous studies have found limited evidence of an effect of connectivity on evolutionary rate (e.g., Dn/Ds: Jordan et al. 2003; Hahn et al. 2004) or on the rate of adaptive substitutions (ωA: Josephs et al. 2017). Here, we investigated gene networks at four different levels (Murali et al. 2011): genetic interactions, that is, the modification of the phenotype of a mutant by an allele at a second gene; microRNA–gene interactions, predicted primarily from the complementarity of microRNAs and putative target genes; and experimentally derived transcription factor (TF)–gene interactions and protein–protein interactions. In all cases, there was a small number of highly connected genes (hubs), whereas the majority of the genes were involved in no or a few interactions. This pattern was extreme for genetic and protein–protein networks, where 88% and 71% of the genes had no interactors recorded, respectively, and only 10% and 6% displayed more than one interactor. Gene networks at the microRNA and TF levels were less skewed, with only 16% and 10% of the genes without interactors, respectively, and 10% and 26% showing more than 20 interactors. If antagonistic molecular pleiotropy were widespread, genes with more interactors should have lower α values. Contrary to this, the most consistent pattern was a positive correlation between the number of protein–protein interactions and α (ρ = 0.748, P-value = 7.35E-3, n = 12; fig. 2 ρpartial = 0.0403, P-value = 3.21E-5, n = 10,631; table 1), whereas patterns were not consistent across the different data sets for the other types of interactions (supplementary fig. S3f, Supplementary Material online). Accordingly, a stepwise selection approach showed that of the four measures of gene networks, only protein–protein interactions were relevant in the DoS (table 2) and ωA (table 3) linear models. In the case of microRNAs, an apparent negative correlation was observed between the number of interactions and α values (ρ = −0.673, P-value = 3.12E-4, n = 24; fig. 2). However, this correlation was not significant with the alternative method of Eyre-Walker and Keightley (2009) (supplementary fig. S2g, Supplementary Material online), and disappeared when other covariates were accounted for (ρpartial = 2.41E-3, P-value = 8.04E-1, n = 10,631; table 1). This seemed to be partly due to the positive correlation between the number of microRNA–gene interactions and transcript length (ρ = 0.368, P-value < 2.2E-16, n = 10,631, supplementary fig. S1a, Supplementary Material online), which may be explained by the fact that many microRNA–gene interactions were estimated in silico based on sequence complementarity (and longer sequences have a higher chance of being complementary). Finally, supplementary fig. S2f, Supplementary Material online, indicates that the positive relationship between connectivity at the protein level and the fraction of adaptive fixations was mainly driven by stronger purifying selection acting on highly connected genes (Pn/Ps: ρ = −0.863, P-value = 2.99E-4, n = 12), whereas positive selection had a negligible effect (ωA: ρ = −0.657, P-value = 2.40E-2, n = 12). Taken together, these results suggest that the common assumption that highly connected genes are more likely to be under antagonistic molecular pleiotropy (Promislow 2004) may be unfounded (He and Zhang 2006).

. 2.

The proportion of adaptive substitutions (αMK) in coding (red) and noncoding (blue) regions is moderately influenced by different facets of pleiotropy: (a) gene networks: protein–protein interactions and microRNA–gene interactions, (b) tissue-by-stage specificity, and (c) sex-related metrics: sex specificity, sexual antagonism, and intersexual genetic correlation. Other details match figure 1.

Table 2.

Best Linear Model on DoS Obtained After a Stepwise Model Selection in (a) Coding and (b) Noncoding Regions.

	(a) Coding		(b) Noncoding
Covariate	Estimate	P-value	Estimate	P-value
Intercept	1.41E-001	0.00E+000***	6.78E-002	2.53E-250***
Log(recombination rate)	3.71E-002	1.71E-079***	2.53E-002	1.92E-042***
Log(transcript length)	−3.38E-002	8.28E-069***	−1.64E-002	8.99E-017***
Log(expression level)	−3.54E-003	1.05E-001ns	5.20E-003	1.19E-002*
X chromosome	5.60E-002	4.80E-026***	4.90E-002	8.94E-023***
Gene–gene interactions
Protein–protein interactions	2.80E-003	1.34E-001ns	4.36E-003	1.38E-002*
microRNA–gene interactions			4.06E-003	2.99E-002*
TF–gene interactions
Sexually antagonistic genes			−1.40E-002	1.53E-002*
Intersexual genetic correlation
Folded sex specificity	2.87E-002	1.66E-051***	5.37E-003	3.15E-003**
Tissue-by-stage specificity	7.10E-003	2.69E-003**	−4.23E-003	6.03E-002ns

Note.—Shaded cells show the discarded covariates. Variants below 5% frequency were excluded.

Significance levels are denoted by asterisks: ***P < 0.001, **P < 0.01, *P < 0.05. non significant.

Table 3.

Best Linear Model on ωAMK Obtained After a Stepwise Model Selection in (a) Coding and (b) Noncoding Regions.

	(a) Coding		(b) Noncoding
Covariate	Estimate	P-value	Estimate	P-value
Intercept	3.60E-001	0.00E+000***	2.43E-001	9.66E-049***
Log(recombination rate)	1.12E-001	1.05E-043***	1.23E-001	3.94E-015***
Log(transcript length)	−1.42E-001	1.10E-070***	−1.21E-001	3.23E-013***
Log(expression level)	−6.94E-002	1.45E-014***	−2.45E-002	1.53E-001ns
X chromosome	1.65E-001	4.81E-014***	1.82E-001	1.51E-005***
Gene–gene interactions
Protein–protein interactions	1.39E-002	7.15E-002ns
MicroRNA–gene interactions			2.74E-002	8.12E-002ns
TF–gene interactions
Sexually antagonistic genes			−8.72E-002	7.25E-002ns
Intersexual genetic correlation			2.71E-002	6.29E-002ns
Folded sex specificity	1.11E-001	6.04E-046***
Tissue-by-stage specificity	4.73E-002	1.30E-006***	−3.72E-002	3.95E-002*

Note.—Shaded cells show the discarded covariates. Variants below 5% frequency were excluded.

Significance levels are denoted by asterisks: ***P < 0.001, **P < 0.01, *P < 0.05. ns: non significant.

Best Linear Model on DoS Obtained After a Stepwise Model Selection in (a) Coding and (b) Noncoding Regions. Note.—Shaded cells show the discarded covariates. Variants below 5% frequency were excluded. Significance levels are denoted by asterisks: ***P < 0.001, **P < 0.01, *P < 0.05. non significant. Best Linear Model on ωAMK Obtained After a Stepwise Model Selection in (a) Coding and (b) Noncoding Regions. Note.—Shaded cells show the discarded covariates. Variants below 5% frequency were excluded. Significance levels are denoted by asterisks: ***P < 0.001, **P < 0.01, *P < 0.05. ns: non significant. The proportion of adaptive substitutions (αMK) in coding (red) and noncoding (blue) regions is moderately influenced by different facets of pleiotropy: (a) gene networks: protein–protein interactions and microRNA–gene interactions, (b) tissue-by-stage specificity, and (c) sex-related metrics: sex specificity, sexual antagonism, and intersexual genetic correlation. Other details match figure 1.

Specificity of Expression across Tissues and Life Stages

Analyses of the effect of expression specificity on evolutionary rate have essentially been limited to adult tissues (Duret and Mouchiroud 2000; Grath and Parsch 2012), whereas other studies have found an impact of the timing of expression during development (early vs. late, e.g., Artieri et al. 2009) but did not survey its breadth across stages. Here, we explored the influence on α of expression specificity across: 1) four developmental stages (“stage specificity,” supplementary fig. S3j, Supplementary Material online); 2) several tissues within each stage (“larval tissue specificity,” “pupae tissue specificity,” and “somatic adult tissue specificity,” supplementary fig. S3k, Supplementary Material online); and 3) by combining these two dimensions (“tissue-by-stage specificity,” supplementary fig. S3l, Supplementary Material online). As expected, we found that genes specifically expressed in a particular tissue and/or stage have low expression overall (ρ ranging from −0.461 to −0.586, and P-value < 2.2E-16 in all cases, n = 10,631, supplementary fig. S1a, Supplementary Material online). Interestingly, we also found a significant negative correlation between the number of TF interactions and the extent of expression specificity (ρ ranging from −0.415 to −0.662, and P-value < 2.2E-16, n = 10,631 in all cases, supplementary fig. S1a, Supplementary Material online), suggesting that broadly expressed genes have a more complex regulatory architecture. As there was a positive correlation between the five “expression specificity” metrics (ρ ranging from 0.493 to 0.932, and P-value < 2.2E-16, n = 10,631 in all cases, supplementary fig. S1a, Supplementary Material online), and they were redundant in a principal component analysis (supplementary fig. S1b, Supplementary Material online), we focused only on the combined measure across tissues and stages (tissue-by-stage specificity) to avoid collinearity issues. The relationship between α and tissue-by-stage specificity (fig. 2) was significantly better explained by a quadratic model than a linear model (AIClinear = −107.6, AICquadratic = −149.1, supplementary table S1, Supplementary Material online), and this pattern was consistent across the different data sets (supplementary fig. S3l, Supplementary Material online) and methods (supplementary fig. S2k, Supplementary Material online). Accordingly, the Spearman’s correlation could not capture this nonmonotonic trend (ρ = −0.0596, P-value = 0.691, n = 47, fig. 2 ρpartial = −0.007, P-value = 0.491, n = 10,631, table 1). Supplementary fig. S2k, Supplementary Material online, reveals that the “U-shaped” pattern arises from the combination of a negative linear relationship between expression specificity and the strength of purifying selection (either using the fraction of strongly deleterious mutations [ρ = −0.890, P-value < 2.2E-16] or Pn/Ps [ρ = 0.948, P-value = 6.46E-24, n = 47] as found by Huber et al. 2017) and a positive nonlinear relationship with the adaptive evolutionary rate (ωA, ρ = 0.672, P-value = 2.32E-7, n = 47). This suggests that an increased fraction of adaptive substitutions can be either caused by strong purifying selection on highly pleiotropic genes (i.e., broadly expressed) or enhanced positive selection on weakly pleiotropic genes (i.e., specifically expressed).

Sex-Biased Genes and Sexual Antagonism

As others have recently found (Avila et al. 2015; Campos et al. 2018), male-biased genes are characterized by a higher proportion of adaptive substitutions than unbiased genes but not female-biased genes, leading to an asymmetric “U-shaped” pattern (fig. 2). This pattern, observed based on whole-body samples and robust to varying data sets (supplementary fig. S3g, Supplementary Material online) and methods (supplementary fig. S2h, Supplementary Material online), was mostly driven by reproductive tissues; heads did not show sufficient sexual dimorphism for such an analysis to be meaningful (supplementary fig. S4, Supplementary Material online). When merging male-biased and female-biased genes together, the folded sex specificity had a significant positive effect on α after controlling for other factors (ρpartial = 0.0899, P-value = 1.72E-20, n = 10,631, table 1). It was also retained in the best linear regression on DoS (table 2) and ωA (table 3), confirming its influence on selection. Several hypotheses have been put forward to explain this accelerated evolution of male-biased genes. First, the fact that adaptive evolution is stronger in male-biased genes than in unbiased and female-biased genes, and that male-biased genes are primarily expressed in the male reproductive organs (supplementary fig. S5, Supplementary Material online), may indicate that sexual selection originating from male–male competition drives this pattern. However, male sexual selection is unlikely to be the only process at play, because male-biased genes had a consistently higher fraction of adaptive fixations than unbiased and female-biased genes across three categories of evolutionary rate (“slow,” “intermediate,” and “fast” evolving genes; supplementary fig. S6, Supplementary Material online). Similarly, partial correlations show that sex bias remains a strong predictor of α even when Dn/Ds is corrected for (ρpartial = 0.0584, P-value = 1.72E-9, n = 10,631; supplementary table S3, Supplementary Material online). Second, sexual antagonistic pleiotropy could produce a similar “U-shaped” pattern as genes under sexual antagonism will exhibit unbiased expression, whereas intrasexual conflict is expected to be resolved by sex-specific expression. We took two approaches to detect an effect of sexual conflict on selection. First, we estimated α for a set of genes that had experimentally been detected as sexually antagonistic (Innocenti and Morrow 2010). The median folded sex specificity was 0.807 for sexually antagonistic genes, and 1.00 for nonsexually antagonistic ones (Wilcoxon’s rank test: P-value = 2.5E-11, nSAG = 1,009, nSAG = 9,622), as expected if dimorphism provides a resolution of conflict. However, when we compared α between the two sets of genes, we did not find any significant difference (medians: αSAG = 0.625, αnon-SAG = 0.649; Wilcoxon’s rank test: P-value = 0.354, nSAG = 1,009, nnon-SAG = 9,622; fig. 2), and this was consistent across data sets (supplementary fig. S3h, Supplementary Material online, except when excluding only singletons) and methods (supplementary figs. S2, Supplementary Material online). As genes were classified as sexually antagonistic based on interactions between expression and sex-specific fitness (Innocenti and Morrow 2010), selection at the coding sequence level may not be involved in the conflict. A second approach was to look at the intersexual correlation of expression of each gene, which is thought to underlie much of the sexual conflict (as it prevents the two sexes from reaching their optimum; Dean and Mank 2016). The intersexual genetic correlation in expression (which was negatively correlated to folded sex specificity, ρ = −0.185, P-value < 2.2E-16, n = 10,631, supplementary fig. S1a, Supplementary Material online; see Griffin et al. 2013 and Allen et al. 2018 for similar results) was negatively correlated with α (ρ = −0.474, P-value = 5.36E-003, n = 33, fig. 2 ρpartial = −0.019, P-value = 5.01E-2, n = 10,631, table 1), confirming that part of these highly correlated genes may be under sexual conflict. Although the trend was consistent across data sets (supplementary fig. S3i, Supplementary Material online), it was not significant with the Eyre-Walker and Keightley method (supplementary fig. S2j, Supplementary Material online). Third, male-biased genes are more tissue-specific than other genes (e.g., in Drosophila: Meisel 2011; Grath and Parsch 2012; in vertebrates: Mank et al. 2008), and their increased α may be mainly driven by tissue specificity. Consistent with this, we found a significant positive correlation between the folded sex specificity and the adult tissue specificity calculated including male and female reproductive tissues (ρ = 0.200, P-value < 2.2E-16), confirming previous findings in Drosophila (Assis et al. 2012; Allen et al. 2018). Disentangling the effect of tissue specificity is difficult because reproductive genes have by far the most specific patterns of expression of any tissue (64% of all fully adult tissue-specific genes were expressed in the male reproductive tissues). In order to get around this, we focused instead on all genes with high expression in the male reproductive tissues (testis and accessory glands), and divided those into three categories of adult tissue specificity: broad (τ < 0.4; 952 genes), intermediate (0.4 ≤ τ ≤ 0.8; 273 genes), and specific (τ > 0.8; 946 genes). Supplementary figure S8, Supplementary Material online, shows that the two extreme categories have similarly high α values (medians: αbroad = 0.833, αintermediate = 0.730, and αspecific = 0.804, respectively), suggesting that tissue specificity is not the primary cause of increased selection efficacy of male-biased genes (Meisel 2011; Grath and Parsch 2012). Finally, the fact that female-biased genes were not significantly different from unbiased genes in terms of α (fig. 2) may be due to their higher breadth of expression (e.g., only 13 genes were strictly expressed in female reproductive tissues). In particular, female-biased genes were far more expressed in the carcass and digestive systems of adults than male-biased genes (supplementary fig. S5, Supplementary Material online). Moreover, female-biased genes were highly expressed in the embryo (supplementary fig. S5, Supplementary Material online), suggesting that many of them function as maternal RNAs, which should further constrain their evolutionary dynamics due to ontogenic pleiotropy.

The Efficacy of Selection at Coding and Noncoding Sites

We compared the selective regime of coding sequences and the UTRs of transcripts, which are known to be involved in gene regulation (Barrett et al. 2012). Synonymous sites were used as the neutral control for both analyses, whereas in the analysis of noncoding regions Du and Pu refer to mutations in the UTRs. Overall, the median fraction of adaptive substitutions was higher in coding than noncoding regions (αcoding = 64.6% vs. αnoncoding = 36.7%; Wilcoxon’s rank test: P-value < 2.2E-16, ncoding = 10,631, nnoncoding = 9,895), and these values were close to those previously reported (Andolfatto 2005). Although the divergence ratio was slightly lower in coding regions (Dn/Dscoding = 40% vs. Du/Dsnoncoding = 49%; Wilcoxon’s rank test: P-value < 2.2E-16, ncoding = 10,631, nnoncoding = 9,895), there was a 2-fold decrease in the median polymorphism ratio in coding regions compared with noncoding regions (Pn/Pscoding = 16% vs. Pu/Psnoncoding = 32% when excluding variants below 5% frequency; Wilcoxon’s rank test: P-value < 2.2E-16, ncoding = 10,631, nnoncoding = 9,895), confirming previous findings that the latter are less constrained (Campos et al. 2018). At the genome level, α values in coding and noncoding regions were moderately but significantly correlated (ρ = 0.23, P-value < 2.2E-16, n = 9,895), suggesting that the selective pressures acting on genes and on their regulatory sequences are comparable. Consistent with this, most strong modulators of selection had similar effects on α for coding and noncoding sequences (we obtained comparable patterns with DoS and ωA, supplementary table S4, Supplementary Material online). Corroborating results by Campos et al. (2018), we found a strong faster-X effect in UTRs (medians: αA = 0.328, αX = 0.530; Wilcoxon’s rank test: P-value = 6.28E-41, nA = 8,299, nX = 1,596; fig. 1), consistent with faster gene expression divergence on the X chromosome compared with the autosomes (Kayserili et al. 2012; Meisel et al. 2012). Moreover, selection efficacy strongly increased with the local recombination rate (ρ = 0.916, P-value = 1.19E-20, n = 50, fig. 1 ρpartial = 0.121, P-value = 1.86E-33, n = 9,895, table 1). We also found a significant negative partial correlation between the length of the transcript and selection efficacy on its UTRs (ρpartial = −0.029, P-value = 4.04E-3, n = 9,895, table 1). However, there was no linear effect of expression level of the transcript on selection efficacy on its UTRs (ρ = −0.006, P-value = 0.968, n = 50, fig. 1 ρpartial = −0.002, P-value = 0.842, n = 9,895, table 1). This may reflect the fact that, contrary to what is observed for amino acid divergence, the rate of adaptive divergence of gene expression appears to be reduced for high expression genes (Nourmohammad et al. 2017). Taking into consideration general modulators alone, 4.9% of the variance in DoS (respectively 1.6% for ωA) was explained (supplementary table S4, Supplementary Material online), which is lower than in coding sequences (supplementary table S3, Supplementary Material online), as modulators were mostly designed based on coding information. Similarly, the effect of all pleiotropic metrics combined was lower to that found in coding regions (DoS: 0.2%; ωA: 0.1%) but was significant based on a likelihood ratio test (Log Lmodulators = 3,574.0, Log Lall = 3,589.7; P-value = 2.71E-4; supplementary table S4, Supplementary Material online). Patterns obtained for gene networks were qualitatively similar to those obtained using coding sequences. First, there was a positive correlation between α and the number of protein–protein interactions, but it was not significant for noncoding regions (ρ = 0.273, P-value = 0.391, n = 12, fig. 2 ρpartial = 0.009, P-value = 0.365, n = 9,895, table 1). We also found a negative correlation between α and the number of microRNA–gene interactions (ρ = −0.526, P-value = 8.32E-3, n = 24, fig. 2). Contrary to coding regions, this negative correlation remained significant after controlling for other factors (ρpartial = −0.037, P-value = 2.7E-4, n = 9,895, table 1), and after the stepwise model selection on DoS (table 2), which may reflect the fact that microRNAs bind primarily to 3′-UTRs. We also found the same “U-shaped” pattern (fig. 2) in the relation between the fraction of adaptive fixations and the tissue-by-stage specificity (AIClinear = −142.9, AICquadratic = −150.6, supplementary table S2, Supplementary Material online). Estimates of sexual antagonism pleiotropy were also correlated with α for noncoding regions. A positive correlation between sex specificity and α was comparable but weaker to that of coding regions (fig. 2 folded sex specificity: ρpartial = 0.0277, P-value = 5.80E-8, n = 9,895, table 1). The same was true for the intersexual genetic correlation in expression (ρ = −0.315, P-value = 7.4E-2, n = 33, fig. 2), which was nonsignificant after accounting for other factors (ρpartial = 0.006, P-value = 5.06E-1, n = 9,895, table 1). On the contrary, the pattern was stronger in noncoding regions when considering genes that were previously classified as sexually antagonistic based on expression data (Innocenti and Morrow 2010 and see Materials and Methods). Sexually antagonistic genes were under less efficient selection compared with other genes (αSAG = 0.327, αnon-SAG = 0.375; Wilcoxon’s rank test: P-value = 5.20E-3, nSAG = 954, nnon-SAG = 954; fig. 2), as expected if opposing selective forces are acting on male and female expression. Moreover, sexual antagonism was retained in the best model after stepwise selection on DoS (table 2) and on ωA (table 3).

Discussion

Pleiotropy is predicted to restrict the efficacy of selection, because trade-offs between mutations that benefit one biological process but harm others are more likely to occur in pleiotropic genes. There has been substantial debate in the literature about the effect of pleiotropy on evolutionary rates, and to a lesser extent, on selection efficacy. In this study, we shed light on the factors influencing selection efficacy in D. melanogaster and disentangle the relative contribution of different aspects of pleiotropy from that of these confounding factors. Correlates are rarely accounted for in similar studies and may have led to an overestimate of the effect of pleiotropy. We explain 4.9% of the variance in the direction of selection (DoS) (respectively 6.3% for ωA) when only considering pleiotropic metrics. When general modulators are taken into account (chromosomal location, Hill–Robertson effects, expression level), the eight pleiotropic metrics (molecular interactions, sex-related metrics, tissue-by-stage specificity) have a significant but relatively small effect on selection efficacy in coding regions: 2.3% of the total variance in DoS, whereas the two Hill–Robertson factors alone explain up to 9.5% and the complete model 12.8% (respectively 2.5%, 7.9%, and 12.0% for ωA). These estimates should be taken with some caution because of several caveats in the analysis. First, the method assumes a linear relationship between DoS and each predictor, which is clearly not the case with some pleiotropic metrics, such as tissue-by-stage specificity. Second, the different data sets employed come from different populations of D. melanogaster, and biological differences between them could have affected our results. African populations were used for the genomic data (Zambia) and for the recombination estimates (Rwanda), whereas American flies were used to identifying the sexual antagonistic genes (LH_m lab strain), for the intersexual genetic correlations (Raleigh), and for the expression data (modENCODE: Oregon-R and ISO1). Some studies have reported a fair amount of divergence among populations in the magnitude of sex-bias (e.g., in D. melanogaster: Huylmans and Parsch 2014; in D. serrata: Allen et al. 2017; but see Müller et al. 2012 for a counterexample). However, we expect that these differences would add noise in our data, therefore making our conclusions conservative. Finally, we used synonymous sites as our neutral control, although these have been shown to be under purifying selection in D. melanogaster, especially when highly expressed genes are considered (Lawrie et al. 2013). The extreme values of α and ωA that we observed for genes with the highest expression (supplementary fig. S2d, Supplementary Material online) may therefore partly be due to the strong purifying selection acting on synonymous sites, which can bias α upwards (Matsumoto et al. 2016). Despite these limitations, our results generally support the idea that the adaptive potential of a gene that has either multiple molecular functions or is expressed in multiple contexts (sexes, life stages, tissues) may be to some degree restricted. Different aspects of pleiotropy appear to restrict adaptation to different extents. First, the influence of gene connectivity on α was not robust to filtering choices or methodologies at any of the four levels investigated, and highly connected genes did not appear to be under less effective selection. Previous work has primarily focused on purifying selection, and conclusions have also been inconsistent (Jordan et al. 2003; Hahn et al. 2004; Hahn and Kern 2005; Larracuente et al. 2008). Similarly, functional analyses in which the number of biological processes of a gene was recorded found limited effects of molecular pleiotropy on the evolutionary rate in yeasts (Salathé et al. 2006), and on α or ωA in coexpression networks in plants (Josephs et al. 2017). Finally, an RNAi study performed in D. melanogaster (Vedanayagam and Garrigan 2015) found that more pleiotropic genes, that is, those having a measurable effect on multiple molecular phenotypes, are in fact under stronger positive selection. Collectively, these findings suggest that gene connectivity, despite having been widely used, is not an accurate indicator of antagonistic molecular pleiotropy (He and Zhang 2006). On the other hand, we detected an influence of sex specificity on selection efficacy. Our results suggested that this was not solely driven by the fast evolution of male-biased genes due to sexual selection, nor by their strong tissue specificity. We argue that sex specificity may be an efficient way to escape pleiotropic sexual antagonism and to evolve toward optimal sex-specific phenotypes. A caveat concerning this reasoning is our assumption that sex-biased genes typically have sex-biased fitness effects. Empirical tests are scarce but found that this is generally true in D. melanogaster (Connallon and Clark 2011). Finally, several studies have considered the effect of tissue specificity on the overall evolutionary rate (Dn/Ds) and found a positive correlation (Duret and Mouchiroud 2000; Larracuente et al. 2008), suggesting that adaptation is more prevalent in the absence of tissue pleiotropy. Likewise, breadth of expression across developmental stages may impede selection efficacy, especially in holometabolous insects, as it has been reported in D. melanogaster (Perry et al. 2014). A notable finding of our work is that by combining both the temporal (life stages) and spatial (tissues) dimensions, an intermediate level of tissue-by-stage specificity appears to lead to maladaptation. We suggest that this complex pattern arises because the accumulation of slightly deleterious nonsynonymous fixations is not compensated by adaptive fixations for intermediate level of ontogenic pleiotropy. Our estimate of α in coding regions was above 50%, consistent with what has been found in Drosophila (Andolfatto 2005; Eyre-Walker et al. 2006) and in other animal species (Galtier 2016). This value was ∼35% for noncoding regions (UTRs), which is similar to the 37.1% reported by Andolfatto (2005). These results confirm that these sequences are under effective natural selection (Andolfatto 2005; Elyashiv et al. 2016; Campos et al. 2017), supporting their role in the regulation of gene expression (Barrett et al. 2012). We found that pleiotropy has similar consequences on selection efficacy for mutations in coding regions and for those regulating gene expression, though effects were generally less pronounced in the latter. Notably, we found more support for an influence of sex-specific selection on regulatory evolution than previously reported. We observed an asymmetric U-shaped pattern between sex specificity and α in noncoding regions, contrary to Campos et al. (2018), who did not find faster adaptive evolution of male-biased genes in UTRs. The reason for this discrepancy is not entirely clear, but may have to do with how genes were classified in sex-bias categories, as Campos et al. (2018) employed a more sophisticated method. Genes that had been detected as having sexually antagonistic effects on fitness (Innocenti and Morrow 2010) also had reduced α values, further supporting the role of sexual conflict in limiting the efficacy of selection on regulatory regions. Although our results were robust to different .filtering procedures and methods to estimate α, the weak effects reported may be partly due to incomplete information on pleiotropy. There are many conflicting definitions of pleiotropy (Paaby and Rockman 2013), and many ways of estimating it. Some studies (Salathé et al. 2006; McGuigan et al. 2014) have used the number of annotated gene ontology terms, but concluded a weak effect on the rate of evolution or on the strength of stabilizing selection; this was the case even when pleiotropy within functional modules (modular pleiotropy) was considered (Collet et al. 2018). More direct and accurate measures include quantitative trait locus mapping (e.g., in mice, Wagner et al. 2008) and RNAi screening (e.g., in yeast, Ohya et al. 2005). These studies have shown that most of the genes influence only a few phenotypes (L-shaped distribution of pleiotropy, Wagner and Zhang 2011; but see Hill and Zhang [2012] for a counterpoint), but how applicable this is to other organisms is still unclear. Overall, our knowledge of the consequences of pleiotropy is still scarce, and thus future empirical studies should focus on what matters from an evolutionary perspective, that is, when a single gene influences more than one fitness component.

Materials and Methods

Genomic Data

Polymorphism within D. melanogaster

We used the Drosophila Population Genomics Project phase 3 data (Lack et al. 2015) consisting of a sample of 197 haploid genomes from a single locality, Siavonga in Zambia (Africa). This locality is considered to be part of the ancestral range for the species (Pool et al. 2012), and so the population shows high genetic diversity and has a simple demography. Whole-genome data for chromosome 2 (arms 2R and 2L), chromosome 3 (arms 3R and 3L) and chromosome X were retrieved from the Pool lab (http://johnpool.net/genomes.html, “DPGP3 SEQ” downloaded in September 2017). We sequentially masked regions with evidence of 1) heterozygosity, 2) identity by descent, and 3) admixture from non-African populations using the “MASKING PACKAGE” provided by the Pool lab. We then converted the 197 haploid sequences to standard vcf format using snp-sites.v2.3.3 (Page et al. 2016) and the D. melanogaster release 5.57 genome (http://popfly.uab.cat) as the reference. We filtered-out sites with more than 50% missing data with VCFtools v0.1.12 (Danecek et al. 2011). The “ingroup vcf” (available in supplementary file S1, Supplementary Material online) provides a list of polymorphic sites segregating in the Zambia population.

Divergence with D. simulans

We retrieved the D. simulans 2 genome aligned to the D. melanogaster 5.57 genome from the Pool lab (http://johnpool.net/genomes.html, “SIMULANS SEQ” downloaded in September 2017). We then converted the D. simulans sequence to standard vcf format using snp-sites.v2.3.3 and the D. melanogaster release 5.57 genome as reference. Sites segregating within the Zambia population were discarded, so the “outgroup vcf” (available in supplementary file S2, Supplementary Material online) provides a list of divergent sites between D. simulans and D. melanogaster. Alleles at each site were orientated based on the D. simulans sequence (sites with missing data in D. simulans were discarded).

Annotation

Only biallelic sites were considered for further analyses, and only the longest transcript of each gene was retained. We conducted variant annotation based on the vcfs with SnpEff v4.3r (Cingolani et al. 2012) using the D. melanogaster BDGP5.75 database. For each transcript, we extracted the number of synonymous (tagged “synonymous_variant”) and nonsynonymous (tagged “missense_variant”) polymorphic sites (Ps and Pn, respectively) from the annotated “ingroup vcf”. Similarly, we extracted the number of synonymous and nonsynonymous divergent sites (Ds and Dn, respectively) from the annotated “outgroup vcf”. To investigate the evolution of noncoding DNA, we also extracted the number of UTR variants (tagged “3_prime_UTR_variant” and “5_prime_UTR_variant”) as an alternative nonneutral class of sites in the “ingroup” and “outgroup” vcfs (Pu and Du, respectively). Annotations for each transcript are provided in supplementary file S3, Supplementary Material online, for coding data and supplementary file S4, Supplementary Material online, for noncoding data. In the Supplementary Material online, we performed an additional annotation based on vcfs with SNPGenie (Nelson et al. 2015) using the D. melanogaster 5.75 genome and annotation file (ftp.flybase.net/releases/FB2014_03/dmel_r5.57). SNPGenie was run twice: once for the forward strand with argument vcfformat = 1 and once for the complementary strand. For each transcript, we obtained the total number of potential synonymous and nonsynonymous sites (Ns and Nn, respectively), together with Ps, Pn, Ds, and Dn as described above. We also computed the folded synonymous and nonsynonymous site frequency spectra (SFSs and SFSn, respectively) by downsampling 186 haplotypes at each site to get data without missing genotypes. Annotations for each transcript are provided in supplementary file S5, Supplementary Material online.

Expression Data

We used the Model Organism ENCyclopedia Of Dna Elements (modENCODE Consortium 2010) that provides RNA-seq expression patterns in D. melanogaster across multiple developmental stages (embryos, larvae, pupae, male and female adults) and types of tissues (carcass, fat, salivary glands, digestive system, imaginal discs, central nervous system [CNS], heads, ovaries, testes, and accessory glands). The expression data (“gene_rpkm_report_fb_2017_04.tsv.gz”) were retrieved from ftp.flybase.net/releases/current/precomputed_files/genes in October 2017. Expression values were provided as the number of reads per kilobase per million reads (RPKM) summed over all transcripts for each gene. We quantile normalized the expression values between samples using the R package “preprocessCore” v1.36.0 (Bolstad 2017). Normalization was applied either 1) between male and female adults, 2) between developmental stages for whole body samples, or 3) between tissues for a given developmental stage. In total, we had 56 samples across all tissues and developmental stages, among which three were removed due to very low expression after normalization (“em0.2hr,” “L3_Wand_fat,” and “L3_Wand_saliv” stages). No minimal expression cutoff was applied. Gene expression values for each sample are provided in supplementary file S6, Supplementary Material online.

Efficacy of Selection

The Proportion and Rate of Adaptive Substitutions: McDonald–Kreitman Method

Based on the SnpEff annotation, we calculated the proportion of nonneutral substitutions fixed by adaptation (α). High values of α can be due to either increased efficacy of positive selection (higher rate of fixation of adaptive substitutions) and/or increased efficacy of purifying selection (lower rate of fixation of deleterious substitutions). We compared the number of sites that were polymorphic within D. melanogaster and divergent between D. melanogaster and D. simulans for a selected class of sites (nonsynonymous or UTR variants) to a neutral reference (synonymous variants) as follows: αMK= 1 − (Pn/Ps)/(Dn/Ds) (Smith and Eyre-Walker 2002, based on an extension of the McDonald–Kreitman test [McDonald and Kreitman 1991]). We assumed that synonymous variants are neutral. Although there is little evidence for current selection on codon usage bias in D. melanogaster (McVean and Vieira 2001), some purifying selection is likely acting on synonymous sites in high expression genes (Lawrie et al. 2013; see Discussion). We excluded rare variants from the analysis (below 5% frequency in our 197 haplotype sample), because the presence of slightly deleterious mutations segregating within species is expected to bias estimates of α downward (Fay et al. 2001, Charlesworth and Eyre-Walker 2008). Estimates were obtained for both coding (supplementary file S3, Supplementary Material online) and noncoding regions (supplementary file S4, Supplementary Material online). In the Supplementary Material online, we additionally obtained estimates of αMK in coding regions by: 1) removing transcripts shorter than 900 bp; 2) excluding X-linked genes; 3) excluding sex-biased genes (SBR > −4 or SBR > 4); 4) excluding immune genes (n = 483 genes with GO:0002376 “immune system process” on http://flybase.org; last accessed October 2018); or 5) removing only singletons. Finally, from αMK we deduced the rate of fixation of beneficial mutations relative to the rate for neutral mutations, as follows: ωAMK = αMK * Dn/Ds (Muyle et al. 2018). This complementary metric is suitable under a nearly neutral regime as αMK would be affected by the rate of both nonadaptive and adaptive substitutions.

The Proportion and Rate of Adaptive Substitutions: Eyre-Walker and Keightley Method

Based on the SNPGenie annotation and all variants, we alternatively calculated α with the approach of Eyre-Walker and Keightley (2009). The method is described in the appendix of their paper, and implemented in the software DoFEα (http://www.sussex.ac.uk/lifesci/eyre-walkerlab/resources; last accessed October 2017). This extension of the classic McDonald and Kreitman framework (1991) explicitly models the contribution of deleterious mutations to polymorphism and divergence, and corrects for distortions of the site frequency spectrum, either due to recent demographic changes or genetic draft (Messer and Petrov 2013), by introducing a series of nuisance parameters. The method uses the folded SFSs, folded SFSn, Ns, and Nn to infer the distribution of fitness effects (DFE) of new mutations by fitting a gamma distribution with two parameters (the shape and the mean strength of selection), based on the method of Eyre-Walker et al. (2006). From the DFE, the method calculates the proportion of mutations in four ranges of Ne*s, where Ne is the effective population size and s is the selection coefficient for deleterious mutations in heterozygotes: 1) Ne*s = 0–1 (nearly neutral); 2) Ne*s = 1–10; 3) Ne*s = 10–100; and 4) Ne*s > 100 (strongly deleterious). We also calculated the rate of adaptive substitution relative to neutral (ωAEWK) following the method of Gossmann et al. (2010) and implemented in the software DoFEα. The method gives 95% confidence intervals for αEWK and ωAEWK, and the standard error associated with the proportion of mutations in the four ranges of the DFE. Binning was necessary here, because estimations for single genes are imprecise. So we grouped genes into bins according to their value at each covariate (see “Covariates” section below), and DoFE was run for each bin independently. For numerical covariates, bins were based on quantiles that divide the data in equally sized groups.

The Direction of Selection, DoS

We calculated the direction of selection statistic (Stoletzki and Eyre-Walker 2011) as follows: DoS = Dn/(Dn + Ds) − Pn/(Pn+Ps). A deficit of nonsynonymous polymorphisms relative to nonsynonymous substitutions (DoS > 0) indicates positive selection, whereas an excess of nonsynonymous polymorphisms relative to nonsynonymous substitutions (DoS < 0) is indicative of purifying selection. The correlation between ωAMK and DoS was strong and highly significant (ρ = 0.964, P-value < 2.2E-16, n = 10,631, supplementary fig. S1a, Supplementary Material online), as was to a lesser extent the correlation between αMK and DoS (ρ = 0.798, P-value < 2.2E-16, n = 10,631, supplementary fig. S1a, Supplementary Material online) and αMK and ωAMK (ρ = 0.676, P-value < 2.2E-16, n = 10,631, supplementary fig. S1a, Supplementary Material online).

Covariates

We used the following statistics for each gene as covariates: 1) chromosomal location (autosome vs. X chromosome), 2) length of its longest transcript, 3) recombination rate, 4) global expression level (average expression of a gene over the whole body samples), 5) intersexual genetic correlation in expression, 6) sexual antagonism, 7) four types of connectivities, and 8) six types of expression specificities across sexes, life stages, and tissues. In the Supplementary Material online, we additionally considered the global selection strength (Dn/Ds in coding regions, and Du/Ds in noncoding regions). The final data sets including all covariates, αMK, ωAMK, and DoS estimates for each gene are provided in supplementary file S7, Supplementary Material online (coding sites excluding variants below 5%) and supplementary file S8, Supplementary Material online (noncoding sites excluding variants below 5%). Similarly, we provide in supplementary file S9, Supplementary Material online, the estimates for αEWK, ωAEWK, and the deleterious DFE obtained with the Eyre-Walker and Keightley method on binned data. In the following sections, details on the covariates are given.

Recombination Rate

We used a fine-scale recombination map of D. melanogaster statistically inferred based on population data from Gikongoro in Rwanda (Africa) and provided by Chan et al. (2012). The recombination rate for a gene was calculated as the mean rate (in units of 4*Ne*r per bp) across all bins (defined as a window between each pair of successive single nucleotide polymorphisms) that a gene contains.

Intersexual Genetic Correlation in Expression

We used expression data in males and females quantified by genome-tiling microarrays in inbred lines of D. melanogaster from Raleigh in North Carolina and provided by Huang et al. (2015; http://dgrp2.gnets.ncsu.edu/data.html; last accessed October 2017). The intersexual genetic correlation in expression was then estimated for each gene using a bivariate mixed effects model with the TYPE = UNR option in Proc MIXED from SAS v.9.4 (following Allen et al. 2018; https://doi.org/10.6084/m9.figshare.c.4181168.v1; last accessed October 2018).

Sexual Antagonism

A list of candidate sexually antagonistic genes in D. melanogaster (LH_m strain) was obtained from Innocenti and Morrow (2010, table S1 sheet 4, doi: 10.1371/journal.pbio.1000335.s004). These genes are characterized by a significant sex-by-fitness interaction in a linear mixed model that partitions the variance in gene expression in different components: sex effect, line fitness effect, batch effect, and sex-by-fitness interaction. As previously found (Griffin et al. 2013), sexually antagonistic genes were characterized by a higher intersexual genetic correlation (median: ICSAG = 0.783) than other genes (median: ICnon-SAG = 0.610) and this difference was significant (Wilcoxon’s rank test: P-value = 1.77E-07, nnon-SAG = 9,622, nSAG = 1,009). We performed a supplementary analysis (supplementary fig. S7, Supplementary Material online) by considering only sexually antagonistic genes that additionally have a significant and opposite fitness effect in the sex-specific data sets (table S1 sheet 1 and 2 of Innocenti and Morrow 2010). However, we could not detect any significant effect of these genes, which may be due to the reduced sample sizes (n = 68 in coding regions, and n = 66 in noncoding regions).

Gene Connectivity

Gene connectivity was assessed based on the Drosophila Interactions Database (Murali et al. 2011) available from http://www.droidb.org (last accessed in October 2017). We calculated for each gene the number of interactions in which it was involved at four different levels: 1) genetic interactions (“fly_genetic_interactions.txt”) which represent interactions between two alleles, 2) protein–protein interactions (“flybase_ppi.txt”) which include experimentally derived physical interactions, 3) microRNA–gene interactions (“rna_gene.txt”) which include predicted regulatory interactions, and 4) TF–gene interactions (“tf_gene.txt”) which contain experimentally validated interactions.

Sex Specificity

For sex specificity, we estimated the sex-bias ratio as SBR = log2(RPKM♀/RPKM♂) based on whole body samples of female adults (“AdF_Ecl_1days,” “AdF_Ecl_5days,” “AdF_Ecl_30days”) and male adults (“AdM_Ecl_1days,” “AdM_Ecl_5days,” “AdM_Ecl_30days”), and we calculated a folded sex specificity metric by taking the absolute value of the sex-bias ratio. Alternatively (supplementary fig. S4, Supplementary Material online), sex specificity was calculated based on reproductive tissue samples in females (“A_VirF_4d_ovary,” “A_MateF_4d_ovary”) and males (“A_MateM_4d_acc_gland,” “A_MateM_4d_testis”). Genes were also categorized in nine sex-bias bins depending on the value of the ratio: 1) male-specific if SBR < −8; 2) three male-biased classes if −8 ≤ SBR < −4, −4 ≤ SBR < −2, −2 ≤ SBR < −1; 3) three female-biased classes if 8 ≥ SBR > 4, 4 ≥ SBR > 2, 2 ≥ SBR > 1; 4) female-specific if SBR > 8; and 5) unbiased if −1 ≤ SBR ≤ 1. Expression values of the different samples within each sex were averaged.

Developmental Stage/Tissue Specificities

We quantify the breadth of expression of a gene across different developmental stages and/or tissues by computing the expression bias (Yanai et al. 2005): Here, n is the total number of stages/tissues, Sj is the expression level in stage/tissue j and Smax is the largest expression level over all stages/tissues. We used τ to classify genes in different bins based on quantiles that divide the data into equally sized groups from broadly expressed (τ = 0) to highly specific (τ = 1) genes. Stage specificity was calculated based on whole body samples of four developmental stages: 1) embryos (“em2.4hr,” “em4.6hr,” “em6.8hr,” “em8.10hr,” “em10.12hr,” “em12.14hr,” “em14.16hr,” “em16.18hr,” “em18.20hr,” “em20.22hr,” and “em22.24hr”); 2) larvae (“L1,” “L2,” “L3_12hr,” “L3_PS1.2,” “L3_PS3.6,” and “L3_PS7.9”); 3) pupae (“P5,” “P6,” “P8,” “P9.10,” and “P15”); and 4) adults (“AdF_Ecl_1days,” “AdF_Ecl_5days,” “AdF_Ecl_30days,” “AdM_Ecl_1days,” and “AdM_Ecl_5days”). Tissue specificity within larvae was calculated based on four tissues: 1) carcass (“L3_Wand_carcass”); 2) digestive system (“L3_Wand_dig_sys”); 3) CNS (“L3_CNS”); and 4) imaginal discs (“L3_Wand_imag_disc”). Tissue specificity within pupae was calculated based on two tissues: 1) CNS (“P8_CNS”) and 2) fat (“P8_fat”). Somatic tissue-specificity within adults was calculated based on three tissues: 1) carcass (“A_1d_carcass,” “A_4d_carcass,” and “A_20d_carcass”); 2) digestive system (“A_1d_dig_sys,” “A_4d_dig_sys,” and “A_20d_dig_sys”); and 3) heads (“A_VirF_1d_head,” “A_VirF_4d_head,” “A_VirF_20d_head,” “A_MateF_1d_head,” “A_MateF_4d_head,” “A_MateF_20d_head,” “A_MateM_1d_head,” “A_MateM_4d_head,” and “A_MateM_20d_head”). Finally, we computed a “tissue-by-stage” specificity measure based on the four larval tissues, the two pupae tissues, and the three adult tissues. Expression values of the different samples within each stage/tissue were averaged.

Statistical Analyses

All statistical analyses were performed using the R version 3.3.1 (R Core Team 2016).

Single Covariate-Related Statistics

Confidence intervals were obtained by bootstrapping across genes (1,000 replicates) using the R function “boot” (from the package “boot”), and we report the 95% confidence intervals as ±2 SD of the distribution of bootstrapped values. Differences in αMK values for the categorical covariates (chromosomal location, sexual antagonism, and the nine sex specificity bins) were assessed with Wilcoxon’s rank-sum tests (R function “wilcox.test” in package “stats”). For numerical covariates, we grouped genes into bins based on quantiles that divide the data into equally sized groups. We then calculated the Spearman’s rank correlation between the median of the covariate at each bin and the median αMK using the R function “cor.test” (from the package “stats”). We also fit least-squares linear regressions in the form αMK ∼ covariate, and least-squares quadratic regressions in the form αMK ∼ covariate + covariate2 on unbinned data, calling the R function “lm” (from the R package “stats”). Model fits were then compared based on AIC (supplementary tables S1 and S2, Supplementary Material online).

Multicollinearity among Covariates

We performed a quantitative assessment of the relationships between covariates on unbinned data. First, Spearman’s pairwise correlations were computed with the R function “cor.test” (from the package “stats”) between αMK, DoS, ωAMK, and all numerical covariates (supplementary fig. S1a, Supplementary Material online). Second, a principal component analysis on all covariates was carried out with the R function “PCA” (from the package “FactoMineR,” supplementary fig. S1b, Supplementary Material online). Numerical covariates were scaled to zero mean and unit variance. In both analyses, the five stage/tissue expression specificities proved to be nonindependent, and so we removed them all from subsequent investigations, except for the combined metric tissue-by-stage specificity. Computation of the variance inflation factors (R function “ols_vif_tol” in package “olsrr”) in the linear models on DoS confirmed that the remaining 12 covariates had limited collinearity (supplementary tables S3 and S4, Supplementary Material online).

Multiple Linear Regression

We constructed linear models on unbinned data with the R function “lm” (from the package “stats”) to analyze the relative contribution of the different covariates on the efficacy of selection. We analyzed separately DoS and ωAMK as the response variables (they were chosen instead of α because DoS and ωAMK follow the assumptions required by linear regressions, which α does not), and we compared two alternative sets of covariates. First, only the general modulators were included, as follows: DoS (respectively ωAMK) ∼ chromosomal location + log(transcript length) + log(recombination rate) + log(global expression level). Then, we included the eight noncollinear pleiotropic covariates described in the previous section: gene–gene interactions + protein–protein interactions + microRNA–gene interactions + TF–gene interactions + sexual antagonistic genes + intersexual genetic correlation + folded sex specificity + tissue-by-stage specificity. In both cases, residuals were independent, identically distributed and followed a Gaussian distribution of null mean (DoS: supplementary fig. S9a, Supplementary Material online; ωAMK: supplementary fig. S9b, Supplementary Material online). Estimate of the effect of each covariate, its standard error, and significance were computed using the type III ANOVA implemented in the R function “Anova” (from the package “car”). The two linear models (“general modulators only” vs. “all covariates”) were compared based on a likelihood ratio test using the R function “lrtest” (from package “lmtest”). Additionally, we carried out a stepwise model selection to eliminate step by step the covariates that contribute the least to the model, based on AIC comparison (table 2). We used the R function “stepAIC” (from the MASS package) including both direction of selection (forward and backward).

Partial Correlations and Principal Component Regression

As α was not suitable for a linear regression analysis, and many covariates were correlated with one another (supplementary fig. S1, Supplementary Material online), which can lead to imprecise estimates of individual effects in linear models, we alternatively calculated Spearman’s rank partial correlations on unbinned data between α (respectively DoS and ωAMK) and each covariate, conditional on all other covariates, using the R function “pcor.test” (from the package “ppcor,” supplementary tables S3 and S4, Supplementary Material online). To further overcome the issue of multicollinearity within a linear modeling framework, we performed a principal component regression on DoS (supplementary tables S5 and S6, Supplementary Material online). We first identified the independent source of variation in the data using a principal component analysis on all 12 covariates. Then, we performed a linear regression of DoS on the principal components to obtain an estimate of the effect of each principal component and its significance. Covariates basically show the same ordering regarding their contribution to the variance in DoS as in the linear models on the original covariates.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.

108 in total

1. SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data.

Authors: Chase W Nelson; Louise H Moncla; Austin L Hughes
Journal: Bioinformatics Date: 2015-07-29 Impact factor: 6.937

2. Codon Usage Selection Can Bias Estimation of the Fraction of Adaptive Amino Acid Fixations.

Authors: Tomotaka Matsumoto; Anoop John; Pablo Baeza-Centurion; Boyang Li; Hiroshi Akashi
Journal: Mol Biol Evol Date: 2016-02-12 Impact factor: 16.240

3. The McDonald-Kreitman test and slightly deleterious mutations.

Authors: Jane Charlesworth; Adam Eyre-Walker
Journal: Mol Biol Evol Date: 2008-01-14 Impact factor: 16.240

Review 4. The evolutionary causes and consequences of sex-biased gene expression.

Authors: John Parsch; Hans Ellegren
Journal: Nat Rev Genet Date: 2013-02 Impact factor: 53.242

5. Genetic basis of transcriptome diversity in Drosophila melanogaster.

Authors: Wen Huang; Mary Anna Carbone; Michael M Magwire; Jason A Peiffer; Richard F Lyman; Eric A Stone; Robert R H Anholt; Trudy F C Mackay
Journal: Proc Natl Acad Sci U S A Date: 2015-10-19 Impact factor: 11.205

6. Mutational Pleiotropy and the Strength of Stabilizing Selection Within and Between Functional Modules of Gene Expression.

Authors: Julie M Collet; Katrina McGuigan; Scott L Allen; Stephen F Chenoweth; Mark W Blows
Journal: Genetics Date: 2018-02-01 Impact factor: 4.562

7. Widespread adaptive evolution of Drosophila genes with sex-biased expression.

Authors: Matthias Pröschel; Zhi Zhang; John Parsch
Journal: Genetics Date: 2006-09-01 Impact factor: 4.562

8. Adaptive protein evolution at the Adh locus in Drosophila.

Authors: J H McDonald; M Kreitman
Journal: Nature Date: 1991-06-20 Impact factor: 49.962

9. Sexual and temporal dynamics of molecular evolution in C. elegans development.

Authors: Asher D Cutter; Samuel Ward
Journal: Mol Biol Evol Date: 2004-09-15 Impact factor: 16.240

10. The effects of natural selection across molecular pathways in Drosophila melanogaster.

Authors: Jeffrey P Vedanayagam; Daniel Garrigan
Journal: BMC Evol Biol Date: 2015-09-21 Impact factor: 3.260

6 in total

1. Single-nucleus transcriptomes reveal evolutionary and functional properties of cell types in the Drosophila accessory gland.

Authors: Alex C Majane; Julie M Cridland; David J Begun
Journal: Genetics Date: 2022-02-04 Impact factor: 4.402

2. Sex differences in deleterious mutational effects in Drosophila melanogaster: combining quantitative and population genetic insights.

Authors: Filip Ruzicka; Tim Connallon; Max Reuter
Journal: Genetics Date: 2021-11-05 Impact factor: 4.402

3. Genomic analysis of the four ecologically distinct cactus host populations of Drosophila mojavensis.

Authors: Carson W Allan; Luciano M Matzkin
Journal: BMC Genomics Date: 2019-10-12 Impact factor: 3.969

4. Dissecting Genomic Determinants of Positive Selection with an Evolution-Guided Regression Model.

Authors: Yi-Fei Huang
Journal: Mol Biol Evol Date: 2022-01-07 Impact factor: 16.240

5. When the "satisficing" is the new "fittest": how a proscriptive definition of adaptation can change our view of cognition and culture.

Authors: Valentin Magnon; Bruno Corbara
Journal: Naturwissenschaften Date: 2022-08-12

6. Schistosome W-Linked Genes Inform Temporal Dynamics of Sex Chromosome Evolution and Suggest Candidate for Sex Determination.

Authors: Marwan Elkrewi; Mikhail A Moldovan; Marion A L Picard; Beatriz Vicoso
Journal: Mol Biol Evol Date: 2021-12-09 Impact factor: 16.240

6 in total