Literature DB >> 21915117

Inferring regulatory mechanisms from patterns of evolutionary divergence.

Abstract

The number of sequenced species is increasing at a staggering rate, calling for new approaches for incorporating evolutionary information in the study of biological mechanisms. Evolutionary conservation is widely used for assigning a function to new proteins and for predicting functional coding or non-coding sequences. Here, we argue for a complementary approach that focuses on the divergence of regulatory programs. Regulatory mechanisms can be learned from patterns of evolutionary divergence in regulatory properties such as gene expression, transcription factor binding or nucleosome positioning. We review examples of this concept using yeast as a model system, and highlight a hybrid-based approach that is highly instrumental in this analysis.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2011 PMID： 21915117 PMCID： PMC3202799 DOI： 10.1038/msb.2011.60

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

The basic approach: why comparing related species could help in identifying regulatory mechanisms

The ability to rapidly and cheaply sequence full genomes is revolutionizing biological research. However, the static genomic sequences conceal a highly dynamic program of DNA-associated processes such as transcription, binding of transcription factors or positioning of nucleosomes. This ‘regulatory program' is a defining property of an organism. Understanding this regulation and how it can be predicted from the genomic sequence are key challenges of genomic research. Comparative genome analysis of related species is highly instrumental in this regard, as it is used routinely for identifying conserved sequences and by that to predict cis-regulatory elements and other functional non-coding sequences (Kellis et al, 2003; Xie et al, 2005). Here, we will discuss the possibility of extending this comparative approach in two ways: first, by comparing also the regulatory programs of different species or strains (rather than just their genomic sequences) and second, by focusing on evolutionary divergence (rather than conservation). The question of how gene expression evolves between closely related species has been studied extensively (King and Wilson, 1975; Carroll, 2005). Widespread inter-species differences were identified in high-throughput comparisons (Rifkin et al, 2003; Gilad et al, 2006; Khaitovich et al, 2006; Tirosh et al, 2006; Yanai and Hunter, 2009), and expression profiles vary greatly even between strains (or individuals) of the same species (Brem et al, 2002; Oleksiak et al, 2002; Denver et al, 2005; Kliebenstein et al, 2006; Landry et al, 2006). Evolutionary differences were also identified in other genomic properties, including transcription factor (TF) binding sites (Borneman et al, 2007; Kasowski et al, 2010; Schmidt et al, 2010; Zheng et al, 2010), nucleosome positioning (Field et al, 2009; Tsankov et al, 2010; Tirosh et al, 2010b), histone modifications (Nagarajan et al, 2010) and protein phosphorylation (Beltrao et al, 2009), leading to the conclusion that regulatory programs change quite rapidly. Divergence of regulatory programs is encoded in the divergence of the genomic sequences. Inter-species comparison, therefore, provides a way to connect sequence and regulatory divergence. In this view, analyzing how the regulatory program diverged is analogous to studying how the regulatory program responds to multiple genetic perturbations. However, while genetic perturbations are normally studied one at a time (e.g., single-gene knockout), inter-species analysis uncovers the combined effects of numerous genetic perturbations. The disadvantage of this approach is that it becomes difficult to associate a regulatory change with specific genetic (sequence) perturbation. The advantage is that by combining many genetic perturbations, one can uncover general trends connecting the genetic perturbations with the regulatory programs. In the sections below, we provide the following examples which employed this idea: Patterns of expression divergence revealed a promoter architecture that governs gene expression variability. Patterns of expression divergence in strains deleted of chromatin regulators revealed that chromatin regulators function as genetic capacitors of gene expression. Patterns of divergence of antisense-containing genes suggested that antisense transcription induces a threshold-dependent transcriptional switch. Patterns of divergence in nucleosome positioning describes both cis and trans influences on nucleosome positioning. Patterns of divergence of mRNA degradation rate suggested a mechanistic coupling between transcription and mRNA degradation. Patterns of allelic expression in a cross between diverged mice strains indicated instances of genomic imprinting. The first five examples used related budding yeast strains (#3), or species (#1, 2, 4, 5) which share approximately the same set of genes, are almost completely syntenic and have high sequence similarity (∼80–90% identical nucleotide sequences; Kellis et al, 2003). The last example used different mice strains. Before describing these examples in detail, we discuss two general issues. First, we consider the relative contribution of natural selection (versus random drift) in shaping the observed patterns of divergence. Second, we discuss the general ‘hybrid approach' for dissecting regulatory changes into those that are generated in cis versus those generated in trans.

The contributions of positive selection versus neutral drift in shaping the regulatory divergence

As described above, we view the regulatory divergence between species as the result of a large collection of mutations. This enables us to deduce general trends connecting changes in genomic sequence to changes in the regulatory program. An underlying assumption here is that mutations accumulate largely at random. Yet, at least in some cases, recurrent patterns of regulatory divergence may reflect the signature of natural selection rather than the outcome of a random sampling of genetic mutations. For example, if certain aspects of regulation undergo frequent adaptive evolution (or conversely are kept constant by purifying selection) then these may appear as a recurrent pattern. The contribution of positive selection to evolution of gene regulation received much attention (Khaitovich et al, 2004; Yanai et al, 2004; Jordan et al, 2005). The emerging conclusion is that adaptive regulatory changes are an exception rather than the rule, and that most of the observed regulatory changes are neutral. First, regulatory changes among closely related species are much more widespread than expected from the apparent differences in physiology between the species (often encompassing half of all genes) and second, most regulatory changes are small in magnitude and therefore are unlikely to carry phenotypic consequences: typical yeast inter-species differences in gene expression are in the order of 1.5-fold (Tirosh et al, 2009b), while a 2-fold reduction (heterozygote deletion) in most yeast genes has no effect on growth rates (Giaever et al, 1999). These and other considerations (Khaitovich et al, 2006) suggest that most of the changes observed in the regulatory program are neutral and therefore that the recurrent patterns may serve as good proxies for identifying regulatory mechanisms. Nevertheless, a ‘selective' explanation should still be considered when inferring a regulatory mechanism, as described in each of the sections below.

The ‘hybrid approach' for dissecting regulatory changes

Another key challenge is to associate regulatory changes with the causal genetic mutations. One way of doing that is using the genetical-genomics approach (Rockman and Kruglyak, 2006). The idea is to perform linkage analysis, comparing the segregation of regulatory changes with that of sequence polymorphisms in a panel of dozens of segregates from a cross between two strains. This approach is highly informative, but is also labor intensive. Furthermore, it can only be used for analyzing differences between strains of the same species but not for analyzing inter-species differences, as, by definition, inter-specific F1 hybrids are sterile and do not produce F2 segregants. An alternative general approach which was used extensively in the studies we describe below is to ‘mix' the two genomes within one organism. This is done by generating a hybrid, which contains full copies of both genomes (Wittkopp et al, 2004; Figure 1). In the context of the hybrid, orthologous genes appear as different alleles of the same gene, and are subject to regulation by the same trans environment (e.g., same regulatory proteins). Thus, allele-specific differences in regulation must reflect mutations that are linked in cis to the gene itself and distinguish between its two orthologous alleles (i.e., mutations in the gene or its flanking regulatory sequences). The additional differences that are observed between orthologs in the two species, but not within the hybrid, correspond to the effects of trans mutations.

Figure 1

Hybrid analysis distinguish between the effects of mutations in cis or trans. Comparison of gene expression between two species reveals differences due to the combined effects of cis and trans mutations (top). Analysis of a hybrid formed by mating the two species allows classification of the differences into those due to cis-acting and trans-acting mutations: two hybrid alleles (that correspond to orthologous genes of the parental species) reside in the same nucleus and are regulated by the same set of trans factors, thus avoiding any differential expression due to trans-acting mutations (bottom right). However, cis-acting mutations discriminate between the two alleles and thus maintain the inter-species differences (bottom left).

Genome-wide analysis of allele-specific expression was initially performed in humans (Yan et al, 2002; Verlaan et al, 2009), followed by extensive analysis of intra- and inter-specific F1 hybrids in flies (Wittkopp et al, 2004, 2008; Landry et al, 2005; McManus et al, 2010), yeasts (Sung et al, 2009; Tirosh et al, 2009b; Emerson et al, 2010a) and mice (Wang et al, 2010; Gregg et al, 2010a, 2010b). A related approach involves the substitution of individual chromosomes between flies (Lemos et al, 2008; Wang et al, 2008). Similarly, a mouse model of Down syndrome that contains an extra human chromosome 21 was used to study the role of cis- and trans-regulatory mutations in mammals (Wilson et al, 2008). Collectively, these studies demonstrate that inter-species expression divergence is dominated by cis mutations, in contrast to expression divergence between strains, which is generated primarily by trans mutations (Wittkopp et al, 2008). At individual genes, the distinction between cis and trans effects is not sufficient to identify the causal mutations, but provides a useful classification of regulatory changes that hints where to look for the specific mutations. Importantly, cis mutations affecting different genes reflect independent evolutionary events and this can increase the significance of observed recurrent patterns (Bullard et al, 2010). A more complete discussion on the properties and roles of cis- and trans-acting mutations can be found in recent reviews (Thompson and Regev, 2009; Emerson and Li, 2010b).

‘Expression variability' and how it is linked to promoter architecture

Genes diverge at different rates. This was first observed in analysis of protein sequences: some proteins diverge quickly, whereas others remain largely the same between organisms. There are multiple reasons for that (Pal et al, 2006). For example, essential proteins are relatively conserved, likely due to the need to maintain their function. Highly expressed proteins are especially well conserved, possibly because they are subject to more stringent constraints for proper folding (Drummond and Wilke, 2008). One of the first questions that were asked when comparing the regulatory program is whether these differences in evolutionary rates generalize also to regulatory properties. Not surprisingly, the rate by which gene expression diverges also differs between genes (Figure 2A). What was surprising, though, was the low correlation between this rate and evolutionary rate of the associated proteins. Although weak correlation was observed among mammalians (Khaitovich et al, 2005), no correlation was identified among yeasts (Tirosh and Barkai, 2008a). Thus, the fact that proteins are conserved in sequence did not imply their conservation in expression. Similarly, genes that diverged rapidly in expression often encoded for proteins with conserved sequence. The two modes of evolution, therefore, reflect distinct constraints. In yeast, only essentiality correlated with conservation both in expression and in sequence, while all other determinants were specific to one mode of evolution (Tirosh and Barkai, 2008a). Additional determinants that were either common or specific to the two modes of evolution were found in flies (Lemos et al, 2005) and in mammals (Liao and Zhang, 2006).

Figure 2

Expression divergence reflects an intrinsic tendency for expression variability that is encoded in promoter structure. (A) Comparison of genome-wide expression patterns among multiple species identifies genes whose expression patterns remain well conserved (low divergence) and genes whose expression diverged extensively (high divergence). (B) Expression divergence is correlated with responsiveness to environmental changes (data taken from Tirosh et al, 2006), and both measures are higher among OPN genes that contain a TATA-box (purple) than among DPN genes that lack a TATA-box (cyan) (Tirosh and Barkai, 2008b); empty black circles represent genes which do not fit with these two gene classes (e.g., intermediate nucleosome pattern between DON and OPN). (C) Schemes depicting the typical promoter structure of OPN genes with a TATA-box (top, purple) and DPN genes without a TATA-box (bottom, cyan). OPN promoters lack NFR, contain multiple TF binding sites (squares), a TATA-box and fuzzy nucleosomes (i.e., nucleosome positions vary across time and within a population; marked with a double-headed arrow). DPN promoters contain NFR, fewer TF binding sites and no TATA-box, and well-positioned nucleosomes.

One of the notable findings was that divergence of gene expression strongly correlates with other measures of expression variability on completely different time scales (Tirosh et al, 2006; Figure 2B). Genes that diverged rapidly in expression tended also to vary more when environmental conditions were modified. Moreover, these genes were more ‘noisy', displaying a wider variance in expression between identical cells subject to the same conditions (Newman et al, 2006). These observations suggested that evolutionary divergence is one facet of a more general property of expression variability: some genes are capable of broad changes in expression, whereas in others this capacity is limited (Tirosh et al, 2009a). Importantly, a ‘selective' explanation to the increased expression divergence of these genes was ruled out as these results were reproduced in analysis of mutation accumulation strains: strains that evolved in the laboratory while maintaining very low effective population sizes, thereby allowing non-lethal mutations to accumulate randomly with minimal effects of natural selection (Denver et al, 2005; Rifkin et al, 2005; Landry et al, 2007). If expression variability is indeed a general property that differs between genes, it might be encoded in the promoter sequence. Indeed, bioinformatics analysis revealed a strong association between expression variability (on all time scales) and particular promoter structures (Figure 2B and C). First, promoters of variable genes have a TATA-box at much higher frequency than less-variable genes (Tirosh et al, 2006). Second, the organization of nucleosomes in variable genes typically lacks the Nucleosome Free Region (NFR) immediately upstream of the transcription start site, which is a characteristic of most genes. We, therefore, denoted this promoter structure as ‘OPN' for Occupied Proximal Nucleosome, in contrast to Depleted Proximal Nucleosome (DPN) genes (Tirosh and Barkai, 2008b). Notably, the association of this promoter organization with expression variability was not specific to yeast as similar observations were made in other organisms (Tirosh and Barkai, 2008b; Gilchrist et al, 2010). A key challenge will be to understand how TATA and OPN promoters support higher expression variability. A hint to this question comes from the hybrid approach for distinguishing cis and trans effects. Although most of the expression divergence between S. cerevisiae and S. paradoxus reflect cis effects, the increased divergence associated with TATA and OPN was instead due to trans effects (Tirosh et al, 2009b). Coupled with the observation that these promoters are typically bound by (and affected by deletion of) a larger number of regulators compared with other promoters (Landry et al, 2007; Choi and Kim, 2008; Tirosh and Barkai, 2008b; Venters et al, 2011), these results suggest that there are simply more regulators (and thus possible trans-mutations) affecting these promoters. For example, in the OPN organization binding sites are more often covered by nucleosomes, an organization that may facilitate competition between the binding of TFs and nucleosomes and could thus increase the influence of various chromatin regulators. Differences in expression variability between genes may be related to the general interplay between ‘robustness' and ‘evolvability'. An organism needs to be robust, namely maintain a reliable function under different conditions or when subjected to mutations. At the same time, it needs to maintain the ability to evolve in order to adapt to new environments, but this requires sensitivity to genetic mutations. The ability to control the plasticity of gene expression through its promoter structure may help in this interplay. Accordingly, genes that are required for the robustness of the organism will be maintained as lowly variable and their expression program will evolve slowly, whereas those that facilitate adaptation to new environments will be maintained as highly variable. In support of that, we observed that in yeast, genes of high expression variability and those with TATA and OPN promoters preferentially encoded proteins that interact with the environment, such as transporters, and as such may mediate the response and adaptation to environmental changes (Tirosh et al, 2006, 2009a; Zhang et al, 2009b).

Chromatin regulators as ‘capacitors' of gene expression variations

Another idea that arose from thinking about the interplay between robustness and evolvability is that of ‘genetic capacitors'. Robustness of the wild-type organism may be facilitated by proteins that act as ‘genetic capacitors' to reduce the effect of mutations. These capacitors enable the accumulation of mutations, or polymorphisms, that have no phenotypic consequences in normal conditions. The thought is that these mutations can support evolvability if capacitors are repressed under harsh conditions, so that the phenotypic effect of the accumulated mutations suddenly emerges. This model was proposed many years ago (Waddington, 1942), but only recently a striking example for a candidate ‘genetic capacitor', the heat-shock protein Hsp90, was identified (Rutherford and Lindquist, 1998; Queitsch et al, 2002). A central question is whether chaperones, such as Hsp90, are unique in their capacity to buffer mutations or whether other protein capacitors can be identified. One line of thought suggested that, in fact, any regulatory protein with large-scale effects may serve in this capacity (Bergman and Siegal, 2003; Hermisson and Wagner, 2004; Levy and Siegal, 2008). The reason is that in the complex regulatory networks of cells, many epistatic effects are expected which means that mutations which do not have an effect in the wild-type organism, might still have a phenotypic consequence in the background of additional mutation. Large-scale regulators are particularly expected to be involved in such epistatic effects, and will therefore behave as effective capacitors: mutations that are neutral in the wild-type background will accumulate, but these mutations may have an effect when one of these regulators will be deleted or its function compromised. This idea was tested using an inter-species comparative approach (Tirosh et al, 2010a). The two yeast species, S. cerevisiae and S. paradoxus, differ in sequence and expression. The buffering hypothesis predicts, however, that many of the sequence differences are in fact buffered and therefore do not affect expression in wild-type cells. An expression effect will be observed when the capacitor protein is deleted, revealing the impact of hidden genetic changes. Comparing the expression profiles of S. cerevisiae and S. paradoxus, both for wild-type strains and for strains where specific chromatin regulators have been deleted, confirmed that this is the case (Figure 3). Deletions of each of the chromatin regulators that were examined had increased the amount of inter-species expression differences, consistent with the regulators acting as capacitors of gene expression variations. Furthermore, the hybrid analysis confirmed that these regulators buffer gene expression by acting in trans, as expected if they act primarily by influencing upstream regulatory signals.

Figure 3

Chromatin regulators buffer gene expression variability. Comparison of inter-species expression differences between wild-type and (chromatin regulators) deletion strains shows increased expression differences among the deletion strains, indicating that chromatin regulators normally buffer the effects of hidden inter-species genetic variability.

Note that in this example we support a ‘selective' explanation whereby natural selection generated a bias toward mutations that affect gene expression in a mutant background, but not in the wild-type. Nonetheless, these results demonstrate that chromatin regulators effectively buffer gene expression and therefore that compromising their activity exposes hidden genetic variability.

Function of antisense transcription

Antisense transcription occurs frequently throughout the genomes of various organisms (He et al, 2008; Guell et al, 2009; Xu et al, 2009; Yassour et al, 2011). Several studies have shown that antisense transcription can, in certain cases, repress transcription of the sense genes by several mechanisms (Hongay et al, 2006; Camblong et al, 2007; Berretta et al, 2008), yet the frequency and mode of such repression remain poorly understood. Notably, the repressive effect of antisense on the sense transcript cannot be determined directly from steady-state expression levels but requires analysis of sense and antisense expression levels upon perturbations. Steinmetz and colleagues compared sense and antisense expression among 2 diverged S. cerevisiae strains and 48 of their segregants, thus revealing the effects of numerous genetic changes (Xu et al, 2011). This analysis showed that genes associated with antisense transcription show an increased variability in expression among segregants, suggesting an effect similar to that of the TATA-box and occupied nucleosome patterns described above. Interestingly, increased variability of antisense-containing genes was due to similar induction but more efficient repression: antisense-containing genes were often completely ‘switched-off', while repression of other genes was more limited. In contrast, the induction (i.e. maximal levels) of antisense-containing genes did not differ from those of other genes, and thus the dynamic range and the variability of antisense-containing genes was typically larger than that of other genes. A ‘selective' explanation to this effect (i.e. that natural selection favors mutations that repress antisense-containing genes) seems unlikely, and instead these results suggest a model in which antisense transcription induces a threshold-dependent switch of sense transcription. According to this model, low sense transcription (e.g., in the absence of activation) is easily inhibited by the antisense transcription, but higher sense transcription (e.g., upon activation) ‘overcomes' this inhibition and eliminates the antisense effect. This model was further supported by direct experiments which demonstrated an inhibitory antisense effect that is abolished upon induction of the sense gene (Xu et al, 2011). In future work, it might also be useful to compare sense and antisense expression among different species and their hybrid.

Characterizing determinants of nucleosome positioning

Most studies on regulatory evolution focused on gene expression but recent work began to extend these studies to other properties such as TF binding, histone modifications and nucleosome occupancy. We will focus here on comparative studies of nucleosome occupancy, which provided insights into mechanisms that determine the positioning of nucleosomes along the genome. Nucleosomes, the basic building block of chromatin, decrease DNA accessibility and thus the ability of regulatory proteins to bind specific DNA regions and exert their regulatory function (Li et al, 2007). A key issue is what determines the positioning of nucleosomes along the DNA. In particular, there is a contemporary debate about the relative importance of the local DNA sequence (the ‘genomic code'; Segal et al, 2006), compared with the contribution of trans factors such as chromatin remodelers, modifiers and TFs (Kaplan et al, 2009; Zhang et al, 2009a). Comparative analysis provided insights into these questions (Tirosh et al, 2010b; Figure 4). Nucleosome patterns of S. cerevisiae differ in many sites from the orthologous S. paradoxus patterns. Notably, the hybrid analysis mapped ∼70% of the differences as cis (i.e., resulting from changes in the local DNA sequence) and ∼30% as trans (i.e., resulting from mutations that effect regulatory proteins). These results provide a general estimation for the relative roles of local DNA sequence versus regulatory proteins in controlling nucleosome positioning, although this analysis might be biased by natural selection. The ability to compare genomic sequences at positions of cis-dependent differences was further informative for evaluating the role of sequence patterns in controlling nucleosome positions: among the various patterns that were proposed to control nucleosome positioning, only the presence of AT-rich elements had a significant effect, suggesting that AT-rich elements are the dominant feature of local DNA sequence with respect to nucleosome positioning. A ‘selective' explanation is highly unlikely in this case, as variation in other sequence patterns was observed but was not associated with changes in nucleosome positioning. Recent analysis of sequence-derived models of nucleosome positioning has independently reached a similar conclusion (Tillo and Hughes, 2009).

Figure 4

Inter-species and hybrid analysis uncovers determinants of nucleosome positioning. Inter-species comparison (blue and red correspond to S. cerevisiae and S. paradoxus, respectively) and hybrid analysis (black curves) of nucleosome positioning characterizes changes due to cis and trans mutations (Tirosh et al, 2010b). Nucleosomes found in only one of the species and in the corresponding hybrid allele reflect the effect of local (cis) mutations (right). Nucleosome found in only one of the species but in both hybrid alleles reflect the trans effect of distal mutations through the activity of a chromatin-related protein or RNA (left). Sequence analysis at positions of cis changes can suggest which sequence patterns influence nucleosome positioning. For example, the inset shows sequences of the two species within a region that is bound by a nucleosome only in S. paradoxus, demonstrating that inter-species substitutions changed the frequency of AT bases between 22/25 (S. cerevisiae) and 18/25 (S. paradoxus). This is consistent with a nucleosome-disfavoring effect of AT-rich sequences, as observed systematically for inter-species nucleosome differences, and is also observed in other studies (Tillo and Hughes, 2009).

Interestingly, local divergence of AT-rich sequences not only affected the positioning of the closest nucleosomes but also influenced the positioning of multiple surrounding nucleosomes (Tirosh et al, 2010b). A large (∼50 bp) shift in the position of a single nucleosome was typically associated with gradually smaller shifts of several (∼3–6) adjacent nucleosomes at both directions. Furthermore, the large shifts were normally consistent with the predicted effect of local sequence changes, while the smaller shifts of adjacent nucleosomes were not, suggesting that sequence-dependent changes in the positioning of a single nucleosome (e.g., by nucleosome-disfavoring sequences) propagated to adjacent nucleosomes. Such propagation is consistent both with the statistical positioning model (Kornberg and Stryer, 1988; Mavrich et al, 2008) and with active nucleosome packing by chromatin remodeling (Zhang et al, 2011). Regardless of the specific mechanism involved, the widespread propagation of nucleosome shifts indicates that apart from direct regulation by cis (sequence) and trans (regulators) elements, nucleosome positioning depends on the state of the surrounding chromatin.

Coupling of transcription and mRNA degradation

Gene expression reflects the combined influence of a multitude of regulatory processes, and in recent years it has become clear that many of these apparently discrete processes are in fact coordinated (Maniatis and Reed, 2002; Proudfoot et al, 2002; Hagiwara and Nojima, 2007; Nagaike et al, 2011). Interestingly, in addition to coordination between consecutive steps, several observations have suggested that transcription in the nucleus may be coupled to the degradation of mRNAs in the cytoplasm. Such coupling was observed in the response of yeast to stresses (Shalem et al, 2008), and is supported by the finding that individual protein complexes regulate both transcription and mRNA degradation (Rpb4/7 and Ccr4-Not) (Collart, 2003; Goler-Baron et al, 2008). However, the scope of this coordination and its underlying mechanisms are poorly understood. Comparative analysis of mRNA degradation rates and mRNA levels between S. cerevisiae and S. paradoxus supported this notion (Dori-Bachash et al, 2011). This analysis demonstrated that most genes that diverged in mRNA degradation rates also diverged in transcription. The mode of this coordination was surprising: increased mRNA degradation was typically associated with increased (rather than decreased) mRNA levels, indicating that transcriptional changes generated opposite effects to those of mRNA degradation (Figure 5). The more intuitive mode of coordination, whereby transcription and mRNA degradation act together to either increase or decrease mRNA levels, was observed only for few genes.

Figure 5

Inter-species comparison of mRNA degradation rates suggests a mechanistic coupling between transcription and mRNA degradation. Transcription of S. cerevisiae and S. paradoxus was chemically arrested and mRNA levels were measured at several time points after transcription arrest, enabling the estimation of mRNA degradation rates (Dori-Bachash et al, 2011). Inter-species differential degradation was identified for ∼10% of the genes, and these were further divided into three classes: (i) genes for which mRNA levels did not differ significantly between the two species (yellow); (ii) genes for which mRNA levels differed significantly between the two species in the direction that is consistent with the changes in mRNA degradation, that is, the species with higher degradation rate for a given gene had lower mRNA level for that gene (green); (iii) genes for which mRNA levels differed significantly between the two species but in the opposite direction to that which is expected by the changes in mRNA degradation, that is, the species with higher degradation rate for a given gene had higher mRNA level for that gene (purple). Notably, approximately half of the differentially degraded genes belonged to the purple class, indicating a widespread coupling between opposite effects in transcription and mRNA degradation. Similar analysis of mRNA degradation within the hybrid further indicated that such coupled effects (transcription and mRNA degradation) are almost exclusively due to the same type of mutation (cis or trans), suggesting that the same individual mutations have influenced both processed through a mechanistic coupling (Dori-Bachash et al, 2011).

Coordinated evolutionary changes may reflect co-evolution, that is, independent mutation events whose coordination may be facilitated by natural selection. Alternatively, it could reflect a mechanistic coupling whereby individual mutations generate the two opposing effects of transcription and mRNA degradation. It is this latter possibility that is particularly intriguing, as it may indicate that this unappreciated mechanism of regulation is in fact quite common. To distinguish between these possibilities, we turned to the hybrid approach and examined whether coordinated changes in transcription and mRNA degradation are generated by the same or by different types of mutations (cis or trans) (Dori-Bachash et al, 2011). Strikingly, we found that cis changes in mRNA degradation are coordinated only with cis changes (but not with trans changes) in transcription, and similarly, that trans changes in mRNA degradation are coordinated only with trans changes in transcription (Figure 5). Since co-evolution through natural selection would occur between independent sets of mutations, this result rules out a ‘selective' explanation and suggests that coordinated changes in transcription and mRNA degradation are generated by the same individual mutations. Analysis of the sets of genes with coupled changes in cis or trans further suggested possible coupling mechanisms (Dori-Bachash et al, 2011). First, trans-coupled genes were associated with the two complexes known to regulate both transcription and mRNA degradation (Rpb4/7 and Ccr4-Not). Second, two additional RNA-binding factors (Npl3 and Pab1) were enriched with trans-coupled genes, consistent with previous findings that these factors shuttle between the nucleus and the cytoplasm (Lei et al, 2001; Brune et al, 2005). Third, cis-coupled promoters were enriched with diverged TF binding sites, suggesting that modulation of transcription regulation in the nucleus (TF binding) may in some cases signal to the mRNA degradation apparatus in the cytoplasm, perhaps through shuttling of transcription-associated molecules. Accordingly, promoters might encode not only for transcription but also for degradation of the associated mRNA.

Characterizing genomic imprinting in the mouse brain

Although we have so far discussed only studies in yeast, the basic concept of analyzing regulatory divergence to elucidate mechanisms of regulation is a general one and can be applied in other species. An important issue that arises in mammalian hybrids is that allele-specific expression differences reflect both the effects of cis-regulatory mutations but also the phenomenon of genomic imprinting, namely the preferential expression of the parental or maternal alleles of certain genes through epigenetic mechanisms (Wood and Oakey, 2006). These effects can be distinguished by using reciprocal crosses, where the parental strain of one cross serves as the maternal strain for the second cross and vice versa. An effect of cis-regulatory mutations should be consistent among the reciprocal crosses, with higher expression of the same allele regardless of its parental origin, while an effect of genomic imprinting should switch between the reciprocal crosses, consistently favoring either the parental or the maternal allele. Allele-specific expression analysis of reciprocal crosses from divergent strains therefore simultaneously characterizes the effects of cis-regulatory mutations and those of genomic imprinting. Recent studies used this approach to gain insight into the scope and properties of genomic imprinting in the mouse brain (Babak et al, 2008; Gregg et al, 2010a, 2010b; Wang et al, 2010). Note that in this case divergence between mouse strains was not studied directly but was required in order to discriminate between alleles and identify instances of genomic imprinting. Imprinting was observed at >1300 loci, indicating a widespread influence on gene regulation (Gregg et al, 2010b). Most imprinted loci show moderate preferential expression of one allele, rather than strict monoallelic expression, and reside within clusters of imprinted genes. Interestingly, notable differences were observed between brain regions and developmental stages. For example, expression was biased toward the maternal allele in the developing brain, but toward the paternal allele in the adult brain. In addition to biasing gene expression according to the sex of the parent, genomic imprinting also depended on the sex of the offspring: imprinting in the hypothalamus was more common in females than in males, and the female-specific imprinting was preferentially biased toward the parental allele, perhaps indicating increased parental influence on certain brain functions in females (Gregg et al, 2010a).

Concluding remarks

Most studies comparing gene expression in related species were motivated by two challenges: to uncover conserved patterns inferred to be functionally important and to discover adaptive evolutionary changes that drive new phenotypes. Here, we presented a complementary approach that uses evolutionary comparisons as means for obtaining basic insights on gene regulation. This approach builds on the fact that such comparisons expose the regulatory effects of thousands of genetic changes at a single experiment. We described five studies that used this approach in yeast and a related approach that was employed in mice. Crucial to these studies was the analysis of allele-specific expression in hybrids to classify the regulatory changes into those due to mutations in cis or trans. We anticipate that future studies will continue to explore the evolution of diverse regulatory mechanisms and further expand to additional regulatory processes such as translational control and protein–protein interactions. Recent studies began to examine the evolutionary divergence of protein abundance, demonstrating significant differences from divergence of mRNA levels (Foss et al, 2007; Fu et al, 2007; Laurent et al, 2010). Integration of various data sets of regulatory divergence would be needed to bridge this gap and understand how different mechanisms of regulation contribute to divergence of the final outcome, namely protein levels and activity. Such integrative analysis would uncover interplay and coordination among regulatory processes.

98 in total

Inferring regulatory mechanisms from patterns of evolutionary divergence.

The basic approach: why comparing related species could help in identifying regulatory mechanisms

The contributions of positive selection versus neutral drift in shaping the regulatory divergence

The ‘hybrid approach' for dissecting regulatory changes

‘Expression variability' and how it is linked to promoter architecture

Chromatin regulators as ‘capacitors' of gene expression variations

Function of antisense transcription

Characterizing determinants of nucleosome positioning

Coupling of transcription and mRNA degradation

Characterizing genomic imprinting in the mouse brain

Concluding remarks

1. Transcription in the nucleus and mRNA decay in the cytoplasm are coupled processes.

2. Dominance and the evolutionary accumulation of cis- and trans-effects on gene expression.

3. Two strategies for gene regulation by promoter nucleosomes.

4. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution.

5. The antisense transcriptomes of human cells.

6. A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome.

7. Global survey of genomic imprinting by transcriptome sequencing.

8. Species-specific transcription in mice carrying human chromosome 21.

9. Transient transcriptional responses to stress are generated by opposing effects of mRNA production and degradation.

10. Network hubs buffer environmental variation in Saccharomyces cerevisiae.

Review 1. Functional primate genomics--leveraging the medical potential.

Review 2. Comparative interaction networks: bridging genotype to phenotype.

Review 3. Interspecific hybridization as a driver of fungal evolution and adaptation.

4. Evolution of gene regulation during transcription and translation.

5. Evolution at two levels of gene expression in yeast.

6. Genome scale transcriptional response diversity among ten ecotypes of Arabidopsis thaliana during heat stress.

7. An allele of an ancestral transcription factor dependent on a horizontally acquired gene product.

8. Evolution of cis-regulatory elements in yeast de novo and duplicated new genes.

9. Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes.

10. Pervasive divergence of transcriptional gene regulation in Caenorhabditis nematodes.