Jennifer M Andrie1, Jon Wakefield2, Joshua M Akey3. 1. Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; 2. Department of Statistics, University of Washington, Seattle, Washington 98195, USA. 3. Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; akeyj@uw.edu.
Abstract
Gene expression levels are determined by the balance between rates of mRNA transcription and decay, and genetic variation in either of these processes can result in heritable differences in transcript abundance. Although the genetics of gene expression has been a subject of intense interest, the contribution of heritable variation in mRNA decay rates to gene expression variation has received far less attention. To this end, we developed a novel statistical framework and measured allele-specific differences in mRNA decay rates in a diploid yeast hybrid created by mating two genetically diverse parental strains. We estimate that 31% of genes exhibit allelic differences in mRNA decay rates, of which 350 can be identified at a false discovery rate of 10%. Genes with significant allele-specific differences in mRNA decay rates have higher levels of polymorphism compared to other genes, with all gene regions contributing to allelic differences in mRNA decay rates. Strikingly, we find widespread evidence for compensatory evolution, such that variants influencing transcriptional initiation and decay have opposite effects, suggesting that steady-state gene expression levels are subject to pervasive stabilizing selection. Our results demonstrate that heritable differences in mRNA decay rates are widespread and are an important target for natural selection to maintain or fine-tune steady-state gene expression levels.
Gene expression levels are determined by the balance between rates of mRNA transcription and decay, and genetic variation in either of these processes can result in heritable differences in transcript abundance. Although the genetics of gene expression has been a subject of intense interest, the contribution of heritable variation in mRNA decay rates to gene expression variation has received far less attention. To this end, we developed a novel statistical framework and measured allele-specific differences in mRNA decay rates in a diploid yeast hybrid created by mating two genetically diverse parental strains. We estimate that 31% of genes exhibit allelic differences in mRNA decay rates, of which 350 can be identified at a false discovery rate of 10%. Genes with significant allele-specific differences in mRNA decay rates have higher levels of polymorphism compared to other genes, with all gene regions contributing to allelic differences in mRNA decay rates. Strikingly, we find widespread evidence for compensatory evolution, such that variants influencing transcriptional initiation and decay have opposite effects, suggesting that steady-state gene expression levels are subject to pervasive stabilizing selection. Our results demonstrate that heritable differences in mRNA decay rates are widespread and are an important target for natural selection to maintain or fine-tune steady-state gene expression levels.
Variation in gene expression levels constitutes a significant source of phenotypic
diversity among individuals within populations and contributes to the evolutionary
divergence between species (Skelly et al. 2009;
Jones et al. 2012). In humans, regulatory
variants affecting gene expression influence susceptibility to autoimmune, infectious,
neoplastic, neurodegenerative, and psychiatric diseases (Skelly et al. 2009). In Darwin’s finches, regulatory variation affecting
beak morphology likely played a role in their speciation (Abzhanov et al. 2004). Likewise, gene expression variation underlies the
skeletal morphology differences in stickleback fish that distinguish saltwater from
freshwater species (Jones et al. 2012).Heritable regulatory variation can broadly be classified as either acting in
cis or trans (Skelly
et al. 2009). While trans-regulatory effects on gene expression
are undoubtedly important, studies in several eukaryotic organisms, including yeast, fruit
flies, mice, rats, and humans, suggest that cis-regulatory effects
constitute a substantially higher proportion of the genetic variance in gene expression
within species than do trans effects (Schadt et al. 2003; Hughes et al. 2006;
Petretto et al. 2006; Emilsson et al. 2008; Pickrell et al.
2010; Skelly et al. 2011). In the budding
yeast Saccharomyces cerevisiae, for example, nearly 80% of the genes that
have transcribed polymorphisms between two diverse strains exhibit allele-specific
expression differences (Skelly et al. 2011). In
humans, it has been estimated that ∼90% of single nucleotide polymorphisms
influencing gene expression levels are due to cis-regulatory mechanisms
(Pickrell et al. 2010). Furthermore,
cis-regulatory differences accumulate at a faster rate than
trans-regulatory differences between closely related species (Wittkopp et al. 2008; Tirosh et al. 2009; Romero et al.
2012).The balance between mRNA synthesis and decay determines steady-state levels of transcript
abundance, and genetic variation affecting either of these processes can contribute to
heritable differences in transcript abundance. However, to date, most research has
concentrated on genetic variation affecting steady-state mRNA levels, failing to
distinguish regulatory variation affecting transcription from that affecting decay (Skelly et al. 2009). Studies that have explored
different classes of heritable variation underlying differences in steady-state gene
expression focus primarily on transcription initiation, cataloging variation both within
and between species in transcription factor binding sites, chromatin structure, and DNA
methylation sites (Gerstein et al. 2010; The modENCODE Consortium et al. 2010; The ENCODE Project Consortium 2012; Connelly et al. 2014). In contrast, regulatory variants
underlying differences in mRNA decay rate have received considerably less attention (Dori-Bachash et al. 2011; Pai et al. 2012).To better delimit the contribution of cis-regulatory variation to
heritable differences in mRNA decay rates, we developed a novel statistical framework and
measured allele-specific differences in decay in a diploid hybrid created from two
genetically diverse strains of the budding yeast, S. cerevisiae. We
demonstrate that allelic differences in mRNA decay rates are widespread, affecting the
expression levels of nearly 31% of measurable genes. Interestingly, we observe that a
significant proportion of changes in decay rate are coupled to opposing changes in
transcriptional initiation, suggesting pervasive compensatory evolution to stabilize or
fine-tune steady-state gene expression levels. Our results also provide insights into the
mechanisms through which cis-regulatory variation acts to influence mRNA
decay rates, highlighting an important role for variants that affect mRNA secondary
structure.
Results
Overview of experimental design
We measured rates of allele-specific mRNA decay (ASD) in a diploid yeast produced by
mating two genetically diverse haploid Saccharomyces cerevisiae
strains: the laboratory strain BY4716 (BY), which is isogenic to the reference
sequence strain S288C, and the wild Californian vineyard strain RM11-1a (RM) (Liti et al. 2009). A schematic of the
experimental design is shown in Figure 1.
Briefly, we introduced rpb1-1, a temperature-sensitive mutation in
an RNA polymerase II subunit encoded by the gene RPO21, to each of
the haploid yeast strains, mated the strains, and grew the resulting hybrid diploid
to mid-log phase at 24°C, before rapidly shifting the culture to 37°C to
inhibit transcription (Fig. 1A; Nonet et al. 1987). RNA-seq was performed on
culture samples taken at 0, 6, 12, 18, 24, and 42 min subsequent to the temperature
shift (Fig. 1A). To identify ASD, we used
transcribed polymorphisms to distinguish between parental transcripts and compared
the relative levels of transcript abundance over the time course (Fig. 1B). Note, this experimental design internally
controls for trans-acting regulatory variation as well as
environmental factors. Under the null hypothesis of no ASD, the proportion of reads
from the BY transcript observed over the time course remains unchanged (Fig. 1B). However, genes with ASD will exhibit an
increasing or decreasing proportion of BY reads as a function of time (Fig. 1B). In total, we measured ASD from three
independent biological replicates.
Figure 1.
Overview of experimental design. (A) We replaced the wild-type
allele of the RPO21 (also known as RPB1) gene
with the temperature-sensitive rpb1-1 allele in both the
BY4716 (BY) and the RM11-1a (RM) strains of S. cerevisiae
(Nonet et al. 1987). We mated these
two haploid strains to produce a diploid hybrid and grew the diploid to mid-log
phase at the permissive temperature of 24°C. We rapidly shifted the
temperature of the culture to 37°C, halting transcription. Immediately
following the temperature shift, and at 6, 12, 18, 24, and 42 min after the
shift, we isolated mRNA and performed RNA-seq. (B) By
quantifying the relative levels of the BY and RM alleles for each gene, we
estimated p, the proportion of
transcripts derived from BY, at each time point. Under the null hypothesis
(H0; dashed line) of no allele-specific differences in mRNA decay
rates, p remains constant. Under
the alternative hypothesis (HA; solid line) of allelic differences
in mRNA decay, we expect p to
change as a function of time. For the gene represented by the solid line in the
example pictured, p decreases
significantly over time, indicating that the BY allele is decaying more quickly
than the RM allele of this gene.
Overview of experimental design. (A) We replaced the wild-type
allele of the RPO21 (also known as RPB1) gene
with the temperature-sensitive rpb1-1 allele in both the
BY4716 (BY) and the RM11-1a (RM) strains of S. cerevisiae
(Nonet et al. 1987). We mated these
two haploid strains to produce a diploid hybrid and grew the diploid to mid-log
phase at the permissive temperature of 24°C. We rapidly shifted the
temperature of the culture to 37°C, halting transcription. Immediately
following the temperature shift, and at 6, 12, 18, 24, and 42 min after the
shift, we isolated mRNA and performed RNA-seq. (B) By
quantifying the relative levels of the BY and RM alleles for each gene, we
estimated p, the proportion of
transcripts derived from BY, at each time point. Under the null hypothesis
(H0; dashed line) of no allele-specific differences in mRNA decay
rates, p remains constant. Under
the alternative hypothesis (HA; solid line) of allelic differences
in mRNA decay, we expect p to
change as a function of time. For the gene represented by the solid line in the
example pictured, p decreases
significantly over time, indicating that the BY allele is decaying more quickly
than the RM allele of this gene.
Statistical modeling of allele-specific mRNA decay
We developed a novel statistical framework to identify ASD. In brief, we use a linear
logistic model to measure the change in the proportion,
p, of reads derived from the BY
transcript as a function of time. To assess statistical significance, we use a
Bayesian hierarchical Markov chain Monte Carlo model (see Methods). In this model,
the prior probability of the alternative hypothesis (i.e., that a gene exhibits ASD)
is determined from the totality of data. We also estimate the mean and variance of
the decay rate differences under the alternative hypothesis from the data (see
Methods). The primary motivation for developing this more sophisticated framework is
that it accounts for genes that exhibit small departures from nonconstancy due to
high read counts in a more principled manner than alternative approaches (Dori-Bachash et al. 2011). We evaluated the power
and operating characteristics of our statistical framework through extensive
simulations and found that it generally has higher power and more accurately
estimates
(the proportion of genes consistent with the null hypothesis of no allelic
differences in mRNA decay) compared to alternative approaches under a wide variety of
parameters (see Supplemental Methods and Results; Supplemental Fig. S1; Supplemental
Table S1).
Pervasive influence of cis-regulatory variation on mRNA decay
rates
Through a careful filtering pipeline, which included whole-genome sequencing of RM to
mitigate read mapping bias (Degner et al. 2009;
see Methods), we identified 27,569 transcribed single nucleotide variants (SNVs) in
4381 genes that could be used to assign whether individual RNA-seq reads derived from
the BY or the RM allele of each gene. Of the ∼222 million RNA-seq reads we
obtained across all replicates and all time points in our study, 13.57 million reads,
averaging 2.26 ± 0.65 million reads per time point, were informative, such that
they mapped to a variant site and could unambiguously be assigned as originating from
BY or RM.We applied the statistical inference framework described above to 3544 genes that
passed our filtering criteria (see Methods). From the Bayesian hierarchical MCMC
model, we estimated , the proportion of genes that exhibit ASD, to be 0.31. Thus,
∼31% of all measured genes are inferred to be inconsistent with the null
hypothesis and exhibit allelic differences in decay rates. Of these, we can identify
350 genes at a false discovery rate of 10% (Fig.
2A). Note, this corresponds to a false nondiscovery rate of 24%. The set of
genes called significant agrees well with a simpler approach of correcting for
multiple testing by the QVALUE software (Storey
2002; Storey and Tibshirani 2003;
Storey et al. 2004) and imposing a
threshold on the magnitude of effect sizes needed to be called as significant (see
Supplemental Methods and Results; Supplemental Fig. S2). We note that, in order to
inhibit transcription, we subjected the yeast to mild heat shock. The decay rates
observed in our data set are specific to the environmental condition that they were
measured in and, thus, may be different in other states, such as log phase growth
(Sun et al. 2012). Additionally, in theory,
some of the differences in decay rate that we measured could be due to
allele-specific transcriptional responses to heat shock, since there can be small
amounts of leaky transcription in the first 5–15 min following the temperature
shift (Nonet et al. 1987); however, such heat
shock-induced differences in allele-specific transcription are unlikely to comprise
most of the differences we observe.
Figure 2.
Characteristics of genes that exhibit allele-specific mRNA decay.
(A) Posterior probability that a gene exhibits
allele-specific mRNA decay rates, as calculated from our Bayesian hierarchical
Markov chain Monte Carlo model. The dashed line at posterior probability =
0.67 corresponds to the threshold we used to call genes as exhibiting
significant allele-specific mRNA decay rates. (B) Histogram of
the slope calculated from the linear logistic model for the 350 genes with
significant (FDR = 10%) allele-specific mRNA decay rates. The exponential
of the slope, which estimates
λ –
λ, is the change in the
odds of observing a BY mRNA allele given a 1-min increase in time.
(C) Decay rate time courses of all genes in which the RM
allele decays significantly faster than the BY allele (left)
and the BY allele decays significantly faster than the RM allele
(right). The gray lines represent the decay rate time
courses of each of the individual genes. The black lines represent the mean
decay rate time courses for all of the genes included in each plot.
(D) Correspondence of
λ –
λ to half-life
differences between the BY and RM alleles of a gene. The dashed black lines
represent the positive and negative of the median effect size, where effect
size is defined as |λ
– λ|, observed
among the genes we identified with significant allele-specific differences in
decay rates.
Characteristics of genes that exhibit allele-specific mRNA decay.
(A) Posterior probability that a gene exhibits
allele-specific mRNA decay rates, as calculated from our Bayesian hierarchical
Markov chain Monte Carlo model. The dashed line at posterior probability =
0.67 corresponds to the threshold we used to call genes as exhibiting
significant allele-specific mRNA decay rates. (B) Histogram of
the slope calculated from the linear logistic model for the 350 genes with
significant (FDR = 10%) allele-specific mRNA decay rates. The exponential
of the slope, which estimates
λ –
λ, is the change in the
odds of observing a BY mRNA allele given a 1-min increase in time.
(C) Decay rate time courses of all genes in which the RM
allele decays significantly faster than the BY allele (left)
and the BY allele decays significantly faster than the RM allele
(right). The gray lines represent the decay rate time
courses of each of the individual genes. The black lines represent the mean
decay rate time courses for all of the genes included in each plot.
(D) Correspondence of
λ –
λ to half-life
differences between the BY and RM alleles of a gene. The dashed black lines
represent the positive and negative of the median effect size, where effect
size is defined as |λ
– λ|, observed
among the genes we identified with significant allele-specific differences in
decay rates.The exponential of the slope of the linear logistic model fit to each gene is a
direct estimate of the difference in mRNA decay rate between the BY and RM alleles of
that gene (see Methods). Among the genes with significant ASD, the effect size of the
decay rate difference ranges from a 1.81 × 10−3 to a 5.62
× 10−2 change in the odds of observing an mRNA allele of the
BY strain given a 1-min increase in time, with a median difference of 1.01 ×
10−2 (Fig. 2B). This median
difference corresponds to an ∼83% increase over 1 hr in the odds of observing
an mRNA allele of the BY strain. The BY allele decays more quickly than the RM allele
in 161 genes, while the RM allele decays more quickly than the BY allele in 189
genes. Genes with allelic differences in mRNA decay rates spanned a broad range of
gene ontology terms, and we did not detect significant enrichment for particular
functions or biological processes after correcting for multiple comparisons.
Allelic differences in mRNA decay reveal widespread compensatory
evolution
To investigate the relationship between ASD and steady-state gene expression levels,
we first inferred allele-specific expression (ASE) at the 0-min time point in our
time course, which is a reasonable proxy for steady-state levels of transcript
abundance. Using the method developed by Connelly et
al. (2014), we find that 1137 genes exhibit ASE (posterior probability
> 0.95) (Fig. 3). Five hundred and
ninety-five of the 1137 genes (52.3%) that exhibit steady-state ASE have higher
levels of the RM transcript, and 542 genes (47.7%) have higher levels of the BY
transcript. The median log2-fold change for all genes with allele-specific
steady-state expression differences is 0.43.
Figure 3.
mRNA decay rates in yeast are often coupled to opposite changes in
transcription initiation. (Left) Venn diagram showing the
overlap of genes with significant allelic differences in steady-state gene
expression (ASE) and decay (ASD). (Right) Scatter plot showing
estimates of differences in decay rates between BY and RM
(x-axis) versus the proportion of transcripts from the BY
allele (p) inferred from the
0-min time point for genes with both ASE and ASD. The shaded gray rectangles
represent quadrants where magnitudes of ASD and ASE are discordant, suggesting
compensatory evolution.
mRNA decay rates in yeast are often coupled to opposite changes in
transcription initiation. (Left) Venn diagram showing the
overlap of genes with significant allelic differences in steady-state gene
expression (ASE) and decay (ASD). (Right) Scatter plot showing
estimates of differences in decay rates between BY and RM
(x-axis) versus the proportion of transcripts from the BY
allele (p) inferred from the
0-min time point for genes with both ASE and ASD. The shaded gray rectangles
represent quadrants where magnitudes of ASD and ASE are discordant, suggesting
compensatory evolution.Of the 350 genes with significant ASD, 182 (52.0%) also exhibit ASE (Fig. 3). Strikingly, of the 182 genes with both ASD
and ASE, 129 (70.9%) have increased decay rates in the allele with higher levels of
steady-state expression, suggesting that there are variants influencing rates of
transcriptional initiation with opposite effects to those influencing decay (Fig. 3). Similarly, the 168 genes that exhibit ASD
but not ASE (Fig. 3) are also likely enriched
for variants with opposing effects on transcriptional initiation and decay, since the
difference in decay rate does not produce a corresponding difference in steady state
levels. Thus, these data suggest that changes in mRNA decay rates in yeast are often
coupled with opposite changes in transcription, consistent with pervasive
compensatory evolution to stabilize or fine-tune steady-state gene expression
levels.
Patterns of genetic diversity across transcripts with allelic differences in mRNA
decay
Previous studies in yeast have found that genes with ASE exhibited higher levels of
genetic diversity compared to those without such differences (Ronald et al. 2005). To explore patterns of diversity in our data
set, we first compared overall levels of variation among four classes of genes: those
with only ASD, those with only ASE in steady-state levels, those with both ASD and
ASE, and those with no allele-specific differences. Limiting our analysis to the 2954
genes with reliable UTR annotations (Nagalakshmi et
al. 2008), we found that genes with any type of allele-specific difference
have 1.4-fold more variants than genes without ASD or ASE (4.62 and 3.32 variants/kb,
respectively; Mann-Whitney U test, P-value <
2.20 × 10−16) (Fig. 4).
Moreover, genes with ASD have 1.3-fold higher levels of variation compared to genes
with only ASE (5.48 variants/kb compared to 4.31 variants/kb, respectively;
Mann-Whitney U test, P-value = 2.26 ×
10−10) (Fig. 4). One
complication in interpreting these findings is that genes with larger numbers of
variants tend to have more informative reads, and, therefore, there is greater power
to detect allelic differences in expression and decay. Indeed, variant density is
significantly correlated with the number of informative reads
(r2 = 0.158, P-value < 2.20
× 10−16). To more formally explore whether differences in SNV
density are simply related to power, we performed logistic regression where the
predictor variables were coded as zero if a gene did not show ASD and one if it did.
We found that a model that included both the number of variants/kb and the number of
informative reads as covariates fits the data significantly better than a model using
just the number of informative reads alone (ANOVA P-value <
2.20 × 10−16), suggesting that the increased levels of
variation in genes with ASD and ASE are not simply a consequence of discovery
bias.
Figure 4.
Levels of genetic diversity in genes with and without allele-specific
differences in mRNA decay rates and steady-state expression levels. Bar plots
show the mean number of single nucleotide variants (SNVs) between BY and RM per
kilobase across the whole gene (left) or across each gene
region separately (right). Error bars correspond to the 95%
confidence interval of the mean. ASE and ASD denote allele-specific expression
and allele-specific decay, respectively.
Levels of genetic diversity in genes with and without allele-specific
differences in mRNA decay rates and steady-state expression levels. Bar plots
show the mean number of single nucleotide variants (SNVs) between BY and RM per
kilobase across the whole gene (left) or across each gene
region separately (right). Error bars correspond to the 95%
confidence interval of the mean. ASE and ASD denote allele-specific expression
and allele-specific decay, respectively.To identify regions that may be enriched for variants that influence decay rates, we
compared levels of genetic variation in the 5′ UTR, coding region, and
3′ UTR among the four classes of genes described above. Levels of variation
are significantly elevated across all genic regions for genes with ASD compared to
genes with no allelic differences (Fig. 4).
Genes with both ASD and ASE have 1.46-fold, 1.68-fold, and 1.74-fold higher levels of
variation than genes with no allelic differences in the 5′ UTR, coding region,
and 3′ UTR, respectively (Mann-Whitney U test,
P-value = 1.43 × 10−2, < 2.20
× 10−16, and 1.47 × 10−4). Genes with
ASD only have 1.88-fold, 1.64-fold, and 2.12-fold higher levels of variation than
genes with no allelic differences in the 5′ UTR, coding region, and 3′
UTR, respectively (Mann-Whitney U test, P-value
= 1.58 × 10−7, 1.12 × 10−15, and
6.77 × 10−4). Thus, allele-specific differences in mRNA decay
rate are likely driven by variants positioned throughout the transcript. Consistent
with this hypothesis, genes that only contain SNVs in either their coding region or
UTR are less likely to exhibit ASD compared to genes with variants in both their
coding region and UTR (Fisher’s exact test, P-value =
1.35 × 10−10 and 1.55 × 10−4,
respectively).
Genes with ASD are enriched for variants that influence mRNA structure
To test the hypothesis that variation in mRNA secondary structure contributes to
allelic differences in mRNA decay rates, we compared the minimum Gibb’s free
energy (ΔG) associated with the predicted secondary structures of the BY and RM
alleles for each mRNA transcript. Specifically, following standard practices (Tuller et al. 2010), we computed the ΔG of
the predicted secondary structures for each of the 27,569 variants that we identified
between BY and RM (Fig. 5A). We then calculated
the absolute value of the difference in free energy between alleles,
|ΔΔG| = |ΔGBY –
ΔGRM|, and for each gene, we recorded the maximum
|ΔΔG| of all its variants. Variants with larger values of
|ΔΔG| are predicted to have more severe structural consequences
(Fig. 5A). Genes that exhibit ASD are
enriched for variants with larger predicted effects on mRNA secondary structure as
compared to genes without any allelic differences in decay or steady-state expression
(1.32-fold increase in the maximum |ΔΔG| observed in genes with
ASD; Mann-Whitney U test, P-value = 5.48
× 10−15) (Fig. 5B).
Although in silico predictions of differences in mRNA secondary structure are not
perfect proxies for structures that occur in vivo, our observations are consistent
with the hypothesis that allelic variation in mRNA secondary structure contributes to
heritable variation in mRNA decay rates. Interestingly, the gene
HSP78, which encodes a mitochondrial matrix chaperone, exhibits
ASD and only contains a single variant (Supplemental Table S2), which has a large
predicted effect on mRNA secondary structure (Fig.
5C), suggesting that allelic differences in decay of this gene are likely
mediated by structural differences.
Figure 5.
Genes with significant allele-specific decay are enriched for predicted
differences in mRNA structure. (A) Calculation of
ΔΔG for SNVs across a transcript. We determined the mRNA sequence
for both the BY and RM alleles of each gene. For each SNV (denoted by *),
we calculated the predicted minimum Gibb’s free energy (ΔG)
associated with the mRNA secondary structures of each allele using 41-bp
regions of the mRNA transcript centered on each of the SNVs. The difference in
ΔG (ΔΔG) between the BY and RM alleles for each 41-bp window
was measured by subtracting the ΔG calculated for the RM allele from the
ΔG calculated for the BY allele. We then calculated the absolute value of
the difference in free energy between alleles, |ΔΔG|
= |ΔGBY – ΔGRM|, for
each variant. For each gene, we recorded the maximum
|ΔΔG| observed among all its variants.
(B) The maximum |ΔΔG| of genes with
and without allele-specific decay rate differences are significantly different
(Mann-Whitney U test, P-value = 4.47
× 10−7), suggesting that genes with allele-specific
decay rates are significantly enriched for variants with larger predicted
effects on mRNA secondary structure compared to genes without allelic
differences in decay. (C) Predicted mRNA secondary structures
surrounding an SNV in the gene HSP78, which encodes a
mitochondrial matrix chaperone. The location of the SNV in each transcript
allele is denoted by *; the BY and RM alleles at the variant site are
colored blue and red, respectively. Note, HSP78 only contains
one SNV.
Genes with significant allele-specific decay are enriched for predicted
differences in mRNA structure. (A) Calculation of
ΔΔG for SNVs across a transcript. We determined the mRNA sequence
for both the BY and RM alleles of each gene. For each SNV (denoted by *),
we calculated the predicted minimum Gibb’s free energy (ΔG)
associated with the mRNA secondary structures of each allele using 41-bp
regions of the mRNA transcript centered on each of the SNVs. The difference in
ΔG (ΔΔG) between the BY and RM alleles for each 41-bp window
was measured by subtracting the ΔG calculated for the RM allele from the
ΔG calculated for the BY allele. We then calculated the absolute value of
the difference in free energy between alleles, |ΔΔG|
= |ΔGBY – ΔGRM|, for
each variant. For each gene, we recorded the maximum
|ΔΔG| observed among all its variants.
(B) The maximum |ΔΔG| of genes with
and without allele-specific decay rate differences are significantly different
(Mann-Whitney U test, P-value = 4.47
× 10−7), suggesting that genes with allele-specific
decay rates are significantly enriched for variants with larger predicted
effects on mRNA secondary structure compared to genes without allelic
differences in decay. (C) Predicted mRNA secondary structures
surrounding an SNV in the gene HSP78, which encodes a
mitochondrial matrix chaperone. The location of the SNV in each transcript
allele is denoted by *; the BY and RM alleles at the variant site are
colored blue and red, respectively. Note, HSP78 only contains
one SNV.
No widespread evidence for coupling between decay rates and translational
efficiency
Previous reports measuring mRNA decay rate in one or a few genes have suggested that
the translation efficiency of an mRNA might be directly coupled to mRNA decay rate
(for review, see Garneau et al. 2007 and Schoenberg and Maquat 2012). To evaluate whether
such coupling is common, we compared the types of codon changes occurring between BY
and RM in genes with and without decay rate differences. If mRNA decay rate is
coupled to translation efficiency, we would expect that genes with ASD would have
higher proportions of codon changes that impact translation efficiency, as compared
to genes without such differences. To this end, we compared the proportion of
preferred to unpreferred synonymous codon changes in genes with and without ASD and
found that genes with ASD have a slightly lower proportion of codon preference
changes, although this is not statistically significant (68.1% versus 68.7%;
Mann-Whitney U test, P-value = 8.60 ×
10−2). Thus, we do not find widespread evidence that mRNA
translation efficiency is directly coupled to mRNA decay rate. A caveat of this
analysis is that more sensitive measures of translational efficiency may be needed to
detect coupling. Moreover, these findings do not preclude the possibility that
coupling exists for a subset of genes, which we may not have power to detect.
Discussion
We developed a novel statistical framework to measure allele-specific differences in
mRNA decay rate in a diploid yeast hybrid created by mating two genetically diverse
parental strains. A particular strength of our statistical approach is its ability to
deal with genes that exhibit small departures from nonconstancy due to high read counts
in a more principled manner than alternative approaches (Dori-Bachash et al. 2011). Using our statistical framework, we demonstrate
the pervasive influence of cis-regulatory variation on mRNA decay
rates, estimating that >30% of measurable genes exhibit ASD. Our results suggest
that variation in mRNA decay rate is widespread across the genome, as well as among
individuals within a species. Overall, our study provides further evidence of the
importance of post-transcriptional processes in determining heritable differences in
gene expression levels, which, in turn, impact phenotypic diversity among individuals
within populations. Additionally, the novel statistical framework we developed has broad
applications for future work in testing hypotheses of differential expression.A striking feature of the data is that differences in mRNA decay rates are often coupled
with opposite changes in transcription. It is difficult to precisely estimate the
proportion of genes with significant ASD that is in the opposite direction of
steady-state expression levels because of differences in the statistical power of
detecting ASE and ASD. However, a naïve estimate suggests that up to 85% of genes
with significant ASD are coupled with opposing effects on transcription (Fig. 3). These findings agree with previous studies,
which observed that roughly 80% of differences between yeast species and 50% of
differences among humans in mRNA decay rate are coupled to opposing differences in
transcription (Dori-Bachash et al. 2011; Pai et al. 2012). Interestingly, in the remaining
15% of genes with significant ASD, there is no significant correlation
(r2 = 0.255, P-value = 6.53
× 10−2) between the magnitude of the decay rate difference and
the magnitude of the gene expression difference between the alleles. Collectively, these
results suggest that steady-state gene expression levels are subject to strong
stabilizing selection, and that heritable differences in mRNA decay rates are an
important target for natural selection to maintain or fine-tune steady-state gene
expression levels.To explore which regions of the mRNA transcript are most important in determining mRNA
decay rate differences, we compared the levels of genetic variation in the 5′
UTR, coding region, and 3′ UTR in genes with and without ASD. We hypothesized
that the 3′ UTR would be the most important region governing ASD, and therefore,
that genes exhibiting ASD would be especially enriched for polymorphisms between BY and
RM in the 3′ UTR compared to genes without ASD or ASE. Instead, we observed that
all three regions exhibited significantly more variation in genes with ASD compared to
genes without ASD or ASE (Fig. 4). One explanation
for these results is that the 3′ UTR contains the lowest overall amount of
variation, suggesting that it is under significant functional constraint. If the
3′ UTR contains the highest density of cis-elements affecting
mRNA decay rate, then changes to this region perhaps have a larger effect on mRNA decay
rate, and therefore, are more likely to be removed by purifying selection. Conversely,
changes in the 5′ UTR or coding region may cause differences in mRNA decay rates
of a smaller effect size, and therefore be subject to less intense purifying selection.
Thus, all three gene regions may be important determinants to within-species differences
in mRNA decay rate. Our observation that genes with SNVs only in the coding region or,
alternatively, only in the UTRs are less likely to exhibit ASD, also suggests that
variation in all parts of the transcript can potentially impact decay rate.Identifying allelic differences in mRNA decay rate is only the first step in the
ultimate goal of identifying causal regulatory alleles and the mechanisms that they act
through. To this end, it is interesting to note that 13 genes with ASD possess a single
variant between the BY and RM alleles of the transcript (11 in coding regions and two in
UTRs) (see Supplemental Table S2). These variants are strong candidate causal alleles,
and as shown for HSP78 (Fig. 5C),
enable mechanistic hypotheses to be formulated and ultimately tested. Furthermore, it
will be important to consider additional processes that could influence allele-specific
decay. For example, nonsense-mediated decay (NMD) can be triggered by AUG codons in the
5′ UTR, and if a SNV introduced or disrupted a 5′ UTR AUG, it could
influence mRNA decay rates between the two alleles. In the 2954 genes that have reliable
UTR annotations, 34 contain SNVs in the 5′ UTR that introduce or disrupt an AUG
codon. Of these 34 genes, nine exhibit significant ASD, which is significantly more than
we would expect by chance (Fisher’s exact test, P-value =
5.16 × 10−3); however, only three of the nine genes show decay
rate differences in the direction expected if allelic differences in mRNA decay rate
were mediated by nonsense-mediated decay. Thus, this process likely makes a minor
contribution to patterns of ASD in our data. More generally, dissecting the mechanistic
basis of allelic variation in mRNA decay rates will facilitate the robust prediction of
causal regulatory alleles from sequence data.Another critical area of research will be exploring the interactions of genetic
variation with the environment. Our study was conducted in yeast undergoing exponential
growth in a rich medium that underwent mild heat shock at the time of transcriptional
shut-off; however, we would expect that patterns of ASD would differ markedly under
differing growth conditions, such as nutrient-limited media or the presence of high
concentrations of chemicals like ethanol or the various heavy metals. Additionally, the
effects of cis-regulatory variation on mRNA decay are also likely to
vary between different stages of the yeast life cycle, including during meiosis and
during vegetative growth. Another important limitation of our study is that it only
examines allele-specific differences in mRNA decay rate between two diverse yeast
strains. Nonetheless, our results highlight the important contribution that heritable
variation in mRNA decay rates make to buffer steady-state differences in gene expression
and suggest that additional post-transcriptional processes should be studied in greater
detail for a more comprehensive understanding of mechanisms contributing to
transcriptional diversity within and between species.
Methods
Yeast strains
For the purposes of this study, we replaced the wild-type copy of the
RPO21 (also known as RPB1) gene in the haploid
S. cerevisiae strains BY4716 (BY) and RM11-1a (RM) (for detailed
descriptions of these two strains, see Brem et al.
2002) with the rpb1-1 mutant allele (Nonet et al. 1987). We began by identifying the mutations that
make rpb1-1 differ from wild-type RPO21 via
standard Sanger sequencing of the RP021 locus in the strain Y262
(described in Herrick et al. 1990). We
identified two mutations: a C to T substitution 206 bp after the translation start
site, and a G to A substitution 4310 bp after the translation start site; both
mutations are nonsynonymous. To substitute RPO21 with
rpb1-1 in RM, we used a “pop-in, pop-out” strategy
(Rothstein 1995; Duff and Huxley 1996). Specifically, in the “pop-in”
step, we linearized a plasmid containing the URA3 selectable marker
and the rpb1-1 mutant allele with a restriction enzyme that cut in
the rpb1-1 sequence segment. We then transformed the linearized
plasmid into RM cells and selected for cells in which the plasmid had recombined into
the genome using uracil prototrophy. At the completion of the “pop-in”
step, RM carried a duplication of the target genomic sequence segment, in which one
duplicate contained the RPO21 wild-type allele and one duplicate
contained the rpb1-1 mutant allele; the plasmid sequences and
URA3 lay between the two duplicates. In the
“pop-out” step, we added uracil back to the medium so that the
URA3 gene was no longer required for viability, allowing
spontaneous recombination events to occur between the duplicated target sequences. To
select for recombination events, we used 5-fluororotic acid (5-FOA), which is
metabolized by URA3 into a toxic compound. A recombination event will result in
either retention of the mutant rpb1-1 allele or reversion to the
wild-type RPO21 allele. Using this strategy, we first replaced the C
located 206 bp after the translation start site in RPO21 with a T,
and then, subsequently, we replaced the G located 4310 bp after the translation start
site in RPO21 with an A. To confirm successful substitution of the
wild-type allele with the mutant allele at both sites, we used standard Sanger
sequencing.To replace RPO21 with rpb1-1 in BY, we employed a
backcrossing strategy. We could not use the “pop-in, pop-out” strategy
because BY already contained the URA3 selectable marker. More
specifically, we crossed BY to Y262, sporulated the hybrid diploid, and screened the
resultant offspring for inability to grow at 37°C (Note: rpb1-1
mutants do not replicate at this temperature). We then performed four more rounds of
backcrossing between the hybrid offspring and BY, such that the resulting yeast
strain carried the rpb1-1 allele on an ∼97% BY genetic
background. We confirmed that the final product of our backcross carried the two
single nucleotide variants that make rpb1-1 mutants different from
the wild-type RPO21 by standard Sanger sequencing. We mated the BY
and RM rpb1-1 temperature-sensitive mutant strains and selected for
the diploid hybrid by visually screening for BY and RM cells that had mated. We
confirmed that our candidate diploid hybrids identified in our screen were, in fact,
diploid using a standard Halo Mating Type Assay.
Measuring mRNA decay rates
mRNA decay rate time course sample collection
The BY × RM hybrid diploid we generated was grown at 24°C to mid-log
phase (OD600 0.8–1.0) in 60 mL of yeast extract peptone dextrose
(YEPD). We abruptly shifted the culture to 37°C via addition of 60 mL of
50°C YEPD. Immediately following, and at 6, 12, 18, 24, and 42 min after
addition of the 50°C medium, we collected 20-mL aliquots of the culture using
vacuum filtration. To maintain the increased temperature of the culture, we housed
it in a 37°C shaking incubator between collection time points. The collected
yeast cells from each time point were flash-frozen in liquid nitrogen and then
stored at −80°C for no more than 2 d before we extracted total RNA
from the cells using a standard phenol-chloroform preparation. In total, we
collected three replicates of our time course.
RNA sequencing
We used a TruSeq RNA Sample Prep v2 Kit (Illumina) to create a sequencing library
from the total RNA collected for each decay rate time course time point from each
replicate. Per the protocol for the kit, we isolated mRNA from the total RNA using
two rounds of poly(A) selection, then fragmented the isolated mRNA into ∼150
base pair (bp) fragments, and finally, used random hexamer primers to produce cDNA.
Poly(A) selection, by definition, retains only mRNA with intact poly(A) tails in the
resultant RNA pool. The most commonly used pathway of mRNA decay, referred to as
deadenylation-dependent decapping, begins with shortening of the poly(A) tail by
deadenylases, followed by removal of the 5′ cap structure by decapping
enzymes, and, finally, 5′ to 3′ exonucleolytic degradation of the
decapped intermediate (for review, see Wilusz et al.
2001 and Garneau et al. 2007). Thus,
our experimental design is unable to detect allelic differences that affect the
decapping rate or rate of exonucleolytic degradation. We chose to use poly(A)
selection to isolate mRNA despite its inability to detect differences in the later
stages of mRNA decay because previous studies of deadenylation-dependent decapping
have demonstrated that mRNA decay proceeds very rapidly following deadenylation, and
that deadenylation, as opposed to decapping or exonucleolytic degradation, is the
rate-limiting step in the mRNA decay process (for review, see Wilusz et al. 2001 and Garneau et
al. 2007). We created barcoded sequencing libraries from the cDNA from each
sample and, in an effort to minimize technical variation between the data acquired
from different decay rate time points, all samples from all replicates were sequenced
in the same lane on an Illumina HiSeq 2000 (50-bp paired-end reads).
Whole-genome sequencing of RM
For whole genome sequencing of the S. cerevisiae strain RM, we
inoculated the strain from −80°C freezer stock into 5 mL YEPD and grew
the culture at 30°C to saturation. We pelleted the cells from the culture by
centrifugation, decanted the supernatant, and froze the cells at −80°C.
We extracted DNA using a Genomic-tip 100/G Kit (Qiagen) and then concentrated the
sample using a standard ethanol precipitation. We prepared a DNA sequencing library
using a TruSeq DNA Sample Prep v2 Kit (Illumina). Per the protocol for the kit, we
used a Covaris sonicator to shear the DNA into ∼300- to 400-bp fragments, and,
after ligating adaptors onto the DNA fragments, we additionally size-selected for
300- to 400-bp fragments by running the ligation products out on an agarose gel and
gel-extracting the appropriate band. We performed whole-genome sequencing using an
Illumina MiSeq (151-bp paired-end reads).
Read mapping
We obtained complete genome sequences for BY from the Saccharomyces
Genome Database (version R64-1-1, released February 3, 2011; http://www.yeastgenome.org) (Engel et
al. 2013) and for RM from the Broad Institute (http://www.broadinstitute.org). We used BWA version 0.5.9 (Li and Durbin 2009) to map both the DNA and RNA
sequence reads to the BY genome and, separately, the RM genome. After mapping reads,
we sorted BAM files and marked duplicate reads using Picard version 1.43 (http://picard.sourceforge.net).
Identification of variant sites for assigning the allele of individual RNA-seq
reads
To obtain a set of variants for allele-specific read calling in the BY × RM
diploid, we used LASTZ (http://www.bx.psu.edu/ miller_lab)
to infer alignment scoring parameters appropriate for aligning the BY and RM genomes
and to generate pairwise alignments between all chromosomes of the two strains. We
then used threaded blockset aligner (TBA) (Blanchette
et al. 2004) to compute a whole-genome alignment that is not biased in
favor of a particular reference genome. We cataloged all SNVs, as well as all indels,
identified in the alignment. As we were only interested in transcribed differences
between the BY and RM genomes, we removed from our variant list all sites not within
annotated BY open reading frames (obtained from the Saccharomyces
Genome Database; http://www.yeastgenome.org) and
their corresponding untranslated regions (UTRs) (UTR lengths were determined from
Nagalakshmi et al. 2008). Manual review of
the remaining variant sites using the program Integrated Genome Viewer (http://www.broadinstitute.org/igv/) revealed that many of the indels
identified from the BY and RM alignment produced by TBA, as well as the SNVs closely
flanking these indels, were miscalled. Therefore, we removed all indels and all SNVs
within 10 bp of an indel from our variant list. Likewise, because we suspected that
most or all of the SNVs we identified in genes with unusually high numbers of
variants per kb were artifacts of alignment errors, we discarded all SNVs located in
genes that exhibited greater than fivefold the average variant density of all genes
across the genome. Due to the difficulty in distinguishing which gene an
RNA-sequencing read that aligns to a location in which two yeast genes overlap
derives from, we also threw out any variants that overlapped more than one annotated
yeast gene. Finally, we removed any variants to which reads obtained from whole
genome sequencing of RM were assigned more often to the BY allele of the variant than
to the RM allele by the method developed by Skelly et
al. (2011) (briefly described below) to assign whether individual RNA-seq
reads derived from the BY or the RM allele of each gene.
Assignment of the allele of individual RNA sequencing reads
We performed assignment of individual RNA-seq reads as originating from either the BY
or the RM allele of each gene as described in Skelly
et al. (2011), with the following two changes. First, any read with an
alignment to one genome that scores higher had to overlap one of the SNVs between BY
and RM that were identified as described above. Second, we did not perform a
correction for GC content.
Measuring allele-specific differences in mRNA decay rate
To determine whether a gene exhibited allele-specific mRNA decay rate differences, we
developed a novel linear logistic model that we applied in conjunction with a
quasi-likelihood ratio test to measure the change across our time course in the
calculated proportion of reads deriving from the BY allele as compared to the total
number of informative reads at each gene. Specifically, in our model, we let
N(t) be the number
of mRNA transcripts for strain j, j = 1, 2
(representing BY and RM) at time t. We then assumed that the rate of
decay is dN/dt =
−λt,
with λ > 0, so that
N(t) =
N
exp(−λt),
where N is the count at time 0 for
strain j. For each time point, t, the number of
RNA-seq reads that we can assign to a strain,
n(t), is a
fraction, f, of the total number of
mRNA transcripts for that strain, such that
n(t) =
f
N(t).We then assumed the modelUnder this model, the distribution of the counts for strain 1 (BY) given the total is
binomial [we could make a binomial approximation since n(t) ≪
N(t)] with denominator
n(t) +
n(t) and
probability (of strain 1):Taking the logit gives:Although different fractions of the total mRNA transcript pool are sampled at each
time point, these fractions cancel in the above calculation, so that we compute the
relative proportion of strain 1 (BY) alleles in the pool from which we have sampled
at each time point. The exp(α) is the odds that we observe an
mRNA allele of strain 1 at time t = 0. In our linear logistic
model, we estimate α; however, from the above derivation we
know that these odds are
N/N,
but this proportion is unobserved. The parameter exp(β) is the
change in the odds of observing an mRNA allele of the strain 1 type given a 1-min
increase in time. Thus, exp(60 × β) is the change in the
odds of observing an mRNA allele of strain 1 given a 1-hr increase in time. For
example, if exp(60 × β) = 2, then the odds of
observing an mRNA allele of strain 1, when compared to the odds of observing an mRNA
allele of strain 2 (RM), doubles over 1 hr. If decay rates are the same in both
strains, then λ =
λ, which is equivalent to
β = 0 in the logistic model. The parameter estimate
, along
with the associated standard error, are subsequently used within a hierarchical
model, as detailed shortly.The null can be rejected with small departures from nonconstancy due to high counts,
if a frequentist test (such as a quasi-likelihood test) is used. This is a recognized
problem with frequentist testing in which power is not accounted for in the setting
of significance thresholds. Hence, to determine if β was
significantly different from 0, and therefore, whether a gene exhibited allele
specific differences in mRNA decay rate, we used a Bayesian hierarchical model. In
our model, we let Y be the estimate of
the slope
for the ith gene, and be the variance of this estimate. We
then assumed Y |
∼
N(, ), i = 1, …, m,
where m is the number of genes. We specified a mixture model for the
collection [, …, ], withThe second mixture component contains the non-null genes. We integrated out over
to
obtain a three-stage model, and we use mixture component indicators
H = 0/1 to denote the
zero/normal membership model for transcript i. The model is:Stage 1:Stage 2:Stage 3:withso that we had improper priors for δ and
τ. This model is appealing
since we deal with overdispersion in the data using a reliable and distribution-free
frequentist method and then take the information on the parameter of interest only
(the differences), namely the estimate and its associated standard error, to model
within the hierarchy. By only concentrating on the key parameters, we avoid having to
make model assumptions concerning parameters of no interest.We implemented this model via a Markov chain Monte Carlo algorithm in which we
introduced indicator variable ω
to denote the mixture component of gene i. For our analysis, we only
evaluated the 3544 genes that had at least 10 informative reads at each of the six
time points, as well as less than a 50-fold difference in expression of the two
alleles at the 0-min time point. General background of this testing framework can be
found in Wakefield (2013).To formally determine whether gene i exhibited allele-specific mRNA
decay rate differences, we placed a threshold of 0.67 on the posterior probability
r =
Pr(H = 1| data) of
being non-null. At this threshold, the false discovery rate (FDR) is 0.099 and the
false nondiscovery rate (FNDR) is 0.244. The FDR and FNDR are model-based estimates
and are calculated as follows. For the list of R (say) genes i that
pass the threshold, we calculate the sum of 1 –
r (i.e., the posterior
probability of no difference in mRNA decay rate) and divide by the total number of
“noteworthy” genes, R, to give the FDR. For all the remaining (3544
– R) non-noteworthy genes, we sum the
r (i.e., the posterior
probability of a difference in mRNA decay rate) and then divide by (3544 – R)
to give the FNDR.
Gene Ontology analysis
To assess whether there was any significant enrichment for genes involved in a
particular molecular function, cellular component, or biological process in the set
of genes we identified with allelic differences in mRNA decay, or in the two smaller
subsets of genes in which one allele or the other decayed more quickly, we submitted
each set of genes to AmiGO’s GO Term Enrichment Tool (http://amigo1.geneontology.org/cgi-bin/amigo/term_enrichment). We
co-submitted all 3544 genes we analyzed for allele-specific mRNA decay rate
differences as the input background set and selected SGD as the database filter. We
chose 0.01 as our maximum P-value threshold and two as the minimum
number of gene products.
Measuring allele-specific differences in mRNA steady-state levels
Using the numbers of mRNA transcripts from BY and RM for each gene at time point
t = 0 min as a proxy for steady-state expression levels, we
determined whether a gene exhibited allele-specific steady-state expression
differences by performing the cis test exactly as described in Connelly et al. (2014), with the following
modification: The test was performed with three, rather than two, replicates. Our
primary motivation for choosing this method, as opposed to alternative approaches
(Skelly et al. 2011), is that its
statistical framework is most closely related to the framework we implemented for
identifying allele-specific mRNA decay rate differences.
Classification of genes by type of allele-specific differences
For comparison between genes with allele-specific mRNA decay rate differences,
allele-specific steady-state expression differences, and no allele-specific
differences, we divided the genes into four classes: those with allele-specific
differences in mRNA decay rate only, those with allele-specific differences in
steady-state levels only, those with allele-specific differences in both mRNA decay
rate and steady-state levels, and those with no allele-specific differences.
Specifically, we categorized genes with a posterior probability greater than 0.67 in
our Bayesian hierarchical Markov chain Monte Carlo model, but with a posterior
probability less than 0.95 in our test for allele-specific steady-state expression
differences as only having allele-specific differences in mRNA decay rate. We
considered genes with a posterior probability greater than 0.67 in our Bayesian
hierarchical Markov chain Monte Carlo model and a posterior probability greater than
0.95 in our test for allele-specific steady-state expression differences as
exhibiting both allele-specific differences in mRNA decay rate and allele-specific
differences in steady-state expression levels. Genes with a posterior probability
greater than 0.95 in our test for allele-specific steady-state expression
differences, but which did not have a posterior probability greater than 0.67 in our
Bayesian hierarchical Markov chain Monte Carlo model, were classified as only having
allele-specific differences in steady-state expression levels. For our final category
of genes with no allelic differences, we grouped together genes with a posterior
probability less than 0.30 in our Bayesian hierarchical Markov chain Monte Carlo
model and a posterior probability less than 0.95 in our test for allele-specific
steady-state expression differences. We choose 0.30 rather than 0.67 as the cut-off
for the posterior probability in our Bayesian hierarchical Markov chain Monte Carlo
model for this group of genes in an effort to minimize the number of false negatives
(for allele-specific differences in mRNA decay rate) in this group.
Secondary structure analysis
To evaluate the differences in mRNA secondary structure between the BY and RM alleles
of each gene, we began by determining the mRNA sequence for both the BY and RM
alleles of each gene using the set of 27,569 variants we identified between BY and
RM, the BY genome sequence (from the Saccharomyces Genome Database,
version R64-1-1, released February 3, 2011; http://www.yeastgenome.org)
(Engel et al. 2013), annotations of the BY
open reading frames (from the Saccharomyces Genome Database;
http://www.yeastgenome.org), and the predicted untranslated region
lengths (UTRs) for the BY open reading frames (Nagalakshmi et al. 2008). We then used the UNAFold software
package’s hybrid-ss-min tool to compute the predicted minimum Gibb’s
free energy (ΔG) associated with the mRNA secondary structures of each allele
of each transcript at 30°C (we chose to use 30°C because this is the
standard temperature at which yeast are grown in the laboratory) (Markham and Zuker 2005, 2008). More specifically, following standard practices (Tuller et al. 2010), we calculated the ΔG
of a 41-bp mRNA region surrounding each of the 27,569 SNVs in our data set (Markham and Zuker 2005, 2008). The variant of interest was placed at the center of each
41-bp window; however, if the variant was <20 bp from the end of the mRNA
transcript, the 41-bp window was shifted such that the beginning or the end coincided
with the beginning or the end of the mRNA transcript, as appropriate. The difference
in ΔG (ΔΔG) between the BY and RM alleles for each 41-bp window was
measured by simply subtracting the ΔG calculated for the RM allele from the
ΔG calculated for the BY allele. We then calculated the absolute value of the
difference in free energy between alleles, |ΔΔG| =
|ΔGBY – ΔGRM|, for each
variant. For each gene, we recorded the maximum |ΔΔG| we
observed among all its variants.
Data access
Sequencing data from this study have been submitted to the NCBI Gene Expression Omnibus
(GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE60617.
Authors: Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller Journal: Genome Res Date: 2004-04 Impact factor: 9.043
Authors: Sushmita Roy; Jason Ernst; Peter V Kharchenko; Pouya Kheradpour; Nicolas Negre; Matthew L Eaton; Jane M Landolin; Christopher A Bristow; Lijia Ma; Michael F Lin; Stefan Washietl; Bradley I Arshinoff; Ferhat Ay; Patrick E Meyer; Nicolas Robine; Nicole L Washington; Luisa Di Stefano; Eugene Berezikov; Christopher D Brown; Rogerio Candeias; Joseph W Carlson; Adrian Carr; Irwin Jungreis; Daniel Marbach; Rachel Sealfon; Michael Y Tolstorukov; Sebastian Will; Artyom A Alekseyenko; Carlo Artieri; Benjamin W Booth; Angela N Brooks; Qi Dai; Carrie A Davis; Michael O Duff; Xin Feng; Andrey A Gorchakov; Tingting Gu; Jorja G Henikoff; Philipp Kapranov; Renhua Li; Heather K MacAlpine; John Malone; Aki Minoda; Jared Nordman; Katsutomo Okamura; Marc Perry; Sara K Powell; Nicole C Riddle; Akiko Sakai; Anastasia Samsonova; Jeremy E Sandler; Yuri B Schwartz; Noa Sher; Rebecca Spokony; David Sturgill; Marijke van Baren; Kenneth H Wan; Li Yang; Charles Yu; Elise Feingold; Peter Good; Mark Guyer; Rebecca Lowdon; Kami Ahmad; Justen Andrews; Bonnie Berger; Steven E Brenner; Michael R Brent; Lucy Cherbas; Sarah C R Elgin; Thomas R Gingeras; Robert Grossman; Roger A Hoskins; Thomas C Kaufman; William Kent; Mitzi I Kuroda; Terry Orr-Weaver; Norbert Perrimon; Vincenzo Pirrotta; James W Posakony; Bing Ren; Steven Russell; Peter Cherbas; Brenton R Graveley; Suzanna Lewis; Gos Micklem; Brian Oliver; Peter J Park; Susan E Celniker; Steven Henikoff; Gary H Karpen; Eric C Lai; David M MacAlpine; Lincoln D Stein; Kevin P White; Manolis Kellis Journal: Science Date: 2010-12-22 Impact factor: 47.728
Authors: Eric E Schadt; Stephanie A Monks; Thomas A Drake; Aldons J Lusis; Nam Che; Veronica Colinayo; Thomas G Ruff; Stephen B Milligan; John R Lamb; Guy Cavet; Peter S Linsley; Mao Mao; Roland B Stoughton; Stephen H Friend Journal: Nature Date: 2003-03-20 Impact factor: 49.962
Authors: Joseph K Pickrell; John C Marioni; Athma A Pai; Jacob F Degner; Barbara E Engelhardt; Everlyne Nkadori; Jean-Baptiste Veyrieras; Matthew Stephens; Yoav Gilad; Jonathan K Pritchard Journal: Nature Date: 2010-03-10 Impact factor: 49.962
Authors: Jacob F Degner; John C Marioni; Athma A Pai; Joseph K Pickrell; Everlyne Nkadori; Yoav Gilad; Jonathan K Pritchard Journal: Bioinformatics Date: 2009-10-06 Impact factor: 6.937