Literature DB >> 22495509

Computational inference of mRNA stability from histone modification and transcriptome profiles.

Chengyang Wang1, Rui Tian, Qian Zhao, Han Xu, Clifford A Meyer, Cheng Li, Yong Zhang, X Shirley Liu.   

Abstract

Histone modifications play important roles in regulating eukaryotic gene expression and have been used to model expression levels. Here, we present a regression model to systematically infer mRNA stability by comparing transcriptome profiles with ChIP-seq of H3K4me3, H3K27me3 and H3K36me3. The results from multiple human and mouse cell lines show that the inferred unstable mRNAs have significantly longer 3'Untranslated Regions (UTRs) and more microRNA binding sites within 3'UTR than the inferred stable mRNAs. Regression residuals derived from RNA-seq, but not from GRO-seq, are highly correlated with the half-lives measured by pulse-labeling experiments, supporting the rationale of our inference. Whereas, the functions enriched in the inferred stable and unstable mRNAs are consistent with those from pulse-labeling experiments, we found the unstable mRNAs have higher cell-type specificity under functional constraint. We conclude that the systematical use of histone modifications can differentiate non-expressed mRNAs from unstable mRNAs, and distinguish stable mRNAs from highly expressed ones. In summary, we represent the first computational model of mRNA stability inference that compares transcriptome and epigenome profiles, and provides an alternative strategy for directing experimental measurements.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22495509      PMCID: PMC3413115          DOI: 10.1093/nar/gks304

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In eukaryotic cells, DNA winds around histone octamers to form the basic chromatin structure. Combinatorial modifications of the histone N-terminals, such as acetylation, methylation and phosphorylation, are related to distinct chromatin states (1–4). Histone modifications are also implicated in a wide range of biological processes, especially with transcriptional gene regulation (2,5,6). Transcription starts with the pre-initiation complex formation, then proceeds to a dynamic cycle of Pol II initiation, elongation and termination (5,7,8). All these events are closely associated with different histone modifications. For example, H3K4me3 is associated in transcription initiation (9), H3K36me3 with transcription elongation (10,11) and H3K27me3 with RNA polymerase pausing and elongation repression (12–14). Although it is unclear whether histone modifications are the cause or consequence of transcription (15), they do play a key role in regulating gene expression (1,16–19). A number of computational methods have been proposed to model histone modification profiles. These methods have been used to identify differential histone modification sites (20), find key transcription factors (21,22), predict tissue-specific gene regulation (23) and infer relationships among different histone modifications (24). Most notably, quantitative models have been developed to show that histone modifications are predictive of gene expression levels (25–27). Karlic et al. (27) proposed a linear regression model that successfully demonstrated the predictive power of histone marks with respect to gene expression. Cheng et al. (25) constructed support vector regression models to integrate histone modifications and reported that histone modifications and transcription factors are statistically redundant for predicting gene expression levels (26). These studies focused on the predictive power of their models but largely ignored the deviations between the model predictions and the mRNA levels. Steady-state mRNA levels, as assessed by microarrays or RNA-seq, represent the dynamic balance between transcript production and degradation at specific cell conditions. We assume that, in a non-stimulus environment, statistical models of histone modification levels could represent a stable transcription rate, and that the mRNA degradation rate is independent of the transcription rate. Under this assumption, the difference between the mRNA levels predicted from histone modifications and the mRNA levels measured from microarrays or RNA-seq could be used to infer RNA stability. The mRNAs that are more abundant than predicted from histone marks are inferred to be stable mRNAs with low degradation rates whereas those that are less abundant than predicted are inferred to be unstable mRNAs with high degradation rates. In this study, we proposed, for the first time, that the computational models systematically comparing epigenome and transcriptome profiles could be used to infer mRNA stability. For the epigenome, we used ChIP-seq data of three widely profiled histone methylations, H3K4me3, H3K27me3 and H3K36me3, all of which have well-defined roles in transcription. We selected specific features to optimize the linear models of the histone marks to fit the transcriptome profiles in multiple human (ENCODE: K562, GM12878, HepG2, HSMM, Huvec, NHEK) and mouse (MEF and NPC) cell lines. Using residuals between the mRNA levels predicted from histone marks and the levels measured by microarrays or RNA-seq, we inferred the mRNA stability in each cell line. We investigated the sequence features, functional characteristics and cell-type specificity of the stable and unstable mRNAs. Our computational inference yielded very consistent results with the mRNA half-life measurements from pulse-labeling experiments, supporting this method as a cost-effective alternative strategy.

MATERIALS AND METHODS

Data sources

The RefSeq gene annotations for hg19 and mm9 were downloaded from the UCSC genome browser (http://genome.ucsc.edu). The gene expression and ChIP-seq data were downloaded from the NCBI Gene Expression Omnibus database. The expression data include microarray data for the ENCODE cell lines K562, GM12878, HepG2, Huvec, HSMM and NHEK [Accession designation: GSE26312 (28)], RNA-seq and GRO-seq data for MEFs [GSE27843 (29), GSE27037 (30)] and RNA-seq data for NPCs [GSE20851 (31)]. The binding data for the histone modifications include ChIP-seq data for the ENCODE cells [GSE26320 (32)], mouse embryonic fibroblasts (MEFs) [GSE12241 (12)] and Neural progenitor cell (NPCs) [GSE16256 (33)]. The averaged half-life data of 5029 transcripts in NIH3T3 cells are from the Supplementary data of Global quantification of mammalian gene expression control (34).

Data pre-processing

The microarray data were processed by the limma package in the R program (35). The fastq files for the RNA-seq data were mapped back to the corresponding genome using hg19 or mm9 with tophat v1.1.4 (36). The FPKMs (fragments per kilobase of exon per million mappable fragments) were calculated as the expression level by cufflinks (37). The fastq files for the ChIP-seq data were mapped to the human (hg19) or mouse (mm9) genome by bowtie (38). A software to generate genome-wide mapped reads intensity and call peaks (MACS) was used to generate wiggle files (39).

Defining the histone modification features for linear regression

We defined 15 regions that cover 5 kb upstream of the transcription start sites (TSSs) to 1 kb downstream of the transcription termination sites (TTSs) for each gene, including up to 1000, up to 2000, up to 5000 (1000/2000/5000-bp regions upstream of the TSSs), TSS 1000, 2000, 2500 (2000, 4000 or 5000 bp centered at the TSSs), exon, gene-body (downstream of the TSS to upstream of the TTS), TTS region 30%, TTS region 50%, TTS region 70% (30, 50 and 70% of the length of the gene body upstream of the TTS), TTS 30%, TTS 50%, TTS 70% (0.3, 0.5, 0.7 % of the exons closest to the TTS), and TTS1000 (2000-bp regions centered at the TTSs). For each transcript, the read coverage of each histone modification in the 15 regions (read count per bp) was calculated and normalized according to the sequencing library sizes. To reflect the relative contribution of each histone modification in our model, each feature was centered and scaled to have a mean of 0 and a standard deviation of 1 in multiple transcripts. We take log2 (FPKM+1) as the expression index measured by RNA-seq so that the microarray and RNA-seq data are comparable. In total, 20 421 transcripts from ENCODE cell lines and 26 400 transcripts from MEFs and NPCs were used to build the linear models.

Studentized residuals

In our regression model: where the errors e are assumed to independently follow a normal distribution N(0, σ2). The residuals ê, unlike the errors, are the deviations between the model predictions and the mRNA levels. Under the assumption of the distribution of errors, the residuals may have different variances. To reasonably compare the residuals among multiple mRNAs, the residuals should be normalized to studentized residuals.

Prediction of microRNA-binding sites

The predicted microRNA binding sites were downloaded from TargetScanHuman (http://www.targetscan.org/vert_61/) and TargetScanMouse (http://www.targetscan.org/mmu_61/) (40). The TargetScan database includes both conserved and non-conserved targeting sites within the 3′UTRs.

Gene ontology analysis

The DAVID functional annotation tool (http://david.abcc.ncifcrf.gov/) was used to analyze the gene function enrichment (41). The P-value cutoff was set as 0.01.

RESULTS

Histone modifications are predictive of mRNA levels in multiple cell lines

We first investigated the quantitative relationship between histone modification and expression profiles in eight different human and mouse cell lines with consistent data format. The expression data are measures of exon arrays for six human ENCODE cell lines and RNA-seq for two mouse cell lines. The histone mark profiles include ChIP-seq of H3K4me3, H3K36me3 and H3K27me3, which are the most frequently profiled in published studies. To assign histone modification signals to genes, we used ChIP-seq reads from 5 kb upstream of TSSs to 1 kb downstream of TTSs. To identify the distinct histone mark features that are predictive of mRNA levels, we defined 15 features across each gene for each histone mark, such as read coverage in promoters, exons or tails of gene bodies (details in section ‘Materials and Methods’). We fitted the mRNA level as a linear combination of individual histone mark features: where N represents the normalized read coverage and e indicates error, which is assumed to independently follow a normal distribution. We tested all of the possible triple combinations of the three histone mark features and used the Bayesian information criterion (BIC) to evaluate the performance of each model. In the K562 cell line, the model with H3K4me3 reads in TSS1000 (defined as the 2000 bps centered at the TSSs), H3K36me3 in gene bodies and H3K27me3 in exons yielded the lowest and optimal BIC score. Our computational model with only three histone mark features fits very well with the mRNA levels in the K562 cell line, as measured by Pearson correlation (Figure 1A) and regression P-value (<2.2 e-16). The regression coefficients in the model (Figure 1B) indicate that H3K4me3 and H3K36me3 contribute most to the prediction model, consistent with its role in transcriptional initiation and elongation. We found that including non-linear terms in the model did not significantly increase the R-square fitting or change any of the downstream conclusions, so we kept the simple linear model for efficiency. The same method was applied to the other cell lines, and each produced a high correlation between the expression levels predicted from the histone marks and the mRNA levels (Supplementary Figure S1). The relative contributions of the individual histone marks to the regression model are also similar in different cells (Supplementary Figure S2).
Figure 1.

Fitting level and regression coefficients. (A) The scatter plot of the measured mRNA levels against the fitted RNA levels of the optimal model with the least BIC score in K562 cells. Each dot represents a transcript involved in our model. (B) The regression coefficients for H3K36me3, H3K27me3 and H3K4me3. The corresponding histone modification features are gene-body, exon and TSS1000 (1000 bp centered on the TSS), respectively.

Fitting level and regression coefficients. (A) The scatter plot of the measured mRNA levels against the fitted RNA levels of the optimal model with the least BIC score in K562 cells. Each dot represents a transcript involved in our model. (B) The regression coefficients for H3K36me3, H3K27me3 and H3K4me3. The corresponding histone modification features are gene-body, exon and TSS1000 (1000 bp centered on the TSS), respectively.

Deviations between model predictions and mRNA levels can infer mRNA stability

To investigate the fitness of our model, we examined the studentized residuals to identify mRNAs whose expression levels significantly deviate from the model predictions (details in section ‘Materials and Methods’, Figure 2A). For example, DNAJC24 and RERE exhibited similar profiles for all the three histone marks, yet their expression levels measured by microarrays differed by two orders of magnitude (Figure 2B). Because our model predicts expression from histone marks, it resulted in a high positive residual for DNAJC24 and a high negative residual for RERE.
Figure 2.

Definition of inferred stable RNAs and inferred unstable RNAs. (A) The mRNAs that are more highly expressed relative to the prediction values (studentized residual >1) are defined as stable mRNAs and shown as red dots. Conversely, the blue dots represent the mRNAs with studentized residual <−1, referred to as unstable. (B) Two randomly selected transcripts, NM_001042682 and NM_181706, separately from unstable RNAs and stable RNAs. The distributions of H3K36me3, H3K27me3 and H3K4me3 over these two genes are visualized below. The y-axes represent the read counts. Note that these two genes are quite similar in overall histone modification patterns and are predicted to have similar mRNA levels in the regression model (NM_001042682: 6.20; NM_181706: 5.95). Nevertheless, their real mRNA levels, as measured by microarray, are significantly different (NM_001042682: 3.58; NM_181706: 8.68).

Definition of inferred stable RNAs and inferred unstable RNAs. (A) The mRNAs that are more highly expressed relative to the prediction values (studentized residual >1) are defined as stable mRNAs and shown as red dots. Conversely, the blue dots represent the mRNAs with studentized residual <−1, referred to as unstable. (B) Two randomly selected transcripts, NM_001042682 and NM_181706, separately from unstable RNAs and stable RNAs. The distributions of H3K36me3, H3K27me3 and H3K4me3 over these two genes are visualized below. The y-axes represent the read counts. Note that these two genes are quite similar in overall histone modification patterns and are predicted to have similar mRNA levels in the regression model (NM_001042682: 6.20; NM_181706: 5.95). Nevertheless, their real mRNA levels, as measured by microarray, are significantly different (NM_001042682: 3.58; NM_181706: 8.68). Previous studies have shown that the three histone modifications used in our model are tightly associated with the transcription cycle (9–14). Our computational model based on these histone marks is statistically correlated with mRNA levels. Therefore, we speculate that the model prediction somewhat represents a stable transcription level. In contrast, the mRNA level, as measured by microarrays or RNA-seq, reflects a balance between transcript production and degradation. Assuming that the degradation rate is independent of the transcription rate, we hypothesize that mRNAs with high negative and positive studentized residuals represent those that are degraded faster (defined as the unstable group, with residual <−1) and slower (defined as the stable group, with residual >1), respectively (Figure 2A).

Inferred unstable mRNAs harbor sequence signatures associated with more microRNA targeting

It is well known that microRNAs, a type of endogenous ∼22-nt RNAs, play an important role in posttranscriptional gene regulation (42). These RNAs direct mRNA degradation, primarily by matching ∼7 nt seeds to the 3′UTRs of target mRNAs (43). Although recent techniques, such as HITS-CLIP and PAR-CLIP, could accurately measure genome-wide microRNA target sites (44–48), such data are not available for the cell lines in our study (49). Because TargetScan is among the best and most widely used tools for the computational prediction of microRNA targets, we used its predictions to investigate the microRNA targeting of our inferred mRNAs (40). Studies have shown that proliferating and cancer cells express mRNA isoforms with shorter 3′UTRs (50,51) and that mRNAs with longer 3′UTRs are more likely to be targeted by microRNAs (52). Indeed, we found that the inferred unstable mRNAs have significantly longer 3′UTRs than the stable ones in all of the cell lines (Figure 3A–B and Supplementary Figure S3). In addition, the 3′UTRs of the unstable mRNAs harbor significantly larger numbers of all (conserved + non-conserved) microRNA-binding sites in multiple cell lines (Figure 3C–D and Supplementary Figure S4). Additionally, in the ENCODE cell lines, unstable mRNAs have larger numbers and a higher density of conserved microRNA binding sites within their 3′UTRs (density refers to the amount of binding sites divided by the 3′UTR length) than stable mRNAs (Supplementary Figures S5 and S6). These results suggest that the inferred unstable mRNAs are preferentially targeted by microRNAs.
Figure 3.

The sequence signatures and half-lives correlations of model inference. The boxplots in (A–D) represent the distribution of certain sequence signatures for specific groups (unstable or stable ones), and the P-values were calculated by the Wilcox-test. These boxplots indicate that the inferred unstable mRNAs have sequence signatures in favor of post-transcriptional degradation. (A–B) The comparison of 3′UTR length in K562 cells and in MEFs, separately. (C–D) The comparison of the number of all binding sites (conserved + non-conserved) within the 3′UTRs in K562 cells and in MEFs, separately. (E) The half-lives of mRNAs are positively correlated with studentized residuals in RNA-seq model, in which all of the transcripts are modeled on histone marks to fit the mRNA levels measured by RNA-seq. ρ refers to Pearson Correlation coefficients. (F) The half-lives are little correlated with the studentized residuals in a GRO-seq model, in which histone modifications are fitted to nascent mRNA levels.

The sequence signatures and half-lives correlations of model inference. The boxplots in (A–D) represent the distribution of certain sequence signatures for specific groups (unstable or stable ones), and the P-values were calculated by the Wilcox-test. These boxplots indicate that the inferred unstable mRNAs have sequence signatures in favor of post-transcriptional degradation. (A–B) The comparison of 3′UTR length in K562 cells and in MEFs, separately. (C–D) The comparison of the number of all binding sites (conserved + non-conserved) within the 3′UTRs in K562 cells and in MEFs, separately. (E) The half-lives of mRNAs are positively correlated with studentized residuals in RNA-seq model, in which all of the transcripts are modeled on histone marks to fit the mRNA levels measured by RNA-seq. ρ refers to Pearson Correlation coefficients. (F) The half-lives are little correlated with the studentized residuals in a GRO-seq model, in which histone modifications are fitted to nascent mRNA levels.

Half-lives are highly correlated with residuals in an RNA-seq model but independent of those in a GRO-seq model

Recently, many studies (53–56) have used pulse labeling to examine the half-lives and degradation rates of mRNAs. In these studies, a pulse of radioactive nucleotides is used to label newly synthesized RNA. By measuring radioactive mRNA and total mRNA levels over a time course, researchers can accurately compute mRNA synthesis and degradation rates. Schwanhäusser B et al. (53) reported the half-life data for 5028 mRNAs in the mouse embryonic fibroblast cell line NIH3T3. Because ChIP-seq data for H3K4me3, H3K27me3 and H3K36me3 in NIH3T3 are not publicly available, we compared their results with our computationally predicted degradation rates from another mouse embryonic fibroblast cell line, MEFs. In MEFs, the correlation between the mRNA levels measured by RNA-seq and our model predictions from the histone marks in MEFs is 0.7 for all of the transcripts. The reported half-lives (53) are significantly correlated with the residuals between the mRNA levels and the model predictions (ρ = 0.322, P-value < 2.2e-16, Figure 3E), supporting the rationale for our definitions of stable and unstable transcripts. When we incorporated the half-life information to re-derive a regression model for the 5028 mRNAs with half-life data, the model was able to better fit the mRNA levels (R = 0.54 for the model based only on histone modifications, r = 0.62 for the one based on both histone marks and half-lives). To further validate our predictions of stable as compared with unstable transcripts, we examined the GRO-seq data from MEFs. GRO-seq is a global run-on experiment to measure nuclear nascent RNAs that are associated with transcriptionally engaged polymerases (57). We refitted the regression model using H3K4me3, H3K27me3 and H3K36me3 to the GRO-seq data in MEFs and examined the residual between GRO-seq and the model prediction. Because GRO-seq effectively measures nascent mRNA production, the residues derived from GRO-seq should be independent of the mRNA degradation rates and half-lives. Indeed, we found little correlation between the GRO-seq model residuals and the half-lives reported by pulse-labeling experiments (ρ = 0.0058, Figure 3F). This observation further validates the idea that the residuals can be used to infer transcript stability, which is independent of the transcription rate.

Inferred unstable and stable mRNAs are enriched for distinct biological processes

To investigate whether genes with varying mRNA degradation rates are enriched for specific biological processes or functions, we conducted a Gene Ontology (GO) analysis for the inferred stable and unstable mRNAs. We found that the inferred unstable mRNAs are consistently enriched in transcription, regulation of RNA metabolism and chromatin modification in all the cell lines examined (Figure 4A, the complete list is shown on Supplementary Table S1). This finding is in agreement with previous studies that identified many transcription factors (e.g. Klf7, Dmtf1), especially those targeted by microRNAs (e.g. Foxo1, Hif1a, p53), to have short mRNA half-lives (54).
Figure 4.

Differentially Enriched GO Terms for unstable mRNAs (A) and stable mRNAs (B) in multiple ENCODE and mouse cell lines. The y-axis represents a negative logarithmic scale of the P-values, so the higher they are, the more significantly the corresponding GO term is enriched.

Differentially Enriched GO Terms for unstable mRNAs (A) and stable mRNAs (B) in multiple ENCODE and mouse cell lines. The y-axis represents a negative logarithmic scale of the P-values, so the higher they are, the more significantly the corresponding GO term is enriched. In contrast, the inferred stable mRNAs are also consistently enriched in constitutive cellular processes in all of the cell lines. These processes include the generation of precursor metabolites and energy, translation, oxidation-reduction, oxidative phosphorylation, cellular respiration and the electron transport chain (Figure 4B, the complete list is shown in Supplementary Table S2). Translation and protein synthesis have been implicated to account for >90% of cellular energy consumption (53), and electron transport chains are also known to maximize electron flux and minimize energy expenditure (58). These findings suggest that the stable mRNAs are all enriched in constitutive cellular processes that are associated with energy. Notably, our GO analysis results for both stable and unstable mRNAs are very consistent with those from pulse-labeling experiments (53), which further validates our computational inference.

Inferred unstable mRNAs have higher cell-type specificity under functional constraint

To investigate the consistency of the stable and unstable mRNAs in different cell conditions, we compared the mRNAs with the inferred stability from different ENCODE cell lines that fall in the same GO terms and calculated their overlaps. We found that the unstable mRNAs under each enriched GO term significantly overlap among different cell lines (pairwise overlaps >50%, hyper-geometric test, P-values <2.2e-16). In addition, the stable mRNAs also show significant pairwise overlap, and the overlap level is much higher than that in the unstable mRNAs (Figure 5). This difference might arise from the cell-type-specific expression of microRNAs and RNA-binding proteins involved in RNA degradation. These results suggest that, whereas the functional processes of stable and unstable mRNAs are consistent among different cell types, the unstable mRNAs have much higher cell-type specificity.
Figure 5.

Unstable mRNAs have higher cell-type specificity than stable ones under functional constraint. The mRNAs with inferred stability from different ENCODE cells that fall onto the same GO terms are pairwise compared, and the overlapped proportions were calculated in each term separately (each overlap corresponds to two proportions based on a pair of compared sets). The unstable gene sets are shown in blue and the stable gene sets are shown in red. The boxplot represents the distribution of the pairwise overlap proportions under the GO term.

Unstable mRNAs have higher cell-type specificity than stable ones under functional constraint. The mRNAs with inferred stability from different ENCODE cells that fall onto the same GO terms are pairwise compared, and the overlapped proportions were calculated in each term separately (each overlap corresponds to two proportions based on a pair of compared sets). The unstable gene sets are shown in blue and the stable gene sets are shown in red. The boxplot represents the distribution of the pairwise overlap proportions under the GO term.

Histone modifications are informative for inferring mRNA stability

Because our model residuals have a positive correlation with the measured mRNA levels, as shown in Figure 2A, one might question of whether the inferred mRNA stability arises solely from gene expression levels. To assess this possibility, we defined four zones in the residual plot of MEFs (Figure 6A). Zone A accounts for 44% of the total 26 400 expressed transcripts in MEFs and is significantly enriched in functions with unclear half-lives (Figure 6A, top left). In fact, most of the mRNAs (10 859/11 572) in Zone A are not expressed at all (the GRO-seq FPKMs are equal to 0).
Figure 6.

Histone modification levels are informative for inferring mRNA stability. (A) Zones A–D are defined in the middle plot according to the measured RNA level (FPKM cutoff of the mRNAs with low expression is 1, whereas that of the highly expressed mRNAs is 7) and the studentized residual (cutoff is −1 and 1). Zone A refers to mRNAs with lower abundances and absolute value of studentized residuals <1. Those with lower abundances and residuals <−1 constitute Zone B. Zone C is defined as highly expressed RNAs with residuals >1, whereas Zone D is composed of those highly expressed mRNAs with absolute values of studentized residuals <1. The four tables contain GO significantly enriched for Zones A–D, respectively. The red color denotes that mRNAs involved in the GO term tend to have longer half-lives, whereas the blue color indicates shorter half-lives. Green denotes no tendency on half-lives, and black means that the information about half-lives is unclear. All the half-life data are from Schwanhäusser B et al. (B) Comparing mRNA half-lives between Zone C and Zone D. The half-life data are available for 48% of the mRNAs in Zone C and 53% in Zone D, separately.

Histone modification levels are informative for inferring mRNA stability. (A) Zones A–D are defined in the middle plot according to the measured RNA level (FPKM cutoff of the mRNAs with low expression is 1, whereas that of the highly expressed mRNAs is 7) and the studentized residual (cutoff is −1 and 1). Zone A refers to mRNAs with lower abundances and absolute value of studentized residuals <1. Those with lower abundances and residuals <−1 constitute Zone B. Zone C is defined as highly expressed RNAs with residuals >1, whereas Zone D is composed of those highly expressed mRNAs with absolute values of studentized residuals <1. The four tables contain GO significantly enriched for Zones A–D, respectively. The red color denotes that mRNAs involved in the GO term tend to have longer half-lives, whereas the blue color indicates shorter half-lives. Green denotes no tendency on half-lives, and black means that the information about half-lives is unclear. All the half-life data are from Schwanhäusser B et al. (B) Comparing mRNA half-lives between Zone C and Zone D. The half-life data are available for 48% of the mRNAs in Zone C and 53% in Zone D, separately. Compared with Zone A, the mRNAs in Zone B are expressed at similarly low levels. Nevertheless, significantly larger numbers of mRNAs in Zone B are actually expressed (the GRO-seq FPKMs of 53.81% of the transcripts in Zone B are not equal to 0 versus 6.16% for Zone A, two proportion z-test, P-value <2.2e-16). In addition, the functional enrichments from Zone B are consistent with those from the unstable transcripts obtained from pulse-labeling experiments (Figure 6A, bottom left). In fact, a large proportion of genes are expressed at very low levels in most of the cells, e.g. the RNA-seq levels of 55% of the mRNAs are less than one FPKM in MEFs. Our analysis suggests that our computational model from histone marks could differentiate two distinct classes of genes with low expression levels: the majority is transcriptionally silenced (Zone A), and the remainders are efficiently transcribed but rapidly degraded (Zone B). Similarly, histone modification levels are informative to distinguish the more stable mRNAs from the highly expressed ones. The mRNAs in Zone C are enriched for constitutive gene functions, whereas the mRNA sets identified as unstable from pulse-labeling experiments are significantly enriched in Zone D (Figure 6A). Furthermore, the mRNAs in Zone C have significantly longer half-lives than those in Zone D (P-value = 1.6e-86, Figure 6B). In summary, the inclusion of histone modifications can differentiate between silenced mRNAs and unstable mRNAs and distinguish stable mRNAs from highly expressed ones.

DISCUSSION

In this article, we report the first computational approach to systematically infer global mRNA stability on the basis of a comparison of mRNA levels and histone modification profiles. Three lines of evidence support our inference on mRNA stability. First, the inferred unstable mRNAs harbor sequence signatures associated with more microRNA targeting. Second, the half-lives are positively correlated with the residuals when our computational model is fitted to RNA-seq but independent of the residuals when fitted to GRO-seq. Third, functional annotations of the enriched gene sets produced consistent results with previous reports based on pulse-labeling experiments. Our analysis conducted on multiple human and a mouse cell lines suggests that unstable mRNAs have higher cell-type specificity than stable ones. Finally, we found histone modification levels can distinguish unstable mRNAs from silenced ones and differentiate stable mRNAs from highly expressed ones. Histone modifications are implicated in transcriptional regulation in eukaryotic cells, and a number of reports have illustrated the predictive power of integrating multiple histone modifications on gene expression (25–27). Our study showed that carefully selected features combining only three histone modifications, H3K4me3, H3K27me3 and H3K36me3, could already explain 40–50% of the mRNA expression variance. We chose these three modifications because they are the most widely profiled and have been shown to be the most informative for gene expression prediction (26). As genome-wide histone modification profiles accumulate over time (59,60), we could improve our model by integrating more histone modifications. It is worth noting that in this study we evaluate significance mostly in the statistical sense, and biological significance needs to be confirmed by further experimental investigations. Our computational model relies on two important assumptions, both of which depend on steady-state conditions. The first assumption is that histone modification is reflective of the steady-state transcription rate. In MCF7 cells, E2 treatment has significant transcriptional effect in only 10 min (61). Thus, transient transcriptional changes in response to outside stimuli may be faster than changes of the three histone marks we selected. The second assumption is that mRNA transcription rates and degradation rates are independent. This assumption is supported by the observation that the transcription rate measured by GRO-seq has little correlation (0.0058) with the half-lives measured by pulse-labeling experiments. Upon cell differentiation or environmental stimulation, the transcription rate could be coupled with the degradation rate (56,62–64). Further studies are needed to refine the computational models for better prediction of degradation during non-steady-state conditions. Steady-state mRNA levels represent a balance between transcription and degradation. Although our analysis revealed that unstable mRNAs are significantly more likely to be targeted by microRNAs, incorporating microRNA binding sites or AU-rich elements within the 3′UTRs (65) only marginally improved the fitness of our model. For mRNAs with available half-life data, the most informative sequence feature (the number of all microRNA binding sites within the 3′UTR) merely explains 1% of the variance of the residuals whereas half-lives could explain 13% of the variance. This finding suggests that other factors, such as RNA-binding proteins or the secondary structure of the 3′UTR, may also be involved in regulating mRNA stability and decay (65–67). In summary, we propose the first computational method for inferring mRNA stability by comparing transcriptome and histone modification profiles. As histone mark ChIP-seq data continue to grow, our approach provides a cost-effective alternative to the direct measurement of RNA stability by pulse-labeling experiments (53–56) or transcriptional inhibition (62,68,69).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1–2 and Supplementary Figures 1–6.

FUNDING

National Basic Research [973] Program of China No. [2010CB944904], National Natural Science Foundation of China NO. [31028011] and NIH grant [HG4069]. Funding for open access charge: National Basic Research [973] Program of China No. [2010CB944904], National Natural Science Foundation of China No. [31028011] and NIH grant [HG4069]. Conflict of interest statement. None declared.
  66 in total

1.  Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity.

Authors:  Huck Hui Ng; François Robert; Richard A Young; Kevin Struhl
Journal:  Mol Cell       Date:  2003-03       Impact factor: 17.970

2.  Genome-wide analysis of mRNA decay in resting and activated primary human T lymphocytes.

Authors:  Arvind Raghavan; Rachel L Ogilvie; Cavan Reilly; Michelle L Abelson; Shalini Raghavan; Jayprakash Vasdewani; Mitchell Krathwohl; Paul R Bohjanen
Journal:  Nucleic Acids Res       Date:  2002-12-15       Impact factor: 16.971

3.  An important role for RNase R in mRNA decay.

Authors:  Zhuan-Fen Cheng; Murray P Deutscher
Journal:  Mol Cell       Date:  2005-01-21       Impact factor: 17.970

4.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome.

Authors:  Nathaniel D Heintzman; Rhona K Stuart; Gary Hon; Yutao Fu; Christina W Ching; R David Hawkins; Leah O Barrera; Sara Van Calcar; Chunxu Qu; Keith A Ching; Wei Wang; Zhiping Weng; Roland D Green; Gregory E Crawford; Bing Ren
Journal:  Nat Genet       Date:  2007-02-04       Impact factor: 38.330

5.  Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites.

Authors:  Rickard Sandberg; Joel R Neilson; Arup Sarma; Phillip A Sharp; Christopher B Burge
Journal:  Science       Date:  2008-06-20       Impact factor: 47.728

6.  Combinatorial patterns of histone acetylations and methylations in the human genome.

Authors:  Zhibin Wang; Chongzhi Zang; Jeffrey A Rosenfeld; Dustin E Schones; Artem Barski; Suresh Cuddapah; Kairong Cui; Tae-Young Roh; Weiqun Peng; Michael Q Zhang; Keji Zhao
Journal:  Nat Genet       Date:  2008-06-15       Impact factor: 38.330

7.  Histone modification levels are predictive for gene expression.

Authors:  Rosa Karlić; Ho-Ryun Chung; Julia Lasserre; Kristian Vlahovicek; Martin Vingron
Journal:  Proc Natl Acad Sci U S A       Date:  2010-02-01       Impact factor: 11.205

Review 8.  MicroRNA control of signal transduction.

Authors:  Masafumi Inui; Graziano Martello; Stefano Piccolo
Journal:  Nat Rev Mol Cell Biol       Date:  2010-03-10       Impact factor: 94.444

9.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

10.  starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data.

Authors:  Jian-Hua Yang; Jun-Hao Li; Peng Shao; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2010-10-30       Impact factor: 16.971

View more
  11 in total

1.  Uniform, optimal signal processing of mapped deep-sequencing data.

Authors:  Vibhor Kumar; Masafumi Muratani; Nirmala Arul Rayan; Petra Kraus; Thomas Lufkin; Huck Hui Ng; Shyam Prabhakar
Journal:  Nat Biotechnol       Date:  2013-06-16       Impact factor: 54.908

2.  Dynamics of transcriptional and post-transcriptional regulation.

Authors:  Mattia Furlan; Stefano de Pretis; Mattia Pelizzola
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

3.  Coordinated Regulation of PPARγ Expression and Activity through Control of Chromatin Structure in Adipogenesis and Obesity.

Authors:  Jérôme Eeckhoute; Frédérik Oger; Bart Staels; Philippe Lefebvre
Journal:  PPAR Res       Date:  2012-09-06       Impact factor: 4.964

4.  Analysis of RNA decay factor mediated RNA stability contributions on RNA abundance.

Authors:  Sho Maekawa; Naoto Imamachi; Takuma Irie; Hidenori Tani; Kyoko Matsumoto; Rena Mizutani; Katsutoshi Imamura; Miho Kakeda; Tetsushi Yada; Sumio Sugano; Yutaka Suzuki; Nobuyoshi Akimitsu
Journal:  BMC Genomics       Date:  2015-03-06       Impact factor: 3.969

5.  Inferring chromatin-bound protein complexes from genome-wide binding assays.

Authors:  Eugenia G Giannopoulou; Olivier Elemento
Journal:  Genome Res       Date:  2013-04-03       Impact factor: 9.043

6.  Integrative analysis of single-cell expression data reveals distinct regulatory states in bidirectional promoters.

Authors:  Fatemeh Behjati Ardakani; Kathrin Kattler; Karl Nordström; Nina Gasparoni; Gilles Gasparoni; Sarah Fuchs; Anupam Sinha; Matthias Barann; Peter Ebert; Jonas Fischer; Barbara Hutter; Gideon Zipprich; Charles D Imbusch; Bärbel Felder; Jürgen Eils; Benedikt Brors; Thomas Lengauer; Thomas Manke; Philip Rosenstiel; Jörn Walter; Marcel H Schulz
Journal:  Epigenetics Chromatin       Date:  2018-11-10       Impact factor: 4.954

7.  Influence of untranslated regions on retroviral mRNA transfer and expression.

Authors:  Anne Prel; Luc Sensébé; Jean-Christophe Pagès
Journal:  BMC Biotechnol       Date:  2013-04-16       Impact factor: 2.563

8.  Predicting expression: the complementary power of histone modification and transcription factor binding data.

Authors:  David M Budden; Daniel G Hurley; Joseph Cursons; John F Markham; Melissa J Davis; Edmund J Crampin
Journal:  Epigenetics Chromatin       Date:  2014-11-24       Impact factor: 4.954

9.  Global intron retention mediated gene regulation during CD4+ T cell activation.

Authors:  Ting Ni; Wenjing Yang; Miao Han; Yubo Zhang; Ting Shen; Hongbo Nie; Zhihui Zhou; Yalei Dai; Yanqin Yang; Poching Liu; Kairong Cui; Zhouhao Zeng; Yi Tian; Bin Zhou; Gang Wei; Keji Zhao; Weiqun Peng; Jun Zhu
Journal:  Nucleic Acids Res       Date:  2016-07-01       Impact factor: 16.971

10.  The integrated landscape of causal genes and pathways in schizophrenia.

Authors:  Changguo Ma; Chunjie Gu; Yongxia Huo; Xiaoyan Li; Xiong-Jian Luo
Journal:  Transl Psychiatry       Date:  2018-03-15       Impact factor: 6.222

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.