Maïwen Caudron-Herger1, Peter R Cook2, Karsten Rippe1, Argyris Papantonis3. 1. Deutsches Krebsforschungszentrum (DKFZ) & BioQuant, D-69120 Heidelberg, Germany. 2. Sir William Dunn School of Pathology, University of Oxford, OX1 3RE Oxford, UK. 3. Sir William Dunn School of Pathology, University of Oxford, OX1 3RE Oxford, UK Center for Molecular Medicine, University of Cologne, D-50931 Cologne, Germany argyris.papantonis@uni-koeln.de.
Abstract
While mapping total and poly-adenylated human transcriptomes has now become routine, characterizing nascent transcripts remains challenging, largely because nascent RNAs have such short half-lives. Here, we describe a simple, fast and cost-effective method to isolate RNA associated with transcription factories, the sites responsible for the majority of nuclear transcription. Following stimulation of human endothelial cells with the pro-inflammatory cytokine TNFα, we isolate and analyse the RNA content of factories by sequencing. Comparison with total, poly(A)(+) and chromatin RNA fractions reveals that sequencing of purified factory RNA maps the complete nascent transcriptome; it is rich in intronic unprocessed transcript, as well as long intergenic non-coding (lincRNAs) and enhancer-associated RNAs (eRNAs), micro-RNA precursors and repeat-derived RNAs. Hence, we verify that transcription factories produce most nascent RNA and confer a regulatory role via their association with a set of specifically-retained non-coding transcripts.
While mapping total and poly-adenylated human transcriptomes has now become routine, characterizing nascent transcripts remains challenging, largely because nascent RNAs have such short half-lives. Here, we describe a simple, fast and cost-effective method to isolate RNA associated with transcription factories, the sites responsible for the majority of nuclear transcription. Following stimulation of human endothelial cells with the pro-inflammatory cytokine TNFα, we isolate and analyse the RNA content of factories by sequencing. Comparison with total, poly(A)(+) and chromatin RNA fractions reveals that sequencing of purified factory RNA maps the complete nascent transcriptome; it is rich in intronic unprocessed transcript, as well as long intergenic non-coding (lincRNAs) and enhancer-associated RNAs (eRNAs), micro-RNA precursors and repeat-derived RNAs. Hence, we verify that transcription factories produce most nascent RNA and confer a regulatory role via their association with a set of specifically-retained non-coding transcripts.
Over the last years, the human transcriptome has been repeatedly revisited and found to be more complex than expected, mainly due to the large fraction of non-coding RNA (ncRNA) (1). Besides the classical ribosomal, transfer and small nucleolar RNAs, many other non-protein-coding RNAs have been uncovered (2,3) including micro-RNAs (miRNAs)(4), long inter-genic non-coding RNAs (lincRNAs)(5), copies of repeats (6,7), as well as short transcripts in and around transcription start sites (TSSs)(8,9), active enhancers (eRNAs)(10–12) and introns (13). In order to identify changes of this complex mixture of RNAs in a cell population, it is highly informative to map the dynamics of the nascent transcriptome. Most nascent transcripts have short half-lives, are variably processed and studies have shown that profiles change rapidly. Of the various approaches described for assessing nascent transcriptomes, including ‘GRO-seq’ (8), ‘NET-seq’ (14), ‘chromatin RNA-seq’ (15), ‘poly(A)-depleted RNA-seq’ (16,17) and the metabolic tagging of newly-made RNA using 4-thiouridine (18,19) each has particular shortcomings, and—in general—all are laborious and/or require a high ‘sequencing depth’; most also focus on just parts of the transcriptome.Here we introduce a simple, fast and cost-effective method to isolate and sequence nascent RNAs. It is based on the biochemical isolation of ‘transcription factories’, the megadalton large complexes that harbour >95% of all nuclear transcription (20–22). Earlier controversy on the extent to which factories contribute to the act of transcription (23) has been relieved by recent studies confirming that (i) RNA synthesis in eukaryotic nuclei occurs in discrete foci that contain several active transcription units (24–26), (ii) factories can be isolated biochemically and their protein constituents catalogued by mass-spectrometry (27) and (iii) co-regulated genes and their enhancers that lie distant in genomic space often come together in 3D nuclear space when transcribed (28–33).Here, by analysing the RNAs differentially-produced by (and associated with) factories, upon stimulation of human umbilical vein endothelial cells (HUVECs) with the pro-inflammatory cytokine TNFα, we validate that they represent the transcriptional hotspots of the nucleoplasm. Moreover, by comparison to different sub-cellular RNA fractions, we identify particular long and short non-coding transcripts that can confer much of the processing and regulatory activities attributed to transcription factories.
MATERIALS AND METHODS
Cell culture
HUVECs from pooled donors (Lonza), grown to 90% confluence in Endothelial Basal Medium 2-MV with supplements (EBM; Lonza) and 5% fetal bovine serum (FBS), were ‘starved’ for 16 h in EBM + 0.5% FBS, treated with TNFα (10 ng/ml; Peprotech) and harvested 0 or 30 min post-stimulation.
Isolation of sub-cellular fractions and RNA extraction
The protocol was modified from Melnik et al. (27) to result in isolation of RNA from the various fractions along its course; the general workflow is outlined in Figure 1A. All buffers were prepared using water treated with diethyl-pyrocarbonate (DEPC) and using nuclease- and protease-free reagents; they were used ice-cold unless stated otherwise. All washes/spins were at 600 × g for 5 min at 4°C. Approximately 1.5 – 2 × 107 cells (stimulated with TNFα for 0 or 30 min) were harvested in complete physiological buffer (PB*) using a rubber scraper (StarLabs). [PB* is based upon 100 mM potassium acetate, 30 mM KCl, 10 mM Na2HPO4 and 1 mM MgCl2. Immediately before each experiment, the following components were added: 1 mM Na2ATP, 1 mM dithiothreitol, 25 units/ml RNase inhibitor (RiboLock; Fermentas), 10 mM β-glycerophosphate, 10 mM NaF, 0.2 mM Na3VO4 and a 1/1000 dilution of complete protease inhibitor cocktail (PIC; Roche). As the acidity of adenosine triphosphate (ATP) batches varies, 100 mM KH2PO4 is used to adjust the pH to 7.4.] Nuclei were isolated by washing 3× (10 min) in 5 ml PB* plus 0.4% NP-40 (Igepal; Sigma-Aldrich), pelleted by centrifugation, gently resuspended in PB* + 0.4% NP-40 (100 μl per 107 cells), and treated (30 min, 33°C) with DNase I (Worthington; 10 units/107 cells plus 0.5 mM CaCl2) or HaeIII (1000 units/107 cells) without shaking in round-bottomed 2-ml tubes (Greiner). Reactions were stopped by adding ethylenediaminetetraacetic acid to 2.5 mM and cooling on ice and digested nuclei collected by centrifugation. This nucleolytic treatment releases most chromatin into the supernatant, from which ‘chromatin-associated’ RNA was isolated by immediately mixing with 750 μl Trizol LS (Invitrogen) and subsequent purification using the manufacturer's instructions. The pellet was now resuspended in Native Lysis Buffer (NLB; 50 μl per 107 cells). NLB contains 40 mM Tris-acetate pH 7.4, 2 M 6-aminocaproic acid, 7% sucrose, plus 1/1000 PIC and 50 units/ml RiboLock. After vortexing vigorously, the mixture was incubated (20 min; ice). Next, a mixture of caspases (2 units/μl of caspases 6, 8, 9 and 10 in PB* + NP-40; BioVision) was added to the resuspended pellet (2 μl of mix/107 nuclei) and the mixture shaken (900 rpm) and incubated (30 min, 33°C) in a thermomixer (Eppendorf). [None of the core subunits of RNA polymerase I, II or III, contain sites for these caspases, except RPB9 (27).] The reaction was stopped by adding Caspase Inhibitor III (Calbiochem) to 0.2 mM, incubating shortly on ice and spinning samples; the resulting supernatant contains large fragments of factories. ‘Factory’ RNA was now purified from this fraction using 750 μl of Trizol LS (Invitrogen) using the manufacturer's instructions.
Figure 1.
Overview of the approach. (A) Strategy. HUVECs were stimulated ± TNFα for 0 or 30 min, and harvested; intact nuclei were isolated in a physiological buffer and digested with DNase I. After spinning, most chromatin (and ‘chromatin-associated’ RNA) is released into the supernatant. The resulting pellet was treated with a mixture of caspases to detach factories (red) from the underlying sub-structure (brown line). Following centrifugation, the supernatant yields ‘factory’ RNA, whilst the pellet contains factory remnants and residual chromatin. Total (ribodepleted), and poly(A)+-enriched RNA fractions were also collected. (B) Genome browser view of a typical, constitutively-expressed, gene (EDN1) illustrating read densities obtained by sequencing total, factory and poly(A)+ RNA. (C) Average coverage profiles from two biological replicates along 2722 and 2386 concatenated exons and introns, respectively, belonging to 309 TNFα-responsive genes. Plots also include 100 bp up/downstream of exons/introns.
Overview of the approach. (A) Strategy. HUVECs were stimulated ± TNFα for 0 or 30 min, and harvested; intact nuclei were isolated in a physiological buffer and digested with DNase I. After spinning, most chromatin (and ‘chromatin-associated’ RNA) is released into the supernatant. The resulting pellet was treated with a mixture of caspases to detach factories (red) from the underlying sub-structure (brown line). Following centrifugation, the supernatant yields ‘factory’ RNA, whilst the pellet contains factory remnants and residual chromatin. Total (ribodepleted), and poly(A)+-enriched RNA fractions were also collected. (B) Genome browser view of a typical, constitutively-expressed, gene (EDN1) illustrating read densities obtained by sequencing total, factory and poly(A)+ RNA. (C) Average coverage profiles from two biological replicates along 2722 and 2386 concatenated exons and introns, respectively, belonging to 309 TNFα-responsive genes. Plots also include 100 bp up/downstream of exons/introns.For the analysis in Supplementary Figure S1, the workflow was slightly modified to include a nuclear ‘run-on’. Intact HUVECs were permeabilized in PB* with 0.025% saponin and RNA polymerases were allowed to extend their transcripts by <40 nt in the presence of a ‘run-on’ mix (complete PB* plus 100 μM of ATP, CTP and GTP, 0.1 μM UTP, 50 μCi/ml [32P]UTP and MgCl2 to a concentration equimolar to that of all the triphosphates) that allowed subsequent detection of labelled RNA by autoradiography. Then, following treatment with DNase I and caspases as above, released ‘factory’ complexes were resolved in two-dimensional acrylamide-agarose composite gels using the Mini-Protean 3 system (BioRad); distributions of nascent transcripts was visualized by autoradiography, while those of RNA polymerase II by western blotting using the iBlot transfer system (Invitrogen) and a mouse monoclonal antibody against its RPB1 subunit (7C2; ref. 34)—all as described in ref. (27).Finally, total RNA was isolated by directly lyzing ∼106 HUVECs in 1 ml of Trizol LS (Invitrogen) as described by the manufacturer and poly(A)+ fractions selected using the TrueSeq protocol (Illumina). In all cases, long (>200 nt) and short (<200 nt) RNAs were separated and DNase-treated using the miRNeasy mini kit (Qiagen) as per the manufacturer's instructions.
RNA sequencing (RNA-seq) and data analysis
Long (>200 nt) and short (<200 nt) RNAs, isolated from the different fractions of HUVECs stimulated with TNFα for 0 or 30 min, were depleted of rRNA species using the RiboMinus kit (Epicentre) according to the manufacturer's instructions, reverse-transcribed using random hexamers in a protocol that allows generation of strand-specific libraries (35) and each of two biological replicates sequenced to ≥40 million (50-bp) reads on the Illumina platform. Resulting ‘fastq’ files were processed using the Bioconductor package for R programming (http://www.bioconductor.org) to assess read quality and produce a read-coverage file after alignment of reads with Bowtie on the GRCh37/hg19 assembly of the human genome (reporting unique hits and allowing no mismatches). The Integrative Genomics Viewer (http://www.broadinstitute.org/igv) was then used to visualize coverage, while rRNAs (representing <10% of mapped reads) were excluded from downstream analysis. Transcripts were assembled using cufflinks (http://cufflinks.cbcb.umd.edu) and annotated using the Genomatix software suite (Genomatix). After annotation, the transcripts were compared to the Genomatix ‘primary transcript’ database and regrouped into a single primary transcript (when both exonic and intronic transcripts would correspond to the same transcript model) or into spliced transcripts (when only exonic signal was seen).
Pre-miRNA RT-qPCR analysis
Short factory RNA (<200 nt), isolated 0 or 30 min post-stimulation with TNFα as described above, was treated with RQ1 DNase (1 unit of DNase/μg of RNA; 37°C for 45 min; Promega) and precursor miRNAs amplified from nascent RNA using miScript assays (Qiagen) as per manufacturer's instructions on a Rotor-Gene 3000 cycler (Corbett). The presence of single amplimers was confirmed by melting-curve analysis; reactions in which the reverse-transcription step was omitted were performed to ensure amplimers did not result from residual genomic DNA and pre-miRNA levels were normalized relative to those of RNU6 snRNA.
RESULTS
Sequencing the RNA content of transcription factories
We previously devised a method to biochemically isolate discrete factory (>8 MDa) complexes harbouring the active forms of RNA polymerases I, II or III, and analysed their protein content by mass-spectrometry (27). Here, we have modified this approach and isolated a single fraction containing all three active polymerizing activities (Figure 1A). In brief, HUVECs were harvested, 0 or 30 min after stimulation with TNFα in a ‘physiological buffer’ (PB*; ref. 25), nuclei isolated after mild treatment with NP-40, washed (to remove most cytoplasmic RNAs) and DNase I-treated to detach most chromatin, but leave essentially all transcriptional activity bound to the nuclear sub-structure. The active polymerases were then released from the sub-structure by digestion with a mixture of caspases. After spinning, the resulting supernatant contained large fragments of factories that still harbour much of the original transcriptional activity (shown using a nuclear ‘run-on’ assay) and chromatin rich in active marks, like histone 3 trimethylation at lysines 4 and 36, and acetylation at lysine 27 (Supplementary Figure S1). RNA purified from this fraction (hereafter called ‘factory’ RNA) was compared with that prepared from whole cells both before (‘total’) and after poly(A)+-selection (‘poly(A)+’), as well as that from the detached chromatin (‘chromatin’; Figure 1A).Total, poly(A)+, and factory RNAs were sequenced, to yield ∼40 million reads per sample (replicates were highly reproducible; Supplementary Figure S2A). Following data mapping, factory RNA was found to be rich in intronic unprocessed—and so nascent—transcripts (a typical example is shown in Figure 1B); this verifies that factories are the active sites of transcription. Reassuringly, read profiles closely matched those obtained via ChIP-seq (33) using an antibody targeting active RNA polymerase II (Supplementary Figure S2B). While reads mapping to exons and 5′/3′-UTRs are the prevalent species in rRNA-depleted total and poly(A)+ RNA, factory profiles are dominated by intronic signal (Figure 1C) and by reads mapping to intergenic (non-coding) regions (Supplementary Figure S2C).
Factory RNA-seq allows sensitive detection of changes in gene expression profiles
The transcripts responsible for determining a particular tissue or developmental state are often identified by comparing poly(A)+ RNA profiles. However, TNFα up/downregulates many genes within minutes—well before any changes are seen in poly(A)+ RNA. Therefore, comparing nascent RNA profiles should significantly improve detection of such responsive genes and we exemplify this using two ‘early’ genes analysed previously—one short (i.e. 11-kbp TNFAIP2) and one long (i.e. 221-kbp SAMD4A; ref. 36). Poly(A)+ profiles along both genes before and after stimulation were essentially the same (Figure 2A); consequently, both would be classified as non-responsive. However, new reads in factory RNA are seen after 30 min throughout TNFAIP2 and in the first half of SAMD4A (Figure 2A), indicating that both are responsive (there are few new reads in the second half of SAMD4A as pioneering polymerases have not yet transcribed that far into the gene) (36). Use of whole-genome factory RNA data enables up/downregulated genes to be readily detected (Figure 2B) and with higher sensitivity than when using poly(A)+ and total RNA (Figure 2C; Supplementary Figure S2D, E and Table S1).
Figure 2.
Changing factory RNA levels accurately portray transcriptional changes. (A) Examples of RNA-seq coverage along typical TNFα-responsive genes. Genome-browser views illustrating read densities along 11-kbp TNFAIP2 and 221-kbp SAMD4A obtained using total, factory and poly(A)+ RNA at 0 (−TNFα) or 30 min (+TNFα) post-stimulation. (B) Genome-wide profiles of factory RNA. Up/downregulated and non-responsive genes were aligned at transcription start (TSS) and termination (TTS) sites, and the log2 fold changes in factory RNA-seq signal (30- compared to 0-min data ± SD) from two biological replicates was plotted; changes within 2 kbp up/downstream of the TSS/TTS, are also shown. (C) Total and poly(A)+ RNA-seq detect ∼50% less genes as significantly upregulated by TNFα compared to those identified using factory RNA.
Changing factory RNA levels accurately portray transcriptional changes. (A) Examples of RNA-seq coverage along typical TNFα-responsive genes. Genome-browser views illustrating read densities along 11-kbp TNFAIP2 and 221-kbp SAMD4A obtained using total, factory and poly(A)+ RNA at 0 (−TNFα) or 30 min (+TNFα) post-stimulation. (B) Genome-wide profiles of factory RNA. Up/downregulated and non-responsive genes were aligned at transcription start (TSS) and termination (TTS) sites, and the log2 fold changes in factory RNA-seq signal (30- compared to 0-min data ± SD) from two biological replicates was plotted; changes within 2 kbp up/downstream of the TSS/TTS, are also shown. (C) Total and poly(A)+ RNA-seq detect ∼50% less genes as significantly upregulated by TNFα compared to those identified using factory RNA.A genome-wide variant of ‘nuclear run-on’, GRO-seq, allows high-resolution analysis of the positioning of engaged RNA polymerases on genes (8). When applied to AC16 human cardiomyocytes stimulated with TNFα (37), it yields similar profiles to those obtained by factory RNA-seq in HUVECs, albeit with lower read coverage (Supplementary Figure S3A). Results from the two cell types correlate to an extent that allows comparison (Supplementary Figure S3B), as the inflammatory cascade is ubiquitous. Again, factory RNA-seq allows up/downregulated genes to be detected more sensitively (Supplementary Figure S3C–E and Table S1). As seen in various cell types, including AC16 (8,37), divergent transcripts are found at TSSs. These are less obvious in factory RNA profiles, largely because they are masked by the high intronic levels detected in gene bodies; even so, low-level anti-sense species are still detectable (Supplementary Figure S3F and G) and this matches the model of transcriptional initiation on the surface of factories (21).
Factory and chromatin RNAs represent different stages in the processing pathway
Factory RNA from HUVECs is also especially rich in intergenic transcripts (compared to total and poly(A)+ RNA; Figure 3A); therefore, we examined whether it contained many long inter-genic non-coding RNAs (lincRNAs; ref. 5). It did (Figure 3B illustrates two examples), with ∼2000 lincRNAs at each time point being enriched >2-fold relative to levels in total RNA (Figure 3C).
Figure 3.
Factory RNA is rich in nascent and non-coding RNA. (A) Pie charts depict distributions of reads obtained with the different RNA fractions that map uniquely to genomic features. Reads mapping to exons are the prevalent species in poly(A)+ RNA, while those mapping to introns or across exon-intron boundaries (‘partial’) predominate in factory RNA. (B) Genome-browser views of regions encoding two long inter-genic non-coding RNAs (lincRNAs) enriched in factories, 470-kbp LINC00478 (which also hosts TNFα-responsive MIR99A and non-expressed MIR125B) and 265-kbp FTX. Tracks show read density (reads per million) of total, factory and poly(A)+ RNA, before (−) or after (+) TNFα stimulation. (C) Heatmaps covering all DNA segments encoding lincRNAs detected 0 or 30 min after stimulation (from TSS to TTS, plus 2 kbp up/downstream). Each row in a heatmap represents one locus amongst the 5930 and 5834 detected before (−) and after (+) TNFα stimulation; loci are ranked from most (top) to least (bottom) enriched (compared to total RNA; log2 scale).
Factory RNA is rich in nascent and non-coding RNA. (A) Pie charts depict distributions of reads obtained with the different RNA fractions that map uniquely to genomic features. Reads mapping to exons are the prevalent species in poly(A)+ RNA, while those mapping to introns or across exon-intron boundaries (‘partial’) predominate in factory RNA. (B) Genome-browser views of regions encoding two long inter-genic non-coding RNAs (lincRNAs) enriched in factories, 470-kbp LINC00478 (which also hosts TNFα-responsive MIR99A and non-expressed MIR125B) and 265-kbp FTX. Tracks show read density (reads per million) of total, factory and poly(A)+ RNA, before (−) or after (+) TNFα stimulation. (C) Heatmaps covering all DNA segments encoding lincRNAs detected 0 or 30 min after stimulation (from TSS to TTS, plus 2 kbp up/downstream). Each row in a heatmap represents one locus amongst the 5930 and 5834 detected before (−) and after (+) TNFα stimulation; loci are ranked from most (top) to least (bottom) enriched (compared to total RNA; log2 scale).It has been proposed that a significant fraction of nascent RNA associates with chromatin (15) and thus chromatin isolation coupled to RNA sequencing could provide a simple means for uncovering nascent RNAs (38). Therefore, we extracted RNA associated with chromatin as we isolated factories (Figure 1A) and sequenced it to obtain 63–65 million reads—roughly the coverage obtained with factory RNA. Chromatin-associated RNA was rich in nascent transcripts, although not as much as factory RNA (Supplementary Figure S4A and B); nevertheless, expression levels in factory and chromatin RNA-seq data correlated well (Supplementary Figure S4C). Notably, chromatin and factories were associated with a distinct subset of lincRNAs and repeat transcripts (Supplementary Figure S4D and E), and the coverage profiles of highly-active genes probably reflect transcripts in discrete processing states (also seen in ref. 15; Supplementary Figure S4F). By analysing the top 5000 introns enriched in either fraction we can attribute this to the differential processing of long introns on chromatin; introns in each group also belong to genes involved in distinct functional pathways (Supplementary Figure S4G).
Short non-coding transcripts are abundant constituents at active sites of transcription
To investigate the contribution of short non-coding RNAs to factories, we selected those of <200 nt from total and factory RNA (again, 0 and 30 min after stimulation) and deep-sequenced them (to ∼50 million reads per sample). Factory RNA was rich in repeat-derived transcripts, especially Alus and L1s (Figure 4A; Supplementary Figure S5A); this confirms early results obtained by low-throughput sequencing of the residual RNA that remained tightly associated with the nuclear sub-structure of HeLa cells (39). As expected, factory RNA is depleted of tRNAs (Figure 4B), with the exception of tRNA-Asp-GAY, -Lys-AAG and -Glu-GAG (encoded by chromosomes 9, 7 and 13, respectively). It also contains few mature miRNAs (which are also mostly cytoplasmic; Figure 4B). However, there were again exceptions—including miR-1291, -145, -1248 or -452 which were enriched >6-fold (Supplementary Table S2)—and we note that some of these are known to be regulated by TNFα (40). In contrast (but again as predicted), miRNA precursors were produced in factories (Supplementary Figure S5B). Small nucleolar RNAs (snoRNAs) are also most enriched in factories (Figure 4B). This is in agreement with their role in co-transcriptional splicing also during inflammatory signalling (1,41), as well as with the idea of maintaining an ‘open’ chromatin state, perhaps necessary for efficient association with factories (42). Finally, many transcripts made by RNA polymerase III are also found in factory RNA (details in Supplementary Figure S5C).
Figure 4.
Factory RNA is rich in short non-coding transcripts. (A) Pie charts depict distributions of reads mapping to particular genomic features; reads mapping to the non-repetitive genome are the prevalent species in all fractions, but more of those mapping to repeats (colour-coded) are found enriched in factory RNA (data from unstimulated HUVECs). (B) Box plots present the fold enrichment of sno- and micro-RNAs (RPKM; factory compared to total RNA levels); tRNAs serve as a negative control. *: P-value < 0.05; unpaired Student's t-test. (C) eRNAs derived from active enhancers are robustly detected using factory RNA. Graphs show mean read densities of sense (red) and anti-sense (pink) factory eRNAs around 1221 active inter-genic enhancers (defined as genomic segments carrying H3K4me1 and H3K27ac marks, and binding NF-κB 30 min post-stimulation; see Supplementary Figure S6A). Coverage using total and chromatin RNA is also shown for both sense (blue/purple) and anti-sense (light blue/light purple) strands.
Factory RNA is rich in short non-coding transcripts. (A) Pie charts depict distributions of reads mapping to particular genomic features; reads mapping to the non-repetitive genome are the prevalent species in all fractions, but more of those mapping to repeats (colour-coded) are found enriched in factory RNA (data from unstimulated HUVECs). (B) Box plots present the fold enrichment of sno- and micro-RNAs (RPKM; factory compared to total RNA levels); tRNAs serve as a negative control. *: P-value < 0.05; unpaired Student's t-test. (C) eRNAs derived from active enhancers are robustly detected using factory RNA. Graphs show mean read densities of sense (red) and anti-sense (pink) factory eRNAs around 1221 active inter-genic enhancers (defined as genomic segments carrying H3K4me1 and H3K27ac marks, and binding NF-κB 30 min post-stimulation; see Supplementary Figure S6A). Coverage using total and chromatin RNA is also shown for both sense (blue/purple) and anti-sense (light blue/light purple) strands.Enhancers are now defined as active transcription units that encode short divergent transcripts of ∼500 nt (eRNAs; ref. 12) and we expected to find such eRNAs in factories (21). Of >5500 putative enhancers active 30 min after stimulation (defined by the presence of H3K4me1, H3K27ac and NF-κB; ref. 33), we focused on the subset of 1221 that lay in intergenic regions (to eliminate possible ‘contamination’ from intronic transcripts, and simplify analysis; Supplementary Figure S6A and Table S3). Factory RNA contained many sense and anti-sense reads from these 1221 sites and these were more enriched than those in total or chromatin RNA; this also holds true when looking at intragenic enhancers or HUVEC-specific ‘super-enhancers’ (43) (Figure 4C; Supplementary Figure S6C–E). Thus, factories are also the sites where eRNAs are produced and factory RNA-seq efficiently detects them.
DISCUSSION
The human transcriptome is a complex multi-layer catalogue of protein-coding and non-coding RNAs (1), but the relative abundance, biogenesis, sub-cellular localization and functional roles of the non-coding constituents remain controversial (44). Here we describe a fast and cost-effective method for purifying nascent transcripts from their sites of production—transcription factories (Figure 1A; ref. 19). ‘Factory RNA-seq’ provides excellent read coverage of nascent RNA whether that unit is protein-coding or non-coding, even at the relatively-low sequencing depth used here, largely due to the biochemical enrichment achieved by factory purification. This readout is depleted of exonic signal and allows for a redefinition of the ‘active’ subset of genes as we detect ∼350 genes characterized by obvious signal in poly(A)+ but barely any reads in factory RNA. We conclude that the corresponding set of genes most probably reflects those that rarely engage in productive transcription but yield long-lived mRNAs (Supplementary Figure S7). Overall, our approach complements others that look at specific fractions of the transcriptome, such as GRO-seq (37) and chromatin-associated RNA-seq (15,38). We anticipate that a combined use of such methods, plus novel in silico analysis tools that focus on intronic RNA levels (45), will facilitate increasingly accurate definition of transcriptomes in different cell types.Transcription factories are multi-MDa, multi-protein complexes (27) and the idea of specific RNAs acting as structural scaffolds therein has long been entertained. Here, we have identified a number of non-coding RNAs that might play such a structural role. About 250 lncRNAs are uniquely enriched in factories compared to the other fractions (Supplementary Figure S4D), some of which are known to be both nuclear and to regulate transcription (46)—including MALAT1, NEAT1 and GAS5 (47). Factories are also rich in short ncRNAs, including snoRNAs (Figure 4)—which is expected as splicing occurs co-transcriptionally (1,21). They also contain many RNAs copied from repeats (Figure 4A), including those from Alus (which are known to engage with the key pro-inflammatory driver, transcription factor NF-κB; ref. 48) and L1s (which are abundant, euchromatin-associated transcripts; ref. 49).In conclusion, this work introduces a general method for analysing RNAs associated with active sites of transcription. Much of this RNA is intronic, but a significant fraction is extra-genic. Faithful cataloguing of when these ncRNAs are both produced upon gene induction and which ones are associated with transcription factories provides a novel approach for dissecting their role in regulating gene expression.
DATA AVAILABILITY
RNA-seq data generated here can be accessed via the EBI Array Express archive under accession number E-MTAB-3020 (summarized in Supplementary Table S4).
Authors: Jason A West; Christopher P Davis; Hongjae Sunwoo; Matthew D Simon; Ruslan I Sadreyev; Peggy I Wang; Michael Y Tolstorukov; Robert E Kingston Journal: Mol Cell Date: 2014-08-21 Impact factor: 17.970
Authors: Amy C Seila; J Mauro Calabrese; Stuart S Levine; Gene W Yeo; Peter B Rahl; Ryan A Flynn; Richard A Young; Phillip A Sharp Journal: Science Date: 2008-12-04 Impact factor: 47.728