| Literature DB >> 28028040 |
Davide Carnevali1, Anastasia Conti1, Matteo Pellegrini2, Giorgio Dieci1.
Abstract
With more than 500,000 copies, mammalian-wide interspersed repeats (MIRs), a sub-group of SINEs, represent ∼2.5% of the human genome and one of the most numerous family of potential targets for the RNA polymerase (Pol) III transcription machinery. Since MIR elements ceased to amplify ∼130 myr ago, previous studies primarily focused on their genomic impact, while the issue of their expression has not been extensively addressed. We applied a dedicated bioinformatic pipeline to ENCODE RNA-Seq datasets of seven human cell lines and, for the first time, we were able to define the Pol III-driven MIR transcriptome at single-locus resolution. While the majority of Pol III-transcribed MIR elements are cell-specific, we discovered a small set of ubiquitously transcribed MIRs mapping within Pol II-transcribed genes in antisense orientation that could influence the expression of the overlapping gene. We also identified novel Pol III-transcribed ncRNAs, deriving from transcription of annotated MIR fragments flanked by unique MIR-unrelated sequences, and confirmed the role of Pol III-specific internal promoter elements in MIR transcription. Besides demonstrating widespread transcription at these retrotranspositionally inactive elements in human cells, the ability to profile MIR expression at single-locus resolution will facilitate their study in different cell types and states including pathological alterations.Entities:
Keywords: ENCODE; RNA polymerase III; RNA-Seq; SINE; mammalian-wide interspersed repeats
Mesh:
Substances:
Year: 2017 PMID: 28028040 PMCID: PMC5381342 DOI: 10.1093/dnares/dsw048
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1Representation of the structure of a mammalian-wide interspersed repeat (MIR). A tRNA-related region contains A- and B-box promoter elements driving Pol III transcription by being recognized by TFIIIC. Core-SINE indicates a highly conserved central sequence, followed by a LINE-related region. Pol III is expected to terminate at the first encountered termination signal (Tn) which may be located at varying distances from the end of the MIR body.
Figure 2Bioinformatic pipeline flowchart. Shown is a flow diagram of the improved bioinformatic pipeline for the identification of autonomously expressed SINE loci from RNA-seq data sets. See text and Supplementary Methods for details.
MIRs subjected to in vitro transcription analysis
| MIR | Expression in cell linesa | Predicted length of primary transcript(s)b |
|---|---|---|
| MIR_dup2285 (chr16:22309780-22309939) | GM12878, H1-hESC, HeLa-S3, HepG2, HUVEC, K562, NHEK | 124/131 (T4), 223/230 (T3GT2), 243/250 (T4), 354/361 (T3CT, downstream of annotated MIR but within the cloned sequence) |
| MIR_dup3493 (chr1:34943459-34943727) | GM12878, H1-hESC, K562, NHEK | 177 (TAT3), 213 (TAT3), 277 (T2AT2). Expected transcripts originating from terminators more downstream of the annotated MIR but within the cloned sequence: 305 (T2AT2), 365 (T2AT3), 393 (T4) |
| MIRb_dup5848 (chr2:71762977-71763215) | H1-hESC, HepG2, NHEK | 119 (TCT3), 256 (T3CT), 358 (T5) |
| MIRc_dup2189 (chr14:89445565-89445634) | H1-hESC, K562, NHEK | 137 (T3AT) and 140 (T4), 207 (T2GT3), 250 (T5) |
aThe column lists, for each MIR element, the cell lines in which it was found to be expressed by ENCODE RNA-seq data analysis.
bThe reported transcript lengths were calculated by assuming as TSS the A or G residue closest to the position 12 bp upstream of the A box. To estimate the 3’ end of the transcript, both canonical (Tn with n ≥ 4) and non-canonical T-rich Pol III terminators [17] were considered both within and downstream of MIR body sequence (indicated in parentheses after the transcript length). For canonical terminators, the four Us corresponding to the first four Ts of the termination signal were considered as part of the transcripts; for non-canonical terminators, all the nucleotides of the terminator were considered as incorporated into the RNA. In the case of MIR_dup2285, for which two possible A boxes could drive transcription, the expected lengths of both putative alternative transcripts are indicated (Supplementary Fig. S1).
Statistic of expression-positive MIR elements in selected cell lines
| Cell line | Total MIRsa | Intergenic/ antisense | Intergenic/ antisense sharedb | Antisensec |
|---|---|---|---|---|
| GM12878 | 145 | 39 | 12 | 16 |
| H1-hESC | 280 | 52 | 18 | 23 |
| HeLa-S3 | 188 | 32 | 6 | 15 |
| HepG2 | 348 | 46 | 16 | 23 |
| HUVEC | 435 | 59 | 7 | 18 |
| K562 | 161 | 42 | 13 | 18 |
| NHEK | 158 | 55 | 15 | 25 |
| ALLd | 1301 | 271 | 33 | 105 |
aFor each cell line, the column reports the number of MIRs considered as autonomously expressed in both ENCODE RNA-seq replicates.
bFor each cell line, the column reports the number of intergenic/antisense MIRs that are also expressed in one or more different cell lines.
cReported in this column are the numbers of intergenic MIRs mapping with an antisense orientation to either protein-coding, non protein-coding or lincRNA genes.
dThe numbers in this raw refer to individual MIRs expressed in one or more cell lines.
Subfamily distribution of expression-positive MIRs
| MIR subfamily | Total genomica | Expressed genomica | Total intergenic/ antisensea | Expressed intergenic/ antisensea |
|---|---|---|---|---|
| MIR | 171148 (30%) | 436 (33%) | 115860 (30%) | 101 (37%) |
| MIRb | 219279 (38%) | 487 (37%) | 148567 (38%) | 104 (38%) |
| MIRc | 100252 (17%) | 202 (16%) | 67888 (17%) | 34 (13%) |
| MIR3 | 87763 (15%) | 176 (14%) | 59078 (15%) | 32 (12%) |
aReported are the absolute copy numbers and (in parentheses) the percentages of MIRs of each sub-family considered relative to (from left to right): the total set of genomic MIRs (intergenic/antisense plus intronic sense MIRs); the set of MIRs found to be expression-positive in one or more cell line (intergenic/antisense plus intronic sense ones); the genomic set of intergenic/antisense MIRs; the set of intergenic/antisense MIRs found to be expression-positive in one or more cell lines.
Figure 3Base-resolution expression profiles for five representative MIRs. See text for descriptions. Bars labeled with the names of Pol III transcription components (RPC155, TFIIIC110, BDP1, BRF1) indicate regions of enrichment of the corresponding proteins according to publicly available ChIP-seq data; upper arrowed bars represent the annotated MIR elements while the lower arrowed bars represent the corresponding expected full-length MIR elements which does not correspond to annotated elements and are reported merely to locate the alignment position of the annotated MIRs within the corresponding consensus sequence. Arrows in panel C point to the positions of strong Pol III terminators. (A) MIR_dup717 chr14:34206132-34206363; (B) MIR_dup2691 chr11:35548054-35548257 which resides within an intron of the PAMR1 gene in sense orientation; (C) MIR_dup2285 chr16:22309780-22309939; (D) MIRb_dup1281 chr17:17863550-17863651; (E) MIRc_dup2189 chr14:89445565-89445634
Figure 5In vitro transcription for selected expression-positive MIRs. In vitro transcription reactions were performed in HeLa nuclear extract using 0.5 μg of the indicated MIR templates (lanes 4–9, 13–18). A previously characterized Alu producing a 355-nt RNA (lanes 2, 11) and a human tRNAVal gene producing a known transcript pattern due to heterogeneous transcription termination (lanes 3, 12) were used as positive controls for in vitro transcription and, at the same time, as a source of RNA size markers. Negative control reactions contained empty pGEM®-T Easy vector (lanes 1, 10). For each MIR, both the wild type, B box-mutated (Bmut) and 5′-flanking region (5′ del) version were tested. Indicated by arrows on the gel image are the migration positions (with lengths) of bands corresponding to the expected transcripts in Table 3.
Figure 4Simultaneous Pol III and Pol II accumulation signals at MIR_dup2285. Shown are the ChIP-seq signals of Pol II (POLR2A) (blue) and Pol III (RPC155) (red) in HeLa-S3 and K562 cell lines across the expression-positive MIR located within the first intron of POLR3E gene in antisense orientation (cf. Fig. 3C).