| Literature DB >> 35961773 |
Alessandro Bonetti1, Charles Plessy1, Michiel de Hoon1, Yoshinari Ando1, Chung-Chau Hon1, Yuri Ishizu2, Masayoshi Itoh1,3, Sachi Kato1, Dongyan Lin1,4,5, Sho Maekawa6, Mitsuyoshi Murata1, Hiromi Nishiyori1, Jay W Shin1,7, Jens Stolte2, Ana Maria Suzuki1, Michihira Tagami1, Hazuki Takahashi1, Supat Thongjuea2, Alistair R R Forrest1,8, Yoshihide Hayashizaki3,6, Juha Kere9,10, Piero Carninci1,11.
Abstract
In eukaryotes, capped RNAs include long transcripts such as messenger RNAs and long noncoding RNAs, as well as shorter transcripts such as spliceosomal RNAs, small nucleolar RNAs, and enhancer RNAs. Long capped transcripts can be profiled using cap analysis gene expression (CAGE) sequencing and other methods. Here, we describe a sequencing library preparation protocol for short capped RNAs, apply it to a differentiation time course of the human cell line THP-1, and systematically compare the landscape of short capped RNAs to that of long capped RNAs. Transcription initiation peaks associated with genes in the sense direction have a strong preference to produce either long or short capped RNAs, with one out of six peaks detected in the short capped RNA libraries only. Gene-associated short capped RNAs have highly specific 3' ends, typically overlapping splice sites. Enhancers also preferentially generate either short or long capped RNAs, with 10% of enhancers observed in the short capped RNA libraries only. Enhancers producing either short or long capped RNAs show enrichment for GWAS-associated disease SNPs. We conclude that deep sequencing of short capped RNAs reveals new families of noncoding RNAs and elucidates the diversity of transcripts generated at known and novel promoters and enhancers.Entities:
Year: 2022 PMID: 35961773 PMCID: PMC9528987 DOI: 10.1101/gr.276647.122
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.438
Overview of data sets included in this study
Figure 1.RNA composition per sequencing library. Pol II short RNAs include independently transcribed small nucleolar RNAs U3, U8, and U13; small Cajal body–specific RNAs 2 and 17; and spliceosomal RNAs except U6 and U6atac. Pol III short RNAs include transfer RNAs, spliceosomal RNAs U6 and U6atac, small ILF3/NF90-associated RNAs, the RNA component of the RNase P ribonucleoprotein, the RNA component of mitochondrial RNA processing endoribonuclease, the 7SK RNA component of the nuclear ribonucleoprotein, the 7SL RNA component of the signal recognition particle, Ro-associated RNAs, vault RNAs, and brain cytoplasmic RNA 1. Intronic RNAs include small nucleolar RNAs and small Cajal body–specific RNAs, except those transcribed by Pol II, and the MALAT1-associated small cytoplasmic RNA. Short RNA precursors include sequences that align within a 500-bp window upstream of and downstream from transfer RNAs, small nuclear RNAs, small nucleolar RNAs, and small Cajal body–specific RNAs but do not fully align to the mature RNA. The categories sense and antisense comprise transcripts associated with mRNAs and lncRNAs in the sense and antisense orientation, respectively.
Figure 2.Gene-associated short and long capped RNAs. (A) Position of the 5′ end of short and long capped RNAs, associated in the sense orientation with annotated genes, relative to the transcription start site of the gene. (B) Venn diagram of transcription initiation peaks associated with genes in the sense orientation. The outer circles represent peaks expressing short capped RNAs (red outer circle) and long capped RNAs (blue outer circle). The inner circles represent peaks with a significantly higher expression of short capped RNAs than long capped RNAs (red inner circle) or a significantly higher expression of long capped RNAs than short capped RNAs (blue inner circle).
Figure 3.Position of the 3′ end of short capped RNAs aligning to coding (A) or noncoding (B) genes, relative to their splice sites. (C) 5′ Region of the diazepam binding inhibitor, acyl-CoA binding protein (DBI) gene. RefSeq transcripts associated with the DBI gene are shown in green. The position of the 5′ end of long capped RNAs is shown in blue as an expression histogram of the mean number of tags per million (tpm) observed in the CAGE libraries; the position of the 5′ end of short capped RNAs is shown in red as an expression histogram of the mean number of tpm observed in the single-end libraries. The 5′ and 3′ end of short capped RNAs aligning to the mature mRNA, colored by total read frequency, as observed in the paired-end libraries at the bottom (only the top expressed RNAs are shown), revealing that the short capped RNAs terminated at one specific position within an exon of DBI. (D) Size of short capped RNAs terminating at splice sites of coding genes. (E) Size of short capped RNAs terminating at splice sites of noncoding genes.
Figure 4.Enhancer expression of short and long capped RNAs. (A) Venn diagram of predicted enhancers. The outer circles represent predicted enhancers expressing short capped RNAs (red outer circle) and long capped RNAs (blue outer circle). The inner circles represent predicted enhancers significantly enriched for short capped RNAs expression (red inner circle) or for long capped RNA expression (blue inner circle). (B) Reporter activity of predicted enhancers with enriched for expression of short or long capped RNAs. (C) H3K4me1 and H3K4me3 epigenetic marker signal for enhancers and active promoters, respectively, at promoters enriched for short or long capped RNAs. (D) Fraction of SNP loci associated with GWAS traits overlapping predicted enhancers enriched for short or long capped RNA expression; error bars indicate the standard deviation of the fraction. The number of GWAS-associated SNP loci and the total number of SNP loci are shown as a ratio above the graph. The dashed line shows the genome-wide fraction of SNP loci associated with GWAS traits as the background level.