| Literature DB >> 24365181 |
Cyrille Lepoivre, Mohamed Belhocine, Aurélie Bergon, Aurélien Griffon, Miriam Yammine, Laurent Vanhille, Joaquin Zacarias-Cabeza, Marc-Antoine Garibal, Frederic Koch, Muhammad Ahmad Maqbool, Romain Fenouil, Beatrice Loriod, Hélène Holota, Marta Gut, Ivo Gut, Jean Imbert, Jean-Christophe Andrau1, Denis Puthier, Salvatore Spicuglia.
Abstract
BACKGROUND: Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24365181 PMCID: PMC3882496 DOI: 10.1186/1471-2164-14-914
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Identification of genes associated with long upstream antisense transcripts in DN thymocytes. A) Heatmap showing the Total RNA-seq signal from ΔRag (DN) thymocytes (SoliD platform) found in the 5 kb region surrounding the TSS of all non-overlapping Refseq genes. Signal was computed based on the number of reads per 100 bp binned regions originated from either the antisense or sense strand with respect to gene orientation (left and right panels, respectively). The heatmap is ordered according to the antisense signal for the [-5 kb; 0] region. The threshold for significantly expressed antisense transcripts is shown by a dotted line (see Methods section). B) Examples of genes associated with LUAT in ΔRag thymocytes. The Total and PolyA RNA-seq signals for the plus and minus strands are shown. Arrows indicate transcript orientation. The scales and genomic coordinates are shown on the left and top of each panel, respectively. Note that the scales were independently fixed for the plus and minus strands in order to properly visualize sense and antisense transcripts. C) Average profiles of PolyA- RNA-seq signal in ΔRag thymocytes for LUAT-associated genes (red line) and a control set of similarly expressed genes (black line). Signals corresponding to the orientation of the coding genes are represented as positive values while antisense signals as negative values. D) Histogram of the positions of 5′ end of LUATs relative to the TSS of their associated coding-genes. E) Distribution of expression of all coding genes (red), LUAT-associated genes (green) and LUATs (blue) in ΔRag thymocytes. F) Number of LUAT-associated genes in each expression quartile of all coding genes (Q1 = 3.05e-6 FPKM; Q2 = 0.013 FPKM; Q3 = 1.99 FPKM). The red line indicates the expected (random) distribution.
Figure 2Functional analysis of LUAT-associated genes. A-B) Functional enrichment analyses for LUAT-associated genes found in ΔRag thymocytes (A) and embryonic kidney (B). Significant GO terms for Molecular Function (MF), Biological Process (BP) and Cellular Component (CC) with a Benjamini-corrected p < 10-3 are shown. Note that using this threshold, a set of similarly expressed control genes retrieved no significant enrichment for GO terms. C) Enrichment scores of functional groups found using the Functional Classification Tool from DAVID [77]. Results are shown for LUAT-associated genes found in the multi-tissue analysis, bidirectional protein-coding gene pairs (coding-coding) and genes with unidirectional promoters. The top ten groups are shown for each set of genes. The functional groups are named based on the term with the lowest p value found in each group.
Figure 3Regulation of LUAT and associated genes during early T-cell differentiation. A) Percentage of thymocyte-specific genes in the LUAT-associated gene sets from either DP or DN (ΔRag) thymocytes or in control sets of similarly expressed genes (see Methods). The p values results from a Chi-2 test are shown. B) Scatterplot showing the log2 ratio of Total RNA-seq signals (FPKM) between DN (ΔRag) and DP thymocytes (both from Illumina platform) for LUAT and associated genes. Shown in red are LUATs and associated gene pairs which are both considered as differentially expressed (2-fold change). C) Examples of co-regulated LUATs and associated genes between DN (ΔRag) and DP thymocytes. The Total RNA-seq signal for the plus and minus strands is shown. Legends are as in Figure 1B.
Figure 4Co-expression of LUAT and associated genes. A) Left panel: heatmap of expression profiles of LUATs in the 17 indicated tissues. Expression profiles were partitioned using K-means algorithm (k = 14). Right panel: heatmap of expression profiles of LUAT-associated genes in the same tissues. Lines are ordered according to the corresponding LUATs. B) Distribution of Pearson correlation coefficient between expression values of indicated gene pairs across the 17 tissues. Red: randomly paired genes (with unidirectional promoters); green: head-to-head protein-coding genes; blue: LUAT and their associated genes. C) Examples of LUAT and associated genes across the 17 tissues. The PolyA RNA-seq signals for the plus and minus strands are shown in blue and red, respectively. The scale is indicated in the top-left of each panel.
Figure 5Characterization of sequence content and regulatory features of LUAT-associated promoters. Results in A-F and H-I are shown for the three set of genes described in Figure 2C. A) Average GC content (left panel) and percentage of CpG islands (right panel) around the TSS (bidirectional promoters are centered on the TSS from the genes that has been used to match the expression with the LUAT associated genes). B) Boxplot showing the distribution of sizes of the CpG islands overlapping the 2 kb region around the TSS (when several CpG islands were found, the sum was calculated). C) Boxplot showing the distribution of TATA box motif scores found in a 500 bp region around the TSS. D) Percentage of sequences with a conserved element at each position around the TSS. E) Average profiles of indicated ChIP-seq data in ES cells around the TSS. F) Percentage of genes having a bivalent domain in their promoter, as defined in [37]. Statistical significances were computed using the hypergeometric test. G) Percentage of genes associated with lymphoid-specific transcription factors. The histogram shows the overlap between indicated transcription factor peaks and regions around TSS (+/-5 kb) for the genes selected in DP thymocytes. Statistical significances were computed using the hypergeometric test (**p value < 0.01; *p value < 0.05). H) Average GC skew profiles, computed as (#G-#C)/(#G+#C). I) Boxplot showing the distribution of first exon length. J) The normalized number of cleavage sites in antisense orientation identified in two control and two U1 inhibition experiments in ES cells [43] was computed for a 5 kb region upstream the TSS of genes for which an associated LUAT was expressed in mouse ES cells (FPKM >1). In panels B, C, I and J, p values of the Wilcoxon rank sum test are shown.
Figure 6Chromatin characteristics of LUAT-associated promoters in DP thymocytes. Average profiles of ChIP-seq signals of the indicated histone modifications (A) and general transcription factors (B). The 5 kb region around the TSS of the genes included in the indicated sets are shown: LUAT-associated genes, bidirectional coding-gene (coding-coding) and unidirectional genes. The three gene sets were selected ensuring a similar distribution of gene expression levels in DP thymocytes (see Methods). C) Average profile of ChIP-seq signals for H3K79me2 in rescaled region ranging from the TSS to the end of the first intron. D) Boxplots showing the distribution of reads per bp for H3K79me2 within the region comprising the TSS and the end of the first intron (p-values of the Wilcoxon rank sum test are shown).
Figure 7LUAT-associated promoters are prone to pervasive transcription. A) Average profiles of Total and PolyA RNA-seq signals in DP thymocytes, for the three set of similarly expressed genes. Signals coming from plus and minus strands are indicated by solid and dashed lines, respectively. B) Splicing index calculated for the 5′ and middle exons for the three set of similarly expressed genes in DP thymocytes. C) Boxplots showing the density of Total RNA-seq reads per bp in the same orientation as the matched coding genes and within the first intron of the three group of genes in DP thymocytes. Statistical significance was assessed by the Mann–Whitney U test. D) Intron/exon ratio of individual genes for the three gene sets in DP thymocytes assessed by reverse transcription quantitative PCR. Relative transcript levels at the first intron and the last exon of each gene was estimated based on a standard dilution of genomic DNA. Statistical significance were assessed using Wilcoxon rank sum tests. E) Schematic representation of RNA processing at the three different classes of gene loci. Exons are shown by rectangles (or stripped rectangles in the case of LUATs). Solid and dotted lines represent immature (unspliced) and processed (spliced) transcripts, respectively. Our results suggest that LUAT-associated genes display an increased rate of immature transcripts.