| Literature DB >> 22955620 |
Sarah Djebali1, Carrie A Davis, Angelika Merkel, Alex Dobin, Timo Lassmann, Ali Mortazavi, Andrea Tanzer, Julien Lagarde, Wei Lin, Felix Schlesinger, Chenghai Xue, Georgi K Marinov, Jainab Khatun, Brian A Williams, Chris Zaleski, Joel Rozowsky, Maik Röder, Felix Kokocinski, Rehab F Abdelhamid, Tyler Alioto, Igor Antoshechkin, Michael T Baer, Nadav S Bar, Philippe Batut, Kimberly Bell, Ian Bell, Sudipto Chakrabortty, Xian Chen, Jacqueline Chrast, Joao Curado, Thomas Derrien, Jorg Drenkow, Erica Dumais, Jacqueline Dumais, Radha Duttagupta, Emilie Falconnet, Meagan Fastuca, Kata Fejes-Toth, Pedro Ferreira, Sylvain Foissac, Melissa J Fullwood, Hui Gao, David Gonzalez, Assaf Gordon, Harsha Gunawardena, Cedric Howald, Sonali Jha, Rory Johnson, Philipp Kapranov, Brandon King, Colin Kingswood, Oscar J Luo, Eddie Park, Kimberly Persaud, Jonathan B Preall, Paolo Ribeca, Brian Risk, Daniel Robyr, Michael Sammeth, Lorian Schaffer, Lei-Hoon See, Atif Shahab, Jorgen Skancke, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner, Diane Trout, Nathalie Walters, Huaien Wang, John Wrobel, Yanbao Yu, Xiaoan Ruan, Yoshihide Hayashizaki, Jennifer Harrow, Mark Gerstein, Tim Hubbard, Alexandre Reymond, Stylianos E Antonarakis, Gregory Hannon, Morgan C Giddings, Yijun Ruan, Barbara Wold, Piero Carninci, Roderic Guigó, Thomas R Gingeras.
Abstract
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22955620 PMCID: PMC3684276 DOI: 10.1038/nature11233
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure1A large majority of Gencode elements are detected by RNA-seq data
Shown are Gencode detected elements in the polyadenylated and non-polyadenylated fractions of cellular compartments (cumulative counts for both RNA fractions and compartments refer to elements present in any of the fractions or compartments). Each box plot is generated from values across all cell lines, thus capturing the dispersion across cell lines. The largest point shows the cumulative value over all cell lines.
Long polyadenylated and non polyadenylated RNAs
| 1. Expression of Gencode (v7) annotated elements | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gene type | Detected | Detected | Detected | Detected | Exon | Number of | Number of | Proportion | Number of | Proportion |
| Long non | 22,381 | 8,017 | 6,521 | 5,906 (9,277) | 87.5 | 5,906 | 1,386 | 23.5 | 631 | 10.7 |
| Protein | 288,322 | 194,752 | 59,822 | 18,939 | 98.1 | 18,939 | 1,082 | 5.7 | 10,571 | 55.8 |
| Other | 102,000 | 19,277 | 45,410 | 10,649 | 95.2 | 10,649 | 2,453 | 23.0 | 1,896 | 17.8 |
| Total | 412,703 | 222,046 | 111,753 | 35,494 | 96.7 | 35,394 | 4,921 | 13.9 | 13,098 | 37.0 |
includes pseudogenes, miRNAs, etc
all elements that passed npIDR (0.1)
cumulative detected nucleotide in detected exons / total nucleotides in detected exons
Figure2Co-transcriptional splicing
a. Short read mappings for exon-based splicing completion. Read mappings that allow assessment of splicing completion around exons. (a,b,c) Reads providing evidence of splicing completion for the region containing the exon (with either exon inclusion, ab, or exclusion, c) (d,e) Reads providing evidence for the splicing of the region containing the exon not being completed yet. The complete Splicing Index (coSI) is the ratio of a+b+c over a+b+c+d+e and can thus be broadly assumed to correspond to the fraction of RNA molecules in which the region containing the exon has already been spliced (see Tilgner et al.[17]). A coSI value of 1 means splicing completed, while a value of 0 indicates that splicing has not yet been initiated.
b. Distribution of coSI scores computed on Gencode internal exons: (Top) Distribution in the total chromatin RNA fraction. (Bottom) Distribution in cytosolic polyadenylated RNA fraction.
Figure 3Abundance of gene types in cellular compartments
2D Kernel density plots of nuclear over cytosolic enrichment (Y axis) versus overall gene expression in the whole cell extract (X axis), for protein coding, long non-coding and novel genes over all cell lines. Only genes present in all 3 RNA extracts are displayed, as well as two representative genes (ACTG1 in red and H19 in blue), for which the expression in each individual cell line is shown. The actual values of the estimated Kernel density are indicated by contour lines and color shades.
Figure 4Isoform expression within a gene
a. Number of expressed isoforms per gene per cell line. Genes tends to express many isoforms simultaneously.
b. Relative expression of the most abundant isoform per gene per cell line. There is generally one dominant isoform in a given condition.
Short RNAs
| a. Expression of Gencode (v7) annotated small RNA genes | |||||||
|---|---|---|---|---|---|---|---|
| Gene type | Gencode total | Detected genes (% | # Genes expressed | # Genes expressed | miRNA guide | miRNA | Internal fragments |
| miRNA | 1,756 | 497 (28) | 59 (12) | 147 (30) | 454 (454) | 175 (175) | 18 |
| snoRNA | 1,521 | 458 (30) | 73 (16) | 223 (49) | NA | NA | 60 |
| snRNA | 1,944 | 378 (19) | 123 (33) | 41 (11) | NA | NA | 36 |
| tRNA | 624 | 465 (75) | 29 (6) | 197 (42) | NA | NA | 52 |
| Other | 1,209 | 191 (16) | 69 (36) | 24 (13) | NA | NA | 32 |
| Total Gencode | 7,054 | 1,989 (28) | 353 (18) | 632 (32) | NA | NA | 40 |
includes all other Gencode small transcripts biotypes except pseudogenes
all elements that have passed npIDR (0.1)
number of detected miRNAs with an expressed annotated guide (with an annotated guide in mirbase)
number of detected miRNAs with an expressed annotated passenger (with an annotated passenger in mirbase)
short RNAseq mapping which 5′ ends starts 5 bp after the start and ends 5bp before the end of a detected gene
Figure 5Transcription at enhancers
a. The pattern of RNA elements around enhancer predictions[21,37] containing DNase I hypersensitive (HS) sites. The lines represent the average frequency of RNA elements (top: polyadenylated long RNA contigs; middle: CAGE tag clusters; bottom: non-polyadenylated long RNA contigs) in a genomic window around the center of the enhancer prediction as determined by DNase I HS sites. Elements on the plus strand are shown in red, and on the minus strand in blue.
b. Enhancer transcripts differ from promoter transcripts.
The box plots compare the features of transcripts at predicted enhancer loci compared to predicted novel intergenic promoters[21] and annotated promoters[8]. H3k4me3, PolyA+ and Nucleus denote the 3 following ratios: H3k4me3/(H3k4me3 + H3k4me1), polyadenylated/(polyadenylated + non-polyadenylated), Nuclear/(Nuclear + Cytosolic). Enhancers are marked by higher levels of H3k4me1 compared to H3K4me3 than novel or annotated promoters (left). Enhancer transcripts show higher levels of non-polyadenylated (middle) and nuclear (right) RNA relative to promoters.
c. Chromatin state at transcribed enhancers.
Enhancer predictions with evidence of transcription (in blue; Cage tags present at predicted locus) show a different pattern of histone modifications and higher levels of RNA Polymerase II binding than non-transcribed predictions (red). They are enriched for H3K27 acetylation, H3K4 methylation, H3K79 di-methylation and depleted for H3K27 tri-methylation.
d. Enhancer activity and transcription is cell type specific.
Loci predicted to be active transcribed enhancers in GM12878 cells, show low signal for CAGE tags (top) and for H3K27 acetylation (bottom) in other cell lines.
Figure 6Size distribution of intergenic regions
Novel genes increase the proportion of small intergenic regions; ig/as = intergenic / antisense.