| Literature DB >> 29657294 |
Rituparno Sen1, Gero Doose2, Peter F Stadler3,4,5,6,7,8,9.
Abstract
Long non-coding RNAs (lncRNAs) form a substantial component of the transcriptome and are involved in a wide variety of regulatory mechanisms. Compared to protein-coding genes, they are often expressed at low levels and are restricted to a narrow range of cell types or developmental stages. As a consequence, the diversity of their isoforms is still far from being recorded and catalogued in its entirety, and the debate is ongoing about what fraction of non-coding RNAs truly conveys biological function rather than being "junk". Here, using a collection of more than 100 transcriptomes from related B cell lymphoma, we show that lncRNA loci produce a very defined set of splice variants. While some of them are so rare that they become recognizable only in the superposition of dozens or hundreds of transcriptome datasets and not infrequently include introns or exons that have not been included in available genome annotation data, there is still a very limited number of processing products for any given locus. The combined depth of our sequencing data is large enough to effectively exhaust the isoform diversity: the overwhelming majority of splice junctions that are observed at all are represented by multiple junction-spanning reads. We conclude that the human transcriptome produces virtually no background of RNAs that are processed at effectively random positions, but is-under normal circumstances-confined to a well defined set of splice variants.Entities:
Keywords: GENCODE; lncRNA; lncRNA isoforms; splice junctions
Year: 2017 PMID: 29657294 PMCID: PMC5831916 DOI: 10.3390/ncrna3030023
Source DB: PubMed Journal: Noncoding RNA ISSN: 2311-553X
Long non-coding RNA (lncRNA) genes catalogued by various annotation systems. The average number of exons and introns per transcript is given in the (avg.) column.
| Genes | Transcripts | Exons | (Avg.) | Introns | (Avg.) | |
|---|---|---|---|---|---|---|
| Ensembl 60 | 1443 | 1703 | 4921 | 2.89 | 3218 | 1.88 |
| Cabili 2011 | 8263 | 14,353 | 33,045 | 2.30 | 18,607 | 1.30 |
| NONCODE 2016 | 160,376 | 233,696 | 536,111 | 2.29 | 305,771 | 1.31 |
| GENCODE v7 | 9580 | 14,984 | 42,060 | 2.81 | 28,998 | 1.94 |
| GENCODE v24 | 15,941 | 28,031 | 68,457 | 2.44 | 45,016 | 1.61 |
Figure 1Saturation curves for the number introns as a function of the number of independent transcriptome samples. The lncRNAs data refer to the 1441 annotated genes in the lymphome dataset with at least one intron.
Figure 2Scatterplots for different numbers of expression bins for long intergenic non-coding RNAs (lincRNAs) and coding genes. The diagonal, where , is marked by a line. Points above the line are those genes for which we calculate more introns compared to GENCODE v.19. Only genes with at least one intron supported by at least 10 reads are considered here. The right-most column displays the fraction of genes that show more (red), the same (blue), or fewer (green) distinct splice junctions in the lymphoma data compared to GENCODE v.19. For the coding genes, there is a clear dependence of these fractions on the expression level: for highly expressed mRNAs, we systematically predict more (rare) splice variants. For mRNAs that are very lowly expressed in the lymphoma data set, GENCODE v.19 has more complex gene models. Overall, there are still more introns in our data set than annotated (Wilcoxon test ). In contrast, we systematically see more introns in lincRNAs than annotated by GENCODE (Wilcoxon test ), independent of the expression level. An alternative presentation of the r.h.s. panels showing data binned in 5-percentiles can be found in the Supplementary Material. RPKM: reads per kilobase and million reads.
Figure 3Two examples with previously unannotated splice junctions and introns. (Top) In ENSG00000267939, we find six introns and two additional exons compared to a single intron described in GENCODE v19. (Below) For ENSG00000263470 we find eight introns plus a likely false positive compared to two introns in GENCODE.
Overlap between lincRNAs expressed in the lymphoma dataset and different versions of the GENCODE annotation.
| Genes | Transcripts | Exons | (Avg.) | Introns | (Avg.) | |
|---|---|---|---|---|---|---|
| v7 | 3296 | 4563 | 12,584 | 2.76 | 8394 | 1.84 |
| v19 | 5257 | 7487 | 18,774 | 2.51 | 12,010 | 1.60 |
| v24 | 4961 | 7318 | 18,685 | 2.55 | 12,202 | 1.67 |