| Literature DB >> 32079618 |
Eddie Luidy Imada1,2, Diego Fernando Sanchez1, Ben Langmead3,4, Luigi Marchionni1, Leonardo Collado-Torres5, Christopher Wilks3, Tejasvi Matam1, Wikum Dinalankara1, Aleksey Stupnikov1, Francisco Lobo-Pereira6, Chi-Wai Yip7, Kayoko Yasuzawa7, Naoto Kondo7, Masayoshi Itoh8, Harukazu Suzuki7, Takeya Kasukawa7, Chung-Chau Hon7, Michiel J L de Hoon7, Jay W Shin7, Piero Carninci7, Andrew E Jaffe5,9,4, Jeffrey T Leek4, Alexander Favorov1,10, Gloria R Franco2.
Abstract
Long noncoding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes, including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly defined human transcriptome, inclusive of over 109,000 coding and noncoding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original recount2 resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue-specific transcription profiles for distinct classes of coding and noncoding genes, (b) perform differential expression analysis across thirteen cancer types, identifying novel noncoding genes potentially involved in tumor pathogenesis and progression, and (c) confirm the prognostic value for several enhancer lncRNAs expression in cancer. Our resource is instrumental for the systematic molecular characterization of lncRNA by the FANTOM6 Consortium. In conclusion, comprised of over 70,000 samples, the FC-R2 atlas will empower other researchers to investigate functions and biological roles of both known coding genes and novel lncRNAs.Entities:
Year: 2020 PMID: 32079618 PMCID: PMC7397872 DOI: 10.1101/gr.254656.119
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Overview of the FANTOM-CAT/recount2 resource development. FC-R2 leverages two public resources, the FANTOM-CAT gene models and recount2. FC-R2 provides expression information for 109,873 genes, both coding (22,110) and noncoding (87,693). This latter group encompasses enhancers, promoters, and other lncRNAs.
Figure 2.Tissue-specific expression in GTEx. Log2 expression for three tissue-specific genes (KRT1, NEUROD1, and ESR1) in GTEx data stratified by tissue type using FC-R2- and GENCODE-based quantification. Expression profiles are highly correlated and expressed consistently in the expected tissue types (e.g., KRT1 is most expressed in skin, NEUROD1 in brain, and ESR1 in estrogen-sensitive tissue types like uterus, Fallopian tubes, and breast). Correlations are shown on top for each tissue marker. Center lines, upper/lower quartiles, and whiskers represent the median, 25/75 percentiles, and 1.5 interquartile range, respectively. Additional tissue-specific markers are shown in Supplemental Figure S1.
Figure 3.Expression profiles across GTEx tissues. (A) Expression level and tissue specificity across four distinct RNA categories. The y-axis shows log2 expression levels representing each gene using its maximum expression in GTEx tissues expressed as transcripts per million (TPM). The x-axis shows expression specificity based on entropy computed from median expression of each gene across the GTEx tissue types. Individual genes are highlighted in the figure panels. (B) Percentage of genes expressed for each RNA category stratified by GTEx tissue facets. The dots represent the mean among samples within a facet and the error bars represent 99.99% confidence intervals. Dashed lines represent the means among all samples.
Figure 4.Differential expression for selected transcripts from distinct RNA classes across tumor types. Box plots for selected differentially expressed genes between tumor and normal samples across all 13 tumor types analyzed. For each tissue of origin, the most up-regulated (on the left) and down-regulated (on the right) gene for each RNA class is shown. Center lines, upper/lower hinges, and the whiskers, respectively, represent the median, the upper and lower quartiles, and 1.5 extensions of the interquartile range. Color coding on the top of the figure indicates the RNA classes (red for mRNA, purple for dp-lncRNA, cyan ip-lncRNA, and green for e-lncRNA). These genes were selected after global multiple testing correction across all 13 tumor types (see Supplemental Tables S1–S4).
Differentially expressed genes in cancer
Figure 5.Processing the FANTOM-CAT genomic ranges. This figure summarizes the disjoining and exon disambiguation processes performed before extracting expression information from recount2 using the FANTOM-CAT gene models. (A) Representation of a genomic segment containing three distinct, hypothetical genes: gene A having two isoforms, and genes B and C with one isoform each. Each box can be interpreted as one nucleotide along the genome. Colors indicate the three different genes. (B) Representation of disjoint exon ranges from example in panel A. Each feature is reduced to a set of nonoverlapping genomic ranges. The disjoint genomic ranges mapping back to two or more distinct genes are removed (crossed gray boxes). After removal of ambiguous ranges, the expression information for the remaining ones is extracted from recount2 and summarized at the gene level.