| Literature DB >> 29666194 |
Abstract
Antisense transcripts and other long non-coding RNAs are pervasive in mammalian cells, and some of these molecules have been proposed to regulate proximal protein-coding genes in cis For example, non-coding transcription can contribute to inactivation of tumor suppressor genes in cancer, and antisense transcripts have been implicated in the epigenetic inactivation of imprinted genes. However, our knowledge is still limited and more such regulatory interactions likely await discovery. Here, we make use of available gene expression data from a large compendium of human tumors to generate hypotheses regarding non-coding-to-coding cis-regulatory relationships with emphasis on negative associations, as these are less likely to arise for reasons other than cis-regulation. We document a large number of possible regulatory interactions, including 193 coding/non-coding pairs that show expression patterns compatible with negative cis-regulation. Importantly, by this approach we capture several known cases, and many of the involved coding genes have known roles in cancer. Our study provides a large catalog of putative non-coding/coding cis-regulatory pairs that may serve as a basis for further experimental validation and characterization.Entities:
Keywords: RNAseq; cancer; correlation; lncRNA; transcriptome
Mesh:
Substances:
Year: 2018 PMID: 29666194 PMCID: PMC5982829 DOI: 10.1534/g3.118.200296
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1General statistics of coding/coding and lncRNAs/coding correlations at several genomic distance thresholds along with specific examples. A) Classification of lncRNAs based on their orientation with respect to a flanking coding gene. LncRNAs marked in red might be potential alternative polyadenylation events of coding genes and are hence removed from downstream analyses. B-D) Enrichment of lncRNAs for significant correlations with proximal coding genes at different genomic distance thresholds: B) 100 Kb C) 50 Kb D) 25 Kb. Cancer types where the fraction of lncRNA-coding proximal pairs that show significant correlation is significantly enriched compared to control sets of coding-coding pairs are indicated by larger dots (Fisher’s exact test, P < 0.01). E) Examples of known lncRNAs-coding regulatory interactions that are also confirmed by the correlative analysis. Each box represents the estimated spearman correlation coefficient between the coding gene and the lncRNAs in a given cancer type. The colors correspond to the values depicted in the scale. Crosses indicate correlations at P > 1e-5.
Figure 2Enrichment of gene classes within specific subset of coding/lncRNA and coding/coding pairs showing positive or negative correlation in majority of cancer types A) Cancer Gene Census genes B) Imprinted genes C) Genes involved in chromatin assembly D) Transcription factors. LCN stands for lncRNA/coding negatively correlated, LCP is lncRNA/coding positively correlated set, CCN is coding/coding negatively correlated set and CCP is coding/coding positively correlated set.
Coding genes from the cancer gene census that show negative correlation of expression with a proximal lncRNA in majority of cancer types, with the correlation being significant (P < 1e-5) in at least one cancer. +ρ and -ρ indicates the number of cancer types that show significant positive or negative correlation, respectively, at the P < 1e-5 level. TauC and TauL give the Tau expression specificity score while PhastconC and PhastconL indicates the phastCons 100-way sequence conservation score for the coding gene and the lncRNA
| Coding | Class | LncRNA | +ρ | -ρ | TauC | TauL | PhastconC | PhastconL |
|---|---|---|---|---|---|---|---|---|
| TSG | 1 | 2 | 0.68 | 0.70 | 0.49 | 0.02 | ||
| Oncogene | 0 | 1 | 0.86 | 0.97 | 0.51 | 0.05 | ||
| TSG | 0 | 3 | 0.42 | 0.65 | 0.65 | 0.07 | ||
| — | 1 | 5 | 0.45 | 0.76 | 0.61 | 0.33 | ||
| Oncogene | 0 | 3 | 0.52 | 0.82 | 0.68 | 0.04 | ||
| — | 0 | 1 | 0.91 | 0.94 | 0.43 | 0.03 | ||
| — | 0 | 2 | 0.53 | 0.86 | 0.49 | 0.23 | ||
| — | 0 | 2 | 0.84 | 0.66 | 0.78 | 0.07 | ||
| — | 2 | 5 | 0.95 | 0.80 | 0.15 | 0.09 | ||
| — | 0 | 4 | 0.37 | 0.84 | 0.40 | 0.06 | ||
| — | 0 | 1 | 0.47 | 0.99 | 0.60 | 0.13 | ||
| — | 0 | 2 | 0.41 | 0.90 | 0.89 | 0.55 | ||
| — | 0 | 1 | 0.63 | 0.96 | 0.42 | 0.11 | ||
| — | 0 | 2 | 0.47 | 0.93 | 0.93 | 0.09 |
Figure 3Expression specificity, sequence conservation and association of lncRNA expression and coding TSS methylation. A) Cancer type expression specificity distribution is able to demarcate lncRNAs negatively correlated with their proximal coding genes. B) A small subset of lncRNAs that shows negative correlation with their proximal coding genes show elevated levels of sequence conservation, indicative of a potential functional importance. C) Comparison of MYLK-AS1 expression against MYLK gene expression and MYLK TSS methylation across multiple cancer types. Cancer types where the expression correlation and expression/methylation correlation are both significant are marked in bold. D) The relationship between MYLK expression, MYLK-AS expression and MYLK promoter methylation across 259 sarcoma samples. Log(cpm) represents gene expression while log(β) is for coding TSS methylation beta values. LncRNA (+ρ with proximal coding) is the LCP dataset and LncRNA (-ρ with proximal coding) is the LCN dataset.