| Literature DB >> 30767760 |
Sarah C Pyfrom1, Hong Luo1, Jacqueline E Payton2.
Abstract
BACKGROUND: Long non-coding RNAs (lncRNAs) exhibit remarkable cell-type specificity and disease association. LncRNA's functional versatility includes epigenetic modification, nuclear domain organization, transcriptional control, regulation of RNA splicing and translation, and modulation of protein activity. However, most lncRNAs remain uncharacterized due to a shortage of predictive tools available to guide functional experiments.Entities:
Keywords: Epigenetics; Interactome; Long non-coding RNA; Lymphoma; RNA-binding protein; Transcriptional control; cis-regulation; lncRNA
Mesh:
Substances:
Year: 2019 PMID: 30767760 PMCID: PMC6377765 DOI: 10.1186/s12864-019-5497-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Hundreds of lncRNAs are dysregulated in NHL compared to normal B cells. a Schematic depicts collection, flow cytometry purification, and ‘omics profiling of malignant and normal B lymphocytes from NHL patients and healthy volunteers [21]. b Diagram of NHL lncRNA discovery pipeline. RNA-seq data was analyzed using a de novo processing pipeline to enable identification of novel transcripts (Cufflinks). Novel RNA transcripts were merged with annotated transcripts (Cuffmerge). c Volcano plot highlights lncRNA transcripts with significantly different expression in NHL tumor samples compared to normal B cells (red). Relative expression of lncRNA transcripts shown in log2 fold change expression (FPKM) versus –log10 adjusted p value (FDR, Benjamini&Hochberg) for NHL:normal B cells. d Data as in C, with different types of lncRNAs highlighted in different colors (red: annotated lncRNAs, blue: intergenic lincRNAs, green: novel (not annotated) lncRNAs
Fig. 2An overview of the PLAIDOH pipeline and algorithm output. a Schematic of the single, input file required by PLAIDOH to identify all possible lncRNA/Coding gene Pairs (LCPs) in the user’s dataset. b Overview of the datasets that are used by PLAIDOH to annotate lncRNAs and predict activity based on genomic and epigenomic context. c Abridged example of the primary PLAIDOH output table, showing the three scores PLAIDOH calculates for each LCP as well as the 30+ additional columns of valuable information about the lncRNA and coding gene in each LCP. d Examples of graphs output by PLAIDOH as part of its standard run settings. The three LCPs and lncRNA1 diagrammed in a are highlighted in red and green, respectively
Fig. 3PLAIDOH reveals global patterns of LCP co-expression. LncRNA expression (log10 FPKM) (a & c) or LCP correlation (−log10 Spearman adjusted p-value) (b & d), are plotted relative to genomic distance from each lncRNA to a coding gene (a & b) or the nearest enhancer (c & d) within 400 kb regions flanking the lncRNA. LCPs with positive Spearman correlation coefficients (rho) are plotted in the upper half of each plot; those with negative Spearman correlation coefficients (rho) are plotted in the lower half. Black points highlight LCPs with adjusted Spearman p-values < 0.05 or FPKM > 1
Fig. 4PLAIDOH ranks lncRNAs by number and fraction of correlated coding genes. a Contour plot shows the frequency of significant LCPs numbers as a function of the number of all possible coding gene pairs for each lncRNA. Color indicates increasing log10 frequency of LCPs at each x,y data point (white-blue-green). Highlighted in red are two LCPs in which single lncRNAs are each highly-correlated with large clusters of coding genes. b Genomic maps of the two LCPs shown in a. c Z-Scores of LCP correlation coefficients plotted by distance between each lncRNA and coding gene pair; positively correlated LCPs are plotted in the left panel and negatively correlated LCPs are in the right panel. Highlighted in red are LCPs in which single lncRNAs are correlated with only one coding gene. d Genomic maps of the LCPs shown in c
Fig. 5LncRNAs demonstrate common or cancer-type specific correlation profiles. a Venn diagram shows the number of significant LCPs shared or unique among five TCGA cancer types. Significant = Spearman correlation adj p < 0.05 for LCP expression. b Binary heatmap shows the pattern of correlation significance for LCPs across TCGA cancer types. Spearman adj p < 0.05 (purple); p > 0.01 (white). c Heatmap of LCP Spearman correlation p-values for expression of AC096992.2 and each of the genes within 400 kb. Spearman adj p < 0.01 (purple); p < 0.05 (blue); p ≥ 0.05 (white). d Bar graph shows expression of AC096992.2 in TCGA cancer types. e Box plot shows Spearman correlation coefficients (rho) for expression of AC096992.2 and all genes within 400 kb flanking. f Heatmap of LCP Spearman correlation p-values for expression of AC138207.5 and each of the genes within 400 kb flanking. Colors as in C. g Bar graph shows expression of AC138207.5 in TCGA cancer types. h Box plot shows Spearman correlation coefficients (rho) for expression of AC138207.5 and all genes within 400 kb flanking
Fig. 6PLAIDOH ranks LCPs by likely transcriptional regulatory mechanism, inferred from Enhancer and LncRNA Cis-regulatory Scores. a-f Plots show LCPs from ENCODE cell lines (a-c) or TCGA DLBC samples (d-f). a & d Plots show LCPs ranked by increasing LncRNA Transcript Cis-regulatory Scores. Red points are known cis-acting lncRNAs; in green are novel LCPs with the highest scores and/or containing known lymphoma oncogenes. b & e As in A&D, but ranked by increasing Enhancer Scores. Highlighted in red are known enhancer-associated lncRNAs; in green are novel LCPs with the highest scores and/or containing known lymphoma oncogenes. c & f XY plots show Enhancer versus LncRNA Transcript Cis-regulatory Scores segregating LCPs. Dotted lines in a-f reflect score cut-offs based on the geometric inflection points calculated from the data in a, b, d & e. Red and green data points are from a & b (for c), or d & e (for f)
Fig. 7PLAIDOH ranks lncRNAs using biological and experimental data from RNA binding protein interaction. a Interaction matrix of lncRNAs and RNA Binding Proteins. Binding events of concordantly localized lncRNAs and RBPs are colored by subcellular localization to the nucleus (blue), cytoplasm (red) or both nucleus and cytoplasm (purple). Discordantly-localized interactions are colored grey. No evidence of binding is white. b Plot shows lncRNA expression versus RBP binding-site density per kilobase of RNA transcript for each lncRNA/RBP interaction shown in panel a. Data point size is scaled to RBP expression level and subcellular localization interactions are colored to match panel a. Labeled dots highlight previously published and validated binding of RBP/lncRNA pairs
Fig. 8Validation of PLAIDOH’s functional predictions for a lncRNA highly expressed in human NHL. a UCSC Genome browser view of HK4me3 ChIP-seq (NHL) and RNA-seq (NHL, normal B cells) for the RP11-960 L18.1 locus. b XY plot shows Enhancer versus LncRNA Transcript Cis-regulatory Scores in primary NHL samples, highlighted are RP11-960 L18.1 and the two most proximal coding genes. c Expression of PLCG2 and RP11-960 L18.1 measured by qRT-PCR in HBL1 lymphoma B cell line treated with scramble or one of two RP11-960 L18.1 shRNAs. d Western Blot for PLCG2 or GAPDH in HBL1 cells treated with scramble or one of two RP11-960 L18.1 shRNAs. Triangles indicate relative number of cells loaded on the gel. e Subcellular localization of RNA transcripts determined by cell-fractionation of control (WT) HBL1 cells followed by qRT-PCR (CP: cytoplasm, NC: nuclear, NP: nucleoplasm, CA: chromatin-associated). f Plot shows lncRNA expression versus log10 RBP binding-site density per kilobase of RNA transcript for each lncRNA/RBP interaction, highlighted are RBPs that bind RP11-960 L18.1. Data point size is scaled to RBP expression level and subcellular localization interactions are colored as in Fig. 7
Example PLAIDOH input table. Example header and first two lines of the modified bedfile required from the user as an input file
| #CHR | START | STOP | NAME | TYPE | SAMPLE1 | SAMPLEN |
|---|---|---|---|---|---|---|
| chr1 | 112 | 256 | DHX9 | protein_coding | 0.675 | 5.89 |
| chr1 | 778 | 4334 | AC00896.1 | lncRNA | 89 | 4 |
| chr1 | 334 | 566 | RP9911.3 | antisense_rna | 8.3 | 0.33 |
Data sources for each PLAIDOH default file. Data names, sources and descriptions for all of the metrics utilized by PLAIDOH to annotate lncRNA and gene function
| Data source | Data Type | Description | URL |
|---|---|---|---|
| Enhancer Atlas [ | Enhancer boundaries | Chromosomal positions for enhancer boundaries from all available tissue samples were downloaded in May 2018. |
|
| Super Enhancer Archive [ | Super-enhancer boundaries | Chromosomal positions for super enhancer boundaries from all available tissue samples were downloaded in May 2018. | |
| ENCODE [ | Histone ChIP-seq (p-value of peaks) | H3K4ME3, H3K4ME1, and H3K27AC ChIP-seq experiment bed files for all available cell lines were downloaded from the ENCODE experiment database in May 2018. All bed files were modified to contain the cell line, histone modification and peak p-value as a column entry. |
|
| ENCODE [ | Cell Fraction Expression (proportion of total RPKM), | FPKMs for all transcripts in nuclear and cytoplasmic RNA-seq for GM12878 cells were downloaded in March 2018. The fraction of total reads in the nuclear fraction for each transcript was calculated. |
|
| ENCODE [ | RNA-Binding Protein (eCLIP) (interacting partners, number RBP bound, number RBP binding sites), | eCLIP experiment bed files for replicates 01 and 02 from K562 and HepG2 cells for all available RBPs were downloaded from the ENCODE experiment database in May 2018. All bed files were modified to contain the RBP gene name, replicate number and cell line as column entries. |
|
| ENCODE [ | ChIA-PET (boundaries of interacting fragments, score) | POL2RA ChIA-PET interactions from K562 and MCF-7 cells were downloaded in April 2018. |
|
| ENSEMBL BIOMART [ | Gene Ontology, Transcript strand, Transcript Name | A biomart query was performed in May 2018. “Gene description”, “Strand”, and “Gene name” were selected and downloaded for all hg19 transcripts. |
|
| Ren Lab Hi-C Project [ | Topologically Associating Domain (TAD) Boundaries | Chromosomal positions were downloaded from two combined replicates of Human ES Cells. |
|
| RBP Image Database [ | Sub-cellular localization of RNA Binding Proteins | Localization of all RBPs in HepG2 cells was downloaded and pruned to show only Nuclear and Cytoplasmic compartments. |