| Literature DB >> 31910864 |
Mahdi Moradi Marjaneh1,2, Jonathan Beesley1, Tracy A O'Mara1, Pamela Mukhopadhyay1, Lambros T Koufariotis1, Stephen Kazakoff1, Nehal Hussein1,3, Laura Fachal4, Nenad Bartonicek5, Kristine M Hillman1, Susanne Kaufmann1, Haran Sivakumaran1, Chanel E Smart6, Amy E McCart Reed6, Kaltin Ferguson6, Jodi M Saunus6, Sunil R Lakhani6,7, Daniel R Barnes8, Antonis C Antoniou8, Marcel E Dinger5,9, Nicola Waddell1, Douglas F Easton4,8, Alison M Dunning4, Georgia Chenevix-Trench1, Stacey L Edwards10, Juliet D French11.
Abstract
BACKGROUND: Genetic variants identified through genome-wide association studies (GWAS) are predominantly non-coding and typically attributed to altered regulatory elements such as enhancers and promoters. However, the contribution of non-coding RNAs to complex traits is not clear.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31910864 PMCID: PMC6947989 DOI: 10.1186/s13059-019-1876-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Identification of mencRNAs from breast cancer GWAS risk regions. a Schematic of the RNA CaptureSeq experimental design. Oligonucleotide probes were tiled across intronic and intergenic regions within 1.5-Mb intervals surrounding breast cancer risk regions (capturing ~ 138 Mb or 4.3% of the human genome). The probes were hybridized to cDNAs from breast-derived cell lines and tissues resulting in capture and enrichment of low abundance transcripts in target regions that were then sequenced. The sequencing reads were de novo assembled, mapped, and quantified. b The number of transcripts captured from each RNA CaptureSeq library. The libraries included nine breast-derived cell lines, four breast tumor (BT) samples, and four breast normal (NB) samples. Four non-captured libraries were also sequenced. c Distribution of mencRNA transcript length. Pooled captured transcripts from all libraries were binned based on their transcript lengths. d Hierarchical clustering of RNA CaptureSeq libraries based on mencRNA expression profiles. ER-positive breast cancer cell lines and tumors are shown in red, ER-negative breast cancer cell lines are shown in blue, and normal breast cell lines and tissues are shown in black. NC non-captured, NB normal breast, BT breast tumor. The y-axis of the dendrogram represents a distance measure between the clusters. e Expression distribution of captured mencRNA transcripts versus protein-coding transcripts. Multi-exonic captured transcripts with max. FPKM ≥ 0.5 were mapped in TCGA RNA-Seq data and their average expression across the TCGA tumors were compared to GENCODE protein-coding genes. The y-axis represents the frequency of transcripts with a given expression value represented as log2 (average FPKM) on the x-axis. f Principal component analysis (PCA) of captured transcripts in TCGA normal breast and matched tumor samples. Scaled, centred, and normalized expression of the captured transcripts were analyzed for the first (x-axis; PC1) and second (y-axis; PC2) principal components. Each dot represents expression profile of an individual sample. g PCA of the captured transcripts in different PAM50 breast cancer subtypes. h Comparison of tissue-specific expression of captured mencRNA versus protein-coding transcripts. Multi-exonic captured transcripts with max. FPKM ≥ 0.5 and GENCODE protein-coding genes were mapped in TCGA RNA-seq data for primary tumors from seven different cancer types. For each gene, tissue specificity index (Tau) was measured with 0 and 1, indicating broad and tissue-specific expression, respectively. The y-axis represents the frequency of transcripts with a given Tau value
Fig. 2Enrichment of CCVs in mencRNA exons. a The overlap of CCVs with non-coding (ncRNA) and protein-coding transcript features. The number of CCVs directly overlapping a feature is shown in blue, and gray bars show the expected values based on overlap with 105 randomly generated interval sets. Error bars show the 95% confidence intervals of the mean. The significance of the enrichment is expressed as p values, calculated by dividing the number of random samples showing equal or greater overlap than the observed by the total number of permutations (*p < 0.05). b WashU genome browser showing annotated GENCODE genes (blue), and mencRNAs (black) within the 2q14.2 risk region. The XLOC-130152 and XLOC-130206 mencRNAs are highlighted in red. Risk signals 1–4 are numbered and the CCVs within each signal shown as colored vertical lines. The dashed gray outlines highlight the CCVs and relevant mencRNAs. c Zoomed in view of signals 1–3 CCVs, XLOC-130152 and XLOC-130206
Fig. 3mencRNAs with eVariants in an exon. a Regional XLOC-142280 eQTL association plot. Red dots indicate CCVs within the region. b WashU genome browser showing annotated GENCODE genes (blue) and mencRNAs (black) within the 2q31.1 risk region. The XLOC-142280 mencRNA is highlighted in red. CCVs are shown as red colored vertical lines. The dashed gray outline highlights the XLOC-142280 mencRNA. c Zoomed in view of CCVs and XLOC-142280. d Box plot showing the expression of XLOC-142280 in ER-negative versus ER-positive breast tumor samples from TCGA RNA-seq data
Fig. 4mencRNAs linked to distal CCVs at 16q12. a Regional XLOC-93918 eQTL association plot. Red dots indicate CCVs within the region. b WashU genome browser showing annotated GENCODE genes (blue) and mencRNAs (black) within the 16q12.2 risk region. The XLOC-93918 mencRNA is highlighted in red. CCVs are shown as red colored vertical lines. The ATAC-seq data is shown as a blue histogram, histone modification ChIP-seq data is shown as black histograms, and CHi-C chromatin interactions are shown as arcs from the B80T5 breast cell line. Red arcs depict chromatin looping between CCVs and the XLOC-93918 promoter region. c Zoomed in view of CHi-C interaction and XLOC-93918
Fig. 5mencRNAs linked to distal CCVs at 6q25. a WashU genome browser showing annotated GENCODE genes (blue) and mencRNAs (black) within the 6q25 risk region. mencRNAs whose promoters participate in chromatin interactions with CCVs are highlighted in blue, green, and red. CCVs are shown as colored vertical lines. The ATAC-seq data is shown as a blue histogram, histone modification ChIP-seq data is shown as black histograms, and CHi-C chromatin interactions are shown as arcs from the MCF7 breast cancer cell line. Colored arcs depict chromatin looping between CCVs and color-matched mencRNA promoter regions. b Correlation between expression of the three captured transcripts and ESR1 in the TCGA cohort. Each dot in the scatterplots represents a breast cancer individual with gene expression values being plotted as log2 (FPKM) on the x- and y-axes. c Boxplot showing absolute correlation coefficient values compared between looped and non-looped pairs of mencRNAs and nearby protein-coding genes (within 1 Mb up- and downstream)
Fig. 6mencRNAs targeted by multiple risk signals. a Regional XLOC-112072 eQTL association plot. Red dots indicate signal 3 CCVs. b WashU genome browser showing annotated GENCODE genes (blue) and mencRNAs (black) within the 18q11 risk region. The XLOC-112072 mencRNA is highlighted in red. Risk signals 1–3 are numbered and the CCVs within each signal shown as colored vertical lines. The ATAC-seq data are shown as blue histograms, histone modification ChIP-seq data is shown as black histograms, and CHi-C chromatin interactions are shown as arcs from T47D and B80T5 breast cell lines. Red arcs depict chromatin looping between CCVs and the XLOC-112072 promoter region. c Zoomed in view of CCVs, CHi-C interaction, and XLOC-9112072