| Literature DB >> 28811643 |
Paula Restrepo1,2, Mercedeh Movassagh3, Nawaf Alomran2,4, Christian Miller2, Muzi Li2,4, Chris Trenkov2, Yulian Manchev2, Sonali Bahl1, Stephanie Warnken5, Liam Spurr1,2, Tatiyana Apanasovich6, Keith Crandall5, Nathan Edwards4, Anelia Horvath7,8,9,10.
Abstract
Asymmetric allele content in the transcriptome can be indicative of functional and selective features of the underlying genetic variants. Yet, imbalanced alleles, especially from diploid genome regions, are poorly explored in cancer. Here we systematically quantify and integrate the variant allele fraction from corresponding RNA and DNA sequence data from patients with breast cancer acquired through The Cancer Genome Atlas (TCGA). We test for correlation between allele prevalence and functionality in known cancer-implicated genes from the Cancer Gene Census (CGC). We document significant allele-preferential expression of functional variants in CGC genes and across the entire dataset. Notably, we find frequent allele-specific overexpression of variants in tumor-suppressor genes. We also report a list of over-expressed variants from non-CGC genes. Overall, our analysis presents an integrated set of features of somatic allele expression and points to the vast information content of the asymmetric alleles in the cancer transcriptome.Entities:
Mesh:
Year: 2017 PMID: 28811643 PMCID: PMC5557904 DOI: 10.1038/s41598-017-08416-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Major steps of the analysis of allele distribution for somatic variants in our dataset. VR:D was analyzed for correlation with different functional mutations groups in oncogenes, tumor suppressors, and the rest of the genes. SOM-E and SOM-L variants were compared with the rest of the somatic mutations for predicted pathogenicity and location in functional motifs such as transcription and splice factor binding sites, and highly preserved sequences.
Figure 2IGV visualization of somatic mutations that are over-expressed (SOM-E, middle) or under-expressed (SOM-L, right) compared to expected allele distribution for a germline heterozygote variant (left); the heterozygosity is reflected through color-coding of the summary flag on the top of each panel. The gray lines represent reads, and the colored letters show differences from the reference.
Figure 3(A–C) Distribution of VAFtRNA (blue) and VR:D (red) in the subgroups of missense (A), non-coding (B) and stop-codon variants. The X axis shows the number of variants in each functional category. Positive correlation is seen in all three mutation groups. (D) Distribution of SOM-E and SOM-L expression status in regards to predicted effect on the protein function in the entire set, CGC-, and non-CGC variants. (E) VR:D for non-coding, missense and stop-codon variants across the entire dataset. Clearly different VR:D distribution is seen among the different functional subtypes, with the missense mutations showing higher VR:D, indicative for higher allele expression of potentially functional transcripts. (F) VR:D for pathogenic and neutral variants as predicted by FATHMM. The difference in the distribution is due to the larger proportion of the pathogenic mutations with higher VR:D.
Figure 4VR:D in the CGC vs non-CGC genes (A), in missense variants (B), in stop-codon variants (B), and in non-coding variants.
SOM-E mutations in non-GCG genes: location within transcription and splicing factor recognizable motifs.
| Gene | Chr:pos (hg38) | Function | TFBS | SFBS |
|---|---|---|---|---|
| TMEM51 | chr1:15215414C > A | missense | none | none |
| NBPF3 | chr1:21481730T > C | non-coding | none | none to SRp40 |
| EPHA10 | chr1:37720517C > T | missense | none | none |
| KIF26B | chr1:245609349C > G | missense | none to V$LRH1_Q5_01 | none |
| ILDR1 | chr3:122001432G > A | non-coding | V$PPARG_02 | none to Sam68, SLM-2 |
| MUC20 | chr3:195725818C > T | non-coding | V$CREB1_Q6 | hnRNP DL, SRp55tonone |
| ZNF518B | chr4:10445288C > G | missense | V$PBX1_02 | none |
| BBS7 | chr4:121828063C > G | non-coding | none | hnRNP, HuB, MBNL1toTIA-1 |
| OTUD4 | chr4:145146395G > A | missense | none | SRP4 0to hnRNPA1 |
| SH3RF1 | chr4:169136534G > A | non-coding | none | MBNL1 to SRp40 |
| SORBS2 | chr4:185589715C > T | missense | none | YB-1 to SAM68 |
| MYO10 | chr5:16877688C > G | missense | V$YY1_01 | none |
| MSH3 | chr5:80768937T > A | missense | V$STAT3_01 | none |
| PCDHB5 | chr5:141136316C > T | non-coding | none | none |
| GRPEL2 | chr5:149351223G > A | missense | V$YY1_02 | none |
| TCOF1 | chr5:150376236C > T | missense | none | SRp20/Nova-1/Nova-2 to none |
| MDN1 | chr6:89700782A > T | non-coding | V$SMAD4_Q6_01 | none |
| TNRC18 | chr7:5316065C > A | non-coding | none | none |
| WDR60 | chr7:158871385A > G | missense | none | none to SC35,SF2/ASF,hnRNPA1 |
| FZD3 | chr8:28527405G > A | non-coding | none | none |
| DAPK1 | chr9:87706999C > T | missense | V$NFAT_Q6 | none |
| COL27A1 | chr9:114309301C > G | missense | none to V$MYOGENIN_Q6_01 | none |
| PLCE1 | chr10:94270600A > C | missense | none to V$NFAT1_Q4 | SF2/ASF,hnRNPA1 to none |
| PDCD11 | chr10:103441838A > C | missense | none | YB-1 to SRp-40 |
| MUC6 | chr11:1016406G > A | missense | none to V$NFAT1_Q4 | none |
| ACER3 | chr11:76861031G > T | missense | none | SRp30c to none |
| RAB38 | chr11:88175236A > T | missense | V$PPARG_02 | none |
| PHLDB1 | chr11:118627958C > T | missense | V$IK3_01 | none to HuB,TIA-1,SRp40 |
| WNK1 | chr12:753666C > G | missense | V$GFI1_01 | none |
| NFE2 | chr12:54292991G > A | missense | none to V$BEN_01 | none to YB-1,SRp40 |
| NUAK1 | chr12:106067839A > T | missense | V$OCT1_06 | none |
| RASAL1 | chr12:113114816C > G | missense | V$YY1_01 | none |
| SLITRK6 | chr13:85795773C > A | missense | V$SMAD4_Q6_01 | SF2/ASF,SRp38,YB-1 to Sam68 |
| ATP11A | chr13:112858175C > A | missense | V$PAX5_01 | none |
| NYNRIN | chr14:24411385C > G | non-coding | none to V$BEN_01 | MBNL1 |
| CLMN | chr14:95203587C > T | missense | none | none to hnRNPI |
| AHNAK2 | chr14:104948892T > C | missense | none | none |
| RAD51 | chr15:40706209C > A | non-coding | V$CEBPB_02 | none |
| CCNB2 | chr15:59125011G > A | non-coding | none | none |
| SULT1A2 | chr16:28592021A > G | non-coding | none | SRp30c to none |
| NFATC3 | chr16:68190983G > A | missense | none to V$GATA_Q6 | none to SLM-2, Sam68 |
| MED31 | chr17:6651601A > G | non-coding | none | SRp30c to none |
| CHRNB1 | chr17:7447082C > T | non-coding | none | none to ETR-3 |
| ACBD4 | chr17:45136583C > T | missense | none | SRp55t to SC35 |
| ABCA7 | chr19:1041510G > A | missense | none | none to YB-1, SRp20 |
| LMNB2 | chr19:2431813G > A | non-coding | none | SRp55 to SC35 |
| ZNF676 | chr19:22180184G > T | non-coding | none to V$NFAT1_Q4 | deleted MBNL1 |
| ZIM2 | chr19:56774836G > T | stop | none to V$DRI1_01 | none to Sam68, SLM-2 |
| MRPL30 | chr2:99181122C > A | non-coding | none to V$NFAT1_Q4 | SLM-2 to hnRNP,DAZAP1, HuD |
| PASK | chr2:241126376C > G | missense | none | ETR-3 to SF2/ASF |
| TOP3B | chr22:21964200A > T | non-coding | none | hnRNPH1,hnRNPH2 to none |
| GGA1 | chr22:37620258G > A | synonymous | none | ETR-3, SRp30c to hnRNPH1/2 |
| RIBC2 | chr22:45426055G > A | non-coding | none | hnRNP K to SF2/ASF |
| GRPR | chrX:16123978C > G | missense | none | none |
| TBC1D25 | chrX:48560553C > G | missense | none | none |
| IGBP1 | chrX:70133976C > T | missense | none | MNBL1 to SRp40, SRp55 |
| HTATSF1 | chrX:136510164G > C | missense | none | none to SRp20, YB-1 |