| Literature DB >> 26474971 |
Lőrinc Pongor1,2, Máté Kormos1, Christos Hatzis3, Lajos Pusztai3, András Szabó2, Balázs Győrffy4,5,6.
Abstract
BACKGROUND: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients.Entities:
Mesh:
Year: 2015 PMID: 26474971 PMCID: PMC4609150 DOI: 10.1186/s13073-015-0228-1
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Complex effects of a single mutation. A cornerstone of our model is the leveraged effect of gene regulation network influencing the final consequence of a mutation. Some target genes will be suppressed while others will be amplified resulting in a markedly changed transcriptomic fingerprint for important genes
Fig. 2Overview of the analysis setup. Mutation status and gene expression levels obtained from the TCGA repository are compared using ROC analysis to identify a gene expression signature for each mutation. Then, the ability to predict survival for this signature is assessed in the gene chip dataset. The entire analysis can be made for a single gene or for up to three genes together
Fig. 3A sample analysis result for the AKT1 gene. The plots show the effect of both upregulated genes (a) and downregulated genes (b). Notice the robust inverse correlation: higher expression of upregulated genes results in worse relapse free survival while lower expression of downregulated genes also leads to worse survival. High and low: compared to the median of the surrogate gene expression signature
A list of the top 20 most common cancer and non-cancer (based on Cosmic) mutations identified by analyzing 763 breast cancer samples
| Cosmic genesa | Non-cosmic genes | ||
|---|---|---|---|
| Gene | % of samples w/ mutations | Gene | % of samples w/ mutations |
| FRG1B | 58.1 | NBPF1 | 78.2 |
| TTC34 | 36.5 | BAGE2 | 70.6 |
| PIK3CA | 34.6 | KMT2C | 37.9 |
| CROCC | 32.9 | BAGE5 | 34.7 |
| RYR2 | 30.8 | TTN | 33.6 |
| TP53 | 28.1 | CROCCP2 | 33.6 |
| HYDIN | 28 | MUC16 | 28.4 |
| NBPF14 | 24.9 | ROCK1P1 | 23.7 |
| CSF2RA | 23.9 | TPTE2P6 | 18.5 |
| PTPRN2 | 23.9 | MST1L | 15.9 |
| EIF2B5 | 23.8 | HERC2P4 | 14.3 |
| PCDHGA1 | 23.4 | MST1P2 | 14.0 |
| SYNE1 | 23 | DDX12P | 13.6 |
| SPDYE3 | 22.9 | DPY19L2P1 | 12.8 |
| HMCN1 | 21.2 | ANKRD20A9P | 11.5 |
| DDX11 | 20.9 | MLLT10P1 | 11.1 |
| GON4L | 20.8 | TBC1D3P1-DHX40P1 | 11.1 |
| PKHD1L1 | 20.8 | MROH7-TTC4 | 11.0 |
| NEB | 20.3 | AQP7P1 | 10.8 |
| USH2A | 20.2 | SDHAP1 | 10.6 |
aVersion: COSMICv71
A significance threshold of 0.01 and a minimal AUC of 0.65 are required for each gene to be considered as significant in the ROC analysis, and the threshold for significance in the survival analysis was set at P <0.05 and average HR >1.4. This analysis included all genes (including the established driver genes as well)
Analysis results for the top genes including new and already established driver gene candidates. The table is split according to top 10 oncogene candidates (A) and top 10 tumor suppressor gene candidates (B)
| Gene | % of samples w/ mutations | Genes from ROC analysis (n) | Upregulated genes | Downregulated genes | ||
|---|---|---|---|---|---|---|
|
| HR |
| HR | |||
| (A) | ||||||
| AKT1 | 5.5 | 136 | <1E-16 | 1.8 | 1.60E-15 | 0.64 |
| ATG2B | 5.1 | 20 | 6.70E-09 | 1.4 | 6.90E-14 | 0.66 |
| COL6A2 | 5.3 | 151 | 3.60E-13 | 1.5 | 1.50E-09 | 0.72 |
| MTUS2 | 5.8 | 97 | 6.90E-06 | 1.3 | 1.10E-16 | 0.63 |
| OSBPL10 | 5.2 | 46 | 2.40E-08 | 1.4 | 1.80E-12 | 0.68 |
| POTEF | 5.7 | 16 | 1.60E-11 | 1.4 | 5.60E-16 | 0.64 |
| SCLT1 | 5.5 | 27 | 3.90E-08 | 1.4 | 5.10E-13 | 0.67 |
| TNC | 6 | 15 | 1.10E-07 | 1.3 | <1E-16 | 0.63 |
| TRANK1 | 5.3 | 10 | <1E-16 | 1.7 | 1.90E-05 | 0.79 |
| TRAPPC10 | 6.1 | 3 | <1E-16 | 1.7 | 2.60E-11 | 0.69 |
| (B) | ||||||
| ARFGEF1 | 6.4 | 63 | <1E-16 | 0.52 | <1E-16 | 1.7 |
| BRCA2 | 6 | 74 | <1E-16 | 0.52 | 2.20E-16 | 1.6 |
| GGA3 | 5.5 | 359 | <1E-16 | 0.49 | 1.40E-14 | 1.5 |
| MPP6 | 5.5 | 48 | <1E-16 | 0.53 | <1E-16 | 1.8 |
| PHEX | 6.6 | 20 | <1E-16 | 0.47 | 2.60E-15 | 1.5 |
| PXDNL | 6.1 | 255 | <1E-16 | 0.51 | <1E-16 | 1.6 |
| RGS22 | 6 | 350 | <1E-16 | 0.5 | <1E-16 | 1.7 |
| TP53 | 28.3 | 1566 | <1E-16 | 0.48 | 5.60E-16 | 1.6 |
| UBR5 | 9.9 | 8 | <1E-16 | 0.53 | <1E-16 | 1.7 |
| UNC5D | 6.6 | 51 | <1E-16 | 0.54 | <1E-16 | 1.8 |
Significant signatures identified had to have an AUC value over 0.65 and a P value below 0.01 in the ROC analysis and an average HR over 1.4 and a P value below 0.01 in the survival analysis
HR and P value: results of the Cox regression for both up- and downregulated genes
Fig. 4Distribution of significant hits in random analyses. To assess the expected false positive rate of the method, the entire pipeline was run 100 times each time on 100 randomly selected genes. The mean number of significant genes was 9.24, the number of significant genes was not more than 15 in any run