| Literature DB >> 32293263 |
Yahya Bokhari1,2,3, Areej Alhareeri4,5, Tomasz Arodz6.
Abstract
BACKGROUND: Cancer is caused by genetic mutations, but not all somatic mutations in human DNA drive the emergence or growth of cancers. While many frequently-mutated cancer driver genes have already been identified and are being utilized for diagnostic, prognostic, or therapeutic purposes, identifying driver genes that harbor mutations occurring with low frequency in human cancers is an ongoing endeavor. Typically, mutations that do not confer growth advantage to tumors - passenger mutations - dominate the mutation landscape of tumor cell genome, making identification of low-frequency driver mutations a challenge. The leading approach for discovering new putative driver genes involves analyzing patterns of mutations in large cohorts of patients and using statistical methods to discriminate driver from passenger mutations.Entities:
Keywords: Cancer pathways; Driver mutations; Protein-protein interaction networks; Somatic mutations
Mesh:
Year: 2020 PMID: 32293263 PMCID: PMC7092414 DOI: 10.1186/s12859-020-3449-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of DriverDB tool datasets used in experimental validation of QuaDMutNetEx
| Dataset | Samples (n) | Genes (p) | Mutations |
|---|---|---|---|
| TN: triple negative breast cancer | 94 | 4594 | 6007 |
| GBM: glioblastoma multiforme | 120 | 3747 | 8141 |
| HGS: high-grade serous ovarian cancer | 316 | 13278 | 22897 |
| METABRIC: breast cancer | 696 | 13076 | 51255 |
Quantitative characteristics of QuaDMutNetEx results
| Dataset | Genes Found | Estimated |
|---|---|---|
| TN | 13 | <0.004 |
| GBM | 6 | <0.004 |
| HGS | 25 | 0.016 |
| METABRIC | 25 | <0.004 |
Solutions for all four datasets are statistically significant at p<0.05
Putative driver gene sets discovered by QuaDMutNetEx
| Gene | c | D/R | QuanDMutEx | COSMIC | DDBv2 | Gene | c | D/R | QuanDMutEx | COSMIC | DDBv2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TP53 | 35 | R | ✓ | ✓ | ✓ | PARK2 | 6 | ∙ | ✓ | ∙ | ✓ |
| ATR | 4 | D | ∙ | ✓ | ✓ | SAGE1 | 3 | ∙ | ✓ | ∙ | ✓ |
| NR3C1 | 3 | ∙ | ✓ | ∙ | ✓ | CREBBP | 2 | D/R | ∙ | ✓ | ✓ |
| DAPK1 | 2 | ∙ | ∙ | ∙ | ✓ | NCOA1 | 2 | D | ∙ | ✓ | ✓ |
| SLC39A7 | 2 | ∙ | ∙ | ∙ | ✓ | IDH3B | 2 | ∙ | ✓ | ∙ | ✓ |
| HIST1H4A | 2 | ∙ | ∙ | ∙ | ✓ | HIF1A | 2 | D | ∙ | ✓ | ✓ |
| MLL | 2 | D | ∙ | ✓ | ✓ | ||||||
| CDKN2A | 55 | R | ✓ | ✓ | ✓ | TP53 | 38 | R | ✓ | ✓ | ✓ |
| MDM2 | 13 | D | ∙ | ✓ | ✓ | MDM4 | 5 | D | ∙ | ✓ | ✓ |
| MAPK9 | 2 | ∙ | ∙ | ∙ | ✓ | RPL11 | 2 | ∙ | ∙ | ∙ | ✓ |
| TP53 | 249 | R | ✓ | ✓ | ✓ | SOS1 | 3 | ∙ | ∙ | ∙ | ✓ |
| CTNNB1 | 2 | D | ∙ | ✓ | ✓ | DAG1 | 2 | ∙ | ∙ | ∙ | ✓ |
| ERBB2 | 2 | D | ∙ | ✓ | ✓ | FANCA | 2 | R | ∙ | ✓ | ✓ |
| GRB2 | 2 | ∙ | ∙ | ∙ | ✓ | PIK3R1 | 2 | R | ∙ | ✓ | ✓ |
| TSHR | 2 | D | ∙ | ✓ | ✓ | DNAJA3 | 2 | ∙ | ∙ | ∙ | ✓ |
| HSP90AA1 | 2 | D | ∙ | ✓ | ✓ | HSPA5 | 2 | ∙ | ∙ | ∙ | ✓ |
| MST1R | 2 | ∙ | ∙ | ∙ | ✓ | PTK2 | 2 | ∙ | ✓ | ∙ | ✓ |
| STAT3 | 2 | D | ∙ | ✓ | ✓ | UBC | 2 | ∙ | ∙ | ∙ | ✓ |
| VAV3 | 2 | ∙ | ∙ | ∙ | ✓ | WRN | 2 | R | ✓ | ✓ | ✓ |
| ZAP70 | 2 | ∙ | ∙ | ∙ | ✓ | ERBB3 | 2 | D | ∙ | ✓ | ✓ |
| NTRK2 | 2 | ∙ | ∙ | ∙ | ✓ | SPRY2 | 2 | ∙ | ∙ | ∙ | ✓ |
| DHHC11 | 2 | ∙ | ∙ | ∙ | ✓ | JAK2 | 2 | D | ∙ | ✓ | ✓ |
| CDKN2A | 2 | R | ∙ | ✓ | ✓ | ||||||
| ERBB2 | 84 | D | ∙ | ✓ | ✓ | FGFR1 | 50 | D | ∙ | ✓ | ✓ |
| GAB2 | 35 | ∙ | ∙ | ∙ | ✓ | PSG11 | 28 | ∙ | ∙ | ∙ | ✓ |
| MACROD2 | 19 | ∙ | ✓ | ∙ | ✓ | PTEN | 16 | D | ✓ | ✓ | ✓ |
| FRS2 | 10 | ∙ | ∙ | ∙ | ✓ | IGF1R | 10 | ∙ | ∙ | ∙ | ✓ |
| CRK | 10 | ∙ | ∙ | ∙ | ✓ | JAK2 | 7 | D | ∙ | ✓ | ✓ |
| AC116165.7-2 | 6 | ∙ | ✓ | ∙ | ∙ | IRS4 | 6 | ∙ | ∙ | ✓ | ✓ |
| PTK2 | 5 | ∙ | ∙ | ∙ | ✓ | IL6ST | 4 | D | ∙ | ✓ | ✓ |
| EGFR | 4 | D | ∙ | ✓ | ✓ | GRB2 | 4 | ∙ | ∙ | ∙ | ✓ |
| PTPN1 | 4 | ∙ | ∙ | ∙ | ✓ | CREBBP | 3 | D/R | ∙ | ✓ | ✓ |
| DOK6 | 3 | ∙ | ∙ | ∙ | ✓ | JAK1 | 2 | D | ∙ | ✓ | ✓ |
| EGF | 2 | ∙ | ∙ | ∙ | ✓ | PIK3R1 | 2 | R | ∙ | ✓ | ✓ |
| SYK | 2 | D | ∙ | ✓ | ✓ | PTPN6 | 2 | ∙ | ∙ | ✓ | ✓ |
| VAV1 | 2 | ∙ | ∙ | ✓ | ✓ |
Number of patients in the dataset that had a mutation in the gene is in c column. D/R stand for dominant or recessive otherwise unknown. Genes discovered by the quadratic mutual-exclusivity approach that does not include the network connectivity term are in QuanDMutEx column. COSMIC [26, 27] column represent if the gene present in COSMIC Cancer Gene Census. Genes present in DriverDBv2 [28] are in DDBv2 column
Fig. 1Known interactions between driver genes discovered by QuaDMutNetEx on the four datasets: TN: triple-negative breast cancer, GBM: glioblastoma multiforme, HGS: high-grade serous ovarian cancer, and METABRIC: breast cancer
Putative driver gene sets and metrics in GBM subtypes discovered by QuaDMutNetEx
| Gene | c | D/R | COSMIC | DDBv2 | Gene | c | D/R | COSMIC | DDBv2 |
|---|---|---|---|---|---|---|---|---|---|
| samples | genes | mutations | Genes in solution | Coverage | Excess coverage | Connected components | |||
| n=69 | p=487 | 1192 | 6 | 0.6232 | 0.1163 | 5 | |||
| EGFR | 21 | D | ✓ | ✓ | PCDHAC2 | 15 | ∙ | ∙ | ✓ |
| DNAH9 | 4 | ∙ | ∙ | ✓ | GABRA6 | 4 | ∙ | ∙ | ✓ |
| PTPRG | 2 | ∙ | ∙ | ✓ | TEK | 2 | ∙ | ∙ | ✓ |
| samples | genes | mutations | Genes in solution | Coverage | Excess coverage | Connected components | |||
| n=75 | p=510 | 1310 | 12 | 0.7733 | 0.1552 | 4 | |||
| PTEN | 23 | D | ✓ | ✓ | EGFR | 17 | D | ✓ | ✓ |
| PIK3CA | 5 | D | ✓ | ✓ | CPNE8 | 3 | ∙ | ∙ | ✓ |
| KDM2B | 3 | ∙ | ✓ | ✓ | NRXN1 | 3 | ∙ | ∙ | ✓ |
| INPPL1 | 3 | ∙ | ∙ | ✓ | EZR | 2 | D | ✓ | ✓ |
| GRB10 | 2 | ∙ | ∙ | ✓ | IRS1 | 2 | D | ✓ | ✓ |
| IRS4 | 2 | ∙ | ∙ | ✓ | LZTR1 | 2 | D | ✓ | ✓ |
| samples | genes | mutations | Genes in solution | Coverage | Excess coverage | Connected components | |||
| n=44 | p=229 | 558 | 7 | 0.6364 | 0.0714 | 2 | |||
| TP53 | 15 | R | ✓ | ✓ | PCDHAC2 | 5 | ∙ | ∙ | ✓ |
| CHEK1 | 2 | ∙ | ∙ | ✓ | CREBBP | 2 | D/R | ✓ | ✓ |
| DAXX | 2 | R | ✓ | ✓ | MECOM | 2 | R | ✓ | ✓ |
| TBP | 2 | ∙ | ∙ | ✓ | |||||
| samples | genes | mutations | Genes in solution | Coverage | Excess coverage | Connected components | |||
| n=41 | p=199 | 482 | 8 | 0.6585 | 0.0370 | 2 | |||
| TP53 | 15 | R | ✓ | ✓ | ANK2 | 5 | ∙ | ∙ | ✓ |
| PDGFRA | 2 | D | ✓ | ✓ | FLT1 | 2 | ∙ | ∙ | ✓ |
| PTPN11 | 2 | D | ✓ | ✓ | CHD8 | 2 | ∙ | ∙ | ✓ |
| DYNC1I1 | 2 | ∙ | ∙ | ✓ | KDR | 2 | D | ✓ | ✓ |
Boxes below the GBM subtypes show the metrics of a GBM subtype including number of samples, number of genes etc. Discovered genes by QuaDMutNetEx are below the metrics box. Number of patients in the dataset that had a mutation in the gene is in c column. D/R stand for dominant or recessive otherwise unknown. COSMIC [26, 27] column represent if the gene present in COSMIC Cancer Gene Census. Genes present in DriverDBv2 [28] are in DDBv2 column
Comparison between QuaDMutNetEx, HotNet2, DriverNet, and Dendrix
| Method | Genes in solution | Coverage | Excess coverage | Dendrix score | Connected components |
|---|---|---|---|---|---|
| TN: Triple negative breast cancer | |||||
| HotNet2 | 128 | 0.6809 | 0.7969 | -118 | 9 |
| DriverNet | 21 | 0.6383 | 0.4667 | 23 | 14 |
| Dendrix | 22 | 0.6170 | 0.1034 | 51 | 8 |
| QuaDMutNetEx | 13 | 0.6854 | 0.0983 | ||
| GBM: Glioblastoma multiforme | |||||
| HotNet2 | 37 | 0.7833 | 0.4149 | 10 | 11 |
| DriverNet | 17 | 0.9333 | 0.8661 | -140 | 9 |
| Dendrix | 22 | 0.7166 | 0.023256 | 4 | |
| QuaDMutNetEx | 6 | 0.8151 | 0.1855 | 79 | |
| HGS: high-grade serous ovarian cancer | |||||
| HotNet2 | 58 | 0.8449 | 0.4307 | 83 | 4 |
| DriverNet | 72 | 0.9335 | 0.6373 | -35 | 51 |
| Dendrix | 3 | 0.8037 | 0.0 | 3 | |
| QuaDMutNetEx | 25 | 0.6170 | 0.1086 | 236 | |
| METABRIC: breast cancer | |||||
| HotNet2 | 224 | 0.4424 | 0.7394 | -1694 | 18 |
| DriverNet | 90 | 0.4683 | 0.7785 | -1130 | 33 |
| Dendrix | 18 | 0.3836 | 0.1236 | 16 | |
| QuaDMutNetEx | 25 | 0.3982 | 0.1753 | 216 |
Fig. 2Effects of parameters on QuaDMutNetEx. a, b: effect on connected components; c, d: effect on coverage; e, f: effect on excess coverage. Results shown are for the HGS dataset, the results for other datasets are similar
Fig. 3Illustration of the role of the network term N(A,x). Based solely on the mutual exclusivity, potential solutions 1 and 2 are equally good, both show perfect mutual exclusivity. Inclusion of network term N(A,x) makes potential solution 2 the preferred one, since it consists of more highly connected genes