| Literature DB >> 31888619 |
Junrong Song1, Wei Peng2, Feng Wang1, Jianxin Wang3.
Abstract
BACKGROUND: Cancer as a kind of genomic alteration disease each year deprives many people's life. The biggest challenge to overcome cancer is to identify driver genes that promote the cancer development from a huge amount of passenger mutations that have no effect on the selective growth advantage of cancer. In order to solve those problems, some researchers have started to focus on identification of driver genes by integrating networks with other biological information. However, more efforts should be needed to improve the prediction performance.Entities:
Keywords: Driver genes; Dysregulated expression; Human functional interaction network; Tissue-specific expression; Variation frequency
Mesh:
Year: 2019 PMID: 31888619 PMCID: PMC6936147 DOI: 10.1186/s12920-019-0619-z
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1The workflow of Dytidriver. We divided our whole process of cancer driver gene identification into four steps and marked with ‘a’,’ b’, ‘c’, ‘d’. In the step ‘a’, we filtered the mutated genes for each patient according to whether or not it influenced the expression of downstream genes. Only the mutated genes which connect at least one outlying genes would be included in our study. Then, the filtered mutated genes for all patients were mapped to the human functional interaction network to construct the Mut-Mut matrix. The ‘b’ step is to generate the tissue-specific PCC matrix. For each cancer, we chose the top one or two tissues with the higher association score in disease-tissue matrix as the cancer related tissues such as the tissue 1 and tissue 2 for disease D1. For each tissue, we calculated its gene-gene pearson correlation values across the whole patients and then generated the gene-gene PCC matrix by keeping the absolute PCC values more than 0.3 while left setting to 0. If there are more than one tissue related to a cancer, the final tissue-specific PCC matrix is constructed by averaging the values in the gene-gene PCC matrix of each tissue. In the ‘c’ step, we constructed the ECC mutated matrix by utilizing the ECC equation. In the final ‘d’ step, we assigned each mutated gene in the network a score by summing up all the ECC values of its connecting edges and then multiply to its corresponding variation frequency. According to the scores, the mutated genes were ranked in a descending order and those ranked at the top list the were considered as potential driver genes
Fig. 2A comparison of the Precision, Recall, and Fscore for top ranking genes in the six methods. The X-axis represents the number of top-ranking genes. The Y-axis represents the score of the given metric
Cociter analysis of top 30 lung cancer driver genes identified by our method
| Genes | Cancer | Lung | Driver | Is_driver | DyTidriver | Diffusion | DriverNet | DawnRank | Muf_max | Muf_sum |
|---|---|---|---|---|---|---|---|---|---|---|
| TP53 | 6772 | 999 | 110 | 1 | 1 | 20 | 1 | 1 | 5 | 6 |
| ZNF536 | 4 | 0 | 1 | 1 | 2 | 5015 | NA | 2689 | 849 | 79 |
| EGFR | 4748 | 2849 | 166 | 1 | 3 | 1 | 3 | 4 | 7 | 26 |
| TSHZ3 | 4 | 1 | 1 | 0 | 4 | 2748 | 1295 | 2463 | 1268 | 188 |
| PRUNE2 | 12 | 1 | 1 | 0 | 5 | 5211 | NA | 2623 | 2018 | 332 |
| RYR2 | 4 | 3 | 2 | 0 | 6 | 757 | 20 | 558 | 128 | 25 |
| SPTA1 | 3 | 2 | 1 | 0 | 7 | 221 | 6 | 15 | 12 | 36 |
| ATP10D | 1 | 0 | 0 | 0 | 8 | 1836 | NA | 2825 | 2667 | 873 |
| ANKIB1 | 2 | 1 | 0 | 0 | 9 | 1607 | NA | 2572 | 4107 | 2080 |
| ZNF521 | 2 | 0 | 1 | 1 | 10 | 5025 | NA | 3058 | 1906 | 302 |
| NES | 192 | 31 | 5 | 0 | 11 | 1483 | NA | 1461 | 3094 | 1138 |
| PIK3CA | 1199 | 183 | 54 | 1 | 12 | 2 | 5 | 112 | 430 | 81 |
| TLR4 | 417 | 591 | 9 | 1 | 13 | 71 | 45 | 3 | 672 | 138 |
| NF1 | 165 | 16 | 11 | 1 | 14 | 34 | 56 | 21 | 389 | 139 |
| FAT4 | 45 | 7 | 2 | 0 | 15 | 3106 | 839 | 1961 | 970 | 119 |
| ASH1L | 4 | 1 | 1 | 0 | 16 | 1506 | NA | 2289 | 2549 | 761 |
| PRKCB | 41 | 11 | 1 | 1 | 17 | 5 | 12 | NA | 442 | 92 |
| SLC12A1 | 2 | 2 | 1 | 0 | 18 | 1647 | NA | 3038 | 4006 | 1750 |
| CTNNB1 | 2517 | 340 | 44 | 1 | 19 | 6 | 21 | NA | 51 | 27 |
| PLCB1 | 9 | 7 | 1 | 0 | 20 | 25 | 22 | 27 | 745 | 91 |
| APOB | 27 | 4 | 2 | 0 | 21 | 117 | 7 | 8 | 664 | 42 |
| MET | 1045 | 348 | 40 | 0 | 22 | 21 | 37 | 7 | 427 | 186 |
| GRIN2B | 13 | 3 | 2 | 0 | 23 | 18 | 39 | 120 | 397 | 135 |
| UBC | 134 | 17 | 2 | 0 | 24 | 3 | 4 | NA | 137 | 1 |
| SASH1 | 13 | 3 | 1 | 0 | 25 | 1537 | NA | 1325 | 5100 | 3080 |
| HGF | 393 | 174 | 7 | 0 | 26 | 47 | 84 | 40 | 398 | 1192 |
| BRAF | 2175 | 270 | 126 | 1 | 27 | 70 | 75 | 155 | 392 | 150 |
| UBA6 | 1 | 1 | 1 | 0 | 28 | 5263 | NA | NA | 2957 | 980 |
| PTPRZ1 | 12 | 1 | 1 | 0 | 29 | 3366 | NA | 2402 | 894 | 289 |
| TAF1L | 2 | 1 | 1 | 0 | 30 | 557 | 57 | 547 | 10 | 130 |
The second to the fourth column show the co-appeared times of top 30 identified genes with ‘driver’, ‘lung’ and ‘cancer’ (from the left to the right). Is_Driver indicates whether the given gene is a driver gene or not in the benchmark dataset. The left columns represent the ranking positions of identified genes in Dytidriver, Diffusion, DriverNet, DawnRank, Muf_max, Muf_sum respectively
Cociter analysis of top 30 prostate cancer driver genes identified by our method
| Genes | Cancer | Prostate | Driver | is driver | DyTidriver | Diffusion | DriverNet | DawnRank | Muf max | Muf sum |
|---|---|---|---|---|---|---|---|---|---|---|
| TP53 | 6772 | 298 | 110 | 1 | 1 | 1 | 1 | 1 | 38 | 4 |
| CTNNB1 | 2517 | 170 | 44 | 1 | 2 | 2 | 2 | 21 | 40 | 9 |
| ASH1L | 4 | 0 | 1 | 0 | 3 | 1703 | NA | NA | 653 | 78 |
| SPOP | 43 | 24 | 4 | 1 | 4 | 1721 | 3 | 169 | 8 | 3 |
| ATM | 1377 | 61 | 5 | 0 | 5 | 13 | 11 | 12 | 36 | 14 |
| PTEN | 3047 | 642 | 64 | 1 | 6 | 700 | 94 | NA | 39 | 37 |
| TTN | 10 | 0 | 2 | 0 | 7 | 1724 | 22 | 14 | 2 | 2 |
| FOXA1 | 182 | 69 | 10 | 0 | 8 | 17 | 5 | 3 | 37 | 10 |
| KMT2D | 25 | 2 | 2 | 0 | 9 | 855 | 54 | NA | NA | NA |
| PIK3CA | 1199 | 34 | 54 | 1 | 10 | 7 | 10 | NA | 282 | 36 |
| DYNC1H1 | 9 | 1 | 2 | 0 | 11 | 66 | 19 | 51 | 219 | 72 |
| CDH12 | 4 | 0 | 0 | 0 | 12 | 1511 | NA | 755 | 349 | 296 |
| BRAF | 2175 | 33 | 126 | 1 | 13 | 326 | 63 | 36 | 348 | 34 |
| AKT1 | 2152 | 317 | 23 | 1 | 14 | 20 | 23 | NA | 52 | 33 |
| FAT3 | 1 | 1 | 1 | 0 | 15 | 19 | 26 | 75 | NA | NA |
| LRP4 | 7 | 0 | 2 | 0 | 16 | 1440 | NA | NA | 1426 | 541 |
| GRIN2B | 13 | 0 | 2 | 0 | 17 | 74 | 33 | NA | 220 | 90 |
| KMT2C | 23 | 2 | 4 | 0 | 18 | 613 | 27 | NA | NA | NA |
| NCOR1 | 109 | 27 | 3 | 1 | 19 | 59 | 77 | 58 | 41 | 60 |
| HSPA8 | 96 | 9 | 1 | 0 | 20 | 10 | 8 | NA | 438 | 67 |
| OBSCN | 7 | 0 | 0 | 0 | 21 | 1714 | 168 | 408 | 1 | 24 |
| GRIN2A | 5 | 0 | 1 | 0 | 22 | 285 | 92 | 85 | 374 | 73 |
| PCDHA12 | 1 | 0 | 0 | 0 | 23 | 1453 | 271 | 197 | 324 | 65 |
| MED12 | 19 | 4 | 4 | 0 | 24 | 376 | 162 | 157 | 317 | 84 |
| STAT3 | 1824 | 147 | 27 | 0 | 25 | 16 | 15 | 5 | 58 | 8 |
| PCDH18 | 2 | 1 | 1 | 0 | 26 | 1656 | 93 | 66 | 262 | 39 |
| CDH23 | 5 | 0 | 1 | 0 | 27 | 457 | 97 | NA | 295 | 63 |
| SPTA1 | 3 | 0 | 1 | 0 | 28 | 1719 | 16 | 9 | 221 | 15 |
| UFL1 | 7 | 0 | 1 | 0 | 29 | NA | NA | NA | 1238 | 1265 |
| SP1 | 393 | 38 | 3 | 1 | 30 | 8 | 9 | NA | 86 | 5 |
The second to the fourth column show the co-appeared times of top 30 identified genes with ‘driver’,‘prostate’ and ‘cancer’ (from the left to the right). Is_driver indicates whether the given gene is a driver or not in benchmark dataset. The left columns represent the ranking positions of identified genes in Dytidriver, Diffusion, DriverNet, DawnRank, Muf_max, Muf_sum respectively
Co-citer analysis of top 30 breast cancer driver genes identified by our method
| Genes | Cancer | Breast | Driver | is driver | DyTidriver | Diffusion | DriverNet | DawnRank | Muf max | Muf sum |
|---|---|---|---|---|---|---|---|---|---|---|
| TP53 | 6772 | 1356 | 110 | 1 | 1 | 233 | 1 | 2 | 7 | 2 |
| PIK3CA | 1199 | 334 | 54 | 1 | 2 | 156 | 2 | 1 | 2 | 3 |
| MAP 3 K1 | 135 | 62 | 2 | 1 | 3 | 128 | 18 | 4 | 899 | 28 |
| GATA3 | 154 | 122 | 8 | 1 | 4 | 85 | 13 | 6 | 888 | 17 |
| CDH1 | 1410 | 358 | 19 | 1 | 5 | 42 | 4 | 10 | 1 | 6 |
| ERBB2 | 5335 | 4332 | 78 | 1 | 6 | 72 | 64 | 90 | 8 | 73 |
| UBC | 134 | 30 | 2 | 0 | 7 | 240 | 3 | 122 | 22 | 1 |
| NCOR1 | 109 | 45 | 3 | 1 | 8 | 139 | 12 | 48 | 6 | 68 |
| ASH1L | 4 | 0 | 1 | 0 | 9 | 1097 | NA | 1986 | 1846 | 729 |
| PIK3R1 | 131 | 21 | 7 | 1 | 10 | 160 | 10 | 26 | 13 | 45 |
| EP300 | 269 | 86 | 4 | 1 | 11 | 68 | 5 | 178 | 367 | 4 |
| DYNC1H1 | 9 | 2 | 2 | 0 | 12 | 63 | 8 | 17 | 1017 | 107 |
| HUWE1 | 29 | 4 | 3 | 0 | 13 | 251 | 28 | 45 | 9 | 112 |
| PTEN | 3047 | 672 | 64 | 1 | 14 | 185 | 98 | 193 | 3 | 79 |
| MAP 3 K13 | 2 | 0 | 1 | 1 | 15 | 6189 | NA | 3303 | 2654 | 2045 |
| NF1 | 165 | 24 | 11 | 1 | 16 | 141 | 41 | 19 | 4 | 144 |
| TTN | 10 | 1 | 2 | 0 | 17 | 2581 | 6 | 5 | 717 | 5 |
| TPP2 | 4 | 0 | 2 | 0 | 18 | 1041 | NA | 2674 | 3172 | 2926 |
| UFL1 | 7 | 1 | 1 | 0 | 19 | 802 | NA | NA | 3493 | 3129 |
| BRCA1 | 4652 | 4017 | 22 | 1 | 20 | 25 | 11 | NA | 361 | 27 |
| BACH2 | 8 | 1 | 2 | 0 | 21 | 810 | 1182 | 2366 | 2298 | 1079 |
| JAK2 | 382 | 92 | 19 | 1 | 22 | 118 | 32 | NA | 73 | 119 |
| ERBB3 | 354 | 178 | 4 | 1 | 23 | 73 | 29 | 8 | 10 | 207 |
| ERBB4 | 350 | 220 | 4 | 1 | 24 | 74 | 56 | 276 | 18 | 410 |
| MAP 2 K4 | 70 | 10 | 2 | 1 | 25 | 127 | 34 | 23 | 898 | 86 |
| CTCF | 63 | 21 | 3 | 1 | 26 | 55 | 20 | 211 | 1027 | 29 |
| PRKCB | 41 | 9 | 1 | 1 | 27 | 174 | 59 | 31 | 80 | 151 |
| SASH1 | 13 | 8 | 1 | 0 | 28 | 1011 | NA | NA | 3706 | 4179 |
| TAF1 | 10 | 3 | 1 | 1 | 29 | 225 | 86 | 33 | 359 | 19 |
| SPTA1 | 3 | 0 | 1 | 0 | 30 | 212 | 17 | 25 | 1018 | 109 |
The second to the fourth column show the co-appeared times of top 30 identified genes with ‘driver’, ‘breast’ and ‘cancer’ (from the left to the right). is_driver indicates whether the given gene is a driver or not in the benchmark dataset. The left columns represent the ranking positions of identified genes in Dytidriver, Diffusion, DriverNet, DawnRank, Muf_max, Muf_sum respectively