| Literature DB >> 31568528 |
Olivier Collier1, Véronique Stoven2,3,4, Jean-Philippe Vert2,5.
Abstract
Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.Entities:
Mesh:
Year: 2019 PMID: 31568528 PMCID: PMC6786659 DOI: 10.1371/journal.pcbi.1007381
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Comparison of TUSON, MUFFINN and LOTUS in the pan-cancer cross-validation regime.
This table shows the mean cross-validated CE for OG and TSG prediction by TUSON, MUFFINN and LOTUS, trained on the TUSON database.
| Driver type ∖ Method | TUSON | MUFFINN | LOTUS |
|---|---|---|---|
| OG | 3,286 | 1,924 | |
| TSG | 626 | 678 |
Comparison of 20/20+, DiffMut and LOTUS in the pan-cancer cross-validation regime.
This table shows the mean cross-validated CE for OG and TSG prediction by 20/20+, DiffMut and LOTUS, trained on the 20/20+ database.
| Driver type ∖ Method | 20/20+ | DiffMut | LOTUS |
|---|---|---|---|
| OG | 1,831 | 4,254 | |
| TSG | 845 | 2,537 |
Comparison of MutSigCV and LOTUS in the pan-cancer cross-validation regime.
This table shows the mean cross-validated CE for OG and TSG prediction by MutSigCV and LOTUS, trained on the MutSigCV database.
| Driver type ∖ Method | MutSigCV | LOTUS |
|---|---|---|
| OG | 6,294 | |
| TSG | 7,232 |
Performance of LOTUS for OG prediction with different gene kernels.
| Train set ∖ Kernel | |||||
|---|---|---|---|---|---|
| TUSON datasets | 2,904 | 1,574 | 1,659 | 1,553 | |
| 20/20 datasets | 2,453 | 1,642 | 1,774 | 1,628 | |
| MutSig datasets | 2,292 | 1,450 | 1,306 | 1,929 |
This table shows the CE of LOTUS for OG prediction in the pan-cancer setting, with different gene kernels (columns) and different gold standard sets of known OGs and mutations (rows).
Performance of LOTUS for TSG prediction with different gene kernels.
| Train set ∖ Kernel | |||||
|---|---|---|---|---|---|
| TUSON datasets | 393 | 1,413 | 1,965 | 1,669 | |
| 20/20 datasets | 971 | 2,460 | 2,994 | 2,392 | |
| MutSig datasets | 4,335 | 4,017 | 4,253 | 3,818 |
This table shows the CE of LOTUS for TSG prediction in the pan-cancer setting, with different gene kernels (columns) and different gold standard sets of known OGs and mutations (rows).
Fig 1Correlation between gene degree and LOTUS rank.
This plot shows the Spearman’s rank correlation between gene degrees in the PPI network and LOTUS ranks (when trained with the 20/20 dataset), as a function of how many top-ranked TSGs (left) and OGs (right) are removed before computing the correlation.
Performance of different methods trained on the TUSON database for TSG prediction in CGCv86.
| Method ∖ Number of predictions | 20 | 50 | 100 |
|---|---|---|---|
| MUFFINN | 3 | 6 | 13 |
| LOTUS | 5 | 10 | 19 |
| TUSON |
For each method (row) trained on the TUSON database, the table shows the number of recently added TSG in CGCv86 predicted among the top k predictions, for k = 20, 50 or 100 (columns).
Performance of different methods trained on the 20/20+ database for OG prediction in CGCv86.
| Method ∖ Number of predictions | 20 | 50 | 100 |
|---|---|---|---|
| LOTUS | 3 | 5 | 9 |
| DiffMut | 2 | 3 | 4 |
| 20/20+ |
For each method (row) trained on the 20/20+ database, the table shows the number of recently added OG in CGCv86 predicted among the top k predictions, for k = 20, 50 or 100 (columns).
Fig 2ROC curves of different cancer gene prediction methods trained on the TUSON database.
ROC curves of different methods for TSG (left) and OG (right) prediction when trained on the TUSON database, and evaluated on the discovery of cancer genes recently added in CGCv86.
Fig 3ROC curves of different cancer gene prediction methods trained on the 20/20 database.
ROC curves of different methods for TSG (left) and OG (right) prediction when trained on the 20/20 database, and evaluated on the discovery of cancer genes recently added in CGCv86.
Total number of non-silent mutations in the top predicted genes by different methods trained on the TUSON database.
| Driver type ∖ Method | TUSON | LOTUS | MUFFINN |
|---|---|---|---|
| 20 first OG | 3 | 4 | |
| 50 first OG | 3 | 3 | |
| 100 first OG | 7 | ||
| 20 first TSG | 30 | 7 | |
| 50 first TSG | 14 | 5 | |
| 100 first TSG | 11 | 5 |
This table shows the total number of non-silent mutations in the top-ranked genes predicted by different method trained on the TUSON database (columns), for both OG and TSG (rows).
Total number of non-silent mutations in the top predicted genes by different methods trained on the 20/20+ database.
| Driver type ∖ Method | DiffMut | 20/20+ | LOTUS |
|---|---|---|---|
| 20 first OG | 25 | ||
| 50 first OG | 17 | ||
| 100 first OG | 11 | ||
| 20 first TSG | 25 | 11 | |
| 50 first TSG | 18 | 9 | |
| 100 first TSG | 14 | 6 |
This table shows the total number of non-silent mutations in the top-ranked genes predicted by different method trained on the TUSON database (columns), for both OG and TSG (rows).
Fig 4Number of cancer genes per cancer type.
This plot shows the distribution of the number of TSGs (left) and OGs (right) per cancer type. For example, 3 cancer types have 0 to 5 TSG, 6 other cancer types have 6 to 10 TSG, etc.
Cross-validated CE for DiffMut and LOTUS prediction of disease specific TSGs in the single- and multitask settings.
| Disease | Number of TSGs | DiffMut | Single-Task LOTUS | Aggregation LOTUS | Multitask LOTUS | Multitask LOTUS2 |
|---|---|---|---|---|---|---|
| ALL | 38 | 7,122 | 1,431 | 783 | 709 | |
| Astrocytoma | 17 | 7,605 | 2,612 | 49 | 38 | |
| BladUroCarc | 9 | 4,852 | 2052 | 173 | 138 | |
| BreastAdeno | 34 | 3,250 | 1,837 | 801 | 778 | |
| CLL | 21 | 4,253 | 1,336 | 895 | 921 | |
| Colorectal | 53 | 7,600 | 3,640 | 911 | 870 | |
| EndomCarc | 18 | 5,831 | 1,222 | 89 | 82 | |
| GliobMulti | 24 | 4,776 | 2,771 | 166 | 191 | |
| HNSC | 24 | 5,819 | 3,051 | 681 | 595 | |
| Kidney Cancer | 19 | 6,133 | 2,766 | 2,512 | ||
| LAML | 56 | 5,947 | 1,936 | 1,483 | 1,451 | |
| LiverHepCarc | 10 | 3,768 | 602 | 221 | 172 | |
| Low-Grade Glioma | 17 | 6,047 | 2,712 | 49 | 38 | |
| LungAdeno | 30 | 6,773 | 4,712 | 339 | 341 | |
| LungSquaCarc | 16 | 6,829 | 3,868 | 53 | 57 | |
| LungSmallCarc | 28 | 8,883 | 5,746 | 58 | 68 | |
| Lymphoma B-Cell | 37 | 6,383 | 2,238 | 2,284 | 2,252 | |
| Medulloblastoma | 14 | 6,692 | 1,123 | 265 | 247 | |
| Melanoma | 31 | 6,719 | 2,467 | 459 | 365 | |
| Multiple Myeloma | 7 | 5,165 | 3,754 | 3,871 | 3,683 | |
| Ovarian | 22 | 6,481 | 2,632 | 724 | 627 | |
| PancAdeno | 13 | 3,140 | 1,777 | 140 | 123 | |
| ProstAdeno | 7 | 6,565 | 2,345 | 457 | 514 | |
| Rhabd | 6 | 4,871 | 1,957 | 181 | 111 | |
| Soft-Tissue Sarcoma | 23 | 8,572 | 4,447 | 2,008 | 1,992 | |
| StomAdeno | 17 | 6,530 | 2,878 | 331 | 322 | |
| ThyrCarc | 8 | 10,352 | 2,834 | 1,325 | 1,538 |
For 27 cancer types (rows) with a given number of known TSG (second column), this table shows the performance in terms of cross-validate CE of different methods for TSG prediction, including DiffMut (column 3) and different variants of single- and multitask LOTUS (columns 4 to 7). ALL stands for Acute Lymphocytic Leukemia, BladUroCarc for Bladder Urothelial Carcinoma, BreastAdeno for Breast Adenocarcinoma, CLL for Chronic Lymphocytic Leukemia, EndomCarc for Endometrial Carcinoma, GliobMulti for Glioblastoma Multiform, HNSC for Head and Neck Squamous Cell Carcinoma, LAML for Acute Myeloid Leukemia, LiverHepCarc for Liver Hepatocellular Carcinoma, LungAdeno for Lung Adenocarcinoma, LungSquaCarc for Lung Squamous Cell Carcinoma, LungSmallCarc for Lung Small Cell Carcinoma, PancAdeno for Pancreatic Adenocarcinoma, ProstAdeno for Prostate Adenocarcinoma, Rhabd for Rhabdomyosarcoma, StomAdeno for Stomach Adenocarcinoma and ThyrCarc for Thyroid Carcinoma.
Cross-validated CE for DiffMut and LOTUS prediction of disease specific OGs in the single- and multitask settings.
| Disease | Number of OGs | DiffMut | Single-Task LOTUS | Aggregation LOTUS | Multitask LOTUS | Multitask LOTUS2 |
|---|---|---|---|---|---|---|
| ALL | 52 | 8,479 | 2,649 | 1,232 | 1,269 | |
| Astrocytoma | 13 | 7,847 | 2,894 | 75 | 63 | |
| BladUroCarc | 10 | 5,324 | 1,578 | 210 | 140 | |
| BreastAdeno | 19 | 2,672 | 1,371 | 852 | 806 | |
| CLL | 19 | 4,582 | 3,821 | 1,537 | 1,501 | |
| Colorectal | 23 | 4,043 | 3,376 | 818 | 784 | |
| EndomCarc | 8 | 5,112 | 1,671 | 122 | 128 | |
| GliobMulti | 22 | 4,915 | 2,539 | 143 | 128 | |
| HNSC | 23 | 4,539 | 2,917 | 1,500 | 1,504 | |
| Kidney Cancer | 11 | 5,774 | 1,903 | 600 | 763 | |
| LAML | 56 | 4,990 | 2,623 | 1,418 | 1,408 | |
| Low-Grade Glioma | 10 | 3,753 | 1,541 | 46 | 33 | |
| LungAdeno | 26 | 4,510 | 2,038 | 84 | 79 | |
| LungSmallCarc | 6 | 3,243 | 2,129 | 1,061 | 864 | |
| LungSquaCarc | 24 | 5,737 | 1,641 | 67 | 54 | |
| Lymphoma B-Cell | 34 | 4,765 | 2,424 | 1,712 | 1,714 | |
| Medulloblastoma | 5 | 7,165 | 58 | 93 | 34 | |
| Melanoma | 35 | 3,377 | 1,925 | 1,576 | 1,550 | |
| Multiple Myeloma | 9 | 3,466 | 2,870 | 1,877 | 2,026 | |
| Neuroblastoma | 5 | 5,298 | 3,830 | 2,078 | 2,101 | |
| Ovarian | 12 | 6,371 | 3,606 | 1,256 | 870 | |
| PancAdeno | 6 | 1,464 | 1,142 | 498 | 426 | |
| ProstAdeno | 13 | 6,523 | 2,451 | 1,599 | 1,475 | |
| Rhabd | 7 | 8,265 | 1,978 | 172 | 104 | |
| Soft-Tissue Sarcoma | 38 | 8,886 | 2,480 | 2,466 | 2,444 | |
| StomAdeno | 10 | 2,235 | 750 | 127 | 97 | |
| ThyrCarc | 8 | 8,407 | 2,656 | 547 | 612 |
For 27 cancer types (rows) with a given number of known OG (second column), this table shows the performance in terms of cross-validate CE of different methods for OG prediction, including DiffMut (column 3) and different variants of single- and multitask LOTUS (columns 4 to 7). The abbreviations of cancer types are explained in the legend of Table 12.
Performance of different methods trained on the TUSON database for OG prediction in CGCv86.
| Method ∖ Number of predictions | 20 | 50 | 100 |
|---|---|---|---|
| MUFFINN | 3 | 5 | 11 |
| LOTUS | 3 | 7 | |
| TUSON | 12 |
For each method (row) trained on the TUSON database, the table shows the number of recently added OG in CGCv86 predicted among the top k predictions, for k = 20, 50 or 100 (columns).
Performance of different methods trained on the 20/20+ database for TSG prediction in CGCv86.
| Method ∖ Number of predictions | 20 | 50 | 100 |
|---|---|---|---|
| LOTUS | 4 | ||
| DiffMut | 1 | 4 | 10 |
| 20/20+ | 9 |
For each method (row) trained on the 20/20+ database, the table shows the number of recently added TSG in CGCv86 predicted among the top k predictions, for k = 20, 50 or 100 (columns).