Literature DB >> 33441128

Current cancer driver variant predictors learn to recognize driver genes instead of functional variants.

Daniele Raimondi1, Antoine Passemiers1, Piero Fariselli2, Yves Moreau3.   

Abstract

BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task.
RESULTS: In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions.
CONCLUSIONS: To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open.

Entities:  

Keywords:  Bias in machine learning; Cancer driver variant prediction; Clever Hans effect

Year:  2021        PMID: 33441128      PMCID: PMC7807764          DOI: 10.1186/s12915-020-00930-0

Source DB:  PubMed          Journal:  BMC Biol        ISSN: 1741-7007            Impact factor:   7.431


  42 in total

1.  A new disease-specific machine learning approach for the prediction of cancer-causing missense variants.

Authors:  Emidio Capriotti; Russ B Altman
Journal:  Genomics       Date:  2011-07-07       Impact factor: 5.736

2.  M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity.

Authors:  Karthik A Jagadeesh; Aaron M Wenger; Mark J Berger; Harendra Guturu; Peter D Stenson; David N Cooper; Jonathan A Bernstein; Gill Bejerano
Journal:  Nat Genet       Date:  2016-10-24       Impact factor: 38.330

3.  Multilevel biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects.

Authors:  Daniele Raimondi; Andrea M Gazzo; Marianne Rooman; Tom Lenaerts; Wim F Vranken
Journal:  Bioinformatics       Date:  2016-02-18       Impact factor: 6.937

4.  Computational approaches to identify functional genetic variants in cancer genomes.

Authors:  Abel Gonzalez-Perez; Ville Mustonen; Boris Reva; Graham R S Ritchie; Pau Creixell; Rachel Karchin; Miguel Vazquez; J Lynn Fink; Karin S Kassahn; John V Pearson; Gary D Bader; Paul C Boutros; Lakshmi Muthuswamy; B F Francis Ouellette; Jüri Reimand; Rune Linding; Tatsuhiro Shibata; Alfonso Valencia; Adam Butler; Serge Dronov; Paul Flicek; Nick B Shannon; Hannah Carter; Li Ding; Chris Sander; Josh M Stuart; Lincoln D Stein; Nuria Lopez-Bigas
Journal:  Nat Methods       Date:  2013-08       Impact factor: 28.547

5.  Distinguishing cancer-associated missense mutations from common polymorphisms.

Authors:  Joshua S Kaminker; Yan Zhang; Allison Waugh; Peter M Haverty; Brock Peters; Dragan Sebisanovic; Jeremy Stinson; William F Forrest; J Fernando Bazan; Somasekar Seshagiri; Zemin Zhang
Journal:  Cancer Res       Date:  2007-01-15       Impact factor: 12.701

6.  Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.

Authors:  Yilun Wang; Michal Kosinski
Journal:  J Pers Soc Psychol       Date:  2018-02

7.  dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.

Authors:  Xiaoming Liu; Chunlei Wu; Chang Li; Eric Boerwinkle
Journal:  Hum Mutat       Date:  2016-01-05       Impact factor: 4.878

8.  CanDrA: cancer-specific driver missense mutation annotation with optimized features.

Authors:  Yong Mao; Han Chen; Han Liang; Funda Meric-Bernstam; Gordon B Mills; Ken Chen
Journal:  PLoS One       Date:  2013-10-30       Impact factor: 3.240

9.  Driver gene classification reveals a substantial overrepresentation of tumor suppressors among very large chromatin-regulating proteins.

Authors:  Zeev Waks; Omer Weissbrod; Boaz Carmeli; Raquel Norel; Filippo Utro; Yaara Goldschmidt
Journal:  Sci Rep       Date:  2016-12-23       Impact factor: 4.379

10.  DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins.

Authors:  Daniele Raimondi; Ibrahim Tanyalcin; Julien Ferté; Andrea Gazzo; Gabriele Orlando; Tom Lenaerts; Marianne Rooman; Wim Vranken
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

View more
  2 in total

1.  Predicting functional consequences of mutations using molecular interaction network features.

Authors:  Kivilcim Ozturk; Hannah Carter
Journal:  Hum Genet       Date:  2021-08-25       Impact factor: 5.881

2.  HPMPdb: A machine learning-ready database of protein molecular phenotypes associated to human missense variants.

Authors:  Daniele Raimondi; Francesco Codicè; Gabriele Orlando; Joost Schymkowitz; Frederic Rousseau; Yves Moreau
Journal:  Curr Res Struct Biol       Date:  2022-05-13
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.