Boyoung Yoo1, Johannes Birgmeier1, Jonathan A Bernstein2, Gill Bejerano3,4,5,6. 1. Department of Computer Science, Stanford School of Engineering, Stanford, CA, USA. 2. Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA. 3. Department of Computer Science, Stanford School of Engineering, Stanford, CA, USA. bejerano@stanford.edu. 4. Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA. bejerano@stanford.edu. 5. Department of Developmental Biology, Stanford School of Medicine, Stanford, CA, USA. bejerano@stanford.edu. 6. Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA. bejerano@stanford.edu.
Abstract
PURPOSE: Roughly 70% of suspected Mendelian disease patients remain undiagnosed after genome sequencing, partly because knowledge about pathogenic genes is incomplete and constantly growing. Generating a novel pathogenic gene hypothesis from patient data can be time-consuming especially where cohort-based analysis is not available. METHODS: Each patient genome contains dozens to hundreds of candidate variants. Many sources of indirect evidence about each candidate may be considered. We introduce InpherNet, a network-based machine learning approach leveraging Monarch Initiative data to accelerate this process. RESULTS: InpherNet ranks candidate genes based on orthologs, paralogs, functional pathway members, and colocalized interaction partner gene neighbors. It can propose novel pathogenic genes and reveal known pathogenic genes whose diagnosed patient-based annotation is missing or partial. InpherNet is applied to patient cases where the causative gene is incorrectly ranked low by clinical gene-ranking methods that use only patient-derived evidence. InpherNet correctly ranks the causative gene top 1 or top 1-5 in roughly twice as many cases as seven comparable tools, including in cases where no clinical evidence for the diagnostic gene is in our knowledgebase. CONCLUSION: InpherNet improves the state of the art in considering candidate gene neighbors to accelerate monogenic diagnosis.
PURPOSE: Roughly 70% of suspected Mendelian disease patients remain undiagnosed after genome sequencing, partly because knowledge about pathogenic genes is incomplete and constantly growing. Generating a novel pathogenic gene hypothesis from patient data can be time-consuming especially where cohort-based analysis is not available. METHODS: Each patient genome contains dozens to hundreds of candidate variants. Many sources of indirect evidence about each candidate may be considered. We introduce InpherNet, a network-based machine learning approach leveraging Monarch Initiative data to accelerate this process. RESULTS: InpherNet ranks candidate genes based on orthologs, paralogs, functional pathway members, and colocalized interaction partner gene neighbors. It can propose novel pathogenic genes and reveal known pathogenic genes whose diagnosed patient-based annotation is missing or partial. InpherNet is applied to patient cases where the causative gene is incorrectly ranked low by clinical gene-ranking methods that use only patient-derived evidence. InpherNet correctly ranks the causative gene top 1 or top 1-5 in roughly twice as many cases as seven comparable tools, including in cases where no clinical evidence for the diagnostic gene is in our knowledgebase. CONCLUSION: InpherNet improves the state of the art in considering candidate gene neighbors to accelerate monogenic diagnosis.
Authors: Peter D Stenson; Matthew Mort; Edward V Ball; Molly Chapman; Katy Evans; Luisa Azevedo; Matthew Hayden; Sally Heywood; David S Millar; Andrew D Phillips; David N Cooper Journal: Hum Genet Date: 2020-06-28 Impact factor: 4.132