| Literature DB >> 29997612 |
David Requena1, Patrick Maffucci1,2,3, Benedetta Bigio1, Lei Shang1, Avinash Abhyankar4, Bertrand Boisson1,5,6, Peter D Stenson7, David N Cooper7, Charlotte Cunningham-Rundles2,3, Jean-Laurent Casanova1,5,6,7,8,9, Laurent Abel1,5,6, Yuval Itan10,11.
Abstract
High-throughput genomic technologies yield about 20,000 variants in the protein-coding exome of each individual. A commonly used approach to select candidate disease-causing variants is to test whether the associated gene has been previously reported to be disease-causing. In the absence of known disease-causing genes, it can be challenging to associate candidate genes with specific genetic diseases. To facilitate the discovery of novel gene-disease associations, we determined the putative biologically closest known genes and their associated diseases for 13,005 human genes not currently reported to be disease-associated. We used these data to construct the closest disease-causing genes (CDG) server, which can be used to infer the closest genes with an associated disease for a user-defined list of genes or diseases. We demonstrate the utility of the CDG server in five immunodeficiency patient exomes across different diseases and modes of inheritance, where CDG dramatically reduced the number of candidate genes to be evaluated. This resource will be a considerable asset for ascertaining the potential relevance of genetic variants found in patient exomes to specific diseases of interest. The CDG database and online server are freely available to non-commercial users at: http://lab.rockefeller.edu/casanova/CDG.Entities:
Keywords: disease-causing gene; gene filtering; genomics; human gene connectome; next-generation sequencing
Year: 2018 PMID: 29997612 PMCID: PMC6030251 DOI: 10.3389/fimmu.2018.01340
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Predicted degrees of separation between (blue) the 13,005 genes from human gene mutation database (HGMD) not known to be disease-causing and their closest predicted HGMD disease-causing genes, and (orange) between all pairs of human genes.
Figure 2Comparative performance of CDG, FunCoup, and HumanNet using (A) 339 new genes in human gene mutation database (HGMD) and (B) using 84 genes in ClinVar that are not in HGMD. The numbers below each method show the number of genes with at least one predicted gene (left) and how many were associated with the expected disease (right). Black numbers show the gene distribution across the three servers and white numbers show how many were associated with the expected disease in each server.
Figure 3Bootstrapping simulations between a set of (1) expected: p-values between 13,005 genes not reported to cause disease and their predicted CDGs; (2) observed: p-values between new human gene mutation database genes (i.e., not used to generate the CDGs presented in this study) and their predicted CDGs. Test performed by random sampling using a Gaussian distribution.
Figure 4Schematic of the closest disease-causing genes (CDG) server pipeline, where CDG can be estimated by queries of gene or disease lists provided by the user.