| Literature DB >> 31668702 |
Adam W Hansen1, Mullai Murugan2, He Li2, Michael M Khayat1, Liwen Wang2, Jill Rosenfeld3, B Kim Andrews2, Shalini N Jhangiani2, Zeynep H Coban Akdemir3, Fritz J Sedlazeck2, Allison E Ashley-Koch4, Pengfei Liu3, Donna M Muzny1, Erica E Davis5, Nicholas Katsanis5, Aniko Sabo1, Jennifer E Posey3, Yaping Yang3, Michael F Wangler3, Christine M Eng3, V Reid Sutton6, James R Lupski7, Eric Boerwinkle8, Richard A Gibbs9.
Abstract
The advent of inexpensive, clinical exome sequencing (ES) has led to the accumulation of genetic data from thousands of samples from individuals affected with a wide range of diseases, but for whom the underlying genetic and molecular etiology of their clinical phenotype remains unknown. In many cases, detailed phenotypes are unavailable or poorly recorded and there is little family history to guide study. To accelerate discovery, we integrated ES data from 18,696 individuals referred for suspected Mendelian disease, together with relatives, in an Apache Hadoop data lake (Hadoop Architecture Lake of Exomes [HARLEE]) and implemented a genocentric analysis that rapidly identified 154 genes harboring variants suspected to cause Mendelian disorders. The approach did not rely on case-specific phenotypic classifications but was driven by optimization of gene- and variant-level filter parameters utilizing historical Mendelian disease-gene association discovery data. Variants in 19 of the 154 candidate genes were subsequently reported as causative of a Mendelian trait and additional data support the association of all other candidate genes with disease endpoints.Entities:
Keywords: HARLEE; Hadoop; Mendelian disease; big data; clan genomics; data lake; developmental disorder; genotype-first; ultra-rare; whole-exome sequencing
Mesh:
Year: 2019 PMID: 31668702 PMCID: PMC6849092 DOI: 10.1016/j.ajhg.2019.09.027
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025