| Literature DB >> 27907889 |
Tychele N Turner1, Qian Yi2, Niklas Krumm2, John Huddleston2,3, Kendra Hoekzema2, Holly A F Stessman2, Anna-Lisa Doebley2,4, Raphael A Bernier5, Deborah A Nickerson2, Evan E Eichler2,3.
Abstract
Whole-exome and whole-genome sequencing have facilitated the large-scale discovery of de novo variants in human disease. To date, most de novo discovery through next-generation sequencing focused on congenital heart disease and neurodevelopmental disorders (NDDs). Currently, de novo variants are one of the most significant risk factors for NDDs with a substantial overlap of genes involved in more than one NDD. To facilitate better usage of published data, provide standardization of annotation, and improve accessibility, we created denovo-db (http://denovo-db.gs.washington.edu), a database for human de novo variants. As of July 2016, denovo-db contained 40 different studies and 32,991 de novo variants from 23,098 trios. Database features include basic variant information (chromosome location, change, type); detailed annotation at the transcript and protein levels; severity scores; frequency; validation status; and, most importantly, the phenotype of the individual with the variant. We included a feature on our browsable website to download any query result, including a downloadable file of the full database with additional variant details. denovo-db provides necessary information for researchers to compare their data to other individuals with the same phenotype and also to controls allowing for a better understanding of the biology of de novo variants and their contribution to disease.Entities:
Mesh:
Year: 2016 PMID: 27907889 PMCID: PMC5210614 DOI: 10.1093/nar/gkw865
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Statistics of denovo-db. (A) Shown are the number of mutations present in the database by the primary phenotype of the individual. (B) Number of mutations split out by the SeattleSeq function class of the variant. (C) CADD score distribution of all variants. (D) Percent of sites validated by primary phenotype.
Figure 2.Browser shots of denovo-db. (A) Result of a gene search for CHD8. (B) Result of a sample search for 11654.p1.
Figure 3.Likely gene-disrupting (LGD) events by cases (in red) and controls (in black). Shown are the counts of LGD events by cases (all phenotypes) and controls with the genes listed for each category. Note there are two bars for the genes with two counts in cases and zero in controls to allow for the full gene list to fit on the plot.
Figure 4.Missense CADD scores in denovo-db. Empirical cumulative distribution functions of missense CADD scores in the following phenotypes: controls, autism, congenital heart defect (CHD), intellectual disability (ID), and epilepsy individuals.