| Literature DB >> 34859531 |
Sanna Gudmundsson1,2,3, Moriel Singer-Berk1,3, Nicholas A Watts1,3, William Phu1,2,3, Julia K Goodrich1,3, Matthew Solomonson1,3, Heidi L Rehm1,3,4, Daniel G MacArthur1,5,6, Anne O'Donnell-Luria1,2,3.
Abstract
Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.Entities:
Keywords: allele frequency; constraint; database; gnomAD; reference population; variant interpretation
Mesh:
Year: 2021 PMID: 34859531 PMCID: PMC9160216 DOI: 10.1002/humu.24309
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.700
Figure 1The gnomAD database aids variant interpretation world‐wide. (a) Weekly page views of gnomAD (dark blue) and ExAC (light blue) from release in October 2014 to mid‐2021. (b) Number of unique gnomAD page views in each country the past 12 months (since 2020‐06‐14) colored by none (grey), 5–10,000 (pink), and more than 10,000 (purple). (c) Schematic of the distribution and overlap of more than 195,000 unique individuals in gnomAD v2 exomes (orange), v2 genomes (green) and v3 genomes (violet)
Figure 2(a) Mean count of coding very rare variants (allele frequency < 0.1%), and (b) mean count of unique coding variants (across v2 and v3) grouped by population; black bar represents the 95% confidence interval. (c) Comparison of genome‐wide distribution of loss of function (LoF) constraint scores in 19,197 genes, colored by LOEUF decile; a continuous distribution for LOEUF score and a dichotomous‐like distribution for pLI scores. Dotted line marks suggested thresholds for LoF constrained genes at pLI ≥ 0.9 and LOEUF < 0.35 in gene interpretation. LOEUF, LoF observed/expected upper bound fraction; PLI, probability of being LoF intolerant
Figure 3The gnomAD gene page, displaying NSD1 as an example. Includes gene‐level information of metrics and variant distribution, and allows customized filtering. Some highlighted features are: (1–2) navigating datasets; (3) exome and (4) genome gene coverage; (5) direction of the gene (NSD1 on forward strand); (6–8) transcript and expression information; (9–10) constraint table; (11) proportion expressed across transcripts (pext) score and (12) example of a region with low pext; (13) filtering options for ClinVar variants and (14) expansion of ClinVar variant view; (15–16) gnomAD variant tracks and (17) filter gnomAD variants by consequence; (18) variant search bar; (19) variant table; (20) filter by sequencing method, variant type, and option to include low‐quality filtered variants; (21) customize variant table; (22) download variant table
Figure 4The gnomAD variant page, displaying the NSD1 missense variant 5‐176562246‐A‐G, p.Met48Val (NM_022455.5:c.142A>G) as an example. Includes variant level information and site specific metrics. Some highlighted features are: (1) external resources and (2) variant feedback forms; (3) allele frequency summary table with filtering allele frequency; (4) population frequency table and (5) visualization of subcontinental populations; (6) navigating datasets; (7) liftover link for gnomAD v3 and (8) visualization of v3 non‐v2 dataset; (9) age data; (10) genotype/depth/allele balance for heterozygotes and (11) site quality metrics; (12) read data and (13) the option to load read data for additional individuals