| Literature DB >> 28518168 |
Nicola Whiffin1,2, Eric Minikel3,4, Roddy Walsh1,2, Anne H O'Donnell-Luria3,4, Konrad Karczewski3,4, Alexander Y Ing5,6, Paul J R Barton1,2, Birgit Funke5,6, Stuart A Cook1,2,7,8, Daniel MacArthur3,4,9, James S Ware1,2,4,10.
Abstract
PurposeWhole-exome and whole-genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognized as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants.MethodsWe present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets.ResultsUsing the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, without removing true pathogenic variants (false-positive rate<0.001).ConclusionWe outline a statistically robust framework for assessing whether a variant is "too common" to be causative for a Mendelian disorder of interest. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.Entities:
Mesh:
Year: 2017 PMID: 28518168 PMCID: PMC5563454 DOI: 10.1038/gim.2017.26
Source DB: PubMed Journal: Genet Med ISSN: 1098-3600 Impact factor: 8.822
Details of the most prevalent pathogenic variants in case cohorts for five cardiac conditions
| HCM | 1/500 | 104/6,179 | 1.7% (1.4–2.0%) | 0.5 | 3.4 × 10−5(2.7–4.0 × 10−5) | 9 | 3 | |
| DCM | 1/250 | 18/1,254 | 1.4% (0.78–2.1%) | 0.5 | 5.6 × 10−5(3.1–8.4 × 10−5) | 16 | 0 | |
| ARVC | 1/1,000 | 24/361 | 6.7% (4.1–9.2%) | 0.5 | 6.7 × 10−5(4.1–9.2 × 10−5) | 17 | 6 | |
| LQTS | 1/2,000 | 30/2,500 | 1.2% (0.77–1.6%) | 0.5 | 6.0 × 10−6(3.9–8.2 × 10−6) | 3 | 0 | |
| Brugada | 1/1,000 | 14/2,111 | 0.66% (0.32–1.0%) | 0.5 | 6.6 × 10−6(0.32–1.0 × 10−5) | 3 | 0 |
AC, allele count; ARVC, arrhythmogenic right ventricular cardiomyopathy; DCM, dilated cardiomyopathy; ExAC, Exome Aggregation Consortium database; HCM, hypertrophic cardiomyopathy; LQTS, long QT syndrome.
Shown along with the frequency in cases is the estimated population allele frequency (calculated as: case frequency × disease prevalence × 1/2 × 1/variant penetrance) and the observed frequency in the ExAC dataset.
As penetrance estimates for individual variants are not widely available, we have applied an estimate of 0.5 across these cardiac disorders (see Supplementary Information). Case cohorts and prevalence estimates (taken as the highest value reported) were obtained from HCM,[14, 15] DCM,[14, 15, 34] ARVC,[15, 35] LQTS,[36, 37] and Brugada.[38, 39]
Figure 1Plot of Exome Aggregation Consortium (ExAC) allele count (all populations) against case allele count for variants classified as variants of unknown significance (VUS), likely pathogenic, or pathogenic in 6,179 cases of hypertrophic cardiomyopathy. The dotted lines represent the maximum tolerated ExAC allele counts in hypertrophic cardiomyopathy for 50% (dark blue) and 100% (light blue) penetrance. Variants are color-coded according to reported pathogenicity. Where classifications from contributing laboratories were discordant, the more conservative classification is plotted. The inset panel shows the full dataset; the main panel expands the region of primary interest. True pathogenic variants appropriately fall below our derived allele count threshold.
Maximum credible population frequencies and maximum tolerated ExAC allele counts for variants causative of exemplar inherited cardiac conditions, assuming a penetrance of 0.5 throughout
| Marfan | 0.015 | 1/3,000 | 0.5 | 5.0 × 10−6 | 2 |
| Noonan | 0.10 | 1/1,000 | 0.5 | 1.0 × 10−4 | 18 |
| CPVT | 0.10 | 1/10,000 | 0.5 | 1.0 × 10−5 | 3 |
| Classic Ehlers-Danlos | 0.40 | 1/20,000 | 0.5 | 2.0 × 10−5 | 5 |
CPVT,catecholaminergic polymorphic ventricular tachycardia; ExAC, Exome Aggregation Consortium database.
Prevalence estimates (taken as the highest value reported) were obtained from Marfan,[40] Noonan,[18] CPVT,[19] and classical Ehlers-Danlos.[20]
Figure 2A flow diagram of our approach, applied to a dominant condition, and using Exome Aggregation Consortium (ExAC) as our reference sample. First, a disease-level maximum credible population allele frequency (AF) is calculated, based on disease prevalence, heterogeneity, and penetrance. To evaluate a specific variant, we determine whether the observed variant allele count is compatible with disease by comparing this maximum credible population AF against the (precalculated) filtering AF for the variant. *While filtering AF has been precomputed for ExAC variants, the same framework can be readily applied using another reference sample.
Figure 3The clinical utility of stringent allele frequency (AF) thresholds. (a) The number of predicted protein-altering variants (definition in “Materials and Methods”) per exome as a function of the AF filter applied. A one-tailed 95% confidence interval is used, meaning that variants were removed from consideration if their AC would fall within the top 5% of the Poisson probability distribution for the user’s maximum credible AF (x axis). (b) The odds ratio for HCM disease-association against AF. The disease odds ratio of a burden test for variants in HCM genes is shown, stratified by variant allele frequency. For each AF bin, the prevalence of variants in sarcomeric HCM-associated genes (MYH7, MYBPC3, TNNT2, TNNI3, MYL2, MYL3, TPM1, and ACTC1, analyzed collectively) in 322 HCM cases and 852 healthy controls was compared, and an odds ratio computed (see “Materials and Methods”). Data for each bin is plotted at the upper AF cutoff. Error bars represent 95% confidence intervals. The probability that a variant is pathogenic is much greater at very low AFs.