Jun Wang1,2, Hehe Liu1,2, Renae Elaine Bertrand1,3, Alejandro Sarrion-Perdigones3, Yezabel Gonzalez3, Koen J T Venken3,4, Rui Chen5,6. 1. Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. 2. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. 3. Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, USA. 4. Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, TX, USA. 5. Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. ruichen@bcm.edu. 6. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. ruichen@bcm.edu.
Abstract
PURPOSE: To achieve the ultimate goal of personalized treatment of patients, accurate molecular diagnosis and precise interpretation of the impact of genetic variants on gene function is essential. With sequencing cost becoming increasingly affordable, the accurate distinguishing of benign from pathogenic variants becomes the major bottleneck. Although large normal population sequence databases have become a key resource in filtering benign variants, they are not effective at filtering extremely rare variants. METHODS: To address this challenge, we developed a novel statistical test by combining sequencing data from a patient cohort with a normal control population database. By comparing the expected and observed allele frequency in the patient cohort, variants that are likely benign can be identified. RESULTS: The performance of this new method is evaluated on both simulated and real data sets coupled with experimental validation. As a result, we demonstrate this new test is well powered to identify benign variants, and is particularly effective for variants with low frequency in the normal population. CONCLUSION: Overall, as a general test that can be applied to any type of variants in the context of all Mendelian diseases, our work provides a general framework for filtering benign variants with very low population allele frequency.
PURPOSE: To achieve the ultimate goal of personalized treatment of patients, accurate molecular diagnosis and precise interpretation of the impact of genetic variants on gene function is essential. With sequencing cost becoming increasingly affordable, the accurate distinguishing of benign from pathogenic variants becomes the major bottleneck. Although large normal population sequence databases have become a key resource in filtering benign variants, they are not effective at filtering extremely rare variants. METHODS: To address this challenge, we developed a novel statistical test by combining sequencing data from a patient cohort with a normal control population database. By comparing the expected and observed allele frequency in the patient cohort, variants that are likely benign can be identified. RESULTS: The performance of this new method is evaluated on both simulated and real data sets coupled with experimental validation. As a result, we demonstrate this new test is well powered to identify benign variants, and is particularly effective for variants with low frequency in the normal population. CONCLUSION: Overall, as a general test that can be applied to any type of variants in the context of all Mendelian diseases, our work provides a general framework for filtering benign variants with very low population allele frequency.
The advancement of high throughput sequencing technology significantly facilitates the identification of genetic variations in individuals and populations. However, the determination of the pathogenicity of genetic variants upon sequencing remains a major challenge for precision medicine. Over the last two decades, many in silico variant functional prediction tools have been developed to distinguish the pathogenic from the likely benign genetic variants, but they are far from perfect. For example, the methods based on evolutionary sequence conservation might be prone to both false positives and false negatives as many benign variants also occur in the evolutionarily conserved regions and vice versa. In parallel, many computational prediction methods that applied machine learning algorithms are limited by the training dataset and the type of information incorporated into the model. Furthermore, most of the computational prediction methods focused on missense variants, leaving other types of variants, such as INDEL and noncoding variants, largely unexplored. Recently, with genome sequences of large populations becoming available, a significant proportion of benign variants can be identified based on population allele frequency (AF) and disease prevalence, given variants having high AF in the normal population tend to be benign[1,2]. Similarly, by comparing AF between a patient cohort and the normal control population, variants that are enriched in the patient cohort and therefore likely to be pathogenic can be identified[3]. However, these AF-based methods have limited power when the AF of the variant is low in the normal population.In this study, we developed a novel statistical method to identify likely benign variants. Briefly, in contrast to previous methods that primarily use the AF in the normal population, our method calculates the expected frequency of an allele in the patient cohort by taking into account the disease prevalence and the AF in the normal population. By comparing the expected and the observed frequency of the variant in the patient cohort, the probability that the variant is pathogenic can be calculated. To test the new statistics, we have applied it to both simulated and real datasets and evaluated its performance based on literature and other variant prediction methods. A subset of the variants with prediction contradictory to previous reports were further examined with experimental functional assays. Based on our results, the model has significantly improved power to annotate rare variants, and it can be applied for all variant types such as SNPs, INDELs and non-coding variants. Overall, we show that our new method is effective and robust as a general framework in evaluating the pathogenicity of variants in the context of Mendelian diseases.
Methods:
Ethics Statement
The studies have approval from the appropriate institutional review board (BCM IRB).
Derivation of the combined test
The combined test is composed of two binomial tests.Let Q = the disease prevalence due to the variants in gene g in population. q= the AF of a single pathogenic variant a of the gene g in population. Assume that n patients are randomly sampled from the patients attributable to the variants in gene g.For a recessive disease gene g, the expected number of pathogenic alleles in n patients would be 2 * n. Among the 2 * n pathogenic alleles, the expected occurrence count of a pathogenic allele a should follow a Binomial distribution with N = 2 * n trials and the occurrence frequency (success rate) of .The first test (test1) is a left-tailed :H0 of the test1: The allele is pathogenic, thus its observed occurrence in patient cohort follows a distribution.H1 of the test1 : The observed occurrence of the allele in patient cohort does not follow , significantly lower than . Therefore, it is unlikely to be pathogenic (The detailed derivation in Supplementary Methods).Additionally, if an allele is likely benign, its AF in patient cohort should be similar to its AF in normal population, given it has equal association with patients and normal population.Thus, the second test (test2) is a right-tailed Binomial.test (X = x, N = 2n, p = q),:H0 of the test2: The allele is benign, thus its observed occurrence in patient cohort follows a Binomial (N = 2n, p = q) distribution.H1 of the test2: The observed occurrence of the allele in patient cohort does not follow Binomial (N = 2n, p = q), significantly higher than 2n × q. Therefore, it is unlikely to be benign.The combined test result is based on the results of test1 and test2.H0 of the combined test: The allele is pathogenic, thus test1 H0 is true (test1 p-value > 0.05) or test2 H0 is rejected (test2 p-value ≤ 0.05).H1 of the combined test: The test1 H1 is true (test1 p-value ≤ 0.05) and test2 H0 is true (test2 p-value > 0.05), thus the allele is unlikely to be pathogenic (namely likely to be benign).For a dominant disease gene g, the expected number of pathogenic alleles in n patients (with the rare dominant disease) ≈ n. Among the n pathogenic alleles, the expected occurrence count of a pathogenic allele a should follow a Binomial distribution with N = n trials and the occurrence frequency (success rate) of .Test1 is a left-tailed :H0 of the test1: The allele is likely pathogenic, thus its observed occurrence in patient cohort follows a distribution.H1 of the test1 : The observed occurrence of the allele in patient cohort does not follow , significantly lower than . Therefore, it is unlikely to be pathogenic (Supplementary Methods).Additionally, if an allele is benign, its AF in patient cohort should be similar to its AF in normal population, given it has equal association with patients and normal population.Test2 is a right-tailed Binomial.test (X = x, N = n, p = 2 × q):H0 of the test2: The allele is likely benign, thus its observed occurrence in patient cohort follows a Binomial (N = n, p = 2 × q) distribution.H1 of the test2: The observed occurrence of the allele in patient cohort does not follow Binomial (N = n, p = 2 × q), significantly higher than n × 2 × q. Therefore, it is unlikely to be benign.The combined test result is based on the results of test1 and test2.H0 of the combined test: The allele is likely pathogenic, thus test1 H0 is true (test1 p-value > 0.05) or test2 H0 is rejected (test2 p-value ≤ 0.05).H1 of the combined test: The test1 H1 is true (test1 p-value ≤ 0.05) and test2 H0 is true (test2 p-value > 0.05), thus the allele is unlikely to be pathogenic (namely likely to be benign).The treatment for X-linked genes and for population stratification is in supplementary materials.
Other Methods
See supplementary materials.
Code availability
Our code are available at:https://github.com/fe4960/Binomial_test
Results:
The simulation analysis of the test power and specificity
To assess the performance of the test, we simulated a variety of scenarios with different settings for the patient cohort size (n), the disease prevalence (Q), and the AF in the normal population (q).
The test power increases as the patient cohort size increases
The test power is positively correlated with the patient cohort size (Figure 1A). Larger sample size of patient cohort will allow the test to achieve high power and specificity in determining the pathogenicity of rarer variants. For example, under the autosomal dominant (AD) model, for a disease with a prevalence of 1/1000, when the patient cohort size is increased from 120 to 200, the test power will increase from 0 to about 100% in detecting rare benign variants with a population AF at 1×10−5 (Figure1A, Supplementary table S1). The similar trend was observed under the AR model (Figure1B, Supplementary table S1).
Figure1.
The simulation analysis of the test power.
As the patient cohort size increases, the test will have more power to distinguish the variants that are rare in the normal population under the AD model (A) and AR model (B). The test has more power to detect the benign variants for the disease with rarer disease prevalence than with relatively common prevalence, for variants with a population AF of 1×10−4 under the AD model (C), a population AF of 1×10−3 under the AR model (D), a population AF of 5×10−5 under the AD model (E), and a population AF of 5×10−4 under the AR model (F). The test has a high specificity and is robust. The test specificity is not significantly affected by the AF in the normal population under the AD model (G) and AR model (H), and is also not significantly affected by the disease prevalence, for variants with a population AF of 1×10−4 under the AD model (I), a population AF of 1×10−3 under the AR model (J), a population AF of 5×10−5 under the AD model (K), and a population AF of 5×10−4 under the AR model (L).
The power increases as the disease prevalence decreases
The test power is negatively correlated with the disease prevalence (Figure 1C–F, Supplementary table S1). Namely, the test has higher power for rarer diseases. For example, under the AD model, to detect benign variants with a population AF of 5×10−5, when the associated disease prevalence decreases from 1/200 to 1/1000, the patient cohort size that the test requires to achieve about 100% power will decrease from 200 to 50. The similar trend was observed under the AR model (Supplementary table S1).
The test has a high specificity
We found that the test specificity is robust and remains close to 100% and is largely unaffected by the AF in the normal population and disease prevalence. As long as the observed frequency of the variant in the patient cohort is greater than or similar to the expected frequency based on the AF in the normal population and disease prevalence, the test will typically not consider the variant as a benign variant (Figure 1G–L, Supplementary table S1).
Sampling bias in patient cohort could affect the test power and specificity
When applying the test to detect the variants biasedly enriched in the sampled patient cohort, the test power to detect benign alleles will decrease (Figure 2, Supplementary Table S2). For example, if the observed frequency of a variant is five folds of the true AF due to sampling bias, for an AR disease with a disease prevalence of 1/10000, the test power of detecting variants with a population AF of 5×10−4 will decrease to 26% with a patient cohort size of 1000. Similarly, when applying the test to identify the variants artificially depleted in the sampled patient cohort due to sampling bias, the test specificity to detect pathogenic alleles will decrease when the patient sample size is small but will rapidly recover as the sample size increases (Figure 2, Supplementary Table S2). For example, if the observed frequency of a variant is at 10% of the true AF, for an AD disease with a disease prevalence of 1/1000, the test specificity of detecting a variant with a population AF of 5×10−5 will increase from 39% to 92% when the sample size increases from 50 to 250. The similar trend was observed under the AR model.
Figure 2.
The simulation analysis of sampling bias
Sampling bias can affect the test performance. The test power decreases for detecting the variants biasedly enriched in the patient sampling, and is not significantly affected for detecting the variants biasedly depleted in the sampling, for variants with a population AF of 1×10−4 under the AD model (A), a population AF of 1×10−3 under the AR model (B), a population AF of 5×10−5 under the AD model (C), and a population AF of 5×10−4 under the AR model (D). The test specificity is not significantly affected for detecting the variants biasedly enriched in the patient sampling, and decreases for detecting the variants biasedly depleted in the sampling, which can be rapidly improved as the sample sizes increase, for variants with a population AF of 1×10−4 under the AD model (E), a population AF of 1×10−3 under the AR model (F), a population AF of 5×10−5 under the AD model (G), and a population AF of 5×10−4 under the AR model (H)
Misspecification of disease prevalence could affect test performance
As shown in Supplementary Figure S1, in the case where the disease prevalence is under-estimated, the test power to detect benign alleles and test specificity to detect pathogenic alleles are not significantly affected. On the other hand, when the disease prevalence is over-estimated, the power of the test to detect benign alleles decreases but rapidly improves with increasing of the sample size, while the specificity of the test to detect pathogenic alleles is not significantly affected (Supplementary materials, Supplementary Figure S1, Table S3).
The impact of allele penetrance on test performance
Based on simulation, if a disease is attributed to alleles with various penetrance values, the test power to detect benign alleles will not be significantly affected, whereas, test specificity to detect pathogenic alleles will decrease for alleles with low penetrance using small patient cohort sizes but will rapidly improve as the sample size increases. (Supplementary materials, Supplementary Figure S2, Table S4). Additionally, we performed the simulation for the scenario of a disease attributed to multiple pathogenic alleles with heterogenous penetrance. Under this scenario, the test power to detect benign alleles is not significantly affected, and test specificity to detect pathogenic alleles with low penetrance decreases using small sample sizes but will rapidly improve as the sample size increases (Supplementary materials, Supplementary Figure S3, Table S4). Overall, the test shows excellent power and specificity for alleles with penetrance ≥ 50%.
Estimation of the thresholds of population allele frequency for filtering variants
We performed simulation to determine the thresholds of population AF for filtering benign variants for diseases with a variety of disease prevalence under AR/AD model, respectively (Supplementary Materials). As shown in Supplementary Table S5, our test allows the filtering of alleles that are at least 10 times less frequent in population than applying thresholds based on disease prevalence alone.
Real case studies
To evaluate the performance of our model, we applied our test to variants in three genes, ABCA4, USH2A and LRP5, to represent three types of scenarios.
Screening of reported pathogenic variants in ABCA4 for Stargardt (STGD)
The disease prevalence of STGD is estimated as 1 in 10000 individuals[4]. It has been estimated that about 70% of STGD patients carry variants in ABCA4[5]. Therefore, this represents the scenario of a recessive disease with a relatively homogeneous genetic cause. We screened 945 reported pathogenic variants in ABCA4 genes collected in HGMD. Among them, 11 variants are likely benign, as their population AF in is higher than 0.7% , the cutoff based on STGD disease prevalence, therefore were excluded from further analysis. The remaining 934 variants were subjected to our test model. As a result, 26 variants with the AF in the range of 0.46% to 0.03% were identified as likely benign (Binomial test1, Bonferroni correction p-value ≤ 0.05/934 and test2 Bonferroni correction p-value > 0.05/934) (Figure 3A).
Figure 3.
The distribution of population AF of the HGMD variants in the three genes. (A) The Non-Finnish European (NFE) AF of ABCA4 variants in HGMD, (B) The NFE AF of USH2A variants in HGMD, and (C) The East-Asian AF of LRP5 variants in HGMD. The orange indicates the HGMD variants identified as likely benign by our test. The blue indicates the other HGMD variants.
To examine the test result, we further reviewed the original literatures and examined the gnomAD database (gnomAD) for the 26 variants. Based on published literature, 8 of the 26 variants should be annotated as benign as they are reported as non-pathogenic polymorphisms in one or more papers (e.g. NM_000350.2:c.455G>A[6,7], NM_000350.2:c.2701A>G[6]), behave similarly to wildtype based on functional assays (e.g. NM_000350.2:c.5693G>A[8]), or do not segregated with the disease (e.g. NM_000350.2:c.4685T>C[7]) in the original report. Consistently, homozygous individuals were observed for five of the eight variants in gnomAD. Additionally, 13 of the 26 variants should be annotated as a variant of uncertain significance (VUS) based on data presented in the original papers due to the lack of definite association of the variant to the disease[4,6,9-12]. Interestingly, 6 of these 13 variants have homozygous individuals observed in gnomAD (Supplementary table S6, S7). Finally, 5 of the 26 variants were noted as likely pathogenic in the original reports, given the patients carry a 2nd variant in ABCA4, or supported by functional evidence[7,8,13-19]. However, four of the five variants have homozygotes identified in gnomAD with the number ranging from 1 to 68 (Supplementary table S6, S7). Therefore, almost all the 26 variants that predicted to be benign by our test are indeed unlikely to be pathogenic.
Screening of reported pathogenic variants in USH2A for Retinitis pigmentosa (RP)
The disease prevalence of RP was estimated as 1 in 4000 (https://ghr.nlm.nih.gov/condition/retinitis-pigmentosa#statistics), and RP has been linked to more than 60 disease genes. About 10%−15% of RP patients were attributed to variants in USH2A[20,21]. Therefore, this represents the scenario where variants in a gene lead to a recessive disease with heterogenous genetic basis. 748 USH2A variants collected in HGMD were evaluated. Among them, 13 variants with AF greater than 0.52% in gnomAD were considered benign and excluded from further analysis (supplemental table S8 and S9). For the remaining 735 variants, 18 were identified as likely benign (Binomial test1, Bonferroni correction p-value ≤ 0.05/735 and test2 Bonferroni correction p-value > 0.05/735), whose AF ranges from 0.04% to 0.5% (Figure 3B). Based on data in the original reports and the number of individuals carrying homozygous variants observed in gnomAD, at least 10 of the 18 variants are likely to be benign. Specifically, three variants were suggested to be benign by the original reference (e.g. NM_206933.2:c.15433G>A[22], NM_206933.2:c.6587G>C[23,24]). All of them have homozygotes in gnomAD, with the number ranging from 4 to 120. Eight variants were annotated as VUS based on the original papers (e.g. NM_206933.2:c.10510C>G[16]) and three of them have homozygotes in gnomAD with the number ranging from one to nine (Supplementary table S8 and S9). Finally, for the seven variants that were suggested to be likely pathogenic by the original papers (e.g. NM_206933.2:c.11815G>A[25]), four of them have homozygotes in gnomAD with the number ranging from 1 to 95 (Supplemental table S8 and S9).
Identification of benign variants in LRP5 gene for Familial Exudative Vitreoretinopathy (FEVR)
FEVR is a rare retinal vascular disorder that is primarily dominantly inherited[26]. Variants in LRP5 account for about 20% – 25% of FEVR patients. Therefore, this represents the scenario where variants in a gene lead to a dominant disease with heterogenous genetic basis. 150 putative pathogenic variants in LRP5 were analyzed. After excluding seven variants based on the disease prevalence of FEVR (greater than 0.05%, ), 16 likely benign variants were identified (Binomial test1, Bonferroni correction p-value ≤ 0.05/143 and test2 Bonferroni correction p-value > 0.05/143). The AF of the 16 variants are in the range of 0.0058% – 0.029% (Figure 3C). Based on the data presented in the original report, nine variants were found in patients with diseases unrelated to eye, including six variants related to bone diseases and three variants related to colorectal cancer, indicating these variants are not pathogenic variants for FEVR. Among the seven variants have been linked to eye diseases, two variants were identified in patients with a 2nd
LRP5 variant in compound heterozygous form, implying each variant alone might be insufficient to lead to disease. For the remaining five variants, one variant was considered as benign, one as VUS, and three as likely pathogenic based on the original literature (Supplementary table S10 and S11). To further evaluate these five variants, functional assays are conducted as described below.
Comparison to in silico variant prediction scores and ClinVar annotation
We have compared our results to three commonly used in silico variant prediction tools, including CADD, REVEL and phastcon_100way. The benign variants identified by our method have lower CADD, REVEL and phastcon_100way scores than the other reported variants in the same gene for all the three genes, supporting our test result (Figure 4A–C and Supplementary table S6, S8, S10, S12, Supplementary Results).
Figure 4.
The distribution of other variant prediction scores and ClinVar assignment for the HGMD variants in ABCA4, LRP5, and USH2A. (A) The distribution of CADD phred scores, (B) The distribution of REVEL scores, (C) The distribution of phastcon scores. The red indicates the likely benign variants identified by our test. The turquoise indicates the other HGMD variants. The blue dot indicates the mean of the distribution.
D) The ClinVar assignment. The bar plot shows the distribution of the variants among the ClinVar assignment categories. The ClinVar categories labeled with 1–13 were enumerated in the right bottom panel. The orange indicates the likely benign variants identified by our test. The blue indicates the other HGMD variants.
To further assess our test results, we crossed reference to ClinVar annotation. As shown in figure 4D, the ClinVar assessment largely supports our test results. Specifically, among the 74 out of the 91 benign variants identified by our test and with records in ClinVar, the vast majority are classified as benign (32 variants, 43.2%), conflicting interpretations of pathogenicity (24 variants, 32.4%), and uncertain_significance (15 variants, 20.3%) (Figure 4D, and Supplementary table S6, S8, and S10).
Functional assay of predicted benign variants in LRP5
To further evaluate our results, we performed a functional assay on five LRP5 variants that were originally reported as pathogenic but are identified as likely benign by our test. As shown in figure 5, three of the variants, LRP5.R1219H (NM_002335.3:c.3656G>A, two tailed T test, p-value < 0.005), LRP5.R1342Q (NM_002335.3:c.4025G>A, T test, p-value > 0.05), LRP5.H1383P (NM_002335.3:c.4148A>C, T test, p-value < 0.002), have similar or higher WNT signaling activity than the wild type control without or with WNT3A treatment, suggesting that these three variants are indeed likely benign. In contrast, LRP5.A422V (NM_002335.3:c.1265C>T) show similar signaling activity to LRP5.WT without WNT3A treatment (T test, p-value = 0.2733), but its activity is reduced by about 50% with WNT3A treatment (T test, p-value =4.95e-5) (Figure 5), suggesting that LRP5.A422V is likely to be a hypomorphic allele. One variant, LRP5.T1506M (NM_002335.3:c.4517C>T), shows lower signaling activity than LRP5.WT without (T test, p-value =1.01e-6) or with WNT3A treatment (T test, p-value =1.31e-7), suggesting it is likely to be a pathogenic allele (Figure 5, Supplementary Table S13, Supplementary Results).
Figure 5.
The luciferase reporter assay of LRP5 variants. We tested five variants that were identified as likely benign by our test along with a positive control and a negative control. The luciferase reporter assay of the positive control variant, p.Q368X, showed reduced WNT signaling activity, while that of the negative control variant, p.D666N, showed higher or similar WNT signaling activity to the wildtype allele. The variants, p.R1342Q, p.H1383P, and p.R1219H showed higher or similar WNT activity to the wildtype allele, suggesting they are likely benign and consistent with test result. However, p.A422V showed similar WNT signaling to the wildtype allele without WNT3A treatment, but the reduced WNT signaling with WNT3A treatment, suggesting it might be a hypomorphic allele. p.T1506M showed the reduced WNT signaling without or with WNT3A treatment, suggesting it might be a pathogenic allele. The Y-axis shows the WNT signaling activity of the variants normalized by that of wildtype allele without WNT3A treatment. NT: no treatment. WNT3A: with treatment.
Discussion
To identify the variants likely to be benign in the context of Mendelian diseases, we designed a novel statistical test by integrating disease prevalence, AF in the patient cohort, and AF in the normal population into a robust statistical framework. Evaluation of our model with both the simulated and real data followed by experimental validation show that it is well-powered to detect benign variants, especially effective for the variants with low AF in the normal population.Simulation analysis also suggests the test is more powerful when using a patient cohort with larger sample size, for dominant diseases than recessive diseases, and for the rarer diseases than the relatively common diseases. Furthermore, simulation indicates the test has a high specificity which are less affected by the disease prevalence and AF in the normal population. Moreover, simulations show that the test performance is not significantly affected by the estimation of disease prevalence and allele penetrance, but could be affected by sampling bias. Additionally, in case when the prevalence of a disease approaches zero, several confounding factors likely co-occur - the variance of disease prevalence estimation likely increases, the available sample size will be smaller, and within these constraints, the penetrance of an allele possibly varies by its frequency. Under those situations, the test power to detect extreme rare benign variants could be compromised and should be used with caution.For real datasets, we applied our test to screen the variants in HGMD using real patient cohort data and AF in gnomAD, and identified the variants that are likely benign for three types of Mendelian diseases. The identified variants have the population AF in the range of 0.0058% to higher than 1%, consistent with the simulation analysis showing the high power of our test for variants with low AF in population. Moreover, our test results are supported by multiple independent evidences: 1) the original literature suggest that a large proportion of the predicted benign variants can indeed be classified as benign or VUS. 2) For recessive diseases, a large proportion of the predicted benign variants are found in homozygous state in multiple individuals in gnomAD, suggesting they are unlikely to cause disease in bi-allelic state. 3) Other in silico variant prediction scores (i.e. CADD, REVEL, and phastcon score) showed that the putatively benign variants are less deleterious than the rest HGMD variants in the same genes. 4) The majority of the putative benign variants are classified as benign, conflicting interpretations of pathogenicity, and uncertain-significance by ClinVar.Predictions from our model are further validated by a functional assay. The luciferase functional assay was performed on a subset of HGMD variants predicted to be benign in LRP5 (with population AF in the range of 0.0058% to 0.029%). The assay show that three of the five tested variants are likely benign. Another variant, LRP5.A422V, is likely a hypomorphic variant. This is also consistent with the original reference in which LRP5.A422V was only seen in-cis with LRP5.R348W in the patient family[27], therefore LRP5.A422V alone might not be severe enough to cause the phenotype. Interestingly, LRP5.R348W is predicted to be likely pathogenic by our test. Additionally, one tested variant, LRP5.T1507M, is likely pathogenic. The reason of the contradictory result for LRP5.T1507M might be due to sampling bias of this allele in our patient cohort. Specifically, under sampling of this variant in the patient cohort could lead to false positive interpretation. Indeed, our test suggested this allele is slightly enriched in the patient cohort compared to its AF in normal population with a p-value of 0.01, though it did not pass the multiple testing correction of p-value (p-value ≤ 0.05/150) to be assigned as a pathogenic allele.One of the main limitations is that the test depends on the availability of patient sequencing data and the patient cohort size. With the dramatic reduction of sequencing cost, it becomes a common practice to perform panel of exome or even genome sequencing as part of the diagnosis for patients with rare genetic diseases. Hence, we expect that this bottleneck will soon be overcome. Additionally, although our model is for filtering the variants of a single gene, if the disease is caused by multiple genes, one can simply adjust the Q value to the frequency of the disease due to mutations in the gene of interests and apply our test directly (e.g. the aforementioned cases of USH2A and LRP5). Moreover, it is straightforward to aggregate multiple genes from the same disease and run the test directly with the modified Q value accordingly. We envision that the performance of our test can be improved by increasing the patient cohort size and combining the test with other variant prediction scores, e.g. REVEL score, CADD score, which are based on other types of information besides allele frequency. In summary, as a general test that is mainly driven by the size of the dataset and can be applied to any types of variants in the context of all Mendelian diseases, we believe our work provides a general framework for filtering benign variants with very low population AF and its performance will continue to improve as more patient sequences becoming available.
Authors: F Simonelli; F Testa; G de Crecchio; E Rinaldi; A Hutchinson; A Atkinson; M Dean; M D'Urso; R Allikmets Journal: Invest Ophthalmol Vis Sci Date: 2000-03 Impact factor: 4.799
Authors: Ramon A C van Huet; Laurence H M Pierrache; Magda A Meester-Smoor; Caroline C W Klaver; L Ingeborgh van den Born; Carel B Hoyng; Ilse J de Wijs; Rob W J Collin; Lies H Hoefsloot; B Jeroen Klevering Journal: Mol Vis Date: 2015-04-28 Impact factor: 2.367
Authors: Gang Zou; Tao Zhang; Xuesen Cheng; Austin D Igelman; Jun Wang; Xinye Qian; Shangyi Fu; Keqing Wang; Robert K Koenekoop; Gerald A Fishman; Paul Yang; Yumei Li; Mark E Pennesi; Rui Chen Journal: Mol Vis Date: 2021-03-18 Impact factor: 2.367