Literature DB >> 29884841

Comprehensive annotation of BRCA1 and BRCA2 missense variants by functionally validated sequence-based computational prediction models.

Steven N Hart1, Tanya Hoskin1, Hermela Shimelis2, Raymond M Moore1, Bingjian Feng3, Abigail Thomas1, Noralane M Lindor4, Eric C Polley1, David E Goldgar3, Edwin Iversen5, Alvaro N A Monteiro6, Vera J Suman1, Fergus J Couch7,8.   

Abstract

PURPOSE: To improve methods for predicting the impact of missense variants of uncertain significance (VUS) in BRCA1 and BRCA2 on protein function.
METHODS: Functional data for 248 BRCA1 and 207 BRCA2 variants from assays with established high sensitivity and specificity for damaging variants were used to recalibrate 40 in silico algorithms predicting the impact of variants on protein activity. Additional random forest (RF) and naïve voting method (NVM) metapredictors for both BRCA1 and BRCA2 were developed to increase predictive accuracy.
RESULTS: Optimized thresholds for in silico prediction models significantly improved the accuracy of predicted functional effects for BRCA1 and BRCA2 variants. In addition, new BRCA1-RF and BRCA2-RF metapredictors showed area under the curve (AUC) values of 0.92 (95% confidence interval [CI]: 0.88-0.96) and 0.90 (95% CI: 0.84-0.95), respectively. Similarly, the BRCA1-NVM and BRCA2-NVM models had AUCs of 0.93 and 0.90. The RF and NVM models were used to predict the pathogenicity of all possible missense variants in BRCA1 and BRCA2.
CONCLUSION: The recalibrated algorithms and new metapredictors significantly improved upon current models for predicting the impact of variants in cancer risk-associated domains of BRCA1 and BRCA2. Prediction of the functional impact of all possible variants in BRCA1 and BRCA2 provides important information about the clinical relevance of variants in these genes.

Entities:  

Keywords:  BRCA1 and BRCA2; Functional evaluation; In silico prediction; Metapredictor; VUS

Mesh:

Substances:

Year:  2018        PMID: 29884841      PMCID: PMC6287763          DOI: 10.1038/s41436-018-0018-4

Source DB:  PubMed          Journal:  Genet Med        ISSN: 1098-3600            Impact factor:   8.822


INTRODUCTION

Pathogenic variants in BRCA1 and BRCA2 account for 20–25% of hereditary breast and ovarian cancer[1], 5–10% of breast cancers[2], and up to 15% of ovarian cancers[3]. While most known pathogenic variants in these genes truncate the encoded proteins, missense variants can also predispose to cancer. More than 90% of missense variants in public databases[4] identified by clinical genetic testing are listed as variants of uncertain significance (VUS)[5]. Missense variants with definitive pathogenic or neutral status can inform clinical management, prevention, and treatment. Thus, accurate methods to establish variant pathogenicity are needed. Family-based studies yielding likelihoods of pathogenicity, based on segregation of variants with cancer and personal and family history of cancer are established methods for determining pathogenicity of variants in BRCA1 and BRCA2. However, few missense variants have been clinically annotated by this method owing to the limited availability of family-based data. Similarly, functional assays[6,7] with established specificity and sensitivity for known pathogenic and neutral BRCA1 or BRCA2 variants, have been used alone or in combination with family-based segregation data to infer pathogenicity[8]. However, classification of all possible variants by functional assays is unlikely. Alternatively, the clinical relevance of variants can be assessed using sequence-based in silico prediction models, which can be applied to all possible missense VUS in these genes. Given the large number of unique VUS identified in BRCA1 and BRCA2, in silico prediction models will need to be incorporated in models that aim to predict the pathogenicity of VUS in these genes. Most commonly used prediction tools such as SIFT[9], PolyPhen[10], GERP[11], Align-GVGD[12], and CADD[13] have been developed using large-scale databases such as the Human Gene Mutation Database (HGMD)[14] or ClinVar[4]. While functional assays can out-perform these computational predictions of damage[6,15], development and/or calibration of in silico prediction models using well characterized functional data from validated assays is expected to improve variant annotation. In this study, HDR[6] functional data from 207 BRCA2 variants and transcriptional integrity[16,17] data for 248 BRCA1 variants were used to evaluate the performance of existing in silico algorithms. Sensitivity and specificity of the algorithms were optimized by defining more accurate thresholds, and by newer high performance Random Forest (RF) and naïve voting method (NVM) predictors. We show that optimization for one gene leads to poor performance when applied to the other, highlighting the importance of different gene-specific features for prediction accuracy.

MATERIALS AND METHODS

BRCA1 transcription integrity assay

Results from functional studies of variants in the BRCT domains of BRCA1 using a transcription integrity assay have been reported previously[16,17]. The sensitivity and specificity of this assay for missense variants in the BRCT domains of BRCA1 have been estimated at 100% (Sensitivity, 95%CI: 75%−100%; Specificity, 95%CI: 83%−100%)[16]. The 95% probability of pathogenicity and neutrality from the VarCall two-component mixture model for classification of BRCA1 missense variants[8] was used to define 61 pathogenic, 21 indeterminate (partial effect on function), and 166 neutral variants (total of 248). These data were used to define BRCA1 activity.

BRCA2 HDR assay

A cell-based homology directed DNA repair activity assay was used to assess the influence of missense variants in the DNA binding domain of BRCA2 on protein activity[6]. In brief, BRCA2 activity in brca2 deficient V-C8 cells expressing mutant forms of full-length BRCA2 was measured with a DR-GFP reporter plasmid after induction of a DNA double strand break using the I-Sce1 enzyme. The V-C8 hamster lung fibroblast cell line was a gift from Dr. Margaret Zdzienicka. Cells were verified by genotyping in the Mayo Clinic Medical Research Facility and routinely tested for mycoplasma contamination. The sensitivity and specificity of this assay for damaging missense variants in the DNA binding domain (DBD) of BRCA2 has previously been estimated at 100% (Sensitivity, 95%CI: 79%−100%; Specificity, 95%CI: 93%−100%) using 21 known neutral and 13 known pathogenic variants[6,18-20]. Results from 68 variants were combined with previous results from 139 previously characterized variants for a total of 207.

Damaging missense prediction tools

dbNSFP version 3.0a[21] was downloaded and converted into a BioR catalogue[22] to annotate variants. Align-GVGD[12] was accessed online. CAROL and CONDEL scores were gathered from Variant Effect Predictor (VEP)[23].

Optimized thresholds

Analyses included damaging, indeterminate, and neutral variants. Indeterminate variants were included in the neutral category (Scenario 1). An alternative approach, in which indeterminate variants are included in the pathogenic category (Scenario 2) is provided in Supplemental Materials. Optimal thresholds for individual predictive algorithms that maximized sensitivity and specificity for damaging variants were derived using results from the BRCA1 transcriptional integrity assay and BRCA2 HDR assay, individually (Figure S1 and S2). Matthew’s correlation coefficients (MCC) were calculated for each resulting binary classification relative to the functional assay standards[24]. The areas under the curve (AUCs) were estimated and reported with 95% confidence intervals using the DeLong error method. Receiver operating characteristic (ROC) analyses were performed using the package optimalCutpoints[25] for R software (v3.3.3; http://www.R-project.org).

Naïve Voting Method (NVM) models.

For each gene, a training set (a random sample of ~50% of the variants for each gene) and a test set (the remaining ~50%) were constructed using the sample function in R. The training set was used to determine the optimal number of individual prediction algorithms in the NVM model based on the maximal MCC. Starting with the individual prediction algorithm with the highest MCC, the prediction algorithm with the highest individual MCC among the models not previously chosen was added iteratively until the optimal numbers of prediction algorithms were included. If both a raw score (Score) and rank score (RankScore) for an algorithm were available, then only the RankScore was utilized. The NVM models and thresholds developed in the training sets were validated in the test sets. The MCC and other performance statistics were also re-calculated across the entire data sets (training and test combined) to be consistent with reporting of other models. Lollipop plots were generated with lollipops (v1.2, http://dx.doi.org/10.5281/zenodo.46184).

Random Forest models.

Random forest (RF) modelling utilized scores from each of the optimized individual prediction algorithms to identify the subset of prediction algorithms that maximized the accuracy of predicting damaging and non-damaging (indeterminate and neutral) variants in BRCA1 and BRCA2. The randomForest R package[26] was used with settings of n=500 trees and the number of predictor variables sampled as candidates at each split set to the recommended default of sqrt(p), where p is the number of predictor variables included in the model. For individual prediction algorithms available as both a Score and RankScore, only the RankScore was included in the random forest models. Variable importance was assessed using the mean decrease in accuracy resulting from exclusion of a given prediction model from the RF classifiers. Out-of-sample predictions on the probability scale were again derived for each model and used to estimate AUC, sensitivity, specificity, and MCC at optimized cut points for prediction of functional status.

Comparison to ClinVar

BRCA1 and BRCA2 classifications from ClinVar that were reviewed by an expert panel and had no conflicting interpretations were used. Pathogenic and likely pathogenic variants in ClinVar were grouped into the pathogenic (damaging) category, and variants annotated as benign or likely benign in ClinVar with no conflicting interpretations were defined as neutral (neutral).

Code availability

All code and data required to replicate all analyses are available on GitHub (https://github.com/Steven-N-Hart/NVM).

RESULTS

Functional characterization of 68 novel BRCA2 missense variants

In this study, 68 BRCA2 variants from the BRCA2 DBD were evaluated using the HDR assay. Of these, 17 showed HDR fold change <1.66, with probabilities of pathogenicity >0.99 (Table 1, Figure 1, Table S1), and 48 variants showed HDR>2.41 and probabilities of neutrality >0.99. Another three variants (p.I2672T, p.D2733V and p.P3150L) displayed partial activity (HDR fold change >1.66 and <2.41) and were annotated as indeterminate variants (Figure 1, Table S1). When combined with previously classified variants[6,27], 69 were predicted deleterious (damaging), 21 were intermediate/partial (indeterminate), and 117 were predicted benign/neutral (neutral) (Table 1, Table S1).
Table 1.

Predicted pathogenic missense variants defined by the BRCA2 HDR assay

VariantcDNAAGVGDclassIARCclassFC*SE[#]p(pathogenicity)p(neutrality)Origin

G2748Dc.8243G>AC65Class 50.68± 0.071.002.96E-12Lindor et al., 2012
L2686Pc.8057T>CC450.72± 0.041.009.50E-12Guidugli et al., 2017
L2653Pc.7958T>CC65Class 50.75± 0.081.003.40E-11Lindor et al., 2012
E2663Kc.7987G>AC550.83± 0.011.004.00E-10Current study
R3052Wc.9154C>TC65Class 50.86± 0.091.009.33E-10Lindor et al., 2012
L2721Hc.8162T>AC250.88± 0.101.001.70E-09Guidugli et al., 2017
Y2624Dc.7870T>GC650.88± 0.061.001.75E-09Current study
Y2624Hc.7870T>CC650.89± 0.311.002.40E-09Current study
L3125Rc.9374T>GC650.91± 0.001.003.41E-09Current study
R2784Wc.8350C>TC650.91± 0.101.003.88E-09Guidugli et al., 2017
A2603Pc.7807G>CC250.92± 0.601.004.78E-09Guidugli et al., 2017
N3124Ic.9371A>TC65Class 40.92± 0.101.004.65E-09Guidugli et al., 2013
L2647Pc.7940T>CC65Class 40.93± 0.101.005.79E-09Lindor et al., 2012
S2670Lc.8009C>TC150.93± 0.071.006.20E-09Guidugli et al., 2017
G3076Ec.9227G>AC650.95± 0.101.001.02E-08Guidugli et al., 2017
Y2624Nc.7870T>AC650.95± 0.001.001.20E-08Current study
L3125Hc.9374T>AC650.96± 0.071.001.36E-08Guidugli et al., 2017
L2510Pc.7529T>CC650.98± 0.111.002.61E-08Guidugli et al., 2017
K2630Qc.7888A>CC450.98± 0.081.002.49E-08Guidugli et al., 2017
R2824Gc.8470A>GC651.00± 0.201.003.72E-08Current study
H2623Rc.7868A>GC251.00± 0.041.003.59E-08Guidugli et al., 2017
D2723Hc.8167G>CC65Class 51.00± 0.011.003.81E-08Lindor et al., 2012
N2781Ic.8342A>TC651.00± 0.101.004.08E-08Current study
I2627Fc.7879A>TC15Class 51.01± 0.081.004.83E-08Lindor et al., 2012
A2730Pc.8188G>CC01.01± 0.011.005.46E-08Current study
L2688Pc.8063T>CC65Class 41.02± 0.081.005.80E-08Guidugli et al., 2013
G3076Rc.9227G>TC651.03± 0.081.008.66E-08Guidugli et al., 2017
G3076Vc.9226G>CC651.03± 0.111.008.08E-08Guidugli et al., 2017
D2723Vc.8168A>TC651.04± 0.081.001.01E-07Guidugli et al., 2017
W2788Rc.8362T>CC251.05± 0.081.001.37E-07Guidugli et al., 2017
D2723Ac.8168A>CC651.06± 0.111.001.49E-07Guidugli et al., 2017
W2788Sc.8363G>CC351.06± 0.081.001.44E-07Guidugli et al., 2017
N3124Kc.9372C>AC651.07± 0.061.001.89E-07Current study
G2609Vc.7826G>TC651.07± 0.081.002.24E-07Guidugli et al., 2017
F2642Sc.7925T>CC451.07± 0.081.002.02E-07Guidugli et al., 2017
D3095Ec.9285C>GC35Class 41.07± 0.121.001.89E-07Guidugli et al., 2013
T2722Rc.8165C>GC65Class 51.08± 0.081.002.78E-07Lindor et al., 2012
G2508Rc.7522G>CC651.09± 0.151.002.92E-07Current study
S2691Fc.8072C>TC01.10± 0.061.004.09E-07Guidugli et al., 2017
E3002Kc.9004G>AC551.10± 0.081.004.35E-07Guidugli et al., 2017
D2723Gc.8168A>GC65Class 51.11± 0.121.005.19E-07Lindor et al., 2012
W2626Rc.7876T>CC651.12± 0.011.006.04E-07Current study
G2596Ec.7787G>AC651.12± 0.091.005.77E-07Guidugli et al., 2017
Q2561Pc.7682A>CC151.13± 0.061.007.81E-07Guidugli et al., 2017
V2687Fc.8059G>TC01.14± 0.021.008.96E-07Current study
H2623Yc.7867C>TC651.15± 0.011.001.32E-06Current study
W2626Cc.7878G>CC65Class 51.16± 0.131.001.56E-06Lindor et al., 2012
G2793Rc.8377G>AC651.18± 0.091.002.23E-06Guidugli et al., 2017
A3028Pc.9082G>CC01.18± 0.041.002.48E-06Current study
L2792Pc.8375T>CC651.19± 0.091.003.03E-06Guidugli et al., 2017
G2793Ec.8378G>AC651.19± 0.071.002.71E-06Guidugli et al., 2017
A2786Pc.8356G>CC01.23± 0.091.005.97E-06Guidugli et al., 2017
R2784Qc.8351G>AC351.27± 0.141.001.46E-05Guidugli et al., 2017
G2596Rc.7786G>CC651.28± 0.101.001.73E-05Guidugli et al., 2017
G2585Rc.7753G>AC651.30± 0.101.002.47E-05Guidugli et al., 2017
G3003Ec.9008G>AC651.33± 0.081.004.59E-05Guidugli et al., 2017
G2609Dc.7826G>AC65Class 41.35± 0.071.006.19E-05Guidugli et al., 2013
W2725Lc.8174G>TC551.35± 0.101.006.73E-05Guidugli et al., 2017
K2498Ec.7492A>GC551.36± 0.101.007.98E-05Guidugli et al., 2017
Q2655Rc.7964A>GC351.38± 0.091.001.13E-04Guidugli et al., 2017
Y2726Cc.8177A>GC651.49± 0.161.007.86E-04Guidugli et al., 2017
Q2925Kc.8773C>AC451.51± 0.121.001.07E-03Guidugli et al., 2017
D3073Gc.9218A>GC651.52± 0.121.001.17E-03Guidugli et al., 2017
R2659Gc.7975A>GC651.57± 0.121.002.57E-03Guidugli et al., 2017
Y2624Cc.7871A>GC651.57± 0.051.002.71E-03Current study
R2842Pc.8525G>CC651.59± 0.051.003.40E-03Current study
D2611Gc.7832A>GC651.61± 0.120.995.09E-03Guidugli et al., 2017
Y2660Dc.7978T>GC651.61± 0.120.995.03E-03Guidugli et al., 2017
N2622Sc.7865A>GC45 1.63± 0.130.996.76E-03Current study

Fold Change in GFP positive cells in HDR assay;

Standard Error

Figure 1.

HDR activity of 207 BRCA2 missense variants.

The model-based HDR fold change with standard error (SE) is displayed on a logarithmic scale. The SE is included as a measure of the reproducibility of the HDR assay for each variant. Solid lines represent 99% probability of pathogenicity and 99% probability of neutrality (fold increase in GFP (+) cells < 1.66 for damaging and fold increase in GFP (+) cells > 2.41 for neutral). Dotted lines separate variants classified as deleterious, indeterminate, and neutral.

Computational Predictions

Sensitivity and specificity of 40 computational prediction models with previously established cut points for damaging variants were determined using the functional assay data for BRCA1 and BRCA2 missense variants (Tables S2, Table S3). These default thresholds yielded either high sensitivity with low specificity (e.g. BRCA2 SIFT Score: sensitivity 100%, specificity <20%) or low sensitivity with high specificity (e.g. BRCA1 PROVEAN Score: sensitivity <0.02%, specificity 100%), depending on the gene (Table S3). To optimize the predictive ability of each model, thresholds that maximized sensitivity and specificity for damaging variants were defined separately for BRCA1 and BRCA2. For the purposes of predicting damaging, clinically relevant variants, models were generated by combining indeterminate with neutral variants (Scenario 1). Performance characteristics and AUC values for optimized individual prediction models for BRCA1 and BRCA2 are shown in Table 2 and Figure S2. The best performing individual models for BRCA1 incorporated conservation measures including deep interspecies protein alignments and physicochemical changes in amino acids (MetaSVM Score and RankScore[21]; PERCH and PERCH_noMAF[28]; Align-GVGD[12]; Polyphen2Hvar Score and RankScore[10], and VEST3 Score and RankScore[29]). These models yielded AUCs>0.87, sensitivity and specificity >80%, and MCCs up to 0.68 (VEST3Score) (Table 2, Table S4). These results represented a major improvement in performance over results based on default thresholds (mean = 0.29) (Table S3). The best performing models for BRCA2 were PERCH and PERCH_noMAF; MetaLR RankScore and Score; MetaSVM RankScore and Score; and VEST3 RankScore and Score. These yielded AUCs of 0.83–0.89, sensitivity and specificity >78% (85% for PERCH), and MCCs>0.53 (Table 2, Table S4), which were substantially improved over models using default parameters (MCC<0.42) (Table S3).
Table 2.

Performance of in silico prediction models with optimized thresholds for classification of BRCA1 and BRCA2 missense variants

GeneModelOptimalThresholdAUC (95%CI)FN / FP / TP / TNMCC

 BRCA1NVM-Validation≥90.94 (0.897–0.983)5 / 8 / 25 / 790.719
Vest3RankScore≥0.855460.9 (0.849–0.95)8 / 24 / 51 / 1530.678
Vest3Score≥0.8680.9 (0.849–0.95)8 / 24 / 51 / 1530.678
RF≥0.2980.92 (0.879–0.96)8 / 26 / 51 / 1510.663
AlignGVGDPrior≥0.290.88 (0.829–0.931)7 / 37 / 54 / 1500.614
PERCHnoMAF≥0.2063160.87 (0.814–0.924)10 / 31 / 51 / 1560.614
PERCH≥0.2398530.87 (0.819–0.922)10 / 32 / 51 / 1550.607
Polyphen2HvarRankScore≥0.915840.89 (0.845–0.93)11 / 30 / 48 / 1470.593
Polyphen2HvarScore≥0.9990.89 (0.845–0.93)11 / 30 / 48 / 1470.593
MetaSVMRankScore≥0.90830.89 (0.844–0.928)11 / 34 / 48 / 1430.565

BRCA2NVM-Validation≥40.89 (0.826–0.963)6 / 9 / 29 / 590.683
PERCH≥0.2959570.89 (0.847–0.939)11 / 21 / 60 / 1150.672
PERCHnoMAF≥0.2721490.88 (0.832–0.929)12 / 23 / 59 / 1130.642
RFModel≥0.3710.9 (0.843–0.947)12 / 24 / 59 / 1110.633
MetaSVMRankScore≥0.931810.87 (0.824–0.923)15 / 29 / 56 / 1070.555
MetaSVMScore≥0.70020.87 (0.824–0.923)15 / 29 / 56 / 1070.555
MetaLRRankScore≥0.921070.87 (0.823–0.922)16 / 30 / 55 / 1060.535
MetaLRScore≥0.76790.87 (0.823–0.922)16 / 30 / 55 / 1060.535
Vest3RankScore≥0.799630.83 (0.776–0.893)16 / 30 / 55 / 1060.535
Vest3Score≥0.8110.83 (0.776–0.893)16 / 30 / 55 / 1060.535

FN: False negative; FP: False positive; TP: True positive; TN: True negative

AUC: Area under the curve from Receiver Operator characteristic analysis

MCC: Matthew Correlation Coefficient

To assess whether meta-predictor models improved prediction of the damaging variants for each gene, two new models were developed for both BRCA1 and BRCA2: (1) Random Forest (RF) classifiers of prediction methods were derived from the continuous outputs from the functional data (BRCA1-RF and BRCA2-RF); (2) naïve voting methods (NVM) were applied to optimized thresholds for each prediction model (BRCA1-NVM and BRCA2-NVM). CAROL[30] and CONDEL[31] predictors were not included in development of new BRCA2 models because prediction scores for 29 of 207 (14.0%) variants were not available. Only 12 of 248 (4.8%) BRCA1 and 1 of 207 (0.5%) BRCA2 variants were excluded from new model development due to missing data or conflicts between protein and DNA sequences (Table S2).

RF-Models

Random Forest (RF) classifiers were used to evaluate the impact of excluding individual prediction methods on the accuracy of composite prediction models. VEST3 RankScore and Align-GVGD had the greatest impact on the accuracy of BRCA1-RF, whereas Mutation Assessor RankScore and PERCH had the greatest impact on BRCA2-RF. The BRCA1-RF model (threshold ≥0.298) (Table S4) showed the second highest AUC value of all models for BRCA1 (0.92, 95%CI:0.88–0.96), with 86% sensitivity and 85% specificity. The BRCA1-RF model predicted 8 of 59 (13.6%) functionally impaired BRCA1 variants as neutral (false negatives), 12 of 21 (57.1%) functionally indeterminate variants as damaging, and 14 of 156 (9.0%) functionally intact neutral variants as damaging (false positives) (Table S2). Similarly, the BRCA2-RF model (threshold ≥0.371) (Table S4) had the highest AUC for BRCA2 (0.90, 95%CI:0.84–0.95) (Table 2, Table S4) with 83% sensitivity and 82% specificity (Table S4, Figure 2).
Figure 2.

Matthews Correlation Coefficients (MCC) for 42 in silico predictors with optimized thresholds for damaging versus indeterminate/neutral variants in BRCA1 and BRCA2.

Higher values indicate increased classifier performance.

NVM-Models

NVM models based on the optimal number of individual prediction algorithms for BRCA1 and BRCA2 variants were also developed. The optimal NVM for BRCA1, following training and validation (BRCA1-NVM Combined) contained 13 prediction models (Table S5). BRCA1 variants are predicted damaging when ≥9 of the 13 models exceed their individual thresholds for damaging variants (Table S5). BRCA1-NVM yielded an AUC of 0.94 with sensitivity of 83% and specificity of 91%. The highest proportion of BRCA1 misclassifications involved variants with indeterminate function, with 9 of 21 (42.9%) annotated as damaging. In contrast, the optimal BRCA2-NVM (BRCA2-NVM Combined) model after training and validation incorporated six prediction models with a threshold of ≥4 models predicting damaging variants (Table S5). This model yielded sensitivity of 82% and specificity of 87% (Table 2, Table S4, Table S5), with 14 functionally damaging variants predicted as neutral, and 18 indeterminate/neutral variants predicted as damaging. As with BRCA1, the false positive results were disproportionately enriched for indeterminate function with 6 of 21 (28.6%) misclassified. Overall, the predictive abilities of the RF and NVM models showed substantial improvement over individual in silico prediction methods using default parameters, and modest improvements over the best performing individual in silico methods optimized at thresholds specific to BRCA1 and BRCA2.

Application of selected models to all possible missense variants in BRCA1 and BRCA2

The RF and NVM models were used to assess the damaging potential of all theoretically possible missense substitutions resulting from single nucleotide changes in BRCA1 and BRCA2, contingent on availability of prediction scores from all the individual methods contributing to each model (Table S2). Because a subset of the contributing prediction algorithms are in part based on nucleotide substitution rates, several missense variants caused by different nucleotide changes may have more than one predicted RF or NVM score. Using BRCA1-NVM, 7.1% of BRCA1 variants were predicted as damaging. Similarly, 2.6% of BRCA2 variants were predicted as damaging using BRCA2-NVM. However, marked enrichment for NVM predicted damaging variants was observed in known functional domains (Figure 3). Analysis of the BRCA1 RING domain, predicted that 30–40% of all missense changes disrupt protein function. Similarly, 46% of all possible missense variants in the C-terminal BRCT domains and >20% in the larger C-terminal region (residue 1660–1810) were predicted damaging (Table S2). Interestingly, ~10% of all possible variants between amino acids 300 to 550, which have been associated with TP53[32], RAD50[33], and c-MYC[32] interactions, were predicted damaging (Table S2). For BRCA2, only the region from residues 2574 to 2771 that contains the helical and OB1 domains of the DNA binding domain was predicted to have >20% damaging variants, although 10% of variants in OB3 were also predicted damaging (Figure 3, Table S2). Few damaging missense variants were predicted in the OB2 domain. Similar results were obtained using the RF model (Table S2). Damaging mutations were not predicted in the N-terminus of BRCA2, containing the PALB2 interaction domain[34], possibly because of the small size of the interaction site.
Figure 3.

Estimates of the proportion of damaging missense variants by position in each gene.

The AAPOS x-axis represents the amino acid position, and the y-axis is the probability of a missense mutation being damaging from the NVM model. The lines were smoothed using a 50 amino acid sliding window.

DISCUSSION

Specific measures of BRCA1 and BRCA2 functional activity have been established as reliable measures of the functional impact and the likelihood of pathogenicity of variants in certain domains of BRCA1 and BRCA2[6,16]. However, in the absence of functional studies of individual variants, in silico models that incorporate functional or structural data are often considered useful predictors of function. Here, existing models for prediction of damaging missense variants were recalibrated based on BRCA1 and BRCA2 functional data and were combined in meta-predictor classifiers (NVM and RF). These meta-predictors leveraged the strengths and weaknesses and improved upon many of the individual models for predicting the functional implications of missense variants in the cancer risk-associated domains of BRCA1 and BRCA2. We subsequently used these highly sensitive and specific models to annotate all missense variants from the BRCA1 and BRCA2 genes as damaging or neutral. Importantly, because the BRCA1 transcriptional integrity assay and the BRCA2 HDR assay used for calibration of the various prediction models have 100% sensitivity and specificity for clinically pathogenic variants in the BRCA1 BRCT and BRCA2 DNA binding domain domains, respectively, the models may also predict the clinical pathogenicity of missense variants in these domains. Whether prediction of functional effects in other parts of these proteins also reflects pathogenicity remains to be determined using additional pathogenic and neutral standards. Overall, these prediction models are likely to alter the interpretation of many VUS in BRCA1 and BRCA2, leading to improved clinical genetic testing, and perhaps improved risk management of patients found to carry VUS. The current American College of Medical Genetics guidelines for variant classification recommends that in silico evidence can be counted as supporting evidence for pathogenicity (or lack thereof) if all of the in silico programs tested agree on the prediction, whereas in silico evidence should not be used for classification if in silico predictions disagree. However, the guidelines do not recommend specific in silico methods, or indicate the number of methods that should be evaluated[35]. This differs from the NVM model in two key areas. First, default thresholds of predictive models are not appropriate for BRCA1 and BRCA2 because the specificity is very low. The new thresholds for predictive models derived here should provide more accurate predictions of functional impact and therefore pathogenicity. Second, while using an ensemble of models is a rational strategy, requiring all models to be in agreement becomes overly stringent resulting in decreased performance (Figure S3 and Figure S4). Rather, the number of in silico models, the choice of which specific models, and the thresholds for those models that are required for an accurate consensus with both high sensitivity and specificity can vary by gene.

Effect of grouping indeterminate variants as either damaging or neutral

Generally, the performance of individual in silico prediction models, as well as the RF and NVM, were similar when indeterminate variants were grouped with either damaging or neutral. However, the performance of some of the known prediction methods was highly sensitive to indeterminate variant classification. Interestingly, the prediction methods that had the greatest difference in thresholds, depending on the incorporation of the indeterminate variants in the damaging or neutral categories, also had higher AUCs (e.g. PERCH, NVM, RF, MetaSVM) compared to those with no change in threshold (e.g. PolyPhen2HDiv, PolyPhen2HVar, MutationTaster Score), suggesting that the former methods are better predictors of indeterminate impact on function. However, the clinical relevance of the indeterminate variants in BRCA1 and BRCA2 is not well understood. Further understanding of function, pathogenicity, cancer risk, and associated refinement of thresholds for damage and pathogenicity for each functional assay may allow recalibration of the prediction models and improved prediction of clinically relevant BRCA1 and BRCA2 variants in the future.

Extending missense prediction outside established functional domains

When cross-referencing the NVM predictions with well-annotated ClinVar classifications, the predictions clustered in well-known domains, with damaging missense variants mostly restricted to BRCT and RING domains of BRCA1 and the DNA binding domain of BRCA2 [12] (Figure S5). The BRCA1-NVM prediction model clearly delineated both regions, with as many as 40% of missense variants in the RING domain and 50% in parts of the BRCT regions annotated as damaging. Interestingly, enrichment between amino acids 400–500 was also observed, but no damaging variants in this region have been defined by functional studies and no pathogenic variants have yet been observed in the clinically tested population. According to the BRCA1-NVM model, the total proportion of all theoretically possible damaging variants in BRCA1 is ~8%, almost all of which are located in the known RING and BRCT domains. For BRCA2, family based studies in combination with the Align-GVGD prediction method were previously used to estimate that 33% of missense variants in the BRCA2 DNA binding domain were damaging[36]. While based on small numbers of missense variants, this is consistent with predictions from the NVM and RF models for BRCA2, although the frequency based on the BRCA2-NVM and BRCA2-RF models is as high as 50% in specific regions. A notable drop in the estimated pathogenic potential was observed in the BRCA2 OB2 DNA binding domain. This was also observed when considering all pathogenic BRCA2 missense mutations listed in ClinVar. However, it should be noted that when applied to genes other than BRCA1 and BRCA2 (Table S6) (or even BRCA1-RF or -NVM and applied to BRCA2 and vice versa), the performance of the NVM and RF models was much lower, with MCCs <0.40, as shown for the BRCA2-NVM (Table S7). The sizable reduction in model accuracy suggested that recalibrated models are specific to the initial gene of interest and cannot be effectively extrapolated to other disease genes. Another potential explanation for this phenomenon is that not all missense variants in other genes may exert phenotypic effects through loss of activity. Because the BRCA1 and BRCA2 assays are limited to measurement of loss of function, perhaps more comprehensive assays to evaluate splicing alterations, gain-of-function mutations, and epigenetic influences on gene function are needed in order to extend the NVM and RF prediction models to other genes. Separately, disruption of functions other than transcriptional activation or homology directed repair by missense variants could result in recalibration of the NVM and RF prediction models. Other influences on model performance may include AT versus GC content of coding sequences and codon usage, and the structural effects of observed variants. Finally, the differences could be due to evolutionary constraint – since some models like Align GVGD and PolyPhen2 perform well for the highly conserved BRCA1, but profoundly less so for the less constrained BRCA2. While the clinical implications of truncating mutations in the BRCA1 and BRCA2 breast cancer predisposition genes are clear, interpretation of missense variants is more challenging. Here we present an approach for predicting the functional impact and potentially the pathogenicity of missense BRCA1 and BRCA2 variants, based on functional evaluation of variants and in silico sequence-based analysis. The functional studies of BRCA2 variants in combination with similar studies of BRCA1 now identify 130 variants in these genes that are damaging and likely pathogenic and may substantially increase risk of breast, ovarian, and other cancers. In contrast, public databases currently identify fewer than 40 such variants. In the absence of functional results, other methods for variant assessment are needed. Many in silico prediction methods exist for characterization of missense variants, but the interpretation of results from these methods, and the accuracy of the methods for predicting whether variants in BRCA1 and BRCA2 are damaging or neutral are not well defined. Here we recalibrated established in silico prediction methods for missense variants using results from BRCA1 and BRCA2 functional assays and developed RF and NVM models that incorporate multiple in silico prediction methods. These classifiers out-performed the individual in silico models. Overall this approach leverages measures of BRCA1 and BRCA2 functional activity to improve the classification of BRCA1 and BRCA2 VUS detected by clinical genetic testing and tumor sequencing.
  22 in total

1.  BRCA1- and BRCA2-specific in silico tools for variant interpretation in the CAGI 5 ENIGMA challenge.

Authors:  Natàlia Padilla; Alejandro Moles-Fernández; Casandra Riera; Gemma Montalban; Selen Özkan; Lars Ootes; Sandra Bonache; Orland Díez; Sara Gutiérrez-Enríquez; Xavier de la Cruz
Journal:  Hum Mutat       Date:  2019-07-03       Impact factor: 4.878

2.  Classification of BRCA2 Variants of Uncertain Significance (VUS) Using an ACMG/AMP Model Incorporating a Homology-Directed Repair (HDR) Functional Assay.

Authors:  Kathleen S Hruska; Fergus J Couch; Chunling Hu; Lisa R Susswein; Maegan E Roberts; Hana Yang; Megan L Marshall; Susan Hiraki; Windy Berkofsky-Fessler; Sounak Gupta; Wei Shen; Carolyn A Dunn; Huaizhi Huang; Jie Na; Susan M Domchek; Siddhartha Yadav; Alvaro N A Monteiro; Eric C Polley; Steven N Hart
Journal:  Clin Cancer Res       Date:  2022-09-01       Impact factor: 13.801

3.  Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2.

Authors:  Raghad Aljarf; Mengyuan Shen; Douglas E V Pires; David B Ascher
Journal:  Sci Rep       Date:  2022-06-21       Impact factor: 4.996

4.  Saturation variant interpretation using CRISPR prime editing.

Authors:  Teija M I Bily; Jason Lequyer; Steven Erwood; Joyce Yan; Nitya Gulati; Reid A Brewer; Liangchi Zhou; Laurence Pelletier; Evgueni A Ivakine; Ronald D Cohn
Journal:  Nat Biotechnol       Date:  2022-02-21       Impact factor: 68.164

Review 5.  Variants of uncertain clinical significance in hereditary breast and ovarian cancer genes: best practices in functional analysis for clinical annotation.

Authors:  Alvaro N Monteiro; Peter Bouwman; Arne N Kousholt; Diana M Eccles; Gael A Millot; Jean-Yves Masson; Marjanka K Schmidt; Shyam K Sharan; Ralph Scully; Lisa Wiesmüller; Fergus Couch; Maaike P G Vreeswijk
Journal:  J Med Genet       Date:  2020-03-09       Impact factor: 6.318

Review 6.  Decoding disease: from genomes to networks to phenotypes.

Authors:  Aaron K Wong; Rachel S G Sealfon; Chandra L Theesfeld; Olga G Troyanskaya
Journal:  Nat Rev Genet       Date:  2021-08-02       Impact factor: 53.242

Review 7.  Basic and Preclinical Research for Personalized Medicine.

Authors:  Wanda Lattanzi; Cristian Ripoli; Viviana Greco; Marta Barba; Federica Iavarone; Angelo Minucci; Andrea Urbani; Claudio Grassi; Ornella Parolini
Journal:  J Pers Med       Date:  2021-04-29

8.  High-throughput functional evaluation of BRCA2 variants of unknown significance.

Authors:  Masachika Ikegami; Shinji Kohsaka; Toshihide Ueno; Yukihide Momozawa; Satoshi Inoue; Kenji Tamura; Akihiko Shimomura; Noriko Hosoya; Hiroshi Kobayashi; Sakae Tanaka; Hiroyuki Mano
Journal:  Nat Commun       Date:  2020-05-22       Impact factor: 14.919

9.  A Recurrent BRCA2 Mutation Explains the Majority of Hereditary Breast and Ovarian Cancer Syndrome Cases in Puerto Rico.

Authors:  Hector J Diaz-Zabala; Ana P Ortiz; Lisa Garland; Kristine Jones; Cynthia M Perez; Edna Mora; Nelly Arroyo; Taras K Oleksyk; Miguel Echenique; Jaime L Matta; Michael Dean; Julie Dutil
Journal:  Cancers (Basel)       Date:  2018-11-02       Impact factor: 6.639

10.  Structural bioinformatics enhances mechanistic interpretation of genomic variation, demonstrated through the analyses of 935 distinct RAS family mutations.

Authors:  Swarnendu Tripathi; Nikita R Dsouza; Raul Urrutia; Michael T Zimmermann
Journal:  Bioinformatics       Date:  2021-06-16       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.