| Literature DB >> 35729312 |
Raghad Aljarf1,2,3, Mengyuan Shen1,2,3,4, Douglas E V Pires5,6,7,8, David B Ascher9,10,11,12.
Abstract
BRCA1 and BRCA2 are tumour suppressor genes that play a critical role in maintaining genomic stability via the DNA repair mechanism. DNA repair defects caused by BRCA1 and BRCA2 missense variants increase the risk of developing breast and ovarian cancers. Accurate identification of these variants becomes clinically relevant, as means to guide personalized patient management and early detection. Next-generation sequencing efforts have significantly increased data availability but also the discovery of variants of uncertain significance that need interpretation. Experimental approaches used to measure the molecular consequences of these variants, however, are usually costly and time-consuming. Therefore, computational tools have emerged as faster alternatives for assisting in the interpretation of the clinical significance of newly discovered variants. To better understand and predict variant pathogenicity in BRCA1 and BRCA2, various machine learning algorithms have been proposed, however presented limited performance. Here we present BRCA1 and BRCA2 gene-specific models and a generic model for quantifying the functional impacts of single-point missense variants in these genes. Across tenfold cross-validation, our final models achieved a Matthew's Correlation Coefficient (MCC) of up to 0.98 and comparable performance of up to 0.89 across independent, non-redundant blind tests, outperforming alternative approaches. We believe our predictive tool will be a valuable resource for providing insights into understanding and interpreting the functional consequences of missense variants in these genes and as a tool for guiding the interpretation of newly discovered variants and prioritizing mutations for experimental validation.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35729312 PMCID: PMC9213547 DOI: 10.1038/s41598-022-13508-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1The distributions of BRCA1 and BRCA2 missense variants shown as lollipop plots. Benign and likely benign variants are represented by blue circles and red circles depict pathogenic and likely pathogenic variants. The mapped BRCA1 and BRCA2 missense variants are ranked for their impact at the protein level, particularly nonsynonymous missense variants.
Figure 2Distribution of the top discriminative features between the pathogenic and benign variants. Selected features incorporated sequence conservation and amino acids physicochemical properties. (PolyphenScore, HENS920101, WEIL970101 and LUTR910107). The selected features are significantly different between the two classes (p < 0.001). Statistical significance was measured using the Welch sample t-test.
Comparative performance of BRCA1/2 models across cross-validation and non-redundant blind test sets.
| Model | MCC | Sensitivity | Specificity | F1 score | Precision | Accuracy | |
|---|---|---|---|---|---|---|---|
| CV | 0.96 | 93% | 98% | 0.96 | 0.96 | 0.96 | |
| test | 0.89 | 92% | 82% | 0.86 | 0.87 | 0.86 | |
| CV | 0.96 | 94% | 98% | 0.97 | 0.97 | 0.97 | |
| test | 0.89 | 100% | 91% | 0.94 | 0.95 | 0.94 | |
| CV | 0.96 | 99% | 99% | 0.99 | 0.99 | 0.99 | |
| test | 0.82 | 92% | 90% | 0.93 | 0.94 | 0.92 | |
| CV | 0.98 | 96% | 98% | 0.98 | 0.98 | 0.98 | |
| test | 0.89 | 60% | 100% | 0.92 | 0.93 | 0.92 | |
| CV | 0.89 | 82% | 93% | 0.91 | 0.91 | 0.91 | |
| test | 0.83 | 86% | 100% | 0.97 | 0.98 | 0.97 | |
| CV | 0.95 | 99% | 99% | 0.99 | 0.99 | 0.99 | |
| test | 1.00 | 100% | 100% | 1.00 | 1.00 | 1.00 | |
| General model | |||||||
| CV | 0.91 | 96% | 98% | 0.96 | 0.96 | 0.96 | |
| test | 0.76 | 93% | 100% | 0.93 | 0.93 | 0.92 | |
Figure 3Receiver Operating Characteristic (ROC) curves for BRCA1 (top) and BRCA2 (bottom). Our predictive models accurately identified pathogenic variants with AUC > 0.92 on cross-validation and blind tests.
Figure 4Distribution of probability scores predicted by our final model for functionally assessed VUSs in BRCA1 and BRCA2.
Comparative Performance on cross-validation between BRCA1/2 classification models and other available approaches.
| Methods | ACC | MCC | Sens | Spec | ROC-AUC | |
|---|---|---|---|---|---|---|
| – | – | 95% | 100% | 0.99 | ||
| MLR[ | 0.82 | 0.65 | 69% | 93% | – | |
| MLR-CAGI[ | 0.75 | 0.50 | 61% | 87% | – | |
| SIFT[ | 0.68 | 0.50 | 92% | 55% | 0.73 | |
| PolyPhen-2[ | 0.64 | 0.40 | 55% | 69% | 0.62 | |
| Align-GVGD[ | 0.48 | 0.18 | 48% | 26% | 0.58 | |
| REVEL[ | 0.68 | 0.48 | 68% | 55% | 0.74 | |
| CADD[ | 0.74 | 0.58 | 74% | 61% | 0.80 | |
| 0.96 | 0.96 | 93% | 98% | 0.98 | ||
| 0.99 | 0.96 | 99% | 99% | 0.98 | ||
| SGE[ | – | – | 96.7% | 98.2% | 0.98 | |
| BRCA-ML[ | – | 0.66 | 89.5% | 91.5% | – | |
| – | – | 97% | 83% | 0.96 | ||
| NN[ | 0.87 | 0.75 | 92% | 85% | – | |
| NN-CAGI[ | 0.76 | 0.55 | 86% | 71% | – | |
| SIFT[ | 0.68 | 0.50 | 92% | 55% | 0.74 | |
| PolyPhen-2[ | 0.64 | 0.40 | 55% | 69% | 0.62 | |
| Align-GVGD[ | 0.40 | 0.18 | 38% | 23% | 0.54 | |
| REVEL[ | 0.68 | 0.48 | 68% | 55% | 0.74 | |
| CADD[ | 0.74 | 0.58 | 74% | 60% | 0.80 | |
| Our model | 0.97 | 0.96 | 94% | 98% | 0.99 | |
| MLR[ | 0.78 | 0.57 | 82% | 74% | – | |
| MLR-CAGI[ | 0.86 | 0.71 | 86% | 85% | – | |
| SIFT[ | 0.64 | 0.45 | 96% | 58% | 0.77 | |
| PolyPhen-2[ | 0.61 | 0.43 | 89% | 56% | 0.72 | |
| Align-GVGD[ | 0.31 | 0.18 | 31% | 24% | 0.47 | |
| REVEL[ | 0.85 | 0.48 | 85% | 89% | 0.77 | |
| CADD[ | 0.76 | 0.58 | 76% | 71% | 0.86 | |
| 0.98 | 0.98 | 96% | 98% | 0.97 | ||
| 0.99 | 0.95 | 99% | 99% | 0.99 | ||
| BRCA-ML[ | – | 0.73 | 97.7% | 85.1% | – | |
| NN[ | 0.84 | 0.59 | 75% | 86% | – | |
| NN-CAGI[ | 0.76 | 0.47 | 75% | 77% | – | |
| SIFT[ | 0.72 | 0.45 | 93% | 68% | 0.80 | |
| PolyPhen-2[ | 0.68 | 0.40 | 88% | 64% | 0.76 | |
| Align-GVGD[ | 0.30 | 0.18 | 30% | 22% | 0.49 | |
| REVEL[ | 0.87 | 0.54 | 87% | 91% | 0.79 | |
| CADD[ | 0.80 | 0.54 | 80% | 77% | 0.85 | |
| Our model | 0.91 | 0.89 | 82% | 93% | 0.92 | |
| BRCA-ML[ | – | – | 74% | 98% | – | |
| SIFT[ | 0.70 | 0.48 | 92% | 63% | 0.78 | |
| PolyPhen-2[ | 0.66 | 0.40 | 66% | 66% | 0.66 | |
| Align-GVGD[ | 0.38 | 0.06 | 38% | 24% | 0.53 | |
| REVEL[ | 0.75 | 0.50 | 75% | 72% | 0.79 | |
| CADD[ | 0.78 | 0.60 | 78% | 72% | 0.84 | |
| Our model | 0.96 | 0.91 | 96% | 98% | 0.96 |
Comparative Performance on blindtest sets between BRCA1/2 classification models and other alternative predictors.
| Methods | ACC | MCC | Sens | Spec | ROC-AUC | |
|---|---|---|---|---|---|---|
| SIFT[ | 0.67 | 0.62 | 76% | 64% | 0.81 | |
| PolyPhen-2[ | 0.68 | 0.41 | 68% | 59% | 0.71 | |
| Align-GVGD[ | 0.48 | 0.18 | 48% | 26% | 0.64 | |
| REVEL[ | 0.41 | 0.18 | 41% | 20% | 0.54 | |
| CADD[ | 0.74 | 0.58 | 74% | 59% | 0.80 | |
| Our model | 0.86 | 0.89 | 92% | 82% | 0.95 | |
| SIFT[ | 0.67 | 0.62 | 76% | 64% | 0.81 | |
| PolyPhen-2[ | 0.68 | 0.41 | 68% | 59% | 0.71 | |
| Align-GVGD[ | 0.40 | 0.18 | 38% | 23% | 0.64 | |
| REVEL[ | 0.41 | 0.18 | 41% | 21% | 0.54 | |
| CADD[ | 0.74 | 0.58 | 73% | 59% | 0.80 | |
| Our model | 0.94 | 0.89 | 100% | 91% | 0.95 | |
| SIFT[ | 0.62 | 0.42 | 62% | 52% | 0.76 | |
| PolyPhen-2[ | 0.58 | 0.40 | 58% | 48% | 0.74 | |
| Align-GVGD[ | 0.39 | 0.34 | 38% | 24% | 0.62 | |
| REVEL[ | 0.84 | 0.51 | 85% | 86% | 0.82 | |
| CADD[ | 0.58 | 0.58 | 58% | 48% | 0.74 | |
| Our model | 0.92 | 0.89 | 60% | 100% | 0.98 | |
| SIFT[ | 0.74 | 0.51 | 74% | 70% | 0.85 | |
| PolyPhen-2[ | 0.69 | 0.46 | 69% | 64% | 0.82 | |
| Align-GVGD[ | 0.46 | 0.28 | 46% | 36% | 0.68 | |
| REVEL[ | 0.87 | 0.61 | 86% | 88% | 0.86 | |
| CADD[ | 0.74 | 0.58 | 74% | 70% | 0.85 | |
| Our model | 0.97 | 0.83 | 86% | 100% | 0.98 | |
| SIFT[ | 0.69 | 0.52 | 69% | 59% | 0.79 | |
| PolyPhen-2[ | 0.65 | 0.47 | 65% | 54% | 0.77 | |
| Align-GVGD[ | 0.44 | 0.19 | 44% | 30% | 0.59 | |
| REVEL[ | 0.82 | 0.64 | 82% | 78% | 0.86 | |
| CADDyyy[ | 0.69 | 0.58 | 69% | 59% | 0.79 | |
| Our model | 0.92 | 0.76 | 93% | 100% | 0.95 |