| Literature DB >> 36246618 |
Mohannad N Khandakji1,2, Borbala Mifsud1,3.
Abstract
Background: Existing BRCA2-specific variant pathogenicity prediction algorithms focus on the prediction of the functional impact of a subtype of variants alone. General variant effect predictors are applicable to all subtypes, but are trained on putative benign and pathogenic variants and do not account for gene-specific information, such as hotspots of pathogenic variants. Local, gene-specific information have been shown to aid variant pathogenicity prediction; therefore, our aim was to develop a BRCA2-specific machine learning model to predict pathogenicity of all types of BRCA2 variants.Entities:
Keywords: VUS; breast cancer; in-silico predictions; variant pathogenicity; variant prioritization
Year: 2022 PMID: 36246618 PMCID: PMC9561395 DOI: 10.3389/fgene.2022.982930
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
The receiver operating characteristic (ROC) curve analysis for the different in silico predictions. AUC: Area under the curve, Obs: Number of observations, Std.Err: Standard error, CI: Confidence interval.
| In silico prediction method | Observation | AUC | Std.Err | 95% CI Low | 95% CI High |
|---|---|---|---|---|---|
| XGBoost | 820 | 1 | 0 | 0.99996 | 1 |
| Consequence | 4102 | 0.9978 | 0.0009 | 0.9961 | 0.99945 |
| IMPACT | 4102 | 0.9986 | 0.0005 | 0.99765 | 0.99957 |
| SIFT_Score | 142 | 0.1166 | 0.0237 | 0.07008 | 0.16318 |
| PolyPhen_Scor | 142 | 0.8811 | 0.0493 | 0.78446 | 0.97781 |
| BayesDel_addAF_rankscore | 717 | 0.9896 | 0.005 | 0.97974 | 0.99946 |
| BayesDel_noAF_rankscore | 717 | 0.9657 | 0.0105 | 0.94517 | 0.98617 |
| CADD_raw_rankscore | 717 | 0.9916 | 0.0041 | 0.9836 | 0.99968 |
| ClinPred_rankscore | 142 | 0.9922 | 0.0054 | 0.98165 | 1 |
| DANN_rankscore | 717 | 0.6899 | 0.0341 | 0.6231 | 0.75676 |
| Eigen-PC-raw_coding_rankscore | 717 | 0.8116 | 0.0255 | 0.76169 | 0.86157 |
| Eigen-raw_coding_rankscore | 717 | 0.8641 | 0.0225 | 0.81998 | 0.90832 |
| FATHMM_converted_rankscore | 142 | 0.9238 | 0.0437 | 0.8381 | 1 |
| GERP++_RS_rankscore | 717 | 0.658 | 0.0271 | 0.60495 | 0.71103 |
| GM12878_fitCons_rankscore | 717 | 0.4681 | 0.0266 | 0.41598 | 0.52016 |
| GenoCanyon_rankscore | 717 | 0.5597 | 0.0264 | 0.50789 | 0.61149 |
| H1-hESC_fitCons_rankscore | 717 | 0.5535 | 0.0285 | 0.49759 | 0.6095 |
| HUVEC_fitCons_rankscore | 717 | 0.5132 | 0.0257 | 0.46293 | 0.56354 |
| LRT_converted_rankscore | 717 | 0.5833 | 0.0273 | 0.52976 | 0.63693 |
| M-CAP_rankscore | 120 | 0.9181 | 0.0341 | 0.85119 | 0.985 |
| MPC_rankscore | 141 | 0.9294 | 0.0295 | 0.87164 | 0.98714 |
| MVP_rankscore | 135 | 0.9563 | 0.0184 | 0.92017 | 0.99247 |
| MetaLR_rankscore | 142 | 0.9414 | 0.0373 | 0.86837 | 1 |
| MetaRNN_rankscore | 142 | 0.995 | 0.0038 | 0.98745 | 1 |
| MetaSVM_rankscore | 142 | 0.9096 | 0.0688 | 0.77467 | 1 |
| MutPred_rankscore | 46 | 0.9821 | 0.0147 | 0.95336 | 1 |
| MutationTaster_rankscore | 717 | 0.984 | 0.0077 | 0.96888 | 0.99921 |
| PROVEAN_converted_rankscore | 142 | 0.596 | 0.068 | 0.46263 | 0.72934 |
| PrimateAI_rankscore | 141 | 0.9153 | 0.0666 | 0.78482 | 1 |
| REVEL_rankscore | 142 | 0.9531 | 0.0217 | 0.91052 | 0.99573 |
| SiPhy_29way_logOdds_rankscore | 717 | 0.6476 | 0.0266 | 0.59544 | 0.69976 |
| VEST4_rankscore | 717 | 0.9948 | 0.0023 | 0.99039 | 0.99925 |
| bStatistic_converted_rankscore | 717 | 0.525 | 0.0271 | 0.47193 | 0.57798 |
| fathmm-MKL_coding_rankscore | 717 | 0.6825 | 0.0281 | 0.62749 | 0.73749 |
| fathmm-XF_coding_rankscore | 717 | 0.4945 | 0.031 | 0.43373 | 0.55523 |
| integrated_fitCons_rankscore | 717 | 0.5015 | 0.0262 | 0.45006 | 0.55286 |
| phastCons17way_primate_rankscore | 717 | 0.5545 | 0.0287 | 0.49815 | 0.61085 |
| phyloP17way_primate_rankscore | 717 | 0.4938 | 0.0341 | 0.42692 | 0.56064 |
| MaxEntScan_alt | 64 | 0.1129 | 0.0416 | 0.03141 | 0.19444 |
| MaxEntScan_diff | 64 | 0.8333 | 0.0496 | 0.73613 | 0.93054 |
| MaxEntScan_ref | 64 | 0.3891 | 0.0843 | 0.22388 | 0.55435 |
| SpliceAI_pred_DS_AG | 3655 | 0.5148 | 0.0048 | 0.50537 | 0.52418 |
| SpliceAI_pred_DS_AL | 3655 | 0.5149 | 0.0029 | 0.50917 | 0.52065 |
| SpliceAI_pred_DS_DG | 3655 | 0.5016 | 0.0032 | 0.49532 | 0.5078 |
| SpliceAI_pred_DS_DL | 3655 | 0.5202 | 0.0032 | 0.51396 | 0.52638 |
FIGURE 1Comparison of the number of pathogenic and benign variants among the 4,102 reviewed, across the BRCA2 gene. (A) The number of pathogenic and benign variants per BRCA2 exons. (B) The number of pathogenic and benign variants per BRCA2 introns.
FIGURE 2The BRCA2 XGBoost models. (A) The models characteristics (AUC: Area Under the Curve). (B) Feature importance of the XGBoost model. (C) Feature importance of the XGBoost model without consequence. The BRCA2 XGBoost model trained on the whole reviewed dataset (4,102 variants) was used to predict VUS pathogenicity based on HDR functional assay scores. (D) The performance of the model on a set of pathogenic and benign variants according to HDR cutoffs <= 1.66 and =>2.44 (247 variants) and cutoffs <= 1.0 and >=3.0 (160 variants). (E) Feature importance of the XGBoost model trained on the whole reviewed dataset.
FIGURE 3Comparison between the predicted pathogenic and benign variants across the BRCA2 exons. (A) The number of predicted pathogenic and benign missense variants (7,131) per BRCA2 exons. (B) Percent distribution of predicted pathogenic and benign missense variants across the BRCA2 exons.