| Literature DB >> 30233647 |
Alejandro Moles-Fernández1, Laura Duran-Lozano1, Gemma Montalban1, Sandra Bonache1, Irene López-Perolio2, Mireia Menéndez3,4,5, Marta Santamariña6, Raquel Behar2, Ana Blanco6, Estela Carrasco7, Adrià López-Fernández7, Neda Stjepanovic7,8, Judith Balmaña7,8, Gabriel Capellá3,4,5, Marta Pineda3,4,5, Ana Vega6, Conxi Lázaro3,4,5, Miguel de la Hoya2, Orland Diez1,9, Sara Gutiérrez-Enríquez1.
Abstract
In silico tools for splicing defect prediction have a key role to assess the impact of variants of uncertain significance. Our aim was to evaluate the performance of a set of commonly used splicing in silico tools comparing the predictions against RNA in vitro results. This was done for natural splice sites of clinically relevant genes in hereditary breast/ovarian cancer (HBOC) and Lynch syndrome. A study divided into two stages was used to evaluate SSF-like, MaxEntScan, NNSplice, HSF, SPANR, and dbscSNV tools. A discovery dataset of 99 variants with unequivocal results of RNA in vitro studies, located in the 10 exonic and 20 intronic nucleotides adjacent to exon-intron boundaries of BRCA1, BRCA2, MLH1, MSH2, MSH6, PMS2, ATM, BRIP1, CDH1, PALB2, PTEN, RAD51D, STK11, and TP53, was collected from four Spanish cancer genetic laboratories. The best stand-alone predictors or combinations were validated with a set of 346 variants in the same genes with clear splicing outcomes reported in the literature. Sensitivity, specificity, accuracy, negative predictive value (NPV) and Mathews Coefficient Correlation (MCC) scores were used to measure the performance. The discovery stage showed that HSF and SSF-like were the most accurate for variants at the donor and acceptor region, respectively. The further combination analysis revealed that HSF, HSF+SSF-like or HSF+SSF-like+MES achieved a high performance for predicting the disruption of donor sites, and SSF-like or a sequential combination of MES and SSF-like for predicting disruption of acceptor sites. The performance confirmation of these last results with the validation dataset, indicated that the highest sensitivity, accuracy, and NPV (99.44%, 99.44%, and 96.88, respectively) were attained with HSF+SSF-like or HSF+SSF-like+MES for donor sites and SSF-like (92.63%, 92.65%, and 84.44, respectively) for acceptor sites. We provide recommendations for combining algorithms to conduct in silico splicing analysis that achieved a high performance. The high NPV obtained allows to select the variants in which the study by in vitro RNA analysis is mandatory against those with a negligible probability of being spliceogenic. Our study also shows that the performance of each specific predictor varies depending on whether the natural splicing sites are donors or acceptors.Entities:
Keywords: NGS of gene-panel; RNA alteration; VUS classification; hereditary cancer genes; in silico tools; splicing
Year: 2018 PMID: 30233647 PMCID: PMC6134256 DOI: 10.3389/fgene.2018.00366
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Publications evaluating in silico splicing site tools.
| Reference | Number of variants | Source of the variants and | Gene(s) | Region analyzed | Experimental design | Prediction tools evaluated | Accuracy of recommended tools | Consensus guideline |
|---|---|---|---|---|---|---|---|---|
| 39 | ∗Experimental evidence | ±60 nucleotides from an AG/GT site | One evaluation stage | NNSplice, PWM, MES, ASSA, ESEfinder, RESCUE-ESE | NA | Not specifically provided | ||
| 18 | Experimental evidence | Intronic: 5′ until +5. 3′ until -16 | One evaluation stage | MES, NNSplice, NetGene2 | NA | Not specifically provided | ||
| 29 | Experimental evidence | Intronic: 5’ until +60. 3’ until -20 | One evaluation stage | NNSplice, NetGene2, PWM, ASSA, MES, HSF | NA | Not specifically provided | ||
| 623 | UMD locus-specific databases, HGMD, and datasets from previous studies | Multiple | Not specifically stated | One evaluation stage | GENSCAN, GeneSplicer, HSF, MES, NNSplice, SplicePort, SplicePredictor, SpliceView, SROOGLE | Invariable position: HSF, MES, SpliceView and SROOGLE 100%. Intronic SS +3, +5 and last exonic position: MES 100%. Other SS intronic position: MES and SplicePort 5’ 76/68% and 3’ 77.27/77.27% | Invariable position: HSF, MES, SpliceView and SROOGLE. Intronic SS +3, +5 and last exonic position: MES. Other SS intronic positions: MES and SplicePort | |
| 53 | Experimental evidence | Not specifically stated | One evaluation stage | PWM, GeneSplicer, NNSplice, MES, HSF | NA | Not specifically provided | ||
| 272 | Experimental evidence | Not specifically stated | One evaluation stage | NNSplice, SSF, MES, ESEfinder, RESCUE-ESE, HSF | Accuracy as AUC: MES: 0.956, SSF-like: 0.914 | Sequential MES and SSF | ||
| 24 | Experimental evidence | Not specifically stated | One evaluation stage | PWM, MES, NNSplice, GeneSplicer, HSF, NetGene2, SpliceView, SplicePredictor, ASSA | NA | HSF and ASSA | ||
| 2,959 | HGMD, SpliceDisease database and DBASS. Negative variants from 1000 Genomes Phase 1 | Multiple | 5’: from -3 to +8. 3’: from -12 to +2 | Evaluation of individual tools + new model construction + validation stage | SSF-like, MES, NNSplice, GeneSplicer, HSF, NetGene2, GENSCAN, SplicePredictor, ∗∗dbscSNV | SSF-like: 91.1% MES: 89.5%/dbscSNV: 93.3% | SSF-like, MES/dbscSNV | |
| 272 | HGMD (damaging variants) and negative variants from 1000 Genomes Phase 1 | Multiple | Intronic: 5’ from +3 to +7. 3’ from -3 to -9 | One evaluation stage | HSF, MES, NNSplice, ASSP | Accuracy as AUC: MES: 0.878 ASSP: 0.881 HSF: 0.834 | MES, ASSP, and HSF combination | |
| 395 | Experimental evidence | Multiple | 5’: from -3 to +8. 3’: from -12 to +2 | Training + evaluation stage | HSF, MES, SSF-like, NNSplice, GS, SPiCE (MES and SSF combination) | SPiCE 95.6% | SPiCE (ThSe threshold with MES and SSF combination) |
Performance of the individual in silico tools in the discovery dataset.
| Sensitivity | Specificity | Accuracy | MCC | Positive Predictive Value | Negative Predictive Value | False Negative Rate | False Positive Rate | False Discovery Rate | False Omission Rate | |
|---|---|---|---|---|---|---|---|---|---|---|
| 96.154 | 96.296 | 3.846 | 3.704 | |||||||
| 96.154 | 96.154 | 96.154 | 0.923 | 96.154 | 96.154 | 3.846 | 3.846 | 3.846 | 3.846 | |
| 84.615 | 92.308 | 0.856 | 86.667 | 15.385 | 13.333 | |||||
| 91.667 | 90.000 | 91.176 | 0.795 | 95.652 | 81.818 | 8.333 | 10.000 | 4.348 | 18.182 | |
| 92.308 | 80.769 | 86.538 | 0.735 | 82.759 | 91.304 | 7.692 | 19.231 | 17.241 | 8.696 | |
| 62.500 | 81.633 | 0.677 | 73.529 | 37.500 | 26.471 | |||||
| 85.714 | 14.286 | |||||||||
| 86.207 | 91.489 | 0.839 | 81.818 | 13.793 | 18.182 | |||||
| 93.750 | 77.778 | 88.000 | 0.736 | 87.500 | 6.250 | 22.222 | 12.500 | |||
| 83.333 | 82.759 | 82.979 | 0.649 | 75.000 | 88.889 | 16.667 | 17.241 | 25.000 | 11.111 | |
| 88.889 | 68.966 | 76.596 | 0.563 | 64.000 | 90.909 | 11.111 | 31.034 | 36.000 | 9.091 | |
| 41.176 | 88.460 | 69.760 | 0.343 | 70.000 | 69.697 | 58.824 | 11.538 | 30.000 | 30.303 | |
| 97.727 | 92.727 | 91.489 | 98.077 | 2.273 | 7.273 | 8.511 | 1.923 | |||
| 85.455 | 91.919 | 0.850 | 84.615 | 14.545 | 15.385 | |||||
| 93.182 | 89.091 | 90.909 | 0.818 | 87.234 | 94.231 | 6.818 | 10.909 | 12.766 | 5.769 | |
| 92.500 | 84.211 | 89.831 | 0.767 | 84.211 | 7.500 | 15.789 | 15.789 | |||
| 90.909 | 74.545 | 81.818 | 0.653 | 74.074 | 91.111 | 9.091 | 25.455 | 25.926 | 8.889 | |
| 53.659 | 76.087 | 0.533 | 88.000 | 71.642 | 46.341 | 12.000 | 28.358 |
Performance with the validation dataset of the best in silico tools previously selected from the results at discovery stage.
| Sensitivity | Specificity | Accuracy | MCC | Positive predictive Value | Negative Predictive Value | False Negative Rate | False Positive Rate | False Discovery Rate | False Omission Rate | |
|---|---|---|---|---|---|---|---|---|---|---|
| All variants | 96.045 | 90.909 | 95.238 | 0.831 | 98.266 | 81.081 | 3.955 | 9.091 | 1.734 | 18.919 |
| Without invariable dinucleotides | 94.643 | 90.909 | 93.793 | 0.830 | 97.248 | 83.333 | 5.357 | 9.091 | 2.752 | 16.667 |
| All variants | ||||||||||
| Without invariable dinucleotides | ||||||||||
| All variants | ||||||||||
| Without invariable dinucleotides | ||||||||||
| All variants | 91.579 | 82.979 | 8.421 | 17.021 | ||||||
| Without invariable dinucleotides | 71.429 | 82.609 | 28.571 | 17.391 | ||||||
| All variants | 92.683 | 0.832 | 96.703 | 7.317 | 3.297 | |||||
| Without invariable dinucleotides | 92.500 | 0.695 | 87.500 | 7.500 | 12.500 |