| Literature DB >> 31114899 |
Emidio Capriotti1, Ludovica Montanucci2, Giuseppe Profiti3, Ivan Rossi3, Diana Giannuzzi2, Luca Aresu4, Piero Fariselli2,5.
Abstract
As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve.Entities:
Year: 2019 PMID: 31114899 PMCID: PMC6602425 DOI: 10.1093/nar/gkz420
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Distribution of the variants in the hd-pathogenic data set and an equal number of potentially benign SNVs from dbSNP (build 146) along the dog chromosomes. (B) Schematic view of the Fido-SNP algorithm and its input features. (C) Distribution of the PhyloP11 score for the potentially pathogenic mutated loci in the hd-pathogenic data set and a random set of variants from dbSNP. (D) Receiver Operator Characteristic (ROC) curve (black) obtained on the validation data set (dog-omia) and Matthews correlation coefficient (MCC) at different classification thresholds (red).
Average performance of Fido-SNP on the hd-pathogenic and dog-omia data sets. All data contains pathogenic variants and an equal number of potentially benign variants randomly selected from dbSNP. Optimized performance of Fido-SNP obtained maximizing the MCC on the hd-pathogenic set. Performance on the validation set (dog-omia) considering a classification threshold of 0.1
| Data set | Threshold | Q2 | TNR | NPV | TPR | PPV | MCC | AUC |
|---|---|---|---|---|---|---|---|---|
|
| 0.09±0.02 | 0.87±0.04 | 0.91±0.04 | 0.86±0.01 | 0.85±0.01 | 0.91±0.04 | 0.77±0.04 | 0.91±0.01 |
|
| 0.10±0.01 | 0.88±0.02 | 0.92±0.03 | 0.85±0.01 | 0.84±0.01 | 0.92±0.03 | 0.77±0.04 | 0.91±0.01 |
|
| 0.11±0.03 | 0.87±0.04 | 0.92±0.05 | 0.84±0.05 | 0.82±0.07 | 0.92±0.05 | 0.75±0.08 | 0.91±0.04 |
*Performance of Fido-SNP on the dog-omia data set using a 3-fold cross-validation procedure. The performance measures are defined in Supplementary Materials. The values are computed using the canfam3 assembly.
Comparison between Fido-SNP and SIFT predictions on dog-omia and Lym168 data sets
| Data set | Method | Pathogenic | Predicted SNVs |
|---|---|---|---|
|
| Fido-SNP | 119 (78.8%) | 168/168 (100.0%) |
| SIFT | 70 (51.1%) | 137/168 (70.8%) | |
|
| Fido-SNP | 64 (85.3%) | 75/75 (100.0%) |
| SIFT | 43 (84.3%) | 51/75 (68.0%) |