| Literature DB >> 23935863 |
Morten Bo Johansen1, Jose M G Izarzugaza, Søren Brunak, Thomas Nordahl Petersen, Ramneek Gupta.
Abstract
We have developed a sequence conservation-based artificial neural network predictor called NetDiseaseSNP which classifies nsSNPs as disease-causing or neutral. Our method uses the excellent alignment generation algorithm of SIFT to identify related sequences and a combination of 31 features assessing sequence conservation and the predicted surface accessibility to produce a single score which can be used to rank nsSNPs based on their potential to cause disease. NetDiseaseSNP classifies successfully disease-causing and neutral mutations. In addition, we show that NetDiseaseSNP discriminates cancer driver and passenger mutations satisfactorily. Our method outperforms other state-of-the-art methods on several disease/neutral datasets as well as on cancer driver/passenger mutation datasets and can thus be used to pinpoint and prioritize plausible disease candidates among nsSNPs for further investigation. NetDiseaseSNP is publicly available as an online tool as well as a web service: http://www.cbs.dtu.dk/services/NetDiseaseSNP.Entities:
Mesh:
Year: 2013 PMID: 23935863 PMCID: PMC3723835 DOI: 10.1371/journal.pone.0068370
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Softening of target values procedure.
Density plot showing the change in the distribution of the output values for known disease and neutral SNPs in the running evaluation set during the softening of target values procedure. Step 0 is the distribution of output values before the procedure and step 4 is the distribution at the end of the procedure for the final version of NetDiseaseSNP. Predicted scores above the threshold of 0.5 are true positives for data shown in red graphs, whereas scores above 0.5 in the blue graphs represent false positives. It is seen that predicted scores are dragged more to the extreme ends (0–1) after applying the procedure ‘softening of target values’.
Composition of the training sets.
| Neutral | Disease | |||
|
|
|
|
|
|
| UniProt | 20202 | 7513 | 6904 | 1847 |
| HGMD | — | — | 31766 | 1593 |
| Human-rodent | 18468 | 2260 | — | — |
| All | 38670 | 7979 | 38670 | 3440 |
Contribution of each source database to the training datasets. The table shows the number of SNPs, as well of the number of affected proteins, in each of the prediction categories, namely, neutral and disease-associated.
Benchmark of NetDiseaseSNP.
|
|
|
|
|
|
|
|
|
| NetDiseaseSNP | 77340 | 0.82 | 0.83 | 0.80 | 0.83 | 0.82 | 0.64 |
| SIFTnd | 75647 | 0.82 | 0.83 | 0.79 | 0.84 | 0.81 | 0.63 |
| SIFTsd | 24584 | 0.67 | 0.41 | 0.54 | 0.72 | 0.47 | 0.24 |
| SNAP | 25141 | 0.51 | 0.33 | 0.84 | 0.40 | 0.48 | 0.22 |
| Polyphen2 | 11012 | 0.61 | 0.09 | 0.81 | 0.60 | 0.17 | 0.18 |
| MutationAssessor | 40693 | 0.64 | 0.30 | 0.86 | 0.60 | 0.44 | 0.34 |
Performance of NetDiseaseSNP and other state-of-the art predictors. The evaluation was performed on all variants in the evaluation set. This includes data obtained from Blosum62 matrices.
Benchmark of NetDiseaseSNP: SIFT PSSMs.
|
|
|
|
|
|
|
|
|
| NetDiseaseSNP | 67119 | 0.83 | 0.84 | 0.82 | 0.85 | 0.83 | 0.67 |
| SIFTnd | 67119 | 0.82 | 0.83 | 0.79 | 0.84 | 0.81 | 0.63 |
| SIFTsd | 22020 | 0.68 | 0.41 | 0.54 | 0.73 | 0.46 | 0.25 |
| SNAP | 22417 | 0.52 | 0.32 | 0.83 | 0.41 | 0.46 | 0.22 |
| Polyphen-2 | 10042 | 0.61 | 0.07 | 0.80 | 0.60 | 0.14 | 0.16 |
| MutationAssessor | 35657 | 0.64 | 0.29 | 0.86 | 0.60 | 0.43 | 0.33 |
Performance of NetDiseaseSNP and other state-of-the art predictors. The evaluation was performed only on the variants for which a SIFT PSSM was available. This excludes data obtained from Blosum62 matrices.
Benchmark of NetDiseaseSNP: Cancer drivers and passengers.
|
|
|
|
|
|
|
|
|
| NetDiseaseSNP | 4401 | 0.85 | 0.62 | 0.81 | 0.86 | 0.70 | 0.61 |
| SIFTnd | 4036 | 0.84 | 0.63 | 0.79 | 0.85 | 0.70 | 0.60 |
| SIFTsd | 2778 | 0.78 | 0.37 | 0.64 | 0.81 | 0.47 | 0.36 |
| SNAP | 2835 | 0.57 | 0.24 | 0.85 | 0.51 | 0.37 | 0.26 |
| Polyphen-2 | 1686 | 0.78 | 0.06 | 0.85 | 0.78 | 0.11 | 0.19 |
| MutationAssessor | 1587 | 0.66 | 0.19 | 0.86 | 0.64 | 0.31 | 0.29 |
Performance of NetDiseaseSNP and other state-of-the art predictors on the cancer-specific dataset from CanPredict [15].
Figure 2Prediction by NetDiseaseSNP on COSMIC.
Number of predicted passenger (neutral) and driver (disease) mutations for the different tissue types in the COSMIC cancer dataset. Our recommendation that predicted disease mutations are ‘drivers’ further suggests that while breast cancer shows almost an equal number of driver and passenger mutations other cancer types are more enriched for ‘drivers’ — at least in the COSMIC dataset.