| Literature DB >> 24810707 |
Christopher M Yates1, Ioannis Filippis2, Lawrence A Kelley2, Michael J E Sternberg2.
Abstract
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html.Entities:
Keywords: SAV; SuSPect; missense mutation; nsSNP; protein–protein interaction
Mesh:
Substances:
Year: 2014 PMID: 24810707 PMCID: PMC4087249 DOI: 10.1016/j.jmb.2014.04.026
Source DB: PubMed Journal: J Mol Biol ISSN: 0022-2836 Impact factor: 5.469
Distribution of disease and polymorphism SAVs in PDB and Phyre2 structures
| Phenotype | Structure | Total | ||
|---|---|---|---|---|
| PDB | Phyre2 | N/A | ||
| Disease | 2914 | 13,560 | 4254 | 20,728 |
| Polymorphism | 1468 | 16,833 | 18,498 | 36,799 |
| Total | 4382 | 30,393 | 22,752 | 57,527 |
Features chosen in feature selection on the full training set
| Feature |
|---|
| (a) Degree centrality in a PPI network. |
| (b) Number of annotations at this position in UniProt FT feature table. |
| (c) Score for the wild-type amino acid in a PSSM. |
| (d) Score for the mutant amino acid in PSSM. |
| (e) Difference between PSSM scores for the wild type and mutant amino acids at the SAV position. |
| (f) Difference between Pfam HMM emission probabilities for the wild type and mutant amino acids at the SAV position. |
| (g) Jensen-Shannon divergence, a measure of sequence conservation. |
| (h) Percentage sequence identity with the first sequence in the MSA to have the mutant amino acid at the SAV position. |
| (i) RSA predicted by NetSurfP. |
Fig. 1Performance of five versions of SuSPect compared to seven other methods.
Performance of five versions of SuSPect compared to 11 other SAV phenotype prediction methods, ordered by MCC
| Method | Precision | Recall | Balanced accuracy | MCC | AUC | |
|---|---|---|---|---|---|---|
| SuSPect-FS | 0.75 | 0.75 | 0.75 | |||
| SuSPect-No Structure | 0.73 | 0.67 | 0.70 | 0.79 | 0.59 | 0.89 |
| SuSPect-All | 0.72 | 0.67 | 0.69 | 0.78 | 0.58 | 0.88 |
| SNPs&GO | 0.70 | 0.56 | — | |||
| MutPred | 0.79 | 0.80 | 0.75 | 0.49 | 0.84 | |
| SuSPect-No Networks | 0.78 | 0.64 | 0.70 | 0.71 | 0.44 | 0.78 |
| PHD-SNP | 0.69 | 0.72 | 0.70 | 0.69 | 0.39 | — |
| MutationAssessor | 0.36 | 0.50 | 0.70 | 0.34 | 0.79 | |
| SuSPect-FS-No Networks | 0.63 | 0.45 | 0.53 | 0.67 | 0.38 | 0.74 |
| SNAP | 0.82 | 0.75 | 0.78 | 0.68 | 0.34 | — |
| FATHMM | 0.41 | 0.71 | 0.52 | 0.63 | 0.24 | 0.63 |
| SIFT | 0.14 | 0.58 | 0.23 | 0.62 | 0.22 | 0.65 |
| Condel | 0.43 | 0.52 | 0.47 | 0.61 | 0.21 | 0.63 |
| SNPanalyzer | 0.94 | 0.61 | 0.74 | 0.65 | 0.20 | — |
| PANTHER | 0.43 | 0.75 | 0.55 | 0.59 | 0.17 | 0.63 |
| PolyPhen-2 | 0.37 | 0.60 | 0.46 | 0.58 | 0.14 | 0.62 |
For four methods, predictions were binary; thus, AUC could not be calculated.
Fig. 2Distribution of SuSPect-FS scores for disease-associated (red) and neutral (blue) SAVs in the VariBench test set. The two sets of SAVs have significantly different distributions (Wilcoxon test, p < 2.2 × 10− 16).