| Literature DB >> 22689647 |
Ngak-Leng Sim1, Prateek Kumar, Jing Hu, Steven Henikoff, Georg Schneider, Pauline C Ng.
Abstract
The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function. It was first introduced in 2001, with a corresponding website that provides users with predictions on their variants. Since its release, SIFT has become one of the standard tools for characterizing missense variation. We have updated SIFT's genome-wide prediction tool since our last publication in 2009, and added new features to the insertion/deletion (indel) tool. We also show accuracy metrics on independent data sets. The original developers have hosted the SIFT web server at FHCRC, JCVI and the web server is currently located at BII. The URL is http://sift-dna.org (24 May 2012, date last accessed).Entities:
Mesh:
Substances:
Year: 2012 PMID: 22689647 PMCID: PMC3394338 DOI: 10.1093/nar/gks539
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of HumDiv and HumVar data points used to assess SIFT’s performance
| Data set | Number of data points | Coverage | ||
|---|---|---|---|---|
| From original dataset ( | Used in evaluating SIFT | With SIFT predictions | ||
| HumDiv neutral | 6027 | 5816 | 5582 | 96.0 |
| HumDiv deleterious | 3055 | 2893 | 2791 | 96.5 |
| HumVar neutral | 8638 | 7475 | 7178 | 96.0 |
| HumVar deleterious | 12 598 | 11 982 | 11 561 | 96.5 |
*Lookups to the SIFT database required Ensembl, RefSeq and UCSC Known protein identifiers and the chromosome associated with the given identifier. Not all data points could be mapped to these types of protein identifiers using UniProtKB’s ID mapping tool. Furthermore, we were not able to map some proteins to their chromosomes.
**Coverage = (Number with predictions/Number of data points tested)
Figure 1.Performance statistics of SIFT predictions on PolyPhen-2’s (a) HumVar and (b) HumDiv data sets when using various protein databases. ROC curves on the (c) HumVar and (d) HumDiv data sets. Although UniRef-100 shows slightly better performance than UniRef-90, it has lower coverage.
Comparison of SIFT’s performance on our predictions based on UniRef90 and that reported by Hicks et al.
| SIFT sensitivity (%) | SIFT specificity (%) | |||
|---|---|---|---|---|
| As reported by Hicks | Generated using UniRef90 (%) | As reported by Hicks | Generated using UniRef90 (%) | |
| MLH1 (60) | 72 | 92 | 52 | 57 |
| MSH2 (30) | 89 | 89 | 46 | 36 |
| TP53 (144) | 84 | 79 | 75 | 100 |
| BRCA1 (33) | 94 | 88 | 31 | 44 |
| Overall | 83 | 83 | 46 | 52 |
In the first column, numbers in parenthesis refers to the number of amino acid substitutions. Hicks et al. did not report accuracy and precision statistics and these are therefore not compared.