| Literature DB >> 21520341 |
Xiaoming Liu1, Xueqiu Jian, Eric Boerwinkle.
Abstract
With the advance of sequencing technologies, whole exome sequencing has increasingly been used to identify mutations that cause human diseases, especially rare Mendelian diseases. Among the analysis steps, functional prediction (of being deleterious) plays an important role in filtering or prioritizing nonsynonymous SNP (NS) for further analysis. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. It has been suggested that investigators should use predictions from multiple algorithms instead of relying on a single one. However, querying predictions from different databases/Web-servers for different algorithms is both tedious and time consuming, especially when dealing with a huge number of NSs identified by exome sequencing. To facilitate the process, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions). It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS in the human genome (a total of 75,931,005). It is the first integrated database of functional predictions from multiple algorithms for the comprehensive collection of human NSs. dbNSFP is freely available for download at http://sites.google.com/site/jpopgen/dbNSFP.Entities:
Mesh:
Year: 2011 PMID: 21520341 PMCID: PMC3145015 DOI: 10.1002/humu.21517
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Number of Entries in dbNSFP
| Chromosome | NS | hg19 | PhyloP | SIFT | Polyphen2 | LRT | MutationTaster |
|---|---|---|---|---|---|---|---|
| 1 | 8329939 | 8329939 | 8329796 | 7611044 | 7279405 | 7644552 | 7959066 |
| 2 | 5115071 | 5115063 | 5114264 | 4376414 | 4428042 | 4765076 | 4858487 |
| 3 | 4280786 | 4280782 | 4280786 | 3804983 | 3756644 | 4063236 | 4098445 |
| 4 | 2868872 | 2868872 | 2868872 | 2532905 | 2554593 | 2752943 | 2766053 |
| 5 | 3484907 | 3484907 | 3484907 | 3134219 | 3043592 | 3207583 | 3347045 |
| 6 | 4075171 | 4075036 | 4075171 | 3710396 | 3586226 | 3824055 | 3803359 |
| 7 | 3268295 | 3268295 | 3267997 | 2819288 | 2856409 | 2989039 | 3132425 |
| 8 | 2597009 | 2597009 | 2597009 | 2278629 | 2207857 | 2456441 | 2539459 |
| 9 | 3164462 | 3164462 | 3164462 | 2893822 | 2842191 | 2902636 | 2860881 |
| 10 | 3273027 | 3273027 | 3273027 | 2924577 | 2922389 | 3093339 | 2956250 |
| 11 | 4253046 | 4253046 | 4253046 | 3793927 | 3795848 | 4004874 | 4098876 |
| 12 | 3852960 | 3852960 | 3852960 | 3412527 | 3332085 | 3698888 | 3723502 |
| 13 | 1420096 | 1420096 | 1420096 | 1278533 | 1235036 | 1378650 | 1374240 |
| 14 | 2494133 | 2494133 | 2494133 | 2262099 | 2236525 | 2383704 | 2432185 |
| 15 | 2470042 | 2469767 | 2470042 | 2206987 | 2219513 | 2317477 | 2351630 |
| 16 | 2970643 | 2970643 | 2970623 | 2589515 | 2545952 | 2794456 | 2864645 |
| 17 | 4372234 | 4372095 | 4371639 | 3943103 | 3822645 | 4019068 | 4228681 |
| 18 | 1178377 | 1178377 | 1178377 | 1031129 | 1039874 | 1090517 | 1131942 |
| 19 | 4415600 | 4415600 | 4415600 | 3916565 | 3779650 | 3360865 | 3983007 |
| 20 | 2103098 | 2103098 | 2103098 | 1935033 | 1881731 | 1998568 | 1927606 |
| 21 | 989231 | 989231 | 989231 | 843131 | 842902 | 890113 | 925335 |
| 22 | 1553703 | 1553703 | 1553703 | 1388173 | 1338949 | 1444075 | 1461233 |
| X | 3176068 | 3176068 | 3175174 | 2941495 | 2719583 | 2834074 | 2998558 |
| Y | 224235 | 224235 | 223727 | 205265 | 195144 | 138312 | 146802 |
| Total | 75931005 | 75930444 | 75927740 | 67833759 | 66462785 | 70052541 | 71969712 |
| Total (unique) | 64646969 | 64646408 | 64643749 | 57471955 | 56620320 | 59419546 | 61277012 |
NSs with the same position and alternative allele were counted only once.
Figure 1Distributions of PhyloP, SIFT, Polyphen2, LRT, and MutationTaster scores.
Summary of Predictions
| Method | SIFT | Polyphen2 | LRT | MutationTaster |
|---|---|---|---|---|
| Unknown/missing | 8097246 | 9468220 | 10556339 | 3961293 |
| Nondeleterious | 31166227 | 35477555 | 25950325 | 29158468 |
| Deleterious | 36667532 | 30985230 | 39424341 | 42811244 |
Pairwise Prediction Agreement Percentages (Upper Right Triangle) and Spearman's Rank Correlation Coefficients (Lower Left Triangle)
| Method | PhyloP | SIFT | Polyphen2 | LRT | MutationTaster |
|---|---|---|---|---|---|
| PhyloP | – | – | – | – | – |
| SIFT | 0.189 | – | 68.82 | 62.14 | 61.69 |
| Polyphen2 | 0.306 | 0.517 | – | 66.82 | 64.91 |
| LRT | 0.475 | 0.267 | 0.457 | – | 77.19 |
| MutationTaster | 0.389 | 0.313 | 0.444 | 0.618 | – |
Figure 2Distributions of imputation errors by using BPCAfill (A) or using the average score of the same algorithm (B).
Format of Queries for Input File
| Query type | Format | Example |
|---|---|---|
| NS | chr pos ref alt | Y 140855 A C |
| chr pos ref alt refAA altAA | Y 140855 A C M L | |
| Genome position | chr pos | 10 94454459 |
| Gene | gene_name | PLCXD1 |
| gene_id | 55344 | |
| CCDS_id | CCDS14103.1 |
Separated by tab or space. chr: chromosome number; pos: position on chromosome; ref: reference allele; alt: alternative allele; refAA: reference amino acid; altAA: alternative amino acid; gene_name: gene name; gene_id: gene Entrez ID; CCDS_id: CCDS ID.