| Literature DB >> 26989155 |
Jiaxin Wu1, Mengmeng Wu1, Lianshuo Li1, Zhuo Liu1, Wanwen Zeng1, Rui Jiang2.
Abstract
The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases.Entities:
Mesh:
Year: 2016 PMID: 26989155 PMCID: PMC4795934 DOI: 10.1093/database/baw024
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Structure of the dbWGFP database.
Computational methods for predicting functionally damaging effects or conservation properties of single nucleotide variants
| Method | Version | Source | Website |
|---|---|---|---|
| Grantham | Sep-74 | CADD | — |
| SIFT | Aug-11 | dbNSFP | |
| PolyPhen-2 | v2.2.2 | dbNSFP | |
| LRT | Nov-09 | dbNSFP | |
| MutationTaster | Mar-13 | dbNSFP | |
| Mutation Assessor | Release 2 | dbNSFP | |
| FATHMM | v2.3 | dbNSFP | |
| RadialSVM | v2.4 | dbNSFP | |
| LR | v2.4 | dbNSFP | |
| CADD | v1.0 | CADD | |
| GWAVA | v1.0 | GWAVA | |
| MSRV | Aug-07 | MSRV | |
| SinBaD | Nov-12 | SinBaD | |
| phastCons | Nov-09 | UCSC | |
| PhyloP | Nov-09 | UCSC | |
| GERP ++ | May-11 | GERP | |
| SiPhy | v0.5 | SiPhy |
Coverage of functional prediction scores in percentage
| Chromosome | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | X | Y | All |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grantham | 1.15 | 0.79 | 0.76 | 0.55 | 0.66 | 0.78 | 0.77 | 0.6 | 0.86 | 0.77 | 1.15 | 1.04 | 0.48 | 0.96 | 1.1 | 1.4 | 1.88 | 0.54 | 2.97 | 1.02 | 0.72 | 1.59 | 0.64 | 0.24 | 0.9 |
| SIFT | 1.16 | 0.8 | 0.75 | 0.55 | 0.66 | 0.78 | 0.76 | 0.61 | 0.87 | 0.78 | 1.13 | 1 | 0.46 | 0.91 | 1.11 | 1.37 | 1.89 | 0.54 | 3.05 | 1.04 | 0.71 | 1.57 | 0.62 | 0 | 0.89 |
| PolyPhen-2 | 1.09 | 0.74 | 0.73 | 0.51 | 0.63 | 0.74 | 0.71 | 0.57 | 0.82 | 0.73 | 1.08 | 0.98 | 0.46 | 0.86 | 1.01 | 1.28 | 1.76 | 0.51 | 2.84 | 0.98 | 0.68 | 1.47 | 0.59 | 0.02 | 0.84 |
| LRT | 1.03 | 0.67 | 0.7 | 0.51 | 0.59 | 0.72 | 0.66 | 0.55 | 0.78 | 0.72 | 1.02 | 0.94 | 0.46 | 0.83 | 0.97 | 1.2 | 1.66 | 0.47 | 2.09 | 0.97 | 0.63 | 1.39 | 0.56 | 0.01 | 0.79 |
| MutationTaster | 1.26 | 0.87 | 0.84 | 0.6 | 0.72 | 0.85 | 0.83 | 0.67 | 0.95 | 0.85 | 1.25 | 1.14 | 0.52 | 1.01 | 1.15 | 1.45 | 2.01 | 0.58 | 3.12 | 1.13 | 0.8 | 1.7 | 0.7 | 0.02 | 0.97 |
| Mutation Assessor | 1.06 | 0.73 | 0.7 | 0.51 | 0.62 | 0.72 | 0.69 | 0.54 | 0.8 | 0.7 | 1.05 | 0.95 | 0.45 | 0.84 | 1 | 1.22 | 1.72 | 0.51 | 2.74 | 0.95 | 0.65 | 1.41 | 0.57 | 0.03 | 0.82 |
| FATHMM | 1.03 | 0.71 | 0.68 | 0.49 | 0.6 | 0.7 | 0.67 | 0.52 | 0.77 | 0.68 | 1.02 | 0.93 | 0.43 | 0.8 | 0.98 | 1.18 | 1.64 | 0.49 | 2.64 | 0.92 | 0.64 | 1.34 | 0.57 | 0.03 | 0.79 |
| RadialSVM | 1.18 | 0.81 | 0.77 | 0.56 | 0.67 | 0.79 | 0.78 | 0.62 | 0.88 | 0.79 | 1.17 | 1.05 | 0.48 | 0.94 | 1.08 | 1.37 | 1.88 | 0.54 | 3.01 | 1.05 | 0.74 | 1.6 | 0.65 | 0.05 | 0.91 |
| LR | 1.18 | 0.81 | 0.77 | 0.56 | 0.67 | 0.79 | 0.78 | 0.62 | 0.88 | 0.79 | 1.17 | 1.05 | 0.48 | 0.94 | 1.08 | 1.37 | 1.88 | 0.54 | 3.01 | 1.05 | 0.74 | 1.6 | 0.65 | 0.05 | 0.91 |
| CADD | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| GWAVA | 0.52 | 0.53 | 0.54 | 0.55 | 0.53 | 0.55 | 0.55 | 0.58 | 0.54 | 0.55 | 0.56 | 0.54 | 0.54 | 0.54 | 0.53 | 0.62 | 0.53 | 0.55 | 0.6 | 0.59 | 0.59 | 0.58 | 0.36 | 0.11 | 0.53 |
| MSRV | 0.99 | 0.66 | 0.67 | 0.47 | 0.6 | 0.68 | 0.66 | 0.51 | 0.76 | 0.68 | 0.99 | 0.91 | 0.42 | 0.79 | 0.94 | 1.17 | 1.66 | 0.48 | 2.56 | 0.92 | 0.65 | 1.33 | 0.56 | 0.01 | 0.78 |
| SinBaD | 1.28 | 0.88 | 0.84 | 0.61 | 0.73 | 0.86 | 0.84 | 0.68 | 0.96 | 0.86 | 1.26 | 1.15 | 0.53 | 1.02 | 1.18 | 1.49 | 2.04 | 0.59 | 3.24 | 1.14 | 0.81 | 1.72 | 0.7 | 0.05 | 0.99 |
| phastCons | 99 | 99 | 99 | 99 | 98 | 99 | 98 | 98 | 98 | 99 | 98 | 98 | 99 | 99 | 99 | 98 | 99 | 99 | 97 | 99 | 99 | 98 | 95 | 85 | 98 |
| PhyloP | 99 | 99 | 99 | 99 | 98 | 99 | 98 | 98 | 98 | 99 | 98 | 98 | 99 | 99 | 99 | 98 | 99 | 99 | 97 | 99 | 99 | 98 | 95 | 85 | 98 |
| GERP ++ | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| SiPhy | 98 | 98 | 98 | 98 | 98 | 98 | 97 | 98 | 98 | 98 | 97 | 98 | 99 | 98 | 98 | 97 | 98 | 98 | 95 | 98 | 96 | 97 | 92 | 0 | 97 |
Figure 2.Pairwise Spearman’s rank correlation coefficients between different functional prediction scores.
Prediction power of the scores
| Type of SNVs | Coding SNVs | Splicing SNVs | Regulatory SNVs | ||||||
|---|---|---|---|---|---|---|---|---|---|
| #(disease SNVs) | 52007 | 8822 | 1811 | ||||||
| #(neutral SNVs) | 272534 | 2897 | 701984 | ||||||
| Method | AUC | Coverage(%) | AUC | Coverage(%) | AUC | Coverage(%) | |||
| mamPhCons | 0 | 67.22 | 99.99 | 5.99E-21 | 56.89 | 99.99 | 1.18E-57 | 56.89 | 99.78 |
| mamPhyloP | 0 | 64.59 | 100 | 0.0264 | 51.61 | 99.99 | 1.12E-36 | 58.6 | 99.8 |
| GERP ++ | 0 | 66.43 | 100 | 3.27E-08 | 56.36 | 100 | 4.78E-19 | 58.73 | 100 |
| SiPhy | 0 | 56.23 | 99.94 | 1.8E-17 | 55.54 | 99.91 | 1.91E-45 | 60.13 | 99.23 |
| Grantham | 0 | 60.37 | 96 | – | – | – | – | – | – |
| CADD | 0 | 77.31 | 100 | 2.44E-21 | 55.64 | 100 | 2.97E-91 | 65 | 100 |
| GWAVA | 5.74E-66 | 54.05 | 88.93 | 0.0005 | 53.37 | 36.35 | 7.73E-277 | 81.61 | 99.67 |
| SIFT | 4.33E-308 | 64.38 | 93.55 | – | – | – | – | – | – |
| Polyphen2 | 0 | 77.04 | 93.36 | – | – | – | – | – | – |
| LRT | 0 | 70.94 | 86.09 | – | – | – | – | – | – |
| MutationTaster | 0 | 63.46 | 99.07 | 1.15E-08 | 53.07 | 72.33 | – | – | – |
| MutationAssessor | 0 | 77.55 | 94.12 | – | – | – | – | – | – |
| FATHMM | 0 | 86.41 | 90.27 | – | – | – | – | – | – |
| RadialSVM | 0 | 87.79 | 96.28 | – | – | – | – | – | – |
| LR | 0 | 87.96 | 96.28 | – | – | – | – | – | – |
| MSRV | 0 | 80.56 | 89.81 | – | – | – | – | – | – |
| SinBAD | 0 | 74.16 | 99.38 | 1.06E-195 | 70.48 | 72.58 | – | – | – |
Running time of the dbWGFP search program. Results are obtained using 8 threads in a server with dual Intel E5-2630V2 CPU (2.6 GHz) and 64GB memory
| 1 | 291 183 | 86 | 3386 | 116 | 2510 |
| 2 | 306 260 | 89 | 3441 | 121 | 2531 |
| 3 | 265 905 | 76 | 3499 | 103 | 2582 |
| 4 | 281 093 | 72 | 3904 | 96 | 2928 |
| 5 | 240 036 | 86 | 2791 | 94 | 2554 |
| 6 | 254 105 | 67 | 3793 | 89 | 2855 |
| 7 | 214 802 | 58 | 3703 | 83 | 2588 |
| 8 | 201 101 | 58 | 3467 | 73 | 2755 |
| 9 | 159 777 | 48 | 3329 | 68 | 2350 |
| 10 | 192 012 | 54 | 3556 | 74 | 2595 |
| 11 | 194 987 | 54 | 3611 | 75 | 2600 |
| 12 | 176 087 | 53 | 3322 | 76 | 2317 |
| 13 | 147 631 | 42 | 3515 | 58 | 2545 |
| 14 | 124 626 | 37 | 3368 | 52 | 2397 |
| 15 | 110 700 | 36 | 3075 | 48 | 2306 |
| 16 | 114 626 | 35 | 3275 | 52 | 2204 |
| 17 | 102 123 | 36 | 2837 | 47 | 2173 |
| 18 | 111 964 | 36 | 3110 | 46 | 2434 |
| 19 | 84 735 | 26 | 3259 | 38 | 2230 |
| 20 | 77 334 | 28 | 2762 | 37 | 2090 |
| 21 | 55 667 | 17 | 3275 | 23 | 2420 |
| 22 | 48 737 | 16 | 3046 | 25 | 1949 |
| X | 88 735 | 46 | 1929 | 57 | 1557 |
| Combined | 3 844 227 | 769 | 4999 | 1054 | 3647 |
Figure 3.Illustration of the step-by-step mode of the query service.