| Literature DB >> 25136813 |
Quan Li1, Xiaoming Liu2, Richard A Gibbs3, Eric Boerwinkle4, Constantin Polychronakos1, Hui-Qi Qu2.
Abstract
The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY) molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25136813 PMCID: PMC4138110 DOI: 10.1371/journal.pone.0104452
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
MODY or neonatal diabetes genes and mutations (n = 1091).
| Gene symbol | Diabetic mutations | Control mutations (n) | Chromosome position | Gene name |
|
| 86 | 57 | 20q12-q13.1 | hepatocyte nuclear factor 4, alpha |
|
| 479 | 22 | 7p15.3-p15.1 | glucokinase (hexokinase 4, maturity onset diabetes of the young 2) |
|
| 324 | 78 | 12q24.2 | hepatocyte nuclear factor-1 (HNF1) homeobox A |
|
| 36 | 52 | 17cen-q21.3 | hepatocyte nuclear factor-1 (HNF1) homeobox B |
|
| 41 | 14 | 11p15.5 | insulin |
|
| 64 | 185 | 11p15.1 | ATP-binding cassette, sub-family C (CFTR/MRP), member 8 |
|
| 61 | 65 | 11p15.1 | potassium inwardly-rectifying channel, subfamily J, member 11 |
* Number of diabetes-causing single nucleotide mutations recorded in the Human Gene Mutation Database (HGMD) 2013.4 release (http://www.hgmd.org/) [30].
Methods for function prediction for non-synonymous mutations*.
| Method | Deleterious Threshold | Algorithm |
| PhyloP | >1.6 | PhyloP calculates basewise conservation score from Multiz alignment |
| GERP++ RS | >4.4 | GERP++ RS calculates site-specific “rejected substitutions” (RS) scores and to discover evolutionarily constrained elements based on maximum likelihood evolutionary rate estimation |
| SiPhy | >12.17 | SiPhy detects bases under selection from a multiple alignment data using a hidden Markov model. ( |
| SIFT | >0.95 | SIFT prediction is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST.( |
| PolyPhen-2 | >0.5 | PolyPhen-2 (Polymorphism Phenotyping v2) predicts the functional significance of an amino acid substitution by Naïve Bayes classifier, using sequence-based and structure-based predictive features. HDIV, or HumDiv, identifies human damaging mutations by assuming differences between human proteins and their closely related mammalian homologs as non-damaging. HVAR, or HumVar, identifies human disease-causing mutations by assuming common human nsSNPs as non-damaging. ( |
| LRT | >0.999 | The likelihood ratio test (LRT) identifies conserved amino acid positions and deleterious mutations using a comparative genomics data set of multiple vertebrate species. ( |
| MutationTaster | >0.5 | MutationTaster evaluates the disease-causing potential of DNA sequence alterations by Naïve Bayes classifier, integrating information Of evolutionary conservation, splice-site changes, loss of protein features and changes that might affect the amount of mRNA from different biomedical databases and uses established analysis tools. ( |
| Mutation Assessor | >0.65 | Mutation Assessor predicts the functional impact of amino-acid substitutions in proteins based on evolutionary conservation of the affected amino acid in protein homologs. ( |
| FATHMM | >0.453 | The Functional Analysis through Hidden Markov Models (FATHMM) Predicts the functional consequences of cancer-associated amino acid substitutions using a model weighted for inherited disease mutations ( |
| RadialSVM score | >0.5 | RadialSVM score is an ensemble-based approach integrating multiple scoring systems (function prediction and conservation Score) by radial support vector machine (SVM) |
| LR score | >0.5 | LR score is an ensemble-based approach integrating multiple scoring systems (function prediction and conservation Score) by logistic regression (LR) |
* Extensively comparisons of these methods genome-widely have been studied by Thusberg et al. [1], [2] and Dong et al. [3]. Deleterious thresholds for PhyloP, GERP++ RS and SiPhy are defined according to the study by Dong et al. [3]. Deleterious thresholds for SIFT, LRT, MutationTaster, Mutation Assessor, FATHMM and RAdialSVM are based on converted scores used in dbNSFP version 2.3 [45], [46] (converted score is designated as Sc and original score is designated as So): SIFT: Sc = 1-So; LRT: Sc = 1- So *0.5 if Ω<1, or Sc = So *0.5 if Ω> = 1; MutationTaster: Sc = So if the prediction is “A” or “D” or Sc = 1- So if the prediction is “N” or “P”; Mutation Assessor: Sc = (So -(−5.545))/(5.975-(−5.545)); FATHMM: Sc = 1-(So -(−16.13))/(10.64-(−16.13)); RadialSVM: Sc = (1+ So/3.03993691875303)*0.5 if predicted “D” and Sc = (1- So/-2.00575697514507)*0.5 otherwise. More details of the conversion can be found at http://dbnsfp.houstonbioinformatics.org/dbNSFPzip/dbNSFP2.3.readme.txt.
Correlations of different Methods for function prediction for non-synonymous mutations causing diabetes [Spearman's ρ (P value)].
| Methods | phyloP | GERP++ RS | SiPhy | SIFT | PolyPhen-2 HDIV | PolyPhen-2 HVAR | LRT | MutationTaster | Mutation Assessor | FATHMM | RadialSVM score |
| GERP++ RS | 0.798 (P = 1.13E-218) | ||||||||||
| SiPhy | 0.857 (P = 2.42E-285) | 0.718 (P = 1.15E-156) | |||||||||
| SIFT | 0.018 (P = 5.92E-01) | 0.062 (P = 6.75E-02) | 0.087 (P = 1.09E-02) | ||||||||
| PolyPhen-2 HDIV | 0.228 (P = 1.16E-11) | 0.192 (P = 1.20E-08) | 0.278 (P = 9.60E-17) | 0.506 (P = 8.57E-57) | |||||||
| PolyPhen-2 HVAR | 0.205 (P = 1.27E-09) | 0.19 (P = 1.85E-08) | 0.233 (P = 3.89E-12) | 0.496 (P = 2.10E-54) | 0.88 (P = 3.88E-281) | ||||||
| LRT | 0.332 (P = 3.28E-23) | 0.412 (P = 5.32E-36) | 0.388 (P = 9.10E-32) | 0.252 (P = 1.23E-13) | 0.38 (P = 2.13E-30) | 0.398 (P = 1.58E-33) | |||||
| MutationTaster | 0.298 (P = 7.40E-21) | 0.33 (P = 1.82E-25) | 0.339 (P = 6.75E-27) | 0.288 (P = 7.50E-18) | 0.348 (P = 6.44E-26) | 0.354 (P = 6.86E-27) | 0.332 (P = 2.93E-23) | ||||
| Mutation Assessor | 0.089 (P = 8.79E-03) | 0.158 (P = 3.00E-06) | 0.158 (P = 3.03E-06) | 0.633 (P = 3.11E-96) | 0.516 (P = 3.30E-59) | 0.493 (P = 2.16E-53) | 0.409 (P = 2.73E-35) | 0.321 (P = 2.58E-22) | |||
| FATHMM | 0.071 (P = 3.75E-02) | 0.042 (P = 2.13E-01) | 0.087 (P = 1.07E-02) | 0.27 (P = 1.09E-15) | 0.297 (P = 5.62E-19) | 0.342 (P = 5.75E-25) | 0.126 (P = 2.28E-04) | 0.13 (P = 1.32E-04) | 0.297 (P = 8.20E-19) | ||
| RadialSVM score | 0.188 (P = 2.08E-08) | 0.233 (P = 2.51E-12) | 0.254 (P = 1.84E-14) | 0.492 (P = 2.33E-53) | 0.516 (P = 6.04E-60) | 0.473 (P = 2.46E-49) | 0.268 (P = 2.29E-15) | 0.275 (P = 8.71E-17) | 0.398 (P = 1.97E-34) | 0.409 (P = 4.23E-36) | |
| LR score | 0.21 (P = 3.08E-10) | 0.211 (P = 2.74E-10) | 0.272 (P = 1.96E-16) | 0.548 (P = 1.82E-68) | 0.603 (P = 9.85E-87) | 0.621 (P = 5.02E-93) | 0.344 (P = 5.86E-25) | 0.303 (P = 3.77E-20) | 0.646 (P = 6.66E-104) | 0.852 (P = 5.43E-244) | 0.634 (P = 3.56E-100) |
Method comparisons for function prediction for non-synonymous mutations causing diabetes.
| Methods | Missing Rate | False Negative Rate | False Positive Rate | MCC |
| PhyloP | 0% | 18% | 53% | 0.300 |
| GERP++ RS | 0% | 21% | 52% | 0.281 |
| SiPhy | 0% | 16% | 51% | 0.342 |
| SIFT | 13% | 25% | 39% | 0.350 |
| PolyPhen-2 HDIV | 15% | 9% | 51% | 0.447 |
| PolyPhen-2 HVAR | 15% | 16% | 42% | 0.434 |
| LRT | 18% | 7% | 68% | 0.324 |
| MutationTaster | 3% | 3% | 77% | 0.333 |
| Mutation Assessor | 15% | 30% | 32% | 0.362 |
| FATHMM | 14% | 1% | 95% | 0.127 |
| RadialSVM score | 8% | 5% | 57% | 0.474 |
| LR score | 8% | 4% | 69% | 0.393 |
* The missing rate refers to the percentage of mutations that a method is inapplicable;
**The false positive rate was calculated by nonsynonymous single-nucleotide mutations in the diabetes genes acquired from the NHLBI GO Exome Sequencing Project (ESP) [33], the CHARGE Exome Sequencing Project [34], [36], and the 1000 Genome Project [37], excluding mutations recorded in the HGMD database.
Prediction score comparisons of diabetes mutations in different genes [Mean±Standard Deviation, N (Maximum/Percentile 75/Median/Percentile 25/Minimum)].
| Methods | GCK(MODY2) | INS(MODY10) | KCNJ11(MODY13) | ABCC8(MODY12) | HNF1A(MODY3) | HNF1B(MODY5) | HNF4A(MODY1) | ANOVA | Overall |
|
| 2.120±0.693, 437 (2.941/2.553/2.285/1.981/−0.445) | 1.206±0.852, 38 (2.069/1.918/1.585/0.590/−1.023) | 2.046±0.444, 61 (2.548/2.398/2.084/1.942/−0.009) | 2.219±0.556, 64 (2.941/2.607/2.331/1.990/−0.403) | 1.793±0.791, 283 (2.882/2.246/2.045/1.662/−3.003) | 2.377±0.652, 28 (2.890/2.882/2.684/2.162/0.768) | 1.894±0.841, 74 (2.814/2.481/2.282/1.226/−1.309) | 1.02E-17 | 1.983±0.752, 985 (2.941/2.449/2.167/1.807/−3.003/) |
|
| 4.969±1.305, 437 (6.170/5.690/5.220/4.770/−3.060) | 2.353±1.724, 38 (4.020/3.580/2.800/1.853/−3.200) | 4.807±0.862, 61 (5.430/5.290/5.160/4.570/−0.548) | 5.135±0.803, 64 (6.170/5.490/5.310/4.993/0.768) | 4.185±2.020, 283 (6.060/4.910/4.650/4.340/−12.100) | 5.279±0.773, 28 (6.060/5.810/5.560/5.110/3.230) | 4.591±1.862, 74 (5.930/5.400/5.160/4.328/−7.160) | 1.55E-25 | 4.624±1.654, 985 (6.170/5.430/5.050/4.490/−12.100/) |
|
| 16.241±3.196, 437 (20.490/18.618/16.477/14.725/2.501) | 9.288±4.002, 38 (14.890/11.956/9.950/7.312/1.486) | 15.732±2.899, 61 (19.243/18.636/15.481/13.607/6.945) | 16.327±2.609, 64 (20.567/18.577/16.174/14.642/10.313) | 13.956±3.649, 283 (19.609/16.972/14.016/12.608/0.949) | 16.300±3.457, 28 (19.609/18.716/17.832/14.263/8.432) | 15.254±4.259, 74 (20.336/18.810/15.601/12.647/2.815) | 2.26E-37 | 15.218±3.738, 985 (20.567/18.332/15.716/13.607/0.949/) |
|
| 0.940±0.151, 383 (1.000/1.000/1.000/0.970/0.000) | 0.956±0.140, 32 (1.000/1.000/1.000/1.000/0.360) | 0.906±0.172, 52 (1.000/1.000/0.990/0.868/0.210) | 0.888±0.204, 59 (1.000/1.000/0.990/0.875/0.000) | 0.918±0.184, 242 (1.000/1.000/1.000/0.933/0.000) | 0.922±0.205, 27 (1.000/1.000/0.990/0.965/0.000) | 0.921±0.168, 62 (1.000/1.000/1.000/0.953/0.250) | 0.247 | 0.927±0.169, 857 (1.000/1.000/1.000/0.950/0.000/) |
|
| 0.917±0.229, 389 (1.000/1.000/1.000/0.988/0.000) | 0.906±0.290, 32 (1.000/1.000/1.000/0.998/0.009) | 0.964±0.151, 52 (1.000/1.000/1.000/0.996/0.004) | 0.821±0.330, 58 (1.000/1.000/0.989/0.858/0.001) | 0.870±0.269, 242 (1.000/1.000/0.999/0.920/0.000) | 0.909±0.252, 27 (1.000/1.000/1.000/0.999/0.000) | 0.883±0.276, 63 (1.000/1.000/1.000/0.995/0.019) | 0.0247 | 0.897±0.253, 863 (1.000/1.000/1.000/0.975/0.000/) |
|
| 0.850±0.280, 389 (1.000/0.999/0.995/0.880/0.000) | 0.893±0.292, 32 (1.000/1.000/1.000/0.973/0.005) | 0.925±0.174, 52 (1.000/1.000/0.998/0.957/0.016) | 0.704±0.334, 58 (1.000/0.991/0.806/0.530/0.001) | 0.781±0.319, 242 (1.000/0.999/0.980/0.589/0.000) | 0.882±0.270, 27 (1.000/1.000/0.986/0.974/0.007) | 0.821±0.324, 63 (1.000/0.999/0.994/0.896/0.002) | 1.88E-04 | 0.826±0.297, 863 (1.000/0.999/0.992/0.793/0.000/) |
|
| 0.999±0.005, 385 (1.000/1.000/1.000/1.000/0.936) | 0.971±0.137, 23 (1.000/1.000/1.000/1.000/0.341) | 0.994±0.046, 52 (1.000/1.000/1.000/1.000/0.668) | 1.000±0.003, 58 (1.000/1.000/1.000/1.000/0.975) | 0.994±0.029, 239 (1.000/1.000/1.000/1.000/0.704) | 1.000±0.000, 27 (1.000/1.000/1.000/1.000/0.998) | 0.997±0.012, 62 (1.000/1.000/1.000/1.000/0.948) | 5.86E-04 | 0.997±0.030, 846 (1.000/1.000/1.000/1.000/0.341/) |
|
| 0.988±0.101, 428 (1.000/1.000/1.000/1.000/0.001) | 0.752±0.437, 32 (1.000/1.000/1.000/0.764/0.000) | 0.983±0.128, 61 (1.000/1.000/1.000/1.000/0.000) | 0.984±0.125, 64 (1.000/1.000/1.000/1.000/0.000) | 0.968±0.165, 266 (1.000/1.000/1.000/1.000/0.000) | 0.967±0.173, 28 (1.000/1.000/1.000/1.000/0.087) | 0.991±0.069, 66 (1.000/1.000/1.000/1.000/0.439) | 3.53E-14 | 0.974±0.153, 945 (1.000/1.000/1.000/1.000/0.000/) |
|
| 0.720±0.086, 388 (0.837/0.796/0.734/0.674/0.442) | 0.752±0.077, 30 (0.816/0.805/0.778/0.728/0.553) | 0.660±0.085, 61 (0.802/0.715/0.666/0.605/0.464) | 0.671±0.083, 64 (0.886/0.703/0.680/0.636/0.356) | 0.638±0.054, 237 (0.697/0.679/0.655/0.619/0.434) | 0.665±0.047, 27 (0.706/0.697/0.677/0.658/0.547) | 0.707±0.106, 62 (0.887/0.788/0.732/0.661/0.434) | 6.27E-36 | 0.688±0.087, 869 (0.887/0.757/0.685/0.641/0.356/) |
|
| 0.587±0.030, 389 (0.683/0.604/0.583/0.560/0.541) | 0.531±0.052, 32 (0.640/0.578/0.503/0.492/0.470) | 0.531±0.020, 52 (0.563/0.546/0.534/0.522/0.479) | 0.512±0.032, 58 (0.575/0.530/0.507/0.497/0.405) | 0.577±0.037, 241 (0.685/0.603/0.576/0.546/0.494) | 0.582±0.034, 27 (0.626/0.612/0.588/0.544/0.537) | 0.542±0.042, 63 (0.612/0.556/0.550/0.537/0.408) | 9.03E-73 | 0.570±0.042, 862 (0.685/0.598/0.571/0.547/0.405/) |
|
| 0.663±0.034, 389 (0.685/0.680/0.673/0.661/0.393) | 0.626±0.062, 32 (0.684/0.663/0.641/0.617/0.374) | 0.598±0.120, 61 (0.682/0.671/0.649/0.588/0.234) | 0.584±0.110, 64 (0.682/0.656/0.628/0.574/0.246) | 0.644±0.066, 244 (0.725/0.678/0.671/0.639/0.275) | 0.667±0.017, 27 (0.683/0.679/0.673/0.663/0.611) | 0.628±0.104, 63 (0.682/0.679/0.672/0.658/0.274) | 5.80E-22 | 0.644±0.072, 880 (0.725/0.679/0.670/0.646/0.234/) |
|
| 0.945±0.061, 389 (0.996/0.982/0.964/0.937/0.495) | 0.863±0.103, 32 (0.992/0.947/0.868/0.808/0.470) | 0.778±0.235, 61 (0.966/0.915/0.881/0.755/0.110) | 0.744±0.215, 64 (0.978/0.882/0.810/0.707/0.075) | 0.892±0.127, 244 (0.994/0.970/0.926/0.871/0.149) | 0.940±0.036, 27 (0.987/0.978/0.932/0.912/0.871) | 0.856±0.201, 63 (0.989/0.956/0.934/0.900/0.153) | 6.11E-38 | 0.895±0.143, 880 (0.996/0.971/0.944/0.879/0.075/) |
Comparisons of the performances of different methods by MCCs [Average difference (P value)].
| Methods | FATHMM | LRT | GERP++ RS | SIFT | PhyloP | SiPhy | MutationTaster | Mutation Assessor | LR score | PolyPhen-2 HDIV | PolyPhen-2 HVAR |
|
| 0.061 (P = 0.018) | ||||||||||
|
| 0.061 (P = 0.346) | 0.001 (P = 0.991) | |||||||||
|
| 0.064 (P = 0.303) | 0.004 (P = 0.959) | 0.003 (P = 0.963) | ||||||||
|
| 0.076 (P = 0.293) | 0.015 (P = 0.837) | 0.015 (P = 0.452) | 0.012 (P = 0.843) | |||||||
|
| 0.113 (P = 0.059) | 0.052 (P = 0.362) | 0.052 (P = 0.104) | 0.049 (P = 0.473) | 0.037 (P = 0.382) | ||||||
|
| 0.117 (P = 0.056) | 0.056 (P = 0.317) | 0.056 (P = 0.266) | 0.053 (P = 0.423) | 0.041 (P = 0.497) | 0.004 (P = 0.889) | |||||
|
| 0.153 (P = 0.012) | 0.093 (P = 0.106) | 0.092 (P = 0.16) | 0.089 (P = 0.127) | 0.077 (P = 0.267) | 0.04 (P = 0.455) | 0.036 (P = 0.516) | ||||
|
| 0.188 (P = 0.000174) | 0.127 (P = 0.013) | 0.127 (P = 0.081) | 0.124 (P = 0.063) | 0.112 (P = 0.143) | 0.075 (P = 0.122) | 0.071 (P = 0.133) | 0.035 (P = 0.404) | |||
|
| 0.21 (P = 0.00144) | 0.149 (P = 0.021) | 0.148 (P = 0.00881) | 0.145 (P = 0.015) | 0.134 (P = 0.019) | 0.097 (P = 0.019) | 0.093 (P = 0.056) | 0.056 (P = 0.158) | 0.022 (P = 0.537) | ||
|
| 0.211 (P = 0.000444) | 0.15 (P = 0.0064) | 0.15 (P = 0.01) | 0.147 (P = 0.037) | 0.135 (P = 0.034) | 0.098 (P = 0.01) | 0.094 (P = 0.026) | 0.058 (P = 0.192) | 0.023 (P = 0.445) | 0.001 (P = 0.934) | |
|
| 0.231 (P = 0.0023) | 0.17 (P = 0.01) | 0.17 (P = 0.033) | 0.167 (P = 0.062) | 0.155 (P = 0.075) | 0.118 (P = 0.05) | 0.114 (P = 0.073) | 0.078 (P = 0.05) | 0.043 (P = 0.341) | 0.021 (P = 0.602) | 0.02 (P = 0.577) |