| Literature DB >> 18005451 |
Jian Tian1, Ningfeng Wu, Xuexia Guo, Jun Guo, Juhua Zhang, Yunliu Fan.
Abstract
BACKGROUND: Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease.Entities:
Mesh:
Year: 2007 PMID: 18005451 PMCID: PMC2216041 DOI: 10.1186/1471-2105-8-450
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Brief flow chart illustrating the prediction procedure of Parepro. First, the position-specific amino acid probabilities (PSAP) of the target sequence are calculated. Second, three attribute sets are constructed using the PSAP information in combination with the RD, MI, and IE properties of the amino acids. Finally, the complex vector of Parepro is integrated and used to predict the effect of an nsSNP.
The prediction performance of the Parepro attribute sets when applied alone or in combination
| Attribute set | Sensitivity | Specificity | Q2 | MCC |
| RD | 0.78 | 0.68 | 0.75 | 0.46 |
| MI | 0.79 | 0.66 | 0.74 | 0.46 |
| IE | 0.75 | 0.56 | 0.67 | 0.32 |
| RD+MI | 0.81 | 0.67 | 0.75 | 0.49 |
| RD+IE | 0.80 | 0.68 | 0.75 | 0.47 |
| MI+IE | 0.80 | 0.66 | 0.75 | 0.47 |
| Parepro | 0.82 | 0.67 | 0.76 | 0.50 |
Q2: the overall accuracy
MCC: Matthews correlation coefficient
Range of the number of homologous sequences
| Subset name | The range of homologous sequences number* | The proteins number within the range (%) | The mutations number within the range (%) |
| F1 | [0,0] | 12.29 | 8.28 |
| F2 | [1,3] | 18.93 | 17.31 |
| F3 | [4,6] | 11.84 | 9.27 |
| F4 | [7,9] | 7.20 | 6.78 |
| F5 | [10,14] | 9.70 | 10.65 |
| F6 | [15,25] | 9.06 | 11.84 |
| F7 | [26,1000] | 30.97 | 35.86 |
*The number of homologous sequences of target protein between a and b, as denoted by [a, b].
Figure 2The overall accuracy (Q2) and Matthews' correlation coefficient (MCC) of Parepro when testing the subsets from F1 to F7. The x-axis denotes the different test subsets from F1 to F7, and the y-axis denotes the overall accuracy (Q2) or Matthews correlation coefficient (MCC).
Figure 3Average prediction accuracy calculated cumulatively with RI above a given value. For example, about 66% of all nsSNPs have RI ≥ 6, and of these nsNSPs about 88% are corrctly predicted. The result is based on the NumVar dataset.
Comparison of performance between Parepro and other methods using the HumVar dataset
| Prediction Method | Sensitivity | Specificity | Q2 | MCC | PM (%) |
| PolyPhen | 0.62 | 0.80 | 0.72 | 0.44 | 93 |
| SIFT | 0.76 | 0.56 | 0.67 | 0.33 | 94 |
| HydridMeth | 0.80 | 0.65 | 0.74 | 0.46 | 100 |
| Parepro | 0.82 | 0.67 | 0.76 | 0.50 | 100 |
The prediction results of PolyPhen, SIFT and HydridMeth were obtained from Capriotti et al. [10].
Q2: the overall accuracy
MCC: Matthews correlation coefficient
PM is the percentage of predicted mutations.
Comparison of performance parameters of Parepro with other methods using the NewHumVar dataset
| Prediction Method | Sensitivity | Specificity | Q2 | MCC | PM (%) |
| PolyPhen | 0.30 | 0.92 | 0.72 | 0.28 | 79 |
| SIFT | 0.32 | 0.87 | 0.69 | 0.22 | 88 |
| HydridMeth | 0.34 | 0.94 | 0.73 | 0.36 | 100 |
| Parepro | 0.40 | 0.94 | 0.78 | 0.42 | 100 |
The prediction results of PolyPhen, SIFT and HydridMeth were obtained from Capriotti et al. [10].
Q2: the overall accuracy
MCC: Matthews correlation coefficient
PM is the percentage of predicted mutations.