| Literature DB >> 22641855 |
Tien-Dao Luu1, Alin Rusu, Vincent Walter, Benjamin Linard, Laetitia Poidevin, Raymond Ripp, Luc Moulinier, Jean Muller, Wolfgang Raffelsberger, Nicolas Wicker, Odile Lecompte, Julie D Thompson, Olivier Poch, Hoan Nguyen.
Abstract
A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22641855 PMCID: PMC3394327 DOI: 10.1093/nar/gks474
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.ILP rules. The first column provides a link to the positive (deleterious mutations) and negative (neutral mutations) examples covered by a given rule and that can be seen by clicking on the + icon. The second column provides the rule identifier (Id). The next two columns provide the ‘if’ and ‘then’ clauses of the induced rules. The two right most columns indicate the number of positive and negative examples covered by the rule in each row.
Figure 2.(a) Screenshot of the input form of the prediction service. (b) Screenshot of the output page providing the prediction results as well as the multi-level characterizations of the mutation. The rules are described if the variant is ‘deleterious’. The annotated information related to the mutated position can be visualized in the MSV3d interface on the right.
Comparison of prediction methods based on the PolyPhen-2 validation set [658 disease-causing (OMIM phenotype) mutations and 298 neutral polymorphisms]
| TP | FP | FN | TN | Sensitivity | Specificity | Precision | Recall | Accuracy | F-measure | |
|---|---|---|---|---|---|---|---|---|---|---|
| SIFT | 398 | 38 | 260 | 260 | 0.6049 | 0.8725 | 0.9128 | 0.6049 | 0.6883 | 0.7276 |
| PolyPhen-2 | 576 | 111 | 77 | 184 | 0.8821 | 0.6237 | 0.8384 | 0.8821 | 0.8017 | 0.8597 |
| KD4v | 487 | 94 | 171 | 204 | 0.7401 | 0.6846 | 0.8382 | 0.7401 | 0.7228 | 0.7861 |
Comparison of prediction methods based on the validation set that excludes proteins present in the training set (173 disease-causing mutations (OMIM phenotype) and 179 neutral polymorphisms)
| TP | FP | FN | TN | Sensitivity | Specificity | Precision | Recall | Accuracy | ||
|---|---|---|---|---|---|---|---|---|---|---|
| SIFT | 106 | 23 | 67 | 156 | 0.6127 | 0.8715 | 0.8217 | 0.6127 | 0.7443 | 0.702 |
| PolyPhen-2 | 139 | 70 | 34 | 109 | 0.8035 | 0.6089 | 0.6651 | 0.8035 | 0.7045 | 0.7278 |
| KD4v | 108 | 21 | 65 | 158 | 0.6243 | 0.8827 | 0.8372 | 0.6243 | 0.7557 | 0.7152 |