| Literature DB >> 27357839 |
Tirso Pons1, Miguel Vazquez1, María Luisa Matey-Hernandez2, Søren Brunak2,3, Alfonso Valencia1, Jose Mg Izarzugaza4.
Abstract
BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.Entities:
Keywords: Functional impact; Pathogenicity prediction; Protein kinases; Variant prioritization; X-linked agammaglobulinemia
Mesh:
Substances:
Year: 2016 PMID: 27357839 PMCID: PMC4928150 DOI: 10.1186/s12864-016-2723-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Performance and classification features. a Performance of the classifier respect to the number of trees in the random forest; b idem, close-up on the region around the performance values; c Number of variants in each kinase group; d log odds-ratio of the number of variants in each kinase group; e Number of variants in each kinase domain; f log odds-ratio of the number of variants in each kinase domain; g changes in Cbeta-branching caused by pathogenic and neutral variants; h number of pathogenic and neutral variants affecting catalytic sites as defined by UniProt, FireDB and Phospho.ELM. i Distribution of SIFT scores; j Changes in volume caused by disease-associated and neutral variants; k Changes in hydrophobicity caused by disease-associated and neutral variants; l Accumulated Gene Ontology (GO) log odds-ratio. Note that, where relevant, disease-associated variants were represented in dark red whereas ochre was used for their neutral counterparts
Relevance of prediction features ranked according to the information gain with respect to the class
| Rank | Gain | Feature | Rank | Gain | Feature |
|---|---|---|---|---|---|
| 1 | 0.4914 | Gene Ontology | 14 | 4.79e-3 | Binding (UniProt) |
| 2 | 0.1787 | SIFT | 15 | 4.43e-3 | Np_bind (UniProt) |
| 3 | 0.1197 | Kinase group | 16 | 3.38e-3 | Repeat (UniProt) |
| 4 | 0.1121 | PFAM domain | 17 | 2.47e-3 | Phospho.ELM |
| 5 | 0.0438 | Wild type amino ac. | 18 | 2.37e-3 | Zn finger (UniProt) |
| 6 | 0.0373 | Hydrophobicity | 19 | 1.82e-3 | Modified res. (UniProt) |
| 7 | 0.0368 | Alternative amino ac. | 20 | 1.51e-3 | Metal binding (UniProt) |
| 8 | 0.0353 | Volume change | 21 | 9.4e-4 | Signal peptide (UniProt) |
| 9 | 0.0239 | FireDB residue | 22 | 7.71e-4 | Active site (UniProt) |
| 10 | 8.94e-3 | Any uniprot | 23 | 6.86e-4 | Carbohyd (UniProt) |
| 11 | 7.70e-3 | Formal charge | 24 | 5.02e-4 | Site (UniProt) |
| 12 | 6.80e-3 | Cbeta Branching | 25 | 5.33e-5 | Transmembrane (UniProt) |
| 13 | 6.02e-3 | Disulfid (UniProt) |
Ranking calculated with the InfoGainAttributeEval function in Weka. Features that are specifically related to the protein kinase superfamily rank among the most informative ones
Benchmark of KinMutRF respect to other methods
| Method | Accuracy | Precision | Recall | F-score | MCC |
|---|---|---|---|---|---|
| MutationTaster | 0.56 | 0.38 |
| 0.55 | 0.36 |
| SIFT | 0.68 | 0.45 | 0.81 | 0.58 | 0.39 |
| Polyphen2:HDIV | 0.66 | 0.44 | 0.90 | 0.59 | 0.42 |
| LRT | 0.65 | 0.45 | 0.87 | 0.59 | 0.39 |
| MutationAssessor | 0.76 | 0.55 | 0.66 | 0.60 | 0.43 |
| CADD | 0.76 | 0.54 | 0.77 | 0.64 | 0.48 |
| Polyphen2:HVAR | 0.64 | 0.53 | 0.85 | 0.65 | 0.50 |
| FATHMM | 0.82 | 0.69 | 0.63 | 0.66 | 0.54 |
| VEST | 0.87 | 0.74 | 0.82 |
|
|
| KinMutRF |
|
| 0.75 |
| 0.68 |
Prediction performance in a 10-fold cross-validation experiment on the 3689 kinase variants for which UniProt provides a characterization of pathogenicity. In bold, the best score for each performance measure
Summary of the KinMutRF and PON-BTK prediction results
| Pathogenic | Neutral | |||||
|---|---|---|---|---|---|---|
| Prediction | Overlap | Diff. | Prediction | Overlap | Diff. | |
| KinMutRF | 1285 (85.9 %) | 967 | 210 (14.1 %) | 174 | ||
| PON-BTK | 1003 (67.1 %) | 36 | 492 (32.9 %) | 318 | ||
Prediction: indicates the total number of BTK variants predicted as pathogenic and neutral. Numbers in parenthesis represent the percentage from a maximum of 1495 possible nonsynonymous variants. Overlap: total number of BTK variants predicted as pathogenic and neutral by KinMutRF and PON-BTK. Diff.: total number of BTK variants with different predictions by KinMutRF and PON-BTK
Fig. 2Prediction of pathogenicity for variants uncharacterised in UniProt. a Distribution of predictions of pathogenicity in the different protein kinases; b Fraction of predictions as disease-associated and neutral; c Distribution of predictions of pathogenicity in the different groups in the taxonomy of protein kinases; d Distribution of predictions of pathogenicity respect to PFAM domains; e Distribution of the PFAM domain log odds-ratios for neutral and disease-associated variants; f Distribution of the accummulated Gene Ontology log odds-ratios (sumGOlor) for neutral and disease-associated variants