| Literature DB >> 23055974 |
Jose M G Izarzugaza1, Martin Krallinger, Alfonso Valencia.
Abstract
Protein kinases play a crucial role in a plethora of significant physiological functions and a number of mutations in this superfamily have been reported in the literature to disrupt protein structure and/or function. Computational and experimental research aims to discover the mechanistic connection between mutations in protein kinases and disease with the final aim of predicting the consequences of mutations on protein function and the subsequent phenotypic alterations. In this article, we will review the possibilities and limitations of current computational methods for the prediction of the pathogenicity of mutations in the protein kinase superfamily. In particular we will focus on the problem of benchmarking the predictions with independent gold standard datasets. We will propose a pipeline for the curation of mutations automatically extracted from the literature. Since many of these mutations are not included in the databases that are commonly used to train the computational methods to predict the pathogenicity of protein kinase mutations we propose them to build a valuable gold standard dataset in the benchmarking of a number of these predictors. Finally, we will discuss how text mining approaches constitute a powerful tool for the interpretation of the consequences of mutations in the context of disease genome analysis with particular focus on cancer.Entities:
Keywords: disease; kinase; literature mining; mutation; pathogenicity prediction; protein kinase; text mining; variation
Year: 2012 PMID: 23055974 PMCID: PMC3449330 DOI: 10.3389/fphys.2012.00323
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.566
Summary of methods to predict the pathogenicity of mutations.
| Method | Main features | Further information |
|---|---|---|
| SIFT (Ng and Henikoff, | Threshold-based, conservation | |
| PMUT (Ferrer-Costa et al., | Neural Network, sequence-, and structure-based features | |
| SNPs3D (Yue et al., | Support Vector Machine, structure-based features | |
| PANTHER (Thomas et al., | Threshold-based, conservation (PSEC) | |
| Pfam LogRE (Clifford et al., | Threshold-based, probability of a PFAM domain to be pathogenic using a log-odds ratio | |
| LS-SNP (Karchin, | Support Vector Machine, sequence-, and structure-based features | |
| CanPredict (Kaminker et al., | Combines SIFT, Pfam LogRE, and Gene Ontology terms in a single prediction | |
| SNAP (Bromberg and Rost, | Neural Network, sequence-, and structure-based features | |
| Torkamani (Torkamani and Schork, | Support Vector Machine, sequence-, and structure-based features, kinase-specific | |
| MutaGeneSys (Stoyanovich and Pe’er, | Whole-genome marker correlation dataset to identify association to causal SNPs in OMIM | |
| stSNP (Uzun et al., | Integrates non-synonymous SNPs from dbSNP, structural models from Modeler and KEGG pathways. Comparative native/mutant analysis | |
| F-SNP (Lee and Shatkay, | Metaserver, combines PolyPhen, SNPeffect2.0, SNPs3D, LS-SNP | |
| SNP & GO (Calabrese et al., | Support Vector Machine, several sequence-derived features, and information from Gene Ontology terms | |
| PolyPhen-2 (Adzhubei et al., | Bayesian classifier, sequence-, and structure-based features | |
| MuD (Wainreb et al., | Random forest, sequence-, and structure-based features | |
| CHASM (Wong et al., | Random forest, sequence-based features | |
| Mutation Assessor (Reva et al., | Threshold-based, differential evolutionary conservation in subfamilies | |
| Condel (González-Pérez and López-Bigas, | Metaserver, combines the output of other predictors | |
| wKinMut (Izarzugaza et al., | Framework for the analysis of kinase mutations. Integrates annotations, predictions, and information from the literature |
Summary of resources providing information about kinases and mutations.
| Method | Description | Further information |
|---|---|---|
| UniProt (Consortium, | General information about proteins, including human protein kinases | |
| PDB (Berman et al., | Catalog of protein structures, protein kinases widely represented | |
| PDBsum (Laskowski et al., | Annotation on protein structures | |
| KinBase (Manning et al., | Hierarchical classification of protein kinases | |
| SwissVar (Yip et al., | Detailed information about mutations present in UniProt | |
| COSMIC (Bamford et al., | Catalog of somatic mutations in cancer | |
| Ensembl (Flicek et al., | Infrastructure for the integrated annotation on chordate and selected eukaryotic genomes | |
| dbSNP (Sherry et al., | Annotated catalog of SNPs | |
| HapMap (Consortium et al., | Catalog of common genetic variants in the human genome | |
| 1000 Genomes (Consortium et al., | Deep catalog of human variations derived from the next-generation sequencing of 1000 people | |
| TCGA (Network, | The Cancer Genome Atlas is a collection of genetic variations found in 20 different cancers | |
| ICGC (Consortium et al., | The International Cancer Genome Consortium project aims to a comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 tumor types and sub-types | |
| OMIM (Amberger et al., | Catalog of Mendelian mutations known to cause disease | |
| SAAPdb (Hurst et al., | Calculation of the structural consequences of mutations | |
| SNPeffect 2.0 (Reumers et al., | A database mapping molecular phenotypic effects of human non-synonymous coding SNPs | |
| ModBase (Pieper et al., | Structural models of mutant proteins | |
| TopoSNP (Stitziel et al., | TopoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association | |
| MoKCa (Richardson et al., | Annotated catalog of cancer-associated mutations in protein kinases | |
| KinMutBase (Ortutay et al., | Registry of disease-causing mutations in protein kinase domains |
Figure 1SNP2L Pipeline as an example of a typical automatic method to extract mutation mentions from the literature. The pipeline integrates article retrieval, detection of mutations and proteins in the corresponding article, correct mutation-protein association and, finally, validation of the results.
Summary of text mining implementations for mutation extraction.
| Method | Main features |
|---|---|
| MEMA (Rebholz-Schuhmann et al., | Regular expressions, gene and protein mentions, co-mention proximity, OMIM validation |
| MuteXt (Horn et al., | Regular expressions, GPCR and NR mentions detection, co-mention proximity, sequence check |
| Yip (Yip et al., | Regular expressions, protein mentions detection, SwissProt validation, sequence check |
| Mutation GraB (Lee et al., | Regular expressions, protein mentions detection, graph shorted distance, sequence check |
| Mutation Miner (Baker and Rene, | Regular expressions, protein mentions detection, sentence co-mention |
| MuGeX (Erdogmus and Sezerman, | Regular expressions, protein mentions, protein, and DNA mutation disambiguation |
| VTag (McDonald et al., | Machine learning detection of acquired sequence variation mentions detection (mutations, translocations, and deletions) |
| OSIRIS (Furlong et al., | Detection of human gene variations corresponding to SNPs |
| MutationFinder (Caporaso et al., | Regular expressions and patterns, protein mutations mentions detection, complex language expressions |