Literature DB >> 18986996

MoKCa database--mutations of kinases in cancer.

Christopher J Richardson1, Qiong Gao, Costas Mitsopoulous, Marketa Zvelebil, Laurence H Pearl, Frances M G Pearl.   

Abstract

Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18986996      PMCID: PMC2686448          DOI: 10.1093/nar/gkn832

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Cancers arise due to the accumulation of mutations in critical target genes that confer a selective advantage on the cell (and its progeny) that contain them. Knowledge of these mutations is key to understanding the biology of cancer initiation and progression, as well as to the development of more targeted therapeutic strategies. While there are rare examples of cancers driven by a single genetic alteration (1), in most solid tumours, tumourigenesis is a multistep process (2,3), reflecting the genetic alterations necessary to transform normal cells into malignant derivatives. The crucial events include acquisition of genomic instability, cell cycle deregulation, evasion of apoptosis, limitless replicative potential, angiogenesis and metastasis (4). Regardless of the multiple mutations ultimately required, a single mutation can initiate the process. Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies (5). There are 518 documented mammalian protein kinases (6) encoded in the human genome, which together represent the largest family of human enzymes, collectively termed the kinome. They play indispensable roles in numerous cellular, metabolic and signalling pathways, in all cell types. Around 40% of the kinome have multiple splice variants and 10% of the total encode catalytically deficient enzymes that have been termed pseudokinases. Although there are several kinase classification schemes (e.g. 6,7), the KinBase resource (http://www.kinase.com/kinbase) (6) reflects the currently accepted classification of eukaryotic protein kinases where the kinases are broadly split into two groups: conventional protein kinases (ePKs) and atypical protein kinases (aPKs). The ePKs are the largest group, and are subclassified into eight families using the sequence similarity of the kinase domain, the presence of accessory domains, and consideration of their modes of regulation. The aPKs are a smaller set of protein kinases that do not share clear sequence similarity with ePKs, but have been shown experimentally to have protein kinase activity. As the entries in KinBase are filtered by stringent criteria, including verification by cDNA cloning, the KinBase classification scheme is the one favoured by experimentalists working on kinases and signal transduction pathways. Protein kinases are frequently found to be mis-regulated in human cancer, and the Cancer Genome Project and similar initiatives, have undertaken systematic re-sequencing screens of all annotated protein kinases in the human genome, to attempt to identify commonly occurring mutations that may play significant roles in a range of different cancers (8–10). In all cases the key to understanding the contribution of a particular disease-associated kinase mutation to development and progression of cancer, comes from an appreciation of the consequences of that mutation on the function of the affected protein, and the impact on the pathways in which that protein is involved. It is this that the MoKCa database described here, aims to facilitate. Changes in the nucleotide sequence of a gene can have a variety of consequences for the encoded protein, including truncations and frameshifts that disrupt the protein structure and/or reduce transcript levels via nonsense-mediated RNA decay. The most common mutational event is a single base change, leading to a missense mutation of a single amino acid. Cancer-associated somatic mutations (CASMs) or missense variants are commonly identified in somatic tumour DNA, but only a fraction directly contribute to oncogenesis. Distinguishing those that contribute to cancer from those that do not is a difficult problem, potentially requiring detailed and protracted functional analysis. However the ability to make this determination, both rapidly and inexpensively, will be essential to realization of ‘personalized therapy’ targeted to the individual tumour. Several approaches have been taken to predict which genes contain mutations that contribute to mutagenesis (i.e. drivers genes) rather than those genes that contain mutations that arise by chance but have no bearing on the disease (passenger mutations). Statistical models comparing the observed ratio of non-synonymous:synonymous compared with that expected by chance, have been used to identify and estimate the number of cancer drivers within a total set of identified genetic variation (11). These methods are excellent for predicting which genes contain drivers, but do not identify the driver mutation alone. Consequently algorithms have been developed that attempt to assess the driver status of missense mutations (12–14) based on the notion that evolutionary conserved sites in a protein tend to be involved in its function, and that mutations that change the properties of these sites, alter that function. However the mechanistic nature of a functional change (activation, inhibition or subversion) and its detailed biochemical effect, is virtually impossible to predict without a detailed analysis of the protein, informed by ‘expert’ insight into its individual biology. There are several protein family-specific databases that collate disease-associated mutations from the literature, such as SH2base (15) that contains data for germline mutations in proteins containing SH2 domains, and KinMutBase (16) which documents disease-related germline mutations in 33 kinases. These data either derive from repetitious sequencing in affected families of particular kinases mutated in specific diseases, or from harvesting literature identifications of mutations observed in individual studies. Furthermore, the COSMIC database (17), is undertaking to document all somatic cancer mutations reported in the literature. By bringing together automatic assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and presenting these in a convenient and coherent fashion, MoKCa aims to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data. These ‘experts’ are then able to bring detailed insights into the biochemistry and biology of individual proteins and systems, that are virtually impossible to encapsulate in an algorithm, but are key to determining if and how a particular mutation will alter the biological function of a protein. Thus the MoKCa database combines automated and ‘expert’ annotation of individual mutations, and is firmly directed towards the specific needs of the cancer research community.

BUILDING THE MoKCa DATABASE

Mutation data

The original mutational data was provided by the Sanger Cancer Genome Project (CGP) Team, and comprises the mutations found in the large-scale re-sequencing of the kinase complement from 210 human cancers. This included samples from breast, lung, colorectal, gastric, testis, ovarian, renal, melanoma, glioma and acute lymphoblastic leukaemia (ALL) cancers (9). Of the 210 tumours studied, 169 were primary tumours, 2 were early cultures and 39 immortal cell lines. One-thousand and seven somatic mutations are documented in the coding exons and splice junctions in the kinases of 137 of the tumours studied. Nine-hundred and twenty one were single base substitutions, 78 were small insertions or deletions and 8 were complex changes. Of the single base substitutions, 620 were missense changes, 54 caused nonsense changes and 28 were observed at highly conserved positions of splice junctions. There were also 219 silent (synonymous mutations). Approximately one-third of these mutations had previously been reported in the literature. Added to this core dataset, are additional somatic mutational data from the COSMIC database that have been curated from the literature (17). This includes 15 911 missense, 47 nonsense and 50 insertions or deletions. This results in a non-redundant mutation dataset of 1406 distinct mutations from over 20 different types of cancer: 269 silent, 912 missense, 83 nonsense, 27 splice site, 84 deletions and 8 multi substitutions.

Driver/passenger assignment of a gene

The Sanger kinase study is by far the largest dataset relating to somatic kinase mutations, and has the advantage of generating a clear picture of the background mutational levels in each gene and in each tumour. This allows the estimation of the likelihood of mutations in a particular gene being significant as a driver of each particular tumour. The deviation of the ratio of non-synonymous:synonymous mutations from that expected by chance, was used to indicate the presence of selection in genes on non-synonymous mutations. Each gene is assigned the selective pressure calculated by Greenman et al. (11): Of the 921 base substitutions in the primary screen 763 are estimated to be passenger mutations, with an estimated 158 driver mutations predicted to be distributed within 119 genes. Each gene is also assigned a rank, which reflects the probability of the gene containing at least one driver mutation.

Data normalization and domain assignment

The translated CGP kinase nucleotide sequences were used as reference sequences, and were scanned against Swiss-Prot/TrEMBL (18) using BLASTP (19), to identify the corresponding Swiss-Prot protein entry, which was then used for numbering, annotation and linking to other major primary and secondary databases. Kinase sequences were mapped to the closest Swiss-Prot sequence and the alignments stored in the database—where more than one isoform was identified the longest transcript was adopted as the matched sequence. This protocol was repeated for the reference sequences from the COSMIC database. Kinase classification schema were extracted from KinBase, and each kinase assigned to its group, family and sub-family. Alternative Gene identifiers were extracted from KinBase and Swiss-Prot and stored in a pseudonym table. Gene names were also mapped to HUGO identifiers (20). Domain boundaries were extracted from Pfam (21), and both PfamA and PfamB domains are displayed. PROSITE patterns (22) are used to identify kinase signature patterns, for example the Serine/Threonine protein kinases active-site signature and Protein kinases ATP-binding region signature. Known phosphorylation sites were downloaded from Phospho.ELM (23) and mapped on to the kinase sequences to help determine whether a mutation changes a residue normally post-translationally modified during its functional life.

Structural mapping of mutations

To map mutational data to protein structures, the sequence for each Pfam domain or non-domain region that contained a missense mutation, was scanned against a database containing non-redundant protein sequences (UniRef90) (18) and the sequences from the PDB using PSI-BLAST (cut-off value of 10–4) (19). The kinase reference sequences and PDB sequences were then aligned. To identify which mutations mapped onto residues with structural density in the PDB file, PDB sequence to structure alignments from the Structure integration with function, taxonomy and sequence (SIFTS) initiative (24) were utilized. Models of the structures with the substituted residues are currently being generated and will be available shortly.

Automated assessment of pathogenicity of mutations

Prediction of which missense mutations contribute to oncogenesis, are provided by the CanPredict algorithm (12), which incorporates three independent scoring schemes based on SIFT (25), Pfam logR.E (26) and GOSS scores. The SIFT algorithm uses similarity between closely related proteins to identify potentially deleterious changes. SIFT scores <0.05 are predicted to be deleterious. The Pfam-based logR.E-value score predicts whether a change will alter protein function by determining the difference in fit of a wild-type version of the protein to a particular Pfam model and a score >0.5 indicates a deleterious change. Lastly, a GOSS metric uses the gene ontology to measure the similarity of the function of a gene to other known cancer-causing genes.

Protein–protein interaction data

Key to determining the impact of a mutation on the cell, is the impact of the mutation on the proteins interactions and pathways. Towards this goal, protein–protein interaction data, for each kinase are provided by the ROCK database (Zvelebil et al., unpublished), an Oracle-based data warehouse that integrates a large number of experimental and derived data in a modular design. ROCK will provide a single online interface for the management, navigation, cross-linking and cross-correlation of data relating to breast cancer research from a wide variety of laboratory and clinical studies. The core of the database contains genomic and protein interaction datasets which are linked to experimental results. Interaction datasets are from MINT (27), MIPS (28), Reactome (29), BioGRID (30), HPRD (31), IntAct (32), BIND (33) and derived from Inparanoid (34) and Homologene (35) along with data curated from literature (Supplementary Table 1).

Database design

MoCKa was implemented using a MySQL database running on a Linux server, with PERL scripts used for all data retrieval and output. Its modular design is compatible with future expansion and connectivity, and currently contains the following subsections: Gene [EntrezGene (35), RefSeq (35), Ensembl (36)], Pseudonyms, Protein (Swiss-Prot/UniProt). Gene Ontology (37), Structure (PDB) (38), Mutational data [CGP, COSMIC (17)], Domain Structure and annotation [CATH (39), Gene3D (40), Pfam (21), SMART (41)], Functional annotation [Phospho.ELM (19)], Cancer data [Sequence, Mutations, Cell line, Cancer sub-type, Selective Pressure (9)] and protein interactions for each kinase (ROCK).

THE MoKCa WEB-INTERFACE

At the highest level MoKCa provides the full list of 518 human protein kinases listed alphabetically by gene name to facilitate browsing, with each entry labelled with the number of mutations found, the cancer driver selection pressure and rank, and an iconic representation of the tumour type(s) in which mutations in that protein kinase have been found. A ‘lightbulb’ icon indicates those proteins for which expert annotations have been added. In addition to alphabetic sorting on the gene name, the list can be sorted by selective pressure or driver ranking (Figure 1). Sorting on selective pressure is particularly informative in presenting those kinases that were found to have the highest involvement in driving cancer in the study of Greenman et al. (11), and probably presents the least biased assessment of driver probability currently available. Alternatively, a filtered list can be generated by flexible text-matching against gene name, UniProt accession code, UniProt protein names and synonyms, or GenBank ID. Future developments will allow selection of sub-sets of genes based on attributes such as kinome ‘branch’ or association with specific tumour type(s). Website navigation is assisted by a ‘breadcrumb’ system that tracks the user's journey through the database and allows them to return to any stage.
Figure 1.

MoKCa kinase gene list. Genes encoding protein kinases are shown listed by ranking of their probability of containing one or more cancer-driving mutation (11). Gene names are additionally annotated with number of mutations found in the Cancer Genome Project analysis (9), the calculated selection pressure on that gene, and indicators showing the cancer types in which the gene was found mutated. The list can also be sorted alphabetically or by selective pressure. Gene names hyperlink to gene-level pages.

MoKCa kinase gene list. Genes encoding protein kinases are shown listed by ranking of their probability of containing one or more cancer-driving mutation (11). Gene names are additionally annotated with number of mutations found in the Cancer Genome Project analysis (9), the calculated selection pressure on that gene, and indicators showing the cancer types in which the gene was found mutated. The list can also be sorted alphabetically or by selective pressure. Gene names hyperlink to gene-level pages.

Gene level

Selection of a gene from the full or filtered list transfers the user to a gene-level page, which provides information and links relevant to the gene, its encoded protein and the spectrum of mutations present in the database for that gene. The encoded protein structure is represented as a domain pictogram, with the positions of silent, missense, nonsense, indel and frameshift mutations indicated. Phosphorylation sites and other functional features are also indicated (Figure 2a and b). A 3D-cartoon image of the crystal structure most closely homologous to the encoded protein and to which the greatest number of mutations can be mapped, is shown, with the mapped positions of mutated residues highlighted (see later).
Figure 2.

Mapping mutations to domains. The spectrum of mutations identified for each gene is shown on its gene-level page mapped on to a schematic representation of the domain structure of the encoded protein. (a) The domain structure of each encoded protein (defined by Pfam definitions) is shown as a pictogram, with the positions of mutations annotated with icons specific to the type of mutation—silent, missense, nonsense (stop), deletion (frameshift or in-frame) or insertion (frameshift or in-frame). Functionally important sequence features, such as documented phosphorylation sites from Phospho.ELM (23) are also shown. Annotation of features such as active sites and ATP-binding motifs (from Prosite) will be added in future versions. (b) Example domain pictogram for TGFBR2, showing missense mutations distributed in the extra-cellular and kinase domains, a frameshift (probably truncating) deletion mutation in the extra-cellular domain, and a truncating nonsense mutation in the C-terminal lobe of the kinase domain. Automated assessment by CanPredict identifies a lung-cancer associated H328Y mutation as a cancer driver. (c) Domain pictogram for BRAF, identified as a strongly selected mutated gene in melanoma and a range of other cancers, showing the cluster of activating missense mutations in the activation segment.

Mapping mutations to domains. The spectrum of mutations identified for each gene is shown on its gene-level page mapped on to a schematic representation of the domain structure of the encoded protein. (a) The domain structure of each encoded protein (defined by Pfam definitions) is shown as a pictogram, with the positions of mutations annotated with icons specific to the type of mutation—silent, missense, nonsense (stop), deletion (frameshift or in-frame) or insertion (frameshift or in-frame). Functionally important sequence features, such as documented phosphorylation sites from Phospho.ELM (23) are also shown. Annotation of features such as active sites and ATP-binding motifs (from Prosite) will be added in future versions. (b) Example domain pictogram for TGFBR2, showing missense mutations distributed in the extra-cellular and kinase domains, a frameshift (probably truncating) deletion mutation in the extra-cellular domain, and a truncating nonsense mutation in the C-terminal lobe of the kinase domain. Automated assessment by CanPredict identifies a lung-cancer associated H328Y mutation as a cancer driver. (c) Domain pictogram for BRAF, identified as a strongly selected mutated gene in melanoma and a range of other cancers, showing the cluster of activating missense mutations in the activation segment. Hyperlinks are provided to entries for the gene in the Cancer Genome Project COSMIC database, SwissProt/UNIPROT and the iHop literature browser (42). Biological function can be accessed via a link to GO and network interactions involving the encoded protein can be explored via a link to ROCK. The list of individual mutations identified for the gene, annotated by the amino acid change, Pfam domain to which they map, cancer driver prediction from CanPredict (‘tick’ for probably cancer; ‘cross’ for probably not) and tumour type(s) in which they occur, provide hyperlinks to the mutation level. The Pfam domain names (for Pfam-A only) hyperlink to the functional definitions for that domain. A ‘lightbulb’ icon indicates those mutations for which expert annotations have been added.

Mutation level

The mutation-level pages provide information and links relevant to the particular mutation, including details of the tumour sample and type in which it was found, and the breakdown of the CanPredict assessment of the cancer probability of that mutation, where available. SMART and Pfam domain boundary mappings on to the amino acid sequence are also provided. A hyperlinked list is provided of all Protein Databank 3D structures (ranked by sequence identity) that match the amino acid sequence of the affected domain with a significant PSI-BLAST e-value. The list is sorted by sequence identity with each entry annotated with the BLAST e-value for the match, and the protein chain and residue number in the structure file that corresponds to the mutated residue. A 3D-cartoon image is generated of the highest homology structure in which the equivalent to the affected residue was structurally defined, with that residue highlighted. Hyperlinks are provided to launch an interactive session with the Jmol viewing applet (http://www.jmol.org), in which the structure can be examined in 3D, and a script for the fully featured molecular Graphics program PyMol (http://www.pymol.org) can be downloaded. Registered ‘expert’ users are able to access text fields in which they can enter their expert assessment of the mechanistic effect of the particular mutation based on the automated predictions and detailed examination of the structural mapping. The mechanism describes the change in biological activity that results from the mutation and includes whether the kinase activity increases or decreases, and whether this results in a tumour suppressor or oncogenic effect on the cell or its pathways. Whereas the evidence refers to the publications in which the proposed mechanism has been described. It can be direct or inferred from a similar mutation in a homologous protein or in an analogous system. These opinions are visible on the mutation-level pages, but not editable, by other users. Future developments will include hyperlinked bibliographic references in support of the expert opinion.

Protein functional interaction networks

Interaction maps are displayed using ROCKscape with nodes representing proteins and edges representing interaction or transcriptional regulatory relationships between components. Depending on the respective display mode selected, edges are colour-coded to identify the type of interaction, and nodes are coloured with selective pressure. Interaction maps can also be progressively expanded outwards to display the next interaction neighbourhood or subsequent levels beyond, thus, making it possible to ‘grow’ networks or extend signalling pathway modules in a desired direction.

EXAMPLE OF USE

One of the first discoveries of the Cancer Genome Project was that mutations in BRAF were found in 70% of melanomas, and to a lesser extent in colorectal, lung and ovarian cancers. By browsing the MOKCA interface, and choosing ‘selective pressure’, it is clear the BRAF is near the top of the predicted driver list (Figure 1), and it can be easily identified in which tumour types the mutations arise. By following the link to the protein page it is clearly visible that the mutations are all missense (Figures 2c and 3a), are localized to the kinase domain, and most are predicted to be pathogenic by CanPredict. The illustration of the protein structure clearly shows that the mutations are all tightly clustered in the 3D structure, in and around the activation segment and phosphate-binding loop.
Figure 3.

Mutations, mechanisms and pathways. (a) Gene-level pages (illustrated for BRAF) provide links to data for the gene in a range of primary and other databases including COSMIC, UNIPROT and KinBase. Additional links are provided to GO functional annotations, Pfam definitions for domains of the encoded protein, and Prosite motifs matched in the encoded protein sequence. A hyperlink to the ROCK Online Cancer Knowledge base (Zvelebil et al., unpublished), provides view of the protein–protein interactions [see (c)]. A hyperlinked list of mutations, annotated with tumour sample registry numbers, and indicators for tumour type and CanPredict assessment of driver probability, points out to individual mutation-level pages [see (b)]. (b) Mutation-level pages give specific domain assignments and CanPredict assessments for the individual mutation, as well as hyperlinks to a homology-ranked list of protein structures homologous to the affected domain. A 3D-graphic overview of the domain is provided with the mutated residue highlighted, and a Jmol session allowing interaction with the protein can be launched. ‘Expert’ users are able to enter a description of the mechanistic implications of the particular mutation, if any, and evidence for that opinion. Future developments will include bibliographic hyperlinks to relevant publications. (c) ROCK provides a graphical view (with optional Cytoscape viewer) of the documented protein–protein interactions for the affected gene (highlighted red). For other kinases appearing in the interaction network, the selective pressure for cancer-associated mutations is used to colour the node. In the BRAF example shown, the two ERK kinases (MAP2K1/2) downstream of BRAF display no selective pressure, whereas RAF1 (aka C-RAF) which co-operates in parallel with BRAF, displays a selective pressure for cancer driver mutations. The displayed network can be interactively expanded or contracted, and interrogated, as required. A full description of ROCK will be presented elsewhere.

Mutations, mechanisms and pathways. (a) Gene-level pages (illustrated for BRAF) provide links to data for the gene in a range of primary and other databases including COSMIC, UNIPROT and KinBase. Additional links are provided to GO functional annotations, Pfam definitions for domains of the encoded protein, and Prosite motifs matched in the encoded protein sequence. A hyperlink to the ROCK Online Cancer Knowledge base (Zvelebil et al., unpublished), provides view of the protein–protein interactions [see (c)]. A hyperlinked list of mutations, annotated with tumour sample registry numbers, and indicators for tumour type and CanPredict assessment of driver probability, points out to individual mutation-level pages [see (b)]. (b) Mutation-level pages give specific domain assignments and CanPredict assessments for the individual mutation, as well as hyperlinks to a homology-ranked list of protein structures homologous to the affected domain. A 3D-graphic overview of the domain is provided with the mutated residue highlighted, and a Jmol session allowing interaction with the protein can be launched. ‘Expert’ users are able to enter a description of the mechanistic implications of the particular mutation, if any, and evidence for that opinion. Future developments will include bibliographic hyperlinks to relevant publications. (c) ROCK provides a graphical view (with optional Cytoscape viewer) of the documented protein–protein interactions for the affected gene (highlighted red). For other kinases appearing in the interaction network, the selective pressure for cancer-associated mutations is used to colour the node. In the BRAF example shown, the two ERK kinases (MAP2K1/2) downstream of BRAF display no selective pressure, whereas RAF1 (aka C-RAF) which co-operates in parallel with BRAF, displays a selective pressure for cancer driver mutations. The displayed network can be interactively expanded or contracted, and interrogated, as required. A full description of ROCK will be presented elsewhere. Moving to the mutation pages, an annotated mechanism for each mutation is described (Figure 3b). For instance, Val 600, which is mutated to glutamate in several different melanomas as well as in ovarian and colorectal tumours, lies at the beginning of the activation segment. The V600E mutation destabilizes the auto-inhibited state of the kinase, leading to constitutive kinase activity in the absence of upstream activating signals. This provides the tumours with self-sufficiency in growth signals, one of the key ‘hallmark’ traits of cancer (4) via its activation of the downstream MAP kinase signalling pathway. Following a hyperlink to the protein–protein interaction page provided by ROCK (Figure 3c), the functional interaction of BRAF with its downstream kinase phosphorylation targets ERK1/2 (MAP2K1/2) are evident. Unlike the upstream BRAF, neither of these were found to be under selective pressure for mutation in the tumours analysed. It can also be seen that BRAF interacts with RAF1, a known kinase proto-oncogene that also phosphorylates ERK1/2, and like BRAF is also under mutational selective pressure.

DISCUSSION

MoKCa provides data and assessments from multiple sources and algorithms for each potential cancer-associated mutation, and the interactive display of the data facilitates authoritative annotation by cancer biologists and structural biologists. These ‘experts’ will bring detailed insights into the biochemistry and biology of individual proteins and systems, that are virtually impossible to encapsulate in an algorithm, but are key to determining if and how a particular mutation will alter the biological function of protein. Thus the MoKCa database combines automated and ‘expert’ annotation of individual mutations, and is firmly directed towards the specific needs of the cancer research community.

FUTURE DEVELOPMENTS

As the various cancer genome and re-sequencing projects continue to generate mutational data, these will be incorporated into the database. We also propose to add cancer-related germline mutational data from OMIM, and family-specific mutational databases such as SH2base (15) and KinMutBase (16), which document inherited genetic disease mutations. The MoKCa database has been designed to combine functional and protein-interaction data, with a user-friendly interface to assess which kinases are most causal of cancer, and which mutations are cancer drivers. It will be important to be able to mine this information to predict which mutated kinases will be effective drug-targets. The database will also be extended to include pathway information, for both known and in-house pathways. For fully documented pathways we will try to ascertain at the molecular level, what affect a mutation will have on pathway function and how it affects the ‘sign’ (activatory or inhibitory) of the pathway interactions made by the affected protein.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Funding for open access charge: The Institute of Cancer Research. Conflict of interest statement. None declared.
  42 in total

1.  The genomic landscapes of human breast and colorectal cancers.

Authors:  Laura D Wood; D Williams Parsons; Siân Jones; Jimmy Lin; Tobias Sjöblom; Rebecca J Leary; Dong Shen; Simina M Boca; Thomas Barber; Janine Ptak; Natalie Silliman; Steve Szabo; Zoltan Dezso; Vadim Ustyanksky; Tatiana Nikolskaya; Yuri Nikolsky; Rachel Karchin; Paul A Wilson; Joshua S Kaminker; Zemin Zhang; Randal Croshaw; Joseph Willis; Dawn Dawson; Michail Shipitsin; James K V Willson; Saraswati Sukumar; Kornelia Polyak; Ben Ho Park; Charit L Pethiyagoda; P V Krishna Pant; Dennis G Ballinger; Andrew B Sparks; James Hartigan; Douglas R Smith; Erick Suh; Nickolas Papadopoulos; Phillip Buckhaults; Sanford D Markowitz; Giovanni Parmigiani; Kenneth W Kinzler; Victor E Velculescu; Bert Vogelstein
Journal:  Science       Date:  2007-10-11       Impact factor: 47.728

2.  The 20 years of PROSITE.

Authors:  Nicolas Hulo; Amos Bairoch; Virginie Bulliard; Lorenzo Cerutti; Béatrice A Cuche; Edouard de Castro; Corinne Lachaize; Petra S Langendijk-Genevaux; Christian J A Sigrist
Journal:  Nucleic Acids Res       Date:  2007-11-14       Impact factor: 16.971

3.  Phospho.ELM: a database of phosphorylation sites--update 2008.

Authors:  Francesca Diella; Cathryn M Gould; Claudia Chica; Allegra Via; Toby J Gibson
Journal:  Nucleic Acids Res       Date:  2007-10-25       Impact factor: 16.971

4.  The universal protein resource (UniProt).

Authors: 
Journal:  Nucleic Acids Res       Date:  2007-11-27       Impact factor: 16.971

5.  Gene3D: comprehensive structural and functional annotation of genomes.

Authors:  Corin Yeats; Jonathan Lees; Adam Reid; Paul Kellam; Nigel Martin; Xinhui Liu; Christine Orengo
Journal:  Nucleic Acids Res       Date:  2007-11-21       Impact factor: 16.971

6.  The HGNC Database in 2008: a resource for the human genome.

Authors:  Elspeth A Bruford; Michael J Lush; Mathew W Wright; Tam P Sneddon; Sue Povey; Ewan Birney
Journal:  Nucleic Acids Res       Date:  2007-11-04       Impact factor: 16.971

7.  The Gene Ontology project in 2008.

Authors: 
Journal:  Nucleic Acids Res       Date:  2007-11-04       Impact factor: 16.971

8.  The Pfam protein families database.

Authors:  Robert D Finn; John Tate; Jaina Mistry; Penny C Coggill; Stephen John Sammut; Hans-Rudolf Hotz; Goran Ceric; Kristoffer Forslund; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2007-11-26       Impact factor: 16.971

9.  Ensembl 2008.

Authors:  P Flicek; B L Aken; K Beal; B Ballester; M Caccamo; Y Chen; L Clarke; G Coates; F Cunningham; T Cutts; T Down; S C Dyer; T Eyre; S Fitzgerald; J Fernandez-Banet; S Gräf; S Haider; M Hammond; R Holland; K L Howe; K Howe; N Johnson; A Jenkinson; A Kähäri; D Keefe; F Kokocinski; E Kulesha; D Lawson; I Longden; K Megy; P Meidl; B Overduin; A Parker; B Pritchard; A Prlic; S Rice; D Rios; M Schuster; I Sealy; G Slater; D Smedley; G Spudich; S Trevanion; A J Vilella; J Vogel; S White; M Wood; E Birney; T Cox; V Curwen; R Durbin; X M Fernandez-Suarez; J Herrero; T J P Hubbard; A Kasprzyk; G Proctor; J Smith; A Ureta-Vidal; S Searle
Journal:  Nucleic Acids Res       Date:  2007-11-13       Impact factor: 16.971

10.  CanPredict: a computational tool for predicting cancer-associated missense mutations.

Authors:  Joshua S Kaminker; Yan Zhang; Colin Watanabe; Zemin Zhang
Journal:  Nucleic Acids Res       Date:  2007-05-30       Impact factor: 16.971

View more
  27 in total

1.  Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology.

Authors:  Renu Goel; Babylakshmi Muthusamy; Akhilesh Pandey; T S Keshava Prasad
Journal:  Mol Biotechnol       Date:  2011-05       Impact factor: 2.695

2.  A HIF-regulated VHL-PTP1B-Src signaling axis identifies a therapeutic target in renal cell carcinoma.

Authors:  Natsuko Suwaki; Elsa Vanhecke; Katelyn M Atkins; Manuela Graf; Katherine Swabey; Paul Huang; Peter Schraml; Holger Moch; Amy Mulick Cassidy; Daniel Brewer; Bissan Al-Lazikani; Paul Workman; Johann De-Bono; Stan B Kaye; James Larkin; Martin E Gore; Charles L Sawyers; Peter Nelson; Tomasz M Beer; Hao Geng; Lina Gao; David Z Qian; Joshi J Alumkal; Gary Thomas; George V Thomas
Journal:  Sci Transl Med       Date:  2011-06-01       Impact factor: 17.956

3.  Human SBK1 is dysregulated in multiple cancers and promotes survival of ovary cancer SK-OV-3 cells.

Authors:  Pingzhang Wang; Jinhai Guo; Feng Wang; Taiping Shi; Dalong Ma
Journal:  Mol Biol Rep       Date:  2010-11-20       Impact factor: 2.316

4.  Dynamically Coupled Residues within the SH2 Domain of FYN Are Key to Unlocking Its Activity.

Authors:  Radu Huculeci; Elisa Cilia; Agatha Lyczek; Lieven Buts; Klaartje Houben; Markus A Seeliger; Nico van Nuland; Tom Lenaerts
Journal:  Structure       Date:  2016-09-29       Impact factor: 5.006

5.  Global analysis of phosphorylation networks in humans.

Authors:  Jianfei Hu; Hee-Sool Rho; Robert H Newman; Woochang Hwang; John Neiswinger; Heng Zhu; Jin Zhang; Jiang Qian
Journal:  Biochim Biophys Acta       Date:  2013-03-21

Review 6.  Implementation of biomarker-driven cancer therapy: existing tools and remaining gaps.

Authors:  Ann M Bailey; Yong Mao; Jia Zeng; Vijaykumar Holla; Amber Johnson; Lauren Brusco; Ken Chen; John Mendelsohn; Mark J Routbort; Gordon B Mills; Funda Meric-Bernstam
Journal:  Discov Med       Date:  2014-02       Impact factor: 2.970

Review 7.  Therapeutic opportunities within the DNA damage response.

Authors:  Laurence H Pearl; Amanda C Schierz; Simon E Ward; Bissan Al-Lazikani; Frances M G Pearl
Journal:  Nat Rev Cancer       Date:  2015-03       Impact factor: 60.716

8.  Sequence and structure signatures of cancer mutation hotspots in protein kinases.

Authors:  Anshuman Dixit; Lin Yi; Ragul Gowthaman; Ali Torkamani; Nicholas J Schork; Gennady M Verkhivker
Journal:  PLoS One       Date:  2009-10-16       Impact factor: 3.240

9.  Extraction of human kinase mutations from literature, databases and genotyping studies.

Authors:  Martin Krallinger; Jose M G Izarzugaza; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

Review 10.  SH2 domains: modulators of nonreceptor tyrosine kinase activity.

Authors:  Panagis Filippakopoulos; Susanne Müller; Stefan Knapp
Journal:  Curr Opin Struct Biol       Date:  2009-11-18       Impact factor: 6.809

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.