Literature DB >> 16421597

COSMIC 2005.

S Forbes1, J Clements, E Dawson, S Bamford, T Webb, A Dogan, A Flanagan, J Teague, R Wooster, P A Futreal, M R Stratton.   

Abstract

The Catalogue Of Somatic Mutations In Cancer (COSMIC) database and web site was developed to preserve somatic mutation data and share it with the community. Over the past 25 years, approximately 350 cancer genes have been identified, of which 311 are somatically mutated. COSMIC has been expanded and now holds data previously reported in the scientific literature for 28 known cancer genes. In addition, there is data from the systematic sequencing of 518 protein kinase genes. The total gene count in COSMIC stands at 538; 25 have a mutation frequency above 5% in one or more tumour type, no mutations were found in 333 genes and 180 are rarely mutated with frequencies <5% in any tumour set. The COSMIC web site has been expanded to give more views and summaries of the data and provide faster query routes and downloads. In addition, there is a new section describing mutations found through a screen of known cancer genes in 728 cancer cell lines including the NCI-60 set of cancer cell lines.

Entities:  

Mesh:

Year:  2006        PMID: 16421597      PMCID: PMC2361125          DOI: 10.1038/sj.bjc.6602928

Source DB:  PubMed          Journal:  Br J Cancer        ISSN: 0007-0920            Impact factor:   7.640


All cancers arise through the acquisition of a number of DNA sequence mutations, some of which confer growth advantage and drive the clonal expansion of the tumour cells (Vogelstein and Kinzler, 1998). At the DNA sequence level the mutations include base substitutions, deletions, amplifications and rearrangements. It is likely that many somatic mutations are a consequence of defects in DNA repair and maintenance (Slupphaug ; Barnes and Lindahl, 2005) or past exposure to mutagens (Luch, 2005) or both of these phenomena. Are all somatic mutations critical for the development of the tumour in which they are found? Probably not, but the proportion of mutations that are causally implicated in cancer is unclear and certainly varies from tumour to tumour (Wang ; Davies ; Stephens ; Bignell ). Differentiating passenger events from disease causing mutations is a challenge, particularly for genes that are infrequently mutated or have silent or noncoding mutations. This contrasts with genes that are frequently mutated, beyond what would be expected by chance, or have mutations that cluster in key amino-acid residues or functional protein domains. In these cases the genetic evidence on its own strongly implies these genes are involved in the development of cancer. What is clear is the utility of mutation data. The small intragenic mutation data that defines known cancer genes is buried in the scientific literature. There are extensive databases and web sites that actively curate the literature for germline mutations in cancer genes, for example HGVbase (Fredman ) and the Human Gene Mutation Database (HGMD, Stenson ). In addition, there are many databases that store and serve somatic mutation data for single genes (see http://www.hgvs.org for an extensive list). Some of these are actively maintained, such as those for TP53 (Olivier ; Béroud and Soussi, 2003), however, most are not updated. Furthermore, there is wide variation in the data that is stored, the extent of queries that can be levelled at the data and the ability to display and download the results. Although all these resources have value they are dispersed across the internet and thus it is difficult to make direct comparisons between cancer genes. Since the early days of sequencing genes in tumours there have been reports of infrequently mutated genes and occasionally genes that appear to have no mutations. This data is now joined by the results of the systematic sequencing of genes in tumours (Bardelli ; Wang ; Davies ; Bignell ; Stephens ) that also report infrequently mutated genes and many more genes with no mutations. Is this data worth preserving? Definitely yes, both to disseminate the mutation data to a wide audience and as a means of preserving the negative data. The Catalogue of Somatic Mutations in Cancer, COSMIC, (http://www.sanger.ac.uk/cosmic) was launched in 2004 as a free resource to hold and display somatic mutation data for four genes; BRAF, HRAS, KRAS and NRAS (Bamford ). The data in COSMIC has expanded to include data on 538 genes, 124 367 tumours with 23 157 mutations. The web site has been expanded to provide summary pages for the genes, tissue types, references, samples and mutations. In addition, there are new sections detailing the results of our sequencing of known cancer genes in 728 publicly available cancer cell lines that incorporate the NCI-60 cancer cell lines including loss of heterozygosity data and copy number information for many of these cancer cell lines.

DATA CURATION

The genes that have been selected for curation are a subset from the Cancer Gene Census (http://www.sanger.ac.uk/CGP/Census Futreal ) and other genes that have been screened for somatic mutations with either negative or inconclusive results. The data held in COSMIC is extracted from the literature as described in Bamford . Once a gene is included in COSMIC there is an ongoing process to curate additional data after it is published. There is usually a delay between publication of data and its appearance in COSMIC while the data is curated. To enhance the utility of COSMIC we standardise the curated data. We extract the tissue and histology for each sample and map the definitions to the COSMIC classification tables (see http://www.sanger.ac.uk/genetics/CGP/cosmic/data/cosmic_classification_alias_list_01_11_05.xls). This yields a standard set of tumour descriptions that can be queried through the web site. The original definition is always maintained in the database. In a similar fashion, a single DNA sequence is held for each transcript. The transcript sequence is translated to give the protein sequence used by COSMIC. This information is available for each gene and all mutations are mapped to these standard sequences. For example, all BRAF V599E mutations are remapped to amino acid 600 in the COSMIC BRAF protein sequence (see Davison for a typical example).

Potential data biases

The data held in COSMIC that is extracted from the literature is likely to have a number of biases. There is potential for publication prejudice where positive data is more likely to appear in print than negative data. There are almost certainly biases in the samples that have been analysed as many studies are performed using tumours from Europe and the USA. Where particular patient groups appear interesting there is often a surge of analysis that can distort the mutation landscape, for instance the reported population bias in EGFR mutations (Paez ). Furthermore, it is a common practice to screen mutation hotspots in known cancer genes, for example, the selective analysis of codons 12 and 13 in the RAS genes. Where possible all data is entered in to COSMIC rather than selecting specific data sets. When viewing the data in COSMIC there is always a link to the publications that were curated making it possible to view the original data, samples and methods to understand any biases.

DATABASE

The COSMIC database is implemented in Oracle. The schema has expanded since the launch of COSMIC to encompass additional details and enhance the tracking of the curation process (see Supplementary data). The main development of the database has been the introduction of feature tables that are linked to the individual and tumour tables. The feature tables are a generic approach to storing any information relating to the individual and tumour. The features are grouped into feature types, for example, ethnicity. Any ethnic name can be added to this feature type. A more complex feature type is cigarette smoking history. The values that have been stored so far for this feature type include values expressed as pack years as well as less specific comments, such as smoker, nonsmoker, ex-smoker and never-smoker. This system allows COSMIC to capture the wide range of information reported in the literature. It also accepts different data content for different genes, for example, drug response information for tumours with and without EGFR mutations. The other noteworthy addition to the COSMIC schema is a pair of tables that store external data sources for the samples held in COSMIC (see Other Data Types below).

WEB SITE

The COSMIC web site has been further developed to provide faster access to the data, new views and summaries and new links to aid navigation around the various pages. There are two routes to the data; selecting a gene or a tissue. The gene selection is either alphabetical or by chromosome position. There are two tissue selection paths. The Browse by Tissue route presents a list of tissues, subtissues then histologies and subhistologies, which culminate in a tissue overview display (Figure 1). The Quick Tissue path proceeds straight from the tissue selection to the overview page.
Figure 1

Tissue overview. The mutation data for a selected tissue is presented in a summary format, in this case for prostate. The top 5 genes with data in COSMIC are selected as the genes with the highest rank score using the method; RankScore=number of mutations/number of samples – 1.6449 × squareroot((number of mutations/number of samples) × (1–(number of mutations/number of samples)/number of samples). The data is presented in both graphical and tabular formats. Further genes with and without mutations for the selected tissue are listed. All of the gene names can be followed to view the details of the mutations.

The gene summary page provides an overview of the data for each gene (Figure 2). The position of recorded mutations is shown on an overview of the protein sequence with links to the gene histogram page. In addition the gene summary has links to external data sources for the gene, the references that have been curated and an overall sample and mutation count. The gene histogram page has been developed from the original web site to show the mutations either on the protein or cDNA sequence but still shows the mutation position, frequency data by tumour type and details of the mutations (not shown). The gene histogram display now also maps the positions of insertions, deletions and complex mutations. The reference summary page presents a list of the genes that were screened in each paper, the samples that had mutations with details of the mutation and the names of the samples that had no mutations in the genes that were screened (not shown). The details for each of the samples and each mutation are presented in two separate summary pages (not shown).
Figure 2

Gene summary. The initial output for a gene is a graphical view of the mutations distributed along the linear amino acid sequence of the gene. This is the data for RB1. The positions of the mutations are shown by tick marks with tracks showing the total number of mutations and mutations that are insertions, nonsense substitutions, missense substitutions, deletions and complex substitutions. In addition the summary presents the number of references curated, the number of samples for the gene and the number of samples with mutations. There are multiple links from this view leading to web pages describing more details of the mutations, the gene and the references that have been curated.

MUTATION CONTENT

The genes in COSMIC can be split into three categories. In all, 28 genes in COSMIC are considered as causal cancer genes in the Cancer Gene Census where the genetic and biological data (where available) indicates that mutations in the genes are almost certainly involved in the development of cancer (Table 1). Of these, 25 have a mutation frequency above 5% in one or more tumour type while the other three, ERBB2, FGFR2 and SUFU, have biologically plausible mutations but a low mutation frequency (mutation frequencies in all available data are; 1.2% for ERBB2, 2% for FGFR2 and 1.6% for SUFU). On the COSMIC web site these genes are grouped in the gene selection page. The data is current for all of the genes except TP53. The results for TP53 are essentially additional information from other work. They have been included in COSMIC but do not constitute a comprehensive survey of TP53 mutation data. Other resources such as the IARC TP53 database (Olivier ) give a far more extensive set of TP53 data.
Table 1

Mutation statistics for the known cancer genes curated in COSMIC

Gene References Unique mutations Samples with mutations Samples without mutations
ABL11852172552
BRAF14477276711509
CEBPA11010127
CTNNB1240261146610643
EGFR391396855398
ERBB2812201693
FGFR2565237
FGFR329214841507
FLT3504614935859
GATA14101569
HRAS2512847211462
JAK291473568
KIT1132477682421
KRAS74960840229328
MET2929661503
MSH6111889588
NOTCH11647248
NRAS31333111013378
PDGFRA17352071060
PIK3CA9623101988
PTEN18067812437830
PTPN119431102268
RB1591261681330
RET48352181097
SMARCB121781931348
SMO71725234
SUFU344240
TP53391061
     
Totals2370220121 057114 346

The data for TP53 is not a comprehensive review of the literature for this gene. Some of the samples screened for mutations in other genes were incidentally screened through TP53 and this data has been captured.

The second set of genes in COSMIC have somatic mutations in cancers, however the frequency of mutations is low, generally <5% in all tumour types, and/or they are not located in known functionally significant positions in the proteins. This set comprises 180 genes. The majority of these genes have been screened in a small number of samples. However, a small subset, for example, ACVR1B and CSF1R, have been screened in many cancers. The role of these mutated genes in the development of cancer is unclear and the mutations could be termed ‘somatic variants of unknown significance’. In all likelihood most are not causally implicated in oncogenesis, that is, the mutations are passenger (also known as bystander) mutations. However, it is equally plausible that a minority is involved in cancer development, although it is currently not possible to determine which. The final set of genes has been screened for mutations but none have been reported. This set of genes is large (333) with the data coming from the sequencing of all 518 protein kinase genes in: 25 breast cancers (Stephens ), 33 lung cancers (Davies ) and 13 testicular germ cell tumours (Bignell ). In general, this type of data is either not present in the literature or the description is cursory making it difficult to enter in COSMIC. If mutations are found in these genes in the future, the status of the genes in COSMIC would be modified.

CANCER CELL LINES AND KNOWN CANCER GENES

Cancer cell lines have been used extensively in the biological characterisation of cancer and in the analysis of both novel and routinely used anticancer drugs. On the whole this has taken place with little or no consideration of the DNA sequence of known cancer genes in these samples. To redress this imbalance COSMIC now displays mutation data that we have generated from known cancer genes in the NCI-60 cell line panel of 59 lines and a further 669 cancer cell lines (http://www.sanger.ac.uk/genetics/CGP/CellLines/). Some of these cell lines have been sequenced in the past. For example, TP53 has been sequenced in the NCI-60 (O’Connor ) while other lines have been used as positive controls in mutation screening experiments. Rather than curate this rather piecemeal set of results, we have begun to systematically resequence known cancer genes in this group of cell lines.

OTHER DATA TYPES

There is additional genetic data for the samples being analysed by us (http://www.sanger.ac.uk/genetics/CGP). In all, 829 cancer cell lines in COSMIC have loss of heterozygosity maps produced by genotyping 395 polymorphic CA repeats from across the genome. The samples analysed in the protein kinase mutation screen (http://www.sanger.ac.uk/genetics/CGP/Kinases) and normal samples from the same individuals have been genotyped with the Affymetrix 10k SNP array. This data has also been used to calculate loss of heterozygosity maps. In addition, the intensity data from the SNP arrays has been used to generate chromosome copy number maps. The SNP and CA repeat data is integrated with the mutation data to provide a wider genetic perspective of these samples.

FUTURE DIRECTIONS

The publication of data from systematic mutation screens provides a new avenue for COSMIC. The volume of systematic data is likely to grow and provide a wider insight into the mutation burden in cancer. The screening of known cancer genes in cancer cell lines provides a resource to both the genetics community and those interested in the biology of these cell lines. We intend to expand this data further. The value of small intragenic mutation data can be enhanced by integrating other data types. As a first step, we have integrated genotyping and copy number data. In the future, we hope to incorporate other somatic mutation data to further expand the content of COSMIC. In the meantime, there are plans for the continued curation of the cancer mutation literature to expand the number of known cancer genes.
  17 in total

Review 1.  Repair and genetic consequences of endogenous DNA base damage in mammalian cells.

Authors:  Deborah E Barnes; Tomas Lindahl
Journal:  Annu Rev Genet       Date:  2004       Impact factor: 16.830

2.  Somatic mutations of the protein kinase gene family in human lung cancer.

Authors:  Helen Davies; Chris Hunter; Raffaella Smith; Philip Stephens; Chris Greenman; Graham Bignell; Jon Teague; Adam Butler; Sarah Edkins; Claire Stevens; Adrian Parker; Sarah O'Meara; Tim Avis; Syd Barthorpe; Lisa Brackenbury; Gemma Buck; Jody Clements; Jennifer Cole; Ed Dicks; Ken Edwards; Simon Forbes; Matthew Gorton; Kristian Gray; Kelly Halliday; Rachel Harrison; Katy Hills; Jonathon Hinton; David Jones; Vivienne Kosmidou; Ross Laman; Richard Lugg; Andrew Menzies; Janet Perry; Robert Petty; Keiran Raine; Rebecca Shepherd; Alexandra Small; Helen Solomon; Yvonne Stephens; Calli Tofts; Jennifer Varian; Anthony Webb; Sofie West; Sara Widaa; Andrew Yates; Francis Brasseur; Colin S Cooper; Adrienne M Flanagan; Anthony Green; Maggie Knowles; Suet Y Leung; Leendert H J Looijenga; Bruce Malkowicz; Marco A Pierotti; Bin T Teh; Siu T Yuen; Sunil R Lakhani; Douglas F Easton; Barbara L Weber; Peter Goldstraw; Andrew G Nicholson; Richard Wooster; Michael R Stratton; P Andrew Futreal
Journal:  Cancer Res       Date:  2005-09-01       Impact factor: 12.701

3.  The IARC TP53 database: new online mutation analysis and recommendations to users.

Authors:  Magali Olivier; Ros Eeles; Monica Hollstein; Mohammed A Khan; Curtis C Harris; Pierre Hainaut
Journal:  Hum Mutat       Date:  2002-06       Impact factor: 4.878

4.  A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer.

Authors:  Philip Stephens; Sarah Edkins; Helen Davies; Chris Greenman; Charles Cox; Chris Hunter; Graham Bignell; Jon Teague; Raffaella Smith; Claire Stevens; Sarah O'Meara; Adrian Parker; Patrick Tarpey; Tim Avis; Andy Barthorpe; Lisa Brackenbury; Gemma Buck; Adam Butler; Jody Clements; Jennifer Cole; Ed Dicks; Ken Edwards; Simon Forbes; Matthew Gorton; Kristian Gray; Kelly Halliday; Rachel Harrison; Katy Hills; Jonathon Hinton; David Jones; Vivienne Kosmidou; Ross Laman; Richard Lugg; Andrew Menzies; Janet Perry; Robert Petty; Keiran Raine; Rebecca Shepherd; Alexandra Small; Helen Solomon; Yvonne Stephens; Calli Tofts; Jennifer Varian; Anthony Webb; Sofie West; Sara Widaa; Andrew Yates; Francis Brasseur; Colin S Cooper; Adrienne M Flanagan; Anthony Green; Maggie Knowles; Suet Y Leung; Leendert H J Looijenga; Bruce Malkowicz; Marco A Pierotti; Bin Teh; Siu T Yuen; Andrew G Nicholson; Sunil Lakhani; Douglas F Easton; Barbara L Weber; Michael R Stratton; P Andrew Futreal; Richard Wooster
Journal:  Nat Genet       Date:  2005-05-22       Impact factor: 38.330

5.  Characterization of the p53 tumor suppressor pathway in cell lines of the National Cancer Institute anticancer drug screen and correlations with the growth-inhibitory potency of 123 anticancer agents.

Authors:  P M O'Connor; J Jackman; I Bae; T G Myers; S Fan; M Mutoh; D A Scudiero; A Monks; E A Sausville; J N Weinstein; S Friend; A J Fornace; K W Kohn
Journal:  Cancer Res       Date:  1997-10-01       Impact factor: 12.701

Review 6.  The UMD-p53 database: new mutations and analysis tools.

Authors:  Christophe Béroud; Thierry Soussi
Journal:  Hum Mutat       Date:  2003-03       Impact factor: 4.878

7.  EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.

Authors:  J Guillermo Paez; Pasi A Jänne; Jeffrey C Lee; Sean Tracy; Heidi Greulich; Stacey Gabriel; Paula Herman; Frederic J Kaye; Neal Lindeman; Titus J Boggon; Katsuhiko Naoki; Hidefumi Sasaki; Yoshitaka Fujii; Michael J Eck; William R Sellers; Bruce E Johnson; Matthew Meyerson
Journal:  Science       Date:  2004-04-29       Impact factor: 47.728

Review 8.  The interacting pathways for prevention and repair of oxidative DNA damage.

Authors:  Geir Slupphaug; Bodil Kavli; Hans E Krokan
Journal:  Mutat Res       Date:  2003-10-29       Impact factor: 2.433

Review 9.  A census of human cancer genes.

Authors:  P Andrew Futreal; Lachlan Coin; Mhairi Marshall; Thomas Down; Timothy Hubbard; Richard Wooster; Nazneen Rahman; Michael R Stratton
Journal:  Nat Rev Cancer       Date:  2004-03       Impact factor: 60.716

10.  The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website.

Authors:  S Bamford; E Dawson; S Forbes; J Clements; R Pettett; A Dogan; A Flanagan; J Teague; P A Futreal; M R Stratton; R Wooster
Journal:  Br J Cancer       Date:  2004-07-19       Impact factor: 7.640

View more
  164 in total

Review 1.  EGFR(S) inhibitors in the treatment of gastro-intestinal cancers: what's new?

Authors:  Shailender Singh Kanwar; Jyoti Nautiyal; Adhip P N Majumdar
Journal:  Curr Drug Targets       Date:  2010-06       Impact factor: 3.465

2.  G12V and G12A KRAS mutations are associated with poor outcome in patients with metastatic colorectal cancer treated with bevacizumab.

Authors:  Ondrej Fiala; Tomas Buchler; Beatrice Mohelnikova-Duchonova; Bohuslav Melichar; Vit Martin Matejka; Lubos Holubec; Jana Kulhankova; Zbynek Bortlicek; Marie Bartouskova; Vaclav Liska; Ondrej Topolcan; Monika Sedivcova; Jindrich Finek
Journal:  Tumour Biol       Date:  2015-12-10

3.  XPO1/CRM1 Inhibition Causes Antitumor Effects by Mitochondrial Accumulation of eIF5A.

Authors:  Takahito Miyake; Sunila Pradeep; Sherry Y Wu; Rajesha Rupaimoole; Behrouz Zand; Yunfei Wen; Kshipra M Gharpure; Archana S Nagaraja; Wei Hu; Min Soon Cho; Heather J Dalton; Rebecca A Previs; Morgan L Taylor; Takeshi Hisamatsu; Yu Kang; Tao Liu; Sharon Shacham; Dilara McCauley; David H Hawke; John E Wiktorowicz; Robert L Coleman; Anil K Sood
Journal:  Clin Cancer Res       Date:  2015-04-15       Impact factor: 12.531

4.  Unusually long-term responses to vemurafenib in BRAF V600E mutated colon and thyroid cancers followed by the development of rare RAS activating mutations.

Authors:  Tali Ofir Dovrat; Ethan Sokol; Garrett Frampton; Eliya Shachar; Sharon Pelles; Ravit Geva; Ido Wolf
Journal:  Cancer Biol Ther       Date:  2018-07-23       Impact factor: 4.742

5.  Prognostic significance of K-Ras mutation rate in metastatic colorectal cancer patients.

Authors:  Bruno Vincenzi; Chiara Cremolini; Andrea Sartore-Bianchi; Antonio Russo; Francesco Mannavola; Giuseppe Perrone; Francesco Pantano; Fotios Loupakis; Daniele Rossini; Elena Ongaro; Erica Bonazzina; Emanuela Dell'Aquila; Marco Imperatori; Alice Zoccoli; Giuseppe Bronte; Giovanna De Maglio; Gabriella Fontanini; Clara Natoli; Alfredo Falcone; Daniele Santini; Andrea Onetti-Muda; Salvatore Siena; Giuseppe Tonini; Giuseppe Aprile
Journal:  Oncotarget       Date:  2015-10-13

6.  Maintaining a regular physical activity aggravates intramuscular tumor growth in an orthotopic liposarcoma model.

Authors:  Mohamad Assi; Frédéric Derbré; Luz Lefeuvre-Orfila; Dany Saligaut; Nathalie Stock; Mickael Ropars; Amélie Rébillard
Journal:  Am J Cancer Res       Date:  2017-05-01       Impact factor: 6.166

7.  Lkb1 deletion in murine B lymphocytes promotes cell death and cancer.

Authors:  George P Souroullas; Yuri Fedoriw; Louis M Staudt; Norman E Sharpless
Journal:  Exp Hematol       Date:  2017-04-21       Impact factor: 3.084

8.  PIK3CA somatic mutations in breast cancer: Mechanistic insights from Langevin dynamics simulations.

Authors:  Parminder K Mankoo; Saraswati Sukumar; Rachel Karchin
Journal:  Proteins       Date:  2009-05-01

Review 9.  Pharmacogenetics and pharmacogenomics of anticancer agents.

Authors:  R Stephanie Huang; Mark J Ratain
Journal:  CA Cancer J Clin       Date:  2009 Jan-Feb       Impact factor: 508.702

10.  Ras- and PI3K-dependent breast tumorigenesis in mice and humans requires focal adhesion kinase signaling.

Authors:  Yuliya Pylayeva; Kelly M Gillen; William Gerald; Hilary E Beggs; Louis F Reichardt; Filippo G Giancotti
Journal:  J Clin Invest       Date:  2009-01-19       Impact factor: 14.808

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.