Literature DB >> 15188009

The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website.

S Bamford1, E Dawson, S Forbes, J Clements, R Pettett, A Dogan, A Flanagan, J Teague, P A Futreal, M R Stratton, R Wooster.   

Abstract

The discovery of mutations in cancer genes has advanced our understanding of cancer. These results are dispersed across the scientific literature and with the availability of the human genome sequence will continue to accrue. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website have been developed to store somatic mutation data in a single location and display the data and other information related to human cancer. To populate this resource, data has currently been extracted from reports in the scientific literature for somatic mutations in four genes, BRAF, HRAS, KRAS2 and NRAS. At present, the database holds information on 66 634 samples and reports a total of 10 647 mutations. Through the web pages, these data can be queried, displayed as figures or tables and exported in a number of formats. COSMIC is an ongoing project that will continue to curate somatic mutation data and release it through the website.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15188009      PMCID: PMC2409828          DOI: 10.1038/sj.bjc.6601894

Source DB:  PubMed          Journal:  Br J Cancer        ISSN: 0007-0920            Impact factor:   7.640


Approximately one in three individuals in Europe and North America develops one of the approximately 200 different classes of cancer and it is the cause of death of one in five (Higginson, 1992). All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, each of which ultimately confers growth advantage upon the clone of cells in which it has occurred (Vogelstein and Kinzler, 1998). These abnormalities include base substitutions, deletions, amplifications and rearrangements. The extent to which each of these mechanisms contributes to cancer varies markedly between different genes, and probably also between different cancer types. Identification of the genes that are mutated in cancer is a central aim of cancer research. Over the past 25 years, approximately 300 genes have been shown to be somatically mutated in cancer (Futreal ). This work forms the foundation for understanding the biological abnormalities within neoplastic cells, provides information on the function of gene products and sheds light on more complex questions such as the relationships between genes and biochemical pathways. Current strategies for the development of new therapeutic and preventive agents in cancer are increasingly dependent upon modulation of these critical molecular targets. The scientific literature is a rich source of mutation data that, in general, is published in a piecemeal fashion. More comprehensive data sources do exist, such as Online Mendelian Inheritance in Man (OMIM, Wheeler ), HGVbase (Fredman ) and the Human Gene Mutation Database (HGMD, Stenson ). These databases give overviews of the genetics and biology of many genes and associated diseases (OMIM), genome variants and associated genotype–phenotype relationships (HGVbase) or germline mutation data (HGMD). For somatic mutations in cancer, there are many locus-specific web resources, such as those for p53 (Olivier ; Béroud and Soussi, 2003), that cover a single gene in depth. The value of these various databases should not be underestimated; however, none of them offer a comprehensive view of all previously reported somatic mutations in cancer. Looking to the future, the volume of somatic mutation data will continue to expand and the scientific community will be better served if this data is provided in a coherent fashion. A public, comprehensive, intuitive, accessible and integrated database is required to maximise the benefit from this rich data set. The Catalogue of Somatic Mutations in Cancer (COSMIC), (http://www.sanger.ac.uk/cosmic) is a database that holds somatic mutation data and associated information, and can be interrogated through a series of web pages to provide a graphical or tabular view of the data along with various export options. To date, the database has been populated with data from four genes: HRAS, KRAS2, NRAS and BRAF.

DATA CURATION

Gene selection

The genes that have been selected for curation are taken from the list of cancer genes assembled in the Cancer Gene Census (Futreal ). In the first instance, data was obtained for four genes that are known to be somatically mutated in cancer: HRAS (Reddy ), KRAS2 (McCoy ), NRAS (Hall ) and BRAF (Davies ).

Data extraction from the literature

PubMed (Wheeler ) is broadly searched for references containing relevant somatic mutation data in cancer (example search: (ras OR genes, ras) AND human AND mutation). In the first instance, the abstract is read to identify, and select for inclusion in the database, papers that are likely to include somatic mutation information relating to cancer or precancerous conditions. Primary research papers are read and information about the samples, mutations and experimental methods (see Table 1 ) is extracted and entered into the database. Reviews are also selected if thought to be specific to a gene of interest. In order to avoid duplication of data, this source is used to identify the relevant primary literature and not as the source of the mutation data. Any references containing incomplete data (e.g. mutations reported but not fully described) or data of insufficient quality (e.g. errors identified in the data) are not fully curated but are added to a list of additional references containing somatic mutation information. Simple mutations are fed through Mutation Checker (Stajich ) before being imported to COSMIC, while more complex alterations are manually annotated.
Table 1

Data entered in COSMIC

ReferenceSample
TitleGene
AuthorsExperimental information
JournalSample ID
YearMutation status
VolumeNormal tissue tested
Page start and stopSite primary
PubMed IDSite subtype 1
Experimental informationSite subtype 2
GeneHistology
 Histology subtype 1
MutationHistology subtype 2
Mutation IDStage
Mutation typeGrade
DNA locationSource tissue
DNA changeLoss of heterozygosity
DNA evidenceGender
Is somaticAge
RNA labelOther mutations
RNA changeEthnicity
RNA regionGeographical location
RNA locationParent tested
RNA evidenceFamily ID
Amino-acid labelRemark
Amino-acid locationReference
Amino-acid changeEnvironmental variables
Amino-acid evidence 
GeneGene
SequenceName
RemarkSymbol
 Other names
Experimental informationChromosome
Primary detection methodChromosome band
Secondary detection methodcDNA sequence accession
Confirmation methodcDNA sequence version
Exons/codons screenedEnsembl gene start and stop
Whole gene screenedSwissprot accession
RemarkOMIM accession

Section heading for the data in COSMIC are in bold.

Section heading for the data in COSMIC are in bold.

COSMIC DATABASE

The COSMIC database is implemented in an Oracle relational database and has five sections each containing multiple tables.

Gene information

A static version of each gene is maintained in COSMIC. The genomic structure of each gene and chromosome location is derived from Ensembl (Birney ) and cDNA sequence and protein sequence from the RefSeq project (Wheeler ). Other information is held to provide links to web resources such as Ensembl (Birney ), Pfam (Bateman ), InterPro (Mulder ) and OMIM (Wheeler ).

Paper information

The details of the papers that have been curated are maintained in the paper section and include title, journal, author lists and links to PubMed. There are currently 1483 papers in COSMIC, 865 of these have been curated for mutations, while 618 either have no relevant data or incomplete data that could not accurately be extracted. By gene 30, 249, 718 and 303 papers report BRAF, HRAS, KRAS2 and NRAS mutations, respectively. Of the 865 papers reporting mutations, 615 report data on only one gene, while 72, 174 and four contain data on two, three or all four genes, respectively.

Mutation information

COSMIC can accommodate information on base substitutions, insertions and deletions, translocations and changes in copy number. For the four genes presently in COSMIC, there are 147 unique mutations (36 for BRAF, 27 for HRAS, 52 for KRAS2 and 32 for NRAS). In the tumours that have been analysed, there are a total of 10 647 mutations, 736 in BRAF, 477 in HRAS, 8302 in KRAS2 and 1132 in NRAS.

Tumour classification system

The tissue site and histology data is taken from the curated papers and entered into COSMIC (this forms the ‘paper definition’). Tumour classification is a continually evolving field and there is no standard nomenclature adhered to for the purposes of publication in the various journals. Identical tissues and histologies can have different labels depending on the origin and age of the study. To overcome difficulties caused by these alternate nomenclatures, a standardised system of definitions has been developed (the ‘COSMIC definitions’) through consultation with experts in the field. This groups data from the same tissue types and histologies and can be used to translate the ‘paper definitions’ to ‘COSMIC definitions’. Every sample has up to eight definitions; primary tissue, tissue subtype 1, 2 and 3, primary histology and histology subtypes 1, 2 and 3. If there is no data for any of these definitions, COSMIC records an entry of NS, not specified. A total of 513 tissue definitions have been noted in the papers in COSMIC and have been translated to 372 COSMIC tissue definitions. Likewise, a total of 1150 histology definitions were found in the papers in COSMIC that were translated to 425 COSMIC histology definitions. This unified classification system is presented through the web pages to present a normalised browsing tool.

Individual/tumour/sample data

The sample data is taken from the curated papers and linked to the appropriate gene, paper, classification and when present a mutation. This forms the core of the COSMIC database. An individual can have many tumours and each tumour can have many samples. However in the COSMIC scheme, each sample is unique and could be considered as a single experiment. There are 66 634 sample records in COSMIC (5158, 11 876, 35 716 and 13 884 for BRAF, HRAS, KRAS2 and NRAS, respectively). These samples are derived from 57 444 tumours of which 51 988 were analysed in one gene, 2353 in two genes, 2930 in three genes and 173 in all four genes.

COSMIC WEBSITE

A series of web pages provides query tools to interrogate COSMIC and produces graphical (Figure 1) and tabular (Table 2 ) displays of the data. Currently the output is provided at the amino-acid level based on the protein structure of each gene.
Figure 1

The initial output from COSMIC is a graphical view of the mutations distributed along the linear amino-acid sequence of the gene. The scale bar incorporates a zoom function to generate a more detailed view of the protein to the point where individual amino acids are named (when there are fewer than 31 amino acids displayed). When a Pfam or Interpro domain is present, a link is provided to these resources (adjacent to the Domain label) while links to the papers that were curated are positioned beneath the mutations (in red) with an option of either viewing the papers that have data for a particular location in the protein or all of the papers for the selected gene.

Table 2

Mutation Details from COSMIC

 Details for BRAF
TissueMutations (% of All Samples)All SamplesMutation Data
NS03More Details
adrenal gland02More Details
autonomic ganglia027More Details
bile duct16 (23%)70More Details
bladder037More Details
bone1 (3%)31More Details
brain4 (7%)56More Details
breast1 (1%)78More Details
cervix049More Details
endometrium05More Details
eye031More Details
haematopoietic and lymphoid tissue4 (1%)322More Details
head neck6 (4%)152More Details
kidney012More Details
large intestine148 (13%)1135More Details
larynx025More Details
liver1 (3%)32More Details
lung15 (2%)829More Details
mouth013More Details
ovary57 (20%)282More Details
pancreas5 (4%)114More Details
pharynx3 (6%)51More Details
placenta01More Details
pleura03More Details
prostate043More Details
skin282 (61%)460More Details
small intestine01More Details
soft tissue5 (2%)211More Details
stomach7 (2%)407More Details
testis07More Details
thyroid181 (27%)669More Details

The mutations from COSMIC are presented by tissue and where selected by histology with a figure for the number of samples analysed for each tissue (All Samples) and the number of mutations reported (Mutated). The ‘More Details’ column gives further navigation options to view data for the selected tissue, view data for the same tissue in other genes or provide more details on the mutations for the selected tissue.

The initial output from COSMIC is a graphical view of the mutations distributed along the linear amino-acid sequence of the gene. The scale bar incorporates a zoom function to generate a more detailed view of the protein to the point where individual amino acids are named (when there are fewer than 31 amino acids displayed). When a Pfam or Interpro domain is present, a link is provided to these resources (adjacent to the Domain label) while links to the papers that were curated are positioned beneath the mutations (in red) with an option of either viewing the papers that have data for a particular location in the protein or all of the papers for the selected gene. The mutations from COSMIC are presented by tissue and where selected by histology with a figure for the number of samples analysed for each tissue (All Samples) and the number of mutations reported (Mutated). The ‘More Details’ column gives further navigation options to view data for the selected tissue, view data for the same tissue in other genes or provide more details on the mutations for the selected tissue.

Browse by gene

Immediate access to the data is provided through the Browse by Gene link. This gives an instant overview of the mutation data for one or more genes and gives links to display data for individual tissues.

Browse by tissue

More complex queries can be constructed using the Browse by Tissue link. The user has the option to select one or more tissues, then one or more histologies, and finally one or more genes. If only one tissue or histology is selected, it is possible to select one or more tissue or histology subtypes before making a gene selection. All of the tissues present in the COSMIC classification scheme are available from the first page; however, subsequent pages only show the relevant options and not the entire list of options, for example having selected eye, the tissue subtype options are retina and uveal tract.

Data display

After querying the database, the results are displayed as a figure (Figure 1) and as a series of tables (Table 2) for each gene that was selected. The figure shows the linear amino-acid sequence derived from the gene with the mutations positioned along its length. Further information and links are provided as appropriate to the protein sequence. The table gives a summary of the mutations stratified by tissue and histology. The depth of the stratification relates to the depth of the original query. If only tissue was selected, the data will be stratified by tissue; however, if tissue, subtissue, histology and subhistology are selected, the data will be broken down further. Links from this table reload the figure to display a subset of the data and provide more details of the specific mutations. Two other tables provide a summary of the statistics in COSMIC for the selected gene and a summary of the mutations shown in the figure.

Exports and downloads

Having displayed the results from a query, the data can be formatted in simple text, Excel or HTML that can be downloaded from the COSMIC site. The cDNA and protein sequences are available through the Additional Info. link on the COSMIC home page as is the Classification Scheme.

FUTURE DIRECTIONS

There is a continuing effort to enter additional somatic mutation data in to COSMIC. In order to keep the data in COSMIC up-to-date, we regularly monitor the literature for new reports of mutations in the genes that exist in COSMIC. In addition, further cancer genes will be taken from the Cancer Gene Census (Futreal ) and curated. The COSMIC website will be developed further to make use of the underlying data. This will include a DNA view of the mutations and methods to display insertions and deletions. In addition, we will display other data that has already been captured such as the patient sex and age for the samples and the experimental methods used to screen for the mutations. There are however limitations to this data as we can only collect data that is described in the original work. Even with this caveat the data provides a direct summary of the somatic mutation literature. Considering the data set as a whole it will be possible to analyse, in greater detail, the wider aspects of the biology underlying the genetic changes that take place in cancer.
  14 in total

1.  Ensembl 2004.

Authors:  E Birney; D Andrews; P Bevan; M Caccamo; G Cameron; Y Chen; L Clarke; G Coates; T Cox; J Cuff; V Curwen; T Cutts; T Down; R Durbin; E Eyras; X M Fernandez-Suarez; P Gane; B Gibbins; J Gilbert; M Hammond; H Hotz; V Iyer; A Kahari; K Jekosch; A Kasprzyk; D Keefe; S Keenan; H Lehvaslaiho; G McVicker; C Melsopp; P Meidl; E Mongin; R Pettett; S Potter; G Proctor; M Rae; S Searle; G Slater; D Smedley; J Smith; W Spooner; A Stabenau; J Stalker; R Storey; A Ureta-Vidal; C Woodwark; M Clamp; T Hubbard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  The InterPro Database, 2003 brings increased coverage and new features.

Authors:  Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Daniel Barrell; Alex Bateman; David Binns; Margaret Biswas; Paul Bradley; Peer Bork; Phillip Bucher; Richard R Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Laurent Falquet; Wolfgang Fleischmann; Sam Griffiths-Jones; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; Rodrigo Lopez; Ivica Letunic; David Lonsdale; Ville Silventoinen; Sandra E Orchard; Marco Pagni; David Peyruc; Chris P Ponting; Jeremy D Selengut; Florence Servant; Christian J A Sigrist; Robert Vaughan; Evgueni M Zdobnov
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

4.  HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources.

Authors:  D Fredman; M Siegfried; Y P Yuan; P Bork; H Lehväslaiho; A J Brookes
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

5.  The IARC TP53 database: new online mutation analysis and recommendations to users.

Authors:  Magali Olivier; Ros Eeles; Monica Hollstein; Mohammed A Khan; Curtis C Harris; Pierre Hainaut
Journal:  Hum Mutat       Date:  2002-06       Impact factor: 4.878

6.  Identification of transforming gene in two human sarcoma cell lines as a new member of the ras gene family located on chromosome 1.

Authors:  A Hall; C J Marshall; N K Spurr; R A Weiss
Journal:  Nature       Date:  1983 Jun 2-8       Impact factor: 49.962

Review 7.  The UMD-p53 database: new mutations and analysis tools.

Authors:  Christophe Béroud; Thierry Soussi
Journal:  Hum Mutat       Date:  2003-03       Impact factor: 4.878

8.  Mutations of the BRAF gene in human cancer.

Authors:  Helen Davies; Graham R Bignell; Charles Cox; Philip Stephens; Sarah Edkins; Sheila Clegg; Jon Teague; Hayley Woffendin; Mathew J Garnett; William Bottomley; Neil Davis; Ed Dicks; Rebecca Ewing; Yvonne Floyd; Kristian Gray; Sarah Hall; Rachel Hawes; Jaime Hughes; Vivian Kosmidou; Andrew Menzies; Catherine Mould; Adrian Parker; Claire Stevens; Stephen Watt; Steven Hooper; Rebecca Wilson; Hiran Jayatilake; Barry A Gusterson; Colin Cooper; Janet Shipley; Darren Hargrave; Katherine Pritchard-Jones; Norman Maitland; Georgia Chenevix-Trench; Gregory J Riggins; Darell D Bigner; Giuseppe Palmieri; Antonio Cossu; Adrienne Flanagan; Andrew Nicholson; Judy W C Ho; Suet Y Leung; Siu T Yuen; Barbara L Weber; Hilliard F Seigler; Timothy L Darrow; Hugh Paterson; Richard Marais; Christopher J Marshall; Richard Wooster; Michael R Stratton; P Andrew Futreal
Journal:  Nature       Date:  2002-06-09       Impact factor: 49.962

9.  Database resources of the National Center for Biotechnology Information: update.

Authors:  David L Wheeler; Deanna M Church; Ron Edgar; Scott Federhen; Wolfgang Helmberg; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Tugba O Suzek; Tatiana A Tatusova; Lukas Wagner
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

10.  The Pfam protein families database.

Authors:  Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more
  587 in total

1.  Implementing multiplexed genotyping of non-small-cell lung cancers into routine clinical practice.

Authors:  L V Sequist; R S Heist; A T Shaw; P Fidias; R Rosovsky; J S Temel; I T Lennes; S Digumarthy; B A Waltman; E Bast; S Tammireddy; L Morrissey; A Muzikansky; S B Goldberg; J Gainor; C L Channick; J C Wain; H Gaissert; D M Donahue; A Muniappan; C Wright; H Willers; D J Mathisen; N C Choi; J Baselga; T J Lynch; L W Ellisen; M Mino-Kenudson; M Lanuti; D R Borger; A J Iafrate; J A Engelman; D Dias-Santagata
Journal:  Ann Oncol       Date:  2011-11-09       Impact factor: 32.976

Review 2.  Signal control through Raf: in sickness and in health.

Authors:  Jihan K Osborne; Elma Zaganjor; Melanie H Cobb
Journal:  Cell Res       Date:  2011-12-06       Impact factor: 25.617

3.  Perspectives on personalized cancer care.

Authors:  Garrett M Dancika; Dan Theodorescu
Journal:  Urol Oncol       Date:  2012 Mar-Apr       Impact factor: 3.498

Review 4.  Molecular prescreening to select patient population in early clinical trials.

Authors:  Jordi Rodón; Cristina Saura; Rodrigo Dienstmann; Ana Vivancos; Santiago Ramón y Cajal; José Baselga; Josep Tabernero
Journal:  Nat Rev Clin Oncol       Date:  2012-04-03       Impact factor: 66.675

5.  A case of synchronous pancreatic ductal adenocarcinoma and ovarian mucinous cystic neoplasm: use of kras mutation molecular phenotyping to demonstrate independent primary origin.

Authors:  Maeve Lowery; David Klimstra; Cyrus Hedvat; William Jarnagin; Dennis Chi; Eileen M O'Reilly
Journal:  Gastrointest Cancer Res       Date:  2012-03

Review 6.  The ERBB network: at last, cancer therapy meets systems biology.

Authors:  Yosef Yarden; Gur Pines
Journal:  Nat Rev Cancer       Date:  2012-07-12       Impact factor: 60.716

7.  Disruption of p73-MDM2 binding synergizes with gemcitabine to induce apoptosis in HuCCT1 cholangiocarcinoma cell line with p53 mutation.

Authors:  Tongsen Zheng; Jiabei Wang; Xi Chen; Xianzhi Meng; Xuan Song; Zhaoyang Lu; Hongchi Jiang; Lianxin Liu
Journal:  Tumour Biol       Date:  2010-04-27

8.  Prediction of human protein-protein interaction by a mixed Bayesian model and its application to exploring underlying cancer-related pathway crosstalk.

Authors:  Yan Xu; Wen Hu; Zhiqiang Chang; Huizi Duanmu; Shanzhen Zhang; Zhenqi Li; Zihui Li; Lili Yu; Xia Li
Journal:  J R Soc Interface       Date:  2010-10-13       Impact factor: 4.118

9.  CDKN2A Germline Rare Coding Variants and Risk of Pancreatic Cancer in Minority Populations.

Authors:  Robert R McWilliams; Eric D Wieben; Kari G Chaffee; Samuel O Antwi; Leon Raskin; Olufunmilayo I Olopade; Donghui Li; W Edward Highsmith; Gerardo Colon-Otero; Lauren G Khanna; Jennifer B Permuth; Janet E Olson; Harold Frucht; Jeanine Genkinger; Wei Zheng; William J Blot; Lang Wu; Luciana L Almada; Martin E Fernandez-Zapico; Hugues Sicotte; Katrina S Pedersen; Gloria M Petersen
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2018-07-23       Impact factor: 4.254

10.  Survival advantage combining a BRAF inhibitor and radiation in BRAF V600E-mutant glioma.

Authors:  Tina Dasgupta; Aleksandra K Olow; Xiaodong Yang; Rintaro Hashizume; Theodore P Nicolaides; Maxwell Tom; Yasuyuki Aoki; Mitchel S Berger; William A Weiss; Lukas J A Stalpers; Michael Prados; C David James; Sabine Mueller; Daphne A Haas-Kogan
Journal:  J Neurooncol       Date:  2015-09-18       Impact factor: 4.130

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.