Literature DB >> 25355519

COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Simon A Forbes¹, David Beare², Prasad Gunasekaran², Kenric Leung², Nidhi Bindal², Harry Boutselakis², Minjie Ding², Sally Bamford², Charlotte Cole², Sari Ward², Chai Yin Kok², Mingming Jia², Tisham De², Jon W Teague², Michael R Stratton², Ultan McDermott², Peter J Campbell².

Abstract

COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. Our latest release (v70; Aug 2014) describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes. To emphasize depth of knowledge on known cancer genes, mutation information is curated manually from the scientific literature, allowing very precise definitions of disease types and patient details. Combination of almost 20,000 published studies gives substantial resolution of how mutations and phenotypes relate in human cancer, providing insights into the stratification of mutations and biomarkers across cancer patient populations. Conversely, our curation of cancer genomes (over 12,000) emphasizes knowledge breadth, driving discovery of unrecognized cancer-driving hotspots and molecular targets. Our high-resolution curation approach is globally unique, giving substantial insight into molecular biomarkers in human oncology. In addition, COSMIC also details more than six million noncoding mutations, 10,534 gene fusions, 61,299 genome rearrangements, 695,504 abnormal copy number segments and 60,119,787 abnormal expression variants. All these types of somatic mutation are annotated to both the human genome and each affected coding gene, then correlated across disease and mutation types.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 25355519 PMCID： PMC4383913 DOI： 10.1093/nar/gku1075

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

COSMIC is a database system designed to bring together the world's information on somatic mutations in human cancer into one single system and make it easily explorable. Gene-focused manual curation delivers deep mutation profiles on known cancer genes selected from the Cancer Gene Census (1) (http://cancer.sanger.ac.uk/cancergenome/projects/census/). These profiles, across more than 2500 human cancer diseases, allow deep stratification of which mutations are causing which cancers. To complement this knowledge depth, systematic curation of cancer genomes, both via publication and consortium data portals, generates huge breadth of knowledge across all somatic human genome annotations, providing substantial power to discover new cancer-causing events. Since COSMIC launched in 2004 (2) detailing four cancer genes, the last 10 years have seen an enormous growth in cancer genetics and genomics, allowing COSMIC to now represent full literature curations of 136 genes and 12 542 cancer genomes (total numbers of data are shown in Table 1). Originally designed to detail simple coding gene point mutations, COSMIC now describes millions of coding mutations, noncoding mutations, genomic rearrangements, fusion genes, copy number abnormalities and gene expression variants across the human genome.

Table 1.

Total contents in version 70 of the COSMIC database, the August 2014 release

Genes (transcripts)	28 735
Tumor samples	1 029 547
Coding mutations	2 002 811
Curated publications	19 703
Fusion mutations	10 435
Genomic rearrangements	61 299
Whole genomes	12 542
Copy number aberrations	695 504
Gene expression variants	60 119 787

DATABASE CONTENT

Curation of published cancer mutation data is achieved via two complementary approaches. In order to obtain great depth of knowledge on key cancer genes, all appropriate literature is identified for each gene, then subjected to manual curation. This manual approach allows the capture of very high detail across mutation positions, disease descriptions and other patient and population data (such as age, ethnicity and therapeutic regime). Over 2500 cancer disease classifications are currently described in COSMIC, from 47 primary tissue types, and manual curation is the only way to capture the level of detail required to define these populations. Manual curation additionally provides improved quality control over systematic approaches. While gene, nucleotide and vocabulary details can be checked automatically, experienced curators are much better at identifying inconsistencies or errors in publications, allowing the rejection of untrustworthy, incomplete or unspecific data sources; over 30% of the 25 715 papers so far scrutinized by COSMIC have been rejected. New genes are included in COSMIC only when curation of their literature is exhausted, and the mutation patterns are as up-to-date as possible. After initial release, information for these genes will be updated as new papers are published. Complementary to the manual curation effort, a semi-automated approach has been developed for curation of large cancer genome (and exome) data sets. Data sources are identified from the published literature and online data portals. Over 300 cancer genome publications have now been curated, and COSMIC includes substantial data sets from The Cancer Genome Atlas (3) (TCGA; http://cancergenome.nih.gov) and International Cancer Genome Consortium (4) (ICGC; https://dcc.icgc.org) projects. Approximately half of COSMIC's cancer genomes are curated from these consortium data portals, the other half from curations of published literature. The details of samples and disease descriptions are curated into COSMIC manually, and the mutations, usually supplied as genomic co-ordinates, are annotated automatically via a software pipeline using Ensembl genome annotations (5) (http://www.ensembl.org). This utilizes custom software similar to the Variant Effect Predictor (VEP; 6) to identify the positions of coding mutations as well as consequence annotations. Somatic mutations in cancer are now described across almost all human genes. While genome-wide resequencing is becoming a standard technology in cancer genetics, the methodologies are still imperfect, although rapidly improving. In these experiments, sequencing coverage is rarely complete, with GC-rich regions particularly suffering dropout (7). It is therefore often difficult to identify every genomic variant, or determine whether a sample is wild-type or simply not assessed at any given position. In the absence of sample-specific coverage information (or raw data to re-evaluate), COSMIC makes the standard assumption that every gene has been evaluated in every sample, and calculates mutation rates accordingly. All curated data in COSMIC is referenced, allowing investigators to independently verify their findings in greater detail. However, ‘COSMIC cell line project’ contains pre-publication exome resequencing results; to support research across these resources, raw data are being prepared for public release. During its 10-year existence, the main focus of COSMIC has been the aggregation of point mutation data across genes and genomes. In addition to this, manual curation efforts include the description of fusion genes. Often observed in cancer, these mutations result from genomic rearrangements which usually translocate two coding domains close to each other so that they form a single mutant transcript driving tumourigenesis. Current curations focus on solid tumor fusions, with an intent to begin curating blood cancer fusions when the majority of solid tumor mutations are represented in COSMIC. All manual curation is driven by the Cancer Gene Census, a list of genes (currently 522) with substantial literature describing their impact in cancer development, which diseases are caused, and indications of the mechanism involved. As the genomic approach to cancer genetics matures, a number of complimentary genome-wide annotations are adding substantial context to the understanding of mutation burden, and we are expanding COSMIC to accommodate these. Copy number alterations are well documented in cancer, with genomic amplifications and deletions regularly driving oncogenesis. Currently, the two cancer genome consortia are releasing substantial copy number (CN) information in regularly formatted data sets, and we have incorporated this into COSMIC. Gene expression variants are also regularly used to identify oncogenic drivers, with significantly increased or decreased levels of expression across sample cohorts identifying a driver signature. Again, cancer genome consortia have regularized their data output into standard formats, enabling our regular interpretation of these data into each COSMIC release. Extensive annotations across all the described data types are available in the current release (v70; August 2014), and will be updated with additional information in future releases.

DATA ACCESS

The data in COSMIC are available in a number of different ways. Most accessibly, a custom website is available (http://cancer.sanger.ac.uk) which displays the information in a number of graphic and tabulated views, making it easily explorable. The data are also available via a BioMart, (8) for programmatic access or downloads of user-specified data subsets. The entire COSMIC database is also available, after registration, for download in several forms including CSV and VCF formatted datasheets, or a full export of the entire Oracle database.

Website

The COSMIC website is available at http://cancer.sanger.ac.uk. Designed to make entry to COSMIC easy via one search box, the homepage (Figure 1) also provides access to a number of related resources. Three parallel websites allow the exploration of components of the COSMIC system. ‘COSMIC cell line project’ exclusively displays the results of genomic analysis across a large set of common cancer cell lines, currently numbering 1015 but expected to grow toward 1500. ‘COSMIC whole genomes’ displays only the genome-wide tumor analyses integrated into COSMIC, providing a view across the breadth of cancer genome data without any specific biases introduced via literature curation. ‘COSMIC’ displays all the data brought into the system across the project's life, including cell lines, whole genomes and all genome-wide and gene-specific literature curations. Additionally, ‘COSMIC genome browser’ provides a genomic view across all COSMIC data types, aligned with many annotations from Ensembl, including noncoding RNAs (not yet included in the COSMIC website). ‘Census’ shows a listing of all the genes in the Cancer Gene Census including the details on disease causation and mutation mechanisms. Finally, ‘Drug Sensitivity’ links to a parallel resource in the team, describing the relationship, across more than 700 cell lines, between original disease, mutant genotype and response to a range of anticancer drugs. (9).

Figure 1.

COSMIC website front page. Search options are presented in the left hand panel, descriptions of the content in the right side panel. The lower panel details related websites and other components of COSMIC. The dark bar at the top provides primary navigation to Help, Downloads and other descriptive content as well as a Contact link to the COSMIC helpdesk. Primary access to COSMIC is via the Search box in the left side panel, accepting multiple parameters including gene names, disease descriptions, mutation syntax and stable COSMIC IDs. ‘Search via Cancer Browser’ raises a new page providing navigation of mutation spectra behind thousands of cancer disease classifications. Navigation through the COSMIC website largely follows the selection of a single gene or disease, most easily typed in through the single Search box. The Search will respond with a list of items in COSMIC matching the input, offering matching gene names, mutations, samples, disease descriptions and even paper titles. Upon selection of any of these entities, a new page will appear showing an overview of its contents. If a gene was selected, the Gene Overview page will describe basic gene details together with results of drug sensitivity screens and a COSMIC genome browser describing all mutations around the gene's genomic footprint. Clicking on the ‘Histogram’ icon leads to a display of the mutation distribution across the gene, detailing the position and counts of point mutations, deletions, insertion copy number aberrations and expression abnormalities (Figure 2). This is the key graphic for exploring gene-specific mutation data, and a number of selections are available to filter the data, including tissue/disease type, tissue source, mutation type, somatic status and gene co-ordinate. The graphic itself is zoomable, by clicking and dragging the mouse cursor across the region of interest. Other tabs on this page describe further information (such as fusion mutations involving the gene selected), or offer further ways to explore the gene's data. Often the most useful of these is ‘Tissue’, which details tissue-specific mutation frequencies with counts and mini-histograms. Clicking on a count or a histogram block shows a tabulation of all the data behind that choice; clicking on a tissue name recalculates the table showing a much more detailed breakdown of diseases under that tissue, allowing much deeper exploration of mutation trends in that population. The Histogram and Tissue tabs are linked, with selections in one tab affecting the display of the other, such that the selection of ‘Soft Tissue: GastroIntestinal Stromal Tumor’ (GIST) in the tissue tab will recalculate the histogram to only show the mutation profile of that disease. Comparing the mutation distribution in KIT for GIST and ‘Haem: Mast Cell Neoplasm’ shows how useful such graphical exploration can be in exploring the mutation burden across cancer populations to identify novel targets and biomarkers (Figure 3).

Figure 2.

Figure 3.

Example use of filters on the Gene Analysis page to explore disease-specific mutation burden, comparing the mutation profile of the KIT gene in two different tumor types, (A) Hematological and lymphoid: mast cell neoplasm and (B) Soft tissue: gastrointestinal stromal tumor. Substantial differences are very clearly shown in the mutation peaks between the two diseases in this gene, suggesting molecular biomarkers which may be exploited diagnostically, or in pharmaceutical target validation.

Full mutation distribution across all tissues and cancer diseases for the KIT gene. The X-axis describes the full length of the gene's coding sequence, and is zoomable (click & drag) to resolve amino acid or nucleotide sequences. In each section of data, the vertical height is kept static, while the scale changes according to the amount of data displayed. From the top down, the following mutation types are displayed: Single base substitutions, gene sequence, PFAM representation of peptide structure, copy number gain (pink)/loss (blue), gene over-(red)/under-(green) expression, multinucleotide substitutions (‘complex’), simple insertions (red triangles) and deletions (blue triangles). Example use of filters on the Gene Analysis page to explore disease-specific mutation burden, comparing the mutation profile of the KIT gene in two different tumor types, (A) Hematological and lymphoid: mast cell neoplasm and (B) Soft tissue: gastrointestinal stromal tumor. Substantial differences are very clearly shown in the mutation peaks between the two diseases in this gene, suggesting molecular biomarkers which may be exploited diagnostically, or in pharmaceutical target validation. If COSMIC is being navigated by disease, either the Search box or the ‘Cancer Browser’ (http://cancer.sanger.ac.uk/cosmic/browse/tissue) will provide helpful access to over 2500 cancer disease classifications. After selection of a disease, pressing ‘Go’ will reveal multiple visualizations of the mutation patterns behind the disease of choice. Initially a simple histogram will display the top few (usually 20) highest mutated genes found in the disease (Figure 4), with links into the gene histogram page, to explore each gene in more detail. ‘Mutation Matrix’ will display a representation of the 200 most mutated samples and 20 most mutated genes for this disease, showing all mutation types in one image. Other graphics and tabulations provide additional ways to explore or detail the mutation information behind each disease.

Figure 4.

Histograms from the Cancer Browser, describing the most mutated top few genes on (A) skin melanoma (n = 9136), (B) rare blood cancers hairy cell leukemia (n = 514); Langerhans cell histiocytosis (n = 188) and (C) uveal melanoma (n = 714). Red bars represent the number of samples tested for each gene in the selected disease (‘n’), while blue bars represent the number of samples mutated; mutation rates simply calculate n_mutated/n_tested in each case. In this example, BRAF is a well known driver of skin melanoma, mutated in 44% of tumors tested (A). However, BRAF mutations are found at a much higher rate in very restricted populations with rare blood cancers (B). The low mutation frequency of BRAF in uveal (Eye) melanoma (6%) suggests very different genetic mechanisms behind this disease. Throughout the COSMIC website, graphic presentations are accompanied by data tabulations to enable complete access to the underlying data. This gives the graphics and interpretations on the website real transparency, since it allows the independent evaluation of each presentation. The display of data source references (largely Pubmed IDs) additionally allows independent evaluation of each publication's curation. Each tabulation is presented in the same format, with sortable columns and exportable contents. Clicking on a column title will sort the entire table on that column in ascending order, clicking on the title again re-sorts in descending order. This also works for the histogram blocks in the Tissue tab on the Gene Analysis page. Each table is also searchable, with a text box in the top right-hand corner available to enter search terms, to which the table will dynamically respond. Links are also available to download the entire contents of each table in Excel-compatible CSV format. While the COSMIC website focuses on a genic presentation of cancer mutation data, cancer genomes are describing many more noncoding mutations and structural breakpoints than coding events. In order to make this information explorable in a genomic perspective, a COSMIC genome browser is now available. Initially conceived to add a genomic context to the Gene pages in the COSMIC website, it can be used independently, to explore all data types in COSMIC by typing a gene name or genomic coordinate in the search bar at the top of the page (http://cancer.sanger.ac.uk/jbrowse?data=data/json/cosmic&loc=3:10183532..10191649). Not all information is shown by default, as the page can become very cluttered, so it has been broken into ‘tracks’ which can be independently turned on and off. These are listed on the left hand side, with toggle buttons. Most significantly, this browser provides the ability to examine the coincidence of noncoding mutations or breakpoints with noncoding RNAs or other genomic annotations from the Ensembl database.

Downloads

While the COSMIC website makes the entire database easily navigable with graphical presentations, it cannot provide easy methods to ask any complex bioinformatic question. For deeper mining of the information in COSMIC, the database is made available to download in multiple formats (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/download), for which registration is required. For fairly simple programmatic interpretation, all the information is available in preformatted CSV datasheets, some running to millions of rows. For point mutation data, VCF format is also available. For more complex data integrations, the entire database is available in its native Oracle format, either as ‘exp’ dump or ‘datapump’ formats. In addition to download files, a Biomart instance ‘COSMICMart’ (http://cancer.sanger.ac.uk/biomart/martview) is available which allows direct programmatic access alongside data filtering and mining tools.

FUTURE WORK

As genomic resequencing of large tumor cohorts matures, huge amounts of data are being rapidly produced, presenting many new opportunities to discover new oncogenic alleles across the human genome. Combining cancer genome annotations from several data types with extensive manual curations has driven a large growth of the COSMIC database across the last 4 years (Figure 5). The download size of the COSMIC database in November 2010 was 191 Mb (v50), but the v70 release (Aug 2014) had grown to 11.74Gb, requiring much more substantial download infrastructure as well as time. As the output of cancer genome publications and consortia expands, this growth trend is expected to continue.

Figure 5.

Growth of COSMIC database size (in Mb, of the Oracle ‘exp’ export file) between November 2010 and August 2014, emphasizing the rapid expansion as COSMIC reflects the data generated by cancer genome studies. Data curation is still the key ingredient in COSMIC, especially manual interpretation of the scientific literature; no other database is addressing the breadth of cancer genetics literature to the same extent as COSMIC. With a strong commitment to manual curation, this will be continued and expanded, focusing primarily on coverage of novel cancer genes. Genes already released in COSMIC will also be addressed, to ensure their mutation profiles are representative of up-to-date knowledge. Cancer genome curation will also be maintained, blending semi-automated interpretation of published data sets with the output of major cancer genome consortia. While current data sources include TCGA and ICGC, similar new data sources are regularly sought. Consortia project goals suggest future release of increasingly large data sets, and systems have been built to ensure COSMIC can handle this output. In between these two extremes, wide targeted gene screens, with several hundred genes tested through large tumor cohorts, are an emerging feature in cancer genetics, and this information will be curated through automated processes, as the data sets are often too large for effective manual curation. In addition to showing point mutations on coding genes, the COSMIC genome browser allows the exploration of noncoding variants and their presence around noncoding genomic annotations, particularly noncoding RNAs. Automated curation systems will be adapted to define the effect of these mutations on ncRNAs, making these interpretations available in COSMIC. While additional data types such as copy number and gene expression are addressed in COSMIC, we will continue to find better ways of annotating variants, since interpretation of these data sets is a fast-changing field. DNA methylation data are also being output from cancer genome consortia, and ways of interpreting this are being explored to include this information in a future COSMIC release. Cancer genomes are remarkably noisy sources of data, often producing hundreds or thousands of point mutations, copy number aberrations and gene expression variants per tumor, most of which have no effect on the development of disease. We are adapting our curation processes to reduce this noise and highlight high-value information. While all published information is included in the COSMIC database, SNPs and germline mutations are tagged, allowing them to be removed from the COSMIC website. Samples with over 20 000 point mutations, none of which have been validated are excluded from curation as their noise vastly outweighs their signal. Copy number annotations are split into numeric and descriptive data sets, the former with full details on absolute copy number at each locus, the latter simply annotating regions of ‘gain’ and ‘loss’. The numeric data sets are considered high value and are used by default to drive the copy number data on the website; the descriptive data are by default excluded, but an opt-in visualization toggle is offered. These noise reduction and quality control systems will be adapted and enhanced to maximize the utility of the data in COSMIC. Identifying genes and mutations driving cancer is a goal of most COSMIC users, and several analytic methods are beginning to mature which might allow their systematic annotation, for instance the OncoDrive suite (10), MutSigCV (11), FATHMM (12). These algorithms are being scrutinized to see how they may be used to create a secondary annotation layer across the COSMIC database, significantly improving the identification of functional driver mutations in oncogenesis. COSMIC is the largest cancer genetics database in the world, and its growth and functional annotation will significantly enhance its support for cancer genetics research and druggable target identification as genetics increasingly grows to underpin clinical treatment.

12 in total

1. International network of cancer genome projects.

Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

Review 2. A census of human cancer genes.

Authors: P Andrew Futreal; Lachlan Coin; Mhairi Marshall; Thomas Down; Timothy Hubbard; Richard Wooster; Nazneen Rahman; Michael R Stratton
Journal: Nat Rev Cancer Date: 2004-03 Impact factor: 60.716

3. Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies.

Authors: Francis S Collins; Anna D Barker
Journal: Sci Am Date: 2007-03 Impact factor: 2.142

4. Discrepancies in cancer genomic sequencing highlight opportunities for driver mutation discovery.

Authors: Yaoyong Li; Eleanor W Trotter; Andrew M Hudson; Tim Yates; Shameem Fawdar; Phil Chapman; Paul Lorigan; Andrew Biankin; Crispin J Miller; John Brognard
Journal: Cancer Res Date: 2014-09-25 Impact factor: 12.701

5. Predicting the functional consequences of cancer-associated amino acid substitutions.

Authors: Hashem A Shihab; Julian Gough; David N Cooper; Ian N M Day; Tom R Gaunt
Journal: Bioinformatics Date: 2013-04-25 Impact factor: 6.937

6. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.

Authors: William McLaren; Bethan Pritchard; Daniel Rios; Yuan Chen; Paul Flicek; Fiona Cunningham
Journal: Bioinformatics Date: 2010-06-18 Impact factor: 6.937

7. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

Authors: Wanjuan Yang; Jorge Soares; Patricia Greninger; Elena J Edelman; Howard Lightfoot; Simon Forbes; Nidhi Bindal; Dave Beare; James A Smith; I Richard Thompson; Sridhar Ramaswamy; P Andrew Futreal; Daniel A Haber; Michael R Stratton; Cyril Benes; Ultan McDermott; Mathew J Garnett
Journal: Nucleic Acids Res Date: 2012-11-23 Impact factor: 16.971

8. Functional impact bias reveals cancer drivers.

Authors: Abel Gonzalez-Perez; Nuria Lopez-Bigas
Journal: Nucleic Acids Res Date: 2012-08-16 Impact factor: 16.971

9. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website.

Authors: S Bamford; E Dawson; S Forbes; J Clements; R Pettett; A Dogan; A Flanagan; J Teague; P A Futreal; M R Stratton; R Wooster
Journal: Br J Cancer Date: 2004-07-19 Impact factor: 7.640

10. Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors: Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal: Nature Date: 2013-06-16 Impact factor: 49.962

1172 in total

Review 1. Next-generation sequencing-based clinical sequencing: toward precision medicine in solid tumors.

Authors: Toshifumi Wakai; Pankaj Prasoon; Yuki Hirose; Yoshifumi Shimada; Hiroshi Ichikawa; Masayuki Nagahashi
Journal: Int J Clin Oncol Date: 2018-12-04 Impact factor: 3.402

2. Pathogenetic Analysis of Sinonasal Teratocarcinosarcomas Reveal Actionable β-catenin Overexpression and a β-catenin Mutation.

Authors: Andrew C Birkeland; Sarah J Burgin; Megan Yanik; Megan V Scott; Carol R Bradford; Jonathan B McHugh; Scott A McLean; Stephen E Sullivan; Jacques E Nor; Erin L McKean; J Chad Brenner
Journal: J Neurol Surg B Skull Base Date: 2017-03-27

3. Multi-omics analysis of primary glioblastoma cell lines shows recapitulation of pivotal molecular features of parental tumors.

Authors: Shai Rosenberg; Maïté Verreault; Charlotte Schmitt; Justine Guegan; Jeremy Guehennec; Camille Levasseur; Yannick Marie; Franck Bielle; Karima Mokhtari; Khê Hoang-Xuan; Keith Ligon; Marc Sanson; Jean-Yves Delattre; Ahmed Idbaih
Journal: Neuro Oncol Date: 2017-02-01 Impact factor: 12.300

4. Low-Coverage Exome Sequencing Screen in Formalin-Fixed Paraffin-Embedded Tumors Reveals Evidence of Exposure to Carcinogenic Aristolochic Acid.

Authors: Xavier Castells; Sandra Karanović; Maude Ardin; Karla Tomić; Evanguelos Xylinas; Geoffroy Durand; Stephanie Villar; Nathalie Forey; Florence Le Calvez-Kelm; Catherine Voegele; Krešimir Karlović; Maja Mišić; Damir Dittrich; Igor Dolgalev; James McKay; Shahrokh F Shariat; Viktoria S Sidorenko; Andrea Fernandes; Adriana Heguy; Kathleen G Dickman; Magali Olivier; Arthur P Grollman; Bojan Jelaković; Jiri Zavadil
Journal: Cancer Epidemiol Biomarkers Prev Date: 2015-09-17 Impact factor: 4.254

5. IW-Scoring: an Integrative Weighted Scoring framework for annotating and prioritizing genetic variations in the noncoding genome.

Authors: Jun Wang; Abu Z Dayem Ullah; Claude Chelala
Journal: Nucleic Acids Res Date: 2018-05-04 Impact factor: 16.971

6. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

Authors: Eric W Deutsch; Zhi Sun; David S Campbell; Pierre-Alain Binz; Terry Farrah; David Shteynberg; Luis Mendoza; Gilbert S Omenn; Robert L Moritz
Journal: J Proteome Res Date: 2016-09-12 Impact factor: 4.466

7. RNA2DNAlign: nucleotide resolution allele asymmetries through quantitative assessment of RNA and DNA paired sequencing data.

Authors: Mercedeh Movassagh; Nawaf Alomran; Prakriti Mudvari; Merve Dede; Cem Dede; Kamran Kowsari; Paula Restrepo; Edmund Cauley; Sonali Bahl; Muzi Li; Wesley Waterhouse; Krasimira Tsaneva-Atanasova; Nathan Edwards; Anelia Horvath
Journal: Nucleic Acids Res Date: 2016-08-30 Impact factor: 16.971