The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
The genomes of all cancers accumulate somatic mutations1. These include nucleotide substitutions, small insertions and deletions, chromosomal rearrangements and copy number changes that can affect protein-coding or regulatory components of genes. In addition, cancer genomes usually acquire somatic epigenetic “marks” compared to non-neoplastic tissues from the same organ, notably changes in the methylation status of cytosines at CpG dinucleotides.A subset of the somatic mutations in cancer cells confers oncogenic properties such as growth advantage, tissue invasion and metastasis, angiogenesis, and evasion of apoptosis2. These are termed “driver” mutations. The identification of driver mutations will provide insights into cancer biology and highlight novel drug targets and diagnostic tests. Knowledge of cancer mutations has already led to the development of specific therapies, such as trastuzumab for HER2/neu positive breast cancers3 and imatinib, which targets BCR-ABL tyrosine kinase for the treatment of chronic myeloid leukemia4,5. The remaining somatic mutations in cancer genomes that do not contribute to cancer development are called “passengers”. These mutations provide insights into the DNA damage and repair processes that have been operative during cancer development, including exogenous environmental exposures6,7. In most cancer genomes, it is anticipated that passenger mutations, as well as germline variants not yet catalogued in polymorphism databases, will substantially outnumber drivers.Large-scale analyses of genes in tumors have revealed that the mutation load in cancer is abundant and heterogeneous8-13. Preliminary surveys of cancer genomes have already demonstrated their relevance in identifying new cancer genes that constitute potential therapeutic targets for several types of cancer, including PIK3CA14, BRAF15, NF110, KDR10, PIK3R19, and histone methyltransferases and demethylases16,17. These projects have also yielded correlations between cancer mutations and prognosis, such as IDH1 and IDH2 mutations in several types of gliomas13,18. Advances in massively parallel sequencing technology have enabled sequencing of entire cancer genomes 19-22.Following the launch of comprehensive cancer genome projects in the United Kingdom (Cancer Genome Project)23 and the United States (The Cancer Genome Atlas)24, cancer genome scientists and funding agencies met in Toronto (Canada) in October 2007 to discuss the opportunity to launch an international consortium. Key reasons for its formation were: (1) the scope is huge; (2) independent cancer genome initiatives could lead to duplication of effort or incomplete studies; (3) lack of standardization across studies could diminish the opportunities to merge and compare datasets; (4) the spectrum of many cancers is known to vary across the world; (5) an international consortium will accelerate the dissemination of datasets and analytical methods into the user community.Working groups were created to develop strategies and policies that would form the basis for participation in the ICGC. The goals of the Consortium (Box 1) were released in April 2008 (http://www.icgc.org/files/ICGC_April_29_2008.pdf). Since then, working groups and initial member projects have further refined the policies and plans for international collaboration.
Bioethical Framework
ICGC members agreed to a core set of bioethical elements for consent as a precondition of membership (Box 2). The Ethics and Policy Committee has created patient consent templates for both prospective collection and retrospective use of samples and data for ICGC projects. Differences in project-specific requirements and national legal frameworks may require some local amendments, while still reflecting the core principles of ICGC.The ICGC recognizes a delicate balance between protecting participants' personal data and sharing these data to accelerate cancer research. Data access policies have been drawn up that are respectful of the rights of the donors, while allowing ICGC data derived from samples to be shared ethically among a wide research community. Two levels of access have been implemented. For data that cannot be used to identify individuals, “Open access” datasets are publically available. These include data such as gender, age range, histology, normalized gene expression values, epigenetic datasets, somatic mutations, summaries of germline data, and study protocols. “Controlled access” datasets contain germline genomic data and detailed clinical information that are associated to a unique individual whose personal identifiers have been removed. To access controlled datasets researchers must seek authorizations by contacting the Data Access Compliance Office (DACO) (http://www.icgc.org/daco). An independent International Data Access Committee (IDAC) oversees the work of the DACO and provides assistance with resolving issues that arise.
Pathology and Clinical Annotation
Large-scale genomic studies of humantumors rely on the availability of fresh frozen tumor tissue. To address the paucity of samples that meet ICGC standards, many projects have initiated prospective collections of high quality source material. Accordingly, the ICGC recommended procedures to promote consistency of sample processing throughout the Consortium and ensure a series of quality features such as high tissue integrity and tumor cell content. Each project will need to include diverse data types such as environmental exposures, clinical history of participants, tumor histopathology, and clinical outcomes.Tumors display considerable clinical and biological heterogeneity which has resulted in a variety of tumor classifications. Within the ICGC, special measures are taken to promote the consistency of diagnosis. These include the coordination of diagnostic criteria among groups investigating tumors that are related, and policies that all samples will be reviewed by at least two independent reference pathologists. Furthermore, images of the stained tumor sections (or blood smear or cytospins for hematological neoplasias), from which diagnoses were made, will be stored and made available to the community.Although different tumor types may require specific procedures for tumor acquisition or compilation of clinical and environmental data, ICGC has set guidelines regarding the use of common definitions and data standards. This will allow ICGC data users to identify correlations between tumor-specific molecular changes with clinical and histopathological data including prognosis, prediction of therapy response and tumor classification schemes for diagnosis.
Study Design and Statistical Issues
To identify cancer-related genes, one needs to detect genes that are mutated at a higher frequency than the background mutation rate. Given that several driver genes have been found to be mutated at low frequencies, ICGC will identify somatic mutation observed in at least 3% of tumors of a given subtype. ICGC determined that 500 samples would be needed per tumor type (although for rare tumor types, a smaller sample size may be justified). In practice, the degree of heterogeneity of a given tumor type is difficult to know in advance, such that some particularly heterogeneous tumor types may require larger sample collections.
Cancer Genome Analyses
High-quality catalogues of somatic mutations from whole cancer genomes will ultimately be the ICGC standard. Shotgun sequencing employing second generation technologies can detect all classes of somatic mutation implicated in cancer. Moreover, if the level of coverage is sufficient, comprehensive high quality catalogues of somatic mutations from individual cancer genomes can be acquired with >90% sensitivity and >95% specificity. In order to achieve this, it will be necessary to sequence both the genome of the cancer and of a normal tissue from the same individual to distinguish germline variants. Although a few genomes of this standard have already been generated, the cost and the continuing technology development will mean that interim analyses of particularly informative sectors of the genome will be carried out, for example of all coding exons and microRNAs.For each individual cancer genome, the catalogue of somatic mutations will be supplemented by genome-wide information on the state of methylation of CpG dinucleotides. The optimal strategies and technologies to achieve this are not yet clear. Moreover, the genomes of individual cancers will be accompanied, where possible, by analyses of the transcriptome. Although conventional array-based approaches currently predominate, it is preferable that RNA sequencing becomes the standard as sequencing has a greater dynamic range25 and provides additional information including novel transcripts and sequence variants26.
ICGC Datasets
The distributed nature of the Consortium coupled with the large size of the datasets makes it cumbersome to store all data in a single centralized repository. For this reason, the ICGC has adopted a “franchise” database model for integrating the information and making it available to the public. Under this model, each member project releases tumor information by copying it into its local franchise database after it has been quality checked. Each franchise database shares a common schema to describe the specimens, the associated clinical information, and their genome characterization data. ICGC primary data files, including sequencing traces, are sent to the National Center for Biotechnology Information (NCBI) and/or the European Bioinformatics Institute (EBI) for archiving, while interpreted data sets, such as somatic mutation calls, are stored in franchise databases. The ICGC franchise databases and web portal use BioMart27, a data federation technology originally developed for use in Ensembl28, and since adopted for use by multiple model organism and genome databases. The management of the ICGC data flow is the responsibility of the ICGC Data Coordination Center (DCC) located at the Ontario Institute for Cancer Research.The DCC also operates the ICGC data portal which allows researchers to access both Open and Controlled access portions of the ICGC data. The portal provides a variety of user interfaces that range from simple gene-oriented queries (“Show me all the non-silent coding mutations identified in PIK3R1 for all cancers.”) to queries that integrate genomic, clinical, and functional information (“Show me all members of the toll receptor pathway having deletions in stage III breast cancer.”). These queries will be distributed across the franchise databases in a manner that is invisible to the user. The portal will also provide links to the primary files at NCBI and EBI, interfaces for generating tabular reports, data dumps in common bioinformatics formats, and other visualizations including genome browser tracks, pathway diagrams and survival curves. The portal is available via a link at http://www.icgc.org.At the time of this publication, the following cancer and reference datasets will be available through the ICGC web portal:Initial data releases from ICGC members for breast cancer (UK), liver cancer (Japan), and pancreatic cancer (Australia and Canada);A whole genome dataset of a metastatic melanoma cell line (COLO829)6;Open datasets from the TCGA for glioblastoma multiforme (GBM) and serous cystadenocarcinoma of the ovary (see below);Whole exome somatic mutation data from 68 individuals with breast, colorectal, pancreatic cancer and GBM11-13;Links to the human reference genome (http://www.genomereference.org/) and gene annotations from the GENCODE Project (http://www.sanger.ac.uk/gencode/) which includes the CCDS gene set29;Links to dbSNP30 and the HapMap31 databases, providing access to common patterns of variation in reference population samples;Links to Reactome32, a curated database of biological pathways in human;A set of reference gene models, mirrored from ENSEMBL28.The current version of the web portal provides an entry point to the open access data tier via interactive query as well as bulk download of data files. We expect that in mid 2010 both open access and controlled data will be available.The ICGC recently established a bioinformatics analysis working group to compare pipelines, analytic methods, consistency within and among algorithms, and establish guidelines or best practices for the Consortium. Over time, significant resources will be deployed to develop strategies to analyze the large complex datasets generated by ICGC member projects, and provide value-added views of cancer genomic data by integrating them with other biological and epidemiological datasets.
Data Release and IP Policies
The data release policies of the ICGC are intended to maximize public benefit while, at the same time, protecting the interests and rights of sample donors and their relatives. Members of the ICGC are committed to the principles of rapid data release (with appropriate controlled access mechanisms), in concordance with the Toronto Statement33. ICGC members encourage the scientific community to use any data that targets specific genes and mutations, without any restrictions. In order to allow ICGC members the opportunity to be the first to publish global analyses from datasets they generate, the Consortium has also agreed that member projects may specify conditions that include a time limit during which other data users are asked to refrain from publishing global analyses (defined by several ICGC member projects as 100 tumors and matched controls), a provision referred to as a “publication moratorium”. In order to allow time for a dataset to be analyzed and submitted for publication, ICGC members will have at most one year after released datasets reach the specified threshold before third parties are permitted to submit manuscripts describing global analyses. Further details on data release guidelines for data producers, users and reviewers are available http://www.icgc.org. Users of ICGC data are expected to respect these terms and to cite this manuscript and the source of pre-publication data, including the version of the dataset. In cases of uncertainty, scientists using ICGC data are encouraged to contact the member projects to discuss publication plans.ICGC members believe that maximum public benefit will be achieved if the data remain publicly accessible without patent restrictions hence no claims to possible intellectual property (IP) derived from primary data (including somatic mutations) will be made. Users of ICGC data (including ICGC members) may elect to perform further research and to exercise their IP rights on these downstream discoveries. If this occurs, users are expected to implement licensing policies that do not obstruct further research.
Initial ICGC Projects
Currently nine countries and two European consortia have initiated cancer genome projects under the umbrella of the ICGC. The initial projects, listed in an online table that accompanies this article, will analyze tumor types found around the globe and throughout the human body affecting a diversity of organs including blood, brain, breast, kidney, liver, pancreas, stomach, oral cavity, and ovary. Over time, the ICGC will investigate fifty or more types and subtypes of cancer in adults and children. In the case of tumors with multiple subtypes, analyses should be focused on subtypes that may be defined on pathological, molecular, etiological or geographical differences. It is expected that some cancer types will be studied in parallel in different parts of the world, as the mutation profiles may differ among populations. The consortium has enabled the coordination of initial projects analyzing similar cancers in different countries, and in some cases, the redirection of resources to launch new projects.
The Cancer Genome Atlas (TCGA)
TCGA is a comprehensive program in cancer genomics that is jointly supported and managed by the National Cancer Institute and the National Human Genome Research Institute of the U.S. National Institutes of Health. TCGA began in 2006 as a pilot focused on three projects, glioblastoma multiforme (GBM), serous cystadenocarcinoma of the ovary, and lung squamous carcinoma, and has recently expanded to produce comprehensive genomic data sets for at least 10 additional cancers in the next two years. Given TCGA's contributions in launching the ICGC and cooperation to ensure that its policies (posted at http://cancergenome.nih.gov) are coordinated with those of the ICGC, TCGA's participation in the ICGC is considered to be equivalent to that of a full member. TCGA, however, is not able to join the ICGC formally at this time, because of technical and legal issues in the U.S. related to the mechanisms of the distribution of controlled-access data, although such data are directly available to investigators at http://cancergenome.nih.gov/dataportal. The National Institutes of Health policies relating to distribution of controlled-access datasets are being reviewed with the intent of enabling researchers to integrate and analyze across databases, for example, using the franchise model adopted by the ICGC. Meanwhile, TCGA is ensuring that projects are coordinated and data sets are compatible with those of the Consortium.
ICGC in the Next Decade
A large proportion of common cancers affecting patients around the world have been or will soon be selected for comprehensive cancer genome studies. Further efforts will be needed to leverage support and expertise to tackle the remaining tumor types, including rare cancers. The challenges of the ICGC are daunting due to the scope of the initiative, the complexity that is inherent to the heterogeneity of cancer and the limitations of current technologies to provide accurate long-range assemblies of highly rearranged chromosomes found in tumor cells. These challenges underscore the importance of continued international coordination and further engagement of the scientific community in the next decade.
Moving Towards Clinical Applications
ICGC catalogues, which are expected to grow exponentially, will have immediate relevance in the cancer research community. Early insight into the biology of somatic mutations will come from functional studies in cell-based and animal models of tumors. Mutation screens in retrospective tumor banks linked to registries or clinical trials having significant clinical data will inform on the potential clinical utility of somatic mutations as biomarkers for prognosis or drug-response. Germline variants identified by ICGC projects may allow the discovery of genes predisposing to familial malignancies, such as PALB2 and pancreatic cancer12,34. High throughput screens of RNAi or small molecule libraries, and the adaptation of existing model systems, will play a major role in refining potential therapeutic candidates for further study35.Translating these discoveries into clinical practice will require more sophisticated clinical trials that take into account the increases in phenotypic subdivisions, additional coordination to identify subjects having tumors with similar profiles, and increased use of biomarkers, genomic analyses, informatics and other technologies in the clinical development of new therapeutics. Given the tremendous potential for relatively low-cost genomic sequencing to reveal clinically useful information, we anticipate that in the not so distant future, partial or full cancer genomes will routinely be sequenced as part of the clinical evaluation of cancerpatients and as part of their on-going clinical management. The successful and appropriate translation of cancer genome research into clinical practice will raise important social and ethical questions. It will be essential to combine the expertise of oncologists, biostatisticians, pathologists, geneticists, policy-makers and members of the biopharmaceutical industry to meet this challenge by developing new policies and clinical paradigms that enable rapid translation of many new biomarkers and cancer targets into new clinical tests and therapeutic interventions that will benefit cancerpatients.Coordinate the generation of comprehensive catalogues of genomic abnormalities (somatic mutations) in tumors in 50 different cancer types and/or subtypes which are of clinical and societal importance across the globe.Ensure high quality by defining the catalogue for each tumor type or subtype to include the full range of somatic mutations such as single-nucleotide variants, insertions, deletions, copy number changes, translocations and other chromosomal rearrangements, and to have the following features:Comprehensiveness, such that most cancer genes with somatic abnormalities occurring at a frequency of greater than 3% are discovered;High resolution, ideally at a single nucleotide level;High quality, using common standards for pathology and technology;Data from matched non-tumor tissue, to distinguish somatic from inherited sequence variants and aberrations;Generate complementary catalogues of transcriptomic and epigenomic datasets from the same tumors.Make the data available to the entire research community as rapidly as possible, and with minimal restrictions, to accelerate research into the causes and control of cancer.Coordinate research efforts so that the interests and priorities of individual participants, self-organizing consortia, funding agencies and nations are addressed, including use of the burden of disease and the minimization of unnecessary redundancy in tumor analysis efforts.Support the dissemination of knowledge and standards related to new technologies, software, and methods to facilitate data integration and sharing with cancer researchers around the globe.For prospective research, ICGC members should convey to potential participants, that:The ICGC is a coordinated effort among related scientific research projects being carried on around the worldParticipation in the ICGC and its component projects is voluntarySamples and data collected will be used for cancer research, which may include whole genome sequencingThe patient's care will not be affected by their decision regarding participationThe samples collected will be in limited quantities; access to them will be tightly controlled and will depend on the policy and practices of the ICGC-member project. At least a small percentage of the samples may be shared with laboratories in other countries for the purposes of performing quality control studiesData derived from the samples collected and data generated by the ICGC members will be made accessible to ICGC members and other international researchers through either an open or a controlled access database under terms and conditions that will maximize participant confidentialityThe researchers accessing data and samples will be required to affirm that they will not attempt to re-identify participantsThere is a remote risk of being identified from data available on the databasesOnce data are placed in open databases, those data cannot be withdrawn laterIn controlled access databases the links to (local) data that can identify an individual will be destroyed upon withdrawal. Data previously distributed will continue to be usedICGC members agree not to make claims to possible IP on primary dataNo profit from eventual commercial products will be returned to subjects donating samplesFor retrospective research, the above guidelines remain the same, with the exception that where the individual is no longer a patient, there will not be a concern that their care could be affected by participation.For research involving samples and data from deceased individuals:Where required by law or ethics, consent should always be obtained from the families of a deceased individual if their samples and data are to be used; if re-consent is not required, however, ethics review is sufficientEthics committee review should be sought for all research proposing the use of existing sample and data collectionsExisting collections are a limited and valuable resource; access to them will be tightly controlled.For research using anonymized samples, ethics review may be required in some jurisdictions.
Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971
Authors: Yardena Samuels; Zhenghe Wang; Alberto Bardelli; Natalie Silliman; Janine Ptak; Steve Szabo; Hai Yan; Adi Gazdar; Steven M Powell; Gregory J Riggins; James K V Willson; Sanford Markowitz; Kenneth W Kinzler; Bert Vogelstein; Victor E Velculescu Journal: Science Date: 2004-03-11 Impact factor: 47.728
Authors: D J Slamon; B Leyland-Jones; S Shak; H Fuchs; V Paton; A Bajamonde; T Fleming; W Eiermann; J Wolter; M Pegram; J Baselga; L Norton Journal: N Engl J Med Date: 2001-03-15 Impact factor: 91.245
Authors: B J Druker; M Talpaz; D J Resta; B Peng; E Buchdunger; J M Ford; N B Lydon; H Kantarjian; R Capdeville; S Ohno-Jones; C L Sawyers Journal: N Engl J Med Date: 2001-04-05 Impact factor: 91.245
Authors: B J Druker; C L Sawyers; H Kantarjian; D J Resta; S F Reese; J M Ford; R Capdeville; M Talpaz Journal: N Engl J Med Date: 2001-04-05 Impact factor: 91.245
Authors: Helen Davies; Graham R Bignell; Charles Cox; Philip Stephens; Sarah Edkins; Sheila Clegg; Jon Teague; Hayley Woffendin; Mathew J Garnett; William Bottomley; Neil Davis; Ed Dicks; Rebecca Ewing; Yvonne Floyd; Kristian Gray; Sarah Hall; Rachel Hawes; Jaime Hughes; Vivian Kosmidou; Andrew Menzies; Catherine Mould; Adrian Parker; Claire Stevens; Stephen Watt; Steven Hooper; Rebecca Wilson; Hiran Jayatilake; Barry A Gusterson; Colin Cooper; Janet Shipley; Darren Hargrave; Katherine Pritchard-Jones; Norman Maitland; Georgia Chenevix-Trench; Gregory J Riggins; Darell D Bigner; Giuseppe Palmieri; Antonio Cossu; Adrienne Flanagan; Andrew Nicholson; Judy W C Ho; Suet Y Leung; Siu T Yuen; Barbara L Weber; Hilliard F Seigler; Timothy L Darrow; Hugh Paterson; Richard Marais; Christopher J Marshall; Richard Wooster; Michael R Stratton; P Andrew Futreal Journal: Nature Date: 2002-06-09 Impact factor: 49.962
Authors: Christopher Greenman; Philip Stephens; Raffaella Smith; Gillian L Dalgliesh; Christopher Hunter; Graham Bignell; Helen Davies; Jon Teague; Adam Butler; Claire Stevens; Sarah Edkins; Sarah O'Meara; Imre Vastrik; Esther E Schmidt; Tim Avis; Syd Barthorpe; Gurpreet Bhamra; Gemma Buck; Bhudipa Choudhury; Jody Clements; Jennifer Cole; Ed Dicks; Simon Forbes; Kris Gray; Kelly Halliday; Rachel Harrison; Katy Hills; Jon Hinton; Andy Jenkinson; David Jones; Andy Menzies; Tatiana Mironenko; Janet Perry; Keiran Raine; Dave Richardson; Rebecca Shepherd; Alexandra Small; Calli Tofts; Jennifer Varian; Tony Webb; Sofie West; Sara Widaa; Andy Yates; Daniel P Cahill; David N Louis; Peter Goldstraw; Andrew G Nicholson; Francis Brasseur; Leendert Looijenga; Barbara L Weber; Yoke-Eng Chiew; Anna DeFazio; Mel F Greaves; Anthony R Green; Peter Campbell; Ewan Birney; Douglas F Easton; Georgia Chenevix-Trench; Min-Han Tan; Sok Kean Khoo; Bin Tean Teh; Siu Tsan Yuen; Suet Yi Leung; Richard Wooster; P Andrew Futreal; Michael R Stratton Journal: Nature Date: 2007-03-08 Impact factor: 49.962
Authors: Gillian L Dalgliesh; Kyle Furge; Chris Greenman; Lina Chen; Graham Bignell; Adam Butler; Helen Davies; Sarah Edkins; Claire Hardy; Calli Latimer; Jon Teague; Jenny Andrews; Syd Barthorpe; Dave Beare; Gemma Buck; Peter J Campbell; Simon Forbes; Mingming Jia; David Jones; Henry Knott; Chai Yin Kok; King Wai Lau; Catherine Leroy; Meng-Lay Lin; David J McBride; Mark Maddison; Simon Maguire; Kirsten McLay; Andrew Menzies; Tatiana Mironenko; Lee Mulderrig; Laura Mudie; Sarah O'Meara; Erin Pleasance; Arjunan Rajasingham; Rebecca Shepherd; Raffaella Smith; Lucy Stebbings; Philip Stephens; Gurpreet Tang; Patrick S Tarpey; Kelly Turrell; Karl J Dykema; Sok Kean Khoo; David Petillo; Bill Wondergem; John Anema; Richard J Kahnoski; Bin Tean Teh; Michael R Stratton; P Andrew Futreal Journal: Nature Date: 2010-01-06 Impact factor: 49.962
Authors: Marina Pajic; Christopher J Scarlett; David K Chang; Robert L Sutherland; Andrew V Biankin Journal: Hum Genet Date: 2011-04-23 Impact factor: 4.132
Authors: Fereydoun Hormozdiari; Iman Hajirasouliha; Andrew McPherson; Evan E Eichler; S Cenk Sahinalp Journal: Genome Res Date: 2011-11-02 Impact factor: 9.043
Authors: Fan Mo; Alexander W Wyatt; Chunxiao Wu; Anna V Lapuk; Marco A Marra; Martin E Gleave; Stanislav V Volik; Colin C Collins Journal: J Clin Microbiol Date: 2011-12-07 Impact factor: 5.948