Hongbing Shen1. 1. Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University Nanjing, China ; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University Nanjing, China.
For hundreds of years, cancer has remained a mystery, threatening the life expectancy and quality of all human beings. Early studies of cancer concluded that all cancers are influenced by genetic variation (germline and/or somatic), environmental agents, and/or health behaviors. Thus, a comprehensive description of the human genome is essential in order to understand the biology of cancer.1 Over the past decade, our knowledge of genomics has experienced substantial progress because of the rapid development of technology, which has dramatically advanced our studies of cancer.Since the conclusion of the Human Genome Project (HGP) in 2003, the reference human genome sequence has provided the first comprehensive catalogue of the human genome and identified more than three million human genetic variants.2,3 Despite the remarkable achievements following from the HGP, our knowledge of human genetic variation remains limited. Genetic association studies of the population provide a powerful approach to link genetic variants and diseases. To better identify disease-related variants, scientists launched the International HapMap Project alongside the 1000 Genomes Project.4.5 These two projects have developed a detailed catalogue of genetic variants in the human genome, provide a blueprint for following studies, and launch us into a genome-wide association study (GWAS) era.Association studies offer a classic strategy to study germline variants.6 Integrating with prior knowledge of candidate genes or loci, we can identify heritable variants associated with cancers (development or survival status) in these regions.7–12 However, a critical question remains; that is, not all reported associations can be robustly replicated in large studies or combined meta-analyses, possibly because of false positive, false negative, or population heterogeneity. With the application of high throughput genotyping technology, the GWAS has emerged as a more powerful and reliable tool for investigating the genetic architecture of complex diseases in a large sample size, without a prior hypothesis about a particular gene orlocus.13 In the GWAS approach, several hundred thousand to millions of single nucleotide polymorphisms (SNPs) are assayed across the whole genome in a large sample size of thousands of individuals.14 To date, GWA studies have led to the discovery of over 600 susceptibility loci of different kinds of cancers, including the loci that have been reported in Chinese populations.15–21Nevertheless, two problems emerged and haunted post-GWA studies for many years. The first problem is missing heritability: only a small fraction of disease heritability could be interpreted by known susceptibility loci. Using prostate cancer as an example, 40 identified susceptibility loci only account for approximately 25% of the familial risk of disease. To deal with this problem, some studies turned to rare variants, structural variations, epistasis, and gene-environment interactions.22–26 Compared with the common variants identified by GWAS (in general, minor allele frequency is more than 5%), rare variants are abundant in the human genome but are poorly detected by commonly used genotyping arrays, even after proper imputation. With the technology of exome sequencing, Thompson et al. successfully identified rare deleterious mutations in DNA repair genes as potential breast cancer susceptibility alleles.22 Structure variants, including copy number variants (CNVs), inversions, translocations, microsatellite, repeat expansions, insertions of new sequence, complex rearrangements, and short insertions or deletions (indels) may also account for some of the unexplained heritability and are poorly captured by existing arrays.27 Run of homozygosity (ROH) is a continuous or uninterrupted stretch of a genomic sequence without heterozygosity in the diploid state, which can be detected using GWAS data, but has been poorly investigated to date. Wang et al. explored the landscape and impact of ROHs on lung cancer.23 Using an existing GWAS dataset including 1473 lung cancer cases and 1962 controls, a new region at 14q23.1 was identified to be consistently associated with lung cancer risk in the Chinese population, suggesting that ROHs may also be responsible for the unexplained familial risk of diseases. In addition, gene-gene or gene-environment interaction is thought to be one of the most important “dark matters” of missing heritability. However, it is still difficult to detect interaction in the current status of epidemiological study design, exposure assessment, and methods of analysis.28 Some studies did some exploration, but interpretation of this statistic interaction is a great challenge.24–26The second problem is the explanation of GWAS results: 88% of those variants from GWA studies fall outside of coding regions and have been difficult to interpret.29 These problems have hampered our ability to pinpoint causal variants, identify genes affected by causal variants, and disentangle the mechanism by which genotype influences phenotype. Fortunately, the emergence of several large-scale genomic data sets generated by projects, such as the ENCyclopedia of DNA Elements (ENCODE), have revolutionized our ability to bestow potential function on GWAS identified variants. The ENCODE project is an international research consortium that aims to identify all functional elements in the human genome sequence.30 It revealed that 80.4% of the human genome displays some functionality in at least one cell type. Integrating functional elements generated by ENCODE, Schaub et al. provided putative functional annotations for up to 80% of all previously reported associations.31 Besides decoding new regulatory elements from human genomics, integrating data from multi-omics provides us new perspectives on GWAS results. On the basis of this strategy, Yao et al. identified 23 promoters and 28 enhancers potentially associated with colon cancer by using genomic and epigenomic information.32 As the landscape of human transcriptome becomes more available, researchers attempt to establish a connection between genetic variants and gene expression, namely expression quantitative trait loci (eQTL). Many studies have demonstrated that GWAS signals are enriched with eQTL variants in a tissue-specific manner, highlighting their capability to help us understand the mechanisms underlying GWAS hits.31,33Particularly, the potential for variants identified in GWA studies to predict the risk of complex diseases has been anticipated, but the usefulness of bringing these fundamental genetic findings to the bedside remains debatable.34 Nevertheless, there are already a number of benefits of such genetic prediction over classical non-genetic models. For instance, genetic risk prediction is more stable over time than traditional risk factors, as a person’s genetic sequence is absolutely constant throughout their life. Recently, Sun et al. reported that genetic score calculated by genetic variants discovered through an association study is an objective and better measurement of inherited risk of prostate cancer than family history.35Another aspect of cancer genomic studies has focused on somatic alternations (e.g. mutations, CNVs, chromosome rearrangement). Unlike neutral germline variants, deleterious somatic mutation could act as a direct trigger of cancer, conferring oncogenic properties, such as growth advantage, tissue invasion and metastasis, angiogenesis, and evasion of apoptosis.36 At the beginning of this century, studies on mutations were rare because of the complexity of the cancer genome and the limitations of technology. The emergence of massively parallel sequencing (MPS) revolutionized the entire enterprise. Since the first whole cancer genome sequencing by MPS in 2008, more than 10 000 cancer samples had been subjected to genome or exome sequencing by late 2013 in The Cancer Genome Atlas (TCGA) project (launched in 2005), let alone The International Cancer Genome Consortium (ICGC) project launched in 2009.37–39 The explosion of genomic data quickly shed light on the mutational processes of cancer and revealed that cancer is much more complex than we originally thought; cancer mutation rates are much more variable, ranging from as low as one base substitution per exon (0.1/Mb) in some pediatric cancers to thousands of mutations per exome (∼100/Mb) in certain mutagen-induced malignancies (such as lung cancer or melanoma); mutation patterns varied both across and within individual tumor types and some distinctive characteristics may reflect extrinsic factors like ultraviolet light or tobacco smoke, or intrinsic patterns such as DNA repair deficiencies.40 To date, TCGA Research Network have published the genomic landscape of more than 10 types of cancers in top journals, identifying hundreds of potential “driver” alternations of cancers and classifying each cancer into a more detailed subtype by integrating mutli-omics data.41–53 Such information could lead to more robust and personalized diagnostic and therapeutic strategies and provide a roadmap for developing new treatments.54 However, our cancer genome catalogue is far from complete. Although a handful of cancer genes are found mutated at high frequency and could easily be detected, many more potential cancer-related genes are found mutated at much lower frequencies. As mentioned in a recently published paper, of 40 loci mutated at significant rates, 53% of the apparent driver mutations or focal copy number alternations were concentrated in six genes (TP53, PIK3CA, ERBB2, FGFR1/ZNF703 and GATA3), and the remainder were dispersed across 34 genes.55 Only eight of the genes were reported to mutate in at least 10% of breast cancers. It will be a great challenge to find these low frequency mutated genes with the current sample sizes. In contrast to point mutations in exons, our ability to discover and understand other types of driver alternations is still limited. As we cannot fully interpret the activation of cancer by mutations in known driver genes for each individual, many more important cancer drivers, including copy number alternations, chromosome rearrangements, and noncoding regions may hide in areas we cannot reach. With the development of sequencing technology and decreasing cost, we believe that we will deal with these problems and gather systematic information to inform a wider range of biological and clinical questions, and eventually realize personalized prevention, diagnosis, and treatment of cancer.
Authors: Caleb F Davis; Christopher J Ricketts; Min Wang; Lixing Yang; Andrew D Cherniack; Hui Shen; Christian Buhay; Hyojin Kang; Sang Cheol Kim; Catherine C Fahey; Kathryn E Hacker; Gyan Bhanot; Dmitry A Gordenin; Andy Chu; Preethi H Gunaratne; Michael Biehl; Sahil Seth; Benny A Kaipparettu; Christopher A Bristow; Lawrence A Donehower; Eric M Wallen; Angela B Smith; Satish K Tickoo; Pheroze Tamboli; Victor Reuter; Laura S Schmidt; James J Hsieh; Toni K Choueiri; A Ari Hakimi; Lynda Chin; Matthew Meyerson; Raju Kucherlapati; Woong-Yang Park; A Gordon Robertson; Peter W Laird; Elizabeth P Henske; David J Kwiatkowski; Peter J Park; Margaret Morgan; Brian Shuch; Donna Muzny; David A Wheeler; W Marston Linehan; Richard A Gibbs; W Kimryn Rathmell; Chad J Creighton Journal: Cancer Cell Date: 2014-08-21 Impact factor: 31.743
Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205