Literature DB >> 23263487

Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer.

Wei-Hua Jia¹, Ben Zhang, Keitaro Matsuo, Aesun Shin, Yong-Bing Xiang, Sun Ha Jee, Dong-Hyun Kim, Zefang Ren, Qiuyin Cai, Jirong Long, Jiajun Shi, Wanqing Wen, Gong Yang, Ryan J Delahanty, Bu-Tian Ji, Zhi-Zhong Pan, Fumihiko Matsuda, Yu-Tang Gao, Jae Hwan Oh, Yoon-Ok Ahn, Eun Jung Park, Hong-Lan Li, Ji Won Park, Jaeseong Jo, Jin-Young Jeong, Satoyo Hosono, Graham Casey, Ulrike Peters, Xiao-Ou Shu, Yi-Xin Zeng, Wei Zheng.

Abstract

To identify new genetic factors for colorectal cancer (CRC), we conducted a genome-wide association study in east Asians. By analyzing genome-wide data in 2,098 cases and 5,749 controls, we selected 64 promising SNPs for replication in an independent set of samples, including up to 5,358 cases and 5,922 controls. We identified four SNPs with association P values of 8.58 × 10(-7) to 3.77 × 10(-10) in the combined analysis of all east Asian samples. Three of the four were replicated in a study conducted in 26,060 individuals of European descent, with combined P values of 1.22 × 10(-10) for rs647161 (5q31.1), 6.64 × 10(-9) for rs2423279 (20p12.3) and 3.06 × 10(-8) for rs10774214 (12p13.32 near the CCND2 gene), derived from meta-analysis of data from both east Asian and European-ancestry populations. This study identified three new CRC susceptibility loci and provides additional insight into the genetics and biology of CRC.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2012 PMID： 23263487 PMCID： PMC3679924 DOI： 10.1038/ng.2505

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Colorectal cancer (CRC) is one of the most commonly diagnosed malignancies in East Asia and many other parts of the world [1]. Genetic factors play an important role in the etiology of both sporadic and familial CRC [2]. However, less than 6% of CRC cases can be explained by rare, high-penetrance variants in the CRC susceptibility genes identified to date, such as the APC, SMAD4, AXIN2, BMPR1A, POLD1, STK11, MUTYH, and DNA mismatch repair genes [2]. Over the past two decades, many candidate gene studies have evaluated common genetic risk factors for CRC; only a few of them have been replicated in subsequent studies [3]. Recent genome-wide association studies (GWAS) have identified approximately 15 common genetic susceptibility loci for CRC [4-12]. However, these newly identified genetic factors, along with known high-penetrance CRC susceptibility genes, explain less than 15% of the heritability for this common malignancy [10, 11]. Furthermore, with the exception of a small study conducted in Japan [12], all other GWAS were conducted among European-ancestry populations which differ from other ethnic groups in certain genetic architecture. Many of the variants discovered in European-ancestry populations show only a weak or no association with CRC in other ethnic groups [13]. Therefore, additional GWAS are needed, particularly in non-European-ancestry populations, to fully uncover the genetic basis for CRC susceptibility. In 2009, we initiated the Asia Colorectal Cancer Consortium (ACCC), a GWAS in East Asians, to search for novel genetic risk factors for CRC. The discovery stage (Stage 1) consisted of five GWAS conducted in China, Korea, and Japan, including 2,293 CRC patients and 5,780 controls (Supplementary Table 1). Cases and controls were genotyped using several SNP arrays, including Affymetrix Genome-Wide Human SNP Array 6.0 (906,602 SNPs), Affymetrix Genome-Wide Human SNP Array 5.0 (443,104 SNPs), Illumina Infinium HumanHap610 BeadChip (592,044 SNPs), Illumina Human610-Quad BeadChip (620,901 SNPs), and Illumina HumanOmniExpress BeadChip (729,462 SNPs) (Supplementary Table 1). After quality control (QC) exclusions as described previously [14-17], 2,098 cases and 5,749 controls remained for this study (Supplementary Tables 1 and 2). Also excluded from the analyses were SNPs with a call rate < 95%, genotype concordance rate < 95% among positive QC samples, minor allele frequency (MAF) < 5%, or P-value for Hardy-Weinberg equilibrium < 1.0 × 10−5 in controls for each study. Imputation was conducted for each study following the MACH algorithm [18] using phased HapMap 2 CHB and JPT samples as the reference. No apparent genetic admixture was identified except for one sample from KCPS-II (Supplementary Fig. 1). Associations between CRC risk and each of the genotyped and imputed SNPs were evaluated using logistic regression within each study after adjusting for age, sex, and the first ten principal components using mach2dat [18]. Meta-analyses were conducted under a fixed-effects model using the METAL program [19]. There was little evidence for inflation in the association test statistics for any of the five studies (genomic inflation factor (λ) range: 1.02 to 1.04) or for all studies combined (λ= 1.01) (Supplementary Table 1 and Supplementary Fig. 2). The observed number of SNPs with a small P-value was slightly larger than that expected by chance (Supplementary Fig. 2). Multiple genomic locations were revealed as potentially related to CRC risk (Supplementary Fig. 3). Nine SNPs identified from published GWAS conducted in European-ancestry populations showed an association with CRC risk at P< 0.05 in Stage 1 (data not shown). To improve the statistical power for evaluating these SNPs, we genotyped 6,476 additional samples to bring the total sample size to 5,252 cases and 9,071 controls. Except for the two SNPs (rs6691170 and rs16892766) that are monomorphic in East Asians, all 16 of the other SNPs identified from published GWAS conducted in European-ancestry populations showed an association with CRC risk in the same direction as reported previously (Supplementary Table 3). A significant association with CRC risk at P < 0.05 was found for 13 SNPs, including rs6687758, rs10936599, rs10505477, rs6983267, rs7014346, rs10795668, rs3802842, rs4444235, rs4779584, rs9929218, rs4939827, rs10411210, and rs961523. Except for two SNPs (rs6983267 and rs4779584), no statistically significant heterogeneity at P< 0.05 was observed between East-Asian- and European-ancestry populations (Supplementary Table 3). To identify novel genetic factors for CRC, we selected 64 SNPs for replication in an independent set of 5,358 cases and 5,922 controls recruited in five studies conducted in China, Korea, and Japan (Supplementary Table 2). SNPs were selected from among those with 1) MAF > 5%; 2) no heterogeneity across studies (Pheterogeneity> 0.05 and I2 < 25%); 3) not in linkage disequilibrium (LD) (r2< 0.2) with any known CRC risk variants reported from previous GWAS; 4) high imputation quality in each of the five studies (RSQ > 0.5); 5) P< 0.01 in the combined analysis of all five studies included in Stage 1. These criteria were used to prioritize SNPs for replication for this study. Of the 64 SNPs evaluated in Stage 2, seven SNPs showed an association with CRC risk at P< 0.05 with a direction of association consistent with that observed in Stage 1 (Table 1 and Supplementary Table 4). In the combined analysis of data from both Stages 1 and 2, P-values for the association with two SNPs (rs647161 at 5q31.1, OR=1.17, P = 3.77 × 10−10 and rs10774214 at 12p13.32, OR=1.17, P = 5.48 × 10−10) were lower than the conventional genome-wide significance level of 5.0 × 10−8, providing convincing evidence for an association of these SNPs with CRC risk (Table 1). An additional SNP, rs2423279, showed a significant association in Stage 2 after Bonferroni correction (corrected P< 7.8 × 10−4), but did not reach the conventional GWAS significance level for association with CRC risk in the combined analysis of all samples (OR=1.14, P = 2.29× 10−7). The association between CRC risk and each of these three SNPs was consistent across most studies (Fig. 1). Results for the other four SNPs (rs1665650, rs2850966, rs1580743, and rs4503064) replicated in Stage 2 at P< 0.05 are also presented in Supplementary Table 4, including one SNP (rs1665650) with a P-value of 8.58 × 10−7 in the combined analysis of all data from both stages (Table 1).

Table 1

Association of colorectal cancer risk with the top four risk variants identified in East Asian samples

SNP (alleles)a	Chr. (gene)b	Location (bp)c	Stage	Cases		Controls		Per-allele association		Heterogeneity
SNP (alleles)a	Chr. (gene)b	Location (bp)c	Stage	Sample size	MAF	Sample size	MAF	OR (95% CI)d	P_trend	Pe	I²
rs10774214 (T/C)	12p13.32 (CCND2)	4,238,613	GWAS	2,098	0.373	5,749	0.348	1.20 (1.09–1.32)	2.03×10⁻⁴
			Replication	5,197	0.381	5,797	0.355	1.16 (1.09–1.23)	5.80×10⁻⁷
			Overall	7,295	0.379	11,546	0.352	1.17 (1.11–1.23)	5.48×10⁻¹⁰	0.615	0%
rs647161 (A/C)	5q31.1 (PITX1)	134,526,991	GWAS	2,098	0.353	5,749	0.308	1.22 (1.12–1.33)	3.29×10⁻⁶
			Replication	5,217	0.344	5,815	0.319	1.14 (1.07–1.21)	1.15×10⁻⁵
			Overall	7,315	0.347	11,564	0.313	1.17 (1.11–1.22)	3.77×10⁻¹⁰	0.444	0%
rs2423279 (C/T)	20p12.3 (HAO1)	7,760,350	GWAS	2,098	0.339	5,749	0.307	1.16 (1.07–1.26)	4.96×10⁻⁴
			Replication	5,227	0.315	5,811	0.297	1.13 (1.06–1.19)	1.22×10⁻⁴
			Overall	7,325	0.322	11,560	0.302	1.14 (1.08–1.19)	2.29×10⁻⁷	0.331	12%
rs1665650 (T/C)	10q26.12 (HSPA12A)	118,477,090	GWAS	2,098	0.346	5,749	0.310	1.20 (1.10–1.31)	3.88×10⁻⁵
			Replication	5,192	0.328	5,808	0.320	1.10 (1.04–1.17)	0.0018
			Overall	7,290	0.333	11,557	0.315	1.13 (1.08–1.19)	8.58×10⁻⁷	0.404	4%

Abbreviations: Chr., Chromosome; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval.

Minor/major allele for East Asians, OR was estimated for the minor allele.

The closest gene.

Location based on NCBI Human Genome Build 36.3.

Adjusted for age, sex, the first ten principal components (Stage 1) and study site.

P for heterogeneity across studies in GWAS and Replication was calculated using a Cochran’s Q test.

Figure 1

Forest plots for the three SNPs showing evidence of an association with CRC risk

Per-allele ORs are presented with the area of the box proportional to the inverse variance weight of the estimate. Horizontal lines represent 95% CIs.

We next evaluated these top four SNPs shown in Table 1 using data from GWAS in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry (GECCO and CCFR), which include 11,870 cases and 14,190 controls of European ancestry [4, 20, 21]. Three of the four SNPs were replicated in the GECCO and CCFR, although the strength of the association was weaker than that found in East Asians (Table 2). These results provide independent support of our findings in the East Asian population. Meta-analyses of data from both East Asian and European populations provided strong evidence for associations of CRC risk with three SNPs with P-values all exceeding the genome-wide significant threshold of 5 × 10−8 (Table 2). The weaker associations observed in European-ancestry populations could be explained in part by differences in LD patterns for these loci for East Asians and Europeans (Supplementary Fig. 4). It is possible that causal variants in these regions are tagged by different SNPs in these two populations or there is allelic heterogeneity, in which different underlying causal variants exist in Asian- and European-ancestry populations. The difference in LD structure between Asian and European descendants and possible allelic heterogeneity in these two populations may explain, in part, why these loci were not discovered in previous studies conducted in European descendants. The fourth SNP, rs1665650 evaluated in in the GECCO and CCFR, however, was not replicated in European-ancestry populations (OR = 0.96, P = 0.05).

Table 2

Association of colorectal cancer risk with the three newly-identified risk variants in European-ancestry populations and the meta-analyses of East Asians and Europeans

SNP	Allelesa	MAFb		Europeansc			East Asians and Europeans combinedc
SNP	Allelesa	Cases	Controls	Cases/controls	OR (95% CI)	P_meta	Cases/controls	OR (95% CI)	P_meta
rs10774214	T/C	0.385	0.379	11,870/14,190	1.04 (1.00–1.09)	0.040	19,165/25,736	1.09 (1.06–1.13)	3.06×10⁻⁸
rs647161	A/C	0.680	0.667	11,870/14,190	1.07 (1.02–1.11)	0.002	19,185/25,754	1.11 (1.08–1.15)	1.22×10⁻¹⁰
rs2423279	C/T	0.263	0.252	11,870/14,190	1.07 (1.03–1.12)	0.001	19,195/25,750	1.10 (1.06–1.14)	6.64×10⁻⁹

Alelles (minor/major) as shown in Table 2 for East Asians.

Minor allele frequency (MAF) in European-ancestry populations.

Summary statistics were generated using inverse-variance weighted, fixed-effects meta-analysis.

Stratification analyses showed that the associations of CRC risk with each of these three replicated SNPs were generally consistent in Chinese, Korean, and Japanese (Pheterogeneity > 0.05), although the association with rs2423279 was not statistically significant in the Japanese, perhaps due to a small sample size (Supplementary Table 5). Associations of these three SNPs with CRC risk were similar for men and women (Pheterogeneity > 0.05) (Supplementary Table 6). SNP rs10774214 is located just 15 kb upstream of CCND2, the gene encoding cyclin D2 (Figure 2), a member of the D-type cyclin family, which also includes cyclins D1 and D3. These cyclins play a critical role in cell cycle control (from G1 phase to S phase) through activation of cyclin-dependent kinases (CDK), primarily CDK4 and CDK6 [22]. CCND2 is closely related to CCND1, a well-established human oncogene [22, 23]. Although CCND2 has been less well studied than CCND1, several studies including The Cancer Genome Atlas (TCGA) have shown CCND2 to be overexpressed in a substantial proportion of human colorectal tumors [22-25]. Overexpression of this cyclin may be an independent predictor of survival in CRC patients [24]. Several other genes, including PARP11, FGF23, FGF6, C12orf5, and RAD51AP1, are also in close proximity to the SNP identified in our study, of which both C12orf5 (also known as TIGAR, TP53-induced glycolysis and apoptosis regulator) and RAD51AP1 were found to be overexpressed in CRC tissue included in TCGA [25]. SNP rs10774214 is in strong LD with several SNPs that are located in potential transcription sites as determined by the TRANSFAC database [26]. Additional research may be warranted regarding possible mechanisms by which this SNP is related to CRC risk.

Figure 2

Regional plots of association results and recombination rates for the three SNPs showing evidence of an association with CRC risk

Genotyped and imputed data from GWAS samples are plotted based on their chromosomal position in NCBI Human Genome Build 36.3. For each region, the SNP selected for Stage 2 replication is denoted with a diamond, and P-value from the combined analysis of Stages 1 and 2 data is provided. Data are shown for (a) rs10774214, (b) rs647161, and (c) rs2423279.

SNP rs647161 is located on chromosome 5q31.1, where a cluster of SNPs were associated with CRC risk (Figure 2). Of the genes in this region (including PITX1, CATSPER3, PCBD2, MIR4461, and H2AFY), PITX1 is the closest to rs647161 (approximately 129 kb upstream). The PITX1 (paired-like homeodomain 1) gene has been described as a tumor suppressor gene and may be involved in the tumorigenesis of multiple human cancers [27-31], including CRC [27, 32]. PITX1 has been reported to suppress tumorigenicity by down-regulating the RAS pathway, which is frequently altered in colorectal tumors [27]. Inhibition of PITX1 induces the RAS pathway and tumorigenicity, and restoring PITX1 in colon cancer cells inhibits tumorigenicity [27]. It also has been reported that PITX1 may activate P53[33] and regulate telomerase activity [34]. Consistent with the role of a tumor suppressor, this gene has been found to be down-regulated in human cancer tissue samples and cell lines [27–30, 32]. CRC tissue expressing wild-type KRAS showed significantly lower expression levels of PITX1 than tissue with mutant KRAS [32]. Most recently, low PITX1 expression was found to be associated with poor survival in CRC patients [35]. In addition, rs6596201 in moderate LD with rs647161 (r2=0.25), is an eQTL (P=2.42×10−28) for the PITX1 gene [36]. Several other genes, including C5orf24, H2AFY, and NEUROG1, at this locus were also found to be highly expressed in colorectal tumors included in the TCGA (P<0.001) [25]. Additional studies are warranted to explore any possible role of these genes in the etiology of CRC. SNP rs2423279 is located on chromosome 20p12.3, close to the HAO1 and PLCB1 genes (Figure 2). HAO1 encodes hydroxyacid oxidase, which has 2-hydroxyacid activity. PLCB1 encodes phospholipase C beta 1, which plays an important role in the intracellular transduction of many extracellular signals. Overexpression of the PLCB1 gene has been observed in CRC tissue [25]. Possible mechanisms by which these genes are involved in CRC carcinogenesis are unknown. SNP rs2423279 is 1,408,069 bp downstream of rs961253, a SNP previously identified in a European GWAS to be associated with CRC risk [10]. However, these two SNPs are not correlated in East Asians (r2=0) or in Europeans (r2=0). Adjustment for rs961253 did not change the results for rs2423279 (data not shown). To our knowledge, this is the largest GWAS performed for CRC in East Asians, a population that differs from the European-ancestry population in CRC risk and certain aspects of genetic architecture. Our study, along with data from a large study conducted in a European-ancestry population, provides convincing evidence of association with CRC risk for three novel independent susceptibility loci at 5q31.1, 12p13.32, and 20p12.3. Results from this study provide new insights into the genetics and biology of CRC.

URLs

CGEMS, http://cgems.cancer.gov/; dbGaP, http://www.ncbi.nlm.nih.gov/gap; EIGENSTRAT, genepath.med.harvard.edu/~reich/EIGENSTRAT.htm; eqtl.uchicago.edu, http://eqtl.uchicago.edu/Home.html; GTEx eQTL Browser, http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi; Haploview, http://www.broad.mit.edu/mpg/haploview/; HapMap project, http://hapmap.ncbi.nlm.nih.gov/; IntOGen, http://www.intogen.org/home; LocusZoom, http://csg.sph.umich.edu/locuszoom/; MACH 1.0, http://www.sph.umich.edu/csg/abecasis/MACH/; mach2dat, http://www.sph.umich.edu/csg/abecasis/MACH/; METAL, http://www.sph.umich.edu/csg/abecasis/Metal/; PLINK version 1.07, http://pngu.mgh.harvard.edu/~purcell/plink/; R version 2.13.0, http://www.r-project.org/; SAS version 9.2, http://www.sas.com/; SNAP, http://www.broadinstitute.org/mpg/snap/; TRANSFAC, http://www.gene-regulation.com/pub/databases.html; UCSC Genome Browser, http://genome.ucsc.edu/.

ONLINE METHODS

Study populations

After quality control (QC), 7,456 cases and 11,671 controls from ten studies were included in this consortium (Supplementary Table 2). Detailed descriptions of participating studies and demographic characteristics of study participants are provided in Supplementary Note. Briefly, the consortium included 10,730 Chinese participants, 5,544 Korean participants, and 2,853 Japanese participants. Chinese participants were from five studies: Shanghai Study 1 (Shanghai-1, n = 3,102), Shanghai Study 2 (Shanghai-2, n = 485), Guangzhou Study 1 (Guangzhou-1, n = 1,613), Guangzhou Study 2 (Guangzhou-2, n = 2,892), and Guangzhou Study 3 (Guangzhou-3, n = 2,638). Korean participants were from three studies: the Korean Cancer Prevention Study-II (KCPS-II, n = 1,301), the Seoul Study (n = 1,522), and the Korea-National Cancer Center (Korea-NCC) Study (n = 2,721). Japanese participants were from two studies: Aichi Study 1 (Aichi-1, n = 1,346) and Aichi Study 2 (Aichi-2, n = 1,507). We also evaluated associations for the top four SNPs using data from 11,870 CRC cases and 14,190 controls of European ancestry included in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry (GECCO and CCFR), which include 14 studies from the USA, Europe, Canada and Australia [4, 20, 21]. Approval was granted from the relevant institutional review boards at all study sites, and all included participants gave informed consent.

Genotyping and QC procedures

For detailed descriptions of genotyping and QC procedures, and design for plates and QC samples, see the Supplementary Note. Briefly, in Stage 1, 481 cases and 2,632 controls from Shanghai-1 were genotyped using the Affymetrix Genome-Wide Human SNP Array 6.0 as described previously [14]. The average concordance percentage of QC samples was 99.7% with a median value of 100% in Shanghai-1 [14, 37, 38]. Stage 1 genotyping for 296 cases and 257 controls in Shanghai-2 was performed using Illumina HumanOmniExpress BeadChips. The same method was used to genotype cases from the Guangzhou-1 (n= 694) and Aichi-1 (n = 497) studies in Stage 1. The positive QC samples in these studies had an average concordance percentage of 99.41% and a median value of 99.97%. Cases and controls in KCPS-II were genotyped using the Affymetrix Genome-Wide Human SNP Array 5.0 [16]. Controls for the Guangzhou-1 and Aichi-1 studies were genotyped previously using the Illumina Human610-Quad [15] and Illumina Infinium HumanHap610 BeadChip [17] platforms, respectively. Details of QC procedures for these samples have been described previously [15-17]. Excluded from the analysis were samples that were genetically identical or duplicated, had a genotype-determined sex inconsistent with self-reported data, had unclear population structure, had close relatives with a PI-HAT estimate greater than 0.25 or had a call rate < 95%. Within each study, SNPs were excluded if: 1) MAF < 5%, 2) call rate < 95%, 3) genotyping concordance percentage < 95% in QC samples, 4) P-value for Hardy-Weinberg equilibrium < 1.0 × 10−5 in controls, or 5) SNPs not in the 22 autosomes. The final numbers of cases, controls, and SNPs remaining for analysis in each participating study are presented in Supplementary Table 1. Genotyping for Stage 2 was completed using the iPLEX Sequenom MassARRAY platform as described previously [14, 39]. With the exception of some samples from Guangzhou study, which were genotyped at Fudan University (Shanghai, China), all other samples were genotyped at the Vanderbilt Molecular Epidemiology Laboratory. The average concordance percentage of the genotyping data for positive QC samples was > 99% with a median value of 100% for each of the five studies. SNPs were excluded from the analysis if: 1) call rate < 95%, 2) genotyping concordance percentage < 95% in QC samples, 3) unclear genotyping cluster, or 4) P-value for Hardy-Weinberg equilibrium < 7.8 × 10−4. The numbers of SNPs remaining for analysis in each participating study in Stage 2 are presented in the Supplementary Note. Genotyping for samples included in the GECCO and CCFR GWAS was conducted using Illumina BeadChip arrays, with the exception of the Ontario Familial Colorectal Cancer Registry study, for which Affymetrix arrays were used [4, 20, 21]. Details of the QC procedures for these samples are presented in the Supplementary Note.

SNP selection for replication

SNPs were selected for Stage 2 replication based on the following criteria: 1) data available in each of the five Stage 1 studies; 2) MAF > 5% in each Stage 1 study; 3) no heterogeneity across the five studies included in Stage 1 (Pheterogeneity> 0.05 and I2< 25%); 4) not in LD (r2< 0.2) with any known risk variants reported from previous GWAS; 5) not in LD (r2< 0.2) with each other; 6) high imputation quality in each of the five studies (RSQ > 0.5), and 7) P< 0.01 in combined analysis of all Stage 1 studies.

Evaluation of population structure

We evaluated population structure in each of the five participating studies included in Stage 1 by using principal components analysis (PCA). Genotyping data for uncorrelated, genome-wide SNPs were pooled with data from HapMap to generate the first ten principal components using EIGENSTRAT software [40] (see URLs). The first two principal components for each sample were plotted using R (see URLs). We identified and excluded one participant of KCPS-II who was more than 6 σ away from the means of PC1 and PC2 (Supplementary Fig. 1). The remaining 7,847 samples showed clear East Asian origin, and these samples were included in the final genome-wide association analysis. Cases and controls in each of the five studies were in the same cluster as HapMap Asian samples. The estimated inflation factor λ ranged from 1.02 to 1.04 in these studies after adjusting for age, sex, and the first ten principal components with a λ of 1.01 for combined Stage 1 data (Supplementary Table 1 and Supplementary Fig. 2).

Imputation

We used the program MACH 1.0 [18](see URLs)to impute genotypes for autosomal SNPs which were present in HapMap Phase II release 22 separately for each of the five studies included in Stage 1. Genotype data from the 90 Asian subjects from HapMap were used as reference. For Guangzhou-1 and Aichi-1, cases and controls were genotyped using different platforms. To improve imputation quality [41], we identified SNPs shared between cases and controls (250,612 SNPs in Guangzhou-1 and 232,426 SNPs in Aichi-1) and used them to impute genotyping data. A total of 1,636,380 genotyped SNPs or imputed SNPs with high imputation quality (RSQ > 0.50) in all the five studies were tested for association with CRC. To directly evaluate the imputation quality for the top four SNPs identified in our study, we genotyped them in approximately 2,500 samples included in Stage 1. The agreement of genotype calls derived from direct genotyping and imputation was very high, with a mean value of 98.05%, 95.61%, 99.84%, and 97.90% for rs647161, rs10774214, rs2423279, and rs1665650, respectively (Supplementary Table 7).

Statistical analyses

Dosage data for genotyped and imputed SNPs for participants in each Stage 1 study were analyzed using the program mach2dat [18](see URLs). We coded 0, 1, or 2 copies of the effect allele as dosage for genotyped SNPs, and for imputed SNPs, we used the expected number of copies of the effect allele as dosage score. This approach has been shown to give unbiased estimates in meta-analyses [42]. Associations between SNPs and CRC risk were assessed using odds ratios (ORs) and 95% confidence intervals (CIs) derived from logistic regression models. ORs were estimated based on the log-additive model and adjusted for age, sex, and the first ten principal components. PLINK version 1.07 (see URLs) also was used to analyze genotype data [43] and yielded results virtually identical to those derived from dosage data using mach2dat [18]. Meta-analyses were performed using the inverse-variance method, assuming a fixed-effects model, and calculations were implemented in the METAL package [19] (see URLs). Similar to Stage 1, we used logistic regression models to derive ORs and 95% CIs for the 64 selected SNPs in Stage 2, assuming a log-additive model with adjustment for age and sex. We performed joint analyses to generate summary results for combined samples from all studies with additional adjustment for study site. We also conducted stratification analysis for the top four SNPs by population ethnicity (Chinese, Korean, and Japanese) and by sex. We used Cochran’s Q statistic to test for heterogeneity [44] and I2 statistic to quantify heterogeneity [45] across studies as described elsewhere in detail [46]. Analyses for Stage 2, as well as combined Stages 1 and 2 data were conducted using SAS, version 9.2(see URLs), with the use of two-tailed tests. P-value 5×10−8 in the combined analysis was considered statistically significant. We used Haploview version 4.2 [47](see URLs)to generate a genome-wide Manhattan plot for results from the Stage 1 meta-analysis. Forest plots and quantile-quantile (Q-Q) plots were drawn using R. We drew regional association plots using the website-based tool LocusZoom, version 1.1 [48] (see URLs). LD plots were generated using Haploview [47] and UCSC Genome Browser (see URLs).

48 in total

1. IntOGen: integration and data mining of multidimensional oncogenomic data.

Authors: Gunes Gundem; Christian Perez-Llamas; Alba Jene-Sanz; Anna Kedzierska; Abul Islam; Jordi Deu-Pons; Simon J Furney; Nuria Lopez-Bigas
Journal: Nat Methods Date: 2010-02 Impact factor: 28.547

Review 2. Genetic susceptibility to cancer: the role of polymorphisms in candidate genes.

Authors: Linda M Dong; John D Potter; Emily White; Cornelia M Ulrich; Lon R Cardon; Ulrike Peters
Journal: JAMA Date: 2008-05-28 Impact factor: 56.272

3. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3.

Authors: Ian P M Tomlinson; Emily Webb; Luis Carvajal-Carmona; Peter Broderick; Kimberley Howarth; Alan M Pittman; Sarah Spain; Steven Lubbe; Axel Walther; Kate Sullivan; Emma Jaeger; Sarah Fielding; Andrew Rowan; Jayaram Vijayakrishnan; Enric Domingo; Ian Chandler; Zoe Kemp; Mobshra Qureshi; Susan M Farrington; Albert Tenesa; James G D Prendergast; Rebecca A Barnetson; Steven Penegar; Ella Barclay; Wendy Wood; Lynn Martin; Maggie Gorman; Huw Thomas; Julian Peto; D Timothy Bishop; Richard Gray; Eamonn R Maher; Anneke Lucassen; David Kerr; D Gareth R Evans; Clemens Schafmayer; Stephan Buch; Henry Völzke; Jochen Hampe; Stefan Schreiber; Ulrich John; Thibaud Koessler; Paul Pharoah; Tom van Wezel; Hans Morreau; Juul T Wijnen; John L Hopper; Melissa C Southey; Graham G Giles; Gianluca Severi; Sergi Castellví-Bel; Clara Ruiz-Ponte; Angel Carracedo; Antoni Castells; Asta Försti; Kari Hemminki; Pavel Vodicka; Alessio Naccarati; Lara Lipton; Judy W C Ho; K K Cheng; Pak C Sham; J Luk; Jose A G Agúndez; Jose M Ladero; Miguel de la Hoya; Trinidad Caldés; Iina Niittymäki; Sari Tuupanen; Auli Karhu; Lauri Aaltonen; Jean-Baptiste Cazier; Harry Campbell; Malcolm G Dunlop; Richard S Houlston
Journal: Nat Genet Date: 2008-03-30 Impact factor: 38.330

4. Genetic and clinical predictors for breast cancer risk assessment and stratification among Chinese women.

Authors: Wei Zheng; Wanqing Wen; Yu-Tang Gao; Yu Shyr; Ying Zheng; Jirong Long; Guoliang Li; Chun Li; Kai Gu; Qiuyin Cai; Xiao-Ou Shu; Wei Lu
Journal: J Natl Cancer Inst Date: 2010-05-18 Impact factor: 13.506

5. A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci.

Authors: Jin-Xin Bei; Yi Li; Wei-Hua Jia; Bing-Jian Feng; Gangqiao Zhou; Li-Zhen Chen; Qi-Sheng Feng; Hui-Qi Low; Hongxing Zhang; Fuchu He; E Shyong Tai; Tiebang Kang; Edison T Liu; Jianjun Liu; Yi-Xin Zeng
Journal: Nat Genet Date: 2010-05-30 Impact factor: 38.330

6. Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1.

Authors: Wei Zheng; Jirong Long; Yu-Tang Gao; Chun Li; Ying Zheng; Yong-Bin Xiang; Wanqing Wen; Shawn Levy; Sandra L Deming; Jonathan L Haines; Kai Gu; Alecia Malin Fair; Qiuyin Cai; Wei Lu; Xiao-Ou Shu
Journal: Nat Genet Date: 2009-02-15 Impact factor: 38.330

7. Genetics and beyond--the transcriptome of human monocytes and disease susceptibility.

Authors: Tanja Zeller; Philipp Wild; Silke Szymczak; Maxime Rotival; Arne Schillert; Raphaele Castagne; Seraya Maouche; Marine Germain; Karl Lackner; Heidi Rossmann; Medea Eleftheriadis; Christoph R Sinning; Renate B Schnabel; Edith Lubos; Detlev Mennerich; Werner Rust; Claire Perret; Carole Proust; Viviane Nicaud; Joseph Loscalzo; Norbert Hübner; David Tregouet; Thomas Münzel; Andreas Ziegler; Laurence Tiret; Stefan Blankenberg; François Cambien
Journal: PLoS One Date: 2010-05-18 Impact factor: 3.240

8. LocusZoom: regional visualization of genome-wide association scan results.

Authors: Randall J Pruim; Ryan P Welch; Serena Sanna; Tanya M Teslovich; Peter S Chines; Terry P Gliedt; Michael Boehnke; Gonçalo R Abecasis; Cristen J Willer
Journal: Bioinformatics Date: 2010-07-15 Impact factor: 6.937

9. Expression of cyclin D2 is an independent predictor of the development of hepatic metastasis in colorectal cancer.

Authors: R Sarkar; I A Hunter; R Rajaganeshan; S L Perry; P Guillou; D G Jayne
Journal: Colorectal Dis Date: 2009-03-11 Impact factor: 3.788

10. METAL: fast and efficient meta-analysis of genomewide association scans.

Authors: Cristen J Willer; Yun Li; Gonçalo R Abecasis
Journal: Bioinformatics Date: 2010-07-08 Impact factor: 6.937

107 in total

1. Common genetic variation and survival after colorectal cancer diagnosis: a genome-wide analysis.

Authors: Amanda I Phipps; Michael N Passarelli; Andrew T Chan; Tabitha A Harrison; Jihyoun Jeon; Carolyn M Hutter; Sonja I Berndt; Hermann Brenner; Bette J Caan; Peter T Campbell; Jenny Chang-Claude; Stephen J Chanock; Jeremy P Cheadle; Keith R Curtis; David Duggan; David Fisher; Charles S Fuchs; Manish Gala; Edward L Giovannucci; Richard B Hayes; Michael Hoffmeister; Li Hsu; Eric J Jacobs; Lina Jansen; Richard Kaplan; Elisabeth J Kap; Timothy S Maughan; John D Potter; Robert E Schoen; Daniela Seminara; Martha L Slattery; Hannah West; Emily White; Ulrike Peters; Polly A Newcomb
Journal: Carcinogenesis Date: 2015-11-19 Impact factor: 4.944

Review 2. Genome-Wide Association Studies of Cancer in Diverse Populations.

Authors: Sungshim L Park; Iona Cheng; Christopher A Haiman
Journal: Cancer Epidemiol Biomarkers Prev Date: 2017-06-21 Impact factor: 4.254

3. Fine-mapping of genome-wide association study-identified risk loci for colorectal cancer in African Americans.

Authors: Hansong Wang; Christopher A Haiman; Terrilea Burnett; Barbara K Fortini; Laurence N Kolonel; Brian E Henderson; Lisa B Signorello; William J Blot; Temitope O Keku; Sonja I Berndt; Polly A Newcomb; Mala Pande; Christopher I Amos; Dee W West; Graham Casey; Robert S Sandler; Robert Haile; Daniel O Stram; Loïc Le Marchand
Journal: Hum Mol Genet Date: 2013-07-12 Impact factor: 6.150

4. Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis.

Authors: Adria Closa; David Cordero; Rebeca Sanz-Pamplona; Xavier Solé; Marta Crous-Bou; Laia Paré-Brunet; Antoni Berenguer; Elisabet Guino; Adriana Lopez-Doriga; Jordi Guardiola; Sebastiano Biondo; Ramon Salazar; Victor Moreno
Journal: Carcinogenesis Date: 2014-04-23 Impact factor: 4.944

Review 5. Genetic variations in colorectal cancer risk and clinical outcome.

Authors: Kejin Zhang; Jesse Civan; Sushmita Mukherjee; Fenil Patel; Hushan Yang
Journal: World J Gastroenterol Date: 2014-04-21 Impact factor: 5.742

6. Estimating the heritability of colorectal cancer.

Authors: Shuo Jiao; Ulrike Peters; Sonja Berndt; Hermann Brenner; Katja Butterbach; Bette J Caan; Christopher S Carlson; Andrew T Chan; Jenny Chang-Claude; Stephen Chanock; Keith R Curtis; David Duggan; Jian Gong; Tabitha A Harrison; Richard B Hayes; Brian E Henderson; Michael Hoffmeister; Laurence N Kolonel; Loic Le Marchand; John D Potter; Anja Rudolph; Robert E Schoen; Daniela Seminara; Martha L Slattery; Emily White; Li Hsu
Journal: Hum Mol Genet Date: 2014-02-21 Impact factor: 6.150

Review 7. New genes emerging for colorectal cancer predisposition.

Authors: Clara Esteban-Jurado; Pilar Garre; Maria Vila; Juan José Lozano; Anna Pristoupilova; Sergi Beltrán; Anna Abulí; Jenifer Muñoz; Francesc Balaguer; Teresa Ocaña; Antoni Castells; Josep M Piqué; Angel Carracedo; Clara Ruiz-Ponte; Xavier Bessa; Montserrat Andreu; Luis Bujanda; Trinidad Caldés; Sergi Castellví-Bel
Journal: World J Gastroenterol Date: 2014-02-28 Impact factor: 5.742

8. Evaluation of genetic variants in association with colorectal cancer risk and survival in Asians.

Authors: Nan Wang; Yingchang Lu; Nikhil K Khankari; Jirong Long; Hong-Lan Li; Jing Gao; Yu-Tang Gao; Yong-Bing Xiang; Xiao-Ou Shu; Wei Zheng
Journal: Int J Cancer Date: 2017-06-21 Impact factor: 7.396

Review 9. Collaborative cancer epidemiology in the 21st century: the model of cancer consortia.

Authors: Michael R Burgio; John P A Ioannidis; Brett M Kaminski; Eric Derycke; Scott Rogers; Muin J Khoury; Daniela Seminara
Journal: Cancer Epidemiol Biomarkers Prev Date: 2013-09-17 Impact factor: 4.254

10. Genome-wide association study of colorectal cancer identifies six new susceptibility loci.

Authors: Fredrick R Schumacher; Stephanie L Schmit; Shuo Jiao; Christopher K Edlund; Hansong Wang; Ben Zhang; Li Hsu; Shu-Chen Huang; Christopher P Fischer; John F Harju; Gregory E Idos; Flavio Lejbkowicz; Frank J Manion; Kevin McDonnell; Caroline E McNeil; Marilena Melas; Hedy S Rennert; Wei Shi; Duncan C Thomas; David J Van Den Berg; Carolyn M Hutter; Aaron K Aragaki; Katja Butterbach; Bette J Caan; Christopher S Carlson; Stephen J Chanock; Keith R Curtis; Charles S Fuchs; Manish Gala; Edward L Giovannucci; Stephanie M Gogarten; Richard B Hayes; Brian Henderson; David J Hunter; Rebecca D Jackson; Laurence N Kolonel; Charles Kooperberg; Sébastien Küry; Andrea LaCroix; Cathy C Laurie; Cecelia A Laurie; Mathieu Lemire; David Levine; Jing Ma; Karen W Makar; Conghui Qu; Darin Taverna; Cornelia M Ulrich; Kana Wu; Suminori Kono; Dee W West; Sonja I Berndt; Stéphane Bezieau; Hermann Brenner; Peter T Campbell; Andrew T Chan; Jenny Chang-Claude; Gerhard A Coetzee; David V Conti; David Duggan; Jane C Figueiredo; Barbara K Fortini; Steven J Gallinger; W James Gauderman; Graham Giles; Roger Green; Robert Haile; Tabitha A Harrison; Michael Hoffmeister; John L Hopper; Thomas J Hudson; Eric Jacobs; Motoki Iwasaki; Sun Ha Jee; Mark Jenkins; Wei-Hua Jia; Amit Joshi; Li Li; Noralene M Lindor; Keitaro Matsuo; Victor Moreno; Bhramar Mukherjee; Polly A Newcomb; John D Potter; Leon Raskin; Gad Rennert; Stephanie Rosse; Gianluca Severi; Robert E Schoen; Daniela Seminara; Xiao-Ou Shu; Martha L Slattery; Shoichiro Tsugane; Emily White; Yong-Bing Xiang; Brent W Zanke; Wei Zheng; Loic Le Marchand; Graham Casey; Stephen B Gruber; Ulrike Peters
Journal: Nat Commun Date: 2015-07-07 Impact factor: 14.919