Literature DB >> 24836286

Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk.

Ben Zhang¹, Wei-Hua Jia², Koichi Matsuda³, Sun-Seog Kweon⁴, Keitaro Matsuo⁵, Yong-Bing Xiang⁶, Aesun Shin⁷, Sun Ha Jee⁸, Dong-Hyun Kim⁹, Qiuyin Cai¹, Jirong Long¹, Jiajun Shi¹, Wanqing Wen¹, Gong Yang¹, Yanfeng Zhang¹, Chun Li¹⁰, Bingshan Li¹¹, Yan Guo¹², Zefang Ren¹³, Bu-Tian Ji¹⁴, Zhi-Zhong Pan², Atsushi Takahashi¹⁵, Min-Ho Shin¹⁶, Fumihiko Matsuda¹⁷, Yu-Tang Gao⁶, Jae Hwan Oh¹⁸, Soriul Kim⁸, Yoon-Ok Ahn¹⁹, Andrew T Chan²⁰, Jenny Chang-Claude²¹, Martha L Slattery²², Stephen B Gruber²³, Fredrick R Schumacher²³, Stephanie L Stenzel²³, Graham Casey²³, Hyeong-Rok Kim²⁴, Jin-Young Jeong⁹, Ji Won Park²⁵, Hong-Lan Li⁶, Satoyo Hosono⁵, Sang-Hee Cho²⁶, Michiaki Kubo¹⁵, Xiao-Ou Shu¹, Yi-Xin Zeng², Wei Zheng¹.

Abstract

Known genetic loci explain only a small proportion of the familial relative risk of colorectal cancer (CRC). We conducted a genome-wide association study of CRC in East Asians with 14,963 cases and 31,945 controls and identified 6 new loci associated with CRC risk (P = 3.42 × 10(-8) to 9.22 × 10(-21)) at 10q22.3, 10q25.2, 11q12.2, 12p13.31, 17p13.3 and 19q13.2. Two of these loci map to genes (TCF7L2 and TGFB1) with established roles in colorectal tumorigenesis. Four other loci are located in or near genes involved in transcriptional regulation (ZMIZ1), genome maintenance (FEN1), fatty acid metabolism (FADS1 and FADS2), cancer cell motility and metastasis (CD9), and cell growth and differentiation (NXN). We also found suggestive evidence for three additional loci associated with CRC risk near genome-wide significance at 8q24.11, 10q21.1 and 10q24.2. Furthermore, we replicated 22 previously reported CRC-associated loci. Our study provides insights into the genetic basis of CRC and suggests the involvement of new biological pathways.

Entities: Chemical

Mesh：

Substances：

Year: 2014 PMID： 24836286 PMCID： PMC4068797 DOI： 10.1038/ng.2985

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Colorectal cancer (CRC) is a leading cause of cancer morbidity and mortality worldwide [1]. It is well established that genetic factors play a significant role in the etiology of CRC [2, 3]. Deleterious germline mutations in known susceptibility genes, notably APC (adenomatous polyposis coli), MLH1, MSH2, MSH6 and PMS2, confer high risk of CRC in hereditary cancer syndromes [3-6]. Most sporadic CRC cases, however, do not carry these high-penetrance mutations [3, 4]. Since 2007, genome-wide association studies (GWAS) and subsequent fine-mapping analyses conducted in European descendants have identified 21 low-penetrance susceptibility loci associated with CRC risk [7-17]. Together, these common loci explain less than 10% of the familial relative risk of CRC in European populations [13, 14]. In a GWAS of 7,456 CRC cases and 11,671 controls conducted as part of the Asia Colorectal Cancer Consortium, we identified three new loci at 5q31.1 (near PITX1), 12p13.32 (near CCND2) and 20p12.3 (near HAO1) associated with CRC risk [18]. In addition, we discovered a new risk variant in the SMAD7 gene associated with CRC among East Asians [19]. Over the past two years, we have doubled the sample size in the Asia Colorectal Cancer Consortium and conducted a four-stage GWAS including 14,963 CRC cases and 31,945 controls to identify additional susceptibility loci for CRC.

RESULTS

We performed a fixed-effects meta-analysis to evaluate approximately 2.4 million genotyped or imputed SNPs in 22 autosomes from five GWAS (stage 1) conducted in China, Japan and South Korea, totaling 2,098 CRC cases and 6,172 cancer-free controls (Supplementary Tables 1 and 2). There was little evidence of population stratification in these studies (Supplementary Figs. 1 and 2), with genomic inflation factor λ <1.04 in any of the five studies and the meta-analysis (λ1000 =1.01). We selected 8,539 SNPs showing evidence of association with CRC risk (P <0.05) according to pre-specified criteria (ONLINE METHODS). We also included the 31 risk variants identified by previous GWAS [7-20], resulting in a total of 8,569 SNPs. Of them, 7,113 SNPs were successfully designed using Illumina Infinium assays as part of a large genotyping effort for multiple projects. Using this customized array, we genotyped an independent set of 3,632 CRC cases and 6,404 controls recruited in three studies (stage 2) conducted in China. After quality control exclusions, 6,899 SNPs remained for the analysis in 3,519 cases and 6,275 controls. We evaluated associations between CRC risk and these SNP in each study separately and then performed a fixed-effects meta-analysis to obtain the summary estimates. Again, we observed little evidence of population stratification either in the three studies individually (λ <1.05) or combined (λ = 1.05, λ1000 = 1.01) (Supplementary Fig. 3). In a meta-analysis of data from stages 1 and 2, we identified 559 SNPs showing evidence of association at P <0.005. We then evaluated these SNPs using data from a large Japanese CRC GWAS (stage 3) with 2,814 CRC cases and 11,358 controls [20]. Thirty SNPs in 25 new loci were associated with CRC risk at P <0.0001 in the meta-analysis of data from stages 1 to 3 and at P <0.01 in the meta-analysis of stages 2 and 3. Of them, 29 were successfully genotyped in an independent sample of 6,532 CRC cases and 8,140 controls from five additional studies (stage 4) conducted in China, South Korea and Japan.

Newly identified risk loci for CRC

In the meta-analysis of all data for the 29 SNPs from stages 1 to 4 with 14,963 CRC cases and 31,945 controls, signals from ten SNPs, representing six new loci, showed convincing evidence for an association with CRC risk at the genome-wide significance level (P <5×10−8) including: rs704017 at 10q22.3; rs11196172 at 10q25.2; rs174537, rs4246215, rs174550 and rs1535 at 11q12.2; rs10849432 at 12p13.31; rs12603526 at 17p13.3; and rs1800469 and rs2241714 at 19q13.2 (Table 1, Supplementary Tables 3 and 4, and Supplementary Fig. 4). Associations of CRC risk with the top SNPs in each of the six loci were consistent across almost all studies with no evidence of heterogeneity (Fig. 1). With the exception of rs10849432 intergenic to 12p13.31, the remaining nine newly identified risk variants are located in the exonic, promoter, three prime untranslated region (3′-UTR) or intronic regions of known genes (Table 1). The linkage disequilibrium (LD) blocks (r2 >0.5) tagged by rs704017 (10q22.3), rs174537 (11q12.2), and rs1800469 (19q13.2), each span multiple genes (Supplementary Table 5). The LD blocks tagged by rs11196172 (10q25.2) and rs12603526 (17p13.3), each lie within a single gene. The LD block tagged by rs10849432 (12p13.31) does not contain any known genes. Stratification analyses of the newly identified risk variants by tumor anatomic site (colon, rectum), population (Chinese, Korean, and Japanese), and sex (men, women) did not reveal any significant heterogeneity (Supplementary Tables 6 to 8). In addition to the six newly identified loci, three additional regions also showed an association with CRC risk near genome-wide significance at 8q24.11 (rs6469656, P =5.38×10−8), 10q21.1 (rs4948317, P =7.14×10−8) and 10q24.2 (rs12412391, P =7.41×10−7). Results for all 29 SNPs across stage 1 to stage 4 are presented in Supplementary Table 3.

Table 1

Summary results for risk variants in the six newly identified loci associated with CRC in East Asians

Locus	SNP	Genea	Annotation	Positionb	Allelesc	RAFd	Stage 1	Stage 2	Stage 3	Stage 4	Stages 1 to 4
Locus	SNP	Genea	Annotation	Positionb	Allelesc	RAFd	P	P	P	P	OR (95% CI)e	Pe
10q22.3	rs704017	ZMIZ1-AS1	Intron 3	80489138	G/A	0.32	0.01	0.01	0.004	9.99 × 10⁻⁴	1.10 (1.06–1.13)	2.07 × 10⁻⁸
10q25.2	rs11196172	TCF7L2	Intron 4	114716833	A/G	0.68	0.03	1.82 × 10⁻⁵	0.03	5.18 × 10⁻⁷	1.14 (1.10–1.18)	1.04 × 10⁻¹²
11q12.2	rs174537	MYRF	Intron 24	61309256	G/T	0.59	0.02	1.33 × 10⁻⁵	1.61 × 10⁻⁴	1.60 × 10⁻¹³	1.16 (1.12–1.19)	9.22 × 10⁻²¹
	rs4246215	FEN1	3′-UTR	61320875	G/T	0.59	0.02	2.29 × 10⁻⁶	1.83 × 10⁻⁴	1.25 × 10⁻¹¹	1.15 (1.12–1.19)	7.65 × 10⁻²⁰
	rs174550	FADS1	Intron 7	61328054	T/C	0.59	0.01	5.71 × 10⁻⁶	1.83 × 10⁻⁴	2.70 × 10⁻¹¹	1.15 (1.12–1.19)	1.58 × 10⁻¹⁹
	rs1535	FADS2	Intron 1	61354548	A/G	0.59	0.02	7.55 × 10⁻⁶	1.24 × 10⁻⁴	1.20 × 10⁻¹¹	1.15 (1.12–1.19)	8.21 × 10⁻²⁰
12p13.31	rs10849432	CD9	Intergenic	6255988	T/C	0.82	0.002	0.007	0.06	6.95 × 10⁻⁶	1.14 (1.09–1.18)	5.81 × 10⁻¹⁰
17p13.3	rs12603526	NXN	Intron 1	747343	C/T	0.30	0.02	6.86 × 10⁻⁴	0.08	3.80 × 10⁻⁴	1.10 (1.06–1.14)	3.42 × 10⁻⁸
19q13.2	rs1800469	TGFB1	Promoter	46552136	G/A	0.48	0.002	0.002	6.74 × 10⁻⁴	0.03	1.09 (1.06–1.12)	1.17 × 10⁻⁸
	rs2241714	B9D2	Exon 1	46561232	C/T	0.48	0.003	0.002	0.001	0.02	1.09 (1.06–1.12)	1.36 × 10⁻⁸

Abbreviations: RAF, risk allele frequency; OR, odds ratio; CI, confidence interval.

The closest gene(s).

The chromosome position (bp) is based on the National Center for Biotechnology Information (NCBI) database, build 36.

Risk/reference alleles are based on forward allele coding in NCBI, build 36. OR was estimated based on the risk allele (bold).

RAF in controls from all stages combined.

Summary OR (95% CI) and P value were obtained from a fixed-effects meta-analysis.

Figure 1

Forest plots for risk variants in the six newly identified loci

The six plots represent (a) rs704017, (b) rs11196172, (c) rs174537, (d) rs10849432, (e) rs12603526 and (f) rs1800469. Per-allele ORs are presented, with the area of each box proportional to the inverse variance weight of the estimate. Horizontal lines represent 95% CIs. Diamonds represent summary ORs generated under a fixed-effects meta-analysis; width of the diamonds corresponds to the 95% CIs. Unbroken vertical lines represent the null value; broken vertical lines represent the summary ORs for all studies for each SNP.

We performed conditional analyses for SNPs within a 1-mb region centered on the index SNPs in each of the six newly identified loci. No second signal was identified at P <0.01 after adjusting for the respective index SNPs (data not shown). Four SNPs at 11q12.2 and two SNPs at 19q13.2 showed association with CRC risk at P <5×10−8, and thus we performed haplotype analysis for these two loci using genotype data available for 10,051 CRC cases and 14,415 controls (stages 2 and 4). Two common haplotypes were found in the 11q12.2 locus, accounting for more than 99% of the haplotypes constructed using the four highly correlated SNPs. The haplotype with all four risk alleles (frequency =0.574 in controls) was strongly associated with CRC risk (odds ratio (OR) =1.40, 95% confidence interval (CI): 1.29–1.51; P =3.69×10−16) (Supplementary Table 9). Similarly, we identified two common haplotypes in the 19q13.2 locus, accounting for more than 99% of the haplotypes constructed using the two highly correlated SNPs. The haplotype with the risk allele in both SNPs (frequency =0.485 in controls) was also associated with increased risk of CRC (OR=1.16, 95% CI: 1.08–1.26; P =1.18×10−4) (Supplementary Table 10). Therefore, these analyses did not reveal an independent signal in any of the six newly identified loci. We examined potential SNP-SNP interactions between the six new risk variants (rs704017, rs11196172, rs174537, rs10849432, rs12603526, and rs1800469) identified in this study and also between these six SNPs and the risk variants in 25 previously reported loci (Supplementary Table 11). Multiplicative interactions were found with suggestive evidence (P <0.05) for seven pairs of SNPs. None of these interactions, however, remain statistically significant after correcting for multiple comparisons of 180 tests (adjusted P =0.000277). We evaluated associations of the ten newly identified SNPs with CRC risk in European descendants using data from three consortia, the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) [17], the Colorectal Transdisciplinary (CORECT) Study and the Colon Cancer Family Registry (CCFR) [21], with a total sample size of 16,984 CRC cases and 18,262 controls (Supplementary Table 12). In a meta-analysis of data from these consortia, all ten SNPs showed associations with CRC risk in the same direction as observed in East Asians (Table 2). Five SNPs in two loci (10q22.3 and 11q12.2) were associated with CRC risk at P <0.008 (corrected for multiple comparisons of six loci). The strength of these associations in Europeans, however, was weaker than in East Asians. Tests for heterogeneity were statistically significant for risk variants in 11q12.2 and 19q13.2 (P <0.008). The frequency of the risk allele also differed considerably between Europeans and East Asians for SNPs in five loci (Supplementary Table 13). For example, rs12603526 is common in East Asians, whereas the minor allele frequency (MAF) is <0.02 in Europeans. These differences may partly reflect distinct patterns of LD between the index SNPs and causal SNPs in these two populations. As expected, LD patterns for most of the newly identified loci differed considerably between Europeans and East Asians (Supplementary Fig. 5). Large-scale fine-mapping of these loci will be helpful to identify causal variants.

Table 2

Associations of risk variants in the six newly identified loci with CRC in European descendants

Locus	SNP (alleles)a	Geneb	Positionc	Cases/controls	RAFd	OR (95% CI)e	Pe	P_{heterogeneity}f
10q22.3	rs704017 (G/A)	ZMIZ1-AS1	80489138	16,984/18,262	0.57	1.06 (1.03–1.10)	4.71 × 10⁻⁴	0.20
10q25.2	rs11196172 (A/G)	TCF7L2	114716833	7,563/6,328	0.15	1.06 (0.99–1.13)	0.11	0.07
11q12.2	rs174537 (G/T)	MYRF	61309256	16,984/18,262	0.67	1.07 (1.04–1.11)	7.39 × 10⁻⁵	0.001
	rs4246215 (G/T)	FEN1	61320875	16,984/18,262	0.65	1.07 (1.03–1.10)	2.71 × 10⁻⁴	8.31 × 10⁻⁴
	rs174550 (T/C)	FADS1	61328054	16,984/18,262	0.67	1.07 (1.03–1.10)	2.37 × 10⁻⁴	8.87 × 10⁻⁴
	rs1535 (A/G)	FADS2	61354548	16,984/18,262	0.67	1.07 (1.04–1.11)	4.12 × 10⁻⁵	0.002
12p13.31	rs10849432 (T/C)	CD9	6255988	7,563/6,328	0.90	1.03 (0.95–1.11)	0.50	0.03
17p13.3	rs12603526 (C/T)	NXN	747343	16,984/18,262	0.02	1.12 (0.98–1.27)	0.10	0.83
19q13.2	rs1800469 (G/A)	TGFB1	46552136	16,984/18,262	0.67	1.03 (1.00–1.07)	0.09	0.01
	rs2241714 (C/T)	B9D2	46561232	16,984/18,262	0.67	1.02 (0.99–1.06)	0.18	0.007

Abbreviations: RAF, risk allele frequency; OR, odds ratio; CI, confidence interval.

Risk/reference alleles for Asians as shown in Table 1. OR was estimated for the risk allele.

The closest gene(s).

The chromosome position (bp) is based on NCBI Build 36.

RAF in controls.

Summary OR (95% CI) and P value were obtained from a fixed-effects meta-analysis.

P for heterogeneity between Asian and European populations was calculated using a Cochran’s Q test.

Putative functional variants and candidate genes

We evaluated and annotated putative functional variants and candidate genes in each of the six newly identified loci using data from the 1000 Genomes Project [22], HapMap 2 [23], Encyclopedia of DNA Elements (ENCODE) [24], expression quantitative trait locus (eQTL) databases [25-28], the Catalogue of Somatic Mutations in Cancer (COSMIC) [29], The Cancer Genome Atlas (TCGA) CRC project [30], Gene Expression Atlas [31], PubMed and OMIM (ONLINE METHODS). We summarize results below for each locus. At the 10q25.2 locus, rs11196172 is located in intron 4 of the TCF7L2 gene. The SNP and other correlated SNPs (r2 >0.5) fall within a strong enhancer activity region and a DNase I hypersensitivity site annotated by ENCODE (Supplementary Table 14), suggesting a potentially functional role for these SNPs. We found that the risk allele of rs11196172 was significantly associated with increased expression of the TCF7L2 gene (P =0.003) in colon tumor tissue using TCGA data (Fig. 2). The TCF7L2 gene encodes TCF7L2 (previously known as TCF4), which is key transcription factor in the Wnt signaling pathway. Aberrant activation of Wnt signaling is found in more than 90% of CRC [30]. TCF7L2 is a known tumor suppressor for CRC. Loss of TCF7L2 function enhances CRC cell growth, whereas gain of function suppresses CRC cell growth [32, 33]. The TCF7L2 gene is one of the most frequently mutated genes in CRC, with estimated point mutation rates of approximately 8 to 12.5% [29, 30]. Although TCF7L2 is the only gene in this locus (Supplementary Fig 4), we also found that the risk allele of rs11196172 was significantly associated with increased expression of the VTI1A gene (P =5.1×10−4) in colon tumor tissue (Fig. 2). The VTI1A gene is located approximately 131 kb upstream of the TCF7L2 gene and mRNA levels of these two genes are highly correlated in colon tumor tissues (r =0.71, P <0.0001). Recently, a recurrent gene fusion of the first three exons of VTI1A to the fourth exon of TCF7L2 has been found in approximately 3% of colorectal tumors [34]. It is possible that the VTI1A gene may also be involved in the association between rs11196172 and CRC risk.

Figure 2

Association of selected risk variants identified in this study with gene expression in colon tumor tissue

The three plots are for (a) rs11196172 and TCF7L2, (b) rs11196172 and VTI1A, and (c) rs1535 and FADS2. Gene expression levels are represented by per kilobase of exon per million mapped reads (RPKM) value based on the three genotypes of each SNP in red, blue and green. The median RPKM values and the interquartile range for each SNP are presented in the overlaid box plot. In (a) and (b), RPKM values are shown at normal scale, whereas RPKM values in (c) are shown with a log scale due to departure from normal distribution.

At the 19q13.2 locus, we identified two perfectly correlated SNPs (rs1800469 and rs2241714, r2 =1) associated with CRC risk. Of them, rs1800469 has been previously investigated in relation to CRC risk in many small candidate-gene association studies with conflicting results [5]. We herein provide, for the first time, convincing evidence for this association through our GWAS. SNP rs1800469 maps to the promoter of the TGFB1 gene, while rs2241714 is a nonsynonymous SNP that results in an amino acid substitution on codon 11 of the B9D2 protein. The A allele of rs1800469 has been related to higher levels of transcription activity of the TGFB1 gene and higher circulating levels of the TGF-β1 protein than the G allele [35]. Both rs1800469 and rs2241714 are in perfect LD with another nonsynonymous SNP rs1800470, which causes a proline to leucine substitution at codon 10 of the TGF-β1 protein. Although the two nonsynonymous SNPs are predicted to be tolerant [36] or benign [37], the Pro10 variant of rs1800470 has also been associated with an increase in gene expression of TGFB1, TGF-β1 protein secretion and circulating levels of TGF-β1 [38-40]. While rs2241714 is an eQTL for TGFB1, both rs1800469 and rs2241714 are also eQTLs for other genes in this locus (Supplementary Table 15). In addition to these three SNPs, many highly correlated SNPs located in the TGFB1 gene are suggested to have potentially regulatory functions (Supplementary Table 14). The TGF-β1 protein is a major member of the TGF-β signaling pathway. Somatic alterations of certain components (TGFBR2, SMAD4, SMAD2 and SMAD3) in this pathway are estimated to affect almost half of CRC [41]. High-penetrance germline mutations in the SMAD4 gene are known to cause juvenile polyposis, an autosomal dominant polyposis syndrome with a high risk of CRC [42]. Germline, allele-specific expression of the TGFBR1 gene has also been shown to contribute to increased risk of CRC [43]. To date, GWAS have identified at least six other independent SNPs that are located in or proximal to genes in the TGF-β signaling pathway (SMAD7, GREM1, BMP2, BMP4 and RHPN2) [9, 10, 13, 19]. Our finding of an association between a genetic variant in the TGFB1 gene and CRC risk adds further evidence for the critical role of this pathway in colorectal tumorigenesis. At the 11q12.2 locus, the four perfectly correlated SNPs rs174537, rs4246215, rs174550 and rs1535 lie in intron 24 of MYRF, the 3′-UTR of FEN1, intron 7 of FADS1 and intron 1 of FADS2, respectively. Of them, rs4246215 is an eQTL for the FEN1 gene in normal colorectal tissue [44] and is predicted to affect miRNA binding site activity [45]. SNP rs174537 is an eQTL for the FADS1 and FADS2 genes in whole blood and other types of tissue (Supplementary Table 15). Using data from TCGA, we identified a strong correlation of rs1535 genotypes with FADS2 gene expression (P =1.4×10−5) in colon tumor tissue (Fig. 2). These findings suggested that the potential function of these SNPs may be mediated through their effect on their host genes. We also found that the FEN1, FADS1 and FADS2 genes are all highly expressed in colon tumor tissue compared with normal colon tissue (Supplementary Table 16). The FEN1 gene encodes flap structure-specific endonuclease 1, a protein that is essential for DNA repair, replication and degradation and has a critical role in maintaining genome stability and protecting against carcinogenesis [46]. FEN1 mutations have been found in several human cancers [47]. Mouse models with haploinsufficiency of Fen1 showed rapid progression of CRC and reduced survival [48]. Two other genes in this locus, FADS1 and FADS2, respectively encode delta-5 and delta-6 desaturases, which are key enzymes in polyunsaturated fatty acid metabolism. Of them, delta-6 desaturase is responsible for the synthesis of arachidonic acid [49], the precursor of prostaglandin E2 (PGE2), which is a key molecule mediating the effect of cyclooxygenase-2 in colorectal carcinogenesis [50]. Notably, SNPs in perfect LD with the risk variants for CRC identified in this study are strongly associated with circulating arachidonic acid level [49]. We have shown previously that high levels of urinary PGE2 metabolite, a marker of endogenous PGE2 production, is strongly related to elevated risk of CRC [51]. Because the LD block of approximately 190 kb tagged by the four risk variants covers many putatively functional SNPs that are located in the FEN1, FADS1 and FADS2 genes (Supplementary Table 14 and Supplementary Fig. 6), it is difficult to pinpoint a single SNP or gene that may be responsible for the association with CRC risk in this locus. Nevertheless, our study provides evidence for a potentially significant role of the FEN1, FADS1 and FADS2 genes in the etiology of CRC. At the 10q22.3 locus, rs704017 is located in intron 3 of the ZMIZ1-AS1 gene and resides in a strong enhancer region predicted using ENCODE data (Supplementary Fig. 6 and Supplementary Table 14). It also maps to a DNase I hypersensitivity site in the Caco-2 CRC cell line. In addition to the ZMIZ1-AS1 gene, the LD block tagged by rs704017 also includes the ZMIZ1 gene, which is down-regulated in the Caco-2 and HT-29 CRC cell lines [31]. In line with this, we found in TCGA data that ZMIZ1 gene expression is reduced in colon tumor tissue compared with normal colon tissue (P =3.28×10−6). In addition, somatic mutations in the ZMIZ1 gene have been reported in more than 2% of colon tumors [29]. While ZMIZ1-AS1 is a miscRNA gene with unknown function, the ZMIZ1 gene encodes the protein ZMIZ1, which regulates the activity of several transcription factors, including AR, SMAD3, SMAD4 and p53. It has been shown that ZMIZ1 may play a broader role in epithelial cancers, including CRC [52]. SNP rs704010, located in intron 1 of the ZMIZ1 gene, has been associated with breast cancer [53]. However, this SNP, which is in weak LD (r2 = 0.09) with the risk variant we identified for CRC, was not associated with CRC in this study (data not shown). Given the biologic function of the ZMIZ1 gene, it is possible that this gene is involved in the association observed in this locus. At the 12p13.31 locus, rs10849432 maps to a LD block of approximately 52 kb with no known genes. ENCODE data suggest that rs4764551 and rs4764552, perfectly correlated with rs10849432, may be located in a strong enhancer region (Supplementary Table 14). Notably, rs4764551 also maps to a DNase I hypersensitivity site in the HCT-116 CRC cell line and a binding site of the CTCF protein in the Caco-2 CRC cell line. Using data from TCGA, we showed that the closest genes to rs10849432, CD9, PLEKHG6 and TNFRSF1A are all down-regulated in colon tumor tissue (Supplementary Table 16). The CD9 gene encodes the CD9 antigen, which participates in many cellular processes, including differentiation, adhesion and signal transduction. Notably, CD9 plays a critical role in the suppression of cancer cell motility and metastasis [54], and overexpression of the CD9 gene is associated with favorable prognosis of patients with CRC [55]. CD9 is also involved in suppressing Wnt signaling [56]. While function of the PLEKHG6 gene is less clear, somatic mutations in this gene were found in approximately 2% of colon tumors [29]. The protein encoded by TNFRSF1A is a major receptor for tumor necrosis factor-alpha and is known to be involved in cytokine-induced senescence in cancer [57]. In addition to evidence for the three nearby genes, we found that rs4764552 is an eQTL for the LTBR gene (Supplementary Table 15). The LTβR protein plays an essential role in lymphoid organ formation and has also been linked to cancer [58], including CRC [59]. Based on these data, we believe that the CD9 gene is the most likely candidate to explain the association identified in this locus. However, the potential role of other genes cannot be ruled out. At the 17p13.3 locus, rs12603526 lies in intron 1 of the NXN gene, a region covering several regulatory elements, including a DNase I hypersensitivity site, a strong enhancer region and a site with an effect on regulatory motifs as annotated by ENCODE (Supplementary Table 14). NXN gene expression was reduced in colon tumor tissue samples included in TCGA (P =2.83×10−5). Nucleoredoxin, encoded by the NXN gene, has functions related to cell growth and differentiation [60]. Overexpression of the NXN gene has been found to suppress the Wnt signaling pathway, and dysfunction of nucleoredoxin may cause activation of the transcription factor T cell factor, accelerated cell proliferation and enhancement of oncogenicity [61]. Further research is needed to determine the causal variant and biologic mechanism for the association in this locus.

Previously reported CRC loci in East Asians

We evaluated association evidence for 31 SNPs in 25 established CRC susceptibility loci [7-20] by analyzing data from stages 1 to 3 and our previous GWAS [18, 19] with a total sample size of up to 11,934 CRC cases and 28,282 controls (Table 3 and Supplementary Table 17). We found further evidence to support the association for the four loci identified previously in our GWAS conducted among East Asians (P =1.40×10−10 to 3.05×10−15). Of the 23 SNPs in the 18 susceptibility loci previously identified by GWAS of European descendants, 20 showed associations with CRC risk at P <0.05 among East Asians in the same direction as reported in the original studies [7-17]. These included six SNPs in four loci (1q41, 8q24.21, 10p14 and 18q21.1) with an association at P <5×10−8, six SNPs in six loci with an association at P <0.002 (significance level adjusted for multiple comparisons of 24 independent loci), and eight SNPs in eight additional loci with an association at P <0.05. Three SNPs in three loci were not associated with CRC risk (P >0.05). Given that our study had a statistical power of >80% to identify an association with an OR of 1.05 at P =0.05 for SNPs with a MAF of 0.20, it is unlikely that these three SNPs confer a substantial risk of CRC in East Asian populations. Generally, loci initially identified in Europeans had smaller ORs in East Asians, with evidence of heterogeneity noted for three SNPs (P <0.002). SNPs rs6691170 and rs16892766, identified by previous GWAS of European descendants, are not polymorphic in East Asians and SNP rs5934683 is located in chromosome X. We did not have data to evaluate the association of these three SNPs with CRC risk in this study.

Table 3

Association evidence in East Asians for risk variants in previously reported CRC susceptibility loci

Locus	SNP	Genea	Annotation	Positionb	Allelesc	East Asians combined in this study				Published GWAS		P_{heterogeneity}f
Locus	SNP	Genea	Annotation	Positionb	Allelesc	N	RAFd	OR (95% CI)	P	RAFe	OR (95% CI)e	P_{heterogeneity}f
Loci initially identified in East Asians
5q31.1	rs647161	PITX1	Intergenic	134526991	A/C	40,051	0.31	1.15 (1.11–1.19)	1.87 × 10⁻¹⁴	0.31	1.17 (1.11–1.22)	0.51
12p13.32	rs10774214	CCND2	Intergenic	4238613	T/C	33,436	0.37	1.14 (1.09–1.18)	1.40 × 10⁻¹⁰	0.35	1.17 (1.11–1.23)	0.39
20p12.3	rs2423279	HAO1	Intergenic	7760350	C/T	40,057	0.31	1.13 (1.09–1.17)	3.04 × 10⁻¹²	0.30	1.14 (1.08–1.19)	0.86
18q21.1	rs7229639	SMAD7	Intron 3	44704974	A/G	39,288	0.16	1.20 (1.16–1.25)	3.05 × 10⁻¹⁵	0.15	1.22 (1.15–1.29)	0.72
Loci initially identified in Europeans
1q41	rs6687758	DUSP10	Intergenic	220231571	G/A	37,803	0.24	1.12 (1.08–1.17)	8.99 × 10⁻⁹	0.20	1.09 (1.06–1.12)	0.23
2q32.3	rs11903757	NABP1	Intergenic	192295449	C/T	22,442	0.05	1.15 (1.03–1.28)	0.01	0.16	1.16 (1.10–1.22)	0.89
3q26.2	rs10936599	MYNN	Exon 2	170974795	C/T	37,790	0.39	1.05 (1.01–1.08)	0.01	0.75	1.08 (1.05–1.10)	0.22
6p21.31	rs1321311	CDKN1A	Intergenic	36730878	A/C	32,236	0.14	1.09 (1.03–1.15)	0.001	0.23	1.10 (1.07–1.13)	0.77
8q24.21	rs10505477	Unknown	Intergenic	128476625	A/G	32,235	0.38	1.15 (1.11–1.20)	3.43 × 10⁻¹³	0.51	1.17 (1.12–1.23)	0.64
8q24.21	rs6983267	Unknown	Intergenic	128482487	G/T	37,790	0.38	1.14 (1.10–1.18)	4.85 × 10⁻¹⁴	0.52	1.21 (1.15–1.27)	0.06
8q24.21	rs7014346	Unknown	Intergenic	128493974	A/G	32,236	0.27	1.13 (1.08–1.17)	1.96 × 10⁻⁸	0.37	1.19 (1.14–1.24)	0.06
10p14	rs10795668	Unknown	Intergenic	8741225	G/A	37,789	0.60	1.15 (1.11–1.19)	4.91 × 10⁻¹⁵	0.67	1.12 (1.09–1.16)	0.30
11q13.4	rs3824999	POLD3	Intron 9	74023198	G/T	32,236	0.40	1.06 (1.02–1.11)	0.002	0.50	1.08 (1.05–1.10)	0.54
11q23.1	rs3802842	Unknown	Intergenic	110676919	C/A	37,791	0.38	1.09 (1.05–1.12)	2.57 × 10⁻⁷	0.29	1.11 (1.08–1.15)	0.37
12q13.13	rs7136702	LARP4	Intergenic	49166483	T/C	37,774	0.51	1.02 (0.98–1.06)	0.31	0.35	1.06 (1.04–1.08)	0.05
12q13.13	rs11169552	ATF1	Intergenic	49441930	C/T	37,761	0.65	1.05 (1.01–1.09)	0.01	0.72	1.09 (1.06–1.12)	0.11
14q22.2	rs4444235	BMP4	Intergenic	53480669	C/T	37,785	0.53	1.04 (1.01–1.08)	0.02	0.46	1.11 (1.08–1.15)	0.007
14q22.2	rs1957636	BMP4	Intergenic	53629768	T/C	32,236	0.62	0.99 (0.95–1.04)	0.77	0.40	1.08 (1.06–1.11)	0.001
15q13.3	rs16969681	SCG5	Intergenic	30780403	T/C	32,236	0.44	1.07 (1.03–1.12)	0.002	0.09	1.18 (1.11–1.25)	0.01
15q13.3	rs4779584	SCG5	Intergenic	30782048	T/C	37,795	0.82	1.06 (1.01–1.11)	0.01	0.18	1.26 (1.19–1.34)	5.48 × 10⁻⁶
15q13.3	rs11632715	GREM1	Intergenic	30791539	A/G	22,442	0.81	0.95 (0.90–1.01)	0.11	0.47	1.12 (1.08–1.16)	4.05 × 10⁻⁶
16q22.1	rs9929218	CDH1	Intron 2	67378447	G/A	28,806	0.81	1.06 (1.00–1.11)	0.03	0.71	1.10 (1.07–1.13)	0.19
18q21.1	rs4939827	SMAD7	Intron 3	44707461	T/C	37,796	0.24	1.12 (1.08–1.16)	1.53 × 10⁻⁸	0.52	1.18 (1.12–1.23)	0.11
19q13.11	rs10411210	RHPN2	Intron 2	38224140	C/T	37,789	0.82	1.12 (1.07–1.17)	3.14 × 10⁻⁶	0.90	1.15 (1.10–1.20)	0.39
20p12.3	rs961253	BMP2	Intergenic	6352281	A/C	37,807	0.09	1.10 (1.04–1.17)	7.74 × 10⁻⁴	0.36	1.12 (1.08–1.16)	0.66
20p12.3	rs4813802	BMP2	Intergenic	6647595	G/T	32,236	0.21	1.12 (1.06–1.17)	9.87 × 10⁻⁶	0.36	1.09 (1.16–1.12)	0.37
20q13.33	rs4925386	LAMA5	Intron 10	60354439	C/T	37,780	0.77	1.05 (1.01–1.10)	0.01	0.68	1.08 (1.05–1.10)	0.38

Abbreviations: GWAS, genome-wide association study; RAF, risk allele frequency; OR, odds ratio; CI, confidence interval.

The closest gene(s).

The chromosome position (bp) is based on NCBI Build 36.

Risk/reference alleles (in published GWAS) are based on forward allele coding in NCBI Build 36. OR was estimated for the risk allele (bold).

RAF in controls.

Results (RAF, ORs, and 95% CIs) from the original studies (ref. 7–19).

P for heterogeneity between this study and published studies was calculated using a Cochran’s Q test.

Familial relative risk explained by established CRC loci

The six newly identified loci in this study explain approximately 2.1% of the familial relative risk of CRC in East Asians (Supplementary Table 18). The variants, along with the four SNPs identified in our previous GWAS, explained approximately 4.3% of the familial relative risk of CRC in East Asians. An additional 3.4% of the familial relative risk in East Asians can be explained by 18 independent SNPs initially identified in studies conducted among European descendants and confirmed in this study. Based on per-allele ORs derived from previously published GWAS [7-18] and this study, we estimate that the SNPs in the 31 loci identified to date explain approximately 9% of the familial relative risk of CRC in Europeans (Supplementary Table 19), slightly higher than the 7.7% explained in East Asians.

DISCUSSION

In the largest GWAS conducted to date among East Asians, we identified six new genetic loci associated with CRC risk and provided suggestive evidence for three additional novel loci. In addition, we replicated 22 previously reported CRC susceptibility loci. Of the six newly identified loci, two map to genes (TCF7L2 and TGFB1) that have established roles in colorectal tumorigenesis. The other four loci are located in or proximal to genes that are functionally important in transcription regulation (ZMIZ1), genome maintenance (FEN1), fatty acid metabolism (FADS1 and FADS2), cancer cell motility and metastasis (CD9) and cell growth and differentiation (NXN). Risk variants at some loci fall within potentially functional regions and two are associated with expression levels of the TCF7L2 and FADS2 genes. This study expands our current understanding of the genetic basis of CRC risk and provides evidence for novel genes and biological pathways that may be involved in colorectal tumorigenesis. Based on a large twin study conducted in Sweden, Denmark and Finland [2], the heritability estimated for CRC, breast cancer and prostate cancer was 35%, 27% and 42%, respectively. To date, more than 70 low-penetrance susceptibility loci have been identified in GWAS for breast cancer [62] or prostate cancer [63], and these loci together explain approximately 14% and 30%, respectively, of the familial relative risk of breast cancer and prostate cancer among European descendants. For CRC, however, only 31 low-penetrance susceptibility loci have been identified, explaining approximately 9% of the familial relative risk of CRC among European descendants. Compared with GWAS of breast cancer and prostate cancer, studies conducted for CRC have been relatively small. In our study, we evaluated approximately 7,000 promising variants identified from GWAS in the replication stages, which represents one of the largest efforts made to date to follow-up genetic variants identified by GWAS. Six novel loci were identified, representing the largest number of loci identified for CRC risk in a single study. Although multiple GWAS with sample sizes larger than this study have been conducted among European descendants [13, 14, 16], we were still able to identify risk variants with relatively large effect sizes. Our study further highlights the value of conducting GWAS in non-European populations to discover novel susceptibility loci for CRC. In summary, we have identified six new loci associated with CRC risk in this large GWAS conducted among East Asians. These new loci contain genes with established connections to colorectal tumorigenesis through major biological pathways such as Wnt and TGF-β signaling, as well as genes with important biological function that have not yet been well linked to CRC. Our study considerably expands our knowledge of the genetic landscape of CRC and provides clues for future studies to characterize the causal variants and functional mechanisms for these GWAS-identified loci.

ONLINE METHODS

Studies participants

This genome-wide association study (GWAS) was conducted as part of the Asia Colorectal Cancer Consortium, including a total of 14,963 colorectal cancer (CRC) cases and 31,945 controls of East Asian ancestry from 14 studies conducted in China, South Korea and Japan (Supplementary Table 1). Specifically, stage 1 (GWAS discovery) consisted of five studies: Shanghai CRC Study 1 (Shanghai-1, n = 3,102), Shanghai CRC Study 2 (Shanghai-2, n = 908), Guangzhou CRC Study 1 (Guangzhou-1, n = 1,603), Aichi CRC Study 1 (Aichi-1, n =1,346), and Korean Cancer Prevention Study-II CRC (KCPS-II, n = 1,301). With the exception of Shanghai-2 for which we added 423 controls from other studies [64, 65], samples for the remaining four studies were the same as we reported in our previous study [18]. Stage 2 consisted of three studies: Shanghai CRC Study 3 (Shanghai-3, n = 6,577), Guangzhou CRC Study 2 (Guangzhou-2, n = 809), and Guangzhou CRC Study 3 (Guangzhou-3, n = 2,408). Stage 3 included one study: the BioBank Japan CRC Study (BBJ, n = 14,172). Stage 4 consisted of five studies: Guangzhou CRC Study 4 (Guangzhou-4, n = 1,791), Aichi CRC Study 2 (Aichi-2, n = 708), Korean-National Cancer Center CRC Study (Korea-NCC, n = 2,721), Seoul CRC Study (Korea-Seoul, n = 1,522), and Hwasun Cancer Epidemiology Study-Colon and Rectum Cancer (HCES-CRC, n = 7,930). We estimated that our study had a statistical power of >80% to identify an association with an OR of 1.10 or above at P <5×10−8 for SNPs with a MAF of as low as 0.30. We evaluated generalizability of the newly identified associations with CRC risk in European descendants in three consortia including 23 studies (Supplementary Table 13) with a total sample size of 16,984 cases and 18,262 controls recruited in the United States, Europe, Canada and Australia: the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) [17], the Colorectal Transdisciplinary (CORECT) Study and the Colon Cancer Family Registry (CCFR) [21]. Summary descriptions of participating studies are presented in Supplementary Note. Study protocols were approved by the relevant review boards in respective institutions and informed consents were obtained from all study participants.

Laboratory procedures

Genotyping of samples in stage 1 was conducted as described previously using the following platforms: Affymetrix Genome-Wide Human SNP Array 6.0, Illumina HumanOmniExpress BeadChip, Illumina Infinium HumanHap550 BeadChip, Illumina 660W-Quad BeadChip, Illumina Human610-Quad BeadChip, Illumina Infinium HumanHap610 BeadChip, and Affymetrix Genome-Wide Human SNP Array 5.0 [18, 64–69]. We used a uniform quality control protocol as described in our recent paper [18] to filter samples and SNPs. Genotyping and quality control methods are also presented in the Supplementary Note. After quality control exclusions, we obtained 502,145 autosomal SNPs for samples in Shanghai-1, 245,961 SNPs in Shanghai-2, 250,612 SNPs in Guangzhou-1, 232,426 SNPs in Aichi-1, and 312,869 SNPs in KCPS-II (Supplementary Table 2). Genotyping for 3,632 cases and 6,404 controls in stage 2 was completed using Illumina Infinium assays as part of the customer add-on content for multiple projects to the Illumina HumanExome Beadchip (see URLs). Details of array design, genotyping, genotype call and quality control are provided in the Supplementary Note. Samples were excluded according to the following criteria: (i) genotype call rate <98%, (ii) genetically identical or duplicated samples, (iii) sex determined using genetic data inconsistent with epidemiological or clinical data, (iv) first or second degree relatives, (v) ethnic outliers, or (vi) heterozygosity outliers. Genetic markers were excluded using the following criteria: (i) MAF = 0, (ii) genotype call rate <98%, (iii) consistency rate <98% in positive quality control samples, (iv) P for Hardy-Weinberg equilibrium (HWE) <10−5 in controls or (v) caution SNPs revealed by the Exome Chip Design group (see URLs). We obtained a final dataset including 6,899 SNPs genotyped on 3,519 cases and 6,275 controls for this project. Cases and controls in stage 3 were genotyped using the Illumina HumanHap610-Quad BeadChip. Quality control filters were based on criteria described previously [20]. Methods of genotyping and quality control procedures are also presented in the Supplementary Note. After sample and SNP exclusions, we generated a dataset including 2,814 cases and 11,358 controls with 460,463 SNPs. Stage 4 genotyping for 29 SNPs was conducted using the iPLEX Sequenom MassARRAY platform according to manufacturer’s protocols at the Vanderbilt Molecular Epidemiology Laboratory (Nashville, Tennessee, United States). Details of genotyping and quality control are provided in the Supplementary Note. We filtered out SNPs with (i) genotype call rate <95%, (ii) genotyping consistency rate <95% in positive control samples, (iii) an unclear genotype call or (iv) P for HWE <10−5 in controls. The average consistency rate of these SNPs passing quality control filters was 99.9% with median value 100% in each of the five participating studies included in this stage. Samples in GECCO, CORECT and CCFR were genotyped with Illumina and Affymetrix Arrays[17, 21]. Genotyping, quality control and imputation have been reported in previously [17, 21] and are described in the Supplementary Note.

SNP selection

Selection of SNPs for stage 2 replication was primarily based on the following criteria: (i) P <0.05 in meta-analysis, (ii) P for heterogeneity >0.0001, (iii) imputation R2 >0.5 in at each of the included studies, (iv) MAF >0.05 in each of the included studies, (v) SNPs uncorrelated with established CRC SNPs (defined as r2 <0.2 in HapMap Asian), (vi) SNPs uncorrelated with other SNPs identified in this project (r2 <0.2) and (vii) data available in at least two studies (see Supplementary Note). We included multiple SNPs in some regions with a prior P value of <0.002 or with genes of interest. Risk variants identified from previously published GWAS were also included in the assay [7-20]. In total, 8,569 unique SNPs were selected. Of them, 7,113 SNPs were successfully designed. For stage 3 replication, we selected 559 SNPs according to criteria: (i) P <0.005 in meta-analysis of data from stages 1 and 2, (ii) association in the same direction in both stages and (iii) P for heterogeneity >0.0001. For stage 4, we selected 30 SNPs on the basis of criteria: (i) P <0.0001 in meta-analysis of stages 1, 2, and 3, (ii) P <0.01 in meta-analysis of stages 2 and 3, (iii) association in the same direction in three stages and (iv) P for heterogeneity >0.0001.

Statistical and bioinformatic analysis

Details of imputation and population substructure evaluation are provided in the Supplementary Note. Briefly, stage 1 imputation was performed with CHB (Han Chinese in Beijing, China) and JPT (Japanese in Tokyo, Japan) HapMap 2 panel as the reference using program MACH v1.0 [70] (see URLs). Stage 3 imputation was conducted with phased data of JPT/CHS/CHD participants from the 1000 Genomes Project phase1 v3 as the reference using program MACH v1.0 [70] and minimac [71] (see URLs). Regional imputation of genotype data from The Cancer Genome Atlas (TCGA) [30] (see URLs) was performed with the GIANT ALL reference panel from the 1000 Genomes Project phase1 release v3 using MACH v1.0 [70] and minimac [71] (see URLs). To evaluate the imputation quality in our study, we directly genotyped the ten newly identified risk variants in approximately 2,800 samples included in stage 1. The concordance between imputed and genotyped data was very high, with mean values ranging from 96.00% to 99.96% for the ten SNPs (Supplementary Table 20). For rs10849432, the imputation quality for the Aichi-1 study was relatively low (R2 =0.57), and thus data from this study were not included in our final analysis. We evaluated population structure in studies included in stages 1 and 2 using principal components analysis with EIGENSTRAT software [72] (see URLs). Based on adjusted regression models including the first ten principal components, the genomic inflation factor λ was <1.04 in each of the five studies included in stage 1 and 1.0368 in the meta-analysis of all five studies (Supplementary Fig. 2). The λ was <1.05 in each of the three studies included in stage 2 and 1.0525 in the meta-analysis of all three studies (Supplementary Fig. 3). A rescaled inflation statistic λ1000, representing an equivalent value of a study with 1,000 cases and 1,000 controls using the formula: λ1000 = 1 + 500 × (λ − 1) × (1/Ncases + 1/Ncontrols) 73, was 1.01 in both stages 1 and 2. These findings showed little evidence of population stratification in our studies. Associations between SNPs and CRC risk were evaluated on the basis of the log-additive model using mach2dat [70], PLINK version 1.0.7 [74], R version 3.0.0 and SAS version 9.3 (for all of these see URLs). Per-allele odds ratios (ORs) and 95% confidence intervals (CIs) were derived from logistic regression models, adjusting for age, sex and the first ten principal components when appropriate. Association analysis was conducted for each participating study separately and a fixed-effects meta-analysis was conducted to obtain summary results for each of the four stages and all stages combined with the inverse-variance method using program METAL [75]. SNPs showing an association at P <5×10−8 in the combined analysis of all studies were considered genome-wide significant. We also performed stratified analyses for the top SNPs by tumor anatomic site (colon and rectum), population (Chinese, Korean and Japanese) and sex (men and women). We estimated heterogeneity across studies and subgroups with a Cochran’s Q test [76], with P for heterogeneity <0.008 as statistically significant considering multiple comparisons of six independent loci. Independent signals in a locus were identified using stepwise logistic regression models conditioning on the top risk variant we identified in each of the new loci using R software (see URLs). We estimated haplotype frequencies using Haploview version 4.2 [77] (see URLs) and conducted haplotype association analysis for two loci (11q12.2 and 19q13.2) where two or more SNPs were identified using SAS Genetics v9.3 with logistic regression models. Pairwise SNP-SNP interactions between six top risk variants in the newly identified loci with P <5×10−8 and also between these six SNPs and the risk variants in 25 previously reported loci were evaluated using the maximal likelihood ratio test with inclusion of interaction terms into logistic regression models. Interactions with P <0.00028 were considered statistically significant with the adjustment of multiple comparisons of 180 tests. The familial relative risk (λ) to offspring of an affected individual due to a single locus was estimated using a log-additive model: λ=(pr2 +q) / (pr +q)2, where p is the frequency of the risk allele, q = 1-p is the frequency of the reference allele, and r is the per-allele relative risk [78]. The proportion of the familial relative risk explained by this locus, assuming a multiplicative interaction between markers in the locus and other loci, was calculated as: log(λ)/ log (λo), where λo is the overall familial relative risk. λo is assigned to be 2.2 for CRC estimated from a meta-analysis [79]. Assuming that the risk associated with each locus combine multiplicatively, the familial relative risks also multiply. Then the combined contribution of the familial relative risks from multiple loci is equal to: ln(Π)/ln(λ0). We generated forest plots and Q-Q plots using R software (see URLs). Regional association plots for SNPs in newly identified loci were generated using the website-based tool LocusZoom version 1.1 [80] (see URLs). Linkage equilibrium (LD) structure between SNPs was determined on the basis of data from the 1000 Genomes Project Pilot 1 or HapMap 2 as provided by the website-based tool SNAP [81] (see URLs) and plotted using Haploview, SNAP and the UCSC Genome Browser (see URLs). LD blocks were defined using HapMap recombination rates and hotspots [23]. All the genomic coordinates are based on the National Center for Biotechnology Information (NCBI), Build 36. To identify putative functional variants for newly identified loci, we identified all SNPs in LD (i.e., r2 > 0.5) with the risk variants using data from the 1000 Genomes Project [22] and HapMap 2 [23]. We mapped the genomic locations of these SNPs to nonsynonymous sites, splice sites, promoters, nearGene-3 regions, nearGene-5 regions, three prime untranslated regions (3′-UTR), five prime untranslated regions (5′-UTR), introns and intergenic regions. We evaluated the potential functional effect of nonsynonymous SNPs using the prediction algorithms SIFT [36] and PolyPhen-2 [37] (see URLs). We predicted the putative function of SNPs in promoters, nearGene-3 regions, nearGene-5 regions, 3′-UTR and 5′-UTR with SNPinfo Web Server [45] (see URLs). We conducted analyses to evaluate the potential regulatory effect of SNPs in non-coding regions on transcription using the Encyclopedia of DNA Elements (ENCODE) tool HaploReg v2 [82] and the UCSC Genome Browser (see URLs) on the basis of their location within regions of promoter or enhancer activity, DNase I hypersensitivity; local histone modifications, proteins bound to these regulatory sites, cis expression quantitative trait loci (eQTL) and transcription factor binding motif. We obtained additional functional evidence for these SNPs from the published literature. We identified all genes that localize in a 1-mb window centered on the top risk variants in our newly identified loci and including SNPs correlated (r2 > 0.5) with the top risk variants. To determine whether these genes may explain the observed association in these loci, we first examined genome-wide cis eQTL data in multiple tissues from four major eQTL databases: the Blood eQTL browser [25], the eQTL Browser [26], the Genotype-Tissue Expression project (GTEx) [27] and the Multiple Tissue Human Expression Resource project (MuTHER) [28]. The significance threshold for these analyses was set to P <0.008 to count for six tests. Somatic mutations of these genes were evaluated using data from the Catalogue of Somatic Mutations in Cancer (COSMIC) [29] (see URLs). Expression levels of these genes in CRC cell lines were assessed using data from Gene Expression Atlas [31] (see URLs). To correct for multiple comparisons of the 11 key genes, associations with a P < 0.0045 were considered to be statistically significant. We searched the published literature for these genes in relation to CRC from PubMed and OMIM (see URLs).

Expression analysis

We downloaded RNA sequencing (level 1) and SNP array (level 2) data for 364 colon adenocarcinoma and 18 normal colon tissue samples from TCGA [30] (see URLs). To quantify expression levels of candidate genes in the newly identified loci, we normalized gene expression levels using the reads per kilobase of exon per million mapped reads (RPKM) value as previously described [83]. Expression differences between tumor and normal samples for each gene were evaluated on the basis of the RPKM values with the Wilcoxon rank sum test. Associations between gene RPKM value and SNP genotypes were analyzed using a linear regression model including age and sex as covariates. We converted the RPKM value of a gene to log scale for analysis if it was not normally distributed. We considered P <0.0045 to be statistically significant with adjustment for testing of the 11 key genes.

83 in total

1. Functional FEN1 genetic variants contribute to risk of hepatocellular carcinoma, esophageal cancer, gastric cancer and colorectal cancer.

Authors: Li Liu; Changchun Zhou; Liqing Zhou; Li Peng; Dapeng Li; Xiaojiao Zhang; Mo Zhou; Pengqun Kuang; Qipeng Yuan; Xianrang Song; Ming Yang
Journal: Carcinogenesis Date: 2011-11-09 Impact factor: 4.944

2. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors: Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal: Bioinformatics Date: 2008-10-30 Impact factor: 6.937

3. Mutations in the SMAD4/DPC4 gene in juvenile polyposis.

Authors: J R Howe; S Roth; J C Ringold; R W Summers; H J Järvinen; P Sistonen; I P Tomlinson; R S Houlston; S Bevan; F A Mitros; E M Stone; L A Aaltonen
Journal: Science Date: 1998-05-15 Impact factor: 47.728

4. Quantitative synthesis in systematic reviews.

Authors: J Lau; J P Ioannidis; C H Schmid
Journal: Ann Intern Med Date: 1997-11-01 Impact factor: 25.391

5. Motility related protein 1 (MRP1/CD9) expression in colon cancer.

Authors: M Mori; K Mimori; T Shiraishi; M Haraguchi; H Ueo; G F Barnard; T Akiyoshi
Journal: Clin Cancer Res Date: 1998-06 Impact factor: 12.531

6. A transforming growth factorbeta1 signal peptide variant increases secretion in vitro and is associated with increased incidence of invasive breast cancer.

Authors: Alison M Dunning; Peter D Ellis; Simon McBride; Heidi L Kirschenlohr; Catherine S Healey; Paul R Kemp; Robert N Luben; Jenny Chang-Claude; Arto Mannermaa; Vesa Kataja; Paul D P Pharoah; Douglas F Easton; Bruce A J Ponder; James C Metcalfe
Journal: Cancer Res Date: 2003-05-15 Impact factor: 12.701

7. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer.

Authors: Richard S Houlston; Emily Webb; Peter Broderick; Alan M Pittman; Maria Chiara Di Bernardo; Steven Lubbe; Ian Chandler; Jayaram Vijayakrishnan; Kate Sullivan; Steven Penegar; Luis Carvajal-Carmona; Kimberley Howarth; Emma Jaeger; Sarah L Spain; Axel Walther; Ella Barclay; Lynn Martin; Maggie Gorman; Enric Domingo; Ana S Teixeira; David Kerr; Jean-Baptiste Cazier; Iina Niittymäki; Sari Tuupanen; Auli Karhu; Lauri A Aaltonen; Ian P M Tomlinson; Susan M Farrington; Albert Tenesa; James G D Prendergast; Rebecca A Barnetson; Roseanne Cetnarskyj; Mary E Porteous; Paul D P Pharoah; Thibaud Koessler; Jochen Hampe; Stephan Buch; Clemens Schafmayer; Jurgen Tepel; Stefan Schreiber; Henry Völzke; Jenny Chang-Claude; Michael Hoffmeister; Hermann Brenner; Brent W Zanke; Alexandre Montpetit; Thomas J Hudson; Steven Gallinger; Harry Campbell; Malcolm G Dunlop
Journal: Nat Genet Date: 2008-11-16 Impact factor: 38.330

8. The tetraspanin CD9 inhibits the proliferation and tumorigenicity of human colon carcinoma cells.

Authors: Susana Ovalle; María Dolores Gutiérrez-López; Nieves Olmo; Javier Turnay; María Antonia Lizarbe; Pedro Majano; Francisca Molina-Jiménez; Manuel López-Cabrera; María Yáñez-Mó; Francisco Sánchez-Madrid; Carlos Cabañas
Journal: Int J Cancer Date: 2007-11-15 Impact factor: 7.396

9. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk.

Authors: Emma Jaeger; Emily Webb; Kimberley Howarth; Luis Carvajal-Carmona; Andrew Rowan; Peter Broderick; Axel Walther; Sarah Spain; Alan Pittman; Zoe Kemp; Kate Sullivan; Karl Heinimann; Steven Lubbe; Enric Domingo; Ella Barclay; Lynn Martin; Maggie Gorman; Ian Chandler; Jayaram Vijayakrishnan; Wendy Wood; Elli Papaemmanuil; Steven Penegar; Mobshra Qureshi; Susan Farrington; Albert Tenesa; Jean-Baptiste Cazier; David Kerr; Richard Gray; Julian Peto; Malcolm Dunlop; Harry Campbell; Huw Thomas; Richard Houlston; Ian Tomlinson
Journal: Nat Genet Date: 2007-12-16 Impact factor: 38.330

Review 10. Functional regulation of FEN1 nuclease and its link to cancer.

Authors: Li Zheng; Jia Jia; L David Finger; Zhigang Guo; Cindy Zer; Binghui Shen
Journal: Nucleic Acids Res Date: 2010-10-06 Impact factor: 16.971

135 in total

1. Multiple functional linear model for association analysis of RNA-seq with imaging.

Authors: Junhai Jiang; Nan Lin; Shicheng Guo; Jinyun Chen; Momiao Xiong
Journal: Quant Biol Date: 2015-08-15

2. Association of 8q23-24 region (8q23.3 loci and 8q24.21 loci) with susceptibility to colorectal cancer: a systematic and updated meta-analysis.

Authors: Linlin Li; Li Lv; Yuan Liang; Xiaoyu Shen; Shishi Zhou; Jia Zhu; Rui Ma
Journal: Int J Clin Exp Med Date: 2015-11-15

Review 3. Genome-Wide Association Studies of Cancer in Diverse Populations.

Authors: Sungshim L Park; Iona Cheng; Christopher A Haiman
Journal: Cancer Epidemiol Biomarkers Prev Date: 2017-06-21 Impact factor: 4.254

4. PUFA levels in erythrocyte membrane phospholipids are differentially associated with colorectal adenoma risk.

Authors: Samara B Rifkin; Martha J Shrubsole; Qiuyin Cai; Walter E Smalley; Reid M Ness; Larry L Swift; Wei Zheng; Harvey J Murff
Journal: Br J Nutr Date: 2017-06-29 Impact factor: 3.718

Review 5. Genetic architecture of colorectal cancer.

Authors: Ulrike Peters; Stephanie Bien; Niha Zubair
Journal: Gut Date: 2015-07-17 Impact factor: 23.059

6. Effects of fish oil supplementation on eicosanoid production in patients at higher risk for colorectal cancer.

Authors: Maya N White; Martha J Shrubsole; Qiuyin Cai; Timothy Su; Jennings Hardee; John-Anthony Coppola; Sunny S Cai; Stephanie M Martin; Sandra Motley; Larry L Swift; Ginger L Milne; Wei Zheng; Qi Dai; Harvey J Murff
Journal: Eur J Cancer Prev Date: 2019-05 Impact factor: 2.497

7. Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population.

Authors: Juncheng Dai; Wei Shen; Wanqing Wen; Jiang Chang; Tongmin Wang; Haitao Chen; Guangfu Jin; Hongxia Ma; Chen Wu; Lian Li; Fengju Song; YiXin Zeng; Yue Jiang; Jiaping Chen; Cheng Wang; Meng Zhu; Wen Zhou; Jiangbo Du; Yongbing Xiang; Xiao-Ou Shu; Zhibin Hu; Weiping Zhou; Kexin Chen; Jianfeng Xu; Weihua Jia; Dongxin Lin; Wei Zheng; Hongbing Shen
Journal: Int J Cancer Date: 2016-10-11 Impact factor: 7.396

8. Evaluation of genetic variants in association with colorectal cancer risk and survival in Asians.

Authors: Nan Wang; Yingchang Lu; Nikhil K Khankari; Jirong Long; Hong-Lan Li; Jing Gao; Yu-Tang Gao; Yong-Bing Xiang; Xiao-Ou Shu; Wei Zheng
Journal: Int J Cancer Date: 2017-06-21 Impact factor: 7.396

9. FEN1 -69G>A and +4150G>T polymorphisms and breast cancer risk.

Authors: Maryam Rezaei; Mohammad Hashemi; Sara Sanaei; Mohammad Ali Mashhadi; Seyed Mehdi Hashemi; Gholamreza Bahari; Mohsen Taheri
Journal: Biomed Rep Date: 2016-08-08

10. Genome-wide association study of colorectal cancer identifies six new susceptibility loci.

Authors: Fredrick R Schumacher; Stephanie L Schmit; Shuo Jiao; Christopher K Edlund; Hansong Wang; Ben Zhang; Li Hsu; Shu-Chen Huang; Christopher P Fischer; John F Harju; Gregory E Idos; Flavio Lejbkowicz; Frank J Manion; Kevin McDonnell; Caroline E McNeil; Marilena Melas; Hedy S Rennert; Wei Shi; Duncan C Thomas; David J Van Den Berg; Carolyn M Hutter; Aaron K Aragaki; Katja Butterbach; Bette J Caan; Christopher S Carlson; Stephen J Chanock; Keith R Curtis; Charles S Fuchs; Manish Gala; Edward L Giovannucci; Stephanie M Gogarten; Richard B Hayes; Brian Henderson; David J Hunter; Rebecca D Jackson; Laurence N Kolonel; Charles Kooperberg; Sébastien Küry; Andrea LaCroix; Cathy C Laurie; Cecelia A Laurie; Mathieu Lemire; David Levine; Jing Ma; Karen W Makar; Conghui Qu; Darin Taverna; Cornelia M Ulrich; Kana Wu; Suminori Kono; Dee W West; Sonja I Berndt; Stéphane Bezieau; Hermann Brenner; Peter T Campbell; Andrew T Chan; Jenny Chang-Claude; Gerhard A Coetzee; David V Conti; David Duggan; Jane C Figueiredo; Barbara K Fortini; Steven J Gallinger; W James Gauderman; Graham Giles; Roger Green; Robert Haile; Tabitha A Harrison; Michael Hoffmeister; John L Hopper; Thomas J Hudson; Eric Jacobs; Motoki Iwasaki; Sun Ha Jee; Mark Jenkins; Wei-Hua Jia; Amit Joshi; Li Li; Noralene M Lindor; Keitaro Matsuo; Victor Moreno; Bhramar Mukherjee; Polly A Newcomb; John D Potter; Leon Raskin; Gad Rennert; Stephanie Rosse; Gianluca Severi; Robert E Schoen; Daniela Seminara; Xiao-Ou Shu; Martha L Slattery; Shoichiro Tsugane; Emily White; Yong-Bing Xiang; Brent W Zanke; Wei Zheng; Loic Le Marchand; Graham Casey; Stephen B Gruber; Ulrike Peters
Journal: Nat Commun Date: 2015-07-07 Impact factor: 14.919