Literature DB >> 22076443

Refinement of the associations between risk of colorectal cancer and polymorphisms on chromosomes 1q41 and 12q13.13.

Sarah L Spain1, Luis G Carvajal-Carmona, Kimberley M Howarth, Angela M Jones, Zhan Su, Jean-Baptiste Cazier, Jennet Williams, Lauri A Aaltonen, Paul Pharoah, David J Kerr, Jeremy Cheadle, Li Li, Graham Casey, Pavel Vodicka, Oliver Sieber, Lara Lipton, Peter Gibbs, Nicholas G Martin, Grant W Montgomery, Joanne Young, Paul N Baird, Hans Morreau, Tom van Wezel, Clara Ruiz-Ponte, Ceres Fernandez-Rozadilla, Angel Carracedo, Antoni Castells, Sergi Castellvi-Bel, Malcolm Dunlop, Richard S Houlston, Ian P M Tomlinson.   

Abstract

In genome-wide association studies (GWASs) of colorectal cancer, we have identified two genomic regions in which pairs of tagging-single nucleotide polymorphisms (tagSNPs) are associated with disease; these comprise chromosomes 1q41 (rs6691170, rs6687758) and 12q13.13 (rs7163702, rs11169552). We investigated these regions further, aiming to determine whether they contain more than one independent association signal and/or to identify the SNPs most strongly associated with disease. Genotyping of additional sample sets at the original tagSNPs showed that, for both regions, the two tagSNPs were unlikely to identify a single haplotype on which the functional variation lay. Conversely, one of the pair of SNPs did not fully capture the association signal in each region. We therefore undertook more detailed analyses, using imputation, logistic regression, genealogical analysis using the GENECLUSTER program and haplotype analysis. In the 1q41 region, the SNP rs11118883 emerged as a strong candidate based on all these analyses, sufficient to account for the signals at both rs6691170 and rs6687758. rs11118883 lies within a region with strong evidence of transcriptional regulatory activity and has been associated with expression of PDGFRB mRNA. For 12q13.13, a complex situation was found: SNP rs7972465 showed stronger association than either rs11169552 or rs7136702, and GENECLUSTER found no good evidence for a two-SNP model. However, logistic regression and haplotype analyses supported a two-SNP model, in which a signal at the SNP rs706793 was added to that at rs11169552. Post-GWAS fine-mapping studies are challenging, but the use of multiple tools can assist in identifying candidate functional variants in at least some cases.

Entities:  

Mesh:

Year:  2011        PMID: 22076443      PMCID: PMC3263985          DOI: 10.1093/hmg/ddr523

Source DB:  PubMed          Journal:  Hum Mol Genet        ISSN: 0964-6906            Impact factor:   6.150


INTRODUCTION

Using genome-wide association studies (GWASs), we have identified 14 regions that contain tagging single nucleotide polymorphisms (tagSNPs) associated with the risk of colorectal cancer (CRC) (1). Within three of these regions—chromosomes 14q22.2, 15q13.3 and 20p12.3—we have shown that there exist two SNPs that are independently associated with disease (2). In two further regions—chromosomes 1q41 and 12q13.13—there are two SNPs associated with CRC risk, but from the original GWA analysis, it was unclear as to whether these represented independent signals of association (1). At 1q41, these SNPs are rs6691170 (chr1: 220,112,069 bases) and rs6687758 (chr1: 220,231,571); they are in modest pairwise linkage disequilibrium (LD) (r2= 0.22; D′ = 0.71). At 12q13.13, the two SNPs are rs7136702 (chr12: 49,166,483) and rs11169552 (chr12: 49,441,930); these SNPs too are moderately correlated (r2= 0.11, D′ = 0.76). Our previous analyses had not resolved the issue of whether there could be more than one independent CRC SNP in either of these regions (1). One of the aims of GWASs is the discovery of functional/causal variants, the effects of which are manifest in the tagSNP associations. It is, however, very challenging to proceed from a tagSNP association to identifying functional variants, and relatively few such studies have been reported to date. One reason for this is that the correlation matrix between tagSNP(s) and functional variant(s) at any locus may be complex. If two association signals occur at tagSNPs at the same locus, the possible causes include the following: the associated tagSNPs are in LD; there are two independent functional sites, each in LD with one tagSNP; there are two functional sites, but there is true epistasis; there is a single functional site on a haplotype defined by the two tagSNPs; there are >2 independent functional sites in LD with one or more tagSNPs; there is a mixture of the above possibilities. It can be extremely hard to distinguish among these possibilities and our inability to de-convolute association signals may help explain why so much of the heritability of complex diseases is unexplained by GWASs to date (3). Despite these problems, functional variant discovery may be aided by a deeper examination of genetic variation in the LD blocks in which the tagSNPs reside. Such discovery is likely to benefit from efforts such as the 1000 Genomes Project, where a comprehensive discovery of novel variants has been carried out in several populations. In this study, we had three aims. First, we wished to investigate as fully as possible whether there was likely to be one or more than one functional variant underlying the association signals at 1q41 and 12q13.13 in CRCs. Secondly, we wanted to investigate other tagSNPs in these regions for evidence of further, independent association signals. Thirdly, we wished to use imputation and functional annotation to refine the most likely location of the ‘disease-causing’ variant in both the 1q41 and 12q13.13 regions.

RESULTS

The 1q41 region

We genotyped a total of 48 174 samples (22 832 cases and 25 892 controls) from 17 sample sets at rs6691170 and rs6687758. This analysis included five replication case/control cohorts that were not previously reported (1) for these SNPs: Kentucky; Prague; EPICOLON; Leiden; and Australia. After meta-analysis in STATA, both rs6691170 and rs6687758 were, as expected, significantly associated with CRC risk (Table 1), with no evidence of heterogeneity among studies. Incorporating both SNPs into an unconditional logistic regression model showed that neither of the pair of SNPs fully captured the association signal in the region [odds ratio (OR) = 1.06, P = 1.06 × 10−4 for rs6691170 and OR = 1.07, P = 2.48 × 10−4 for rs6687758]. We used PLINK to examine the possibility that the two tagSNPs indicated a single high-risk haplotype on which an unknown functional SNP was present (that is, all the functional risk alleles resided on a haplotype composed solely of one of the four possible pairs of tagSNP alleles). However, the association signal was not simply present on the high-risk haplotype TG (for rs6991170|rs6687758). Instead, the risks for the ‘compound’ (high-low or low-high) haplotypes—GG and TA—were greater than those for the low-low haplotype (GA), inconsistent with a functional SNP being in complete LD with a haplotype indicated by the pair of tagSNPs (Supplementary Material, Table S1). We also tested for evidence of epistasis between rs6691170 and rs6687758 using case–control logistic regression analysis, incorporating interaction between SNPs as a variable, but no evidence of deviation from log-additive SNP effects was found (P = 0.292).
Table 1.

Summary of genotyping and association results at the original four tagSNPs on 1q41 and 12q13.13 in the extended data sets

SummarySeriesCa11Ca12Ca22Co11Co12Co22Ca1Ca2Co1Co2MAF caMAF coORNtotNcaNco
rs6691170; chr1: 220,112,069; z = 6.87; P = 6.42 × 10−12; OR = 1.10; 95% CI = 1.07–1.12; Phet = 0.39; I2 = 5.9%; allele 1 = T; allele 2 = GUK1/CORGI130435355100429393695114562912150.380.341.1721215920922
Scotland1/COGS134463379130433435731122169313030.370.351.1261303976998
UK2/NSCCG398139510583551304115921913511201436220.380.361.122362228512818
Scotland2/SOCCS24894181723996785114372575144526690.360.351.031266920062057
VQ582778336883591234109613872209195234260.390.361.102342617982689
CFR149581447155436399879147574612340.370.380.98612341177990
UK3/NSCCG406144811373671251119822603722198536470.380.351.116364729912816
Scotland3/SOCCS103376326117447363582102868111730.360.370.9751173805927
UK4/CORGI2BCD7021221312947344535263873113630.360.351.02913634951047
Cambridge3241068805280101389017162678157327930.390.361.138279321972183
COIN/NBS30010547973261170100516542648182231800.380.361.090318021512501
Helsinki143435351105372340721113758210520.390.361.1461052929817
Prague1474243638231725271811504818210.380.371.066821934651
Kentucky1564663882447096307781242119719690.390.381.030196910101583
EPICOLON1936135201846325789991653100017880.380.361.081178813261394
Australia64223129582121683514813285480.420.371.219548416438
Leiden1414043109229130468610244758990.400.351.268899855687
rs6687758; chr1: 220,231,571; z = 5.64; P = 1.70 × 10−8; OR = 1.09; 95% CI = 1.06–1.13; Phet = 0.28; I2 = 14.8%; allele 1 = G; allele 2 = AUK1/CORGI3731256832299598386144836314950.210.201.0981495917929
Scotland1/COGS6330860634325642434152039316090.220.201.16916099771001
UK2/NSCCG121985174698898182212274477109445420.220.191.138454228522818
Scotland2/SOCCS776941235746391344848316478733270.210.191.133332720062057
VQ5886605110611383217427772817105843160.220.201.125431617972687
CFR5036475651327607464187642915410.200.220.88815411170985
UK3/NSCCG1229471920115850186111914787108045720.200.191.053457229892826
Scotland3/SOCCS4826351951315566359130141714470.220.220.9581447830932
UK4/CORGI2BCD241583064530966920677039916470.210.201.10416474881023
Cambridge897551366766641444933348781635520.210.191.165355222102184
COIN/NBS1027011330897701642905336194840540.210.191.151405421332501
Helsinki6738547649317437519133741511910.280.261.1141191928803
Prague4733555233230388429143929610060.230.231.0131006934651
Kentucky41312657575091017394162662325430.200.200.989254310101583
EPICOLON5742984046442906543210953422540.200.191.087225413261394
Australia25151264171522692016791866900.230.211.098690440438
Leiden4528452128212448374132626811080.220.191.1661108850688
rs7136702; chr12: 49,166,483; z = 6.69; P = 2.23 × 10−11; OR = 1.10; 95% CI = 1.07–1.12; Phet = 0.51; I2 = 0.0%; allele 1 = T; allele 2 = CUK1/CORGI131433357113430386695114765612020.380.351.1101202921929
Scotland1/COGS146443388126444431735121969613060.380.351.13113069771001
UK2/NSCCG380133111403291306118320913611196436720.370.351.083367228512818
Scotland2/SOCCS27697575527593584715272485148526290.380.361.088262920062057
VQ582378696942951290110213432257188034940.370.351.106349418002687
CFR155604427103444450914145865013440.390.331.29613441186997
UK3/NSCCG402138811903591283118021923768200136430.370.351.059364329802822
Scotland3/SOCCS11831027012240135654685064511130.390.371.1081113698879
UK4/CORGI2BCD8121519015146644437759576813540.390.361.11713544861061
Cambridge332955903261101590616192761153728270.370.351.079282721902182
COIN/NBS2878938443211121105914672581176332390.360.351.044323920242501
Helsinki10338943672334414595126147811620.320.291.1471162928820
Prague854194305729130358912794058970.320.311.020897934651
Kentucky1404783922157506187581262118019860.380.371.011198610101583
EPICOLON1986424861876235841038161499717910.390.361.155179113261394
Leiden1153883419226932161810704539110.370.331.162911844682
rs11169552; chr12: 49,441,930; z = 6.88; P = 5.99 × 10−12; OR = 0.90; 95% CI = 0.88–0.93; Phet = 0.50; I2 = 0.0%; allele 1 = T; allele 2 = CUK1/CORGI5632853767350512440140248413740.240.260.8911374921929
Scotland1/COGS6036954476406519489145755814440.250.280.86914449731001
UK2/NSCCG209106215801991124149414804222152241120.260.270.947411228512817
Scotland2/SOCCS1118081087152821108410302982112529890.260.270.918298920062057
VQ581096651026201104614428832717144839300.250.270.882393018002689
CFR7245066373408516594177655414400.250.280.86914401185997
UK3/NSCCG167117916252141142146315134429157040680.250.280.885406829712819
Scotland3/SOCCS141271768032149015547948113010.240.270.8751301317891
UK4/CORGI2BCD341752778039555424372955515030.250.270.90315034861029
Cambridge1558241241163853117211343306117931970.260.270.930319722202188
COIN/NBS1358181107189973133810883032135136490.260.270.969364920602500
Helsinki10340740115335630361312096629620.340.410.737962911812
Prague513755083827334047713913499530.260.270.936953934651
Kentucky5837757593665825493152785123150.240.270.878231510101583
EPICOLON5645381775471848565208762121670.210.220.945216713261394
Leiden5330449257251378410128836510070.240.270.8781007849686

Ca, cases; Co, controls; 11, rare homozygote; 12, heterozygote; 22, common homozygote; 1, minor allele; 2, major allele. Allele 1 is risk allele for rs6691170, rs6687758 and rs7136702; allele 2 is risk allele for rs11169552. MAF, minor allele frequency; OR, odds ratio.

Summary of genotyping and association results at the original four tagSNPs on 1q41 and 12q13.13 in the extended data sets Ca, cases; Co, controls; 11, rare homozygote; 12, heterozygote; 22, common homozygote; 1, minor allele; 2, major allele. Allele 1 is risk allele for rs6691170, rs6687758 and rs7136702; allele 2 is risk allele for rs11169552. MAF, minor allele frequency; OR, odds ratio. Having failed to find evidence for the simplest situations—namely that one of each tagSNP pair captured the great majority of the association signal or that the tagSNPs essentially acted as simple two-locus tags for the functional variants in each region—we attempted to deconvolute the 1q41 signal by association testing of imputed SNPs in the region. The three GWAS sample sets, UK1, Scotland 1 and VQ58, were imputed to the combined 1000 genomes and HapMap3 reference set. A total of 630 SNPs in the 220–221 Mb region on chromosome 1q41 was successfully imputed from 76 genotyped SNPs. The strongest association signal (Fig. 1, Supplementary Material, Table S2), as measured by association test P-value, was at rs11118883 (chr1: 220,127,645), an imputed SNP in moderate LD with rs6691170 (r2= 0.40, D′ = 0.74) and rs6687758 (r2= 0.31, D′ = 0.77).
Figure 1.

Individual SNP associations in the 1q41 region. Association testing was performed in SNPtest using typed and imputed genotypes from the three GWAS series (UK1, Scotland1 and VQ58) and displayed using SNAP (http://www.broadinstitute.org/mpg/snap/). The X-axis shows position on chromosome 1 and the Y-axis, −log10(P) from the per allele association test. The most strongly associated SNP, rs11118883, is shown as a large diamond, and the colours of other data points reflect the LD between that SNP and rs11118883. The smaller diamond points indicate genotyped SNPs and the triangles indicate imputed SNPs. The blue line represents recombination rates.

Individual SNP associations in the 1q41 region. Association testing was performed in SNPtest using typed and imputed genotypes from the three GWAS series (UK1, Scotland1 and VQ58) and displayed using SNAP (http://www.broadinstitute.org/mpg/snap/). The X-axis shows position on chromosome 1 and the Y-axis, −log10(P) from the per allele association test. The most strongly associated SNP, rs11118883, is shown as a large diamond, and the colours of other data points reflect the LD between that SNP and rs11118883. The smaller diamond points indicate genotyped SNPs and the triangles indicate imputed SNPs. The blue line represents recombination rates. We then used reverse stepwise logistic regression analysis to determine whether rs6691170 and rs6687758, or other combinations of SNPs, best accounted for the association between CRC and 1q41 variation. Using a final significance threshold of P = 0.01, we found that two imputed SNPs, rs11118883 and rs12726661, were most strongly associated with the CRC risk (Table 2, Supplementary Material, Table S2). By comparison, a joint analysis of rs6687758 and rs6691170 in the same three GWAS data sets gave much weaker evidence of association, as assessed using the Akaike Information Criterion (AIC). Indeed, a model incorporating rs11118883 alone—although not one with rs12726661 alone—provided a better fit than a model incorporating both rs6687758 and rs6691170; haplotype-based association analysis supported these findings (data not shown).
Table 2.

Two-SNP logistic regression analysis showing best signal in the 1q41 region in comparison with the originally reported SNPs

SNPsPositions (bases)LD (r2, D′)Risk allele (freqcases, freqcontrols)No. cases, no. controlsOR95% CIZP-valueAIC
rs6691170220,112,0690.15,T (0.38,0.36)3272,1.091.01–1.172.950.003210486
rs6687758220,231,5710.65G (0.22,0.20)45721.080.98–1.181.610.108
rs11118883220,127,6450.92,A (0.32,0.29)3206,2.321.49–3.623.712.07 × 10−410475
rs12726661220,134,4111.00A (0.68,0.71)44520.490.32–0.763.161.58 × 10−3

LD is shown for the pair of SNPs being tested. OR, odds ratio; AIC, Akaike information criterion (AIC = −2 * log-likelihood + 2 * (number of parameters)). Note the lower AIC, showing a better model fit, for the test of rs11118883 + rs12726661 compared with rs6691170 + rs6687758. Individual AICs for these four SNPs were, respectively, 10487, 10489, 10484 and 10487.

Two-SNP logistic regression analysis showing best signal in the 1q41 region in comparison with the originally reported SNPs LD is shown for the pair of SNPs being tested. OR, odds ratio; AIC, Akaike information criterion (AIC = −2 * log-likelihood + 2 * (number of parameters)). Note the lower AIC, showing a better model fit, for the test of rs11118883 + rs12726661 compared with rs6691170 + rs6687758. Individual AICs for these four SNPs were, respectively, 10487, 10489, 10484 and 10487. We were surprised to note that in a single-SNP analysis the direction of effect for rs12726661 was reversed—the minor allele was associated with disease risk—compared with that in the two-SNP analysis. We determined that rs11118883 and rs12726661 were in strong LD (r2= 0.98, D′ = 1.00) in our samples, consistent with data from the 1000 genomes project and HapMap3 that had been used for imputation. Examination of the genotype distribution in our data set showed that deviation from perfect LD between the SNPs resulted from two sets of individuals: (i) 50 homozygous for the major allele at rs12726661 and heterozygous at rs11118883; and (ii) 15 heterozygous at rs12726661 and homozygous for the minor allele at rs11118883. Specifically, 28/50 in category (i) were cases and 4/11 in category (ii) were cases. For these 65 individuals, the risk of CRC was significantly greater than that of individuals with the other genotypes at rs12726661 and rs11118883 (OR = 2.10, P = 0.003, χ21 test). A potential explanation for our apparently paradoxical findings is that there exists another allele, almost certainly relatively rare, that is associated with the minor allele of rs12726661 (but not with rs11118883), and that is protective against the CRC risk. We then analysed our UK1, Scotland 1 and VQ58 individuals using GENECLUSTER with the original GWAS SNP genotypes in the rs6691170/6687758 region as inputs. There was no evidence to favour an underlying two-locus model over a one-SNP model (Fig. 2). The predicted most strongly associated single SNP was rs11577023, a SNP that is in very strong LD with rs11118883 (r2= 0.93, D′ = 1.0) in our data.
Figure 2.

GENECLUSTER output for the 1q41 region. The upper left panel compares the Bayes factors (BFs) for models in which the association signals at rs6691170 and 6687758 are derived from either one functional SNP (red) or two functional SNPs (green). Recombination rates are also shown as a red line. The upper right panel shows the log10(BF) at the focal position—the site of the highest log10(BF), here chr1:220,129,000 bases—under one- and two-SNP models. The lower right panel shows reconstructed genealogies for UK1, Scotland1 and VQ58 combined, based on each individual's genotypes in the region from the Illumina Hap300/370/550 panels and HapMap2 data. The most likely positions of SNP origins under the one-SNP model (blue, rs11577023) and two-SNP model (green and red) are shown. These result in counts of cases and controls and relative risks as indicated in the upper right panel. The lower left panel shows haplotypes (rows) and SNPs (columns). Note that the region analysed extends for several Mb flanking rs6691170 and 6687758; although no signal reaches nominal significance at log10(BF) = 4, there is some evidence of a second independent region of 1q associated with CRC at ∼218.2 Mb, as we have reported previously. The importance of rs11577023 was supported by the Margarita analysis in which it was the second most strongly associated with disease (P = 3.59 × 10−4).

GENECLUSTER output for the 1q41 region. The upper left panel compares the Bayes factors (BFs) for models in which the association signals at rs6691170 and 6687758 are derived from either one functional SNP (red) or two functional SNPs (green). Recombination rates are also shown as a red line. The upper right panel shows the log10(BF) at the focal position—the site of the highest log10(BF), here chr1:220,129,000 bases—under one- and two-SNP models. The lower right panel shows reconstructed genealogies for UK1, Scotland1 and VQ58 combined, based on each individual's genotypes in the region from the Illumina Hap300/370/550 panels and HapMap2 data. The most likely positions of SNP origins under the one-SNP model (blue, rs11577023) and two-SNP model (green and red) are shown. These result in counts of cases and controls and relative risks as indicated in the upper right panel. The lower left panel shows haplotypes (rows) and SNPs (columns). Note that the region analysed extends for several Mb flanking rs6691170 and 6687758; although no signal reaches nominal significance at log10(BF) = 4, there is some evidence of a second independent region of 1q associated with CRC at ∼218.2 Mb, as we have reported previously. The importance of rs11577023 was supported by the Margarita analysis in which it was the second most strongly associated with disease (P = 3.59 × 10−4). We genotyped rs11118883 directly in a set of 84 UK control samples and found complete concordance with the imputed genotypes. rs11118883 (chr1:220,127,645) lies in a gene desert, within a region of LD that extends approximately from 220.0 to 220.3 Mb. The nearest gene, ∼150 kb towards the centromere, is the MAP kinase regulator dual-specificity phosphatase 10 (DUSP10). DUSP10 inactivates p38 and also the Jun N-terminal kinase that phosphorylates c-Jun which is believed to play a role in CRC pathogenesis. rs11118883 itself lies upstream of DUSP10 within a region with strong evidence of transcriptional regulatory activity (http://genome.ucsc.edu). Using 1000 genomes data, we found that rs11118883 is in strong LD (r2> 0.7) with at least six SNPs (rs12738322, rs12726661, rs4129271, rs11577023, rs10746414 and rs12137702). Of these, rs10746414 and rs12137702 are also close to regions with potential effects on transcription.

The 12q13.13 region

Analysis of the 12q13.13 region proceeded in parallel with that of the 1q41 region using essentially the same strategy. We initially confirmed the individual associations of SNPs rs7136702 and rs11169552 with the CRC risk in the extended data sets (Table 1). Unconditional logistic regression analysis did not exclude the possibility that the two SNPs had independent effects; for rs7136702 and rs11169552, the association statistics were P = 1.63 × 10−5 (OR = 1.07) and P = 1.70 × 10−7 (OR = 0.92), respectively, showing that one SNP did not simply capture all of the association signals. Further analysis showed that the association signal was not derived from a single high-risk haplotype tagged by rs7136702 and rs11169552 (Supplementary Material, Table S3) and there was no evidence of epistasis between the SNPs (P = 0.903). We imputed SNPs within the 48.5–50 Mb region of chromosome 12 using the combined 1000 genomes and HapMap3 reference panel in the 3 GWAS sample sets (UK1, Scotland 1 and VQ58) (Fig. 3, Supplementary Material, Table S4). A total of 2736 SNPs was successfully imputed from 158 genotyped SNPs. The most significant single-SNP association was at the imputed SNP rs7972465 [OR = 1.18, 95% confidence interval (CI) 1.11–1.27, P = 8.22 × 10−7), a signal slightly stronger than that of rs11169552 (OR = 0.85, 95% CI 0.79–0.91, P = 1.08 × 10−5) and notably stronger than that of rs7136702 (OR = 1.13, 95% CI 2.06–1.21, P = 3.85 × 10−4). Direct genotyping in 91 UK control individuals showed that imputation of rs7972465 was very good, although not perfect (r2= 0.93).
Figure 3.

Individual SNP associations in the 12q13.13 region. Legend is as for Figure 1.

Individual SNP associations in the 12q13.13 region. Legend is as for Figure 1. Reverse stepwise logistic regression analysis was then used to assess whether rs11169552 and rs7136702, or other combinations of SNPs in the region, best accounted for the association between CRC and 12q13.13 variation (Table 3). Many highly correlated SNPs exist within the region, making this analysis difficult. Nonetheless, while rs11169552 remained in the regression model after stepwise elimination of less strongly associated SNPs, a number of SNPs provided improved or similar associations compared with rs7136702 in a two-SNP model with rs11169552. One of these SNPs was rs7972465 (Table 3, Fig. 4), but another SNP, rs706793, a SNP in very low LD with rs11169552 (Table 3), provided a larger improvement in the AIC (see also Supplementary Material, Table S4).
Table 3.

Two-SNP logistic regression analysis showing best signals in the 12q13.13 region in comparison with the original reported SNPs

SNPsGenotyped or imputed?Positions (bases)LD (r2, D′)Risk allele (freqcases, freqcontrols)No, cases, no. controlsOR95% CIzP-valueAIC
rs11169552Genotyped49,441,9300.040C (0.76, 0.73)32760.920.81–0.943.377.52 × 10−410473
rs7136702Genotyped49,166,4830.57T (0.37, 0.35)45761.081.01–1.162.130.033
rs11169552Genotyped49,441,9300.19C (0.76,0.73)32060.890.82–0.972.690.00710469
rs3184122Imputed48,856,3940.92C (0.41,0.37)44801.121.04–1.212.960.003
rs11169552Genotyped49,441,9300.06C (0.76, 0.73)32680.880.81–0.953.406.74 × 10−410468
rs35031884Imputed49,063,8401.00A (0.32,0.29)45541.151.05–1.253.190.0014
rs11169552Genotyped49,441,9300.17C (0.76,0.73)32680.900.83–0.972.640.00810466
rs7972465Imputed48,832,3920.89G (0.21,0.18)45631.141.06–1.223.436.04 × 10−4
rs11169552Genotyped49,441,9300.002C (0.76,0.73)32660.840.78–0.914.565.12 × 10−610426
rs706793Genotyped48,754,0360.095C (0.60,0.57)45570.490.32–0.763.230.0012

SNP pairs are shown in order of descending Aikake Information Criterion (AIC). Note that in single-SNP analysis, rs706793 provided only slightly worse evidence of association (OR = 0.90, 95% CI 0.84–0.96, P = 0.002) than in combined analysis with rs11169552. Individual AICs for rs11169552, rs7136702, rs3184122, rs35031884, rs7972465 and rs706793 were, respectively, 10476, 10483, 10475, 10477, 10471 and 10447. Incorporation of rs7972465 into a regression model with rs11169552 and rs706793 did not improve the model's fit (AIC = 10426).

Figure 4.

LD and main haplotypes at SNPs with best evidence of association on 12q13.13. Note that in this Haploview output from HapMap3 data, the alleles at rs706793 are shown on the opposite strand (that is G/A rather than C/T as used in the rest of this manuscript).

Two-SNP logistic regression analysis showing best signals in the 12q13.13 region in comparison with the original reported SNPs SNP pairs are shown in order of descending Aikake Information Criterion (AIC). Note that in single-SNP analysis, rs706793 provided only slightly worse evidence of association (OR = 0.90, 95% CI 0.84–0.96, P = 0.002) than in combined analysis with rs11169552. Individual AICs for rs11169552, rs7136702, rs3184122, rs35031884, rs7972465 and rs706793 were, respectively, 10476, 10483, 10475, 10477, 10471 and 10447. Incorporation of rs7972465 into a regression model with rs11169552 and rs706793 did not improve the model's fit (AIC = 10426). LD and main haplotypes at SNPs with best evidence of association on 12q13.13. Note that in this Haploview output from HapMap3 data, the alleles at rs706793 are shown on the opposite strand (that is G/A rather than C/T as used in the rest of this manuscript). We then undertook GENECLUSTER analysis of the UK1, Scotland 1 and VQ58 sample sets in the 12q13.13 region. There was no good evidence to distinguish between underlying two-locus and one-locus models (Fig. 5), although the association signal showed two peaks at ∼48.85 Mb (close to rs706793) and at ∼49.45 Mb (very close to rs11169552) that could not readily be explained by long-range LD between these two regions (Supplementary Material, Fig. S1). The predicted most strongly associated SNP under the one-SNP model was rs3184122 (Supplementary Material, Table S4), a variant that is in moderate or strong LD (Fig. 5) with rs11169552 (r2= 0.19, D′ = 0.92), rs7136702 (r2= 0.49, D′ = 0.73) and rs706793 (r2= 0.47, D′ = 0.94), and strong LD with rs7972465 (r2= 0.87, D′ = 1.00).
Figure 5.

GENECLUSTER output for the 12q13.13 region. The legend is as for Figure 2, except that the focal position is Chr12:48,849,000 and the double peak of association at ∼48.85 and 49.45 Mb should be noted. The top SNP (blue dot) under the one-SNP model is the imputed SNP rs3184122. The top-genotyped SNP in the GENECLUSTER analysis was rs7138945, which was the SNP with the second-best association signal in Margarita (P = 1.14 × 10−5).

GENECLUSTER output for the 12q13.13 region. The legend is as for Figure 2, except that the focal position is Chr12:48,849,000 and the double peak of association at ∼48.85 and 49.45 Mb should be noted. The top SNP (blue dot) under the one-SNP model is the imputed SNP rs3184122. The top-genotyped SNP in the GENECLUSTER analysis was rs7138945, which was the SNP with the second-best association signal in Margarita (P = 1.14 × 10−5). Since the various analyses had not resolved the question of whether there exist one or two independent CRC-associated SNPs in the 12q13.13 region, we used PLINK to examine the associations with disease of the haplotypes (Fig. 4) for rs706793, rs7972465 and rs11169552. As expected, the haplotype CGC was most strongly associated with risk (Table 4, Supplementary Material, Table S5). The G (risk) allele at rs7972465 was essentially present only on this haplotype, but it appeared that haplotypes containing the T allele at rs7972465 were not all low risk and therefore that rs7972465 did not explain all the association signal. We therefore considered the association signals when we fixed the alleles at rs706793 and rs11169552 and allowed those at rs7972465 to vary, and vice versa. Initially, we undertook simple comparisons between haplotype frequencies in cases and controls, and found that the rs706793 and rs11169552 risk alleles, but not the rs7972465 risk allele, were found at significantly higher frequencies in cases than controls (Supplementary Material, Table S6). Since this analysis suggested that there might be independent effects of rs706793 and rs11169552—and that the signal at rs7972465 resulted from LD with these two SNPs—we proceeded to a further evaluation of this possibility using conditional haplotype analysis in PLINK. We again compared two scenarios, (i) in which the CGC and CTC haplotypes were equivalent (that is, varying rs7972465) and (ii) in which the CTC and TTT haplotypes were equivalent (that is, varying rs706793 and rs11169552). No effect was seen in the first case (likelihood ratio test, P = 0.35), whereas there was a significant difference in the second case (P = 0.023), again supporting effects of rs706793 and rs11169552 rather than rs7972465.
Table 4.

Haplotype analysis in the 12q13.13 region

HaplotypeFreq. in casesFreq. in controlsORP-value
TTT0.09220.10850.825.3 × 10−4
CTT0.14070.15670.874.5 × 10−3
CGC0.38290.34431.196.2 × 10−7
TTC0.31020.31940.960.212
CTC0.07390.07101.050.483

Haplotypes (cen-tel) at rs706793, rs7972465 and rs11169552 were analysed, notwithstanding the low LD between the first and last of these SNPs. Five haplotypes with frequencies of >0.01 were predicted. Ca, cases; Co, controls. OR, odds ratio relative to all other haplotypes. P-value is from analysis of effects of all haplotypes on disease risk in a logistic regression model. Odds ratios and P-values relative to reference haplotype TTT are given in Supplementary Material, Table S5.

Haplotype analysis in the 12q13.13 region Haplotypes (cen-tel) at rs706793, rs7972465 and rs11169552 were analysed, notwithstanding the low LD between the first and last of these SNPs. Five haplotypes with frequencies of >0.01 were predicted. Ca, cases; Co, controls. OR, odds ratio relative to all other haplotypes. P-value is from analysis of effects of all haplotypes on disease risk in a logistic regression model. Odds ratios and P-values relative to reference haplotype TTT are given in Supplementary Material, Table S5. Further genotyping in additional sample sets strengthened the rs706793 association with CRC, although it did not reach formal significance and there was some evidence of inter-study heterogeneity, the origins of which remain unclear (Supplementary Material, Table S7). Logistic regression analysis in the extended sample set continued to support a model incorporating rs706793 and rs11169552 (P = 8.38 × 10−4 and P = 7.82 × 10−6, respectively, AIC = 27932) over one with rs7163702 and rs11169552 (P = 3.05 × 10−5 and P = 9.05 × 10−3, AIC = 27999). rs706793 (chr12:48,754,036) and rs11169552 (chr12:49,441,930) are separated by a predicted recombination hotspot at ∼48.8 Mb in the HapMap data (Fig. 3) but not in our own data (Fig. 5), although LD in the region is complex (Supplementary Material, Fig. S1). The 12q13.13 region contains coding genes ACCN2, SMARCD1, GPD1, LASS5, LIMA1 and ATF1. ACCN2 probably encodes an ion channel protein, SMARCD1 is part of chromatin remodelling complex SNF/SWI, GPD1 is glycerol-3-phosphate dehydrogenase and LASS5 is probably a ceramide synthase. LIMA1 codes for EPLIN, a protein downregulated in some cancers. ATF1 is a transcription factor centrally involved in the stress response and in the pathogenesis of angiomatoid fibrous histiocytoma and clear cell sarcoma through translocation. Supplementary Material, Table S8 lists SNPs in strong LD (r2> 0.70) with rs706793, rs7972465 or rs11169552, and provides annotation for those with evidence of potential roles in gene or protein regulation or function.

DISCUSSION

We have undertaken additional genotyping and more detailed analysis in order to understand better the dual tagSNP association signals that we observed on chromosomes 1q41 (rs6991170, rs6687758) and 12q13.13 (rs11169552, rs7136702) in a GWAS of CRC (1). In both cases, genotyping of additional sample series confirmed the originally reported associations, without demonstrating good evidence for the three simplest scenarios: independent functional variants; capture of the association signal by one of the pair of SNPs; or two-SNP tagging of a single haplotype on which functional variation lay. We therefore proceeded to more detailed analyses in each region, after imputation of genotypes where appropriate in the data sets with best coverage of each region (UK1, Scotland1 and VQ58). It is conceivable that the analysis of these three data sets, which had already been used in SNP discovery, would introduce a small amount of bias into the fine mapping. However, we reasoned that the marginal differences in association that might occur would be more than outweighed by the power provided by the use of these data sets. For 1q41, the single-SNP association test, logistic regression analysis and GENECLUSTER all found that SNP rs11118883, or a SNP in strong LD, was most likely to be responsible for the signal of association. This SNP itself is a very good functional candidate, lying within or immediately adjacent to regions bearing histone methylation and acetylation marks, DNAse I hypersensitive sites and sites of transcription factor binding (http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=195445293&c=chr1&g=wgEncodeReg). The SCAN expression Quantitative Trait Locus (eQTL) database (4) reports rs11118883 being associated (P = 8 × 10−5) in Europeans with expression of platelet-derived growth factor β (PDGFRB, chr5q31–q32), although this association requires confirmation in appropriate cell types for the CRC risk and is not present in the Genevar eQTL database (http://www.sanger.ac.uk/resources/software/genevar) (5). The possibility that the minor allele of rs12726661 is associated with a second, presumably rare, variant that is protective against the CRC risk is intriguing. While speculative, such a scenario has precedents, such as the MDM2 promoter SNP rs117039649 (6). For 12q13.13, a complex situation was found. Single-SNP analysis found variants with much stronger association signals than either rs11169552 or rs7136702, notably at rs7972465 although small imputation inaccuracies may have inflated this signal. GENECLUSTER analysis found no greater evidence for a two-SNP than one-SNP model and detected the best signal for the former at a SNP, rs3184122, that is in strong LD with rs7972465. Logistic regression analysis, however, supported a two-SNP model, in which a signal at rs706793 was added to that at rs11169552. rs706793 and rs11169552 are in very weak LD, but rs706793 is in moderate LD with rs7163702 (r2= 0.20, D′ = 0.60). Haplotype analysis supported the logistic regression analysis, in that the genotype at rs7972465 did not affect the risk associated with the rs706793–rs11169552 haplotypes, whereas the reverse scenario (high- versus low-risk rs706793–rs11169552 haplotypes) did affect risk. As regards eQTLs for the 12q13.13 SNPs, Genevar shows rs706793 to be associated with LASS5 expression (at P< 10−4), although this association is not reported in SCAN. Clearly, all post-GWAS fine-mapping studies face intrinsic difficulties, such as the use of imputed genotypes, despite the use of stringent criteria for SNP inclusion, and a limited ability to differentiate among association signals of similar magnitudes. The analysis of the 12q13.13 region illustrates some of these problems well. Although a much more strongly CRC-associated SNP than the original tagSNPs was identified through imputation, the balance of evidence slightly favours this signal resulting from two independent association signals, as we have previously found for the GREM1 locus (2). In the 1q41 region, in contrast, rs11118883—a SNP in moderate LD with both the original tagSNPs—emerged as an excellent candidate for the functional variant.

MATERIALS AND METHODS

Sample sets

The Kentucky samples comprised 1020 incident colon cancer cases and 1598 population controls of white European origin recruited between July 2003 and December 2009. Eligible cases were identified through the population-based Surveillance, Epidemiology and End Results (SEER) Kentucky Cancer Registry covering all residents living in the State of Kentucky at the time of diagnosis. We used random digital dialling to recruit population controls who were 40 years of age or older and had no personal history of cancer other than skin cancer. We excluded those with known inflammatory bowel diseases, family history of familial adenomatous polyposis and hereditary non-polyposis CRC. The Prague cases (7) were patients with histologically confirmed CRC recruited between September 2004 and February 2009 from nine oncology departments in the Czech Republic: Prague (two), Benesov, Brno, Liberec, Ples, Pribram, Usti nad Labem and Zlin. During this period, a total of 1554 cases provided blood samples. This study includes 1001 subjects who could be interviewed, provided biological samples and were genotyped. Controls were 683 hospital-based volunteers with negative colonoscopy results for malignancy or idiopathic bowel diseases (CFCC, cancer-free colonoscopy inspected controls). CFCCs were selected from among individuals admitted to the same hospitals during the same period of the recruitment of the cases. The reasons for undergoing the colonoscopy were: (i) positive faecal occult blood test, (ii) haemorrhoids, (iii) abdominal pain of unknown origin, or (iv) macroscopic bleeding. Details of other sample sets have been reported previously (2) and are provided briefly below. UK1 (CORGI) comprised 922 cases with colorectal neoplasia (47% male) ascertained through the Colorectal Tumour Gene Identification (CORGI) consortium. All had at least one first-degree relative affected by CRC and one or more of the following phenotypes: CRC at age 75 or less; any colorectal adenoma (CRAd) at age 45 or less; ≥3 CRAds at age 75 or less; or a large (>1 cm diameter) or aggressive (villous and/or severely dysplastic) adenoma at age 75 or less. The 929 controls (45% males, 55% females) were spouses or partners unaffected by cancer and without a personal family history (to second degree relative level) of colorectal neoplasia. Known dominant polyposis syndromes, HNPCC/Lynch syndrome or bi-allelic MUTYH mutation carriers were excluded. Scotland1 (COGS) included 980 CRC cases (51% male; mean age at diagnosis 49.6 years, SD ± 6.1) and 1002 cancer-free population controls (51% male; mean age 51.0 years; SD ± 5.9). Cases were for early age at onset (age ≤ 55 years). Known dominant polyposis syndromes, HNPCC/Lynch syndrome or bi-allelic MUTYH mutation carriers were excluded. Control subjects were sampled from the Scottish population NHS registers, matched by age (±5 years), gender and area of residence within Scotland. VQ58 comprised 1832 CRC cases (1099 males, mean age of diagnosis 62.5 years; SD ± 10.9) from the VICTOR and QUASAR2 (www.octo-oxford.org.uk/alltrials/trials/q2.html) clinical trials of adjuvant therapy in stage II/III CRC. There were 2720 population control genotypes (1391 males) from the Wellcome Trust Case-Control Consortium 2 (WTCCC2) 1958 birth cohort (also known as the National Child Development Study), which included all births in England, Wales and Scotland during a single week in 1958. The Australian study comprised 591 patients treated for CRC at the Royal Melbourne, Western and St Francis Xavier Cabrini Hospitals in Melbourne from 1999 to 2009. The 2353 controls were derived from Queensland or Melbourne: for the former, the controls came from the Brisbane Twin Nevus Study; for the latter, individuals were participants in the Genes in Myopia study. There was no overlap between the CFR and Australian data sets. Owing to potential residual ethnic heterogeneity within the Melbourne population, for the Australian cohort only we performed an additional screen to minimize heterogeneity after performing principal components analysis (PCA) to remove individuals who clustered with non-CEU individuals (see below). We achieved this by performing PCA on the Australian cases and controls without reference samples of known ancestry. We then paired each case with a control in a 1:1 ratio based on a maximum separation of 0.050 using the first and second eigenvectors. All unpaired samples were excluded, leaving 441 cases and 441 controls in the study. Calculation of the genomic inflation factor, λGC, showed this to be 1.02 after this filtering. UK2 (NSCCG) consisted of 2854 CRC cases (58% male, mean age at diagnosis 59.3 years; SD± 8.7) ascertained through two ongoing initiatives at the Institute of Cancer Research/Royal Marsden Hospital NHS Trust (RMHNHST) from 1999 onwards—The National Study of Colorectal Cancer Genetics (NSCCG) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. The 2822 controls (41% males; mean age 59.8 years; SD ± 10.8) were the spouses or unrelated friends of patients with malignancies. None had a personal history of malignancy at the time of ascertainment. All cases and controls had self-reported European ancestry, and there were no obvious differences in the demography of cases and controls in terms of place of residence within the UK. Scotland2 (SOCCS) comprised 2024 CRC cases (61% male; mean age at diagnosis 65.8 years, SD ± 8.4) and 2092 population controls (60% males; mean age 67.9 years, SD ± 9.0) ascertained in Scotland. Cases were taken from an independent, prospective, incident CRC case series and aged <80 years at diagnosis. Control subjects were population controls matched by age (±5 years), gender and area of residence within Scotland. UK3 (NSCCG) comprised 7912 CRC cases (65% male; mean age at diagnosis 59 years, SD± 8.2) and 4398 controls (40% male; mean age 62 years, SD± 11.5) ascertained through NSCCG post-2005. Scotland3 (SOCCS) comprised 1145 CRC cases (50% male; mean age at diagnosis 53.2 years, SD ± 15.4) and 2203 cancer-free population controls (47% male; mean age 51.8 years, SD ± 11.5). Controls were recruited as part of the Generation Scotland study. UK4 (CORGI2BCD) consisted of 621 CRC or CRAd cases (46% male; mean age at diagnosis 58.3 years; SD ± 14.1) and 1121 cancer-free population or spouse controls (45% male; mean age 45.1 years, SD ± 15.9), sampled using the same criteria as UK1. Cambridge/SEARCH consisted of 2248 CRC cases (56% male; mean age at diagnosis 59.2 years, SD± 8.1) and 2209 controls (42% males; mean age 57.6 years, SD± 15.1). Samples were ascertained through the SEARCH (Studies of Epidemiology and Risk Factors in Cancer Heredity, http://www.cancerhelp.org.uk/trials/a-study-looking-at-genetic-causes-of-cancer) study based in Cambridge, UK. Recruitment started in 2000; initial patient contact was though the general practitioner. Control samples were collected post-2003. Eligible individuals were sex and frequency matched in 5-year age bands to cases. The COIN samples were 2151 cases derived from the COIN and COIN-B clinical trials of metastatic CRC. Median age was 63 years. COIN cases were compared against genotypes from 2501 population controls (1237 males), from the WTCCC2 National Blood Service (NBS) cohort (50% male; mean age at diagnosis 53.2 years, SD ± 15.4). The Helsinki (FCCPS) study (http://research.med.helsinki.fi/gsb/aaltonen/) comprised 988 cases from a population-based collection centred on south-eastern Finland and 864 population controls from the same collection. EPICOLON included 1410 CRC cases matched with the same number of controls collected in a prospective fashion from centres in Spain. Exclusion criteria were Mendelian CRC syndromes and a personal history of inflammatory bowel disease. The Leiden sample set included 858 unselected cases with CRC and 690 controls ascertained through genetic testing programmes for non-cancer-related conditions from the Leiden area. In all cases, CRC was defined according to the ninth revision of the International Classification of Diseases (ICD) by codes 153–154 and all cases had pathologically proven disease. Only individuals of white European origin were included in the study.

Sample preparation and genotyping

Collection of blood samples and clinico-pathological information from patients and controls was undertaken with informed consent and ethical review board approval in accordance with the tenets of the Declaration of Helsinki. DNA was extracted from samples using conventional methods and quantified using PicoGreen (Invitrogen). The VQ, UK1, Scotland1 and Australia GWA cohorts were genotyped using Illumina Hap300, Hap370 or Hap550 arrays. 1958BC and NBS genotyping was performed as part of the WTCCC2 study on Hap1.2M arrays. In UK2 and Scotland2, genotyping was conducted using custom Illumina Infinium arrays according to the manufacturer's protocols. Some COIN SNPs were typed on custom Illumina Goldengate arrays. To ensure quality of genotyping, a series of duplicate samples was genotyped, resulting in 99.9% concordant calls in all cases. Other genotyping was conducted using competitive allele-specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK), Taqman (Life Sciences, Carlsbad, CA, USA) or MassARRAY (Sequenom Inc., San Diego, CA, USA). All primers, probes and conditions used are available on request. Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99% concordant results were obtained. We excluded SNPs from analysis if they failed one or more of the following thresholds: GenCall scores <0.25; overall call rates <95%; minor allele frequency (MAF)<0.01; departure from the Hardy–Weinberg equilibrium (HWE) in controls at P<10−4 or in cases at P< 10−6; outlying in terms of signal intensity or X:Y ratio; discordance between duplicate samples; and, for SNPs with evidence of association, poor clustering on inspection of X:Y plots. We excluded individuals from the GWA analyses if they had evidence of non-white European ancestry by PCA-based analysis in comparison with HapMap samples (http://hapmap.ncbi.nlm.nih.gov/) or by self-report. Deviation of the genotype frequencies in the controls from those expected under HWE was assessed by the χ2 test (1 df), or Fisher's exact test where an expected cell count was <5.

Association statistics and imputation

Associations between SNP genotype and disease status were primarily assessed in STATA v10 (http://www.stata.com/) and PLINK v1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/) using allelic and Cochran–Armitage tests (both with 1df) respectively, or by Fisher's exact test where an expected cell count was <5. Genotypic (2 df), dominant (1 df) and recessive (1 df) tests were also performed. The risks associated with each SNP were estimated by allelic, heterozygous and homozygous ORs using unconditional logistic regression, and associated 95% CIs were calculated. Joint analysis of data generated from multiple phases was conducted using standard methods for combining raw data based on the Mantel–Haenszel method in STATA and in PLINK. Joint ORs and 95% CIs were calculated assuming fixed- and random-effects models. Tests of the significance of the pooled effect sizes were calculated using a standard normal distribution. Cochran's Q statistic to test for heterogeneity and the I2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated. Large heterogeneity is typically defined as I2≥75%. Where significant heterogeneity was identified, results from the random-effects model were reported. Alongside, we also performed meta-analysis based on allele dosage (0, 1, 2) and incorporated age and sex as co-variates. Although age and sex are associated with the CRC risk, they were not associated with SNP genotype and did not materially affect the significance of any of the reported associations (data not shown). The combined effects of pairs or other multiples of loci identified as possibly associated with the CRC risk were investigated by unconditional or conditional logistic regression analysis in PLINK and STATA to test for independent effects of each SNP, stratifying by sample series. Logistic regression was undertaken both pairwise with the original tagSNP and then in a backwards analysis that initially included all SNPs with good evidence of association in each region. We used Haploview software v4.2 (http://www.broadinstitute.org/haploview) to infer the LD structure of the genome in the 1q41 and 12q13.13 regions, and used the expectation maximum algorithms in Haploview or PLINK to infer haplotypes. To predict genotypes at untyped SNPs in both regions, imputation of the UK1, Scotland 1 and VQ58 data sets was performed using the IMPUTE2 software and the combined CEU 1000 Genomes low-coverage pilot and complete HapMap3 haplotypes reference set, which was filtered to remove duplicate haplotypes (both from https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) (8). Association statistics for imputed SNPs were calculated in SNPTEST v1.1.5 (www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html) using the ‘–proper’ option, which is an additive model score test based on missing data likelihood, to allow for the uncertainty of imputed genotypes (9). Imputed markers with proper_info scores <0.5, imputed call rates per SNP <0.9 (using a maximum genotype probability threshold of 0.9 to call a genotype) and MAFs <0.01 were excluded from the analyses. Meta-analyses of the sample sets were carried out with Meta (10) (http://www.stats.ox.ac.uk/~jsliu/meta.html) and in STATA, using the genotype probabilities from IMPUTE2 where a SNP was not directly typed. The GENECLUSTER program (11) was used to analyse our UK1, Scotland1 and VQ58 samples specifically in order to test whether one- or two-SNP models better fitted the association signals in each SNP region. GENECLUSTER is a Bayesian method that uses HapMap haplotypes to estimate genealogy of samples and by jointly testing all SNPs on each branch of the genealogy in cases and controls, the program indicates the identities of the SNP(s) most likely to have the strongest association signal, thus potentially helping to identify functional variation. The default model parameters were used, specifically mutation model prior: (0.50, 0.50, 0.00), max number of trees to consider per location: 1 and beta risk prior parameters: (5.00, 5.00). Essentially for comparative purposes, we also ran the Margarita program (12), based on ancestral recombination graphs (ARGs), in UK1, Scotland1 and VQ58 for the genotyped SNPs in the 1q and 12q regions. This program aims to maximize available information as to the location and identity of a functional SNP by reconstructing the genealogical history of the sample population. For each ARG, a putative risk mutation is placed on the marginal tree and the frequency of each branch in cases and controls is assessed. For each region, 30ARGs were constructed and the significance of a SNP at each branchpoint assessed by 10 000 permutations. Unlike GENECLUSTER, Margarita does not specifically address the issue of whether there are two independent underlying SNP in each region, and comparison was therefore restricted to the single-SNP scenario. Genome co-ordinates were taken from the NCBI build 36/hg18 (dbSNP b126).

SUPPLEMENTARY MATERIAL

Supplementary Material is available at . Conflict of Interest statement. None declared.

FUNDING

Funding was primarily provided by Cancer Research UK. The EU FP7 CHIBCHA grant supported LGC-C through funding to IPMT, SC-B and ACar. Core infrastructure support to the Wellcome Trust Centre for Human Genetics, Oxford was provided by grant 090532/Z/09/Z. I.P.M.T. received support from the Oxford NIHR Comprehensive Biomedical Research Centre. The UK National Cancer Research Network supported the NSCCG. Additional funding to M.D. was provided by the Medical Research Council (G0000657-53203), CORE and Scottish Executive Chief Scientist's Office (K/OPR/2/2/D333, CZB/4/449). The EPICOLON work was supported by grants from the Fondo de Investigación Sanitaria/FEDER (08/0024, 08/1276, PS09/02368), Ministerio de Ciencia e Innovación (SAF2010-19273), Asociación Española contra el Cáncer (Fundación Científica y Junta de Barcelona) and Fundació Olga Torres (CRP). S.C.-B. and C.F.-R. are supported by contracts from the Fondo de Investigación Sanitaria (CP03-0070 and PS09/02368). CIBERehd and CIBERER are funded by the Instituto de Salud Carlos III. For the Melbourne cases, work was supported by the Hilton Ludwig Cancer Metastasis Initiative. The specimens and data from Australian colon cancer patients were provided by the Victorian Cancer Biobank and BioGrid Australia with appropriate ethics approval. The Victorian Cancer Biobank is supported by the Victorian Government. CERA receives operational infrastructure support from the Victorian Government. Funding to pay the Open Access publication charges for this article was provided by the Wellcome Trust.
  11 in total

1.  Mapping trait loci by use of inferred ancestral recombination graphs.

Authors:  Mark J Minichiello; Richard Durbin
Journal:  Am J Hum Genet       Date:  2006-09-27       Impact factor: 11.025

2.  A new multipoint method for genome-wide association studies by imputation of genotypes.

Authors:  Jonathan Marchini; Bryan Howie; Simon Myers; Gil McVean; Peter Donnelly
Journal:  Nat Genet       Date:  2007-06-17       Impact factor: 38.330

3.  SCAN: SNP and copy number annotation.

Authors:  Eric R Gamazon; Wei Zhang; Anuar Konkashbaev; Shiwei Duan; Emily O Kistner; Dan L Nicolae; M Eileen Dolan; Nancy J Cox
Journal:  Bioinformatics       Date:  2009-11-17       Impact factor: 6.937

4.  Searching for the missing heritability of complex diseases.

Authors:  Kari Hemminki; Asta Försti; Richard Houlston; Justo Lorenzo Bermejo
Journal:  Hum Mutat       Date:  2011-02       Impact factor: 4.878

5.  The MDM2 promoter SNP285C/309G haplotype diminishes Sp1 transcription factor binding and reduces risk for breast and ovarian cancer in Caucasians.

Authors:  Stian Knappskog; Merete Bjørnslett; Line M Myklebust; Petra E A Huijts; Maaike P Vreeswijk; Hege Edvardsen; Yongli Guo; Xuemei Zhang; Ming Yang; Sanna K Ylisaukko-Oja; Pia Alhopuro; Johanna Arola; Rob A E M Tollenaar; Christi J van Asperen; Caroline Seynaeve; Vidar Staalesen; Ranjan Chrisanthar; Erik Løkkevik; Helga B Salvesen; D Gareth Evans; William G Newman; Dongxin Lin; Lauri A Aaltonen; Anne-Lise Børresen-Dale; Grethe S Tell; Camilla Stoltenberg; Pål Romundstad; Kristian Hveem; Johan R Lillehaug; Lars Vatten; Peter Devilee; Anne Dørum; Per E Lønning
Journal:  Cancer Cell       Date:  2011-02-15       Impact factor: 31.743

6.  Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.

Authors:  Tsun-Po Yang; Claude Beazley; Stephen B Montgomery; Antigone S Dimas; Maria Gutierrez-Arcelus; Barbara E Stranger; Panos Deloukas; Emmanouil T Dermitzakis
Journal:  Bioinformatics       Date:  2010-08-10       Impact factor: 6.937

7.  Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer.

Authors:  Ian P M Tomlinson; Luis G Carvajal-Carmona; Sara E Dobbins; Albert Tenesa; Angela M Jones; Kimberley Howarth; Claire Palles; Peter Broderick; Emma E M Jaeger; Susan Farrington; Annabelle Lewis; James G D Prendergast; Alan M Pittman; Evropi Theodoratou; Bianca Olver; Marion Walker; Steven Penegar; Ella Barclay; Nicola Whiffin; Lynn Martin; Stephane Ballereau; Amy Lloyd; Maggie Gorman; Steven Lubbe; Bryan Howie; Jonathan Marchini; Clara Ruiz-Ponte; Ceres Fernandez-Rozadilla; Antoni Castells; Angel Carracedo; Sergi Castellvi-Bel; David Duggan; David Conti; Jean-Baptiste Cazier; Harry Campbell; Oliver Sieber; Lara Lipton; Peter Gibbs; Nicholas G Martin; Grant W Montgomery; Joanne Young; Paul N Baird; Steven Gallinger; Polly Newcomb; John Hopper; Mark A Jenkins; Lauri A Aaltonen; David J Kerr; Jeremy Cheadle; Paul Pharoah; Graham Casey; Richard S Houlston; Malcolm G Dunlop
Journal:  PLoS Genet       Date:  2011-06-02       Impact factor: 5.917

8.  Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33.

Authors:  Richard S Houlston; Jeremy Cheadle; Sara E Dobbins; Albert Tenesa; Angela M Jones; Kimberley Howarth; Sarah L Spain; Peter Broderick; Enric Domingo; Susan Farrington; James G D Prendergast; Alan M Pittman; Evi Theodoratou; Christopher G Smith; Bianca Olver; Axel Walther; Rebecca A Barnetson; Michael Churchman; Emma E M Jaeger; Steven Penegar; Ella Barclay; Lynn Martin; Maggie Gorman; Rachel Mager; Elaine Johnstone; Rachel Midgley; Iina Niittymäki; Sari Tuupanen; James Colley; Shelley Idziaszczyk; Huw J W Thomas; Anneke M Lucassen; D Gareth R Evans; Eamonn R Maher; Timothy Maughan; Antigone Dimas; Emmanouil Dermitzakis; Jean-Baptiste Cazier; Lauri A Aaltonen; Paul Pharoah; David J Kerr; Luis G Carvajal-Carmona; Harry Campbell; Malcolm G Dunlop; Ian P M Tomlinson
Journal:  Nat Genet       Date:  2010-10-24       Impact factor: 38.330

9.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity.

Authors:  Jason Z Liu; Federica Tozzi; Dawn M Waterworth; Sreekumar G Pillai; Pierandrea Muglia; Lefkos Middleton; Wade Berrettini; Christopher W Knouff; Xin Yuan; Gérard Waeber; Peter Vollenweider; Martin Preisig; Nicholas J Wareham; Jing Hua Zhao; Ruth J F Loos; Inês Barroso; Kay-Tee Khaw; Scott Grundy; Philip Barter; Robert Mahley; Antero Kesaniemi; Ruth McPherson; John B Vincent; John Strauss; James L Kennedy; Anne Farmer; Peter McGuffin; Richard Day; Keith Matthews; Per Bakke; Amund Gulsvik; Susanne Lucae; Marcus Ising; Tanja Brueckl; Sonja Horstmann; H-Erich Wichmann; Rajesh Rawal; Norbert Dahmen; Claudia Lamina; Ozren Polasek; Lina Zgaga; Jennifer Huffman; Susan Campbell; Jaspal Kooner; John C Chambers; Mary Susan Burnett; Joseph M Devaney; Augusto D Pichard; Kenneth M Kent; Lowell Satler; Joseph M Lindsay; Ron Waksman; Stephen Epstein; James F Wilson; Sarah H Wild; Harry Campbell; Veronique Vitart; Muredach P Reilly; Mingyao Li; Liming Qu; Robert Wilensky; William Matthai; Hakon H Hakonarson; Daniel J Rader; Andre Franke; Michael Wittig; Arne Schäfer; Manuela Uda; Antonio Terracciano; Xiangjun Xiao; Fabio Busonero; Paul Scheet; David Schlessinger; David St Clair; Dan Rujescu; Gonçalo R Abecasis; Hans Jörgen Grabe; Alexander Teumer; Henry Völzke; Astrid Petersmann; Ulrich John; Igor Rudan; Caroline Hayward; Alan F Wright; Ivana Kolcic; Benjamin J Wright; John R Thompson; Anthony J Balmforth; Alistair S Hall; Nilesh J Samani; Carl A Anderson; Tariq Ahmad; Christopher G Mathew; Miles Parkes; Jack Satsangi; Mark Caulfield; Patricia B Munroe; Martin Farrall; Anna Dominiczak; Jane Worthington; Wendy Thomson; Steve Eyre; Anne Barton; Vincent Mooser; Clyde Francks; Jonathan Marchini
Journal:  Nat Genet       Date:  2010-04-25       Impact factor: 38.330

10.  A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

Authors:  Bryan N Howie; Peter Donnelly; Jonathan Marchini
Journal:  PLoS Genet       Date:  2009-06-19       Impact factor: 5.917

View more
  14 in total

1.  Novel childhood ALL susceptibility locus BMI1-PIP4K2A is specifically associated with the hyperdiploid subtype.

Authors:  Kyle M Walsh; Adam J de Smith; Anand P Chokkalingam; Catherine Metayer; Gary V Dahl; Ling-i Hsu; Lisa F Barcellos; Joseph L Wiemels; Patricia A Buffler
Journal:  Blood       Date:  2013-06-06       Impact factor: 22.113

2.  Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis.

Authors:  Adria Closa; David Cordero; Rebeca Sanz-Pamplona; Xavier Solé; Marta Crous-Bou; Laia Paré-Brunet; Antoni Berenguer; Elisabet Guino; Adriana Lopez-Doriga; Jordi Guardiola; Sebastiano Biondo; Ramon Salazar; Victor Moreno
Journal:  Carcinogenesis       Date:  2014-04-23       Impact factor: 4.944

3.  DUSP10 regulates intestinal epithelial cell growth and colorectal tumorigenesis.

Authors:  C W Png; M Weerasooriya; J Guo; S J James; H M Poh; M Osato; R A Flavell; C Dong; H Yang; Y Zhang
Journal:  Oncogene       Date:  2015-03-16       Impact factor: 9.867

4.  Gene-environment interaction involving recently identified colorectal cancer susceptibility Loci.

Authors:  Elizabeth D Kantor; Carolyn M Hutter; Jessica Minnier; Sonja I Berndt; Hermann Brenner; Bette J Caan; Peter T Campbell; Christopher S Carlson; Graham Casey; Andrew T Chan; Jenny Chang-Claude; Stephen J Chanock; Michelle Cotterchio; Mengmeng Du; David Duggan; Charles S Fuchs; Edward L Giovannucci; Jian Gong; Tabitha A Harrison; Richard B Hayes; Brian E Henderson; Michael Hoffmeister; John L Hopper; Mark A Jenkins; Shuo Jiao; Laurence N Kolonel; Loic Le Marchand; Mathieu Lemire; Jing Ma; Polly A Newcomb; Heather M Ochs-Balcom; Bethann M Pflugeisen; John D Potter; Anja Rudolph; Robert E Schoen; Daniela Seminara; Martha L Slattery; Deanna L Stelling; Fridtjof Thomas; Mark Thornquist; Cornelia M Ulrich; Greg S Warnick; Brent W Zanke; Ulrike Peters; Li Hsu; Emily White
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2014-07-03       Impact factor: 4.254

Review 5.  Quantifying the utility of single nucleotide polymorphisms to guide colorectal cancer screening.

Authors:  Mark A Jenkins; Enes Makalic; James G Dowty; Daniel F Schmidt; Gillian S Dite; Robert J MacInnis; Driss Ait Ouakrim; Mark Clendenning; Louisa B Flander; Oliver K Stanesby; John L Hopper; Aung K Win; Daniel D Buchanan
Journal:  Future Oncol       Date:  2016-02-01       Impact factor: 3.404

Review 6.  Molecular pathological epidemiology of epigenetics: emerging integrative science to analyze environment, host, and disease.

Authors:  Shuji Ogino; Paul Lochhead; Andrew T Chan; Reiko Nishihara; Eunyoung Cho; Brian M Wolpin; Jeffrey A Meyerhardt; Alexander Meissner; Eva S Schernhammer; Charles S Fuchs; Edward Giovannucci
Journal:  Mod Pathol       Date:  2013-01-11       Impact factor: 7.842

7.  Differential expression of miR-1, a putative tumor suppressing microRNA, in cancer resistant and cancer susceptible mice.

Authors:  Jessica L Fleming; Dustin L Gable; Somayeh Samadzadeh-Tarighat; Luke Cheng; Lianbo Yu; Jessica L Gillespie; Amanda Ewart Toland
Journal:  PeerJ       Date:  2013-04-16       Impact factor: 2.984

8.  A colorectal cancer prediction model using traditional and genetic risk scores in Koreans.

Authors:  Keum Ji Jung; Daeyoun Won; Christina Jeon; Soriul Kim; Tae Il Kim; Sun Ha Jee; Terri H Beaty
Journal:  BMC Genet       Date:  2015-05-09       Impact factor: 2.797

9.  Genome-wide association study of colorectal cancer in Hispanics.

Authors:  Stephanie L Schmit; Fredrick R Schumacher; Christopher K Edlund; David V Conti; Ugonna Ihenacho; Peggy Wan; David Van Den Berg; Graham Casey; Barbara K Fortini; Heinz-Josef Lenz; Teresa Tusié-Luna; Carlos A Aguilar-Salinas; Hortensia Moreno-Macías; Alicia Huerta-Chagoya; María Luisa Ordóñez-Sánchez; Rosario Rodríguez-Guillén; Ivette Cruz-Bautista; Maribel Rodríguez-Torres; Linda Liliana Muñóz-Hernández; Olimpia Arellano-Campos; Donají Gómez; Ulices Alvirde; Clicerio González-Villalpando; María Elena González-Villalpando; Loic Le Marchand; Christopher A Haiman; Jane C Figueiredo
Journal:  Carcinogenesis       Date:  2016-04-18       Impact factor: 4.944

10.  Fine-Mapping of Common Genetic Variants Associated with Colorectal Tumor Risk Identified Potential Functional Variants.

Authors:  Mengmeng Du; Shuo Jiao; Stephanie A Bien; Manish Gala; Goncalo Abecasis; Stephane Bezieau; Hermann Brenner; Katja Butterbach; Bette J Caan; Christopher S Carlson; Graham Casey; Jenny Chang-Claude; David V Conti; Keith R Curtis; David Duggan; Steven Gallinger; Robert W Haile; Tabitha A Harrison; Richard B Hayes; Michael Hoffmeister; John L Hopper; Thomas J Hudson; Mark A Jenkins; Sébastien Küry; Loic Le Marchand; Suzanne M Leal; Polly A Newcomb; Deborah A Nickerson; John D Potter; Robert E Schoen; Fredrick R Schumacher; Daniela Seminara; Martha L Slattery; Li Hsu; Andrew T Chan; Emily White; Sonja I Berndt; Ulrike Peters
Journal:  PLoS One       Date:  2016-07-05       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.