Literature DB >> 29228715

Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas.

Philipp Hofer1, Michael Hagmann2, Stefanie Brezina1, Erich Dolejsi2, Karl Mach3, Gernot Leeb3, Andreas Baierl4, Stephan Buch5, Hedwig Sutterlüty-Fall1, Judith Karner-Hanusch6, Michael M Bergmann6, Thomas Bachleitner-Hofmann6, Anton Stift6, Armin Gerger7, Katharina Rötzer7, Josef Karner8, Stefan Stättner8, Melanie Waldenberger9, Thomas Meitinger9, Konstantin Strauch9,10, Jakob Linseisen9, Christian Gieger9, Florian Frommlet2, Andrea Gsur1.   

Abstract

Most genome-wide association studies (GWAS) were analyzed using single marker tests in combination with stringent correction procedures for multiple testing. Thus, a substantial proportion of associated single nucleotide polymorphisms (SNPs) remained undetected and may account for missing heritability in complex traits. Model selection procedures present a powerful alternative to identify associated SNPs in high-dimensional settings. In this GWAS including 1060 colorectal cancer cases, 689 cases of advanced colorectal adenomas and 4367 controls we pursued a dual approach to investigate genome-wide associations with disease risk applying both, single marker analysis and model selection based on the modified Bayesian information criterion, mBIC2, implemented in the software package MOSGWA. For different case-control comparisons, we report models including between 1-14 candidate SNPs. A genome-wide significant association of rs17659990 (P=5.43×10-9, DOCK3, chromosome 3p21.2) with colorectal cancer risk was observed. Furthermore, 56 SNPs known to influence susceptibility to colorectal cancer and advanced adenoma were tested in a hypothesis-driven approach and several of them were found to be relevant in our Austrian cohort. After correction for multiple testing (α=8.9×10-4), the most significant associations were observed for SNPs rs10505477 (P=6.08×10-4) and rs6983267 (P=7.35×10-4) of CASC8, rs3802842 (P=8.98×10-5, COLCA1,2), and rs12953717 (P=4.64×10-4, SMAD7). All previously unreported SNPs demand replication in additional samples. Reanalysis of existing GWAS datasets using model selection as tool to detect SNPs associated with a complex trait may present a promising resource to identify further genetic risk variants not only for colorectal cancer.

Entities:  

Keywords:  GWAS; MOSGWA; advanced colorectal adenomas; colorectal cancer; model selection

Year:  2017        PMID: 29228715      PMCID: PMC5716755          DOI: 10.18632/oncotarget.21697

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Numerous genome-wide association studies (GWAS) in diverse complex diseases have uncovered hundreds of genetic risk factors by determining hundred thousands of single nucleotide polymorphisms (SNPs) in cohorts of thousands of individuals in a hypothesis-free approach. Although these findings provide valuable insights into the genetic architecture of common diseases they collectively account for a relatively small proportion of heritability [1]. Colorectal carcinogenesis is a complex multi-step process influenced by both, genetic and environmental risk factors. Only 5-10% [2] of all colorectal cancer (CRC) cases can be ascribed to hereditary syndromes and explained by rare but high-penetrant germline mutations. Another 30% of CRCs can be attributed to non-syndromic familial cases with increased familial risk but without evidence of predisposing mutations. The remaining CRCs evolve sporadically and are influenced by numerous genetic variants with low penetrance but of high prevalence in the population (>1%). This common disease-common variant hypothesis was formulated in the early days of GWAS, but was relativized when identified risk loci explained only a small fraction of genetic variance in complex traits. More refined concepts include the common disease-rare variant hypothesis [2], the infinitesimal and the broad sense heritability model (discussed in [3]). GWAS of CRC conducted in European but also Asian populations have discovered so far more than 50 risk variants [4-29] mapping to 23 susceptibility loci. Although GWAS have successfully identified multiple associations of genetic variants with risk of CRC, collectively the CRC SNPs identified in European populations account only for 8% of familial CRC risk [30]. Additional rare risk variants still remain undetected and in part may account for the missing heritability of CRC. Typically, GWAS aims at the identification of a relatively small set of SNPs associated with the investigated phenotype. SNPs exceeding a genome-wide significance threshold (P < 5×10-8) are tested for replication in independent samples. Inevitably, these necessarily stringent penalties for multiple testing have the consequence that a relatively large proportion of associated SNPs cannot be detected. Consequently, the majority of missing heritability may be due to SNPs with effects below the level of genome-wide significant associations [3]. The vast majority of GWAS have been analyzed via single marker analysis. One advantage of this approach is its computational inexpensiveness. However, this standard approach to analyze association with disease risk for each SNP individually assumes complete independence of the analyzed SNPs [31]. In contrast, genetic risk often can be explained as the influence of multiple SNPs mapping to various chromosomal regions resulting in a phenotype [32]. Furthermore, single marker tests cannot take into consideration the distinct correlation structure among SNPs caused by linkage disequilibrium (LD) and interaction effects [31]. Usually, individual effect sizes of SNPs are small, but collectively their impact on the phenotype can be substantial [32]. There are other weighty reasons for considering all genotyped SNPs simultaneously in analysis of GWAS. The predictive power of a single SNP is usually very low, but considering more disease relevant SNPs can improve the accuracy of prediction [33]. In the context of complex diseases multiple genes are involved in disease etiology, thus a joint analysis of multiple SNPs can be more informative and better reflect the relationship between genotype and phenotype than single SNP models [34]. A comprehensive overview of the advantages of model selection based approaches to analysis of GWAS is provided in Frommlet et al. 2016 [35], particularly addressing selection procedures based on modifications of the Bayesian information criterion (BIC) [36]. In high dimensional settings like GWAS where only a small number of SNPs is expected to be associated with disease (under sparsity), it has been shown repeatedly that BIC tends to select too large models. Various modifications of BIC have been proposed to solve this problem, among them mBIC2 [37, 38] which was designed to control the false discovery rate (FDR). Here, we pursued a dual analysis strategy, reporting results from both single marker tests and MOSGWA [39], an implementation of a model selection procedure based on mBIC2. Genome-wide SNP data of 1060 CRC cases, 689 patients with advanced colorectal adenomas and 4367 controls were analyzed presenting the first GWAS of CRC in an Austrian population.

RESULTS

Downstream analysis was performed for 492,217 SNPs using the software package MOSGWA. Additionally, results from single marker analysis via PLINK are reported using Cochran Armitage trend test (CAT) as well as univariate logistic regression models including the first four principle components as covariates to account for population structure. Our study population consisted of four different case and control groups, CRC cases (A), advanced adenomas (B), colonoscopy-negative CORSA controls (C) and KORA controls (D) (Table 1). Further clinical characteristics of CRC cases and advanced adenomas are provided in Supplementary Table 1. Specifically, we report the following four case-control comparisons: A vs. C, A vs. CD, AB vs. CD and B vs CD (Table 2).
Table 1

Study population

TotalPre-QCTotalPost-QC (%)Male (%)Female (%)Mean age ± SD [y]
CRC (A)1060978 (100.0)584 (59.7)394 (40.3)63.5 ± 12.0
AA (B)689636 (100.0)428 (67.3)208 (32.7)64.5 ± 10.3
ControlCORSA (C)928855 (100.0)496 (58.0)359 (42.0)65.1 ± 11.8
ControlKORA (D)34393439 (100.0)1690 (49.1)1749 (50.9)53.8 ± 14.0
Total611659083198271058.2 ± 13.9

CRC Colorectal cancer cases.

AA Advanced adenomas.

Table 2

Single marker tests and model selection

SNPChromosomeGeneOR (Logistic)P (Logistic)Rank (Logistic)OR (Model)P (SM)
A vs. C (978 cases vs. 855 controls)
rs191280416q23.1WWOX1.693.39E-0711.701.96E-07
rs958326913q33.3MYO160.696.19E-0720.691.24E-06
rs104956722p24.2KCNS31.433.23E-0671.461.95E-06
A vs. CD (978 cases vs. 4294 controls)
rs176599903p21.2DOCK31.931.35E-0711.981.59E-08
rs69433918q22.3CBLN21.971.41E-0721.997.89E-07
rs1291630015q13.1HERC21.353.75E-0741.351.37E-04
rs168451073q13.2WDR520.525.94E-0760.521.49E-06
rs119274243p11.1C3orf381.311.17E-06111.321.07E-07
rs168699614p15.31KCNIP40.751.51E-06130.741.25E-04
rs77744356p21.32HLA-DQA2---1.463.52E-04
AB vs. CD (1614 cases vs. 4294 controls)
rs176599903p21.2DOCK31.885.43E-0911.962.94E-10
rs77429156p21.2BTBD91.318.52E-0821.321.44E-06
rs1694461315q26.1CRTC31.321.49E-0741.313.71E-07
rs131296794p16.3RNF42.302.38E-0752.428.82E-07
rs1295371718q21.1SMAD71.263.00E-0761.276.68E-08
rs7422236p24.1TMEM170B0.605.47E-0790.599.49E-08
rs21848571q43CHRM30.797.10E-07110.783.35E-09
rs49545852q22.1CXCR41.269.84E-07121.262.29E-08
rs794226011q21PIWIL40.697.28E-06300.671.84E-06
rs722105917q25.2LINC003380.761.02E-05450.742.92E-05
rs43617678p23.1LOC1572730.771.31E-05620.756.91E-05
rs3401453q13.2TMPRSS70.823.91E-051420.791.45E-04
rs77744356p21.32HLA-DQA2---1.655.82E-05
rs31309546p21.33HCG27---1.791.04E-04
B vs. CD (636 cases vs. 4294 controls)
rs794425111q14.3FAT30.663.97E-0720.661.27E-08

A  CRC cases (CORSA).

B  Advanced adenomas (CORSA).

C  Controls (CORSA).

D  Controls (KORA).

OR (Model) Odds ratio based on the coefficients of the model selected by MOSGWA.

P (SM) Single marker test P-value (Cochran Armitage trend test).

OR (Logistic) Odds ratio based on univariate logistic model.

P (Logistic) P-value of univariate logistic model.

Rank (Logistic) Rank of the SNP in the top SNP list of P (Logistic) sorted by P-value.

-  HLA region excluded from logistic models.

CRC Colorectal cancer cases. AA Advanced adenomas. A  CRC cases (CORSA). B  Advanced adenomas (CORSA). C  Controls (CORSA). D  Controls (KORA). OR (Model) Odds ratio based on the coefficients of the model selected by MOSGWA. P (SM) Single marker test P-value (Cochran Armitage trend test). OR (Logistic) Odds ratio based on univariate logistic model. P (Logistic) P-value of univariate logistic model. Rank (Logistic) Rank of the SNP in the top SNP list of P (Logistic) sorted by P-value. -  HLA region excluded from logistic models. Table 2 provides for each of the four comparisons some basic information and odds ratios for those SNPs corresponding to the model selected by MOSGWA. Additionally, P-values from CAT test, odds ratios, and P-values based on the univariate logistic model as well as the corresponding rank of each SNP according to the logistic model are presented. A list of the 200 top ranking SNPs for each contrast is provided as Supplementary Materials (Supplementary Table 2).

A vs. C

For the comparison A vs. C, considering only Austrian cases and controls, MOSGWA selected a model of size three including SNPs rs1912804, rs9583269, and rs10495672. The best SNP rs1912804 has a marginal P-value of 3.39×10-7 that is not significant at the commonly adopted genome-wide significance level α=5.0×10-8. The three selected SNPs are among the top seven single marker SNPs (ranks 1, 2 and 7).

A vs. CD

Adding KORA controls increased power to detect associated SNPs. Accordingly, the comparison A vs. CD yielded a model containing seven SNPs including the top SNP rs17659990 (P=1.35×10-7; DOCK3).

AB vs. CD

For the joint analysis of CRC and advanced adenomas versus all controls, AB vs. CD, MOSGWA selected 14 SNPs, including rs17659990 (P=5.43×10-9, DOCK3) that reached the generally accepted level of genome-wide significance, followed by borderline significant rs7742915 (P=8.52×10-8, BTBD9), rs16944613 (P=1.49×10-7, CRTC3), rs13129679 (P=2.38×10-7, RNF4) and rs12953717 (P=3.00×10-7, SMAD7), a well-known CRC susceptibility variant.

B vs. CD

For the comparison of advanced adenomas against the combined control (B vs. CD) MOSGWA identified one SNP on 11q14 (rs7944251, P=3.97×10-7, FAT3). Using only CORSA controls (B vs. C) there was not sufficient power to detect any SNP and MOSGWA selected the null model. Genotype distributions of 56 CRC or colorectal adenoma susceptibility SNPs previously identified by GWAS were analyzed in the present genome-wide data set. Uncorrected P-values for all calculated case-control comparisons are provided in Table 3. For CRC SNPs, not covered by Axiom array, the distance of the proxy SNP from the original CRC SNP is provided in base pairs. P-values below 0.05 are given in bold, P-values below Bonferroni corrected significance level α=8.9×10-4 are given in bold and are underlined. Several SNPs previously identified by CRC GWAS exhibit significantly different genotype distributions in cases and controls. The strongest associations were found for rs12953717 of SMAD7 on chromosome arm 18q21.1 (P(Avs.C)=4.64×10-4, P(Avs.CD)=2.83×10-5, P(ABvs.CD)=8.64×10-6). Significant associations were also observed for the SMAD7 SNP rs4939827 (P(Avs.CD)=4.03×10-4, P(ABvs.CD)=1.53×10-4) and the RHPN2 SNP rs10411210 on 19q13.11 (P(Avs.CD)=3.28×10-4). Several SNPs of the well-known CRC susceptibility loci on chromosome 8 showed differentially distributed genotypes, among them rs16892766 on 8q23.3 (P(Avs.CD)=5.48×10-4, EIF3H), rs10505477 on 8q24.21 (P(Avs.C)=6.08×10-4, CASC8), and rs6983267 also located on 8q24.21 (P(Avs.C)=7.35×10-4, MYC). Also rs3802842 on chromosome 11q23.1 showed significant associations with CRC risk across different comparisons (P(Avs.C)=8.98×10-5, P(Avs.CD)=8.62×10-5, P(ABvs.CD)=1.86×10-5, COLCA1,2).
Table 3

Associations of CRC susceptibility SNPs identified by preceding GWAS

SNPChr.GeneRef.DistanceP_AvCP_AvCDP_ABvCP_ABvCDP_BvCP_BvCD
rs109112511q25.3LAMC12334922.03E-015.59E-012.26E-016.08E-013.38E-018.03E-01
rs66877581q41DUSP101337692.40E-015.32E-014.24E-012.57E-018.29E-013.59E-01
rs66911701q41DUSP10132652.29E-011.19E-013.96E-013.40E-019.64E-019.23E-01
rs23738592p22.1SLC8A1207974.48E-019.68E-012.75E-018.66E-011.98E-015.54E-01
rs119037572q32.3NABP1/SDPR2349097.26E-012.92E-019.02E-012.66E-015.29E-015.34E-01
rs109365993q26.2TERC1382965.38E-018.38E-017.26E-019.83E-019.48E-015.98E-01
rs355092824q32.2FSTL52710309.40E-017.64E-019.20E-018.85E-019.35E-019.10E-01
rs2754545p15.31PAPD72004.71E-012.87E-016.38E-015.04E-019.34E-019.80E-01
rs28536685p15.33TERT2005.61E-018.88E-014.40E-018.84E-015.15E-018.38E-01
rs6471615q31.1PITX1/H2AFY2203.80E-036.79E-026.78E-038.45E-027.56E-023.31E-01
rs13213116p21.2SRSF3/CDKN1A1915412.62E-019.68E-011.88E-017.48E-013.09E-015.60E-01
rs15254617q35TPK12032174.43E-015.59E-012.97E-012.87E-012.33E-011.84E-01
rs168885228q23.3EIF3H2015808.15E-027.24E-022.27E-012.52E-019.43E-016.36E-01
rs16892766*8q23.3TRPS1/EIF3H/UTP231007.75E-035.48E-043.67E-023.23E-034.64E-015.74E-01
rs105054778q24.21CASC82506.08E-043.48E-035.44E-035.10E-023.38E-018.22E-01
rs108085558q24.21CASC8, MYC1103.20E-032.08E-021.22E-021.28E-012.54E-019.78E-01
rs6983267*8q24.21CASC8, MYC407.35E-043.03E-035.10E-034.36E-023.03E-019.34E-01
rs70143468q24.21CASC8902.31E-031.26E-024.42E-033.91E-029.48E-025.69E-01
rs78373288q24.21CASC8112147.89E-039.95E-024.86E-031.43E-014.49E-026.62E-01
rs7197259p24.1TPD52L3/UHRF2/GLDC6340733.07E-015.52E-012.16E-013.59E-012.57E-012.28E-01
rs1079566810p14KRT8P16/TCEB1P31003.26E-012.32E-011.56E-011.40E-011.65E-012.58E-01
rs70401710q23.2ZMIZ1-AS129104256.97E-022.32E-021.65E-012.08E-019.04E-017.31E-01
rs103520910q24.2ABCC2/MRP22604.37E-015.15E-017.78E-019.56E-016.37E-014.96E-01
rs1119617210q25.2TCF7L2292248.12E-016.13E-015.86E-017.73E-012.26E-017.71E-01
rs1224100810q25.2VTI1A285133.40E-015.47E-016.91E-018.73E-014.73E-011.92E-01
rs166565010q26.2HSPA12A2216478.07E-015.09E-015.57E-018.96E-013.95E-018.20E-01
rs153511q12.2FADS22972432.21E-012.31E-024.43E-013.77E-029.01E-014.78E-01
rs17455011q12.2FADS129962.15E-012.18E-023.55E-013.99E-029.99E-014.17E-01
rs424621511q12.2FEN11755312.13E-013.07E-023.40E-015.52E-029.63E-014.76E-01
rs382499911q13.4POLD31913831.26E-012.32E-022.23E-015.58E-026.20E-014.41E-01
rs3802842*11q23.1COLCA1,2708.98E-058.62E-051.11E-041.86E-054.85E-032.91E-03
rs1084943212p13.31CD92919526.48E-017.73E-019.37E-013.89E-016.47E-012.32E-01
rs1077421412p13.32CCND22218163.85E-024.02E-026.87E-023.68E-023.37E-014.48E-01
rs321781012p13.32CCND2238874.27E-011.78E-012.74E-012.81E-022.94E-016.89E-02
rs321790112p13.32CCND22303.20E-013.11E-012.41E-012.18E-011.65E-013.00E-01
rs1116955212q13.12ATF11305.08E-013.61E-012.72E-014.37E-022.05E-014.71E-02
rs713670212q13.12LARP4/DIP2B1317537.08E-015.18E-015.40E-012.69E-013.91E-012.14E-01
rs5933612q24.21TBX32318175.48E-016.78E-017.01E-017.19E-019.24E-017.89E-01
rs731543812q24.21TBX3204811.47E-012.54E-012.78E-011.82E-015.32E-014.60E-01
rs195763614q22.2BMP4/ATP5C1P1/CDKN3/MIR55801638695.53E-011.18E-019.52E-013.92E-014.20E-017.87E-01
rs4444235*14q22.2BMP4/ATP5C1P1/CDKN3/MIR5580706.22E-013.63E-019.03E-015.40E-014.58E-017.19E-01
rs1163271515q13.3SCG5, GREM1, FMN1169895.01E-013.09E-013.03E-011.58E-012.61E-013.17E-01
rs1696968115q13.3SCG5, GREM1, FMN11605.25E-012.82E-015.79E-012.46E-018.45E-013.70E-01
rs477958415q13.3SCG5, GREM1, FMN1707.37E-021.03E-028.98E-028.61E-032.63E-015.18E-02
rs992921816q22.1CDH1/ZFP90707.72E-014.05E-015.89E-018.57E-011.52E-011.06E-01
rs1260352617p13.3NXN2902.93E-011.85E-011.14E-014.43E-021.03E-016.63E-02
rs1295371718q21.1SMAD7504.64E-042.83E-054.55E-048.64E-063.21E-025.04E-03
rs446414818q21.1SMAD75826.75E-021.80E-013.80E-021.11E-011.08E-013.22E-01
rs4939827*18q21.1SMAD7708.37E-034.03E-049.92E-031.53E-041.31E-011.42E-02
rs722963918q21.1SMAD7251702.69E-019.11E-021.36E-011.90E-021.96E-014.29E-02
rs1041121019q13.11RHPN2703.94E-033.28E-042.07E-022.66E-034.64E-012.91E-01
rs224171419q13.2TGFB1, B9D221125066.30E-018.95E-016.57E-019.93E-018.08E-018.30E-01
rs242327920p12.3BMP2/HAO1/FERMT122108158.46E-014.13E-017.65E-018.11E-013.03E-015.80E-01
rs481380220p12.3BMP2/HAO1/FERMT11604.40E-023.49E-025.59E-021.88E-022.09E-011.48E-01
rs96125320p12.3BMP2/HAO1/FERMT1701.71E-011.12E-016.02E-014.29E-014.78E-013.46E-01
rs492538620q13.33LAMA513532639.93E-029.76E-021.25E-024.64E-037.60E-031.11E-03

P-values are uncorrected and P-values <0.05 (5.00E-02) are given in bold.

P-values <0.00089 (8.90E-04) are given in bold and are underlined.

Rs number followed by * indicates CRC SNP with experimentally confirmed functional relevance [52].

P-values are uncorrected and P-values <0.05 (5.00E-02) are given in bold. P-values <0.00089 (8.90E-04) are given in bold and are underlined. Rs number followed by * indicates CRC SNP with experimentally confirmed functional relevance [52]. Several SNPs previously associated not only with risk of CRC but also with risk of colorectal adenoma exhibited borderline significant P-values in comparisons B vs. C and B vs. CD (rs7837328, P(Bvs.C)=4.49×10-2; rs3802842, P(Bvs.C)=4.85×10-3, P(Bvs.CD)=2.91×10-3; rs4939827, P(Bvs.CD)=1.42×10-2; rs4925386, P(Bvs.C)=7.60×10-3, P(Bvs.CD)=1.11×10-3).

DISCUSSION

Most published GWAS are based on single marker analysis in combination with correction for multiple testing, a strategy which has been shown to suffer both from unnecessarily low power and a relatively high risk of false positive detections in case of complex traits [38]. Reduced statistical power reflects one aspect of missing heritability in GWAS [1]. Simulation studies based on real SNP data provided evidence that model selection strategies may outperform multiple testing in detecting causal SNPs [39] while controlling the type I error rate of false detections and therefore, should be used to complement (standard) analysis of GWAS. We performed – to our best knowledge – the first GWAS of CRC in an Austrian cohort including 1060 CRC cases, 689 patients with advanced colorectal adenomas, 928 colonoscopy-negative controls, and additional genotype data of 3439 population-based KORA controls from southern Germany. Model selection analysis was based on MOSGWA [39], a bioinformatical tool for analysis of GWAS using the FDR controlling modification of BIC, mBIC2, which has been shown to have certain optimality properties with respect to the number of missclassifications. Due to its fixed selection criterion, MOSGWA requires no parameter tuning like LASSO-based approaches [40]. In simulation studies [39], MOSGWA exceeded the performance of competing approaches and when re-analyzing data of complex diseases from the Wellcome Trust Case-Control Consortium [41] several SNPs could be identified, which were not detected by other algorithms, but were later confirmed by independent studies [39]. In this study, MOSGWA selected models for different case-control comparisons, including between one and 14 SNPs. The theoretically well-founded advantage of the model selection approach is its larger power to detect candidate SNPs compared to single marker tests while at the same time strictly controlling the false discovery rate. Among all four studied contrasts, single marker tests yielded only one significant SNP (rs17659990, P=5.43×10-9, DOCK3) at the usually recommended genome-wide significance level for the comparison AB vs. CD when considering the entire study population. Rs17659990 is an intronic variant of dedicator of cytokinesis 3 (DOCK3) gene, a gene specifically expressed in the central nervous system, that was associated with an attention deficit hyperactivity disorder-like phenotype [42]. DOCK3, also referred to as modifier of cell adhesion (MOCA), was also shown to be an inhibitor of Wnt/beta-catenin signaling [43], a pathway known to play an important role in colorectal carcinogenesis [44]. Moreover, multiple studies reported DOCK3 to be implicated in cancer cell invasion and migration (as recently reviewed [45]). The SNP rs17659990 was also included in the model A vs. CD (model size 7). For the comparison AB vs. CD, MOSGWA selected a model including 14 SNPs, including apart from rs17659990 another borderline significant SNP (rs7742915, P=8.52×10-8, BTBD9). Rs7742915 of BTB domain containing 9 (BTBD9) gene, a locus encoding a BTB/POZ domain-containing protein, is involved in protein-protein interactions. Genetic variation of BTBD9 was associated with susceptibility to Restless Legs Syndrome [46]. Aside from rs17659990 and rs7742915, further 12 variants with marginal P-values (P>5.0×10-8) were selected for AB vs. CD comparison including rs16944613 (P=1.49×10-7, CRTC3), rs13129679 (P=2.38×10-7, RNF4), and rs12953717 (P=3.00×10-7, SMAD7). Rs12953717 located in intron 3 of SMAD7 gene has been previously linked to CRC risk by two GWAS [5, 9] and was subsequently confirmed as CRC susceptibility variant [47, 48] as recently discussed by Stolfi et al. [49]. SMAD7 is a negative regulator of transforming growth factor-β signaling. Depending on single marker tests only, SMAD7 rs12953717 may not have been regarded as a candidate SNP in our study. Interestingly, rs1912804 of WW domain-containing oxidoreductase (WWOX) gene emerged in this study of CRC (A vs. C). Defects in this tumor suppressor gene were associated with multiple cancers [50] and altered WWOX expression was observed in tissues of CRC [51]. Recently, WWOX was shown to be involved in double-strand break repair [50]. Although defects in mismatch repair (MMR) genes influence both, hereditary and sporadic CRCs (recently reviewed [52]), no CRC risk SNPs annotating to MMR genes were identified by GWAS thus far. In this study, we used model selection as a tool to detect SNPs associated with CRC, not aiming at the identification of a model which can be used later for prediction. Therefore, we do not provide model coefficients obtained by MOSGWA but only report the detected SNPs. This is crucial to understand the principle and function of model selection as tool for analysis of GWAS. Considering the identification of disease associated SNPs as a high-dimensional classification problem, SNPs can be classified as either associated or not associated with the trait. Theoretical results showed that performing model selection using the FDR controlling mBIC2 selection criterion yields a classification procedure which asymptotically minimizes the misclassification rate. The expected proportion of false positive SNPs is controlled at a level which decreases with sample size and which will be for this study below 5%. Therefore, about one or two false positive detections can be expected among the reported 14 SNPs in model AB vs. CD. CRC SNPs identified by preceding GWAS were tested in a hypothesis-driven approach and a number of these SNPs exhibited relevant differences between cases and controls in our data set. Several risk variants were replicated in this study for the first time in the Austrian population. The strongest associations were observed for SNPs annotating to the following genes: SMAD7, RHPN2, EIF3H, CASC8, MYC, and COLCA1,2. Functional relevance was experimentally confirmed for only five common CRC risk loci [52]. Four of them (rs16892766, EIF3H; rs6983267, MYC; rs3802842, COLCA2 and rs4939827, SMAD7) also play a role in our study population. Sporadic CRCs usually arise from premalignant lesions (adenoma-carcinoma sequence), thus high-risk adenomas impact CRC risk [53, 54]. Removal of advanced adenomas during colonoscopy reduces mortality from CRC [55]. We included advanced colorectal adenomas into this study because these precursors are important targets for CRC prevention. Previously unreported rs7944251 of FAT tumor suppressor homolog 3 (FAT3) was associated with reduced risk of advanced adenoma (OR=0.66, P=3.97×10-7) and the SNP was also selected when comparing advanced adenomas with the combined control group (B vs. CD). All previously unreported candidate SNPs demand replication in independent CRC cohorts. A strength of this study is the dual approach to analyze genotype distributions in a genome-wide SNP dataset including CRC cases, advanced adenomas and controls. CORSA controls (C) received a complete colonoscopy within B-PREDICT screening and were known to be free of colorectal polyps and CRC. Sometimes, these colonoscopy-negative controls are also referred to as “super-controls” [12]. A recent study indicated that exclusion of controls with a family history of CRC and of controls with record of colorectal adenomas can increase power [56]. To our knowledge, this is the first GWAS of CRC investigating Austrian CRC cases and premalignant colorectal tumors. However, limitations of the study are the limited sample size, especially in the subgroup of advanced adenomas as well as limited availability of environmental data of CRC cases impeding stratification analysis for environmental risk factors. To increase statistical power, individual level genotype data of additional controls (KORA) were included in the study. Because CORSA recruitment is ongoing, further Austrian CRC cases will be genotyped and integrated into the analysis to investigate population specific SNP signatures of CRC risk. Meta-analysis of GWAS present a powerful strategy to enhance the power of identifying weak genetic associations with disease phenotype, but is often complicated by between-study heterogeneity. Precision gained by combination of datasets may be spurious due to different study designs, divergent LD structures, different patterns of correlated phenotypes or dissimilar gene-environment interactions across populations [57, 58]. The application of CRC SNP signatures to improve screening decisions is presently impeded by the fact that single risk variants account only for little heritability and thereby explain a small increment of risk. We hypothesize that potentially disease relevant variants not reaching genome-wide significance may explain a substantial part of missing heritability and are worth exploration and follow-up. Also epigenetic alterations play an important role in colorectal carcinogenesis [59]. The combination of genetic and epigenetic biomarkers to a multi-marker panel considering also environmental risk factors could be suited to complement present screening strategies and for instance be applied after a positive fecal occult blood test, but prior to an invasive colonoscopy. Genetic risk variants are ideal candidates for the development of minimal-invasive and cost-effective biomarker tests enabling personal risk profiling. In the near future, management of CRC will increasingly focus on personalized screening and treatment strategies aiming at early detection and prevention of disease. A combination of single marker tests and model selection in high dimensions may facilitate the identification of marker candidates otherwise not detected due to stringent penalties for multiple testing.

MATERIALS AND METHODS

Study population

In this GWAS, 2677 individuals of our ongoing Colorectal Cancer Study of Austria (CORSA) [60, 61] were genotyped including 1060 CRC cases, 689 patients with advanced adenomas and 928 colonoscopy-negative controls. CRC cases were patients with histologically confirmed, sporadic CRC. CRC cases with clinical record of inflammatory bowel disease (IBD) were excluded from the study. Advanced adenomas included adenomatous villous, adenomatous tubulovillous and tubular polyps larger than 1cm in diameter. All controls received a complete colonoscopy and exhibited no pathological findings. From June 2003 to November 2012 CORSA participants had been recruited in four hospitals in the province Burgenland (Oberpullendorf, Kittsee, Oberwart and Güssing), Austria, at the Medical University of Vienna (Department of Surgery), and the Medical University of Graz (Department of Internal Medicine). To augment statistical power, individual level genotype data of 3439 additional control individuals from the German “Cooperative Health Research in the Region of Augsburg” (KORA) platform were included in this study [62]. Population-based controls from the studies S4 and F4 were integrated. To ensure exclusion of CRC cases from the KORA control set, all individuals with evidence of malignant diseases were removed from the dataset. In total, 6116 individuals (1749 colorectal tumors and 4367 controls) were included in this study.

Ethics statement

Written informed consent was obtained from all participants of CORSA. The study was approved by the ethical review committee of the Medical University of Vienna (MUW, EK Nr. 703/2010) and the “Ethikkommission Burgenland” (KRAGES, 33/2010). Conduct of the study followed the approved study protocol and all methods were performed in accordance with the relevant guidelines and regulations. Approval for the use of KORA data was obtained from the KORA-Study Group (K072/13).

Genotyping

Genomic DNA was purified from peripheral blood following the QIAamp DNA Blood Midi Spin Protocol (QIAGEN, Valencia, CA). Genotyping was performed using population-optimized Axiom Genome-Wide CEU 1 Arrays (Affymetrix, Santa Clara, CA) analyzing 587,532 SNPs. Array processing was performed at the Institute of Human Genetics, Helmholtz Center Munich. KORA samples were genotyped on the same array type.

Statistical analysis

Extensive quality control and genotype calling was performed with Affymetrix Genotyping Console Software 4.1.3.840 (www.affymetrix.com). 2469 genotyped CORSA subjects survived QC filtering (Dish QC >0.82, call rate >97.5%). Inclusion criteria for SNPs eligible for downstream analysis were a minor allele frequency (MAF) >1%, Hardy-Weinberg equilibrium (HWE) P-value cut-off >1.00×10-8, a SNP call rate >97.5%, and >95% calls per individual. 271 SNPs were discarded due to showing significant difference between the CORSA and KORA control group (P-values smaller than 1.00×10-7 in a simple Fisher exact test comparing controls as suggested in [63]). After filtering, 492,217 SNPs remained for which imputation of missing genotypes was performed using Beagle software v.4.0 r1274 [64]. The primary aim of the study was to find SNPs which are associated with CRC or with advanced adenomas, respectively. To this end we performed traditional single marker based analysis as well as a more involved model selection based approach. Single marker analysis was performed with PLINK 1.9 beta 3 (www.cog-genomics.org/plink2) [65]. We report P-values of CAT as well as from a logistic regression model including the factors age and the leading four principal components from a principal component analysis (PCA) which was used to adjust for population structure [66]. A PCA plot of the first four principal components plotted against each other is provided in Supplementary Figure 1. Genotype cluster plots of all reported SNPs underwent visual inspection. For model selection analysis, the software package MOSGWA was applied (http://mosgwa.sourceforge.net) [39] using multi-marker logistic regression models including again the factors age and the leading four principal components as covariates which were not under selection. In addition to the genome-wide analysis we inspected specifically 56 SNPs which were previously reported in the GWAS literature to be involved in colorectal carcinogenesis. For SNPs not represented on the array, suitable proxies were identified and tested.
  62 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

2.  Differential expressions of cancer-associated genes and their regulatory miRNAs in colorectal carcinoma.

Authors:  Murat Kara; Onder Yumrutas; Onder Ozcan; Ozgur Ilhan Celik; Esra Bozgeyik; Ibrahim Bozgeyik; Sener Tasdemir
Journal:  Gene       Date:  2015-04-27       Impact factor: 3.688

3.  Gene-environment interaction involving recently identified colorectal cancer susceptibility Loci.

Authors:  Elizabeth D Kantor; Carolyn M Hutter; Jessica Minnier; Sonja I Berndt; Hermann Brenner; Bette J Caan; Peter T Campbell; Christopher S Carlson; Graham Casey; Andrew T Chan; Jenny Chang-Claude; Stephen J Chanock; Michelle Cotterchio; Mengmeng Du; David Duggan; Charles S Fuchs; Edward L Giovannucci; Jian Gong; Tabitha A Harrison; Richard B Hayes; Brian E Henderson; Michael Hoffmeister; John L Hopper; Mark A Jenkins; Shuo Jiao; Laurence N Kolonel; Loic Le Marchand; Mathieu Lemire; Jing Ma; Polly A Newcomb; Heather M Ochs-Balcom; Bethann M Pflugeisen; John D Potter; Anja Rudolph; Robert E Schoen; Daniela Seminara; Martha L Slattery; Deanna L Stelling; Fridtjof Thomas; Mark Thornquist; Cornelia M Ulrich; Greg S Warnick; Brent W Zanke; Ulrike Peters; Li Hsu; Emily White
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2014-07-03       Impact factor: 4.254

Review 4.  Genetics and Genetic Biomarkers in Sporadic Colorectal Cancer.

Authors:  John M Carethers; Barbara H Jung
Journal:  Gastroenterology       Date:  2015-07-26       Impact factor: 22.682

5.  Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk.

Authors:  Emma Jaeger; Emily Webb; Kimberley Howarth; Luis Carvajal-Carmona; Andrew Rowan; Peter Broderick; Axel Walther; Sarah Spain; Alan Pittman; Zoe Kemp; Kate Sullivan; Karl Heinimann; Steven Lubbe; Enric Domingo; Ella Barclay; Lynn Martin; Maggie Gorman; Ian Chandler; Jayaram Vijayakrishnan; Wendy Wood; Elli Papaemmanuil; Steven Penegar; Mobshra Qureshi; Susan Farrington; Albert Tenesa; Jean-Baptiste Cazier; David Kerr; Richard Gray; Julian Peto; Malcolm Dunlop; Harry Campbell; Huw Thomas; Richard Houlston; Ian Tomlinson
Journal:  Nat Genet       Date:  2007-12-16       Impact factor: 38.330

6.  PUMA: a unified framework for penalized multiple regression analysis of GWAS data.

Authors:  Gabriel E Hoffman; Benjamin A Logsdon; Jason G Mezey
Journal:  PLoS Comput Biol       Date:  2013-06-27       Impact factor: 4.475

7.  Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer.

Authors:  Ian P M Tomlinson; Luis G Carvajal-Carmona; Sara E Dobbins; Albert Tenesa; Angela M Jones; Kimberley Howarth; Claire Palles; Peter Broderick; Emma E M Jaeger; Susan Farrington; Annabelle Lewis; James G D Prendergast; Alan M Pittman; Evropi Theodoratou; Bianca Olver; Marion Walker; Steven Penegar; Ella Barclay; Nicola Whiffin; Lynn Martin; Stephane Ballereau; Amy Lloyd; Maggie Gorman; Steven Lubbe; Bryan Howie; Jonathan Marchini; Clara Ruiz-Ponte; Ceres Fernandez-Rozadilla; Antoni Castells; Angel Carracedo; Sergi Castellvi-Bel; David Duggan; David Conti; Jean-Baptiste Cazier; Harry Campbell; Oliver Sieber; Lara Lipton; Peter Gibbs; Nicholas G Martin; Grant W Montgomery; Joanne Young; Paul N Baird; Steven Gallinger; Polly Newcomb; John Hopper; Mark A Jenkins; Lauri A Aaltonen; David J Kerr; Jeremy Cheadle; Paul Pharoah; Graham Casey; Richard S Houlston; Malcolm G Dunlop
Journal:  PLoS Genet       Date:  2011-06-02       Impact factor: 5.917

8.  A novel colorectal cancer risk locus at 4q32.2 identified from an international genome-wide association study.

Authors:  Stephanie L Schmit; Fredrick R Schumacher; Christopher K Edlund; David V Conti; Leon Raskin; Flavio Lejbkowicz; Mila Pinchev; Hedy S Rennert; Mark A Jenkins; John L Hopper; Daniel D Buchanan; Noralane M Lindor; Loic Le Marchand; Steven Gallinger; Robert W Haile; Polly A Newcomb; Shu-Chen Huang; Gad Rennert; Graham Casey; Stephen B Gruber
Journal:  Carcinogenesis       Date:  2014-07-14       Impact factor: 4.741

Review 9.  The genetic heterogeneity of colorectal cancer predisposition - guidelines for gene discovery.

Authors:  M M Hahn; R M de Voer; N Hoogerbrugge; M J L Ligtenberg; R P Kuiper; A Geurts van Kessel
Journal:  Cell Oncol (Dordr)       Date:  2016-06-09       Impact factor: 6.730

10.  Statistical power of model selection strategies for genome-wide association studies.

Authors:  Zheyang Wu; Hongyu Zhao
Journal:  PLoS Genet       Date:  2009-07-31       Impact factor: 5.917

View more
  10 in total

1.  Interactions of PVT1 and CASC11 on Prostate Cancer Risk in African Americans.

Authors:  Hui-Yi Lin; Catherine Y Callan; Zhide Fang; Heng-Yuan Tung; Jong Y Park
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2019-03-26       Impact factor: 4.254

2.  Association analyses identify 31 new risk loci for colorectal cancer susceptibility.

Authors:  Philip J Law; Maria Timofeeva; Ceres Fernandez-Rozadilla; Peter Broderick; James Studd; Juan Fernandez-Tajes; Susan Farrington; Victoria Svinti; Claire Palles; Giulia Orlando; Amit Sud; Amy Holroyd; Steven Penegar; Evropi Theodoratou; Peter Vaughan-Shaw; Harry Campbell; Lina Zgaga; Caroline Hayward; Archie Campbell; Sarah Harris; Ian J Deary; John Starr; Laura Gatcombe; Maria Pinna; Sarah Briggs; Lynn Martin; Emma Jaeger; Archana Sharma-Oates; James East; Simon Leedham; Roland Arnold; Elaine Johnstone; Haitao Wang; David Kerr; Rachel Kerr; Tim Maughan; Richard Kaplan; Nada Al-Tassan; Kimmo Palin; Ulrika A Hänninen; Tatiana Cajuso; Tomas Tanskanen; Johanna Kondelin; Eevi Kaasinen; Antti-Pekka Sarin; Johan G Eriksson; Harri Rissanen; Paul Knekt; Eero Pukkala; Pekka Jousilahti; Veikko Salomaa; Samuli Ripatti; Aarno Palotie; Laura Renkonen-Sinisalo; Anna Lepistö; Jan Böhm; Jukka-Pekka Mecklin; Daniel D Buchanan; Aung-Ko Win; John Hopper; Mark E Jenkins; Noralane M Lindor; Polly A Newcomb; Steven Gallinger; David Duggan; Graham Casey; Per Hoffmann; Markus M Nöthen; Karl-Heinz Jöckel; Douglas F Easton; Paul D P Pharoah; Julian Peto; Federico Canzian; Anthony Swerdlow; Rosalind A Eeles; Zsofia Kote-Jarai; Kenneth Muir; Nora Pashayan; Andrea Harkin; Karen Allan; John McQueen; James Paul; Timothy Iveson; Mark Saunders; Katja Butterbach; Jenny Chang-Claude; Michael Hoffmeister; Hermann Brenner; Iva Kirac; Petar Matošević; Philipp Hofer; Stefanie Brezina; Andrea Gsur; Jeremy P Cheadle; Lauri A Aaltonen; Ian Tomlinson; Richard S Houlston; Malcolm G Dunlop
Journal:  Nat Commun       Date:  2019-05-14       Impact factor: 14.919

Review 3.  Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population.

Authors:  Karen H Miga
Journal:  Genes (Basel)       Date:  2019-05-08       Impact factor: 4.096

4.  A genome-wide association study of plasma concentrations of warfarin enantiomers and metabolites in sub-Saharan black-African patients.

Authors:  Innocent G Asiimwe; Marc Blockman; Karen Cohen; Clint Cupido; Claire Hutchinson; Barry Jacobson; Mohammed Lamorde; Jennie Morgan; Johannes P Mouton; Doreen Nakagaayi; Emmy Okello; Elise Schapkaitz; Christine Sekaggya-Wiltshire; Jerome R Semakula; Catriona Waitt; Eunice J Zhang; Andrea L Jorgensen; Munir Pirmohamed
Journal:  Front Pharmacol       Date:  2022-09-23       Impact factor: 5.988

5.  Identifying Novel Susceptibility Genes for Colorectal Cancer Risk From a Transcriptome-Wide Association Study of 125,478 Subjects.

Authors:  Xingyi Guo; Weiqiang Lin; Wanqing Wen; Jeroen Huyghe; Stephanie Bien; Qiuyin Cai; Tabitha Harrison; Zhishan Chen; Conghui Qu; Jiandong Bao; Jirong Long; Yuan Yuan; Fangqin Wang; Mengqiu Bai; Goncalo R Abecasis; Demetrius Albanes; Sonja I Berndt; Stéphane Bézieau; D Timothy Bishop; Hermann Brenner; Stephan Buch; Andrea Burnett-Hartman; Peter T Campbell; Sergi Castellví-Bel; Andrew T Chan; Jenny Chang-Claude; Stephen J Chanock; Sang Hee Cho; David V Conti; Albert de la Chapelle; Edith J M Feskens; Steven J Gallinger; Graham G Giles; Phyllis J Goodman; Andrea Gsur; Mark Guinter; Marc J Gunter; Jochen Hampe; Heather Hampel; Richard B Hayes; Michael Hoffmeister; Ellen Kampman; Hyun Min Kang; Temitope O Keku; Hyeong Rok Kim; Loic Le Marchand; Soo Chin Lee; Christopher I Li; Li Li; Annika Lindblom; Noralane Lindor; Roger L Milne; Victor Moreno; Neil Murphy; Polly A Newcomb; Deborah A Nickerson; Kenneth Offit; Rachel Pearlman; Paul D P Pharoah; Elizabeth A Platz; John D Potter; Gad Rennert; Lori C Sakoda; Clemens Schafmayer; Stephanie L Schmit; Robert E Schoen; Fredrick R Schumacher; Martha L Slattery; Yu-Ru Su; Catherine M Tangen; Cornelia M Ulrich; Franzel J B van Duijnhoven; Bethany Van Guelpen; Kala Visvanathan; Pavel Vodicka; Ludmila Vodickova; Veronika Vymetalkova; Xiaoliang Wang; Emily White; Alicja Wolk; Michael O Woods; Graham Casey; Li Hsu; Mark A Jenkins; Stephen B Gruber; Ulrike Peters; Wei Zheng
Journal:  Gastroenterology       Date:  2020-10-12       Impact factor: 33.883

6.  Detecting Shared Genetic Architecture Among Multiple Phenotypes by Hierarchical Clustering of Gene-Level Association Statistics.

Authors:  Melissa R McGuirl; Samuel Pattillo Smith; Björn Sandstede; Sohini Ramachandran
Journal:  Genetics       Date:  2020-04-03       Impact factor: 4.562

7.  Functional Polymorphisms in DNA Repair Genes Are Associated with Sporadic Colorectal Cancer Susceptibility and Clinical Outcome.

Authors:  Katerina Jiraskova; David J Hughes; Stefanie Brezina; Tanja Gumpenberger; Veronika Veskrnova; Tomas Buchler; Michaela Schneiderova; Miroslav Levy; Vaclav Liska; Sona Vodenkova; Cornelia Di Gaetano; Alessio Naccarati; Barbara Pardini; Veronika Vymetalkova; Andrea Gsur; Pavel Vodicka
Journal:  Int J Mol Sci       Date:  2018-12-27       Impact factor: 5.923

8.  A network view of microRNA and gene interactions in different pathological stages of colon cancer.

Authors:  Jia Wen; Benika Hall; Xinghua Shi
Journal:  BMC Med Genomics       Date:  2019-12-30       Impact factor: 3.063

9.  Genome-wide association studies and heritability analysis reveal the involvement of host genetics in the Japanese gut microbiota.

Authors:  Sachiko Ishida; Kumiko Kato; Masami Tanaka; Toshitaka Odamaki; Ryuichi Kubo; Eri Mitsuyama; Jin-Zhong Xiao; Rui Yamaguchi; Satoshi Uematsu; Seiya Imoto; Satoru Miyano
Journal:  Commun Biol       Date:  2020-11-18

10.  Polymorphisms within Autophagy-Related Genes Influence the Risk of Developing Colorectal Cancer: A Meta-Analysis of Four Large Cohorts.

Authors:  Juan Sainz; Francisco José García-Verdejo; Manuel Martínez-Bueno; Abhishek Kumar; José Manuel Sánchez-Maldonado; Anna Díez-Villanueva; Ludmila Vodičková; Veronika Vymetálková; Vicente Martin Sánchez; Miguel Inacio Da Silva Filho; Belém Sampaio-Marques; Stefanie Brezina; Katja Butterbach; Rob Ter Horst; Michael Hoffmeister; Paula Ludovico; Manuel Jurado; Yang Li; Pedro Sánchez-Rovira; Mihai G Netea; Andrea Gsur; Pavel Vodička; Víctor Moreno; Kari Hemminki; Hermann Brenner; Jenny Chang-Claude; Asta Försti
Journal:  Cancers (Basel)       Date:  2021-03-12       Impact factor: 6.639

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.