| Literature DB >> 23591404 |
Nicholas B Larson1, Gregory D Jenkins1, Melissa C Larson1, Robert A Vierkant1, Thomas A Sellers2, Catherine M Phelan2, Joellen M Schildkraut3, Rebecca Sutphen4, Paul P D Pharoah5, Simon A Gayther6, Nicolas Wentzensen7, Ellen L Goode1, Brooke L Fridley8.
Abstract
Although single-locus approaches have been widely applied to identify disease-associated single-nucleotide polymorphisms (SNPs), complex diseases are thought to be the product of multiple interactions between loci. This has led to the recent development of statistical methods for detecting statistical interactions between two loci. Canonical correlation analysis (CCA) has previously been proposed to detect gene-gene coassociation. However, this approach is limited to detecting linear relations and can only be applied when the number of observations exceeds the number of SNPs in a gene. This limitation is particularly important for next-generation sequencing, which could yield a large number of novel variants on a limited number of subjects. To overcome these limitations, we propose an approach to detect gene-gene interactions on the basis of a kernelized version of CCA (KCCA). Our simulation studies showed that KCCA controls the Type-I error, and is more powerful than leading gene-based approaches under a disease model with negligible marginal effects. To demonstrate the utility of our approach, we also applied KCCA to assess interactions between 200 genes in the NF-κB pathway in relation to ovarian cancer risk in 3869 cases and 3276 controls. We identified 13 significant gene pairs relevant to ovarian cancer risk (local false discovery rate <0.05). Finally, we discuss the advantages of KCCA in gene-gene interaction analysis and its future role in genetic association studies.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23591404 PMCID: PMC3865403 DOI: 10.1038/ejhg.2013.69
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Figure 1Line plot of the empirical Type-I error rates from the null simulations across trim levels ω=0.00, 0.05, 0.10, 0.15, 0.20, and 0.25, for sample sizes of 500, 1000, and 1500.
KCCA null simulation results for various trimming values , which includes the P-value for the Kolmogorov-Smirnov test for normality (KS), the empirical SD and mean of the simulated test statistic distribution, as well as the realized Type-I error rate rejecting at an -level of 0.05
| | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.00 | <0.001 | 0.633 | 0.002 | 0.001 | <0.001 | 0.643 | 0.001 | 0.003 | <0.001 | 0.757 | 0.002 | 0.006 |
| 0.05 | <0.001 | 0.798 | −0.001 | 0.012 | <0.001 | 0.801 | −0.001 | 0.016 | <0.001 | 0.938 | −0.001 | 0.021 |
| 0.10 | <0.001 | 0.904 | −0.003 | 0.033 | <0.001 | 0.902 | −0.002 | 0.027 | 0.010 | 1.038 | 0.002 | 0.036 |
| 0.15 | 0.281 | 1.007 | −0.003 | 0.540 | 0.810 | 1.008 | −0.004 | 0.046 | 0.538 | 1.144 | 0.004 | 0.054 |
| 0.20 | 0.002 | 1.107 | −0.001 | 0.081 | 0.002 | 1.114 | −0.005 | 0.072 | 0.001 | 1.250 | 0.005 | 0.078 |
| 0.25 | <0.001 | 1.201 | −0.001 | 0.107 | <0.001 | 1.220 | −0.005 | 0.095 | <0.001 | 1.363 | 0.004 | 0.110 |
Figure 2Barplots of the empirical power at α-level of 0.05 for the KCCA, CCA, PC-LR, and CLD gene–gene interaction methods, for sample sizes of 500, 1000, and 1500.
Simulation result for empirical power, using =0.05 level significance testing, for the KCCA, CCA, PC-LR, and CLD testing procedures across varying numbers of markers per gene
| | |||||
| 0.852 | 0.818 | 0.799 | 0.779 | 0.757 | |
| 0.428 | 0.097 | 0.012 | 0.002 | 0.000 | |
| 0.570 | 0.575 | 0.552 | 0.545 | 0.544 | |
| 0.724 | 0.570 | 0.371 | 0.305 | 0.234 | |
Sample sizes are fixed at 1000, with five causal markers per gene.
Detailed results of the significant KCCA gene–gene coassociations for analysis of ovarian cancer risk of significant (lFDR≤0.05) gene–gene interactions from FOCI analysis ranked by estimated lFDR
| 0.0875 | 0.0074 | 14.2897 | <1.00E–16 | 3.38E–21 | −0.7248 | 0.46853 | ||
| 1.7243 | 1.9167 | −12.8110 | <1.00E–16 | 1.58E–15 | −0.0938 | 0.92525 | ||
| 0.0976 | 0.0196 | 8.1708 | 2.22E–16 | 8.53E–06 | −0.2205 | 0.82548 | ||
| 0.1146 | 0.0462 | 8.2271 | 2.22E–16 | 7.11E–06 | −0.0947 | 0.92449 | ||
| 0.1049 | 0.0344 | 9.2402 | <1.00E–16 | 5.65E–08 | 0.2640 | 0.79171 | ||
| 0.0436 | 0.0989 | −6.8641 | 6.69E–12 | 0.008161 | −0.1386 | 0.88972 | ||
| 0.1073 | 0.0524 | 6.8256 | 8.75E–12 | 0.001646 | −0.3888 | 0.69736 | ||
| 0.1025 | 0.0317 | 10.6548 | <1.00E–16 | 2.59E–11 | −0.2520 | 0.80098 | ||
| 0.0763 | 0.0194 | 5.7655 | 8.14E–09 | 0.028977 | 0.7772 | 0.43702 | ||
| 0.1125 | 0.0465 | 5.5000 | 3.80E–08 | 0.049916 | −1.0316 | 0.30223 | ||
| 0.0601 | 0.0001 | 10.8523 | <1.00E–16 | 8.74E–12 | −0.1986 | 0.84251 | ||
| 0.1032 | 0.0377 | 6.1801 | 6.41E–10 | 0.010925 | 0.3529 | 0.72412 | ||
| 0.0231 | 0.1012 | −6.5902 | 4.39E–11 | 0.016345 | 0.0313 | 0.97502 | ||
Abbreviations: CCA, canonical correlation analysis; FOCI, Follow-up Ovarian Cancer Genetic Association and Interaction Studies; KCCA, kernelized version of CCA; lFDR, local false discovery rate.
Includes the Fisher-transformed maximal kernel canonical correlation values for cases and controls, the resulting test statistic, P-values, and lFDR estimates for KCCA, as well as results for the CCA analysis.
Figure 3Colorized image plot of Pearson's correlation values between SNPs for CASP8-MAP3K3 coassociation for cases (left) and controls (right). The axes depict the genomic position of the markers on the respective genes.
Figure 4Barplots of the empirical power at α-level of 0.05 for the KCCA, CCA, PC-LR, and CLD gene–gene interaction methods, under sample sizes of 500, 1000, and 1500 with the inclusion of statistically significant marginal effects.