| Literature DB >> 17683639 |
Laurent Briollais1, Yuanyuan Wang, Isaac Rajendram, Venus Onay, Ellen Shi, Julia Knight, Hilmi Ozcelik.
Abstract
BACKGROUND: There is growing evidence that gene-gene interactions are ubiquitous in determining the susceptibility to common human diseases. The investigation of such gene-gene interactions presents new statistical challenges for studies with relatively small sample sizes as the number of potential interactions in the genome can be large. Breast cancer provides a useful paradigm to study genetically complex diseases because commonly occurring single nucleotide polymorphisms (SNPs) may additively or synergistically disturb the system-wide communication of the cellular processes leading to cancer development.Entities:
Mesh:
Year: 2007 PMID: 17683639 PMCID: PMC1976420 DOI: 10.1186/1741-7015-5-22
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Figure 1Application of CART to the XPD*IL10 interaction. CART sequentially partitions the data into two homogeneous subsets: first using XPD-[Lys751Gln], {AA} versus {AC, CC}; and then the {AA} subset is split according to IL10-[G(-1082)A], {AA, AG} versus {GG}. The splitting variables leading to such groups are inherent main or interaction effects. For example, a low-risk subgroup is defined by the two SNPs: XPD-[Lys751Gln] and IL10-[G(-1082)A] and the tree suggests an interaction between these two SNPs. Multi-way interactions can be detected in a similar way. The terminal nodes can be classified as low- or high-risk subgroups (indicated by different color density) and their association with the outcome can be estimated (that is, the corresponding odds ratio is 0.63 with a P-value of 0.002). Therefore, investigating the tree terminal nodes provides a natural way to identify interactions and characterize high- or low-risk subgroups.
Figure 2Example of partitions of two-locus genotypes with the three methods. (a-d) The four partitions identified by MDR for the XPD-CYP17 two-locus genotypes. (e) The best partition found by MDR for the COMT-CCDN1 two-locus genotypes. (f) The best partition found by CART for the CYP17-BARD1 two-locus genotypes. Shaded cells are classified as high-risk and non-shaded cells as low-risk. This corresponds to a ratio of cases versus controls higher or lower than 1, respectively. The four partitions of the two-locus genotypes found by MDR showed two cells with different assignments. In (f), CART can partition the two-locus genotypes in more than two groups, but for the purpose of comparison with MDR, we used the same high-risk/low-risk grouping.
Two-way interactions detected by the three methods with a P-value less than 5%
| Rank | LRM | CART | MDR | |||
| Interaction | Interaction | Interaction | ||||
| 1 | 0.013 | 0.006* | 0.009* | |||
| 2 | 0.020 | 0.013 | 0.019 | |||
| 3 | 0.046 | 0.037 | ||||
| 4 | 0.037 | |||||
*Significant after Bonferroni correction for the number of tests performed in the second stage analysis.
Interactions and risk subgroups identified by CART with a P-value less than 5%
| Rank | Interaction | |
| 1 | 0.006 | |
| 2 | 0.013 | |
| 3 | 0.016 | |
| 4 | 0.019 | |
| 5 | 0.020 | |
| 6 | 0.043 | |
| 7 | 0.050 |
Interactions selected by MDR with permutation P-value less than 5%
| Rank | Interactions | Testing accuracy* | Permutation |
| 1 | 58.2% | <0.001 | |
| 2 | 60.2% | <0.001 | |
| 3 | 58.4% | 0.001 | |
| 4 | 57.5% | 0.002 | |
| 5 | 56.7% | 0.006 | |
| 6 | 57.5% | 0.006 | |
| 7 | 57.2% | 0.007 | |
| 8 | 58.2% | 0.007 | |
| 9 | 55.9% | 0.009 | |
| 10 | 56.3% | 0.016 | |
| 11 | 55.6% | 0.017 | |
| 12 | 55.9% | 0.017 | |
| 13 | 55.2% | 0.019 | |
| 14 | 55.7% | 0.020 | |
| 15 | 55.7% | 0.023 | |
| 16 | 54.5% | 0.037 | |
| 17 | 54.7% | 0.037 | |
| 18 | 55.9% | 0.046 |
*1 - prediction error.
Main characteristics of each approach used to model SNP-SNP interactions
| Approach | Type of two-locus model detected | Pattern of complex interactions | Potential advantages | Potential limitations | Possible improvements |
| LRM | Logical AND models – multiplicative models | Can not be investigated | Easy to fit | Curse of dimensionality | Logic regression |
| CART | Conditional recessive or dominant models | Driven by SNP main effects and binary splits | Deals with sparse data | Influence of main effects | Random forest |
| MDR | All types | Diverse | Deals with sparse data | Over-fitting | Limit plausible genetic models |
*Multivariate adaptive regression splines