| Literature DB >> 29931044 |
Joeri J Meijsen1,2, Alexandros Rammos1,3, Archie Campbell1, Caroline Hayward4, David J Porteous1,2, Ian J Deary2,5, Riccardo E Marioni1,2, Kristin K Nicodemus1,2.
Abstract
Motivation: The genomic architecture of human complex diseases is thought to be attributable to single markers, polygenic components and epistatic components. No study has examined the ability of tree-based methods to detect epistasis in the presence of a polygenic signal. We sought to apply decision tree-based methods, C5.0 and logic regression, to detect epistasis under several simulated conditions, varying strength of interaction and linkage disequilibrium (LD) structure. We then applied the same methods to the phenotype of educational attainment in a large population cohort.Entities:
Mesh:
Substances:
Year: 2019 PMID: 29931044 PMCID: PMC6330004 DOI: 10.1093/bioinformatics/bty462
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Visual representation of a C5.0 and logic tree. (A) C5.0 decision tree; (B) logic tree
Two-SNP interaction models, R2 and P-values
| Model | β1,β2 | β3 | Mean | Median | ||
|---|---|---|---|---|---|---|
| Strong | 0.2 | 0.24 | 1.6 | 35.6 | 1.3 × 10−10 | 3.5 × 10−17 |
| Intermediate | 0.125 | 0.15 | 0.82 | 17.8 | 3.6 × 10−4 | 2.0 × 10−7 |
| Weak | 0.07 | 0.09 | 0.35 | 6.8 | 3.1 × 10−2 | 1.8 × 10−3 |
Three-SNP interaction models, R2 and P-values
| Model | β1,β2,β3 | β4, β5,β6 | β7 | Mean | Median | |||
|---|---|---|---|---|---|---|---|---|
| Pure | 0 | 0 | 0.4 | 0.04 | 1.86 | 30.1 | 1.0 × 10−14 | 1.1 × 10−22 |
| Strong | 0.05 | 0.1 | 0.2 | 0.19 | 0.41 | 39.9 | 4.5 × 10−4 | 6.6 × 10−7 |
| Weak | 0.025 | 0.05 | 0.1 | 0.15 | 0.1 | 14.3 | 7.7 × 10−2 | 1.3 × 10−2 |
Fig. 2.Flow chart for logic regression analyses
Power of C5.0 and logic regression in pruned and unpruned data, with and without polygenic component
| Condition | C5.0: − polygenic | LR: − polygenic | C5.0 + polygenic | LR + polygenic |
|---|---|---|---|---|
| 2-SNP, Pruned, Weak | 8.6% | 77.0% | 0% | 9.6% |
| 2-SNP, Pruned, Intermediate | 99.2% | 98.8% | 23% | 82.8% |
| 2-SNP, Pruned, Strong | 100% | 99.8% | 100% | 98.8% |
| 2-SNP, Unpruned, Weak | 19.8% | 0.3% | 0.6% | 0% |
| 2-SNP, Unpruned, Intermediate | 98.2% | 17.6% | 41.6% | 2.2% |
| 2-SNP, Unpruned, Strong | 100% | 52.8% | 100% | 23.4% |
| 3-SNP, Pruned, Weak | 90.4% | 89.0% | 3.6% | 53.6% |
| 3-SNP, Pruned, Strong | 100% | 99.6% | 100% | 95.2% |
| 3-SNP, Pruned, Pure | 100% | 99.6% | 100% | 98.4% |
| 3-SNP, Unpruned, Weak | 91.0% | 14.7% | 24.0% | 2.2% |
| 3-SNP, Unpruned, Strong | 100% | 35.4% | 100% | 21.0% |
| 3-SNP, Unpruned, Pure | 100% | 50.2% | 100% | 35.2% |
LR, Logic Regression.
Power of logic regression based on randomization tests
| Model | Power: Pruned | Power: Unpruned |
|---|---|---|
| Weak 2-SNP interaction | 97.4 | 64.6 |
| Intermediate 2-SNP interaction | 100 | 100 |
| Strong 2-SNP interaction | 100 | 100 |
| 30% Polygenic + Weak 2-SNP interaction | 89.6 | 54 |
| 30% Polygenic + Inter. 2-SNP interaction | 100 | 89.6 |
| 30% Polygenic + Strong 2-SNP interaction | 100 | 100 |
| Weak 3-SNP interaction | 100 | 99.2 |
| Strong 3-SNP interaction | 100 | 100 |
| Pure 3-SNP interaction | 100 | 100 |
| 30% Polygenic + Weak 3-SNP interaction | 100 | 92.6 |
| Polygenic + Strong 3-SNP interaction | 100 | 100 |
| Polygenic + Pure 3-SNP interaction | 100 | 100 |
| Polygenic model | 99.6 | 94.4 |