| Literature DB >> 26624895 |
Xia Jiang1, Jeremy Jao1, Richard Neapolitan2.
Abstract
BACKGROUND: The problems of correlation and classification are long-standing in the fields of statistics and machine learning, and techniques have been developed to address these problems. We are now in the era of high-dimensional data, which is data that can concern billions of variables. These data present new challenges. In particular, it is difficult to discover predictive variables, when each variable has little marginal effect. An example concerns Genome-wide Association Studies (GWAS) datasets, which involve millions of single nucleotide polymorphism (SNPs), where some of the SNPs interact epistatically to affect disease status. Towards determining these interacting SNPs, researchers developed techniques that addressed this specific problem. However, the problem is more general, and so these techniques are applicable to other problems concerning interactions. A difficulty with many of these techniques is that they do not distinguish whether a learned interaction is actually an interaction or whether it involves several variables with strong marginal effects. METHODOLOGY/Entities:
Mesh:
Year: 2015 PMID: 26624895 PMCID: PMC4666609 DOI: 10.1371/journal.pone.0143247
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
An interaction with little marginal effect. Variables X and Y are both trinary predictors of a binary target Z.
The number next to each variable value shows the fraction of occurrence of that value in the population, and the entries in the table show the probability Z equals z 1 given each combination of the predictors.
|
|
|
| |
|---|---|---|---|
|
| 0.0 | 0.1 | 0.0 |
|
| 0.11 | 0.0 | 0.11 |
|
| 0.0 | 0.1 | 0.0 |
Fig 1Bayesian network modeling relationships among respiratory diseases.
Fig 2Causal BN DAG models.
Fig 3If the DAG model in (b) has a lower score than the one in (a), we do not add node X 3.
Fig 4Comparison of MBS-IGain, REGAL, and MBS all using the MDL score (with T = 0.1 for MBS-IGain) according to performance Criterion 1.
Fig 5Comparison of MBS-IGain, REGAL, and MBS all using the MDL score (with T = 0.1 for MBS-IGain) according to performance Criterion 2.
Fig 6Comparison of MBS-IGain to 7 other methods according to performance Criterion 1.
SNP Harvester (SH), maximum entropy conditional probability modeling (MECPM), Bayesian epistasis association mapping (BEAM), logistic regression (LR), full interaction modeling (FIM), information gain (IG), multifactor dimensionality reduction (MDR).
Fig 7ROC curves for MBS-IGain and 5 other methods.
SNP Harvester (SH), Bayesian epistasis association mapping (BEAM), full interaction modeling (FIM), information gain (IG), multifactor dimensionality reduction (MDR).
Fig 8Histograms and percentile distributions when determining interactive models from the LOAD dataset.
Interactions learned from the LOAD dataset using the BDeu score with α = 4.
The third column shows gene on which the SNP resides if it is located in a gene; otherwise it shows the chromosome. The fourth column shows the BDeu score of the interaction, and the fifth column shows the interaction strength of the interaction (See Eq 4).
| Rank | Interaction | Genes | BDeu | IS |
|---|---|---|---|---|
| 1 | rs10510511, rs197899, rs7115850, APOE | Chrome 3, Chrome 6, GAB2, APOE | -824.6 | 0.042 |
| 2 | rs11895074, rs7115850, APOE | SPAG16, GAB2, APOE | -827.1 | 0.035 |
| 3 | rs536128, rs7115850, APOE | CALN1, GAB2, APOE | -827.3 | 0.020 |
| 4 | rs7101429, rs10510511, rs197899, APOE | GAB2, Chrome 3, Chrome 6, APOE | -828.7 | 0.039 |
| 5 | rs11122116, rs16856748, rs734600, rs16992170, APOE | NPHP4, LRP2, EYA2, EYA2, APOE | -828.9 | 0.049 |
| 6 | APOE | APOE | -836.3 | 0 |
| 7 | rs41369150, rs2265264, rs11217838, rs4420638 | FNDC3B, Chrome 10, ARHGEF12, APOC1 | -861.8 | 0.049 |
| 8 | rs7355646, rs41369150, rs4420638 | Chrome 2, FNDC3B, APOC1 | -863.4 | 0.023 |
| 9 | rs7355646, rs41528844, rs4420638 | Chrome 2, ADAMTS16, APOC1 | -865.6 | 0.018 |
| 10 | rs2265264, rs4420638 | Chrome 10, APOC1 | -865.9 | 0.026 |
| 11 | rs41369150, rs4420638, rs6121360 | FNDC3B, APOC1, TM9SF4 | -866.1 | 0.019 |
| 12 | rs10922885, rs7355646, rs4420638 | Chrome 1, Chrome 2, APOC1 | -867.3 | 0.023 |
| 13 | rs41369150, rs4420638 | FNDC3B, APOC1 | -867.4 | 0.010 |
| 14 | rs7355646, rs4420638 | Chrome 2, APOC1 | -868.6 | 0.012 |
Interactions learned from the LOAD dataset using the MDL score.
The third column shows the gene on which the SNP resides if it is located in a gene; otherwise it shows the chromosome. The fourth column shows the negative MDL score of the interaction, and the fifth column shows the interaction strength of the interaction (See Eq 4).
| Rank | Interaction | Genes | MDL | IS |
|---|---|---|---|---|
| 1 | rs7115850, APOE | GAB2, APOE | 160.5 | 0.013 |
| 2 | rs197899, APOE | Chrome 6, APOE | 158.3 | 0.009 |
| 3 | rs1785928, APOE | ELP2, APOE | 157.6 | 0.008 |
| 4 | rs10793294, APOE | GAB2, APOE | 156.0 | 0.012 |
| 5 | rs41491045, APOE | Chrome 2, APOE | 155.3 | 0.009 |
| 6 | rs2057537, APOE | TCP11, APOE | 155.2 | 0.007 |
| 7 | APOE | APOE | 154.7 | 0 |
| 8 | rs12421071, APOE | Chrome 11, APOE | 154.2 | 0.007 |
| 9 | rs891159, APOE | ATG10, APOE | 154.2 | 0.009 |
| 10 | rs12674799, APOE | Chrome 8, APOE | 153.4 | 0.010 |
| 11 | rs1957731, APOE | Chrome 4, APOE | 153.3 | 0.009 |
| 12 | rs1389421, APOE | Chrome 11, APOE | 153.0 | 0.007 |
| 13 | rs986647, APOE | Chrome 4, APOE | 153.0 | 0.009 |
| 14 | rs17095891, rs8108841, APOE | Chrome 10, Chrome 19, APOE | 152.8 | 0.016 |
| 15 | rs2717389, rs10130967, APOE | Chrome 3, Chrome 14, APOE | 140.7 | 0.022 |
| 16 | rs898717, rs16975605, APOE | FRMD4A, Chrome 15, APOE | 140.5 | 0.020 |
| 17 | rs11895074, rs7115850, APOE | SPAG16, GAB 2, APOE | 138.2 | 0.035 |
| 18 | rs11676052, rs6719419, APOE | Chrome 2, Chrome 2, APOE | 138.1 | 0.020 |
| 19 | rs2265264, rs4420638 | Chrome 10, APOC1 | 104.9 | 0.026 |
| 20 | rs41369150, rs4420638 | FNDC3B, APOC1 | 99.7 | 0.010 |
| 21 | rs4420638 | APOC1 | 97.1 | 0 |
The loci involved in the 14 interactions learned using the BDeu score or the 21 interactions learned using the MDL Scored.
The second column shows their rank when we score all 312,260 1-SNP models using the MDL score. The third column shows the information gain provided by the SNP by itself. The SNP in the third row is not in a learned interaction. It is included to show the highest scoring SNP other than APOE or APOC1.
| Locus | Rank | Info Gain |
|---|---|---|
| APOE | 1 | 0.121 |
| rs4420638 (APOC1) | 2 | 0.080 |
| rs6784615 | 3 | 0.016 |
| rs1785928 | 111 | 0.010 |
| rs41528844 | 449 | 0.008 |
| rs7355646 | 968 | 0.007 |
| rs1389421 | 966 | 0.007 |
| rs8108841 | 994 | 0.007 |
| rs11895074 | 1085 | 0.007 |
| rs7115850 | 1384 | 0.006 |
| rs891159 | 1599 | 0.006 |
| rs41369150 | 1781 | 0.006 |
| rs898717 | 2731 | 0.006 |
| rs17095891 | 2817 | 0.006 |
| rs10510511 | 2901 | 0.006 |
| rs16975605 | 3404 | 0.005 |
| rs197899 | 3915 | 0.005 |
| rs6719419 | 4219 | 0.005 |
| rs536128 | 4841 | 0.005 |
| rs2057537 | 4813 | 0.005 |
| rs1957731 | 5781 | 0.005 |
| rs6121360 | 6887 | 0.005 |
| rs10130967 | 6901 | 0.005 |
| rs986647 | 7820 | 0.005 |
| rs10793294 | 7983 | 0.004 |
| rs10922885 | 8045 | 0.004 |
| rs2717389 | 8314 | 0.004 |
| rs41491045 | 8527 | 0.004 |
| rs12421071 | 8736 | 0.004 |
| rs12674799 | 8741 | 0.004 |
| rs11676052 | 8804 | 0.004 |
| rs7101429 | 11208 | 0.004 |
| rs11122116 | 13898 | 0.004 |
| rs16992170 | 18747 | 0.004 |
| rs11217838 | 20371 | 0.004 |
| rs2265264 | 99260 | 0.001 |
| rs16856748 | 108717 | 0.001 |
| rs734600 | 138078 | 0.001 |
The genes involved in the 14 interactions learned using the BDeu score or the 21 interactions learned using the MDL Scored.
The second column whether previous research indicated whether they were involved in an interaction concerning LOAD, and the third column shows whether previous research indicated that they had an effect on LOAD when interactions were not considered.
| Gene | Prev. Int. Effect | Previous No Int. Effect |
|---|---|---|
| APOE | Yes | Yes |
| APOC1 | Yes | Yes |
| GAB2 | Yes | No |
| SPAG16 | Yes | No |
| CALN1 | No | No |
| NPHP4 | No | No |
| LRP2 | No | Yes |
| EYA2 | No | No |
| FNDC3B | No | No |
| ARHGEF12 | No | No |
| ADAMTS16 | No | Yes |
| TM9SF4 | No | No |
| ELP2 | No | Yes |
| TCP11 | No | No |
| ATG10 | Yes | No |
| ZNF77 | No | No |
| FRMD4A | No | Yes |