| Literature DB >> 27230078 |
Zexian Zeng1, Xia Jiang2, Richard Neapolitan3.
Abstract
BACKGROUND: The problem of learning causal influences from data has recently attracted much attention. Standard statistical methods can have difficulty learning discrete causes, which interacting to affect a target, because the assumptions in these methods often do not model discrete causal relationships well. An important task then is to learn such interactions from data. Motivated by the problem of learning epistatic interactions from datasets developed in genome-wide association studies (GWAS), researchers conceived new methods for learning discrete interactions. However, many of these methods do not differentiate a model representing a true interaction from a model representing non-interacting causes with strong individual affects. The recent algorithm MBS-IGain addresses this difficulty by using Bayesian network learning and information gain to discover interactions from high-dimensional datasets. However, MBS-IGain requires marginal effects to detect interactions containing more than two causes. If the dataset is not high-dimensional, we can avoid this shortcoming by doing an exhaustive search.Entities:
Keywords: Bayesian network; Breast cancer survival; Cause; Epistasis; Information gain; Interaction; Low-dimensional
Mesh:
Year: 2016 PMID: 27230078 PMCID: PMC4880828 DOI: 10.1186/s12859-016-1084-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1On the left is a Bayesian network representing a causal interaction with no marginal effects, and on the right is a Bayesian network representing a causal interaction described by the Noisy-Or model
Fig. 2A Bayesian network representing the relationships among a small subset of variables related to respiratory illnesses
Fig. 3Algorithm MBS-IGain
Fig. 4Algorithm Exhaustive-IGain
Fig. 5The model that X and Y are both parents of Z is on the left, and its three competing models are on the right
The clinical variables in the METABRIC dataset
| Variable | Description | Values |
|---|---|---|
| age_at_diagnosis | age at diagnosis of the disease | 0-39, 39–54, 54–69, 69–84, 84-100 |
| menopausal_status | inferred menopausal status | pre, post |
| size | size of tumor in cm | 0-20, 20–50, 50-180 |
| lymph_nodes_positive | number of positive lymph nodes | 0, 1, 2–3, 4–5, 6–9. ≥ 10 |
| lymph_nodes_removed | number of lymph nodes removed | 0, 1–3, 4–9, 10–20, ≥ 21 |
| percent_nodes_positive | percent of removed nodes positive | 0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1 |
| grade | grade of disease | 1, 2, 3 |
| stage | composite of size and # positive nodes | 0,1,2,3,4 |
| histological | tumor histology | IDC, Other |
| ER_Expr | estrogen receptor expression | +, − |
| PR_Expr | progesterone receptor expression | +, − |
| HER2_status | HER2 expression | +, − |
| P53_mutation_status | whether P53 is mutated | +, − |
| chemo | whether patient had chemotherapy | yes, no |
| radiation | whether patient had radiation therapy | yes, no |
| hormone | whether patient had hormone therapy | yes, no |
Fig. 6Comparison of Exhaustive-IGain and MBS-IGain, when analysing the simulated datasets based on interactions with marginal effects, using Performance Criterion 1
Fig. 7Comparison of Exhaustive-IGain and MBS-IGain, when analysing the simulated datasets based on interactions with marginal effects, using Performance Criterion 2
Fig. 8Comparison of Exhaustive-IGain and MBS-IGain, when analysing the simulated datasets based on pure epistatic interactions with no marginal effects, using Performance Criterion 2
The individual variable effects learned from the METABRIC dataset. The p-values were obtained using the chi-square test
| Variable | 5 year BC death | 10 year BC death | 15 year BC death | |||
|---|---|---|---|---|---|---|
| BNPP | p-value | BNPP | p-value | BNPP | p-value | |
| P53_mutation_status | 1 | 0 | 0.97 | 0.001 | 0.936 | 0.0004 |
| HER2_Status | 1 | 0 | 1 | 0 | 0.853 | 0.0006 |
| chemo | 1 | 0 | 1 | 0 | 0.999 | 0 |
| PR_category | 1 | 0 | 1 | 0 | 0.971 | 0.002 |
| hormone | 0.880 | 0.112 | 0.410 | 0.120 | 0.999 | 0 |
| radiation | 0.240 | 0.320 | 0.170 | 1 | 0.280 | 0.576 |
| ER_category | 1 | 0 | 1 | 0 | 0.889 | 0.002 |
| overall_stage | 1 | 0 | 1 | 0 | 1 | 0 |
| menopausal_status | 0.940 | 0.019 | 0.190 | 0.76 | 0.421 | 0.554 |
| histological | 0.450 | 0.0250 | 0.940 | 0.002 | 0.913 | 0.055 |
| lymph_nodes_pos | 1 | 0 | 1 | 0 | 1 | 0 |
| percent_nodes_positive | 1 | 0 | 1 | 0 | 0.999 | 0 |
| overall_grade | 1 | 0 | 1 | 0 | 0.999 | 0.0001 |
| size | 1 | 0 | 1 | 0 | 0.954 | 0.014 |
| age_at_diagnosis | 1 | 0 | 1 | 0 | 0.950 | 0.0003 |
| axillary_nodes_removed | 0.160 | 0.113 | 0.950 | 0.003 | 0.147 | 0.567 |
The interactions learned from the METABRIC dataset
| Outcome | Interaction | BNPP | IP |
|---|---|---|---|
| 5 year BC death | histological, menopausal_status | 0.77 | 0.43 |
| histological, hormone | 0.93 | 0.47 | |
| 10 year BC death | hormone, menopausal_status | 0.32 | 0.72 |
| 15 year BC death | histological, menopausal status | 0.57 | 0.49 |
The average BNPPs and IPs of all 2, 3, 4, and 5 predictor models obtained from the Metabric dataset
| Model | Avg. BNPP | Avg. IP |
|---|---|---|
| 2-predictor models | 0.266 | 0.042 |
| 3-predictor models | 0.005 | −0.005 |
| 4-predictor models | 6.13 × 10 -7 | 0.013 |
| 5-predictor models | 7:04 × 10 -16 | 0.040 |