| Literature DB >> 27168765 |
Ruowang Li1, Scott M Dudek1, Dokyoon Kim1, Molly A Hall1, Yuki Bradford1, Peggy L Peissig2, Murray H Brilliant2, James G Linneman2, Catherine A McCarty3, Le Bao4, Marylyn D Ritchie5.
Abstract
BACKGROUND: The future of medicine is moving towards the phase of precision medicine, with the goal to prevent and treat diseases by taking inter-individual variability into account. A large part of the variability lies in our genetic makeup. With the fast paced improvement of high-throughput methods for genome sequencing, a tremendous amount of genetics data have already been generated. The next hurdle for precision medicine is to have sufficient computational tools for analyzing large sets of data. Genome-Wide Association Studies (GWAS) have been the primary method to assess the relationship between single nucleotide polymorphisms (SNPs) and disease traits. While GWAS is sufficient in finding individual SNPs with strong main effects, it does not capture potential interactions among multiple SNPs. In many traits, a large proportion of variation remain unexplained by using main effects alone, leaving the door open for exploring the role of genetic interactions. However, identifying genetic interactions in large-scale genomics data poses a challenge even for modern computing.Entities:
Keywords: Bayesian Network; Discriminant analysis; Evolution algorithm; Genetic interactions; Type 2 diabetes
Year: 2016 PMID: 27168765 PMCID: PMC4862166 DOI: 10.1186/s13040-016-0094-4
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Generation of BN using the grammar
Fig. 2Schematic of data simulation. Main effect models have different allele frequencies in case and control datasets at the simulated SNPs. In interaction effect models, cases and control datasets have different simulated interacting SNPs without main effects
Data simulation details
| Functional SNPs in case data | Functional SNPs in control data | Weight (W) | No. datasets for each W | Total SNPs | Sample size | |
|---|---|---|---|---|---|---|
| Main effect | SNP A | SNP A | 0.1, 0.5, 0.9 | 10 | 100, 500 | 4000 |
| SNP A, B, C, D | SNP A, B, C, D | 0.1, 0.5, 0.9 | 10 | 100, 500 | 4000 | |
| Interaction effect | SNP A * SNP B | None | 0.1, 0.5, 0.9 | 10 | 100, 500 | 4000 |
| SNP A * SNP B | SNP W * SNP X | 0.1, 0.5, 0.9 | 10 | 100, 500 | 4000 | |
| SNP C * SNP D | SNP Y * SNP Z |
Fig. 3Simulation results for additive and interaction models using grammatical evolution Bayesian Network (GEBN), grammatical evolution neural network (GENN), logistic regression, and logistic regression with the exact simulated model (MAX). The colors represent different weight indexes (red = 0.9, blue = 0.5, green = 0.1). These weight indices correspond to strength of the simulated effects. a. Main effect model: SNP A (100) b. Main effect model: SNP A (500) c. Main effect model: SNP A, B, C, D (100) d. Main effect model: SNP A, B, C, D (500) e. Interaction model: SNP A < − > B (100) f. Interaction: SNP A < − > B (500) g. Interaction model: SNP A < − > B, C < − > D, W < − > X, Y < − > Z (100) h. Interaction model: SNP A < − > B, C < − > D, W < − > X, Y < − > Z (500)
Comparison of AUC for GEBN and logistic regression
| Functional SNPs in Case data | Functional SNPs in Control data | Weight (W) | MAX | Regression | GENN | GEBN | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| 100 | 100 | 500 | 100 | 500 | 100 | 500 | ||||
| Main effect | SNP A | SNP A | 0.1 | 55 | 52 | 51 | 54 | 54 | 53 | 52 |
| 0.5 | 71 | 71 | 64 | 70 | 71 | 71 | 67 | |||
| 0.9 | 88 | 88 | 72 | 88 | 88 | 88 | 87 | |||
| SNP A, B, C, D | SNP A, B, C, D | 0.1 | 61 | 57 | 53 | 54 | 54 | 58 | 54 | |
| 0.5 | 90 | 89 | 72 | 73 | 73 | 89 | 87 | |||
| 0.9 | 99 | 96 | 87 | 98 | 98 | 99 | 99 | |||
| Interaction effect | SNP A * SNP B | None | 0.1 | 53 | 50 | 50 | 50 | 50 | 50 | 50 |
| 0.5 | 67 | 50 | 49 | 56 | 53 | 65 | 60 | |||
| 0.9 | 89 | 50 | 50 | 60 | 59 | 80 | 77 | |||
| SNP A * SNP B | SNP W * SNP X | 0.1 | 56 | 50 | 50 | 50 | 50 | 52 | 51 | |
| SNP C * SNP D | SNP Y * SNP Z | 0.5 | 82 | 50 | 50 | 57 | 55 | 81 | 67 | |
| 0.9 | 97 | 49 | 50 | 62 | 60 | 97 | 89 | |||
Fig. 4Testing ROC curve for type 2 diabetes. Each color represents a single cross-validation
Fig. 5Best Bayesian Network models for cases and controls. Left panel shows network structure before BIC pruning. Right panel shows network structure after BIC pruning, and the red edges indicate interactions only found in the case data or the control data, but not both cases and controls. a. Case data network, b. Control data network