| Literature DB >> 19208195 |
Alberto Malovini1, Angelo Nuzzo, Fulvia Ferrazzi, Annibale A Puca, Riccardo Bellazzi.
Abstract
BACKGROUND: Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.Entities:
Mesh:
Year: 2009 PMID: 19208195 PMCID: PMC2646249 DOI: 10.1186/1471-2105-10-S2-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of Bayesian network representing the dependencies between a Phenotype and 4 SNPs. On the left, the directed acyclic graph of the BN; on the right the conditional probabilities tables associated with each node.
Figure 2Classification tree for meta-variable state assignment. Example of classification tree used to infer the possible states of the meta-variable associated with gene C, represented by two SNPs, C1 and C2. OR = "Odds Ratio".
Figure 3SNP-based BN learned using the whole dataset. Bayesian network learned on the whole dataset using single SNPs as variables.
Figure 4Meta-variable BN learned using the whole dataset. Bayesian network learned on the whole dataset using meta-variables associated to each gene.
Classification performance on the test sets.
| Sampling test 1 | 55.71 | 0.09 | 64.28 | 0.26 | 57.14 | 0.12 | 51.43 |
| Sampling test 2 | 55 | 0.07 | 59.28 | 0.16 | 53.57 | 0.04 | 51.43 |
| Sampling test 3 | 63.57 | 0.25 | 67.86 | 0.34 | 55 | 0.07 | 51.43 |
| Sampling test 4 | 62.14 | 0.22 | 65.72 | 0.29 | 49.29 | -0.04 | 51.43 |
| Sampling test 5 | 58.57 | 0.15 | 64.28 | 0.26 | 57.85 | 0.13 | 51.43 |
| 95% Confidence Interval | 54.28–63.72 | 60.36–68.2 | 50.34–58.80 | ||||
| Standard Deviation | 3.8 | 3.16 | 3.4 | ||||
| Standard Error | 1.7 | 1.41 | 1.52 | ||||
The table summarizes the results obtained by repeating 5 times a random sampling hold-out scheme in which 75% of the dataset (216 affected and 203 unaffected individuals) was employed as training set and the remaining 25% as test set (72 affected and 68 unaffected individuals). In particular, the table shows the classification accuracies obtained on the test sets by the single-SNP BN, the meta-variable BN and the haplotype BN, the accuracies of the majority classifier and the k-statistics.
Figure 5Haplotype-based BN learned using the whole dataset. Bayesian network learned on the whole dataset using haplotypes as variables.