| Literature DB >> 23095471 |
Alberto Malovini1, Nicola Barbarini, Riccardo Bellazzi, Francesca de Michelis.
Abstract
BACKGROUND: Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets.Entities:
Mesh:
Year: 2012 PMID: 23095471 PMCID: PMC3439732 DOI: 10.1186/1471-2105-13-S14-S6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Graphical representation of a genome region.
Figure 2The hierarchical structure of the data represented with plates notation.
Figure 3The hierarchical structure of the data represented with the plates notation using SNPs data.
Characteristics of the simulated datasets.
| LD thr.: r2 ≥ 0.80 | |||||||
|---|---|---|---|---|---|---|---|
| sim | GRR | B | SNPs/B | B | SNPs/B | ||
| 1 | 1.5/3.0 | 43 | 5.0 [6.50] | 0.94 [0.13] | 63 | 3 [5.50] | 0.97 [0.06] |
| 2 | 1.5/3.0 | 36 | 6.5 [11.50] | 0.95 [0.08] | 55 | 4 [6.00] | 0.98 [0.05] |
| 3 | 1.5/3.0 | 58 | 3.5 [5.00] | 0.97 [0.07] | 76 | 3 [3.00] | 0.98 [0.06] |
| 4 | 2.0/4.0 | 24 | 8.5 [29.50] | 0.97 [0.07] | 67 | 4 [4.00] | 0.98 [0.06] |
| 5 | 2.0/4.0 | 34 | 4.5 [14.00] | 0.95 [0.17] | 61 | 3 [6.00] | 0.98 [0.09] |
| 6 | 2.0/4.0 | 39 | 5.0 [6.50] | 0.97 [0.19] | 70 | 4 [3.75] | 0.99 [0.05] |
| 7 | 3.0/6.0 | 22 | 9.0 [28.25] | 0.96 [0.07] | 49 | 5 [6.00] | 0.98 [0.07] |
| 8 | 3.0/6.0 | 45 | 5.0 [10.00] | 0.98 [0.10] | 80 | 3 [3.00] | 0.98 [0.06] |
| 9 | 3.0/6.0 | 34 | 8.5 [14.50] | 0.93 [0.11] | 72 | 3 [4.00] | 0.96 [0.09] |
GRR, heterozygote/homozygote Genotype Relative Risk (GRR); B, number of blocks; SNPs/B, median number [Interquartile Range (IQR)] of SNPs within each block; r2, median [Interquartile Range (IQR)] pairwise r2 within each block. The described parameters are reported for blocks defined using thresholds of LD corresponding to r2 ≥ 0.6 and 0.8 respectively.
Results from the analysis of simulated datasets
| 10 Folds CV | Independent Test | ||||||
|---|---|---|---|---|---|---|---|
| sim | GRR | Model | CA | AUC | CA | AUC | |
| 1 | 1.5/3.0 | r2 ≥ 0.6. | HNB | 0.85 [0.81-0.87] | 0.92 [0.91-0.95] | 0.64 | 0.66 |
| NB | 0.80 [0.78-0.82] | 0.90 [0.89-0.90] | 0.69 | 0.70 | |||
| r2 ≥ 0.8. | HNB | 0.85 [0.81-0.89] | 0.93 [0.90-0.95] | 0.63 | 0.68 | ||
| NB | 0.80 [0.78-0.82] | 0.90 [0.89-0.90] | 0.69 | 0.70 | |||
| 2 | 1.5/3.0 | r2 ≥ 0.6. | HNB | 0.87 [0.83-0.93] | 0.94 [0.89-0.98] | 0.63 | 0.68 |
| NB | 0.83 [0.80-0.83] | 0.87 [0.84-0.90] | 0.59 | 0.63 | |||
| r2 ≥ 0.8. | HNB | 0.85 [0.80-0.87] | 0.92 [0.88-0.94] | 0.65 | 0.70 | ||
| NB | 0.83 [0.80-0.83] | 0.87 [0.84-0.90] | 0.59 | 0.63 | |||
| 3 | 1.5/3.0 | r2 ≥ 0.6 | HNB | 0.73 [0.70-0.77] | 0.82 [0.76-0.85] | 0.65 | 0.72 |
| NB | 0.78 [0.69-0.80] | 0.86 [0.77-0.94] | 0.68 | 0.75 | |||
| r2 ≥ 0.8 | HNB | 0.77 [0.70-0.80] | 0.85 [0.80-0.88] | 0.71 | 0.75 | ||
| NB | 0.78 [0.69-0.80] | 0.86 [0.77-0.94] | 0.68 | 0.75 | |||
| 4 | 2.0/4.0 | r2 ≥ 0.6 | HNB | 0.78 [0.72-0.84] | 0.85 [0.80-0.89] | 0.74 | 0.80 |
| NB | 0.72 [0.64-0.81] | 0.76 [0.72-0.86] | 0.71 | 0.75 | |||
| r2 ≥ 0.8 | HNB | 0.72 [0.64-0.81] | 0.77 [0.71-0.88] | 0.70 | 0.76 | ||
| NB | 0.72 [0.64-0.81] | 0.76 [0.72-0.86] | 0.71 | 0.75 | |||
| 5 | 2.0/4.0 | r2 ≥ 0.6 | HNB | 0.82 [0.77-0.83] | 0.89 [0.83-0.92] | 0.73 | 0.80 |
| NB | 0.78 [0.73-0.80] | 0.84 [0.77-0.85] | 0.76 | 0.83 | |||
| r2 ≥ 0.8 | HNB | 0.82 [0.78-0.83] | 0.88 [0.84-0.90] | 0.76 | 0.86 | ||
| NB | 0.78 [0.73-0.80] | 0.84 [0.77-0.85] | 0.76 | 0.83 | |||
| 6 | 2.0/4.0 | r2 ≥ 0.6 | HNB | 0.77 [0.73-0.80] | 0.85 [0.83-0.87] | 0.71 | 0.79 |
| NB | 0.75 [0.68-0.77] | 0.80 [0.76-0.82] | 0.66 | 0.71 | |||
| r2 ≥ 0.8 | HNB | 0.73 [0.67-0.77] | 0.80 [0.79-0.82] | 0.65 | 0.72 | ||
| NB | 0.75 [0.68-0.77] | 0.79 [0.76-0.82] | 0.66 | 0.71 | |||
| 7 | 3.0/6.0 | r2 ≥ 0.6 | HNB | 0.83 [0.81-0.83] | 0.91 [0.87-0.93] | 0.76 | 0.84 |
| NB | 0.80 [0.77-0.83] | 0.85 [0.83-0.88] | 0.81 | 0.87 | |||
| r2 ≥ 0.8 | HNB | 0.83 [0.80-0.86] | 0.94 [0.93-0.94] | 0.82 | 0.91 | ||
| NB | 0.80 [0.77-0.83] | 0.85 [0.83-0.88] | 0.81 | 0.87 | |||
| 8 | 3.0/6.0 | r2 ≥ 0.6 | HNB | 0.83 [0.78-0.87] | 0.91 [0.89-0.94] | 0.78 | 0.83 |
| NB | 0.82 [0.80-0.86] | 0.87 [0.82-0.94] | 0.81 | 0.85 | |||
| r2 ≥ 0.8 | HNB | 0.82 [0.77-0.86] | 0.90 [0.85-0.94] | 0.78 | 0.86 | ||
| NB | 0.82 [0.78-0.86] | 0.87 [0.82-0.94] | 0.81 | 0.85 | |||
| 9 | 3.0/6.0 | r2 ≥ 0.6 | HNB | 0.92 [0.87-0.93] | 0.96 [0.94-0.98] | 0.86 | 0.92 |
| NB | 0.83 [0.83-0.87] | 0.92 [0.92-0.95] | 0.84 | 0.86 | |||
| r2 ≥ 0.8 | HNB | 0.87 [0.87-0.92] | 0.96 [0.93-0.97] | 0.89 | 0.92 | ||
| NB | 0.83 [0.83-0.87] | 0.92 [0.92-0.95] | 0.84 | 0.86 | |||
CA, Median Classification Accuracy and 25% - 75% of the distribution; AUC, Median Area Under the Curve and 25% - 75% of the distribution. The 25% - 75% of the distribution are reported for results deriving from 10 Folds CV.
Majority Classifier CA and AUC for 10 Folds CV and Independent test sets: 0.50
Results obtained on the T1D and T2D datasets
| 10 Folds CV | Independent Test | ||||
|---|---|---|---|---|---|
| Study | Model | CA | AUC | CA | AUC |
| T1D | HNB | 0.70 [0.67-0.73] | 0.80 [0.78-0.82] | 0.71 | 0.79 |
| NB | 0.70 [0.67-0.72] | 0.79 [0.76-0.81] | 0.68 | 0.78 | |
| T2D | HNB | 0.83 [0.81-0.85] | 0.92 [0.89-0.93] | 0.57 | 0.57 |
| NB | 0.81 [0.80-0.84] | 0.90 [0.89-0.92] | 0.55 | 0.56 | |
CA, Median Classification Accuracy and 25% - 75% of the distribution; AUC, Median Area Under the Curve and 25% - 75% of the distribution. The 25% - 75% of the distribution are reported for results deriving from 10 Folds CV. The described parameters are reported for blocks defined using thresholds of LD corresponding to r2 ≥ 0.8
Majority Classifier CA and AUC for 10 Folds CV and Independent test sets: 0.50.