| Literature DB >> 22970175 |
Alireza Nazarian1, Heike Sichtig, Alberto Riva.
Abstract
Complex disorders are a class of diseases whose phenotypic variance is caused by the interplay of multiple genetic and environmental factors. Analyzing the complexity underlying the genetic architecture of such traits may help develop more efficient diagnostic tests and therapeutic protocols. Despite the continuous advances in revealing the genetic basis of many of complex diseases using genome-wide association studies (GWAS), a major proportion of their genetic variance has remained unexplained, in part because GWAS are unable to reliably detect small individual risk contributions and to capture the underlying genetic heterogeneity. In this paper we describe a hypothesis-based method to analyze the association between multiple genetic factors and a complex phenotype. Starting from sets of markers selected based on preexisting biomedical knowledge, our method generates multi-marker models relevant to the biological process underlying a complex trait for which genotype data is available. We tested the applicability of our method using the WTCCC case-control dataset. Analyzing a number of biological pathways, the method was able to identify several immune system related multi-SNP models significantly associated with Rheumatoid Arthritis (RA) and Crohn's disease (CD). RA-associated multi-SNP models were also replicated in an independent case-control dataset. The method we present provides a framework for capturing joint contributions of genetic factors to complex traits. In contrast to hypothesis-free approaches, its results can be given a direct biological interpretation. The replicated multi-SNP models generated by our analysis may serve as a predictor to estimate the risk of RA development in individuals of Caucasian ancestry.Entities:
Mesh:
Year: 2012 PMID: 22970175 PMCID: PMC3435396 DOI: 10.1371/journal.pone.0044162
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A flow chart illustrating the steps applied by the KBAS method.
The list of pathways included in this study and their characteristics.
| Pathway | KEGG ID | Number of Genes | Number of Transcripts | Number of Validated SNPs | Number of Validated SNPs in WTCCC Dataset | ||||
|
| |||||||||
|
| map04612 | 69 | 111 | 7,950 | 148 | ||||
|
| map04662 | 75 | 141 | 30,791 | 1,054 | ||||
|
| map04062 | 190 | 314 | 71,371 | 2,576 | ||||
|
| map04610 | 69 | 137 | 15,415 | 673 | ||||
|
| map04060 | 265 | 453 | 47,551 | 1,811 | ||||
|
| map04664 | 79 | 131 | 34,966 | 1,247 | ||||
|
| map04666 | 95 | 182 | 48,187 | 1,842 | ||||
|
| map04672 | 46 | 61 | 7,538 | 132 | ||||
|
| map04670 | 116 | 215 | 53,296 | 1,930 | ||||
|
| map04650 | 132 | 236 | 37,095 | 1,262 | ||||
|
| map04145 | 151 | 261 | 32,421 | 1,029 | ||||
|
| map04140 | 34 | 45 | 5,342 | 160 | ||||
|
| map04660 | 108 | 205 | 38,035 | 1,334 | ||||
|
| map04620 | 102 | 172 | 19,755 | 585 | ||||
|
| |||||||||
|
| map04260 | 73 | 145 | 33,043 | 1,430 | ||||
|
| map04540 | 90 | 148 | 57,854 | 2,257 | ||||
|
| map00010 | 64 | 116 | 10,570 | 392 | ||||
|
| map04910 | 137 | 245 | 44,633 | 1,289 | ||||
|
| map03420 | 44 | 63 | 10,096 | 319 | ||||
|
| map00190 | 116 | 175 | 15,100 | 478 | ||||
|
| map00230 | 161 | 296 | 80,663 | 2,845 | ||||
|
| map00240 | 99 | 160 | 28,338 | 724 | ||||
|
| map04614 | 17 | 29 | 3,720 | 125 | ||||
|
| map03040 | 126 | 180 | 14,922 | 474 | ||||
Test-set contains immune system related pathways selected based on the preexisting knowledge about the pathogenesis of diseases under investigation and control-set contains pathways which are not likely to be relevant to the pathogenesis of the diseases of interest based on the preexisting knowledge.
The p-values associated with the pairwise comparisons of the RA group and the two control groups using the successful models derived from immune system related pathways.
| Pathway | 58C | RA | RA | ||
| Fitness | Fitness | Randomization-test | Fitness | Randomization-test | |
|
| 0.00589 |
|
|
|
|
|
| 0.04550 |
|
|
|
|
|
| >0.05 |
|
|
|
|
|
| >0.05 |
|
|
|
|
|
| 0.04981 |
|
|
|
|
|
| 0.03246 | 6.97×10 | >0.05 | 2.47×10 | >0.05 |
|
| 0.04356 | 1.91×10 | >0.05 | 1.43×10 | 0.01143 |
|
| 0.00408 |
|
|
|
|
|
| >0.05 |
|
|
|
|
|
| 0.00437 |
|
|
|
|
|
| >0.05 |
|
|
|
|
|
| >0.05 | 0.00941 | >0.05 | 0.01714 | >0.05 |
|
| 0.04116 |
|
|
|
|
|
| >0.05 |
|
|
|
|
The fitness p-values measure the fitness of each successful model retrieved by Genetic Algorithm engine. They are calculated by comparing original case and control datasets using corresponding successful models. Randomization-test p-values measure the significance of fitness p-values of their corresponding successful model by comparing permuted case and control datasets. According to Bonferroni’s correction, a fitness p-value <6.944×10 9 and a randomization test p-value <0.00104 were considered significant. The p-values of the models showing strong or moderate association with rheumatoid arthritis are in bold.
The p-values associated with the pairwise comparisons of the RA group and the two control groups using successful models derived from negative control pathways.
| Pathway | 58C | RA | RA | ||
| Fitness | Fitness | Randomization-test | Fitness | Randomization-test | |
|
| >0.05 | 0.01392 | >0.05 | 0.01158 | >0.05 |
|
| >0.05 | 0.00051 | >0.05 | 0.00202 | >0.05 |
|
| >0.05 | 0.0007 | >0.05 | 0.00288 | >0.05 |
|
| >0.05 | 4.30×10 | >0.05 | 0.000533 | >0.05 |
|
| >0.05 | 0.00144 | >0.05 | 0.00084 | >0.05 |
|
| 0.02171 | 9.12×10 | 0.04476 | 1.26×10 | >0.05 |
|
| >0.05 | >0.05 | >0.05 | >0.05 | >0.05 |
|
| >0.05 | 0.00782 | >0.05 | 0.03497 | >0.05 |
|
| >0.05 | 0.00273 | >0.05 | 0.02638 | >0.05 |
|
| 0.02692 | 4.47×10 | >0.05 | 0.00024 | >0.05 |
The p-values associated with the comparison of the RA and CTR groups and NARAC-A and NARAC-C using the successful models derived from pathways under consideration.
| Pathway | RA | NARAC-A | ||
| Fitness | Randomization-test | Fitness | Randomization-test | |
|
| ||||
|
|
|
|
|
|
|
|
|
| 4.54×10 | >0.05 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6.34×10 | >0.05 |
|
| 2.73×10 | >0.05 | 7.52×10 | >0.05 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.00532 | >0.05 |
|
| 0.00172 | >0.05 | 9.99×10 | >0.05 |
|
|
|
|
|
|
|
|
|
| 1.96×10 | >0.05 |
|
| ||||
|
| 0.00405 | >0.05 | 4.44×10 | >0.05 |
|
| 7.72×10 | 0.04995 | 0.01804 | >0.05 |
|
| 0.00025 | >0.05 | 3.51×10 | >0.05 |
|
| 2.37×10 | 0.00860 | 0.00186 | >0.05 |
|
| 7.01×10 | >0.05 | 0.01216 | >0.05 |
|
| 1.72×10 | 0.00460 | 2.77×10 | >0.05 |
|
| 0.04224 | >0.05 | >0.05 | >0.05 |
|
| 0.00371 | >0.05 | 0.00034 | >0.05 |
|
| 0.00216 | >0.05 | 0.00248 | >0.05 |
|
| 1.09×10 | 0.02772 | 5.78×10 | >0.05 |
According to Bonferroni’s correction, a fitness p-value <6.944×10 9 and a randomization test p-value <0.00208 were considered significant. The p-values of the models showing significant association with rheumatoid arthritis comparing RA vs. CTR are in bold. Of these 12 pathways, five were replicated in NARAC dataset at the significance level of 0.00208 and three were replicated at the significance level of 0.05. The p-values of these replicated models are also in bold.
Multivariate regression of disease-state on the score variables derived from the successful models showing strong or moderate association with rheumatoid arthritis (comparing RA vs. CTR).
| Test of Overall Model | ||||||||
| Test | Chi-square | df |
| |||||
|
| 529.5161 | 12 | <0.0001 | |||||
|
| 497.9300 | 12 | <0.0001 | |||||
|
| 451.4576 | 12 | <0.0001 | |||||
|
| ||||||||
|
|
|
|
|
|
|
| ||
|
|
| |||||||
|
| -0.3790 | 0.0312 | 147.7942 | 1 | <0.0001 | - | - | - |
|
| 0.5020 | 0.1304 | 14.8224 | 1 | 0.0001 | 1.652 | 1.279 | 2.133 |
|
| 0.6809 | 0.1516 | 20.1816 | 1 | <0.0001 | 1.976 | 1.468 | 2.659 |
|
| 0.8802 | 0.0990 | 79.0971 | 1 | <0.0001 | 2.411 | 1.986 | 2.928 |
|
| 0.7601 | 0.0984 | 59.6341 | 1 | <0.0001 | 2.139 | 1.763 | 2.594 |
|
| 1.0191 | 0.1973 | 26.6887 | 1 | <0.0001 | 2.771 | 1.882 | 4.078 |
|
| 0.7184 | 0.1224 | 34.4668 | 1 | <0.0001 | 2.051 | 1.614 | 2.607 |
|
| 0.6218 | 0.1108 | 31.4872 | 1 | <0.0001 | 1.862 | 1.499 | 2.314 |
|
| 1.0506 | 0.1575 | 44.5086 | 1 | <0.0001 | 2.859 | 2.100 | 3.893 |
|
| 0.5614 | 0.1840 | 9.3069 | 1 | 0.0023 | 1.753 | 1.222 | 2.514 |
|
| 0.5519 | 0.1274 | 18.7555 | 1 | <0.0001 | 1.737 | 1.353 | 2.229 |
|
| -1.3250 | 0.3206 | 17.0822 | 1 | <0.0001 | 0.266 | 0.142 | 0.498 |
|
| -1.7018 | 0.5694 | 8.9334 | 1 | 0.0028 | 0.182 | 0.060 | 0.557 |
|
| ||||||||
|
|
|
|
| |||||
|
| 6.9178 | 8 | 0.5455 | |||||
Pathway 1: B-cell Receptor Signaling Pathway.
Pathway 2: Chemokine Signaling Pathway.
Pathway 3: Complement and Coagulation Cascades Pathway.
Pathway 4: Cytokine -Cytokine Receptor Interaction Pathway.
Pathway 5: Fc Gamma R-mediated Phagocytosis Pathway.
Pathway 6: Intestinal Immune Network for IgA Production Pathway.
Pathway 7: Leukocyte Trans-endothelial Migration Pathway.
Pathway 8: Natural Killer Cell Mediated Cytotoxicity Pathway.
Pathway 9: T-cell Receptor Signaling Pathway.
Pathway 10: Toll-like Receptor Signaling Pathway.
Figure 2Disease risk-Score class diagram for RA vs. CTR and NARAC-A vs. NARAC-C comparisons.
For each comparison overall score variable derived from the entire set of 44 SNPs present in the eight replicated RA-associated models was discretized into 12 bins, and for each bin the posterior probability of being affected by disease was calculated based on Bayes formula.
Figure 3Disease risk-Score class diagram for CD vs. CTR.
The overall score variable derived from the entire set of 57 SNPs present in the nine significant CD-associated models was discretized into 12 bins, and for each bin the posterior probability of being affected by disease was calculated based on Bayes formula.