| Literature DB >> 21980439 |
Hirofumi Nakaoka1, Tailin Cui, Atsushi Tajima, Akira Oka, Shigeki Mitsunaga, Koichi Kashiwase, Yasuhiko Homma, Shinji Sato, Yasuo Suzuki, Hidetoshi Inoko, Ituro Inoue.
Abstract
Genome-wide association studies (GWAS) have yielded novel genetic loci underlying common diseases. We propose a systems genetics approach to utilize these discoveries for better understanding of the genetic architecture of rheumatoid arthritis (RA). Current evidence of genetic associations with RA was sought through PubMed and the NHGRI GWAS catalog. The associations of 15 single nucleotide polymorphisms and HLA-DRB1 alleles were confirmed in 1,287 cases and 1,500 controls of Japanese subjects. Among these, HLA-DRB1 alleles and eight SNPs showed significant associations and all but one of the variants had the same direction of effect as identified in the previous studies, indicating that the genetic risk factors underlying RA are shared across populations. By receiver operating characteristic curve analysis, the area under the curve (AUC) for the genetic risk score based on the selected variants was 68.4%. For seropositive RA patients only, the AUC improved to 70.9%, indicating good but suboptimal predictive ability. A simulation study shows that more than 200 additional loci with similar effect size as recent GWAS findings or 20 rare variants with intermediate effects are needed to achieve AUC = 80.0%. We performed the random walk with restart (RWR) algorithm to prioritize genes for future mapping studies. The performance of the algorithm was confirmed by leave-one-out cross-validation. The RWR algorithm pointed to ZAP70 in the first rank, in which mutation causes RA-like autoimmune arthritis in mice. By applying the hierarchical clustering method to a subnetwork comprising RA-associated genes and top-ranked genes by the RWR, we found three functional modules relevant to RA etiology: "leukocyte activation and differentiation", "pattern-recognition receptor signaling pathway", and "chemokines and their receptors".These results suggest that the systems genetics approach is useful to find directions of future mapping strategies to illuminate biological pathways.Entities:
Mesh:
Year: 2011 PMID: 21980439 PMCID: PMC3182219 DOI: 10.1371/journal.pone.0025389
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The systems genetics approach proposed in this study.
A) Databases from which knowledge is extracted. Meta-analyses and GWAS findings are sought in PubMed and NHGRI GWAS catalog, respectively. Human protein-protein interaction data is obtained from HPRD. B) Retrieved information is used to create two types of networks: ‘gene-disease association network’ and ‘protein-protein interaction network’. C) The data analysis phase. The gene-disease associations are confirmed by using real case-control subjects. The predictive ability of selected genetic variants is evaluated and the result is used in the simulation study to infer allelic architecture of as-yet-discovered genetic variants. Two types of networks are integrated to prioritize genes by the global measure of distance to known disease-associated genes within the protein-protein interaction network. Hierarchical clustering algorithm is applied to a subnetwork comprising top-ranked genes and functional annotation for each cluster is used for the inference on biological pathways underlying the disease of interest. D) The systems genetics approach emerges two types of clues: Future mapping strategies, and biological pathways.
Figure 2Overview of association studies in RA of 20 genetic variants examined in the meta-analyses met our inclusion criteria.
Colored bars displays number of individual studies according to the result of testing for association of each variant with RA: red, studies show significant evidence of an increased risk; blue, studies show significant evidence of disease protection; and green, studies show non-significant result. The significance level was set at P = 0.05.
Association analysis of RA with selected genetic variants.
| Gene | SNP | A1/A2 | Univariate | Multivariate | Previous report | ||
| OR (95% CI) |
| OR (95% CI) |
| OR (95% CI) | |||
|
|
| +/− | 1.29 (1.03–1.61) | 0.025 | 1.95 (1.52–2.48) | 8.8×10−8 | 1.60 (1.39–1.84) |
|
| +/− | 1.20 (1.04–1.39) | 0.012 | 1.74 (1.48–2.04) | 1.8×10−11 | 1.67 (1.44–1.94) | |
|
| +/− | 2.88 (1.42–5.83) | 3.3×10−3 | 3.59 (1.72–7.52) | 7.0×10−4 | 2.35 (1.90–2.91) | |
|
| +/− | 1.89 (1.23–2.90) | 3.9×10−3 | 2.70 (1.69–4.30) | 3.0×10−5 | 3.30 (3.01–3.61) | |
|
| +/− | 1.49 (0.55–4.02) | 0.43 | 2.92 (0.98–8.67) | 0.054 | 1.85 (1.54–2.22) | |
|
| +/− | 2.31 (2.01–2.66) | 1.3×10−31 | 2.80 (2.40–3.27) | 9.4×10−39 | 3.84 (3.30–4.46) | |
|
| |||||||
|
| rs3093024 | A/G | 1.25 (1.12–1.39) | 4.1×10−5 | 1.26 (1.13–1.42) | 6.3×10−5 | 1.19 (1.15–1.24) |
|
| rs2240340 | T/C | 1.23 (1.11–1.37) | 1.5×10−4 | 1.24 (1.11–1.40) | 2.6×10−4 | 1.31 (1.22–1.41) |
|
| rs2736340 | T/C | 1.24 (1.10–1.39) | 3.2×10−4 | 1.24 (1.09–1.41) | 7.8×10−4 | 1.19 (1.13–1.27) |
|
| rs4810485 | T/G | 0.80 (0.72–0.89) | 4.7×10−4 | 0.82 (0.73–0.92) | 7.8×10−4 | 0.87 (0.83–0.90) |
|
| |||||||
|
| rs26232 | T/C | 0.86 (0.77–0.98) | 0.018 | 0.86 (0.75–0.98) | 0.021 | 0.90 (0.87–0.94) |
|
| rs2073838 | A/G | 1.14 (1.02–1.27) | 0.022 | 1.17 (1.04–1.32) | 0.012 | 1.11 (1.05–1.18) |
|
| rs11676922 | T/A | 1.11 (1.00–1.24) | 0.043 | 1.11 (0.99–1.24) | 0.083 | 1.14 (1.10–1.18) |
|
| rs7528684 | G/A | 1.11 (1.00–1.24) | 0.047 | 1.08 (0.96–1.21) | 0.20 | 1.16 (1.09–1.24) |
|
| |||||||
|
| rs934734 | G/A | 1.14 (0.99–1.31) | 0.064 | 1.17 (1.00–1.36) | 0.043 | 1.13 (1.09–1.17) |
|
| rs7574865 | T/G | 1.10 (0.98–1.23) | 0.093 | 1.09 (0.97–1.23) | 0.14 | 1.23 (1.19–1.27) |
|
| rs3087243 | A/G | 0.92 (0.82–1.04) | 0.18 | 0.96 (0.84–1.10) | 0.56 | 0.89 (0.85–0.95) |
|
| rs3761847 | A/G | 1.05 (0.95–1.17) | 0.35 | 1.03 (0.91–1.15) | 0.66 | 1.13 (1.09–1.17) |
|
| rs706778 | T/C | 1.05 (0.95–1.17) | 0.36 | 1.05 (0.93–1.17) | 0.43 | 1.12 (1.09–1.16) |
|
| |||||||
|
| rs10499194 | T/C | 1.18 (0.96–1.46) | 0.11 | 1.18 (0.94–1.48) | 0.15 | 0.82 (0.77–0.87) |
A1 and A2 represent the coded and non-coded alleles, respectively.
ORs and 95% CIs were estimated by logistic regression analyses using univariate analysis for each allele and then using multivariate analysis including all the alleles. The number of coded alleles (A1) was used as the predictor value in the logistic regression analyses.
ORs and 95% CIs were calculated by meta-analyses of published studies: HLA-DRB1 from [62]; CD40, SLC22A4, STAT4, CTLA4, TRAF1, TNFAIP3, and IRF5 from re-analysis of meta-analyses shown in Table S4; PADI4 and FCRL3 from re-analysis of ethnicity-specific meta-analyses shown in Table S6; and CCR6, BLK, C5orf30, AFF3, SPRED2, and IL2RA from original GWASs [69]–[71]. These ORs were used to create genetic risk scores.
The discriminative ability and the global model fit of three predictive models according to subphenotype of case patients.
| Case phenotype | Model | AUC (95% CI) | AIC |
| Overall | HLA | 65.9 (63.9–67.9) | 3477.7 |
| Non-HLA | 58.8 (56.6–60.9) | 3630.7 | |
| Integrative | 68.4 (66.4–70.4) | 3421.9 | |
| RF & anti-CCP positive | HLA | 68.3 (65.2–71.4) | 1603.7 |
| Non-HLA | 60.0 (56.7–63.3) | 1694.2 | |
| Integrative | 70.9 (67.8–73.9) | 1578.0 |
Akaike's information criterion.
Figure 3Distribution of risk scores by phenotypic status for the integrative model, in which six HLA-DRB1 alleles and 14 SNPs were included.
The curves were generated with a Gaussian kernel density smoother.
Figure 4Simulation study addressing how many additional loci should be mapped for the establishment of excellent genetic risk prediction.
Five scenarios with different combination of OR and RAF were examined.
Figure 5ROC curve using the leave-one-out cross-validation method to evaluate the predictive ability of the RWR algorithm.
The gray diagonal line corresponds to the AUC of 0.5 and no discrimination (i.e., random performance).
Figure 6RA-associated network.
A) Entire RA-associated network comprising known RA-associated genes and genes ranked in the top 100 by the RWR algorithm and edges are physical interactions between their products. Nodes are color coded by hierarchical clusters detected by the EAGLE algorithm: CL1, red; CL2; cyan, and CL3, yellow. Overlapped region between CL1 and CL2 are rendered in green. Node size is based on the ranking in the RWR algorithm. Official gene symbols are shown for known RA-associated genes. B–D) Subnetworks corresponds to the hierarchical clusters CL1-3.
Top-ranked GO and KEGG annotations for three clusters in RA-associated network.
| Annotation | Term | Count | % | FE | P-value |
|
| |||||
| GO:0045321 | Leukocyte activation | 20 | 40.0 | 23.3 | 1.4×10−21 |
| GO:0002521 | Leukocyte differentiation | 15 | 30.0 | 32.3 | 8.0×10−18 |
| hsa04660 | T cell receptor signaling pathway | 15 | 30.0 | 15.7 | 1.1×10−13 |
| GO:0006468 | Protein amino acid phosphorylation | 19 | 38.0 | 8.0 | 2.8×10−12 |
|
| |||||
| hsa04620 | Toll-like receptor signaling pathway | 20 | 40.0 | 18.2 | 2.0×10−20 |
| hsa04622 | RIG-I-like receptor signaling pathway | 14 | 28.0 | 21.7 | 7.5×10−15 |
| GO:0007249 | I-kappaB kinase/NF-kappaB cascade | 12 | 24.0 | 31.6 | 4.3×10−14 |
| hsa05200 | Pathways in cancer | 20 | 40.0 | 5.4 | 2.0×10−10 |
| hsa04623 | Cytosolic DNA-sensing pathway | 10 | 20.0 | 22.0 | 2.0×10−10 |
| hsa04621 | NOD-like receptor signaling pathway | 11 | 22.0 | 15.1 | 8.7×10−10 |
|
| |||||
| GO:0006935 | Chemotaxis | 9 | 52.9 | 47.6 | 1.9×10−12 |
| GO:0007626 | Locomotory behavior | 9 | 52.9 | 27.8 | 1.5×10−10 |
| GO:0006955 | Immune response | 11 | 64.7 | 13.5 | 2.7×10−10 |
| GO:0006952 | Defense response | 10 | 58.8 | 13.7 | 3.1×10−9 |
| GO:0019957 | C-C chemokine binding | 4 | 23.5 | 231.8 | 4.4×10−7 |
| GO:0016493 | C-C chemokine receptor activity | 4 | 23.5 | 231.8 | 4.4×10−7 |
Within each cluster, related terms are not shown to reduce redundancy. Among terms with parent-child relationships, we selected one showing highest significance enrichment P-value.
Number of GO or KEGG category genes in each cluster.
Percentage of GO or KEGG category genes in each cluster.
Fold Enrichment of genes in each cluster compared to a background list.