| Literature DB >> 18986552 |
Angela P Presson1, Eric M Sobel, Jeanette C Papp, Charlyn J Suarez, Toni Whistler, Mangalathu S Rajeevan, Suzanne D Vernon, Steve Horvath.
Abstract
BACKGROUND: Systems biologic approaches such as Weighted Gene Co-expression Network Analysis (WGCNA) can effectively integrate gene expression and trait data to identify pathways and candidate biomarkers. Here we show that the additional inclusion of genetic marker data allows one to characterize network relationships as causal or reactive in a chronic fatigue syndrome (CFS) data set.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18986552 PMCID: PMC2625353 DOI: 10.1186/1752-0509-2-95
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1a. Flow chart overview of methods and b. subsets of patients analyzed at each step. We first constructed a co-expression network based on 127 CFS samples and then identified a CFS severity-related module using a subset of 87 patients with CFS severity scores. We then related the SNPs and connectivities to the module gene expressions in both the males and homogenized female samples separately. We selected candidate genes based on 1) association with a SNP that in turn was associated with severity, 2) connectivity, and 3) association with severity in both sexes. We then repeated analysis steps 1–5 on a second data set.
Figure 2Graphical representations of network properties. (a) Hierarchical clustering of the 2677 most varying and connected genes resulted in five modules. (b) A multi-dimensional scaling plot of these genes indicates that the blue module is the most distinct. (c) There is little relationship between male and female gene expression correlations with CFS severity, likely due to genetic heterogeneity in the female samples. (d) Homogenizing the female samples more than doubled the correlation between M and FH gene significance. (e) Connectivities of the module genes are similar between males (M) and females (F) and (f) males and homogenized females (FH), with the blue module showing the highest preservation. The fact that intramodular connectivity is highly preserved forms the foundation of a connectivity and network-based screening strategy.
Figure 3Male and female gene significance bar plots for CFS severity. We found that the blue module gene significance was highest in (a) all samples and in (b) males. In females (c) the blue module significance was approximately equal to the average significance of the other modules. (d) Homogenizing the female samples increased and emphasized the blue module significance.
Average absolute value of severity associations for the SNPs within eight candidate genes.
| Name | Correlation | p-value | ||||
| 2p24 | 0.14 (NA) | 1 | rs12473543 | 0.135 | 0.216 | |
| 5q34 | 0.07 (0.06) | 7 | rs258750 | 0.198 | 0.069 | |
| 7p15 | 0.15 (0.08) | 3 | hCV15960586 | 0.225 | 0.036 | |
| 11p15 | 0.07 (0.01) | 2 | rs4074905 | 0.080 | 0.466 | |
| 12q21 | 0.23 (0.04) | 7 | rs10784941 | 0.275 | 0.010 | |
| 17q11.1 | 0.18 (0.17) | 3 | rs4325622 | 0.347 | 0.001 | |
| 17q21 | 0.03 (0.02) | 6 | rs242940 | 0.069 | 0.531 | |
| 22q11.1 | 0.04 (0.02) | 7 | hCV11804654 | 0.077 | 0.479 | |
TPH2 with seven SNPs had the highest average association with CFS severity.
Understanding the factors that affect gene significance.
| Samples with severity scores | 87 | r = 0.27 (p = 0.011) | r = 0.28 (p = 0.010) |
| Males & HomFemales | 76 | r = 0.50 (p = 8 × 10-6) | r = 0.33 (p = 0.003) |
| Males with severity scores | 23 | r = 0.34 (p = 0.113) | r = 0.45 (p = 0.030) |
| Females with severity scores | 64 | r = 0.26 (p = 0.041) | r = 0.17 (p = 0.170) |
| Homogenized Females | 53 | r = 0.54 (p = 2 × 10-5) | r = 0.25 (p = 0.076) |
| Second data set (DS)3 | 33 | r = 0.09 (p = 0.638) | r = 0.03 (p = 0.846) |
| Second DS Homogenized3 | 30 | r = 0.38 (p = 0.040) | r = 0.15 (p = 0.415) |
| Samples with severity scores | 87 | r = 0.21 (p = 0.055) | r = 0.18 (p = 0.088) |
| Males & HomFemales | 76 | r = 0.38 (p = 6 × 10-4) | r = 0.23 (p = 0.045) |
| Males with severity scores | 23 | r = 0.27 (p = 0.216) | r = 0.20 (p = 0.365) |
| Females with severity scores | 64 | r = 0.21 (p = 0.101) | r = 0.19 (p = 0.137) |
| Homogenized Females | 53 | r = 0.44 (p = 0.001) | r = 0.24 (p = 0.082) |
| Second DS3 | 33 | r = 0.28 (p = 0.116) | r = 0.18 (p = 0.318) |
| Second DS Homogenized3 | 30 | r = 0.40 (p = 0.030) | r = 0.12 (p = 0.515) |
The severity, MEblue, TPH2 SNP, and FOXN1 correlations (r) and p-values (p) for five different subsets of the first primary data set and the homogenized samples in the secondary data set. Severity is most significantly related toMEblue, the TPH2 SNP and the candidate FOXN1 gene in the combined male and homogenized female sub set. Homogenizing the second data set (which discarded three samples) also improved the MEblue and FOXN1 associations with severity.
1MEblue refers to the blue module eigengene, or first principal component of the blue module.
2SNP refers to the TPH2 SNP rs10784941.
3Here empiric replaces severity in cor(severity, FOXN1).
Pearson correlations (r) between the expression profiles of the 20 candidate genes from the IWGCNA and MEblue, CFS severity, and the TPH2 SNP.
| r: All | Rank* | r: All | p-value | r: M | r: F | ||||||
| 1 | FOXN1 ( | 0.845 | 195 | 0.21 | 0.018 | 0.23 | 0.20 | ||||
| 2 | PRDX3 ( | 0.848 | 181 | 0.21 | 0.020 | 0.32 | 0.17 | ||||
| 3 | SUCLA2 ( | 0.844 | 197 | 0.20 | 0.021 | 0.36 | 0.17 | ||||
| 4 | DCTN2 ( | 0.909 | 30 | 0.23 | 0.009 | 0.41 | 0.18 | ||||
| 5 | PGK1 ( | 0.849 | 176 | 0.14 | 0.108 | 0.26 | 0.12 | ||||
| 6 | SNURF ( | 0.882 | 77 | 0.18 | 0.037 | 0.32 | 0.14 | ||||
| 7 | PRKCH ( | 0.888 | 64 | 0.15 | 0.089 | 0.23 | 0.13 | ||||
| 8 | RYK ( | 0.867 | 113 | 0.12 | 0.182 | 0.21 | 0.09 | ||||
| 9 | PPP1R14C ( | 0.866 | 114 | 0.21 | 0.016 | 0.26 | 0.19 | ||||
| 10 | VAMP5 ( | 0.863 | 124 | 0.24 | 0.007 | 0.35 | 0.20 | ||||
| 11 | PRO0641 ( | 0.853 | 159 | 0.21 | 0.016 | 0.25 | 0.19 | ||||
| 12 | TMEM50A ( | 0.911 | 23 | 0.17 | 0.050 | 0.22 | 0.16 | ||||
| 13 | CRNKL1 ( | 0.865 | 117 | 0.22 | 0.013 | 0.36 | 0.18 | ||||
| 14 | NPAL2 ( | 0.919 | 10 | 0.21 | 0.020 | 0.33 | 0.16 | ||||
| 15 | TFB2M ( | 0.899 | 49 | 0.21 | 0.016 | 0.22 | 0.19 | ||||
| 16 | PBLD ( | 0.906 | 38 | 0.17 | 0.049 | 0.23 | 0.15 | ||||
| 17 | LTV1 ( | 0.856 | 145 | 0.19 | 0.029 | 0.29 | 0.17 | ||||
| 18 | MED8 ( | 0.869 | 108 | 0.22 | 0.015 | 0.35 | 0.18 | ||||
| 19 | CD302 ( | 0.817 | 315 | 0.18 | 0.046 | 0.29 | 0.16 | ||||
| 20 | ( | 0.887 | 68 | 0.19 | 0.032 | 0.38 | 0.15 | ||||
The correlations were computed using all 127 samples studied (All), the 98 female samples and the 29 male samples except for the correlation with severity which only had 87 non-missing scores (64 female and 23 male). The CFS Severity column is bolded for clarity.
*The rank of each gene in terms of its MEblue correlation out of the 8966 genes used to start the analysis.
Figure 4Secondary data set results. (a) Average linkage hierarchical clustering of the gene expressions from 33 secondary data set samples colored by the original network module definitions shows that the blue module is preserved. (b) Intramodular connectivity is preserved between the secondary and primary data set networks.
Figure 5Ingenuity Pathway Analysis results. An IPA comparison analysis indicates that the 20 candidate gene pathway (light blue) is connected with several of the most highly significant blue module pathways (dark blue). Each pathway description was selected from the top three most significant IPA pathway annotations, and the other two are listed below the diagram. The ranks correspond to the p-values of the identified networks, where the network with the smallest p-value has rank = 1.
Candidate gene names and Entrez Gene descriptions from the standard analysis.
| 1 | DGCR8 ( | DiGeorge syndrome critical region gene 8. 22q11.2 |
| 2 | PPARD ( | Peroxisome proliferator-activated receptor delta. May be involved in the development of several chronic diseases, including diabetes, obesity, atherosclerosis, and cancer. 6p21.2-p21.1 |
| 3 | IHPK2 ( | Inositol hexaphosphate kinase 2. May affect the growth suppressive and apoptotic activities of interferon-beta in some ovarian cancers. 3p21.31 |
| 4 | CCDC92 ( | Coiled-coil domain containing 92. 12q24.31 |
| 5 | NR5A2 ( | Nuclear receptor subfamily 5, group A, member 2. May be involved in regulation of Hepatitis B virus. 1q32.1 |
| 7 | NXF1 ( | Nuclear RNA export factor 1. Exports viral mRNA's and herpes simplex virus type 1. 11q12-q13 |
| 8 | COL13A1 ( | Collagen, type XIII, alpha 1. May function in connective tissues. 10q22 |
| 9 | AXIN2 ( | Regulates stability of beta-catenin in the Wnt signaling pathway. Mutations associated with colorectal cancer. 17q23-q24 |
| 10 | SCAP ( | SREBF chaperone. Involved in regulating sterol biosynthesis. 3p21.31 |
| 11 | DFFA ( | DNA fragmentation factor, 45 kDa, alpha polypeptide. Triggers DNA fragmentation during apoptosis. 1p36.3-p36.2 |
| 12 | TCF4 ( | Transcription factor 7-like 2 (T-cell specific, HMG-box). Implicated in blood glucose homeostasis. 10q25.3 |
| 13 | WNT16 ( | Wingless-type MMTV integration site family, member 16. Implicated in oncogenesis and in several developmental processes. 7q31 |
| 14 | ZNF687 ( | Zinc finger protein 687. 1q21.2 |
| 15 | FGF1 ( | Fibroblast growth factor 1 (acidic). Embryonic development, cell growth, morphogenesis, tissue repair, tumor growth and invasion. 5q31 |
| 16 | ANKRD6 ( | Ankyrin repeat domain 6. 6q14.2-q16.1 |
| 17 | EPHX1 ( | Epoxide hydrolase 1, microsomal (xenobiotic). Activation and detoxification of exogenous chemicals such as polycyclic aromatic hydrocarbons. 1q42.1 |
| 18 | FAIM ( | Fas apoptotic inhibitory molecule. 3q22.3 |
| 20 | ADFP ( | Adipose differentiation-related protein. 9p22.1 |
| 21 | BAT5 ( | HLA-B associated transcript 5. Possibly involved in immunity. 6p21.3 |
| 22 | CEBPA ( | CCAAT/enhancer binding protein, alpha. Body weight homeostasis. 19q13.1 |
| 23 | HNRNPA1 ( | Heterogeneous nuclear ribonucleoprotein A1. May be part of the regulatory mechanisms of the life cycle of HTLV-1 human retrovirus in T cells. 12q13.1 |
| 25 | RNASEN ( | Ribonuclease type III, nuclear. Participates in diverse RNA maturation and decay pathways. 5p13.3 |
| 26 | EDAR ( | Ectodysplasin A receptor. Mutations in this gene result in hypohidrotic ectodermal dysplasia. 2q11-q13 |
| 27 | F3 ( | Coagulation factor III (thromboplastin, tissue factor). Enables cells to initiate the blood coagulation cascades. 1p22-p21 |
| 28 | HSPG2 ( | Heparan sulfate proteoglycan 2. 1p36.1-p34 |
*Members of the blue module.
Pearson correlations (r) between the expression profiles of the 29 candidate genes from the standard analysis and MEblue, severity, and the TPH2 SNP.
| r: All | Rank1 | r: All | p-value | r: M | r: F | ||||||
| 1 | DGCR8 ( | 0.81 | 329 | 0.13 | 0.14 | 0.12 | 0.12 | ||||
| 2 | PPARD ( | 0.54 | 2305 | 0.10 | 0.25 | 0.18 | 0.08 | ||||
| 3 | IHPK2 ( | 0.48 | 2842 | 0.15 | 0.09 | 0.02 | 0.16 | ||||
| 4 | CCDC92 ( | 0.43 | 3434 | 0.07 | 0.42 | -0.07 | 0.10 | ||||
| 5 | NR5A2 ( | 0.47 | 2945 | 0.10 | 0.27 | 0.18 | 0.07 | ||||
| 7 | NXF1 ( | 0.63 | 1517 | 0.15 | 0.10 | -0.03 | 0.15 | ||||
| 8 | COL13A1 ( | 0.51 | 2590 | 0.05 | 0.60 | -0.10 | 0.06 | ||||
| 9 | AXIN2 ( | 0.80 | 414 | 0.13 | 0.15 | 0.16 | 0.10 | ||||
| 10 | SCAP ( | 0.64 | 1458 | 0.17 | 0.06 | 0.38 | 0.12 | ||||
| 11 | DFFA ( | 0.71 | 948 | 0.11 | 0.21 | 0.07 | 0.10 | ||||
| 12 | TCF4 ( | 0.76 | 637 | 0.14 | 0.13 | 0.15 | 0.11 | ||||
| 13 | WNT16 ( | 0.71 | 910 | 0.08 | 0.36 | 0.13 | 0.05 | ||||
| 14 | ZNF687 ( | 0.83 | 243 | 0.10 | 0.28 | 0.29 | 0.04 | ||||
| 15 | FGF1 ( | 0.65 | 1378 | 0.02 | 0.83 | -0.03 | 0.00 | ||||
| 16 | ANKRD6 ( | 0.83 | 241 | 0.14 | 0.12 | 0.23 | 0.09 | ||||
| 17 | EPHX1 ( | 0.64 | 1442 | 0.11 | 0.22 | -0.01 | 0.12 | ||||
| 18 | FAIM ( | 0.86 | 132 | 0.07 | 0.43 | 0.11 | 0.04 | ||||
| 20 | ADFP ( | 0.69 | 1057 | 0.11 | 0.20 | 0.12 | 0.10 | ||||
| 21 | BAT5 ( | 0.73 | 802 | 0.10 | 0.24 | 0.08 | 0.09 | ||||
| 22 | CEBPA ( | 0.70 | 980 | 0.08 | 0.38 | -0.16 | 0.10 | ||||
| 23 | HNRNPA1 ( | 0.46 | 3048 | 0.11 | 0.23 | -0.12 | 0.15 | ||||
| 25 | RNASEN ( | 0.75 | 680 | 0.10 | 0.26 | 0.09 | 0.08 | ||||
| 26 | EDAR ( | 0.82 | 288 | 0.09 | 0.32 | -0.02 | 0.09 | ||||
| 27 | F3 ( | 0.67 | 1181 | 0.10 | 0.26 | 0.18 | 0.06 | ||||
| 28 | HSPG2 ( | 0.30 | 4880 | -0.03 | 0.71 | -0.25 | -0.02 | ||||
The correlations were computed using the 127 samples studied (All), the 98 female samples and the 29 male samples except for the correlation with severity which only had 87 non-missing scores (64 female and 23 male). Four of the candidate genes indicated in bold satisfied our IWGCNA screening criteria (excluding blue module membership). The CFS Severity column is bolded forclarity.
1The rank of each gene in terms of its MEblue correlation out of the 8966 genes used to start the analysis.
2The median was computed using the absolute value.
Figure 6Boxplot comparisons of correlation distributions for the 20 candidate genes from the IWGCNA and the 29 candidate genes from the standard analysis. The correlations with severity are higher among the standard analysis candidate genes, but the MEblue and TPH2 SNP correlations are higher for the IWGCNA candidates.