| Literature DB >> 32099002 |
Paola Paci1, Giulia Fiscon2, Federica Conte2, Valerio Licursi3, Jarrett Morrow4, Craig Hersh4, Michael Cho4, Peter Castaldi4, Kimberly Glass4, Edwin K Silverman4, Lorenzo Farina5.
Abstract
Chronic obstructive pulmonary disease (COPD) is a complex and heterogeneous syndrome. Network-based analysis implemented by SWIM software can be exploited to identify key molecular switches - called "switch genes" - for the disease. Genes contributing to common biological processes or defining given cell types are usually co-regulated and co-expressed, forming expression network modules. Consistently, we found that the COPD correlation network built by SWIM consists of three well-characterized modules: one populated by switch genes, all up-regulated in COPD cases and related to the regulation of immune response, inflammatory response, and hypoxia (like TIMP1, HIF1A, SYK, LY96, BLNK and PRDX4); one populated by well-recognized immune signature genes, all up-regulated in COPD cases; one where the GWAS genes AGER and CAVIN1 are the most representative module genes, both down-regulated in COPD cases. Interestingly, 70% of AGER negative interactors are switch genes including PRDX4, whose activation strongly correlates with the activation of known COPD GWAS interactors SERPINE2, CD79A, and POUF2AF1. These results suggest that SWIM analysis can identify key network modules related to complex diseases like COPD.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32099002 PMCID: PMC7042269 DOI: 10.1038/s41598-020-60228-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Differentially expressed genes in lung tissue samples. (a) Pie chart represents the percentages of DEGs that are up-/down-regulated in COPD cases in comparison to control subjects, based on 1% false discovery rate. (b) Heatmap represents DEGs clustered according to genes (rows) and samples (columns) by using one minus the Pearson correlation as distance. Colors represent different expression levels increasing from blue to yellow.
Figure 2COPD correlation network and module eigengene (a) COPD correlation network where nodes are DEGs and a link occurs between them if the absolute value of the Pearson correlation coefficient between their expression profiles exceeds the correlation threshold ( | r | > 0.57). Groups of nodes sharing the same color represent gene modules obtained by k-means clustering. (b) [UPPER] Heatmap representing genes of module 3 (rows) across samples (columns). Colors represent different expression levels increasing from blue to yellow. Gene expression data are log2-transformed and z-score normalized. [BOTTOM] Bar plot of the expression levels of module 3 eigengene (y-axis) across samples (x-axis). Gene expression data are log2-transformed and z-score normalized. (c) The percent variability explained by each principal component (PC) computed for module 3, known as a Pareto chart, contains both bars and a line graph, where individual values are represented in descending order by bars, and the line represents the cumulative total value. The left y-axis represents the percentage of the data variance explained by each PC, the right y-axis represents the cumulative distribution, and the x-axis represents the PCs that are able to explain 100% of the cumulative distribution. PC1 represents the module eigengene and explains about 90% of the data variance.
Figure 3Module characterization in COPD network. The three boxes represent the three modules obtained by k-means clustering from the COPD correlation network. In each module, genes of interest or immune cell populations are highlighted. From top to bottom: boxplots in controls (orange boxes) and COPD cases (green boxes) of the module 1 eigengene and of the GWAS genes with the highest module 1 membership; bar plots, for each immune cell population included in module 2, of the fold-change values of the marker genes in that immune cell population; boxplot in controls (orange boxes) and COPD cases (green boxes) of the module 3 eigengene and the top-enriched GO BP terms and KEGG pathways in this module.
Figure 4Identification of COPD switch genes. (a-b) Heat cartography maps of COPD and randomized network obtained by randomly shuffling the edges but preserving the degree of each node. Dots correspond to network nodes colored according to their APCC value.
Figure 5Characterization of COPD switch genes. (a) The larger pie chart [right] represents the percentages of DEGs that are up-/down-regulated in COPD cases in comparison to control subjects. The smaller pie chart [left] represents the percentages of switch genes among the up-regulated genes in COPD cases. (b) The pie chart represents the percentages of switch genes in each cluster. (c) Tables listing the switch genes that are transcription factors [left] and GWAS genes [right]. Switch genes are colored according to their associated cluster. (d) Robustness of the COPD correlation network. Blue curve corresponds to the cumulative deletion of non-switch hubs (i.e., the first 62 hubs that are not switch genes, sorted by decreasing degree); red curve corresponds to the cumulative deletion of the 62 switch genes, sorted by decreasing degree; the green curve corresponds to the cumulative deletion of randomly selected nodes. The x-axis represents the cumulative fraction of removed nodes with respect to the total number of 1655 network nodes (i.e., x-maximum is 62/1655 = 0.04), while the y-axis represents the average shortest path.
Figure 6Degree distribution for each class of hubs. The dashed red lines correspond to the lowest degree (i.e., 264) of the first 62 (i.e., number of switch genes) nodes sorted by decreasing degree. For each class of hubs, the number of nodes that are included in the first 62 sorted nodes is reported.
Figure 7Probability distribution of the network proximity. The network proximity was computed between the list of switch genes from the COPD training and test set. The dashed red line corresponds to the observed network proximity measure (p=1.8) across the lists of switch genes in the two analyzed datasets. The red area represents the probability of observing the test statistic as small as that observed, corresponding to a p-value = 0.049, or smaller.
Figure 8Schematic SWIM disease modules. Schematic diagram of disease modules identified by SWIM in the full interactome between switch genes associated with the three diseases identified in the legend.
Common KEGG functional annotations. Table showing the KEGG pathways shared between the two lists of switch genes obtained from the training and test set (i.e., GSE47460 and GSE7925).
| KEGG pathways | Switch GSE47460 | Switch GSE76925 |
|---|---|---|
| Antigen processing and presentation | ||
| Th17 cell differentiation | ||
| Primary immunodeficiency | ||
| NF-kappa B signaling pathway | ||
| Toll-like receptor signaling pathway | ||
| NOD-like receptor signaling pathway | ||
| PI3K-Akt signaling pathway | ||
| Cellular senescence |
Common GO BP functional annotations. Table showing the GO Biological Processes shared between the two lists of switch genes obtained from the training and test set (i.e., GSE47460 and GSE7925).
| GO Biological Process | Switch GSE47460 | Switch GSE76925 |
|---|---|---|
| antigen processing and presentation of exogenous peptide antigen via MHC class I | ||
| antigen receptor-mediated signaling pathway | ||
| cellular response to cytokine stimulus | ||
| cellular response to interleukin-1 | ||
| cytokine-mediated signaling pathway | ||
| innate immune response activating cell surface receptor signaling pathway | ||
| positive regulation of T cell proliferation | ||
| regulation of cytokine-mediated signaling pathway | ||
| regulation of immune response | ||
| regulation of interleukin-2 production | ||
| negative regulation of interleukin-2 production | ||
| neutrophil activation involved in immune response | ||
| neutrophil degranulation | ||
| neutrophil mediated immunity | ||
| neutrophil migration | ||
| regulation of response to cytokine stimulus | ||
| positive regulation of I-kappaB kinase/NF-kappaB signaling | ||
| regulation of I-kappaB kinase/NF-kappaB signaling | ||
| toll-like receptor signaling pathway | ||
| MyD88-dependent toll-like receptor signaling pathway | ||
| cellular response to hypoxia | ||
| extracellular matrix disassembly | ||
| extracellular matrix organization | ||
| cellular response to DNA damage stimulus | ||
| regulation of cellular response to stress | ||
| negative regulation of cell adhesion mediated by integrin | ||
| negative regulation of angiogenesis | ||
| negative regulation of apoptotic process | ||
| regulation of cell differentiation | ||
| regulation of cell migration | ||
| negative regulation of cell migration | ||
| negative regulation of cell motility | ||
| negative regulation of cell proliferation | ||
| regulation of cell proliferation | ||
| regulation of intracellular signal transduction | ||
| regulation of signal transduction | ||
| regulation of signal transduction by p53 class mediator | ||
| regulation of stem cell differentiation | ||
| negative regulation of Wnt signaling pathway |
Figure 9Switch genes interactions. [LEFT] Networks of switch genes negatively correlated with GWAS genes. Pink nodes correspond to GWAS genes, blue nodes correspond to switch genes, orange nodes correspond to switch genes that are also GWAS genes, larger size nodes correspond to negative nearest neighbor of EMP2. The interactions of AGER with its nearest neighbors are highlighted in red. [RIGHT] Sketched network of correlations among switch genes and SERPINE2, CD79A, and POUF2AF1.
Figure 10Flowchart of gene expression analysis.
Linear regression models for association with the variable of interest. In this table the linear regression models used to fit each dataset were reported, where EXP refers to the gene expression data, and COPD refers to the variable of interest (i.e., case/control condition). Smoker status of GSE47460 dataset corresponds to: current, ever, or never.
| Dataset | Reference | GEO Accession | Linear Model |
|---|---|---|---|
| training set | Peng 2016[ | GSE47460 | EXP ~ COPD + age + sex + smoker status + 2 surrogate_variables |
| test set | Morrow 2017[ | GSE76925 | EXP ~ COPD + age + sex + race + pack years + 2 surrogate_variables |