| Literature DB >> 33986404 |
Fabienne K Roessler1, Birke J Benedikter2,3, Bernd Schmeck2,4,5,6, Nadav Bar7.
Abstract
Chronic obstructive pulmonary disease (COPD) kills over three million people worldwide every year. Despite its high global impact, the knowledge about the underlying molecular mechanisms is still limited. In this study, we aimed to extend the available knowledge by identifying a small set of COPD-associated genes. We analysed different publicly available gene expression datasets containing whole lung tissue (WLT) and airway epithelium (AE) samples from over 400 human subjects for differentially expressed genes (DEGs). We reduced the resulting sets of 436 and 663 DEGs using a novel computational approach that utilises a random depth-first search to identify genes which improve the distinction between COPD patients and controls along the first principle component of the data. Our method identified small sets of 10 and 15 genes in the WLT and AE, respectively. These sets of genes significantly (p < 10-20) distinguish COPD patients from controls with high fidelity. The final sets revealed novel genes like cysteine rich protein 1 (CRIP1) or secretoglobin family 3A member 2 (SCGB3A2) that may underlie fundamental molecular mechanisms of COPD in these tissues.Entities:
Year: 2021 PMID: 33986404 PMCID: PMC8119951 DOI: 10.1038/s41598-021-89762-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Overview of the included GEO gene expression data sets and their subject characteristics.
| WLT | AE | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bronchial AE | SAE | |||||||||||||
| GEO Accession | GSE76925[ | GSE47460 | GSE37147[ | GSE11906[ | GSE64614[ | |||||||||
| Microarray platform | GPL10558 | GPL6480 | GPL14550 | GPL6244 | GPL6244 | GPL570 | GPL570 | |||||||
| Smoking status | FS | FS | FS | FS | CS | CS | CS | |||||||
| Subject group | COPD | Control | COPD | Control | COPD | Control | COPD | Control | COPD | Control | COPD | Control | COPD | Control |
| Number of subjects | 111 | 40 | 66 | 9 | 125 | 54 | 57 | 82 | 30 | 69 | 20 | 44 | 36 | 73 |
| Mean age (± SD) | 63.6 (± 6.6) | 65.7 (± 9.0) | 62.7 (± 10.8) | 66.1 (± 10.2) | 66.1 (± 9.1) | 65.9 (± 10.4) | 66.1 (± 5.6) | 65.8 (± 5.0) | 63.2 (± 6.7) | 62.2 (± 6.0) | 52.1 (± 8.1) | 43.5 (± 5.9) | ||
| Male/female | 52/59 | 15/25 | 41/25 | 6/3 | 69/56 | 30/24 | 36/21 | 49/33 | 16/14 | 34/35 | 16/4 | 31/13 | ||
| Pack-years (± SD) | – | – | – | – | 52.9 (± 28.1) | 48.3 (± 22.9) | 47.5 (± 13.9) | 44.3 (± 11.2) | 37.6 (± 23.4) | 28.6 (± 16.4) | ||||
| FEV1% predicted (± SD) | 26.5 (± 9.4) | 98.7 (± 12.5) | 55.5 (± 26.8) | 107.7 (± 11.2) | 53.7 (± 21.9) | 96.6 (± 8.7) | 61.6 (± 12.1) | 93.1 (± 13.1) | 57.6 (± 16.4) | 92.2 (± 13.7) | ||||
| Name of comparison group | WLT 1c | WLT 2 | WLT 3 | AE 1 | AE 2 | AE 3b | AE 4b,c | |||||||
Cursive values were taken from the original publications and not calculated by us.
aThe values displayed were taken from the original publication and include an additional COPD subject which was not included in our study.
bThe comparison group is not age matched (p < 10–4).
cThe comparison group is not matched for pack-years (p < 0.01).
Figure 1Selection of COPD-associated DEGs for the WLT and the AE. (a) Bar diagram showing the number of significant (p < 0.05) DEGs found in each comparison group. (b) Venn diagrams showing the overlap of DEGs between different comparison groups. We only compared groups of the same lung sample type. White-outlined sections mark DEGs that fulfil the first selection criteria (see Results) for COPD-associated DEGs. (c) Heatmap showing fold changes of the 436 and 663 COPD-associated DEGs from the WLT and the AE, respectively. The DEGs were sorted by the mean fold change over all comparison groups.
Figure 2Distinction between COPD and control subjects using PC1. (a) Beeswarm plots comparing rescaled PC1 scores of control and COPD subjects for both lung sample types (WLT and AE). The PC1 scores are computed from either the gene expression values of all tested genes (17,249) or the corresponding COPD-associated DEGs. Only the COPD-associated DEGs lead to a significant difference (p < 0.05) between COPD and control subjects. Red lines with black error bars show the mean ± SEM. (b) ROC curves comparing the performance of the computed PC1 scores (see (a)) in distinguishing COPD from control subjects.
Figure 3Search for small sets of discriminatory DEGs. (a) Visualisation of one iteration of our RDFS approach. The search starts with the full number of COPD-associated DEGs (= N) on the left and continues to the right (black arrows) by randomly removing single genes. After a removal, the PC1 scores for the two subject groups are computed from the expression values of the remaining subset of DEGs (e.g. N − 1) and the p value using a t-test is calculated. If the newly calculated p value is smaller than the previous one (e.g. pN−1 < pN), the gene is removed entirely, and the search continues on that branch of the search tree by randomly removing another gene. If the p value is equal to or bigger than the previous one (e.g. pN−1 ≥ pN), the gene is returned to the set of DEGs and another random gene is removed and tested. The search ends if no removal of a gene leads to a decrease in p value and the remaining subset of DEGs (N − L + 1, with L = depth of search tree) is the smallest set of discriminatory DEGs for this iteration. (b) Smoothed histograms showing the mean frequency of rescaled PC1 scores for COPD and control subjects over all ten search runs. The top ones consider only training subjects, while the lower ones consider only test subjects. The dashed line represents the mean threshold with the highest F-scores in distinguishing COPD from control subjects. (c) ROC curves showing the performance of the different sets of discriminatory DEGs in distinguishing COPD from control subjects. The dashed line represents the performance of a random guess.
10 persistent DEGs of the WLT. Genes are sorted by the mRNA regulation in COPD and gene symbol (alphabetic).
| Name | Gene symbol | mRNA regulation in COPD | Literature evidence supporting a role in COPD |
|---|---|---|---|
| Autophagy related 3 | ATG3 | Up | Positive regulator of autophagy, involved in TGF-β induced epithelial-to-mesenchymal transition in alveolar epithelial A549 cells[ |
| Chimerin 2 | CHN2 | Up | Activator of RAC1[ |
| Dihydropyrimidinase | DPYS | Up | Upregulated in response to long-term (9 months) cigarette smoke exposure in mice[ |
| Growth arrest specific 2 | GAS2 | Up | Shown to modulate cell cycle and apoptosis[ |
| Gametogenetin | GGN | Up | – |
| Cysteine rich protein 1 | CRIP1 | Down | Indirect evidence: CRIP1 acts as carrier for transmucosal zinc absorption[ |
| Dipeptidyl peptidase like 6 | DPP6 | Down | Indirect evidence: DPP6 is a positive regulator of Kv4 potassium channels[ |
| FRAS1 related extracellular matrix 3 | FREM3 | Down | Associated with accelerated aging[ |
| Polypeptide N-acetylgalactosaminyltransferase 14 | GALNT14 | Down | – |
| Pyroglutamylated RFamide peptide receptor | QRFPR | Down | – |
15 persistent DEGs of the AE.
| Name | Gene symbol | mRNA regulation in COPD | Literature evidence supporting a role in COPD |
|---|---|---|---|
| Eva-1 homolog C | EVA1C | Up | – |
| Glutamate decarboxylase 1 | GAD1 | Up | Biosynthetic enzyme for neurotransmitter gamma-amino-butyric acid (GABA). Its mRNA expression is upregulated in AE of COPD patients. Also upregulated in healthy smokers and associated with increased epithelial MUC5AC[ |
| MOB kinase activator 3C | MOB3C | Up | – |
| Phospholipase C gamma 2 | PLCG2 | Up | Downstream mediator of RAC1[ |
| Rac family small GTPase 1 | RAC1 | Up | RAC1 signalling is activated by cigarette smoke and mediates inflammation[ |
| Selenoprotein H | SELENOH | Up | – |
| URB1 ribosome biogenesis homolog | URB1 | Up | – |
| WD repeat domain 4 | WDR4 | Up | – |
| Chromosome 10 open reading frame 53 | C10orf53 | Down | – |
| Cysteine rich secretory protein 3 | CRISP3 | Down | – |
| Integral membrane protein 2A | ITM2A | Down | – |
| Peroxisomal biogenesis factor 5 like | PEX5L | Down | – |
| RIPOR family member 3 | RIPOR3 | Down | – |
| Secretoglobin family 3A member 2 | SCGB3A2 | Down | Supports lung development, anti-apoptotic[ |
| Surfactant protein B | SFTPB | Down | Anti-inflammatory[ |
Genes are sorted by the mRNA regulation in COPD and gene symbol (alphabetic).
Figure 4Study of 10 and 15 persistent DEGs. (a) Beeswarm plots comparing rescaled PC1 scores computed for COPD and control subjects using only the expression values of the two persistent sets of COPD-associated DEGs. For subjects used to select the persistent DEGs, the scores of the COPD subjects are significantly (p < 10–20) different compared to the control subjects. The PC1 scores computed for the validation subjects show their distribution in comparison to the subjects used for selection. Red lines with black error bars show the mean ± SEM (computed only for selection subjects). (b) ROC curves comparing the performance of the persistent DEGs in distinguishing COPD from control subjects using PC1 scores (see (a)). (c) Scatter plots showing the relation between the FEV1% predicted and the gene expression of DPYS in the WLT and URB1 in the AE of COPD and control subjects. The linear regression models the decrease in expression of DPYS and URB1 based on the increase in lung function, adjusted for comparison group, age and gender for DPYS or adjusted for smoking status (FS or CS), age, gender and pack-years for URB1.