Yuanlong Hu1, Xiaomeng Cheng1, Zhanjun Qiu2, Xianhai Chen2. 1. First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, Shandong, China (mainland). 2. Department of Pulmonary Disease, The Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, Shandong, China (mainland).
Abstract
BACKGROUND Chronic obstructive pulmonary disease (COPD) is a disease with high heterogeneity, which is a major challenge in clinical individualized treatment. A mucus phenotype is one of the main characteristics of COPD. MATERIAL AND METHODS Gene expression profiles of lung tissue samples were from the Lung Genomics Research Consortium. MUC5B-associated gene signatures were obtained based on a nonlinear feature screening algorithm. These signatures were used to fit a latent profile analysis (LPA) model to identify COPD molecular subtypes and build a subtype classifier to verify the subtypes. Then, we explored the characteristics of cilium assembly and beating signatures, transcriptome features, immune infiltration among the 3 subtypes by xCell, single-sample gene set enrichment analysis, network perturbation amplitude, and weighted gene co-expression network analysis algorithms. An external dataset was used to verify the above COPD subtypes. RESULTS Three subtypes associated with mucus were identified by LPA and verified in an external dataset. Subtype 1 displayed higher T helper type 1 (Th1) and basophil infiltration, higher Th17/regulatory T cells (Tregs) ratio, a higher level of cilium assembly and beating, and lower mast cell and Treg infiltration. The subtypes 2 and 3 demonstrated higher macrophage M2 infiltration in lung tissue, while subtype 3 had higher neutrophil and eosinophil infiltration than subtype 2. CONCLUSIONS Overall, this work identified 3 mucus-associated molecular subtypes related to MUC5B expression, which deepens the understanding of airway mucus secretion in COPD and potentially provides valuable information for precision therapy.
BACKGROUND Chronic obstructive pulmonary disease (COPD) is a disease with high heterogeneity, which is a major challenge in clinical individualized treatment. A mucus phenotype is one of the main characteristics of COPD. MATERIAL AND METHODS Gene expression profiles of lung tissue samples were from the Lung Genomics Research Consortium. MUC5B-associated gene signatures were obtained based on a nonlinear feature screening algorithm. These signatures were used to fit a latent profile analysis (LPA) model to identify COPD molecular subtypes and build a subtype classifier to verify the subtypes. Then, we explored the characteristics of cilium assembly and beating signatures, transcriptome features, immune infiltration among the 3 subtypes by xCell, single-sample gene set enrichment analysis, network perturbation amplitude, and weighted gene co-expression network analysis algorithms. An external dataset was used to verify the above COPD subtypes. RESULTS Three subtypes associated with mucus were identified by LPA and verified in an external dataset. Subtype 1 displayed higher T helper type 1 (Th1) and basophil infiltration, higher Th17/regulatory T cells (Tregs) ratio, a higher level of cilium assembly and beating, and lower mast cell and Treg infiltration. The subtypes 2 and 3 demonstrated higher macrophage M2 infiltration in lung tissue, while subtype 3 had higher neutrophil and eosinophil infiltration than subtype 2. CONCLUSIONS Overall, this work identified 3 mucus-associated molecular subtypes related to MUC5B expression, which deepens the understanding of airway mucus secretion in COPD and potentially provides valuable information for precision therapy.
Mucus serves a vital role in mucociliary clearance (MCC) and host defense, which help to maintain the lung health [1]. However, a substantial proportion of patients with chronic obstructive pulmonary disease (COPD) experience chronic mucus hypersecretion. This hypersecretion is associated with severe airflow limitation [2], poor quality of life [2-4], a higher number of exacerbations and hospitalizations [2-6], and mortality [7-9].Normally, MCC of the lung consists of 3 elements: motile cilia, a periciliary liquid layer, and a mucus layer [10]. Previous studies [11-13] showed that MUC5AC and MUC5B are the major oligomeric respiratory mucins. However, the functions of MUC5AC and MUC5B are thought to be different, because of the differences in secretion sites, domain structure, and glycosylation [14].The traditional theory holds that the mucous phenotype of COPD is driven by MUC5AC, and the contribution of MUC5B is unnecessary. However, a series of studies in recent years have shown that MUC5B plays a dominant role in the mucous phenotype of COPD. First, a previous study found that MUC5B [15] is a major polymeric mucin from COPD sputum. Second, MUC5B is critical to respiratory innate immunity and MCC, while MUC5AC is secondary. Lowering MUC5B levels could result in airway obstruction by mucus, MCC damage, and increased infection risk [14,16]. Thus, understanding the utility and role of MUC5B in COPD is important [17].The heterogeneity of MUC5B expression can affect the heterogeneity of the mucus phenotype. A search of the literature revealed only a few studies that evaluated the heterogeneity of MUC5B expression and relationship between MUC5B with the lung immune microenvironment. Therefore, the aim of our study was to better understand the correlation of MUC5B expression with MCC and inflammation. In the present study, a latent profile analysis (LPA) method based on MUC5B-associated genes was carried out to identify the COPD subtypes, and the findings were validated in another COPD dataset. Then, we identified the characteristics of cilium assembly and beating signatures, transcriptome features, and immune infiltration in the COPD subtypes.
Material and Methods
Data Source
The gene expression profile GSE47460 was downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) by GEOquery R package (version 2.56.0). The publicly available data of the whole-lung homogenate samples in GSE47460, which was from the Lung Genomics Research Consortium, were from individuals undergoing thoracic surgery. Individuals with cystic fibrosis or pulmonary hypertension were excluded. A total of 220 patients with COPD and 108 donor control subjects were selected from the GSE47460-GPL6480 (75 cases and 17 controls) and GSE47460-GPL14550 (145 cases and 91 controls) cohorts. Patient characteristics are detailed in Supplementary Table 1. The expression profile data of COPD patients were selected for subsequent clustering analysis.
Two-Step Feature Selection
We designed a 2-step feature selection pipeline to identify MUC5B-related gene signatures. In the first step, the maximal information coefficient (MIC) [18] was used to test the dependence between MUC5B with other genes and whether they have a linear or nonlinear relationship, calculated with the minerva R package (version 1.5.8). The MIC values between MUC5B and the expression of other genes in GSE47460-GPL6480 and GSE47460-GPL14550 cohorts were calculated, respectively. The robust rank aggregation (RRA) algorithm was used to evaluate the consistency of the rank of genes with MIC ≥0.3 in 2 cohorts by RobustRankAggreg R package (version 1.1). Genes with an RRA score <0.05 were considered to be the genes obtained by feature selection in the first step.In the second step, the Boruta algorithm with default parameters based on the random forest algorithm was used to identify the genes significantly associated with MUC5B from the one-step genes by Boruta R package (version 7.0.0). The “rejected” genes were excluded. Then, the overlapping genes in the 2 cohorts were confirmed as the final selected genes.
Latent Profile Analysis
To estimate the optimal number of subtypes, latent profile models were fit in tidyLPA (version 1.0.8) and mclust (version 5.4.6) R package based on the selected genes [19]. Models ranging from 2 to 5 subtypes and specified variable variances and covariances arguments were estimated to identify the optimal number of subtypes and parameter combinations. From these models, the best fit was evaluated by the analytic hierarchy process method [18] using Akaike information criterion, approximate weight of evidence, bayesian information criterion, classification likelihood criterion, and Kullback information criterion.
Classifier Construction
Following the previous research of Chen et al [20], we used a similar method to construct a subtype classifier. For each subtype, we computed the average expression value for each of the selected genes based on the gene expression data. We then computed the cosine similarity between each external expression profile and each subtype averaged profile. We assigned each external COPD case to a subtype, according to which subtype profile showed the highest correlation with the given external dataset profile.
Weighted Gene Co-Expression Network Analysis
Weighted gene co-expression network analysis (WGCNA) was used to correlate or associate highly co-expressed genes (modules) with 3 subtypes by WGCNA R package (version 1.69). Modules were selected as further research objects, which displayed high correlation according to module-trait relationships. The module with the highest positive or negative correlation for each COPD subtype was considered to be the key module.
Pathway Analysis
The clusterProfiler (version 3.16.0) and ReactomePA (version 1.32.0) were used to evaluate the biological and functional relevance within COPD subtypes gene-enriched modules. The overrepresentation of gene ontology (GO) categories, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Reactome pathways was examined (Bonferroni-corrected P<0.05 was considered significant).
Network Analysis
We constructed the protein-protein interaction network (PPI) using STRING (https://string-db.org/). The Cytoscape (version 3.7.0) plugin cytoHubba (version 0.1) was used to calculated the degree, bottleneck, closeness, betweenness, MCC, EcCentricity, and radiality scores of each protein node in the PPI network. The 7 node lists sorted in reverse order were combined into a single ranking prioritized node list using the RRA method by the RobustRankAggreg (version 1.1). Genes with an RRA score < 0.05 were identified as hub nodes in the PPI network.
Immune Microenvironments in COPD-Affected Lung Tissue Analysis
The relative abundance across lung tissue of immune cells was scored using xCell R package (version 1.1.0) [21]. The xCell algorithm is a gene signature-based method, which is used to examine whole-lung tissue gene expression data to infer 10 immune cell types, including type 1 T helper (Th1) cells, type 2 T helper (Th2) cells, regulatory T cells (Tregs), basophils, eosinophils, neutrophils, mast cells, macrophages M1, and macrophages M2.To verify the results of xCell, the single-sample gene set enrichment analysis (ssGSEA) algorithm was used to re-evaluate the immune score by GSVA R package (version 1.36.0), including Th1 cells, Th2 cells, activated Th1 cells, activated Th2 cells, Th17 cells, Tregs, basophils, eosinophils, neutrophils, mast cells, macrophages M1, and macrophages M2. In addition, the ratios of Th1 score to Th2 score, activated Th1 to activated Th2, Th17 to Tregs, and macrophages M1 to macrophages M2 were calculated using the ssGSEA score. The gene set and the detailed calculation method used in the calculation are shown in Supplementary Table 2.Further, the network perturbation amplitude (NPA) methodology was used to obtain a quantitative assessment of how the cause-and-effect network models of neutrophil signaling interprets the transcriptomic difference between the various subtypes [22,23]. The NPA R package (https://github.com/philipmorrisintl/NPA) was used to perform NPA analysis, and the network models were provided with the NPAModels R package (https://github.com/philipmorrisintl/NPAModels) [24]. O and K statistics were used to test the specificity of the network models [25]. A network was confirmed to be significantly affected if the P value of the confidence interval, *O, and K* statistics were below 0.05.
Other Statistical Analysis
All analyses and data plotting were performed using R software (version 4.0.2) and Rstudio (version 1.3.1093) for Windows, and t test and 1-way analysis of variance (ANOVA) were used to test for differences among 2 and 3 groups using ggpubr R package (version 0.4.0). The Pearson product moment correlation coefficient was used to determine the correlation by Hmisc R package (version 4.4-1). A P-value of <0.05 indicated statistical significance.
Results
Three Molecular Subtypes Related to Mucus Secretion
Twelve gene signatures were identified using the 2-step feature selection pipeline (Figure 1A) and were selected as indicator variables for constructing latent profiles, including BPIFB1, SERPINB3, SERPINB4, FAM83F, GSTA2, GSTA5, PLEKHS1 (C10orf81), ABCA13, TSPAN19, KRT15, ATP12A, and ANKFN1. The model with equal variances and equal covariances is considered to be the best fit in the GSE47460-GPL14550 dataset (Supplementary Table 3). Three distinct subtypes were identified, namely subtype 1, subtype 2, and subtype 3. The principal components analysis showed a relatively stable partitioning of the samples in the 3 subtypes (Figure 1B). By differential expression analysis of 3 subtypes, 12 genes of subtype 1 were found to be significantly upregulated compared with subtypes 2 and 3 (Figure 1C). Similarly, the expression level of MUC5B showed the same trend. Further analysis of MUC5B expression showed that there was no significant difference between subtype 2 and normal samples, but subtypes 1 and 3 were significantly upregulated. The main source of MUC5B production was considered to be the cells of the submucosal glands and superficial epithelium of small airways [26]. Human SCGB3A2 is the biomarker of serous-like cells of the submucosal gland acinus [27], and subtypes 1 and 3 had a higher level of SCGB3A2 expression than subtype 2 (Figure 2A). Thus, subtypes 1 and 3 were mucus phenotypes.
Figure 1
The disease subtype of chronic obstructive pulmonary disease related to mucus hypersecretion. (A) The maximal information coefficient (MIC) values in 2 datasets. (B) Principal components analysis plot for 3 subtypes in the GSE47460-GPL14550 dataset. (C) Ridgeline plot showing differences of expression values of 12 gene signatures in 3 subtypes. (D) Violin and box plot gave a significant difference in MUC5B expression.
Figure 2
Verification of chronic obstructive pulmonary disease subtypes. (A) Difference in SCGB1A1 and SCGB3A2 between 3 subtypes in the GSE47460-GPL14550 dataset. (B) Confusion matrix showing the performance of classifier. (C) Three subtypes displaying high robustness verified on the GSE47460-GPL6480 dataset. ns, no significance.
We constructed the classifier and tested its performance using a confusion matrix (Figure 2B). Using this classifier, 3 subtypes were verified on another dataset (GSE47460-GPL6480), and the results showed high consistency (Figure 2C).We further analyzed the relationship between the 3 subtypes and clinical characteristics, which revealed that the differences in the clinical characteristics were not significant in the 2 datasets, including the age, sex, smoking status, emphysema, diffusing capacity for carbon monoxide, forced expiratory volume in the first second of expiration, and forced vital capacity (Table 1, Supplementary Table 4).
Table 1
Differences in clinical characteristics among subtypes in the GSE47460-GPL14550 dataset.
Variable
Subtype*
P-value**
1 (N=28)
2 (N=57)
3 (N=60)
Age, y
66 (61, 71)
68 (60, 73)
67 (61, 72)
0.7
Sex, n (%)
0.7
Female
12 (43)
24 (42)
30 (50)
Male
16 (57)
33 (58)
30 (50)
Smoking status, n (%)
0.7
Current
2 (7.1)
5 (8.8)
4 (6.7)
Ever
24 (86)
47 (82)
54 (90)
Never
2 (7.1)
5 (8.8)
2 (3.3)
Emphysema, %
7 (3, 26)
12 (2, 28)
5 (1, 16)
0.4
DLCO, %predicted
46 (35, 68)
48 (38, 67)
60 (48, 77)
0.068
FEV1 (pre-BD), %predicted
54 (26, 73)
58 (28, 66)
52 (44, 66)
0.9
FEV1 (post-BD), %predicted
58 (33, 70)
58 (32, 72)
58 (51, 70)
0.9
FVC (pre-BD), %predicted
73 (62, 90)
75 (64, 86)
78 (66, 88)
0.9
FVC (post-BD), %predicted
81 (74, 93)
80 (69, 91)
84 (71, 94)
0.9
BD – bronchodilator; DLCO – diffusing capacity for carbon monoxide; FEV1 – forced expiratory volume in the first second of expiration; FVC – forced vital capacity; IQR – interquartile range.
Statistics presented: median (IQR); n (%);
statistical tests performed: Wilcoxon rank-sum test, Fisher’s exact test, and chi-square test of independence.
Two Key Modules Related to 3 Molecular Subtypes Identified by WGCNA
WGCNA identified 34 modules (Figure 3A) in the GSE47460-GPL14550 dataset; of these, the eigengene was significantly associated with the 3 subtypes in 17 (Figure 3A). We found the blue module (MEblue) displaying highest correlation was significantly associated with COPD molecular subtypes (Figure 3A).
Figure 3
The Module-trait relationships and biological functions of key modules. (A) Module-trait relationships. (B) Biological functions of MEblue with gene ontology analysis.
To investigate the biological functions of MEblue, the pathway analysis of GO-biological process, KEGG, and Reactome were performed in the target genes corresponding to each module. The key biological functions of MEblue were associated with cilium assembly and beating (Figure 3B, Supplementary Table 5).We constructed the PPI network on the basis of the MUC5B-related genes in the MEblue and calculated the topological features of each node. Afterward, a subnetwork was extracted based on genes linked to MUC5B, in which node rankings in each topological feature were integrated using the RRA method (Figure 4A, 4B). We found that FOXJ1 was the most important hub gene (Figure 4C) and had a significant difference in expression between the 3 subtypes (Figure 4D). In addition, FOXJ1 is a marker of ciliated-to-goblet transdifferentiation [28,29] and cilia development [30], and RFX3 is the transcriptional co-activator to FOXJ1 [31].
Figure 4
MUC5B-associated subnetwork analysis in the MEblue. (A) Protein-protein interaction subnetwork of the MUC5B-associated genes in the MEblue. (B) Heatmap displaying the result of network topological analysis. (C) Network node importance sorted by robust rank aggregation (RRA) method. (D) Difference in FOXJ1 expression between the 3 subtypes.
Correlation analysis showed that MUC5B was positively correlated with FOXJ1 (Pearson’s r=0.63, P<0.001) and RFX3 (Pearson’s r=0.57, P<0.001), respectively. The shape of the loess fitting curves indicated that there is a threshold for FOXJ1/RFX3 expression (Figure 5A, 5B). The slope of the curve after the threshold is greater than the threshold of the curve before the threshold. Additionally, FOXJ1 was positively associated with RFX3 (Pearson’s r=0.72, P<0.001; Figure 5C).
Figure 5
Expression association between MUC5B and FOXJ1/RFX3. (A, B) Curves fitted using loess showing the MUC5B expression association with (A) FOXJ1 and (B) RFX3. (C) Curves fitted using loess showing the expression relationship between FOXJ1 and RFX3. (D, E) Curves fitted using linear regression displaying the MUC5B expression association with (D) FOXJ1 and (E) RFX3 in 3 subtypes. (F) Curves fitted using linear regression displaying the expression association between FOXJ1 and RFX3 in 3 subtypes.
Further subgroup analysis of the 3 subtypes showed that a significant positive relationship was observed between MUC5B and FOXJ1/RFX3 (Figure 5D, 5E) only in subtype 1 (FOXJ1, Pearson’s r=0.67, P<0.001; RFX3, Pearson’s r=0.68, P<0.001), and only subtype 1 and subtype 3 showed a significant positive correlation between FOXJ1 and RFX3 (subtype 1, Pearson’s r=0.86, P<0.001; subtype 3, Pearson’s r=0.54, P<0.001; Figure 5F).
Three Subtypes Differ in Immune Microenvironment Characteristics
We utilized the xCell algorithm to estimate the relative abundance of 10 cells from the expression profile of COPD-affected lung tissue. As shown in Figure 6A, there was a significant difference in the degree of immune cell infiltration between the 3 subtypes, which differed in immune microenvironment characteristics.
Figure 6
Immune microenvironments in chronic obstructive pulmonary disease-affected lung tissue. (A) Relative abundance of immune cell estimated by xCell algorithm. (B) Immune score estimated by ssGSEA algorithm. P-value: ns, no significance; * P<0.05; ** P<0.01; *** P<0.001; **** P<0.0001.
As shown in Figure 6A, compared with subtypes 2 and 3, subtype 1 had a lower fraction of Tregs (subtype 1 vs subtype 2, P=0.013; subtype 1 vs subtype 3, P=0.028) and a higher level of Th1 cells (subtype 1 vs subtype 2, P=0.013; subtype 1 vs subtype 3, P=0.043), but Th2 cell infiltration among the 3 subtypes showed no significant difference. The fractions of macrophages M2 and mast cells were significantly lower in subtype 1. Subtype 3 had a significantly higher level of neutrophils than subtypes 1 and 2 (subtype 1 vs subtype 3, P=0.014; subtype 2 vs subtype 3, P=0.046), which was consistent with the ssGSEA results. What is particularly noteworthy is that basophil infiltration was significantly increased in subtype 1 (subtype 1 vs subtype 2, P=2.7×10−5; subtype 1 vs subtype 3, P=1.6×10−3).To verify and complement the results of xCell evaluation, we used the ssGSEA algorithm to reanalyze the immune microenvironment of lung tissue in COPD. As shown in Figure 6B, the scores for Th1 and Th2 cells in subtype 1 were higher compared with subtypes 2 and 3, while no consistent result was found in the scores for activated Th1 cells and Th2 cells. However, the ratio of the scores for Th1 cells to Th2 cells was not significantly different between the 3 subtypes, and consistent results were found for the activated Th1/Th2 cells score. In contrast with the results calculated by xCell, the level of eosinophil infiltration in subtype 1 was significantly lower than that in subtypes 2 and 3 in the ssGSEA calculation. In addition, the ratio of Th17 cells to Tregs in subtype 1 was higher than the other 2 subtypes.The NPA analysis showed that only the NPA values of subtypes 2 and 3 were significantly different in neutrophil signaling (Figure 7A). The above results were supported by the results of the xCell and ssGSEA. The lung tissue in subtype 3 possessed a higher degree of neutrophil signaling activation, and the activation of top-10 leading nodes was upregulated, including ALOX5, LTB4R1, IRAK1, leukotriene B4, ICAM1, Myd88, IRAK4, ALOX12, neuroprotectin D1, and lipoxin A4 (Figure 7B). Although there was no statistically significant difference in other 2 comparison groups of NPA values (subtype 1 vs subtype 2, subtype 1 vs subtype 3), the above 10 leading nodes were downregulated in subtype 2 and upregulated in subtype 3, compared with subtype 1. Subtypes 1 and 3 had higher ssGSEA scores for the antimicrobial humoral immune response mediated by antimicrobial peptide than subtype 2 (Figure 7C, 7D).
Figure 7
Difference in neutrophil signaling among 3 subtypes. (A) Bar chart of network perturbation amplitude (NPA) score among 3 subtypes. The gray bars represent no statistical significance. (B) Heatmap showing the NPA coefficients with leading nodes. (C) Difference in ssGSEA score of the antimicrobial humoral immune response mediated by antimicrobial peptide between the 3 subtypes (GO: 0061844). (D) Heatmap displaying the gene expression for the antimicrobial humoral immune response mediated by antimicrobial peptide between the 3 subtypes (GO: 0061844).
In summary, lung tissue in subtype 1 displayed higher Th1 and basophil infiltration, higher Th17/Tregs, and lower mast cell and Treg infiltration. Subtypes 2 and 3 demonstrated higher macrophage M2 infiltration in lung tissue, while subtype 3 had higher neutrophil and eosinophil infiltration than subtype 2.
Discussion
Three subtypes were marked with LPA based on 12 gene signatures, which displayed the heterogeneity of MUC5B-related mucus secretion in COPD. The expression level of MUC5B varied from subtype to subtype. However, it was unexpected that the 3 COPD subtypes could not reflect the difference in airflow limitation. A previous study found that the MUC5B/MUC5AC ratio was the factor influencing the lung function [32], so we think the airflow limitation cannot be entirely attributable to the MUC5B expression.Additionally, a decreased MUC5B level could hamper MCC and airway defense injury [16]. Consistent with that idea, our WGCNA analysis showed that subtype 1 with the highest MUC5B expression had a significantly positive association with MCC. The ssGSEA analysis suggested that subtypes 1 and 3 had a stronger antimicrobial humoral immune response than subtype 2. BPIFB1, which is regarded as the regulator of MUC5B [33], was the most upregulated gene. BPIFB1 is a protective secreted protein of goblet cells [34] to create a chemical barrier against harmful pathogens and irritants.Very little is noted in the literature regarding an association between MUC5B expression and the immune microenvironment. Previous observational studies [16] found that the infiltration level of neutrophils and eosinophils was higher in lung tissue with low expression of MUC5B, which was consistent with our results. Of interest, our findings suggest that basophil infiltration may be a new mechanism for regulating the expression of MUC5B. In a review of the literature, we found that basophil infiltration plays a crucial role in emphysema formation [35], and the lower eosinophil/basophil ratio of peripheral blood has been linked to a higher rate of exacerbations in COPD [36]. The SPIROMICS study [37] recently found that patients with low eosinophil counts had a greater degree of obstruction. However, the differences between local and systemic inflammation need to be taken in account.An important caveat to this type of analysis is the limitations of how well inferred values from transcriptomic databases can reflect immune infiltration. Secondly, our subtypes need to be verified in a dataset based on a larger sample size. In addition, the infiltration of immune cells in the lung tissue of COPD is not uniform, and different sampling locations may affect the inferred results of immune infiltration. Despite such limitation, this transcriptome-based approach provides insights on the interaction between the immune microenvironment and mucus secretion in COPD. Future experiments are needed to verify the results and conclusion.
Conclusions
Overall, this work identified 3 MUC5B-associated molecular subtypes, which deepens the understanding of MUC5B-related airway mucus secretion in COPD and could provide valuable information for precision therapy.Clinical characteristics in the GSE47460-GPL14550 and GSE47460-GPL6480 dataset.Statistics presented: median (IQR); n (%);Statistical tests performed: Wilcoxon rank-sum test; Fisher’s exact test; chi-square test of independence.The source and detailed method of gene signature used to calculate ssgsea score.Evaluation of fitted multiple latent profile analysis models.Differences in clinical characteristics among subtypes in the GSE47460-GPL6480 dataset.Statistics presented: median (IQR); n (%);statistical tests performed: Wilcoxon rank-sum test; Fisher’s exact test; chi-square test of independence;emphysema(%) was defined as percent of lung attenuation voxels below −950 Hounsfield units (HU);BD refers to bronchodilator.
Supplementary Table 1
Clinical characteristics in the GSE47460-GPL14550 and GSE47460-GPL6480 dataset.
Variable
GSE47460-GPL14550
GSE47460-GPL6480
Control (N=91)*
COPD (N=145)*
p-value**
Control (N=17)*
COPD (N=75)*
p-value**
Age (years)
65 (58, 72)
67 (60, 73)
0.2
65 (56, 71)
61 (56, 70)
>0.9
Gender
0.15
0.7
Female
51 (56%)
66 (46%)
8 (47%)
29 (39%)
Male
40 (44%)
79 (54%)
9 (53%)
46 (61%)
Smoke Status
<0.001
<0.001
Current
1 (1.4%)
11 (7.6%)
0 (0%)
3 (4.2%)
Ever
49 (69%)
125 (86%)
9 (53%)
66 (92%)
Never
21 (30%)
9 (6.2%)
8 (47%)
3 (4.2%)
Emphysema (%)
0 (0, 1)
7 (1, 24)
<0.001
0 (0, 1)
9 (2, 36)
<0.001
DLCO (%predicted)
80 (72, 92)
53 (39, 72)
<0.001
88 (84, 102)
57 (34, 73)
<0.001
FEV1 (pre-bd, %predicted)
92 (85, 102)
54 (33, 67)
<0.001
99 (91, 104)
42 (24, 68)
<0.001
FEV1 (post-bd, %predicted)
95 (90, 104)
58 (35, 71)
<0.001
102 (98, 110)
59 (34, 76)
<0.001
FVC (pre-bd, %predicted)
92 (86, 102)
76 (64, 88)
<0.001
95 (88, 102)
71 (54, 82)
<0.001
FVC (post-bd, %predicted)
94 (88, 104)
82 (71, 92)
<0.001
95 (94, 104)
83 (66, 94)
0.007
Statistics presented: median (IQR); n (%);
Statistical tests performed: Wilcoxon rank-sum test; Fisher’s exact test; chi-square test of independence.
Supplementary Table 2
The source and detailed method of gene signature used to calculate ssgsea score.
Authors: Maria Montes de Oca; Ronald J Halbert; Maria Victorina Lopez; Rogelio Perez-Padilla; Carlos Tálamo; Dolores Moreno; Adrianna Muiño; José Roberto B Jardim; Gonzalo Valdivia; Julio Pertuzé; Ana Maria B Menezes Journal: Eur Respir J Date: 2012-01-26 Impact factor: 16.671
Authors: Susan D Reynolds; Paul R Reynolds; Gloria S Pryhuber; Jonathan D Finder; Barry R Stripp Journal: Am J Respir Crit Care Med Date: 2002-08-01 Impact factor: 21.405
Authors: Linbo Liu; Suresh Shastry; Suzanne Byan-Parker; Grace Houser; Kengyeh K Chu; Susan E Birket; Courtney M Fernandez; Joseph A Gardecki; William E Grizzle; Eric J Wilsterman; Eric J Sorscher; Steven M Rowe; Guillermo J Tearney Journal: Am J Respir Cell Mol Biol Date: 2014-10 Impact factor: 6.914
Authors: Michelle G Roy; Alessandra Livraghi-Butrico; Ashley A Fletcher; Melissa M McElwee; Scott E Evans; Ryan M Boerner; Samantha N Alexander; Lindsey K Bellinghausen; Alfred S Song; Youlia M Petrova; Michael J Tuvim; Roberto Adachi; Irlanda Romo; Andrea S Bordt; M Gabriela Bowden; Joseph H Sisson; Prescott G Woodruff; David J Thornton; Karine Rousseau; Maria M De la Garza; Seyed J Moghaddam; Harry Karmouty-Quintana; Michael R Blackburn; Scott M Drouin; C William Davis; Kristy A Terrell; Barbara R Grubb; Wanda K O'Neal; Sonia C Flores; Adela Cota-Gomez; Catherine A Lozupone; Jody M Donnelly; Alan M Watson; Corinne E Hennessy; Rebecca C Keith; Ivana V Yang; Lea Barthel; Peter M Henson; William J Janssen; David A Schwartz; Richard C Boucher; Burton F Dickey; Christopher M Evans Journal: Nature Date: 2013-12-08 Impact factor: 49.962
Authors: Colin D Bingle; Kirsty Wilson; Hayley Lunn; Frances A Barnes; Alec S High; William A Wallace; Doris Rassl; Michael A Campos; Manuel Ribeiro; Lynne Bingle Journal: Histochem Cell Biol Date: 2010-03-18 Impact factor: 4.304
Authors: Lauren J Donoghue; Alessandra Livraghi-Butrico; Kathryn M McFadden; Joseph M Thomas; Gang Chen; Barbara R Grubb; Wanda K O'Neal; Richard C Boucher; Samir N P Kelada Journal: Genetics Date: 2017-08-29 Impact factor: 4.562
Authors: Wanda K O'Neal; Richard C Boucher; Alessandra Livraghi-Butrico; Barbara R Grubb; Kristen J Wilkinson; Allison S Volmer; Kimberly A Burns; Christopher M Evans Journal: Mucosal Immunol Date: 2016-07-20 Impact factor: 7.313
Authors: Kenichi Okuda; Gang Chen; Durai B Subramani; Monroe Wolf; Rodney C Gilmore; Takafumi Kato; Giorgia Radicioni; Mehmet Kesimer; Michael Chua; Hong Dang; Alessandra Livraghi-Butrico; Camille Ehre; Claire M Doerschuk; Scott H Randell; Hirotoshi Matsui; Takahide Nagase; Wanda K O'Neal; Richard C Boucher Journal: Am J Respir Crit Care Med Date: 2019-03-15 Impact factor: 30.528