| Literature DB >> 34432807 |
Lucas A Gillenwater1, Shahab Helmi2, Evan Stene2, Katherine A Pratte1, Yonghua Zhuang3, Ronald P Schuyler4, Leslie Lange5, Peter J Castaldi6, Craig P Hersh6, Farnoush Banaei-Kashani2, Russell P Bowler1, Katerina J Kechris3.
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is the third leading cause of mortality in the United States; however, COPD has heterogeneous clinical phenotypes. This is the first large scale attempt which uses transcriptomics, proteomics, and metabolomics (multi-omics) to determine whether there are molecularly defined clusters with distinct clinical phenotypes that may underlie the clinical heterogeneity. Subjects included 3,278 subjects from the COPDGene cohort with at least one of the following profiles: whole blood transcriptomes (2,650 subjects); plasma proteomes (1,013 subjects); and plasma metabolomes (1,136 subjects). 489 subjects had all three contemporaneous -omics profiles. Autoencoder embeddings were performed individually for each -omics dataset. Embeddings underwent subspace clustering using MineClus, either individually by -omics or combined, followed by recursive feature selection based on Support Vector Machines. Clusters were tested for associations with clinical variables. Optimal single -omics clustering typically resulted in two clusters. Although there was overlap for individual -omics cluster membership, each -omics cluster tended to be defined by unique molecular pathways. For example, prominent molecular features of the metabolome-based clustering included sphingomyelin, while key molecular features of the transcriptome-based clusters were related to immune and bacterial responses. We also found that when we integrated the -omics data at a later stage, we identified subtypes that varied based on age, severity of disease, in addition to diffusing capacity of the lungs for carbon monoxide, and precent on atrial fibrillation. In contrast, when we integrated the -omics data at an earlier stage by treating all data sets equally, there were no clinical differences between subtypes. Similar to clinical clustering, which has revealed multiple heterogenous clinical phenotypes, we show that transcriptomics, proteomics, and metabolomics tend to define clusters of COPD patients with different clinical characteristics. Thus, integrating these different -omics data sets affords additional insight into the molecular nature of COPD and its heterogeneity.Entities:
Mesh:
Year: 2021 PMID: 34432807 PMCID: PMC8386883 DOI: 10.1371/journal.pone.0255337
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Clinical characteristics and demographics for profiled subjects by each of the three -omics technologies.
| Variable | Transcriptomic | Proteomic | Metabolomic | p-value |
|---|---|---|---|---|
| No. of Participants | 2637 | 1013 | 1057 | |
| Age (mean(sd | 65.5 (8.6) | 67.8 (8.6) | 67.6 (8.6) | < 0.0001 |
| % Female | 48.3 | 49.3 | 49.5 | 0.7668 |
| % AA | 25.2 | 7.9 | 8.6 | < 0.0001 |
| %NHW | 74.8 | 92.1 | 91.4 | |
| BMI | 29 (6.3) | 29.1 (6.4) | 28.9 (6.2) | 0.8600 |
| % Former Smoker | 64.4 | 75.3 | 74.7 | < 0.0001 |
| % Current Smoker | 35.6 | 24.7 | 25.3 | |
| Smoking Pack-Years (mean(sd | 44.1 (23.9) | 45.1 (24.8) | 45.1 (24.7) | 0.3685 |
| % Controls | 56.1 | 52 | 52 | 0.0195 |
| % COPD | 43.9 | 48 | 48 | |
| % PRISm | 12.7 | 9.3 | 9.3 | 0.3346 |
| % GOLD 0 | 43.4 | 42.6 | 42.7 | |
| % GOLD 1 | 9.9 | 10.4 | 10.7 | |
| % GOLD 2 | 20.1 | 19.8 | 20 | |
| % GOLD 3 | 9.9 | 11.6 | 11.3 | |
| % GOLD 4 | 4.1 | 6.2 | 6 | |
| FEV1pp | 78.5 (24.3) | 77.2 (26.6) | 77.6 (26.4) | 0.3563 |
| FEV1/FVC | 0.6762 (0.1471) | 0.6572 (0.1549) | 0.6581 (0.1544) | 0.0002 |
| % Emphysema | 5.5 (9.3) | 7.1 (10.1) | 7 (10.1) | < 0.0001 |
| Exacerbation Frequency | 0.3 (0.8) | 0.3 (0.8) | 0.3 (0.7) | 0.9326 |
| % Chronic Bronchitis | 14.7 | 16.5 | 16.2 | 0.443 |
1sd-standard deviations.
2NHW—Non-Hispanic White; AA—African American.
3BMI–body mass index (kg/m2).
4COPD is defined by GOLD score > 0.
5PRISm—Preserved Ratio Impaired Spirometry [11].
6FEV1/FVC = post-bronchodilator forced expiratory volume at one second (FEV1)/forced vital capacity (FVC)
7FEV1pp = FEV1 percent predicted.
8Quantitative emphysema was quantified by percent of lung voxels -950 Hounsfield Units (% low attenuation areas: %LAA) on the full inspiratory CT scans. Visual emphysema was assessed as described by [12].
9Exacerbations were defined as acute worsening of respiratory symptoms requiring treatment with oral corticosteroids and/or antibiotics, emergency room visit, or hospital admission [13].
10Chronic bronchitis was defined as self-reported chronic cough and sputum for at least three months in each of the two years prior to baseline.
11P- values are reporting for testing the differences of variables across the three different groups of subjects. P-values are based on chi-square tests for categorical or binary variables (sex, smoking status, COPD status, COPD severity by GOLD status, and chronic bronchitis status) or ANOVA tests for continuous variables (age, BMI, smoking pack-years, FEV1pp, FEV1/FVC, percent emphysema, and exacerbation frequency).
Final subtyping results based on AE and MineClus for each -omics data.
For each -omics type the total number of samples and features, two subtypes (size; silhouette), and number of outliers is listed along with the overall silhouette and connectedness.
| Dataset | Samples | Features | w | Outliers | Subtype 1 | Subtype 2 | Silhouette | Connectedness |
|---|---|---|---|---|---|---|---|---|
| Transcriptomics | 2637 | 1889 | 14.2 | 23 | 2342; 0.31 | 272; 0.35 | 0.31 | 0.96 |
| Proteomics | 1013 | 142 | 5.58 | 57 | 848; 0.17 | 108; 0.13 | 0.16 | 0.92 |
| Metabolomics | 1057 | 187 | 6.5 | 28 | 893; 0.20 | 136; 0.15 | 0.19 | 0.92 |
Summary of top 10 single-omics clinical associations.
All clinical variables listed were significant at a false discovery rate of 10% over the variables tested and are ordered by significance. Only the top 10 associations are displayed. For more details see Data Dictionary in and complete results in .
| Transcriptomics | Proteomics | Metabolomics |
|---|---|---|
| 1-min post-walk Sa02 (%) | Kidney Disease | Age at current visit |
| Age at current visit | Distance walked (ft) | High Blood Pressure |
| Airway Wall Thickness, segmental (main 6) | High Blood Pressure | Distance walked (ft) |
| Red Blood Cell Count | Age at current visit | Coronary Artery Disease |
| Clinical Center | Duration of smoking (yrs) | Kidney Disease |
| Heart Rate 1-minute post-walk (beats/min) | Pack years, from Resp Questionnaire | SF-36 Physical Health Aggregate (PCS) Score (normalized) |
| Hematocrit (%) | Coronary Artery Disease | SF-36 Physical Function (PF) score |
| Hemoglobin (g/dL) | SF-36 Physical Health Aggregate (PCS) Score (normalized) | SF-36 Physical Function (PF) t-score (normalized) |
| Resting SaO2 (%) | SF-36 Physical Function (PF) score | Diabetes |
| In last 12 months, had wheezing or whistling in chest | SF-36 Physical Function (PF) t-score (normalized) | SF-36 Role Physical (RP) t-score (normalized) |
Features selected by SVMRFE for each omic type after 5-fold cross-validation.
Cumulative score is the classification metric (f1-score) of an SVM used to predict the cluster labels using only the features at and above that feature (e.g., score for Fibroblast growth factor 20 is the result of an SVM trained on feature set [Interleukin-23, Fibroblast growth factor 20]). These scores are not used to select the size of the feature sets because the size is selected before features are ranked (see Methods).
| Dataset | Name | Cumulative f1-score |
|---|---|---|
| Transcriptomics | SLCO4C1 | 91.81% |
| TNFRSF10B | 92.23% | |
| SNX4 | 91.81% | |
| RLF | 91.85% | |
| SELENOW | 91.62% | |
| TPD52L2 | 92.77% | |
| PPP1R10 | 91.47% | |
| CD80 | 92.16% | |
| SNRPB2 | 92.27% | |
| RSL24D1 | 92.04% | |
| RPL26L1 | 95.10% | |
| RPS27L | 91.70% | |
| FOLR2 | 91.81% | |
| Proteomics | Interleukin-23 | 89.75% |
| Fibroblast growth factor 20 | 92.47% | |
| Stromelysin-1 | 92.36% | |
| Macrophage-capping protein | 92.05% | |
| C5a anaphylatoxin | 92.68% | |
| Coagulation Factor X | 92.47% | |
| Gelsolin | 92.78% | |
| Trefoil factor 3 | 92.57% | |
| Limbic system-associated membrane protein | 93.51% | |
| Mannose-binding protein C | 93.51% | |
| Adhesion G protein-coupled receptor E2 | 92.57% | |
| Neural cell adhesion molecule 1, 120 kDa isoform | 92.78% | |
| Apolipoprotein A-I | 92.47% | |
| Follicle stimulating hormone | 93.30% | |
| Glucose-6-phosphate isomerase | 92.68% | |
| A disintegrin and metalloproteinase with thrombospondin motifs 5 | 93.62% | |
| Interleukin-1 receptor-like 1 | 93.10% | |
| Nidogen-1 | 93.31% | |
| 72 kDa type IV collagenase | 93.10% | |
| Transforming growth factor-beta-induced protein ig-h3 | 93.10% | |
| C-X-C motif chemokine 10 | 92.99% | |
| Hemojuvelin | 92.99% | |
| Complement factor B | 92.99% | |
| Bone morphogenetic protein 1 | 93.20% | |
| UNANNOTATED (SOMAmer: 9191–8_3) | 91.74% | |
| C-reactive protein | 92.89% | |
| Insulin-like growth factor-binding protein 6 | 92.78% | |
| Apolipoprotein B | 92.57% | |
| C-X-C motif chemokine 16 | 92.15% | |
| UNANNOTATED (SOMAmer: 5451–1_3) | 92.15% | |
| Tumor necrosis factor receptor superfamily member 10D | 92.47% | |
| UNANNOTATED (SOMAmer: 5349–69_3) | 91.74% | |
| UNANNOTATED (SOMAmer: 8459–10_3) | 91.74% | |
| UNANNOTATED (SOMAmer: 8464–31_3) | 91.74% | |
| SPARC-related modular calcium-binding protein 1 | 91.32% | |
| Mast/stem cell growth factor receptor Kit | 92.05% | |
| Ephrin-B1 | 91.32% | |
| NT-3 growth factor receptor | 92.36% | |
| UNANNOTATED (SOMAmer: 5115–31_3) | 92.36% | |
| 60 kDa heat shock protein, mitochondrial | 92.68% | |
| UNANNOTATED (SOMAmer: 5509–7_3) | 91.94% | |
| Metabolomics | dehydroisoandrosterone sulfate (DHEA-S) | 91.93% |
| 3-(3-amino-3-carboxypropyl)uridine* | 91.64% | |
| X– 12117 | 91.84% | |
| stearoyl sphingomyelin (d18:1/18:0) | 91.84% | |
| hydroxy-CMPF* | 91.45% | |
| N6-carbamoylthreonyladenosine | 92.52% | |
| N-formylmethionine | 92.23% | |
| sphingomyelin (d18:1/20:1, d18:2/20:0)* | 91.55% | |
| sphingomyelin (d18:1/17:0, d17:1/18:0, d19:1/16:0) | 91.84% | |
| 1-palmitoyl-2-linoleoyl-GPC (16:0/18:2) | 92.81% | |
| 3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF) | 92.32% | |
| pyroglutamine* | 91.84% | |
| Integrated (Transcriptomics + Proteomics + Metabolomics) | UQCRB | 74.28% |
Top enrichment pathways from multiple annotation sources.
For annotations with a defined hierarchy (gene ontology: GO), p-values were adjusted per-level of the hierarchy (FDR < 0.10). Levels begin from 1, the lowest level of the hierarchy, and increase to the top level, number of levels vary by annotation database.
| Dataset | Annotation | Level | Name | FDR (Level) |
|---|---|---|---|---|
| Transcriptomics | GO Biological Process | 4 | regulation of immune response (GO:0050776) | 3.97 X 10−2 |
| 3 | membrane invagination (GO:0010324) | 4.54 X 10−2 | ||
| 3 | response to bacterium (GO:0009617) | 6.29 X 10−2 | ||
| 3 | positive regulation of immune response (GO:0050778) | 8.10 X 10−2 | ||
| 3 | positive regulation of cysteine-type endopeptidase activity (GO:2001056) | 8.10 X 10−2 | ||
| 5 | positive regulation of catalytic activity (GO:0043085) | 8.40 X 10−2 | ||
| GO Cellular Component | 5 | immunoglobulin complex (GO:0019814) | 5.01 X 10−6 | |
| 4 | immunoglobulin complex, circulating (GO:0042571) | 1.13 X 10−2 | ||
| 2 | ribosome (GO:0005840) | 2.13 X 10−2 | ||
| 2 | ribosomal subunit (GO:0044391) | 2.13 X 10−2 | ||
| 5 | extracellular space (GO:0005615) | 5.24 X 10−2 | ||
| 1 | cytosolic ribosome (GO:0022626) | 8.20 X 10−2 | ||
| Proteomics | No Significant Pathways | |||
| Metabolomics | Sub Class | N/A | Sphingomyelins | 3.19 X 10−2 |
* Sub classes for metabolomic features were annotated by Metabolon, Inc.
Post-clustering clinical associations of subjects that are in the small subtype for one of the -omics data sets, but not the others.
| Clinical variable | FDR | Small Transcriptomic, Large Proteomic & Metabolomic | Small Proteome Cluster, Large Transcriptomic & Metabolomic | Small Metabolome, Large Transcriptomic & Proteomic |
|---|---|---|---|---|
| Distance walked | 4.91E-02 | 1397.3 (308.4) | 1020.0 (520.2) | 1184.4 (584.6) |
| DLco percent predicted | 5.44E-02 | 95.2 (25.3) | 75.5 (23.8) | 69.5 (27.1) |
| Atrial Fibrillation a | 5.44E-02 | 5.6% | 0.0% | 35.3% |
| Percent gas trapping (-856) | 6.80E-02 | 14.5 (15.0) | 26.2 (22.1) | 30.4 (19.9) |
| Age at enrollment | 6.80E-02 | 65.2 (8.3) | 70.7 (7.3) | 73.4 (10.3) |
| FEV1 percent predicted | 7.41E-02 | 83.2 (21.7) | 69.5 (28.4), | 61.6 (18.0) |
*FDR adjusted p values over the variables tested. For more detailed explanation of the clinical variables see Data Dictionary in . Mean and standard deviation except for a which is reported as a percentage.