| Literature DB >> 31501851 |
Christopher R John1, David Watson2,3, Michael R Barnes1,3, Costantino Pitzalis1, Myles J Lewis1.
Abstract
MOTIVATION: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data.Entities:
Mesh:
Year: 2020 PMID: 31501851 PMCID: PMC7703791 DOI: 10.1093/bioinformatics/btz704
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Spectrum clusters five simulated Gaussian clusters and finds the correct K. (a) PCA showing the five simulated Gaussian clusters. (b) The eigenvalues of the eigenvectors from the data’s graph Laplacian, the greatest eigengap is between the fifth and sixth eigenvectors, therefore correctly indicating K = 5
Fig. 2.Spectrum clusters RNA-seq data to find cancer subtypes with different survival times. (a) t-SNE plot illustrating the four clusters Spectrum identified in a brain cancer RNA-seq dataset (Ceccarelli ). (b) Survival curve analysis results using the discovered clusters showing a P-value from a Cox proportional hazards regression model using a log-rank test to test the significance of the survival time differences between clusters
Spectrum multi-omic clustering performance relative to other algorithms
| Dataset |
| Spectrum | PINSplus | iClusterPlus | SNF | CIMLR |
|---|---|---|---|---|---|---|
| Bladder | 338 | 0.0042 (3) | 0.31 (5) | 0.0022 (2) | 0.00022 (1) | 0.0047 (4) |
| Brain | 425 | 3.76E-16 (1) | 0.0053 (4) | 1.72E-07 (3) | 4.17E-11 (2) | 0.013 (5) |
| Breast | 634 | 1.47E-07 (1) | 2.85E-05 (4) | 1.78E-05 (3) | 0.94 (5) | 2.04E-07 (2) |
| Kidney | 240 | 0.91 (5) | 0.038 (2) | 0.24 (4) | 0.045 (3) | 0.0026 (1) |
| PCPG | 80 | 0.043 (1) | 0.18 (4) | 0.093 (3) | 0.09 (2) | 0.54 (5) |
| Skin | 338 | 0.0014 (1) | 0.96 (5) | 0.4 (3) | 0.51 (4) | 0.0029 (2) |
| Thyroid | 219 | 0.049 (1) | 0.09 (2) | 0.67 (5) | 0.18 (4) | 0.17 (3) |
| P integrated | 1.07E-22 | 1.04E-05 | 1.91E-10 | 2.22E-11 | 5.18E-11 | |
| Rank score | 13 | 26 | 23 | 21 | 22 |
Note: P values are from a Cox proportional hazards regression model using a log-rank test to test the significance of the survival time differences between clusters. In brackets next to the P values are the ranks for each dataset. The first final row is the integrated P-value using Fisher’s method, the second is the sum of the ranks (lower is better). PCPG stands for Pheochromocytoma and Paraganglioma. For all datasets, the three data types used were mRNA, miRNA and protein.
Fig. 3.The adaptive density-aware kernel demonstrates an advantage in multi-omic analysis. On the right-hand side of the panel are the results for the Zelnik-Manor kernel, while the density-aware kernel results are shown on the left-hand side. (a) Spectrum clustering assignments from the brain cancer dataset (Ceccarelli ), UMAP was run on the integrated similarity matrices for mRNA, miRNA and protein data to generate the plots. (b) Survival curves with P values from a Cox proportional hazards regression model using a log-rank test to assess significance between clusters
Comparison of spectrum density-aware kernel versus the Zelnik-Manor self-tuning kernel in a multi-omic cluster analysis
| Dataset | Data types |
| Spectrum density aware | Spectrum Zelnik-Manor |
|---|---|---|---|---|
| Bladder | mRNA, miRNA, protein | 338 | 0.0042 | 0.0033 |
| Brain | mRNA, miRNA, protein | 425 | 3.76E-16 | 1.68E-11 |
| Breast | mRNA, miRNA, protein | 634 | 1.47E-07 | 3.56E-07 |
| Kidney | mRNA, miRNA, protein | 240 | 0.91 | 0.86 |
| PCPG | mRNA, miRNA, protein | 80 | 0.043 | 0.35 |
| Skin | mRNA, miRNA, protein | 338 | 0.0014 | 0.0058 |
| Thyroid | mRNA, miRNA, protein | 219 | 0.049 | 0.054 |
| P integrated | 1.07E-22 | 7.71E-17 |
Note: Values correspond to P values from a Cox proportional hazards regression model using a log-rank test to test the significance of the survival time differences between clusters. The final row is the integrated P-value using Fisher’s method. PCPG stands for Pheochromocytoma and Paraganglioma.