| Literature DB >> 35870923 |
Ali Alfatemi1, Hong Peng1, Wentao Rong1, Bin Zhang1, Hongmin Cai2.
Abstract
BACKGROUND: Patient subgroups are important for easily understanding a disease and for providing precise yet personalized treatment through multiple omics dataset integration. Multiomics datasets are produced daily. Thus, the fusion of heterogeneous big data into intrinsic structures is an urgent problem. Novel mathematical methods are needed to process these data in a straightforward way.Entities:
Keywords: Cancer subtypes; Grassmann manifold; Multi-omics data; PCA; Patients subgroups; Survival rates
Mesh:
Substances:
Year: 2022 PMID: 35870923 PMCID: PMC9308936 DOI: 10.1186/s12911-022-01938-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Patients (samples ) and features for the dataset
| Cancer type (samples) | Gene expression(features) | DNA methylation (features) | MicroRNA (features) |
|---|---|---|---|
| BIC (105 patients) | 17,814 | 23,094 | 354 |
| GBM (215 patients) | 12,042 | 1305 | 534 |
| KRCCC (122 patients) | 17,899 | 24,960 | 329 |
| LSCC (106 patients) | 12,042 | 23,074 | 352 |
| COAD (92 patients) | 17,814 | 23,088 | 312 |
Fig. 1The framework of the proposed method. (1) The DNA methylation, gene expression, and miRNA expression omics datasets for the same cohort of patients. (2) The representation for each data type by using PCA. (3) The patient-to-patient graph for each type of omics data. (4) The subspace representation for graphs. (5) Subspaces merging via analysis on the Grassmann manifold. (6) The final integrative groups of patients
Survival analysis by Log-rank test on five tumor dataset
| Cancer type | GrassmannCluster | SNF | Our method |
|---|---|---|---|
| BIC (5 clusters) | |||
| GBM (3 clusters) | |||
| KRCCC (3 clusters) | |||
| LSCC (4 clusters) | |||
| COAD (3 clusters) |
Clustering performance on synthetic multiple omics data. A higher value indicates better performance
| Datasets | Methods | NMI | ACC | F-score | Precision | Recall | Purity |
|---|---|---|---|---|---|---|---|
| Synthetic omics data | GrassmannCluster | 0.9429 | 0.8800 | 0.9320 | 1 | 0.8743 | 1 |
| SNF | 0.639 | 0.500 | 0.5665 | 0.3952 | 1 | 0.5000 | |
| Our method | 0.9468 | 0.9150 | 0.9393 | 1 | 0.8855 | 1 |
Fig. 2Performance comparison for generating subgroups for our method, SNF and GrassmannCluster using synthetic omics data
Fig. 3The clustering results heatmaps of similarities score for each type of cancer
An example of survival to illustrate the comparison between five subgroups for breast cancer
| Subgroup 1 | Subgroup 2 | Subgroup 3 | Subgroup 4 | Subgroup 5 | |
|---|---|---|---|---|---|
| Number of patients | 29 | 19 | 19 | 21 | 17 |
| Events | 10 | 2 | 1 | 2 | 3 |
| Median (days) | 1563 | 3945 | 2965 | 4273 | 1699 |
| N.risk | 5 | 2 | 1 | 1 | 2 |
| Lower 95% CI | 0.2188 | 0.0673 | NA | NA | 0.0839 |
| Upper 95% CI | 0.872 | 1 | NA | NA | 1 |
| Survival | 0.437 | 0.333 | 0 | 0 | 0.375 |
Fig. 4The integrative cluster detects clinically important subgroups of cancer, which are determined by Kaplan- Meier plots for survival clustering integration for BIC, COAD, GBM, KRCCC, and LSCC. The log-rank test show the p value by log-rank test summarizes the statistical differences between subgroups