| Literature DB >> 26626453 |
Dingming Wu1, Dongfang Wang1, Michael Q Zhang2,3, Jin Gu4.
Abstract
BACKGROUND: One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data.Entities:
Mesh:
Year: 2015 PMID: 26626453 PMCID: PMC4667498 DOI: 10.1186/s12864-015-2223-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1LRAcluster overview. LRAcluster receives 3 types (Gaussian, Poisson and Binary) of data as input. A probabilistic model with large amount of parameters are used to model the data. Low-rank approximation of the parameter matrix implies a latent subspace with low dimension. Clustering done on the reduced subspace generates the candidate molecular subtypes
Fig. 2Performance of LRAcluster. a the classification accuracy and silhouette value against the dimension of the reduced subspace (the cluster number is set as three) on the three cancer-type testing dataset. b Time consumption of LRAcluster and iCluster+. The number behind the method’s name is the dimension of the latent subspace. iCluster + represents the method that do not tune the penalty parameter. iCluster.tune represents the method tuning the penalty parameter. c and d the dynamic changes of the explained variance and penalty parameter μ as the algorithm iterates
Fig. 3The curves for parameter choice. a the curve of “explained variance” against dimension. b the curve of silhouette value against cluster number. c the scatter plot of BRCA samples in the reduced 2-dimensional subspace
The unsupervised clustering results of pan-cancer analysis
| BRCA | COAD | GBM | HNSC | KIRC | LGG | LUAD | LUSC | PRAD | STAD | THCA | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C1 | 1 | 0 | 0 | 286 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 293 |
| C2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 411 | 412 |
| C3 | 0 | 0 | 41 | 0 | 0 | 451 | 0 | 0 | 0 | 0 | 0 | 492 |
| C4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 231 | 0 | 231 |
| C5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 293 | 0 | 0 | 293 |
| C6 | 0 | 190 | 0 | 1 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 194 |
| C7 | 3 | 17 | 0 | 0 | 1 | 0 | 406 | 7 | 0 | 0 | 3 | 437 |
| C8 | 0 | 0 | 0 | 0 | 240 | 0 | 0 | 0 | 0 | 0 | 0 | 240 |
| C9 | 448 | 0 | 1 | 2 | 1 | 0 | 4 | 1 | 0 | 0 | 0 | 457 |
| C10 | 8 | 1 | 0 | 195 | 0 | 0 | 6 | 60 | 0 | 0 | 0 | 270 |
| Total | 460 | 208 | 42 | 484 | 242 | 452 | 418 | 74 | 294 | 231 | 414 | 3319 |
The results of single-cancer analysis
| Cancer | Dimensiona | #Clusterb | Silhouette values |
|---|---|---|---|
| BRCA | 2 | 2 | 0.55 |
| COAD | 4 | 4 | 0.40 |
| GBM | 8 | 2 | 0.35 |
| HNSC | 7 | 3 | 0.26 |
| KIRC | 6 | 2 | 0.36 |
| LGG | 2 | 3 | 0.44 |
| LUAD | 5 | 2 | 0.34 |
| LUSC | 5 | 4 | 0.32 |
| PRAD | 2 | 4 | 0.41 |
| STAD | 4 | 3 | 0.37 |
| THCA | 2 | 2 | 0.61 |
aThe dimension of the reduced space is determined according to the curve of the explained variations of each cancer type
bThe number of clusters is determined according to the curve of the within cluster variances
Fig. 4The molecular subtypes identified by LRAcluster. (a) is for LGG, (b) for PRAD and (c) for THCA. The scatter plots show all the samples in the corresponding reduced 2-dimensional subspace. Different colors represent different molecular subtypes identified by LRAcluster, c indicates the number of identified clusters and s shows the silhouette value