| Literature DB >> 31120942 |
Sayantan Mitra1, Sriparna Saha1.
Abstract
Recent high throughput omics technology has been used to assemble large biomedical omics datasets. Clustering of single omics data has proven invaluable in biomedical research. For the task of patient sub-classification, all the available omics data should be utilized combinedly rather than treating them individually. Clustering of multi-omics datasets has the potential to reveal deep insights. Here, we propose a late integration based multiobjective multi-view clustering algorithm which uses a special perturbation operator. Initially, a large number of diverse clustering solutions (called base partitionings) are generated for each omic dataset using four clustering algorithms, viz., k means, complete linkage, spectral and fast search clustering. These base partitionings of multi-omic datasets are suitably combined using a special perturbation operator. The perturbation operator uses an ensemble technique to generate new solutions from the base partitionings. The optimal combination of multiple partitioning solutions across different views is determined after optimizing the objective functions, namely conn-XB, for checking the quality of partitionings for different views, and agreement index, for checking agreement between the views. The search capability of a multiobjective simulated annealing approach, namely AMOSA is used for this purpose. Lastly, the non-dominated solutions of the different views are combined based on similarity to generate a single set of non-dominated solutions. The proposed algorithm is evaluated on 13 multi-view cancer datasets. An elaborated comparative study with several baseline methods and five state-of-the-art models is performed to show the effectiveness of the algorithm.Entities:
Mesh:
Year: 2019 PMID: 31120942 PMCID: PMC6533037 DOI: 10.1371/journal.pone.0216904
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of the proposed algorithm enAMOSA.
Fig 2Membership matrices represented in a solution.
In the example, there are two views, two clusters, and a data set of ten points.
Fig 3enAMOSA perturbation operator.
Descriptions of datasets.
| Dataset | Views | Total Features | Selected Features | Samples |
|---|---|---|---|---|
| RNASeq | 20510 | 4300 | 621 | |
| miRNASeq | 1046 | 220 | ||
| DNA Methylation | 4885 | 1125 | ||
| Gene Expression | 21439 | 4500 | 349 | |
| miRNA Expression | 734 | 164 | ||
| DNA Methylation | 4885 | 1125 | ||
| Gene Expression | 21439 | 4500 | 349 | |
| miRNA Expression | 734 | 164 | ||
| DNA Methylation | 4885 | 1125 | ||
| Gene Expression | 26446 | 5300 | 151 | |
| miRNA Expression | 368 | 82 | ||
| DNA Methylation | 3894 | 858 | ||
| Gene Expression | 12042 | 2500 | 274 | |
| miRNA Expression | 534 | 110 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 12043 | 2500 | 398 | |
| miRNA Expression | 800 | 190 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 20351 | 4883 | 220 | |
| miRNA Expression | 705 | 170 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 20531 | 4792 | 367 | |
| miRNA Expression | 705 | 170 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 20531 | 4880 | 341 | |
| miRNA Expression | 705 | 170 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 20531 | 4884 | 448 | |
| miRNA Expression | 705 | 170 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 20531 | 4617 | 257 | |
| miRNA Expression | 1046 | 241 | ||
| DNA Methylation | 5000 | 1150 | ||
| Gene Expression | 20531 | 4880 | 183 | |
| miRNA Expression | 705 | 170 | ||
| DNA Methylation | 5000 | 1200 | ||
| Gene Expression | 20531 | 4520 | 170 | |
| miRNA Expression | 705 | 168 | ||
| DNA Methylation | 5000 | 1198 |
Parameter settings for the proposed algorithm enAMOSA.
| enAMOSA | |
|---|---|
| Max Temperature | 100 |
| Min Temperature | 0.0001 |
| # Iteration | 100 |
| Rate of cooling ( | 0.8 |
| Soft Limit | 40 |
| Hard Limit | 20 |
Comparison of Normalized Mutual Information (NMI) scores of different combinations of our proposed approach.
| BRC | BRC.1 | BRC.2 | MSKCC | GBM | OVG | COAD | LIHC | LUSC | SKCM | SARC | KIRC | AML | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4461 | 0.4714 | 0.3842 | 0.1495 | 0.4816 | 0.1194 | 0.1377 | 0.1209 | 0.3435 | 0.0814 | 0.09708 | 0.0910 | 0.4615 | |
| 0.4131 | 0.4219 | 0.4692 | 0.0979 | 0.4401 | 0.1107 | 0.1201 | 0.0984 | 0.3321 | 0.0618 | 0.0483 | 0.0291 | 0.3078 | |
| 0.4601 | 0.4679 | 0.4097 | 0.1401 | 0.4487 | 0.1303 | 0.1417 | 0.1334 | 0.3647 | 0.0796 | 0.1040 | 0.0951 | 0.5157 | |
| 0.4515 | 0.4653 | 0.3904 | 0.1425 | 0.4803 | 0.1147 | 0.1207 | 0.1134 | 0.3476 | 0.0784 | 0.0736 | 0.0891 | 0.4574 | |
| 0.4641 | 0.4730 | 0.4012 | 0.1498 | 0.4817 | 0.1203 | 0.1297 | 0.1219 | 0.3574 | 0.07807 | 0.1022 | 0.01074 | 0.5098 | |
| 0.4689 | 0.4717 | 0.4397 | 0.1521 | 0.4927 | 0.1104 | 0.1514 | 0.1298 | 0.3651 | 0.1126 | 0.1013 | 0.1095 | 0.5231 | |
| 0.4787 | 0.5105 | 0.4475 | 0.1735 | 0.5094 | 0.1394 | 0.1704 | 0.1473 | 0.3747 | 0.1319 | 0.0985 | 0.1025 | 0.5201 | |
| 0.4702 | 0.5447 | 0.4778 | 0.1916 | 0.5407 | 0.2236 | 0.2012 | 0.1603 | 0.4096 | 0.1594 | 0.1102 | 0.1154 | 0.5487 | |
| 0.4707 | 0.4804 | 0.4584 | 0.1605 | 0.5146 | 0.1264 | 0.1537 | 0.13199 | 0.3815 | 0.1409 | 0.1017 | 0.0920 | 0.4701 | |
| 0.4772 | 0.5546 | 0.4760 | 0.2066 | 0.5419 | 0.2176 | 0.1952 | 0.1693 | 0.4106 | 0.1609 | 0.1130 | 0.1161 | 0.5507 |
Comparison of Adjusted Rand Index (ARI) scores of different combinations of our proposed approach.
| BRC | BRC.1 | BRC.2 | MSKCC | GBM | OVG | COAD | LIHC | LUSC | SKCM | SARC | KIRC | AML | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4007 | 0.3717 | 0.3212 | 0.0998 | 0.3921 | 0.1091 | 0.02851 | 0.0163 | 0.3164 | 0.03205 | 0.0987 | 0.1048 | 0.4015 | |
| 0.3681 | 0.3402 | 0.2954 | 0.0481 | 0.3620 | 0.0493 | 0.0224 | 0.0098 | 0.1712 | 0.0216 | 0.0190 | 0.0401 | 0.3634 | |
| 0.4116 | 0.3797 | 0.3207 | 0.1034 | 0.3975 | 0.1017 | 0.03196 | 0.0185 | 0.2705 | 0.0330 | 0.1045 | 0.1314 | 0.4184 | |
| 0.3918 | 0.3697 | 0.3102 | 0.0981 | 0.3816 | 0.1034 | 0.02961 | 0.0185 | 0.2624 | 0.03105 | 0.0991 | 0.1034 | 0.4087 | |
| 0.4066 | 0.3757 | 0.3198 | 0.1018 | 0.3943 | 0.1009 | 0.03114 | 0.0179 | 0.2695 | 0.0321 | 0.1051 | 0.1326 | 0.4161 | |
| 0.4216 | 0.4375 | 0.3792 | 0.1085 | 0.4369 | 0.1078 | 0.10184 | 0.0704 | 0.2459 | 0.0253 | 0.1098 | 0.1311 | 0.4201 | |
| 0.4291 | 0.5120 | 0.3915 | 0.1294 | 0.4593 | 0.1011 | 0.1211 | 0.0935 | 0.2901 | 0.0843 | 0.1057 | 0.1319 | 0.4198 | |
| 0.4501 | 0.5284 | 0.4201 | 0.1513 | 0.4710 | 0.1012 | 0.1213 | 0.1016 | 0.3103 | 0.1023 | 0.1107 | 0.1402 | 0.4215 | |
| 0.4393 | 0.5101 | 0.40914 | 0.1302 | 0.4601 | 0.1008 | 0.1461 | 0.1015 | 0.3091 | 0.0879 | 0.1008 | 0.1084 | 0.4087 | |
| 0.4513 | 0.5304 | 0.4284 | 0.1601 | 0.4709 | 0.094 | 0.1437 | 0.1065 | 0.3002 | 0.0934 | 0.1103 | 0.1412 | 0.4208 |
Fig 4Contribution by each view in clustering.
The p-values reported by one-way ANOVA test on comparing enAMOSA with other methods.
| BRC | BRC.1 | BRC.2 | MSKCC | GBM | OVG | COAD | LIHC | LUSC | SKCM | SARC | KIRC | AML | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9.35e-36 | 1.38e-35 | 2.84e-44 | 8.46e-51 | 1.01e-31 | 1.61e-17 | 7.77e-4 | 5.65e-5 | 6.20e-10 | 9.05e-8 | 1.28e-5 | 3.56e-4 | 8.07e-5 | |
| 3.94e-63 | 1.18e-54 | 7.69e-50 | 8.18e-65 | 3.28e-57 | 3.11e-22 | 5.32e-9 | 2.31e-7 | 3.67e-7 | 7.11e-10 | 2.01e-4 | 2.01e-7 | 3.17e-5 | |
| 3.55e-15 | 3.08e-18 | 4.00e-20 | 1.93e-14 | 1.98e-28 | 1.79e-24 | 1.23e-8 | 2.14e-5 | 1.05e-11 | 8.12e-6 | 5.18e-5 | 2.28e-7 | 1.21e-4 | |
| 1.28e-31 | 2.56e-30 | 7.07e-32 | 3.65e-41 | 6.22e-42 | 7.05e-25 | 3.24e-14 | 1.27e-9 | 5.42e-8 | 79.12e-7 | 6.65e-5 | 7.22e-4 | 6.05e-5 | |
| 7.24e-25 | 5.05e-28 | 2.49e-31 | 1.78e-27 | 3.14e-33 | 2.82e-26 | 2.17e-24 | 3.24e-7 | 2.78e-8 | 5.31e-7 | 0.44e-7 | 5.12e-7 | 4.64e-6 | |
| 8.94e-28 | 3.11e-28 | 4.77e-26 | 7.84e-28 | 4.11e-28 | 2.57e-26 | 1.84e-13 | 3.76e-11 | 1.11e-4 | 1.61e-7 | 3.74e-8 | 4.41e-4 | 4.37e-5 | |
| 1.44e-7 | 1.12e-7 | 8.64e-6 | 2.73e-7 | 8.93e-9 | 4.53e-5 | 2.44e-9 | 6.72e-6 | 6.50e-7 | 1.63e-4 | 3.08e-8 | 1.00e-4 | 1.93e-4 | |
| 1.26e-12 | 5.71e-4 | 8.63e-8 | 7.64e-9 | 2.77e-5 | 6.23e-7 | 1.19e-7 | 6.34e-12 | 2.63e-10 | 6.22e-11 | 8.94e-9 | 7.53e-11 | 1.43e-9 | |
| 0.0045 | 0.0118 | 0.0086 | 0 | 0.0357 | 0.0023 | 0.0076 | 0.0048 | 0.0034 | 0.0043 | 4.93e-9 | 4.51e-5 | 2.24e-10 | |
| 3.74e-12 | 1.41e-4 | 7.37e-7 | 1.24e-15 | 4.13e-6 | 8.73e-10 | 9.87e-11 | 1.45e-17 | 1.35e-5 | 7.86e-7 | 4.11e-8 | 2.57e-6 | 1.84e-3 |
Fig 5Heatmap to show the expression levels of the selected gene markers for each subclass in OXF.BRC.1 dataset.
Selected 10 gene markers for OXF.BRC.1 dataset.
| Her2 | Basal | LumA | LumB | ||||
|---|---|---|---|---|---|---|---|
| Gene ID | Down/Up | Gene ID | Down/Up | Gene ID | Down/Up | Gene ID | Down/Up |
| 2064 | 2296 | 9 | 51523 | ||||
| 2886 | 8190 | 25800 | 23594 | ||||
| 2264 | 7052 | 4137 | 4193 | ||||
| 644 | 55165 | 11065 | 11004 | ||||
| 573 | 54443 | 9232 | 5241 | ||||
| 991 | 9833 | 9156 | 4288 | ||||
| 898 | 26996 | 1063 | 599 | ||||
| 57180 | 120224 | 83540 | 1956 | ||||
| 4609 | 2099 | 4605 | 26227 | ||||
| 6422 | 3169 | 332 | 1001 | ||||
Significant shared Gene Ontology (GO) terms by gene markers.
| Classes | Gene Ontology(GO) term | (%) Genome | (%)Cluster |
|---|---|---|---|
| regulation of catalytic activity: GO:0050790 | 47% | 50% | |
| regulation of cell proliferation: GO:0042127 | 31% | 40% | |
| negative regulation of programmed cell death: GO:0043069 | 50% | 50% | |
| negative regulation of apoptotic process: GO:0043066 | 2% | 20% | |
| positive regulation of cell proliferation: GO:0008284 | 3% | 40% | |
| negative regulation of cell death: GO:0060548 | 5% | 38% | |
| biological process: GO:0008150 | 50% | 50% | |
| biological regulation: GO:0065007 | 52% | 60% | |
| signal transduction: GO:0007165 | 6% | 10% | |
| nitrogen compound metabolic process: GO:0006807 | 27% | 30% | |
| multicellular organismal process: GO:0032501 | 3% | 30% | |
| cell cycle: GO:0007049 | 30% | 46% | |
| regulation of chromosome organization: GO:0033044 | 2% | 20% | |
| organelle fission: GO:0048285 | 3% | 10% | |
| regulation of chromosome segregation: GO:0051983 | 8% | 10% | |
| mitotic nuclear division: GO:0140014 | 3% | 20% | |
| regulation of biological process: GO:0050789 | 17% | 20% | |
| regulation of cellular process: GO:0050794 | 0.5% | 10% | |
| multicellular organism development: GO:0007275 GO:0007275 | 1.3% | 10% | |
| regulation of macromolecule metabolic process: GO:0060255 | 16% | 20% | |
| organic substance biosynthetic process: GO:1901576 | 21% | 30% |
Comparison of Normalized Mutual Information (NMI) scores of our proposed approach (enAMOSA) with other baseline approaches and state-of-the art methods.
| BRC | BRC.1 | BRC.2 | MSKCC | GBM | OVG | COAD | LIHC | LUSC | SKCM | SARC | KIRC | AML | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4021 | 0.4185 | 0.3618 | 0.0906 | 0.4383 | 0.0797 | 0.0971 | 0.0594 | 0.3021 | 0.0281 | 0.0463 | 0.0184 | 0.3017 | |
| 0.4067 | 0.4206 | 0.3531 | 0.1076 | 0.4354 | 0.0814 | 0.1014 | 0.0498 | 0.2841 | 0.0212 | 0.0315 | 0.0281 | 0.2962 | |
| 0.4215 | 0.4418 | 0.3615 | 0.1206 | 0.4519 | 0.1067 | 0.1098 | 0.0961 | 0.3196 | 0.0684 | 0.0748 | 0.0907 | 0.4517 | |
| 0.4375 | 0.4668 | 0.3798 | 0.1179 | 0.4708 | 0.1201 | 0.1147 | 0.0948 | 0.3089 | 0.0716 | 0.1005 | 0.1104 | 0.5141 | |
| 0.4581 | 0.5071 | 0.4104 | 0.1390 | 0.4615 | 0.1421 | 0.1227 | 0.1103 | 0.3284 | 0.1097 | 0.1091 | 0.1109 | 0.5207 | |
| 0.0146 | 0.0232 | 0.0118 | 0.1098 | 0.0532 | 0.0304 | 0.0328 | 0.0573 | 0.0672 | 0.0483 | 0.0475 | 0.0389 | 0.3629 | |
| 0.0118 | 0.0146 | 0.00392 | 0.0572 | 0.0153 | 0.0095 | 0.0459 | 0.0348 | 0.0237 | 0.0382 | 0.0262 | 0.0279 | 0.2219 | |
| 0.0358 | 0.0475 | 0.0153 | 0.0098 | 0.0026 | 0.0068 | 0.0332 | 0.0129 | 0.0082 | 0.0088 | 0.0233 | 0.0908 | 0.4349 | |
| 0.0121 | 0.09931 | 0.0153 | 0.0780 | 0.0306 | 0.0081 | 0.0106 | 0.0258 | 0.0112 | 0.0044 | 0.0177 | 0.0108 | 0.0894 | |
| 0.3912 | 0.4034 | 0.3403 | 0.1124 | 0.4213 | 0.0863 | 0.0793 | 0.0175 | 0.2594 | 0.0195 | 0.0321 | 0.0655 | 0.2871 |
Comparison of Adjusted Rand Index (ARI) scores of our proposed approach (enAMOSA) with other baseline methods and state-of-the art methods.
| BRC | BRC.1 | BRC.2 | MSKCC | GBM | OVG | COAD | LIHC | LUSC | SKCM | SARC | KIRC | AML | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.3641 | 0.3471 | 0.2815 | 0.0448 | 0.3561 | 0.0430 | 0.02051 | 0.0083 | 0.1573 | 0.0161 | 0.0189 | 0.0384 | 0.3412 | |
| 0.3651 | 0.3384 | 0.2901 | 0.0457 | 0.3620 | 0.0473 | 0.02110 | 0.0082 | 0.1615 | 0.0186 | 0.0175 | 0.0115 | 0.3603 | |
| 0.3805 | 0.3547 | 0.2981 | 0.0935 | 0.3726 | 0.0984 | 0.02751 | 0.0096 | 0.2136 | 0.0228 | 0.0984 | 0.1045 | 0.3942 | |
| 0.3916 | 0.3675 | 0.3102 | 0.0958 | 0.3864 | 0.0953 | 0.02814 | 0.0094 | 0.2219 | 0.0231 | 0.1041 | 0.1308 | 0.4107 | |
| 0.3943 | 0.3861 | 0.3306 | 0.1006 | 0.3937 | 0.1002 | 0.0447 | 0.0108 | 00.2419 | 0.0259 | 0.1114 | 0.1137 | 0.4208 | |
| 0.0086 | 0.0112 | 0.0012 | 0.0105 | 0.0076 | 0.0051 | 0.0184 | 0.0054 | 0.0098 | 0.0055 | 0.0263 | 0.0392 | 0.2546 | |
| 0.0144 | 0.0112 | 0.00864 | 0.01142 | 0.0089 | 0.0045 | 0.0244 | 0.0067 | 0.0065 | 0.0016 | 0.0152 | 0.0136 | 0.1195 | |
| 0.0126 | 0.0057 | 0.0863 | 0.00081 | 0.0027 | 0.0062 | 0.0119 | 0.0063 | 0.0026 | 0.00062 | 0.0238 | 0.0157 | 0.3667 | |
| 0.0045 | 0.0118 | 0.0086 | 0.0171 | 0.0357 | 0.0023 | 0.0076 | 0.0048 | 0.0034 | 0.0043 | 0.0399 | 0.0288 | 0.0482 | |
| 0.2457 | 0.3441 | 0.2831 | 0.067 | 0.3441 | 0.0614 | 0.0194 | 0.0085 | 0.1351 | 0.0145 | 0.02366 | 0.0355 | 0.2021 |
F1-measure and accuracy values obtained by enAMOSA for all the datasets.
| Datasets | F1-measure | Accuracy |
|---|---|---|
| TCGA.BRC | 0.6592 | 0.6814 |
| OXF.BRC.1 | 0.7048 | 0.7184 |
| OXF.BRC.2 | 0.6067 | 0.6126 |
| MSKCC | 0.5146 | 0.5541 |
| TCGA.GBM | 0.6975 | 0.7068 |
| TCGA.OVG | 0.5012 | 0.5236 |
| TCGA.COAD | 0.4905 | 0.5131 |
| TCGA.LIHC | 0.4718 | 0.4946 |
| TCGA.LUSC | 0.6056 | 0.6479 |
| TCGA.SKCM | 0.4861 | 0.4931 |
| TCGA.SARC | 0.4315 | 0.4415 |
| TCGA.KIRC | 0.4174 | 0.4212 |
| TCGA.AML | 0.6904 | 0.6882 |
Fig 6Gene expression profile plot for each subclass in OXF.BRC.1 dataset.
Execution time of the algorithms in seconds.
| BRC | BRC.1 | BRC.2 | MSKCC | GBM | OVG | COAD | LIHC | LUSC | SKCM | SARC | KIRC | AML | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9531.21 | 4389.12 | 4380.98 | 727.08 | 2045.80 | 4419.02 | 2241.11 | 4373.24 | 4352.51 | 6533.02 | 3016.14 | 2045.03 | 978.42 | |
| 15034.51 | 3421.16 | 3415.77 | 972.30 | 2055.93 | 3417.27 | 2104.53 | 5918.45 | 5074.86 | 8550.49 | 3057.00 | 1569.91 | 1025.22 | |
| 1219.78 | 612.71 | 614.45 | 121.45 | 356.52 | 634.88 | 379.28 | 371.29 | 596.02 | 798.99 | 376.72 | 379.28 | 206.98 | |
| 65.71 | 19.61 | 20.71 | 12.04 | 13.72 | 20.07 | 15.89 | 27.21 | 24.31 | 36.23 | 17.30 | 15.89 | 14.90 | |
| 9041.95 | 4503.10 | 4523.06 | 1014.06 | 2923.39 | 4543.08 | 3635.03 | 5669.00 | 5309.83 | 6776.40 | 4197.39 | 3635.03 | 1847.40 | |
| 222.11 | 146.73 | 140.27 | 153.54 | 190.24 | 136.37 | 197.62 | 224.15 | 141.09 | 214.08 | 175.583 | 211.07 | 225.11 |