| Literature DB >> 31649733 |
Ran Duan1, Lin Gao1, Han Xu1, Kuo Song1, Yuxuan Hu1, Hongda Wang1, Yongqiang Dong1, Chenxing Zhang1, Songwei Jia1.
Abstract
Cancer subtypes can improve our understanding of cancer, and suggest more precise treatment for patients. Multi-omics molecular data can characterize cancers at different levels. Up to now, many computational methods that integrate multi-omics data for cancer subtyping have been proposed. However, there are no consistent criteria to evaluate the integration methods due to the lack of gold standards (e.g., the number of subtypes in a specific cancer). Since comprehensive evaluation and comparison between different methods serves as a useful tool or guideline for users to select an optimal method for their own purpose, we develop a scalable platform, CEPICS, for comprehensively evaluating and comparing multi-omics data integration methods in cancer subtyping. Given a user-specified maximum number of subtypes, k-max, CEPICS provides (1) cancer subtyping results using up to five built-in state-of-the-art integration methods under the number of subtypes from two to k-max, (2) a report including the evaluation of each user-selected method and comparisons across them using clustering performance metrics and clinical survival analysis, and (3) an overall analysis of subtyping results by different methods representing a robust cancer subtype prediction for samples. Furthermore, users can upload subtyping results of their own methods to compare with the built-in methods. CEPICS is implemented as an R package and is freely available at https://github.com/GaoLabXDU/CEPICS.Entities:
Keywords: R package; cancer subtypes; cluster analysis; data integration; multi-omics
Year: 2019 PMID: 31649733 PMCID: PMC6792302 DOI: 10.3389/fgene.2019.00966
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1CEPICS framework. (A) Data processing step. Users can choose different strategies to impute missing values, select features, and normalize the data. (B) Integration and subtyping step. CEPICS utilizes five existing methods to obtain subtyping results. Users can select several or all of the methods. (C) Evaluation and comparison step. CEPICS evaluates results of each method using both clinical survival analysis and clustering performance metrics, and makes comparisons among the results vertically and horizontally.
Figure 2The main results of Scenario 1. (A) Data Input. Four datasets including mRNA expression, miRNA expression, DNA methylation, and CNV were uploaded to CEPICS. (B) Cox p-value heatmap showed the comparison based on clinical survival analysis across different methods and different numbers of clusters (subtypes) with p-values of the Cox proportional hazards model. The shade of color was inversely proportional to the p-value. (C) Silhouette coefficient comparison. (D) Comparisons of NMI and ARI. The shade of color in the NMI and ARI heatmaps was proportional to the consistency between results by different methods. Silhouette coefficient, NMI, and ARI showed the comparison based on clustering performance. (E) The process of generating the overall sample similarity heatmap.
Figure 3The main results of Scenario 2. (A) Data Input. A pre-defined subtyping result with four datasets was uploaded to CEPICS. CEPICS took the pre-defined subtyping result as the gold standard to evaluate each method, and then made comparisons. (B) Comparisons based on k = 4. CEPICS suggested the best method according to Cox p-value, NMI, ARI and silhouette coefficient (SC) comparisons based on k = 4. The method which had the highest score (SNF for this case) was considered to be the best method based on the current k, and was highlighted by blue in the table. (C) Comparisons from k = 2 to k-max. After comparing, CEPICS summarised the performance of the main metrics for each method for all choices of k in the second part of the report.
Figure 4The main results of Scenario 3. (A) Data Input. Subtyping results of our own method and an integrated data matrix were uploaded with four datasets to CEPICS. CEPICS compared our results with other built-in methods. (B) Comparisons without pre-defined result. (C) Comparisons with the pre-defined result. We uploaded the pre-defined result in Scenario 2 to make comparisons based on it. The performances of ‘Ours’ were highlighted by red boxes.
Figure 5Turning points selection. The upper plot of each panel is an explained variance plot and the lower one is a corresponding adjusted turning score plot. (A) and (B) illustrate two situations where the turning points can be easily chosen manually. (C) and (D) illustrate two situations where the turning points are difficult to choose manually.