| Literature DB >> 32111252 |
Li Wang1,2,3, Robert P Sebra1,2,3, John P Sfakianos4, Kimaada Allette1,2, Wenhui Wang1,2, Seungyeul Yoo1,2, Nina Bhardwaj5,6, Eric E Schadt1,2,3,6, Xin Yao7, Matthew D Galsky5,6, Jun Zhu8,9,10,11.
Abstract
BACKGROUND: Patient stratification based on molecular subtypes is an important strategy for cancer precision medicine. Deriving clinically informative cancer molecular subtypes from transcriptomic data generated on whole tumor tissue samples is a non-trivial task, especially given the various non-cancer cellular elements intertwined with cancer cells in the tumor microenvironment.Entities:
Keywords: Bulk tumor profiling; Clustering; Deconvolution
Mesh:
Year: 2020 PMID: 32111252 PMCID: PMC7049190 DOI: 10.1186/s13073-020-0720-0
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Flow chart of the DeClust algorithm (a) and the simulation results (b–d). The accuracy of the estimated cell compartment fraction in b was calculated via the correlation between simulated and estimated cell frequency profiles (average for the three components). The accuracy of the estimated expression profiles in d was calculated by correlation between simulated and estimated expression profiles. Specifically, the average correlations over the cancer, immune, and stromal profiles are plotted at different noise levels and sample sizes. The noise levels represent the standard deviation of the noise added to the simulated mixed expression data under the log-normal distribution (see the “Methods” section)
Fig. 2Comparison of cell fractions estimated by different methods. Comparing tumor purity (cancer cell fraction) estimation by different deconvolution methods with the one based on ABSOLUTE (treated as “ground truth”) in terms of Spearman’s correlation coefficients (a) or median absolute deviation (b). The p values above are the difference between DeClust and other methods according to the two-sided paired t test. c Scatter plot of tumor purity estimates by DeClust or ESTIMATE_tr against the one by ABSOLUTE for each cancer type (ESTIMATE_tr transformed ESTIMATE ssGSEA score to fit ABSOLUTE estimates.). d Correlation among the top 3 deconvolution methods (i.e., DeClust, ESTIMATE_ssGSEA, and CIBERSORT_abs) in estimations of immune cell fractions (red), and correlation among the top 3 deconvolution methods (i.e., DeClust, EPIC, and ESTIMATE_ssGSEA) in estimations of stromal cell fractions (gray)
Fig. 3a Association between overall survival and stromal cell fraction estimated by different methods. Kaplan-Meier curves of patients with high/low stromal compartment fractions as defined by DeClust in the KIRC (b) and BLCA (c) TCGA datasets. Heat maps of the gene expression levels estimated by DeClust of cell type-specific markers (d) and EMT genes (e) in the stromal compartment across the 13 cancer types
Fig. 4Overlap between the DeClust and TCGA subtypes. Numbers in the overlap table represent the number of samples shared by the two subtypes, and the color intensity represents the significance of the overlap
Fig. 5Plots of the numbers of somatically mutated, amplified, or deleted genes that were enriched in subtypes defined by different methods (adjusted p value of chi-squared test < 0.05) grouped by methods (a) or by cancer types (b). The p values above are the difference between DeClust and other methods according to the two-sided paired Wilcoxon rank-sum test based on the log-transformed counts or the original counts (in parenthesis). c The frequency of genetic alterations within each BLCA subtype defined by DeClust (left) or TCGA (right). Only genetic alterations that were enriched in either DeClust or TCGA subtypes in BLCA are included here (adjusted p value < 0.05). d Similarity between CRIS subtypes and subtypes derived from different methods for COADREAD (upper) and overlap between CRIS subtypes and DeClust subtypes (lower)
Fig. 6Plots of associations between overall survival outcome and subtypes defined by different methods grouped by methods (a) or by cancer types (b). The p values above are the difference between DeClust and other methods according to the two-sided paired t test. c Heat map of the pathway activity in the immune, stromal, and cancer compartment across 13 TCGA datasets based on the expression profiles of these compartments as estimated by DeClust. The pathway activity score was calculated by single-sample gene set enrichment analysis (ssGSEA) [24, 47] and then row-wise scaled for display. We considered 50 cancer Hallmark pathways annotated in MsigDB by Broad Institute [48]
Fig. 7Kaplan-Meier curves of DeClust subtypes for the KIRP (a), KIRC (b), and LUAD (c) TCGA datasets. d Frequency of CDKN2A deletions in each subtype of KIRP, KIRC, and LUAD as defined by DeClust and TCGA
Fig. 8Comparison of scRNAseq data and DeClust tumor-type-specific stromal profiles. Clustering results of scRNAseq data for ccRCC samples (a) and BLCA samples (d). Cells are colored according to cell clusters. Expression levels (heat map) of cell type-specific markers (top) and EMT genes (bottom) in each cell of the RNAseq data for ccRCC (b) or BLCA (e). Marker gene expression levels are represented as the absolute expression value, log2(read count+ 1), while expression levels for EMT genes are represented as the row-wise scaled relative expression value. For display purpose, cells in ccRCC2 were downsampled to the same size of ccRCC1. Cells in Epithelia-c0 and Epitheila-c1 were downsampled so that the combined cell number in the two clusters equals to that in Epithelial-c2. Correlation between stromal profiles estimated by DeClust and stromal profiles calculated by mean expression profile of each stromal cell cluster in scRNAseq data of ccRCC (c) or BLCA (f). Error bars indicate the 95% confidence interval
Fig. 9Comparison of BLCA scRNAseq data and BLCA subtypes. a The correlation between mean expression profile of each epithelial cell cluster and subtype-specific cancer profiles estimated by DeClust or by TCGA. Error bars indicate the 95% confidence interval. b Bar plot of the fold change of the top 20 up/downregulated genes between two BLCA luminal subtypes defined by TCGA (left) and the heat map of their expression in our scRNAseq dataset (right). The fold change was calculated based on mixed expression values. The absolute expression values (log-transformed) are showed in the heat map. c Similar to b except the top 20 genes and their fold change was derived by comparing the two luminal subtypes defined by DeClust. The fold change was calculated by comparing the subtype-specific profiles estimated by DeClust