| Literature DB >> 33845875 |
Haijing Jin1, Zhandong Liu2,3.
Abstract
BACKGROUND: Deconvolution analyses have been widely used to track compositional alterations of cell types in gene expression data. Although a large number of novel methods have been developed, due to a lack of understanding of the effects of modeling assumptions and tuning parameters, it is challenging for researchers to select an optimal deconvolution method suitable for the targeted biological conditions.Entities:
Mesh:
Year: 2021 PMID: 33845875 PMCID: PMC8042713 DOI: 10.1186/s13059-021-02290-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Overview of three in silico testing frameworks. a Three benchmarking frameworks were constructed to investigate the impact of seven factors that affect deconvolution analysis: noise level, noise structure, other noise sources, quantification unit, unknown content, component number, and weight matrix. b Eleven deconvolution methods are tested and have been categorized based on the required reference input: marker-based, reference-based, and reference-free. c Performance of the methods is assessed through Pearson’s correlation coefficient (R) and mean absolute deviance (mAD). Evaluation results are illustrated by heatmaps and scatter plots. When unknown content is involved, we derive evaluation metrics in both relative and absolute measurement scales
Cellular components and datasets involved in three testing frameworks and variance analysis
| Analysis | Cell types | Datasets |
|---|---|---|
| Variance analysis | CD8 T cells Whole blood Simulated mixtures (T, B, and mono) | GSE113590 GSE60424 GSE51984 |
| Sim1_simModel | Simulated mixtures (T, B, and mono) | GSE60424 GSE51984 GSE64655 |
| Sim1_libSize | Simulated mixtures (T, B, and mono) | |
| Sim2 | 6 gradients of cell types: Comp 5–10 | GSE60424 GSE51984 GSE64655 GSE115736 |
| Sim3 | 6 gradients of cell types: Comp 5–10 and one unknown component HCT116 | GSE60424 GSE51984 GSE64655 GSE115736 GSE118490 |
Six gradients of cell types:
Comp 5—T, B, monocytes, neutrophils, and NK cells
Comp 6—T, B, monocytes, neutrophils, NK cells, and eosinophils
Comp 7—T, B, monocytes, neutrophils, NK cells, eosinophils, and myeloid DC
Comp 8—T, B, monocytes, neutrophils, NK cells, eosinophils, myeloid DC, and CD34+ HSC
Comp 9—CD4 T, CD8 T, B, monocytes, neutrophils, NK cells, eosinophils, myeloid DC, and CD34+ HSC
Comp 10—CD4 T, CD8 T, naive B, memory B, monocytes, neutrophils, NK cells, eosinophils, myeloid DC, and CD34+ HSC
Fig. 2Evaluation results of Sim1_simModel and noise structure comparisons between real and simulated data. a Heatmap of the summarized evaluation results based on the Pearson’s correlation coefficients and b rankings of the tested deconvolution methods in the Sim1_simModel. In each heatmap, row indexes refer to the tested methods and column indexes refer to the simulation models (negative binomial, log-normal, and normal). c, d Mean-variance plots of c real and d simulated data. e, f Sample-sample scatter plots of e real and f simulated data. r, Spearman’s correlation coefficient; d, Euclidean distance. g, h Density plots of CV (coefficient of variation) of g real and d simulated data. Real data are derived from GSE113590 and GSE60424 (Additional file 1: Figures S6 and S7 contain detailed variance analysis results for each dataset). All simulated data in Fig. 2 are based on simulations derived from GSE51984 with the P6 noise level. Results in a and b are in the tpm unit; results in c–h are in count unit
Fig. 3Evaluation results of Sim1_libSize. a Heatmap of the summarized evaluation results based on the Pearson’s correlation coefficients and b rankings of the tested deconvolution methods. In each heatmap, row indexes refer to the tested methods, and column indexes refer to the quantification units (count, countNorm, cpm, and tpm)
Fig. 4Evaluation results of Sim2. a, b Heatmaps of the summarized evaluation results based on the Pearson’s correlation coefficients with a “orthog” weight matrix and b real weight matrix. In each heatmap, row indexes refer to the tested methods, and column indexes refer to the cellular component numbers. c Scatter plots of estimated weights vs. ground truths of “real” mixtures with 10 cellular components. d, e Cell type-specific evaluation results of “real” mixtures consist of 10 cellular components based on d Pearson’s correlation coefficient and e mean absolute deviance. In each heatmap, row indexes refer to the tested methods, column indexes refer to the cell types, and the last column “all” refers to the averaged evaluation results across all cell types
Fig. 5Evaluation results of Sim3. a, b Heatmaps of the summarized evaluation results based on the Pearson’s correlation coefficients on the a relative measurement scale and b absolute measurement scale. In each heatmap, row indexes refer to the tested methods, and column indexes refer to the types of tumor spike-ins (small, large, and mosaic). c, d Scatter plots of the estimated weights vs. ground truths of mixtures consist of 5 cellular components and mosaic tumor spike-ins. c Estimated weights vs. relative ground truth. d Estimated weights vs. absolute ground truth