| Literature DB >> 31510660 |
Gregor Sturm1,2, Francesca Finotello3, Florent Petitprez4,5, Jitao David Zhang6, Jan Baumbach1, Wolf H Fridman4, Markus List7, Tatsiana Aneichyk2,8.
Abstract
MOTIVATION: The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31510660 PMCID: PMC6612828 DOI: 10.1093/bioinformatics/btz363
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Overview of cell type quantification methods providing gene signatures for immuno-oncology
| Tool | Abbrev. | Type | Score | Comparisons | Algorithm | Cell types | Reference |
|---|---|---|---|---|---|---|---|
| CIBERSORT | CBS | D | Immune cell fractions, relative to total immune cell content | Intra |
| 22 immune cell types |
|
| CIBERSORT abs. mode | CBA | D | Score of arbitrary units that reflects the absolute proportion of each cell type | Intra, inter |
| 22 immune cell types |
|
| EPIC | EPC | D | Cell fractions, relative to all cells in sample | Intra, inter | constrained least square regression | 6 immune cell types, fibroblasts, endothelial cells |
|
| MCP-counter | MCP | M | Arbitrary units, comparable between samples | Inter | mean of marker gene expression | 8 immune cell types, fibroblasts, endothelial cells |
|
| quanTIseq | QTS | D | Cell fractions, relative to all cells in sample | Intra, inter | constrained least square regression | 10 immune cell types |
|
| TIMER | TMR | D | Arbitrary units, comparable between samples (not different cancer types) | Inter | linear least square regression | 6 immune cell types |
|
| xCell | XCL | M | Arbitrary units, comparable between samples | Inter | ssGSEA ( | 64 immune and non-immune cell types |
|
Note: Methods can be conceptually distinguished in marker-gene-based approaches (M) and deconvolution-based approaches (D). The output scores of the methods have different properties and allow either intra-sample comparisons between cell types, inter-sample comparisons of the same cell type, or both. All methods come with a set of cell type signatures ranging from six immune cell types to 64 immune and non-immune cell types.
Fig. 1.(a) Correlation of predicted versus known cell type fractions on 100 simulated bulk RNA-seq samples generated from single cell RNA-seq. Pearson’s r is indicated in each panel. Due to the lack of a corresponding signature, we estimated macrophages/monocytes with EPIC using the ‘macrophage’ signature and with MCP-counter using the ‘monocytic lineage’ signature as a surrogate. (b) Performance of the methods on three independent datasets that provide immune cell quantification by FACS. Different cell types are indicated in different colors. Pearson’s r has been computed as a single correlation on all cell types simultaneously. Note that only methods that allow both inter- and intra-sample comparisons (i.e. EPIC, quanTIseq, CIBERSORT absolute mode) can be expected to perform well here. (c–d) Performance on the three validation datasets per cell type. Schelker’s and Racle’s dataset have too few samples to be considered individually. The values indicate Pearson correlation of the predictions with the cell type fractions determined using FACS. Blank squares indicate that the method does not provide a signature for the respective cell type. ‘n/a’ values indicate that no correlation could be computed because all predictions were zero. The asterisk (*) indicates that the ‘monocytic lineage’ signature was used as a surrogate to predict monocyte content. P-values: **** < 0.0001; *** < 0.001; ** < 0.01; * < 0.05; ns . P-values are not adjusted for multiple testing. Method abbreviations: see Table 1
Fig. 2.Minimal detection fraction and background prediction level. For each panel, we created simulated bulk RNA-seq samples with an increasing amount of the cell type of interest and a background of 1000 cells randomly sampled from the other cell types. The dots show the mean predicted score across five independently simulated samples for each fraction of spike-in cells. The grey ribbon indicates the 95% confidence interval. The red line refers to the minimal detection fraction, i.e. the minimal fraction of an immune cell type needed for a method to reliably detect its abundance as significantly different from the background (P-value < 0.05, one-sided t-test). The blue line refers to the background prediction level, i.e. the average estimate of a method while the cell type of interest is absent. Method abbreviations: see Table 1
Fig. 3.Spillover analysis. All methods were applied to simulated bulk RNA-seq samples containing only cells of one of the nine immune and non-immune cell types. The outer circle indicates the different samples, the connections within refer to the methods’ predictions. The size of a border segment is reflective of the predicted score on that cell type. A connection leading to a border segment of the same color indicates a correctly predicted cell type fraction; a connection leading to a different color indicates spillover, i.e. a prediction of a different cell type than actually present. Note that not all methods provide signatures for all cell types, in that case the connections are indicative of the cell types wrongly predicted when a method is confronted with cell types it has not been optimized for. CD4+ T cell samples are an aggregate of regulatory and non-regulatory CD4+ T cells. The numbers in the center indicate the overall noise ratio, i.e. the fraction of predictions that are attributed to a wrong cell type. Method abbreviations: Table 1
Fig. 4.(a) Background prediction level of quanTIseq before and after removing nonspecific signature genes. This plot is based on the same five simulated samples used to determine the background prediction level in the Mac/Mono panel of Figure 2. (b) B cell score on ten simulated pDC samples before and after removing nonspecific signature genes. Method abbreviations: Table 1
Guidelines for method selection
| Cell type | Recommended methods | Overall performance | Absolute score | No background predictions |
|---|---|---|---|---|
| B cell | EPIC | ++ | ++ | + |
| MCP-counter | ++ | − | − | |
| T cell CD4+ | EPIC | ++ | ++ | − |
| xCell | ++ | − | ++ | |
| T cell CD4+ non-regulatory | quanTIseq | + | ++ | + |
| xCell | + | − | ++ | |
| T cell regulatory | quanTIseq | ++ | ++ | − |
| xCell | ++ | − | ++ | |
| T cell CD8+ | quanTIseq | ++ | ++ | − |
| EPIC | ++ | ++ | − | |
| MCP-counter | ++ | − | − | |
| xCell | + | − | ++ | |
| Natural Killer Cell | EPIC | ++ | ++ | + |
| MCP-counter | ++ | − | − | |
| Macrophage / Monocyte | xCell | − | ++ | |
| EPIC | + | ++ | + | |
| MCP-counter | ++ | − | − | |
| Cancer-associated fibroblast | EPIC | ++ | ++ | + |
| MCP-counter | ++ | − | − | |
| Endothelial Cell | EPIC | ++ | ++ | + |
| xCell | ++ | − | ++ | |
| Dentricic cell | None of the methods can be recommended to estimate overall DC content. MCP-counter and quanTIseq can be used to profile mDCs. | |||
Note: For each cell type, we recommend which method to use, highlighting advantages and possible limitations. Overall performance: indicates how well predicted fractions correspond to known fractions in the benchmark. Absolute score: the method provides an absolute score that can be interpreted as a cell fraction. Methods that do not provide an absolute score only support inter-sample comparisons within the same experimental dataset, i.e. the score is only meaningful in relation to another sample. Background predictions: indicates, if a method tends to predict a cell type although it is absent.