| Literature DB >> 36191033 |
William Gasper1, Francesca Rossi2, Matteo Ligorio2, Dario Ghersi1.
Abstract
Single-cell RNA-sequencing is an invaluable research tool that allows for the investigation of gene expression in heterogeneous cancer cell populations in ways that bulk RNA-seq cannot. However, normal (i.e., non tumor) cells in cancer samples have the potential to confound the downstream analysis of single-cell RNA-seq data. Existing methods for identifying cancer and normal cells include copy number variation inference, marker-gene expression analysis, and expression-based clustering. This work aims to extend the existing approaches for identifying cancer cells in single-cell RNA-seq samples by incorporating variant calling and the identification of putative driver alterations. We found that putative driver alterations can be detected in single-cell RNA-seq data obtained with full-length transcript technologies and noticed that a subset of cells in tumor samples are enriched for putative driver alterations as compared to normal cells. Furthermore, we show that the number of putative driver alterations and inferred copy number variation are not correlated in all samples. Taken together, our findings suggest that augmenting existing cancer-cell filtering methods with variant calling and analysis can increase the number of tumor cells that can be confidently included in downstream analyses of single-cell full-length transcript RNA-seq datasets.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36191033 PMCID: PMC9560611 DOI: 10.1371/journal.pcbi.1010576
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Fig 1Heatmap visualizing inferred CNV results obtained using CopyKAT [10].
Columns represent individual cells, rows represent genes arranged by chromosomal position. Alternating bars on the y-axis indicate chromosome. Cells are clustered hierarchically within each patient using expression score matrices. Pink brackets indicate cells with low inferred CNV. Panel (A) shows results for the TNBC dataset and panel (B) for the CRC dataset.
Fig 2Heatmap indicating alteration status for 736 cells for the top 25 most frequent oncogenic, predicted oncogenic, and likely oncogenic alterations in the CRC dataset.
Alterations are annotated using OncoKB. Absence of an alteration is noted when a cell has a read depth of at least 5 for all bases corresponding to the residue. For residues without an oncogenic alteration and with read depths less than 5 for all corresponding bases, the presence or absence of an alteration is not characterized (labeled as “Insufficient coverage”). Common recurrent driver mutations are shown in bold.
Fig 3Relationship between putative driver alteration counts and inferred CNV for normal tissues (left) and tumor (right) dataset cells.
(A) Cancer dataset (at right) cells shown based on primary tumor site, normal cells shown together at left. (B) TNBC cells, grouped by patient, shown in comparison to normal cells, grouped by tissue type. (C) CRC cells, grouped by patient, shown in comparison to normal cells, grouped by tissue type. Higher mean absolute CNV values indicate predicted structural alterations resulting in copy number variation, and lower values suggest limited CNV. Dashed rectangles in (A) indicate regions of interest: groups of cells that might be identified as cancer cells by either CNV inference or putative driver alteration count. Dashed rectangles are bounded at the lower ends by the 99th percentile values derived from the values for 4,415 normal cells amenable to both CopyKAT and variant analysis. Dashed polygon in (B) indicates cells of interest, with either high CNV scores or high putative driver counts, that might be selected for downstream analyses. In (A), first, second, and third quartiles are indicated for tumor cells by dashed lines and bold labels along their respective axes.
Table showing the correlation coefficients and statistical significance for relationships between inferred CNV and putative driver alteration counts for cells belonging to each patient.
p-values less than 0.05 are shown in bold.
| Patient | Pearson’s | p-value | Spearman’s | p-value |
|---|---|---|---|---|
| PT039 | 0.361 |
| 0.306 |
|
| PT126 | 0.308 |
| 0.293 |
|
| PT081 | 0.128 | 0.109 | 0.122 | 0.127 |
| PT058 | -0.077 | 0.57 | 0.099 | 0.465 |
| PT084 | 0.014 | 0.839 | 0.039 | 0.584 |
| PT089 | 0.025 | 0.716 | 0.034 | 0.615 |
| col39 | 0.604 |
| 0.592 |
|
| col44 | 0.486 |
| 0.437 |
|
| col45 | 0.543 |
| 0.489 |
|
| col36 | 0.302 | 0. | 0.374 |
|
| col48 | 0.28 |
| 0.22 |
|
| col38 | 0.153 | 0.225 | 0.161 | 0.201 |
| col40 | 0.249 | 0.092 | 0.151 | 0.31 |
| col47 | 0.059 | 0.738 | 0.162 | 0.354 |
Gene set enrichment analysis results showing top 10 enriched cancer hallmark gene sets by enrichment score for groups of driver enriched cells versus all others.
Top section shows enrichment when comparing cells with high putative driver counts to cells with low putative driver counts for the CRC dataset. Middle section shows enrichment when comparing ERBB2+/PIK3CA+ cells to cells lacking the characteristic ERBB2 or PIK3CA mutations for the CRC dataset. Bottom section shows enrichment when comparing cells with high putative driver counts to cells with low putative driver counts for the TNBC dataset.
| Comparison | Group | Gene set | Enrichment score | FDR q-value |
|---|---|---|---|---|
| CRC cells with high (> 5, | High drivers | Hallmark MYC Targets V1 | 0.62 | 0.0 |
| Hallmark MYC Targets V2 | 0.57 | 0.0 | ||
| Hallmark Oxidative Phosphorylation | 0.57 | 0.006 | ||
| Hallmark E2F Targets | 0.53 | 0.0 | ||
| Hallmark G2M Checkpoint | 0.49 | 0.0 | ||
| Hallmark Peroxisome | 0.46 | 0.0 | ||
| Hallmark PI3K AKT MTOR Signaling | 0.46 | 0.008 | ||
| Hallmark Fatty Acid Metabolism | 0.46 | 0.014 | ||
| Hallmark Protein Secretion | 0.45 | 0.126 | ||
| Hallmark MTORC1 Signaling | 0.44 | 0.015 | ||
| Low drivers | Hallmark IL6 JAK STAT3 Signaling | -0.18 | 0.874 | |
| Hallmark Inflammatory Response | -0.23 | 0.537 | ||
| CRC driver positive (ERBB2 L755S or PIK3CA H1047R, | Driver positive | Hallmark MYC Targets V1 | 0.51 | 0.0 |
| Hallmark Allograft Rejection | 0.5 | 0.0 | ||
| Hallmark TGF Beta Signaling | 0.49 | 0.0 | ||
| Hallmark Pancreas Beta Cells | 0.48 | 0.157 | ||
| Hallmark Interferon Alpha Response | 0.45 | 0.002 | ||
| Hallmark PI3K AKT MTOR Signaling | 0.45 | 0.002 | ||
| Hallmark Reactive Oxygen Species Pathway | 0.43 | 0.074 | ||
| Hallmark Interferon Gamma Response | 0.43 | 0.004 | ||
| Hallmark WNT Beta Catenin Signaling | 0.42 | 0.105 | ||
| Hallmark Apoptosis | 0.42 | 0.004 | ||
| Driver negative | Hallmark KRAS Signaling Down | -0.18 | 0.862 | |
| Hallmark Bile Acid Metabolism | -0.21 | 0.614 | ||
| Hallmark Coagulation | -0.23 | 0.588 | ||
| TNBC cells with high (> 5, | High drivers | Hallmark WNT Beta Catenin Signaling | 0.58 | 0.0 |
| Hallmark TGF Beta Signaling | 0.54 | 0.0 | ||
| Hallmark Hedgehog Signaling | 0.52 | 0.015 | ||
| Hallmark Notch Signaling | 0.51 | 0.005 | ||
| Hallmark IL6 JAK STAT3 Signaling | 0.5 | 0.0 | ||
| Hallmark Mitotic Spindle | 0.49 | 0.0 | ||
| Hallmark Hypoxia | 0.47 | 0.0 | ||
| Hallmark UV Response Up | 0.45 | 0.003 | ||
| Hallmark TNFA Signaling via NFKB | 0.44 | 0.019 | ||
| Hallmark Apical Junction | 0.42 | 0.003 | ||
| Low drivers | Hallmark E2F Targets | -0.19 | 0.799 | |
| Hallmark Interferon Alpha Response | -0.22 | 0.791 | ||
| Hallmark Oxidative Phosphorylation | -0.48 | 0.0 |
Fig 4Flowchart depicting a cancer-cell-filtering process.
Additional steps proposed in this work, to include variant calling and analysis, are shown by the green dashed rectangle. Solid rectangles indicate inputs and outputs, hexagons indicate processes.