| Literature DB >> 36084081 |
Erik Christensen1,2, Alaine Naidas2,3, David Chen2,3, Mia Husic4, Parisa Shooshtari1,2,3,5,6.
Abstract
MOTIVATION: The tumour microenvironment (TME) contains various cells including stromal fibroblasts, immune and malignant cells, and its composition can be elucidated using single-cell RNA sequencing (scRNA-seq). scRNA-seq datasets from several cancer types are available, yet we lack a comprehensive database to collect and present related TME data in an easily accessible format.Entities:
Mesh:
Year: 2022 PMID: 36084081 PMCID: PMC9462821 DOI: 10.1371/journal.pone.0272302
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
A list of search parameters that can be passed to queryTME in order to filter the available datasets.
| Search Parameter | Description |
|---|---|
| geo_accession | Search by GEO accession |
| score_type | Search by type of score available |
| has_signatures | Search by presence of cell type signature gene sets |
| has_truth | Search by presence of cell type annotations |
| tumour_type | Search by type of tumour |
| author | Search by first author |
| journal | Search by publication journal |
| year | Search by publication year |
| pmid | Search by PMID |
| sequence_tech | Search by sequencing technology |
| organism | Search by source organism |
| sparse | Return expression in sparse matrices |
| download_format | Specify a list of score formats to download. Additional formats will be stored in altExps |
Fig 1A visualisation of the various tissue types included in TMExplorer.
TMExplorer includes 48 TME scRNA-seq datasets from 26 different human cancer types from 13 different sites and 4 different mouse cancer types. TMExplorer is generalizable and extendable, and the new datasets are added to the database as they become available. Fig 1 is created with BioRender.com.
List of tumor microenvironment scRNA-seq datasets included in TMExplorer.
| Dataset | Cancer type | Sequencing Technology | Number of tumors | Number of cells | Number of genes | Annotation available? | Gene signature available? |
|---|---|---|---|---|---|---|---|
|
| Glioblastoma | SMART-seq | 5 human primary glioblastoma tumors | 1,456 | 5,796 | Yes | No |
|
| Metastatic melanoma | SMART-seq 2 | 19 human melanoma tumors | 4,645 | 23,686 | Yes | Yes |
|
| Oligodendroglioma | SMART-seq 2 | 6 human IDH-mutant oligodendroglioma tumors | 4,347 | 23,686 | Yes | Yes |
|
| Astrocytoma | SMART-seq2 | 10 human IDH-mutant astrocytoma tumors | 6,341 | 23,686 | No | No |
|
| Colorectal cancer | Fluidigm C1 | 11 human primary colorectal cancer tumors | 375 | 57,241 | Yes | Yes |
|
| Breast cancer | Fluidigm C1 | 11 human primary breast cancer tumors | 563 | 57,915 | Yes | Yes |
|
| Head and neck squamous cell carcinoma | SMART-seq 2 | 18 human primary oral cavity tumors and 5 lymph node metastases | 5,902 | 21,884 | Yes | Yes |
|
| Chronic myeloid leukemia | SMART-seq2 | 20 human bone marrow aspirates | 2,287 | 23,384 | No | No |
|
| H3 K27M-mutant glioma | SMART-seq2 | 6 human primary H3K27M-glioma tumors | 4,058 | 23,686 | Yes | No |
|
| Melanoma | SMART-seq2 | 33 human melanoma tumors | 7,186 | 23,686 | Yes | Yes |
|
| Acute myeloid leukemia | Seq-Well | 40 human bone marrow aspirates | 23,383 | 27,899 | No | No |
|
| Pancreatic cancer | Tang Protocol | 5 mice with pancreatic cancer, 1 mouse embryonic fibroblast cell line, 1 mouse pancreatic cancer cell line, 1 control mouse | 187 | 29,018 | No | No |
|
| Prostate cancer | ABI SOLiD | 18 patients with metastatic prostate cancer, 4 patients with localized prostate cancer, 12 bulk primary prostate tumors, 4 prostate cancer cell lines | 169 | 21696 | No | No |
|
| Breast cancer | Truseq | 2 ER+/HER2- breast cancer patients, 14 triple negative breast cancer patients | 74 | 23,368 | No | No |
|
| Breast cancer | InDrop | 8 human breast carcinomas | 46,016 | 14,875 | No | No |
|
| Non-small cell lung carcinoma | 10x Genomics | 5 human non metastatic lung squamous carcinoma tumors | 51,775 | 22,533 | Yes | Yes |
|
| Melanoma | SMART-seq2 | Mouse tumors | 6,422 | 26,946 | No | No |
|
| Pancreatic ductal adenocarcinoma | 10x Genomics | 24 human primary pancreatic ductal adenocarcinoma tumors, 11 control pancreases | 57,530 | 24,005 | Yes | Yes |
|
| Glioblastoma | Smart-seq2 | 4 human glioblastoma tumors | 3,589 | 23,465 | No | No |
|
| Mixed cancer: Melanoma, breast mammary carcinoma, Lewis lung carcinoma, colon carcinoma, fibrosarcoma | 10x Genomics | 1 mouse melanoma tumor, 1 mouse breast mammary carcinoma tumor, 1 mouse Lewis lung carcinoma tumor, 2 different mouse colon carcinoma tumors, 1 mouse fibrosarcoma tumor | 10,473 | 27,998 | No | No |
|
| Glioblastoma | Fluidigm C1 | 1 human glioblastoma cancer cell line, 1 normal neural stem cell line | 134 | 21,209 | No | No |
|
| Nasopharyngeal carcinoma | 10x Genomics | 15 human nasopharyngeal carcinoma tumors | 48,584 | 24,720 | Yes | No |
|
| Pancreatic ductal adenocarcinoma | 10x Genomics | 16 human pancreatic ductal adenocarcinoma tumors | 14,926 | 22,217 | No | Yes |
|
| Ependymoma | 10x Genomics | 26 human ependymoma tumors | 18,500 | 23,580 | Yes | No |
|
| Gastric cancer | 10x Genomics | 13 human gastric tumors | 56,440 | 22,910 | No | No |
|
| Breast cancer | 10x Genomics | 4 mouse breast cancer tumours | 13,745 | 31,053 | No | No |
|
| Anaplastic thyroid cancer | 10x Genomics | 5 human anaplastic thyroid tumors | 19,568 | 33,540 | No | No |
|
| Breast ductal carcinoma | 10x Genomics | 1 human breast ductal carcinoma tumor | 1,480 | 33,694 | No | No |
|
| Triple negative breast cancer | 10x Genomics | 3 human triple negative breast cancer tumors | 2,663 | 33,964 | No | No |
|
| Triple negative breast cancer | 10x Genomics | 2 human triple negative breast cancer tumors | 6,281 | 33,538 | No | No |
|
| Breast invasive ductal carcinoma | 10x Genomics | 2 human breast invasive ductal carcinoma tumors | 6,209 | 33,540 | No | No |
|
| Merkel cell carcinoma | 10x Genomics | 2 human primary merkel cell carcinoma tumors | 25,066 | 11,072 | No | No |
|
| Thymic cancer | 10x Genomics | 7 human primary thymic cancer tumors | 74,780 | 33,694 | No | Yes |
|
| Merkel cell carcinoma | 10x Genomics | 2 primary merkel cell carcinoma tumors from 1 human patient at 2 timepoints | 7,432 | 21,861 | No | No |
|
| Lung adenocarcinoma | SMART-seq | 2 primary human lung adenocarcinoma tumors | 201 | 57,820 | No | No |
|
| Ewing sarcoma | 10x Genomics | 3 Ewing sarcoma patient-derived xenografts samples | 97 | 56,764 | No | No |
|
| Prostate cancer | Seq-Well S^3 | 6 prostate biopsies from 3 different patients, 4 radical prostatectomies with tumor-only samples from 4 patients, and 4 radical prostatectomies with matched normal samples from 4 patients | 53765 | 19,665 | No | Yes |
|
| Nasopharyngeal carcinoma | 10x Genomics | 10 human nasopharyngeal carcinoma tumor-blood paired samples | 176,447 | 20,930 | No | Yes |
|
| Head and neck squamous cell carcinoma | 10x Genomics | 18 primary human head and neck squamous cell carcinoma tumors | 61,221 | 33,545 | No | Yes |
|
| Ependymoma | SMART-seq2 | 20 fresh surgical tumor specimens from 18 ependymoma patients, eight patient-derived cell models, and two patient-derived xenograft models | 6,739 | 20,447 | Yes | Yes |
|
| Colon cancer | SMART-seq2 | 18 primary human colorectal cancer tumors | 43,817 | 13,538 | No | Yes |
|
| Pancreatic ductal adenocarcinoma | 10x Genomics | 16 primary human pancreatic ductal adenocarcinoma tumors | 55,652 | 32,738 | No | No |
|
| Pancreatic ductal adenocarcinoma | 10x Genomics | 16 metastatic human pancreatic ductal adenocarcinoma tumors | 17,889 | 33,694 | No | Yes |
|
| Pancreatic ductal adenocarcinoma | inDROP | 11 primary human pancreatic cancer tumors | 19,738 | 4,343 | No | No |
|
| Non-small cell lung cancer | 10x Genomics | 42 primary human non-small cell lung cancer tumors | 89,887 | 29,527 | No | No |
|
| Gastric cancer | 10x Genomics | 47 patient biopsies consisting of 24 gastric cancer lesions and 23 adjacent normals | 13,113 | 8,705 | No | No |
|
| Gastric cancer | 10x Genomics | 48 primary human gastric cancer tumors | 158,641 | 26,571 | No | Yes |
|
| Lung adenocarcinoma | 10x Genomics | 11 tumour, 11 distant normal lung, 10 normal lymph node, and 10 metastatic brain tissue samples from patients without prior treatment. 7 metastatic lymph node and 4 lung tumour tissue samples from advanced stage patients. | 208,506 | 29,634 | Yes | Yes |
Fig 2The format of the SingleCellExperiment objects containing TME datasets.
The Assay is a matrix or dgCMatrix containing the gene expression table, named according to the type of score (i.e. an Assay containing raw counts would be named “Counts”); colData is a DataFrame with the number of rows equal to the number of columns in the Assay and describes the cells in the dataset; Metadata is a named list of additional metadata objects describing the dataset. A SingleCellExperiment object may contain one or more AltExps, which are nested SingleCellExperiment objects containing a different score type in the Assay.
Fig 3An overview of the main functions of TMExplorer.
A. queryTME allows users to search and return datasets in either a descriptive table or as a list of SingleCellExperiment objects for analysis. B. saveTME allows users to write datasets to disk. For each dataset written to disk, up to three files are created; a table storing the expression data as either a CSV or matrix market file, depending on whether a dense or sparse matrix is passed to the function; a table containing the cells and their truth label, if available; and a table containing the cell type signature gene sets, if available.
Fig 4An example workflow of using TMExplorer to obtain datasets for the downstream analysis using Python and R.
Users start by using queryTME to return all datasets that have cell type labels and cell type signature gene sets, which will get a list of matching datasets contained in SingleCellExperiment objects. Then, for R based algorithms, users can pass the SingleCellExperiments directly if that is supported, or users can pass the individual components required. For Python based algorithms, saveTME can be used to save the files for each dataset to disk, which can then be opened in Python for analysis.
Fig 5A summary of TMExplorer contents.
Here, we provide a summary of the number of humans and mice datasets in TMExplorer (A); the number of datasets generated by various sequencing technologies (B); the number of datasets for which cell type labels and gene signatures are available (C); and the distributions of score types of different datasets (D) and the tumour types (E). In addition, boxplots of the number of cells, genes, tumours and patients across different datasets are provided (F).
Fig 6A flowchart of data query and analysis using TMExplorer.
TMExplorer provides a search and analysis capability, where users can look up and return their datasets of interest, view the expression matrix, cell type labels and metadata including gene signatures (if available) and continue by either using R for data visualization and analysis, or save the datasets in CSV format to be analyzed by their programming language of choice (e.g. Python).
Fig 7A case study on using TMExplorer to identify cell types.
A case study showing how TMExplorer can be used in order to obtain datasets for cell cluster labelling via Seurat and GSVA. queryTME can be used to return those datasets which have both gene signatures and cell type annotations required for testing the automated identification of cell types. The expression data can be passed to Seurat for cell clustering, and the gene signatures can be used by GSVA to identify the cell types in Seurat’s clusters. Finally, the cell type annotations can be used as the truth labels to measure the performance of the results obtained by Seurat clustering followed by GSVA.
Fig 8A case study on using TMExplorer for inferring CNVs.
A case study showing how TMExplorer can be used to obtain multiple datasets for a specific tumour type, to be used with CNV-based separation methods, such as CONICSmat. QueryTME returns datasets of a specific tumour type, such as Glioblastoma. These datasets can then be inputted directly into large-scale CNV inferencing methods, such as CONICSmat.