| Literature DB >> 31028376 |
Rui Hou1, Elena Denisenko1, Alistair R R Forrest1.
Abstract
MOTIVATION: Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process.Entities:
Mesh:
Year: 2019 PMID: 31028376 PMCID: PMC6853649 DOI: 10.1093/bioinformatics/btz292
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Annotation of single cells using scMatch. scRNA-seq expression profiles are compared against a reference database. Matching samples from the reference database are then ranked by highest to lowest similarity. In top-match mode the cell is annotated with the label of the reference sample with the highest similarity. In ontology-mode, the cell ontology with highest average similarity is used to annotate the query cell. Shapes represent the features of a unique gene expression profile
Annotation recalls of scMatch on four deeply profiled cell lines
| Remove zeros | Keep zeros | |||
|---|---|---|---|---|
| Spearman (%) | Pearson (%) | Spearman (%) | Pearson (%) | |
| A549 | 100.00 | 14.90 | 100.00 | 45.90 |
| K562 | 95.90 | 1.40 | 100.00 | 20.50 |
| GM12878 batch1 | 81.20 | 0.00 | 93.80 | 0.00 |
| GM12878 batch2 | 100.00 | 50.00 | 100.00 | 57.30 |
| H1 batch1 | 100.00 | 82.60 | 100.00 | 94.20 |
| H1 batch2 | 100.00 | 84.40 | 100.00 | 96.90 |
Note: Spearman’s and Pearson’s correlation coefficients were calculated between individual cells from the Li dataset and all FANTOM5 samples, using either all genes (keep zeros), or detected genes only (remove zeros). An annotation was considered correct if the matching cell type in FANTOM5 had the highest correlation. Annotation recall is calculated as the number of cells that were correctly annotated by scMatch divided by the total number of cells of that type in the Li et al. dataset. The number of single cells for each cell line is: A549: 74 cells, K562: 73 cells, GM12878 batch1: 32 cells, GM12878 batch12: 96 cells, H1 batch1: 69 cells and H1 batch2: 96 cells. Batch1 and batch2 correspond to biological replicates.
Annotation recalls of scMatch on four down-sampled cell lines using Spearman’s correlation coefficient
| Spearman | |||
|---|---|---|---|
| Remove zeros (%) | Keep zeros (%) | ||
| A549 | 150 000 reads | 99.20 | 100.00 |
| 100 000 reads | 98.00 | 100.00 | |
| 50 000 reads | 94.60 | 100.00 | |
| 10 000 reads | 87.30 | 99.70 | |
| 1000 reads | 63.00 | 92.20 | |
| K562 | 150 000 reads | 95.20 | 100.00 |
| 100 000 reads | 92.90 | 100.00 | |
| 50 000 reads | 90.80 | 100.00 | |
| 10 000 reads | 85.10 | 99.50 | |
| 1000 reads | 14.50 | 94.80 | |
| GM12878 batch1 | 150 000 reads | 80.30 | 90.00 |
| 100 000 reads | 74.70 | 89.70 | |
| 50 000 reads | 66.90 | 90.00 | |
| 10 000 reads | 41.20 | 89.10 | |
| 1000 reads | 14.40 | 79.70 | |
| GM12878 batch2 | 150 000 reads | 99.70 | 100.00 |
| 100 000 reads | 99.60 | 100.00 | |
| 50 000 reads | 98.40 | 100.00 | |
| 10 000 reads | 92.80 | 100.00 | |
| 1000 reads | 46.10 | 96.60 | |
| H1 batch1 | 150 000 reads | 98.30 | 100.00 |
| 100 000 reads | 98.00 | 100.00 | |
| 50 000 reads | 97.40 | 100.00 | |
| 10 000 reads | 96.20 | 100.00 | |
| 1000 reads | 50.00 | 99.40 | |
| H1 batch2 | 150 000 reads | 100.00 | 100.00 |
| 100 000 reads | 100.00 | 100.00 | |
| 50 000 reads | 100.00 | 100.00 | |
| 10 000 reads | 99.60 | 100.00 | |
| 1000 reads | 59.30 | 99.10 | |
Note: The recalls at depths varying from 1000 read up to 150 000 reads calculated (i) only using detected genes or (ii) using all genes. Average recalls of 10 random down-samplings are shown for each. Batch1 and batch2 correspond to biological replicates. Annotation recall is calculated as in Table 1.
Annotation of 93 655 PBMC cells profiled on the 10X platform
| Specific genes | Highly expressed genes | All genes | ||||
|---|---|---|---|---|---|---|
| Pearson (%) | Spearman (%) | Pearson (%) | Spearman (%) | Pearson (%) | Spearman (%) | |
| CD4+/CD25+ regulatory T cells | 37.7 | 8.4 | 55.6 | 53.2 | 59.7 | 19.3 |
| CD8+ cytotoxic T cells | 30.7 | 37.5 | 2.0 | 76.4 | 2.0 | 98.3 |
| CD19+ B cells | 92.0 | 41.5 | 38.1 | 99.9 | 37.4 | 100.0 |
| CD56+ NK cells | 0.6 | 4.9 | 22.9 | 0.0 | 20.6 | 0.0 |
| CD4+ helper T cells | 55.5 | 68.1 | 98.2 | 30.0 | 98.4 | 5.8 |
| CD14+ monocytes | 10.7 | 78.8 | 28.0 | 42.9 | 27.5 | 78.8 |
| CD34+ cells | 0.8 | 4.4 | 56.3 | 83.0 | 36.0 | 82.8 |
| CD4+/CD45RO+ memory T cells | 44.0 | 0.5 | 26.3 | 12.7 | 28.5 | 5.4 |
| CD4+/CD45RA+/CD25− naive T cells | 11.1 | 51.0 | 86.6 | 5.5 | 85.9 | 7.7 |
| CD8+/CD45RA+ naive cytotoxic T cells | 29.7 | 35.8 | 0.0 | 86.5 | 0.0 | 99.1 |
Note: The relative recalls for PBMCs when annotated using scMatch with the FANTOM5 dataset as a reference. (i) Pearson’s correlation coefficient and (ii) Spearman’s correlation. Accuracy calculated using different gene lists is shown [(i) All reference set genes (22 049 genes), (ii) Highly expressed genes (detected in the FANTOM5 atlas with maximum expression ≥500 tags per million, 4129 genes) and (iii) Manually curated lineage-specific genes from the FANTOM5 atlas (272 genes)]. Annotation recall is calculated as the number of cells that were correctly annotated by scMatch divided by the total number of cells of that type in the Zheng dataset.
The recalls of annotating the PBMCs using scMatch and SingleR using ENCODE, BLUEPRINT and HPCA datasets
| BLUEPRINT+ENCODE | HPCA | BLUEPRINT+ENCODE +HPCA | BLUEPRINT+ENCODE +HPCA+FANTOM5 | ||||
|---|---|---|---|---|---|---|---|
| SingleR (%) | scMatch (%) | SingleR (%) | scMatch (%) | SingleR (%) | scMatch (%) | scMatch (%) | |
| CD4+/CD25+ Regulatory T cells | 42.8 | 28.1 | 0.0 | 0.0 | 43.1 | 37.9 | 39.0 |
| CD8+ Cytotoxic T cells | 82.7 | 27.5 | 18.0 | 8.9 | 34.0 | 43.0 | 44.8 |
| CD19+ B cells | 99.9 | 66.5 | 100.0 | 99.9 | 90.5 | 99.0 | 99.2 |
| CD56+ NK cells | 99.2 | 95.8 | 95.1 | 93.3 | 95.0 | 98.4 | 98.5 |
| CD4+ Helper T cells | 94.5 | 86.4 | 99.5 | 99.5 | 93.5 | 98.2 | 97.9 |
| CD14+ Monocytes | 98.1 | 95.5 | 83.2 | 95.6 | 96.4 | 97.6 | 97.7 |
| CD34+ cells | 84.6 | 88.2 | 80.4 | 85.3 | 88.5 | 85.7 | 85.3 |
| CD4+/CD45RO+ Memory T cells | 75.4 | 80.8 | 95.3 | 97.5 | 73.1 | 86.8 | 86.0 |
| CD4+/CD45RA+/CD25− Naive T cells | N/A | N/A | 85.3 | 85.3 | 0.0 | 0.0 | 0.0 |
| CD8+/CD45RA+ Naive Cytotoxic T cells | N/A | N/A | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Note: Comparison of scMatch and SingleR using different reference datasets. Annotation recall is calculated as in Table 3. (i) We used SingleR in fine-tuning mode, (ii) There are no matched reference samples in the BLUEPRINT + ENCODE database for CD4+/CD45RA+/CD25− Naïve T cells and CD8+/CD45RA+ Naïve cytotoxic T cells.