| Literature DB >> 36185375 |
Congmin Xu1, Junkai Yang2, Astrid Kosters2, Benjamin R Babcock2, Peng Qiu1, Eliver E B Ghosn2,3.
Abstract
Single-cell transcriptomics enables the definition of diverse human immune cell types across multiple tissues and disease contexts. Further deeper biological understanding requires comprehensive integration of multiple single-cell omics (transcriptomic, proteomic, and cell-receptor repertoire). To improve the identification of diverse cell types and the accuracy of cell-type classification in multi-omics single-cell datasets, we developed SuPERR, a novel analysis workflow to increase the resolution and accuracy of clustering and allow for the discovery of previously hidden cell subsets. In addition, SuPERR accurately removes cell doublets and prevents widespread cell-type misclassification by incorporating information from cell-surface proteins and immunoglobulin transcript counts. This approach uniquely improves the identification of heterogeneous cell types and states in the human immune system, including rare subsets of antibody-secreting cells in the bone marrow.Entities:
Keywords: Biocomputational method; Omics; Systems biology
Year: 2022 PMID: 36185375 PMCID: PMC9523353 DOI: 10.1016/j.isci.2022.105123
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1SuPERR workflow
(A) Schematic overview of the experimental design. Peripheral blood and bone marrow aspirates were processed, surface-stained with barcoded antibodies, and then encapsulated with barcoded microspheres. We generated three libraries for each sample corresponding to gene expression (GEX), cell-surface protein/antibody-derived tags (ADT), and cell-receptor repertoire (VDJ). Libraries were sequenced to a target depth, and count matrices were assembled for each-omic data separately.
(B) SuPERR workflow is composed of two main steps. Major cell lineages are manually gated at the first step by integrating information from both the ADT and V(D)J data matrices. Then, the manually-gated cell lineages are further sub-clustered based on information from the GEX data. The V(D)J matrix can be used to further identify the diversity of heavy (VH) and light (VL) variable genes among the plasma cell clusters. PCs: plasma cells. See also Tables S1 and S2.
Figure 2SuPERR workflow applied to peripheral blood mononuclear cells (PBMCs)
(A) “Gating strategy” approach to identify major cell lineages on biaxial plots based on surface markers (ADT) and V(D)J data. Total Ig transcript: sum of Ig UMIs in the VDJ matrix. Gates for major lineages are indicated as black outlines and black text. Gates for downstream cell-identity validation are indicated as golden outlines and golden text.
(B) Cross comparison between the manually-gated major lineages and the final SuPERR clusters.
(C) The average expression levels of surface markers (ADT) and VDJ features for the final SuPERR clusters. Only the ADTs/VDJ features that were not used for sequential gating are included. All gates: all cell types defined by sequential gating. SUPERR clusters: clusters generated by clustering on each major cell types. PCs: plasma cells. See also Figures S1 and S3.
Figure 3SuPERR workflow applied to bone marrow (BM) cells
(A) “Gating strategy” approach to identify major cell lineages on biaxial plots based on surface markers (ADT) and V(D)J data. Total Ig transcript: sum of Ig UMIs in the VDJ matrix. Gates for major lineages are indicated as black outlines and black text. Gates for downstream cell-identity validation are indicated as golden outlines and golden text.
(B) Cross comparison between the manually-gated major lineages and the final SuPERR clusters.
(C) The average expression levels of surface markers (ADT) and VDJ features for the final SuPERR clusters. Only the ADTs/VDJ features that were not used for sequential gating are included. Non-productive: 1, if a cell were labeled as non-productive in the VDJ matrix and 0 if not. Productive VDJ: 1, if a cell was labeled as productive in the VDJ matrix and 0 if not. All gates: all cell types defined by sequential gating. SUPERR clusters: clusters generated by clustering on each major cell types. PCs: plasma cells. See also Figures S2 and S3.
Cell-type classification and phenotype of PBMC clusters generated by the SuPERR workflow
| Main lineage | SuPERR clusters | Cell classification | Phenotype | Refs. |
|---|---|---|---|---|
| Plasma cells | 1 | Plasma cells | CD19+/CD20-/highest levels of Ig-specific transcripts | ( |
| B cells | 2 | Naïve-I | IgD+/IgM+/CD27-/CD95- | ( |
| (CD19+, CD20+) | 3 | Naïve-II | IgD+/IgM+/CD27-/CD95-/HLA-DQA2+ | |
| 4 | Switched Memory | IgD-/IgM-/CD27+/CD95+ | ||
| 5 | Unswitched Memory | IgD+/IgM+/CD27+ | ||
| 6 | IgMhi | IgMhi/CD27-/CD95+/CD24- | ||
| NK/NKT/MAIT/γδT | 7 | NK-I CD16+ | KLRC3+/CD11bhi | ( |
| (CD56+) | 8 | NK-II CD16hi | CX3CR1+ | |
| 9 | NK-III CD16+ | CX3CR1+ | ||
| 10 | NK CD56hi | CD16− | ||
| 11 | MAIT cells | TRAV1-2+ | ||
| 12 | NKT | CD4+/CCR7+ | ||
| 13 | iNKT | TRAV24+/CD8+ | ||
| 14 | NK-IV CD16+ | CD11bhi | ||
| 15 | γδ T cells | TRGV9+/TRDV2+ | ||
| 16 | NK-V CD16+ (dividing) | MKI67+ | ||
| Monocytes | 17 | Classical-I | CD16-/CD68+/HLA-Drhi | ( |
| (CD14lo/+) | 18 | Classical-II | CD16-/CD68+/HLA-DRlo | |
| 19 | Classical-III | CD16-/CD68+/HLA-DRlo/CD11bhi | ||
| 20 | Non-classical | CD16+/CD14-/lo/CD68+/HLA-DRhi | ||
| 21 | Intermediate/DCs | CD16int/CD14int/CD68+/HLA-DRhi/CD11c+ | ||
| CD4 T cells | 22 | Naïve-I | CCR7+/SELL+/CD27+/CD95-/CD3hi | ( |
| (CD3+, CD4+) | 23 | Naïve-II | CCR7+/SELL+/CD27+/CD95- | |
| 24 | Tcm→Tem | CCR7lo/SELL+/CD27+/CD95+ | ||
| 25 | CTL-I | KLRB1+ | ||
| 26 | CTL-II | KLRB1+/GZMA+/GZMK+ | ||
| 27 | Tem | CCR7lo/SELL-/CD27-/CD95+/LGALS1+/S100A4hi | ||
| 28 | Treg | FOXP3+/CTLA4+/CD95+/HLA-DRB1+ | ||
| 29 | Treg Naive | FOXP3+/CD45RA+/CD95- | ||
| 30 | Temra | CD45RA+/NKG7+/GNLY+/GZMB+ | ||
| CD8 T cells | 31 | Naïve-I | CCR7+/CD45RA+/CD27+/CD127+ | ( |
| (CD3+, CD8a+) | 32 | Tcm-I | CCR7+/SELL+/CD27+/CX3CR1- | |
| 33 | Tem | CCR7-/SELL+/CX3CR1+ | ||
| 34 | TLE | CCR7-/SELL+/CX3CR1+/ZNF683+ | ||
| 35 | MAIT cells | TRAV1-2+ | ||
| 36 | Naïve-II | CCR7+/CD45RA+/CD27hi/CD127+ | ||
| 37 | Temra | CD45RA+/CCR7-/CD27-/CD127- | ||
| 38 | Tcm-II | CCR7+/SELL+/CD27+/CX3CR1-/TNFSF10+ |
Gene expression (GEX) and cell-surface (ADT) markers used to classify the lymphoid and myeloid cell populations in the human peripheral blood mononuclear cells (PBMC). NK: Nature Killer cells; NKT: NK T cells; iNKT: invariant NKT; γδ T: gamma-delta T cells; Tcm: central memory T cells; Tem: effector memory T cells; CTL: cytotoxic T lymphocytes; Treg: regulatory T cells; Temra: effector memory T cells expressing CD45RA; TLE: long-lived effector memory T cells; MAIT: mucosal-associated invariant T cells. See also Figures S5–S9.
Cell-type classification and phenotype of BM clusters generated by the SuPERR workflow
| Main lineage | SuPERR clusters | Cell classification | Phenotype | Refs. |
|---|---|---|---|---|
| Plasma cells | 1 | Ighi/PRDM1- | Ighi/PRDM1- | ( |
| (CD138-) | 2 | HLA-DRhi/SDC1lo/G2-M phase | HLA-DRhi/SDC1lo/G2-M phase | |
| Plasma cells | 3 | IRF4hi/PDL1+ | IRF4hi/PDL1+ | |
| (CD138+) | 4 | XBP1hi/SDC1hi/PDL1hi | XBP1hi/SDC1hi/PDL1hi | |
| B cells | 5 | T3/Naive | CD21+/IL4R+ | ( |
| (CD19+, CD20+) | 6 | Small pre-B/pre-BII-I | RAG+/IGKC+ | |
| 7 | Small pre-B/pre-BII-II | RAG+/IGLC3+ | ||
| 8 | Switched Memory | CD21+/CD27+/IGHA1+/IGHG1+ | ||
| 9 | T1/T2 | RAG-/CD10+/CD20-/lo | ||
| 10 | Large pre-B/pre-BII | CD34-/DNTT-/MKI67+ | ||
| 11 | Early pro-B/pre-pro-B | CD34+/DNTThi | ||
| 12 | Activated naïve | CD21+/NR4A1+/DUSP2+ | ||
| 13 | Late pro-B/pre-BI | CD34+/DNTTint | ||
| HSPCs | 14 | Pre-reticulocytes | GYPAint/HBB+ | ( |
| (Lineage-, CD34+/−) | 15 | Pre-pDCs | CD123+/CD304+/CD303+/CSF2RA+ | |
| 16 | GMP | CD34+/FLT3+/CD164+/CD45RAhi/Cell cycle | ||
| 17 | MEP | CD34+/GYPA-/ITGA2B+ | ||
| 18 | Reticulocytes | GYPA-/HBB+/HBM- | ||
| 19 | Pro-Neutrophil | CD34+/MPO+/ELANE+ | ||
| 20 | Reticulocytes GYPAhi | GYPAhi/HBB+ | ||
| 21 | HSC/MPP | CD34+/AVP+/CD38- | ||
| 22 | Erythrocytes | GYPA-/HBB+/HBM- | ||
| 23 | Pro-erythroblast | GYPAlo/HBB+/HBM+ | ||
| 24 | CMP | CD34+/FLT3+/CD164+ | ||
| 25 | Basophil/Mast cell progenitors | CD34+/CLC+/HDC+ | ||
| Myeloid/Granuloid | 26 | Neutropoiesis | MPOhi/ELANEhi | ( |
| (CD11c+) | 27 | Monopoiesis | CD68hi/CD14hi | |
| 28 | Monopoiesis | CD68lo/CD14int | ||
| 29 | Neutropoiesis | MPOint/ELANEint | ||
| 30 | Monopoiesis | CD68lo/CD14lo | ||
| 31 | Monopoiesis | CD68int/CD14int | ||
| 32 | Neutropoiesis | MPOlo/ELANElo/MKI67hi | ||
| 33 | Monopoiesis | CD68hi/CD14hi | ||
| 34 | Progenitors | FLT3+/CD74+ |
Gene expression (GEX) and cell-surface (ADT) markers used to classify the plasma cells, B cells, myeloid and granuloid cells, and the hematopoietic stem and progenitor cells (HSPCs) in the human bone marrow (BM). Ig: immunoglobulin transcripts; G2-M phase: genes involved in the cell cycle; T1/2/3: Transitional B cells; pDCs: plasmacytoid dendritic cells; GMP: granulocyte-monocyte progenitor; MEP: megakaryocyte-erythroid progenitor; HSC: hematopoietic stem cell; MPP: multipotent progenitor; CMP: common myeloid progenitor. See also Figures S4, S10, and S11.
Figure 4Cell-type-specific variations in gene expression
(A) “Gene count” represents the number of unique genes expressed by each cell type. Error bars in boxplots are the 95% confidence interval.
(B) “UMI count” represents the total mRNA abundance expressed by each cell type.
(C) “Percent of Ribosomal” represents the percentage of ribosomal gene UMI counts expressed by each cell type. The grey line shows the mean expression level for each feature in the total PBMC and BM samples.
(D) Left panel: the top 30 (red points) and the top 300 (grey points) highly variable genes (HVGs) from total PBMC. The points under the red dashed line fall below the top 300 HVGs of total PBMC. Right panel: the top 30 HVGs from PBMC-derived B cells (green points) displayed with the top 300 HVGs from total PBMC (grey points). Student’s t-test was used to compare the mean of each cell type with the mean of the total PBMC/BM. ∗p<0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001, unpaired, two-tailed. Multiple-group ANOVA test for (A), (B), and (C): p < 2.2e-16. PCs: plasma cells.
Figure 5SuPERR workflow identifies four subsets of human plasma cells in the BM
(A) UMAP representation of the four bone marrow (BM) plasma cell clusters.
(B) Top panel: percentage of Ig-specific transcripts (UMI) expressed in each plasma cell subset. Bottom panel: expression levels (sum) of plasma cell genes (see STAR Methods) after removing Ig-specific UMIs and re-normalizing the data matrix. Error bars in boxplots are the 95% confidence interval.
(C) Expression levels of individual plasma cell genes, cell-cycle score after removing Ig-specific UMIs (See STAR Methods) and ADT. The grey line shows the mean expression level across all clusters.
(D) The antibody isotypes and subclasses expressed by each plasma cell subset.
(E) The connected lines on the Circus plot describe shared clones between clusters (clonal lineage was identified by the identical V and J gene usage, identical CDR3 nucleotide length, and ≥85% homology within the CDR3 nucleotide sequence).
(F) Reactome Pathway Database analysis (see STAR Methods) shows unique biological processes that define each plasma cell subset.
Figure 6Cell-doublet identification by SuPERR using both surface markers and gene expression data matrices
(A) Distribution (gene count x total UMI) of cell doublets (left) and singlets (right) detected by the SuPERR approach. The red dashed lines show the threshold used by some conventional approaches to exclude cells that express higher than mean+4SD of gene count and total UMI. Only the cells above the dashed lines would have been excluded from the downstream analysis in conventional approaches (i.e., plasma cells in PBMCs, highlighted in the red circle, would have been incorrectly excluded from downstream analysis).
(B) The number of unique genes (left panels) and the number of total UMIs (right panels) expressed by singlets and doublets in PBMC (top panels) and BM (bottom panels). The grey line shows the mean expression level across all clusters. Error bars in boxplots are the 95% confidence interval.
(C) Cell doublets identified by the SuPERR workflow and projected on a UMAP, showing the cell doublets are spread across multiple clusters.
(D) Venn diagram comparing the cell doublets identified by the SuPERR workflow and the ScDblFinder pipelines.
(E) Proportion of heterotypic doublets identified and classified by SuPERR in PBMC.
(F) Expression level of gene signatures (see STAR Methods) of heterotypic doublets defined by SuPERR and scDblFinder to confirm their cell identities. Red points represent SuPERR-defined doublets. Green points are the cell doublets identified by both SuPERR and scDblFinder. Blue points represent scDblFinder-defined doublets, which were identified as singlets by SuPERR. The immune cell types were annotated by the SuPERR workflow. See also Figures S1, S2, and S12.
Figure 7SuPERR identifies significant cell-type misclassifications in other commonly-used approaches
(A) Red points represent the peripheral blood mononuclear cells (PBMC) that were misclassified by either the conventional approach using GEX data only (i.e., Seurat v3), or by more recent approaches using both GEX and ADT data, such as the WNN in Seurat v4, and the SNF in CiteFuse. The Cell Fidelity Statistic (CFS, see STAR Methods) reports the fraction of correctly classified cells, the inverse of which is the fraction of misclassified cells (6.94% by Seurat v3, 5.16% by Seurat v4, 2.42% by CiteFuse).
(B) Red points represent the bone marrow (BM) cells that were misclassified by Seurat v3 (5.31%), WNN/Seurat v4 (5.12%), and SNF/CiteFuse (5.15%) as determined by CFS. CFS scores show a progressive improvement in cell-type classification from Seurat v3 (GEX only) to Seurat v4 and CiteFuse, revealing higher agreement between CiteFuse and gold-standard biaxial gating of cell lineages.
(C) The PBMC cluster 4 generated by the WNN method (Seurat v4) contains misclassified cells (i.e., a mixture of NK, NKT, and T cells) and was further explored using the cell-surface (ADT) markers CD56 and CD3 (left panel). The Differential Gene Expression (DGE) analysis for cluster 4 (pink circle) compared to “cleaned” NK cells (Venn diagram) shows the TRGV10 gene as a top hit. However, the TRGV10 gene is mostly expressed in CD3+ gamma-delta T cells and absent in NK cells (right panel). See also Figures S14–S17.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| TotalSeq-C0148 anti-human CD197 (CCR7) (clone G043H7) | BioLegend | Cat# 353251; RRID: |
| TotalSeq-C0390 anti-human CD127 (IL-7Ralpha) (clone A019D5) | BioLegend | Cat# 351356; RRID: |
| TotalSeq-C0161 anti-human CD11b (clone ICRF44) | BioLegend | Cat# 301359; RRID: |
| TotalSeq-C0081 anti-human CD14 (clone M5E2) | BioLegend | Cat# 301859; RRID: |
| TotalSeq-C0083 anti-human CD16 (clone 3G8) | BioLegend | Cat# 302065; RRID: |
| TotalSeq-C0050 anti-human CD19 (clone HIB19) | BioLegend | Cat# 302265; RRID: |
| TotalSeq-C0100 anti-human CD20 (clone 2H7) | BioLegend | Cat# 302363; RRID: |
| TotalSeq-C0181 anti-human CD21 (clone Bu32) | BioLegend | Cat# 354923; RRID: |
| TotalSeq-C0085 anti-human CD25 (clone BC96) | BioLegend | Cat# 302649; RRID: |
| TotalSeq-C0154 anti-human CD27 (clone O323) | BioLegend | Cat# 302853; RRID: |
| TotalSeq-C0034 anti-human CD3 (clone UCHT1) | BioLegend | Cat# 300479; RRID: |
| TotalSeq-C0389 anti-human CD38 (clone HIT2) | BioLegend | Cat# 303543; RRID: |
| TotalSeq-C0072 anti-human CD4 (clone RPA-4) | BioLegend | Cat# 300567; RRID: |
| TotalSeq-C0048 anti-human CD45 (clone 2D1) | BioLegend | Cat# 368545; RRID: |
| TotalSeq-C0063 anti-human CD45RA (clone HI100) | BioLegend | Cat# 304163; RRID: |
| TotalSeq-C0084 anti-human CD56 (NCAM) (clone QA17A16) | BioLegend | Cat# 392425; RRID: |
| TotalSeq-C0146 anti-human CD69 (clone FN50) | BioLegend | Cat# 310951; RRID: |
| TotalSeq-C0005 anti-human CD80 (clone 2D10) | BioLegend | Cat# 305243; RRID: |
| TotalSeq-C0006 anti-human CD86 (clone IT2.2) | BioLegend | Cat# 305447; RRID: |
| TotalSeq-C0080 anti-human CD8a (clone RPA-T8) | BioLegend | Cat# 301071; RRID: |
| TotalSeq-C0156 anti-human CD95 (Fas) (clone DX2) | BioLegend | Cat# 305651; RRID: |
| TotalSeq-C0159 anti-human HLA-DR (clone L243) | BioLegend | Cat# 307663; RRID: |
| TotalSeq-C0007 anti-human CD274 (B7-H1, PD-L1) (clone 29E.2A3) | BioLegend | Cat# 329751; RRID: |
| Purified anti-human CD11c (clone S-HCL-3) | BioLegend | Cat# 371502; RRID: |
| Purified anti-human CD133 (Prominin-1) (clone 7) | BioLegend | Cat# 372802; RRID: |
| Purified anti-human CD138 (Syndecan-1) (clone MI15) | BioLegend | Cat# 356502; RRID: |
| Purified anti-human CD34 (clone 581) | BioLegend | Cat# 343602; RRID: |
| Purified anti-human CD41 (clone HIP8) | BioLegend | Cat# 303702; RRID: |
| Purified anti-human CD43 (clone CD43-10G7) | BioLegend | Cat# 343202; RRID: |
| Purified anti-mouse/human CD45R/B220 (clone RA3-6B2) | BioLegend | Cat# 103202; RRID: |
| Purified anti-human CD5 (clone UCHT2) | BioLegend | Cat# 300602; RRID: |
| Alexa Fluor 488 anti-human CD45 (clone HI30) | BioLegend | Cat# 368536; RRID: |
| PerCP/Cyanine5.5 anti-human CD38 (clone HIT2) | BioLegend | Cat# 303522; RRID: |
| Alexa Fluor 647 anti-human CD14 (clone HCD14) | BioLegend | Cat# 325612; RRID: |
| Alexa Fluor 647 anti-human CD16 (clone 3G8) | BioLegend | Cat# 302020; RRID: |
| Alexa Fluor 700 anti-human IgM (clone MHM-88) | BioLegend | Cat# 314538; RRID: |
| Brilliant Violet 421 anti-human CD27 (clone M-T271) | BioLegend | Cat# 356418; RRID: |
| Brilliant Violet 570 anti-human CD16 (clone 3G8) | BioLegend | Cat# 302036; RRID: |
| Brilliant Violet 570 anti-human CD3 (clone UCHT1) | BioLegend | Cat# 300436; RRID: |
| Brilliant Violet 650 anti-human CD20 (clone 2H7) | BioLegend | Cat# 302336; RRID: |
| Brilliant Violet 785 anti-human CD19 (clone HIB19) | BioLegend | Cat# 302240; RRID: |
| ZombieUV Viability Dye | BioLegend | 423107 |
| PE Mouse Anti-Human CD34 (clone 8G12) | BD Biosciences | Cat# 348057; RRID: |
| PE-Cy7 Mouse Anti-Human IgD (clone IA6-2) | BD Biosciences | Cat# 561314; RRID: |
| Human Adult Peripheral Blood | Emory University’s Children’s Clinical and Translational Discovery Core | N/A |
| Human Adult Bone Marrow | Emory University; AllCells | N/A |
| Chromium Single Cell 5′ Library & Gel Bead | 10X Genomics | Cat# 1000006 |
| Chromium Single Cell A Chip Kit | 10X Genomics | Cat# 120236 |
| Chromium Single Cell V(D)J Enrichment Kit, | 10X Genomics | Cat# 1000016 |
| Chromium Single Cell 5' Library Construction | 10X Genomics | Cat# 1000020 |
| Chromium Single Cell 5' Feature Barcode | 10X Genomics | Cat# 1000080 |
| EasySep Direct Human PBMC Isolation Kit | Stem Cell Technologies | Cat# 19654 |
| EasySep Human B-Cell Enrichment Kit II without CD43 Depletion | Stem Cell Technologies | Cat# 17963 |
| Custom oligonucleotide antibody conjugation | Expedeon | Custom |
| Raw and analyzed data | This paper | GEO: |
| /5AmMC12/CGGAGATGTGTATAAGA | IDT | Custom |
| /5AmMC12/CGGAGATGTGTATAAGAGACAGNN | IDT | Custom |
| /5AmMC12/CGGAGATGTGTATAA | IDT | Custom |
| /5AmMC12/CGGAGATGTGTATA | IDT | Custom |
| /5AmMC12/CGGAGATGTGTATAAG | IDT | Custom |
| /5AmMC12/CGGAGATGTGTATA | IDT | Custom |
| /5AmMC12/CGGAGATGTGTATA | IDT | Custom |
| 3/5AmMC12/CGGAGATGTGT | IDT | Custom |
| FlowJo v10.6 | BD Biosciences | |
| Cell Ranger v3.1.0 | 10X Genomics | |
| Seurat v4.0 | ||
| DSB | ||
| Reactome pathway database | ||
| scDblFinder | ||
| Cell Fidelity Statistic (CFS) scores | ||
| Custom scripts | This paper | Zenodo: |