Literature DB >> 32763828

Chronic myelomonocytic leukaemia stem cell transcriptomes anticipate disease morphology and outcome.

Daniel H Wiseman1, Syed M Baker2, Arundhati V Dongre3, Kristian Gurashi3, Joanna A Storer3, Tim Cp Somervaille4, Kiran Batta5.   

Abstract

BACKGROUND: Chronic myelomonocytic leukaemia (CMML) is a clinically heterogeneous stem cell malignancy with overlapping features of myelodysplasia and myeloproliferation. Over 90% of patients carry mutations in epigenetic and/or splicing genes, typically detectable in the Lin-CD34+CD38- immunophenotypic stem cell compartment in which the leukaemia-initiating cells reside. Transcriptional dysregulation at the stem cell level is likely fundamental to disease onset and progression.
METHODS: We performed single-cell RNA sequencing on 6826 Lin-CD34+CD38-stem cells from CMML patients and healthy controls using the droplet-based, ultra-high-throughput 10x platform.
FINDINGS: We found substantial inter- and intra-patient heterogeneity, with CMML stem cells displaying distinctive transcriptional programs. Compared with normal controls, CMML stem cells exhibited transcriptomes characterized by increased expression of myeloid-lineage and cell cycle genes, and lower expression of genes selectively expressed by normal haematopoietic stem cells. Neutrophil-primed progenitor genes and a MYC transcription factor regulome were prominent in stem cells from CMML-1 patients, whereas CMML-2 stem cells exhibited strong expression of interferon-regulatory factor regulomes, including those associated with IRF1, IRF7 and IRF8. CMML-1 and CMML-2 stem cells (stages distinguished by proportion of downstream blasts and promonocytes) differed substantially in both transcriptome and pseudotime, indicating fundamentally different biology underpinning these disease states. Gene expression and pathway analyses highlighted potentially tractable therapeutic vulnerabilities for downstream investigation. Importantly, CMML patients harboured variably-sized subpopulations of transcriptionally normal stem cells, indicating a potential reservoir to restore functional haematopoiesis.
INTERPRETATION: Our findings provide novel insights into the CMML stem cell compartment, revealing an unexpected degree of heterogeneity and demonstrating that CMML stem cell transcriptomes anticipate disease morphology, and therefore outcome. FUNDING: Project funding was supported by Oglesby Charitable Trust, Cancer Research UK, Blood Cancer UK, and UK Medical Research Council.
Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  CMML; Leukaemia; Stem cells; sc-RNA Seq

Mesh:

Year:  2020        PMID: 32763828      PMCID: PMC7403890          DOI: 10.1016/j.ebiom.2020.102904

Source DB:  PubMed          Journal:  EBioMedicine        ISSN: 2352-3964            Impact factor:   8.143


Evidence before this study

Chronic myelomonocytic leukaemia (CMML) is a rare haematological malignancy with dismal prognosis. Currently there are no treatment options available for this disease, largely due to inadequate mechanistic understanding of disease initiation and progression. The mutational landscape in CMML and insights from other myeloid malignancies implicate transcriptional dysregulation at the level of the disease initiating haematopoietic stem cells in CMML leukaemogenesis. However, to date no studies have directly investigated the transcriptome of these cells in detail. Such understanding is critical for the rational design of novel targeted therapeutic strategies to address this major area of unmet clinical need.

Added value of this study

This is the first study to evaluate the CMML transcriptome at single cell level, and the first to directly evaluate transcriptional dysregulation specifically in the immunophenotypic stem cell compartment. It reveals marked heterogeneity, both within and between individual patients, highlighting CMML as a common phenotypic endpoint of quite different transcriptionally dysregulated stem cell programs. It highlights some candidate genes and pathways as potential novel therapeutic targets and, importantly, identified a proportion of transcriptionally near-normal residual stem cells in most cases, raising the potential for an important therapeutic window and sparing of normal haematopoietic output.

Implications of all the available evidence

Our study reveals new insights into CMML biology and provides a useful platform for future hypothesis-driven exploration of potentially tractable therapeutic vulnerabilities. Alt-text: Unlabelled box

Introduction

Chronic myelomonocytic leukaemia (CMML) is a myelodysplastic/myeloproliferative overlap syndrome typically affecting older people, characterized by bone marrow (BM) failure, leukocytosis and monocytosis [1,2]. CMML is clinically heterogeneous and categorized by both percentage accumulation of blasts and promonocytes (stages CMML-0/−1/−2), and by circulating white blood cell (WBC) count into dysplastic (<13 × 109/L) and proliferative (>13 × 109/L) forms [2,3]. Advanced stage and proliferative disease are both associated with shorter survival and higher risk of leukaemic transformation [4]. The genomic landscape is dominated by mutations in epigenetic modifier (e.g. TET2; ASXL1) and splicing (e.g. SRSF2) genes [5,6]. Despite its clinico-pathological heterogeneity, most patients have dismal prognosis with few treatment options [7,8]. Haematopoietic stem cell (HSC) transplantation is potentially curative but only an option for the minority of younger patients [9]. For the majority, hydroxycarbamide remains standard of care, affording cytoreduction without substantially altering disease biology, clonal composition or natural history. Hypomethylating agents (HMA) have an important role for a subset of non-proliferative patients, but their utility is mostly restricted to transiently rebalancing haematopoiesis, with inevitable progression and only modest influence on survival [10]. Thus, novel therapeutic targets and strategies are required to improve patient outcome; this will require deeper understanding of the biology driving the malignant evolution of CMML. While the transcriptome of CMML monocytes and bulk BM cells has been previously evaluated, the malignant stem cells in CMML have not been directly studied at the genome-wide level [11], [12], [13], [14], [15]. The Lin−CD34+CD38− BM compartment represents the apex of the haematopoietic hierarchy, typically representing <5% of total CD34+ cells and <0.25% of total BM MNCs [16]. Imbued with properties of self-renewal and multilineage differentiation potential, normal HSCs sustain and replenish haematopoiesis throughout life. In most myeloid malignancies there is substantial enrichment of leukaemia-initiating cells (LICs) within this compartment [17], [18], [19]. Whereas CD34+CD38+ and even CD34− LICs are recognized in some acute myeloid leukaemias (AML), disease-propagating cells in myelodysplasia (MDS) are thought to reside exclusively in the CD34+CD38− fraction [20]. This may also apply in CMML, with the CD34+CD38− compartment consistently harbouring initiating and driver mutations [13] and able to initiate disease in murine xenotransplantation experiments [21]. LICs are distinct from downstream blasts and other clonal progeny in their biology and response to therapy. Their persistence following chemotherapy is a major contributor to the high relapse rate in AML, and the primitive nature of MDS/CMML LICs may explain, in part, the inability to cure these diseases without allogeneic stem cell transplantation. Single-cell RNA-sequencing (scRNA-seq) is a powerful tool for elaborating transcriptional heterogeneity in cell populations [22], [23], [24]. We hypothesised that the remarkable clinical heterogeneity observed in CMML might reflect transcriptomic heterogeneity within the disease-initiating stem cell compartment. ScRNA-seq could reveal genes and pathways selectively dysregulated in CMML stem cells, highlighting potentially tractable vulnerabilities for novel therapeutic strategies. It could also shed light on the origin and pathobiology of CMML disease progression: defined by the accumulation of blasts and promonocytes into clinical stages (CMML-0; −1; −2; AML) tightly linked to overall prognosis [1]. We performed scRNA-seq on 6826 sorted Lin−CD34+CD38− immunophenotypic stem cells from seven treatment-naïve CMML patients and three healthy controls using the droplet-based, ultra-high-throughput 10x platform [24]. We identified distinctive CMML stem cell signatures but with a remarkable degree of inter- and intra-patient transcriptomic heterogeneity, distinct patterns of transcription factor network dysregulation, and substantial differences between CMML-1 and CMML-2 evident even at the primitive stem cell level.

Materials and methods

Samples: BM mononuclear cell (MNC) samples from CMML patients were obtained from the Manchester Cancer Research Centre Tissue Biobank (initiated with the approval of South Manchester Research Ethics Committee). Samples were donated, with informed consent, from patients presenting to The Christie NHS Foundation Trust (Manchester, UK), and MNC preparations cryopreserved as previously described [25]. All CMML samples were taken at time of diagnosis and before any definitive therapy. Healthy control BM MNC samples were purchased from Stem Cell Technologies (Cambridge, UK, Cat #70,001). Further details are provided in Table S1 Data and code availability: The data generated in the study are available at Array Express (E-MTAB-8884) https://www.ebi.ac.uk/arrayexpress/. Fluorescence-activated cell sorting (FACS): BM-MNC samples were thawed, washed and incubated for 30 min on ice with Hoechst 33-258 (Cat #94,403; Sigma Aldrich, Gillingham, UK) and the following antibodies (all at 1:100): anti-CD34-APC (clone 581; #555,824; BD Biosciences, Oxford, UK); anti-CD38-PE (HIT2; #25–0389–42; eBioscience, Altrincham, UK); anti-Lin-FITC (pooled antibodies targeting CD2/3/4/7/8/10/11b/14/19/20/56/235a; #25–0389–71; BD Biosciences, Oxford, UK). Viable Lin−CD34+CD38− cells were sorted using a FACS Aria III (BD Biosciences). To ensure maximum purity with minimal contamination of the targeted population, sorted samples were immediately subjected to a second sort (when cell numbers permitted) using identical gating. Up to 2000 Lin−CD34+CD38− cells were targeted per sample, based on the empirically-determined 10x capture efficiency (40–50%). Single-cell RNA-sequencing: Each sample was prepared for single-cell capture immediately after sorting to ensure cell viability. Single cell capture was performed as per manufacturer's protocol (Chromium Single Cell v2 chemistry; 10x Genomics; Leiden, The Netherlands), with 12 cycles of whole transcriptome amplification. Barcoded cDNA libraries were pooled for sequencing on an Illumina NextSeq 500 (Illumina, San Diego, CA, USA) across two high-output runs. Data processing and filtering: FASTQ files were processed using 10x Genomics’ custom pipeline Cell Ranger (v2.2.0) using default parameters (E-MTAB-8884). This identifies valid barcodes as cells and counts Unique Molecular Identifiers (UMIs) mapped to each cell, reported by cell count matrix in sparse matrix format. Sequences were mapped to the prebuild hg38 genome. Cell Ranger's aggr command was used to aggregate samples, with default down-sampling normalization enabled. We evaluated three metrics to remove low quality cells: number of UMIs identified per cell barcode (i.e. library size); number of genes called per barcode; and proportion of UMIs mapping to mitochondrial genes. For each a threshold of three Median Absolute Deviations was applied to reject poor quality cells. Violin plots for each metric were visualized to exclude outlier distributions (no potential doublets/multiplets were observed). Genes with average counts below 0.01 were excluded, being minimally informative and unreliable for statistical inference [26]. A total of 12,695 genes were retained for downstream analysis. To consider the effect of variable library size for each cell, raw counts were normalized using a deconvolution-based method and then log-transformed [27]. Scater (v1.14.6) and its dependent packages were downloaded from Bioconductor [28]. Cell cycle analysis: We used the Seurat package (version 3.1.4) CellCycleScoring function to calculate the cell cycle phase score for each cell using canonical marker genes [29]. For this calculation, we took counts for all cells and log normalized them. Next, we performed cell cycle scoring analysis that gives a score for S and G2/M phase of cell cycle. The cell cycle phase is then determined based on a highest positive score given for S or G2/M phase of the cell cycle. Any cell not scoring positive for either of these phases is assigned to G1/G0 phase. Canonical marker genes used for scoring were loaded from Seurat package [29]. No corrections for cell cycle were made, in view of the possibility that cell cycle differences were an important biological variable in comparing cells from different samples in this study. Visualization and clustering: The variance of expression of each gene was decomposed to technical and biological components, and highly variable genes identified where biological components were significantly >0.5. This gave a list of genes for which the difference between average expression in any two cells would be at least 2-log fold. These were used for dimensional reduction using Principle Component Analysis (PCA). T-distributed Stochastic neighbour Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) plots were generated using 1–14 components of the PCA. No batch effects were observed for sample BC572 (sequenced on both runs), indicating that batch corrections were not required. To cluster cells we used the hierarchical iterative clustering from the scrattch.hicat package (https://github.com/AllenInstitute/scrattch.hicat) [30]. This starts with coarse-level clustering and iteratively splits into increasingly fine clusters using the phonograph algorithm, which creates a graph with phenotypic similarities of cells by calculating Jaccard distance between their nearest neighbours [31]. Differential gene/pathway analysis: Marker genes for each cluster were identified as those displaying differential expression on comparing each cluster against all others and reporting the genes that are differentially expressed, using edgeR [32]. Pairwise differential expression (DE) analysis was performed between patients or between clusters, with each cell considered as a sample in edgeR convention. All comparisons used the DE analysis from sSeq package [33]. Cluster 17 (derived from sample BC278) returned a prominent signature of highly expressed erythroid progenitor genes; since low cell numbers had precluded double sorting on this sample we could not exclude contamination from CD38+ or CD34− downstream cells, so excluded this cluster from all subsequent DE analyses (CD34 mRNA expression was relatively lower in cells from this cluster). Gene set enrichment analysis (GSEA) was performed using GSEA software (http://software.broadinstitute.org/gsea) with default parameters, 1000 permutations on gene sets, and gene sets downloaded from MSigDB or other relevant studies [23,34,35] (Table S3). Pseudotime analysis: We ordered single cells along their developmental trajectory using the Monocle (v2.0) R package (http://cole-trapnell-lab.github.io/monocle-release/) and default workflow [36]. Size factors and dispersions were first estimated and genes with a global minimum expression detection threshold of 0.1 were selected for reordering, using dpFeature. We then used tSNE for dimension reduction, and pseudotime trajectories were generated using the plot_cell_trajectory function. SCENIC analysis: We used SCENIC (https://github.com/aertslab/SCENIC) to construct gene regulatory networks and identify stable cell states [37]. We first applied a two-stage soft filter to remove genes with low expression (UMI count lower than the value of UMI counts of 3 in 1% of cells) or expressed only in few (<1%) cells. For further downstream analysis we identified genes available on RcisTarget database and ran GENIE3 [38] to identify potential targets for each transcription factor based on co-expression. We then identified potential direct targets based on DNA-motif analysis using RcisTarget. Finally, we identified cell states of individual cells by analysing the network activity in the cell and scoring them using AUCell.

Results

To compare the transcriptional landscape of immunophenotypic CMML stem cells with their normal counterparts we performed scRNA-seq on double-sorted Lin−CD34+CD38− cells from seven untreated CMML patients at presentation (CMML-1, n = 4; CMML-2, n = 3; six males, one female; age range, 60–75) and three normal healthy volunteers (two males, one female; age range, 36–52) (Figs. 1A, S1A-B, Table S1). In total we sequenced 5870 patient and 1913 normal stem cells, of which 5031 and 1795 respectively passed quality controls for downstream analysis (Fig. S2A). For each sample we detected a median of 10,520 unique transcripts (range, 2411–14,493) mapping to median 2162 genes per cell (range, 917–2524) (Figs. S2B-C).
Fig. 1

CMML stem cells display distinct transcriptional signatures: (A) Schematic overview of the experimental workflow used in the study. (B) T-distributed stochastic neighbour embedding (t-SNE) plot based on highly variable genes for all cells passing filtering thresholds, coloured by sample type. In the inset, cells derived from CMML patients are coloured black and from healthy controls are coloured grey. BC: Biobank case number; HV: healthy volunteer; (C-D) KEGG (C) and GO biological processes (D) pathway analysis of up-regulated genes in stem cells from CMML samples as compared with control samples. The top pathways differentially active in CMML stem cells are shown; horizontal bars indicate the -log(10) q-value for that term. (E) Gene set enrichment analysis plots showing enrichment of the indicated gene signatures in CMML stem cells as compared with control samples. See also Figs. S1–S3.

CMML stem cells display distinct transcriptional signatures: (A) Schematic overview of the experimental workflow used in the study. (B) T-distributed stochastic neighbour embedding (t-SNE) plot based on highly variable genes for all cells passing filtering thresholds, coloured by sample type. In the inset, cells derived from CMML patients are coloured black and from healthy controls are coloured grey. BC: Biobank case number; HV: healthy volunteer; (C-D) KEGG (C) and GO biological processes (D) pathway analysis of up-regulated genes in stem cells from CMML samples as compared with control samples. The top pathways differentially active in CMML stem cells are shown; horizontal bars indicate the -log(10) q-value for that term. (E) Gene set enrichment analysis plots showing enrichment of the indicated gene signatures in CMML stem cells as compared with control samples. See also Figs. S1–S3.

CMML stem cells display a myeloid skewed, proliferative transcriptional signature

Two dimensional visualization by t-SNE based on 526 highly variable genes, (defined as those with biological component of variance significantly greater than zero, i.e. false discovery rate (FDR) 5%) revealed a substantial and clear separation between CMML and control Lin−CD34+CD38− cells with, in addition, striking transcriptomic heterogeneity between different CMML patients (Fig. 1B). By contrast, spatial separation of the three control samples was much more limited. We focused initially on common differences between CMML and control stem cells by analysing differentially expressed genes. We performed differential gene expression analyses between control and CMML Lin−CD34+CD38− cells, with each cell considered as a separate sample. We found 943 genes up-regulated and 296 genes down-regulated in CMML by comparison with normal cells (FDR<5%, fold change >1.5; Table S2). KEGG and Gene Ontology (GO) analysis of up-regulated genes highlighted enrichment of pathways related to cell cycle and also those characteristic of myeloid lineage cells, including “cytokine-cytokine receptor interaction”, “JAK-STAT signalling” and “chemokine signalling” (Fig. 1C and D). There were no similar enrichments among down-regulated genes. Chemokine genes up-regulated included CXCL8, CXCL12, CCL3L3, and CXCL10 (Fig. S3A, Table S2). In keeping with the significant level of inter-patient heterogeneity, we did not find any genes consistently up-regulated across every CMML patient sample. By contrast, many genes were consistently down-regulated, including FOS and JUN proto-oncogenes, both coding for components of the AP-1 transcription factor complex and important regulators of myeloid differentiation (Fig. S3B). Silencing of FOS has been reported to promote abnormalities of myeloid differentiation [39], while both are among the most prominently up-regulated genes in (unsorted) BM upon clinical response to HMAs in CMML [10]. We next computed mean normalized expression values for each gene across the population of sorted single Lin−CD34+CD38− cells, on a sample-by-sample basis for each CMML and control specimen (i.e. seven versus three samples), and performed gene set enrichment analysis (GSEA) using a signal-to-noise ranking metric. We interrogated the ranked list with the Molecular Signatures Database Hallmark Gene Set collection, each of which conveys a specific biological state or process and displays coherent expression [40]. The most significant enrichments were for gene sets characteristic of differentiated myeloid cells such as “Inflammatory response” and “IL6-JAK-STAT signalling” (strongly enriched in CMML versus normal Lin−CD34+CD38− cells) and cell cycle gene sets “E2F targets” and “G2M checkpoint” (Fig. 1E, Tables S3 and S4). No Hallmark gene sets exhibited enriched expression in control versus CMML Lin−CD34+CD38− cells. To investigate whether changes in gene expression between CMML and control stem cells are merely reflective of age differences between these groups, we performed GSEA analysis for genes differentially expressed between aged vs. young HSCs [35]. No gene sets associated with normal HSC ageing were differentially enriched in CMML or normal stem cells (Fig. S3C). Further, out of 943 up-regulated genes in CMML stem cells only 8 were associated with ageing (Fig. S3D). Since stem cell division and differentiation are intimately linked we performed additional GSEA using (i) the gene set up-regulated in human Lin−CD34+CD38− HSCs by comparison with their downstream Lin+CD34+CD38+ progenitor cells [34] and (ii) gene sets up-regulated in Lin−CD34+CD38+ oligopotent progenitors committed to one particular lineage versus all other cells with the same immunophenotype (including B-cell progenitors, megakaryocyte/erythrocyte (MegE) committed progenitors, neutrophil-primed progenitors, monocyte/dendritic cell progenitors and eosinophil/basophil/mast cell (EBM) progenitors) (Table S3). We observed a highly significant down-regulation of a normal Lin−CD34+CD38− HSC gene set in CMML versus normal Lin−CD34+CD38− HSCs (Fig. 2A) and a highly significant up-regulation of genes selectively expressed in oligopotent Lin−CD34+CD38+neutrophil-primed or monocyte/dendritic cell-primed progenitors (Fig. 2B). There was by contrast minimal or absent enrichment of expression of B-cell, MegE or EBM progenitor gene sets (data not shown). These findings are further supported by violin plots shown (Fig. 2C) for single-cell expression of transcription factor genes which have been associated with HSC self-renewal, including MEIS1, HLF, KLF4 and EGR1 (all down-regulated in CMML versus normal stem cells); and transcription factor and other genes associated with myeloid commitment and differentiation such as RUNX1, CEBPD, LGALS1 and AZU (all up-regulated in CMML versus normal HSCs (Fig. 2C). We also found up-regulation of several genes unique to subsets of patients e.g. CDKN1A was up-regulated in a subset of patients’ stem cells, as observed previously by targeted q-RT-PCR analysis [11].
Fig. 2

CMML stem cells lose features of stem cell quiescence and exhibit myeloid bias: (A and B) Gene set enrichment analysis plots showing enrichment of the indicated gene signatures in CMML stem cells as compared with control samples. (C) Violin plots showing the expression of selected genes relating to HSC maintenance and myeloid commitment in CMML and control stem cells. Expression levels are shown as log(10)(normalized counts +1).

CMML stem cells lose features of stem cell quiescence and exhibit myeloid bias: (A and B) Gene set enrichment analysis plots showing enrichment of the indicated gene signatures in CMML stem cells as compared with control samples. (C) Violin plots showing the expression of selected genes relating to HSC maintenance and myeloid commitment in CMML and control stem cells. Expression levels are shown as log(10)(normalized counts +1). Taken together, these data demonstrate that by comparison with normal Lin−CD34+CD38− HSCs, CMML stem cells exhibit a transcriptome characterized by increased expression of myeloid lineage and cell cycle genes, and lower expression of genes selectively expressed by normal HSCs.

Detection of normal Lin−CD34+CD38−single cell transcriptional signature in CMML patients

We next performed unsupervised iterative clustering and identified 17 transcriptionally separable subpopulations across all sequenced cells (Fig. 3A). For each comparison of a subpopulation with all remaining populations, the 20 most up-regulated genes were identified (Table S5). Expression of these marker genes for each of the 17 clusters is shown in Fig. 3B, along with the top three KEGG terms exhibiting significant enrichment in each cluster. As expected for CMML stem cell clusters, and as before, terms include those associated with myeloid lineage differentiation and the cell cycle (Fig. 3B). Normal Lin−CD34+CD38− HSCs almost exclusively clustered in Cluster 15 (Fig. 3C). Notably, most CMML samples contained a variable proportion of cells also assigned to Cluster 15 (median 3.1%; range, 0.1–52.3%; Fig. 3C), suggesting the residual presence of cells akin to normal HSCs within each CMML patient.
Fig. 3

A variable fraction of CMML stem cells share a transcriptional signature with normal stem cells: (A) t-SNE plot based on highly variable genes for all cells passing filtering thresholds, indicating 17 transcriptionally separable subpopulations (distinguished by colour) as determined by unsupervised iterative clustering. Normal stem cells (from all three healthy controlssegregate together in Cluster 15. (B) Heat map displaying scaled expression of the top 20 differentially expressed genes in each cluster, with respect to cells from all other clusters. Top 3 KEGG pathways (lowest q-value) active in each cluster are shown next to the corresponding cluster (where identified). (C) Heat map indicating the distribution of cells for each sample (columns) across the 17 defined clusters (rows); shading indicates scaled percentage of cells within each cluster for that patient/control sample. Lower panel indicates percentage of cells within Cluster 15 (on a logarithmic scale). See also Fig. S4.

A variable fraction of CMML stem cells share a transcriptional signature with normal stem cells: (A) t-SNE plot based on highly variable genes for all cells passing filtering thresholds, indicating 17 transcriptionally separable subpopulations (distinguished by colour) as determined by unsupervised iterative clustering. Normal stem cells (from all three healthy controlssegregate together in Cluster 15. (B) Heat map displaying scaled expression of the top 20 differentially expressed genes in each cluster, with respect to cells from all other clusters. Top 3 KEGG pathways (lowest q-value) active in each cluster are shown next to the corresponding cluster (where identified). (C) Heat map indicating the distribution of cells for each sample (columns) across the 17 defined clusters (rows); shading indicates scaled percentage of cells within each cluster for that patient/control sample. Lower panel indicates percentage of cells within Cluster 15 (on a logarithmic scale). See also Fig. S4. Leukemic stem cells could influence normal stem cell function via paracrine signalling or through alterations to the microenvironment. We found samples BC786 and BC746 included unusually high (52.4% and 46.5%, respectively) proportions of cells assigned to the “normal” Cluster 15 (Fig. 3C), alongside distinctive patient-specific leukemic clusters. Interestingly, when Cluster 15 CMML stem cells were compared with Cluster 15 stem cells from healthy controls, significant and consistent gene expression differences were evident. We again observed a relative down-regulation of normal HSC genes in Cluster 15 stem cells recovered from CMML patients BC786 and BC746 by comparison with normal individuals; and in the case of Cluster 15 stem cells from patient BC746 there was a significant relative increase in genes selectively expressed in both neutrophil and monocyte/dendritic progenitor cells (Figs. S4A-C). It remains unclear whether these transcriptional differences within Cluster 15 are driven by CMML-associated genetic lesions or alternatively reflect residual genetically normal HSCs in the BM of CMML patients whose transcriptome has been modified by an altered disease-associated BM microenvironment.

Stem cells from CMML-1 and CMML-2 patients harbour distinct transcriptional programs

An increased BM blast percentage is strongly associated with inferior outcome in CMML [1]. To determine whether this clinicopathological distinction is reflected by transcriptomic differences within the CMML stem cell compartment, we compared single stem cell transcriptomes from patients with WHO-defined CMML-0 or CMML-1 (<10% BM blasts and promonocytes; hereafter “CMML-1″) with those from patients with “CMML-2″ (≥10% BM blasts and promonocytes). Again each cell was considered as a separate sample. We observed substantial differences: 782 genes were up-regulated and 802 genes down-regulated in CMML-1 versus CMML-2 stem cell populations (FDR<5%, fold change >1.5; Table S6). KEGG and GO Biological Process analysis showed enriched expression in CMML-1 of gene sets with annotations “DNA replication”, “Cell cycle”, “Translational elongation”, among others. By contrast, gene sets enriched in CMML-2 stem cells by comparison with CMML-1 included those related to inflammatory and cytokine response, JAK-STAT signalling and apoptosis (Table S7). Given this observation of highly distinct categories of KEGG and GO terms enriched in CMML-1 versus CMML-2 stem cells, we made use of the alternate visualization method UMAP, which better represents cellular origins and trajectories during development than the more widely used t-SNE method [41]. This revealed remarkable segregation of CMML-2 stem cell transcriptomes away from CMML-1 and control stem cell transcriptomes, but for a subfraction of stem cells from CMML-2 patient BC572 (Fig. 4A).
Fig. 4

CMML-1 and CMML-2 stem cells show distinct developmental trajectories: (A) Uniform Manifold Approximation and Projection (UMAP) plot based on highly variable genes for all cells. (B) Cellular trajectories displayed as linear stream plots for each sample, illustrating the spread and density of individual cells in pseudotime (C and D) Cellular trajectories of all cells in the study mapped to a branching 3D model, as constructed by the Monocle package; colour coding indicates (C) scaled pseudotime; and (D) the patient/control sample from which each cell derived. Branching points 1 and 2 are indicated in (C) by bold black dots. CMML-1, CMML-2 and HV sample IDs are labelled in green, red and blue respectively. See also Fig. S5.

CMML-1 and CMML-2 stem cells show distinct developmental trajectories: (A) Uniform Manifold Approximation and Projection (UMAP) plot based on highly variable genes for all cells. (B) Cellular trajectories displayed as linear stream plots for each sample, illustrating the spread and density of individual cells in pseudotime (C and D) Cellular trajectories of all cells in the study mapped to a branching 3D model, as constructed by the Monocle package; colour coding indicates (C) scaled pseudotime; and (D) the patient/control sample from which each cell derived. Branching points 1 and 2 are indicated in (C) by bold black dots. CMML-1, CMML-2 and HV sample IDs are labelled in green, red and blue respectively. See also Fig. S5. To further interrogate the relatedness of normal and CMML stem cell populations we performed pseudotemporal ordering analysis, a method that constructs putative cellular trajectories by statistically ordering cells based on transcriptomic similarity [36,41]. In a two-dimensional linear representation clear demarcation of CMML-1, CMML-2 and normal stem cells was observed (Fig. 4B). This separation was further emphasized by a branched representation where pseudotime ordered stem cells into distinct branches without a priori assumption of normal HSC differentiation (Fig. 4C and D). In this projection control and CMML-1 stem cells are largely located above branching point 2, whereas those before branching point 2 are predominantly CMML-2 stem cells (Fig. S5A). Given the distinct transcriptomes of CMML-1 versus CMML-2 stem cells, we next evaluated levels of transcription factor expression, since these are core regulators of cellular state and differentiation. Consistent with more extensive monocytic differentiation observed in CMML-2 cases, we noted significantly higher expression of transcription factor genes associated with monocytic differentiation such as IRF1, IRF7, IRF9, MAFF and CEBPB in CMML-2 versus CMML-1 stem cells (Fig. 5A, Fig. S5B). In contrast, the key oncogenic transcription factor MYC and its downstream targets were up-regulated in CMML-1 versus either CMML-2 or normal stem cells, in keeping with the observed enrichment of MYC-associated KEGG and GO terms (Fig. 5B). Furthermore, mature granulocytic genes including ELANE, MPO and LYZ were significantly up-regulated in CMML-1 stem cells (Fig. S5C). Transcription factors specifying granulomonocytic commitment and differentiation, such as SPI1 and CEBPA, were also consistently more highly expressed in CMML-1 versus CMML-2 stem cells, but did not reach statistical significance possibly due to small sample size (Fig. 5C). Taken together, our data reveal that stem cell transcriptomes from patients with CMML-2 are quite distinct from those from patients with CMML-1, apparently anticipating disease morphology and thus outcome.
Fig. 5

Stem cells from CMML-1 and CMML-2 patients harbour distinct transcriptional signatures: (A and C) Violin plots showing the relative expression level of selected genes involved in myeloid differentiation. BC: Biobank case number; HV: healthy volunteer; CMML-1, CMML-2 and HV sample IDs are labelled in green, red and blue respectively. (B) Gene set enrichment analysis plots of MYC target genes comparing CMML-1 stem cells as compared with control (upper pane) or CMML-2 (lower pane).

Stem cells from CMML-1 and CMML-2 patients harbour distinct transcriptional signatures: (A and C) Violin plots showing the relative expression level of selected genes involved in myeloid differentiation. BC: Biobank case number; HV: healthy volunteer; CMML-1, CMML-2 and HV sample IDs are labelled in green, red and blue respectively. (B) Gene set enrichment analysis plots of MYC target genes comparing CMML-1 stem cells as compared with control (upper pane) or CMML-2 (lower pane). Another source of clinicopathological heterogeneity in CMML is the prognostically-significant distinction between dysplastic and proliferative disease. Differential gene expression analysis found 195 genes up-regulated and 596 down-regulated in dysplastic stem cells with respect to proliferative stem cells (Table S8). Terms including “negative regulation of cell proliferation” and “negative regulation of developmental process” were enriched in dysplastic stem cells (Table S9), but no hallmark gene sets were enriched in stem cells from dysplastic CMML cases as compared with proliferative cases (data not shown). No genes were uniquely expressed in CMML stem cells from cases of proliferative versus dysplastic disease manifestation. Of note, gene expression profiles of whole unsorted BM mononuclear cells and of sorted CD34+ cells from CMML patients similarly did not segregate samples based on proliferative versus dysplastic phenotype [15]. Together these observations suggest that whereas the distinction of CMML-1 from CMML-2 stage disease is primed by upstream transcriptional programs within the primitive stem cell compartment, this does not appear to be the case for dysplastic versus proliferative phenotype; presumably the degree of proliferative potential is instead programmed by genetic or epigenetic events downstream of the stem cell compartment.

Transcription factor networks involved in CMML stem cells

To more deeply evaluate transcription factor dysregulation in CMML stem cells, we performed a transcriptional regulatory network analysis. A regulon is a gene regulatory network comprising a transcription factor and all of its putative downstream targets. This approach is particularly suited to scRNA-seq, with opportunity to evaluate directly the coordinated expression of a regulating transcription factor and its dependent targets within the same cell. Moreover, considering a regulon as a whole, it is robust against gene drop-outs which are common in scRNA-seq datasets. For this we employed SCENIC (Single-Cell rEgulatory Network Inference and Clustering) [37], a computational algorithm that identifies regulon modules by integrating co-expression of transcription factor and genes containing the corresponding transcription factor -binding motif in their cis-regulatory elements. Clustering based on a total of 296 regulons again showed that normal and CMML Lin−CD34+CD38− stem cells were widely separable in transcriptional space (Fig. 6A). Whilst some regulons were equally active across all cells (e.g. ELF1, HMGA1, ETS2), certain transcription factor networks displayed unique activity in subsets of samples (Fig. 6B). Consistent with our earlier analyses, regulons associated with the proto-oncogene JUN and with transcription factors linked to HSC self-renewal (e.g. KLF4, HLF) were relatively inactive in CMML (Fig. 6C). CMML-1 stem cells showed a relative increase in activity of regulons associated with known myeloid transcription factors CEBPD, CEBPA and MYC. Networks related to monocytic differentiation were preferentially active in CMML-2 stem cells (Fig. 6C and Fig. S6), consistent with our earlier single gene and GSEA observations. These included regulons centred on NFKB1, IRF8, RELA, RELB and IRF7, and other transcription factors strongly associated with monocytic lineage differentiation. Thus, the transcriptome of CMML stem cells and their lineage potential are characterized by expression of distinctive transcriptional circuits according to whether they are from patients with CMML-1 (MYC) or CMML-2 (IRF transcription factors).
Fig. 6

Distinct transcription factor networks operational in CMML-1 and CMML-2 stem cells-: (A) t-SNE plot based on regulon activity in all CMML and control stem cells included in the study, as determined by SCENIC analysis. Segregation of cells derived from CMML-1, CMML-2 and normal controls is indicated by the dashed lines. BC: Biobank case number; HV: healthy volunteer; CMML-1, CMML-2 and HV sample IDs are labelled in green, red and blue respectively. (B & C) t-SNE plots based on activity of regulons centred on the indicated transcription factors, displayed as a binary measure (grey: inactive; blue: active). Regulons that are uniquely active in HV, CMML-1 and CMML-2 cells are highlighted in blue, green and red boxes, respectively (C). Regluons that are either active in all cells or in cells that do not fall under any specific category are highlighted black boxes (B). See also Fig. S6.

Distinct transcription factor networks operational in CMML-1 and CMML-2 stem cells-: (A) t-SNE plot based on regulon activity in all CMML and control stem cells included in the study, as determined by SCENIC analysis. Segregation of cells derived from CMML-1, CMML-2 and normal controls is indicated by the dashed lines. BC: Biobank case number; HV: healthy volunteer; CMML-1, CMML-2 and HV sample IDs are labelled in green, red and blue respectively. (B & C) t-SNE plots based on activity of regulons centred on the indicated transcription factors, displayed as a binary measure (grey: inactive; blue: active). Regulons that are uniquely active in HV, CMML-1 and CMML-2 cells are highlighted in blue, green and red boxes, respectively (C). Regluons that are either active in all cells or in cells that do not fall under any specific category are highlighted black boxes (B). See also Fig. S6.

Inter- and intra-patient heterogeneity in CMML HSPCs

Understanding clonal heterogeneity in the stem cell compartment within and between patients is critical for optimal design of novel therapies, which to be effective must target all tumour sub-populations within that patient. Cell-to-cell transcriptional correlation analysis found no consistent differences in the magnitude of overall transcriptomic heterogeneity between samples, or between CMML and controls (Fig. S7A). This suggests that overall CMML stem cells are no more heterogeneous than normal stem cells. However, unlike control stem cells, each patient's stem cells were distributed across 2–3 distinct clusters (Fig. 7A and S7B).
Fig. 7

Inter and intra patient heterogeneity in CMML stem cells: (A) t-SNE plots based on highly variable genes for each CMML patient separately, highlighting the distribution of their stem cells across the defined clusters; distinct clusters represented within a sample are assigned different colours for emphasis. t-SNE plots presenting CMML-1 and CMML-2 stem cells are highlighted in green and red boxes respectively (B) Bubble chart showing relative enrichment of the indicated pathways in each distinct cluster from the indicated CMML patient samples, all with respect to Cluster 15 from all merged control samples. Bubble colour represents scaled -log(10) FDR significance; bubble size indicates the number of up-regulated genes enriched in the indicated pathways. See also Fig. S7.

Inter and intra patient heterogeneity in CMML stem cells: (A) t-SNE plots based on highly variable genes for each CMML patient separately, highlighting the distribution of their stem cells across the defined clusters; distinct clusters represented within a sample are assigned different colours for emphasis. t-SNE plots presenting CMML-1 and CMML-2 stem cells are highlighted in green and red boxes respectively (B) Bubble chart showing relative enrichment of the indicated pathways in each distinct cluster from the indicated CMML patient samples, all with respect to Cluster 15 from all merged control samples. Bubble colour represents scaled -log(10) FDR significance; bubble size indicates the number of up-regulated genes enriched in the indicated pathways. See also Fig. S7. Next we performed pairwise pathway analyses comparing transcriptional profiles of each CMML-patient specific cluster with that of Cluster 15, excluding cells from CMML patients that mapped to this cluster (Fig. 7B). There was highly heterogeneous differential expression of genes mapping to distinct KEGG pathways. Some (e.g. “cytokine-cytokine receptor interaction”, “chemokine signalling”) were generally active across most disease clusters, whilst others were only active in certain clusters in some patients. Moreover, we observed striking heterogeneity within certain CMML patients. For example, patient BC776 displayed three major clusters: 3, 4 and 5. While Clusters 3 and 5 shared many features, “Cell cycle” and “DNA replication” gene sets were more active in Cluster 3 versus Cluster 5. Similarly, Cluster 4 showed minimal activation of JAK-STAT and MAPK signalling pathways in comparison with Clusters 3 and 5. These data further highlight the transcriptional heterogeneity not just between, but within CMML patient stem cell populations.

Discussion

Novel inhibitors targeting pathogenic gain-of-function mutations (e.g. IDH1, IDH2, FLT3) are changing the treatment landscape in myeloid malignancies, but have a limited role in CMML in which these mutations are comparatively rare [5]. The typical mutations driving CMML (TET2, SRSF2, ASXL1, RUNX1, N/KRAS) have proved difficult to target directly, and no tractable vulnerabilities specific to the CMML disease-initiating compartment have been identified to rationally guide targeted therapy design. Our data reveal that the defining myelomonocytic expansion of CMML is primed from the Lin−CD34+CD38− stem cell compartment, with fundamental myelomonocytic transcriptomic perturbation and dysregulation of key differentiation pathways. These results are consistent with the previously observed up-regulation of the myelomonocytic transcription factor SPI1 in sorted common myeloid progenitor cells from CMML patients in comparison with the healthy controls [13]. We highlight candidate genes and pathways consistently and exclusively dysregulated in CMML stem cells, apparently across genotypes: for example, down-regulation of the AP-1 subunits FOS and JUN, implicating NF-κB pathway activation as in other cancers [42] and overexpression of LGALS1, a positive regulator of both RAS signalling and the anti-apoptotic proto-oncogenes BCL-2 and MCL-1 [43]. More broadly, we identify pathways aberrantly operational in stem cells from subsets of patients, including JAK-STAT signalling, inflammatory response, translation elongation and cell cycle progression, each of which are increasingly amenable to therapeutic modulation with novel agents. Aberrant expression of genes involved in inflammation and cell cycle was also observed in whole unsorted BM mononuclear cells and in monocytes from CMML patients, indicating that these altered regulatory networks are programmed within primitive stem cells and propagated throughout differentiation [11,13]. Activation of JAK-STAT pathway, without necessarily a demonstrable JAK2 mutation and particularly as a hypersensitive response to native GM-CSF, has been reported as a key feature of CMML, and drugs targeting this pathway have shown promise in both preclinical models and early phase clinical trials [44], [45], [46], [47], [48]. Our data reveal a substantial degree of both inter- and intra-patient heterogeneity within the Lin−CD34+CD38− compartment of CMML. Strikingly, every patient displayed a private set of differentially expressed genes and aberrantly operational transcription factor networks. This extent of transcriptomic heterogeneity between patients mirrors the observed clinical heterogeneity, which far exceeds the mutational diversity of this disease and indicates that effective targeted therapy approaches in CMML may need to be highly personalized [7]. Strikingly, we observed wide segregation and distinct transcriptomic profiles between CMML-1 and CMML-2 stem cells; disease stages clinically distinguished solely by the percentage of blasts and promonocytes. CMML-1 and CMML-2 stem cells were segregated widely by UMAP. CMML-1 stem cells exhibited a program defined by high expression of MYC, and myeloid differentiation and cell cycle genes. CMML-2 stem cells were characterized by high expression of regulomes associated with IRF1, IRF7 and IRF8, factors that are highly expressed in monocytic lineage differentiation. These data highlight the distinctive priming of CMML-1 versus CMML-2 malignant stem cells and suggest against a simplistic, linear succession model of CMML disease progression. This implies that CMML-1 and CMML-2 might represent quite different disease states rather than necessarily steps on a continuum, although our sample size was too small for definitive confirmation. Notably, many CMML-1 patients do not transform to AML despite sometimes exhibiting highly proliferative, aggressive disease. CMML-2 patients, by contrast, frequently presents with a disease closely resembling, or rapidly evolving into AML. We identified variably sized populations of CMML stem cells transcriptionally resembling normal HSCs. It remains unclear whether these are part of the malignant clone, partially-transformed pre-leukemic stem cells or genuinely normal residual stem cells. That they might be uninvolved bystanders is supported by a previous report that founding mutations were detectable in most, but not all, HSC/MPP-derived colonies from a majority of CMML patients [13]. Nevertheless, the relationship between genotype and functional LIC status is uncertain, particularly since CMML founding mutations are also common markers of age-related clonal haematopoiesis of indeterminate potential [49]. Indeed, our scRNA-seq, at much higher throughput, is also consistent with a hypothesis that the BM of CMML patients contains residual genetically uninvolved stem cells, with transcriptomes subtly altered due to a non-cell autonomous effect. These could represent transcriptional perturbations induced by paracrine signalling from neighbouring malignant stem cells or from the leukemic microenvironment, as has been observed in chronic myeloid leukaemia [50]. Irrespective, this raises the prospect of a therapeutic window, with a potential reservoir of cells to potentially survive a targeted therapy exploiting more pervasive transcriptional vulnerabilities and restore functional haematopoiesis. Whether the transcriptomic heterogeneity of CMML stem cells is mediated by genetic, epigenetic or microenvironmental influences remains uncertain. We found no convincing correlation of transcriptional clusters with patient mutation status as genotyped at bulk MNC level (data not shown), but these analyses were indirect and limited by small sample size. Our limited sample size also precluded us from a detailed investigation into the transcriptional consequences of TET2/SRSF2 mutant combination in patient HSPCs. Studies leveraging emerging technologies to correlate transcriptome with genotype at the single-cell level, and in larger patient cohorts, are required to comprehensively address the link between transcriptional and mutational heterogeneity. Taken together, though, our data demonstrate CMML to be a heterogeneous stem cell malignancy, with multiple transcriptional routes converging on the common downstream clinicopathological phenotype.

Author contributions

KB, TS and DW designed the study and performed the experiments. KB, SM, TS and DW analysed the data and wrote the manuscript, with contribution from all other authors.

Declaration of Compeitng Interest

Authors declare no competing interests.
  49 in total

1.  Independent filtering increases detection power for high-throughput experiments.

Authors:  Richard Bourgon; Robert Gentleman; Wolfgang Huber
Journal:  Proc Natl Acad Sci U S A       Date:  2010-05-11       Impact factor: 11.205

Review 2.  Chronic Myelomonocytic Leukemia: a Genetic and Clinical Update.

Authors:  Kristen B McCullough; Mrinal M Patnaik
Journal:  Curr Hematol Malig Rep       Date:  2015-09       Impact factor: 3.952

Review 3.  Chronic myelomonocytic leukemia: 2018 update on diagnosis, risk stratification and management.

Authors:  Mrinal M Patnaik; Ayalew Tefferi
Journal:  Am J Hematol       Date:  2018-06       Impact factor: 10.047

4.  Hematopoietic stem cell and progenitor cell mechanisms in myelodysplastic syndromes.

Authors:  Wendy W Pang; John V Pluvinage; Elizabeth A Price; Kunju Sridhar; Daniel A Arber; Peter L Greenberg; Stanley L Schrier; Christopher Y Park; Irving L Weissman
Journal:  Proc Natl Acad Sci U S A       Date:  2013-02-06       Impact factor: 11.205

5.  Clonal architecture of chronic myelomonocytic leukemias.

Authors:  Raphaël Itzykson; Olivier Kosmider; Aline Renneville; Margot Morabito; Claude Preudhomme; Céline Berthon; Lionel Adès; Pierre Fenaux; Uwe Platzbecker; Olivier Gagey; Philippe Rameau; Guillaume Meurice; Cédric Oréar; François Delhommeau; Olivier A Bernard; Michaela Fontenay; William Vainchenker; Nathalie Droin; Eric Solary
Journal:  Blood       Date:  2013-01-14       Impact factor: 22.113

6.  Microarray and serial analysis of gene expression analyses identify known and novel transcripts overexpressed in hematopoietic stem cells.

Authors:  Robert W Georgantas; Vivek Tanadve; Matthew Malehorn; Shelly Heimfeld; Chen Chen; Laura Carr; Francisco Martinez-Murillo; Greg Riggins; Jeanne Kowalski; Curt I Civin
Journal:  Cancer Res       Date:  2004-07-01       Impact factor: 12.701

Review 7.  An evolutionary perspective on chronic myelomonocytic leukemia.

Authors:  R Itzykson; E Solary
Journal:  Leukemia       Date:  2013-04-05       Impact factor: 11.528

8.  Age-related mutations and chronic myelomonocytic leukemia.

Authors:  C C Mason; J S Khorashad; S K Tantravahi; T W Kelley; M S Zabriskie; D Yan; A D Pomicter; K R Reynolds; A M Eiring; Z Kronenberg; R L Sherman; J W Tyner; B K Dalley; K-H Dao; M Yandell; B J Druker; J Gotlib; T O'Hare; M W Deininger
Journal:  Leukemia       Date:  2015-12-09       Impact factor: 11.528

9.  Interleukin 10 inhibits growth and granulocyte/macrophage colony-stimulating factor production in chronic myelomonocytic leukemia cells.

Authors:  K Geissler; L Ohler; M Födinger; I Virgolini; M Leimer; E Kabrna; M Kollars; S Skoupy; B Bohle; M Rogy; K Lechner
Journal:  J Exp Med       Date:  1996-10-01       Impact factor: 14.307

10.  SCENIC: single-cell regulatory network inference and clustering.

Authors:  Sara Aibar; Carmen Bravo González-Blas; Thomas Moerman; Vân Anh Huynh-Thu; Hana Imrichova; Gert Hulselmans; Florian Rambow; Jean-Christophe Marine; Pierre Geurts; Jan Aerts; Joost van den Oord; Zeynep Kalender Atak; Jasper Wouters; Stein Aerts
Journal:  Nat Methods       Date:  2017-10-09       Impact factor: 28.547

View more
  4 in total

Review 1.  Increasing recognition and emerging therapies argue for dedicated clinical trials in chronic myelomonocytic leukemia.

Authors:  Aline Renneville; Mrinal M Patnaik; Onyee Chan; Eric Padron; Eric Solary
Journal:  Leukemia       Date:  2021-06-26       Impact factor: 11.528

2.  The delta isoform of phosphatidylinositol-3-kinase predominates in chronic myelomonocytic leukemia and can be targeted effectively with umbralisib and ruxolitinib.

Authors:  Matthew T Villaume; M Pia Arrate; Haley E Ramsey; Kathryn I Sunthankar; Matthew T Jenkins; Tamara K Moyo; Brianna N Smith; Melissa A Fischer; Merrida A Childress; Agnieszka E Gorska; P Brent Ferrell; Michael R Savona
Journal:  Exp Hematol       Date:  2021-02-19       Impact factor: 3.249

3.  Tracking chronic myelomonocytic leukaemia diversity at the single cell level.

Authors:  Eric Solary
Journal:  EBioMedicine       Date:  2020-08-15       Impact factor: 8.143

Review 4.  Natural Barcodes for Longitudinal Single Cell Tracking of Leukemic and Immune Cell Dynamics.

Authors:  Livius Penter; Satyen H Gohil; Catherine J Wu
Journal:  Front Immunol       Date:  2022-01-03       Impact factor: 8.786

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.