Lisa Bast1,2, Michèle C Buck3, Judith S Hecker3, Robert A J Oostendorp3, Katharina S Götze3,4, Carsten Marr1,2. 1. Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany. 2. Technical University of Munich, Department of Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany. 3. Technical University of Munich, School of Medicine, Klinikum rechts der Isar, Department of Internal Medicine III, Munich, Germany. 4. German Cancer Consortium (DKTK), Heidelberg, Partner Site Munich, Germany.
Abstract
Classically, hematopoietic stem cell (HSC) differentiation is assumed to occur via progenitor compartments of decreasing plasticity and increasing maturity in a specific, hierarchical manner. The classical hierarchy has been challenged in the past by alternative differentiation pathways. We abstracted experimental evidence into 10 differentiation hierarchies, each comprising 7 cell type compartments. By fitting ordinary differential equation models with realistic waiting time distributions to time-resolved data of differentiating HSCs from 10 healthy human donors, we identified plausible lineage hierarchies and rejected others. We found that, for most donors, the classical model of hematopoiesis is preferred. Surprisingly, multipotent lymphoid progenitor differentiation into granulocyte-monocyte progenitors is plausible in 90% of samples. An in silico analysis confirmed that, even for strong noise, the classical model can be identified robustly. Our computational approach infers differentiation hierarchies in a personalized fashion and can be used to gain insights into kinetic alterations of diseased hematopoiesis.
Classically, hematopoietic stem cell (HSC) differentiation is assumed to occur via progenitor compartments of decreasing plasticity and increasing maturity in a specific, hierarchical manner. The classical hierarchy has been challenged in the past by alternative differentiation pathways. We abstracted experimental evidence into 10 differentiation hierarchies, each comprising 7 cell type compartments. By fitting ordinary differential equation models with realistic waiting time distributions to time-resolved data of differentiating HSCs from 10 healthy human donors, we identified plausible lineage hierarchies and rejected others. We found that, for most donors, the classical model of hematopoiesis is preferred. Surprisingly, multipotent lymphoid progenitor differentiation into granulocyte-monocyte progenitors is plausible in 90% of samples. An in silico analysis confirmed that, even for strong noise, the classical model can be identified robustly. Our computational approach infers differentiation hierarchies in a personalized fashion and can be used to gain insights into kinetic alterations of diseased hematopoiesis.
Identification of the hematopoietic lineage hierarchy is crucial to understand how blood production is controlled and how it is altered in hematopoietic diseases. Current knowledge of humanhematopoiesis suggests that the formation of all blood cell types is maintained by a lineage hierarchy, with hematopoietic stem cells (HSCs) at the apex, followed by a set of progenitors that terminate in distinct mature blood cell types. In the classical models of hematopoiesis (Akashi et al., 2000; Doulatov et al., 2012; Manz et al., 2002), HSCs give rise to multipotent progenitors (MPPs), which can differentiate into common myeloid progenitors (CMPs) and multipotent lymphoid progenitors (MLPs). CMPs can further differentiate to megakaryocyte erythrocyte progenitors (MEPs) or granulocyte-monocyte progenitors (GMPs, Figure 1A). Finally, MEPs give rise to erythrocytes and megakaryocytes, which then form platelets, while GMPs differentiate to granulocytes and monocytes. In recent years, this classical model has often been debated as additional or alternative differentiation paths between these cell types have been suggested in humans and mice by a wide range of experimental techniques. Several review studies (Doulatov et al., 2012; Haas et al., 2018; Laurenti and Göttgens, 2018) have summarized these findings and pinpointed the differences in hematopoiesis between mice and humans, but so far, no comprehensive analysis quantitatively assessed the plausibility of competing differentiation hierarchies.
We constructed 10 different models from hematopoietic hierarchies and differentiation pathways described in the literature (see Methods for a detailed description). Each model consists of a stem cell compartment (HSC), five progenitor compartments (MPPs, MLPs, CMPs, MEPs, GMPs), a compartment comprising late progenitors and mature cells (M). Transitions between states are visualized with arrows, where black arrows indicate transitions that deviate from the classical model (A).
(A) The classical model based on (Akashi et al., 2000) (Akashi et al., 2000; Manz et al., 2002), Laurenti and Göttgens (2018), Haas et al., (2018), and (Doulatov et al., 2012) is the least complex lineage hierarchy with 21 rates.
(B) Same hierarchy as in A but with additional direct differentiation from MLPs to GMPs (22 rates).
(C) Same hierarchy as in B but with additional direct differentiation from HSCs to M compartment (23 rates).
(D) Same hierarchy as in A but with additional direct differentiation from HSCs to Ms (22 rates).
(E) Same hierarchy as in B but with additional direct differentiation from HSCs and MPPs to MEPs (24 rates).
(F) Same hierarchy as in A but with additional direct differentiation from HSCs to CMPs and from MPPs to GMPs (23 rates).
(G) Same hierarchy as in A but replacing the direct differentiation from CMPs to MEPs with the direct differentiation from HSCs to MEPs (21 rates).
(H) Same hierarchy as in A but with additional direct differentiation from HSCs to CMPs, and MPPs to MEPs and GMPs (22 rates).
(I) Same hierarchy as in G but with additional direct differentiation from MLPs to GMPs (22 rates).
(J) Most complex lineage hierarchy incorporating all direct differentiation paths present in any other model (27 rates).
Lineage hierarchies describing healthy hematopoiesisWe constructed 10 different models from hematopoietic hierarchies and differentiation pathways described in the literature (see Methods for a detailed description). Each model consists of a stem cell compartment (HSC), five progenitor compartments (MPPs, MLPs, CMPs, MEPs, GMPs), a compartment comprising late progenitors and mature cells (M). Transitions between states are visualized with arrows, where black arrows indicate transitions that deviate from the classical model (A).(A) The classical model based on (Akashi et al., 2000) (Akashi et al., 2000; Manz et al., 2002), Laurenti and Göttgens (2018), Haas et al., (2018), and (Doulatov et al., 2012) is the least complex lineage hierarchy with 21 rates.(B) Same hierarchy as in A but with additional direct differentiation from MLPs to GMPs (22 rates).(C) Same hierarchy as in B but with additional direct differentiation from HSCs to M compartment (23 rates).(D) Same hierarchy as in A but with additional direct differentiation from HSCs to Ms (22 rates).(E) Same hierarchy as in B but with additional direct differentiation from HSCs and MPPs to MEPs (24 rates).(F) Same hierarchy as in A but with additional direct differentiation from HSCs to CMPs and from MPPs to GMPs (23 rates).(G) Same hierarchy as in A but replacing the direct differentiation from CMPs to MEPs with the direct differentiation from HSCs to MEPs (21 rates).(H) Same hierarchy as in A but with additional direct differentiation from HSCs to CMPs, and MPPs to MEPs and GMPs (22 rates).(I) Same hierarchy as in G but with additional direct differentiation from MLPs to GMPs (22 rates).(J) Most complex lineage hierarchy incorporating all direct differentiation paths present in any other model (27 rates).To understand how differentiation paths shape humanhematopoiesis, we here assembled and derived 10 competing lineage hierarchies and computationally tested their plausibility. To take advantage of the abundant mouse data and to carefully check all possible transitions rather than to miss any, we decided to include evidence from the murine system. For each lineage hierarchy, we mechanistically modeled cell state kinetics with ordinary differential equations (ODEs) that describe proliferation, differentiation, and cell death for each compartment. In contrast to previous computational approaches, we explicitly accounted for the cell division history and introduced realistic, non-exponential waiting times via intermediate states. To investigate the cell intrinsic differentiation and proliferation potential of human HSCs experimentally, HSCs, 5 different progenitor cell types, and mature cells were defined by cell surface marker expression (Doulatov et al., 2010; Majeti et al., 2007; Manz et al., 2002) and measured by flow cytometry at several time points. The 10 lineage hierarchy models were fitted to the resulting in vitro fluorescence activated cell sorting (FACS) data of differentiating HSCs from 10 healthy donors. Subsequently, we ranked the models' performances according to their Bayesian information criterion (BIC) values to investigate which models can be rejected and which model performs best. An in-depth in silico analysis moreover allowed us to identify how accurate our computational approach proves to be, under the assumption of realistic test parameters and for varying noise levels in the data.
Results
Derivation of a set of 10 comparable lineage hierarchies
Our goal was to identify plausible hematopoietic hierarchies by computational modeling. To that end, we compiled hierarchies from previously published experimental studies and review articles of mouse and humanhematopoiesis. However, to allow for a systematic evaluation of different proposed hematopoietic hierarchies, we first had to abstract the evidence in these studies to a limited number of cell types and transitions.The findings from Akashi et al. (2000), Manz et al. (2002), and several review articles (Doulatov et al., 2012; Haas et al., 2018; Laurenti and Göttgens, 2018) provide the basis for the classical hierarchy of humanhematopoiesis. Here, HSCs give rise to MPPs, which give rise to CMPs and MLPs. CMPs can further differentiate to MEPs and GMPs, which contribute to mature cells (from now on referred to as the M compartment). This classical hierarchy (Figure 1A) was modified by taking into account reported alternative differentiation paths. As it has been observed in several studies (Doulatov et al, 2010, 2012; Giebel et al., 2006; Goardon et al., 2011; Reynaud et al., 2003) that MLPs can also differentiate to GMPs, we incorporated the respective transition into 4 hierarchy models (Figures 1B, 1C, 1E, and 1I). Moreover, the existence of a megakaryocyte-primed HSC subset was proposed (Månsson et al., 2007; Sanjuan-Pla et al., 2013), guiding us to include a direct transition from HSCs to the M compartment (Figures 1C and 1D). We furthermore included a direct differentiation transition from MPPs to MEPs, which was observed by Pronk et al. (2007) and Notta et al. (2016) (Figures 1E and 1H). Conflicting observations however have been made regarding the direct transition from HSCs to MEPs (see Adolfsson et al., 2005; Takano et al., 2004 vs. Forsberg et al., 2006), a differentiation transition which we incorporated in three more hierarchies (Figures 1E, 1G, and 1I). Suggested were also the direct differentiation transitions from HSCs to CMPs and from MPPs to GMPs (Adolfsson et al., 2005), which we incorporated in two hierarchies (Figures 1F and 1H). Finally, we considered a hierarchy that comprises all suggested differentiation transitions (Figure 1J). For a detailed description of experimental evidence for differentiation transitions and the derivation of hierarchies, see Transparent methods.As cells were cultured in media containing specific growth factors at high concentrations (see Transparent methods), we could assume that cells were at any point saturated and not subject to niche effects. We thus did not include any feedback terms in our mathematical models.
Multi-compartment models describe possible cell-intrinsic kinetics
To computationally test the plausibility of the 10 lineage hierarchies (Figure 1) with experimental data, we developed a framework to derive, solve, fit, and rank competing lineage hierarchy models.The experimental data we used for this analysis were generated by in vitro experiments in which HSCs from 10 healthy human bone marrow (BM) samples were stringently sorted as Lineage- CD45dim CD34+ CD38- CD45RA- CD90+ cells, additionally stained with CellTrace division marker, cultured for 7 days in a defined medium optimized for differentiation, and analyzed by multiparameter immunophenotyping on subsequent days (see Transparent methods, Figure S1A and Table S1). This approach allowed us to measure the abundances of 7 cell types for in total 10 samples in a time-resolved manner and track the number of cell divisions between experiment start and the respective observed time point (see Figures S1B and S1C and Transparent methods).We next derived an ODE compartmental model which considers the 7 cell types as system states and proliferation, differentiation, and cell death events as transitions between these states (Figure 2A shows transitions for the most complex model J) for every candidate hierarchy (Figures 1A–1J). Transitions are defined as reactions that occur with cell-type-specific rates: α for differentiation, β for proliferation, and γ for cell death (Figures 2A and Table 1). While the same proliferation and cell death reactions are considered for each lineage hierarchy model, the set of differentiation reactions is specific for each hierarchy, leading to models with varying complexity (see Transparent methods, Figures 2A and Table 2). For example, model A contains 8 differentiation reactions, 7 proliferation reactions, and 6 cell death reactions (which totals 21 rates), whereas model J contains 14 differentiation reactions and thus a total of 27 rates.
Figure 2
Multi-compartment models with 3 intermediate states show optimal trade-off between parameter inference precision and computational cost
(A) Multi-compartment model accounting for the number of cell divisions and realistic rate distributions via intermediate states (nIS), exemplarily shown for the classical model A. Parameters describe cell-type-specific proliferation, differentiation, and death rates.
(B) Waiting time distributions for 1-5 intermediate states assuming a mean reaction (proliferation, differentiation, or death) rate of 1/20 [h−1].
(C) The mean log likelihood for fitting models A-J (10 lines) to 4 individual samples increases with any additionally introduced intermediate state nIS, especially for introducing a second and third intermediate state.
(D) The mean percentage of practically identifiable parameters sorted by models A-J (10 lines) based on 4 individual samples increases considerably for small nIS but remains roughly constant for nIS ≥ 3.
(E) The mean computation time for fitting models A-J (10 lines) to the 4 individual samples increases exponentially with the number of intermediate states irrespective of the model hierarchy. For every sample and lineage hierarchy, optimization of 1000 multistarts was run in parallel on 24 workers.
Table 1
All possible state transitions as model reactions in hierarchies A–J
Proliferation reactions
Differentiation reactions
Cell death reactions
R1:HSC→βHSC2HSC
R8:HSC→αHSC→MPPMPP
R15:MPP→αMPP→GMPGMP
R22:HSC→γHSC∅
R2:MPP→βMPP2MPP
R9:HSC→αHSC→CMPCMP
R16:CMP→αCMP→MEPMEP
R23:MPP→γMPP∅
R3:CMP→βCMP2CMP
R10:HSC→αHSC→MEPMEP
R17:CMP→αCMP→GMPGMP
R24:CMP→γCMP∅
R4:MLP→βMLP2MLP
R11:HSC→αHSC→MM
R18:MLP→αMLP→…∅
R25:MEP→γMEP∅
R5:MEP→βMEP2MEP
R12:MPP→αMPP→CMPCMP
R19:MLP→αMLP→GMPGMP
R26:GMP→γGMP∅
R6:GMP→βGMP2GMP
R13:MPP→αMPP→MLPMLP
R20:MEP→αMEP→MM
R27:M→γM∅
R7:M→βM2M
R14:MPP→αMPP→MEPMEP
R21:GMP→αGMP→MM
HSC, MPP, CMP, MLP, MEP, GMP, and M indicate cell types (see Figure 1), Ø denotes the empty set, and the reaction rates (see Figure 2).
Table 2
Model complexity indicated by the number of reaction rates for lineage hierarchies A–J
Model
# Reaction rates
Reactions
A
21
R1−R8,R12−R13,R16−R18,R20−R27
B
22
R1−R8,R12−R13,R16−R27
C
23
R1−R8,R11−R13,R16−R27
D
22
R1−R8,R11−R13,R16−R18,R20−R27
E
24
R1−R8,R10,R12−R14,R16−R27
F
23
R1−R9,R13,R16−R18,R20−R27
G
21
R1−R8,R10,R12−R13,R17,R18,R20−R27
H
22
R1−R9,R13−R14,R16−R18,R20−R27
I
22
R1−R8,R10,R12−R13,R17−R27
J
27
R1−R27
The corresponding reactions are listed in Table 1.
Multi-compartment models with 3 intermediate states show optimal trade-off between parameter inference precision and computational cost(A) Multi-compartment model accounting for the number of cell divisions and realistic rate distributions via intermediate states (nIS), exemplarily shown for the classical model A. Parameters describe cell-type-specific proliferation, differentiation, and death rates.(B) Waiting time distributions for 1-5 intermediate states assuming a mean reaction (proliferation, differentiation, or death) rate of 1/20 [h−1].(C) The mean log likelihood for fitting models A-J (10 lines) to 4 individual samples increases with any additionally introduced intermediate state nIS, especially for introducing a second and third intermediate state.(D) The mean percentage of practically identifiable parameters sorted by models A-J (10 lines) based on 4 individual samples increases considerably for small nIS but remains roughly constant for nIS ≥ 3.(E) The mean computation time for fitting models A-J (10 lines) to the 4 individual samples increases exponentially with the number of intermediate states irrespective of the model hierarchy. For every sample and lineage hierarchy, optimization of 1000 multistarts was run in parallel on 24 workers.All possible state transitions as model reactions in hierarchies A–JHSC, MPP, CMP, MLP, MEP, GMP, and M indicate cell types (see Figure 1), Ø denotes the empty set, and the reaction rates (see Figure 2).Model complexity indicated by the number of reaction rates for lineage hierarchies A–JThe corresponding reactions are listed in Table 1.To incorporate the measured cell-type-specific number of divisions since the start of the experiment, we extended this ODE model to include 7 division compartments per cell type. This leads to an increase in state space (from 7 to 49 states), while the parameter space (i.e. the number of unknown rates α, β, γ) stays constant. Every time a cell division occurs, the next division compartment of the same cell type is reached (Figure 2A). When cells change cell type along a differentiation reaction, they stay in the respective division compartment (see Figure 2A inset). Upon arrival at the seventh division compartment (the last one we could reliably measure, see Transparent methods), cells accumulate and can only escape via a differentiation or cell death reaction.To ensure that differentiation, proliferation, and cell death processes are realistically described by our model, we finally introduced intermediate states for every reaction. Thus, our waiting times are Erlang distributed (Matis and Wehrly, 1990), instead of the conventionally used, unrealistic, exponentially distributed waiting times (Figure 2B). This model extension leads to complex ODE systems with at least 343 and up to 994 equations (see Transparent methods) while again keeping the number of rates constant. The number of intermediate states nIS is a hyperparameter of our model.Before fitting to experimental data, we analyzed structural identifiability of the 10 competing models with the MATLAB toolbox STRIKE-GOLDD (Villaverde et al., 2019; Villaverde and Banga, 2017). Under the assumption of ideal, noise-free experimental data, all proliferation, differentiation, and cell death rates can be inferred from the measurements. This holds independently of the chosen model hierarchy and independently of considering proliferation count compartments if three intermediate states are assumed. Interestingly, for models C, D, E, and J, some rates are not structurally identifiable if proliferation count compartments but no intermediate states are considered and most parameters especially proliferation and death rates are unidentifiable if no proliferation counts and no intermediate states are considered (Table 3).
Table 3
Structurally unidentifiable parameters for models A–J with and without 3 intermediate states and 7 proliferation count compartments
Model
Unidentifiable parameters
nIS=1,ndiv=7
nIS=3,ndiv=7
nIS=1,ndiv=1
nIS=3,ndiv=1
A
x0(θ)
x0(θ)
x0(θ),β.,αMLP→…,γ.
x0(θ)
B
x0(θ)
x0(θ)
x0(θ),β.,αMLP→…,γ.
x0(θ)
C
x0θ,αGMP→M,γGMP
x0(θ)
x0(θ),β.,αMLP→…,αGMP→M,γ.
x0(θ)
D
x0θ,αGMP→M,γGMP
x0(θ)
x0(θ),β.αMLP→…,αGMP→M,γ.
x0(θ)
E
x0θ,αCMP→MEP,γCMP
x0(θ)
x0(θ),β.,αMLP→…,αCMP→MEP,γ.
x0(θ)
F
x0(θ)
x0(θ)
x0(θ),β.,αMLP→…,γ.
x0(θ)
G
x0(θ)
x0(θ)
x0(θ),β.,αMLP→…,γ.
x0(θ)
H
x0(θ)
x0(θ)
x0(θ),β.,αMLP→…,γ.
x0(θ)
I
x0(θ)
x0(θ)
x0(θ),β.,αMLP→…,γ.
x0(θ)
J
αMLP→GMP,αMLP,x0θ
x0(θ)
x0(θ),β.,αMLP→…,αMLP→MEP,αGMP→M,γ.
x0(θ)
x0(θ) is the set of parameters describing the initial conditions of the respective ordinary differential equation (ODE) system (see Transparent methods).
Structurally unidentifiable parameters for models A–J with and without 3 intermediate states and 7 proliferation count compartmentsx0(θ) is the set of parameters describing the initial conditions of the respective ordinary differential equation (ODE) system (see Transparent methods).
Parameter inference identifies optimal number of intermediate states
We implemented all 10 models for an arbitrary number of intermediate states nIS in MATLAB and used the AMICI toolbox (Fröhlich et al., 2017) to determine the solution of the ODE system. Solving the ODE system is required to simulate observations from the models for certain parameter combinations while maximizing the likelihood function. As we observed and modeled cell counts, we assumed log normally distributed noise in the likelihood function. We applied a hierarchical optimization approach (Loos et al., 2018) with which the division compartment and cell type compartment-specific noise parameters are analytically calculated during optimization. Using the profile likelihood method (Kreutz et al., 2013), we calculated the 95% confidence interval for each parameter, each model, and each donor sample. Next, we optimized the number of intermediate states nIS that are required to accurately and efficiently estimate and practically identify model parameters. Specifically, we fitted all 10 models with 1–5 intermediate states (in total 50 models) to 4 randomly chosen samples. We observed that for all models A-J, the log likelihood increases with the number of intermediate states nIS (Figure 2C). However, the mean percentage increase in log likelihood per additional intermediate state is below 3% for nIS > 3. As the percentage of practically identifiable parameters plateaus at nIS = 3 (see Transparent methods), we fixed this hyperparameter and performed the remaining analysis with nIS = 3 intermediate states.We fitted every model A-J with 3 intermediate states to all 10 samples (see Figure 3A for the fit of model A to one exemplary donor sample) using maximum likelihood estimation (see Transparent methods) and multistart optimization with the MATLAB toolbox PESTO (Stapor et al., 2018). As indicated by narrow confidence intervals for almost all samples (Figure 3B), we observed that most rates could be inferred from the data with high certainty (Table S3 lists all log10 transformed parameter values with their confidence intervals for model A). We thus concluded that overall, practical identifiability of parameters is given. However, identifiability deteriorates for the downstream progenitor compartments (GMP and MEP), especially for differentiation and cell death rates, for which we observe larger confidence intervals compared to upstream compartments (Figure 3B) and practical identifiability is not always guaranteed. Practical non-identifiability of a parameter is indicated by confidence intervals that include the upper or lower border or both borders of the parameter's search space (e.g. γMEP for sample 1 in Figure 3B).
Figure 3
Quantitative model comparison reveals model A as the most plausible lineage hierarchy (see also Figure S2)
(A) Model A (solid line) was fitted to cell abundances in 49 compartments observed for sample 1 (dots). Model uncertainty is shown as ± 2σ error bands. Sum over all divisions for each cell type compartment (last column) was not used for the fit but shows agreement of model fit and data.
(B) Optimal parameter values (dots) with 95% confidence intervals (boxes) result from fitting model A to all 10 samples.
(C) Bayesian information criterion (BIC) values per model A-J (rows) for every individual sample (columns) result from fitting all considered lineage hierarchies (Figures 1A–1J) to samples of 10 healthy individuals. Color code corresponds to three categories, which were defined based on BIC scoring. Models were categorized into best (for the lowest score), plausible (a difference to the lowest BIC score ≤10), and implausible (a difference to the lowest score >10) models for each donor sample.
(D) Relative frequency of a model to belong to one of the three categories best, plausible, and implausible according to the BIC scores obtained from the 10 samples.
Quantitative model comparison reveals model A as the most plausible lineage hierarchy (see also Figure S2)(A) Model A (solid line) was fitted to cell abundances in 49 compartments observed for sample 1 (dots). Model uncertainty is shown as ± 2σ error bands. Sum over all divisions for each cell type compartment (last column) was not used for the fit but shows agreement of model fit and data.(B) Optimal parameter values (dots) with 95% confidence intervals (boxes) result from fitting model A to all 10 samples.(C) Bayesian information criterion (BIC) values per model A-J (rows) for every individual sample (columns) result from fitting all considered lineage hierarchies (Figures 1A–1J) to samples of 10 healthy individuals. Color code corresponds to three categories, which were defined based on BIC scoring. Models were categorized into best (for the lowest score), plausible (a difference to the lowest BIC score ≤10), and implausible (a difference to the lowest score >10) models for each donor sample.(D) Relative frequency of a model to belong to one of the three categories best, plausible, and implausible according to the BIC scores obtained from the 10 samples.
Model comparison reveals plausible lineage hierarchies
To determine which lineage hierarchies best explained our experimental data, we performed quantitative model selection. This was done by assigning a score to each of the 10 models and every individual sample. We calculated the BIC (Schwartz, 1965), which takes goodness of fit, number of parameters, and the number of data points into account (see Transparent methods). After calculating model- and sample-specific BIC scores, we ranked the models accordingly and assessed how often a respective model was the best performing one (lowest score among the considered models), among the plausible models (BIC difference in score to best model <10), or how often it was rejected (BIC difference to best model ≥10, see Figures 3C and 3D).Based on our analysis, there is no evidence in our data for models E, H, and J, which were rejected for all donor samples according to BIC. Models C, F, and I also performed poorly, as they were rejected for more than 80% of samples. The only model which was not rejected by a single sample is the classical lineage hierarchy A. It was selected as the best performing model in 90% of the samples based on BIC. Model B was considered as plausible for almost 90% of the samples according to BIC (Figure 3D). There is also some evidence for models D and G, which are plausible for 30% of the samples.To investigate if the outstanding performance of model A mainly stems from its low complexity, we additionally calculated the Akaike information criterion (AIC, Akaike, 1992). As this selection criterion penalizes the number of parameters in the model differently than BIC, it can potentially select and reject other models. Indeed, the AIC is less conservative than the BIC for the model complexities of interest and the range of the number of data points we used for the fit (Figure S2A). For AIC, we again found support for models A and B, whereas models E, F, H, I, and J performed as poorly as in the BIC ranking (Figures 3D and S2B). According to AIC, there is more evidence for model C (rejected in only 60% of samples as compared to 80% with BIC) and model D (rejected with BIC in only 40% of samples as compared to 70%) but not for model G (rejected in 70% of the samples for both criteria). With a rejection percentage of at least 70% of samples according to both criteria (AIC and BIC), models E-J can be overall rejected.
Robust identification of lineage hierarchies
To investigate if it is in principle possible to select the true underlying model and not necessarily the least complex one, we performed an in silico model selection analysis. For this purpose, we chose a realistic model-specific parameter set, which was equal to the mean rate estimates from the experimental data (shown for model A in Figure 3B). We subsequently simulated artificial data from every model A-J, perturbed the generated data with noise of varying strengths (see Figure S3A for an exemplary simulation), and fitted the perturbed artificial samples with every model A-J (see Figure S3B for corresponding true and inferred parameter values). For weak noise, the measurement points deviated only lightly from the simulated values (noise parameter σ = 0.4, Figure 4A), whereas for strong noise, the perturbed measurements scattered strongly (noise parameter σ = 1.2, Figure 4C). Interestingly, for every in silico sample generated with weak noise, the true model (that is, the model from which the data were generated) performed best and almost all (8 or 9 out of 9 other models) were rejected (Figure 4B). For strong noise, we found that the true model was only accurately identified for lineage hierarchies A and D (Figure 4D). Model D is however also at least plausible if the data were simulated from any other model (Figure 4D). For model I and the two most complex models E and J, other models were favored and the true model was rejected (Figure 4D). This analysis shows how crucial underlying noise is for the robust identification of the true model. However even for strong noise, model A was never the best performing model when the in silico data was generated from another lineage hierarchy and not even considered as plausible, suggesting its low complexity is not the predominant feature of its outstanding performance (Figures 4B and 4D). For models E and J on the contrary, the model complexity might have been a barrier in correctly identifying them in the presence of strong noise: Model E was identified only once as a plausible model, model J never (Figure 4D). While simulating from model I, model C was selected as the best model, despite its higher complexity (23 vs. 22 parameters), while model I itself was rejected. Interestingly, the noise parameters per cell type and division compartment that were estimated from the experimental data (Figure 3) all lie in the interval [0.6,1.1] and are thus stronger than the weak noise (σ = 0.4) but also weaker than the strong noise (σ = 1.2) which were assumed in the in silico analysis.
Figure 4
Lineage hierarchy identification is robust for varying noise levels (see also Figure S3)
(A) Weak noise simulation and fit of model A, exemplarily shown for HSCs that have divided once. Simulated data (dashed line) were perturbed with weak noise (σ = 0.4, dots). The true kinetics (dashed line) deviates only slightly from the fitted model (solid line) and is contained in the ± 2σ error band depicting model uncertainty.
(B) For weak noise, the best fitting model always coincides with the simulated ground truth. Data was simulated with a realistic test parameter set for each model A-J and perturbed with weak noise (σ = 0.4, as shown in A). BIC values result from fitting simulated data with models A-J. The lowest BIC value (dark gray boxes) appears always for the true model and only up to two other models are plausible (light gray boxes).
(C) Strong noise simulation and fit of model A, exemplarily shown for HSCs that have divided once. Simulated data (dashed line) was perturbed with strong noise (σ = 1.2, dots). The true kinetics (dashed line) deviates at later time points from the fitted model (solid line) but is contained in the ± 2σ error band depicting model uncertainty.
(D) For strong noise (σ = 1.2, as shown in C), only models A and D are correctly identified as the best performing model (dark gray boxes on diagonal). For data simulated from models B, C, F, G, and H the true lineage hierarchy is not the best performing model but considered as plausible (light gray boxes on diagonal). For data simulated from models E and J, the true model was assessed as implausible (white boxes on diagonal) based on the BIC value.
Lineage hierarchy identification is robust for varying noise levels (see also Figure S3)(A) Weak noise simulation and fit of model A, exemplarily shown for HSCs that have divided once. Simulated data (dashed line) were perturbed with weak noise (σ = 0.4, dots). The true kinetics (dashed line) deviates only slightly from the fitted model (solid line) and is contained in the ± 2σ error band depicting model uncertainty.(B) For weak noise, the best fitting model always coincides with the simulated ground truth. Data was simulated with a realistic test parameter set for each model A-J and perturbed with weak noise (σ = 0.4, as shown in A). BIC values result from fitting simulated data with models A-J. The lowest BIC value (dark gray boxes) appears always for the true model and only up to two other models are plausible (light gray boxes).(C) Strong noise simulation and fit of model A, exemplarily shown for HSCs that have divided once. Simulated data (dashed line) was perturbed with strong noise (σ = 1.2, dots). The true kinetics (dashed line) deviates at later time points from the fitted model (solid line) but is contained in the ± 2σ error band depicting model uncertainty.(D) For strong noise (σ = 1.2, as shown in C), only models A and D are correctly identified as the best performing model (dark gray boxes on diagonal). For data simulated from models B, C, F, G, and H the true lineage hierarchy is not the best performing model but considered as plausible (light gray boxes on diagonal). For data simulated from models E and J, the true model was assessed as implausible (white boxes on diagonal) based on the BIC value.Based on this analysis, we can conclude that our parameter inference approach allows for the robust identification of lineage hierarchies in the presence of noise, if it is not overprominent. Most importantly, it is unlikely that model A was only selected due to its low complexity.
Discussion
Although many aspects of the hematopoietic system are well characterized after decades of research, substantial gaps in our understanding of how the different blood cell types are produced remain, in particular when it comes to humanhematopoiesis, where evidence and access is substantially scarcer than in mice. We approached this gap by quantitatively comparing 10 lineage hierarchies, integrating the evidence from a broad range of experimental studies. We found that our data supported the classical model of hematopoiesis, that an additional differentiation transition from MLPs to GMPs is plausible, but no evidence to support the validity of the remaining 8 lineage hierarchies.Our approach is based on 7 distinctly defined human hematopoietic cell types. This is complementary to a recently reported combined index-FACS and single-cell transcriptomic analysis (Velten et al., 2017). By projecting transcriptomic similarity of single hematopoietic cells on a unit circle, Velten et al. (2017) could observe a “Continuum of LOw primed UnDifferentiated hematopoietic stem- and progenitor cells (CLOUD-HSPCs)”. Interestingly, our finding that the additional transition from MLPs to GMPs (Figure 1B) is plausible in 90% of the samples (Figure 3D) is in line with their finding that a small subpopulation of MLPs in the two samples they analyzed differentiate toward neutrophils (Fig. 5b and Supplemental Fig. 5b in Velten et al. (2017)). It also fits with the recently reported joint progenitor population of B cells and plasmacytoid dendritic cells in mice (Herman et al., 2018). While our approach cannot resolve continuous transitions by design, it facilitates the direct inference of differentiation, proliferation, and cell death rates. Thereby, we circumvent that inferred differentiation rates are often biased by cell cycle effects, which is currently a challenge in differentiation trajectory inference from scRNAseq data (Watcham et al., 2019; Weinreb et al., 2018).Hematopoiesis as a paradigmatic stem cell system has been described in the past with a few data-driven models. In Busch et al. (2015), mouse in vivo lineage tracing data were used to parametrize a computational ODE-based compartment model of HSCs, MPPs, CLPs, and CMPs. Parameter inference revealed cell type-specific differentiation and net proliferation rates based on the upper part of the classical hierarchy since only late progenitors (GMPs and MEPs) but not mature cells were measured in the experiment. A model by Klose et al. (2019) included internal feedback within a heterogeneous HSC population and is suited to consistently describe both hematopoietic homeostasis and regeneration upon injury. It was validated using the experimental mouse data from Busch et al. (2015) but only considered one progenitor compartment without distinguishing between the various progenitor cell types. They model heterogeneity in the HSC compartment by considering two HSC subtypes: repopulating HSCs which are rarely activated during homeostasis, and maintaining HSCs which ensure the continuous supply of progenitor cells. Assuming steady-state hematopoiesis their analysis showed that repopulating HSCs differentiate and proliferate at a much lower rate than maintaining HSCs and progenitors, which differentiate almost as fast as they proliferate. In contrast to these approaches, we modeled cellular rates with realistic, Erlang distributed waiting time, leading to an improved fitting of the model to the data. Similar to our in vitro experiments, time-resolved cell counts from mouse in vitro lineage negative, Sca1 positive, c-kit negative (LSK) cell cultures (comprising long-term (LT) HSCs, short-term (ST) HSCs, and MPPs) were used previously (Adimy and Crauste, 2009; Busch et al., 2015; Klose et al., 2019; Mahadik et al., 2019) for model fitting. In particular the influence of different culture conditions on the balance between HSC self-renewal vs. differentiation and proliferation was studied by considering 3 (LSK, CMP, terminal) and 5 (LT HSC, ST HSC, MPP, CMP, terminal) compartment ODE models. Interestingly, feedback and some recently suggested differentiation transitions (LSK to terminal, and ST HSC and MPP to terminal, respectively) were included in the model. The authors conclude that downstream differentiation is driven by rapid differentiation of ST repopulating HSC progenitor populations and not CMPs and thereby favor the additionally included differentiation transitions. In contrast to our approach, the plausibility of the suggested 3- or 5-compartment models was neither assessed nor quantitatively compared and cell divisions were neither tracked nor modeled.We modeled hematopoiesis deterministically with ODEs since hundreds to thousands of cells were observed in our in vitro culture and stochastic effects average out. Per design of our cell culture experiment, extrinsic niche effects were not considered and cells were not competing for space or growth factors. Consequently, we modeled unlimited cell population growth without any feedback. To apply our approach to in vivo data or to cell cultures with only few cells and limited space or growth factors, the model would need to be adapted to e.g. include feedback terms and a carrying capacity (Klose et al., 2019; MacLean et al., 2014), use a stochastic formalism (Marr et al., 2012; Newton et al., 1995; Strasser et al., 2018; Xu et al., 2018), or incorporate space or cell age via partial differential equations (Craig et al., 2016; Østby et al., 2003; Roeder et al., 2009).Detailed insights into differentiation pathways and rates are critical for understanding how hematopoiesis yields sufficient blood cells throughout the lifetime of an individual. Improving our understanding of cell-type-specific differentiation, proliferation, and cell death may help to predict how hematologic diseases develop and how they respond to therapy. Our computational approach provides access to these parameters and may facilitate comparisons of benign and leukemic kinetics to healthy hematopoiesis to identify cell types and rates affected in leukemogenesis. It might thus also help to uncover targetable cell intrinsic disease mechanisms in the future.
Limitations of the study
In theory, differentiation transitions may exist which we have not tested, or transitions may exist that our present approach could not identify. The latter could be due to relatively high noise in the count data in combination with a very low differentiation rate of the respective transition. Also, it is worth mentioning that we focussed only on cell intrinsic kinetics—constraints of the BM environment that affect the differentiation hierarchy are not considered in our in vitro setting. Similarly, our model would need to be changed if one aims to describe cell cultures in which cells are competing for space or growth factors.The experimental setup and used technology limits the number of cell types which could be determined by flow cytometry. By using additional markers and expanding the model, additional hypotheses regarding the differentiation paths, i.e., involving specific mature cell types could be tested.
Resource availability
Lead contact
Requests for resources and reagents should be directed to the lead contact Carsten Marr (carsten.marr@helmholtz-muenchen.de).
Material availability
This study did not generate new unique reagents.
Data and code availability
Cell counts for the measured division and cell type compartments and easily executable, documented code in MATLAB and Python are publicly accessible at https://github.com/marrlab/HematopoiesisModelComparison.
Methods
All methods can be found in the accompanying Transparent methods supplemental file.
Authors: Cornelis J H Pronk; Derrick J Rossi; Robert Månsson; Joanne L Attema; Gudmundur Logi Norddahl; Charles Kwok Fai Chan; Mikael Sigvardsson; Irving L Weissman; David Bryder Journal: Cell Stem Cell Date: 2007-10-11 Impact factor: 24.633
Authors: Caleb Weinreb; Samuel Wolock; Betsabeh K Tusi; Merav Socolovsky; Allon M Klein Journal: Proc Natl Acad Sci U S A Date: 2018-02-20 Impact factor: 11.205