With a rising incidence of COVID-19-associated morbidity and mortality worldwide, it is critical to elucidate the innate and adaptive immune responses that drive disease severity. We performed longitudinal immune profiling of peripheral blood mononuclear cells from 45 patients and healthy donors. We observed a dynamic immune landscape of innate and adaptive immune cells in disease progression and absolute changes of lymphocyte and myeloid cells in severe versus mild cases or healthy controls. Intubation and death were coupled with selected natural killer cell KIR receptor usage and IgM+ B cells and associated with profound CD4 and CD8 T cell exhaustion. Pseudo-temporal reconstruction of the hierarchy of disease progression revealed dynamic time changes in the global population recapitulating individual patients and the development of an eight-marker classifier of disease severity. Estimating the effect of clinical progression on the immune response and early assessment of disease progression risks may allow implementation of tailored therapies.
With a rising incidence of COVID-19-associated morbidity and mortality worldwide, it is critical to elucidate the innate and adaptive immune responses that drive disease severity. We performed longitudinal immune profiling of peripheral blood mononuclear cells from 45 patients and healthy donors. We observed a dynamic immune landscape of innate and adaptive immune cells in disease progression and absolute changes of lymphocyte and myeloid cells in severe versus mild cases or healthy controls. Intubation and death were coupled with selected natural killer cell KIR receptor usage and IgM+ B cells and associated with profound CD4 and CD8 T cell exhaustion. Pseudo-temporal reconstruction of the hierarchy of disease progression revealed dynamic time changes in the global population recapitulating individual patients and the development of an eight-marker classifier of disease severity. Estimating the effect of clinical progression on the immune response and early assessment of disease progression risks may allow implementation of tailored therapies.
Coronavirus disease-2019 (COVID-19), caused by severe acute respiratory
syndrome coronavirus 2 (SARS-CoV-2), is a global pandemic that (as of August 2020)
has infected over 25 million people worldwide, caused over 840,000 deaths, and
strains health systems on an unprecedented scale. COVID-19 has heterogeneous
clinical manifestation, ranging from mild symptoms such as cough and low-grade fever
to severe conditions including respiratory failure and death[1,2].
While most patients with mild disease develop an appropriate immune response that
culminates with viral clearance[3],
severe disease manifestations have been linked to lymphopenia and immune
hyperresponsiveness leading to cytokine release syndrome (CRS)[1,2,4,5]. The most effective therapeutic approaches developed so far for
severe cases involve either general immunosuppression with glucocorticoids[6] or selective neutralization of
interleukin 6 (IL-6) with tocilizumab[7], a monoclonal antibody used to manage CRS in indications such as
rheumatoid arthritis[8]. The efficacy
of these therapies strongly support a key role for immune dysregulation in the
pathogenesis of COVID-19. However, neither treatment has achieved high clinical
remission rates in patients with severe COVID-19[9,10], suggesting that
other immunological or immune-independent attributes may contribute to severity,
treatment failure, and ultimately patient death. Thus, in-depth characterization of
immune responses to SARS-CoV-2 infection is urgently needed.Recent characterization efforts have uncovered broad dysregulation of the
innate immune system[11] coupled
with altered inflammatory responses[12] and impaired adaptive immunity[13]. Specifically, the adaptive immune
compartment of COVID-19patients exhibits marked lymphopenia[4,14,15], polarization of T cells toward a
memory phenotype[14] and functional
exhaustion[16-18], demonstrating that SARS-CoV-2
infection induces both cellular[19,20] and humoral responses[21]. However, the molecular and
cellular mechanisms through which SARS-CoV-2 infection induces these broad
immunological derangements in only some patients remains to be elucidated. Further,
little is known about the role of the innate immune responses that constitute the
first defense against SARS-CoV-2 infection. Moreover, the degree of interaction
between various immune compartments and demographic factors and medical
comorbidities is unclear. The most prominent risk factors for severe disease and
death by COVID-19 include age, cardiovascular or oncological comorbidities, and
immunosuppression[22-24]. Additionally, men appear to be at
significantly higher risk for severe COVID-19 than women[25]. While mortality rates are estimated at
4–6% in the general population, high-risk populations experience mortality
rates >60%[26].Clarifying the early immunological alterations associated with mild versus
severe COVID-19 may not only offer therapeutically actionable targets, but also
enable the identification of cases at highest risk for clinical deterioration and
death. The development of an effective clinical decision-making tool rooted in
immunological monitoring has the potential to optimize patient care and resource
utilization.By profiling mild and severe COVID-19patients and healthy donors with flow
cytometry, we demonstrate that SARS-CoV-2 is associated with broad dysregulation of
the circulating immune system, characterized by the relative loss of lymphoid cells
coupled to expansion of myeloid cells. Severe cases demonstrated enrichment of
natural killer (NK) cells expressing the immunosuppressive receptor killer cell
immunoglobulin-like receptor, two Ig domains and short cytoplasmic tail 4 (KIR2DS4;
CD158i), as well as alterations in the B cell compartment marked by reduced CD19,
CD20, and IgM+ cells. These immune profiles enable reconstruction of a hierarchy of
disease progression with pseudo-temporal modeling, which allows estimation of
dynamic longitudinal changes within individual patients. Our approach also estimates
the effect of clinical factors on immune dysregulation and thus establishes an
immune-monitoring tool for disease progression.
Results
SARS-CoV-2 infection causes major changes in the circulating immune
system
We conducted an observational study of 45 individuals with COVID-19 that
were treated at New York Presbyterian Hospital and Lower Manhattan Hospitals,
Weill Cornell Medicine (IRB 20–03021645) as in- or outpatients between
April and July, 2020. The disease was categorized as “mild” if the
patient was not admitted or required <6 L non-invasive supplemental
oxygen to maintain SpO2 >92% (n = 21). Patients with
“severe” disease required hospitalization and received >6 L
supplemental oxygen or mechanical ventilation (n = 15). Blood samples were
collected at enrollment and, when permissibsible, approximately every 7 days
thereafter. Samples were also collected from non-hospitalized individuduals who
had recovered from mild, laboratory-confirmed SARS-CoV-2 infection
(“convalescent” group, n = 9). and from healthy COVID-19 negative
donors (n = 12) (Figure 1a).The median age
of COVID-19patients was 65 years, which was significantly higher than healthy
donors (30 years) (Supplementantary Tables 1–2; Supplementary Figure 1).
Figure 1:
Immuno-profiling of COVID-19 patients reveals a disarrayed immune
system.
a) Composition of the study cohort. b)
Description of immune panels and their target epitopes. c)
Composition of major immune compartments as a percentage of all live
CD45+ cells. d) Abundance of major lymphphoid
compartments as a percentage of all lymphocytes. For (c) and (d), the upper
panels divide patients by general disease status and three lower panels further
divide the study subjects by clinical intervention or outcome. Significance was
assessed using Mann-Whitney U tests and corrected for multiple testing with the
Benjamini-Hochberg false discovery rate (FDR). **, FDR-adjusted p-value
<0.01; *, FDR-adjusted p-value of 0.01–0.05.
We performed high-dimensional immune cell profiling of circulating blood
by flow cytometry based seven independent fluorochrome-conjugated antibody
panels, each targeting a specific surface protrotein marker of T, B, NK, and
myeloid-derived suppressor cells (MDSCs) (Figure
1b, Supplementary
Figure 2 and Supplementary Tables 3–7). Longitudinal sampling was
performed in eight patients, one in the “mild” and seven in the
“severe” group, and included at least three samples per patient,
making a complete dataset including 102 samples from 57 individuals.Consistent with previous reports[12,15,27,14], we observed global loss of lymphocytes among
CD45+ cells and enrichment of the myeloid cell compartment in the
peripheral blood of COVID-19patients compared with healthy donors (Figure 1c, top). This was exacerbated in
patients with severe disease, particularly among hospitalized patients who
required mechanical ventilation or died, compared with individuals with mild
disease (Figure 1c, bottom). This
lymphocyte depletion was primarily observed in the T and NK cell compartments
(Figure 1d). There was no difference in
the abundance of B cells between mild and severe groups. These results highlight
a major shift in peripheral immune cell absolute abundance from the lymphoid to
myeloid lineage (Supplementary
Tables 6–7).
SARS-CoV-2 infection causes imbalances in the naive and memory T cell
compartments and induces exhaustion
We next profiled CD4+ and CD8+ T-cells in COVID-19patients and healthy
donors (Supplementary Figure
2). The CD4/CD8 ratio correlated positively with disease severity
(Figure 2a)[14,19]. There was also an expansion of memory T cells
(CD45RO+) with reciprocal contraction of the naive compartment
(CD45RA+) in severe cases relative to mild disease or healthy donors (Figure 2b). We next quantified the abundance
of populations expressing C-C motif chemokine receptor 7 (CCR7; CD197), selectin
L (SELL; CD62L), and FAS cell surface death receptor (CD95). Within
CD45RA+ cells, effector CCR7− (TEFF)
populations were increased in COVID-19patients and those with severe disease,
especially in the CD8+ compartment (Figure 2c). Conversely, there was significant depletion of
CD8+CD45RO+CD95− T cells in
patients, which was exacerbated with severe disease.
Figure 2:
T cells from COVID-19 patients have high levels of CD25, FAS, and exhaustion
markers.
a) The ratio of CD4 to CD8 cells is dependent on disease
state and clinical intervention. b) The abundance of CD45RA/RO
cells in either CD4+ or CD8+ compartments is dependent on
disease state or clinical intervention. c) Abundance of immune
populations changes significantly between disease states. d)
Uniform Manifold Approximation and Projection (UMAP) projection of all cells
colored by either surface receptor expression, cluster assignment, or disease
severity. e) Immune phenotype of each cluster (top) and its
composition in disease severity (bottom). f) Expression levels of
CD25 and FAS receptors in the UMAP projection. g) FAS expression
across all clusters depending on disease severity (left) and the proportion of
cells not expressing FAS for each sample (right). h) Scatter plot
of CD25 and FAS expression for each cell according to disease severity.
i) Abundance of CD4+ CXCR5+ PD-1+ TFH by disease severity.
j) Immune populations with significantly different amounts of
cells expressing immune checkpoint receptors by disease severity.Significance
was assessed by Mann-Whitney U tests and corrected for multiple testing with the
Benjamini-Hochberg false discovery rate (FDR). **, FDR-adjusted p-value
<0.01; *, FDR-adjusted p-value 0.01–0.05.
To characterize these populations more objectively and independently of
manual gating, we analyzed the expression of eight surface proteins at the
single-cell level by jointly embedding CD3+ cells from all samples
with the Uniform Manifold Approximation and Project (UMAP) method and clustering
them (Figure 2d). Not all clusters
contained cells from all severity groups proportionally. Specifically, clusters
12, 18, and 21, which are characterized by reduced FAS expression, were enriched
for cells from healthy donors (Figure 2e).
Moreover, there was increased expression of CD95 in samples from COVID-19patients that correlated with disease severity (Figure 2f–g), and
FAS– cells were particularly depleted in all patients
(Figure 2g). Indeed,
CD95+CD25+ T cells were increased in severe cases,
while no difference was observed between convalescent patients and healthy
donors (Figure 2h).Next, we assessed the frequency of CD4 regulatory T cells
(TREG, characterized by
CD127dimCD25bright). As markers of follicular helper T
cells (T ), we also measured CD4+, CXCR5+ PD1+,
ICOS+ TFH, which are critical to B cells in the
initiation and maintenance of humoral immune responses[28]. We found a significant but modest
increase in TFH in mild and severe COVID-19 cases, with their
presence in convalescent patients similar to in healthy donors (Supplementary Figure 4a/b). However, upon
considering a broader spectrum of TFH cells regardless of ICOS
expression (Supplementary
Figure c), CD4+, CXCR5+, PD-1+
TFH were most abundant in COVID-19patients with mild disease
(Figure 2i). In TREG, severe
COVID-19patients showed significant increase compared with healthy donors
(Supplementary Figure
4c), while previous reports showed an increase in patients with mild
course[5,29].To investigate T cell functional phenotypes, we assessed the expression
of co-inhibitory T cell receptors. We observed sustained increase of Programmed
Cell Death 1 (PD-1) in COVID-19patients compared with healthy donors in both
CD4 and CD8 compartments. At the same time, V-set immunoregulatory receptor
(VISTA) and lymphocyte-activating gene 3 (LAG3) were upregulated in mild cases
(Figure 2j). Exhausted T cell
phenotypes, with high expression of VISTA and LAG-3, can be encountered in
chronic viral diseases[30],
including chronic SARS-CoV-2 infection[17]. This phenotype suggests that these inhibitory
receptors may operate at least partially via non-overlapping immunosuppressive
signals that negatively regulate T cell responses during chronic viral
infection[16].These results highlight a shift toward an activated T cell memory
phenotype in COVID-19patients, with a potential role for CD95-mediated cell
death. By and large, convalescent patients and healthy donors displayed similar
immunotypes in comparison with COVID-19patients. However, we did identify
populations such as CD45RA+, CCR7+,
CD62L−, FAS− CD8+ TEFF cells,
which remained significantly different to healthy donors up to ~2 months
into recovery (Figure 2c). These cells may
represent “T stem memory (TSM) cells” with poor
expansion potential[31] and/or
aberrant terminally differentiated effector memory (TEM)
cells[32]
SARS-CoV-2 induces expansion of polymorphonuclear MDSCs and biases NK KIR
usage
Having observed myeloid expansion in COVID-19patients (Figure 1c), we next investigated the abundance of the
MDSC subset. These elements are activated by interleukin 6 (IL-6)[33] and have immunomodulatory
functions in cancer[34,35] and viral infections[36]. Our flow cytometry panel
considered CD3−, CD56−,
CD19−, HLA-DR−/dim, CD33+,
CD11b+ cells and focused on distinguishing
CD14−, CD15+ granulocytic cells (G-MDSCS);
CD14+, CD15−/dim monocytic-like cells (M-MDSC);
and CD14−, CD15−/dim immature cells (I-MDSC)
from each other. G-MDSCs were rarely detected in healthy donors but were
prevalent in mild and severe COVID-19patients (Figure 3a/b). Convalescent
patients showed numbers of G- and M-MDSCs closer to healthy donors, with a
non-significant increase in I-MDSCs compared with healthy donors. Conversely,
I-MDSC cells, while relatively rare as a fraction of all immune cells, were
further reduced with disease (Figure
3a/b). Since neutrophils are
phenotypically similar to MDSCs, we compared their abundance with MDSCs. While
there was a positive correlation between G-MDSCs and a high neutrophil count,
neither could account entirely for the other (Supplementary Figure 3).
Figure 3:
Emergence of granulocytic MDSCs and preferential expression of specific NK
cell receptors in the innate immune system of COVID-19 patients.
a) Abundance of MDSCs as a percentage of all immune cells
according to disease severity. b) Uniform Manifold Approximation
and Projection (UMAP) projection of all cells from all patients colored by the
expression levels of surface receptors, derived clusters, or disease severity
among all patients. c) Immune profile of each cluster from (b)
based on the expression of surface markers (top) and composition in disease
severity (bottom). d) Expression levels of CD15 dependent on
disease severity (left) and quantification of cells expressing it (right)
according to CD16, CD3, and CD33 expression. e) Abundance of cells
expressing various KIR receptors as a percentage of NK cells according to
disease severity. f) UMAP projection of all cells from all patients
colored by the expression of surface receptors, derived clusters, or disease
severity. g) Immune profile of each cluster from (f) based on the
expression of surface markers (top) and composition in disease severity
(bottom). h) Expression levels of all four measured KIR receptors
in each disease state. Significance was assessed using Mann-Whitney U tests and
corrected for multiple testing with the Benjamini-Hochberg false discovery rate
(FDR). **, FDR-adjusted p-value <0.01; *, FDR-adjusted p-value
0.01–0.05.
Next, we created a joint embedding of 2.4 million CD16+ cells
from all samples using the UMAP method, deriving clusters based on similar cells
(Figure 3c). Clusters containing
CD15+ cells were disproportionately enriched in samples from
COVID-19patients, while clusters with CD3+, IL4R (CD124) were mostly
composed of cells from healthy donors (Figure
3d). In addition, CD15 expression was most prominent in COVID-19patients, particularly in severe cases, but when selecting for
CD3− or CD3− CD33+ cells, convalescent
patients possessed a number of CD15+ cells more similar to patients
with active disease than healthy donors.Next, we focused on innate lymphoid cells and determined the expression
of KIR receptors in CD56+, CD16bright NK cells. While we
observed no significant differences in the relative abundance of KIR receptors
among COVID-19patients with mild disease and healthy controls (Figure 3f), a significantly higher proportion of cells
expressed CD158i (NKG2A) in severe patients compared with mild or convalescent
individuals. Moreover, we observed fewer CD158e (KIR3DL1) cells in patients with
mild disease compared with severe patients and a lower proportion of cells not
expressing any of the measured receptors (KIR–) in patients
with severe disease. To further explore NK cell subsets independent of
conventional gating, we harnessed single-cell analysis and integrated
>500,000 cells in a UMAP representation, identifying cell clusters based
on surface marker expression (Figure 3g).
Clusters significantly enriched in CD158i expressing cells were paucicellular in
healthy donors compared with COVID-19patients (Figure 3g, bottom), and the relative frequency of CD158i-expressing
cells was lower in healthy donors, regardless of the expression of other KIR
receptors (Figure 3h). Since the expression
of KIR variants is stochastic, the apparent selection of KIR-expressing cells in
severe COVID-19patients could indicate that a viral antigen presented by major
histocompatibility complex (MHC) class I molecules with higher affinity for
CD158i could select for NK cells expressing this receptor.
B cells of COVID-19 patients show distinct patterns of immunoglobulin
expression associated with disease severity
Since B cells play a critical role in adaptive immunity, we investigated
the expression levels of surface CD19, CD20, IgM, and IgG in circulating cells.
Despite the backdrop of a relative decrease in B cell numbers as disease
progresses, we observed only a mild, non-significant increase in plasmacytoid
cells in patients with severe COVID-19 compared with healthy donors (Figure 4a). However, the number of
IgM+ CD19+ CD20+ B cells was decreased in
patients with severe disease compared with mild, while IgG+,
CD19+, CD20+ cells remained comparable across all
patients (Figure 4b). Next, we visualized
single cells from all patients in a common UMAP plot and assigned clusters based
on surface marker expression (Figure 4c).
This approach identified two distinct groups based on the expression of surface
IgM, with the total number of IgM+ cells within clusters increased in
severe COVID-19patients (Figure 4d).
Conversely, healthy donors displayed B cells with high expression of surface
CD19+ and CD20+ antigens (Figure 4d). Closer inspection of CD19 and CD20
expression identified two distinct populations that differ in CD20 levels (Figure 4e). This approach also revealed that
the relative abundance of circulating CD19 and CD20bright B cells was
lower in COVID-19patients compared with healthy individuals regardless of
disease severity.
Figure 4:
B cells of COVID-19 patients are marked by a shift toward a plasmocytic IgM
phenotype.
a-b) The abundance of total B cells, plasma, and IgG+ and
IgG+ cells between disease states. c) Uniform Manifold
Approximation and Projection (UMAP) projection of all cells colored by surface
receptor expression, cluster assignment, or disease severity. d)
Immunophenotype of each cluster (top) and its composition by disease severity
(bottom). e) Identification and quantification of five populations
of B cells dependent on CD20 and CD19 expression. f) Comparison of
the abundance of the populations identified in e) between disease states.
Significance was assessed using Mann-Whitney U tests and corrected for multiple
testing with Benjamini-Hochberg FDR. **, FDR-adjusted p-value <0.01; *,
FDR-adjusted p-value 0.01–0.05.
To shed light on the functional relevance of these different B cell
subsets, we quantified the expression of IgG and IgM in each population
identified based on CD19 and CD20 co-staining (Figure 4f). Circulating CD19low B cells (populations A
and B) were enriched for IgG+ cells in patients with mild and severe
COVID-19 and IgM+ cells in severe COVID-19patients, while
convalescent patients resembled healthy donors. No such difference was observed
with CD19+ and CD20bright B cells (population C) and
CD19+ CD20+ and CD19+
CD20− B cells (populations D and E). Overall, despite
dwindling numbers of B cells overall, specific subsets of B cells, especially
those with lower CD19 expression, have distinct immunoglobulin expression
patterns in COVID-19patients, with severe patients more frequently bearing
IgM+ B cells. We speculate that these findings may be related to
the plasmacytoid differentiation and immunoglobulin switching programs, which
may be dysfunctional due to SARS-CoV-2 infection.
Pseudo-temporal modeling unveils a highly dynamic immune cell landscape of
COVID-19 over time
Having characterized the main circulating compartments of the immune
system, we next sought to leverage the high-dimensionality of the dataset and
relate individual immune fingerprints of different patients using hierarchical
clustering (Figure 5a). Not only did
individuals with similar clinilinical conditions tend to cluster together, but
clinical factors such as hospitalization or intubation also appeared to be
linked to immunotypes. We hypothesized that this underlying data structure would
be useful for reconstructing the clinical course of COVID-19. Thus, we employed
pseudotime inference to reconstruct an underlying latent space from a healthy
state to a severe disease state (Figure
5b/c)
Figure 5:
Pseudo-temporal reconstitution of disease progression reveals a hierarchy of
immune changes in COVID-19 disease.
a) Hierarchical clustering of the abundance of immune
populations for all samples reveals an organized structure by disease states and
clinical factors. b) Projection of immune profiles into a
two-dimensional latent space that reconstructs the hierarchy of disease
progression. The x-axis represents disease progression in the pseudo-temporal
space. c) Distribution of samples grouped by disease state along
the pseudo-temporal axis derived in (b). d) Immune populations
associated with the pseudo-temporal axis represented by either the absolute
change in percentage in their extremes (x-axis) or strength of linear
association (y-axis). e) Heatmap of immunotypes and immune
populations sorted by their order or relative abundance in the pseudo-temporal
axis, respectively. f) Clusters of immune populations based on
their abundance along the pseudo-temporal axis. g) Examples of
immune populations from each cluster in (f).
Further analysis of the inferred space enabled identification of
circulating immune cell populations associated with disease progression. In
particular, we identified a space driven by a decrease of lymphocytes, gain of
myeloid cells (G-MDSCs in particular), and a terminally activated/exhausted T
cell phenotype (Figure 5d). Besides
discovering immune signatures associated with each degree of severity, this
analysis allows the relative positioning of each time point in relation to the
continuos changes characterized by pseudotime (Figure 5e). Variable changes associated with the pseudo-temporal
axis could be classified in three clusters (Figure
5f). The first was composed of 68 populations, with an increase
toward higher disease severity with representatives such as the fractction of
myeloid cells, PD-1+ CD4+ T-cells, and
CD62L− cells among CD45RA+, CD8+ T
cells (Figure 5g left). The second
corresponded to a virtually stable cluster with 52 populations such as
IgM+ B cells, with only mild fluctuation in the intermediate
stage (Figure 5g, center). The third
included a cluster with a steady decrease by disease severity, encompassing the
overall lymphoid population as well as B cells and CD45RA+
CD4+ T cells (Figure 5g,
right). This effectively establishes a temporal hierarchy of changes as disease
progresses in which populations such as CD62L+, CCR7+ ,
CD45RA+,CD8+ T cells have a steady decline and others
such as B cells have a stronger decline toward the severe end of the
pseudo-temporal timeline. Additionally, the dynamic character of changes raises
the possibility of using flow cytometry to improve COVID-19 patient
stratification based on real-time immunological monitoring. Although our
observations do not indicate causality, immunologi variations in the
pseudo-temporal dimension may offer testable hypotheses on COVID-19 progression
mechanisms.
Integration of clinical and demographic factors affecting COVID-19 immunity
and stratification of patients by disease severity
Since various clinical and demographic factors influence disease
incidence and mortality[1,2,37], we investigated the interaction between SARS-CoV-2
infection, the circulating immune system, and various demographic and clinical
factors. Thus, we fit regularized linear models to the proportional flow
cytometry data with co-variates such as sex, race, age, disease severity,
presence of comorbidities, hospitalization, intubation, and death (Supplementary Figure 5a).
We also estimated the interaction of sex with clinical variables such as disease
severity, hospitalization, intubation, and death. The resulting network of
significant effects identified several clinical factors associated with specific
immune cell populations, highlighting how age, sex, and disease severity jointly
influence the circulating immune systems in patients with COVID-19 (Figure 6a).
Figure 6:
Factors conditioning the immune response during COVID-19 and predicting
disease severity
a) Directed graph of clinical factors (green) and immune
populations (pink). Edges represent the association between factors and immune
populations and are colored by the direction and strength of association (blue,
negative; red, positive). b) Abundance of select immune populations
with significantly different responses between sexes dependent on outcome.
c) Estimated coefficients of change for severe vs. mild disease
(left) o tocilizumab treatment (right) for immune populations that change
discordantly. d–e) Abundance of select immune populations
with significantly different responses between sexes dependent on tocilizumab
treatment (d) or intubation (e). f) Graphical depiction of the
machine learning framework for predicting disease severity using the earliest
available samples per patient and cross-validation. g–h)
Performance of classifiers trained with real or randomly shuffled labels and
either all immune populations (g) or with selection for the top most predictive
eight populations (h). i) Predicted severity scores over time since
symptoms started for immune profiles from patients with at least three
longitudinal sampling points. j) Relative expression of CD25,
CD45RA and CD45RO over time in four patients from (g). **, FDR-adjusted p-value
<0.01; *, FDR-adjusted p-value 0.01–0.05.
As a baseline, we could recover known effects independent of disease,
such as a higher CD4:CD8 ratio in females than males and an overall decrease of
the lymphoid population with age (Supplementary Figure 5b). Lastly,
we found associations between sex and clinical variables such as a significantly
higher fraction of CD62L+, CCR7+, CD45RO+,
CD4+ T cells in males that died compared with females (Figure 6b, left) and much lower total
lymphocyte levels in females that died compared with males (Figure 6b, right). Regarding the effect of tocilizumab
on the immune system, we compared post-treatment samples from eight treated
severe patients to seven severe untreated patients. While we observed the
largest effect in certain subsets of CD4+ T cells, there was also an
increased relative abundance of B cells and a decrease in T cells expressing the
co-inhibitory receptor hepatitis A virus cellular receptor 2 (HAVCR2; TIM3)
(Supplementary Figure
5c). Moreover, the signature associated with severe versus mild
patients was broadly counteracted by tocilizumab (Figure 6c). Associations between sex and clinical variables were
found, such as a lower fraction of CD62L+, CCR7+, CD45RA+, CD8+ T cells in
females treated with tocilizumab compared with males, contrary to the opposing
trend in untreated individuals (Figure 6d),
or the lower frequency of CD158a NK cells in female intubated patients (Figure 6e).Since there is a need to stratify patients to provide better, more
effective, and less costly care, particularly in the earlier stages of disease,
we hypothesized that the high dimensionality of the immunotypes would make it
possible to train a classifier to predict disease severity early on. A random
forest classifier was trained to distinguish patients with mild from severe
disease using only the earliest available sample of each patient in a
cross-validated manner (Figure 6f). We
observed good performance of the classifier (median area under receiving
operator curve [ROC AUC], 0.81) compared with one with randomized severity
labels (median ROC AUC, 0.49) (Figure 6g),
providing good balance between true positive and false positive rates. Since our
dataset is composed of immune populations from seven flow cytometry panels, we
tested whether a smaller number of variables could discern patients with mild
and severe disease courses. With only eight variables, the classifier could
distinguish patients with different disease severities, albeit with lower
performance (ROC AUC, 0.73 vs. 0.49 with randomized labels) (Figure 6h). Furthermore, we hypothesized that our
classifier could be used for real-time immuno-monitoring of COVID-19patients.
Thus, we applied it to subsequent samples of patients with more than three
samples collected over the disease course, while withholding those samples from
the training set (Figure 6i). Patient 26,
who had an overall mild disease course, had all samples classified as mild;
severe patients often showed dynamic severity probabilities over time, with at
least one timepoint classified as severe disease. To exemplify how this
prediction relates back to flow cytometry data, we illustrate the aggregated
expression of the activation marker CD25 and CD45RA/RO in single T cells over
time (Figure 6j). Patients with lower
predicted severity toward the end of their course (e.g., patient 23) tended to
have less CD25 expression and increased CD45RA expression, while the opposite
was also true (e.g., patient 16). Patients with predictions that were either
more stable or dynamic over time (patients 26 and 24, respectively) showed
dynamics of expression in accordance to their overall predicted pattern over
time. This proof-of-principle work demonstrates our ability to leverage
high-content immune profiling to predict overall disease course and provides the
basis for real-time immune-monitoring of COVID-19patients.
Discussion
Here, we describe the circulating immune landscape of COVID-19patients
compared with healthy individuals. Consistent with previous reports[11,14,15,29], we demonstrate that disease progression is
dominated by the progressive loss of circulating lymphocytes and gain of myeloid
cells. We also detected selective expansion of NK populations and MDSCs, suggesting
that the innate compartment may contribute to the immunological disarray of COVID-19patients. We then harnessed this multi-dimensional dataset to generate a
machine-learning classifier that could predict disease severity using a defined
flow-cytometric signature. Our work provides a proof-of-concept that an
immune-monitoring algorithm could provide a rapid and personalized approach to
manage COVID-19.While previous studies have focused on lymphocyte populations[12,14,15,38], to our knowledge the role of innate immune
cells is less understood[39]. Our
study highlights the expansion of MDSCs, especially G-MDSCs, in severe COVID-19patients. Unlike their natural counterparts, these elements have suppressive
function[40] that impairs
immune responses in cancer[34] and
derails effective responses against bacterial and viral infections by the adaptive
immune system[41,42]. Given the overall depletion of the immune
system’s lymphoid branch during COVID-19, an interesting hypothesis is that
G-MDSCs and other myeloid cells represent uncontrolled negative feedback. These
elements ultimately contribute to the establishment of pan-immunosuppression,
leading to dysregulated responses from the adaptive immune system. It will be
essential to establish whether they are actively recruited to infected lungs and
whether they are causally involved in disease pathogenesis or represent a systemic
compensatory response to inflammation. Since MDSCs are virtually absent in healthy
individuals, questions arise regarding the mechanisms of their genesis and tissue
recruitment and how they interact with lung tissue. At the same time, our novel
observation that NK cells expressing the CD158i variant are over-represented in
patients with severe disease raises the question of whether this variant and other
KIRs implicate NK cells in disease progression (Figure
3).Within the adaptive immune system, the mechanisms leading to severe immune
depletion, a landmark seen with disease progression and markedly apparent in autopsy
samples, are unknown[43]. To this
end, we observed increased CD25+ T cells in COVID-19patients, indicating
a higher state of activation (Figure 2), but
also an increase in CD95+ with disease progression. This phenotype was
significantly marked in severe patients, consistent with a recent study[38]. FAS has a crucial role in
mediating cell death via FAS ligand engagement, as in activation-induced cell death
(AICD), or by shifting cells to a more apoptotic-prone phenotype. While FAS is a
natural regulatory checkpoint of T cells, it plays a role in autoimmunity[44] and cancer[45], and AICD is involved in loss of
CD4+ and CD8+ cells in HIV patients[46]. However, severely exhausted T cells can
undergo apoptosis, and virus-specific T cell decline can favor viral
escape[47-49]. Indeed, similar to previous
reports[17,18,50], T
cells displayed an overall exhausted phenotype, with overexpression of VISTA, TIM3,
LAG3, TIGIT, and PD-1 co-inhibitory receptors in COVID-19 patient T cell
populations. This likely results in inability of the adaptive immune system to keep
viral proliferation in check. In the B cell compartment, we observed lower
expression of CD19 in COVID-19patients and higher expression of membrane-bound IgM
and IgG in both mild and severe patients. These data suggest that under viral
exposure, B cells undergo plasmacytoid maturation and immunoglobulin switching.
Remarkably, several patients displayed higher IgM than IgG
CD19+CD20+/– cells, suggesting abnormal and delayed
maturation of plasma cells (Figure 4). While
the implications remain speculative, they do warrant further investigation given the
central role of B cells in the development of immunity by COVID-19patients.Taking advantage of our dataset’s high-dimensional characteristics
and pseudo-temporal modeling, we constructed a COVID-19 disease course landscape.
This strategy reveals a continuum of disease progression between healthy state, mild
disease, and severe disease. Remarkably, convalescent patients displayed immune
phenotypes similar to healthy donors, suggesting a possible return to a largely
healthy state, as previously suggested based on the exhaustion phenotype in adaptive
responses[18]. Conversely,
we could speculate that the immune landscape of mild/convalescent patients never
achieved the level of disarray observed in severe patients. While there were marked
differences between patients with prevalent mild or severe disease, their
recognition remains a unique challenge. One interesting open question is whether the
changes associated with mild vs. severe disease protect against disease progression
or, conversely, which immune populations related to severe illness play a role in
the progression to severe disease. While proof-of-principle, our classifier of
severe disease shows robustness and overall value in predicting disease progression
based on immune profiling and near-real-time disease monitoring. Thus, it may be
valuable to inform clinical action like that proposed in chronic diseases with other
high-dimensional assays[50-52]. Moreover, we demonstrated that a
classifier with a limited number of markers retains good performance. If confirmed
in large cohorts, it could provide a useful approach to stratify patients and
predict clinical evolution using a rapid and economical assay.Although our study confirmed some findings and provides new data on the
innate immune landscape of COVID-19patients, we recognize limitations that could be
overcome with larger sample sizes and matched control populations. Indeed, large
population studies may discover new immune populations and find a relationship with
disease progression or clinical factors. In this current pandemic, profiling a
larger sample of the population and investigating multiple time points may help
identify viral adaptation to the host (particularly when coupled to analysis of
viral sequence) in patients with different outcomes. These programs may be achieved
if an effective institutional organization, multicentric networking, and substantial
financial support are available.The targeted nature of flow cytometry interrogates limited sets of immune
populations and implies that only certain molecules can be effectively profiled. In
our study we used mainly proportional data when comparing the abundance of immune
populations between patient groups. While this may not necessarily imply absolute
changes in cell numbers, we observed good overall agreement between changes in
proportions and absolute counts when comparing severe and mild disease status (Supplementary Figure 6 and
Supplementary Table 7).
This highlights the importance of studies employing orthogonal modalities such as
cytokine profiling[51], single-cell
RNA sequencing[52-56], and their integration[57,58].
Nevertheless, even without orthogonal studies, our machine learning approach for
predicting disease severity demonstrates predictive potential, although it should be
tested in a validation cohort before use in a clinical setting.Lastly, we wish to note the work of others and their complementary findings.
For example, Laing et al.[32] used
peripheral blood flow cytometry and circulating cytokine measurements to demonstrate
apparent immune dysregulation in COVID-19patients. They highlighted additional
interesting, complementary features, including increased IL-6, IL-10, and IP-10 and
depletions of basophils, plasmacytoid dendritic cells, TH1 cells, and
TH17 cells. Incorporating their most differentiating markers with
ours could yield a more complete yet targeted panel of markers with more predictive
power to determine which patients will rapidly progress to a severe disease state.
This incorporation of additional differentiating markers should be pursued in future
studies.Collectively, our study highlights a profound imbalance in the COVID-19
immune landscape, characterized by G-MDSC expansion and T cell exhaustion that may
open avenues for clinical translation. Further, our approach provides a powerful
tool to predict clinical outcomes and tailor more effective and proactive therapies
to COVID-19patients.
Methods
Study design, sample acquisition, and clinical data
The study was approved by the Institutional Review Board of Weill
Cornell Medicine. Participants were recruited from patients hospitalized at New
York Presbyterian Hospital from April to July 2020. Some participants in a
COVID-19 convalescent plasma donor screening program with prior confirmed
diagnosis (by RT-PCR or serology) were given the option to contribute a sample
for this research. ARDS was categorized in accordance with the Berlin definition
reflecting each subject’s worst oxygenation level and with physicians
adjudicating chest radiographs[59]. Informed consent was obtained from all participants.
Flow cytometry
For flow cytometric analysis of circulating leukocytes, peripheral blood
was collected in Na-heparin. Except for the MDSC panel, in which PBMCs were
prepared by density gradient centrifugation, erythrocytes were lysed with BD
Pharm Lyse. Peripheral blood was washed in Dulbecco’s PBS (DPBS), lysed
in 1× BD Pharm Lyse, and washed again in DPBS. PBMC cell suspensions were
prepared with Ficoll-Paque following the manufacturer’s protocol. Cells
were stored briefly in storage medium (10% heat-inactivated fetal bovine
serum/1% L-glutamine/1% pen-strep) before staining with antibody cocktails.For each panel, one million cells were stained with specific cocktails
of fluorochrome-conjugated antibodies (Supplementary Table 3-4). Cells were washed
with DPBS then stained with dead cell dye (BD Fixable Viability Stain 700)
before washing with wash buffer (0.5% BSA/DPBS/NaN3). Cells were then
treated with 50 μL of Fc-blocking solution (2% normal rabbit serum/10% BD
Fc Block/DPBS)before application of a 100-μL antibody cocktail diluted in
wash buffer. Samples were stained within 6 hours of sample collection and
analyzed on a BD Biosciences FACSCanto flow cytometer within 2 hours of
staining. The stopping gate was set to acquire 500,000 viable, nucleated single
cells.
Supervised quantification of immune cell populations (gating)
Immune populations were quantified by manual analysis with BD FACSDiva.
Absolute counts of populations were exported to CSV and relative population
sizes were calculated in Microsoft Excel. Gating for each panel started with a
time gate, followed by a singlet gate (FSC-A vs. FSC-H). Next, viable cells
(dead cell dye vs. FSC-A) and nucleated cells (FSC-A vs. SSC-A) were gated.
Populations of MDSCs were gated sequentially from leukocytes (CD45 vs. SSC-A),
then CD3/CD56/CD19 (Lin)- and HLA-DR −/dim cells, followed by
CD33+ and CD11b+ cells. From there, granulocytic cells
were defined as CD14− and CD15+, monocytic cells as
CD14+ and CD15 −/dim and immature forms as
CD14– CD15 −/dim.TREG were defined by sequential gating of lymphocytes (CD45
vs. SSC), T cells (CD45 vs. CD3), T helper (CD8 vs. CD4), and finally
TREG were defined as CD127 dim and CD25+. The
TFH panel was gated the same as in the T cell regulatory panel
down to the CD4+ helper gate. Under this gate, CD185+ cells were
quantified (CD185 vs. CD8) and the ICOS bright, PD-1 bright (CD278 vs. CD279)
cells were gated. The ICOS bright, PD-1 bright TFH gate was placed
under the CD185+ gate to identify the population with all the phenotypic markers
of TFH lineage in this panel.The analysis of the T cell memory and checkpoint panels started with
identifying T cells (CD3 vs. SSC), then the CD4 helper and CD8 cytotoxic
subsets. To analyze the T cell checkpoint panel, individual exhaustion markers
were gated on histogram plots. The T cell memory panel was further subdivided
into CD45RA+/CD45RO− and
CD45RA−/CD45RO+ subsets. Under these gates, two
quadrant gates were placed on CD62L vs. CCR7 and CD62L vs. FAS.Gating for the B-cell panel began with CD45 vs. CD19 then FSC-A and
SSC-A to identify cells of the B lineage. The CD20+ and
CD20− subsets were gated (CD19 vs. CD20) and IgG and IgM
were quantified within the CD20+ subset (IgM vs. IgG).NK cells were identified by sequential gates on CD56 vs. CD3, FSC-A vs.
SSC-A, and CD56 vs. CD16. CD56+, CD16 bright, mature NK cells were
then interrogated for their reactivity with individual anti-KIR (CD158) and
anti-NKG2A (CD159a) antibodies with gates on histogram plots. KIR-negative NK
were identified by sequential gating on CD158a vs. CD158b double-negative, then
CD158i and CD158e double-negative subsets.
Statistical testing
Non-parametric Mann-Whitney U tests were employed to assess the
significance of pairwise changes in the proportions of immune populations
between severity groups using the Pingouin package, version 0.3.7. Multiple test
correction was performed with the Benjamini-Hochberg FDR method.
Single-cell analysis of immune cell populations
To select cells from the events, single cells were gated using
forward-side scatter height and area, CD45-positivity, viability dye-negativity,
and the major marker of each panel (e.g.. CD3 for T cell memory panel).
Compensation was applied using FlowKit[60] version 0.5.0, and an inverse hyperbolic transformation
(AsinhTransform) was applied with parameters t = 10000, m = 4.5, a = 0. To
construct a shared latent representation for all cells, dimensionality reduction
was performed with Principal Component Analysis (PCA), a neighbor graph was
computed using 15 neighbors per cell, UMAP[61] with default parameters, and Leiden clustering, all
using the Scanpy package[62]
version 1.5.1. For each discovered single-cell cluster, a proportion of cells
was calculated in relation to a specific clinical factor after normalization by
the frequency of the same factor in the cohort.
Pseudotime inference and time series modeling of immune cell dynamics during
disease progression
To learn a latent manifold of the data, the non-linear method Laplacian
Eingenmaps[63] was used
as implemented in the “SpectralEmbedding” method
of the scikit-learn framework[64] (version 0.23.0) with default parameters. A z-scored matrix
of proportional data was input for all immune cell populations (variables) and
patient samples (observations). To rank the features by their association with
the learned space, Pearson’s correlation was calculated between the first
component and each variable, in addition to the fold and absolute change in the
variable between the top and bottom 10% of the samples in each extreme of the
embedding. The same procedure applied to a Uniform Manifold Approximation and
Projection (UMAP) latent representation of the same data yielded similar
results, with the exception that the spread of samples according to disease
progression was parallel to multiple learned axes rather than single.To rank variables by the amount of change in both real time since the
reported start of symptoms for a single patient or over the learned latent space
across all patients, GPy package[65] was used to fit Gaussian Process regression models on the
learned pseudotime axis (independent variable) and the abundance of each immune
cell population (dependent variables). A variable radial basis function (RBF)
kernel and a constant kernel (both with an added noise kernel) were fitted and
the log-likelihood and standard deviation of the posterior probability of the
two were compared as described previously[66]. To cluster the abundance of immune populations based
on their dynamics over the pseudotime axis, the same kernels were used to fit a
Mixture of Hierarchical Gaussian Processes (MOHGP) as implemented in the GPClust
package[67,68] using 8 as an initial guess of number of
clusters.
Linear modelling of immune cell type abundances
Due to the proportional nature of the dataset, generalized linear models
were fit using a Gamma-distributed noise model with a log-link function. Ridge
regularization was employed to ensure robust coefficients given the low
abundance of some populations, and the model was fit with ordinary least squares
optimization using the statsmodels package[69] version 0.11.1. Categorical variables
were one-hot encoded and numeric ones such as age or days since symptoms started
were kept as years or days, respectively; the date of acquisition was
transformed into days and scaled to the unit interval. Since values for clinical
categorical variables and comorbidities were only available to COVID-19patients, various models were employed that aimed to explore different aspects
of immune system change during COVID-19:Comparison of healthy donors to COVID-19patients: sex + race +
age + batch + COVID19Effect of clinical/demographic factors on COVID-19patients: sex
+ race + batch + COVID19 + severity group + hospitalization + intubation
+ death + diabetes + obesity + hypertension + age in years + days since
symptoms startEffect of tocilizumab treatment on severe patients only: sex +
age + batch + tocilizumabTo generate a graph of interactions between factors and immune
populations, significant coefficients (FDR-adjusted p-value <0.05) were
used as undirected edges between factors and immune populations. For edges
between factors, the Pearson correlation between factors across immune
populations was used. Exclusively for visualization, coefficients for the
continuous variables “age” and “time since symptoms
started” were multiplied by half of the median of the values of that
variable (33.0 and 10.8, respectively) to make the range of coefficients
comparable with the categorical variables. Visualizations were produced using
Gephi version 0.9.2 with the Force Atlas2 layout with parameters “LinLog
mode”, “scaling factor” 8.0, and “gravity”
11.0.
Prediction of disease severity from immunotypes
A Random Forest Classifier was trained as implemented in
scikit-learn framework[64] (version 0.23.0) to distinguish between cases with
“mild” and “severe” disease using 10-fold cross
validation. The cross-validation loop was repeated 100 times and models were fit
with real or randomized labels. Test set performance was assessed with the
ROC-AUC. To investigate the performance of the classifier, feature importance
was averaged across cross validation folds and iterations and the log fold
importance of the real models over the randomized labels was calculated. A sign
was added to the feature importance depending on the sign of the Pearson
correlation of each variable with each class. Only the earliest temporal sample
of each patient was used to ensure lack of data leakage (avoid training/testing
on samples from the same patient without stratified cross validation) and to
maximize the utility of the model. The same cross-validation scheme was used to
develop a classifier using a subset of features but including feature selection
using mutual information inside the cross-validation loop. To predict severity
longitudinally for single patients, a model was trained on the initial samples
from all other patients and tested on the samples of the patient in
question.
Data availability
Quantification of immune cell populations is available as a Supplementary Table file. Hierarchical
data format files with single cell data (h5ad) are available as indicated in the
repository with source code for the study (https://github.com/ElementoLab/covid-flowcyto) and raw FCS files
will be made available upon post-peer review publication.
Code availability
The source code for the analysis is available on GitHub at the following
URL: https://github.com/ElementoLab/covid-flowcyto
Authors: Divij Mathew; Josephine R Giles; Amy E Baxter; Derek A Oldridge; Allison R Greenplate; Jennifer E Wu; Cécile Alanio; Leticia Kuri-Cervantes; M Betina Pampena; Kurt D'Andrea; Sasikanth Manne; Zeyu Chen; Yinghui Jane Huang; John P Reilly; Ariel R Weisman; Caroline A G Ittner; Oliva Kuthuru; Jeanette Dougherty; Kito Nzingha; Nicholas Han; Justin Kim; Ajinkya Pattekar; Eileen C Goodwin; Elizabeth M Anderson; Madison E Weirick; Sigrid Gouma; Claudia P Arevalo; Marcus J Bolton; Fang Chen; Simon F Lacey; Holly Ramage; Sara Cherry; Scott E Hensley; Sokratis A Apostolidis; Alexander C Huang; Laura A Vella; Michael R Betts; Nuala J Meyer; E John Wherry Journal: Science Date: 2020-07-15 Impact factor: 47.728