Literature DB >> 34347776

Clinical and biological clusters of sepsis patients using hierarchical clustering.

Grégory Papin^1,2, Sébastien Bailly^3,4, Claire Dupuis^1,2, Stéphane Ruckly⁵, Marc Gainnier⁶, Laurent Argaud⁷, Elie Azoulay⁸, Christophe Adrie⁹, Bertrand Souweine¹⁰, Dany Goldgran-Toledano¹¹, Guillaume Marcotte¹², Antoine Gros¹³, Jean Reignier¹⁴, Bruno Mourvillier¹⁵, Jean-Marie Forel¹⁶, Romain Sonneville², Anne-Sylvie Dumenil¹⁷, Michael Darmon⁸, Maité Garrouste-Orgeas¹⁸, Carole Schwebel¹⁹, Jean-François Timsit^1,2.

Abstract

BACKGROUND: Heterogeneity in sepsis expression is multidimensional, including highly disparate data such as the underlying disorders, infection source, causative micro-organismsand organ failures. The aim of the study is to identify clusters of patients based on clinical and biological characteristic available at patients' admission.
METHODS: All patients included in a national prospective multicenter ICU cohort OUTCOMEREA and admitted for sepsis or septic shock (Sepsis 3.0 definition) were retrospectively analyzed. A hierarchical clustering was performed in a training set of patients to build clusters based on a comprehensive set of clinical and biological characteristics available at ICU admission. Clusters were described, and the 28-day, 90-day, and one-year mortality were compared with log-rank rates. Risks of mortality were also compared after adjustment on SOFA score and year of ICU admission.
RESULTS: Of the 6,046 patients with sepsis in the cohort, 4,050 (67%) were randomly allocated to the training set. Six distinct clusters were identified: young patients without any comorbidities, admitted in ICU for community-acquired pneumonia (n = 1,603 (40%)); young patients without any comorbidities, admitted in ICU for meningitis or encephalitis (n = 149 (4%)); elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures (n = 243 (6%)); elderly patients, with several comorbidities and organ failures (n = 1,094 (27%)); patients admitted after surgery, with a nosocomial infection (n = 623 (15%)); young patients with immunosuppressive conditions (e.g., AIDS, chronic steroid therapy or hematological malignancy) (n = 338 (8%)). Clusters differed significantly in early or late mortality (p < .001), even after adjustment on severity of organ dysfunctions (SOFA) and year of ICU admission.
CONCLUSIONS: Clinical and biological features commonly available at ICU admission of patients with sepsis or septic shock enabled to set up six clusters of patients, with very distinct outcomes. Considering these clusters may improve the care management and the homogeneity of patients in future studies.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 34347776 PMCID： PMC8336799 DOI： 10.1371/journal.pone.0252793

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

In European intensive care units (ICU), the frequency of admission for sepsis still ranges from 10 to 64% [1]. Although the prognosis has improved thanks to a better management of vital organ support, the in-ICU mortality of patients with sepsis still ranges between 20 to 30% [2]. Despite many studies, the lack of effective specific therapies remains the main issue for the management of septic patients. The heterogeneity of patients included in studies focusing on sepsis could partly explain these failures [3-5]. Sepsis is not a specific illness, but rather a syndrome encompassing a still-uncertain pathobiology. Currently, the classification of sepsis is only based on the etiology, the distinction between sepsis and septic shock, and mortality risk levels after stratification on severity scores [6]. These scores do not capture adequately the heterogeneity of sepsis [7]. As explained by J. Castela Forte, to personalize and improve treatments of sepsis, patients must be clustered into common phenotypes based on clinically objective parameters reflecting disease mechanisms. After external validation step, these clusters could be grouped by underlying causal mechanisms and will improve patient characterization; optimized design and powering of randomized control trials. Finally, these clusters can allow identifying differential response patterns by considering baseline characteristics of sepsis patients [8]. Cluster analysis refers to statistical methods of data partitioning whereby objects or individuals are grouped into homogeneous groups on the basis of similarity independent of any outcome variables. Several methods may be used, each of them with advantage and pitfalls [9]. The multiple correspondence analysis (MCA) approach, combined with hierarchical clustering (HC) has already been used in several diseases such as chronic obstructive pulmonary disease, asthma, obstructive sleep apnea, lung cancer and vasculitis [10-14]. This clustering method has two advantages; it limits the noise in the data set without inducing bias by reducing the variables in coordinates obtained for each patient which summarize the main part of the information and does not require defining a priori the number of clusters. In ICU, acute respiratory distress syndrome (ARDS) was evaluated through a cluster analysis [15]. Two clusters were described, with consistent outcomes and responses to treatment; these two ARDS clusters were a tangible discovery made thanks to a retrospective analysis of negative clinical trials. In sepsis, several studies used cluster analysis mainly for genotypic or transcriptomic approach [16,17]. Recently, Seymour et al conducted a clustering analysis on several large cohorts of patients meeting the definition of sepsis 3.0 [18]. Four interesting patient clusters were identified that correlated with host-response patterns and clinical outcomes. However, understanding the phenotype of critically ill patients should include more data characterizing patient comorbidity and infection [19]. The comparison of cluster analysis on different cohorts is therefore essential. The primary objective of our study was to identify clusters among patients with sepsis by considering data available at admission including: underlying disorders, source of infection, micro-organism, biological host response and organ failures. The secondary objectives were to assess their heterogeneity on outcomes and validate them in an independent dataset.

Materials and methods

Study design and data source

We conducted a retrospective analysis of a prospective observational multicenter database (OutcomeRea™). The database, fed by 20 French ICUs, collects prospective data on daily disease severity, iatrogenic events, and nosocomial infections. Each year, each ICU includes a random sample of at least 50 patients who have ICU stays longer than 24 h. Each ICU could choose to obtain the random sample by taking either consecutive admissions to selected ICU beds throughout the year or consecutive admissions to all ICU beds for 1 month. This study was approved by our institutional review board (CECIC Clermont-Ferrand—IRB n°5891; Ref: 2007–16), which waived the need for signed informed consent of the participants, in accordance with French legislation on non-interventional studies. However, the patients and their next of kin were asked whether they were willing to participate in the database and use of their personal anonymized data, none declined participation.

Participants

All patients admitted in ICU for sepsis and septic shock were included. ICU-acquired infections were excluded. The presence or absence of infection at admission was prospectively recorded by clinical physicians according to the standard definitions developed by the Centers for Disease Control and Prevention and recently updated [21]. According to the new Sepsis 3.0 definition [5], sepsis was defined as a life-threatening organ dysfunction, identified by an increase by 2 points or more of the SOFA score, associated with an infection. Patients were included in the OUTCOMEREA™ database prior to this new definition. Accordingly, the increase in the SOFA score was retrospectively calculated. The maximum SOFA score measured on the first day of ICU stay was used. Prior to admission in ICU, SOFA score baseline can be considered to be at zero. The baseline SOFA score of patients with chronic renal replacement therapy was assumed at 4 points. Septic shock was defined by a need to administer vasopressor agents for maintaining a mean arterial pressure of 65 mmHg or greater, and a serum lactate level greater than 2 mmol/L (>18 mg/dL) in the absence of hypovolemia.

Data collection

Data were collected daily in the participating ICUs by senior physicians and/or specifically trained study monitors. For each patient, the investigators entered the data into a computer case-report form using data-capture software (RHEA; OutcomeRea™, France) and imported all records into the OutcomeRea™ database. All codes and definitions were established prior to study initiation. For most of the study variables, the data-capture software immediately ran an automatic check for internal consistency, generating queries that were sent to the ICUs for resolution before incorporation of the new data into the database. In each participating ICU, data quality is checked by having a senior physician from another participating ICU who performs a review of a 2% random sample of the study data every other year. A 1-day data-capture training course held once a year is open to all OutcomeRea™ investigators and study monitors. All qualitative variables used in the analyses had Cohen’s kappa coefficient > 0.8 and all variables had inter-rater coefficients in the 0.67 to 1 range, indicating good to excellent reproducibility.

Statistical methods

For variables with less than 20% of missing values, we performed a multiple imputation of missing data using Markov Chain Monte Carlo. Sixty-three clinical and biological variables were available at admission. Description of 63 variables included in the cluster analysis was available S1 Table. The original data set was randomly split into a training set (2/3 of the patients) and a validation set (1/3 of the patients). The statistical analyses comprised 4 steps: 1) Reduction dimension and cluster analysis; 2) Cluster description, 3) Outcomes, 4) Binary tree and cluster validation (Fig 1). Due to the absence of recommendations, we empirically chose the combination of MCA and HC.

Fig 1

Schematic of study.

Definition of abbreviations: MCA = multiple correspondence analysis; HC = hierarchical clustering Sensitivity analyses: Cluster analysis in the training set after excluding COPD exacerbation and cluster analysis in the training set after excluding the data before 2008.

Schematic of study.

Cluster analysis

An MCA was performed to reduce the dimension of the 63 variable of the dataset () in “Euclidian patient-coordinates” dataset. Because MCA is based on qualitative variables, quantitative variables we categorized. The first 52 dimensions out of a total of 79, which explained at least 90% of the total variability, were considered in the HC [20]. The HC was performed on this patient-coordinates dataset using the Ward’s minimum-variance. Initially, each patient was his own cluster, and was thereafter merged into larger clusters to minimize the within-cluster homogeneity and to maximize the inter-cluster heterogeneity. The final number of clusters was defined on the basis of the Semi partial R-Squared, the Squared-R, Pseudo F statistic and The Pseudo t2 statistic. There is no consensus in the literature on the final choice of the number of clusters regardless of the clustering method. K-means algorithm was considered to identify potential outliers in the dataset [21].

Cluster description

Variables were described separately for each cluster, by the use of median and interquartile range (IQR) for quantitative variables, and frequency and percent for qualitative variables. The probability for each variable of belonging to one cluster was assessed using odds ratios, determined by a univariable logistic regression.

Outcomes

The associations between clusters and mortality were assessed by using the status of the patients at day-28, day-90 and one year after admission. The log-rank test was used to compare clusters for mortality. Risk of mortality were described after adjustment on SOFA score at admission using Cox model. Analyses were performed on sub-groups of patients with septic shock. The length of ICU and hospital stay, the number of ventilator-free days at day-28, the duration of the renal replacement therapy, the number of catecholamine-free days at day-28, and the number of organ system failure-free days at day-28 were evaluated.

Binary tree

To build a simple tool able to assign a new patient into clusters using only data that are commonly available at ICU admission, a binary tree was performed using classification and regression tree (CART) [22,23]. The accuracy of the binary tree was evaluated using sensitivity, specificity and Area under the Receiver Operating Characteristic (AUC) in the training set.

Cluster validation

Cluster analysis was also applied in the validation set. Results obtained were compared to clusters obtained by applying the binary tree using AUC. New cluster analysis and cluster description were performed after exclusion of the oldest data (patients admitted before 2008). Data analyses were performed using R (The R foundation, Vienna, Austria) and SAS version 9.4 (SAS Institute Inc., Cary, NC) [24].

Results

Subject demographics

The new definition of sepsis applied on the 18,840 patients of the OUTCOMEREA® database yielded a total of 6,046 patients admitted for a first episode of sepsis between 1997 and 2015, of whom 58% (n = 3,479) had a septic shock (S1 Fig). Details of missing data are available (S1 Table). The initial sample was split in two sets: a training set (n = 4,050 (67%)) and a validation set (n = 1,996 (33%)) (Fig 1 and S1 Table). In the training set, the median age of patients was 65 year (IQR: [53-76]) and 62% (n = 3,763) were males. Patients were mainly admitted for medical reasons (80%). Median SAPSII and SOFA score at admission were 46 [34-60] and 6 [4-9], respectively. At admission, 58% of patients (n = 2,344) required mechanical ventilation, 60% (n = 2,430) required vasopressor therapy and 10% (n = 415) required renal replacement therapy. The estimated mortality was 26% [95%CI:24–27] at day-28, 36% [95%CI:34–38] at day-90 and 40% [95%CI:38–42] at one year. The baseline characteristics of all variables of the training set are shown in Tables 1–4.

Table 1

Host characteristics by cluster (performed in training set).

Variable	Cluster 1 n = 1,603	Cluster 2 n = 149	Cluster 3 n = 243	Cluster 4 n = 1,094	Cluster 5 n = 623	Cluster 6 n = 338
Age (years)	59 [47–72]	55 [36–66]	70 [61–77]	72 [62–79]	68 [56–77]	56 [42–64]
Sex (Male)	1062 (66%)	81 (54%)	144 (59%)	624 (57%)	366 (59%)	214 (63%)
Weight (kg)	69 [57–80]	68.6 [58–80]	68.6 [58–83]	70.7 [60–83.5]	72 [61–84]	70 [60–80]
Malnutrition	96 (6%)	5 (3%)	10 (4%)	66 (6%)	33 (5%)	20 (6%)
Alcohol abuse	302 (19%)	13 (9%)	39 (16%)	171 (16%)	79 (13%)	21 (6%)
Not complicated diabetes	131 (8%)	15 (10%)	32 (13%)	222 (20%)	69 (11%)	22 (7%)
Complicated diabetes	20 (1%)	4 (3%)	10 (4%)	147 (13%)	29 (5%)	3 (1%)
Chronic heart failure	135 (8%)	10 (7%)	47 (19%)	403 (37%)	123 (20%)	31 (9%)
Chronic kidney disease	38 (2%)	6 (4%)	7 (3%)	201 (18%)	37 (6%)	26 (8%)
Liver cirrhosis	75 (5%)	9 (6%)	4 (2%)	178 (16%)	65 (10%)	21 (6%)
COPD	371 (23%)	9 (6%)	202 (83%)	217 (20%)	68 (11%)	25 (7%)
Hematological malignancy	41 (3%)	7 (5%)	3 (1%)	29 (3%)	14 (2%)	293 (87%)
HIV/AIDS or Transplant	97 (6%)	20 (13%)	0 (0%)	43 (4%)	14 (2%)	57 (17%)
Solid tumor	138 (9%)	7 (5%)	12 (5%)	181 (17%)	129 (21%)	53 (16%)
Chronic steroid therapy	92 (6%)	11 (7%)	18 (7%)	63 (6%)	25 (4%)	41 (12%)
Charlson score	2 [1–3]	1 [1–3]	3 [2–4]	4 [3–6]	3 [3–5]	3 [2–4]
ICU Admission
Medical admission	1494 (93%)	144 (97%)	237 (98%)	957 (87%)	76 (12%)	331 (98%)
Unscheduled surgery	46 (3%)	3 (2%)	2 (1%)	89 (8%)	489 (78%)	7 (2%)
Scheduled surgery	63 (4%)	2 (1%)	4 (2%)	48 (4%)	58 (9%)	0 (0%)

Table 4

Organ failure at admission by cluster (performed in training set).

Variable	Cluster 1 n = 1,603	Cluster 2 n = 149	Cluster 3 n = 243	Cluster 4 n = 1,094	Cluster 5 n = 623	Cluster 6 n = 338
Vasopressor at admission	714 (45%)	58 (39%)	68 (28%)	799 (73%)	470 (75%)	204 (60%)
Glasgow Coma Score	15 [8–15]	9 [6–13]	15 [13–15]	13 [7–15]	15 [13–15]	15 [13–15]
Creatinine level (μmol/l)	85 [64–120]	86 [72–121]	80 [62–114]	165 [110–278]	111.5 [76.5–185]	111.5 [76.5–185]
Platelets count (x10³/mm³)	223 [155–300]	209 [125–282]	234 [182–308]	182 [110–275]	214.5 [132–297]	40 [24–80]
PaO2/FiO2 ratio (mmHg)	217 [133–378]	312 [214–493]	240 [170–322]	224 [136–358]	267 [180–382]	315 [168–497]
Bilirubin level (mmol/l)	11 [7–19]	14 [8–24]	9 [6–13]	16 [9–30]	17 [10–30]	20 [11–42]
Blood lactate level (mmol/l)	1.7 [1.2–2.7]	1.8 [1.3–3.2]	1.5 [1.1–2.2]	2.6 [1.6–4.9]	2.2 [1.4–3.7]	2.2 [1.5–4.4]

Definition of abbreviations: COPD = chronic obstructive pulmonary disease; HIV = human immunodeficiency virus; AIDS = acquired immune deficiency syndrome; IQR = interquartile range; Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients, with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. Values in Numbers (%) or median [IQR]. Definition of abbreviations: NF GNB: non-fermentative Gram negative bacilli. Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any co-morbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. Values in Numbers (%) or median [IQR]. Definition of abbreviations: COPD = chronic obstructive pulmonary disease; HIV = human immunodeficiency virus; AIDS = acquired immune deficiency syndrome; IQR = interquartile range; Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. Values in Numbers (%) or median [IQR]. Definition of abbreviations: COPD = chronic obstructive pulmonary disease; HIV = human immunodeficiency virus; AIDS = acquired immune deficiency syndrome; IQR = interquartile range; Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients, with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. Values in Numbers (%) or median [IQR].

Clusters analysis and clusters description

Six clusters were identified (S2 and S3 Figs). The contributions of variables in the construction of the first four dimensions of the MCA are depicted on S4 Fig. Representation of patients in the first four dimensions of MCA is depicted on S5 Fig. Clusters were described according to their major associations (S6 Fig). Tables 1–4 describe the distribution of demographic characteristics, various comorbidities, sources of infection, micro-organisms, clinical and biological data, and the organ failures at admission. The main characteristics of each cluster are described below.

Cluster 1 (1,603 patients, 40%): Young patients without any comorbidities, admitted in ICU for community-acquired pneumonia

Most patients were included in this cluster. Among them, 1,391 (87%) had pneumonia, which was community-acquired for 1,125 (70%) of them. The most frequently involved micro-organism was Streptococcus pneumoniae (259 (16%) patients); however, no responsible pathogens were identified in 710 (44%) patients.

Cluster 2 (149 patients, 4%): Young patients without any comorbidities, admitted in ICU for meningitis or encephalitis

This cluster, very close to the cluster 1 in the hierarchical classification, was the smallest. It gathered the youngest patients (median age 55.2 [36.6–66.8]). Almost 20 patients (13%) had HIV, AIDS, or an organ transplant. Their Glasgow Coma Score was the lowest (median at 9 [6-13]) compared to that of other clusters.

Cluster 3 (243 patients, 6%): Elderly patients with COPD, admitted in ICU for bronchial infection and with few organ failures

With a median SOFA score of 4 [2-6] at admission, this cluster had the less severely ill patients. Only 68 patients (28%) were in septic shock. COPD was the main comorbidity (202 patients, 83%). The pathogen of the bronchial infection was most often not identified (198 patients, 81%).

Cluster 4 (1,094 patients, 27%): Elderly patients with several comorbidities and organ failures

This cluster comprised the oldest patients (median age, 72.3 [62.3–79.7] years). Comorbidities were mainly chronic heart failure (n = 403, 37%), mellitus diabetes (n = 369, 34%) and COPD (n = 217, 20%). The patients in this cluster had the highest severity status at admission, as 799 (73%) had a septic shock and a median SOFA score at admission at 8 [5-11].

Cluster 5 (623 patients, 15%): Patients admitted after surgery (scheduled or unscheduled), and with a nosocomial infection

This cluster included 489 patients (78%) admitted for unscheduled surgery and 58 patients (9%) admitted for scheduled surgery. Patients were aged 68.8 [56.8–77.6] in median, and had comorbidities such as solid tumor (129 (21%)) and chronic heart failure (123 (20%)). Most patients were in septic shock (470 (75%)), and their infection was nosocomial for 316 (51%) of them.

Cluster 6 (338 patients, 8%): Young patients with immunosuppressive conditions such as AIDS, chronic steroid therapy or hematological malignancy

In this cluster, 293 patients (87%) had a hematological malignancy and 57 (17%) were diagnosed with HIV, AIDS or had an organ transplant. Thrombocytopenia (median platelets count: 40,000 [2424,000-80,000]/mm3) and leucopenia (median leukocytes count: 800 [200-4,300]/mm3) were the main biologic disorders.

Outcome data

The survival curves shown in Fig 2 were significantly different between clusters (log-rank test: p <0.01). The risks of early, intermediate and late mortality, with and without adjustment, are shown in Fig 3. Results remained similar in the sub-groups limited to patients with septic shock (S7 Fig). The differences in length of ICU and hospital stay, number of ventilator-free days at 28-day, duration of renal replacement therapy, number of catecholamine-free days at 28-day, or of organ system failure-free days at 28-day are described in S2 Table.

Fig 2

Mortality estimated by Kaplan-Meier according to the cluster assignment with log rank tests (performed in training set).

Definition of abbreviations: Each curve was compared one by one using a log rank test; Results of these tests are presented in a double entry matrix; each cluster can be identified by its color; Analysis was performed including all patients in sepsis or septic shock. Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy.

Fig 3

Description of the clusters and their risks of early, intermediate and late mortality, with and without adjustment (performed in training set).

Definition of abbreviations: HR: Hazard ratio; The analysis was performed after exclusion of patients without septic shock. A Cox model was used to determine the hazard ratio; Data are reported as HR ± 95% confidence intervals, presented from lowest to highest;presented from lowest to highest; Cluster 3 was used as reference; Adjusted mortality were adjusted using SOFA score at admission and year of ICU admission. Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy such as AIDS, chronic steroid therapy or hematological malignancy.

Mortality estimated by Kaplan-Meier according to the cluster assignment with log rank tests (performed in training set).

Description of the clusters and their risks of early, intermediate and late mortality, with and without adjustment (performed in training set).

Binary tree

Six discriminatory variables available on admission were identified by CART methods to assign new patients into a cluster (S8 Fig). The distribution of the patients amongst the clusters was similar. The accuracy of the binary tree in the training set is shown on S3 Table.

Cluster validation

Six clusters were identified in the validation set (S9 Fig). The contributions of variables in the construction of the first four dimensions of the MCA are depicted on S10 Fig. S4 Table describe the distribution of demographic characteristics, various comorbidities, sources of infection, micro-organisms, clinical and biological data, and the organ failures at admission. The accuracy of the binary tree in the validation set is shown on S5 Table. The results of the clusters analysis after exclusion of the oldest data (admission before 2008) are provided in S6 Table.

Discussion

Our work is an original clinical study based on a large sample that tried to reduce the heterogeneity of septic population. Using the new definition of sepsis 3.0 [4], we performed a HC based on clinical and biological data commonly available at ICU admission. We were able to discriminate 6 rather homogeneous clusters of patients with sepsis and septic shock. Three clusters were characterized by underlying disorders, while two clusters were characterized by the source of infection. Baseline risk of death at day-28, day-90, and one year were significantly different across clusters, independently of organ failures at admission. After having developed a binary classification tree, we were able to identify similar results from the validation subset. The ability to affect a patient with sepsis to a homogeneous cluster should enable to achieve a personalized ICU medical care strategy, and to test potentially appropriate new therapies by performing more efficiently targeted clinical trials. The simplest way to do so is to use classification tree. The low influence of organ dysfunction to characterize clusters was the main result that was comparable with the study of Seymour et al [18]. Two clusters share identical characteristics, one that included young patients with low comorbidity, and one that included elderly patients with high Charlson scores. Some similarities can be found between their δ clusters and our cluster 5. Patients had more liver dysfunction, and more often a septic shock and intra-abdominal sepsis. It was the most frequent site of infection of this cluster. Interestingly, we identified a well-defined cluster including immunocompromised patients, that mainly comprised hematologic patients. These patients were excluded of the SENECA cohorts, GenIMS cohort, and of the ProCESS trial, PROWESS trial or ACCESS trial. We identified a small cluster that included patients with COPD exacerbation. These patients met to the definition of sepsis 3.0, yet mechanisms of organ failure were probably very different from other cluster. The inclusion of these patients in sepsis studies should be discussed. Some limitations must be acknowledged for this study. First, these phenotypes will reduce heterogeneity prior to randomization; however, because they will not necessarily create better trials, further studies are necessary to better explain this clustering [31]. Second, the training set and the validation set come from the same database. A validation with an external database would have been more robust. Third, the time of patient’s admission on our study ranged from 1997 to 2015 with heterogeneous periods of inclusion and number of patients included between each center (S7 Table). The improvement of the prognosis related to the improvements in sepsis management is no longer debated. This difference in risks of mortality was taken into account by an adjustment on the year of ICU admission and a sensitivity analysis was performed without the oldest data. Finally, unsupervised analysis is not unbiased because only the data available within the database can be explored, and only the data that are known or thought to be important are entered [25]. Some data would have been necessary, in particular data about clinical management or specific therapy [26]. Also, there is a lack of information about cytokines, ethnicity, genetic polymorphism, precise dating of the infection onset, and biomarkers like C-reactive protein test. Several studies have focused on identification of sepsis molecular phenotypes based on gene expression data [16,27,28]. Between two to four clusters were identified. Genotype or endotype approach has the potential to substantially improve our understanding of the key biological pathways involved in human diseases and to suggest new targets for treatment or prevention. However, studies using these approaches often failed to replicate positive findings, especially when investigating associations with sepsis outcomes [29,30]. The possible explanations include low statistical power, heterogeneous patient populations, and imprecise definition of phenotypes [31]. As suggested in a review by Clark et al, sepsis defines a syndrome rather than a specific disease. This may lead to a marked heterogeneity in patient populations, which may explain some of the variations in the findings. Also, he suggested to use more precise phenotypes, for example, meningococcal sepsis or fecal peritonitis. According to Rautanen et al, promising results might be generated by focusing on more homogenous subgroups such as sepsis patients with pneumonia [32]. The understanding of the immune system and its interaction with pathogens must take into account the high dimensionality of the data, future studies are needed to aggregate different data in order to combine clinical data, host genomics, transcriptomic responses and cytokines, and using data science approaches accounting for longitudinal data. When the hypothesis is made that the treatment effect is similar in all patients and evolves linearly with the severity of illness, the expected effect of the new treatment in a randomized trial measured by the reduction of the risk of death is dependent of the baseline risk of death and its distribution in the sample. This was described and illustrated by Kent [33]. In their simulation study of therapeutic trials on ARDS and sepsis, Iwashyna et al showed that the variation in the baseline risk of death was the main determinant of the heterogeneity of the treatment response [34]. Thus, the impact of the same treatment applied to populations with a different baseline risk of death spanned from an increase in the risk of death in a low risk population to a decrease in the risk of death in patients whom baseline risk of death was high. However, patients with sepsis are intrinsically heterogeneous, not only in their baseline risk of death [35,36], but also in their risk of adverse outcome [37]. Identifying treatable traits and setting an accurate diagnosis are the major challenges in sepsis [37]. Improving prognostic estimation, and performing comparisons and benchmarking of processes and outcomes between different ICUs are also necessary. Although the current definition of sepsis groups was designed to reduce heterogeneity of the patients [4], an important variability can still be observed for many patients’ characteristics such as the sources of infection, causative pathogen(s), age, lifestyle, comorbidities, or genetic profile.

Conclusion

Because the prognostication and the identification of target patients remain difficult in sepsis, new approaches are necessary. In patients who met the new sepsis 3.0 definition, six clusters, clearly different in their clinical and biological presentation, were identified by using hierarchical clustering. These clusters also differed in mortality and severity of illness. Considering these clusters may reduce the uncontrolled differences in patients’ prognosis and improve the power of studies. Future works including big data analysis, clinical and genomic data and several biomarkers may contribute to better defining homogeneous subsets of sepsis patients.

Flow chart.

(DOCX) Click here for additional data file.

Dendrogram of ascending hierarchical clustering analysis used in the training set.

Dendrogram obtained after application of hierarchical clustering analysis by accounting for the 51 dimensions of the multiple correspondence analysis. The vertical axis of the dendrogram represents the distance between clusters. The horizontal vertical axis represents the patients and clusters. Each junction between two clusters is represented on the graph by the split of a vertical line into two vertical lines. The vertical position of the split, shown by the short horizontal bar, gives the distance between the two clusters. The red line shows the cut level that determines the number of clusters. The indices used to determine this cut level, Semi partial R-Squared, the Squared-R, the Pseudo F statistic and the Pseudo t2 statistic, are presented in S3 Fig. (DOCX) Click here for additional data file.

Reprentation of indices used to determine the dendrogram cut level.

A: The Semi partial R-Squared given the decrease in the proportion of variance accounted of resulting from joining the two clusters. B: The Squared-R is the proportion of variance accounted for by the cluster. C: The Pseudo F statistic measuring the separation among all clusters at the current level. D: The Pseudo t2 statistic measuring the separation between the two clusters most recently joined. There is no consensus in the literature on the final choice of the number of clusters regardless of the clustering method. According to the different criteria (Semi partial R-Squared = 4 to 6 clusters, R squared = upper 10, Pseudo F statistic = 4 to 7 clusters, Pseudo t2 statistic = 2, 4 or 6 clusters), the number of six clusters is the best trade-off between goodness of fit criteria and clinical interpretation. (DOCX) Click here for additional data file.

Contribution of each variable to the first four dimensions of the MCA.

Definition of abbreviations: MCA = multiple correspondence analysis; COPD = chronic obstructive pulmonary disease; ICU = intensive care unit; SOFA = Sequential Organ Failure Assessment score; The contribution is expressed as a percentage. The representation is limited to the first 20 variables that contribute most to the dimension of the 63 used. The variables represented are classified in descending order of contribution. (DOCX) Click here for additional data file.

Point cloud of patients representing the 6 clusters in the first four dimensions of the MCA.

Definition of abbreviations: MCA: Multiple correspondence analysis; HC: Hierarchical clustering; Each patient was represented using his individual coordinates in these dimensions. Patient’s assignment to a cluster was obtained after HC application. A: Axes correspond to the first and second dimension of MCA. Each patient was represented using his individual coordinates in these dimensions. B: Axes correspond to the third and fourth dimension of MCA. (DOCX) Click here for additional data file.

Probability for variables for belonging to one cluster assessed using univariable logistic regression.

A: Figure of cluster 1; B: Figure of cluster 2; C: Figure of cluster 3; D: Figure of cluster 4; E: Figure of cluster 5; F: Figure of cluster 6; Definition of abbreviations: COPD = chronic obstructive pulmonary disease; HIV = human immunodeficiency virus; AIDS = acquired immune deficiency syndrome; Leucopenia was defined by a leukocyte count below 500/mm3; Hyperlactatemia was defined by a blood lactate concentration greater than 2 mmol/l; Median of SOFA score by organ was used to define organ dysfunction in binary variable. Associations with leucopenia and bronchial could not be calculated due to insufficient numbers in this cluster. (DOCX) Click here for additional data file.

Risks of mortality at 28 days, 90 days and one-year according to the cluster assignment in patients with septic shock in the training set.

Definition of abbreviations: OR: Odds ratio; The analysis was performed after exclusion of patients without septic shock. A logistic regression was used to determine the odds ratio; Data are reported as odds ratios ± 95% confidence intervals, presented from lowest to highest; Cluster 3 was used as reference class; Adjusted mortality were adjusted using SOFA score at admission and year of ICU admission. Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. (DOCX) Click here for additional data file.

Binary tree using variables available at the ICU admission.

Definition of abbreviations: ICU = intensive care unit; The binary tree was built in the training set using Breiman methods [22] with Rpart package version 4.1–10 [23], R version 3.1.0. The structure is similar to a real tree, from the bottom up: There is a root, where the first split happens. After each split, two new nodes are created. Each node contains only a subset of the patients with a minimum size of 20. The partitions of the data, which are not split any more, are called terminal nodes or leafs. The second stage of the procedure consists to prune the tree using cross-validation. Pruning means to shorten the tree, which makes trees more compact and avoids over fitting to the training data. Each split is examined, if it brings a reliable improvement. The six variables used by the binary tree are: Lung infection, Surgical admission, Hematological malignancy, Bronchial infection, Chronic heart failure and Meningeal infection. The accuracy of the binary tree evaluated in the training dataset was available on S3 Table. Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. (DOCX) Click here for additional data file.

Representation of indices used to determine the dendrogram cut level (performed in validation set).

A: The Semi partial R-Squared given the decrease in the proportion of variance accounted of resulting from joining the two clusters. B: The Squared-R is the proportion of variance accounted for by the cluster. C: The Pseudo F statistic measuring the separation among all clusters at the current level. D: The Pseudo t2 statistic measuring the separation between the two clusters most recently joined. There is no consensus in the literature on the final choice of the number of clusters regardless of the clustering method. According to the different criteria (Semi partial R-Squared = 3 to 6 clusters, R squared = upper 10, Pseudo F statistic = 2 to 7 clusters, Pseudo t2 statistic = 3 or 6 clusters), the number of six clusters is the best trade-off between goodness of fit criteria and clinical interpretation. (DOCX) Click here for additional data file.

Contribution of each variable to the first four dimensions of the MCA (performed in validation set).

The 63 variables included in the cluster analysis, distribution and missing values.

Definition of abbreviations: COPD = chronic obstructive pulmonary disease; HIV = human immunodeficiency virus; AIDS = acquired immune deficiency syndrome; NF GNB: Non-fermentative Gram negative bacilli; MDRO: Multi-drug resistance organisms (refer to vancomycin-resistant enterococci, methicillin-resistant Staphylococcus aureus, extended-spectrum β-lactamase-producing Enterobacteriaceae, AmpC-producing Enterobacteriaceae, Pseudomonas aeruginosa resistant to more than two antimicrobial families, Stenotrophomonas maltophilia); Values in Numbers (%) or median [IQR]. (DOCX) Click here for additional data file.

Outcomes of patients (performed in training set).

Definition of abbreviations: ICU = intensive care unit; IQR = interquartile range; Free days were censored at 28 days; A ventilator-free day refers to a day without invasive or non-invasive mechanical ventilation or death; A catecholamine-free day refers to a day without to vasoactive or inotropic agent or death; An organ system failure-free day refers to a day without SOFA score upper zero or death; Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. Values in Numbers (%) or median [IQR]. P- values were obtained by Analysis of variance or Chi-2 test. (DOCX) Click here for additional data file.

Accuracy of the binary tree (performed in training set).

Definition of abbreviations: Truly assigned = Number of patients correctly assigned as belonging to the cluster according to the total number of patients in the cluster; Falsely assigned = Number of patients incorrectly labeled as belonging to the cluster according to the total number of patients out of the cluster; Se = Sensitivity; Sp = Specificity; AUC = Area under the Receiver Operating Characteristic curve; IC95% = 95% confidence intervals; Cluster 1 = young patients, without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients, without any comorbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = Young patients, with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. (DOCX) Click here for additional data file.

Characteristics of patients set after assignment in cluster by the cluster analysis (performed in validation set).

Accuracy of the binary tree (performed in validation set).

Characteristics of patients after excluding the oldest data (i.e., admission before 2008) (performed in training set).

Number of patients included and median of admission in each center.

(DOCX) Click here for additional data file.

Dataset.

(XLS) Click here for additional data file. 8 Nov 2020 Submitted filename: Reviewers.docx Click here for additional data file. 19 Jan 2021 PONE-D-20-35126 Clinical and biological clusters of sepsis patients using hierarchical clustering. PLOS ONE Dear Dr. Papin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 18th. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Martina Crivellari Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for including your ethics statement: "According to the French law, the study was approved by institutional review board and hospital ethics committee, which waived the need for informed consent of patients included in the database." a. Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study. b. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research. 3. In the ethics statement in the manuscript and in the online submission form, please provide additional information about the patient records/samples used in your retrospective study, including: a) whether all data were fully anonymized before you accessed them; b) the date range (month and year) during which patients' medical records/samples were accessed; c) the date range (month and year) during which patients whose medical records/samples were selected for this study sought treatment. 4. One of the noted authors is a group or consortium [OUTCOMEREA network]. In addition to naming the author group, please list the individual authors and affiliations within this group in the acknowledgments section of your manuscript. Please also indicate clearly a lead author for this group along with a contact email address. Additional Editor Comments: In general, the paper is of interest, is presented in standard English, but there are many concerns that have to be addressed. The paper is complex, there are data not always clear. The statistical analysis has not been performed appropriately and rigorously. The authors conclude that "[c]onsidering these clusters may reduce the uncontrolled differences in patients’ prognosis and improve the power of studies." It is not clear how the proposed clustering would help in clinical practice, authors should please clarify this point in the manuscript. There are no conflicts between the reviews. In my personal opinion even if this is an interesting paper, there are major questions to answer, and I was in doubt with the really possibility to make it achievable. I'm asking you a complete major revision with a specific attention to the statistical analysis. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: In "Clinical and biological clusters of sepsis patients using hierarchical clustering." Papin et al. report on an analysis based on data from 6,046 sepsis/septic shock patients from the French national prospective multicenter ICU cohort OUTCOMEREA. Essentially, the authors performed a hierarchical clustering (HC) on a random sample of 4,050 patients and derived six clusters which differed in early or late mortality (even after adjustment on severity of organ dysfunctions (SOFA) and year of ICU admission). The authors conclude that "[c]onsidering these clusters may reduce the uncontrolled differences in patients’ prognosis and improve the power of studies." Addressing the heterogeneity of sepsis patients to contribute e.g. to more targeted randomized controlled trials, is very important. The authors use retrospective data from a registry covering the period between 1997-2015. However, even after extensive reviewing, several major questions remain that have to be addressed. First of all the paper lacks clarity with regard to the steps that have been conducted. In this regard figure S1 and S2 should be put into the main text. Afterwards, it has to be clear in all figures and tables if the training or the validation data set is displayed. How many variables (overall) entered the HC – this number is important and is missing? If the HC is similarly conducted in the validation data set (independent of the training set) what cluster structure is derived and how similar is it to the six clusters? The focus should generally be on the validation data set! As this is the more critical step and of course generalizability to other ICU data sets is the ultimate step. Finally, and related to this point, the authors should extend the comparison to Seymour et al. (2019) and more systematically describe if and how their French results differ from those of the US electronic medical record analysis. Other major points: Page 8 (line 159-): What statistics are used for decisions and it is not clear why and how k-means clustering is included. Table 1: Can be put to the Supplement – why did you categorize quantitative variables (which is not recommended!) Page 10 (line 176-): These are time to event analyses (that can address censoring); thus, I hope that the mortality estimates are derived from the Kaplan Meier estimator (on page 11; line 206-); similarly I assume that Cox regression models were used for the analyses with adjustment? Page 10 (line 187): Accuracy in the training set is of very little use – report this information in the validation set, please. Page 10 (line 188-): Especially the red new text is not clear; in particular what is meant by “new”? Page 11 (line 210-): What is meant by “The clusters supported 12% of the total variability.”? Page 20 (line 333-): Is also not clear (I refer to the other points above). Page 21 (line 358-): This recommendation should be attenuated and written more carefully. Page 21 (line 364-): Please shorten the limitations part and structure the arguments better Page 22 (line 391-): The argument regarding “selection bias” is not clear. Page 23 (line 403-): “Second, …” where is first? Minor points: Page 5: The authors should mention the period of data collection already; moreover did the 20 ICUs contribute homogeneously along these 18 years? Please provide information on the ICUs their focus and temporal contribution Page 7 (line 141-): Please provide a little more information on the reported coefficients Page 7 (line 146-): Please provide more information on the missing value handling including the imputation and most importantly the impact that the imputation has on the conclusions Page 23 (line 414): better use prognosis instead of “prognostication” throughout Reviewer #2: Dear editor, thank you very much for inviting me to review this paper. The paper is a complex manuscript about the necessity to divide sepsis patients in cluster based on disease characteristics. Authors individuated six different clusters of patients analyzing data from a multicenter prospective database. Even though I lack the expertise to judge the details of the methodology, the chosen approach to explain data is not clear and the deep use of figures and tables makes all the results incomprehensible. I would suggest explaining the results, figures, and tables in text to allow a better understanding. Indeed, it is unclear what kind of information this study would add to daily clinical practice. It is clearly evident how sepsis and septic shock are conditions deeply related to clinical characteristics of the host, microbes involved and concomitant diseases, but it is not clear how the proposed clustering would help in clinical practice, I would suggest to better clarify this point in paper. Language is correct and all the bias and limits of the study are well indicated in manuscript. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: André Scherag Reviewer #2: Yes: Gaetano Lombardi [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 30 Apr 2021 Editor Comments: In general, the paper is of interest, is presented in standard English, but there are many concerns that have to be addressed. The paper is complex, there are data not always clear. The statistical analysis has not been performed appropriately and rigorously. The authors conclude that "[c]onsidering these clusters may reduce the uncontrolled differences in patients’ prognosis and improve the power of studies." It is not clear how the proposed clustering would help in clinical practice, authors should please clarify this point in the manuscript. There are no conflicts between the reviews. In my personal opinion even if this is an interesting paper, there are major questions to answer, and I was in doubt with the really possibility to make it achievable. I'm asking you a complete major revision with a specific attention to the statistical analysis. Dear editor We tried to solve the methodological points and to make the reading easier. Indeed the new reviewers asked for more details and it is of course difficult to be easily accessible to common readers and to provide details to specialists in statistics. We tried to provide details in ESM and to keep the manuscript understantable by common medical readers. Best regards Prof JF Timsit MD PhD Reviewer #1: In "Clinical and biological clusters of sepsis patients using hierarchical clustering." Papin et al. report on an analysis based on data from 6,046 sepsis/septic shock patients from the French national prospective multicenter ICU cohort OUTCOMEREA. Essentially, the authors performed a hierarchical clustering (HC) on a random sample of 4,050 patients and derived six clusters which differed in early or late mortality (even after adjustment on severity of organ dysfunctions (SOFA) and year of ICU admission). The authors conclude that "[c]onsidering these clusters may reduce the uncontrolled differences in patients’ prognosis and improve the power of studies." Addressing the heterogeneity of sepsis patients to contribute e.g. to more targeted randomized controlled trials, is very important. The authors use retrospective data from a registry covering the period between 1997-2015. However, even after extensive reviewing, several major questions remain that have to be addressed. #R1.1: First of all the paper lacks clarity with regard to the steps that have been conducted. In this regard figure S1 and S2 should be put into the main text. R.R1.1: We agree, to better understand the study design and the methods used, we added figure S1 (Schematic of study) into the main text. However, as Figure S2 give no crucial information; we think that it is better to leave it in the supplementary material. #R1.2: Afterwards, it has to be clear in all figures and tables if the training or the validation data set is displayed. R.R1.2: We agree, to improve understanding, all titles of figures or tables were completed by mention: “(performed in training set)” or “(performed in validation set)”. #R1.3: How many variables (overall) entered the HC – this number is important and is missing? R.R1.3: All 63 variables presented in Table 1 were included in the cluster analysis. To account for further comment, the Table 1 is now moved in appendix as S1 Table, as suggested in #R.R1.7 and missing values previously presented separately in the appendix were added to the S1 Table. Moreover, to account for this comment understanding, the title of this table was modified as follow: ”S1 Table: Description of the 63 variables included in the cluster analysis”. Following sentences were added line 148: “Description of 63 variables included in the cluster analysis was available S1 Table” and the line 177: ”A MCA was performed to reduce the dimension of the 63 variables of the dataset (S1 Table) in “Euclidian patient-coordinates” dataset.”. #R1.4: If the HC is similarly conducted in the validation data set (independent of the training set) what cluster structure is derived and how similar is it to the six clusters? R.R1.4: We agree, more information on the clusters building in the independent validation set is needed. Regarding the similarities and differences between clusters identified on the training set and on the validation set, we have already provided distribution of all characteristics of cluster in validation set in S4 Table. As suggested in #R1.9 , the main analysis for assess similarity and reproducibility is given in the new S5 Table. Following information was added in the manuscript:” Six clusters were identified in the validation set according to the indices used to select the number of clusters (S9 Fig.). The contributions of variables in the construction of the first four dimensions of the MCA are depicted on S10 Fig. S4 Table described the distribution of demographic characteristics, various comorbidities, sources of infection, micro-organisms, clinical and biological data, and the organ failures at admission.”. Following information were added in the appendix: ”S9 Fig : Representation of indices used to determine the dendrogram cut level (performed in validation set).”. #R1.5: The focus should generally be on the validation data set! As this is the more critical step and of course generalizability to other ICU data sets is the ultimate step. Finally, and related to this point, the authors should extend the comparison to Seymour et al. (2019) and more systematically describe if and how their French results differ from those of the US electronic medical record analysis. R.R1.5: We agree that external validation cohort is critical; it will be an upcoming study for our team. However, some external validation should be performed by using dataset with common collected variables, that is not the case. This information was available in the discussion section as follow: Second, the training set and the validation set come from the same database. A validation with an external database with common variables is needed.” Other major points: #R1.6: Page 8 (line 159-): What statistics are used for decisions and it is not clear why and how k-means clustering is included. R.R1.6: At the best of our knowledge, there was no unique method or indicator neither consensus in the literature to define the optimal number of clusters. Here, we chose a trade-off between the four criteria used (Semi-partial R-squared, R-squared, Pseudo F statistic and pseudo t² statistics presented in S4 Fig.) and the clinical relevance of clusters (number of individuals by cluster and identity of the clusters). Following sentence was added in the method section: “There is no consensus in the literature on the final choice of the number of clusters regardless of the clustering method”. K-means algorithm was considered to identify potential outliers in the data. This information was added in the methods section. #R1.7: Table 1: Can be put to the Supplement – why did you categorize quantitative variables (which is not recommended!) R.R1.7: We agree, the Table 1 was moved in appendix as S1 Table. Moreover, concerning the categorization of quantitative variables, as we used MCA, we need to have qualitative variables only, as mix type of variables cannot be considered here. Following sentence was added in the method section: “because MCA is based on qualitative variables, quantitative variables were categorized”. #R1.8 : Page 10 (line 176-): These are time to event analyses (that can address censoring); thus, I hope that the mortality estimates are derived from the Kaplan Meier estimator (on page 11; line 206-); similarly I assume that Cox regression models were used for the analyses with adjustment? R.R1.8: As specified (on page 11, line 206), it was crude mortality rates. As suggested, we replaced it with the estimated mortality by Kaplan Meier. Risks of mortality were estimated using Cox model. This information was added in the method section and in footnote as follow: “A cox model was used to determine the hazard ratio. Data are reported as HR ±95% confidence intervals, presented from the lowest to highest” #R1.9: Page 10 (line 187): Accuracy in the training set is of very little use – report this information in the validation set, please. R.R1.9: We agree, accuracy in the validation set is now provided in S5 Table. Accuracy in the training set remains available in S4 Table. #R1.10: Page 10 (line 188-): Especially the red new text is not clear; in particular what is meant by “new”? R.R1.10: The term “new” has been deleted and methods section was modified as follow: “Cluster analysis was also applied in the validation set.” #R1.11: Page 11 (line 210-): What is meant by “The clusters supported 12% of the total variability.”? R.R1.11: The term “variability” refers to the observed differences between all individuals of the dataset. Due to the poor information provided, this sentence was deleted. #R1.12: Page 20 (line 333-): Is also not clear (I refer to the other points above). R.R1.12: As specified in #R.R1.4, sentence was modified as follow: “Six clusters were identified in the validation set according to the indices used to select the number of clusters (S9 Fig.). The contributions of variables in the construction of the first four dimensions of the MCA are depicted on S10 Fig. S4 Table described the distribution of demographic characteristics, various comorbidities, sources of infection, micro-organisms, clinical and biological data, and the organ failures at admission.”.. #R1.13 : Page 21 (line 358-): This recommendation should be attenuated and written more carefully. R.R1.13: The paragraph was modified as follow: “We identified a small cluster that included patients with COPD exacerbation. These patients met to the definition of sepsis 3.0, yet mechanisms of organ failure were probably very different from other cluster. The inclusion of these patients in sepsis studies should be discussed.” #R1.14 : Page 21 (line 364-): Please shorten the limitations part and structure the arguments better R.R1.14: We agree, limitations section was shortened and better stratified (#R.R1.16). #R1.15 : Page 22 (line 391-): The argument regarding “selection bias” is not clear. R.R1.15: As specified in #R.R1.14, the sentence was deleted. #R1.16 : Page 23 (line 403-): “Second, …” where is first? R.R1.16: We agree, as specified in #R.R1.14. Minor points: #R1.17 : Page 5: The authors should mention the period of data collection already; moreover did the 20 ICUs contribute homogeneously along these 18 years? Please provide information on the ICUs their focus and temporal contribution R.R1.17: Period of data collection was already mentioned line 205 and discussed line 358. Moreover, the 20 ICUs contributed heterogeneously. The S7 table was added in the appendix and the following sentence was modified in the discussion section as follow: “Third, the period of patient’s admission on our study ranged from 1997 to 2015 with heterogeneous periods of inclusion and number of patients included between each center (S7 Table)”. #R1.18 : Page 7 (line 141-): Please provide a little more information on the reported coefficients R.R1.18: The term « κ -coefficients » was modified by Cohen's kappa coefficient. #R1.19 : Page 7 (line 146-): Please provide more information on the missing value handling including the imputation and most importantly the impact that the imputation has on the conclusions R.R1.16: As specified in #R.R1.3, missing values was presented to the S1 Table. #R1.20 : Page 23 (line 414): better use prognosis instead of “prognostication” throughout R.R1.20: The term “prognostication” was replaced by “prognostic estimation”. Reviewer #2: Dear editor, thank you very much for inviting me to review this paper. The paper is a complex manuscript about the necessity to divide sepsis patients in cluster based on disease characteristics. Authors individuated six different clusters of patients analyzing data from a multicenter prospective database. #R2.1: Even though I lack the expertise to judge the details of the methodology, the chosen approach to explain data is not clear and the deep use of figures and tables makes all the results incomprehensible. I would suggest explaining the results, figures, and tables in text to allow a better understanding. R.R2.1: The unsupervised clustering we used is only devoted to reduce heterogeneity between individuals by emphasizing resemblance at the ICU admission. The originality of this technique is that it does not consider any specific outcome. It is like sorting people in a crowd based on whether they are tall or not, because this specification is the best way to dichotomize this crowd into two more homogeneous samples. Interestingly, in our study, these specific clusters, defined based on patient’s characteristics at admission, were associated with different susceptibility to therapy but also different health care consumption, duration of stay and mortality. #R2.2: Indeed, it is unclear what kind of information this study would add to daily clinical practice. It is clearly evident how sepsis and septic shock are conditions deeply related to clinical characteristics of the host, microbes involved and concomitant diseases, but it is not clear how the proposed clustering would help in clinical practice, I would suggest to better clarify this point in paper. R.R2.2: The six identified clusters reduced the heterogeneity of the syndrome and allowed a better discrimination of patients with sepsis at ICU admission based on characteristics available at admission. These clusters are characterized by distinct pathophysiological signatures, and different mortality risks and resources consumption. Interestingly, as already highlighted in the manuscript, the severity of illness and of organ dysfunctions are not the main sources of heterogeneity in septic population. The following sentence was added in the discussion section line 358:” The ability to affect a patient with sepsis to a homogeneous cluster should enable to achieve a personalized ICU medical care strategy, and to test potentially appropriate new therapies by performing more efficiently targeted clinical trials. The simplest way to do so is to use a classification tree.” #R2.3: Language is correct and all the bias and limits of the study are well indicated in manuscript. Submitted filename: Response to reviewers.docx Click here for additional data file. 14 May 2021 PONE-D-20-35126R1 Clinical and biological clusters of sepsis patients using hierarchical clustering. PLOS ONE Dear Dr. Papin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ACADEMIC EDITOR: Dear authors, thanks for your answers. The aim of the study is clearly to identify clusters of patients based on characteristic available at patients’ admission. My concerns is about clinical involvements. If your aim is to propose a different clinical strategy, you should better clarify this point. It is clear that different clusters have different clinical outcomes, health care consumption, duration of stay and mortality; anyway, it is not clear why an ICU physician should use this classification to plan clinical strategy. Thanks for the sentence you added to line 358, but I would specify that further studies are necessary to better explain this clustering. Please submit your revised manuscript by Jun 3rd. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Martina Crivellari Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 20 May 2021 ACADEMIC EDITOR: Dear authors, Thanks for your answers. #AE1: The aim of the study is clearly to identify clusters of patients based on characteristic available at patients’ admission. My concerns is about clinical involvements. If your aim is to propose a different clinical strategy, you should better clarify this point. R.AE1: We agree, the aim of the study is clearly to identify clusters of patients based on characteristic available at patients’ admission. This sentence was added in the abstract (line 39) as follows: “The aim of the study is to identify clusters of patients based on clinical and biological characteristic available at patients’ admission.” and in the methods section (line 99) as follows: “The primary objective of our study was to identify clusters among patients with sepsis by considering data available at admission including: underlying disorders, source of infection, micro-organism, biological host response and organ failures.”. #AE2: It is clear that different clusters have different clinical outcomes, health care consumption, duration of stay and mortality; anyway, it is not clear why an ICU physician should use this classification to plan clinical strategy. R.AE2: To clarify the interest of the clustering of sepsis patient’s the sentence of the introduction section (line 77) was reformulated as follows: “As explained by J. Castela Forte, to personalize and improve treatments of sepsis, patients must be clustered into common phenotypes based on clinically objective parameters reflecting disease mechanisms. After external validation step, these clusters could be grouped by underlying causal mechanisms and will improve patient characterization; optimized design and powering of randomized control trials. Finally, these clusters can allow identifying differential response patterns by considering baseline characteristics of sepsis patients.” #AE3: Thanks for the sentence you added to line 358, but I would specify that further studies are necessary to better explain this clustering. R.AE3: We agree, sentence was modified as follow : ”First, these phenotypes will reduce heterogeneity prior to randomization; however, because they will not necessarily create better trials, further studies are necessary to better explain this clustering”. Submitted filename: Response to Editor.docx Click here for additional data file. 24 May 2021 Clinical and biological clusters of sepsis patients using hierarchical clustering. PONE-D-20-35126R2 Dear Dr.Papin, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Martina Crivellari Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 23 Jul 2021 PONE-D-20-35126R2 Clinical and biological clusters of sepsis patients using hierarchical clustering. Dear Dr. Papin: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Martina Crivellari Academic Editor PLOS ONE

Table 2

Source of infection and micro-organism by cluster (performed in training set).

Variable	Cluster 1 n = 1,603	Cluster 2 n = 149	Cluster 3 n = 243	Cluster 4 n = 1,094	Cluster 5 n = 623	Cluster 6 n = 338
Source of infection
Pulmonary	1391 (87%)	26 (17%)	1 (0%)	431 (39%)	53 (9%)	171 (51%)
Bronchial	0 (0%)	0 (0%)	237 (98%)	2 (0%)	1 (0%)	0 (0%)
Urinary tract	43 (3%)	1 (1%)	0 (0%)	351 (32%)	16 (3%)	16 (5%)
Surgical abdomen	3 (0%)	0 (0%)	1 (0%)	14 (1%)	419 (67%)	10 (3%)
Medical abdomen	32 (2%)	0 (0%)	0 (0%)	62 (6%)	16 (3%)	38 (11%)
Soft tissues	15 (1%)	1 (1%)	2 (1%)	75 (7%)	66 (11%)	13 (4%)
Meningeal encephalitis	0 (0%)	149 (100%)	0 (0%)	0 (0%)	0 (0%)	1 (0%)
Miscellaneous sites	44 (3%)	8 (5%)	2 (1%)	88 (8%)	49 (8%)	26 (8%)
Unknown site	68 (4%)	1 (1%)	2 (1%)	123 (11%)	13 (2%)	68 (20%)
Infection micro-organisms
Escherichia coli	58 (4%)	2 (1%)	3 (1%)	352 (32%)	122 (20%)	60 (18%)
Other Enterobacteriaceae	99 (6%)	1 (1%)	3 (1%)	208 (19%)	78 (13%)	24 (7%)
Pseudomonas spp. and other NF GNB	86 (5%)	1 (1%)	8 (3%)	97 (9%)	46 (7%)	52 (15%)
Streptococcus pneumoniae	259 (16%)	36 (24%)	3 (1%)	39 (4%)	3 (0%)	11 (3%)
Enterococcus and Streptococcus	70 (4%)	5 (3%)	9 (4%)	131 (12%)	149 (24%)	25 (7%)
Staphylococcus aureus	153 (10%)	9 (6%)	1 (0%)	146 (13%)	45 (7%)	17 (5%)
Fungus	18 (1%)	1 (1%)	0 (0%)	27 (2%)	35 (6%)	19 (6%)
Virus	54 (3%)	18 (12%)	2 (1%)	21 (2%)	0 (0%)	19 (6%)
Other pathogens	341 (21%)	51 (34%)	21 (9%)	140 (13%)	116 (19%)	69 (20%)
Unknown pathogen	710 (44%)	49 (33%)	198 (81%)	221 (20%)	122 (20%)	60 (18%)
Bacteriemia	142 (9%)	29 (19%)	3 (1%)	320 (29%)	97 (16%)	81 (24%)
Nosocomial	478 (30%)	32 (21%)	43 (18%)	369 (34%)	316 (51%)	147 (43%)
MDRO	110 (7%)	8 (5%)	12 (5%)	139 (13%)	70 (11%)	22 (7%)

Definition of abbreviations: NF GNB: non-fermentative Gram negative bacilli. Cluster 1 = young patients without any comorbidities, admitted in ICU for community-acquired pneumonia; Cluster 2 = young patients without any co-morbidities, admitted in ICU for meningitis or encephalitis; Cluster 3 = elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures; Cluster 4 = elderly patients, with several comorbidities and organ failures; Cluster 5 = patients admitted after surgery with a nosocomial infection; Cluster 6 = young patients with immunosuppressive disease or therapy, such as AIDS, chronic steroid therapy or hematological malignancy. Values in Numbers (%) or median [IQR].

Table 3

Clinical and biological data at admission by cluster (performed in training set).

Variable	Cluster 1 n = 1,603	Cluster 2 n = 149	Cluster 3 n = 243	Cluster 4 n = 1,094	Cluster 5 n = 623	Cluster 6 n = 338
Myocardial dysfunction	160 (10%)	16 (11%)	33 (14%)	402 (37%)	131 (21%)	43 (13%)
Cardiac arrest before admission	57 (4%)	5 (3%)	7 (3%)	120 (11%)	17 (3%)	9 (3%)
Hyperglycemia (>11 mmol/l)	164 (10%)	22 (15%)	39 (16%)	299 (27%)	82 (13%)	73 (22%)
Hypoglycemia (<3 mmol/l)	13 (1%)	5 (3%)	5 (2%)	85 (8%)	8 (1%)	8 (2%)
Body temperature (°C)	38.3 [37.7–39]	38.9 [37.8–39.5]	37.7 [37.2–38.2]	38 [37.3–38.6]	38.0 [37.4–38.7]	38.8 [38–39.6]
New atrial fibrillation	212 (13%)	20 (13%)	21 (9%)	173 (16%)	123 (20%)	96 (28%)
Recurrent atrial fibrillation	42 (3%)	4 (3%)	11 (5%)	126 (12%)	40 (6%)	10 (3%)
Heart rate (beats/min)	115 [100–130]	115 [97–130]	110 [100–124]	114 [98–135]	118 [100–138]	129 [112–147]
Respiratory rate (breaths/min)	26 [20–33]	23 [20–30]	27 [20–33]	24 [20–31]	20 [16–25]	29 [24–35]
Sodium blood level (mmol/l)	137 [133–141]	136 [132–140]	138 [135–141]	137 [133–142]	137 [133–141]	136 [132–140]
Potassium blood level (mmol/l)	3.9 [3.5–4.4]	3.6 [3.3–4.2]	4.2 [3.8–4.8]	4.2 [3.5–5]	4.2 [3.7–4.8]	3.7 [3.2–4.2]
Bicarbonate blood level (mmol/l)	23 [19–26]	22 [18–25]	27 [22–33]	18.6 [14–23]	19 [15–23]	20 [16–24]
Hematocrit (%)	35 [30–40]	37 [31–41]	40 [34–45]	32 [28–36]	31 [27–36]	25 [22–28]
Prothrombin time (%)	74 [62–85]	74 [63–85]	78 [59–92]	56 [40–71]	59 [47–70]	59 [48–73]
Leukocytes (x10³/mm3),	12.6 [8.4–17.8]	12.5 [7.9–19.9]	12.5 [9.2–17.2]	14.8 [9.6–21.1]	12.800 [7.9–18.9]	0.8 [0.2–4.3]
Fluid replacement >50 ml/kg	242 (15%)	28 (19%)	12 (5%)	253 (23%)	157 (25%)	61 (18%)

32 in total

Review 1. Severe sepsis and septic shock.

Authors: Derek C Angus; Tom van der Poll
Journal: N Engl J Med Date: 2013-08-29 Impact factor: 91.245

2. Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials.

Authors: Carolyn S Calfee; Kevin Delucchi; Polly E Parsons; B Taylor Thompson; Lorraine B Ware; Michael A Matthay
Journal: Lancet Respir Med Date: 2014-05-19 Impact factor: 30.700

3. Association of TNF2, a TNF-alpha promoter polymorphism, with septic shock susceptibility and mortality: a multicenter study.

Authors: J P Mira; A Cariou; F Grall; C Delclaux; M R Losser; F Heshmati; C Cheval; M Monchi; J L Teboul; F Riché; G Leleu; L Arbibe; A Mignon; M Delpech; J F Dhainaut
Journal: JAMA Date: 1999-08-11 Impact factor: 56.272

4. Association between source of infection and hospital mortality in patients who have septic shock.

Authors: Aleksandra Leligdowicz; Peter M Dodek; Monica Norena; Hubert Wong; Aseem Kumar; Anand Kumar
Journal: Am J Respir Crit Care Med Date: 2014-05-15 Impact factor: 21.405

5. Phenotypic clusters within sepsis-associated multiple organ dysfunction syndrome.

Authors: Daniel B Knox; Michael J Lanspa; Kathryn G Kuttler; Simon C Brewer; Samuel M Brown
Journal: Intensive Care Med Date: 2015-04-08 Impact factor: 17.440

6. Phenotypic heterogeneity of potentially curable non-small-cell lung cancer: cohort study with cluster analysis.

Authors: Lama Sakr; David Small; Goulnar Kasymjanova; Samy Suissa; Pierre Ernst
Journal: J Thorac Oncol Date: 2015-05 Impact factor: 15.609

7. Identifying Sepsis Phenotypes.

Authors: Pope L Moseley; Soren Brunak
Journal: JAMA Date: 2019-10-08 Impact factor: 56.272

8. Sepsis in European intensive care units: results of the SOAP study.

Authors: Jean-Louis Vincent; Yasser Sakr; Charles L Sprung; V Marco Ranieri; Konrad Reinhart; Herwig Gerlach; Rui Moreno; Jean Carlet; Jean-Roger Le Gall; Didier Payen
Journal: Crit Care Med Date: 2006-02 Impact factor: 7.598

Review 9. A systematic review of the quality of genetic association studies in human sepsis.

Authors: Martin Fraser Clark; Simon Victor Baudouin
Journal: Intensive Care Med Date: 2006-09-07 Impact factor: 17.440

10. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

Authors: R Andrew Taylor; Joseph R Pare; Arjun K Venkatesh; Hani Mowafi; Edward R Melnick; William Fleischman; M Kennedy Hall
Journal: Acad Emerg Med Date: 2016-02-13 Impact factor: 3.451

2 in total

1. Lung histopathologic clusters in severe COVID-19: a link between clinical picture and tissue damage.

Authors: Maddalena Alessandra Wu; Gianluca Lopez; Manuela Nebuloni; Davide Ottolina; Jonathan Montomoli; Luca Carsana; Tommaso Fossali; Antonio Castelli; Roberto Rech; Chiara Cogliati; Emanuele Catena; Riccardo Colombo
Journal: Crit Care Date: 2021-12-13 Impact factor: 9.097

2. Learning and visualizing chronic latent representations using electronic health records.

Authors: David Chushig-Muzo; Cristina Soguero-Ruiz; Pablo de Miguel Bohoyo; Inmaculada Mora-Jiménez
Journal: BioData Min Date: 2022-09-05 Impact factor: 4.079

2 in total