| Literature DB >> 33927289 |
Stanislas Werfel1, Georg Lorenz1, Bernhard Haller2, Roman Günthner1, Julia Matschkal1, Matthias C Braunisch1, Carolin Schaller1, Peter Gundel1, Stephan Kemmner1,3, Salim S Hayek4, Christian Nusshag5,6, Jochen Reiser5, Philipp Moog1, Uwe Heemann1, Christoph Schmaderer7.
Abstract
Cohort studies often provide a large array of data on study participants. The techniques of statistical learning can allow an efficient way to analyze large datasets in order to uncover previously unknown, clinically relevant predictors of morbidity or mortality. We applied a combination of elastic net penalized Cox regression and stability selection with the aim of identifying novel predictors of mortality in a cohort of prevalent hemodialysis patients. In our analysis we included 475 patients from the "rISk strAtification in end-stage Renal disease" (ISAR) study, who we split into derivation and confirmation cohorts. A wide array of examinations was available for study participants, resulting in over a hundred potential predictors. In the selection approach many of the well established predictors were retrieved in the derivation cohort. Additionally, the serum levels of IL-12p70 and AST were selected as mortality predictors and confirmed in the withheld subgroup. High IL-12p70 levels were specifically prognostic of infection-related mortality. In summary, we demonstrate an approach how statistical learning can be applied to a cohort study to derive novel hypotheses in a data-driven way. Our results suggest a novel role of IL-12p70 in infection-related mortality, while AST is a promising additional biomarker in patients undergoing hemodialysis.Entities:
Year: 2021 PMID: 33927289 PMCID: PMC8085040 DOI: 10.1038/s41598-021-88655-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Patient flow diagram for the ISAR dialysis trial. The censoring for patients without an event occurred largely (≈ 80%) at the time point of last follow-up (4 years after study initiation). Death due to other causes also included unknown cause of death. Further details of the derivation and confirmation datasets are presented in Table 1. CV, cardiovascular.
Matching characteristics of the derivation and confirmation datasets.
| Total | Derivation | Confirmation | p-value | |
|---|---|---|---|---|
| n (%total) | 475 (100%) | 317 (67%) | 158 (33%) | |
| Age (year) | 68 [55–77] | 67 [55–77] | 69 [54–77] | 0.72 |
| Sex (%male) | 331 (70%) | 219 (69%) | 112 (71%) | 0.77 |
| Comorbidity index | 3 [1–6] | 3 [2–6] | 4 [1–7] | 0.96 |
| Serum IL-6 (pg/ml) | 9.4 [5.6–16.1] | 9.2 [5.6–15.6] | 10.2 [5.7–16.6] | 0.58 |
| All-cause mortality | 169 (36%) | 112 (35%) | 57 (36%) | 0.95 |
| Cardiovascular mortality | 64 (13%) | 44 (14%) | 20 (13%) | 0.82 |
| Mortality D/T infection | 43 (9%) | 27 (9%) | 16 (10%) | 0.68 |
| Mortality D/T other causes | 62 (13%) | 41 (13%) | 21 (13%) | 1.00 |
| 1 year mortality | 35 (7%) | 24 (8%) | 11 (7%) | 0.96 |
Nominal variables are reported as counts and percentages, ordinal/continuous variables as median and interquartile range. P-values were calculated using a Chi-squared and a Mann–Whitney test respectively. Dialysis-patient adapted comorbidity index was calculated as described by Liu et al.[13].
D/T, due to.
Figure 2Stability paths of the elastic net regression for all-cause (a), cardiovascular (b), and infection-associated mortality (c). Each curve represents either a nominal predictor or a group (top/bottom quintile) of an ordinal/continuous predictor. The colors are coded as indicated. Vertical dashed lines represent the penalty parameter (ln-lambda) chosen by the stability selection, horizontal dashed lines represent the predefined selection threshold of 60%. Variables above this threshold at the selected penalty were considered stable. "High" and "low" indicates that the value falls within the top or bottom quintile of the total study population (see “Methods”).
Summary of the stably selected variables and their confirmation.
| Derivation | Confirmation | n | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Stable selection | Sign of EN coefficient | All-cause | Cardiovascular | Infection-associated | ||||||||||||
| AC | CV | INF | AC | CV | INF | HR | CI (95%) | p-value | HR | CI (95%) | p-value | HR | CI (95%) | p-value | ||
| Age (per 10 years) | x | x | x | + | + | + | 1.64 | (1.31–2.06) | < 0.001* | 1.86 | (1.23–2.81) | 0.003* | 1.75 | (1.12–2.73) | 0.014* | 158 |
| Atherosclerosis (non CHD) | x | x | + | + | 2.14 | (1.26–3.61) | 0.005* | 2.70 | (1.01–7.21) | 0.048* | 47 | |||||
| Atrial fibrillation | x | + | 2.67 | (1.58–4.52) | < 0.001* | 42 | ||||||||||
| CVD | x | + | 2.35 | (0.97–5.71) | 0.059# | 62 | ||||||||||
| Dialysis catheter | x | x | x | + | + | + | 3.72 | (1.59–8.72) | 0.002* | 3.47 | (0.80–15.12) | 0.098# | 4.09 | (0.92–18.12) | 0.064# | 8 |
| DM | x | x | + | + | 2.00 | (1.19–3.37) | 0.009* | 1.38 | (0.57–3.35) | 0.470 | 61 | |||||
| H.o. amputation due to PAD | x | x | x | + | + | + | 1.57 | (0.67–3.67) | 0.296 | 2.43 | (0.71–8.35) | 0.158 | 0.90 | (0.12–6.81) | 0.918 | 10 |
| H.o. MI | x | + | 2.49 | (0.97–6.40) | 0.057# | 37 | ||||||||||
| H.o. neoplasia | x | + | 2.22 | (0.90–5.46) | 0.082# | 42 | ||||||||||
| HF | x | + | 2.57 | (1.49–4.41) | 0.001* | 35 | ||||||||||
| HRD (non AF) | x | + | 2.07 | (0.59–7.31) | 0.256 | 19 | ||||||||||
| Oral anticoagulation | x | + | 3.10 | (1.71–5.64) | < 0.001* | 22 | ||||||||||
| Other ac (dialysis) | x | + | 0.00 | (0.00–Inf) | 0.998 | 7 | ||||||||||
| PAD | x | x | + | + | 2.13 | (1.23–3.69) | 0.007* | 2.56 | (0.93–7.05) | 0.069# | 31 | |||||
| Pulmonary hypertension | x | + | 3.58 | (1.19–10.81) | 0.023* | 13 | ||||||||||
| DBP (24 h) low | x | + | 3.41 | (1.18–9.80) | 0.023* | 20 | ||||||||||
| MAP (24 h) low | x | + | 2.22 | (0.71–6.87) | 0.169 | 23 | ||||||||||
| Immunosuppression | x | + | 0.47 | (0.06–3.48) | 0.457 | 17 | ||||||||||
| Kt/V high | x | − | 2.85 | (0.98–8.26) | 0.054 | 25 | ||||||||||
| HB low | x | + | 3.26 | (1.18–9.03) | 0.023* | 29 | ||||||||||
| Serum AST low | x | − | 0.31 | (0.10–0.98) | 0.046* | 23 | ||||||||||
| Serum cholesterol high | x | + | 0.22 | (0.03–1.67) | 0.144 | 24 | ||||||||||
| Serum creatinine high | x | − | 0.44 | (0.18–1.11) | 0.083# | 30 | ||||||||||
| Serum creatinine low | x | x | + | + | 1.34 | (0.77–2.33) | 0.295 | 1.77 | (0.72–4.35) | 0.210 | 44 | |||||
| Serum hsCRP high | x | + | 1.82 | (1.00–3.28) | 0.048* | 31 | ||||||||||
| Serum IFN-gamma high | x | + | 0.95 | (0.27–3.33) | 0.934 | 29 | ||||||||||
| Serum IL-12p70 high | x | + | 3.12 | (1.16–8.41) | 0.024* | 33 | ||||||||||
| Serum IL-13 low | x | + | 0.26 | (0.03–1.95) | 0.190 | 32 | ||||||||||
| Serum IL-6 high | x | x | x | + | + | + | 2.96 | (1.73–5.07) | < 0.001* | 1.98 | (0.75–5.20) | 0.165 | 4.49 | (1.67–12.05) | 0.003* | 36 |
| Serum IP-10 high | x | + | 1.19 | (0.45–3.10) | 0.726 | 38 | ||||||||||
| Serum IP-10 low | x | − | 0.57 | (0.13–2.53) | 0.462 | 30 | ||||||||||
| Serum iron low | x | + | 1.94 | (0.62–6.05) | 0.251 | 26 | ||||||||||
| Serum LDL high | x | + | 0.49 | (0.11–2.13) | 0.343 | 24 | ||||||||||
| Serum phosphate low | x | + | 1.28 | (0.49–3.33) | 0.615 | 36 | ||||||||||
| Serum triglycerides low | x | x | + | + | 2.30 | (1.16–4.56) | 0.018* | 1.20 | (0.28–5.19) | 0.809 | 16 | |||||
| Serum YKL-40 high | x | + | 2.51 | (1.44–4.35) | 0.001* | 32 | ||||||||||
For ordinal variables "high" and "low" indicates that the value falls within the top or bottom quintile of the total study population (see “Methods”).
ac, anticoagulation; AC, all-cause mortality; AF, atrial fibrillation; AST, Aspartate transaminase; CHD, coronary heart disease; CI(95%), 95% confidence interval of the hazard ratio; CV, cardiovascular mortality; CVD, cardiovascular disease; D/T, due to; DBT (24 h), average (24 h) diastolic blood pressure; DM, Diabetes mellitus; EN, elastic net; ESRD, end-stage renal disease; H.o., history of; HB, hemoglobin; HF, heart failure; HR, hazard ratio; HRD, heart rhythm disorder; INF, infection-associated mortality; MAP (24 h), average (24 h) mean arterial pressure; MI, myocardial infarction; n, number of non-zero observations of the respective predictor in the confirmation group; PAD, peripheral artery disease.
*p < 0.05; #, two-sided p < 0.1 and same effect direction as in the elastic-net regression derivation dataset for respective all-cause or cause-specific mortality. Effect direction in the derivation dataset is indicated as sign of elastic net coefficient: + , regularized HR > 1; −, regularized HR < 1. Statistics were calculated in the confirmation dataset using univariate Cox regression. Tests were only performed in the confirmation dataset for predictors which passed stability selection for the respective outcome in the derivation dataset.
Figure 3Cumulative incidence plots for all-cause, cardiovascular (CV), infection-associated mortality and other mortality causes. Patients were stratified by AST levels (a, patients within top, middle and lowest quintile thresholds) and IL-12p70 levels (b, below detection limit, detected but not in top quintile, top quintile threshold). The analysis was performed on the total cohort with non-missing values for the respective variables. P-values were calculated using log-rank test for trend.
Figure 5Time-dependent area under the curve (AUC) calculated using bootstrapping. (a) Univariate analyses comparing individual variables and transformations. For AST ln transformation was applied. IL-12p70 values were transformed using areasinus hyperbolicus (asinh) due to a large number of patients with measurements below the detection limit. Dashed lines represent established predictors for comparison (as indicated in the color legend). (b) Multivariable analysis was fitted using known predictors as described in “Methods” (“All-cause model”, black curve). Solid lines represent the addition of AST (ln-transformed, fitted by a spline) and/or IL-12p70 (high group) to the all-cause model. Dashed lines represent the all-cause model after removal of the indicated known predictors. Dots in (b) represent AUC values for individual imputed datasets for each model (see “Methods”). Lines in (a) and (b) represent smoothed conditional means. Bootstrapping was performed on the total cohort with non-missing values for (a) and on the total cohort (n = 475) with missing values imputes as described in “Methods” section for (b). Tr., transformation; spl., spline fit; Catheter, use of catheter for dialysis.
Figure 4AST spline functions. Spline functions to ln transformed serum AST values were fit in a Cox regression for all-cause, cardiovascular (CV), infection-associated (INF), and other mortality causes. Horizontal boxplots represent the distribution of ln-transformed AST values in ISAR patients. Red lines represent the estimated (cause-specific) HR (patients with median AST as reference) and the dashed lines represent the 95% confidence interval of HR at a given AST level. The models were univariate (a) or were adjusted for other relevant predictors in a multivariable model (b) and were fit to the total cohort with non-missing values for the respective variables. Multivariable model predictors are described in “Methods” section. Vertical dashed line represents the cutoff value for the lowest AST quintile.