| Literature DB >> 27439789 |
Harald Binder1, Thorsten Kurz2, Sven Teschner3, Clemens Kreutz4, Marcel Geyer3, Johannes Donauer5, Annette Kraemer-Guth6, Jens Timmer4,7, Martin Schumacher8, Gerd Walz3,7.
Abstract
BACKGROUND: Identification of prognostic gene expression markers from clinical cohorts might help to better understand disease etiology. A set of potentially important markers can be automatically selected when linking gene expression covariates to a clinical endpoint by multivariable regression models and regularized parameter estimation. However, this is hampered by instability due to selection from many measurements. Stability can be assessed by resampling techniques, which might guide modeling decisions, such as choice of the model class or the specific endpoint definition.Entities:
Keywords: Clinical endpoint; Outlier; Prognostic signature; Stability
Mesh:
Year: 2016 PMID: 27439789 PMCID: PMC4955222 DOI: 10.1186/s12920-016-0210-9
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Strategy building blocks for investigating and improving signature stability. All steps are based on resampling data sets, which serve as a foundation. Subsequent steps involving all model building steps, e.g. cross-validation for model complexity selection, are indicated in white, steps based on fixed complexity levels in gray. Of the latter, steps for identifying outliers are indicated by light gray, steps for investigating different modeling strategies are indicated by darker gray shading
Clinical data of 321 ESRD patients on chronic intermittent hemodialysis
| Patients | |
|---|---|
| Age (yrs) | 66.3 (± 17.3) |
| Sex (Male (%) / Female (%)) | 197 (61) / 124 (39) |
| BMI (kg/m 2) | 25.2 (± 5.1) |
| Dialysis | |
| Time on dialysis at inclusion (months) | 49 (± 46) |
| Dialysis treatment duration per week (hrs) | 12.7 (± 2.5) |
| Kt/V (mean of last three | 1.27 (± 0.34) |
| measurements before inclusion) | |
| Renal disease | |
| Diabetic nephropathy (%) | 85 (26) |
| Glomerulonephritis (%) | 75 (23) |
| Hypertensive/Vascular (%) | 48 (15) |
| ADPKD (%) | 26 (8) |
| Other (%) | 46 (14) |
| Unknown (%) | 41 (13) |
| Clinical chemistry | |
| Total cholesterol (mg/dL) | 177 (± 43) |
| Triglycerides (mg/dL) | 197 (± 145) |
| Hemoglobin (g/dL) | 11.7 (± 1.5) |
| CRP (mg/L) | 30.5 (± 71.5) |
| Urea before dialysis (mg/dL) | 121 (± 43) |
| Phosphate (mean of three predialysis | 5.6 (± 1.7) |
| measurements, mg/dL) | |
| Total Calcium (corrected for albumin (mmol/L)) | 2.3 (± 0.3) |
| Calcium × Phosphate product | 50 (± 14) |
| Parathormon, in patients not | 158 (± 201) |
| parathyroidectomized (pg/ml) | |
| Ferritin (ng/mL) | 540 (± 332) |
| Albumin (g/dl) | 4.0 (± 0.6) |
BMI: Body mass index; ACE-I: Angiotensin-converting enzyme inhibitor; ARB: Angiotensin receptor blocker; MI: Myocardial infarction
Inclusion frequencies for microarray features selected in at least 20 % of subsampling data sets by any of the approaches (sh: Fine-Gray regression; csh: cause-specific hazard model)
| Feature | Original endpoint | Updated endpoint | Min | Max | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All observations | w/o outlier | All observations | w/o outlier | |||||||||||
| Multi | Uni | multi | Multi | Uni | multi | |||||||||
| sh | csh | sh | csh | sh | csh | sh | csh | sh | csh | sh | csh | |||
|
| 0.30 | 0.38 | 0.31 | 0.50 | 0.33 | 0.40 | 0.31 | 0.37 | 0.41 | 0.50 | 0.33 | 0.38 | 0.30 | 0.50 |
|
| 0.22 | 0.22 | 0.32 | 0.29 | 0.17 | 0.15 | 0.13 | 0.15 | 0.14 | 0.23 | 0.08 | 0.10 | 0.10 | 0.32 |
| R36623 | 0.11 | 0.10 | 0 | 0.12 | 0.10 | 0.09 | 0.25 | 0.28 | 0.01 | 0.29 | 0.25 | 0.28 | 0 | 0.29 |
|
| 0.23 | 0.20 | 0.10 | 0.17 | 0.27 | 0.25 | 0.07 | 0.10 | 0.11 | 0.15 | 0.10 | 0.13 | 0.07 | 0.27 |
| BX104205 | 0.19 | 0.18 | 0.03 | 0.04 | 0 | 0 | 0.36 | 0.33 | 0.19 | 0.19 | 0 | 0 | 0 | 0.36 |
| BX100481 | 0.14 | 0.10 | 0.05 | 0.01 | 0.09 | 0.06 | 0.26 | 0.24 | 0.12 | 0.04 | 0.19 | 0.17 | 0.01 | 0.26 |
| BM918155 | 0 | 0 | 0 | 0 | 0 | 0 | 0.27 | 0.28 | 0.13 | 0.01 | 0.31 | 0.32 | 0 | 0.32 |
| R00274 | 0.03 | 0.03 | 0.33 | 0.33 | 0.04 | 0.05 | 0.01 | 0.01 | 0.16 | 0.18 | 0.01 | 0.02 | 0.01 | 0.33 |
|
| 0.19 | 0.23 | 0.03 | 0.12 | 0.21 | 0.27 | 0.02 | 0.02 | 0 | 0.01 | 0.03 | 0.04 | 0 | 0.23 |
|
| 0.07 | 0.05 | 0.11 | 0.18 | 0.06 | 0.04 | 0.11 | 0.10 | 0.11 | 0.22 | 0.04 | 0.04 | 0.04 | 0.22 |
| AA027034 | 0.04 | 0.03 | 0.17 | 0.05 | 0.04 | 0.03 | 0.04 | 0.04 | 0.25 | 0.07 | 0.04 | 0.04 | 0.03 | 0.25 |
| AF086244 | 0 | 0 | 0 | 0 | 0 | 0 | 0.14 | 0.13 | 0.05 | 0.08 | 0.20 | 0.20 | 0 | 0.20 |
| AA001661 | 0.04 | 0.03 | 0.16 | 0.22 | 0.04 | 0.03 | 0.01 | 0.01 | 0.06 | 0.17 | 0.01 | 0.01 | 0.01 | 0.22 |
|
| 0.21 | 0.11 | 0 | 0.03 | 0.17 | 0.09 | 0.01 | 0.01 | 0 | 0 | 0.01 | 0.01 | 0 | 0.21 |
Minimum and maximum inclusion frequencies are given in the two rightmost columns. Names of microarray features contained in the original signature are indicated by boldface
Fig. 2Prediction error curves..632+ prediction error curve estimates for the microarray signature for the original (left panel) und the updated endpoint information (right panel), considering an Aalen-Johansen estimator (which doe not use any patient information), and a purely clinical model as a benchmark
Fig. 3Joint selection. Odds ratios of joint selection for microarray features with inclusion frequency larger or equal to 0.1. Blue color indicates odds ratios <1, i.e. alternative selection, red colors >1, i.e. joint selection, with more intense color indicating more extreme effects
Fig. 4Outlier detection. Scatter plots of standardized log2 expression values of some microarray features with strong conditional signature non-inclusion. The three observations considered as outliers for later analysis are indicated by triangles. Censored observations are indicated by hollow symbols, deaths after cardiovascular event by filled grey symbols, an deaths without prior cardiovascular events by filled black symbols
Binomial regression model for signature inclusion of microarray feature BX281671 (ITGA2B) in different modeling approaches (sh: Fine-Gray regression; csh: cause-specific hazard model)
| Term | ITGA2B | All features | |||
|---|---|---|---|---|---|
| estimate |
| sig - | sig + | ||
| Intercept | −3.82 | <0.001 | - | - | |
| sh vs. csh | −0.16 | 0.013 | 24 | 16 | |
| Outlier excluded | 0.58 | <0.001 | 29 | 19 | |
| Original endpoint | 2.63 | <0.001 |
|
| |
| Univariate model | −0.98 | <0.001 |
|
| |
| sh × outlier excluded | −0.01 | 0.798 | 0 | 0 | |
| sh × original endpoint | −0.12 | 0.056 | 19 | 17 | |
| sh × univariate | −1.27 | <0.001 | 31 | 25 | |
| Outlier excluded × original endpoint | −0.43 | <0.001 | 19 | 11 | |
| Original endpoint × univariate | 0.15 | 0.055 | 21 | 14 | |
The numbers of significant effects (5 % level after Bonferroni correction) for all 58 microarray features with inclusion frequency larger or equal to 0.1 in any of the approaches are indicated in the two rightmost columns, separately by positive and negative signs
Fig. 5Data and analysis flow. Use of original data and resampling data sets for different analyses, specifically, for obtaining signatures, estimating prediction performance, identifying outliers and judging stability of selection, exemplarily illustrated for ITGA2B