| Literature DB >> 27578050 |
Stefanie Hieke1,2, Axel Benner3, Richard F Schlenl4, Martin Schumacher5, Lars Bullinger4, Harald Binder6.
Abstract
BACKGROUND: High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building multivariable risk prediction models for a clinical endpoint, such as treatment response or survival. Unfortunately, such a high-dimensional modeling task will often be complicated by a limited overlap of molecular measurements at different levels between patients, i.e. measurements from all molecular levels are available only for a smaller proportion of patients.Entities:
Keywords: Acute myeloid leukemia; Boosting; Multiple genome-wide data sets; Multivariable model; Risk prediction; Time-to-event endpoint
Mesh:
Year: 2016 PMID: 27578050 PMCID: PMC5004308 DOI: 10.1186/s12859-016-1183-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flowchart describing the sequential complementary strategy based on a stepwise procedure (notation focuses on the first AML application combining GEP and SNP data). Established clinical predictors (clin) that need to be adjusted for are considered as mandatory in the stepwise procedure based on componentwise likelihood-based boosting. Important clinical predictors are available for all AML cases. The SNP signature () including the known SNP signature (overlap samples) from model (1) and the predicted SNP signature (non-overlap samples) considering continuous response linear regression (3) as prediction technique is incorporated as fixed offset in model (2) for the microarray-based GEP data
Fig. 2Resampling inclusion frequencies for the genes selected by the sequential as well as by the reference approach (black), genes selected only by the reference approach (red) and genes selected only by the sequential complementary strategy (green) from the GEP data (first AML application). The inclusion frequencies for these genes concerning the reference approach are displayed by squares and the inclusion frequencies for these genes concerning the sequential complementary strategy are displayed by dots. Reference approach and sequential complementary strategy are estimated by componentwise boosting
Fig. 3Prediction error curves for the first AML data application example. Bootstrap.632+ prediction error curves estimates for sequential complementary strategy, i.e. boosting including SNP information for adjustment (dashed red curve) and for the reference approach, i.e. boosting without including SNP information (solid blue curve). The Kaplan-Meier benchmark is indicated by the dashed-dotted gray curve and the Cox model is given by dotted black curve
Fig. 4Variability of the.632+ prediction error estimates in the first AML application based on varying overlap sizes of 26, 15 and 10 biological samples, respectively. Integrated prediction error curve estimates for the Cox model, the reference approach and the sequential complementary strategy. The performance of the Kaplan-Meier benchmark is indicated by a horizontal line
Fig. 5Variability of the.632+ prediction error estimates in the second AML application based on varying overlap sizes of 166, 100, 50 and 30 biological samples, respectively. Integrated prediction error curve estimates for the Cox model, the reference approach and the sequential complementary strategy (verification example). The performance of the Kaplan-Meier benchmark is indicated by a horizontal line