| Literature DB >> 31750518 |
Riccardo De Bin1, Anne-Laure Boulesteix2, Axel Benner3, Natalia Becker3, Willi Sauerbrei4.
Abstract
Data integration, i.e. the use of different sources of information for data analysis, is becoming one of the most important topics in modern statistics. Especially in, but not limited to, biomedical applications, a relevant issue is the combination of low-dimensional (e.g. clinical data) and high-dimensional (e.g. molecular data such as gene expressions) data sources in a prediction model. Not only the different characteristics of the data, but also the complex correlation structure within and between the two data sources, pose challenging issues. In this paper, we investigate these issues via simulations, providing some useful insight into strategies to combine low- and high-dimensional data in a regression prediction model. In particular, we focus on the effect of the correlation structure on the results, while accounting for the influence of our specific choices in the design of the simulation study.Keywords: data integration; prediction models; regularized regression
Year: 2020 PMID: 31750518 DOI: 10.1093/bib/bbz136
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622