| Literature DB >> 24436727 |
Philip S Boonstra1, Bhramar Mukherjee1, Jeremy Mg Taylor1.
Abstract
Motivated by the increasing use of and rapid changes in array technologies, we consider the prediction problem of fitting a linear regression relating a continuous outcome Y to a large number of covariates X , eg measurements from current, state-of-the-art technology. For most of the samples, only the outcome Y and surrogate covariates, W , are available. These surrogates may be data from prior studies using older technologies. Owing to the dimension of the problem and the large fraction of missing information, a critical issue is appropriate shrinkage of model parameters for an optimal bias-variance tradeoff. We discuss a variety of fully Bayesian and Empirical Bayes algorithms which account for uncertainty in the missing data and adaptively shrink parameter estimates for superior prediction. These methods are evaluated via a comprehensive simulation study. In addition, we apply our methods to a lung cancer dataset, predicting survival time (Y) using qRT-PCR ( X ) and microarray ( W ) measurements.Entities:
Keywords: High-dimensional data; Markov chain Monte Carlo; measurement error; missing data; shrinkage
Year: 2013 PMID: 24436727 PMCID: PMC3891514 DOI: 10.1214/13-AOAS668
Source DB: PubMed Journal: Ann Appl Stat ISSN: 1932-6157 Impact factor: 2.083