Literature DB >> 23087411

Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches.

Philip S Boonstra1, Jeremy M G Taylor, Bhramar Mukherjee.   

Abstract

With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p markers which is the best available measure of the underlying biological process. This same biological process may also be measured by W, coming from a prior technology but correlated with X. On a moderately sized sample, we have (Y,X,W), and on a larger sample we have (Y,W). We utilize the data on W to boost the prediction of Y by X. When p is large and the subsample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies. We propose to shrink the regression coefficients β of Y on X toward different targets that use information derived from W in the larger dataset. We compare these proposals with the classical ridge regression of Y on X, which does not use W. We also unify all of these methods as targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of β. With an optimal choice of weights, the hybrid estimator balances efficiency and robustness in a data-adaptive way to theoretically yield a smaller prediction error than any of its constituents. The methods, including a fully Bayesian alternative, are evaluated via simulation studies. We also apply them to a gene-expression dataset. mRNA expression measured via quantitative real-time polymerase chain reaction is used to predict survival time in lung cancer patients, with auxiliary information from microarray technology available on a larger sample.

Entities:  

Mesh:

Year:  2012        PMID: 23087411      PMCID: PMC3590922          DOI: 10.1093/biostatistics/kxs036

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  5 in total

1.  Simultaneous estimation of parameters in different linear models and applications to biometric problems.

Authors:  C R Rao
Journal:  Biometrics       Date:  1975-06       Impact factor: 2.571

2.  A theoretical and experimental analysis of linear combiners for multiple classifier systems.

Authors:  Giorgio Fumera; Fabio Roli
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-06       Impact factor: 6.226

3.  A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Authors:  Juliane Schäfer; Korbinian Strimmer
Journal:  Stat Appl Genet Mol Biol       Date:  2005-11-14

4.  Development and validation of a quantitative real-time polymerase chain reaction classifier for lung cancer prognosis.

Authors:  Guoan Chen; Sinae Kim; Jeremy M G Taylor; Zhuwen Wang; Oliver Lee; Nithya Ramnath; Rishindra M Reddy; Jules Lin; Andrew C Chang; Mark B Orringer; David G Beer
Journal:  J Thorac Oncol       Date:  2011-09       Impact factor: 15.609

5.  Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies.

Authors:  Yi-Hau Chen; Nilanjan Chatterjee; Raymond J Carroll
Journal:  J Am Stat Assoc       Date:  2009-03-01       Impact factor: 5.033

  5 in total
  8 in total

1.  A Small-Sample Choice of the Tuning Parameter in Ridge Regression.

Authors:  Philip S Boonstra; Bhramar Mukherjee; Jeremy M G Taylor
Journal:  Stat Sin       Date:  2015-07-01       Impact factor: 1.261

2.  Incorporating auxiliary information for improved prediction using combination of kernel machines.

Authors:  Xiang Zhan; Debashis Ghosh
Journal:  Stat Methodol       Date:  2015-01-01

3.  Flexible co-data learning for high-dimensional prediction.

Authors:  Mirrelijn M van Nee; Lodewyk F A Wessels; Mark A van de Wiel
Journal:  Stat Med       Date:  2021-08-26       Impact factor: 2.497

4.  BAYESIAN SHRINKAGE METHODS FOR PARTIALLY OBSERVED DATA WITH MANY PREDICTORS.

Authors:  Philip S Boonstra; Bhramar Mukherjee; Jeremy Mg Taylor
Journal:  Ann Appl Stat       Date:  2013-12-01       Impact factor: 2.083

5.  Combining Multiple Observational Data Sources to Estimate Causal Effects.

Authors:  Shu Yang; Peng Ding
Journal:  J Am Stat Assoc       Date:  2019-06-11       Impact factor: 5.033

6.  Evaluating biomarkers for treatment selection from reproducibility studies.

Authors:  Xiao Song; Kevin K Dobbin
Journal:  Biostatistics       Date:  2022-01-13       Impact factor: 5.899

7.  Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models.

Authors:  Andrew Wey; John Connett; Kyle Rudser
Journal:  Biostatistics       Date:  2015-02-05       Impact factor: 5.279

8.  Environmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the NHANES study using serum lipid levels.

Authors:  Sung Kyun Park; Yebin Tao; John D Meeker; Siobán D Harlow; Bhramar Mukherjee
Journal:  PLoS One       Date:  2014-06-05       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.