Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches.

Literature DB >> 23087411

Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches.

Philip S Boonstra¹, Jeremy M G Taylor, Bhramar Mukherjee.

Abstract

With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p markers which is the best available measure of the underlying biological process. This same biological process may also be measured by W, coming from a prior technology but correlated with X. On a moderately sized sample, we have (Y,X,W), and on a larger sample we have (Y,W). We utilize the data on W to boost the prediction of Y by X. When p is large and the subsample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies. We propose to shrink the regression coefficients β of Y on X toward different targets that use information derived from W in the larger dataset. We compare these proposals with the classical ridge regression of Y on X, which does not use W. We also unify all of these methods as targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of β. With an optimal choice of weights, the hybrid estimator balances efficiency and robustness in a data-adaptive way to theoretically yield a smaller prediction error than any of its constituents. The methods, including a fully Bayesian alternative, are evaluated via simulation studies. We also apply them to a gene-expression dataset. mRNA expression measured via quantitative real-time polymerase chain reaction is used to predict survival time in lung cancer patients, with auxiliary information from microarray technology available on a larger sample.

Entities: Disease Species

Mesh：

Year: 2012 PMID： 23087411 PMCID： PMC3590922 DOI： 10.1093/biostatistics/kxs036

Source DB: PubMed Journal: Biostatistics ISSN： 1465-4644 Impact factor: 5.899

5 in total

1. Simultaneous estimation of parameters in different linear models and applications to biometric problems.

Authors: C R Rao
Journal: Biometrics Date: 1975-06 Impact factor: 2.571

2. A theoretical and experimental analysis of linear combiners for multiple classifier systems.

Authors: Giorgio Fumera; Fabio Roli
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2005-06 Impact factor: 6.226

3. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Authors: Juliane Schäfer; Korbinian Strimmer
Journal: Stat Appl Genet Mol Biol Date: 2005-11-14

4. Development and validation of a quantitative real-time polymerase chain reaction classifier for lung cancer prognosis.

Authors: Guoan Chen; Sinae Kim; Jeremy M G Taylor; Zhuwen Wang; Oliver Lee; Nithya Ramnath; Rishindra M Reddy; Jules Lin; Andrew C Chang; Mark B Orringer; David G Beer
Journal: J Thorac Oncol Date: 2011-09 Impact factor: 15.609

5. Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies.

Authors: Yi-Hau Chen; Nilanjan Chatterjee; Raymond J Carroll
Journal: J Am Stat Assoc Date: 2009-03-01 Impact factor: 5.033

5 in total

8 in total

8. Environmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the NHANES study using serum lipid levels.

Authors: Sung Kyun Park; Yebin Tao; John D Meeker; Siobán D Harlow; Bhramar Mukherjee
Journal: PLoS One Date: 2014-06-05 Impact factor: 3.240

8 in total

Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches.

1. Simultaneous estimation of parameters in different linear models and applications to biometric problems.

2. A theoretical and experimental analysis of linear combiners for multiple classifier systems.

3. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

4. Development and validation of a quantitative real-time polymerase chain reaction classifier for lung cancer prognosis.

5. Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies.

1. A Small-Sample Choice of the Tuning Parameter in Ridge Regression.

2. Incorporating auxiliary information for improved prediction using combination of kernel machines.

3. Flexible co-data learning for high-dimensional prediction.

4. BAYESIAN SHRINKAGE METHODS FOR PARTIALLY OBSERVED DATA WITH MANY PREDICTORS.

5. Combining Multiple Observational Data Sources to Estimate Causal Effects.

6. Evaluating biomarkers for treatment selection from reproducibility studies.

7. Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models.

8. Environmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the NHANES study using serum lipid levels.