Literature DB >> 32773922

Synthetic data method to incorporate external information into a current study.

Tian Gu1, Jeremy M G Taylor1, Wenting Cheng1, Bhramar Mukherjee1.   

Abstract

We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X. A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y|X,B that uses both the available individual level data and some summary information obtained from the known model for Y|X. We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n+m to estimate the parameters of the Y|X, B model. This combined dataset of size n+m now has missing values of B form of the observations, and is analyzed using methods that can handle missing data (e.g. multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variance of the parameter estimates in the Y|X, B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios.

Entities:  

Keywords:  Synthetic data; constrained maximum likelihood; data integration; prediction models

Year:  2019        PMID: 32773922      PMCID: PMC7410329          DOI: 10.1002/cjs.11513

Source DB:  PubMed          Journal:  Can J Stat        ISSN: 0319-5724            Impact factor:   0.875


  10 in total

1.  Informing a Risk Prediction Model for Binary Outcomes with External Coefficient Information.

Authors:  Wenting Cheng; Jeremy M G Taylor; Tian Gu; Scott A Tomlins; Bhramar Mukherjee
Journal:  J R Stat Soc Ser C Appl Stat       Date:  2018-08-13       Impact factor: 1.864

2.  Sequential BART for imputation of missing covariates.

Authors:  Dandan Xu; Michael J Daniels; Almut G Winterstein
Journal:  Biostatistics       Date:  2016-03-15       Impact factor: 5.899

3.  Improving estimation and prediction in linear regression incorporating external information from an established reduced model.

Authors:  Wenting Cheng; Jeremy M G Taylor; Pantel S Vokonas; Sung Kyun Park; Bhramar Mukherjee
Journal:  Stat Med       Date:  2018-01-24       Impact factor: 2.373

4.  Empirical Bayes Estimation and Prediction Using Summary-Level Information From External Big Data Sources Adjusting for Violations of Transportability.

Authors:  Jason P Estes; Bhramar Mukherjee; Jeremy M G Taylor
Journal:  Stat Biosci       Date:  2018-05-14

5.  A simple-to-use method incorporating genomic markers into prostate cancer risk prediction tools facilitated future validation.

Authors:  Sonja Grill; Mahdi Fallah; Robin J Leach; Ian M Thompson; Kari Hemminki; Donna P Ankerst
Journal:  J Clin Epidemiol       Date:  2015-01-14       Impact factor: 6.437

6.  Urine TMPRSS2:ERG Plus PCA3 for Individualized Prostate Cancer Risk Assessment.

Authors:  Scott A Tomlins; John R Day; Robert J Lonigro; Daniel H Hovelson; Javed Siddiqui; L Priya Kunju; Rodney L Dunn; Sarah Meyer; Petrea Hodge; Jack Groskopf; John T Wei; Arul M Chinnaiyan
Journal:  Eur Urol       Date:  2015-05-16       Impact factor: 20.096

7.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually.

Authors:  M H Gail; L A Brinton; D P Byar; D K Corle; S B Green; C Schairer; J J Mulvihill
Journal:  J Natl Cancer Inst       Date:  1989-12-20       Impact factor: 13.506

8.  Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources.

Authors:  Nilanjan Chatterjee; Yi-Hau Chen; Paige Maas; Raymond J Carroll
Journal:  J Am Stat Assoc       Date:  2016-05-05       Impact factor: 5.033

9.  Comparison of approaches for incorporating new information into existing risk prediction models.

Authors:  Sonja Grill; Donna P Ankerst; Mitchell H Gail; Nilanjan Chatterjee; Ruth M Pfeiffer
Journal:  Stat Med       Date:  2016-12-11       Impact factor: 2.373

10.  Colorectal cancer risk prediction tool for white men and women without known susceptibility.

Authors:  Andrew N Freedman; Martha L Slattery; Rachel Ballard-Barbash; Gordon Willis; Bette J Cann; David Pee; Mitchell H Gail; Ruth M Pfeiffer
Journal:  J Clin Oncol       Date:  2008-12-29       Impact factor: 44.544

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.