Literature DB >> 29479125

Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies.

Ran Tao1, Donglin Zeng2, Dan-Yu Lin2.   

Abstract

In modern epidemiological and clinical studies, the covariates of interest may involve genome sequencing, biomarker assay, or medical imaging and thus are prohibitively expensive to measure on a large number of subjects. A cost-effective solution is the two-phase design, under which the outcome and inexpensive covariates are observed for all subjects during the first phase and that information is used to select subjects for measurements of expensive covariates during the second phase. For example, subjects with extreme values of quantitative traits were selected for whole-exome sequencing in the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP). Herein, we consider general two-phase designs, where the outcome can be continuous or discrete, and inexpensive covariates can be continuous and correlated with expensive covariates. We propose a semiparametric approach to regression analysis by approximating the conditional density functions of expensive covariates given inexpensive covariates with B-spline sieves. We devise a computationally efficient and numerically stable EM-algorithm to maximize the sieve likelihood. In addition, we establish the consistency, asymptotic normality, and asymptotic efficiency of the estimators. Furthermore, we demonstrate the superiority of the proposed methods over existing ones through extensive simulation studies. Finally, we present applications to the aforementioned NHLBI ESP.

Entities:  

Keywords:  Biased sampling; EM algorithm; Genome sequencing; Responseselective sampling; Semiparametric efficiency; Sieve approximation

Year:  2017        PMID: 29479125      PMCID: PMC5823539          DOI: 10.1080/01621459.2017.1295864

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  15 in total

1.  Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study.

Authors:  Herman A Taylor; James G Wilson; Daniel W Jones; Daniel F Sarpong; Asoka Srinivasan; Robert J Garrison; Cheryl Nelson; Sharon B Wyatt
Journal:  Ethn Dis       Date:  2005       Impact factor: 1.847

2.  Variants in STAT5B associate with serum TC and LDL-C levels.

Authors:  Jan-Wilhelm Kornfeld; Aaron Isaacs; Veronique Vitart; J Andrew Pospisilik; Thomas Meitinger; Ulf Gyllensten; James F Wilson; Igor Rudan; Harry Campbell; Josef M Penninger; Veronika Sexl; Richard Moriggl; Cornelia van Duijn; Peter P Pramstaller; Andrew A Hicks
Journal:  J Clin Endocrinol Metab       Date:  2011-07-13       Impact factor: 5.958

3.  Quantitative trait analysis in sequencing studies under trait-dependent sampling.

Authors:  Dan-Yu Lin; Donglin Zeng; Zheng-Zheng Tang
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-11       Impact factor: 11.205

4.  On semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome.

Authors:  Rui Song; Haibo Zhou; Michael R Kosorok
Journal:  Biometrika       Date:  2009-01-26       Impact factor: 2.445

5.  A two stage design for the study of the relationship between a rare exposure and a rare disease.

Authors:  J E White
Journal:  Am J Epidemiol       Date:  1982-01       Impact factor: 4.897

6.  Design of the Women's Health Initiative clinical trial and observational study. The Women's Health Initiative Study Group.

Authors: 
Journal:  Control Clin Trials       Date:  1998-02

7.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators.

Authors: 
Journal:  Am J Epidemiol       Date:  1989-04       Impact factor: 4.897

8.  Multi-Ethnic Study of Atherosclerosis: objectives and design.

Authors:  Diane E Bild; David A Bluemke; Gregory L Burke; Robert Detrano; Ana V Diez Roux; Aaron R Folsom; Philip Greenland; David R Jacob; Richard Kronmal; Kiang Liu; Jennifer Clark Nelson; Daniel O'Leary; Mohammed F Saad; Steven Shea; Moyses Szklo; Russell P Tracy
Journal:  Am J Epidemiol       Date:  2002-11-01       Impact factor: 4.897

9.  CARDIA: study design, recruitment, and some characteristics of the examined subjects.

Authors:  G D Friedman; G R Cutter; R P Donahue; G H Hughes; S B Hulley; D R Jacobs; K Liu; P J Savage
Journal:  J Clin Epidemiol       Date:  1988       Impact factor: 6.437

10.  LDL-cholesterol concentrations: a genome-wide association study.

Authors:  Manjinder S Sandhu; Dawn M Waterworth; Sally L Debenham; Eleanor Wheeler; Konstantinos Papadakis; Jing Hua Zhao; Kijoung Song; Xin Yuan; Toby Johnson; Sofie Ashford; Michael Inouye; Robert Luben; Matthew Sims; David Hadley; Wendy McArdle; Philip Barter; Y Antero Kesäniemi; Robert W Mahley; Ruth McPherson; Scott M Grundy; Sheila A Bingham; Kay-Tee Khaw; Ruth J F Loos; Gérard Waeber; Inês Barroso; David P Strachan; Panagiotis Deloukas; Peter Vollenweider; Nicholas J Wareham; Vincent Mooser
Journal:  Lancet       Date:  2008-02-09       Impact factor: 79.321

View more
  15 in total

1.  Two-Phase, Generalized Case-Control Designs for the Study of Quantitative Longitudinal Outcomes.

Authors:  Jonathan S Schildcrout; Sebastien Haneuse; Ran Tao; Leila R Zelnick; Enrique F Schisterman; Shawn P Garbett; Nathaniel D Mercaldo; Paul J Rathouz; Patrick J Heagerty
Journal:  Am J Epidemiol       Date:  2020-02-28       Impact factor: 4.897

2.  Novel two-phase sampling designs for studying binary outcomes.

Authors:  Le Wang; Matthew L Williams; Yong Chen; Jinbo Chen
Journal:  Biometrics       Date:  2019-11-14       Impact factor: 2.571

3.  Estimating Additive Interaction Effect in Stratified Two-Phase Case-Control Design.

Authors:  Ai Ni; Jaya M Satagopan
Journal:  Hum Hered       Date:  2019-10-21       Impact factor: 0.444

4.  Optimal sampling for design-based estimators of regression models.

Authors:  Tong Chen; Thomas Lumley
Journal:  Stat Med       Date:  2022-01-06       Impact factor: 2.373

5.  Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.

Authors:  Bryan E Shepherd; Pamela A Shaw
Journal:  Stat Commun Infect Dis       Date:  2020-10-07

6.  Generalized case-control sampling under generalized linear models.

Authors:  Jacob M Maronge; Ran Tao; Jonathan S Schildcrout; Paul J Rathouz
Journal:  Biometrics       Date:  2021-09-29       Impact factor: 1.701

7.  Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data.

Authors:  Ran Tao; Nathaniel D Mercaldo; Sebastien Haneuse; Jacob M Maronge; Paul J Rathouz; Patrick J Heagerty; Jonathan S Schildcrout
Journal:  Stat Med       Date:  2021-01-13       Impact factor: 2.373

8.  Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors.

Authors:  Ran Tao; Sarah C Lotspeich; Gustavo Amorim; Pamela A Shaw; Bryan E Shepherd
Journal:  Stat Med       Date:  2020-11-03       Impact factor: 2.373

9.  Optimal multiwave sampling for regression modeling in two-phase designs.

Authors:  Tong Chen; Thomas Lumley
Journal:  Stat Med       Date:  2020-10-05       Impact factor: 2.373

10.  Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort.

Authors:  Sarah C Lotspeich; Bryan E Shepherd; Gustavo G C Amorim; Pamela A Shaw; Ran Tao
Journal:  Biometrics       Date:  2021-07-02       Impact factor: 2.571

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.