Literature DB >> 34975235

Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome.

Gustavo Amorim1, Ran Tao1,2, Sarah Lotspeich1, Pamela A Shaw3, Thomas Lumley4, Bryan E Shepherd1.   

Abstract

Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement error, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error-prone variables, which can be highly correlated to the unknown truth. Applying and extending ideas from the two-phase sampling literature, we propose optimal and nearly-optimal designs for selecting the validation sample in the classical measurement-error framework. We target designs to improve the efficiency of model-based and design-based estimators, and show how the resulting designs compare to each other. Our results suggest that sampling schemes that extract more information from the error-prone data are substantially more efficient than SRS, for both design- and model-based estimators. The optimal procedure, however, depends on the analysis method, and can differ substantially. This is supported by theory and simulations. We illustrate the various designs using data from an HIV cohort study.

Entities:  

Keywords:  Design-based estimator; Linear Regression; Measurement error; Model-based estimator; Two-phase design

Year:  2021        PMID: 34975235      PMCID: PMC8715909          DOI: 10.1111/rssa.12689

Source DB:  PubMed          Journal:  J R Stat Soc Ser A Stat Soc        ISSN: 0964-1998            Impact factor:   2.175


  27 in total

1.  Validation studies: bias, efficiency, and exposure assessment.

Authors:  Nilanjan Chatterjee; Sholom Wacholder
Journal:  Epidemiology       Date:  2002-09       Impact factor: 4.822

2.  Issues of cost and efficiency in the design of reliability studies.

Authors:  M M Shoukri; M H Asyali; S D Walter
Journal:  Biometrics       Date:  2003-12       Impact factor: 2.571

3.  Correction for regression dilution bias using replicates from subjects with extreme first measurements.

Authors:  Lars Berglund; Hans Garmo; Johan Lindbäck; Björn Zethelius
Journal:  Stat Med       Date:  2007-05-10       Impact factor: 2.373

4.  Connections between survey calibration estimators and semiparametric models for incomplete data.

Authors:  Thomas Lumley; Pamela A Shaw; James Y Dai
Journal:  Int Stat Rev       Date:  2011-08       Impact factor: 2.217

5.  Quantitative trait analysis in sequencing studies under trait-dependent sampling.

Authors:  Dan-Yu Lin; Donglin Zeng; Zheng-Zheng Tang
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-11       Impact factor: 11.205

6.  Outcome-dependent sampling: an efficient sampling and inference procedure for studies with a continuous outcome.

Authors:  Haibo Zhou; Jianwei Chen; Tiina H Rissanen; Susan A Korrick; Howard Hu; Jukka T Salonen; Matthew P Longnecker
Journal:  Epidemiology       Date:  2007-07       Impact factor: 4.822

7.  Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome.

Authors:  Haibo Zhou; Rui Song; Yuanshan Wu; Jing Qin
Journal:  Biometrics       Date:  2011-03       Impact factor: 2.571

8.  Two-phase analysis and study design for survival models with error-prone exposures.

Authors:  Kyunghee Han; Thomas Lumley; Bryan E Shepherd; Pamela A Shaw
Journal:  Stat Methods Med Res       Date:  2020-12-16       Impact factor: 2.494

9.  Adaptive sampling in two-phase designs: a biomarker study for progression in arthritis.

Authors:  Michael A McIsaac; Richard J Cook
Journal:  Stat Med       Date:  2015-05-07       Impact factor: 2.373

10.  Optimal multiwave sampling for regression modeling in two-phase designs.

Authors:  Tong Chen; Thomas Lumley
Journal:  Stat Med       Date:  2020-10-05       Impact factor: 2.373

View more
  2 in total

1.  Optimal sampling for design-based estimators of regression models.

Authors:  Tong Chen; Thomas Lumley
Journal:  Stat Med       Date:  2022-01-06       Impact factor: 2.373

2.  Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.

Authors:  Bryan E Shepherd; Pamela A Shaw
Journal:  Stat Commun Infect Dis       Date:  2020-10-07
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.