Maarten van Smeden1, Daniel L Oberski2, Johannes B Reitsma3, Jeroen K Vermunt2, Karel G M Moons3, Joris A H de Groot3. 1. Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands. Electronic address: M.vanSmeden@umcutrecht.nl. 2. Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, The Netherlands. 3. Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands.
Abstract
OBJECTIVES: The objective of this study was to evaluate the performance of goodness-of-fit testing to detect relevant violations of the assumptions underlying the criticized "standard" two-class latent class model. Often used to obtain sensitivity and specificity estimates for diagnostic tests in the absence of a gold reference standard, this model relies on assuming that diagnostic test errors are independent. When this assumption is violated, accuracy estimates may be biased: goodness-of-fit testing is often used to evaluate the assumption and prevent bias. STUDY DESIGN AND SETTING: We investigate the performance of goodness-of-fit testing by Monte Carlo simulation. The simulation scenarios are based on three empirical examples. RESULTS: Goodness-of-fit tests lack power to detect relevant misfit of the standard two-class latent class model at sample sizes that are typically found in empirical diagnostic studies. The goodness-of-fit tests that are based on asymptotic theory are not robust to the sparseness of data. A parametric bootstrap procedure improves the evaluation of goodness of fit in the case of sparse data. CONCLUSION: Our simulation study suggests that relevant violation of the local independence assumption underlying the standard two-class latent class model may remain undetected in empirical diagnostic studies, potentially leading to biased estimates of sensitivity and specificity.
OBJECTIVES: The objective of this study was to evaluate the performance of goodness-of-fit testing to detect relevant violations of the assumptions underlying the criticized "standard" two-class latent class model. Often used to obtain sensitivity and specificity estimates for diagnostic tests in the absence of a gold reference standard, this model relies on assuming that diagnostic test errors are independent. When this assumption is violated, accuracy estimates may be biased: goodness-of-fit testing is often used to evaluate the assumption and prevent bias. STUDY DESIGN AND SETTING: We investigate the performance of goodness-of-fit testing by Monte Carlo simulation. The simulation scenarios are based on three empirical examples. RESULTS: Goodness-of-fit tests lack power to detect relevant misfit of the standard two-class latent class model at sample sizes that are typically found in empirical diagnostic studies. The goodness-of-fit tests that are based on asymptotic theory are not robust to the sparseness of data. A parametric bootstrap procedure improves the evaluation of goodness of fit in the case of sparse data. CONCLUSION: Our simulation study suggests that relevant violation of the local independence assumption underlying the standard two-class latent class model may remain undetected in empirical diagnostic studies, potentially leading to biased estimates of sensitivity and specificity.