Erin Jacobsen1, Xinhui Ran2, Anran Liu2, Chung-Chou H Chang2,3, Mary Ganguli1,4,5. 1. Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA. 2. Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA. 3. Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA. 4. Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA. 5. Department of Neurology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
Abstract
BACKGROUND: Longitudinal studies predictably experience non-random attrition over time. Among older adults, risk factors for attrition may be similar to risk factors for outcomes such as cognitive decline and dementia, potentially biasing study results. OBJECTIVE: To characterize participants lost to follow-up which can be useful in the study design and interpretation of results. METHODS: In a longitudinal aging population study with 10 years of annual follow-up, we characterized the attrited participants (77%) compared to those who remained in the study. We used multivariable logistic regression models to identify attrition predictors. We then implemented four machine learning approaches to predict attrition status from one wave to the next and compared the results of all five approaches. RESULTS: Multivariable logistic regression identified those more likely to drop out as older, male, not living with another study participant, having lower cognitive test scores and higher clinical dementia ratings, lower functional ability, fewer subjective memory complaints, no physical activity, reported hobbies, or engagement in social activities, worse self-rated health, and leaving the house less often. The four machine learning approaches using areas under the receiver operating characteristic curves produced similar discrimination results to the multivariable logistic regression model. CONCLUSIONS: Attrition was most likely to occur in participants who were older, male, inactive, socially isolated, and cognitively impaired. Ignoring attrition would bias study results especially when the missing data might be related to the outcome (e.g. cognitive impairment or dementia). We discuss possible solutions including oversampling and other statistical modeling approaches.
BACKGROUND: Longitudinal studies predictably experience non-random attrition over time. Among older adults, risk factors for attrition may be similar to risk factors for outcomes such as cognitive decline and dementia, potentially biasing study results. OBJECTIVE: To characterize participants lost to follow-up which can be useful in the study design and interpretation of results. METHODS: In a longitudinal aging population study with 10 years of annual follow-up, we characterized the attrited participants (77%) compared to those who remained in the study. We used multivariable logistic regression models to identify attrition predictors. We then implemented four machine learning approaches to predict attrition status from one wave to the next and compared the results of all five approaches. RESULTS: Multivariable logistic regression identified those more likely to drop out as older, male, not living with another study participant, having lower cognitive test scores and higher clinical dementia ratings, lower functional ability, fewer subjective memory complaints, no physical activity, reported hobbies, or engagement in social activities, worse self-rated health, and leaving the house less often. The four machine learning approaches using areas under the receiver operating characteristic curves produced similar discrimination results to the multivariable logistic regression model. CONCLUSIONS: Attrition was most likely to occur in participants who were older, male, inactive, socially isolated, and cognitively impaired. Ignoring attrition would bias study results especially when the missing data might be related to the outcome (e.g. cognitive impairment or dementia). We discuss possible solutions including oversampling and other statistical modeling approaches.
Entities:
Keywords:
artificial neural network (ANN); epidemiology; gradient boosting machine (GBM); least absolute shrinkage and selection operator-type regression (LASSO); loss to follow-up; random forest (RF)
Authors: Shanna L Burke; Tianyan Hu; Mitra Naseh; Nicole M Fava; Janice O'Driscoll; Daniel Alvarez; Linda B Cottler; Ranjan Duara Journal: Aging Clin Exp Res Date: 2018-12-10 Impact factor: 3.636
Authors: C E M Van Beijsterveldt; M P J van Boxtel; H Bosma; P J Houx; F Buntinx; J Jolles Journal: J Clin Epidemiol Date: 2002-03 Impact factor: 6.437
Authors: Claudia E Bambs; Kevin E Kip; Suresh R Mulukutla; Aryan N Aiyer; Cheryl Johnson; Lee Ann McDowell; Karen Matthews; Steven E Reis Journal: Ann Epidemiol Date: 2013-03-25 Impact factor: 3.797
Authors: Adrienne Chu; Hongshik Ahn; Bhawna Halwan; Bruce Kalmin; Everson L A Artifon; Alan Barkun; Michail G Lagoudakis; Atul Kumar Journal: Artif Intell Med Date: 2007-12-11 Impact factor: 5.326
Authors: Sivaniya Subramaniapillai; Sana Suri; Claudia Barth; Ivan I Maximov; Irene Voldsbekk; Dennis van der Meer; Tiril P Gurholt; Dani Beck; Bogdan Draganski; Ole A Andreassen; Klaus P Ebmeier; Lars T Westlye; Ann-Marie G de Lange Journal: Hum Brain Mapp Date: 2022-04-23 Impact factor: 5.399