Literature DB >> 22052713

Two-stage method to remove population- and individual-level outliers from longitudinal data in a primary care database.

C Welch1, I Petersen, K Walters, R W Morris, I Nazareth, E Kalaitzaki, I R White, L Marston, J Carpenter.   

Abstract

PURPOSE: In the UK, primary care databases include repeated measurements of health indicators at the individual level. As these databases encompass a large population, some individuals have extreme values, but some values may also be recorded incorrectly. The challenge for researchers is to distinguish between records that are due to incorrect recording and those which represent true but extreme values. This study evaluated different methods to identify outliers.
METHODS: Ten percent of practices were selected at random to evaluate the recording of 513,367 height measurements. Population-level outliers were identified using boundaries defined using Health Survey for England data. Individual-level outliers were identified by fitting a random-effects model with subject-specific slopes for height measurements adjusted for age and sex. Any height measurements with a patient-level standardised residual more extreme than ±10 were identified as an outlier and excluded. The model was subsequently refitted twice after removing outliers at each stage. This method was compared with existing methods of removing outliers.
RESULTS: Most outliers were identified at the population level using the boundaries defined using Health Survey for England (1550 of 1643). Once these were removed from the database, fitting the random-effects model to the remaining data successfully identified only 75 further outliers. This method was more efficient at identifying true outliers compared with existing methods.
CONCLUSIONS: We propose a new, two-stage approach in identifying outliers in longitudinal data and show that it can successfully identify outliers at both population and individual level.
Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.

Entities:  

Year:  2011        PMID: 22052713     DOI: 10.1002/pds.2270

Source DB:  PubMed          Journal:  Pharmacoepidemiol Drug Saf        ISSN: 1053-8569            Impact factor:   2.890


  10 in total

1. 

Authors:  Eric I Benchimol; Liam Smeeth; Astrid Guttmann; Katie Harron; David Moher; Irene Petersen; Henrik T Sørensen; Jean-Marie Januel; Erik von Elm; Sinéad M Langan
Journal:  CMAJ       Date:  2019-02-25       Impact factor: 8.262

2.  Comparison of cohort characteristics in Central Africa International Epidemiology Databases to Evaluate AIDS and Demographic Health Surveys: Rwanda and Burundi.

Authors:  Anna Mageras; Ellen Brazier; Théodore Niyongabo; Gad Murenzi; Jean D'Amour Sinayobye; Adebola A Adedimeji; Christella Twizere; Elizabeth A Kelvin; Kathryn Anastos; Denis Nash; Heidi E Jones
Journal:  Int J STD AIDS       Date:  2021-02-03       Impact factor: 1.359

3.  Detection of Outliers Due to Participants' Non-Adherence to Protocol in a Longitudinal Study of Cognitive Decline.

Authors:  Aline Dugravot; Severine Sabia; Martin J Shipley; Catherine Welch; Mika Kivimaki; Archana Singh-Manoux
Journal:  PLoS One       Date:  2015-07-10       Impact factor: 3.240

4.  Longitudinal multiple imputation approaches for body mass index or other variables with very low individual-level variability: the mibmi command in Stata.

Authors:  Evangelos Kontopantelis; Rosa Parisi; David A Springate; David Reeves
Journal:  BMC Res Notes       Date:  2017-01-13

5.  [The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement].

Authors:  Eric I Benchimol; Liam Smeeth; Astrid Guttmann; Katie Harron; Lars G Hemkens; David Moher; Irene Petersen; Henrik T Sørensen; Erik von Elm; Sinéad M Langan
Journal:  Z Evid Fortbild Qual Gesundhwes       Date:  2016-09-28

6.  The value of aspartate aminotransferase and alanine aminotransferase in cardiovascular disease risk assessment.

Authors:  Stephen F Weng; Joe Kai; Indra Neil Guha; Nadeem Qureshi
Journal:  Open Heart       Date:  2015-08-21

7.  The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement.

Authors:  Eric I Benchimol; Liam Smeeth; Astrid Guttmann; Katie Harron; David Moher; Irene Petersen; Henrik T Sørensen; Erik von Elm; Sinéad M Langan
Journal:  PLoS Med       Date:  2015-10-06       Impact factor: 11.069

8.  Artificial Intelligence-Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study.

Authors:  Weijia Chen; Zhijun Lu; Lijue You; Lingling Zhou; Jie Xu; Ken Chen
Journal:  JMIR Med Inform       Date:  2020-06-15

9.  Is it time to stop sweeping data cleaning under the carpet? A novel algorithm for outlier management in growth data.

Authors:  Charlotte S C Woolley; Ian G Handel; B Mark Bronsvoort; Jeffrey J Schoenebeck; Dylan N Clements
Journal:  PLoS One       Date:  2020-01-24       Impact factor: 3.240

10.  Methods to estimate baseline creatinine and define acute kidney injury in lean Ugandan children with severe malaria: a prospective cohort study.

Authors:  Anthony Batte; Michelle C Starr; Andrew L Schwaderer; Robert O Opoka; Ruth Namazzi; Erika S Phelps Nishiguchi; John M Ssenkusu; Chandy C John; Andrea L Conroy
Journal:  BMC Nephrol       Date:  2020-09-29       Impact factor: 2.388

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.