Literature DB >> 28407816

Automatic identification of variables in epidemiological datasets using logic regression.

Matthias W Lorenz1, Negin Ashtiani Abdi2, Frank Scheckenbach3, Anja Pflug3, Alpaslan Bülbül3, Alberico L Catapano4,5, Stefan Agewall6,7, Marat Ezhov8, Michiel L Bots9,10, Stefan Kiechl11, Andreas Orth2.   

Abstract

BACKGROUND: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable.
METHODS: For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated.
RESULTS: In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables.
CONCLUSIONS: We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

Entities:  

Keywords:  Data management; Epidemiology; Logic regression; Meta-analysis

Mesh:

Year:  2017        PMID: 28407816      PMCID: PMC5390441          DOI: 10.1186/s12911-017-0429-1

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   2.796


  18 in total

Review 1.  Traditional reviews, meta-analyses and pooled analyses in epidemiology.

Authors:  M Blettner; W Sauerbrei; B Schlehofer; T Scheuchenpflug; C Friedenreich
Journal:  Int J Epidemiol       Date:  1999-02       Impact factor: 7.196

Review 2.  A systematic review evaluating the methodological aspects of meta-analyses of genetic association studies in cancer research.

Authors:  Stefania Boccia; Emma De Feo; Paola Gallì; Francesco Gianfagna; Rosarita Amore; Gualtiero Ricciardi
Journal:  Eur J Epidemiol       Date:  2010-09-10       Impact factor: 8.082

3.  Logic regression for analysis of the association between genetic variation in the renin-angiotensin system and myocardial infarction or stroke.

Authors:  Charles Kooperberg; Joshua C Bis; Kristin D Marciante; Susan R Heckbert; Thomas Lumley; Bruce M Psaty
Journal:  Am J Epidemiol       Date:  2006-11-02       Impact factor: 4.897

4.  Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies.

Authors:  Isabel Fortier; Dany Doiron; Julian Little; Vincent Ferretti; François L'Heureux; Ronald P Stolk; Bartha M Knoppers; Thomas J Hudson; Paul R Burton
Journal:  Int J Epidemiol       Date:  2011-07-30       Impact factor: 7.196

5.  Preferred Reporting Items for Systematic Review and Meta-Analyses of individual participant data: the PRISMA-IPD Statement.

Authors:  Lesley A Stewart; Mike Clarke; Maroeska Rovers; Richard D Riley; Mark Simmonds; Gavin Stewart; Jayne F Tierney
Journal:  JAMA       Date:  2015-04-28       Impact factor: 56.272

6.  Assessing host-specificity of Escherichia coli using a supervised learning logic-regression-based analysis of single nucleotide polymorphisms in intergenic regions.

Authors:  Shuai Zhi; Qiaozhi Li; Yutaka Yasui; Thomas Edge; Edward Topp; Norman F Neumann
Journal:  Mol Phylogenet Evol       Date:  2015-06-23       Impact factor: 4.286

Review 7.  Developing and validating risk prediction models in an individual participant data meta-analysis.

Authors:  Ikhlaaq Ahmed; Thomas P A Debray; Karel G M Moons; Richard D Riley
Journal:  BMC Med Res Methodol       Date:  2014-01-08       Impact factor: 4.615

8.  Data harmonization and federated analysis of population-based studies: the BioSHaRE project.

Authors:  Vincent Ferretti; Isabel Fortier; Dany Doiron; Paul Burton; Yannick Marcon; Amadou Gaye; Bruce H R Wolffenbuttel; Markus Perola; Ronald P Stolk; Luisa Foco; Cosetta Minelli; Melanie Waldenberger; Rolf Holle; Kirsti Kvaløy; Hans L Hillege; Anne-Marie Tassé
Journal:  Emerg Themes Epidemiol       Date:  2013-11-21

Review 9.  A decade of individual participant data meta-analyses: A review of current practice.

Authors:  Mark Simmonds; Gavin Stewart; Lesley Stewart
Journal:  Contemp Clin Trials       Date:  2015-06-17       Impact factor: 2.226

10.  Individual participant data meta-analysis for a binary outcome: one-stage or two-stage?

Authors:  Thomas P A Debray; Karel G M Moons; Ghada Mohammed Abdallah Abo-Zaid; Hendrik Koffijberg; Richard David Riley
Journal:  PLoS One       Date:  2013-04-09       Impact factor: 3.240

View more
  1 in total

1.  Using logic regression to characterize extreme heat exposures and their health associations: a time-series study of emergency department visits in Atlanta.

Authors:  Shan Jiang; Joshua L Warren; Noah Scovronick; Shannon E Moss; Lyndsey A Darrow; Matthew J Strickland; Andrew J Newman; Yong Chen; Stefanie T Ebelt; Howard H Chang
Journal:  BMC Med Res Methodol       Date:  2021-04-26       Impact factor: 4.615

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.