Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Automatic identification of variables in epidemiological datasets using logic regression.

Literature DB >> 28407816

Automatic identification of variables in epidemiological datasets using logic regression.

Matthias W Lorenz¹, Negin Ashtiani Abdi², Frank Scheckenbach³, Anja Pflug³, Alpaslan Bülbül³, Alberico L Catapano^4,5, Stefan Agewall^6,7, Marat Ezhov⁸, Michiel L Bots^9,10, Stefan Kiechl¹¹, Andreas Orth².

Abstract

BACKGROUND: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable.
METHODS: For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated.
RESULTS: In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables.
CONCLUSIONS: We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

Entities: Chemical Disease Gene Species

Keywords: Data management; Epidemiology; Logic regression; Meta-analysis

Mesh：

Year: 2017 PMID： 28407816 PMCID： PMC5390441 DOI： 10.1186/s12911-017-0429-1

Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN： 1472-6947 Impact factor: 2.796

18 in total

Review 1. Traditional reviews, meta-analyses and pooled analyses in epidemiology.

Authors: M Blettner; W Sauerbrei; B Schlehofer; T Scheuchenpflug; C Friedenreich
Journal: Int J Epidemiol Date: 1999-02 Impact factor: 7.196

Review 2. A systematic review evaluating the methodological aspects of meta-analyses of genetic association studies in cancer research.

Authors: Stefania Boccia; Emma De Feo; Paola Gallì; Francesco Gianfagna; Rosarita Amore; Gualtiero Ricciardi
Journal: Eur J Epidemiol Date: 2010-09-10 Impact factor: 8.082

3. Logic regression for analysis of the association between genetic variation in the renin-angiotensin system and myocardial infarction or stroke.

Authors: Charles Kooperberg; Joshua C Bis; Kristin D Marciante; Susan R Heckbert; Thomas Lumley; Bruce M Psaty
Journal: Am J Epidemiol Date: 2006-11-02 Impact factor: 4.897

4. Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies.

Authors: Isabel Fortier; Dany Doiron; Julian Little; Vincent Ferretti; François L'Heureux; Ronald P Stolk; Bartha M Knoppers; Thomas J Hudson; Paul R Burton
Journal: Int J Epidemiol Date: 2011-07-30 Impact factor: 7.196

5. Preferred Reporting Items for Systematic Review and Meta-Analyses of individual participant data: the PRISMA-IPD Statement.

Authors: Lesley A Stewart; Mike Clarke; Maroeska Rovers; Richard D Riley; Mark Simmonds; Gavin Stewart; Jayne F Tierney
Journal: JAMA Date: 2015-04-28 Impact factor: 56.272

6. Assessing host-specificity of Escherichia coli using a supervised learning logic-regression-based analysis of single nucleotide polymorphisms in intergenic regions.

Authors: Shuai Zhi; Qiaozhi Li; Yutaka Yasui; Thomas Edge; Edward Topp; Norman F Neumann
Journal: Mol Phylogenet Evol Date: 2015-06-23 Impact factor: 4.286

Review 7. Developing and validating risk prediction models in an individual participant data meta-analysis.

Authors: Ikhlaaq Ahmed; Thomas P A Debray; Karel G M Moons; Richard D Riley
Journal: BMC Med Res Methodol Date: 2014-01-08 Impact factor: 4.615

8. Data harmonization and federated analysis of population-based studies: the BioSHaRE project.

Authors: Vincent Ferretti; Isabel Fortier; Dany Doiron; Paul Burton; Yannick Marcon; Amadou Gaye; Bruce H R Wolffenbuttel; Markus Perola; Ronald P Stolk; Luisa Foco; Cosetta Minelli; Melanie Waldenberger; Rolf Holle; Kirsti Kvaløy; Hans L Hillege; Anne-Marie Tassé
Journal: Emerg Themes Epidemiol Date: 2013-11-21

Review 9. A decade of individual participant data meta-analyses: A review of current practice.

Authors: Mark Simmonds; Gavin Stewart; Lesley Stewart
Journal: Contemp Clin Trials Date: 2015-06-17 Impact factor: 2.226

10. Individual participant data meta-analysis for a binary outcome: one-stage or two-stage?

Authors: Thomas P A Debray; Karel G M Moons; Ghada Mohammed Abdallah Abo-Zaid; Hendrik Koffijberg; Richard David Riley
Journal: PLoS One Date: 2013-04-09 Impact factor: 3.240

1 in total

1. Using logic regression to characterize extreme heat exposures and their health associations: a time-series study of emergency department visits in Atlanta.

Authors: Shan Jiang; Joshua L Warren; Noah Scovronick; Shannon E Moss; Lyndsey A Darrow; Matthew J Strickland; Andrew J Newman; Yong Chen; Stefanie T Ebelt; Howard H Chang
Journal: BMC Med Res Methodol Date: 2021-04-26 Impact factor: 4.615

1 in total