Literature DB >> 34000733

Reflection on modern methods: good practices for applied statistical learning in epidemiology.

Yanelli Nunez1, Elizabeth A Gibson1, Eva M Tanner2, Chris Gennings2, Brent A Coull3, Jeff Goldsmith4, Marianthi-Anna Kioumourtzoglou1.   

Abstract

Statistical learning includes methods that extract knowledge from complex data. Statistical learning methods beyond generalized linear models, such as shrinkage methods or kernel smoothing methods, are being increasingly implemented in public health research and epidemiology because they can perform better in instances with complex or high-dimensional data-settings in which traditional statistical methods fail. These novel methods, however, often include random sampling which may induce variability in results. Best practices in data science can help to ensure robustness. As a case study, we included four statistical learning models that have been applied previously to analyze the relationship between environmental mixtures and health outcomes. We ran each model across 100 initializing values for random number generation, or 'seeds', and assessed variability in resulting estimation and inference. All methods exhibited some seed-dependent variability in results. The degree of variability differed across methods and exposure of interest. Any statistical learning method reliant on a random seed will exhibit some degree of seed sensitivity. We recommend that researchers repeat their analysis with various seeds as a sensitivity analysis when implementing these methods to enhance interpretability and robustness of results.
© The Author(s) 2021; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.

Entities:  

Keywords:  Bayesian statistics; Statistical learning; environmental mixtures; machine learning; penalized regression; random seed

Mesh:

Year:  2021        PMID: 34000733      PMCID: PMC8128480          DOI: 10.1093/ije/dyaa259

Source DB:  PubMed          Journal:  Int J Epidemiol        ISSN: 0300-5771            Impact factor:   9.685


  20 in total

1.  Telomere measurement by quantitative PCR.

Authors:  Richard M Cawthon
Journal:  Nucleic Acids Res       Date:  2002-05-15       Impact factor: 16.971

2.  Model selection and health effect estimation in environmental epidemiology.

Authors:  Francesca Dominici; Chi Wang; Ciprian Crainiceanu; Giovanni Parmigiani
Journal:  Epidemiology       Date:  2008-07       Impact factor: 4.822

3.  Socioeconomic status, health behavior, and leukocyte telomere length in the National Health and Nutrition Examination Survey, 1999-2002.

Authors:  Belinda L Needham; Nancy Adler; Steven Gregorich; David Rehkopf; Jue Lin; Elizabeth H Blackburn; Elissa S Epel
Journal:  Soc Sci Med       Date:  2013-02-21       Impact factor: 4.634

4.  The estimation of total serum lipids by a completely enzymatic 'summation' method.

Authors:  J R Akins; K Waldrep; J T Bernert
Journal:  Clin Chim Acta       Date:  1989-10-16       Impact factor: 3.786

5.  Early prenatal exposure to suspected endocrine disruptor mixtures is associated with lower IQ at age seven.

Authors:  Eva M Tanner; Maria Unenge Hallerbäck; Sverre Wikström; Christian Lindh; Hannu Kiviranta; Chris Gennings; Carl-Gustaf Bornehag
Journal:  Environ Int       Date:  2019-10-24       Impact factor: 9.621

6.  Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.

Authors:  Jennifer F Bobb; Linda Valeri; Birgit Claus Henn; David C Christiani; Robert O Wright; Maitreyi Mazumdar; John J Godleski; Brent A Coull
Journal:  Biostatistics       Date:  2014-12-22       Impact factor: 5.279

7.  Prenatal Phthalate, Perfluoroalkyl Acid, and Organochlorine Exposures and Term Birth Weight in Three Birth Cohorts: Multi-Pollutant Models Based on Elastic Net Regression.

Authors:  Virissa Lenters; Lützen Portengen; Anna Rignell-Hydbom; Bo A G Jönsson; Christian H Lindh; Aldert H Piersma; Gunnar Toft; Jens Peter Bonde; Dick Heederik; Lars Rylander; Roel Vermeulen
Journal:  Environ Health Perspect       Date:  2015-06-26       Impact factor: 9.031

8.  Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression.

Authors:  Jennifer F Bobb; Birgit Claus Henn; Linda Valeri; Brent A Coull
Journal:  Environ Health       Date:  2018-08-20       Impact factor: 5.984

9.  An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length.

Authors:  Elizabeth A Gibson; Yanelli Nunez; Ahlam Abuawad; Ami R Zota; Stefano Renzetti; Katrina L Devick; Chris Gennings; Jeff Goldsmith; Brent A Coull; Marianthi-Anna Kioumourtzoglou
Journal:  Environ Health       Date:  2019-08-28       Impact factor: 5.984

10.  Cross-sectional Associations between Exposure to Persistent Organic Pollutants and Leukocyte Telomere Length among U.S. Adults in NHANES, 2001-2002.

Authors:  Susanna D Mitro; Linda S Birnbaum; Belinda L Needham; Ami R Zota
Journal:  Environ Health Perspect       Date:  2015-10-09       Impact factor: 9.031

View more
  1 in total

1.  Prenatal metal(loid) mixtures and birth weight for gestational age: A pooled analysis of three cohorts participating in the ECHO program.

Authors:  Caitlin G Howe; Sara S Nozadi; Erika Garcia; Thomas G O'Connor; Anne P Starling; Shohreh F Farzan; Brian P Jackson; Juliette C Madan; Akram N Alshawabkeh; José F Cordero; Theresa M Bastain; John D Meeker; Carrie V Breton; Margaret R Karagas
Journal:  Environ Int       Date:  2022-01-23       Impact factor: 9.621

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.