Literature DB >> 28936917

Scalable collaborative targeted learning for high-dimensional data.

Cheng Ju1, Susan Gruber2, Samuel D Lendle1, Antoine Chambaz1,3, Jessica M Franklin4, Richard Wyss4, Sebastian Schneeweiss4, Mark J van der Laan1.   

Abstract

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well-behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation procedure. The original instantiation of the collaborative targeted minimum loss-based estimation template can be presented as a greedy forward stepwise collaborative targeted minimum loss-based estimation algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel instantiation of the collaborative targeted minimum loss-based estimation template where the covariates are pre-ordered. Its time complexity is O(p) as opposed to the original O(p2) , a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is O(p) as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy collaborative targeted minimum loss-based estimation algorithm is unacceptably slow. Simulation studies seem to indicate that our scalable collaborative targeted minimum loss-based estimation and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.

Entities:  

Keywords:  Observational study; electronic healthcare database; high-dimensional data; propensity score; targeted minimum loss-based estimation; variable selection

Year:  2017        PMID: 28936917      PMCID: PMC6086775          DOI: 10.1177/0962280217729845

Source DB:  PubMed          Journal:  Stat Methods Med Res        ISSN: 0962-2802            Impact factor:   3.021


  21 in total

1.  Collaborative targeted maximum likelihood for time to event data.

Authors:  Ori M Stitelman; Mark J van der Laan
Journal:  Int J Biostat       Date:  2010       Impact factor: 0.968

2.  Regularized Regression Versus the High-Dimensional Propensity Score for Confounding Adjustment in Secondary Database Analyses.

Authors:  Jessica M Franklin; Wesley Eddings; Robert J Glynn; Sebastian Schneeweiss
Journal:  Am J Epidemiol       Date:  2015-08-01       Impact factor: 4.897

3.  Diagnosing and responding to violations in the positivity assumption.

Authors:  Maya L Petersen; Kristin E Porter; Susan Gruber; Yue Wang; Mark J van der Laan
Journal:  Stat Methods Med Res       Date:  2010-10-28       Impact factor: 3.021

4.  Variable selection for propensity score models.

Authors:  M Alan Brookhart; Sebastian Schneeweiss; Kenneth J Rothman; Robert J Glynn; Jerry Avorn; Til Stürmer
Journal:  Am J Epidemiol       Date:  2006-04-19       Impact factor: 4.897

5.  Super learner.

Authors:  Mark J van der Laan; Eric C Polley; Alan E Hubbard
Journal:  Stat Appl Genet Mol Biol       Date:  2007-09-16

6.  An application of collaborative targeted maximum likelihood estimation in causal inference and genomics.

Authors:  Susan Gruber; Mark J van der Laan
Journal:  Int J Biostat       Date:  2010-05-17       Impact factor: 0.968

7.  Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score-based confounding adjustments.

Authors:  Elisabetta Patorno; Robert J Glynn; Sonia Hernández-Díaz; Jun Liu; Sebastian Schneeweiss
Journal:  Epidemiology       Date:  2014-03       Impact factor: 4.822

8.  Finding Quantitative Trait Loci Genes with Collaborative Targeted Maximum Likelihood Learning.

Authors:  Hui Wang; Sherri Rose; Mark J van der Laan
Journal:  Stat Probab Lett       Date:  2011-07-01       Impact factor: 0.870

9.  Comparison of high-dimensional confounder summary scores in comparative studies of newly marketed medications.

Authors:  Hiraku Kumamaru; Joshua J Gagne; Robert J Glynn; Soko Setoguchi; Sebastian Schneeweiss
Journal:  J Clin Epidemiol       Date:  2016-02-27       Impact factor: 6.437

10.  Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records.

Authors:  Sengwee Toh; Luis A García Rodríguez; Miguel A Hernán
Journal:  Pharmacoepidemiol Drug Saf       Date:  2011-06-30       Impact factor: 2.890

View more
  7 in total

1.  Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data.

Authors:  Cheng Ju; Richard Wyss; Jessica M Franklin; Sebastian Schneeweiss; Jenny Häggström; Mark J van der Laan
Journal:  Stat Methods Med Res       Date:  2017-12-11       Impact factor: 3.021

2.  Synthetic Negative Controls: Using Simulation to Screen Large-scale Propensity Score Analyses.

Authors:  Richard Wyss; Sebastian Schneeweiss; Kueiyu Joshua Lin; David P Miller; Linda Kalilani; Jessica M Franklin
Journal:  Epidemiology       Date:  2022-04-12       Impact factor: 4.860

Review 3.  When Can Nonrandomized Studies Support Valid Inference Regarding Effectiveness or Safety of New Medical Treatments?

Authors:  Jessica M Franklin; Richard Platt; Nancy A Dreyer; Alex John London; Gregory E Simon; Jonathan H Watanabe; Michael Horberg; Adrian Hernandez; Robert M Califf
Journal:  Clin Pharmacol Ther       Date:  2021-05-09       Impact factor: 6.903

4.  Analyses of child cardiometabolic phenotype following assisted reproductive technologies using a pragmatic trial emulation approach.

Authors:  Jonathan Yinhao Huang; Shirong Cai; Zhongwei Huang; Mya Thway Tint; Wen Lun Yuan; Izzuddin M Aris; Keith M Godfrey; Neerja Karnani; Yung Seng Lee; Jerry Kok Yen Chan; Yap Seng Chong; Johan Gunnar Eriksson; Shiao-Yng Chan
Journal:  Nat Commun       Date:  2021-09-23       Impact factor: 14.919

Review 5.  Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature.

Authors:  Richard Wyss; Chen Yanover; Tal El-Hay; Dimitri Bennett; Robert W Platt; Andrew R Zullo; Grammati Sari; Xuerong Wen; Yizhou Ye; Hongbo Yuan; Mugdha Gokhale; Elisabetta Patorno; Kueiyu Joshua Lin
Journal:  Pharmacoepidemiol Drug Saf       Date:  2022-07-05       Impact factor: 2.732

Review 6.  Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects.

Authors:  Sebastian Schneeweiss
Journal:  Clin Epidemiol       Date:  2018-07-06       Impact factor: 4.790

7.  A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases.

Authors:  Imane Benasseur; Denis Talbot; Madeleine Durand; Anne Holbrook; Alexis Matteau; Brian J Potter; Christel Renoux; Mireille E Schnitzer; Jean-Éric Tarride; Jason R Guertin
Journal:  Pharmacoepidemiol Drug Saf       Date:  2022-01-07       Impact factor: 2.732

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.