Literature DB >> 32101638

Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions.

Min Zhang1, Youfei Yu1, Shikun Wang2, Maxwell Salvatore1, Lars G Fritsche1, Zihuai He3, Bhramar Mukherjee1.   

Abstract

The statistical practice of modeling interaction with two linear main effects and a product term is ubiquitous in the statistical and epidemiological literature. Most data modelers are aware that the misspecification of main effects can potentially cause severe type I error inflation in tests for interactions, leading to spurious detection of interactions. However, modeling practice has not changed. In this article, we focus on the specific situation where the main effects in the model are misspecified as linear terms and characterize its impact on common tests for statistical interaction. We then propose some simple alternatives that fix the issue of potential type I error inflation in testing interaction due to main effect misspecification. We show that when using the sandwich variance estimator for a linear regression model with a quantitative outcome and two independent factors, both the Wald and score tests asymptotically maintain the correct type I error rate. However, if the independence assumption does not hold or the outcome is binary, using the sandwich estimator does not fix the problem. We further demonstrate that flexibly modeling the main effect under a generalized additive model can largely reduce or often remove bias in the estimates and maintain the correct type I error rate for both quantitative and binary outcomes regardless of the independence assumption. We show, under the independence assumption and for a continuous outcome, overfitting and flexibly modeling the main effects does not lead to power loss asymptotically relative to a correctly specified main effect model. Our simulation study further demonstrates the empirical fact that using flexible models for the main effects does not result in a significant loss of power for testing interaction in general. Our results provide an improved understanding of the strengths and limitations for tests of interaction in the presence of main effect misspecification. Using data from a large biobank study "The Michigan Genomics Initiative", we present two examples of interaction analysis in support of our results.
© 2020 John Wiley & Sons, Ltd.

Entities:  

Keywords:  gene-environment interaction; generalized additive model (GAM); independence; joint tests; power; robust tests; sandwich variance estimator; type I error

Mesh:

Year:  2020        PMID: 32101638     DOI: 10.1002/sim.8505

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  2 in total

1.  A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures.

Authors:  Jonathan Boss; Alexander Rix; Yin-Hsiu Chen; Naveen N Narisetty; Zhenke Wu; Kelly K Ferguson; Thomas F McElrath; John D Meeker; Bhramar Mukherjee
Journal:  Environmetrics       Date:  2021-07-30       Impact factor: 1.527

2.  GEM: scalable and flexible gene-environment interaction analysis in millions of samples.

Authors:  Kenneth E Westerman; Duy T Pham; Liang Hong; Ye Chen; Magdalena Sevilla-González; Yun Ju Sung; Yan V Sun; Alanna C Morrison; Han Chen; Alisa K Manning
Journal:  Bioinformatics       Date:  2021-10-25       Impact factor: 6.931

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.