Amand F Schmidt1, Chris Finan2. 1. Faculty of Population Health, Institute of Cardiovascular Science, University College London, London WC1E 6BT, United Kingdom; Groningen Research Institute of Pharmacy, University of Groningen, Groningen, The Netherlands; Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht, The Netherlands. Electronic address: amand.schmidt@ucl.ac.uk. 2. Faculty of Population Health, Institute of Cardiovascular Science, University College London, London WC1E 6BT, United Kingdom.
Abstract
OBJECTIVES: Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. STUDY DESIGN AND SETTING: Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. RESULTS: Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. CONCLUSION: Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations.
OBJECTIVES: Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. STUDY DESIGN AND SETTING: Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. RESULTS: Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. CONCLUSION: Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations.
Authors: Casey R Guillot; Sabrina M Blackledge; Megan E Douglas; Renee M Cloutier; Madalyn M Liautaud; Raina D Pang; Matthew G Kirkpatrick; Adam M Leventhal Journal: Behav Med Date: 2019-04-30 Impact factor: 3.104
Authors: Megan E Narad; Eloise E Kaizar; Nanhua Zhang; H Gerry Taylor; Keith Owen Yeates; Brad G Kurowski; Shari L Wade Journal: J Dev Behav Pediatr Date: 2022-02-15 Impact factor: 2.988
Authors: Pariya L Fazeli; Steven Paul Woods; Crystal Chapman Lambert; Drenna Waldrop-Valverde; David E Vance Journal: Arch Clin Neuropsychol Date: 2020-07-24 Impact factor: 2.813
Authors: Tonny Ssekamatte; John Bosco Isunju; Paul Alex Kimoga Zirimala; Samuel Etajak; Saul Kamukama; Mathias Seviiri; Mary Nakafeero; Aisha Nalugya; Solomon Tsebeni Wafula; Edwinah Atusingwize; Justine N Bukenya; Richard K Mugambe Journal: Health Psychol Behav Med Date: 2021-04-07