| Literature DB >> 29706643 |
Oliver Pain1,2, Frank Dudbridge2,3, Angelica Ronald4.
Abstract
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.Entities:
Mesh:
Year: 2018 PMID: 29706643 PMCID: PMC6057994 DOI: 10.1038/s41431-018-0159-6
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Skew, range, and correlation with covariates for dependent variables derived from TEDS sample
| Dependent variable | Range | Skew | Pearson correlation with age | Pearson correlation with sex |
|---|---|---|---|---|
| Paranoia | 5 | 1.357 | 0.055 | 0.018 |
| Paranoia | 10 | 1.195 | 0.043 | −0.026 |
| Paranoia | 20 | 1.095 | 0.03 | −0.022 |
| Paranoia | 40 | 1.296 | 0.022 | −0.059 |
| Anhedonia | 5 | 1.868 | −0.006 | 0.177 |
| Anhedonia | 10 | 0.858 | −0.025 | 0.127 |
| Anhedonia | 20 | 0.651 | −0.02 | 0.135 |
| Anhedonia | 40 | 0.537 | −0.013 | 0.205 |
Fig. 1The relationship between the number of available responses (x-axis) and correlation between normalized residuals and covariate (y-axis) for different values of the skew in the raw phenotypic data. Within this figure, the correlation between the untransformed phenotypic data and covariate data is at 0.06
Fig. 2The effect of applying a rank-based INT to residuals of questionnaire-type data, i.e., after regressing out covariates. All correlations referred to in this figure are Pearson (linear) correlations. a Untransformed questionnaire-type variable and its relationship with a continuous covariate. The questionnaire-type variable has a range of 5. A weak linear relationship exists between the questionnaire-type variable and covariate. b Questionnaire-type variable residuals after regressing out the relationship with the covariate. No linear relationship exists between the questionnaire-type residuals and covariate. Regressing out covariate effects has led to the separation of many tied observations, creating a covariate-based rank within the questionnaire-type variable residuals. c After the rank-based INT of questionnaire-type variable residuals, the transformed questionnaire-type variable residuals show a strong linear correlation with the covariate. This correlation is stronger and in the opposite direction to the original correlation between the untransformed questionnaire-type variable and the covariate
Fig. 3The effect of applying a rank-based INT to questionnaire-type data before regressing out covariates. All correlations referred to in this figure are Pearson (linear) correlations. a Untransformed questionnaire-type variable and its relationship with a continuous covariate. The questionnaire-type variable has a range of 5. A weak linear relationship exists between the questionnaire-type variable and covariate. b Questionnaire-type variable after rank-based INT, randomly splitting tied observations. Relationship between the questionnaire-type variable remains intact. c Covariate effects have been regressed from the normalized questionnaire-type variable. There is no linear relationship between the residuals and the covariate, and the skew is close to zero