| Literature DB >> 25092958 |
Changyong Feng1, Hongyue Wang1, Naiji Lu1, Tian Chen1, Hua He1, Ying Lu2, Xin M Tu1.
Abstract
SUMMARY: The log-transformation is widely used in biomedical and psychosocial research to deal with skewed data. This paper highlights serious problems in this classic approach for dealing with skewed data. Despite the common belief that the log transformation can decrease the variability of data and make data conform more closely to the normal distribution, this is usually not the case. Moreover, the results of standard statistical tests performed on log-transformed data are often not relevant for the original, non-transformed data.We demonstrate these problems by presenting examples that use simulated data. We conclude that if used at all, data transformations must be applied very cautiously. We recommend that in most circumstances researchers abandon these traditional methods of dealing with skewed data and, instead, use newer analytic methods that are not dependent on the distribution the data, such as generalized estimating equations (GEE).Entities:
Keywords: hypothesis testing; lon-normal distribution; normal distribution; outliners; skewness
Year: 2014 PMID: 25092958 PMCID: PMC4120293 DOI: 10.3969/j.issn.1002-0829.2014.02.009
Source DB: PubMed Journal: Shanghai Arch Psychiatry ISSN: 1002-0829
Simulation results for simple linear regression without outliers (n=100; 100,000 simulations)
| β0 | Original data | log-transformed data | ||
|---|---|---|---|---|
| Estimated Intercept | SE | Estimated Intercept | SE | |
| 0.50 | 0.5000 | 0.0288 | -0.9999 | 0.0998 |
| 0.51 | 0.5100 | 0.0289 | -0.9440 | 0.0887 |
| 0.55 | 0.5499 | 0.0289 | -0.7993 | 0.0718 |
| 0.60 | 0.6001 | 0.0290 | -0.6647 | 0.0608 |
| 0.70 | 0.7002 | 0.0289 | -0.4591 | 0.0480 |
| 0.80 | 0.8000 | 0.0288 | -0.2977 | 0.0401 |
| 0.90 | 0.8999 | 0.0288 | -0.1626 | 0.0347 |
| 1.00 | 1.0001 | 0.0288 | -0.0451 | 0.0307 |
| 1.50 | 1.5000 | 0.0289 | 0.3863 | 0.0198 |
| 5.50 | 5.5000 | 0.0289 | 1.7034 | 0.0053 |
Simulation results for simple linear regression with outliers (n=104; 100,000 simulations)
| β0 | Original data | log-transformed data | ||
|---|---|---|---|---|
| Estimated Intercept | SE | Estimated Intercept | SE | |
| 0.50 | 0.7501 | 0.0277 | -0.8886 | 0.0960 |
| 0.51 | 0.7599 | 0.0277 | -0.8350 | 0.0849 |
| 0.55 | 0.7999 | 0.0277 | -0.6956 | 0.0689 |
| 0.60 | 0.8500 | 0.0278 | -0.5660 | 0.0585 |
| 0.70 | 0.9500 | 0.0287 | -0.3678 | 0.0461 |
| 0.80 | 1.0499 | 0.0277 | -0.2119 | 0.0386 |
| 0.90 | 1.1500 | 0.0278 | -0.0811 | 0.0335 |
| 1.00 | 1.2501 | 0.0277 | 0.0323 | 0.0296 |
| 1.50 | 1.7499 | 0.0278 | 0.4497 | 0.0190 |
| 5.50 | 5.7501 | 0.0278 | 1.7328 | 0.0051 |