Anli Gao1, Jennifer Fischer-Jenssen1, Charles Wroblewski1, Perry Martos1.
Abstract
BACKGROUND: Bacterial enumeration data are typically log transformed to realize a more normal distribution and stabilize the variance. Unfortunately, statistical results from log transformed data are often misinterpreted as data within the arithmetic domain.
OBJECTIVE: To explore the implication of slope and intercept from an unweighted linear regression and compare it to the results of the regression of log transformed data.
METHOD: Mathematical formulae inferencing explained using real dataset.
RESULTS: For y=Ax+B+ε, where y is the recovery (CFU/g) and x is the target concentration (CFU/g) with error ε homogeneous across x. When B=0, slope A estimates percent recovery R. In the regression of log transformed data, logy=αlogx+β+εz (equivalent to equation y=Axα·ω), it is the intercept β=logyx=logA that estimates the percent recovery in logarithm when slope α=1, which means that R doesn't vary over x. Error term ω is multiplicative to x, while εz or log(ω) is additive to log(x). Whether the data should be transformed or not is not a choice, but a decision based on the distribution of the data. Significant difference was not found between the five models (the linear regression of log transformed data, three generalized linear models and a nonlinear model) regarding their predicted percent recovery when applied to our data. An acceptable regression model should result in approximately the best normal distribution of residuals.
CONCLUSIONS: Statistical procedures making use of log transformed data should be studied separately and documented as such, not collectively reported and interpreted with results studied in arithmetic domain. HIGHLIGHTS: The way to interpret statistical results developed from arithmetic domain does not apply to that of the log transformed data. © AOAC INTERNATIONAL 2020. All rights reserved. For permissions, please email: journals.permissions@oup.com.
BACKGROUND: Bacterial enumeration data are typically log transformed to realize a more normal distribution and stabilize the variance. Unfortunately, statistical results from log transformed data are often misinterpreted as data within the arithmetic domain.
OBJECTIVE: To explore the implication of slope and intercept from an unweighted linear regression and compare it to the results of the regression of log transformed data.
METHOD: Mathematical formulae inferencing explained using real dataset.
RESULTS: For y=Ax+B+ε, where y is the recovery (CFU/g) and x is the target concentration (CFU/g) with error ε homogeneous across x. When B=0, slope A estimates percent recovery R. In the regression of log transformed data, logy=αlogx+β+εz (equivalent to equation y=Axα·ω), it is the intercept β=logyx=logA that estimates the percent recovery in logarithm when slope α=1, which means that R doesn't vary over x. Error term ω is multiplicative to x, while εz or log(ω) is additive to log(x). Whether the data should be transformed or not is not a choice, but a decision based on the distribution of the data. Significant difference was not found between the five models (the linear regression of log transformed data, three generalized linear models and a nonlinear model) regarding their predicted percent recovery when applied to our data. An acceptable regression model should result in approximately the best normal distribution of residuals.
CONCLUSIONS: Statistical procedures making use of log transformed data should be studied separately and documented as such, not collectively reported and interpreted with results studied in arithmetic domain. HIGHLIGHTS: The way to interpret statistical results developed from arithmetic domain does not apply to that of the log transformed data. © AOAC INTERNATIONAL 2020. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Entities:
Mesh:
Year: 2020
PMID: 33241340 DOI: 10.1093/jaoacint/qsaa005
Source DB: PubMed Journal: J AOAC Int ISSN: 1060-3271 Impact factor: 1.913