A S Rigby1. 1. Sheffield Children's Hospital, University of Sheffield, Western Bank, UK. a.s.rigby@sheffield.ac.uk
Abstract
PURPOSE: The statistical terms 'correlation' and 'regression' are frequently mistaken for each other in the scientific literature. Why this is so is unclear. This paper discusses their differences/ similarities arguing that in most circumstances regression is the most appropriate technique to use, since regression incorporates a notion of dependency of one variable on another. METHOD: Pearson's correlation coefficient (r) is introduced as a method for estimating the degree of linear association between two normally distributed variables. The problem of least squares' regression (when y depends on x) is introduced by considering the best-fitting straight line between points on a scatter plot. RESULTS: Correlation, regression analysis and residual estimation are discussed by taking examples from the author's own teaching experiences. CONCLUSIONS: Correlation and regression share some similarities. However, regression is the better technique to use because with it comes a notion of dependency of one variable upon another. Regression model checking includes residual examination. The importance of plotting and examination of residuals cannot be overemphasized. Residual examination should become as much a part of a regression analysis as the estimation of the regression coefficients themselves.
PURPOSE: The statistical terms 'correlation' and 'regression' are frequently mistaken for each other in the scientific literature. Why this is so is unclear. This paper discusses their differences/ similarities arguing that in most circumstances regression is the most appropriate technique to use, since regression incorporates a notion of dependency of one variable on another. METHOD: Pearson's correlation coefficient (r) is introduced as a method for estimating the degree of linear association between two normally distributed variables. The problem of least squares' regression (when y depends on x) is introduced by considering the best-fitting straight line between points on a scatter plot. RESULTS: Correlation, regression analysis and residual estimation are discussed by taking examples from the author's own teaching experiences. CONCLUSIONS: Correlation and regression share some similarities. However, regression is the better technique to use because with it comes a notion of dependency of one variable upon another. Regression model checking includes residual examination. The importance of plotting and examination of residuals cannot be overemphasized. Residual examination should become as much a part of a regression analysis as the estimation of the regression coefficients themselves.
Authors: Arjan Malekzadeh; Wietske Van de Geer-Peeters; Vincent De Groot; Charlotte Elisabeth Teunissen; Heleen Beckerman Journal: Dis Markers Date: 2015-01-19 Impact factor: 3.434