| Literature DB >> 27843795 |
Rakesh Aggarwal1, Priya Ranganathan2.
Abstract
Correlation is a statistical technique which shows whether and how strongly two continuous variables are related. In this article, which is the eighth part in a series on 'Common pitfalls in Statistical Analysis', we look at the interpretation of the correlation coefficient and examine various situations in which the use of technique of correlation may be inappropriate.Entities:
Keywords: Biostatistics; correlation; “data interpretation, statistical”
Year: 2016 PMID: 27843795 PMCID: PMC5079093 DOI: 10.4103/2229-3485.192046
Source DB: PubMed Journal: Perspect Clin Res ISSN: 2229-3485
Figure 1Scatter plots of relationship between values of two quantitative variables and their corresponding correlation coefficient (r) values. “r” can vary between − 1.0 and + 1.0. If as the values of one variable (say on X-axis) increase, those of the other variable (on Y-axis) increase, “r” is positive (a-c); however, if the latter decrease, “r” is negative (d-f). When the values of two variables have no clear relation, “r” is zero (g). The absolute values of “r” are higher when the individual data points are closer to a line showing the linear trend (a > b > c; d > e > f)
Figure 2Situations in which linear correlation should not be used: (a) two variables have a relationship which is nonlinear (analysis of data points in this figure shows r = 0, thus failing to detect the relationship), (b) the data have one or a few outliers (one outlier at right upper end resulted in a false relationship with r = 0.71; exclusion of this point reduces r to near zero), (c) when the data have two subgroups, within each of which there is no correlation, and (d) when variability in values on Y-axis changes with values on X-axis. Each situation is described further in the text