Literature DB >> 30191186

User's guide to correlation coefficients.

Haldun Akoglu1.   

Abstract

When writing a manuscript, we often use words such as perfect, strong, good or weak to name the strength of the relationship between variables. However, it is unclear where a good relationship turns into a strong one. The same strength of r is named differently by several researchers. Therefore, there is an absolute necessity to explicitly report the strength and direction of r while reporting correlation coefficients in manuscripts. This article aims to familiarize medical readers with several different correlation coefficients reported in medical manuscripts, clarify confounding aspects and summarize the naming practices for the strength of correlation coefficients.

Entities:  

Keywords:  Correlation coefficient; Cramer's; Interpretation; Lin's; Pearson's; Spearman's

Year:  2018        PMID: 30191186      PMCID: PMC6107969          DOI: 10.1016/j.tjem.2018.08.001

Source DB:  PubMed          Journal:  Turk J Emerg Med        ISSN: 2452-2473


Introduction

Medical research is naturally based on finding the relationship between the known and the unknown. Clinicians gather information via history, physical examination, laboratory tests and imaging; then, they use this information to infer clinical diagnosis, outcomes and treatment choices. Therefore, an endless struggle to link what is already known to what needs to be known goes on. We try to infer the mortality risk of a myocardial infarction patient from the level of troponin or cardiac scores so that we can select the appropriate treatment among options with various risks. We are trying to calculate the risk of mortality from the level of troponin or TIMI score. The most basic form of mathematically connecting the dots between the known and unknown forms the foundations of the correlational analysis. Correlation is defined as a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected by chance alone by the Merriam-Webster dictionary. A classic example would be the apparent and high correlation between the systolic (SBP) and diastolic blood pressures (DBP). The correlation between two variables (eg., systolic and diastolic pressures) is called a bivariate correlation and can be shown on a scatterplot diagram if both are continuous (scale) variables (Fig. 1). It is clear from the figure that SBP and DBP increase and decrease together, therefore, they are highly correlated. If we want to remove the effect of a third variable from the correlation between two variables, then we have to calculate a Partial correlation. It is a form of correlation which quantifies the relationship between two variables while controlling the effect of one or more additional variables (eg., age, sex, treatment received, etc.). In the figure male and female subjects are colored separately to examine if sex affects the correlation between SBP and DBP, or not.
Fig. 1

Scatterplot of systolic and diastolic blood pressures of a study group according to sex.

Scatterplot of systolic and diastolic blood pressures of a study group according to sex. The most important fact is that correlation does not imply causation. As the ice-cream sales increase, the rate of deaths from drownings, and the frequency of forest fires increase as well. These facts happen at the same period, doesn't cause one another. The relationship (or the correlation) between the two variables is denoted by the letter r and quantified with a number, which varies between −1 and +1. Zero means there is no correlation, where 1 means a complete or perfect correlation. The sign of the r shows the direction of the correlation. A negative r means that the variables are inversely related. The strength of the correlation increases both from 0 to +1, and 0 to −1. When writing a manuscript, we often use words such as perfect, strong, good or weak to name the strength of the relationship between variables. However, it is unclear where a good relationship turns into a strong one. The same strength of r is named differently by several researchers. Therefore, there is an absolute necessity to explicitly report the strength and direction of r while reporting correlation coefficients in manuscripts. This article aims to familiarize medical readers with several different correlation coefficients reported in medical manuscripts, clarify confounding aspects and summarize the naming practices for the strength of correlation coefficients.

How to name the strength of the relationship for different coefficients?

Bivariate correlation coefficients: Pearson's r, Spearman's rho (rs) and Kendall's Tau (τ)

Those tests use the data from the two variables and test if there is a linear relationship between them or not. Therefore, the first step is to check the relationship by a scatterplot for linearity. Pearson's r is calculated by a parametric test which needs normally distributed continuous variables, and is the most commonly reported correlation coefficient. For non-normal distributions (for data with extreme values, outliers), correlation coefficients should be calculated from the ranks of the data, not from their actual values. The coefficients designed for this purpose are Spearman's rho (denoted as rs) and Kendall's Tau. In fact, normality is essential for the calculation of the significance and confidence intervals, not the correlation coefficient itself. Kendall's tau is an extension of Spearman's rho. It should be used when the same rank is repeated too many times in a small dataset. Some authors suggest that Kendall's tau may draw more accurate generalizations compared to Spearman's rho in the population. After the calculation of the above coefficients, an interesting question arises: how can we name this strength? All researchers tend to report that there is a strong relationship between what they have tested. However, most of the time, the significance is incorrectly reported instead of the strength of the relationship. A statistically significant correlation does not necessarily mean that the strength of the correlation is strong. The p-value shows the probability that this strength may occur by chance. In the dataset shown in Fig. 1, the correlation coefficient of systolic and diastolic blood pressures was 0.64, with a p-value of less than 0.0001. This r of 0.64 is moderate to strong correlation with a very high statistical significance (p < 0.0001). In the same dataset, the correlation coefficient of diastolic blood pressure and age was just 0.31 with the same p-value. Even though, it has the same and very high statistical significance level, it is a weak one. The low level of the p-value reassures us that 99.99% of the time the correlation is weak at an r of 0.31. In this context, the utmost importance should be given to avoid misunderstandings when reporting correlation coefficients and naming their strength. In Table 1, we provided a combined chart of the three most commonly used interpretations of the r values. Authors of those definitions are from different research areas and specialties.
Table 1

Interpretation of the Pearson's and Spearman's correlation coefficients.

Correlation CoefficientDancey & Reidy (Psychology)Quinnipiac University (Politics)Chan YH (Medicine)
+1−1PerfectPerfectPerfect
+0.9−0.9StrongVery StrongVery Strong
+0.8−0.8StrongVery StrongVery Strong
+0.7−0.7StrongVery StrongModerate
+0.6−0.6ModerateStrongModerate
+0.5−0.5ModerateStrongFair
+0.4−0.4ModerateStrongFair
+0.3−0.3WeakModerateFair
+0.2−0.2WeakWeakPoor
+0.1−0.1WeakNegligiblePoor
00ZeroNoneNone

The naming on the 1) Left: Dancey & Reidy., 2) Middle: The Political Science Department at Quinnipiac University, 3) Right: Chan et al..

Interpretation of the Pearson's and Spearman's correlation coefficients. The naming on the 1) Left: Dancey & Reidy., 2) Middle: The Political Science Department at Quinnipiac University, 3) Right: Chan et al..

Phi Coefficient and Cramer's V Correlation

Phi is a measure for the strength of an association between two categorical variables in a 2 × 2 contingency table. It is calculated by taking the chi-square value, dividing it by the sample size, and then taking the square root of this value. It varies between 0 and 1 without any negative values (Table 2).
Table 2

Interpretation of Phi and Cramer's V.

Phi and Cramer's VInterpretation
>0.25Very strong
>0.15Strong
>0.10Moderate
>0.05Weak
>0No or very weak
Interpretation of Phi and Cramer's V. Cramer's V is an alternative to phi in tables bigger than 2 × 2 tabulation. Cramer's V varies between 0 and 1 without any negative values. Similar to Pearson's r, a value close to 0 means no association. However, a value bigger than 0.25 is named as a very strong relationship for the Cramer's V (Table 2).

Concordance Correlation Coefficient (CCC)

Lin's concordance correlation coefficient (ρc) is a measure which tests how well bivariate pairs of observations conform relative to a gold standard or another set. Lin's CCC (ρc) measures both precision (ρ) and accuracy (Cβ). It ranges from 0 to ±1 similar to Pearson's. Altman suggested that it should be interpreted close to other correlation coefficients like Pearson's, with <0.2 as poor and >0.8 as excellent. On the contrary, McBride suggested another set for the interpretation (Table 3).
Table 3

Interpretation of Lin's CCC according to McBride et al..

Value of the Lin's CCCInterpretation
>0.99Almost Perfect
0.95 to 0.99Substantial
0.90 to 0.95Moderate
<0.90Poor
Interpretation of Lin's CCC according to McBride et al..

Conclusion

Interpretation of correlation coefficients differs significantly among scientific research areas. There are no absolute rules for the interpretation of their strength. Therefore, authors should avoid overinterpreting the strength of associations when they are writing their manuscripts.

Funding

None declared.

Conflicts of interest

HA reports no conflict of interest.

Author contributions

HA performed the literature search, designed the manuscript, drafted and approved the final version. HA take responsibility for the paper.
  2 in total

1.  A note on concordance correlation coefficient.

Authors:  J J Liao; J W Lewis
Journal:  PDA J Pharm Sci Technol       Date:  2000 Jan-Feb

2.  Biostatistics 104: correlational analysis.

Authors:  Y H Chan
Journal:  Singapore Med J       Date:  2003-12       Impact factor: 1.858

  2 in total
  458 in total

1.  Predicting adequacy of free quadriceps tendon autograft, for primary and revision ACL reconstruction, from patients' physical parameters.

Authors:  Anthony Ugwuoke; Farhan Syed; Sam El-Kawy
Journal:  Knee Surg Sports Traumatol Arthrosc       Date:  2019-07-30       Impact factor: 4.342

2.  The Hidradenitis Suppurativa Quality of Life (HiSQOL) score: development and validation of a measure for clinical trials.

Authors:  J S Kirby; L Thorlacius; B Villumsen; J R Ingram; A Garg; K B Christensen; M Butt; S Esmann; J Tan; G B E Jemec
Journal:  Br J Dermatol       Date:  2019-12-26       Impact factor: 9.302

3.  Cultural adaptation, translation and validation of Cochin Hand Function Scale and evaluation of hand dysfunction in systemic sclerosis.

Authors:  Devender Bairwa; Chengappa G Kavadichanda; M B Adarsh; Aishwarya Gopal; Vir Singh Negi
Journal:  Clin Rheumatol       Date:  2020-10-15       Impact factor: 2.980

4.  A collective tracking method for preliminary sperm analysis.

Authors:  Sung-Yang Wei; Hsuan-Hao Chao; Han-Ping Huang; Chang Francis Hsu; Sheng-Hsiang Li; Long Hsu
Journal:  Biomed Eng Online       Date:  2019-11-27       Impact factor: 2.819

5.  Identifying Children With Medical Complexity From the National Survey of Children's Health Combined 2016-17 Data Set.

Authors:  Justin A Yu; Gina McKernan; Thomas Hagerman; Yael Schenker; Amy Houtrow
Journal:  Hosp Pediatr       Date:  2021-01-07

6.  Do Religious/Spiritual Preferences and Needs of Cancer Patients Vary Based on Clinical- and Treatment-Level Factors?

Authors:  Elizabeth Palmer Kelly; Anghela Z Paredes; Stephanie DiFilippo; Madison Hyer; Brian Myers; Julia McGee; Daniel Rice; Junu Bae; Diamantis I Tsilimigras; Timothy M Pawlik
Journal:  Ann Surg Oncol       Date:  2020-05-18       Impact factor: 5.344

7.  Circulating and tissue biomarkers as predictors of bromine gas inhalation.

Authors:  Juan Xavier Masjoan Juncos; Shazia Shakil; Aamir Ahmad; Duha Aishah; Charity J Morgan; Louis J Dell'Italia; David A Ford; Aftab Ahmad; Shama Ahmad
Journal:  Ann N Y Acad Sci       Date:  2020-07-09       Impact factor: 5.691

8.  Toward an Individualized Neural Assessment of Receptive Language in Children.

Authors:  Selene Petit; Nicholas A Badcock; Tijl Grootswagers; Anina N Rich; Jon Brock; Lyndsey Nickels; Denise Moerel; Nadene Dermody; Shu Yau; Elaine Schmidt; Alexandra Woolgar
Journal:  J Speech Lang Hear Res       Date:  2020-07-08       Impact factor: 2.297

9.  A comprehensive electrocardiographic analysis for young athletes.

Authors:  Hüseyin Yanık; Evren Değirmenci; Belgin Büyükakıllı
Journal:  Med Biol Eng Comput       Date:  2021-08-03       Impact factor: 2.602

10.  The deep medial femoral sulcus sign: does it exist?

Authors:  Robert D Wissman; Derek Stensby; Juhi Koolwal; Philip Silva; Mojgan Golzy
Journal:  Skeletal Radiol       Date:  2020-09-12       Impact factor: 2.199

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.