Literature DB >> 30533555

Bland-Altman analysis: A paradigm to understand correlation and agreement.

Nurettin Özgür Doğan1.   

Abstract

The rapid increase in the number of new laboratory methods has led to the necessity of reliable verification methods. Validation of a new measurement method for application to medical practice requires comparison with gold standard techniques. The Bland-Altman analysis is a frequently applied technique in studies that investigate the agreement between two methods of the same medical measurement. In this review, potential areas of usage of Bland-Altman analysis is elaborated from a clinical viewpoint, and possible pitfalls in study designs are discussed in statistical perspective.

Entities:  

Keywords:  Biostatistics; Bland-Altman analysis; Correlation analysis; Limits of agreement

Year:  2018        PMID: 30533555      PMCID: PMC6261099          DOI: 10.1016/j.tjem.2018.09.001

Source DB:  PubMed          Journal:  Turk J Emerg Med        ISSN: 2452-2473


Introduction

The Bland-Altman analysis was proposed by Martin Bland and Douglas Altman over thirty years ago with an article published in Lancet. In this article, their main argument was about the incorrect use of correlation coefficients in comparison of a new measurement technique with an established gold standard. This article is accepted as the sixth most-cited paper in statistics literature and was about the differences between measurements obtained by two different measurement systems. In the following years, their method has become the most appropriate way of determining the limits of agreement (LOA) between measurements. Medical laboratories and clinicians often need to assess the agreement between two measurement methods. Validation of a clinical measurement method is a compelling and lengthy process, which necessitates acceptable LOA between two techniques. When the comparing methods are continuous variables (e.g. leucocyte count, antibody titer, body temperature), the Bland-Altman analysis is an appropriate way to perform this comparison and presents quantified measures to decide whether the new method is acceptable or not. This review focuses on the current approach to the Bland-Altman method and its applications in clinical practice.

Concept of correlation analysis

For many years, correlation analysis has been used to assess the relationship between one variable and another. Correlation analysis is classified as a part of a larger class of statistical techniques known as regression. Regression analysis uses the principles of correlation, but it does more than just to describe the strength of a relationship between two variables. The main result of correlation analysis is the correlation coefficient (r), which ranges from −1.0 to +1.0. The closer the coefficient is to the ends of this range, the greater the strength of the linear relationship is. Correlation coefficients can be handled as linear measures for the relationship between variables without providing their agreement. A fictitious data set is provided in Table 1. In this dataset, potassium measurements from venous blood gas analysis and biochemistry panel are presented for each patient. It is easy to make an approximate estimate of these values, and conclude that they are very close to each other. Also using a Spearman correlation analysis, correlation coefficient (Spearman's rho) can be found as 0.885 (p < 0.001), which indicates a very strong relationship between the variables.
Table 1

Dataset for potassium levels in venous blood gases and blood electrolyte work-up.

Potassium level (mEq/L) (Obtained from venous blood gas analysis)Potassium level (mEq/L) (Obtained from blood electrolyte levels)Mean potassium level (mEq/L)Difference between potassium levels (mEq/L)
Patient Nr. 14.54.74.60.2
Patient Nr. 23.84.24.00.4
Patient Nr. 35.15.15.10.0
Patient Nr. 44.95.35.10.4
Patient Nr. 53.94.03.950.1
Patient Nr. 64.03.83.9−0.2
Patient Nr. 74.14.04.05−0.1
Patient Nr. 84.34.04.15−0.3
Patient Nr. 95.35.35.30.0
Patient Nr. 105.25.15.15−0.1
Patient Nr. 113.94.03.950.1
Patient Nr. 124.14.44.250.3
Patient Nr. 134.04.24.10.2
Patient Nr. 145.35.15.2−0.2
Patient Nr. 155.55.35.4−0.2
Patient Nr. 164.44.24.3−0.2
Patient Nr. 174.95.04.950.1
Patient Nr. 183.73.93.80.2
Patient Nr. 193.93.73.8−0.2
Patient Nr. 204.84.74.75−0.1
Patient Nr. 215.55.25.35−0.3
Patient Nr. 223.73.83.750.1
Patient Nr. 233.73.93.800.2
Patient Nr. 244.84.24.5−0.6
Patient Nr. 255.15.65.350.5
Dataset for potassium levels in venous blood gases and blood electrolyte work-up. Does this mean that we can use a given variable instead of the other? Can we replace a laboratory method with the new one, regarding this strong relationship? This argument is not always correct. Unfortunately, correlation analysis provides a link between variables which just happen to occur together, without having an association in between. In this setting, Spearman's rho indicates only the power of this relationship, and this small p-value suggests just strong evidence against the null hypothesis. Consequently, the null hypothesis is rejected and there is probably a relationship. However, results of the correlation analysis do not answer following questions: [a] Is this occurrence an incidental finding or have they a meaningful clinical association? [b] What is the probability of error in each measurement of potassium? A high correlation does not explicitly imply that there is good agreement between the two methods. Moreover, data which seem to be in a poor agreement can produce quite high correlations.

Analysis of the differences between variables

Bland and Altman quantified the difference between measurements using a graphical method. They draw a scatterplot in which the X-axis represented the average [(K1 + K2)/2], and the Y-axis represented the difference (K1 – K2) of two measurements. After the graph is drawn, the mean bias (mean of the K1 – K2) and its confidence limits (limits of agreement) should be quantified. Using statistical software, a one-sample T-test can be performed to calculate the mean bias and its SD. To represent mean bias and limits of agreement, we need only mean of the difference of measurement methods and its standard deviation obtained from one-sample T-test. Secondly, the data points can be restricted using +2 standard deviation (SD) to demonstrate a 95% confidence interval (CI; precisely defined: mean ± 1.96 standard deviations) of distributed data. An ideal agreement is zero difference between measurements. Thus average difference and its limits can also be found near zero in this setting. For our dataset, the mean difference (mean bias) was found as 0.012 with an SD of 0.260. A scatterplot should be drawn to understand dispersion of variables using X-axis (average) and Y-axis (difference). The LOA can be drawn manually if the statistical software does not automatically demonstrate them. In our data set, the upper limit can be calculated using mean + 1.96 x SD (0.012 + 1.96 x 0.260 = 0.522) and the lower limit can be calculated using mean – 1.96 x SD (0.012–1.96 x 0.260 = –0.498). The appropriate statement used in the manuscript can be following: The Bland-Altman plot showed the mean bias ±SD between first and second potassium levels as 0.012 ± 0.260 mEq/L, and the limits of agreement were −0.498 and 0.522 (Fig. 1).
Fig. 1

Agreement between two potassium measurements (Bland-Altman plot).

Agreement between two potassium measurements (Bland-Altman plot). The scatterplot can be evaluated according to the scatter dispersion. In a good agreement, the scattering of points is diminished, and points lie relatively close to the line which represents mean bias. As a quantifiable measure, mean bias and limits of the agreement give information about the utility of the new measurement method. Regarding our data set, those two methods can be used interchangeably as the limits vary from nearly one mEq/L of potassium.

Clinical implication and potential areas of usage

Only a clinician, who uses the test results in a clinical setting can decide whether the mean bias and LOA are acceptable or not. For instance, a mean bias of 0.2 mEq/L is obviously acceptable for potassium levels. However, 3 mEq/L is too broad and can lead to lethal complications if the actual potassium value is higher in biochemistry panel. Bland-Altman analysis was previously used in many method comparisons in the literature. It may be used to compare two new measurement methods or one measurement method against a reference standard. These measurement variables should be continuous (not categorical) such as hemoglobin level (g/dl), anti-HCV antibody titer or the size of a tumor (cm). The Bland-Altman method is a popular approach, and there are reports including but not limited to compare two hemodynamic measurements, end-tidal carbon dioxide measurement methods,, different electrolyte level measurement methods, self-assessed general well-being scores, performance of different computed tomography technologies in evaluating pulmonary nodules.

Pitfalls in Bland-Altman analysis

One of the critical problems in the Bland-Altman analysis is the need to meet the assumption of normal distribution. The continuous measurement variables need not to be normally distributed, but their differences should. If the assumption of normal distribution is not met, data may be logarithmically transformed. The data may be tested against the normal distribution using classical methods such as the Shapiro-Wilk test or Kolmogorov-Smirnov test. Visual evaluation of the histogram plot may not be adequate. Another problem arises from the sample size. Studies comparing methods of measurements should be adequately sized to conclude that the effects are universally valid. If the sample size is not adequate, it is possible to find a low mean bias and reduced limits of agreement by comparing two methods. Such methods cannot be recommended for general use without verification of the results of other studies. To calculate sample size, maximum allowed difference derived from other studies should be provided. Some authors argue that also regression analysis can be performed to compare two methods of measurements. The Bland-Altman analysis may bring proportional bias, which is present when the difference in values resulting from two methods increases or decreases in proportion to the average values. Although it is an uncertain area of expertise, Ludbrook indicated that two methods could be used for different purposes: According to him, regression analysis can be used if the concern of the investigator is to calibrate one measurement against another or to detect bias between two methods of measurement. However, if the goal is to determine whether a method may be safely substituted for another, particularly in clinical practice, the Bland-Altman method may be used. An other problem in the Bland-Altman analysis is repeated measure designs. The Bland-Altman analysis is not an appropriate method to compare repeated measurements. However, it can be performed by adding a random effects model to the analysis., In addition, some statistical softwares allow to perform analysis for repeated designs using Bland-Altman method. Besides, a meta-analysis of studies conducted with the Bland-Altman analysis is still under debate, recently a framework for the meta-analysis of Bland-Altman studies based on limits of agreement approach is published.

Conclusion

Correlation analysis may lead to incorrect or debated results in comparison of two measurement methods. The Bland-Altman analysis is a simple and accurate way to quantify agreement between two variables and may help clinicians to compare a new measurement method against another one or a reference standard.

Conflict of interest

The author declares no conflicts of interest.

Source of funding

None declared.

Author contributions

NOD designed and wrote the manuscript, he also takes responsibility for the paper as a whole.
  15 in total

1.  Using the Bland-Altman method to measure agreement with repeated measures.

Authors:  P S Myles; J Cui
Journal:  Br J Anaesth       Date:  2007-09       Impact factor: 9.166

2.  Bland-Altman beyond the basics: creating confidence with badly behaved data.

Authors:  Richard John Woodman
Journal:  Clin Exp Pharmacol Physiol       Date:  2009-10-16       Impact factor: 2.557

3.  Adapted Bland-Altman method was used to compare measurement methods with unequal observations per case.

Authors:  Cynthia S Hofman; Rene J F Melis; A Rogier T Donders
Journal:  J Clin Epidemiol       Date:  2015-03-07       Impact factor: 6.437

4.  A framework for the meta-analysis of Bland-Altman studies based on a limits of agreement approach.

Authors:  Elizabeth Tipton; Jonathan Shuster
Journal:  Stat Med       Date:  2017-06-29       Impact factor: 2.373

5.  The accuracy of mainstream end-tidal carbon dioxide levels to predict the severity of chronic obstructive pulmonary disease exacerbations presented to the ED.

Authors:  Nurettin Özgür Doğan; Alp Şener; Gül Pamukçu Günaydın; Ferhat İçme; Gülhan Kurtoğlu Çelik; Havva Şahin Kavaklı; Tuğba Atmaca Temrel
Journal:  Am J Emerg Med       Date:  2014-01-15       Impact factor: 2.469

6.  Statistical methods for assessing agreement between two methods of clinical measurement.

Authors:  J M Bland; D G Altman
Journal:  Lancet       Date:  1986-02-08       Impact factor: 79.321

7.  Disparity between mainstream and sidestream end-tidal carbon dioxide values and arterial carbon dioxide levels.

Authors:  Murat Pekdemir; Orhan Cinar; Serkan Yilmaz; Elif Yaka; Melih Yuksel
Journal:  Respir Care       Date:  2013-01-15       Impact factor: 2.258

8.  Bland-Altman analysis as an alternative approach for statistical evaluation of agreement between two methods for measuring hemodynamics during acute myocardial infarction.

Authors:  Julija Brazdzionyte; Andrius Macas
Journal:  Medicina (Kaunas)       Date:  2007       Impact factor: 2.430

Review 9.  Understanding Bland Altman analysis.

Authors:  Davide Giavarina
Journal:  Biochem Med (Zagreb)       Date:  2015-06-05       Impact factor: 2.313

10.  Ultralow dose CT for follow-up of solid pulmonary nodules: A pilot single-center study using Bland-Altman analysis.

Authors:  Michael Paks; Paul Leong; Paul Einsiedel; Louis B Irving; Daniel P Steinfort; Diane M Pascoe
Journal:  Medicine (Baltimore)       Date:  2018-08       Impact factor: 1.817

View more
  31 in total

1.  New Multisite Bioelectrical Impedance Device Compared to Hydrostatic Weighing and Skinfold Body Fat Methods.

Authors:  Andrew D Wells; Bryanne N Bellovary; Jonathan M Houck; Jeremy B Ducharme; Abdulaziz A Masoud; Ann L Gibson; Christine M Mermier
Journal:  Int J Exerc Sci       Date:  2020-12-01

2.  Predictive Ability of European Heart Surgery Risk Assessment System II (EuroSCORE II) and the Society of Thoracic Surgeons (STS) Score for in-Hospital and Medium-Term Mortality of Patients Undergoing Coronary Artery Bypass Grafting.

Authors:  Fei Gao; Lingtong Shan; Chong Wang; Xiaoqi Meng; Jiapeng Chen; Lixiang Han; Yangyang Zhang; Zhi Li
Journal:  Int J Gen Med       Date:  2021-11-19

3.  Objectifying clinical gait assessment: using a single-point wearable sensor to quantify the spatiotemporal gait metrics of people with lumbar spinal stenosis.

Authors:  Callum Betteridge; Ralph J Mobbs; R Dineth Fonseka; Pragadesh Natarajan; Daniel Ho; Wen Jie Choy; Luke W Sy; Nina Pell
Journal:  J Spine Surg       Date:  2021-09

4.  Comparison and Agreement of Toxic and Essential Elements Between Venous and Capillary Whole Blood.

Authors:  Verónica Rodríguez-Saldaña; Niladri Basu
Journal:  Biol Trace Elem Res       Date:  2021-09-21       Impact factor: 3.738

5.  [LLMKA: A Matlab-based toolbox for musculoskeletal kinematics analysis of lower limbs].

Authors:  Shiqi Li; Yong Nie; Junqing Wang; Kang Li; Bin Shen
Journal:  Zhongguo Xiu Fu Chong Jian Wai Ke Za Zhi       Date:  2022-05-15

6.  Chemiluminescent immunoassay overestimates hormone concentrations and obscures testosterone sex differences relative to LC-MS/MS in a field study of diverse adolescents.

Authors:  Julia E Chafkin; Joseph M O'Brien; Fortunato N Medrano; Hae Yeon Lee; David S Yeager; Robert A Josephs
Journal:  Compr Psychoneuroendocrinol       Date:  2022-04-01

7.  Free-breathing liver fat and R 2 quantification using motion-corrected averaging based on a nonlocal means algorithm.

Authors:  Huiwen Luo; Ante Zhu; Curtis N Wiens; Jitka Starekova; Ann Shimakawa; Scott B Reeder; Kevin M Johnson; Diego Hernando
Journal:  Magn Reson Med       Date:  2020-08-01       Impact factor: 4.668

8.  Comments on: High adherence of patients with multiple myeloma who receive treatment with immunomodulatory drugs (IMIDS) in hematology/oncology group practices in Germany.

Authors:  Amélie Cransac; Serge Aho; Mathieu Boulin
Journal:  Support Care Cancer       Date:  2019-06-03       Impact factor: 3.603

9.  Agreement between blood pressure from research study visits versus electronic medical records and associations with hypertensive disorder diagnoses in pregnant women with overweight/obesity.

Authors:  Abbi D Lane-Cordova; Sara Wilcox; Bo Fernhall; Jihong Liu
Journal:  Blood Press Monit       Date:  2021-10-01       Impact factor: 1.430

10.  Measuring Spatiotemporal Parameters on Treadmill Walking Using Wearable Inertial System.

Authors:  Sofia Scataglini; Stijn Verwulgen; Eddy Roosens; Robby Haelterman; Damien Van Tiggelen
Journal:  Sensors (Basel)       Date:  2021-06-29       Impact factor: 3.576

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.