Literature DB >> 31908673

Assessing the reliability of the modified Gartland classification system for extension-type supracondylar humerus fractures.

T L Teo¹, E K Schaeffer¹, E Habib¹, A Cherukupalli¹, A P Cooper¹, A Aroojis², W N Sankar³, V V Upasani⁴, S Carsen⁵, K Mulpuri¹, C Reilly¹.

Abstract

PURPOSE: The Gartland extension-type supracondylar humerus (SCH) fracture is the most common paediatric elbow fracture. Treatment options range from nonoperative treatment (taping or casting) to operative treatments (closed reduction and percutaneous pinning or open reduction). Classification variability between surgeons is a potential contributing factor to existing controversy over treatment options for type II SCH fractures. This study investigated levels of agreement in extension-type SCH fracture classification using the modified Gartland classification system.
METHODS: A retrospective review was conducted on 60 patients aged between two and 12 years who had sustained an extension-type SCH fracture and received operative or nonoperative treatment at a tertiary children's hospital. Baseline radiographs were provided, and surgeons were asked to classify the fractures as type I, IIA, IIB or III according to the modified Gartland classification. Respondents were then asked to complete a second round of classifications using reshuffled radiographs. Weighted kappa values were calculated to assess interobserver and intraobserver levels of agreement.
RESULTS: In all, 21 paediatric orthopaedic surgeons responded to the survey and 15 completed a second round of ratings. Interobserver agreement for classification based on the Gartland criteria between surgeons was substantial with a kappa of 0.679 (95% confidence interval (CI) 0.501 to 0.873). Intraobserver agreement was substantial with a kappa of 0.796, (95% CI 0.628 to 0.864).
CONCLUSION: Radiographic classification of extension-type SCH fractures demonstrated substantial agreement both between and within surgeon raters. Therefore, classification variability may not be a major contributing factor to the treatment controversy for type II SCH fractures and treatment variability may be due to differences in surgeon preferences. LEVEL OF EVIDENCE: III.

Entities: Chemical

Keywords: Gartland classification; elbow fracture; paediatrics; reliability; supracondylar humerus fracture

Year: 2019 PMID： 31908673 PMCID： PMC6924127 DOI： 10.1302/1863-2548.13.190005

Source DB: PubMed Journal: J Child Orthop ISSN： 1863-2521 Impact factor: 1.548

Introduction

The Gartland extension-type supracondylar humerus fracture is the most common paediatric elbow fracture.[1] Depending on fracture classification, treatment options range from nonoperative, such as closed reduction and/or tape, splint or cast immobilization, to operative, such as closed reduction and percutaneous pinning or open reduction and pinning.[2] There is generally little controversy over whether nonoperative versus operative management is recommended for Gartland types I and III fractures. However, controversy still exists around the world for type II fracture treatment; many surgeons prefer operative treatment while some surgeons choose to treat type II fractures nonoperatively. Despite an increased trend towards operative treatment for type II fractures, the current evidence does not completely support the superiority of operative over nonoperative methods.[3] A potential contributing factor to this controversy is variability in classification of fractures by different orthopaedic surgeons. Literature generally supports the use of the modified Gartland classification system; however, the classification of certain subtypes of supracondylar fractures can be quite variable between surgeons, including the following treatment regimen.[4] Differences in classification can mean the difference between operative and nonoperative treatment for a patient. For example, a patient with a borderline type I/IIA fracture may get treated with a splint if classified as type I or surgery if classified as type II if they went to a surgeon who chooses to treat type IIA fractures operatively. Similarly, a patient with a borderline type IIA/IIB fracture may get treated with casting if classified as type IIA or surgery if classified as type IIB if they went to a surgeon who chooses to treat type IIA fractures nonoperatively. It is necessary to determine whether differences in treatment preferences for type II fractures between surgeons are due to a true difference in practice patterns or due to surgeons classifying the same fractures differently. If a true difference in practice patterns exists, then there is a need for further research to compare outcomes between nonoperative and operative management in order to standardize patient care. If, however, the differences in treatment preferences are due to different patterns of classification between surgeons, then it may call into question the utility of the modified Gartland classification system in determining management. In this case, further exploration into the factors influencing individual surgeon decision-making may allow for the development of a more reliable diagnostic classification. To date, there has been no classification reliability study performed between surgeons across countries. The purpose of this study was to investigate the levels of agreement between surgeons from Canada, the United States (USA), Australia, the United Kingdom (UK), and India in the classification of extension-type supracondylar humerus fractures using the modified Gartland classification system.

Materials and methods

After receiving institutional Research Ethics Board approval at the University of British Columbia, a retrospective radiographic and chart review was conducted on patients aged two to 12 years who had sustained an extension-type supracondylar humerus fracture between January 2005 and December 2016, received either operative or nonoperative treatment at a tertiary paediatric hospital, and had adequate pre-reduction radiographs available. Radiograph adequacy was defined by a true anteroposterior (AP) view on AP radiographs with orthogonal visualization of the distal humerus and clear delineation of the hourglass sign and capitellum on lateral radiographs. A total of 60 patients were selected for inclusion after review. These patients had received a Gartland classification diagnosis by one of seven surgeons from a single institution: ten were diagnosed with type I fractures, 25 with type II fractures and 25 with type III fractures. Many of the cases were chosen because they straddled the borderline between two categories, to reflect the difficulty in decision-making often seen in clinical practice. Each patient had an adequate set of baseline AP and lateral plain elbow radiographs as determined by the senior author (CR); these were de-identified and compiled into surveys administered through Research Electronic Data Capture (REDCap) software (Vanderbilt University, Nashville, Tennessee). Invitations to participate in the survey were sent out to fellowship-trained paediatric orthopaedic surgeons practising in tertiary care hospitals around the world. This study was conducted in two rounds. For the first round, surgeons were provided with a brief pictorial and table summary of the Wilkins-modified[5] Gartland classification system, along with the compiled radiographs (Fig. 1). Each surgeon was blinded to patient treatment and original diagnosis and asked to classify the fractures as type I, IIA, IIB or III according to the system provided. In order to assess intraobserver agreement, the same radiographs were reshuffled and surgeons were asked to reclassify each set of radiographs using the same classification system following a two-week interval.

Fig. 1

Summary of Wilkins-modified Gartland classification system[5] provided to survey respondents. Figure 1 is being reprinted with permission and licensing from Wolters Kluwer Health, Inc. Journal content for figure re-use: Authors: Timothy Alton, Shawn Werner, and Albert Gee; Article Title: Classifications In Brief: The Gartland Classification of Supracondylar Humerus Fractures; Journal: Clinical Orthopaedics and Related Research; Volume 473; Issue 2; Pages 738-741; URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4294919/ Computer-generated weighted kappa statistics using pairwise comparisons were calculated along with 95% confidence intervals (CI) to assess interobserver and intraobserver levels of agreement. The weighted kappa coefficients used (Table 1) took into consideration the varying levels of clinical importance for disagreement between each classification (e.g. a disagreement between type I and IIA would be less significant than one between type I and III) as determined by two orthopaedic surgeons (CR, KM). The kappa values were interpreted using the Landis and Koch guidelines[6] outlined as follows: values < 0.00 indicate poor agreement; 0.00 to 0.20 slight agreement; 0.21 to 0.40 fair agreement; 0.41 to 0.60 moderate agreement; 0.61 to 0.80 substantial agreement; and 0.81 to 1.00 excellent or almost perfect agreement.

Table 1

Relative weights assigned to kappas for disagreement between observers

Disagreement between classifications	Weight
Type I vs IIA	1
Type I vs IIB	4
Type I vs III	6
Type IIA vs IIB	3
Type IIA vs III	4
Type IIB vs III	1

Relative weights assigned to kappas for disagreement between observers

Results

Participant demographics

A total of 31 surgeons comprised of members of the Canadian Pediatric Orthopaedic Group and surgeons known to one of the study authors (KM) were invited to participate. In all, 21 surgeons (14 from Canada, three from the USA, two from Australia, one from the UK, one from India) representing 17 tertiary care hospitals responded to the first round of surveys. Of these respondents, four were from the same institution. The mean length of practice across all respondents was ten years (1 to 26). The majority of the respondents treat approximately 41 to 60 supracondylar humerus fractures a year. None of the respondents treat < 20 a year. After a two-week period, 15 of the original survey respondents completed the second round of surveys.

Interobserver level of agreement

The weighted interobserver agreement for classification based on the modified Gartland criteria across all respondents was substantial (Table 2). Levels of agreement were similarly substantial for the Canadian and non-Canadian groups, with weighted kappa values of 0.687 (95% CI 0.501 to 0.873) and 0.663 (95% CI 0.436 to 0.891), respectively. Across the four respondents from the same institution, the level of agreement was substantial with a weighted kappa of 0.746 (95% CI 0.654 to 0.839). The most common source of disagreement was between the type IIA and IIB classifications.

Table 2

Weighted interobserver and intraobserver kappas (κ)

Patient cases, n	Combined weighted interobserver κ (95% CI) (n = 21)	Combined weighted intraobserver κ (95% CI) (n=15)
60	0.679 (0.468 to 0.889)	0.796 (0.628 to 0.964)

CI, confidence interval

Weighted interobserver and intraobserver kappas (κ) CI, confidence interval

Intraobserver level of agreement

The weighted intraobserver agreement for classification based on the modified Gartland criteria for all respondents was substantial (Table 2). Individual intraobserver agreements for each of the 15 respondents who completed both surveys ranged from 0.554 (moderate) to 0.898 (excellent or almost perfect).

Discussion

The results of this study demonstrate a substantial level of interobserver and intraobserver agreement in the radiographic classification of extension-type supracondylar humerus fractures at baseline. The levels of agreement are substantial enough to suggest that classification variability is not a major contributing factor to variability in treatment between surgeons for type II supracondylar fractures. However, as levels of agreement were not perfect, we acknowledge that there are still a proportion of cases for which treatment decisions may vary based on the way individual surgeons classify them. The incidence of paediatric supracondylar fractures has increased over the last decade, potentially due to the greater prevalence of high-energy recreational activities that children participate in in recent years.[3] In an effort to standardize the management of these fractures, the Gartland classification system and treatment algorithm was created in 1959 and continues to form the basis of the American Academy of Orthopaedic Surgeons’ treatment recommendations for paediatric supracondylar fractures.[1] Despite this, controversy still exists over the necessity of operative treatment in the management of Gartland type IIA fractures. While this may reflect true differences in surgeon preferences, it may also potentially be the result of classification differences, where a fracture may be classified as type I and thus treated nonoperatively by one surgeon and classified as IIA or IIB by another and treated operatively. Consequently, we need to ascertain that fractures are being classified consistently in order to compare treatment outcomes for nonoperative and operative methods within a single fracture type. This would not only allow for confidence when interpreting results of potential multi-centre prospective study, but also allow for meaningful comparisons across multiple studies with the knowledge that surgeons are discussing the same kinds of fractures. Previous studies have investigated the reliability of the Gartland classification system (Table 3). Barton et al[4] performed a study with five physicians of varying levels of training and areas of expertise – a junior orthopaedic resident, a senior orthopaedic resident, a paediatric orthopaedic fellow, an attending-level paediatric orthopaedic surgeon and an attending-level paediatric orthopaedic radiologist. The study found substantial interobserver agreement and excellent intraobserver agreement. A subsequent study involving four orthopaedic surgeons based in the UK found moderate interobserver agreement overall, poor interobserver agreement for type I fractures, fair to moderate interobserver agreement for type II fractures, substantial to excellent interobserver agreement for type III fractures and substantial to excellent intraobserver agreement overall.[7] A later study in 2010 involving four fellowship-trained paediatric orthopaedic surgeons based in New York confirmed moderate interobserver agreement overall, moderate interobserver agreement for type I fractures, moderate interobserver agreement for type II fractures, substantial interobserver agreement for type III fractures and substantial intraobserver agreement overall.[2] Both of the latter studies suggested that injury mechanism, soft-tissue status and neurovascular compromise are considered by surgeons in addition to the Gartland classification in making treatment decisions. A more recent study from Brazil between three paediatric orthopaedic surgeons found substantial interobserver agreement overall, excellent interobserver agreement for type I fractures, moderate interobserver agreement for type II fractures, substantial interobserver agreement for type III fractures and excellent intraobserver agreement overall.[8] Finally, a study by Leung et al[9] between five USA-based surgeons found moderate interobserver agreement overall and substantial intraobserver agreement overall. They questioned the utility of the Wilkins-modified classification system in guiding clinical decision-making, particularly for type II fractures where the reliability of classification between surgeons is low. They also found a much greater level of agreement when asked to classify fractures as requiring operative or nonoperative treatment instead.

Table 3

Comparison with other studies evaluating the reliability of the Gartland classification system

Study	Number of patient cases	Number of raters	Raters’ country of practice	Classification system	Overall interobserver κ	Interpretation	Overall intraobserver κ	Interpretation
Barton et al (2001)[4]	50	5	United States	Modified Gartland	0.74	Substantial	0.81 to 0.84	Excellent or almost perfect
Heal et al (2007)[7]	50	4	United Kingdom	Modified Gartland	0.54	Moderate	0.68 to 0.83	Substantial to excellent or almost perfect
Mallo et al (2010)[2]	75	4	United States	Standard Gartland	0.521	Moderate	0.723	Substantial
Rocha et al (2015)[8]	50	3	Brazil	Standard Gartland	0.756	Substantial	0.719 to 0.859	Substantial
Leung et al (2018)[9]	200	5	United States	Modified Gartland	0.475	Moderate	0.777	Substantial
Present study	60	21	Canada, United States, United Kingdom, Australia, India	Modified Gartland	0.679	Substantial	0.796	Substantial

κ, kappa

Comparison with other studies evaluating the reliability of the Gartland classification system κ, kappa Two observations may be drawn from these studies. Firstly, while the levels of interobserver agreement for types I and III fractures and general intraobserver agreement tends to be high, interobserver agreement for type II fractures tend to be only as good as fair to moderate. We chose not to perform subgroup analyses for individual classification types because it would have required using the original diagnosis as the gold standard for categorization, which is subjective based on individual surgeon variation. This was also evident in a study by Hyman et al[10] where classification of staging in patients with Legg-Calvé-Perthes disease was limited by a lack of a benchmark for comparison. They also considered the use of a senior author as a standard of comparison; however, such an approach was limited due to the inability to determine the accuracy of the author’s rating. Nevertheless, we attempted to account for this by using weighted kappa coefficients in our calculations to reflect the varying degrees of clinical importance for discrepancies between various fracture classifications. We also note that our observed frequency of disagreement was indeed highest between type IIA and IIB classifications. Secondly, none of these studies has examined agreement between surgeons practising across different countries and thus their results may not accurately reflect differences between surgeons around the world. The present study attempted to improve upon the generalizability of the findings by examining agreement between a significantly larger group of surgeon respondents representing multiple tertiary paediatric centres around the world. There are limitations to this study. The surgeons invited to respond to this survey were all known to one of the study’s authors (KM) and many of them had completed their one-year paediatric orthopaedic fellowship at the same institution. This could have potentially increased the level of agreement above the true value since the surgeons’ diagnostic patterns would have been influenced by having been through similar training. Nevertheless, these surgeons also performed their residencies at different institutions and have practiced for a number of years at their current institutions, where their practice patterns would inevitably be influenced differently. Another limitation to this study is that a large proportion of the survey respondents were Canadian, which challenges the generalizability of the study results to countries outside of Canada. Most of the survey respondents practise in Western countries, with only one practising in an Asian country, so the study was not as internationally representative of an assessment as intended. However, this is the first study to assess classification reliability using respondents from more than one institution or country. In order to assess the extent of difference that having a large proportion of Canadian respondents would cause, we examined the levels of agreement when Canadians and non-Canadians were grouped separately and found that, within the limitations of our sample size, levels of agreement were similar between both groups, although a larger number of respondents would be needed in order to test this fully. Furthermore, while the level of agreement between the four surgeons from the same institution appears to be slightly better than the overall level of agreement, both kappa values still fall within the substantial agreement category, suggesting that neither shared institution nor shared country of practice have a large influence on the reliability in classification between surgeons. Another limitation to the study is that the weighted kappa coefficients that were used to account for the varying levels of clinical importance for disagreement between each classification were determined by only two orthopaedic surgeons. The weighted coefficients were used in an attempt to numerically account for the difference in clinical significance of disagreement between different Gartland classifications, where a disagreement between type IIA and IIB fractures is clinically more significant than a disagreement between type I and IIA fractures. This was important since we were calculating a quantitative kappa value for agreement using categories between which the clinical significance of disagreement is not equal. However, we acknowledge that having more than two surgeons determine the respective weighting used would have increased in the generalizability of our results. A further limitation was that we did not have representation from physicians or surgeons practising at community-based hospitals since all of the respondents were paediatric orthopaedic surgeons practising at tertiary referral hospitals. This might prevent the generalizability of the findings to non-Western countries and non-tertiary hospital-based practices, since training, methods of diagnosis and treatment may potentially differ in those areas. Finally, no residents or fellows were included in this survey, so the results may be limited to fellowship-trained attending-level paediatric orthopaedic surgeons. The current orthopaedic literature is often limited by a lack of high-quality, scientific data, particularly in the treatment of paediatric supracondylar humerus fractures, where levels of evidence are limited by methodological shortcomings inherent in retrospective study designs.[11] There is a clear need for larger, prospective controlled trials to more effectively compare outcomes from different treatment modalities and these trials would ideally include multiple centres so as to increase the generalizability of findings. However, in order to perform such cross-collaborations, we need to be certain that surgeons from different centres are classifying supracondylar humerus fractures in the same way. Unfortunately, while there should be a central adjudication system to standardize classification in an ideal trial, this is often limited by availability of resources. In conclusion, this study suggests that levels of agreement between orthopaedic surgeons are substantial enough to allow for the meaningful comparison of extension-type paediatric supracondylar humerus fractures across the practices of attending-level orthopaedic surgeons practicing in Western countries. Classification variability does not seem to be a major contributing factor to the treatment controversy for type II supracondylar humerus fractures. Further research is needed to compare patient outcomes between nonoperative and operative treatment for these fractures, so as to establish consensus and a standardized treatment protocol for optimal patient care across centres.

10 in total

1. Interobserver and intraobserver reliability of the modified Waldenström classification system for staging of Legg-Calvé-Perthes disease.

Authors: Joshua E Hyman; Evan P Trupia; Margaret L Wright; Hiroko Matsumoto; Chan-Hee Jo; Kishore Mulpuri; Benjamin Joseph; Harry K W Kim
Journal: J Bone Joint Surg Am Date: 2015-04-15 Impact factor: 5.284

2. Classifications in brief: the Gartland classification of supracondylar humerus fractures.

Authors: Timothy B Alton; Shawn E Werner; Albert O Gee
Journal: Clin Orthop Relat Res Date: 2014-11-01 Impact factor: 4.176

3. The measurement of observer agreement for categorical data.

Authors: J R Landis; G G Koch
Journal: Biometrics Date: 1977-03 Impact factor: 2.571

4. The treatment of pediatric supracondylar humerus fractures.

Authors: Andrew Howard; Kishore Mulpuri; Mark F Abel; Stuart Braun; Matthew Bueche; Howard Epps; Harish Hosalkar; Charles T Mehlman; Susan Scherl; Michael Goldberg; Charles M Turkelson; Janet L Wies; Kevin Boyer
Journal: J Am Acad Orthop Surg Date: 2012-05 Impact factor: 3.020

5. Gartland Type II Supracondylar Humerus Fractures, Their Operative Treatment and Lateral Pinning Are Increasing: A Population-Based Epidemiologic Study of Extension-Type Supracondylar Humerus Fractures in Children.

Authors: Juha-Jaakko Sinikumpu; Tytti Pokka; Minna Sirviö; Willy Serlo
Journal: Eur J Pediatr Surg Date: 2016-12-09 Impact factor: 2.191

6. Reliability of a modified Gartland classification of supracondylar humerus fractures.

Authors: K L Barton; C K Kaminsky; D W Green; C J Shean; S M Kautz; D L Skaggs
Journal: J Pediatr Orthop Date: 2001 Jan-Feb Impact factor: 2.324

7. Use of the Gartland classification system for treatment of pediatric supracondylar humerus fractures.

Authors: Gregory Mallo; Scott J C Stanat; John Gaffney
Journal: Orthopedics Date: 2010-01 Impact factor: 1.390

8. Does the Modified Gartland Classification Clarify Decision Making?

Authors: Sophia Leung; Ebrahim Paryavi; Martin J Herman; Paul D Sponseller; Joshua M Abzug
Journal: J Pediatr Orthop Date: 2018-01 Impact factor: 2.324

9. Reproducibility of the Gartland classification for supracondylar humeral fractures in children.

Authors: J Heal; M Bould; J Livingstone; N Blewitt; A W Blom
Journal: J Orthop Surg (Hong Kong) Date: 2007-04 Impact factor: 1.118

10. Reproducibility of the AO/ASIF and Gartland classifications for supracondylar fractures of the humerus in children.

Authors: Igor Tadeu Silveira Rocha; André de Siqueira Faria; Carlos Fontoura Filho; Murilo Antônio Rocha
Journal: Rev Bras Ortop Date: 2015-05-28

10 in total

4 in total

Review 1. Recommendations for the Care of Pediatric Orthopaedic Patients During the COVID-19 Pandemic.

Authors: Sarah Farrell; Emily K Schaeffer; Kishore Mulpuri
Journal: J Am Acad Orthop Surg Date: 2020-06-01 Impact factor: 3.020

2. Is the modified Gartland classification system important in deciding the need for operative management of supracondylar humerus fractures?

Authors: Tammie L Teo; Emily K Schaeffer; Eva Habib; Ron El-Hawary; Patricia Larouche; Benjamin Shore; Alexander Aarvold; Sasha Carsen; Christopher Reilly; Kishore Mulpuri
Journal: J Child Orthop Date: 2020-12-01 Impact factor: 1.548

Review 3. Overview of the contemporary management of supracondylar humeral fractures in children.

Authors: Sean Duffy; Oliver Flannery; Yael Gelfer; Fergal Monsell
Journal: Eur J Orthop Surg Traumatol Date: 2021-03-20

4. What Are the Interobserver and Intraobserver Variability of Gap and Stepoff Measurements in Acetabular Fractures?

Authors: Anne M L Meesters; Kaj Ten Duis; Hester Banierink; Vincent M A Stirler; Philip C R Wouters; Joep Kraeima; Jean-Paul P M de Vries; Max J H Witjes; Frank F A IJpma
Journal: Clin Orthop Relat Res Date: 2020-12 Impact factor: 4.755

4 in total