Literature DB >> 35361094

The intra- and interobserver reliability of five commonly used intertrochanteric femur fracture classification systems.

Cem Yıldırım, Osman Görkem Muratoğlu¹, Kaya Turan, Tugrul Ergün, Abdulhamit Mısır, Mahmud Aydın.

Abstract

OBJECTIVES: This study aims to evaluate the effect of surgical experience on reliability for Boyd-Griffin, Evans/Jensen, Evans, Orthopaedic Trauma Association (main and subgroups), and Tronzo classification systems. PATIENTS AND METHODS: Between January 2013 and December 2014, radiological images of a total of 60 patients (13 males, 47 females; mean age: 78.9±21.9 years; range, 61 to 96 years) with the diagnosis of intertrochanteric femur fracture were analyzed. Radiographs were evaluated and classified by five residents and five orthopedics and traumatology surgeons according to the Evans, Boyd-Griffin, Evans/Jensen, OTA, and Tronzo classification systems. Intra- and interobserver reliability were calculated using the kappa statistics.
RESULTS: The worst intraobserver compatibility among the residents was the classification system with OTA subgroups (κ=0.516), while the classification system with the best intraobserver fit was found to be OTA main groups (κ=0.744). The worst agreement among surgeons was in the Evans classification system (κ=0.456). However, the best intraobserver agreement was in the OTA main groups (κ=0.741). The best interobserver agreement was observed regarding the OTA main groups (κ=0.699).
CONCLUSION: The classification that has the best harmony both among residents and surgeons, and between residents and surgeons is the OTA main group classification.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35361094 PMCID： PMC9057552 DOI： 10.52312/jdrs.2022.498

Source DB: PubMed Journal: Jt Dis Relat Surg ISSN： 2687-4792

Introduction

Almost a half of the hip fractures are extracapsular and are subclassified as intertrochanteric and subtrochanteric.[1] Fracture stability or fracture classification systems are used for the recommendation of treatment in intertrochanteric fractures. Such classifications are also used to recommend proper implant or surgical techniques. The ideal classification system allows interaction between physicians, guides the planning, predicts the treatment outcome, and is applicable for clinical practice and research. Examination of the fracture evaluation by the same physician and different physicians should yield the same result each time (intraobserver and interobserver reliability). Several classification systems are used for the classification of extra-capsular hip fractures.[2-5] Most utilized is the Evans classification system modified by Jensen and Michaelsen.[4] Recently, Arbeitsgemeinschaft für Osteosynthesefragen (AO) classification system has been introduced. Despite the widespread use of these systems and the thousands of publications regarding hip fracture, few studies have evaluated the reliability of classification systems and even fewer studies have investigated the reliability of experienced physicians using the classification systems.[6,7] Evans[2] described an anatomical classification based on the number of fragments and whether the lesser trochanter is split off as a separate fragment. The Jensen modification of Evans’ classification consists of five subtypes regarding displacement, the number of fracture fragments, and posteromedial and medial support.[5] The Orthopaedic Trauma Association (OTA) classification for trochanteric femur fractures is built up by three groups of possible types of fractures and then according to increasing fracture severity divided into the subgroups A, B, or C.[8] Tronzo[9] subdivided these fractures into five types according to stability, posteromedial comminution, and fracture line extension. Boyd and Griffin[2] described another classification according to more or less fracture line extension, comminution, subtrochanteric involvement, and extension to the shaft. In the literature, there has been no comprehensive study evaluating intra- and interobserver reliability and the effect of surgeon experience for the five most used intertrochanteric femur fracture classification systems. In this study, therefore, we hypothesized that the interobserver reliability between senior residents and surgeons for intertrochanteric femur fracture classification systems was moderate and intraobserver reliability of the AO/OTA-main group was better than the other classification system. We aimed to compare inter- and intraobserver reliability of five different intertrochanteric fracture classification systems (Evans, Boyd-Griffin, Evans/ Jensen, AO, Tronzo) by two groups of physicians with different ranges of experience.

Patients and Methods

This retrospective study was conducted at Haseki Training and Research Hospital, Department of Orthopedics and Traumatology between January 2013 and December 2014. Preoperative anteroposterior and lateral radiographs of intertrochanteric femur fractures treated surgically were screened. Anteroposterior and lateral radiographs of patients showing femoral fractures, which were obtained randomly and retrospectively from hospital data, were selected in the study. Radiographs were carefully chosen by the investigator who was blinded to the study protocol. Radiological imaging of a total of 60 patients (13 males, 47 females; mean age: 78.9±21.9 years; range, 61 to 96 years) was performed. Patients’ data were not shared with the participants. Radiographs eligible for participation in the study were of adequate radiological quality to allow the investigator to classify the fracture and demonstrate extracapsular hip fracture. There was only anteroposterior and lateral view of fractured hip (Figure 1). Selected radiographs were evaluated by five senior residents and five orthopedic surgeons, each with more than five years of experience in orthopedic trauma. Each observer was given brief information on the original illustrations of the Evans, Boyd-Griffin, Evans/Jensen, AO, and Tronzo classification systems. A written informed consent was obtained from each patient. The study protocol was approved by the Istinye University Ethics Committee (date/no: 24.11.2021-2/2021.K-88). The study was conducted in accordance with the principles of the Declaration of Helsinki. With no clinical information, observers blindly evaluated the radiographs according to the separately defined classification systems. In similar studies, the three-month period was considered appropriate to avoid recall bias.[10,11] Therefore, three months after the first evaluation, the researchers re-evaluated the same radiographs shown in a different order than on the first occasion. During this three-month period, 60 designated radiographs were withheld from researchers. Statistical analysis For intraobserver reliability, Cohen’s kappa value (κ) was obtained using the IBM SPSS for Windows version 21.0 software (IBM Corp., Armonk, NY, USA). Landis and Koch[12] defined values exceeding 0.80 as almost perfect compliance; values between 0.61 and 0.80 as substantial; values between 0.41 and 0.60 as moderate; values between 0.21 and 0.40 as fair; and values between zero and 0.21 as low. Descriptive data were expressed in mean ± standard deviation (SD), median (min-max) or number and frequency. A p value of <0.05 was considered statistically significant.

Results

Interobserver and intraobserver agreement in all classification systems were not significantly different between experienced surgeons and senior residents (p>0.05). Interobserver agreement Interobserver agreement for the Boyd-Griffin classification system was moderate for experienced orthopedic surgeons and senior residents (κ=0.572; 95% confidence interval [CI]: 0.532-0.616). Interobserver agreement for Evans/Jensen classification for experienced surgeons and senior residents (κ=0.498 95% CI: 0.450-0.553) and Evans classification (κ=0.438 95% CI: 0.400-0.481) was moderate. In AO/OTA, subgroup agreement was moderate (κ=0.444 95% CI: 0.418-0.470); however, in AO/OTA, the main group agreement was substantial for surgeons and lateterm residents (κ=0.699 95% CI: 0.649-0.750). Tronzo classification also showed a moderate agreement for surgeons and late-term residents (κ=0.554 95% CI: 0.506-0.614). Intraobserver agreement In the repeated evaluation three months after the first assessment, for the Boyd and Griffin classification experienced surgeons (κ=0.658; 95% CI: 0.550-0.770) obtained a substantial intraobserver agreement similar to senior residents (κ=0.66; 95% CI: 0.550-0.770). When we evaluated intraobserver agreement according to the Evans/Jensen classification system, orthopedic surgeons achieved a moderate agreement (κ=0.484; 95% CI: 0.434-0.542), while senior residents achieved a substantial agreement (κ=0.625; 95% CI: 0.600-0.655). In Evans classification, the resident group and surgeon group showed a moderate agreement (κ=0.557; 95% CI: 0.519-0.595/κ=0.456; 95% CI: 0.409-0.503, respectively). When AO/OTA classification was evaluated, the agreement between the senior residents and the experienced surgeons was moderate in the AO/OTA subgroup (κ=0.516; 95% CI: 0.498-0.540/κ=0.488; 95% CI: 0.418-0.558, respectively) and substantial in the AO/OTA main group (κ=0.744; 95% CI: 0.708-0.785/ κ=0.741; 95% CI: 0.696-0.797, respectively). In Tronzo classification, senior residents and experienced surgeons also showed a moderate agreement (κ=0.528; 95% CI: 0.501-0.562/κ=0.529; 95% CI: 0.489-0.569, respectively) (Table I).

Discussion

An ideal classification system creates a platform for universal communication among surgeons regarding common scenarios and methods for treatment. Classification systems should be both easy to understand and have good interobserver and intraobserver compatibility. This is the first study evaluating five established classification systems with the same surgeons and residents together. The Evans/Jensen and AO/OTA classifications are the most used intertrochanteric fracture classifications. Although the AO/OTA classification provided higher agreement than the Evans/Jensen classification, Fung et al.[6] reported that it was insufficient for compatibility. Pervez et al.[10] showed that the AO/OTA main groups had better compatibility than the Evans/Jensen and subgroups together with the AO/OTA classification. Schipper et al.[13] found that the AO/OTA main groups had a better agreement than the classification with subgroups. Zarie et al.[7] reported that the agreement of AO/OTA classification system was weak. In line with the literature, our study showed that, although the best interobserver and intraobserver compliance was found in the AO/OTA main groups, the compliance was substantial even among experienced surgeons (κ=0.744/0.741 – intraobserver, κ=0.699-intraobserver). In many studies investigating the reliability of fracture classification systems, the images were also evaluated by the residents and experience was shown to increase the reliability of the classifications used in the studies.[13,14] The low agreement on fracture classification among residents used in the studies may be explained by their lack of surgical experience. Gehrchen et al.[15] evaluated intertrochanteric femur fracture in 52 radiographs according to Evan/Jensen classification by two senior residents and two junior residents and did not detect a significant difference in agreement with increasing experience. Behrendt et al.[9] compared Tronzo and AO/OTA classifications, reporting that, similar to our study, experience had no effect on the results, but that the AO/OTA main groups were more compatible than Tronzo. To avoid the limitations described in previous studies, professionals with a wider difference in experience level were included in the study, and there were five assessors in each category, more than similar studies. In our study, there was no significant statistical difference between the interobserver agreement between surgeons and residents. The reason for this is that, despite the difference in experience among the two groups, we believe that the senior residents in our study had sufficient experience in intertrochanteric femur fractures. In their study, Jin et al.[16] showed that AO/OTA main groups were more concordant than other classifications, but the concordance was much lower when evaluated with subgroups. Our study showed that the compatibility of the AO/OTA classification decreased when evaluated together with the subgroups. Klaber et al.[17] evaluated the compatibility of the new AO/OTA classification defined in 2018 and showed that the new AO/OTA classification system had better interobserver and intraobserver agreement than the classical system. In complex patterns of intertrochanteric fractures, a better radiological evaluation may help to evaluate the treatment plan and more reliable fracture classification. Computed tomography (CT) and plain radiography have been compared in recent studies for different types of fractures with complicated fracture patterns, such as tibial plateau or calcaneal fractures, and CT has proven superior.[18-20] Cavaignac et al.[21] evaluated the effect of CT on AO/OTA and Evans/Jensen classification systems and showed that it provided a clearer understanding of the fracture, but did not increase interobserver agreement on the classification systems. Another study evaluating the effects of three-dimensional CT examinations on fracture classification systems showed threedimensional CT to succeed in determining the stability and, thus, implant options, but this study obtained similar results with Cavaignac et al.'s[21] study regarding compliance.[22] Our study did not include CT in fractures due to the additional cost and radiation exposure to participants. An ideal fracture classification system should provide information on fracture stability and have a high degree of reproducibility. The common philosophy of classification systems designed for intertrochanteric femur fractures is whether the fracture is stable or not. A study by van Embden et al.,[11] which compared the AO/OTA and Jensen classification systems for intertrochanteric femur fractures, showed low agreement among participants on the assessment of a trochanteric fracture as either stable or unstable. Intertrochanteric fractures with four-part, reverse oblique and medial cortical discontinuities are usually considered unstable. However, there is not enough evidence in the literature on this subject.[23,24] Several articles in the literature failed to provide a consensus on fracture stability, although some studies have suggested that medial structural continuity is vital,[2] while Palm et al.[25] and Gotfried[26] reported that an intact lateral wall played a key role in stabilization and fixation of intertrochanteric fractures. In particular, in the Tronzo classification, it may be difficult to interpret the stability of the fracture, as although the discontinuity of the posteromedial wall indicates instability, the lesser trochanter may also be fractured in Tronzo type 2 which is stable.[27] Confusion in the concept of stable fracture probably explains the low agreement in the Tronzo, OTAsubgroups, and Evans/Jensen classifications in our study. Some of the intertrochanteric classification systems are similar and intertwined: the AO/OTA group A1 fractures are displaced or nondisplaced two-part trochanteric fractures, equivalent to Jensen classification 1,2 and Tronzo 1,2. Group AO/OTA A2 intertrochanteric fractures are comminuted and unstable and are equivalent to Jensen types III, IV, V, and Tronzo types III, IV. Group AO/OTA A3 intertrochanteric fractures are at the level of the lesser trochanter and may be reversed, transverse or oblique. In reverse oblique fractures, the fracture line extends from medial to lateral, from proximal to distal. This fracture group was classified as type V in Tronzo classification, type III in Boyd and Griffin classification, type II in Evans classification, and included in other groups in Jensen's modification. Clinical studies showing an increased risk of fixation failure for reverse oblique fractures, and intramedullary fixation has been recommended in these studies.[28] The reason why current classification systems are inconsistent with the complexity of intertrochanteric fractures may be that these classification systems focus on well-known fracture features such as four-part fractures, reverse oblique fractures, disruption of medial cortical continuity and do not consider less important fracture features as intact lateral wall. By revising the current classification systems and using CT, which can provide three-dimensional imaging, instead of direct radiography, fracture stability, treatment options, and agreement in fracture classification systems can be increased. In our study, all the physicians who evaluated the classification systems were working in the same clinic. Physicians working in the same clinic are likely to have the same approach in intertrochanteric fracture classification as in all fracture types.[29] Additionally, our study does not focus on implant selection based on determining whether an intertrochanteric fracture is stable or unstable. In conclusion, none of the commonly used classification systems for trochanteric fractures accurately describe intertrochanteric fractures. To improve current fracture management, existing classification systems should be revised by learning more about fracture characteristics, biomechanical properties of fractures, and their understanding, as well as the definition of successful fracture reduction. The use of CT, which shows the fracture in more detail, would facilitate the understanding of the fracture. Despite all these, we believe that the current AO/OTA classification for intertrochanteric fractures, using the three main types, allows for a common language among treating physicians.

Table 1

Intraobserver kappa values of resident group and surgeon group, and interobserver kappa value of resident-surgeon evaluations

	Resident group intraobserver (95% CI)	Surgeon group intraobserver (95% CI)	Resident-surgeon group interobserver (95% CI)
Boyd-Griffin classification	0.660 (0.550-0.770)	0.658 (0.550-0.770)	0.572 (0.532-0.616)
Evans-Jensen classification	0.625 (0.600-0.655)	0.484 (0.434-0.542)	0.498 (0.450-0.553)
Evans classification	0.557 (0.519-0.595)	0.456 (0.409-0.053)	0.438 (0.400-0.481)
AO/OTA-main group	0.744 (0.708-0.785)	0.741 (0.696-0.797)	0.699 (0.649-0.750)
AO/OTA-subgroup	0.516 (0.498-0.540)	0.488 (0.418-0.558)	0.444 (0.418-0.470)
Tronzo classification	0.528 (0.501-0.562)	0.529 (0.489-0.569)	0.554 (0.506-0.614)
CI: Confidence interval; AO/OTA: Arbeitsgemeinschaft für Osteosynthesefragen/Orthopaedic Trauma Association.

28 in total

1. Reliability of classification systems for intertrochanteric fractures of the proximal femur in experienced orthopaedic surgeons.

Authors: Wen-Jie Jin; Li-Yang Dai; Yi-Min Cui; Qing Zhou; Lei-Sheng Jiang; Hua Lu
Journal: Injury Date: 2005-04-07 Impact factor: 2.586

2. The treatment of trochanteric fractures of the femur.

Authors: E M EVANS
Journal: J Bone Joint Surg Br Date: 1949-05

3. Unstable intertrochanteric fractures of the hip.

Authors: J H Dimon; J C Hughston
Journal: J Bone Joint Surg Am Date: 1967-04 Impact factor: 5.284

4. CT scan does not improve the reproducibility of trochanteric fracture classification: a prospective observational study of 53 cases.

Authors: E Cavaignac; M Lecoq; A Ponsot; A Moine; N Bonnevialle; P Mansat; N Sans; P Bonnevialle
Journal: Orthop Traumatol Surg Res Date: 2012-12-25 Impact factor: 2.256

10. Evaluation of the Inter and Intra-Observer Reliability of the AO Classification of Intertrochanteric Fractures and the Device Choice (DHS, PFNA, and DCS) of Fixations.

Authors: Mohamed Zarie; Mohamed Farah Mohamoud; Amir Reza Farhoud; Nima Bagheri; Furqan Mohammed Yaseen Khan; Mahdi Heshmatifar; Hadi Klantar
Journal: Ethiop J Health Sci Date: 2020-09

The intra- and interobserver reliability of five commonly used intertrochanteric femur fracture classification systems.

Introduction

Patients and Methods

Results

Discussion

1. Reliability of classification systems for intertrochanteric fractures of the proximal femur in experienced orthopaedic surgeons.

2. The treatment of trochanteric fractures of the femur.

3. Unstable intertrochanteric fractures of the hip.

4. CT scan does not improve the reproducibility of trochanteric fracture classification: a prospective observational study of 53 cases.

5. Interobserver reliability of a CT-based fracture classification system.

6. Consistency of AO fracture classification for the distal radius.

7. Classification of trochanteric fracture of the proximal femur: a study of the reliability of current systems.

8. Letournel classification for acetabular fractures. Assessment of interobserver and intraobserver reliability.

9. What are the expectations of an editor from a scientific article?

10. Evaluation of the Inter and Intra-Observer Reliability of the AO Classification of Intertrochanteric Fractures and the Device Choice (DHS, PFNA, and DCS) of Fixations.