Literature DB >> 35652025

Evaluation of the Intraobserver and Interobserver Agreements of the New AO/OTA Classification for Fractures of the Trochanteric Region and the Femoral Neck.

Thiago Sampaio Busato¹, Daniel Baldasso¹, Gladyston Roberto Matioski Filho¹, Lucas Dias Godoi¹, Marcelo Gavazzoni Morozowski¹, Juan Rodolfo Vilela Capriotti¹.

Abstract

Objective In the present study, we investigated the intra and interobserver agreement of the new Arbeitsgemeinschaft für Osteosynthesefragen/Orthopaedic Trauma Association (AO/OTA) classification for fractures of the proximal extremity of the femur. Methods One hundred hip radiographs were selected from patients who suffered fractures of the trochanteric region or femoral neck. Four orthopedists, fellowship trained hip surgeons, and four orthopedic residents evaluated and classified fractures according to the new AO/OTA system on two separate occasions. The kappa (k) coefficient was used to evaluate intra and interobserver agreement in the different steps of the classification, namely: type , group , subgroup , and qualifier. Results Hip surgery experts obtained almost perfect intraobserver agreement of type , substantial for group and, only moderate, for subgroup and qualifiers. The residents had lower performance, with substantial agreement for type, moderate for group , and reasonable for subgroup and qualifier. In the specialists' interobserver evaluation, there was also a gradual decrease in the agreement between type (almost perfect) and group (moderate), which was even lower for subgroup and qualifiers. Residents had a substantial interobserver agreement for type , moderate for group , and reasonable in the other branches. Conclusion The new AO/OTA classification for fractures of the trochanteric region and femoral neck showed intra and interobserver agreements considered appropriate for type and group , with a drop in the subsequent branches, that is, for subgroup and qualifier. Still, in relation to the old AO/OTA classification, there was an improvement in the agreements for subgroup. Sociedade Brasileira de Ortopedia e Traumatologia. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commecial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ).

Entities: Chemical

Keywords: classification; femur neck; hip fractures

Year: 2021 PMID： 35652025 PMCID： PMC9142227 DOI： 10.1055/s-0041-1729939

Source DB: PubMed Journal: Rev Bras Ortop (Sao Paulo) ISSN： 0102-3616

Introduction

Fractures of the proximal extremity of the femur cause great morbidity and mortality in the short and medium term in the elderly, 1 2 3 with one third of patients progressing to death within 1 year, and half becoming dependent for locomotion. 4 5 These fractures can also affect young patients victimized by high-energy trauma. 4 The treatment of these lesions requires the interaction of a multidisciplinary team. 6 These lesions recquire eminently surgical treatment and, to define the best treatment, among other data, it is necessary to classify the fracture. An ideal classification system should allow communication between physicians, standardize terminology for research, and guide treatment decision. 6 Many attempts to create a classification system for fractures of the proximal femur have been described, with the classifications of Garden, 7 Evans, 8 Boyd and Griffin, 9 Tronzo 10 and the Arbeitsgemeinschaft für Osteosynthesefragen/Orthopaedic Trauma Association (AO/OTA) 11 as the best-known ones. The AO/OTA classification has favorable evidence of reliability in the trochanteric region 12 13 14 and in the femoral neck, 15 when compared to the other commonly used classifications. However, its additional ramifications tend to decrease interobserver confidence 13 16 and require great practice. 16 17 The validation of a classification occurs with the demonstration of some criteria: good clinical correlation, adequacy in terms of agreement and accuracy, and also constructive validation (relevance). 18 Concerns with the terminology and complex flow line of the previous AO/OTA classification 19 motivated the modernization of the classification. However, we have not located studies in the literature on the reliability of this new version. In this sense, the present study aimed to evaluate the degree of intra and interobserver agreement in each sequential subdivision of the new AO/OTA classification for fractures of the proximal extremity of the femur, in observers considered experienced (adult hip surgeons) and inexperienced (orthopedic residents).

Materials and Methods

The present retrospective study included radiographic records of patients who suffered fractures of the proximal extremity of the femur between 2015 and 2019, treated at a reference center in orthopedic trauma. A total of 100 consecutive cases were selected for this evaluation. The sample size was arbitrated based on previous studies 4 5 6 12 13 18 that used smaller samples to validate classifications (between 40 and 70 cases). The project was previously submitted and approved by the Research Ethics Committee of the Institution (CAAE: 30754120.7.0000.5226). The inclusion criteria were presence of fracture of the proximal extremity of the femur (bone 3, anatomical region 1), the trochanteric region (group A) or the femoral neck (group B), in skeletically mature individuals. Fractures of the femoral head (which are best evaluated by computed axial tomography) and pathological fractures were not included in this study. Each participant received the digital radiography images in anteroposterior and lateral views for analysis. There was no patient or treatment information on the images used for classification. Four orthopedists specialized in adult hip surgery and 42 nd -year orthopedic residents classified all cases (sequentially and uninterruptedly, without time limitation) at 2 distinct moments, with an interval of 4 weeks. At the beginning of the evaluations, the detailed description of the new classification and its illustrative images were made available to the evaluators for learning the system. Each evaluation was performed individually, and neither the answers were allowed to be kept nor were the results discussed among them. Interobserver reliability was determined through the first response between the evaluators and the intraobserver through a new evaluation 4 weeks after the initial one. This interval was used to reduce the risk of memory bias. Data were collected and stored in spreadsheets for statistical analysis. The Cohen kappa coefficient was used to evaluate the intraobserver agreement, and the Fleiss kappa coefficient was calculated to evaluate the inter-observer agreement. The SPSS Statistcs for Windows, Version 20.0 software (IBM Corp., Armonk, NY, USA) and the Online kappa Calculator ( www.statisticssolutions.com ) were used for the analyses. The agreement assessment included 4 stages: type of fracture (A - trochanteric or B - neck), group (1, 2, or 3), subgroups (1, 2, or 3 - except in type B3, which does not have subgroups), and also their qualifiers when available in the classification, that is, only in A.1.1 (N or O) or B2 1.2 or 3 (P, Q, or R). Table 1 and Table 2 illustrate the differences between the old and new AO/OTA classifications.

Table 1

Illustrated comparison between the AO/OTA systems (Group A)

	Old AO/OTA	New AO/OTA
Group A1	Simple transtrochanteric	Simple transtrochanteric
A1.1	Simple undisplaced	Isolated fracture of trochanter*Qualifiers:n: greater trochantero: lesser trochanter
A1.2	Simple displaced	Two-part fracture
A1.3	Simple with distal to calcar extension	Fracture with intact lateral wall (> 20.5 mm)
Group A2	Transtrochanteric cominution	Multifragmentary transtrochanteric,incompetent lateral wall (<20.5 mm)
A2.1	Comminuta undisplaced
A2.2	Displaced with cominution	Fracture with 1 intermediate fragment
A2.3	Multifragmented (> 3 fragments)	Fracture with 2 or more fragmentsintermediate
Group A3	Reverse transtrochanteric	Reverse transtrochanteric
A3.1	Reverse obliquity	Simple, reverse obliquity
A3.2	Transverse reverse obliquity	Simple transverse, reverse obliquity
A3.3	Reverse obliquity with fracture of lesser trochanter	Reverse obliquity with wedge ormultifragmentary

Table 2

Illustrated comparison between the AO/OTA systems (Group B)

	Old AO/OTA	New AO/OTA
Group B1	Subcapital fracture with minimal displacement	Subcapital fracture
B1.1	Valgus impaction > 15 degrees	Valgus impaction
B1.2	Valgus impaction < 15 degrees	Undisplaced
B1.3	Not impacted	Displaced
Group B2	Transcervical fracture	Transcervical fracture
B2.1	Basicervical	Simple Qualifiers: p < 30° q = 30–70° r > 70°
B2.2	Adducted mid-cervical	Multifragmented Qualifiers: p < 30° q = 30–70° r > 70°
B2.3	Shear mid-cervical	With shear Qualifiers: p < 30° q = 30–70° r >70°
Group B3	Subcapital displaced fracture, not impacted	Basicervical fracture
B3.1	Moderate varus displacement and lateral rotation
B3.2	Moderate vertical displacement and lateral rotation
B3.3	Significant displacement

Although the degree of agreement has distinct forms of interpretation, 19 the classic proposal of Landis and Koch was adopted, 20 with values between 0.00 and 0.20 considered as mild agreement; 0.21 and 0.40 reasonable agreement; 0.41 and 0.60 moderate agreement; 0.61 and 0.80 substantial agreement, and 0.81 and 1.00 almost perfect agreement (or excellent). In the population sample studied, the mean age was 77.71 years (ranging from 57–98 years, standard deviation of 10.12). The female gender was predominant, with 63% of the cases, and the right side had one more case of fracture (51%).

Results

Intraobserver agreement

In the repeated evaluation with an interval of 4 weeks, the intraobserver agreement of type was almost perfect for specialists, with a mean k of 0.92, while that of residents was substantial (mean k 0.77). In the group classification, the specialists presented a substantial agreement (mean k 0.68), and residents a moderate agreement (mean k 0.44). For subgroup , the agreement of the specialists was moderate (mean k 0.52), and for the residents it was reasonable (mean k 0.28). Finally, in relation to qualifiers , the agreement of the specialists was also moderate (mean k 0.50), and that of the residents was reasonable (mean k 0.27). In general, specialists perform better than residents. It is also observed ( Figure 1 ) that the coefficients are decreasing as the branches of the classification are followed. Table 3 describes in detail the intraobserver findings.

Fig. 1

Comparison between specialists and residents of the mean kappa coefficient (intraobserver).

Table 3

Intraobserver agreement kappa coefficient

	kappa (Cohen)
Expert	Type	Group	Subgroup	Qualifier
1	0.972	0.705	0.607	0.607
2	0.972	0.589	0.376	0.338
3	0.894	0.747	0.599	0.600
4	0.851	0.713	0.500	0.459
Resident
1	0.828	0.468	0.298	0.300
2	0.851	0.443	0.259	0.260
3	0.806	0.640	0.421	0.391
4	0.608	0.230	0.153	0.144

Comparison between specialists and residents of the mean kappa coefficient (intraobserver).

Inter-observer agreement

Considering the first round as standard (used in most studies) for interobserver evaluation, we have an agreement in the type of 93.67% for specialists (k 0.87, almost perfect) and 90.17% for residents ( k 0.80, substantial). In the group , the agreement was 60.83% for the specialists (k 0.53, moderate) and 55.5%for the residents ( k 0.47, moderate). Advancing to subgroup , agreement dropped to 44.5% among specialists (k 0.41, moderate) and 42.7% for residents ( k 0.39, reasonable). Finally, in the qualifiers the agreement was 42.67% for the specialists (k 0.40, reasonable) and 41.0% for the residents (k 0.39, reasonable). Table 4 details the interobserver results. Both specialists and residents decreased the coefficients as the classification branches out. However, in the subgroup-qualifier transition, the decrease in agreement was not significant. In the first round, residents reached coefficients always below the coefficients of the specialists ( Figure 2 ), but in the second round, residents presented a greater agreement with each other than the specialists ( Figure 3 ).

Table 4

Interobserver agreement kappa coefficient

	Rating 1		Rating 2
Type	% agreement	Kappa	% agreement	Kappa
Experts	93.67	0.87	97.00	0.94
Residents	90.17	0.80	94.50	0.89
Group
Experts	60.83	0.53	58.83	0.51
Residents	55.50	0.47	69.50	0.63
Subgroup
Experts	44.50	0.41	39.67	0.35
Residents	42.67	0.39	57.67	0.55
Qualifier
Experts	42.67	0.40	37.33	0.35
Residents	41.00	0.39	57.17	0.55

Fig. 2

Interobserver agreement in the first evaluation.

Fig. 3

Interobserver agreement in the second evaluation.

Interobserver agreement in the first evaluation. Interobserver agreement in the second evaluation.

Discussion

In the previous AO/OTA classification, some fracture patterns occurred so rarely that there was no need for an exclusive coding for them. The terminology was the focus of confusion, due to the wide variety of terms for similar fractures. There was also difficulty in defining the fractures of group A2. In the new classification, the definitions and codes have been updated and simplified. The neck fractures were reorganized, and the Pauwels qualifier added to better definition of instability, especially in high-energy fractures. 19 A fracture classification system should have adequate agreement between the same observer at different opportunities (intraobserver) and between different observers at the same time (interobserver). 6 17 18 The kappa (k) coefficient is one of the most used methods to evaluate the diagnostic accuracy of a classification system; its calculation has been adjusted for possible coincidences at random. 3 In the current study, for the specialists, intraobserver agreement was almost perfect for type , substantial for group and moderate for subgroup and qualifier, while residents performed worse in all divisions. Comparing interobserver agreements in both rounds was an interesting aspect of this research. A curious observation is that the interobserver agreement of the residents increased between the rounds, which may have indicated the capacity of learning the new classification. Studies on the previous AO/OTA system obtained results similar to those obtained in our study, but with small variations. The study by Pervez et al. 3 obtained an average k among its observers of 0.62 for group , higher than that observed in our study ( k 0.53 - moderate) and 0.33 in the s ubgroups , lower than we found ( k 0.41 - moderate). Urrutia et al. 6 obtained moderate agreement, as our results, among its 9 evaluators for the groups , and only reasonable for the subgroups which is lower than in our study ( k 0.41 - moderate). Mattos et al. 4 also obtained similar results with AO/OTA and Tronzo. Schwartsmann et al., 14 in a study also involving orthopedic surgeons and residents, obtained moderate agreement (0.60) for group , similar to the present study, and reasonable (0.34) for subgroups, lower than that observed in the present study ( k 0.41). Another interesting study, 15 with 100 fractures of the femoral neck, graded the Garden 7 classification as only reasonable interobserver agreement, which increased to moderate by simplifying the criterion for fracture displacement. This indicates that, in certain fractures, even a fairly simple classification can generate only moderate agreement. In summary, considering the old classification, in the present study we obtained similar agreement to that of the literature for type and group and better agreement for subgroup , while qualifiers were not available in the old classification. This indicates that the new system was successful in bringing greater agreement in the subgroups , which were more extensively modified. Evaluating the issue of experience of the examiners in the AO/OTA classification, Crijins et al. 16 did not observe a difference between 65 surgeons divided between more and less experienced according to the variables of practice time (> or < 17 years), work time dedicated to trauma (> or < 80%), and fractures treated per year (> or < 50). In an analogy with our study, we evaluated that the residents matched the specialists in the second evaluation, indicating a fast-learning curve in this system. Fung et al. 21 also noted that more experienced residents, in their final part of training, had a better assessment than the less experienced, indicating the learning of the old classification. About the new AO/OTA system specifically, it is interesting to note the elimination of subgroup A2.1, which can help distinguish stable patterns (group A1), from the unstable ones (groups A2 and A3). Studies of the previous system 11 that tried to discern the extent to which a trochanteric fracture was stable had somewhat conflicting results. Radaideh et al., 22 in a study on the use of cephalomedullary stems, defined groups A2 and A3 as unstable, as well as Zhang et al. 23 However, Knobe et al. 24 mention that groups A2 and A3 are generally considered unstable in the literature, but in a direct evaluation, the fracture of the lesser trochanter was the main criterion of instability for 82% of surgeons, among other considered factors (fracture of the greater trochanter, lateral wall fracture and reverse obliquity). Another study 25 considered subgroups A2.1, A2.2, and A3.3 as unstable. The current classification has the integrity of the lateral wall (width greater than 20.5 mm) as the division pattern between groups A1 and A2. The rationale for this division was initially described by Gottfried 26 and Palm et al., 27 who defined the lateral wall as an important structure for implant support. Later, Hsu et al. 28 were able to evaluate the thickness of the lateral wall would be necessary for safe synthesis with sliding screw plate, which motivated the current change of classification. Other studies 29 reviewed the subject and brought strategies for lateral wall reconstruction even with the use of intramedullary synthesis. Based on this literature review and the difficulties of classification into subgroups, we find it interesting to divide the groups from A2 (including) as a parameter to consider an unstable fracture that requires accurate technique of reduction and intramedullary synthesis. Our study evidences the difficulties in classification systems for fractures of the proximal extremity of the femur. Despite these difficulties, this system demonstrated advantages over its predecessor 11 by simplifying a division for unstable exchange fractures at group level (A2 and A3) and may facilitate a possible choice of implant and reduction techniques. In neck fractures (type B), the new subdivision is simpler than the previous subgroup complexes ( Table 1 ), and also encompasses the Pauwels qualifier. Additionally, we verified through the literature a higher reliability of the new AO/OTA classification when compared to other very widespread systems (Garden, Evans, Boyd, Tronzo). In the present study we sought a consistent methodology for evaluating a classification of fractures, having as strengths the size and representativeness of the sample (larger than previous studies, and all the patterns of the new classification were identified by at least one evaluator), in addition to an adequate number of observers for better reliability of the Kappa coefficient. 20 The methodology of reliability of orthopedic classifications was examined by Audigé et al., 18 and the present study encompasses all the quality criteria described. In addition, no study was found evaluating this new system in Pubmed, Medline and Scielo databasis, which brings new and relevant data on this classification very popular among orthopedic surgeons. Another interesting point was the evolution of interobserver agreement noted among resident physicians between evaluations, indicating the learning of the system. A deficient point in this research was the inclusion of hip specialists compared to only four second-year residents. Perhaps the inclusion of first- and third-year residents could further demonstrate the learning process. As another possible limitation of the present study, we highlight the relatively high mean age of the sample, indicating a characteristic typical of the population profile of the hospital where it was performed. However, although there is a potential risk of not representing some more specific fractures of high-energy trauma, more common in younger patients, yet all the patterns of the new AO/OTA classification were identified at some point during the study. If we were to go to other younger patients to include in the sample, we would incur in selection bias. Moreover, studies of this nature have limitations inherent to its design, such as potential memory bias, which we consider low due to the large number of cases, the complexity of the classification and the time elapsed between evaluations. The new AO/OTA system has moderate interobserver and substantial intraobserver reliability for experienced evaluators. Resident physicians were able to achieve the same levels of agreement after a short learning period. Further studies are needed to assess their ability in relation to the indication of treatment (especially on type of synthesis) and prognosis.

Conclusion

The new AO/OTA classification for fractures of the trochanteric region and femoral neck showed appropriate intra and interobserver agreements for type and group, with worsening in subsequent branches, that is, subgroup and qualifier. Nevertheless, in relation to the old AO/OTA classification, there was an improvement in the agreements for subgroups.

Introdução

As fraturas da extremidade proximal do fêmur causam grande morbidade e mortalidade a curto e médio prazo nos idosos, 1 2 3 sendo que um terço dos pacientes evolui para óbito em um ano, e metade se torna dependente para locomoção. 4 5 Estas fraturas também podem acometer pacientes jovens vitimados por trauma de alta energia. 4 O tratamento destas lesões requer a atuação de uma equipe multidisciplinar. 6 Estas lesões têm tratamento eminentemente cirúrgico e, para que se defina o melhor tratamento, entre outros dados, é necessário que se classifique a fratura. Um sistema ideal de classificação deve permitir a comunicação entre médicos, padronizar a terminologia para pesquisa e guiar a decisão do tratamento. 6 Muitas tentativas de se criar um sistema de classificação de fraturas do fêmur proximal foram descritas, sendo as classificações de Garden, 7 Evans, 8 Boyd e Griffin, 9 Tronzo 10 e do Arbeitsgemeinschaft für Osteosynthesefragen/Orthopaedic Trauma Association (AO/OTA), 11 as mais conhecidas. A classificação AO/OTA tem evidência favorável de confiabilidade na região trocantérica 12 13 14 e no colo do fêmur 15 quando comparada as outras classificações comumente utilizadas. Porém, suas ramificações adicionais tendem a diminuir a confiança interobservadores 13 16 e requerem grande prática. 16 17 A validação de uma classificação ocorre com a demonstração de alguns critérios: boa correlação clínica, adequação em termos de concordância e acurácia e também validação construtiva (relevância). 18 Preocupações com a terminologia e linha de fluxo complexa da classificação AO/OTA prévia 19 motivaram a modernização da classificação. Entretanto, não localizamos estudos na literatura sobre a confiabilidade desta nova versão. Nesse sentido, este estudo teve como objetivos avaliar o grau de concordância intra e interobservadores em cada subdivisão sequencial da nova classificação AO/OTA para fraturas da extremidade proximal do fêmur, em observadores considerados experientes (cirurgiões de quadril) e observadores inexperientes (residentes de ortopedia).

Materiais e Métodos

Este estudo retrospectivo incluiu registros radiográficos de pacientes que sofreram fraturas da extremidade proximal do fêmur entre os anos de 2015 e 2019, tratados em um centro de referência em trauma ortopédico. Foram selecionados 100 casos consecutivos para esta avaliação. O tamanho da amostra foi arbitrado com base em estudos prévios 4 5 6 12 13 18 que utilizaram amostras menores para validação de classificações (entre 40–70 casos). O projeto foi previamente submetido e aprovado pela Comissão de Ética em Pesquisa da Instituição (CAAE: 30754120.7.0000.5226). Os critérios de inclusão foram: presença de fratura da extremidade proximal do fêmur (osso 3, região anatômica 1), da região trocantérica (grupo A) ou do colo do fêmur (grupo B), em indivíduos esqueleticamente maduros. Fraturas da cabeça do fêmur (que são melhor avaliadas por tomografia axial computadorizada) e fraturas patológicas não foram incluídas neste estudo. Cada participante recebeu as imagens digitalizadas de radiografia nas incidências em anteroposterior e perfil para análise. Não havia nenhuma informação do paciente ou do tratamento nas imagens utilizadas para classificação. Quatro ortopedistas especializados em cirurgia do quadril e 4 médicos residentes do 2° ano de ortopedia e traumatologia classificaram todos os casos (de maneira sequencial e ininterrupta, sem limitação de tempo) em 2 momentos distintos, com intervalo de 4 semanas. No início das avaliações, a descrição detalhada da nova classificação e suas imagens ilustrativas foram disponibilizadas aos avaliadores para aprendizagem do sistema. Cada avaliação foi realizada individualmente e não foi permitida a guarda das respostas ou a discussão dos resultados entre os mesmos. A confiabilidade interobservador foi determinada através da primeira resposta entre os avaliadores e a intra observador através de nova avaliação quatro semanas após a inicial. Este intervalo foi utilizado para se reduzir o risco de viés de memória. Os dados foram coletados e armazenados em planilhas para análise estatística. Para avaliar a concordância intraobservador foi calculado o coeficiente kappa de Cohen e para avaliar a concordância interobservadores foi calculado o coeficiente kappa de Fleiss. Para as análises foram usados o software SPSS Statistics for Windows, Version 20.0 (IBM Corp., Armonk, NY, EUA) e a Online kappa Calculator www.statisticssolutions.com . A avaliação de concordância incluiu 4 etapas: tipo de fratura (A- trocantérica ou B- colo), grupo (1, 2 ou 3), subgrupos (1, 2 ou 3–exceto no tipo B3, que não possui subgrupos) e também seus qualificadores quando disponíveis na classificação, ou seja, apenas nas A.1.1 (N ou O) ou B2 1, 2 ou 3 (P, Q ou R). A Tabela 1 e a Tabela 2 ilustram as diferenças entre as classificações AO/OTA antiga e a nova.

Tabela 1

Comparação ilustrada entre os sistemas AO/OTA (Grupo A)

	AO/OTA Antiga	AO/OTA Nova
Grupo A1	Transtrocantérica simples	Transtrocantérica simples
A1.1	Simples sem desvio	Fratura isolada de trocânter*Qualificadores:n: trocânter maioro: trocânter menor
A1.2	Simples desviada	Fratura em duas partes
A1.3	Simples com traço distal ao calcar	Fratura com parede lateral intacta (>20,5 mm)
Grupo A2	Transtrocantérica cominuta	Transtrocantérica multifragmentária, parede lateral incompetente (<20,5 mm)
A2.1	Cominuta sem desvio
A2.2	Cominuta desviada	Fratura com 1 fragmento intermediário
A2.3	Multifragmentada (> 3 fragmentos)	Fratura com 2 ou mais fragmentos intermediários
Grupo A3	Transtrocantérica Reversa	Transtrocantérica Reversa
A3.1	Traço reverso oblíquo	Traço reverso simples e oblíquo
A3.2	Traço reverso transverso	Traço reverso simples e transverso
A3.3	Traço reverso com fratura do pequeno trocanter	Traço reverso com cunha oumultifragmentária

Tabela 2

Comparação ilustrada entre os sistemas AO/OTA (Grupo B)

	AO/OTA Antiga	AO/OTA Nova
Grupo B1	Fratura subcapital com desvio mínimo	Fratura subcapital
B1.1	Impactada em valgo > 15 graus	Impactada em valgo
B1.2	Impactada em valgo < 15 graus	Sem desvio
B1.3	Não impactada	Desviada
Grupo B2	Fratura transcervical	Fratura transcervical
B2.1	Basocervical	Simples Qualificadores: p <30° q = 30-70° r >70°
B2.2	Médio cervical em adução	Multifragmentada Qualificadores: p <30° q = 30-70° r >70°
B2.3	Médio cervical com cisalhamento	Com cisalhamento Qualificadores: p <30° q = 30-70° r >70°
Grupo B3	Fratura subcapital desviada, não impactada	Fratura Basocervical
B3.1	Desvio moderado em varo e rotação lateral
B3.2	Desvio moderado vertical e rotação lateral
B3.3	Desvio significativo

Embora o grau de concordância tenha formas distintas de interpretação, 19 adotamos a proposta clássica de Landis e Koch, 20 com valores entre 0.00 e 0.20 considerados como concordância leve; 0.21 e 0.40 concordância razoável; 0.41 e 0.60 concordância moderada; 0.61 e 0.80 concordância substancial, e 0.81 e 1.00 concordância quase perfeita (ou excelente). Na amostra populacional estudada, a média de idade foi de 77,71 anos (variando de 57 a 98 anos, desvio padrão de 10,12). O sexo feminino foi predominante, com 63% dos casos, e o lado direito teve um caso a mais de fratura (51%).

Concordância intraobservador

Na avaliação repetida com intervalo de 4 semanas, a concordância intraobservador de tipo foi quase perfeita para os especialistas com k médio de 0.92, enquanto que a dos residentes foi substancial ( k médio 0.77). Já na classificação de grupo , os especialistas apresentam uma concordância substancial ( k médio 0.68) e os residentes moderada ( k médio 0.44). Para subgrupo , a concordância dos especialistas foi moderada ( k médio 0.52), e a dos residentes foi razoável ( k médio 0.28). Por fim, em relação aos qualificadores , a concordância dos especialistas também foi moderada ( k médio 0.50) e a dos residentes foi razoável ( k médio 0.27). De modo geral, os especialistas apresentam desempenho melhor que os residentes. Observa-se ainda ( Figura 1 ) que os coeficientes vão decrescendo à medida que se seguem as ramificações da classificação. A Tabela 3 descreve detalhadamente os achados intraobservadores.

Fig. 1

Comparativo entre especialistas e residentes do coeficiente médio kappa (intraobservador).

Tabela 3

Coeficiente kappa de concordância intraobservador

	kappa (Cohen)
Especialista	Tipo	Grupo	Subgrupo	Qualificador
1	0,972	0,705	0,607	0,607
2	0,972	0,589	0,376	0,338
3	0,894	0,747	0,599	0,600
4	0,851	0,713	0,500	0,459
Residente
1	0,828	0,468	0,298	0,300
2	0,851	0,443	0,259	0,260
3	0,806	0,640	0,421	0,391
4	0,608	0,230	0,153	0,144

Comparativo entre especialistas e residentes do coeficiente médio kappa (intraobservador).

Concordância interobservador

Considerando a primeira rodada como padrão (usado na maioria dos estudos) para a avaliação interobservador, temos uma concordância no tipo de 93,67% para os especialistas ( k 0.87, quase perfeita) e 90,17% para os residentes ( k 0.80, substancial). No grupo , a concordância foi de 60,83% para os especialistas ( k 0.53, moderada) 55,5% para os residentes ( k 0.47, moderada). Avançando para subgrupo a concordância caiu para 44,5% entre os especialistas ( k 0.41, moderada) e 42,7% para os residentes ( k 0.39, razoável). Finalmente, nos qualificadores , a concordância foi 42,67% para os especialistas ( k 0.40, razoável) e 41,0% para os residentes ( k 0.39, razoável). A Tabela 4 detalha os resultados interobservadores. Tanto os especialistas quanto os residentes tiveram decréscimo dos coeficientes à medida que a classificação se ramifica. No entanto, na transição subgrupo-qualificador , o decréscimo da concordância foi pouco significativo. Na primeira rodada, os residentes alcançaram coeficientes sempre abaixo dos coeficientes dos especialistas ( Figura 2 ), mas na segunda rodada, os residentes apresentam uma concordância maior entre si do que os especialistas ( Figura 3 ).

Tabela 4

Coeficiente kappa de concordância (Fleiss) interobservador (especialistas e residentes)

	Avaliação 1		Avaliação 2
TIPO	% concordância	Kappa	% concordância	Kappa
Especialistas	93,67	0,87	97,00	0,94
Residentes	90,17	0,80	94,50	0,89
GRUPO
Especialistas	60,83	0,53	58,83	0,51
Residentes	55,50	0,47	69,50	0,63
SUBGRUPO
Especialistas	44,50	0,41	39,67	0,35
Residentes	42,67	0,39	57,67	0,55
QUALIFICADOR
Especialistas	42,67	0,40	37,33	0,35
Residentes	41,00	0,39	57,17	0,55

Fig. 2

Concordância interobservadores na primeira avaliação.

Fig. 3

Concordância interobservadores na segunda avaliação.

Concordância interobservadores na primeira avaliação. Concordância interobservadores na segunda avaliação.

Discussão

Na classificação AO/OTA prévia, alguns padrões de fratura ocorriam tão raramente que não havia necessidade de uma codificação exclusiva para estes. A terminologia era foco de confusão, devido a grande variedade de termos para fraturas similares. Havia, ainda, dificuldade em definir-se as fraturas do grupo A2. Na nova classificação, as definições e os códigos foram atualizados e simplificados. As fraturas do colo foram reorganizadas e o qualificador de Pauwels agregado para melhor definição de instabilidade, especialmente em fraturas de alta energia. 19 Um sistema de classificação de fraturas deve ter concordância adequada entre o mesmo observador em oportunidades diferentes (intraobservador) e entre diferentes observadores na mesma oportunidade (interobservador). 6 17 18 O coeficiente kappa (k) é um dos métodos mais usados para avaliar a acurácia diagnóstica de um sistema de classificação; tendo seu cálculo ajustado para possíveis coincidências ao acaso. 3 Neste estudo, para os especialistas, a concordância intraobservador foi quase perfeita para tipo , substancial para grupo e moderada para subgrupo e qualificador , enquanto os residentes tiveram um desempenho abaixo em todas as divisões. Comparar as concordâncias interobservadores nas duas rodadas foi um aspecto interessante desta pesquisa. Uma observação curiosa é que a concordância interobservadores dos residentes aumentou entre as rodadas, o que talvez tenha indicado a capacidade de aprendizado da nova classificação. Estudos sobre o sistema AO/OTA anterior obtiveram resultados similares aos obtidos em nosso estudo, mas com pequenas variações. O estudo de Pervez et al. 3 obteve um k médio entre seus observadores de 0.62 para grupo , maior do que o observado em nosso estudo ( k 0,53 - concordância moderada) e de 0.33 nos s ubgrupos , menor do que encontramos neste estudo ( k 0,41 - concordância moderada). Urrutia et al. 6 obteve concordância moderada, como nossos resultados, entre seus 9 avaliadores para os grupos , e apenas razoável para os subgrupos menor do que em nosso estudo ( k 0,41 - concordância moderada). Mattos et al. 4 também obteve resultados semelhantes com AO/OTA e Tronzo. Já Schwartsmann et al., 14 em um estudo também englobando ortopedistas e residentes, obteve concordância moderada (0.60) para grupo , similar à presente pesquisa, e razoável (0.34) para os subgrupos , menor do que a observada no presente estudo (k 0,41). Outro estudo interessante, 15 com 100 fraturas do colo do fêmur, graduou a classificação de Garden 7 como de concordância interobservadores apenas razoável, que aumentou para moderada ao se simplificar o critério para fraturas desviadas ou não. Isto indica que, em determinadas fraturas, até mesmo uma classificação bastante simples pode gerar concordância apenas moderada. Em resumo, em relação à classificação antiga, no presente estudo obtivemos concordância similar a da literatura para tipo e grupo e melhor concordância para subgrupo, enquanto qualificadores não eram disponíveis na classificação antiga. Isso indica que o novo sistema obteve sucesso ao trazer maior concordância nos subgrupos , que foram mais extensamente modificados. Avaliando a questão de experiência dos examinadores na classificação AO/OTA, Crijins et al. 16 não observaram diferença entre 65 cirurgiões divididos entre mais e menos experientes de acordo com as variáveis de tempo de prática (> ou < 17 anos), tempo de trabalho dedicado ao trauma (> ou < 80%) e fraturas tratadas por ano (> ou < 50). Em uma analogia com nosso estudo, avaliamos que os residentes se equipararam aos especialistas na segunda avaliação, indicando uma curva rápida de aprendizado neste sistema. Fung et al. 21 também notou que residentes mais experientes, na sua parte final de treinamento, tinham uma avaliação melhor que os menos experientes, indicando o aprendizado da classificação antiga. Sobre o novo sistema AO/OTA especificamente, é interessante notar a eliminação do subgrupo A2.1, o que pode auxiliar na distinção de padrões estáveis (grupo A1), dos instáveis (grupos A2 e A3). Estudos do sistema prévio 11 que tentaram discernir até que ponto uma fratura trocantérica era estável tiveram resultados algo conflitantes. Radaideh et al. 22 em um estudo sobre uso de hastes céfalo-medulares, definia os grupos A2 e A3 como instáveis, assim com Zhang et al. 23 Já Knobe et al. 24 citam que os grupos A2 e A3 são geralmente considerados instáveis na literatura, porém em uma avaliação direta a fratura do trocânter menor era o principal critério de instabilidade para 82% dos cirurgiões, entre outros fatores considerados (fratura do trocânter maior, fratura da parede lateral e traço reverso). Outro estudo 25 considerava com instáveis os subgrupos A2.1, A2.2 e A3.3. A classificação atual tem a integridade da parede lateral (largura superior a 20,5 mm) como o padrão de divisão entre o grupo A1 e A2. O racional para esta divisão foi inicialmente descrito por Gottfried 26 e Palm et al., 27 que definiram a parede lateral como importante estrutura para suporte de implantes. Posteriormente, Hsu et al. 28 conseguiram avaliar qual a espessura da parede lateral seria necessária para síntese com placa-parafuso deslizante com segurança, o que motivou a mudança atual da classificação. Outros estudos 29 revisaram o assunto e trouxeram estratégias para reconstrução da parede lateral mesmo com o uso de síntese intramedular. Com base nesta revisão da literatura e as dificuldades de classificação em subgrupos, achamos interessante a divisão dos grupos a partir de A2 (inclusive) como parâmetro para considerar uma fratura instável que requer acurada técnica de redução e síntese intramedular. Nosso estudo evidencia as dificuldades em sistemas de classificação para fraturas da extremidade proximal do fêmur. Apesar destas dificuldades, este sistema demonstrou vantagens em relação ao seu antecessor 11 ao simplificar uma divisão para fraturas trocantéricas instáveis ao nível de grupo (A2 e A3) podendo facilitar uma eventual escolha de implante e técnicas de redução. Nas fraturas do colo (tipo B), a nova subdivisão é mais simples do que os complexos subgrupos prévios ( Tabela 1 ), e também engloba o qualificador de Pauwels. Adicionalmente, verificamos através da literatura, uma confiabilidade superior da nova classificação AO/OTA quando comparada com outros sistemas muito difundidos (Garden, Evans, Boyd, Tronzo). No presente estudo, buscamos uma metodologia consistente para avaliação de uma classificação de fraturas, tendo como pontos fortes o tamanho e representatividade da amostra (maior do que estudos prévios, sendo que todos os padrões da nova classificação foram identificados por pelo menos um avaliador), além de um número adequado de observadores para melhor confiabilidade do coeficiente de Kappa. 20 A metodologia de trabalhos de confiabilidade de classificações ortopédicas foi examinada por Audigé et al., 18 e o presente estudo engloba todos os critérios de qualidade descritos. Além disso, não foi encontrado nenhum estudo com avaliação deste novo sistema nas bases de dados do Pubmed , Medline e Scielo , o que traz dados inéditos e relevantes sobre esta classificação muito popular entre cirurgiões ortopédicos. Outro ponto interessante foi a evolução da concordância interobservadores notada entre médicos residentes entre as avaliações, indicando o aprendizado do sistema. Um ponto deficitário nesta pesquisa foi a inclusão de especialistas de quadril em comparação com apenas quatro residentes do segundo ano. Talvez, a inclusão também de residentes do primeiro e do terceiro ano pudesse demonstrar ainda melhor o processo de aprendizado. Como outra possível limitação do presente estudo, destacamos a idade média relativamente alta da amostra, indicando uma característica típica do perfil da população do hospital onde este foi realizado. Porém, embora exista risco potencial de não se representarem algumas fraturas mais específicas do trauma de alta energia, mais comum nos mais jovens, ainda assim, todos os padrões da nova classificação AO/OTA foram identificados em algum momento durante o estudo. Se buscássemos outros pacientes mais jovens para incluir na amostra, incorreríamos em viés de seleção. Além disso, estudos desta natureza possuem limitações inerentes ao seu desenho, como potencial vício de memória, que consideramos baixo devido ao grande número de casos, a complexidade da classificação e pelo tempo transcorrido entre as avaliações. O novo sistema AO/OTA tem uma confiabilidade moderada interobservadores e substancial intraobservadores para avaliadores experientes. Médicos residentes são capazes de atingir os mesmos níveis de concordância após curto período de aprendizado . Mais estudos são necessários para avaliar sua capacidade em relação à indicação de tratamento (especialmente sobre tipo de síntese) e de prognóstico.

Conclusão

A nova classificação AO/OTA para fraturas da região trocantérica e do colo do fêmur mostrou concordâncias intra- e interobservadores adequadas para tipo e grupo , com queda nas ramificações subsequentes, ou seja, subgrupo e qualificador. Ainda assim, em relação à classificação AO/OTA antiga, houve melhora nas concordâncias para subgrupos.

24 in total

1. Reliability of classification systems for intertrochanteric fractures of the proximal femur in experienced orthopaedic surgeons.

Authors: Wen-Jie Jin; Li-Yang Dai; Yi-Min Cui; Qing Zhou; Lei-Sheng Jiang; Hua Lu
Journal: Injury Date: 2005-04-07 Impact factor: 2.586

2. The treatment of trochanteric fractures of the femur.

Authors: E M EVANS
Journal: J Bone Joint Surg Br Date: 1949-05

3. The reliability of a simplified Garden classification for intracapsular hip fractures.

Authors: D Van Embden; S J Rhemrev; F Genelin; S A G Meylaerts; G R Roukema
Journal: Orthop Traumatol Surg Res Date: 2012-05-03 Impact factor: 2.256

4. What makes an intertrochanteric fracture unstable in 2015? Does the lateral wall play a role in the decision matrix?

Authors: Akhil A Tawari; Harish Kempegowda; Michael Suk; Daniel S Horwitz
Journal: J Orthop Trauma Date: 2015-04 Impact factor: 2.512

5. Lateral femoral wall thickness. A reliable predictor of post-operative lateral wall fracture in intertrochanteric fractures.

Authors: C-E Hsu; C-M Shih; C-C Wang; K-C Huang
Journal: Bone Joint J Date: 2013-08 Impact factor: 5.082

6. Reliability of the classification of proximal femur fractures: Does clinical experience matter?

Authors: Tom J Crijns; Stein J Janssen; Jacob T Davis; David Ring; Hugo B Sanchez
Journal: Injury Date: 2018-03-15 Impact factor: 2.586

Review 7. How reliable are reliability studies of fracture classifications? A systematic review of their methodologies.

Authors: Laurent Audigé; Mohit Bhandari; James Kellam
Journal: Acta Orthop Scand Date: 2004-04

Review 8. The orthogeriatrics model of care: systematic review of predictors of institutionalization and mortality in post-hip fracture patients and evidence for interventions.

Authors: Marta Martinez-Reig; Laura Ahmad; Gustavo Duque
Journal: J Am Med Dir Assoc Date: 2012-09-01 Impact factor: 4.669

9. Evaluation of the reproducibility of the Tronzo classification for intertrochanteric fractures of the femur.

Authors: Fernando Abdala Silva Oliveira; Ricardo Basile; Bruno Cézar Brabo Pereira; Rafael Levi Louchard Silva da Cunha
Journal: Rev Bras Ortop Date: 2014-11-07

10. Functional and Radiological Results of Proximal Femoral Nail Antirotation (PFNA) Osteosynthesis in the Treatment of Unstable Pertrochanteric Fractures.

Authors: Ahmad M Radaideh; Hashem A Qudah; Ziad A Audat; Rami A Jahmani; Ibraheem R Yousef; Abed Allah A Saleh
Journal: J Clin Med Date: 2018-04-12 Impact factor: 4.241