Literature DB >> 35572425

The reliability of the Neer classification for proximal humerus fractures: a survey of orthopedic shoulder surgeons.

Mikaël Chelli^1,2, Gregory Gasbarro³, Vincent Lavoué¹, Marc-Olivier Gauci⁴, Jean-Luc Raynier¹, Christophe Trojani¹, Pascal Boileau¹.

Abstract

Background: The Neer classification is among the most widely used systems to describe proximal humerus fractures (PHF) despite the poor interobserver agreement. The purpose of this study was to verify whether or not blinded shoulder surgeons and trainees agree with the authors of articles published in the highest impact-factor orthopedic journals.
Methods: All articles regarding PHF published between 2017 and 2019 in the top 10 orthopedic journals as rated by impact factor were searched. Articles were included if the authors used the Neer classification to describe at least 1 PHF in the figures. Figures were extracted without the legend, and X-rays ± computed tomography scan images were included when available. An international survey was conducted among 138 shoulder surgeons who were asked to record the Neer classification for each de-identified radiograph in the publications. The type of fracture mentioned in the legend of the published figure was considered as the gold standard.
Results: Survey participants agreed with the published article authors in 55% of cases overall (range 6%-96%, n = 35). The most common response disagreed with the article authors in 13 cases (37%), underestimating the number of parts in 11 of 13 cases. The interobserver agreement between the 138 responders was fair (k = 0.296). There was an association between the percentage of concordant answers and greater experience (number of years of shoulder surgery practice) of the responders (P = .0023). The number of parts, the number or type of available imaging modalities, and the geographic origin of participants did not influence the agreement between responders and authors. Discussion: In more than one-third of cases, specialized shoulder surgeons disagree with article authors when interpreting the Neer classification of de-identified images of PHF in published manuscripts. Morphologic classification of PHF as the sole basis for treatment algorithms and surgical success should be scrutinized.

Entities: Chemical

Keywords: Interobserver agreement; Neer classification; Proximal humerus fracture; Reliability; Survey; Traumatology

Year: 2022 PMID： 35572425 PMCID： PMC9091924 DOI： 10.1016/j.jseint.2022.02.006

Source DB: PubMed Journal: JSES Int ISSN： 2666-6383

The Neer classification is among the most widely used systems to describe proximal humerus fractures (PHF) due to its numerous advantages. Based on observations made by Codman, this simple classification system provides a conceptual framework to explain the pathoanatomy of PHF. As stated by Martin and Marsh, “fracture description is important and cannot be replaced by classification”. However, the Neer classification is useful in grouping similar fracture patterns given the high variability of each unique fracture line. This can guide treatment in the acute setting, and help to predict functional outcomes or fracture sequelae.,, These strengths, among others, have led to the widespread use of this classification among surgeons in their clinical practice and for research purposes. The Neer classification has been shown to be poorly reproducible, however, regardless of imaging modality.,,,, Interobserver agreement rarely exceeds 0.50 on X-rays and 0.60 on computed tomography (CT) scans.,,,, This may lead to discord among the readers and authors of a large volume of PHF literature based on Neer's system. As such, the applicability of published treatment decisions or outcomes and the ability to compare and pool study results for a given Neer fracture type may be limited. The purpose of the present study was to evaluate the Neer classification concordance between the original authors and physician participants of the 2020 Nice Shoulder Course asked to interpret de-identified X-rays and CT scans from peer-reviewed articles. We hypothesized that course participants (i.e., specialized shoulder surgeons) would agree on the Neer classification of the PHF presented in the published literature.

Methods

Study design

An international online survey was made available to all surgeons and trainees attending the Nice Shoulder Course from July 9, to July 11, 2020. Participants were asked to describe themselves as a resident, fellow in shoulder surgery, shoulder surgeon with less than 10 years of experience, or shoulder surgeon with at least 10 years of experience.

Article selection

Inclusion criteria: (1) published and peer-reviewed article between January 1, 2017 and December 31, 2019, (2) indexed in PubMed, (3) top 10 orthopedic journals according to impact factor, (4) PHF as a primary focus of study, and (5) at least 1 Neer classified image in a skeletally mature patient. Exclusion criteria: (1) studies that also addressed humeral shaft fractures and distal humerus fractures, (2) biomechanical or finite element studies, (3) reviews of the literature, (4) surgical treatment articles that lacked preoperative imaging, and (5) annotations over the imaging that prevented blinding.

Figure extraction

X-rays and/or CT scans from the included articles were retrieved, and the associated legends or other identifying information were removed. When several imaging modalities were available for the same fracture, they were retrieved together. When several fractures were available in the same article, they were all extracted independently. The figures were presented in their original format without modifying orientation or contrast.

Survey

An online survey was designed. For each de-identified image, participants were asked to classify the fracture into one of the 4 groups of the simplified Neer classification: 1-part, 2-part, 3-part, or 4-part. There was no distinct class for fracture-dislocation. There was no pretraining for participants, and answering all of the questions was mandatory to complete the survey. The answers were not timed, but the survey was only accessible for 3 days. The online survey was hosted on the BeeMed platform (Geneva, Switzerland).

Statistical analyses

An agreement was defined as a survey responder classifying a fracture the same way as the authors of the article. Each participant was then given a score corresponding to the number of fractures that agreed with the authors. This score was normalized to 100 and reported as a percentage. Scores were compared between groups of responders with the Welch's t-test for unpaired samples or the Kruskal–Wallis test according to the number of groups. We defined the majority vote for a given fracture as the answer having the largest number of answers among responders. Interobserver agreements were calculated with Fleiss kappa coefficients and rated according to the criteria of Landis and Koch. Statistical significance was set at P < .05. Statistical analyses were performed with EasyMedStat (version 2.5.0; www.easymedstat.com, Levallois-Perret, France).

Results

Fractures

Thirty-five distinct fracture images were extracted from 20 articles (Table I). The fractures were classified by the authors as 2-part in 6 cases, 3-part in 7 cases and 4-part in 22 cases. Five fractures were associated with a glenohumeral dislocation. The available imaging modalities were: X-rays only (n = 24), X-ray + 2D CT scan (n = 3) and X-ray + 3D CT scan (n = 8). There was only 1 available image for 18 fractures (AP view X-ray only) and at least 2 images (several X-ray views or X-ray + CT scan) for the other 17 fractures.

Table I

Characteristics of included articles.

First author	Year	Journal	Number of fractures
Grubhofer¹⁴	2017	JSES	1
Kancherla²²	2017	J Am Acad Orthop Surg	2
Padegimas³⁶	2017	JSES	2
Park³⁷	2017	JSES	1
Singh⁴³	2017	JSES	3
Trikha⁴⁸	2017	JSES	2
Boileau⁴	2018	JSES	3
Boileau³	2018	JSES	1
Chen⁹	2018	JSES	2
Chen⁸	2018	JSES	2
Chung¹⁰	2018	Acta Orthopaedica	2
Kim²³	2018	JSES	2
Singh⁴⁴	2018	JSES	1
Cai⁷	2019	JSES	1
Hudgens¹⁸	2019	JSES	2
Jorge-Mora²¹	2019	JSES	1
Klug²⁴	2019	JSES	2
Large²⁷	2019	J Am Acad Orthop Surg	1
Sears³⁹	2019	J Am Acad Orthop Surg	3
Siebenbürger⁴¹	2019	JSES	1

JSES, Journal of Shoulder and Elbow Surgery; J Am Acad Orthop Surg, Journal of the American Academy of Orthopaedic Surgeons.

Characteristics of included articles. JSES, Journal of Shoulder and Elbow Surgery; J Am Acad Orthop Surg, Journal of the American Academy of Orthopaedic Surgeons.

Participants

All 503 participants were invited to participate in the survey. One hundred and thirty-eight participants (27%) responded to the survey, including 27 residents, 15 fellows, 46 surgeons with less than 10 years of experience, and 50 surgeons (36%) with at least 10 years of experience. Participants originated from 41 countries, the most frequent being France (n = 28), Spain (n = 10), Brazil (n = 10), Chile (n = 8), Portugal (n = 8) and the United Kingdom (n = 7).

Agreement with the authors

Participants agreed with the authors in 55% of cases overall (range 6%-96% for the 35 fractures). For 4 fractures, all classified as 4-part by the authors, the agreement rate of responders was below 25% (Fig. 1). For 15 fractures, the agreement rate was below 50% (Fig. 2), and for 3 fractures, the agreement rate was over 90% (Fig. 3). There was an association between the percentage of concordant answers and surgical experience of the responder: 46%, 55%, 56%, and 58% respectively for residents, fellows, surgeons with less than 10 years of experience and surgeons with at least 10 years of experience (P = .0023). This difference was only found when comparing residents to older participants and not found between fellows, surgeons with less than 10 years or with at least 10 years of experience (P = .551). Responders agreed with the journal authors for 62% of fractures classified as 2-part by authors, 58% for 3-part fractures, and 52% for 4-part fractures (P = .634). The agreement rate of responders was 74% for the 5 fracture-dislocations and 52% for the 30 non-dislocated fractures (P = .066). No difference according to the journal or geographic origin of the participant was discovered. The number of available images or inclusion of a CT scan did not increase the rate of agreement between responders and authors (Table II).

Figure 1

Fractures with the lowest agreement between responders and article authors.

Figure 2

Agreement rate between responders and article authors for the 35 included fractures.

Figure 3

Fractures with the highest agreement between responders and article authors.

Table II

Rate of agreement between participants and authors.

	Agreement (%)	P value
Overall (n = 138)	54.7
Experience		.0023∗
Residents (n = 27)	45.9
Fellows (n = 15)	54.5
Surgeons < 10 y. of experience (n = 46)	56.0
Surgeons ≥ 10 y. of experience (n = 50)	58.2
Fracture classification according to the authors		.634
2-part (n = 6)	62.0
3-part (n = 7)	57.5
4-part (n = 22)	51.8
Fracture-dislocation		.066
No (n = 30)	51.5
Yes (n = 5)	73.6
4-part fractures		.112
Nondislocated 4-part fracture (n = 18)	46.8
Dislocated 4-part fractures (n = 4)	74.1
Journal		.820
J. Shoulder and Elbow Surgery (n = 27)	53.5
J. Am. Acad. Orthop. Surgery (n = 6)	60.4
Acta Orthopedica (n = 2)	52.5
Origin of participants		.689
Asia (n = 9)	56.8
Europe (n = 73)	53.4
Middle East (n = 9)	51.1
North Africa (n = 2)	45.8
North America (n = 9)	51.4
Oceania (n = 3)	61.0
South America (n = 30)	58.7
Imaging modality		.845
X-rays only (n = 24)	54.4
X-rays + 2D CT-scan (n = 3)	47.1
X-rays + 3D CT-scan (n = 8)	58.4
Number of available images (X-ray and/or CT-scan)		.385
1 imaging (n = 18)	51.9
2 imaging or more (n = 17)	57.8

J. Shoulder and Elbow Surgery, Journal of Shoulder and Elbow Surgery; J. Am. Acad. Orthop. Surgery, Journal of the American Academy of Orthopaedic Surgeons; CT, computed tomography.

P < .05.

Fractures with the lowest agreement between responders and article authors. Agreement rate between responders and article authors for the 35 included fractures. Fractures with the highest agreement between responders and article authors. Rate of agreement between participants and authors. J. Shoulder and Elbow Surgery, Journal of Shoulder and Elbow Surgery; J. Am. Acad. Orthop. Surgery, Journal of the American Academy of Orthopaedic Surgeons; CT, computed tomography. P < .05.

Voting trends

The most common survey response (majority vote) for each fracture agreed with author articles between 38% and 96% of the time (mean 66%). The most common response disagreed with the article authors in 13 cases (37%), including 2 2-part fractures (33% of 2-part fractures) and 11 4-part fractures (50% of 4-part fractures). In 11 cases, the participants underestimated the number of parts as compared to the authors: 10 4-part fractures classified as 3-part by the most common response and 1 4-part fracture classified as 2-part. In 2 cases, the most common response classified a 2-part fracture (according to the authors) as a 3-part fracture (Fig. 4).

Figure 4

2-part fractures (according to the authors) classified as 3-part fractures by the majority of responders.

Interobserver agreement

The interobserver agreement between the 138 responders was fair (k = 0.296 [0.294-0.299], P < .0001). Residents had lower agreement than other participants, but there was no significant difference between other levels of experience (Table III).

Table III

Interobserver agreement according to the level of experience.

Level of experience	κ [95% confidence interval]	P value∗
Residents (n = 27)	0.228 [0.217-0.240]	<.0001†
Fellow (n = 15)	0.313 [0.288-0.338]	<.0001†
Surgeons ≤ 10 years of experience (n = 46)	0.326 [0.319-0.333]	<.0001†
Surgeons > 10 years of experience (n = 50)	0.314 [0.307-0.320]	<.0001†

The P value indicates the probability that the κ value differs from 0.

P < .05.

Interobserver agreement according to the level of experience. The P value indicates the probability that the κ value differs from 0. P < .05.

Discussion

The key finding of the present study is that specialized shoulder surgeons disagree with article authors on the classification of PHF in the peer-reviewed literature on more than 1 out of 3 fractures (37%). When considering the findings from published articles for PHF, readers should be attentive to the methods used for the classification of these fractures and analyze each available image. As such, morphologic criteria as the sole basis for treatment algorithms and surgical success should be scrutinized. The objective of this study was to highlight the agreement or disagreement between readers and authors of published articles. It was not aimed at assessing interobserver reliability, which has been featured in prior work.,,, For instance, the frequently cited PROFHER randomized clinical trial showed only fair interobserver agreement of the Neer classification (k = 0.29). Practice experience in shoulder surgery does, however, play a role as interobserver agreement. The current study corroborates the findings of previous articles where more clinical experience was related to higher interobserver agreement.,, This may be due to increased familiarity with the Neer classification, time spent reading and analyzing PHF imaging, and the eventual convergence of opinion between mid- and late-career readers and authors. The number and type of available imaging did not play a significant role in the present study. This was likely the result of study design as many X-rays lacked orthogonal views, and only representative, static CT images were available for review. Imaging modality has been shown to variably influence interobserver agreement, however (Table IV). Iordens et al. found better interobserver agreement with CT scans as compared to X-rays, but no difference between 2D and 3D CT scans. Foroohar et al. found similar results but only in the subgroups of upper-limb surgeons, and both studies only observed a fair to moderate agreement (k ≤ 0.60) with 3D CT scans. Brunner et al. obtained an excellent interobserver agreement (k = 0.80) using stereo-visualization of 3D CT scans with special polarized lenses spectacles. In contrast, several studies did not find any advantage of CT scans over X-rays.,,,,,

Table IV

Review of literature: interobserver agreements according to imaging modality.

First author	Year	Interobserver agreement (κ)
First author	Year	X-rays	2D CT-scan	3D CT-scan
Kristiansen²⁵	1988	0.07-0.48	-	-
Siebenrock⁴²	1993	0.40	-	-
Bernstein²	1996	0.52	0.50	-
Sjödén⁴⁵	1997	-	0.42	-
Shrader⁴⁰	2005	0.47	0.34	-
Mora Guix³³	2006	0.35	0.44	-
Brunner⁶	2009	0.48	0.58	0.80∗
Foroohar¹³	2011
Overall		0.14	0.06	0.09
Upper-limb specialists		0.03	0.23	0.32
Berkes¹	2014	0.42	0.67	0.63
Matsushigue³⁰	2014	0.37	-	0.57
Handoll¹⁵	2016	0.48	-	-
Iordens¹⁹	2016	0.29	0.51	0.51
Sumrein⁴⁶	2018	0.73	0.72	-
Torrens⁴⁷	2018	0.50	0.53	0.46

CT, computed tomography.

The 3D CT, scans reconstructions were visualized with special spectacles for ‘real’ 3D projection.

Review of literature: interobserver agreements according to imaging modality. CT, computed tomography. The 3D CT, scans reconstructions were visualized with special spectacles for ‘real’ 3D projection. The Neer classification does have numerous advantages.,, Further work is needed to better define the morphologic criteria of PHF to increase the reproducibility and generalizability of published results. First, authors should describe precisely how they measured fracture displacement: the imaging modality, the method for multiplanar or 3D reconstruction, the use of a ruler or goniometer, and the cut-off used to define a ‘part’ (45° and/or 5 mm or 10 mm). Second, computerized tools should be developed to assist clinicians in these measurements. Plain radiographs or multiplanar reconstructions often fail to accurately measure fracture displacement, which occurs in 3 dimensions. This study has several limitations. The online survey was not monitored and un-timed. Published articles were considered as the gold standard for study design but this may not be appropriate. Furthermore, the authors of these articles presumably had access to orthogonal X-ray views and serial, multiplanar CT images, which were unavailable to survey participants and could explain discordances in the results. However, out of 16 clinical studies included in this article, only 7 reported using systematic CT scans (44%) to classify fractures, and 9 did not report CT scan acquisitions or were only limited to complex cases. The 4 remaining studies were reviews. Last, intraobserver agreement, which has been previously described by several authors,,, was not studied in our article. The strengths of this study are the large number of participants rarely reached in prior interobserver studies,,, and their quality as most had specialty training and experience in shoulder surgery.

Conclusion

In more than one-third of cases, specialized shoulder surgeons disagree with article authors regarding the Neer classification based on images of PHF provided in peer-reviewed manuscripts. Morphologic criteria of PHF as the sole basis for treatment algorithms and surgical success should be scrutinized.

Disclaimers

Funding: No funding was disclosed by the authors. Conflicts of interest: The authors, their immediate families, and any research foundation with which they are affiliated have not received any financial payments or other benefits from any commercial entity related to the subject of this article.

46 in total

1. The impact of stereo-visualisation of three-dimensional CT datasets on the inter- and intraobserver reliability of the AO/OTA and Neer classifications in the assessment of fractures of the proximal humerus.

Authors: A Brunner; P Honigmann; T Treumann; R Babst
Journal: J Bone Joint Surg Br Date: 2009-06

2. Reverse shoulder arthroplasty for acute fractures in the elderly: is it worth reattaching the tuberosities?

Authors: Pascal Boileau; Tjarco D Alta; Lauryl Decroocq; François Sirveaux; Philippe Clavert; Luc Favard; Mikaël Chelli
Journal: J Shoulder Elbow Surg Date: 2018-12-18 Impact factor: 3.019

3. Defining optimal calcar screw positioning in proximal humerus fracture fixation.

Authors: Eric M Padegimas; Benjamin Zmistowski; Cassandra Lawrence; Aaron Palmquist; Thema A Nicholson; Surena Namdari
Journal: J Shoulder Elbow Surg Date: 2017-07-05 Impact factor: 3.019

4. An important lesson in assessing neurovascular involvement in proximal humeral fractures: the presence of neuropathic pain in a dysvascular limb.

Authors: Ashok K Singh; William Rudge; Tom Quick
Journal: J Shoulder Elbow Surg Date: 2018-01 Impact factor: 3.019

5. Displaced humeral surgical neck fractures: classification and results of third-generation percutaneous intramedullary nailing.

Authors: Pascal Boileau; Thomas d'Ollonne; Charles Bessière; Adam Wilson; Philippe Clavert; Armodios M Hatzidakis; Mikael Chelli
Journal: J Shoulder Elbow Surg Date: 2018-11-12 Impact factor: 3.019

Review 6. Current classification of fractures. Rationale and utility.

Authors: J S Martin; J L Marsh
Journal: Radiol Clin North Am Date: 1997-05 Impact factor: 2.303

7. Reverse shoulder arthroplasty for the treatment of acute complex proximal humeral fractures: Influence of greater tuberosity healing on the functional outcomes.

Authors: Carlos Torrens; Eduard Alentorn-Geli; Felipe Mingo; Carlo Gamba; Fernando Santana
Journal: J Orthop Surg (Hong Kong) Date: 2018 Jan-Apr Impact factor: 1.118

8. Functional outcomes after nonoperative management of fractures of the proximal humerus.

Authors: Beate Hanson; Philipp Neidenbach; Piet de Boer; Dirk Stengel
Journal: J Shoulder Elbow Surg Date: 2009 Jul-Aug Impact factor: 3.019

9. Do computed tomography and its 3D reconstruction increase the reproducibility of classifications of fractures of the proximal extremity of the humerus?

Authors: Thaís Matsushigue; Valmir Pagliaro Franco; Rafael Pierami; Marcel Jun Sugawara Tamaoki; Nicola Archetti Netto; Marcelo Hide Matsumoto
Journal: Rev Bras Ortop Date: 2014-03-27

10. Defining the fracture population in a pragmatic multicentre randomised controlled trial: PROFHER and the Neer classification of proximal humeral fractures.

Authors: H H G Handoll; S D Brealey; L Jefferson; A Keding; A J Brooksbank; A J Johnstone; J J Candal-Couto; A Rangan
Journal: Bone Joint Res Date: 2016-10 Impact factor: 5.853