Eitan M Ingall1, Philip Kaiser2, Soheil Ashkani-Esfahani3, John Zhao1, John Y Kwon4. 1. Harvard Combined Orthopaedic Residency Program, Massachusetts General Hospital, Boston, MA, USA. 2. Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA. 3. Foot & Ankle Research and Innovation Laboratory (FARIL), Massachusetts General Hospital, Boston, MA, USA. 4. Department of Orthopaedic Surgery, Division of Foot and Ankle Surgery, Massachusetts General Hospital, Boston, MA, USA.
Abstract
Background: The lateral fibular stress test (LFST), also known as the hook or Cotton test, is commonly performed to assess syndesmotic instability intraoperatively. Several studies have used 100 N as the force applied when performing the LFST to detect syndesmotic instability, though no evidence-based requisite force has been described for the test. We hypothesize that surgeons do not apply force uniformly or consistently when performing the LFST and that substantial variation exists. Fundamentally, this could lead to inconsistent diagnosis of syndesmotic instability as surgeons may not be applying the force in a consistent manner. Methods: A biomechanical ankle model consisting of an industrial force gauge attached through a SawBones model was fashioned. Orthopaedic attending surgeons and trainees were asked to perform a series of LFSTs and to simulate the force they typically apply intraoperatively. Basic demographic data were collected on each participant. Results: Thirty-three surgeons participated in the study, including 18 trainees. The median (IQR) force applied during the LFST was 96.42 (71.42-126.33), 87.49 (69.19-117.40), 99.99 (79.91-137.49), for the pooled group, attendings, and trainees respectively. More than half (54.5%) of all trials were less than 100 N (57.8% of surgeons, 51.8% trainees). Intraobserver correlation was excellent within the overall cohort (0.92, P < .001), trainees (0.90, P < .001), and attendings (0.94, P < .001), respectively. Interobserver reliability was fair among the overall cohort (κ =0.28, P = .49), and poor between the attendings (κ = 0.11, P = .69) and the trainees (κ = 0.05, P = .82), respectively. Conclusion: Our study demonstrates that the amount of force applied by typical surgeons when performing the LFST test is highly variable. Variable force application when performing the LFST may lead to inconsistent detection of syndesmotic instability, which may portend a poorer outcome. Clinical Relevance: In this study, we demonstrate the wide variability in the amount of force used during a lateral fibular stress test. High variability of force application when performing the LFST may lead to inconsistent diagnosis of syndesmotic instability, which may portend a poorer outcome. Our findings suggest the need for further investigation into the technical aspects of syndesmotic testing that will permit more reproducible and valid interrogation of the syndesmosis.
Background: The lateral fibular stress test (LFST), also known as the hook or Cotton test, is commonly performed to assess syndesmotic instability intraoperatively. Several studies have used 100 N as the force applied when performing the LFST to detect syndesmotic instability, though no evidence-based requisite force has been described for the test. We hypothesize that surgeons do not apply force uniformly or consistently when performing the LFST and that substantial variation exists. Fundamentally, this could lead to inconsistent diagnosis of syndesmotic instability as surgeons may not be applying the force in a consistent manner. Methods: A biomechanical ankle model consisting of an industrial force gauge attached through a SawBones model was fashioned. Orthopaedic attending surgeons and trainees were asked to perform a series of LFSTs and to simulate the force they typically apply intraoperatively. Basic demographic data were collected on each participant. Results: Thirty-three surgeons participated in the study, including 18 trainees. The median (IQR) force applied during the LFST was 96.42 (71.42-126.33), 87.49 (69.19-117.40), 99.99 (79.91-137.49), for the pooled group, attendings, and trainees respectively. More than half (54.5%) of all trials were less than 100 N (57.8% of surgeons, 51.8% trainees). Intraobserver correlation was excellent within the overall cohort (0.92, P < .001), trainees (0.90, P < .001), and attendings (0.94, P < .001), respectively. Interobserver reliability was fair among the overall cohort (κ =0.28, P = .49), and poor between the attendings (κ = 0.11, P = .69) and the trainees (κ = 0.05, P = .82), respectively. Conclusion: Our study demonstrates that the amount of force applied by typical surgeons when performing the LFST test is highly variable. Variable force application when performing the LFST may lead to inconsistent detection of syndesmotic instability, which may portend a poorer outcome. Clinical Relevance: In this study, we demonstrate the wide variability in the amount of force used during a lateral fibular stress test. High variability of force application when performing the LFST may lead to inconsistent diagnosis of syndesmotic instability, which may portend a poorer outcome. Our findings suggest the need for further investigation into the technical aspects of syndesmotic testing that will permit more reproducible and valid interrogation of the syndesmosis.
Syndesmotic injuries are common in both Weber B and Weber C ankle fractures.
Recognition and anatomic stabilization of syndesmotic disruption correlates with
improved clinical outcomes.[1,4,7,12] The lateral fibular stress
test (LFST), also referred to as the hook or Cotton test,
is commonly performed to diagnose syndesmotic instability intraoperatively. A
bone hook or clamp is placed on the fibula near the level of the superior border of
the syndesmosis with the foot in a neutral position. A lateral distraction force is
applied while evaluating for fluoroscopic widening of the medial clear space (MCS),
tibiofibular overlap (TFOL), and tibiofibular clear space (TFCS). In a cadaveric
model, Stoffel et al
demonstrated that 100 N of force applied in an LFST to an uninjured ankle
specimen was the point after which no further MCS, TFOL, or TFCS increases were
appreciated, and the clamp was noted to crush into the bone. Additional studies have
similarly used 100 N as the applied force for detecting syndesmotic
instability.[6,8]
Despite this, there does not appear to be any conclusive evidence that 100 N is the
requisite minimum amount of force.Additionally, it remains unknown how much force is actually applied by surgeons when
performing the LFST in the operating room. Underdetection of syndesmotic instability
may occur if surgeons are not applying the requisite force in a consistent manner.
Likewise, pulling too hard may cause the clamp to subside in osteoporotic bone or
even iatrogenically widen the syndesmosis secondary to supraphysiologic loading.
Therefore, the purpose of this study was to evaluate the amount of force orthopaedic
surgeons apply during an LFST in a simulated ankle fracture model. Our hypothesis is
that (1) there is substantial variation among surgeons in the amount of force
applied, (2) surgeons do not consistently apply 100 N of force, and (3) the amount
of force pulled is independent of level of training or subspecialty.
Methods
This study was conducted after approval by our institutional review board. A
biomechanical SawBones lower leg model including simulated soft tissue envelope
(SawBones Inc, Vashon Island, Washington) was mounted to a board and a 1-cm-diameter
hole was drilled across the tibia and fibula in the area of the syndesmosis. An
industrial force gauge (Nidec-Shimpo, Kyoto, Japan) was then mounted. A metal
extension piece was passed from the force gauge through the hole and positioned so
that the tip was exposed on the lateral side of the model (Figure 1). A commonly utilized reduction
clamp was affixed to the force gauge. The construct allowed applied forces to be
directly transmitted to the force gauge, increasing measurement accuracy and
obviating any potential issues such as fracture of the SawBones model, clamp
pull-out, or variability of clamp placement had the reduction clamp been applied
directly to the fibula. Validation of the model was performed with a second strain
gauge with reproducible force readings consistently within 0.5 lb of force. This
calibration was repeated after every 10 participants to ensure continued validity
and accuracy of the model.
Figure 1.
Biomechanical model. An industrial force gauge was fashioned to a sawbones
model for simulation.
Biomechanical model. An industrial force gauge was fashioned to a sawbones
model for simulation.Attending orthopaedic surgeons and orthopaedic trainees (residents and fellows) from
two institutions who had previously performed an intraoperative LFST were eligible
for inclusion in the study. Surgeons who did not know how to perform an LFST and
those who had not previously treated an ankle fracture were excluded. Participants
were grouped as either attending surgeons or trainees for analysis. Basic
demographic data were collected on each participant including years in practice for
attending surgeons. Participant gender, handedness, fellowship subspecialty
training, and the number of ankle fractures treated operatively within the previous
year (prior to testing) were also recorded. Participants were asked to perform a
series of LFSTs and to simulate the force they typically applied intraoperatively.
The display of the gauge was covered such that participants were unable to see the
amount of force they were exerting. After a demonstration of the system and 1
practice attempt (with the force gauge covered), 3 trials were recorded for each
participant with a 1-minute break between tests.
Statistical Analysis
Data are exhibited as median and interquartile range (IQR). The Shapiro Wilk test
demonstrated non normally distributed data (P > .05). Basic
demographic data between groups including gender, handedness, subspecialty, and
number of ankle fractures fixed per year, and routine use of the LFST were
compared with chi-squared tests. The amount of force applied in consecutive
pulls were compared within groups using the Friedman test and between groups
using the Wilcoxon signed-rank test. P value <.05 was
considered statistically significant. Intraclass correlation coefficient (ICC)
estimation and Fleiss multirater Kappa test were performed for the entire cohort
and among each group to assess the intra- and interrater reliability,
respectively. Kappa index was interpreted as poor if less than 0.20, fair if
0.20 to 0.40, moderate if 0.40 to 0.60, good if 0.60 to 0.80, and very good if
0.80 to 1.00. ICC below 0.50 was considered poor; 0.50 to 0.75, moderate; 0.75
to 0.90, good; and above 0.90, excellent. Target enrollment was 20 subjects as
we estimated the study would have 90% power (alpha 0.05) to detect a 15-N
difference in pull strength between attendings and residents with a sample size
of 10 subjects per group.
Results
Basic demographic data of the participants in each group are depicted in Table 1. Thirty-three
surgeons participated in the study, of which 4 were female (12.1%). Among the
attendings, mean years in practice was 9 (IQR 3-17). Trainees were mostly
postgraduate year 4 and 5 (62.5%) residents. Eighty percent of the attendings stated
they fixed >20 ankles in the previous year (vs 61% of trainees). A majority (73%)
of attendings were routinely using the LFST to evaluate the syndesmosis
intraoperatively.
Individual Characteristics of the Participants.Abbreviations: IQR, interquartile range; LFST, lateral fibular stress
test; NA, not applicable; PGY, postgraduate year.χ2 test where applicable.The median (IQR) force applied during the LFST was 96.42 (71.42-126.33), 87.49
(69.19-117.40), and 99.99 (79.91-137.49) for the pooled group, attendings, and
trainees, respectively. There was no significant difference between the attendings
and trainees with respect to the first (P = .42), second
(P = .49), or third (P = .49) trials. There
was no difference in the amount of force between those with foot and ankle
subspecialty training vs other subspecialties in any of the 3 trials
(P = .74, .78, .69, respectively). More than half (54.5%) of
all LFSTs were less than 100 N (57.8% of surgeons, 51.8% trainees), with the
distribution depicted in Figure
2.
Figure 2.
Distribution of force applied during LFST by trainees, attendings and the
total cohort, respectively, displayed as (A) percentage of total pulls and
(B) number of participants (based on the averagea pull per
participant). (LFST, lateral fibular stress test)
aAverage force range: Each participant performed 3 LFSTs in this
investigation. The x axis is grouped according to each
participants’ respective average of the 3 pulls.
Distribution of force applied during LFST by trainees, attendings and the
total cohort, respectively, displayed as (A) percentage of total pulls and
(B) number of participants (based on the averagea pull per
participant). (LFST, lateral fibular stress test)aAverage force range: Each participant performed 3 LFSTs in this
investigation. The x axis is grouped according to each
participants’ respective average of the 3 pulls.The ICC was excellent within the overall cohort (0.92, P < .001),
trainees (0.90, P < .001), and attendings (0.94,
P < .001), respectively. Interobserver reliability was fair
among the overall cohort (κ =0.28, P = .49) and poor between the
attendings (κ = 0.11, P = .69) and the trainees (κ = 0.05,
P = .82), respectively.
Discussion
Significant challenges in diagnosing syndesmotic instability exist even when
performing intraoperative stress tests. The accuracy of these tests relies heavily
on technical expertise, applying a reproducible amount of adequate force, and the
ability to discern small but meaningful fluoroscopic changes in the MCS, TFCS, and
TFOL. The findings of this investigation demonstrate that a wide variability in
force is applied by orthopaedic surgeons during simulated LFST testing using a
biomechanical ankle model. Although there is excellent intraobserver reproducibility
of force applied during an LFST, our simulation suggests fair to poor interobserver
reliability. Furthermore, the amount of force applied does not appear to be related
to level of training, subspecialty training nor other demographic factors. Finally,
more than 50% of all trials were below 100 N.Despite considerable recent research evaluating various parameters to identify
syndesmotic instability, little research has been performed examining the technique
surgeons’ use to stress the syndesmosis in vivo. Stoffel et al
demonstrated in an uninjured cadaveric ankle that after 100 N of applied
force, no further widening of MCS, TFCS, or TFOL was noted. This amount of
quantified force has been used by several investigators[1,8,11] to detect syndesmotic
instability in cadaveric models, even though it was not intended or investigated as
a “requisite force” for the LFST by Stoffel. In fact, it seems logical that accurate
diagnosis in vivo is likely based on multiple patient and injury characteristics.
Anatomic factors such as bone strength, soft tissue tension, and/or intact parts of
the ligamentous complex may all factor into how much the fibula translates with an
applied force. To our knowledge, no study has investigated the “correct” amount of
force to use for an LFST. Intraoperatively, a force gauge is not typically used
during the LFST, so surgeons remain unaware of the actual force they are using to
stress the syndesmosis. Although it may seem intuitive that attending orthopaedic
surgeons who commonly treat ankle fractures have adequate clinical experience to
apply the diagnostic requisite force, our results show that wide variability exists
between even experienced surgeons.Diagnosis of syndesmotic injury with radiographic stress testing continues to be a
challenge in the care of ankle fractures. In a 2-surgeon comparison of the external
rotation (ER) stress test vs the Cotton test on 140 unstable ankle fractures
undergoing surgery, Pakarinen et al
found that the LFST had a sensitivity of just 0.25%. Although intra- and
interobserver reliability in their study were high, these results should be
interpreted in the context of a 2-surgeon comparison. Our simulation challenges
their finding of high interobserver reliability of the LFST with testing of more
than 30 subjects. Jiang et al
performed a cadaveric study, which demonstrated that the Cotton test
increased the TFCS most reliably after syndesmotic injury. Although their
biomechanical model used 100 N of force for the LFST, our study highlights that in
clinical practice, collectively, surgeons do not routinely apply this amount of
force and that significant variability exists between surgeons. The findings of our
study therefore question several foundational assumptions underpinning such
studies.Several studies have directly compared the LFST with the ER stress test.[9,13] Common to all of these
studies, however, is the lack of a standard method of LFST when being compared to
the ER stress test. In previous cadaveric studies, 100 N of force is most often
applied. However, in the clinical studies, the amount of force pulled is not
delineated. In fact, the authors were unable to find a single study that quantified
the in vivo force used by surgeons during the LFST. This is in contrast to the ER
stress test, where a standardized torque of 7.5 Nm can be applied with an F-Tool.
Given the amount of variability identified between subjects in our study, we
would be concerned that the clinical variability in the force applied during the
Cotton test would alter measurement of syndesmotic widening and may have
implications on treatment.Standardization of the LFST goes beyond simply the amount of force pulled. Direction
of distraction has also been proposed as a factor for consideration.
Furthermore, where the surgeon places his or her hand on the leg to apply
countertraction and where the clamp is placed on the fibula may all impact
assessment of radiographic parameters when stressing the mortise. Further study of
these variables in an effort to further standardize testing would be warranted to
optimize accuracy of diagnosing syndesmotic injury.There are several limitations to our study. Foremost, this is a biomechanical study
using an ankle fracture model and does not fully replicate in vivo situations.
However, using a SawBones model reduced variability compared to cadaveric or in vivo
study as we were able to solely examine the force surgeons apply during a simulated
LFST without other potential confounding factors. In vivo or cadaveric tissue,
however, would inherently better mimic the biofeedback experienced by surgeons in
the operating theatre and may have led to differential force generation.
Additionally, this study was performed under ideal conditions to quantify force
generation without common clinical concerns such as iatrogenic osseous fibular
injury when applying a reduction clamp or disrupting a concomitantly repaired
fibular fracture. Based on our own experiences, surgeons likely apply variable
amounts of force depending on the specific clinical scenario and type of ankle
fracture. Additionally, although our surgeon cohort likely represents typical
abilities generalizable to most practicing orthopaedic surgeons, this study was
performed at two institutions in a common geographic area. Although not a weakness
specific to this investigation, it should be noted that many syndesmotic injuries in
clinical practice are readily evident on initial radiographs. LFST or other
intraoperative stressing testing is not always required to diagnose syndesmotic
instability. Finally, we want to emphasize that 100 N for performing an LFST may in
fact not be the requisite amount of force for all patients. Although this amount of
force has been used several times throughout the literature, its questionable use as
a methodologic predicate is derived from one study that was not aiming to determine
the “correct” amount of force to use in an LFST. The appropriate amount of force
required in an LFST to detect syndesmotic injury should be an area of further
investigation. Furthermore, our findings regarding the wide variability of force
pulled and low interobserver reliability of the test, highlight the need for
improvement and standardization of the LFST.In conclusion, our study demonstrates that the amount of lateral force applied by
surgeons in a biomechanical ankle model when performing the LFST is variable.
Variability in force application when performing the LFST may impact consistent
detection of syndesmotic instability, which may portend a poorer outcome. Either the
intraoperative use of force gauges and/or specific practice outside of the operating
theatre (to become familiarized with the proprioceptive feel of generating the
requisite force) may permit surgeons to consistently apply the test in a manner that
is clinically reproducible. Finally, these results suggest that further
investigation into the technical reproducibility and accuracy of intraoperative
syndesmotic testing, specifically the LFST, on both cadaveric specimens and in vivo
is warranted.
Authors: Harri Pakarinen; Tapio Flinkkilä; Pasi Ohtonen; Pekka Hyvönen; Martti Lakovaara; Juhana Leppilahti; Jukka Ristiniemi Journal: J Bone Joint Surg Am Date: 2011-11-16 Impact factor: 5.284
Authors: Jafet Massri-Pugin; Bart Lubberts; Bryan G Vopat; Jonathon C Wolf; Christopher W DiGiovanni; Daniel Guss Journal: Foot Ankle Int Date: 2018-01-10 Impact factor: 2.827
Authors: Richard J Jenkinson; David W Sanders; Mark D Macleod; Andrea Domonkos; Jeanette Lydestadt Journal: J Orthop Trauma Date: 2005-10 Impact factor: 2.512
Authors: Kevin N Jiang; Brian M Schulz; Ying Lai Tsui; Thomas R Gardner; Justin K Greisberg Journal: J Orthop Trauma Date: 2014-06 Impact factor: 2.512