Literature DB >> 33786328

Interrater and Intrarater Reliability of the Beighton Score: A Systematic Review.

Lauren N Bockhorn1, Angelina M Vera1, David Dong1, Domenica A Delgado1, Kevin E Varner1, Joshua D Harris1.   

Abstract

BACKGROUND: The Beighton score is commonly used to assess the degree of hypermobility in patients with hypermobility spectrum disorder. Since proper diagnosis and treatment in this challenging patient population require valid, reliable, and responsive clinical assessments such as the Beighton score, studies must properly evaluate efficacy and effectiveness.
PURPOSE: To succinctly present a systematic review to determine the inter- and intrarater reliability of the Beighton score and the methodological quality of all analyzed studies for use in clinical applications. STUDY
DESIGN: Systematic review; Level of evidence, 3.
METHODS: A systematic review of the MEDLINE, Embase, CINAHL, and SPORTDiscus databases was performed. Studies that measured inter- or intrarater reliability of the Beighton score in humans with and without hypermobility were included. Non-English, animal, cadaveric, level 5 evidence, and studies utilizing the Beighton score self-assessment version were excluded. Data were extracted to compare scoring methods, population characteristics, and measurements of inter- and intrarater reliability. Risk of bias was assessed with the COSMIN (Consensus-Based Standards for the Selection of Health Measurement Instruments) 2017 checklist.
RESULTS: Twenty-four studies were analyzed (1333 patients; mean ± SD age, 28.19 ± 17.34 years [range, 4-71 years]; 640 females, 594 males, 273 unknown sex). Of the 24 studies, 18 reported raters were health care professionals or health care professional students. For interrater reliability, 5 of 8 (62.5%) intraclass correlation coefficients and 12 of 19 (63.2%) kappa values were substantial to almost perfect. Intrarater reliability was reported as excellent in all studies utilizing intraclass correlation coefficients, and 3 of the 7 articles using kappa values reported almost perfect values. Utilizing the COSMIN criteria, we determined that 1 study met "very good" criteria, 7 met "adequate," 15 met "doubtful," and 1 met "inadequate" for overall risk of bias in the reliability domain.
CONCLUSION: The Beighton score is a highly reliable clinical tool that shows substantial to excellent inter- and intrarater reliability when used by raters of variable backgrounds and experience levels. While individual components of risk of bias among studies demonstrated large discrepancy, most of the items were adequate to very good.
© The Author(s) 2021.

Entities:  

Keywords:  Beighton score; hypermobility; interrater; intrarater; systematic review

Year:  2021        PMID: 33786328      PMCID: PMC7960900          DOI: 10.1177/2325967120968099

Source DB:  PubMed          Journal:  Orthop J Sports Med        ISSN: 2325-9671


The Beighton score is the cornerstone for diagnosing hypermobility syndromes, including hypermobility spectrum disorder or hypermobile Ehlers-Danlos syndrome.[13,59] The original criteria do not provide a detailed description,[6] which leaves them open for interpretation and uncertainty of application. No threshold score is determined by the original description,[6] nor is there consensus throughout the literature on what defines hypermobility.[24,34] However, variations are seen in hypermobility depending on age, sex, and race; thus, some experts believe that thresholds should be individualized to subpopulations.[51,52] Given the imprecision of the Beighton score, studies utilizing it may be inconsistent in starting positions, performance, and benchmarks.[34] Questions left unanswered by the Beighton score include whether the tests should be performed actively by the respondent or passively by the clinician and whether a warm-up period is required.[35] The risk of these inherent shortcomings is that a lack of specificity could affect the score’s generalizable applicability and reliability. In addition, the Beighton score does not account for symptoms. Laxity is defined as excessive motion in a specific joint in an asymptomatic individual. “Excessive” relative to a joint, is defined as abnormally increased or supraphysiologic motion, also known as “hypermobility.” “Instability” is defined as excessive motion in a specific joint in a symptomatic individual. The key distinction between laxity and instability is the absence (former) or presence (latter) of symptoms. Historically, studies have consistently reported excellent reliability of the Beighton score. However, recent systematic reviews have reported these studies to show conflicting evidence, and they have cited concerns with the methodology based on requirements with COSMIN (Consensus-Based Standards for the Selection of Health Measurement Instruments) criteria that are clinically inapplicable to this score.[17,36] The training and experience of raters[26,42] and the time between examinations[33] have the potential to affect the measures of Beighton score reliability according to the current COSMIN criteria. Reliable, accurate, and precise measures for hypermobility are necessary for operative and nonoperative musculoskeletal care for clinicians and surgeons. Specifically, they can guide treatment choices in patellofemoral,[10] shoulder,[53] and hip instability[46] as well as anterior cruciate ligament (ACL) reconstruction.[41,55] Owing to the significant heterogeneity in evidence regarding the Beighton score, the purpose of this investigation was to succinctly present a systematic review to determine the inter- and intrarater reliability of the Beighton score and the methodological quality of all analyzed studies in the context of clinical applicability. We hypothesized that this systematic review will demonstrate excellent inter- and intrarater reliability and substantial methodological quality that is satisfactory for surgeons’ clinical use.

Methods

The review protocol was registered via the National Institute for Health Research’s PROSPERO International Prospective Register of Systematic Reviews (CRD42018081703).[28] The systematic review was conducted according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.[43] Utilizing PICO (population, intervention, comparison, outcome) to fit a measurement tool, we examined research addressing humans of any age, degree of hypermobility, the Beighton score, and inter- and intrarater reliability. Therefore, it was determined that studies evaluating the clinical Beighton score between and among raters as a primary or secondary outcome would be included and all others would be considered the wrong outcome. Studies that utilized the Beighton self-assessment exclusively, in which patients independently measured and reported their own score, were excluded. Reviews, abstracts, theses, unpublished studies, articles not available in English, and studies with animal or cadaveric subjects were also excluded. A systematic computerized search (Appendix 1) was conducted by 1 author (L.N.B.) on January 30, 2018, in 4 databases (MEDLINE, Embase, CINAHL, and SPORTDiscus) with no limitations on dates of inclusion. To reduce the search bias, the search strategy was conducted using Medical Subject Headings. A search in ClinicalTrials.gov was also conducted to identify any possible ongoing studies. The search terms included, but were not limited to the following: Beighton, joint laxity, hypermobility, reproducibility of results, observer variation, reliability, interrater, or intrarater (Appendix 1). Identified records were imported to the systematic review software Rayyan (Qatar Computing Research Institute),[48] and duplicates were removed. Articles were screened in a 2-step process, first by title and abstract according to exclusion criteria. Second, articles included by abstract were imported into Rayyan; full texts were made available; and 2 authors (L.N.B. and A.M.V.) independently screened by reading the article abstract and the article full text for inclusion according to both eligibility criteria. Disagreements concerning final inclusion were settled by consensus between these authors during a deliberation session. The data extraction sheet was developed according to the Cochrane Consumers and Communication Review Group’s data extraction template,[30] was pilot tested on 3 randomly selected included studies, and then refined accordingly. One review author (L.N.B.) extracted the data from included studies, which the second author (A.M.V.) verified. Disagreements were resolved by discussion between them; if no agreement could be reached, it was planned that a third author (J.D.H.) would decide. No authors were contacted for additional information, and all missing data were labeled “not specified.” The included articles were independently assessed by 2 authors (L.N.B. and A.M.V.) for risk of bias using the COSMIN checklist.[44] The complete COSMIN checklist includes 12 boxes, covering internal consistency, reliability, measurement error, validity, and responsiveness. This review exclusively evaluated reliability (COSMIN box 6), which was determined to be crucial to the context in which inter- and intraobserver values were interpreted. The overall methodological quality of a study is determined by the lowest rating among the items in the reliability box (ie, “the worst score counts” principle), including “very good,” “adequate,” “doubtful,” and “inadequate.” Individual scores on the COSMIN “reliability” subitems were assessed and are included in Appendix 2 for completeness. COSMIN question 6.8, “other methodological flaws,” was not assessed because of the subjectivity of the question. To minimize selection bias, studies were not excluded on the basis of methodological quality, as they were evaluated only in the reliability domain and the lowest score determined the overall quality in reliability.
Box 6

Reliability

Very GoodAdequateDoubtfulInadequateNot applicable
Design requirements
1Were patients stable in the interim period on the construct to be measured?Evidence provided that patients were stableAssumable that patients were stableUnclear if patients were stablePatients were NOT stable
2Was the time interval appropriate?Time interval appropriateDoubtful whether time interval was appropriate or time interval was not statedTime interval NOT appropriate
3Were the test conditions similar for the measurements (eg type of administration, environment, instructions)?Test conditions were similar (evidence provided)Assumable that test conditions were similarUnclear if test conditions were similarTest conditions were NOT similar
Statistical methods
4For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?ICC calculated and model or formula of the ICC is describedICC calculated but model or formula of the ICC not described or not optimal. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurredPearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurredNo ICC or Pearson or Spearman correlations calculatedNot applicable
5For dichotomous/nominal/ordinal scores: Was kappa calculated?Kappa calculatedNo kappa calculatedNot applicable
6For ordinal scores: Was a weighted kappa calculated?Weighted Kappa calculatedUnweighted Kappa calculated or not describedNot applicable
7For ordinal scores: Was the weighting scheme described? eg linear, quadraticWeighting scheme describedWeighting scheme NOT describedNot applicable
Other
8Were there any other important flaws in the design or statistical methods of the study?No other important methodological flawsOther minor methodological flawsOther important methodological flaws

From Mokkink LB, de Vet HCW, Prinsen CAC, et al. COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1171-1179.[44] Material distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

We defined “reliability” as reproducibility of test values in repeated trials on the same individuals,[32] quantified by inter- and intrarater reliability. Consistency of outcomes recorded from 1 participant examined by the same observer multiple times was defined as intrarater reliability, while reproducibility of the score among observers was defined as interrater reliability.[4] Since the level of measurement of the Beighton score is not defined, researchers use different statistics to quantify these 2 values. Nominal and ordinal data were analyzed with the Cohen or weighted kappa (κ) coefficient,[50] which varies from –1 to 1. COSMIN criteria favor weighted kappa values, which penalize disagreements in terms of their seriousness, over unweighted kappa values, which treat disparities equally.[14,56] Less rigorous expressions of inter- and intrarater reliability include percentage agreement and the Spearman rho. While percentage agreement is a direct measurement of the similarity between chosen values, it does not take into account the chance that scores were guessed[42] or the difference between more disparate scores. The Spearman rho expresses correlation between values on a scale of –1 to 1, with no known standards for reliability. This correlation reveals only how much values vary in relationship to each other, not the degree of agreement between them, allowing it to discount systematic differences.[9] These values were not considered adequate to express reliability according to COSMIN standards. No transformation of reported values was required, except for simplifications detailed in the legend of Tables 1 and 2. No quantitative assessment of risk of bias across studies could be computed with the measures of reliability, and no additional quantitative analysis was performed.
Table 1

Extracted Data

Population DescriptionTest ConditionsWhether Test Conditions Were Similar for the Measurements
Number of participantsBeighton score modificationsParticipant sequence generation
AgeExamination settingWhether sequence of participants was concealed
SexNumber of ratersBlinding of raters
Diagnostic criteriaRater professionsKey conclusions of study authors
Inclusion criteriaExperienceStatistical tests
Exclusion criteriaTrainingCOSMIN criteria
Time between measurementsWhether patients were stable in the interim

COSMIN, Consensus-Based Standards for the Selection of Health Measurement Instruments.

Table 2

Strength of Agreement for the Kappa Coefficient and Intraclass Correlations[14,39,40,5] [6]

Kappa CoefficientAgreementIntraclass CorrelationReliability
≤0Poor0.5Poor
0.01-0.20Slight>0.5-0.75Moderate
0.21-0.40Fair>0.75-0.9Good
0.41-0.60Moderate>0.90Excellent
0.61-0.80Substantial
0.81-1.00Almost perfect
Extracted Data COSMIN, Consensus-Based Standards for the Selection of Health Measurement Instruments. Strength of Agreement for the Kappa Coefficient and Intraclass Correlations[14,39,40,5] [6]

Results

The database search strategy yielded 1250 records. Three articles not identified by these searches were discovered by literature citation and added to the screen. After the screening process delineated in Figure 1, a total of 24 records were determined to meet inclusion criteria.[‡]
Figure 1.

Flow diagram summarizing the literature search, screening, and review using the PRISMA (Preferred Reporting Items for Systematic Meta-Analyses) guidelines.

Flow diagram summarizing the literature search, screening, and review using the PRISMA (Preferred Reporting Items for Systematic Meta-Analyses) guidelines. Table 3 includes characteristics of all included studies and their corresponding COSMIN criteria. All 24 studies selected for review were published in English and were observational studies with level 4 evidence. Of the 14 articles that explicitly express time intervals between measurements, the longest was 12 to 16 weeks,[5] with 12 of 14 reporting ≤2 weeks. A total of 1333 participants were examined for reliability of the Beighton score across included trials, with a reported mean ± SD age of 28.19 ± 17.34 years (range, 4-71 years). Of the 24 studies, 8 had populations <18 years old, and 14 included a higher proportion of women than men (640 female, 594 male, 273 unknown). Six studies included athletes in their participant population, and 8 comprised patients with pathological conditions. Seven studies used goniometers in their protocol.
Table 3

Population Characteristics, Time Interval, Study Design, and Associated COSMIN Scores

Study (Year)Population CharacteristicsTime IntervalStudy Design
Sample, Age (y), Female Sex (%), DOPb,c 6.1d InterraterIntrarater6.2d Test ConditionNo. of RatersRater ProfessionCombined Rater Experiencec Rater Training6.3d
Aartun (2014)[1] 111, 12-14, 46.8, middle school studentsVG<4 d1-4 hVG5 item2Chiropractors18 yStandardization sessionVG
Aslan (2006)[3] 72, 20.36 ± 1.24 (18-25), 40.20, undergraduate PT studentsVG<24 h12.84 ± 7.41 dVG5 item + goniometer2PTs21 y2 h practice togetherVG
Baumhauer (1995)[5] 21, 18-23, 57, intercollegiate athletesVG12-16 wkNAVG5 item2NSNSNSA
Boyle (2003)[8] 42, 25.4 ± 4.2 (15-45), 100, noninjured HS athletes and PT studentsVG15-60 min6 ± 4 dVG5 item + goniometer2PTs17 yCME, trained with indexVG
Bulbena (1992)[11] 173, 43.98e NS, JHS with >5 Beighton systemVGConsecutiveNAD5 item2RheumatologistsExperiencedNSA
Cooper (2018)[15] 50, 49 (22-60), 56, community membersVGNS1 wkVG5 item + goniometer1NSNSNSA
Erdogan (2012)[18] 15, 31.8 (16-50), 59.15, treated for ingrown nailsVGNSNSD5 item + goniometer2RheumatologistsNSNSA
Erkula (2005)[20] 50, 10.4 ± 1.2 (8-15)f 46.97, asymptomatic studentsVG2 wkNAVG5 item2Orthopaedic surgeonsNSNSA
Evans (2012)[21] 30, 10.6 ± 2.3 (7-15), 65, asymptomatic podiatry clinic patientsVG>2 h>2 hVG5 item2Podiatrists21 yNSA
Fritz (2005)[22] 38, 39.2 ± 11f 57,f history of lower back painVG5 minNAVG5 item2PTsNSNSA
Glasoe (2002)[23] 30, 14-24, 100, athletesVGNSNAVG5 item2NS>6 yNSA
Hansen (2002)[27] 100, 9-13, NS, asymptomatic competitive athletesVGNSNAD4/5, no fifth finger42 rheumatologists 1 untrained physicianNSGuided by illustrationsA
Hicks (2003)[29] 63, 36 (20-66), 60.30, patients with lower back painVG>15 minNAVG5 item43 PT, 1 PT and chiropractor20 yGroup review, 1 h practiceVG
Hirsch (2007)[31] 50, 38.3 ± 11.3 (20-60), 56, asymptomaticVGNS24.6 dVG5 item + goniometer2DentistsNSInstructions, directed by orthopaedic surgeonVG
Junge (2013)[34] 103, 7-8 and 10-12, 44e healthy school childrenVG<30 minNAVG4PT studentsNSTrainedVG
Juul-Kristensen (2007)[35] 40, 42.27 (18-71)e 68.33,e BJHS, EDS, back/shoulder painVGNSNAD5 item2NSNSTrained per protocolVG
Karim (2011)[37] 30, 24 (18-32), 100, contemporary professional dancersVGNSNAVG5 item41 PT, 3 PT students30 yPT trained studentsVG
Naal (2014)[46] 55, 28.5 ± 4.1, 32.70, symptomatic FAI casesVGNSNAD5 item2CliniciansNSNSA
Pitetti (2015)[49] 25, 13.3 ± 2.9, 44, intellectually disabledVG3-4 wkNAVG5 item + goniometer2DPT studentsNonePeer supportive learningVG
Smith (2012)[57] 5, 27, 100, patellar instability patientsVG<1 d30 minVG5 item5Orthopaedic surgeons125 yFamiliarizedVG
Tarara (2014)[58] 19, 20.3 ± 1.2 (male), 19.8 ± 1.0 (female), 57.89, club athletesVG<2.5 h4-7 dVG5 item31 clinician and 2 novice students22 yPrior reading, 1 h training and questionsVG
Vaishya (2013)[62] 300, 24.6 ± 0.9, 36.67, postoperative ACL reconstruction and controlsVGNSNAD5 item2NSNSNSA
Vallis (2015)[63] 36, 22.7 (18-32), 75, asymptomatic PT and OT studentsVG<1 d, 1 wkNAVG5 item + goniometer2ResearchersNSTeaching sessionVG
van der Giessen (2001)[65] 48, 4-12, 48.9f primary schoolchildrenVGNSNAD5 item2PT students1 moProfessional PT trained studentsVG

ACL, anterior cruciate ligament; BJHS, benign joint hypermobility syndrome; CME, continuing medical education; DOP, description of participants; DPT, doctorate of physical therapy; EDS, Ehlers-Danlos syndrome; FAI, femoroacetabular impingement; HS, high school; JHS, joint hypermobility syndrome; NA, not available/applicable; NS, not specified; OT, occupational therapy; PT, physical therapy.

Age reported as mean ± SD or range.

Calculated.

COSMIN criterion (Consensus-Based Standards for the Selection of Health Measurement Instruments; see Appendix 2 for details). Scoring: VG = very good, A = adequate, D = doubtful, I = inadequate.

Weighted average of groups or 2-phase studies.

Demographics of larger sample, of which reliability population is a subgroup.

Population Characteristics, Time Interval, Study Design, and Associated COSMIN Scores ACL, anterior cruciate ligament; BJHS, benign joint hypermobility syndrome; CME, continuing medical education; DOP, description of participants; DPT, doctorate of physical therapy; EDS, Ehlers-Danlos syndrome; FAI, femoroacetabular impingement; HS, high school; JHS, joint hypermobility syndrome; NA, not available/applicable; NS, not specified; OT, occupational therapy; PT, physical therapy. Age reported as mean ± SD or range. Calculated. COSMIN criterion (Consensus-Based Standards for the Selection of Health Measurement Instruments; see Appendix 2 for details). Scoring: VG = very good, A = adequate, D = doubtful, I = inadequate. Weighted average of groups or 2-phase studies. Demographics of larger sample, of which reliability population is a subgroup. Raters in at least 18 of the 24 studies were health care professionals (HCPs) or HCP students. Eight studies had physical therapists or physical therapy students as raters; 2 studies, orthopaedic surgeons; 3 studies, rheumatologists; and in the 5 other studies, other HCP disciplines that were not specified in the article. One study referred to its raters as “researchers.” None of the studies included HCPs with equal years of experience. Half of the studies did not report the HCPs’ years of experience at all. For the studies that did report years of experience, the numbers for each HCP were summed to reach combined total years for Table 3. Table 4 includes measures of reliability in each study and the corresponding COSMIN criteria. Because the study designs, participants, interventions, and reported outcome measures varied markedly, results were synthesized in a qualitative manner, and pooled means could not be determined. Because 3 studies included reliability statistics for >1 cutoff score (ie, ≥4/9 and composite), the included 24 articles reported interrater reliability values for 27 cutoff scores. For interrater reliability, 5 of the 27 scoring cutoffs were ≥4 of 9; 3 were ≥5 of 9; 13 were composite (total of 9 points); 4 used each item in the Beighton score; and 1 used a modified composite scale. Intrarater reliability was expressed for 10 total cutoff values: 3 were ≥4 of 9; 1 was ≥5 of 9; 5 included composite values; and 1 calculated a score for each item. Of the 8 studies that utilized intraclass correlation (ICC) to express interrater reliability, 1 found an excellent value; 4, good; 1, moderate to good; and 2, moderate. Of the 19 kappa values or ranges for interrater reliability, 3 were almost perfect; 6 were substantial; 2 were moderate; 1 was poor; and the others ranged between scales. Of the 7 ranges, 3 crossed between substantial and almost perfect, while the other 4 varied among lower ratings. Three studies used percentage agreement values, and 3 studies used the Spearman rho to demonstrate interrater reliability. For interrater reliability, 5 of 8 (62.5%) ICCs and 12 of 19 (63.2%) kappa values were better than moderate. Of the 13 intrarater values provided, 3 were ICC; 7 were kappa; 2 were percentage agreement; and 1 was a Spearman rho. All 3 ICC values for intrarater reliability were excellent. For the 7 kappa values and ranges, 2 were almost perfect; 2, substantial; 1, fair; and 2 had scores varying from substantial to almost perfect.
Table 4

Inter- and Intrarater Reliability and Associated COSMIN Scores

Reliability, Mean (95% CI)COSMIN Item
Study (Year)Cutoff ScoreInterraterIntrarater6.46.56.66.7
Aartun (2014)[1] ≥4/9κ = 0.65 (0.33 to 0.97)κ = 0.66-1 (0.03 to 1)NAVGDA
≥5/9κ = 0.56 (0.11 to 1.00)κ = 1
Aslan (2006)[3] CompositeICC = 0.82Agreement = 42%ICC = 0.92Agreement = 43%ANANANA
Baumhauer (1995)[5] Compositeρ = 1NAIDA
Boyle (2003)[8] Compositeρ = 0.87Agreement = 51%ρ = 0.86Agreement = 69%DNANANA
Bulbena (1992)[11] Each itemκ = 0.79-0.93DVGDNA
Cooper (2018)[15] ≥4/9κ = 0.96b (0.87 to 1.00)κ = 1NAVGDA
Erdogan (2012)[18] Each itemκ = 0.71-1.0κ = 0.81-1.0NAVGDA
Erkula (2005)[20] ρ = 0.86ρ = 0.62DNANANA
Evans (2012)[21] CompositeICC = 0.73ICC = 0.96-0.98VGNANANA
Fritz (2005)[22] CompositeICC = 0.72 (0.50 to 0.85)VGNANANA
Glasoe (2002)[23] Compositeκ = 0.7NAVGDA
Hansen (2002)[27] ≥4/9κ = 0.44-0.82DVGDA
Hicks (2003)[29] CompositeICC = 0.79 (0.68 to 0.87)VGNANANA
Hirsch (2007)[31] ≥4/9ICC >0.84ICC > 0.89ANANANA
Junge (2013)[34] Each item c κ = 0.49-0.94, 0.30-0.84NAVGDA
≥5/9c κ = 0.64, 0.59d
Juul-Kristensen (2007)[35] CompositeICC = 0.91VGVGDA
≥5/9κ = 0.66 (0.30 to 1.02)0.74 (0.46 to 1.02)d
Karim (2011)[37] NSκ = 0.6Agreement = 54%-100%NAVGDNA
Naal (2014)[46] Compositeκ = 0.82b (0.72 to 0.91)NAVGVGVG
Pitetti (2015)[49] CompositeICC = 0.88AVGDA
Each itemκ = 0.45-0.80
Smith (2012)[57] Compositeκ = 0.00 (−0.16 to 0.17)κ = 0.25 (0.03 to 0.51)NAVGVGA
Tarara (2014)[57] Modified compositee κ = 0.64-0.69f κ = 0.72g (0.62 to 0.82)Expert: κ = 0.69 (0.46 to 0.92)Novice: κ = 0.72-0.73 ([0.53-0.90] to [0.58-0.89])NAVGVGA
Vaishya (2013)[62] ≥4/9κ = 0.7NAVGDA
Vallis (2015)[63] CompositeICC = 0.72-0.80 ([0.51-0.84] to [0.64-0.89])κ = 0.71-0.82 ([0.67-0.90] to [0.50-0.84])AVGVGA
van der Giessen (2001)[65] Compositeκ = 0.81NAVGDA

A, adequate; COSMIN, Consensus-Based Standards for the Selection of Health Measurement Instruments; D, doubtful; I, inadequate; ICC, intraclass correlation; NA, not available/applicable; VG, very good.

Observer-participant reliability.

Percentage agreement omitted.

For 2 distinct methods of performing Beighton score.

Modified composite scale: 0 = pain with test, 1 = 8-9 points, 2 = 6-7 points, 3 = 4-5 points, 4 = 2-3 points, 5 = 0-1 points.

Expert-novice rater reliability.

Novice-novice rater reliability.

Inter- and Intrarater Reliability and Associated COSMIN Scores A, adequate; COSMIN, Consensus-Based Standards for the Selection of Health Measurement Instruments; D, doubtful; I, inadequate; ICC, intraclass correlation; NA, not available/applicable; VG, very good. Observer-participant reliability. Percentage agreement omitted. For 2 distinct methods of performing Beighton score. Modified composite scale: 0 = pain with test, 1 = 8-9 points, 2 = 6-7 points, 3 = 4-5 points, 4 = 2-3 points, 5 = 0-1 points. Expert-novice rater reliability. Novice-novice rater reliability. Out of the 168 COSMIN questions in the reliability domain across all studies, 79 (47%) were “very good”; 29 (17%), “adequate”; 24 (14%), “doubtful”; 1, “inadequate”; and 35 (21%) did not apply. Utilizing the COSMIN “worse score counts” principle, we determined that 1 (4%) study met “very good” criteria[29]; 7 (29%) met “adequate”[3,21,22,31,57,58,63]; 15 (63%) met “doubtful”[§]; and 1 (4%) met “inadequate”[5] for overall risk of bias in the reliability domain. Eight (33.33%) studies utilized ICC, and 16 (66.66%) comprised 19 kappa statistics to express interrater reliability, of which 4 (25%) used weighted kappa values. Of the 12 articles that included unweighted kappa values, 6 received an overall score of “doubtful,” which was attributed only to question 6.6, regarding use of weighted kappa,[44] when they otherwise would have received “adequate” or “very good” overall. Of the 24 included studies, 7 did not report an explicit time interval between reliability measurements. However, 6 of the 7 had another doubtful measure, which means that question 6.2, regarding the appropriateness of the time interval,[44] did not greatly affect the overall score for most studies.

Discussion

This systematic review has demonstrated high inter- and intrarater reliability for the Beighton score in individuals with and without hypermobility in a variety of clinical conditions. As demonstrated by the data derived from Table 3, varying time conditions, population characteristics, measurement tools, measurer education and training, and the Beighton score cutoff did not greatly influence the reliability of this test. Most studies demonstrated substantial to almost perfect interrater reliability values. Intrarater reliability was excellent or almost perfect in more than half of analyzed investigations. The quality of analyzed evidence was adequate, in contrast to findings in previous systematic reviews.[35] The increased mobility seen in patients with an elevated Beighton score is of importance for the clinician. Generalized joint hypermobility is a risk factor for many musculoskeletal conditions, such as multidirectional shoulder instability,[54] hip instability,[12] femoroacetabular impingement,[46,64] hip dysplasia,[2,7] ACL injury,[60,62] flatfoot,[45] ankle sprains,[16] and many others. Clinicians should have a high index of suspicion for these conditions in this population. Knowledge of hypermobility influences patient selection for surgical versus nonsurgical treatments, the actual surgical technique employed, and the expected prognosis and outcome with respect to risks for recurrence of symptoms (which may vary along a spectrum of instability).[19] This is important in the clinical setting for practitioners to avoid unnecessary imaging or interventions or the misdiagnosis of chronic pain.[66] Patients with hypermobility may warrant more aggressive rehabilitation or injury prevention protocols. Owing to the higher incidence of joint instability in patients with hypermobility, it has been suggested that these patients undergo prolonged strengthening, proprioception, and generalized conditioning programs when considering initial nonoperative treatment.[66] Additionally, considerations in operative intervention may change with the knowledge of a patient’s hypermobility status. For instance, a surgeon might consider an open inferior capsular shift versus arthroscopic capsular plication for the hypermobile shoulder, or a surgeon may consider using a patellar tendon autograft over hamstring tendon autograft in ACL reconstruction[38] to ensure greater stability postoperatively. Arthroscopic hip preservation surgeons may employ greater degrees of capsular plication and/or inferior capsular shift in patients undergoing FAI syndrome and labral injury surgical treatment.[61] Even trauma and arthroplasty surgeons should consider a patient’s hypermobility status. Patients with hypermobility have been found to have lower bone density[25,47] than controls, which leaves them at greater risk for fixation and implant failure and fracture. Postoperative protocols may need to be adjusted for this population to address the increased laxity. Thus, use of a reliable system, such as the Beighton score, for identifying these patients is essential to providing the most comprehensive musculoskeletal care. Limitations of the present study include the quality of studies available in the literature, the failure of studies to include time intervals between intrarater measures, reporting bias, and lack of rater standardization or comparison. Studies that did not include time intervals between intrarater measures resulted in a summary COSMIN score of “doubtful.” Laxity may change in an individual over a period of decades[3,11,16,66]; however, it does not change over short periods. Thus, the omission of time intervals should not negatively affect a clinician’s evaluation of the evidence supporting inter- and intrarater reliability of the Beighton score. Additionally, score reporting is subject to publication bias and selective reporting because reliability may be reported by composite score, individual measurement score, or cutoff score. This may influence authors to choose the reporting measure with the most desirable outcomes. Studies that measure interrater reliability risk underestimating it when raters are not properly standardized. Using raters with unequal experience may result in artificially low interrater statistics. All studies in the present review used raters with different levels of experience; thus, it is likely that under standardized conditions the interrater reliability may be higher. No one study utilized raters of different professions; therefore, the discrepancy in Beighton score reliability among health care disciplines cannot be evaluated by this study.

Conclusion

The Beighton score is a reliable clinical assessment tool that shows acceptable reliability when used by raters of any background or experience level. Studies demonstrate immense variability in participant population, study design, time interval, and rater experience yet consistently report substantial to excellent inter- and intrarater reliability. While individual components of risk of bias among studies also demonstrated large discrepancy, most of the items were adequate to very good.
  60 in total

1.  Association of hypermobility and ingrown nails.

Authors:  Fatma Gulru Erdogan; Abdurrahman Tufan; Munevver Guven; Berna Goker; Aysel Gurler
Journal:  Clin Rheumatol       Date:  2012-06-02       Impact factor: 2.980

2.  Reliability of the Beighton Hypermobility Index to determinate the general joint laxity performed by dentists.

Authors:  Christian Hirsch; Monique Hirsch; Mike T John; Jens Johannes Bock
Journal:  J Orofac Orthop       Date:  2007-09       Impact factor: 1.938

3.  Reconstruction of the coracoclavicular and acromioclavicular ligaments with semitendinosus tendon graft: a pilot study.

Authors:  Maristella F Saccomanno; Mario Fodale; Luigi Capasso; Gianpiero Cazzato; Giuseppe Milano
Journal:  Joints       Date:  2014-05-08

Review 4.  Orthopaedic management of the Ehlers-Danlos syndromes.

Authors:  William B Ericson; Roger Wolman
Journal:  Am J Med Genet C Semin Med Genet       Date:  2017-02-13       Impact factor: 3.908

5.  Volumetric definition of shoulder range of motion and its correlation with clinical signs of shoulder hyperlaxity. A motion capture study.

Authors:  Mickaël Ropars; Armel Cretual; Hervé Thomazeau; Rajiv Kaila; Isabelle Bonan
Journal:  J Shoulder Elbow Surg       Date:  2014-09-03       Impact factor: 3.019

6.  CURRENT CONCEPTS IN THE TREATMENT OF GROSS PATELLOFEMORAL INSTABILITY.

Authors:  Grant Buchanan; LeeAnne Torres; Brian Czarkowski; Charles E Giangarra
Journal:  Int J Sports Phys Ther       Date:  2016-12

7.  Hypermobility syndrome increases the risk for low bone mass.

Authors:  Selmin Gulbahar; Ebru Sahin; Meltem Baydar; Ciğdem Bircan; Ramazan Kizil; Metin Manisali; Elif Akalin; Ozlen Peker
Journal:  Clin Rheumatol       Date:  2005-11-26       Impact factor: 2.980

8.  Test-retest reliability of ankle injury risk factors.

Authors:  J F Baumhauer; D M Alosa; A F Renström; S Trevino; B Beynnon
Journal:  Am J Sports Med       Date:  1995 Sep-Oct       Impact factor: 6.202

9.  Inter-tester reproducibility and inter-method agreement of two variations of the Beighton test for determining Generalised Joint Hypermobility in primary school children.

Authors:  Tina Junge; Eva Jespersen; Niels Wedderkopp; Birgit Juul-Kristensen
Journal:  BMC Pediatr       Date:  2013-12-21       Impact factor: 2.125

10.  Interrater reliability: the kappa statistic.

Authors:  Mary L McHugh
Journal:  Biochem Med (Zagreb)       Date:  2012       Impact factor: 2.313

View more
  2 in total

1.  Capsule Closure of Periportal Capsulotomy for Hip Arthroscopy.

Authors:  Rami George Alrabaa; Abhishek Kannan; Alan L Zhang
Journal:  Arthrosc Tech       Date:  2022-06-21

2.  Assessment of systemic joint laxity in the clinical context: Relevance and replicability of the Beighton score in chronic fatigue.

Authors:  Gabriella Bernhoff; Helena Huhmar; Lina Bunketorp Käll
Journal:  J Back Musculoskelet Rehabil       Date:  2022       Impact factor: 1.456

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.