| Literature DB >> 36078310 |
Preston Riley Graben1, Mark C Schall1, Sean Gallagher1, Richard Sesek1, Yadrianna Acosta-Sojo1.
Abstract
(1) Background: The objectives of this systematic review were to (i) summarize the results of studies evaluating the reliability of observational ergonomics exposure assessment tools addressing exposure to physical risk factors associated with upper extremity musculoskeletal disorders (MSDs), and (ii) identify best practices for assessing the reliability of new observational exposure assessment tools. (2)Entities:
Keywords: ergonomics; fatigue failure; musculoskeletal disorders; occupational safety and health; physical health; prevention and protection; risk assessment; risk perception and management
Mesh:
Year: 2022 PMID: 36078310 PMCID: PMC9518117 DOI: 10.3390/ijerph191710595
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Search string used in the review.
Figure 2The article selection process.
Title, industry of interest, and stated objectives of the articles included in the review a.
| Author (Year) | Title | Industry | Stated Objective |
|---|---|---|---|
| Neumann et al. (1998) [ | A participative field study of the inter-rater reliability of a risk factor assessment checklist used by manufacturing plant personnel | Foam manufacturing. | The purpose of this study was to evaluate the inter-rater reliability of the Manufacturing Operation Risk Factor Checklist (MORF) in a realistic field implementation. |
| Dockrell et al. (2012) [ | An investigation of the reliability of Rapid Upper Limb Assessment (RULA) as a method of assessment of children’s computing posture | Elementary school. | The objectives were to (1) to establish the inter-rater reliability of RULA in children (2) to establish intra-rater reliability of RULA in children (3) to investigate the association, if any, between child’s age and reliability of RULA. |
| Rhen and Forsman (2020) [ | Inter- and intra-rater reliability of the OCRA checklist method in video-recorded manual work tasks | Grocery and cashier work, meat deboning and netting, engine assembly, lavatory and stair cleaning, post-sorting, and hairdressing. | The objectives were to, with respect to risk factors and calculated risk levels, study the consistency of (1) assessments performed by different ergonomists (inter-rater reliability) and (2) repeated assessments performed by each of the ergonomists (intra-rater reliability) of the Occupational Repetitive Actions (OCRA) checklist. |
| Paulsen et al. (2014) [ | Inter-rater reliability of cyclic and non-cyclic task assessment using the hand activity level in appliance manufacturing | House appliance manufacturing. | The purpose of this study was to compare the inter-rater reliability of the HAL assessments used to estimate worker exposure to repetitive hand extensions during cyclic and non-cyclic task performance in the appliance manufacturing industry. |
| Stevens et al. (2004) [ | Inter-rater reliability of the Strain Index | Videos were selected from an archive to provide a full spectrum of rating categories for the task variables of the Strain Index. | The purpose of this study was to evaluate the inter-rater reliability of the Strain Index. |
| Dartt et al. (2009) [ | Reliability of assessing upper limb postures among workers performing manufacturing tasks | Appliance manufacturing. | The purpose of this study was to determine the inter- and intra-rater reliability of assessing neck, shoulder, and wrist postures by using the Multimedia Video Task Analysis (MVTA) |
| Valentim et al. (2018) [ | Reliability, Construct Validity, and Interpretability of the Brazilian version of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI) | Textile industry, electronics industry, assembling line, tinsmith and sawmills, self-employed workers (hairdresser, dentist, beautician, woodworker, butcher, bricklayer, etc.). | The study aimed to cross-culturally adapt and test the measurement properties of the RULA and the Strain Index. |
| Stephens et al. (2006) [ | Test–retest repeatability of the Strain Index | Manufacturing, meat/poultry, manual material handling. | The purpose pf this study was to investigate the test-retest repeatability of the Strain Index. |
| Paulsen et al. (2015) [ | The inter-rater reliability of Strain Index and OCRA Checklist task assessments in cheese processing | Cheese manufacturing. | The purpose of this study was to characterize the inter-rater reliability of two physical exposure assessment methods of the upper extremity, the Strain Index, and OCRA checklist. |
| Hollak et al. (2014) [ | Towards a comprehensive Functional Capacity Evaluation for hand function | More than 180 different occupations. | The purpose of this study was to develop a more efficient (shortened) protocol for hand function capacity evaluation and to test the agreement of the protocol compared to the original protocol. |
| Coenen et al. (2014) [ | Validity and inter-observer reliability of subjective hand-arm vibration assessments | Laboratory. | Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lack detail. Therefore, this study aimed to test a subjective hand-arm vibration assessment method for validity and inter-observer reliability. |
a The table may include direct quotes to maintain consistency with the original articles. Please seek the original articles for further information.
Participant and Rater information for the articles included in the review a.
| Author (Year) | Number of Participants | Participant Demographics | Rater Training | Rater Characteristics |
|---|---|---|---|---|
| Neumann et al. (1998) [ | 8 | N.A. | 7–10 h of training on the use of the checklist. | Plant ergonomic committee members. |
| Dockrell et al. (2012) [ | 24 | Children | Raters were given 45 min training sessions including a lecture and demonstration using PowerPoint on Rapid Upper Limb Assessment (RULA). It was followed by a practical session where they used the tool and compared and discussed their ratings. | Undergraduate physiotherapy students and experienced physiotherapists. Mean age of students = 22.2 years (range = 21–24. The mean age of therapists = 37.3 years (range = 31–45). |
| Rhen and Forsman (2020) [ | One voluntary worker for each job filmed | N.A. | Raters were given a 25 min lecture and an Internet-based education on Occupational Repetitive Actions (OCRA), which included background, application, and a demonstration. | Licensed female physiotherapists with more than 4 years ergo experience as professional ergonomists |
| Paulsen et al. (2014) [ | 385 workers | Mean age = 42.3 years (SD = 10.6). Average experience = 14.7 years (SD = 11.4); 91.5% were white, 51.3% were males. | Each faculty member at the University was thoroughly trained in the use of the Hand Activity Level (HAL). They thereafter trained their graduate students. | Two university faculty members with extensive experience and nine graduated students trained by the faculty. Mean age = 29.8 years (SD = 8.6) and roughly 54.5% were female. |
| Stevens et al. (2004) [ | The research team used video files and did not include any participant information. | All raters participated in a 1-day training course given in their respective geographic location. Lasting approximately 8 h, it included a description of the principles and procedures of the Strain Index, applied examples using video of real-world examples, along with feedback and discussion regarding the choice of the appropriate ratings. | Nine raters were practicing ergonomists and six raters were students studying for advanced ergonomic degrees. | |
| Dartt et al. (2009) [ | 20 | Mean age = 47.8 years (range = 34–62). Average experience = 19.7 years (range = 6–36; 50% were male. | Six months of software familiarity and two weeks of formal training sessions including (1) observing a professional (2) completing the same review as the professional (3) reviewing tasks and having the professional check their work afterwards (4) then completing analyses on their own. | Graduate students working in the Ergonomics Laboratory at Colorado State University. |
| Valentim et al. (2018) [ | 116 assumed workers for each job | N.A. | Each rater received additional training which consisted of explanations of the methods and theoretical/practical application of the assessment tools. | Each rater was experienced with three to five years of biomechanical exposure assessments. |
| Stephens et al. (2006) [ | Assumed one worker for each job on the video file | N.A. | Each rater regardless of experience was given an 8 h tutorial on using Strain Index which included background on Strain Index principles, Strain Index applications, video file examples of jobs, demonstrations on how to apply ratings to video files, and an open discussion of example results. | Six graduate students (three masters and three PhD’s) and nine ergonomic practitioners. No Certified Professional Ergonomists (CPEs). |
| Paulsen et al. (2015) [ | Assumed one worker for each job on the video file | N.A. | Training sessions included instruction on the procedures of each method, practice applying the methods to video segments of manufacturing tasks, and feedback from an experienced rater. Training sessions continued until trainees achieved competency. Competency for each method was reached when trainees consistently (80% of time) assigned exposure ratings that were similar (within 20%) to the most experienced rater. | Members from occupational health research groups including three university faculty and four graduate students. Two were CPEs. |
| Hollak et al. (2014) [ | 643 healthy working participants | 402 mean and 241 women. Mean age = 41.6 (SD = 10.4). | Two-day Functional Capacity Evaluation training given by a licensed WorkWell trainer specifically for the purpose of this study. | Physical therapy students. |
| Coenen et al. (2014) [ | 2 | Two males aged 37 and 56 with substantial knowledge and experience with power tools. | Each rater had substantial knowledge in human kinematics and ergonomic risk assessments but not regarding vibration; therefore, all received verbal and written instructions on the hand-arm vibration assessment. | Students and employees of Vrije Universiteit Amsterdam, Faculty of Human Movement Sciences and TNO Healthy Living. Mean age = 30.2 (SD = 12.1). |
a The table may include direct quotes to maintain consistency with the original articles. Please seek the original articles for further information.
Observation information, reliability assessment(s), and interpretation for the articles included in the review a.
| Author (Year) | Number of Raters | Obs | What was Observed | Reliability Assessment | Interpretation |
|---|---|---|---|---|---|
| Neumann et al. (1998) [ | 7 | 56 | Eight jobs. | Intraclass correlation coefficients (ICCs), similar index and comparable to the kappa coefficient, were calculated from 2 × 2 analysis of variance (ANOVA) | Poor-fair at ICC < 0.4, fair-good at 0.4 < ICC < 0.75, and excellent at ICC => 0.75 |
| Dockrell et al. (2012) [ | 6 | 144 | Twenty-four school children based on Shoukri et al. (2004) recommendation of 18–29. | ICC (2,1), ICC (3,1) | ICC < 0.50 = Poor, 0.05 < ICC < 0.75 = Moderate, ICC > 0.5 = Good |
| Rhen and Forsman (2020) [ | 11 | 220 | Ten video recordings were analyzed twice by each rater. | Cohens linearly weighted kappa, ICC (2,1), Kendall’s coefficient of concordance KCC, percentage agreement | kappa < 0.00 = Poor, 0.00–0.20 = Slight, 0.21–0.40 = Fair, 0.41–0.60 = Moderate, 0.61–0.80 = Substantial, 0.81–1.00 = Almost Perfect, Percent agreement > 80% = acceptable, (ICC < 0.50, “poor”, 0.50–0.75 “moderate”, 0.75–0.9 “good”, and >0.90 “excellent” reliability) |
| Paulsen et al. (2014) [ | 11 working in pairs. Each person in each pair rated tasks individually, but each task was rated by one pair. | 1716 | 385 workers doing 858 tasks | For each rater pair, reliability was measured between the scores using Pearson Product Moment Correlation Coefficient (Streiner and Norman, 2006) [ | Weighted Mean Correlation Coefficients—negligible: 0.00–0.25; fair to moderate: 0.25–0.50; moderate to good: 0.50–0.75; good to excellent: 0.75–1.0 |
| Stevens et al. (2004) [ | Fifteen raters and five teams. | 1095 | 61 videos for specific task variables of the Strain Index and 12 videos for complete analysis (73 total). | ICC (2,1) using single measure and absolute agreement were used to analyze the data, task variable ratings, and Strain Index score. The Kuder and Richardson’s Equation 20 (KR-20) and percent agreement was used to analyze the dichotomized hazard score. | Poor-fair at ICC < 0.4, fair-good at 0.4 < ICC < 0.75, and excellent at ICC => 0.75. The authors did not indicate what other interpretations they used for the other reliability coefficients. |
| Dartt et al. (2009) [ | 2 | 80 | 20 jobs were analyzed twice by both raters | Generalizability theory paired with Pearson Product Moment Correlation Coefficients | Coefficients > 0.75 = good to excellent, 0.50 < Coefficients < 0.75 = fair to good, Coefficients < 0.50 = poor |
| Valentim et al. (2018) [ | 2 | 464 | 116 recorded tasks were analyzed twice by each rater | Kappa, ICC (2,1), percentage agreement, standard error for measurement, Cronbach alpha Coefficient, Spearman’s Rho. | Kappa (k < 0.00 = Poor, 0.00–0.20 = Slight, 0.21–0.40 = Fair, 0.41–0.60 = Moderate, 0.61–0.80 = Substantial, 0.81–1.00 = Almost Perfect), ICC (poor <0.40, moderate 0.40–0.75, strong 0.75–0.90, excellent >0.90), agreement (very good <5%, 5% < good = 10%, 10% < doubtful = 20%, negative >20%), Cronbach alpha (positive = 0.70 and 0.95, low < 0.70, redundant > 0.95), Spearman rho (weak = 0–0.30, moderate = 0.30–0.70, strong = 0.70–1.0) |
| Stephens et al. (2006) [ | 15 individual raters in 5 teams of 3 | 1854 | 73 job files (61 task variable and 12 Strain Index score files) | ICC (2,1) was used for most of the data while the tetrachoric correlation coefficient was used for the dichotomous hazard classification value | The authors of this study do not reference a single interpretation scale for either the ICC or the Tetrachoric Correlation Value. |
| Paulsen et al. (2015) [ | 3 university faculty and 4 graduate students for a total of 7 raters | 448 | 21 cyclic U.E. tasks were to be analyzed; 11 were asymmetric and treated separately which increased the total tasks to 32 | ICC(2,1) | ICC < 0.40 = poor reliability; 0.40 < ICC < 0.75 = moderate to good reliability; and ICC > 0.75 = excellent reliability |
| Hollak et al. (2014) [ | 1 of 15 physical therapy students | 643 | 643 | One-way random ICC(1,1) and Limits of Agreement (LoA) | 0.91 < ICC < 1.0 (Excellent Agreement), 0.75 < ICC < 0.90 (High Agreement), 0.50 < ICC < 0.75 (Moderate Agreement) The LoA were assumed to be acceptable for clinical interpretation at 16%. |
| Coenen et al. (2014) [ | 16 in teams of 4 | 64 | 16 tasks | ICC, weighted Cohen’s kappa (k), and Percentage Agreement | For both ICCs and Ks >0.60 = good, 0.40–0.60 = agree moderately, <0.40 = limited agreement. To test for a learning effect, the percentage of agreement between subjective assessment and objective measurements in the first two tasks were compared to the last two tasks. |
a The table may include direct quotes to maintain consistency with the original articles. Please seek the original articles for further information.
Reliability results for the articles included in the review a.
| Author (Year) | Reliability Results |
|---|---|
| Neumann et al. (1998) [ | Reliability, as assessed using the intra-class correlation coefficient (ICC), was found to be poor for the upper limb, moderate for the torso and lower limb, and good for the assessment of manual material handling. |
| Dockrell et al. (2012) [ | Rapid Upper Limb Assessment (RULA) demonstrated higher intra-rater reliability than inter-rater reliability, although both were moderate to good. RULA was more reliable when used for assessing older children (8–12 years) than with younger children (4–7 years). RULA may prove useful as part of an ergonomic assessment, but its level of reliability warrants caution for its sole use when assessing children, and in particular, younger children. Action Limit—Mean = (0.60), Standard Deviation (SD) = (0.20), Range = (0.59); Grand Score—Mean = (0.68), SD = (0.15), Range = (0.37); Arm Score—Mean = (0.62), SD = (0.25), Range = (0.58); Trunk and Leg Score—Mean = (0.75), SD = (0.13), Range = (0.32). |
| Rhen and Forsman (2020) [ | For the five risk levels, the inter-rater overall percentage agreement was 39% and Cohen’s linearly weighted kappa was 0.43. For the six risk factors, the linearly weighted kappa values were between 0.25 (Posture) and 0.40 (Duration and Force). As expected, a higher (however just slightly higher) reliability was found within raters than between raters, with an overall percentage agreement of 45% and a linearly weighted kappa of 0.52. The linearly weighted kappa values of the risk factors ranged from 0.41 (Recovery) to 0.61 (Duration). |
| Paulsen et al. (2014) [ | Results indicated that the Hand Activity Level (HAL) is a reliable exposure assessment method for cyclic ( |
| Stevens et al. (2004) [ | For task variables and estimated data, ICC (2,1) varied between 0.66–0.84 for individuals and 0.48–0.93 for teams. The Strain Index score had an ICC (2,1) of 0.43 and 0.64 for individuals and teams, respectively. For the most important variable, hazard classification, the Kuder and Richardson’s Equation 20 (KR-20) was 0.91 for the individuals and 0.89 for the teams. |
| Dartt et al. (2009) [ | The results demonstrated good to excellent inter-rater reliability for neck and shoulder postures and fair to excellent inter-rater reliability for wrist postures. Intra-rater posture assessment demonstrated good to excellent reliability for both raters in all postures of the neck, shoulder, and wrist. This study demonstrated that posture assessment of manufacturing workers using Multimedia Video Task Analysis (MVTA) is a reliable method. |
| Valentim et al. (2018) [ | The intra-raters’ reliability for the RULA ranged from poor to almost perfect (kappa: 0.00–0.93), and Strain Index from poor to excellent (ICC2.1: 0.05–0.99). The inter-raters’ reliability was very poor for RULA (kappa: −0.12 to 0.13) and ranged from very poor to moderate for Strain Index (ICC2.1: 0.00–0.53). The agreement was good for RULA (75–100% intra-raters, and 42.24–100% inter-raters) and to Strain Index (EPM: −1.03% to 1.97%; intra-raters, and −0.17% to 1.51% inter-raters). The internal consistency was appropriate for RULA (a = 0.88), and low for Strain Index (a = 0.65). Moderate construct validity was observed between RULA and Strain Index, in wrist/hand-wrist posture (rho: 0.61) and strength/intensity of exertion (rho: 0.39). |
| Stephens et al. (2006) [ | Intraclass correlation (ICC) coefficients for task variable ratings and accompanying data ranged from 0.66 to 0.95 for both individuals and teams. The Strain Index Score ICC (2,1) for individuals and teams were 0.56 and 0.82, respectively. Intra-rater reliability for the hazard classification (tetrachoric correlation) was 0.81 for individuals and 0.88 for teams. The results indicate that the Strain Index has good test–retest reliability. |
| Paulsen et al. (2015) [ | Inter-rater reliability was characterized using a single-measure, agreement-based ICC. Interrater reliability of Strain Index assessments was moderate to good (ICC = 0.59, 95% Confident Interval (CI): 0.45–0.73), a similar finding to prior studies. Inter-rater reliability of Occupational Repetitive Actions (OCRA) checklist assessments was excellent (ICC = 0.80, 95% CI: 0.70–0.89). Task complexity had a small, but non-significant, effect on inter-rater reliability Strain Index and OCRA checklist scores. Both the Strain Index and OCRA checklist assessments possess adequate inter-rater reliability for the purposes of occupational health research and practice. |
| Hollak et al. (2014) [ | The ICCs were excellent (ICC > 0.91) in all proposed protocols except for the one trial Purdue Pegboard test with ICCs of 0.80–0.82. In all tests, the ICCs were higher for the two-trial protocol than for the one trial protocol. For all tests, the Limits of Agreements (LoAs) were about twice as large for the one trial protocol compared to the two-trial protocol. All two trial protocols had a variability of the LoA of lower than 16% when compared to the criterion values. |
| Coenen et al. (2014) [ | Inter-observer reliability can be expressed by an ICC of 0.708 (0.511–0.873). The concurrent validity of subjective hand-arm vibration assessment in comparison to the objective measurement can be expressed by a weighted kappa of 0.535 (0.285–0.785). As a comparison, the ICC depicting the validity of the vibration values provided by the manufacturers as compared to the objectively measured vibrations was calculated 0.505 (0.364–0.706). Exact agreement of the subjective assessment compared to the objective measurement occurred in 52% of the assessed tasks. The additional analysis to investigate a possible learning effect showed 44% agreement of the subjective and objective assessment during the first two tasks of each observer while there was 59% agreement during the last two tasks. |
a The table may include direct quotes to maintain consistency with the original articles. Please seek the original articles for further information.