Literature DB >> 21569486

The quality of evidence of psychometric properties of three-dimensional spinal posture-measuring instruments.

Yolandi Brink¹, Quinette Louw, Karen Grimmer-Somers.

Abstract

BACKGROUND: Psychometric properties include validity, reliability and sensitivity to change. Establishing the psychometric properties of an instrument which measures three-dimensional human posture are essential prior to applying it in clinical practice or research.
METHODS: This paper reports the findings of a systematic literature review which aimed to 1) identify non-invasive three-dimensional (3D) human posture-measuring instruments; and 2) assess the quality of reporting of the methodological procedures undertaken to establish their psychometric properties, using a purpose-build critical appraisal tool.
RESULTS: Seventeen instruments were identified, of which nine were supported by research into psychometric properties. Eleven and six papers respectively, reported on validity and reliability testing. Rater qualification and reference standards were generally poorly addressed, and there was variable quality reporting of rater blinding and statistical analysis.
CONCLUSIONS: There is a lack of current research to establish the psychometric properties of non-invasive 3D human posture-measuring instruments.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 21569486 PMCID： PMC3107179 DOI： 10.1186/1471-2474-12-93

Source DB: PubMed Journal: BMC Musculoskelet Disord ISSN： 1471-2474 Impact factor: 2.362

Background

Postural assessment is a standard and essential component of examining individuals with neuromusculoskeletal disorders [1,2]. Prolonged static postures are widely recognised as a risk factor of neuromusculoskeletal pain among children, adolescents and adults [3-9]. No uniform definition for "ideal" posture exists and therefore researchers and clinicians continue to seek the best way of assessing and describing posture. Ideal spinal posture is proposed as neutral spinal alignment, however the relationship between spinal segments in a normal population remains unknown [10,11]. The spine is a complex three-dimensional (3D) anatomical structure, whose segmental position in space should be described in all three planes (sagittal, frontal and transverse) [12-14]. Precise positional data can be derived from a number of biomechanical measurement tools, of which non-invasive 3D instruments are preferred. It is essential that a spinal posture-measuring instrument is shown to be reliable and valid. Without this assurance, it cannot facilitate diagnosis, chart variability in 'usual' posture or assist objective monitoring of patient progress with treatment [1]. Researchers and clinicians should therefore be familiar with the psychometric properties of spinal posture-measuring instruments, and choose the ones with the best evidence of performance [15]. Two core elements of psychometric properties are reliability and validity [16]. Reliability and validity are interlinked of which reliability is a prerequisite to validity. A measurement tool cannot be recommended with confidence if there is a lack of evidence about its reliability and validity [17]. Reliability, refers to being able to estimate the inherent variability of posture, as well as error that can be attributed to the rater and the measurement instrument [17]. Error can relate to the consistency with which measurements are taken by the same or different raters, or over multiple occasions of testing [16]. Reliability is variously classified as test-retest reliability, inter-and intra-rater reliability. Test-retest reliability describes the stability of the measurement instrument in obtaining the same results with repeated measurements using the identical test on two or more separate occasions, keeping all testing conditions as constant as possible [17]. Intra-rater reliability is defined as the stability of data recorded by one observer across two or more test occasions. Inter-rater reliability is the extent to which two or more observers obtain similar scores when rating the same individuals [16,17]. Validity is the extent to which an instrument measures what it is intended to measure [18]. Criterion-related validity is the ability of one test (index test) to predict results obtained on an external criterion (gold standard/reference standard) which is assumed to be valid. When both tests are performed on the same subjects, the scores from the index test are correlated with those achieved by the criterion measure. Construct validity is the ability of an instrument to measure an abstract concept, which cannot be observed directly and which has been constructed to represent an abstract trait [17]. There are two types of criterion-related validity. Concurrent validity is evaluated when the index test and the criterion measure are taken at the same time so that it reflects the same incident of behaviour while predictive validity is tested when the index test is performed and measured prospectively to ascertain the relationship between the index test and the criterion scores to determine whether the index test is a valid predictor of the outcome [17]. There are three types of construct validity. Convergent validity indicates that two measures, which are believed to reflect the same construct, will have similar results or will correlate highly [17]. Whereas divergent validity indicates that two measures, which are believed to measure different constructs, will correlate poorly [19]. Convergent and divergent validity assess the sensitivity and specificity of a measurement respectively [19]. Discriminative validity is the extent to which measures from a measurement instrument distinguishes between individuals or populations that would be expected to differ [19]. Establishing the psychometric properties of spinal posture-measuring instruments is not a trivial task, given the complex nature of human posture. Thus, convincing evidence of reliability and validity of any posture-measuring instrument can only be established by assessing the methodological quality of the underpinning developmental studies. Specific psychometric study design features are therefore essential to establish and assess, for instance, controls that are put in place for systematic bias, non-systematic bias and inferential error. An important requirement for psychometric testing of posture measurement is that the instrument be tested under a given set of conditions on a specific population within the context of the instrument's intended use. Therefore it is essential that posture-measuring instruments be tested on humans at some stage of development, and not just on inanimate objects [17]. The purpose of the systematic review reported in this paper was 1) to identify the non-invasive 3D tools which measure human static sitting or standing spinal posture and 2) to review the quality of the evidence of reliability and validity of the identified 3D posture-measuring instruments.

Methods

Search Strategies

Two inter-related search strategies (A and B) were implemented to ensure that all eligible papers were included. Strategy A sought any primary research studies which reported the use of 3D non-invasive instruments measuring static sitting or standing spinal posture. Strategy B sought primary research into the psychometric testing of these instruments. One reviewer searched six electronic databases that were available at the Stellenbosch University Library. The databases were BioMed Central, CINAHL, PEDRO, PROQUEST, PUBMED and SCIENCE DIRECT. The publication date was restricted to papers published from 1980 to June 2010. The search was limited to full-text papers published in English. MESH terms were used in PUBMED. See additional file 1 for a detailed description of the database searches. In addition, secondary searching was performed on the reference list of the included papers. Experts in this field of research, and authors who failed to provide references to studies which tested an instrument's psychometric properties, were contacted.

Keywords and synonyms

The following keywords were used: three-dimensional, measurement tool, assessment tool, instrument, measurement, assessment, spinal posture, posture, validity, reliability, accuracy and reproducibility.

Inclusion and exclusion criteria for selection of papers

Papers were included if they reported testing an instrument's psychometric properties, specifically reliability and/or validity, using humans, or the instrument's validity using objects. A core inclusion criteria was that static standing or sitting spinal posture had to be evaluated with an instrument that could quantitatively calculate 3D spinal posture without using a baseline reference value such as zero. This was because a reference value requires that the subject be required to first assume a neutral or resting posture at which point the instrument is zeroed before the instrument can measure static spinal posture. For the purpose of the review, static posture should be assessed instantaneously without any guiding from the researcher. Papers were excluded if (1) they reported neither reliability nor validity testing (2) they did not report on static spinal posture (e.g. reported on the 3D motion of the spine, scapulo-humeral girdle or pelvis); (3) the study reported on the validity testing of an instrument using motion (as motion was not incorporated in this review, and we argue that validity be evaluated within the context of the instrument's intended use; (4) the instrument only measured cadaver or in vitro spinal posture; (5) the instrument was invasive e.g. biplanar radiography and stereoradiography; (6) only an algorithm or a mathematical formula were reported.

Study selection

One reviewer excluded papers by screening all the titles and reading the abstracts after which two independent reviewers selected the eligible papers after reading the full text version of the remaining papers. Figure 1 describes the procedures of study selection for each of the two search strategies.

Figure 1

A Flowchart to demonstrate the procedures for study selection.

Methodological Quality Appraisal

The full text eligible papers were then subjected to methodological critical appraisal. The Critical Appraisal Tool (CAT) applied in this review was purpose-built, in the absence of any other relevant CAT. It was adapted from the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) [20] and the Quality Appraisal of Reliability Studies (QAREL) [21]. The purpose-built CAT has 13 items, however its data is not designed to be reported as a composite quality score (see additional file 2). The CAT was designed to assess the impact of each individual item on the quality of the methodological procedures implemented in each paper. Prior to critical appraisal of the included articles, three papers were randomly selected and assessed independently by three reviewers using the purpose-built CAT. Disagreements were discussed to ensure that interpretation of the CAT items were consistent.

Results

Results from the search strategies

One hundred and thirty possible papers were considered, of which 30 papers were deemed to be eligible. Nine additional papers were identified after searching the reference lists of these papers. Two further papers were included after experts and authors had been contacted. Figure 2 provides a consort diagram to demonstrate the selection of papers.

Figure 2

Consort diagram to demonstrate the selection of papers.

Volume of literature

Eighteen instruments were identified from the two literature searches, 15 from Search A, one from Search B and two from author contacts. The instruments are listed in the first column of Table 1, the papers addressing aim one appear in the second column and those addressing aim two are in the third column. Papers reporting these instruments, are identified by bold script if from strategy A, italics if from strategy B, normal script if from author search and with a * if from secondary searching. The Automatic Scoliosis Analyser System (Auscan) (Italy), the Elite system (Italy), the Optotrak 3020 (Canada), the Peak Motus (USA), the PosturePrint (Canada), the Qualysis Proreflex Motion Capture Unit system (Sweden), the Vicon 370 (England) and an Optoelectronic camera system (Canada) are optoelectronic analysis systems. The Fonar upright positional MRI (USA) uses magnetic resonance imaging. The INSPECK (Canada) is an optical 3D digitizer. The Lumbar Motion Monitor (LMM) (USA) is a electrogoniometer. The Metrecom (USA), the Articulated Arm for Computerized Surface Measurement (BACES) (Italy) and the Microscribe 3DX Digitizer (USA) are computerized electromechanical 3D digitizers. Rasterstereography is a photogrammetric method based on triangulation. The 3 Space Isotrak or Fastrak (USA) and the Electromagnetic tracking system (USA) are electromagnetic devices. The Zebris (Germany) is an ultrasound analysis system.

Table 1

Recent three-dimensional instruments used to measure static spinal posture

Instrument	Addresses Aim 1: Used to measure posture	Addresses Aim 2: Reports on psychometric properties	N
BACES	D'Osualdo et al. 2002 [41]

AUSCAN	Negrini et al. 2007 [42]

Electromagnetic tracking system	Claus et al. 2009 [43]

Elite optoelectronic system	Lissoni et al. 2001 [44]; Naslund et al. 2005 [45]

Inspek		Pazos et al. 2005* [35]; Pazos et al. 2007 [27]	2

Lumber Motion Monitor (LMM)	Jang et al. 2007 [46]

FONAR Upright positional MRI	Morl et al. 2006 [47]; Cargill et al. 2007 [48]; Lafon et al. 2010 [49]

Metrecom	Franklin et al. 1995* [50]; Black et al. 1996 [51]; Gram et al. 1999 [52]	Smidt et al. 1992* [22]; Norton et al. 1993* [38]	2

Microscribe 3DX Digitizer		Warren et al. 2005 [28]	1

Optoelectronic camera system	Duong et al. 2009 [53]

Optotrak 3020	Rempel et al. 2007 [54]

Peak Motus	Straker et al. 2009 [55]

Postureprint		Normand et al. 2002 [37]; Harrison et al. 2007 [33]; Janik et al. 2007 [34]; Normand et al. 2007 [26]	4

Qualysis Proreflex Motion Capture Unit system	Grip et al. 2007 [56]; Neiva et al. 2009 [57]

Rasterstereography		Stokes et al. 1988* [32]; Hackenberg et al 2003a [30]; Hackenberg 2003b [31]; Drerup et al. 1994* [23] and 1996* [24]	5

3 Space Isotrack/Fastrak	O' Sullivan et al. 2006* [58]; Caneiro et al. 2010 [59]; Astfalck et al. 2010 [60]	Pearcy et al. 1989* [36]	1

Vicon three-dimensional kinematic system	Levine et al. 1996 [61}; Szeto et al. 2005 [9]; Skalli et al. 2006 [62]	Whittle et al. 1997 [29]	1

Zebris CMS70P; Zebris CMS20	Theisen et al. 2010 [63]	Geldhof et al. 2007 [25]	1

N: number of papers addressing aim 2; Bold script: Papers from search A; Italic script: Papers from search B;*: Papers from secondary search; Normal script: Papers from author search

Recent three-dimensional instruments used to measure static spinal posture N: number of papers addressing aim 2; Bold script: Papers from search A; Italic script: Papers from search B;*: Papers from secondary search; Normal script: Papers from author search Seventeen papers reported on reliability and/or validity of the included instruments and were thus assessed to address Aim two (see Table 1 third column). One paper by Smidt et al. [22] reported on both reliability and validity, and was therefore reviewed as if it was two separate papers, due to the nature of this review. Drerup et al. [23] tested a new algorithm for processing data presented in a previous paper [24]. These papers were reviewed as if they were one paper, because the previous paper reported on the study procedure in more detail whereas the latter paper discussed the latest improvement made on the data processing procedure.

Aim of the reliability studies

The aim of six studies was to test the reliability of a 3D instrument in assessing the spinal posture of humans [22,25-29].

Aim of the validity studies

The aim of eleven studies was to test the validity of a 3D posture instrument. Four studies [23,30-32] used human subjects to measure 3D spinal posture and to compare the results with those obtained from a reference standard. The other seven studies either used mannequins [33-35], wooden wedges [36], a steel frame [22], parallelograms [37] or other objects with known parameters [38] to test the validity of an instrument that could be used to assess 3D spinal posture of humans in future.

Study design for reliability and validity studies

The type of reliability and validity tested, as well as the time interval for the reliability studies and the reference standard for the validity studies, are reported in Table 2.

Table 2

The type and time interval for reliability studies and the type and reference standard for validity studies

Author	Type of reliability	Time interval	Type of validity	Reference standard
Stokes et al (1988)	N/A	N/A	Criterion-related validity	Stereoradiography

Pearcy et al (1989)	N/A	N/A	Concurrent validity	Precision optical inclinometer

Smidt et al (1992)	N/A	N/A	Concurrent validity	Not specified

	Intra- and interrater reliabIlity	On the same day	N/A	N/A

Norton et al (1993)	N/A	N/A	Concurrent validity	Type measure or ruler

Drerup et al (1996)	N/A	N/A	Criterion-related validity	Stereoradiography

Normand et al (2002)	N/A	N/A	Concurrent validity	Not specified

Hackenberg et al (2003a)	N/A	N/A	Criterion-related validity	Stereoradiography

Hackenberg et al (2003b)

Pazos et al (2005)	N/A	N/A	Concurrent validity	Coordinate measuring machine

Harrison et al (2007) and Janik et al (2007)	N/A	N/A	Concurrent validity	Not specified

Whittle et al (1997)	Intrarater reliability	On the same day	N/A	N/A

Warren et al 2005	Intrarater reliability	One minute	N/A	N/A

Geldhof et al (2007)	Intrarater reliability	One week	N/A	N/A

Pazos et al (2007)	Test retest reliability	30 seconds	N/A	N/A

Normand et al (2007)	Intra- and interrater reliability	One day	N/A	N/A

N/A: Not Applicable

The type and time interval for reliability studies and the type and reference standard for validity studies N/A: Not Applicable

Statistical analysis

Table 3 summarizes the statistical procedures implemented in the reliability and validity studies. Comparing the findings in this table with the types of reliability and validity testing reported in Table 2, highlights the variability in choice and application of statistical tests to assess the same constructs.

Table 3

Statistical procedures of the reliability and validity studies

Author	Statistical analysis
Stokes et al (1988)	• linear regression analysis and Pearson correlation coefficient ^®

Pearcy et al (1989)	• means; estimate of error, regression analysis and ICC

Smidt et al (1992)	• Dunnett's comparison test

Norton et al (1993)	• Pearson product moment correlation coefficient ^®and repeated measures t test

Drerup et al (1996) and Hackenberg et al (2003a and b)	• Root mean square (RMS) deviations of the surface curves from the radiographic curves

Whittle et al (1997)	• ICC and Pearson correlation coefficient

Normand et al (2002)	• means, SD, SEM, 95% Confidence Intervals (CI) and mean differences

Pazos et al (2005)	• multiway ANOVA

Warren et al 2005	• Pearson correlation coefficient and ICC

Harrison et al (2007) and Janik et al (2007)	• error analyses of mean differences and SD

Geldhof et al (2007)	• ICC for test-retest reliability

Pazos et al (2007)	• bivariate ANOVA; typical error of measurement (TEM); 95% CI of the TEM; smallest detectable difference (SDD) and multivariate ANOVA

Normand et al (2007)	• mean absolute values of differences within examiner and between examiner measurements; ANOVA; Shapiro-Wilk test and SEM for conservative and liberal ICC methods

Statistical procedures of the reliability and validity studies Table 4 reports the findings from the critical appraisal of the papers, related to reliability and validity testing.

Table 4

Summary of the methodological quality appraisal results of the studies (n = 17)

Authors	Item 1	Item 2	Item 3	Item 4	Item 5	Item 6	Item 7	Item 8	Item 9	Item 10	Item 11	Item 12	Item 13
Stokes et al (1988)	√	x	√	n/a	n/a	n/a	√	n/a	√	√	√	√	√

Pearcy et al (1989)	n/a	x	√	n/a	n/a	n/a	n/a	n/a	√	√	√	n/a	√

Smidt et al (1992) (validity)	n/a	x	x	n/a	n/a	n/a	n/a	n/a	x	√	x	n/a	√

Smidt et al (1992) (reliability)	√	√	n/a	√	√	x	n/a	√	n/a	√	n/a	x	√

Norton et al (1993)	n/a	x	x	n/a	n/a	n/a	n/a	n/a	√	√	√	n/a	x

Drerup et al (1994; 1996)	x	x	√	n/a	n/a	n/a	√	n/a	√	√	√	√	√

Whittle et al (1997)	√	x	n/a	n/a	x	x	n/a	√	n/a	√	n/a	√	√

Normand et al (2002)	n/a	x	x	n/a	n/a	n/a	n/a	n/a	x	√	x	n/a	√

Hackenberg et al (2003a)	√	x	√	n/a	n/a	n/a	√	n/a	√	x	√	x	√

Hackenberg et al (2003b)	√	x	√	n/a	n/a	n/a	√	n/a	√	x	√	x	√

Warren et al (2005)	√	x	n/a	n/a	X	x	n/a	√	n/a	√	n/a	x	√

Pazos et al. (2005)	n/a	x	√	n/a	n/a	n/a	n/a	n/a	√	√	√	n/a	√

Harrison et al (2007)	n/a	x	x	n/a	n/a	n/a	n/a	n/a	x	√	x	n/a	√

Janik et al (2007)	n/a	x	x	n/a	n/a	n/a	n/a	n/a	x	√	x	n/a	√

Geldhof et al (2007)	√	x	n/a	n/a	√	x	n/a	√	n/a	√	n/a	√	√

Pazos et al (2007)	√	x	n/a	n/a	n/a	n/a	n/a	√	n/a	√	n/a	x	√

Normand et al (2007)	√	√	n/a	√	√	√	n/a	√	n/a	√	n/a	√	√

Summary of the methodological quality appraisal results of the studies (n = 17) Item 1: If human subjects were used, did the authors give a detailed description of the sample of subjects used to perform the (index) test? Nine papers [22,25-32] scored "yes" because a detailed description of the sample characteristics was stated. Drerup et al. [23] scored "no" as the authors did not mention how their subjects were recruited and merely stated that only scoliosis patients were included. Seven papers [22,33-38] scored "not applicable" because these studies used inanimate objects. Item 2: Did the authors clarify the qualification, or competence of the rater(s) who performed the (index) test? Eleven validity studies [22,23,30-38] and four reliability studies [25,27-29] scored "no". The qualifications of the operators of the instruments were not reported, as there was no description of their past experience with operating these instruments. The reliability studies of Smidt et al. [22] and Normand et al. [26] scored "yes" as they stated that the operators were "familiar and competent" in its use. Item 3: Was the reference standard explained? Drerup et al. [23], Hackenberg et al. [30,31] and Stokes et al. [32] scored "yes" as they provided references for the methods used to digitize the radiographs. Pazos et al. [35] and Pearcy et al. [36] scored "yes" because the authors named and stated the accuracy of the instruments used as the reference standard. Norton et al. [38] scored "no" because the ruler or tape measure was inappropriately used as a reference standard for calculating 3D coordinates of a point in space. Harrison et al. [33], Janik et al. [34], Normand et al. [37] and Smidt et al. [22] scored "no" because the authors used an object with known 3D parameters as reference standards, but the methods to measure these 3D locations, angles or distances were not explained. Item 4: If interrater reliability were tested, were raters blinded to the findings of other raters? Normand et al. [26] and Smidt et al. [22] scored "yes" because subjects were evaluated separately by the different raters. Geldhof et al. [25], Warren et al. [28] and Whittle and Levine [29] only tested intrarater reliability and scored "not applicable". Pazos et al. [26] scored "not applicable" because no rater reliability was evaluated but instead test-retest reliability of the instrument, when using different postures, was evaluated. Item 5: If intrarater reliability were tested, were raters blinded to their own prior findings of the test under evaluation? Geldhof et al. [25], Normand et al. [26] and Smidt et al. [22] scored "yes" because the raters were sufficiently blinded to their own prior measurements as either repeated digitizing of the anatomical landmarks took place one week apart, all photographs were numbered and were not identifiable by subject name, occasion or characteristics, and no skin markings were made on subjects. Warren et al. [28] and Whittle and Levine [29] scored "no" because passive and skin markings respectively were placed only once on the subject and were not removed between repeated measurements. Pazos et al. [27] scored "not applicable" because they did not test rater reliability. Item 6: Was the order of examination varied? Normand et al. [26] scored "yes" because subjects were evaluated in random order. Warren et al. [28] and Whittle and Levine [29] scored "no" because repeated measurements were performed consecutively without changing the order of subjects during testing. Geldhof et al. [25] scored "no" as the order of testing was kept the same for the repeated measurements one week apart. Smidt et al. [22] scored "no" as insufficient information was provided. Pazos et al. [27] scored "not applicable" because no rater reliability was tested. Item 7: If human subjects were used, was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests? Drerup et al. [23], Hackenberg et al. [30,31] and Stokes et al. [32] scored "yes" because the radiographs and the rasterstereographs were taken on the same day. The other seven articles [22,33-38] scored "not applicable" because inanimate objects which cannot deform with passage of time were used. Item 8: Was the stability (or theoretical stability) of the variable being measured taken into account when determining the suitability of the time-interval between repeated measures? Six papers scored "yes" because repeated measurements of posture were either taken on the same day [22,27-29] one week [25] or one day apart [26]. Item 9: Was the reference standard independent of the index test? Seven papers [23,30-32,35,36,38] scored "yes" because the index test and the reference standard were independant instruments. Harrison et al. [33], Janik et al. [34], Normand et al. [37] and Smidt et al. [22] scored "no" due to insufficient information provided. Item 10: Was the execution of the (index) test described in sufficient detail to permit replication of the test? Nine validity [22,23,32-38] and six reliability papers [22,25-29] scored "yes" because clear descriptions of how the instruments were applied to the subjects or to the inanimate objects were provided. Hackenberg et al. [30,31] scored "no" as the authors did not explain how raterstereographs were performed on the subjects, nor did they provide any citations for the methodology. Item 11: Was the execution of the reference standard described in sufficient detail to permit its replication? Seven papers scored "yes" because clear descriptions of how the reference standard were used on the subjects [23,32] or on the inanimate objects [35,36,38] or citations for the methodology [30,31] were provided. Harrison et al. [33], Janik et al. [34], Smidt et al. [22] and Normand et al. [37] scored "no" for the reasoning provided for item 3. Item 12: Were withdrawals from the study explained? Drerup et al. [23], Geldhof et al. [25], Normand et al. [26], Stokes et al. [32] and Whittle and Levine [29], scored "yes" because the number of subjects who participated in the studies was reflected in the results sections of the studies. Hackenberg et al. [30,31] scored "no" as the authors did not explain why 48 instead of 52 and 24 instead of 25 subjects participated in the pre operative evaluations respectively. Pazos et al. [27], Warren et al. [28] and Smidt et al. [22] scored "no" due to insufficient information provided. Seven papers [22,33-38] scored "not applicable" because these studies used inanimate objects. Item 13: Were the statistical methods appropriate for the purpose of the study? All but one paper by Norton et al. [38] implemented appropriate statistical analysis and thus scored "no". Although the other sixteen papers reported appropriate statistical analysis only six papers [23,30,31,26,28] provided a justification or motivation for using their chosen statistical measures.

Discussion

This review attempted to evaluate the quality of reporting of psychometric properties of 18 3D human posture measuring instruments. It identified a lack of well-documented studies testing the psychometric properties of these instruments, as papers describing the development of only eight instruments were found (see Table 1 column C). The review suggests that the PosturePrint and rasterstereography had relatively more psychometric testing than the other tools included in this review. However, the methodological quality of the testing procedures for all instruments was flawed, when considering the methodological criteria applied in this review.

Rater qualification

Both reliability and validity studies should provide descriptions of the qualifications of the rater(s) used in the studies because the rater(s) professional background, expertise and prior training operating these instruments will affect psychometric property assessment. Appropriate training of raters is important to minimise measurement error, and to facilitate interpretation of findings. These factors should therefore be considered when interpreting study findings, and extrapolating them for applicability and generalisability to other clinical and research settings [39].

Reference standard

Four studies, which used inanimate objects, did not identify the instruments used to obtain the known values of objects which provided the reference standard data. In order to test validity, it is important that the psychometric properties of the reference standard be known to confirm that the reference standard is suitable [39]. The most suitable non-invasive 3D reference standard for postural measurements has not been unanimously determined in this field of research. The validity studies that used humans also used stereoradiography as reference standard, as radiography remains the most accurate assessment for posture. This situation continues, even though there is a possible health risk for repeated X-ray exposure to healthy spines and organs [40]. Norton et al. [38] used a ruler or tape measure as a reference standard. The x, y, z coordinates obtained from the index test had to be mathematically transformed to distances between pairs of points before the reference data, obtained from the ruler or tape measure, could be used. It would have been better had these authors used a reference standard with known accuracy to measure 3D coordinates directly. The ruler or tape measure was also a poor reference standard to use when measuring the distance between pairs of points on the human skeleton.

Blinding for intra- or interrater reliability

The repeated measurements by Geldhof et al. [25] were performed one week apart however the order of the subjects was fixed. Therefore this enhances the possibility for the raters to recall the test outcomes of the previous measurements and potentially incurs increased bias. Warren et al. [28] and Whittle and Levine [29] tested intrarater reliability however the marking of the anatomical landmarks was only undertaken once before repeated measurements were taken, without allowing for removal and replacement of the markers between repeated measurements. Both raters in these studies were not blinded to their previous measurements of the same subjects. Consequently this potentially introduced bias and compromised the quality of the studies and findings. Given the complexity of posture measurement and interpretation, no statistical strategy for psychometric property testing is without its disadvantages. Therefore it seems sensible to report the findings of two or more different statistical analysis approaches in order to validate findings [21]. This did not occur in any of the included papers. For example Pearcy et al. [36] used linear regression analysis to demonstrate that as the magnitude of the one variable increases so does the amount of error however there is no indication of a cut off value (e.g. 95% CI and SD) up to where the 3 Space Isotrak can be expected to accurately measure an angle. As a variety of statistical measures were reported in this review, another method to improve reporting quality would be for authors to justify why they chose a particular statisical test, relevant to the purpose of testing. This would provide the reader with better insight into the results, and would perhaps guide future authors in choice, and interpretation of more appropriate statisical analysis. For example Norton et al. [38] used multiple analysis to determine whether there is agreement between measures. However Pearson product moment correlation only reports on the correlation between two different measurements and cannot quantify the amount of aggreement or indicate whether there is systematic error. Repeated t-tests are also inappropriate to test systematic differences, as this testing will inflate the type I error and compromise interpretation of significance.

Limitations

One limitation to this review comes from our inability to retrieve potentially eligible papers from authors who failed to respond to email inquiries. It could be that there are other relevant instruments which have been adequately evaluated for reliability and validity, however these papers were not available despite using multiple search methods (database, internet and author searches).

Conclusions

This review described 18 non-invasive ways of measuring static human 3D sitting or standing spinal posture, and the methodological procedures of testing reliability and validity of a subset of these instruments. The review concludes that further research into the reliability and validity testing of these instruments is required to improve the quality of reliability and validity evidence of posture-measuring instruments. Psychometric property testing should be improved by addressing rater qualification, clearer definitions of the reference standards, applying appropriate methodological procedures to enhance rater blinding and improving the quality of reported statistical analysis. By improving the methodological rigor of reliability and validity testing, it would consequently enhance users' confidence in the psychometric evidence of static human 3D sitting or standing spinal posture in clinical and research settings.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YB and QL contributed to the conception and design of the study, YB acquired and analyzed the data and all authors YB, QL and KGS contributed to the interpretation of data, the drafting and critically appraising of the content of the manuscript. All authors read and approved the final manuscript.

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2474/12/93/prepub

56 in total

Review 1. Current techniques for assessing physical exposure to work-related musculoskeletal risks, with emphasis on posture-based methods.

Authors: G Li; P Buckle
Journal: Ergonomics Date: 1999-05 Impact factor: 2.778

Review 2. Computer environments for children: a review of design issues.

Authors: Marisol Barrero; Alan Hedge
Journal: Work Date: 2002

3. The working postures among schoolchildren--a controlled intervention study on the effects of newly designed workstations.

Authors: Lea Saarni; Clas-Håkan Nygård; Arja Rimpelä; Tapio Nummi; Anneli Kaukiainen
Journal: J Sch Health Date: 2007-05 Impact factor: 2.118

4. Validation of a computer analysis to determine 3-D rotations and translations of the rib cage in upright posture from three 2-D digital images.

Authors: Deed E Harrison; Tadeusz J Janik; Rene Cailliet; Donald D Harrison; Martin C Normand; Denise L Perron; Joseph R Ferrantelli
Journal: Eur Spine J Date: 2006-03-18 Impact factor: 3.134

5. Are neck flexion, neck rotation, and sitting at work risk factors for neck pain? Results of a prospective cohort study.

Authors: G A Ariëns; P M Bongers; M Douwes; M C Miedema; W E Hoogendoorn; G van der Wal; L M Bouter; W van Mechelen
Journal: Occup Environ Med Date: 2001-03 Impact factor: 4.402

6. Neck and shoulder pains in relation to physical activity and sedentary activities in adolescence.

Authors: Juha Auvinen; Tuija Tammelin; Simo Taimela; Paavo Zitting; Jaro Karppinen
Journal: Spine (Phila Pa 1976) Date: 2007-04-20 Impact factor: 3.468

7. Is 'ideal' sitting posture real? Measurement of spinal curves in four sitting postures.

Authors: Andrew P Claus; Julie A Hides; G Lorimer Moseley; Paul W Hodges
Journal: Man Ther Date: 2008-09-14

8. New method for the non-invasive three-dimensional measurement of human back movement.

Authors: M J Pearcy; R J Hindle
Journal: Clin Biomech (Bristol, Avon) Date: 1989-05 Impact factor: 2.063

9. Reliability of trunk shape measurements based on 3-D surface reconstructions.

Authors: Valérie Pazos; Farida Cheriet; Jean Danserau; Janet Ronsky; Ronald F Zernicke; Hubert Labelle
Journal: Eur Spine J Date: 2007-08-15 Impact factor: 3.134

10. Real time noninvasive assessment of external trunk geometry during surgical correction of adolescent idiopathic scoliosis.

Authors: Luc Duong; Jean-Marc Mac-Thiong; Hubert Labelle
Journal: Scoliosis Date: 2009-02-24

9 in total

Review 1. Reliability and validity of inexpensive and easily administered anthropometric clinical evaluation methods of postural asymmetry measurement in adolescent idiopathic scoliosis: a systematic review.

Authors: Ashleigh Prowse; Rodney Pope; Paul Gerdhem; Allan Abbott
Journal: Eur Spine J Date: 2015-04-28 Impact factor: 3.134

2. Development and clinical application of Vertebral Metrics: using a stereo vision system to assess the spine.

Authors: Ana Teresa Gabriel; Cláudia Quaresma; Mário Forjaz Secca; Pedro Vieira
Journal: Med Biol Eng Comput Date: 2018-01-20 Impact factor: 2.602

3. Reported influences of backpack loads on postural deviation among school children: A systematic review.

Authors: Balamurugan Janakiraman; Hariharasudhan Ravichandran; Senait Demeke; Solomon Fasika
Journal: J Educ Health Promot Date: 2017-05-05

4. Real-Time Back Surface Landmark Determination Using a Time-of-Flight Camera.

Authors: Daniel Ledwoń; Marta Danch-Wierzchowska; Marcin Bugdol; Karol Bibrowicz; Tomasz Szurmik; Andrzej Myśliwiec; Andrzej W Mitas
Journal: Sensors (Basel) Date: 2021-09-26 Impact factor: 3.576

5. Standard reference values of the upper body posture in healthy male adults aged between 51 and 60 years in Germany.

Authors: Daniela Ohlendorf; Dominik Krüger; Wolfgang Christian; Hanns Ackermann; Fee Keil; Gerhard Oremek; Christian Maurer-Grubinger; David A Groneberg
Journal: Sci Rep Date: 2022-04-28 Impact factor: 4.996

6. Development of a cost effective three-dimensional posture analysis tool: validity and reliability.

Authors: Yolandi Brink; Quinette Louw; Karen Grimmer; Kristiaan Schreve; Gareth van der Westhuizen; Esmè Jordaan
Journal: BMC Musculoskelet Disord Date: 2013-12-01 Impact factor: 2.362

7. Reliability of photographic posture analysis of adolescents.

Authors: Zeynep Hazar; Gul Oznur Karabicak; Ugur Tiftikci
Journal: J Phys Ther Sci Date: 2015-10-30

8. The association between pelvic asymmetry and non-specific chronic low back pain as assessed by the global postural system.

Authors: Qiuhua Yu; Huanjie Huang; Zhou Zhang; Xiaoqian Hu; Wenfeng Li; Le Li; Min Chen; Zhenwen Liang; Wai Leung Ambrose Lo; Chuhuai Wang
Journal: BMC Musculoskelet Disord Date: 2020-09-05 Impact factor: 2.362

9. Surgeon's Neck Posture during Spine Surgeries: "The Unrecognised Potential Occupational Hazard".

Authors: J Naresh-Babu; Viswanadha Arun-Kumar; D G S Raju
Journal: Indian J Orthop Date: 2019 Nov-Dec Impact factor: 1.251

9 in total