| Literature DB >> 28054257 |
Sam Robertson1,2, Peter Kremer3, Brad Aisbett4, Jacqueline Tran3, Ester Cerin4,5.
Abstract
BACKGROUND: Performance tests are used for multiple purposes in exercise and sport science. Ensuring that a test displays an appropriate level of measurement properties for use within a population is important to ensure confidence in test findings. The aim of this study was to obtain subject matter expert consensus on the measurement and feasibility properties that should be considered for performance tests used in the exercise and sport sciences and how these should be defined. This information was used to develop a checklist for broader dissemination.Entities:
Keywords: Assessment; Delphi; Performance tests; Reliability; Responsiveness; Sports testing; Validity
Year: 2017 PMID: 28054257 PMCID: PMC5215201 DOI: 10.1186/s40798-016-0071-y
Source DB: PubMed Journal: Sports Med Open ISSN: 2198-9761
Fig. 1Taxonomy including the initial measurement properties and feasibility as sent to participants as part of the first Delphi round
Results relating to round 1 of the Delphi study, including specific percentage of consensus reached for each of the four questions
| Q1 | Q2 | Q3 | Q4 | ||||
|---|---|---|---|---|---|---|---|
| Group | Item | Consider the item? | Definition and terminology | Importance to quality (mean) | % responses level 4 or 5 | Attitude to item (mean) | % responses level 4 or 5 |
| Reproducibility/reliability | Stability | 71.4 | 68.2 | 3.62 | 65.4a | 3.62 | 69.2 |
| Re-test reliability | 92.9 | 96.0 | 4.5 | 85.7 | 4.43 | 85.7 | |
| Intra-rater | 100.0 | 92.9 | 4.5 | 92.9 | 4.46 | 89.3 | |
| Inter-rater | 100.0 | 92.9 | 4.46 | 89.3 | 4.5 | 89.3 | |
| Internal consistency | 67.9 | 100.0 | 3.39 | 50.0a | 3.29 | 46.4a | |
| Validity | Content validity | 100.0 | 89.3 | 4.68 | 96.4 | 4.64 | 96.4 |
| Discriminant validity | 100.0 | 92.9 | 4.21 | 82.1 | 4.14 | 75.0 | |
| Convergent validity | 78.6 | 91.3 | 3.14 | 28.6a | 3.11 | 28.6a | |
| Concurrent validity | 82.1 | 88.9 | 3.25 | 32.1a | 3.25 | 35.7a | |
| Predictive validity | 85.7 | 91.7 | 3.79 | 64.3a | 3.71 | 60.7a | |
| Responsiveness | Responsiveness | 100.0 | 89.3 | 4.5 | 85.7 | 4.37 | 81.5 |
| Sensitivity | 92.9 | 85.7 | 4.25 | 85.7 | 4.14 | 78.6 | |
| Min. important diff. | 92.9 | 88.9 | 4.04 | 71.4 | 3.96 | 67.9 | |
| Floor and ceiling | 89.3 | 96.2 | 3.54 | 53.6a | 3.39 | 46.4a | |
| Feasibility | Interpretability | 100.0 | 89.3 | 4.21 | 82.1 | 4.18 | 82.1 |
| Familiarity required | 78.6 | 95.7 | 3.79 | 71.4 | 3.75 | 71.4 | |
| Scoring complexity | 92.6 | 96.2 | 3.75 | 57.1a | 3.86 | 60.7a | |
| Completion complexity | 85.7 | 96.3 | 3.54 | 57.1a | 3.64 | 60.7a | |
| Cost | 89.3 | 88.9 | 3.61 | 64.3a | 3.75 | 67.9 | |
| Duration | 92.9 | 100.0 | 3.75 | 67.9 | 3.93 | 75.0 | |
Q1 refers to question one and so forth
aConsensus not reached on the question for the corresponding item
Final list of items ranked by level; corresponding definitions are also included
| Item | Definition | |
|---|---|---|
| Level 1 | Re-test reliability | The consistency of performers(s) results over repeated rounds of testing conducted over a period of typically days or weeks. This represents the change in a participant’s results between repeated tests due to both systematic and random error, rather than true changes in performance [ |
| Intra-rater | The agreement (consistency) among two or more trials administered or scored by the same rater [ | |
| Inter-rater | The level of agreement (consistency) between assessments of the same performance when undertaken by two or more raters [ | |
| Content validity | How well a specific test measures that which it intends to measure [4, 27] | |
| Discriminant validity | The extent to which results from a test relate to results on another test which measures a different construct (i.e., the ability to discriminate between dissimilar constructs) [ | |
| Responsiveness/sensitivity to change | The ability of a test to detect worthwhile and ‘real’ improvements over time (e.g., between an initial bout of testing and subsequent rounds) [ | |
| MID/SWC | The smallest change or difference in a test result that is considered practically meaningful or important [ | |
| Interpretability | The degree to which practical meaning can be assigned to a test result or change in result [ | |
| Familiarity required | The need to undertake a test familiarisation session with all participants prior to main testing in order to reduce or eliminate learning or reactivity effects [ | |
| Duration | Expected and/or actual duration of the testing protocol [ | |
| Level 2 | Stability | The consistency of performer(s) results over repeated rounds of testing conducted over a period of months or years [ |
| Internal consistency | The degree of inter-relatedness among test components that intend to measure the same construct/characteristic [ | |
| Convergent validity | The extent to which results from tests that theoretically should be related to each other are, in fact, related to each other [ | |
| Concurrent validity | The extent to which the test relates to an alternate, previously validated measure of the same construct administered at the same time [ | |
| Predictive validity | The extent to which the test relates to a previously validated measure of a theoretically similar construct, administered at a future point in time [ | |
| Floor and ceiling effects | The ability of a test to distinguish between individuals at the lower and upper extremities of performance (i.e., ability to distinguish between high results (ceiling effect) and low results (floor effect)) [ | |
| Scoring complexity | The ease with which a test can be conducted and scored in a practical setting by the test administrator [ | |
| Completion complexity | The ease with which a test can be completed by a participant [ | |
| Cost | The total amount of resources required for test administration including equipment, time, and administrator expertise/experience [ |
Reference support for each definition has also been provided
MID minimum important difference, SWC smallest worthwhile change
Fig. 2Final taxonomy displaying the 19 level 1 and 2 items important for consideration in evaluating an exercise and sport science performance test
User checklist based on the final results of the Delphi study
All items achieving consensus in the questionnaire are included under the respective ‘level 1’ or ‘level 2’ categories. The user can list previous findings relating the measurement properties and feasibility of a test and/or record their own results