| Literature DB >> 26931142 |
Paul Kelly1, Claire Fitzsimons2, Graham Baker3.
Abstract
BACKGROUND: The measurement of physical activity (PA) and sedentary behaviour (SB) is fundamental to health related research, policy, and practice but there are well known challenges to these measurements. Within the academic literature, the terms "validity" and "reliability" are frequently used when discussing PA and SB measurement to reassure the reader that they can trust the evidence. DISCUSSION: In this paper we argue that a lack of consensus about the best way to define, assess, or utilize the concepts of validity and reliability has led to inconsistencies and confusion within the PA and SB evidence base. Where possible we propose theoretical examples and solutions. Moreover we present an overarching framework (The Edinburgh Framework) which we believe will provide a process or pathway to help researchers and practitioners consider validity and reliability in a standardized way.Entities:
Mesh:
Year: 2016 PMID: 26931142 PMCID: PMC4772314 DOI: 10.1186/s12966-016-0351-4
Source DB: PubMed Journal: Int J Behav Nutr Phys Act ISSN: 1479-5868 Impact factor: 6.457
Description of terms for validity and reliability in PA and SB measurement
|
| |
|
| |
| Face validity | The extent to which a measure looks like it will, or appears to, provide the desired information. Assessed by expert consensus and theoretical consideration. |
| Likewise for the proposed data processing and generation of outcome variables. Assessed by expert consensus and theoretical consideration. | |
| Content validity | The extent to which a measure covers all aspects of the intended behavioural or physiological domains or dimensions (see Fig. |
| Likewise for the proposed data processing and generation of outcome variables. | |
| Convergent validity | The extent of the agreement with another (non-criterion) measure that should assess the same PA or SB parameter based on face and content validity. Assessed quantitatively. |
| Useful when the criterion is very resource intensive. | |
| This approach also allows assessment of whether the measures can be used interchangeably, or the data from the two measures pooled or otherwise compared. | |
| Criterion validity | The extent of the agreement between a measure and another already held as being a criterion or gold standard. Assessed quantitatively. Called absolute validity when compared to measure known to provide perfectly true values. |
| Concurrent validity | Assessment of convergent or criterion validity when measures taken at same time. |
| Predictive validity | Assessment of convergent or criterion validity when measures taken at different times. |
|
| |
| Internal validity | The extent to which conclusions drawn from the experimental data are free from confounding issues which cause bias such as reactivity and missing data; similar to methodological quality. Assessed by examination of relevant issues. |
| External validity | The extent to which conclusions drawn from the data are generalizable to the wider populations. Assessed by examination of age, sex, ethnic origin, socio-economic status, etc., of study sample. |
| This could be assessed by a theoretical justification or empirical demonstration such as field testing and small scale “proof of concept” studies. These should assess participant feedback (e.g. satisfaction and burden) as well as data issues (e.g. can meaningful information be produced in reasonable time frames?) | |
|
| |
| Test-retest reliability | The extent to which test scores are consistent from one test administration to the next; keeping as many conditions (e.g. researcher, timing, preparation, etc.) as possible unchanged. Assessed quantitatively. |
| This estimate incorporates any factors that cannot be controlled e.g. intra-rater reliability, behaviour change, etc. | |
| Inter/intra-rater reliability | The extent to which test scores are consistent when measurements are taken by different people using the same methods (inter-rater) or at different times by the same person (inter-rater). Assessed quantitatively. |
| Inter/intra-instrument reliability | The extent to which test scores are consistent when measurements of the same thing are taken by different versions of the same instrument (inter-instrument) or repeatedly by the same version of an instrument (intra-instrument). Assessed quantitatively. |
| Behavioural reliability | The extent to which stability in behaviour has been considered when assessing other aspects of reliability. |
Note: We are not attempting to deliberately re-define any term here; if we use one here that you think we have described incorrectly we suggest this is more evidence for non-standard use of terms and further justification for the need of this framework. Multiple sources used
Fig. 1Domains, dimensions, and correlates and determinants of PA and SB. We use this figure to discuss the different ways these behaviours can be described or characterized. It is not meant to be exhaustive, and some may take issue with how we have used ‘determinants’. When considering sedentary behaviour posture may require its own box. Source: PAHRC teaching materials (MSc Physical Activity for Health)
Fig. 2A false hierarchy for PA and SB measurement when considering anything other than PAEE. Source: PAHRC teaching materials (MSc Physical Activity for Health)
Methodological framework for establishing feasibility, validity and reliability
|
|
|
|---|---|
| Proof of concept–feasibility | 1. Field testing and pilot testing of measure in controlled and free-living settings |
| Content and Face validity | 2. Examination of relevant literature |
| Convergent validity | 6. Assessment of the agreement between your measure and an existing (non-criterion) measure |
| Criterion validity | 7. Assessment of the agreement between your measure and a criterion measure |
| Internal validity | 8. Examination of bias such as reactivity and missing data |
| External validity | 9. Examination of sample bias (age, sex, ethnic origin, socio-economic status) |
| Inter-rater reliability | 10. Assessment of stability of tests administered by different researchers |
| Inter-instrument reliability | 11. Assessment of stability of tests administered using multiple versions of the same instrument |
| Test-retest reliability | 12. Assessment of stability of consecutive tests |
| Behavioural reliability | 13. Assessment of stability accounting for behavioural changes |
| Context validity | 14. Based on all assessments, will measure give useful information in the proposed context? |
| Purpose validity | 15. Based on all assessments and considering study design, are the validity and reliability results suitable for the proposed use and likely to allow the research question to be answered? |
Fig. 3The Edinburgh Framework v1.0 for validity and reliability in PA and SB measurement