Literature DB >> 36083879

Psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic conditions: A systematic review.

Caoimhe Barry Walsh¹, Roisin Cahalan^1,2, Rana S Hinman³, Kieran O' Sullivan^1,4,5.

Abstract

BACKGROUND: Telehealth could enhance rehabilitation for people with chronic health conditions. This review examined the psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic health conditions using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) approach.
METHODS: This systematic review was registered with Prospero (Registration number: CRD42021262547). Four electronic databases were searched up to June 2022. Study quality was evaluated by two independent reviewers using the COSMIN risk of bias checklist. Measurement properties were rated by two independent reviewers in accordance with COSMIN guidance. Results were summarised according to the COSMIN approach and the modified GRADE approach was used to grade quality of the summarised evidence.
RESULTS: Five articles met the eligibility criteria. These included patients with Parkinson's Disease (n = 2), stroke (n = 1), cystic fibrosis (n = 1) and chronic heart failure (n = 1). Fifteen performance-based measures of physical function administered via videoconferencing were investigated, spanning measures of functional balance (n = 7), other measures of general functional capacity (n = 4), exercise capacity (n = 2), and functional strength (n = 2). Studies were conducted in Australia (n = 4) and the United States (n = 1). Reliability was reported for twelve measures, with all twelve demonstrating sufficient inter-rater and intra-rater reliability. Criterion validity for all fifteen measures was reported, with eight demonstrating sufficient validity and the remaining seven demonstrating indeterminate validity. No studies reported data on measurement error or responsiveness.
CONCLUSIONS: Several performance-based measures of physical function across the domains of exercise capacity, strength, balance and general functional capacity may have sufficient reliability and criterion validity when administered via telehealth. However, the evidence is of low-very low quality, reflecting the small number of studies conducted and the small sample sizes included in the studies. Future research is needed to explore the measurement error, responsiveness, interpretability and feasibility of these measures administered via telehealth.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36083879 PMCID： PMC9462578 DOI： 10.1371/journal.pone.0274349

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Chronic health conditions have the potential to lead to significant levels of disability, mortality and reduced quality of life [1]. In 2019, on average, more than one-third of adults aged 16 and above in 26 OECD (Organisation for Economic Co-operation and Development) countries reported living with a chronic health condition [2]. The ageing nature of the Western world and the increasing prevalence of chronic conditions presents a significant socioeconomic burden and will continue to persistently challenge health care services [3, 4]. Rehabilitation has been identified as an integral aspect of chronic condition management, in order to facilitate people living with chronic health problems to independently manage their condition and improve their physical function and quality of life [5-8]. Although in-person rehabilitation is considered the default service delivery method, healthcare services lack the capacity required to meet the increasing demand for these programmes. Also, uptake levels among patients have traditionally been poor due to different barriers, such as travel and time limitations [9, 10]. Offering rehabilitation via digital platforms (telerehabilitation) may increase service accessibility and overcome barriers to traditional face-to-face programmes. Furthermore, telerehabilitation is as clinically effective as face-to-face rehabilitation for several different chronic populations [11-13]. The recent COVID-19 pandemic presented challenges for rehabilitation service providers, resulting in a dramatic increase in the use of telehealth. This accelerated shift towards the use of an alternative method of service delivery allowed health care services to maintain service accessibility and ensure continuity of patient care. Despite the evidence supporting its efficacy, resistance to the adoption of telehealth has been demonstrated by both patients and healthcare providers [14]. One of the challenges which has limited the adoption of telehealth is the perceived difficulty of assessing patients remotely, particularly the administration of performance-based measures via telehealth platforms and the uncertainty regarding the accuracy and reliability of these measures [15-17]. The use of standardised performance-based measures in clinical assessment is an important element of evidence-based rehabilitation and clinical practice [18, 19] to inform diagnosis, clinical decision making, intervention planning and goal setting [15, 20]. The regular measurement of parameters of performance-based physical function during rehabilitation programmes therefore facilitates objective monitoring and evaluation of the effectiveness of the intervention. The reliability and validity of measures administered via telehealth has been explored in recent systematic, scoping and rapid reviews in musculoskeletal [15, 21], as well as chronic cardiac and respiratory [22, 23] populations. Zischke et al. [24] also conducted a review examining various clinical assessments conducted via telehealth. Overall, these reviews supported the feasibility of assessment via telehealth, and highlighted the reliability and validity of several performance-based measures across domains such as range of motion, strength, endurance, aerobic capacity, balance, gait and functional assessments. However, the existing evidence exploring performance-based measures is limited, with a tendency to focus on the use of measures in specific patient cohorts, rather than considering all domains across all populations with chronic conditions. Furthermore, some of the existing reviews included patient-reported outcomes such as pain intensity, or pain response during special orthopaedic tests. While evidence demonstrates that electronic patient-reported measures are equivalent to paper-based self-reported measures when administered in various chronic populations [25-28], there is limited evidence exploring the psychometric properties and equivalence of performance-based measures administered via telehealth when compared to face-to-face administration. Therefore, a comprehensive overview of a wide range of performance-based measures relevant to a variety of chronic neurological, respiratory and musculoskeletal conditions administered via telehealth is required. To our knowledge, this is the first review using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) approach to evaluate the reliability and validity of performance-based measures of physical function across a broad range of chronic health conditions.

Methods

This systematic review protocol was registered with Prospero (Registration number: CRD42021262547). This review was conducted in accordance with COSMIN methodology which is a robust approach that aims to improve the selection of measurement instruments using transparent methodology.

Search strategy

A comprehensive search strategy was developed, reviewed and refined by the authors, with the assistance of a health librarian, in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [29] (S1 Fig). An electronic database search of PubMed, EMBASE, CINAHL and PsycINFO via EBSCOhost was conducted on the 28th of June 2022. Key search terms were developed using four individual search filters. These filters included: Population: chronic conditions OR chronic disease OR chronic health OR chronic illness OR long term illness OR long term disability OR long term condition Construct: physical function OR physical performance OR functional capacity OR physical capacity Measurement Instrument: assessment OR evaluation OR outcome OR measure OR test Context: telehealth OR telerehabilitation OR telemedicine OR e-health These individual filters were combined with the COSMIN search filter for measurement properties [30] to create the search strategy outlined in S2 Fig. Hand-searching of the reference lists of the included articles was also performed to identify additional relevant articles.

Eligibility criteria

Studies were included in the review if they met the following criteria: Population: adults (≥18 years) diagnosed with any chronic health condition, as defined by the ICD-10-CM [31] as a condition that lasts greater than 12 months and results in the need for ongoing medical intervention and limits self-care, independent living and social interaction. Studies including a mixed sample of acute and chronic populations were included if at least 80% of the sample had a chronic diagnosis. Construct: the evaluated measure was a performance-based measure of physical function, as defined by the World Health Organisation (WHO) [32] International Classification of Functioning, Disability and Health (ICF) framework as activities which relate to the ability to move around and perform daily activities e.g. strength, balance, etc. Measurement instrument: an established performance-based measure of physical function, commonly used in clinical practice, which was evaluated synchronously by a tester as the activity was being performed by the individual. This usually involved evaluation by timing, counting or distance methods [33]. Setting: the evaluated measure was administered by a tester located remotely from the patient using any telehealth platform, as defined by WHO as “the delivery of health care services, where patients and providers are separated by distance. Telehealth uses information and communication technologies (ICT) for the exchange of information for the diagnosis and treatment of diseases and injuries.”. Measurement properties: In our pre-registered protocol, we highlighted studies must have reported one or more of the psychometric measurement properties from the COSMIN taxonomy [30]. For studies examining the validity of the measurement instrument administered via telehealth, the comparator was a face-to-face administration of the same measurement instrument. Since the comparator was always face-to-face administration of the same measure, when extracting data from the selected studies the measurement properties of interest were reliability, measurement error and criterion validity. Therefore, the remaining measurement properties outline in the COSMIN taxonomy including other forms of validity and interpretability were not considered to be outcomes of interest in this review. Studies were excluded if (1) the evaluated measure was a self-reported measure of physical function, or a laboratory value (e.g., VO2 max, spirometry, etc.) indirectly used to assess physical function, or a self-administered measure that did not involve administration and evaluation by an independent tester; or (2) the study population consisted of post-operative patients since post-operative pain and disability levels differ in magnitude and stability from chronic conditions. A sample of 30% of abstracts from the database search were initially screened by two independent reviewers (CBW & RC) to determine potential eligibility. As good agreement (>80%) was achieved, the remaining abstracts were screened by one reviewer (CBW). Thereafter, a sample of 30% of full texts of potentially eligible studies were reviewed to determine eligibility by two independent reviewers (CBW & RC). Any disagreements were resolved through discussion with a third reviewer (KOS). As above, good agreement was achieved, and the remaining studies were reviewed by one reviewer (CBW).

Data extraction

Data were extracted by two independent authors (CBW & KOS) using a table created by the authors following COSMIN guidance [34]. Firstly, the characteristics of the included studies and the performance-based measures evaluated within the studies were extracted. Thereafter, data relating to the evaluation of the methodological quality of the included studies and the evaluation of the measurement properties (i.e. strength of correlations/associations) were also extracted. The included performance-based measures were categorised according to the domain of physical function that they measured. These domains included exercise capacity, functional strength, functional balance and general functional capacity.

Methodological quality of included studies

The methodological quality of each of the included studies was evaluated by two independent authors (CBW & KOS) using the COSMIN risk of bias checklist and scores were determined by consensus [34]. This tool contains separate standards for each measurement property (i.e., reliability, measurement error and criterion validity) that can be used to determine the trustworthiness of the result. Each of the standards were rated and the ‘worst-score-counts’ method was applied to determine the overall quality of each measurement property reported in the included studies [34].

Evaluation of the measurement properties reported in the included studies

The COSMIN methodology was used to evaluate the measurement property results reported in each of the included studies [34]. These results were evaluated according to the criteria for good measurement properties (strength of correlations/associations with the reference standard face-to-face administration of the measure) to give a rating of sufficient (+), indeterminate (?) or insufficient (-) for each measurement property, as described by Prinsen et al. [34] (See S1 Table). For the reliability domain, inter-rater and intra-rater reliability were evaluated by comparing the scores for the measure when administered via telehealth between different raters and also when administered by the same rater at two different time points. As recommended, a threshold of 0.70 on the intraclass correlation (ICC) or weighted kappa was used to evaluate the reliability of the measure administered via telehealth [34]. If the correlation was ≥ 0.70 the reliability received a sufficient rating. If the ICC or weighted kappa was not reported it received an indeterminate rating for reliability. The reliability of the measure was rated as insufficient if the ICC or weighted kappa score was < 0.70. For the criterion validity domain, the measure administered via telehealth was compared to the same measure administered in a face-to-face environment. A correlation of 0.70 with the reference standard [34], which for the purpose of this review was the measure administered in a face-to-face environment, was the threshold. The validity of the measure was rated as sufficient if the correlation was ≥ 0.70. The validity was rated as indeterminate if correlations were not reported. The validity was rated as insufficient if the correlation with the reference standard was < 0.70.

Data synthesis and analysis

To synthesise the results, the evidence was summarised per measurement property (e.g., reliability, validity) per outcome measure to come to an overall conclusion regarding the reliability and validity of the measures. If multiple studies examined the same measure, the results of the studies were synthesised to achieve an overall result. In the case of inconsistency in the results between studies (e.g., both sufficient and insufficient results were found), explanations for the inconsistency were explored. When inconsistent results likely existed due to varying study quality as previously described, the results of lower quality studies were omitted and only the higher quality results were used to determine the overall rating and the quality of summarised evidence was downgraded due to inconsistency. If no logical explanation was found which could explain the inconsistency, the results were considered inconsistent. The modified GRADE approach was used by two independent reviewers and disagreements were resolved by consensus to summarise how confident we can be that the summarised evidence is trustworthy [35]. The summarised evidence was graded as high, moderate, low or very low based on the following four criteria: 1. Risk of Bias (quality of the studies); 2. Inconsistency (of the results of the studies); 3. Imprecision (total sample size of the studies) and 4. Indirectness (evidence from different populations than the population of interest). Detailed instructions on the use of the modified GRADE approach to grade the quality of the summarised evidence can be found in S2 Table. The starting point assumed that the summarised result was of high quality and was downgraded by one, two or three levels depending on the risk of bias. The summarised result was further downgraded depending on the inconsistency, imprecision and indirectness associated with the summarised result as appropriate. When inconsistency existed between the results of the included studies examining the same measurement instrument, the results were summarised as sufficient or insufficient and the quality of the evidence was downgraded for inconsistency with one or two levels depending on the severity of the inconsistency. As the severity of inconsistency between results is context dependent, the level of severity was discussed and decided by the review team in each situation. For imprecision, the evidence was downgraded one level if the sample size was 50–100 individuals. If the sample size was less than 50 individuals the evidence was downgraded two levels. As this review included studies in which the >80% of the population had a chronic diagnosis, the risk of bias associate with indirectness did not exist and therefore the evidence was not downgraded for indirectness.

Results

The initial search yielded 9,906 articles, of which 7,377 remained after duplicates were removed. Five articles met the inclusion criteria and were deemed eligible for inclusion in the review. Fig 1 outlines the search results and screening process using a PRISMA flow diagram [29].

Fig 1

Process of identification, screening and exclusion of studies according to the PRISMA statement [29].

Study characteristics

A summary of the descriptive characteristics of each included study and the included performance-based measures is presented in Table 1. Four of the included studies were conducted in Australia [36-39], with the remaining study conducted in the United States [40]. A total of 77 individuals were included in the review with sample sizes ranging from 10–26 participants. Two studies included patients with Parkinson’s Disease [38, 39] and the remaining three studies included patient cohorts with stroke, cystic fibrosis and heart failure respectively [36, 37, 40].

Table 1

Characteristics of included studies.

Study	Country	Performance Measure	Physical Function Domains	Telehealth Environment	Equipment Required	Participant Population	Mean age ± SD (range)	Rater Population	Measurement Properties Assessed
Cox et al. 2013 [37]	Melbourne,Australia	3-minute step test	Exercise capacity	Administered using synchronous videoconferencing platform by clinician in separate room to the participant within the same building	15cm high step, metronome, pulse oximeter	N = 10 adults with cystic fibrosis recruited prospectively on admission to hospital, N = 5 males, N = 5 females, mean FEV1 = 55.4% of predicted (range = 38–90% of predicted)	32 years ± 7 years	Not reported	Criterion validity
Hoffmann et al. 2008 [38]	Queensland,Australia	FIM (motor components), UPDRS (selected items), Nine Hole Peg Test, Grip strength Pinch strength	Functional strength, functional capacity	Administered using synchronous videoconferencing platform by clinician in a separate room to the participant	Hand-held dynamometer, Pinch gauge	N = 12 community-dwelling participants with Parkinson’s Disease, adequate cognitive status to participate in assessment tasks, N = 6 males, N = 6 females, N = 6 tested in telehealth, N = 6 tested face-to-face	66.1 years ± 8.5 years	N = 3 assessors	Inter-rater reliability, intra-rater reliability, criterion validity
Hwang et al. 2017 [36]	Brisbane, Australia	TUGT, 6MWT, grip strength	Functional balance and mobility, exercise capacity, functional strength	Administered using synchronous videoconferencing by clinician in a separate room within the same hospital building	TUGT: stopwatch, 45cm high chair with arm rests, 3m walk track, regular footwear ± mobility aid6MWT: 30m track, stopwatch, automatic sphygmomanometer, finger pulse oximeter, lap counter Grip strength: Hand-held dynamometer	N = 17 patients with chronic heart failure, 88% males, 12% females	69 years ± 12 years	N = 4 hospital physiotherapists with an average of 11.5 years of work experience in physiotherapy	Inter-rater reliability, intra-rater reliability, criterion validity
Palsbo et al. 2007 [40]	United States of America	European Stroke Scale, Functional Reach Test	Functional capacity, functional balance	Administered using synchronous videoconferencing platform by clinician in a separate room to the participant	European Stroke Scale: examination tableFunctional Reach Test: large yard stick	N = 26 patients with a history of stroke including both inpatients and outpatients, N = 18 males, N = 18 females, time since stroke range 2 months-15 years, mean = 2.7 years	Median age = 64 years	N = 4 physiotherapists from rehabilitation hospitals, all had at least 2 years of experience using telehealth to support onsite physiotherapists for a variety of patient assessments	Criterion validity
Russell et al. 2013 [39]	Queensland, Australia	TUGT, step test, steps in 360 degree turn, timed stance test, Berg Balance Scale, lateral reach test, functional reach test	Functional balance and mobility	Administered using synchronous videoconferencing platform by clinician in a separate room to the participant	TUGT: stopwatchLateral and Functional Reach Tests: calibrated assessment tool	N = 12 people with Parkinson’s Disease, adequate cognitive status to participate in assessment tasks, N = 6 males, N = 6 females, mean age at time of diagnosis = 53.5 years, SD = 9.0, range = 38–69 years, average number of years since diagnosed with Parkinson’s = 6.8 years, SD = 4.4, range 2–15 years	66.1 years ± 8.5 years (45–76 years)	N = 1 final year physiotherapy and N = 2 occupational therapy students	Inter-rater reliability, intra-rater reliability, criterion validity

FIM = Functional Independence Measure, UPDRS = Unified Parkinson’s Disease Rating Scale, TUGT = Timed Up and Go Test, 6MWT = 6 Minute Walk Test, FEV1 = Forced Expiratory Volume in one second, N = sample size, SD = standard deviation; cm = centimetres, m = metres Measurement properties of 15 performance-based physical function measures were investigated in the included studies. Inter-rater and intra-rater reliability were reported for 12 of the measures, while the criterion validity of all 15 measures was reported. No studies reported data on measurement error or responsiveness. Of the 15 performance-based measures, seven assessed balance (Timed Up and Go Test, functional and lateral reach tests, Berg Balance Scale, step test, steps in 360 degree turn, and timed stance test) [36, 39, 40], two assessed exercise capacity (3 minute step test and 6 minute walk test) [36, 37] and two assessed functional strength (grip and pinch strength) [36, 38]. The remaining four measures assessed diverse aspects of functional capacity including the Functional Independence Measure (FIM) [38], of which only the motor components were assessed (bathing, dressing, toileting, walking, stairs, eating, grooming, bladder management, toilet transfers, bowel management bed/chair transfers, tub/shower transfers), the Unified Parkinson’s Disease Rating Scale (UPDRS) [38], of which relevant items were assessed (posture, gait, sensory complaints, falling, freezing when walking, tremor, tremor at rest, salivation, facial expression, bradykinesia, speech, action or postural hand tremor, handwriting), the Nine Hole Peg Test (38) and the European Stroke Scale [40]. The COSMIN risk of bias scores for the measurement properties of the measures in each included study are displayed in Table 2. Of the studies that reported the reliability of the included measures, ten measures demonstrated adequate quality while three demonstrated inadequate quality. Of the studies that reported criterion validity of the measures, eight measures demonstrated very good quality and ten demonstrated inadequate quality. Many of the studies reporting the criterion validity of the included measures received an inadequate quality rating as per COSMIN guidance as the correlation with the reference standard was not calculated [41] (e.g. mean differences between measures administered via telehealth compared to face to face administration were reported as opposed to correlations).

Table 2

Measurement properties of performance-based measures.

Performance-based Measure	Reliability					Criterion Validity
	Result	Design	Time Interval	COSMIN Risk of Bias Score	Overall Rating	Result	COSMIN Risk of Bias Score	Overall Rating
Exercise Capacity
6MWT [36]	ICC_2,1 >0.99ICC_1,1 >0.99	Inter-raterIntra-rater	Same day	InadequateInadequate	++	ICC_1,1 (95%CI) 0.90 (0.74–0.96)MD (95%CI) 4 (-25 to 17 metres)	Very good	+
3 min Step Test [37]						MD lowest SpO2 0.2% (LoA -3.4 to 3.6%), MD rate of perceived exertion 0.5 points (LoA -1.1 to 2.1 points)MD heart rate -0.6 beats/min (LoA -11.3 to 10.1 beats/min)	Inadequate	?
Strength Tests
Grip Strength [38]						Authors report “no differences” observed	Inadequate	?
Grip Strength [36]	ICC_2,1 >0.99ICC_1,1 >0.99	Inter-raterIntra-rater	Same day	InadequateInadequate	++	Right hand: ICC_1,1 (95%CI) 0.94 (0.84–0.98)Left hand: ICC_1,1 (95%CI) 0.96 (0.89–0.98)	Very good	++
Pinch Strength [38]						Authors report “no differences” observed	Inadequate	?
Balance Tests
Berg Balance Scale [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	Kappa 0.94, %EA 16.7, %A ±1 75.0	Very good	+
TUGT [36]	ICC_2,1 0.95 (0.86–0.98)ICC_1,1 0.96 (0.90–0.99)	Inter-raterIntra-rater	Same day	InadequateInadequate	++	ICC_1,1 (95%CI) 0.85 (0.64–0.94)MD (95%CI) 0.24 (-0.56 to 1.03) seconds	Very good	+
TUGT [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	LoA 1.25 to 1.24Clinically acceptable limit 5.00MD -0.01, SD 0.63MAD 0.47	Inadequate	?
Step Test [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	Right foot: Kappa 0.97, %EA 75.0, %A ±1 83.3Left foot: Kappa 0.95, %EA 66.7, %A ±1 83.3	Very goodVery good	++
Functional Reach Test [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	LoA -2.71 to 0.69Clinically acceptable limit 4.74MD -1.01, SD 0.87MAD 1.01	Inadequate	?
Functional Reach Test [40]						No significant difference between results (Z = -0.239, p>0.05)92% of participants scored within 95% agreement limits	Inadequate	?
Steps in 360 degrees turn [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	Right foot: Kappa 0.98, %EA 75.0, %A ±1 100.0Left foot: Kappa 0.97, %EA 66.7, %A ±1 91.7	Very goodVery good	++
Lateral Reach Test [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	MD -0.79, SD 0.66, LoA -2.09 to 0.51, clinically acceptable limit 4.74, MAD 0.82	Inadequate	?
Timed Stance Test [39]	ICC_2,1≥0.96ICC_2,1≥0.98	Inter-raterIntra-rater	2 months	AdequateAdequate	++	LoA -4.17 to 5.06, clinically acceptable limit 8.00, MD 0.44, SD 2.35, MAD 1.58	Inadequate	?
Functional Capacity Tests
FIM (motor components) [38]	ICC_2,1 0.95ICC_2,1 0.94	Inter-raterIntra-rater	1 week2 months	1 week2 months	++	Kappa 0.93, %EA 91.6%, %A ±1 98.7%	Very good	+
UDPRS (selected items) [38]	ICC_2,1 0.80ICC_2,1 0.84	Inter-raterIntra-rater	1 week2 months	AdequateAdequate	++	Kappa 0.81, %EA 73.4%, %A ±1 95.2%	Very good	+
European Stroke Scale [40]						No significant difference between results (Z = -0.239, p>0.05)	Inadequate	?
Nine Hole Peg Test [38]	ICC_2,1 0.99ICC_2,1 0.99	Inter-raterIntra-rater	1 week2 months	AdequateAdequate	++	Right hand: MD 0.25 seconds (SD 0.90), LoA -2.02 to 1.52, MAD 0.68 secondsLeft hand: MD 0.14 seconds, SD 0.61, LoA -1.34 to 1.05, MAD 0.45 seconds	InadequateInadequate	??

FIM = Functional Independence Measure, UPDRS = Unified Parkinson’s Disease Rating Scale, TUGT = Timed Up and Go Test, 6MWT = 6 Minute Walk Test, %EA = Percent exact agreement, %A ±1 = Percent agreement within one point on ordinal scale, SD = Standard deviation, MAD = Mean absolute difference, ICC = intraclass correlation coefficient, MD = Mean difference, LoA = Limits of agreement, + = sufficient rating,? = indeterminate rating

Overall rating and quality of evidence

A summary of the overall rating and quality of evidence per measurement property of the included measures is presented in Table 3. These scores were developed from the information displayed in Table 2 which included the rating and the COSMIN risk of bias score. Twelve measures received ‘sufficient’ overall ratings for reliability, with a ‘very low’ quality of evidence score. Eight measures received ‘sufficient’ overall ratings and seven received ‘indeterminate’ ratings for criterion validity and were all scored as ‘low’ or ‘very low’ quality of evidence. For example, the Six Minute Walk Test (6MWT) demonstrated sufficient reliability (ICC>0.70) with a ‘very low’ quality of evidence score due to the inadequate COSMIN risk of bias rating of the included study and the small sample size (n<50). The 6MWT also demonstrated ‘sufficient’ validity (correlation with face-to-face>0.70) with a ‘low’ quality of evidence score due to the ‘very good’ COSMIN risk of bias rating of the included study and the low sample size (n<50). The 3-minute step test demonstrated ‘indeterminate’ validity as no correlation with the reference standard (face-to-face administration) was reported, which was insufficient information reported to provide a ‘sufficient’ rating according to the COSMIN guidance. The summary scores for the validity of the Timed Up and Go Test (TUGT) and grip strength reflect adjustments that were made to allow for inconsistencies in the results reported in the included studies. For example, the validity of the TUGT was reported in two studies and received an overall ‘sufficient’ validity rating (correlation>0.70) [36], with ‘very low’ quality of evidence due to the inadequate COSMIN risk of bias score of the included studies, the small sample size (<50), and the inconsistency between the validity findings of the included studies [36-39].

Table 3

Summary of findings.

Reliability	Summary Result	Overall Rating	Quality of Evidence
6MWT	ICC>0.99; sample size: 17	Sufficient	Very Low (one inadequate study, sample <50–100)
Step Test	ICC≥0.96; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
Grip Strength	ICC>0.99; sample size: 17	Sufficient	Very Low (one inadequate study, sample <50–100)
Berg Balance Scale	ICC≥0.96; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
TUGT	ICC>0.95; total sample size: 29	Sufficient	Very Low (multiple studies of at least inadequate quality, sample <50–100, consistent results)
Functional Reach Test	ICC≥0.96; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
Steps in 360 degree turn	ICC≥0.96; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
Lateral Reach Test	ICC≥0.96; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
Timed Stance Test	ICC≥0.96; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
FIM (motor components)	ICC range 0.94–0.95; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
UDPRS (selected items)	ICC range 0.80–0.84; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
Nine Hole Peg Test	ICC 0.99; sample size: 12	Sufficient	Very Low (one adequate study, sample <50–100)
Criterion Validity
6MWT	ICC 0.90, mean difference of 4; sample size: 17	Sufficient	Low (one very good study, sample <50–100)
Step Test	Kappa range 0.95–0.97, %EA ≥66.7, %A±1 83.3; sample size: 12	Sufficient	Low (one very good study, sample <50–100)
3 min Step Test	MD Sp02 0.2%, MD rate of perceived exertion 0.5 points, MD heart rate -0.6 beats/min; sample size: 10	Indeterminate	Very Low (one inadequate study, sample <50–100)
Grip Strength	Right hand: ICC_1,1 (95%CI) 0.94 (0.84–0.98)Left hand: ICC_1,1 (95%CI) 0.96 (0.89–0.98); authors report “no differences” observed; total sample size: 29	Sufficient	Very Low (multiple studies of at least inadequate quality, sample <50–100, inconsistent results)
Pinch Strength	Authors report “no differences” observed; sample size: 12	Indeterminate	Very Low (one inadequate study, sample <50–100)
Berg Balance Scale	Kappa 0.94, %EA 16.7, %A±1 75.0; sample size: 12	Sufficient	Low (one very good study, sample <50–100)
TUGT	ICC 0.85, MD 0.24 seconds, LoA 1.25 to 1.24, CAL 5.00, MD -0.01, SD 0.63; total sample size: 29	Sufficient	Very Low (multiple studies of at least inadequate quality, sample <50–100, inconsistent results)
Functional Reach Test	LoA -2.71 to 0.69, CAL 4.74, MD -1.01, SD 0.87, MAD 1.01; No significant difference between results (Z = -0.239, p>0.05), 92% of participants scored within 95% agreement limits; total sample size: 29	Indeterminate	Very Low (multiple studies of at least inadequate quality, sample <50–100, consistent results)
Steps in 360 degrees turn	Kappa range: 0.97–0.98, %EA≥66.7, %A±1 ≥ 91.7; sample size: 12	Sufficient	Low (one very good study, sample <50–100)
Lateral Reach Test	MD -0.79, SD 0.66, LoA 2.09 to 0.51, CAL 4.74, MAD 0.82; sample size: 12	Indeterminate	Very Low (one inadequate study, sample <50–100)
Timed Stance Test	LoA -4.17 to 5.06, CAL 8.00, MD 0.44, SD 2.35, MAD 1.58; sample size: 12	Indeterminate	Very Low (one inadequate study, sample <50–100)
FIM (motor components)	Kappa 0.93, %EA 91.6, %A±1 98.7; sample size: 12	Sufficient	Low (one very good study, sample <50–100)
UDPRS (selected items)	Kappa 0.81, %EA 73.4, %A±1 95.2; sample size: 12	Sufficient	Low (one very good study, sample <50–100)
European Stroke Scale	No significant difference between results (Z = -0.239, p>0.05); sample size: 26	Indeterminate	Very Low (one inadequate study, sample <50–100)
Nine Hole Peg Test	MD range 0.14–0.25 seconds, SD range 0.61–0.90, MAD range 0.45–0.68seconds; sample size: 12	Indeterminate	Very Low (one inadequate study, sample <50–100)

MD = Mean difference, MAD = Mean Absolute Difference, ICC = Intraclass correlation coefficient, CAL = Clinically acceptable limits, LoA = Limits of agreement %EA = Percent exact agreement, %A±1 = Percent agreement within one point on ordinal scale, SD = Standard deviation, FIM = Functional Independence Measure, UPDRS = Unified Parkinson’s Disease Rating Scale, TUGT = Timed Up and Go Test, 6MWT = 6 Minute Walk Test

Exercise capacity measures

Measures of exercise capacity included in this review were the 6MWT and three minute step test. The 6MWT demonstrated sufficient reliability and criterion validity when administered via telehealth. Evidence for the administration of three minute step test via telehealth is yet to be determined as there was no information available examining its reliability, and the evidence for criterion validity was indeterminate due to non-optimal analysis. Therefore recommendation for the use of this instrument via telehealth cannot be made. However, the mean differences between the telehealth assessment and the face-to-face assessment observed by Cox et al. [37] were very small and suggest that there was no significant difference between the telehealth assessment and face-to-face administration. Therefore these three minute step test results are encouraging.

Functional strength measures

The grip strength test demonstrated sufficient reliability and criterion validity. Evidence for the pinch strength measure administered via telehealth is yet to be determined due to the lack of information available.

Functional balance measures

Seven measures of functional balance were included in the review. Measures with the most robust results demonstrating sufficient reliability and criterion validity were the Berg Balance Scale, TUGT, Step test and the Steps in 360 degree turn test. The other measures (Functional Reach Test, Lateral Reach Test and Timed Stance Test) all demonstrated sufficient reliability, however, the criterion validity of these measures administered via telehealth when compared to face-to-face administration could not be determined due to non-optimal analysis. The mean difference between the telehealth and face-to-face administration of the functional reach test of -1.01 as observed by Russell et al. [39] lies within the limits of agreement of -2.71 to 0.69. This is also within the clinically acceptable limit of 4.74cm [42], supporting telehealth administration of the functional reach test [43]. Similarly, the mean difference observed between the telehealth and face-to-face administration of the lateral reach test [39] of -0.79 is within the reported limits of agreement (-2.09 to 0.51) and clinically acceptable limit (4.74cm) [42], which supports this measure being administered via telehealth. Finally, the mean difference of 0.44 [39] between the timed stance tests when administered via telehealth compared to face-to-face administration is also within the limits of agreement (-4.17 to 5.06), and is less than the clinically acceptable limit (8.00 seconds). Therefore it can be reasonably assumed that the telehealth administration of the timed stance test is valid when compared to face-to-face administration.

Functional capacity measures

Other measures included in the review which measured various aspects of general functional capacity included the European Stroke Scale, Unified Parkinson’s Disease Rating Scale, Functional Independence Measure and Nine Hole Peg Test. The most robust results were reported for the Functional Independence Measure and the Unified Parkinson’s Disease Rating Scale which both demonstrated sufficient reliability and criterion validity. However, as the Unified Parkinson’s Disease Rating Scale is a population-specific measure, the Functional Independence Measure may be more appropriate for various chronic populations. While the Nine Hole Peg Test demonstrated sufficient reliability, the criterion validity was indeterminate due to non-optimal analysis. However, the mean differences of 0.25 seconds (right hand) and 0.14 seconds (left hand) observed between telehealth administration and face-to-face administration of the measure were both within the limits of agreement of -2.02 to 1.52 (right hand) and -1.34 to 1.05 (left hand), which is encouraging. Evidence for the reliability and criterion validity of the European Stroke Scale administered via telehealth is yet to be determined due to the lack of information available and non-optimal analysis.

Discussion

This systematic review identified five studies which examined the psychometric properties of fifteen performance-based measures of physical function administered via telehealth among people with various chronic conditions. Overall, there is low-very low evidence demonstrating sufficient reliability and criterion validity for a range of measures across each domain of exercise capacity, strength, balance and functional capacity when administered via telehealth and compared to face-to-face administration. The overall quality of evidence was low-very low, reflecting the small number of studies, the small sample sizes of the included studies and non-optimal analyses (i.e. failure to correlate scores with the reference standard face-to-face administration method) as per the COSMIN risk of bias tool. The findings of sufficient reliability and criterion validity when administered via telehealth mirror that reported when many measures are administered among chronic populations in a face-to-face environment, including the 6MWT [44, 45] and the grip strength test [46, 47], as well as the Berg Balance Scale, TUGT and Step Test [48-52]. As per COSMIN guidance, in order to demonstrate ‘sufficient’ validity the measure administered via telehealth must demonstrate >0.70 correlation with the measurement when administered in a face-to-face setting. While the included measures which appeared to be valid when compared to face-to-face administration, the correlations were not calculated and therefore there was not sufficient information to classify the criterion validity as ‘sufficient’ as per the COSMIN standards. Also, the quality of the included studies were downgraded for this same reason. Therefore these findings should be interpreted with caution. Although the included studies did not report on all measurement properties for each measure, sufficient evidence was reported for the reliability and criterion validity of some measures across several domains. No evidence regarding the measurement error or responsiveness of the included measures was reported in the included studies.

Strengths and limitations

Strengths of this review include the prospective protocol registration, following PRISMA guidance, as well as using two reviewers for screening, shortlisting and data extraction. A particular strength is using the COSMIN approach, which had not been used in previous psychometric evaluations for performance measures via telehealth. There were also some limitations to be acknowledged. As previously stated, two independent reviewers screened a sample of 30% of abstracts and relevant full texts to determine eligibility. As good agreement was achieved on the 30% sample, the remaining screening process was not performed in duplicate. Although the quality of the summarised evidence was rated using the modified GRADE approach by two independent reviewers, neither of these reviewers were formally trained in the use of this method. Due to the heterogeneous nature of the included measures and the populations of the included studies, a meta-analysis could not be performed and the results could not be quantitatively summarised. As the majority of the results could not be combined, best evidence synthesis was mostly obtained from a single study. Further evidence may have been identified from studies of post-operative populations, such as individuals post total knee arthroplasty [53]. However, these were excluded as pain and disability levels immediately post-operatively, and how much these change relatively quickly, are quite different from other chronic conditions where physical function may be more stable over time. There was limited information reported regarding the characteristics of the samples in the included studies in relation to aspects such as socioeconomic status, cognitive status and technological literacy. Also, the included studies were all carried out in countries with ‘very high’ Human Development Index scores [54]. These factors could potentially impact the external validity of the findings. Although Cox et al. [37] reported the usability of the three minute step test administered via telehealth and Hwang et al. [36] reported some information regarding the number and nature of technical issues encountered during telehealth administration of the 6MWT, TUGT and grip and pinch strength, there was limited information reported in the included studies regarding the interpretability and feasibility of the included measures when administered via telehealth. As noted in the eligibility criteria, this review was concerned with examining the validity of measures administered via telehealth when compared to face-to-face administration of the same measure. Therefore, other types of validity, such as content and construct validity, were not reported in this review. Due to the eligibility criteria and aims of this review, the outcomes of interest were reliability, measurement error and criterion validity. For this reason, we followed COSMIN recommendations for evaluating reliability, measurement error and criterion validity but a priori did not choose to evaluate other types of validity, internal structure, interpretability and feasibility.

Clinical implications

Encouragingly, several performance-based measures of physical function across different domains (e.g. exercise capacity, strength and balance) may have satisfactory reliability and criterion validity when used in a telehealth environment. Furthermore, the psychometric properties of these measures appears similar to that reported for the same measures when used in a face-face context. This should reassure clinicians that using performance-based measures of physical function via telehealth is possible. However, this evidence is of low-very low quality and there is a significant lack of information regarding the measurement error and responsiveness of these measures. Furthermore, information regarding the interpretability and feasibility of the included measures was very limited [36, 37].

Future research

This systematic review highlights the need for further larger, high quality research, in line with COSMIN guidance, exploring the psychometric properties of performance-based measures of physical function administered via telehealth among people with various chronic conditions. In particular more studies examining the measurement error, responsiveness, interpretability and feasibility of these instruments are required. Although some of the measures included in this review demonstrated sufficient reliability and validity, none of the measures were evaluated with respect to all measurement properties and therefore strong recommendations cannot yet be made. Additionally, the lack of studies exploring the administration of performance-based measures of physical function via telehealth among chronic musculoskeletal populations is acknowledged. Therefore this review highlights the need for future studies to be conducted in this population.

Conclusion

A wide range of performance-based measures measuring various domains of physical function administered via telehealth among chronic populations have been identified in this review. All of these measures appear to be reliable when used in a telehealth environment. Validity of these measures is less certain, and there is no information regarding the measurement error or responsiveness of these measures. Further high quality research is required to examine the psychometric properties of a core set of measures administered via telehealth among people with chronic health conditions, particularly regarding measurement error, responsiveness, feasibility and interpretability.

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines.

(PDF) Click here for additional data file.

Search strategy.

(PDF) Click here for additional data file.

Criteria for good measurement properties.

(PDF) Click here for additional data file.

Instructions on the use of the modified GRADE approach.

(PDF) Click here for additional data file. 15 Jun 2022

PONE-D-22-14057

Psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic conditions: A systematic review

PLOS ONE Dear Dr. Barry Walsh, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Additional Editor Comments: Please find the comments of the reviewers below. All of the reviewers indicated essential major and minor revisions. King Regards Please submit your revised manuscript by Jul 30 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Fatih Özden, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Partly Reviewer #4: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know Reviewer #3: Yes Reviewer #4: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for the opportunity to review this manuscript. The study represents an important piece of work evaluating measurement properties for measurement tools for physical function using the COSMIN checklist. The review appears to be robust and well conducted, and provides a clear summary of the reliability measurements undertaken in the literature. I found the description of the methods and findings around validity less clear and would suggest some significant modifications before this study is suitable for publication. 1) Use of the COSMIN Checklist: the references to the COSMIN checklist link to recommendations for patient reported outcome measures. The use of COSMIN methodology seems reasonable for these performance based (not self reported) tools, but needs to be justified in the methods and discussed in the study limitations 2) The COSMIN user manual that is referenced recommends a 10 step process for systematic review of measurement properties. For a comprehensive review such as this one I would expect statements describing the methodology and findings for: *Content validity including tool development *Internal structure *Remaining measurement properties (reliability, measurement error, criterion validity, hypothesis testing + responsiveness) *Interpretability and feasibility While some of these domains may not be directly applicable for the tools you have identified, and others may have no available data, I would suggest this should be acknowledged in your methods and then considered in your discussion as deviations from the COSMIN method. 3) Study construct / measurement instrument selection: you describe the construct of interest as "performance-based physical function" and your measurement instruments of interest as "performance measures". Do you mean that the type of physical function you are interested in relates to performance, or that the measurement tools require an evaluation of performance (i.e. not self reported). I would suggest that you use performance-based for one of these uses but not both. 4) Evaluation of measurement properties. It would be helpful to have a description of the validity/reliability domains and the thresholds applied to decide if measurement properties of satisfactory/indeterminant and inadequate to guide a reader who is unfamiliar with COSMIN through your methods 5) 'Gold standard': The authors have used a gold standard of fact to face measurement for criterion validity. I would argue that this is more likely to be a reliability measurement, or even a hypothesis test of construct validity as it is possible that an invalid test for physical function may yield similar values if administered face to face or via telemedicine. While this is an important measurement property I do not feel that it is sufficient robust to use as gold standard. 6) Study characteristics: I would be interested to know where the patients were located for these validation studies. Were these patients at home (akin to telemedicine during the pandemic) or patients brought to a clinic or laboratory to participate in the study. This would have significant feasibility implications for patient-at-home telemedicine rehabilitation as discussed in the background. 7) Methodological quality of studies: I found this section unclear as study quality (risk of bias) seemed to blend with the adequacy of measurement properties reported. In addition, I am confused as to how studies could be selected for reporting criterion validity but then downgraded for not reporting correlation. How did these studies compare their telemedicine to face to face measurements? 8) Summary of findings: You report unidimensionality in this table but this has not been described in the background or methods. How dis you assess unidimensionality and is this relevant for single-item scores? 9) Discussion lines 323-377 are a well written description of the results and I feel would be better situated in your results section. 10) Clinical implications: Given the very low quality of evidence available I do not think you can conclude that any measures have adequate reliability and validity in this study. I think the implication should be toned down to several performance measures may have satisfactory reliability and validity. 11) Strengths and limitations: you discuss limitations in the studies included but not the potential limitations and mitigations from your study design and execution. These include: Use of the COSMIN checklist for non-PROM tools Not undertaking study screening and full text extraction in duplicate Not seeking indirect evidence of measurement properties from other populations Reviewer #2: Dear authors, I would like to congratulate you for your article that will contribute to the literature. A few things I can say about the study: 1- Correct the " COSMIN (Consensus-Based Standards for the Selection of Health Measurement Instrument)" phrase in lines 28 and 29 to " Consensus-Based Standards for the Selection of Health Measurement Instrument (COSMIN)". Correct other spellings like this throughout the manuscript. 2- Indicate which authors are the reviewers mentioned on lines 163-169. 3- Write the reference representations of the studies mentioned in Table 1 and Table 2. 4- Specify the definitions of abbreviations such as n, SD in Table 1. Reviewer #3: The manuscript that aimed to the investigate psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic health conditions using the COSMIN (Consensus-Based Standards for the Selection of Health Measurement Instrument) approach. Overall, the study is well written. Authors should address the following concerns: 1- Search strategy should be updated. 2- Abstract: authors should report specific chronic conditions, telehealth environment and tools tested in the results and make conclusion on the quality of the current evidence supporting any investigate tool, if so. Besides, it is important to raise the potential issue related to the external validity (only high Human Development Index/HDI settings?). 3- Methods, selection of studies: Only 30% of abstracts and potential full texts were independently assessed by two independent reviewers in the selection of studies. Add it as a limitation in the discussion section. 4- Methods, page 10. Please add reference for the following statement: “As per COSMIN guidance, in order to demonstrate ‘sufficient’ validity the measure must 198 demonstrate >0.70 correlation with the ‘gold standard’.”. 5- Methods, synthesis: please clarify if it is appropriate to include studies not comparing with face to face. 6- Methods: criteria do downgrade que quality of the evidence in each domain of the GRADE approach should be specified. Was it conducted by two independent and trained reviewers? How did they resolve potential disagreements, if so? 7- Results: further description on characteristics of the samples is needed because it may be a potential external validity to be discussed. Participants: from the community? rural?, Socioeconomic status?, Cognitively tested?, and able to use mobile, computer, both? 8- Results, Table 1: Hwang et al 2017 conducted synchronous videoconferencing as all the other included studies? Please clarify in the Table. 9- Discussion: authors should revise discussion section to accommodate comments. 10- References: replace reference 29 to “LB Mokkink, M Boers, CPM van der Vleuten, LM Bouter, J Alonso, DL Patrick, HCW de Vet, CB Terwee. COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. BMC Medical Research Methodology. 2020;20(293).”. Is is the specific reference for performance-based outcome measures (PerFOMs). 11- References: reference for risk of bias should be “Lidwine B. Mokkink, Maarten Boers, CPM van der Vleuten, LM Bouter, Jordi Alonso, Donald L Patrick, HCW de Vet, CB Terwee. COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. BMC Medical Research Methodology. 2020;20(293).” Besides, please revise if it was used correctly by two independent reviewers with disagreements resolved by consensus or a third reviewer. Reviewer #4: PONE-D-22-14057 Title: Psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic conditions: A systematic review Thank you for the opportunity to review the above-mentioned manuscript. This article aims to examine the psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic health conditions. I agree with the authors as they mention in the introduction, the necessity of such reviews for measures of chronic conditions. I would like to congratulate the authors for such impressively high-quality work. I absolutely enjoyed reading this work. I have only a few minor comments. Abstract- Results: please re-order according to frequency What about other types of validity such as content, face, and construct validity? Can you add a bit about these too? Also, you discuss reliability, which type of reliability was that? Introduction- This section was extremely well-written. Line 59- “The ageing nature of our population..” I agree this is true in many Western countries but not necessarily everywhere, therefore, I suggest changing our. Methods- Authors did a meticulous work in this section as well. Line 142- for the constructs, please add a few examples Line 154- "However, since the comparator was always face-to-face administration of the same measure, when extracting data from the selected studies the only relevant properties were reliability, measurement error and criterion validity." This is great explanation but can go into your discussion or limitation. I think here under measurement properties you must say that you planned to include any measurement property reported in the papers. Line 160- I think you can connect using just 'or' instead of and/or. This is discouraged in academic writing. Results- Line 252- minor grammatical error, use were instead of was. Line 254- I suggest adding something like ‘…Or other types of validity’ Line 255- Same comment as in the abstract, please re-order based on the frequency of the tests, or alphabetically. Discussion- Line 326- Consider re-writing: This evidence reflects the evidence supporting…Line 379- remove 'of' in following of PRISMA Line 400- I would be cautious to say that '....have satisfactory reliability and validity when used in a telehealth environment.' especially with validity since you only found data for criterion validity and other types were not addressed. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 28 Jul 2022 We would like to thank you for giving us the opportunity to submit a revised version of the manuscript and express our thanks to the Reviewers for their constructive feedback and helpful suggestions. We believe that these revisions in response to the comments made by the Reviewers have resulted in an improved manuscript. We have uploaded our specific responses to each of the comments made by the reviewers in the file titled 'Response to Reviewers'. Submitted filename: Response to Reviewers.docx Click here for additional data file. 26 Aug 2022 Psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic conditions: A systematic review PONE-D-22-14057R1 Dear Dr. Barry Walsh, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Fatih Özden, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for asking me to review this manuscript again. I have read the new submission and found all comments to be addressed Reviewer #2: (No Response) Reviewer #3: Authors have clarified all comments and manuscript has good quality to be published. I believe the readers will be interested. Reviewer #4: I have no further comments. The requested revisions in my previous round were fully satisfied. The manuscript is technically sound and is in good shape for publication. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** 30 Aug 2022 PONE-D-22-14057R1 Psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic conditions: A systematic review Dear Dr. Barry Walsh: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Fatih Özden Academic Editor PLOS ONE

46 in total

1. Assessing functional exercise capacity using telehealth: Is it valid and reliable in patients with chronic heart failure?

Authors: Rita Hwang; Allison Mandrusiak; Norman R Morris; Robyn Peters; Dariusz Korczyk; Trevor Russell
Journal: J Telemed Telecare Date: 2016-07-09 Impact factor: 6.184

2. Measurement properties of the Timed Up & Go test in patients with COPD.

Authors: Rafael Mesquita; Sarah Wilke; Dionne E Smid; Daisy Ja Janssen; Frits Me Franssen; Vanessa S Probst; Emiel Fm Wouters; Jean Wm Muris; Fabio Pitta; Martijn A Spruit
Journal: Chron Respir Dis Date: 2016-07-08 Impact factor: 2.444

3. Adherence to Pulmonary Rehabilitation in COPD: A QUALITATIVE EXPLORATION OF PATIENT PERSPECTIVES ON BARRIERS AND FACILITATORS.

Authors: Gabriela R Oates; Soumya J Niranjan; Corilyn Ott; Isabel C Scarinci; Christopher Schumann; Trisha Parekh; Mark T Dransfield
Journal: J Cardiopulm Rehabil Prev Date: 2019-09 Impact factor: 2.081

4. Measurement Properties of the Hand Grip Strength Assessment: A Systematic Review With Meta-analysis.

Authors: Pavlos Bobos; Goris Nazari; Ze Lu; Joy C MacDermid
Journal: Arch Phys Med Rehabil Date: 2019-11-13 Impact factor: 3.966

5. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Authors: Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal: Qual Life Res Date: 2010-02-19 Impact factor: 4.147

Review 6. Usefulness of the Berg Balance Scale in stroke rehabilitation: a systematic review.

Authors: Lisa Blum; Nicol Korner-Bitensky
Journal: Phys Ther Date: 2008-02-21

7. The utility of physiotherapy assessments delivered by telehealth: A systematic review.

Authors: Cherie Zischke; Vinicius Simas; Wayne Hing; Nikki Milne; Alicia Spittle; Rodney Pope
Journal: J Glob Health Date: 2021-12-18 Impact factor: 4.413

Review 8. Telehealth and patient satisfaction: a systematic review and narrative analysis.

Authors: Clemens Scott Kruse; Nicole Krowski; Blanca Rodriguez; Lan Tran; Jackeline Vela; Matthew Brooks
Journal: BMJ Open Date: 2017-08-03 Impact factor: 2.692

9. 'It's not hands-on therapy, so it's very limited': Telehealth use and views among allied health clinicians during the coronavirus pandemic.

Authors: P Malliaras; M Merolli; C M Williams; J P Caneiro; T Haines; C Barton
Journal: Musculoskelet Sci Pract Date: 2021-02-05 Impact factor: 2.520

10. Physiotherapists and patients report positive experiences overall with telehealth during the COVID-19 pandemic: a mixed-methods study.

Authors: Kim L Bennell; Belinda J Lawford; Ben Metcalf; David Mackenzie; Trevor Russell; Maayken van den Berg; Karen Finnin; Shelley Crowther; Jenny Aiken; Jenine Fleming; Rana S Hinman
Journal: J Physiother Date: 2021-06-09 Impact factor: 7.000