| Literature DB >> 30238086 |
Katharine Gries1, Pamela Berry1, Magdalena Harrington2, Mabel Crescioni3, Mira Patel3, Katja Rudell4, Shima Safikhani5, Sheryl Pease6, Margaret Vernon5.
Abstract
BACKGROUND: In the development of patient-reported outcome (PRO) instruments, little documentation is provided on the justification of response scale selection. The selection of response scales is often based on the developers' preferences or therapeutic area conventions. The purpose of this literature review was to assemble evidence on the selection of response scale types, in PRO instruments. The literature search was conducted in EMBASE, MEDLINE, and PsycINFO databases. Secondary search was conducted on supplementary sources including reference lists of key articles, websites for major PRO-related working groups and consortia, and conference abstracts. Evidence on the selection of verbal rating scale (VRS), numeric rating scale (NRS), and visual analogue scale (VAS) was collated based on pre-determined categories pertinent to the development of PRO instruments: reliability, validity, and responsiveness of PRO instruments, select therapeutic areas, and optimal number of response scale options.Entities:
Keywords: Literature review; Patient-reported outcome; Response option; Response scales
Year: 2018 PMID: 30238086 PMCID: PMC6127075 DOI: 10.1186/s41687-018-0056-3
Source DB: PubMed Journal: J Patient Rep Outcomes ISSN: 2509-8020
Literature review search terms
| No. | Type | Search Terms |
|---|---|---|
| Search #1 | ||
| #1 | Consensus/guideline/ review terms | ‘consensus’/exp. OR consensus:ab,ti OR ‘review’/exp. OR review:ab,ti OR ‘practice guideline’/exp. OR guideline*:ab,ti OR ‘expert opinion’:ab,ti NOT ‘institutional review board’ |
| #2 | Response scale terms | ‘response scale’:ab,ti OR ‘response scales’:ab,ti OR likert:ab,ti OR ‘likert scale’/exp. OR ‘visual analog scale’:ab,ti OR ‘visual analog scales’:ab,ti OR ‘visual analogue scale’:ab,ti OR ‘visual analog scale’/exp. OR ‘numerical rating scale’:ab,ti OR ‘numerical rating scales’:ab,ti OR ‘verbal rating scale’:ab,ti OR ‘verbal rating scales’:ab,ti OR ‘competence scale’:ab,ti OR ‘competence scales’:ab,ti OR ‘frequency scale’:ab,ti OR ‘frequency scales’:ab,ti OR ‘extent scale’:ab,ti OR ‘extent scales’:ab,ti OR ‘comparison scale’:ab,ti OR ‘comparison scales’:ab,ti OR ‘performance scale’:ab,ti OR ‘performance scales’:ab,ti OR ‘developmental scale’:ab,ti OR ‘developmental scales’:ab,ti OR ‘qualitative scale’:ab,ti OR ‘qualitative scales’:ab,ti OR ‘agreement scale’:ab,ti OR ‘agreement scales’:ab,ti OR ‘categorical scale’:ab,ti OR ‘categorical scales’:ab,ti |
| #3 | Selecting terms | select*:ab,ti OR choose:ab,ti OR criteria:ab,ti OR compare:ab,ti OR comparison:ab,ti |
| #4 | Human studies terms | ‘animal’/exp. NOT ‘human’/exp. |
| #5 | Clinical trial terms | ‘randomized controlled trial’/exp. OR ‘controlled clinical trial’/exp. OR ‘clinical trial’/exp. OR ‘phase 1 clinical trial’/exp. OR ‘phase 2 clinical trial’/exp. OR ‘phase 3 clinical trial’/exp. OR ‘phase 4 clinical trial’/exp. OR ‘multicenter study’/exp. OR random*:ab,ti OR placebo:ab,ti OR trial:ab,ti OR groups:ti OR (singl*:ab,ti OR doubl*:ab,ti OR trebl*:ab,ti OR tripl*:ab,ti AND (mask*:ab,ti OR blind*:ab,ti OR dumm*:ab,ti)) OR ‘double blind procedure’/exp. OR ‘single blind procedure’/exp. OR ‘random allocation’:ab,ti OR ‘open label’:ab,ti OR ‘open labeled’:ab,ti OR ‘open labelled’:ab,ti OR ‘placebo’/exp. OR ‘randomization’/exp. OR ‘crossover procedure’/exp. |
| #6 | Final encompassing terms | #1 AND #2 AND #3 NOT #4 NOT #5 AND ([article]/ |
| Search #2 | ||
| #7 | Comparison of scales terms | TI ((scale OR measure) N5 (compare* OR merit* OR evaluat* OR consider*)) OR AB ((scale OR measure) N5 (compare* OR merit* OR evaluat* OR consider*)) |
| #8 | Merits of scales terms | TI (scor* OR psychometric* OR responsive* OR “cross culture” OR “cross cultural” OR collect* OR “anchor placement” OR “data collection method” OR “internal consistency” OR “test retest” OR construct OR interrater OR standardization OR reliability OR validity OR sensitivity OR specificity OR “item response” OR “intraclass correlation”) OR AB (scor* OR psychometric* OR responsive* OR “cross culture” OR “cross cultural” OR collect* OR “anchor placement” OR “data collection method” OR “internal consistency” OR “test retest” OR construct OR interrater OR standardization OR reliability OR validity OR sensitivity OR specificity OR “item response” OR “intraclass correlation”) OR SU (scor* OR psychometric* OR responsive* OR “cross culture” OR “cross cultural” OR collect* OR “anchor placement” OR “data collection method” OR “internal consistency” OR “test retest” OR construct OR interrater OR standardization OR reliability OR validity OR sensitivity OR specificity OR “item response” OR “intraclass correlation”) |
| #9 | Review/consensus terms | TI (“expert opinion” OR “consensus development”) OR AB (“expert opinion” OR “consensus development”) OR DE “Literature Review” |
| #10 | Final encompassing terms | #2 AND #7 AND #8 NOT #9 NOT #4 NOT #5 AND ([article]/lim OR [article in press]/lim OR [review]/lim) AND [english]/lim AND [2004–2014]/py |
| Search #3 | ||
| #11 | PRO terms | ‘patient satisfaction’/exp. OR (patient* NEAR/2 satisfaction):ab,ti OR (patient* NEAR/2 reported):ab,ti OR ‘self report’/exp. OR (self NEAR/1 report*):ab,ti OR ‘patient preference’/exp. OR (patient* NEAR/2 preference*):ab,ti OR (patient* NEAR/1 assess*):ab,ti OR ‘self evaluation’:ab,ti OR ‘self evaluations’:ab,ti OR (patient* NEAR/2 rating):ab,ti OR (patient* NEAR/2 rated):ab,ti OR ‘self-completed’:ab,ti OR ‘self-administered’:ab,ti OR (self NEAR/1 assessment*):ab,ti OR ‘self-rated’:ab,ti OR ‘patient based outcome’:ab,ti OR ‘self evaluation’/exp. OR experience*:ab,ti |
| #12 | Format terms | format:ab,ti OR structur*:ab,ti OR ((multiple OR multi OR single OR number) NEAR/4 item*):ab,ti OR (anchor* NEAR/4 (wording OR item*)):ab,ti |
| #13 | Final encompassing terms | #2 AND #11 AND #12 NOT #4 AND ([article]/lim OR [article in press]/lim OR [review]/lim) AND [english]/lim AND [2004–2014]/py |
| Search #4 | ||
| #14 | Scoring/ psychometric properties | ‘instrumentation’/exp. OR ‘validation study’/exp. OR ‘reproducibility’/exp. OR reproducib*:ab,ti OR ‘psychometrics’ OR psychometr*:ab,ti OR clinimetr*:ab,ti OR clinometr*:ab,ti OR ‘observer variation’/exp. OR observer AND variation:ab,ti OR ‘discriminant analysis’/exp. OR reliab*:ab,ti OR valid*:ab,ti OR coefficient:ab,ti OR ‘internal consistency’:ab,ti OR (cronbach*:ab,ti AND (alpha:ab,ti OR alphas:ab,ti)) OR ‘item correlation’:ab,ti OR ‘item correlations’:ab,ti OR ‘item selection’:ab,ti OR ‘item selections’:ab,ti OR ‘item reduction’:ab,ti OR ‘item reductions’:ab,ti OR agreement OR precision OR imprecision OR ‘precise values’ OR test–retest:ab,ti OR (test:ab,ti AND retest:ab,ti) OR (reliab*:ab,ti AND (test:ab,ti OR retest:ab,ti)) OR stability:ab,ti OR interrater:ab,ti OR ‘inter rater’:ab,ti OR intrarater:ab,ti OR ‘intra rater’:ab,ti OR intertester:ab,ti OR ‘inter tester’:ab,ti OR intratester:ab,ti OR ‘intra tester’:ab,ti OR interobserver:ab,ti OR ‘inter observer’:ab,ti OR intraobserver:ab,ti OR ‘intra observer’:ab,ti OR intertechnician:ab,ti OR intratechnician:ab,ti OR ‘intra technician’:ab,ti OR interexaminer:ab,ti OR ‘inter examiner’:ab,ti OR intraexaminer:ab,ti OR ‘intra examiner’:ab,ti OR interassay:ab,ti OR ‘inter assay’:ab,ti OR intraassay:ab,ti OR ‘intra assay’:ab,ti OR interindividual:ab,ti OR ‘inter individual’:ab,ti OR intraindividual:ab,ti OR ‘intra individual’:ab,ti OR interparticipant:ab,ti OR ‘inter participant’:ab,ti OR intraparticipant:ab,ti OR ‘intra participant’:ab,ti OR kappa:ab,ti OR kappa’s:ab,ti OR kappas:ab,ti OR ‘coefficient of variation’:ab,ti OR repeatab* OR (replicab* OR repeated AND (measure OR measures OR findings OR result OR results OR test OR tests)) OR generaliza*:ab,ti OR generalisa*:ab,ti OR concordance:ab,ti OR (intraclass:ab,ti AND correlation*:ab,ti) OR discriminative:ab,ti OR ‘known group’:ab,ti OR ‘factor analysis’:ab,ti OR ‘factor analyses’:ab,ti OR ‘factor structure’:ab,ti OR ‘factor structures’:ab,ti OR dimensionality:ab,ti OR subscale*:ab,ti OR ‘multitrait scaling analysis’:ab,ti OR ‘multitrait scaling analyses’:ab,ti OR ‘item discriminant’:ab,ti OR ‘interscale correlation’:ab,ti OR ‘interscale correlations’:ab,ti OR (error:ab,ti OR errors:ab,ti AND (measure*:ab,ti OR correlat*:ab,ti OR evaluat*:ab,ti OR accuracy:ab,ti OR accurate:ab,ti OR precision:ab,ti OR mean:ab,ti)) OR ‘individual variability’:ab,ti OR ‘interval variability’:ab,ti OR ‘rate variability’:ab,ti OR ‘variability analysis’:ab,ti OR (uncertainty:ab,ti AND (measurement:ab,ti OR measuring:ab,ti)) OR ‘standard error of measurement’:ab,ti OR sensitiv*:ab,ti OR responsive*:ab,ti OR (limit:ab,ti AND detection:ab,ti) OR ‘minimal detectable concentration’:ab,ti OR interpretab*:ab,ti OR (small*:ab,ti AND (real:ab,ti OR detectable:ab,ti) AND (change:ab,ti OR difference:ab,ti)) OR ‘meaningful change’:ab,ti OR ‘minimal important change’:ab,ti OR ‘minimal important difference’:ab,ti OR ‘minimally important change’:ab,ti OR ‘minimally important difference’:ab,ti OR ‘minimal detectable change’:ab,ti OR ‘minimal detectable difference’:ab,ti OR ‘minimally detectable change’:ab,ti OR ‘minimally detectable difference’:ab,ti OR ‘minimal real change’:ab,ti OR ‘minimal real difference’:ab,ti OR ‘minimally real change’:ab,ti OR ‘minimally real difference’:ab,ti OR ‘ceiling effect’:ab,ti OR ‘floor effect’:ab,ti OR ‘item response model’:ab,ti OR irt:ab,ti OR rasch:ab,ti OR ‘differential item functioning’:ab,ti OR dif:ab,ti OR ‘computer adaptive testing’:ab,ti OR ‘item bank’:ab,ti OR ‘cross-cultural equivalence’:ab,ti |
| #15 | Final encompassing terms | #2 AND #11 AND #3 AND #14 NOT #4 AND ([article]/lim OR [article in press]/lim OR [review]/lim) AND [english]/lim AND [2004–2014]/py |
| Search #5 | ||
| #16 | RA (fatigue) terms | ‘rheumatoid arthritis’/exp./mj AND ‘fatigue’/exp. OR (‘rheumatoid arthritis’:ab,ti AND fatigue:ab,ti) |
| #17 | Asthma terms | ‘asthma’/exp./mj OR asthma:ab,ti |
| #18 | Cognition terms | ‘cognition’/exp./mj OR cognition:ab,ti |
| #19 | Depression terms | ‘depression’/exp./mj OR depression:ab,ti |
| #20 | SCLC terms | ‘lung small cell cancer’/exp./mj OR ‘small cell lung cancer’:ab,ti |
| #21 | Pain terms | ‘pain’/exp./mj OR pain:ab,ti |
| #22 | Sub-final terms | #16 OR #17 OR #18 OR #19 OR #20 OR #21 |
| #23 | Final encompassing terms | #2 AND #11 AND #3 AND #22 NOT #4 AND ([article]/lim OR [article in press]/lim OR [review]/lim) AND [english]/lim AND [2004–2014]/py |
Fig. 1Outline of search procedures and data extraction. PRO: patient-reported outcome
Key studies that support response scale selection for PRO instruments based on responsiveness
| Reference | Response Scale Type | Methods to Determine Responsivenessa | Summary of Resultsb |
|---|---|---|---|
| Grotle et al. 2004 [ | 11-point NRS VAS | SRM | In acute pain, for improved patients NRS SRM = 2.0 and VAS SRM = 1.6. For unchanged patients NRS SRM = 1.0 and VAS SRM = − 0.5. |
| Skovlund et al. 2005 [ | VAS: 100 mm line anchored at no pain/discomfort and pain/discomfort 4-point VRS: none, mild, moderate, severe | Sensitivity of scales with multiple simulations | Cross-sectional analyses with multiple simulations to understand the sensitivity of scales. |
| Chanques et al. 2010 [ | 11-point NRS 5-point VRS (no pain, mild pain, moderate pain, severe pain, extreme pain) VAS: 10-cm line anchored at no pain and extreme pain | ES Type of ES (Cohen’s d or SRM) not provided in the reference | Patients identified NRS was the easiest, most accurate and preferred scale in comparison with 5-point VRS and VAS. NRS demonstrated the best sensitivity (96.6%) and negative predictive value (89.6%) whereas VRS demonstrated the best specificity (70.7%) and positive predictive value (86.3%). VAS demonstrated the lowest performance, except for the negative predictive value, which was comparable to VRS |
| Dogan et al. 2012 [ | Faces scale: 7-point horizontal scale that defines feels due to pain. First face represents no pain and the last face represents the worst possible pain VAS: 10-cm horizontal line anchored at no pain and severe pain. | Calculated ES (SRM) | Faces scale ES = 1.78 |
| Chien et al. 2013 [ | 11-point NRS (several different BPI scales) | SRM | Results for all participants: |
| Gonzalez-Fernandez et al. 2014 [ | VAS (100 mm line) | Between group difference | The mean (SD) VAS score was 6.13 (2.27) and the mean (SD) NRS score (after scaling to a 0–10 scale) was 4.35 (2.52), with medians of 7 and 4, respectively. |
PRO patient-reported outcome, NRS numeric rating scale, VAS visual analogue scale, SRM standardized response mean, VRS verbal rating scale, ES effect size, BPI Brief Pain Inventory, ODI Oswestry Disability Index, gLMS general Labeled Magnitude Scale, SD standard deviation
aSRM calculated by dividing the mean change by the standard deviation of the mean change scores. Effect size of 0.2 = small, 0.5 = moderate, and > 0.8 = large clinical change
bAll references provided direct evidence: Primary research that compares different response scales within study
Key studies that support response scale selection used in PRO instruments based on select therapeutic areas
| References | Study Type, Evidence Typea, Gradeb | Response Scale Type | Objective | Summary of Results |
|---|---|---|---|---|
| Asthma | ||||
| Sherbourne et al. 2012 [ | Cross-sectional observational study, Indirect, C | 5-point VRS | Develop asthma-specific quality of life items | A 5-point VRS for asthma quality of life assessment in adults was understood based on qualitative research with patients (cognitive interviews). |
| Liu et al. 2007 [ | Cross-sectional observational study, Indirect, C | 4-point VRS | Develop and validate the Childhood Asthma Control Test (C-ACT) | Children between the ages of 4 and 11 could understand and complete a 4-point VRS assisted by facial graphics. |
| Cognition | ||||
| Hagell and Knutsson 2013 [ | Prospective, observational study, Direct, A | 5-point VRS and VAS | Compare test-retest properties of 2 general health single item response formats among people with neurological disorders | Test-retest reliability assessments were similar for both formats, however patients preferred the VRS over the VAS format. |
| Depression | ||||
| Preston et al. 2011 [ | Cross-sectional observational study, Direct, A | 4-point VRS and 5-point VRS | Evaluate the precision of the 5-point VRS response scale utilized in the emotional distress PROMIS item bank | The 5-point response options are not always equally spaced (i.e., do not meet the assumptions of an equal interval scale) and 4-point response categories were as precise as five. |
| Lasch et al. 2012 [ | Cross-sectional observational study, Indirect, C | 11-point NRS | Develop a content valid PRO measure for Major Depressive Disorder (MDD) | Cognitive interview demonstrated that an 11-point NRS was well understood and appropriate for evaluating concepts. |
| Rheumatoid Arthritis (Fatigue) | ||||
| Hewlett et al. 2007 [ | Review, Indirect, B | VAS and NRS | Systematic literature review to identify fatigue in rheumatoid arthritis scales; assess scale measurement properties | A VAS scale was the most frequently utilized scale to evaluate fatigue in rheumatoid arthritis and shows evidence of validity but there was no standardized VAS scale to evaluate fatigue in rheumatoid arthritis as scales were study specific. NRS used to evaluate fatigue in rheumatoid arthritis showed some evidence of construct validity but data on criterion validity, reliability, or sensitivity were not found. |
| Nicklin et al. 2010 [ | Cross-sectional observational study, Direct, A | VAS and NRS | Develop and validate a patient reported outcome measure of fatigue in RA, the Bristol RA Fatigue- Multidimensional Questionnaire (BRAF-MDQ) and the Bristol RA Fatigue (BRAF) short scales (VAS/NRS) | The final wording for fatigue severity, effect, and coping VAS/NRS scales was based on focus group recommendations and required measurement properties. The VAS /NRS were understoodby all patients in the way they were intended by the authors. Vertical orientation of the scales enhanced comprehension (rather than horizontal). |
| Khanna et al. 2008 [ | Prospective, observational study, Indirect, C | VAS | Evaluate score interpretation (MID) for a fatigue VAS | Mean MID estimates ranged from −0.82 to −1.12 for improvement and 1.13 to 1.26 for worsening (range of 0–10) for a fatigue VAS. These results were similar to those see in RA clinical trials. |
| Oncology | ||||
| Koshy et al. 2004 [ | Cross-sectional, observational study, Direct, A | VAS, VRS, Graphical rating scales | Determine patient preferences for pain assessment scale type | Most patients (56%) preferred the pain VAS, 30% preferred the graphical (coin) rating scale, 13% preferred the VRS, and no patients preferred the graphical (color) scale. Findings of statistically significant positive correlations between the VAS and VRS suggest both represent similar pain intensity, and both could be used as reliable pain assessment tools. A single item VAS was recommended for evaluating pain in oncology patients because it is reliable and well understood, and preferred by most patients in this study. |
| Anderson et al. 2007 [ | Review, Indirect, B | VAS, VRS, and NRS | Review of pain assessment scales for us in an oncology population | Pain intensity ratings using the VAS, NRS, and VRS are highly inter-correlated. The NRS is easily understood by most patients, recommended in many pain treatment guidelines, and may be more reliable than the VAS in clinical trials, particularly with low literacy patients. |
| Rohan 2012 [ | Review, Indirect, B | VRS and 11-point NRS | Review of distress screening measures used in oncology | A review of the multi-item Hospital Anxiety and Depression Scale (HADS) and the Brief Symptom Inventory- 18 (BSI-18) scale, and a single item Distress Thermometer (11-point NRS) concluded the Distress Thermometer was as discriminative as the multi-item HADS and BSI-18. |
| Sigurdardottir et al. 2014 [ | Delphi-process, Indirect, D | NRS | Delphi process to obtain consensus on a basic set of core variables to describe or classify a palliative care cancer population | The 11-point NRS scale was recommended to evaluate important aspects of palliative care in cancer (e.g., appetite, depression, anxiety) and PRO instrument selection should always be undertaken with consideration of specific objectives, samples, treatments, and available resources. |
| King et al. 2014 [ | Prospective observational study, Direct, A | 11-point NRS and VAS | Determine optimal instrument to measure subjective symptom benefit in clinical trials of palliative | For an ovarian symptom PRO measure, the 11-point NRS was preferable over the VAS and VRS due to improved responsiveness, ease of use, and compliance. |
| Jacobs et al. 2013 [ | Prospective observational study, Indirect, C | Faces scale | Psychometric evaluation of a pediatric mucositis scale in cancer patients | For a pediatric mucositis scale in cancer patients ages 8 to 18, a Faces scale was found to be reliable, valid, and responsive. |
| Ng et al. 2012 [ | Cross-sectional, observational study, Direct, A | VAS, NRS, and Faces scales | Investigate correlations between, and patient preference for, pain assessment scales for use in an oncology population | The VAS, NRS, and Faces scale showed a high degree of association with intensity of pain making these scales appropriate for pain assessment in cancer. The Faces scale was preferred over the VAS and NRS and was superior to the NRS or VAS with cognitively impaired patients |
| Chordas et al. 2013 [ | Prospective observational study, Direct, A | 11-point NRS, VAS, VRS | Determine if a single item pain measure can accurately identify clinically significant pain in a pediatric brain cancer population | In a pediatric population of brain cancer patients, a multi-item measure with VRS was more precise than a single item disease thermometer (variation of 11-point NRS). |
| Banthia et al. 2006 [ | Prospective observational study, Direct, A | VAS and VRS | Comparison of daily versus weekly, unidimensional versus multidimensional measures of fatigue in a breast cancer population | A single item cancer fatigue VAS daily and weekly had some discordance between the daily and weekly measurement, indicating they are not capturing the same information. The single item fatigue VAS showed greatest overlap with the general fatigue subscale of the multidimensional fatigue measure, suggesting the VAS item is a unidimensional measure of one aspect of fatigue. The decision to use a multidimensional or unidimensional measures of fatigue will depend upon the research question. |
| Grassi et al. 2013 [ | Cross-sectional, observational study, Indirect, C | NRS with Graphical component and multi-item measures | Validation and acceptance of the Distress Thermometer in an Italian cancer population | A distress thermometer (NRS with graphical component) was as specific and sensitive as multi-item measures and was slightly preferred by patients. |
VRS verbal rating scale, VAS visual analogue scale, NRS numeric rating scale, RA rheumatoid arthritis, PRO patient-reported outcome
aDirect evidence: Primary research that compares different response scales within study. Indirect evidence: Review or expert opinion based on empirical evidence or primary research that evaluates a single response scale type within the study
bGrade Key: A) Primary research: compares different response scales within study; B) Review or expert opinion: based on an empirical evidence base; C) Primary research: evaluates a single response scale type within the study; and D) Review or expert opinion, based on expert consensus, convention, or historical evidence
Key studies that support response scale selection used in PRO instruments based on optimal response set number
| Reference | Response Scale Type | Study Type, Evidence Typea, Gradeb | Study Population | Summary of Results | Conclusion |
|---|---|---|---|---|---|
| Cleopas et al. 2006 [ | Binary | Prospective study, Direct, A | 1996 adult patients discharged from the hospital in Switzerland | Superior reliability, assessed by Cronbach’s alpha and test -retest, and convergent and discriminant validity for the 5-point version compared to the binary or 3-point version in the Nottingham Health Profile (NHP). | 5-point VRS improved patient acceptability, reduced ceiling effects, and improved measurement properties |
| DeWalt et al. 2007 [ | 4-point VRS | Instrument development and/or validation study, Direct, A | Analysis of PROMIS items; pain, fatigue, emotional distress, physical function, and social function | Optimal response set number was somewhat dependent on the item and construct, 4 to 6 response options was typically optimal because this number both reduced cognitive burden for respondents and each option could provide unique information; investigators found that with response sets of greater than six choices, two or more options were typically collapsed to improve step-disorder and model fit. | Based on IRT analyses recommend 4-point to 6-point based on the item construct |
| Janssen et al. 2008 [ | 3-level | Instrument development and/or validation study, Direct, A | 81 adult respondents in a panel session | 5-level version had higher acceptability and comprehension and demonstrated superior reliability, validity, and discriminatory power. | 5-level reduced ceiling effect, increased benefit in the detection of mild problems and in measuring general population health |
| Chomeya 2010 [ | 5-point Likert | Instrument development and/or validation study, Direct, A | 180 undergraduate students from Mahasarakham University | The 6-point Likert scale had slightly better discrimination and reliability, assessed by Cronbach’s alpha, compared to a 5-point scale. | Both the 5-point and 6-point scales gave discrimination at acceptable level per the standard of psychology tests |
| Rhodes et al. 2010 [ | 5-point Likert | Instrument development and/or validation study, Direct, A | 412 volunteer students in introduction psychology or physical education courses. | The 7-point scale (strongly disagree, moderately disagree, slightly disagree, undecided, slightly agree, moderately agree, strongly agree) had slightly higher reliability, assessed by Cronbach’s alpha, overall but predictive validity was largely comparable to the 5-point scale (strongly disagree, moderately disagree, undecided, agree, strongly agree). The 7-point scale demonstrated larger variability compared to the 5-point scale. | Either the 5-point or the 7-point scale is appropriate for use in scales for physical activity research |
| Bakshi et al. 2012 [ | 3-point Likert | Instrument development and/or validation study, Direct, A | Inpatients aged 50 years and above in Singapore ( | The 3-point versions (disagree, neutral, and agree) were comparable to the 5-point versions (strongly disagree, disagree, neutral, agree, and strongly agree); the scores performed similarly. The 3-point versions were not less reliable, assessed by Cronbach’s alpha, or discriminative. | The 3-point scale is acceptable if a simple scale is required |
| Leung and Xu 2013 [ | 5-point VRS | Review, Indirect, B | 7147 students (age 12 to 22 years) in Macau. 795 students in China. 844 secondary students in Macau. | Single item measures with an 11-point scale from 0 to 10 are closer to normality and interval scales, and have construct validity with major social constructs. | The 11-point scale was more normally distributed than the shorter scale options and had good validity. |
| Dumas et al. 2013 [ | 3-point VRS | Review, Indirect, B | Published literature for the Scale to Assess Unawareness of Mental Disorder (SUMD). | The 5-point scale was more informative and discriminative than a 3-point scale. | Authors state that further research is required to determine if a 3-point or 5-point scale should be used with the SUMD. |
| Janssen et al. 2013 [ | 3-level | Instrument development and/or validation study, Direct, A | 3919 adults with chronic conditions (cardiovascular disease, respiratory disease, depression, diabetes, liver disease, personality disorders, arthritis, and stroke) | For the 5-level system, the ceiling was reduced from 20.2% (3 L) to 16.0% (5 L). Absolute discriminatory power (Shannon index) improved considerably with 5 L (mean 1.87 for 5 L versus 1.24 for 3 L), and relative discriminatory power (Shannon Evenness index) improved slightly (mean 0.81 for 5 L versus 0.78 for 3 L). Convergent validity with WHO-5 was demonstrated and improved slightly with 5 L. Known-groups validity was confirmed for both 5 L and 3 L. | 5-level version had higher acceptability and comprehension and demonstrated superior reliability, validity, and discriminatory power. |
PRO patient-reported outcome, VRS verbal rating scale, NRS numeric rating scale
aDirect evidence: Primary research that compares different response scales within study. Indirect evidence: Review or expert opinion based on empirical evidence or primary research that evaluates a single response scale type within the study
bGrade Key: A) Primary research: compares different response scales within study; B) Review or expert opinion: based on an empirical evidence base; C) Primary research: evaluates a single response scale type within the study; and D) Review or expert opinion, based on expert consensus, convention, or historical evidence