Literature DB >> 30798308

Shortening patient-reported outcome measures through optimal test assembly: application to the Social Appearance Anxiety Scale in the Scleroderma Patient-centered Intervention Network Cohort.

Daphna Harel^1,2, Sarah D Mills³, Linda Kwakkenbos⁴, Marie-Eve Carrier⁵, Karen Nielsen⁶, Alexandra Portales⁷, Susan J Bartlett⁸, Vanessa L Malcarne^9,10, Brett D Thombs¹¹.

Abstract

OBJECTIVES: The Social Appearance Anxiety Scale (SAAS) is a 16-item measure that assesses social anxiety in situations where appearance is evaluated. The objective was to use optimal test assembly (OTA) methods to develop and validate a short-form SAAS based on objective and reproducible criteria.
DESIGN: This study was a cross-sectional analysis of baseline data from adults enrolled in the Scleroderma Patient-centered Intervention Network (SPIN) Cohort.
SETTING: Adults in the SPIN Cohort in the present study were enrolled at 28 centres in Canada, the USA and the UK. PARTICIPANTS: The SAAS was administered to 926 adults with scleroderma. PRIMARY AND SECONDARY MEASURES: The SAAS, Brief Fear of Negative Evaluation II (BFNE II), Brief Satisfaction with Appearance Scale (Brief-SWAP), Patient Health Questionnaire-8 (PHQ8) and Social Interaction Anxiety Scale-6 (SIAS-6) were collected, as well as demographic characteristics.
RESULTS: OTA methods identified a maximally informative shortened version for each possible form length between 1 and 15 items. The final shortened version was selected based on prespecified criteria for reliability, concurrent validity and statistically equivalent convergent validity with the BFNE II scale. A five-item short version was selected (SAAS-5). The SAAS-5 had a Cronbach's α of 0.95 and had high concurrent validity with the full-length form (r=0.97). The correlation of the SAAS-5 with the BFNE II was 0.66, which was statistically equivalent to that of the full-length form. Furthermore, the correlation of the SAAS-5 with the two subscales of the Brief-SWAP, and the SIAS-6, were statistically equivalent to that of the full-length form.
CONCLUSIONS: OTA was an efficient method for shortening the full-length SAAS to create the SAAS-5. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical Disease Species

Keywords: generalized partial credit model; optimal test assembly; patient reported outcome measure; short form; systemic sclerosis

Mesh：

Year: 2019 PMID： 30798308 PMCID： PMC6398718 DOI： 10.1136/bmjopen-2018-024010

Source DB: PubMed Journal: BMJ Open ISSN： 2044-6055 Impact factor: 2.692

This study used optimal test assembly methods and equivalence testing to shorten the Social Appearance Anxiety Scale (SAAS) in patients with scleroderma. This method is data driven and reproducible, unlike many alternative methods for shortening questionnaires. The generalisability of findings is limited to adults with scleroderma and should be confirmed for other patient populations as well as the general population.

Introduction

Patient-reported outcome measures (PROs) that assess patient health, well-being and psychological status based on patient perspectives are increasingly a central component of clinical trials and cohort-based observational studies in health research.1 This can lead, however, to participants being asked to respond to many scales that each contain multiple items, which may be a burden for participants, increase research costs and contribute to poor quality data due to survey fatigue. To ameliorate this problem, researchers sometimes attempt to create shortened versions of PROs with scores that can perform as well or nearly as well as original full-length versions.2–5 In rare diseases, including systemic sclerosis (SSc), psychological impact can be substantial, and psychological measures are increasingly included in large, multisite studies. SSc is a rare autoimmune disorder characterised by thickening and fibrosis of the skin and internal organs.6 7 Changes in appearance are a hallmark of the disease and can include hypopigmentation and hyperpigmentation, digital ulcers, hand contractures, telangiectasias and altered facial features. Changes in appearance often occur in socially relevant areas (ie, hands and face) of the body and can have significant impacts on psychosocial functioning, in particular in social contexts.8 Adults with SSc report high rates of anxiety, with 64% reporting at least one anxiety disorder in their lifetime, and social anxiety being among the most common.9 Despite reports of appearance-related social discomfort, research in appearance-related social anxiety among adults with SSc is limited. The Social Appearance Anxiety Scale (SAAS)10 is a 16-item self-report measure that assesses fear of situations in which one’s appearance will be evaluated. The SAAS was recently validated in a large sample (n=938) of adults with SSc attending clinics in Canada, the USA and the UK.11 Consistent with previous studies,10 12–14 a unidimensional factor structure fits well among the total sample of adults with SSc. Internal consistency reliability as measured by Cronbach’s α was excellent in the total sample (α=0.96) and for limited (α=0.96) and diffuse (α=0.97) subtypes. Evidence of convergent validity was provided via moderate to large correlations between the 16-item SAAS and measures of social discomfort, fear of negative evaluation, social anxiety, symptoms of depression and dissatisfaction with appearance. In other studies, the SAAS has also demonstrated strong measurement properties in samples of university students,10 12 women with eating disorders13 and gay and bisexual men of colour.14 No studies, however, have examined whether all 16 items of the SAAS are necessary to achieve these measurement properties or whether it is possible to shorten this scale. Apparent redundancy between some of the 16 items suggest that there may be an opportunity for shortening (eg, item 7, ‘I am afraid people find me unattractive’, and item 16, ‘I am concerned that people think I am not good looking’.). Historically, researchers have created shortened versions of PROs through either an expert-based, qualitative assessment of item content or by fitting a factor analysis model and removing items with minimal factor loadings or low item–total correlations.3 More modern techniques, such as item response theory,15 have been used to identify items that are problematic. However, these methods often are administered in a way that the final selection of items in the shortened version is left to the researcher’s discretion, rather than by systematically establishing prespecified cut-offs or using reproducible criteria. Optimal test assembly (OTA) is a branch-and-bound, mixed integer programming procedure that relies on estimates obtained from an item response theory model to select an optimal subset of items that best satisfy objective, reproducible and prespecified constraints.16 OTA has been commonly used to create versions of high-stakes educational tests,17 but recently, a study demonstrated its use for the development of shortened versions of PROs in health research by shortening an 18-item hand function scale to six items while maintaining equivalent measurement properties to those of the full-length form.18 Furthermore, OTA was recently used to shorten the Patient Health Questionnaire – 9 to a four-item shortened form.19 This procedure was also shown to be replicable, reproducible and produce shortened forms of minimal length as compared with leading alternative methods.20 The objective of the present study was to apply OTA to develop a shortened version of the SAAS. We: (1) used OTA methods to generate maximally precise candidate short versions of the SAAS of each possible length; (2) selected the shortest possible version that performed similarly to the full-form SAAS in terms of prespecified reliability and validity criteria; and (3) assessed the convergent validity of the final selected shortened form as compared with that of the full-length form.

Material and methods

Participants and procedures

This study was a cross-sectional analysis of baseline data from adults enrolled in the Scleroderma Patient-centered Intervention Network (SPIN) Cohort21 who completed online study questionnaires from May 2014 to August 2016. Adults in the SPIN Cohort in the present study were enrolled at 28 centres in Canada, the USA and the UK. To be eligible for the SPIN Cohort, adults must be classified as having SSc according to 2013 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria22 and confirmed by a SPIN physician, be at least 18 years of age, have the ability to provide informed consent and be fluent in English, French or Spanish. Eligible adults are invited by the attending physician or a supervised nurse coordinator to participate in the cohort, and written informed consent is obtained. SPIN Cohort adults complete outcome measures via the internet on enrolment and subsequently every 3 months. Adults who completed all items of the SAAS and the Brief Fear of Negative Evaluation II (BFNE II) at baseline in English were included in the present study.

Measures

Demographic and medical variables

Age, gender, marital status, number of years since first non-Raynaud’s symptom, disease subtype (limited or diffuse)and modified Rodnan skin score23 were collected. Limited disease was defined as skin sclerosis confined to the limbs distal to the elbows and knees with or without face involvement. Diffuse disease was defined as skin sclerosis involving the limbs proximal to the elbows and knees with or without chest or trunk involvement.24 Demographic variables were self-reported, and SPIN physicians or nurse coordinators collected medical variables.

Social Appearance Anxiety Scale

The SAAS, a 16-item measure, was developed to assess the respondent’s anxiety surrounding situations in which one’s appearance may be evaluated. Response options for each item range from 1 (not at all) to 5 (extremely). The total score is calculated by summing across all items, after reverse coding the first item. Scores range from 16 to 80, with higher scores indicating greater fear. A study of adults with SSc found strong evidence for a one-dimension factor structure both in the total sample and when examined separately among adults with limited and diffuse SSc, internal consistency reliability and convergent validity.11

Brief Fear of Negative Evaluation II

The BFNE-II is a 12-item measure that assesses the degree to which individuals worry about how they are perceived and evaluated by others.25 Response options for each item range from 1 (not at all characteristic of me) to 5 (extremely characteristic of me). Scores range from 12 to 60 with higher scores indicating greater fear of negative evaluation. A study of adults with SSc found strong evidence for a one-dimension factor structure, internal consistency reliability and convergent validity.26

Brief Satisfaction with Appearance Scale (Brief-SWAP)

The Brief-SWAP consists of two three-item subscales that measure dissatisfaction with appearance and social discomfort.27 Response options for each item range from 0 (strongly disagree) to 6 (strongly agree). Scores on each subscale range from 0 to 18, with higher scores indicating greater body image dissatisfaction. A study of adults with SSc found high internal consistency and strong convergent validity with the SAAS.11

Patient Health Questionnaire-8 (PHQ-8)

The PHQ-8 consists of eight items that measure depressive symptomology.28 Response options on each item range from 0 (not at all) to 3 (nearly every day), with a total score that ranges from 0 to 24. Higher scores indicate higher levels of depressive symptoms. A study of adults with SSc found high internal consistency and moderate convergent validity with the SAAS.11

Social Interaction Anxiety Scale-6 (SIAS-6)

The SIAS-6 assesses anxiety resulting from social.29 Response options on six items range from 0 (not at all characteristic or true of me) to 4 (extremely characteristic or true of me), with total scores ranging from 0 to 24. A study of adults with SSc found excellent internal consistency and strong convergent validity with the SAAS.11

Statistical analysis

Item response theory model and OTA

Unidimensionality of the SAAS in this sample was confirmed previously using the same dataset as in the present study.11 Thus, a generalised partial credit item response theory model (GPCM) was fit to all 16 items of the SAAS.30 The GPCM estimates two types of parameters for each item: threshold parameters, which measure the level of anxiety at which people are more likely to endorse a higher category than the one below it, and discrimination parameters, which measure the strength of the association between that item and the underlying construct (in this case, social appearance anxiety). From these item-level parameters, item information functions are estimated for each of the 16 items, and summed pointwise to obtain the test information function (TIF). The TIF measures the total amount of Fisher’s information in the 16 items and is inversely related to the SE of measurement of the underlying construct. Thus, versions of a PRO with higher levels of test information result in greater precision in the measurements of the underlying construct.15 A set of 15 candidate shortened versions, one of each possible length between 1 item and 15 items, was generated through the OTA procedure. OTA uses a branch-and-bound approach through mixed integer linear programming to systematically explore the space of all possible shortened versions of a fixed length to optimise an objective function. In this case, the objective function was defined to be the height of the TIF, thus minimising the SE of measurement of the underlying construct. Therefore, for each possible length, the OTA procedure creates an optimal candidate shortened version of the PRO, defined by selecting the items that maximise the TIF across the latent spectrum of the underlying construct, as compared with all other possible shortened versions of the same length. Based on previously established guidelines, the OTA procedure was anchored at five points across the spectrum of the underlying construct (−3, –1, 0, 1, 3), jointly maximising the objective function at these points.16 Each of the 15 candidate short versions and the full-length form were scored using two procedures to obtain estimates of each participant’s level of anxiety surrounding situations in which one’s appearance will be evaluated. First, the summed scores across all items included in the form were calculated by adding item scores for each item included in the form. Second, factor scores, which estimate a level of a latent construct, were estimated from the GPCM for each participant for each form through an application of Bayes’ theorem. Although summed scores are typically relied on for clinical use, the factor scores were considered to provide a better estimate of the underlying construct. This is because of limitations of summed scores under the GPCM. Summed scores may result in an incorrect ordering of patients along the spectrum of the underlying construct. That is, patients with lower levels of fear may have higher summed scores than patients with higher levels of fear.31 32

Selection of the final form

OTA generates optimal candidate short versions of the SAAS but does not provide criteria by which the final form should be selected. When items are eliminated from the full-length form, the amount of test information inherently decreases, and there is no obvious threshold at which a shortened version would be said to contain adequate information. Therefore, the selection of the final form was based on five criteria: reliability, concurrent validity based on summed scores, concurrent validity based on factor scores, convergent validity based on summed scores and convergent validity based on factor scores. Applying these five criteria concurrently ensured that the final selected shortened version maintains desirable measurement properties across these categories. First, the reliability of each candidate shortened version and the full-length form was assessed using Cronbach’s α coefficient. The shortened version was required to maintain at least 95% of the value of Cronbach’s α for the full-length form. Second, concurrent validity for both summed and factor scores were assessed by calculating Pearson’s correlation coefficient between the scores on each candidate shortened version and the scores on the full-length form. For both the summed and factor scores, these correlations were required to be at least 0.95, ensuring that the shortened version demonstrated high concurrent validity. Lastly, convergent validity was assessed through the correlation between each patient’s score on the SAAS and their score on the BFNE II. The candidate shortened versions were required to demonstrate statistical equivalence within a tolerance of 0.05 with the convergent validity of the full-length SAAS through an application of equivalence testing. Equivalence testing, more commonly used in clinical trials, tests whether the difference between two correlations is within a prespecified range, in this case set at 0.05.33 34 Contrary to traditional hypothesis testing, equivalence testing tests a null hypothesis that the difference between the two correlations is greater than the prespecified range, against an alternative hypothesis of equivalence within the prespecified range. To assess statistical significance, we applied the Benjamini-Hochberg correction procedure for each of the 30 hypothesis tests used (15 candidate shortened versions × two scoring procedures).35

Post hoc convergent validity of the shortened form

Convergent validity of the selected shortened form was compared with that of the full-length form. Correlations between the summed scores of the selected shortened form and those of four other measures: the two subscales of the Brief-SWAP, the PHQ-8 and the SIAS-6 were calculated. Statistical equivalence was assessed within a tolerance of 0.05 with the convergent validity of the full-length SAAS using Benjamini-Hochberg adjusted p values. All analyses were conducted in R Studio V.1.0.136.36 The GPCM was fit using the ltm package.37 The OTA analysis was conducted using the lpSolveApi package.38

Patient involvement

SPIN was conceived by a collaboration of investigators and patients. SPIN’s Patient Advisory Board advises the SPIN Steering Committee on priorities for investigation. Patients were included in the SPIN Publication Committee, which reviewed the proposal for the present study and its methods. Two patients were coauthors of the present report.

Results

There were 926 people who completed both the SAAS and BFNE II. The mean age was 55.6 years, 88% were women and 43% had diffuse SSc. The mean±SD score on the SAAS was 28.3±24. SAAS scores in adults with diffuse SSc were significantly higher than adults with limited SSc (p<0.001). See table 1 for descriptive statistics.

Table 1

Patient demographic and disease characteristics (n=926)

Sociodemographic and medical variables	Values
Age*, years, mean±SD (range)	55.6±11.8 (18.6–84.9)
Women, n (%)	813 (88)
Married/cohabitating, n (%)	681 (74)
Time since the onset of the first non-Raynaud’s symptoms, years†, mean±SD (range)	11.8±8.9 (0.1–46.2)
Patients with diffuse SSc, n (%)	399 (43)
MRSS‡, mean±SD (range)	8.0±8.8 (0–48)
SAAS score, mean (median)±SD (range)	28.3 (24)±13.2 (16–80)
Mean score (median)±SD (range) in diffuse SSc subset	30.7 (26)±15.5 (16–80)
Mean score (median)±SD (range) in limited SSc subset	16.5 (22)±11.9 (16–79)
BFNE II, mean (median)±SD (range)	24.7 (21)±12.1 (12–60)
Brief-SWAP Dissatisfaction with Appearance*, mean (median)±SD (range)	9.25 (9)±5.24 (0–18)
Brief-SWAP Social Discomfort§, mean (median)±SD (range)	5.33 (4)±5.26 (0–18)
PHQ-8§, mean (median)±SD (range)	6.14 (5)±5.35 (0–24)
SIAS-6¶, mean (median)±SD (range)	2.43 (1)±3.81 (0–24)

Due to missing values: *N=922; †N=861; ‡N=722; §N=921; ¶N=920.

BFNE II, Brief Fear of Negative Evaluation II; Brief-SWAP, Brief Satisfaction with Appearance Scale; MRSS, modified Rodnan skin score; SAAS, Social Anxiety Appearance Scale; SIAS-6, Social Interaction Anxiety Scale-6; SSc, systemic sclerosis.

Patient demographic and disease characteristics (n=926) Due to missing values: *N=922; †N=861; ‡N=722; §N=921; ¶N=920. BFNE II, Brief Fear of Negative Evaluation II; Brief-SWAP, Brief Satisfaction with Appearance Scale; MRSS, modified Rodnan skin score; SAAS, Social Anxiety Appearance Scale; SIAS-6, Social Interaction Anxiety Scale-6; SSc, systemic sclerosis.

Item response theory model and OTA

The GPCM was fit to the 16 items of the SAAS. Table 2 shows the item content, along with the discrimination parameters estimated from the GPCM. The three items with the highest amount of discriminative ability and, therefore, the most influential on the TIF, were items 6, 7 and 13. The items with the least amount of discriminative ability and, therefore, the least influential on the TIF, were items 1, 2 and 15. Figure 1 shows the individual item information functions generated from the estimates from the GPCM and the TIF.

Table 2

SAAS items and discrimination parameters from the GPCM

Item	Description	Discrimination parameter
1	I feel comfortable with the way I appear to others.	0.58
2	I feel nervous when having my picture taken.	0.90
3	I get tense when it is obvious people are looking at me.	1.57
4	I am concerned people won’t like me because of the way I look.	2.12
5	I worry that others talk about flaws in my appearance when I am not around.	2.30
6	I am concerned that people will find me unappealing because of my appearance.	3.28
7	I am afraid people find me unattractive.	3.34
8	I worry that my appearance will make life more difficult for me.	2.03
9	I am concerned that I have missed out on opportunities because of my appearance.	1.58
10	I get nervous when talking to people because of the way I look.	3.14
11	I feel anxious when other people say something about my appearance.	2.31
12	I am frequently afraid that I won’t meet others’ standards of how I should look.	2.94
13	I worry people will judge the way I look negatively.	3.24
14	I am uncomfortable when I think others are noticing flaws in my appearance.	3.00
15	I worry that a romantic partner will/would leave me because of my appearance.	1.04
16	I am concerned that people think I am not good looking.	2.59

The SAAS is available in the public domain.

SAAS, Social Appearance Anxiety Scale.

Figure 1

Item and test information curves of the SAAS. The left hand plot shows the 16 individual item information curves. The right hand plot compares the test information functions of the full SAAS (solid line) and SAAS-5 (dashed line). SAAS, Social Appearance Anxiety Scale.

SAAS items and discrimination parameters from the GPCM The SAAS is available in the public domain. SAAS, Social Appearance Anxiety Scale. Item and test information curves of the SAAS. The left hand plot shows the 16 individual item information curves. The right hand plot compares the test information functions of the full SAAS (solid line) and SAAS-5 (dashed line). SAAS, Social Appearance Anxiety Scale. The OTA procedure generated 15 candidate short versions that each maximised the total amount of test information among all shortened versions of that length. Online supplementary appendix table 1 shows the items that were selected by the OTA procedure for each of the 15 candidate short versions. Items 6, 7, and 14 were included in all short forms of length at least three items. Although question 13 had a higher discrimination parameter estimate than question 14, it was not included in shortened forms of lengths shorter than 4. This is because the OTA procedure accounts for a more complete assessment of an item than just its discrimination parameter.20 That is, if two items have the same level of discrimination, but provide information at the same point on the latent spectrum, then the OTA procedure may not select both items into the shortened form. Items 1, 12, and 15 were the first three items dropped from the candidate shortened versions. These items all had low information across the spectrum of social appearance anxiety.

Selection of the final shortened version

Table 3 presents Cronbach’s α values and concurrent validity correlations for the 16 candidate short forms. Even for shortened versions with very few items, the values of Cronbach’s α and the validity correlations remained high. Table 4 presents results of the equivalency tests for the convergent validity correlation with the BFNE II. The two-item shortened version, and all versions with at least four items, demonstrated statistically significant equivalency for both the correlations between the summed and factor scores of the full SAAS with the BFNE II. All shortened versions with at least five items satisfied our prespecified criteria in terms of reliability, concurrent validity and convergent validity. Therefore, the five-item shortened version (SAAS-5, see online supplementary appendix table 2) was the shortest candidate version to fulfil our requirements. Versions shorter than the SAAS-5 failed to meet the criteria on concurrent validity for the factor scores from the GPCM.

Table 3

Properties of optimal shortened versions

Short form length	Cronbach’s α	Correlation of summed scores with full form score (95% CI)	Correlation of factor scores with full form score (95% CI)
1	NA	0.892 (0.878 to 0.904)	NA
2	0.859	0.947 (0.940 to 0.953)	0.928 (0.918 to 0.936)
3	0.921	0.957 (0.951 to 0.962)	0.928 (0.918 to 0.936)
4	0.937	0.967 (0.963 to 0.971)	0.946 (0.939 to 0.952)
5	0.947	0.969 (0.965 to 0.973)	0.952 (0.945 to 0.957)
6	0.953	0.975 (0.972 to 0.978)	0.958 (0.953 to 0.963)
7	0.959	0.978 (0.975 to 0.981)	0.964 (0.959 to 0.968)
8	0.962	0.981 (0.979 to 0.984)	0.972 (0.968 to 0.975)
9	0.965	0.983 (0.980 to 0.985)	0.975 (0.971 to 0.978)
10	0.967	0.984 (0.982 to 0.986)	0.977 (0.973 to 0.979)
11	0.969	0.987 (0.985 to 0.988)	0.980 (0.978 to 0.983)
12	0.969	0.990 (0.989 to 0.991)	0.988 (0.986 to 0.989)
13	0.970	0.992 (0.991 to 0.993)	0.989 (0.988 to 0.992)
14	0.967	0.996 (0.995 to 0.996)	0.995 (0.994 to 0.995)
15	0.967	0.997 (0.997 to 0.998)	0.996 (0.995 to 0.996)
16	0.964	1.000 (1.000 to 1.000)	1.000 (1.000 to 1.000)

Bold values represent those of the final selected short form.

NA, not applicable.

Table 4

Equivalency analysis results

Short form length	Correlations with the BFNE II (95% CI)		Equivalency analysis corrected p values
Short form length	Summed scores	Factor scores	Summed scores	Factor scores
1	0.568 (0.523 to 0.611)	NA	1.000	NA
2	0.644 (0.605 to 0.680)	0.653 (0.615 to 0.689)	<0.001	0.003
3	0.635 (0.595 to 0.672)	0.632 (0.591 to 0.669)	<0.001	0.392
4	0.651 (0.612 to 0.687)	0.657 (0.591 to 0.669)	<0.001	<0.001
5	0.656 (0.618 to 0.691)	0.675 (0.638 to 0.709)	<0.001	<0.001
6	0.659 (0.621 to 0.694)	0.677 (0.640 to 0.710)	<0.001	<0.001
7	0.660 (0.622 to 0.695)	0.680 (0.644 to 0.713)	<0.001	<0.001
8	0.661 (0.623 to 0.696)	0.680 (0.643 to 0.713)	<0.001	<0.001
9	0.663 (0.625 to 0.697)	0.682 (0.646 to 0.715)	<0.001	<0.001
10	0.664 (0.627 to 0.699)	0.684 (0.648 to 0.717)	<0.001	<0.001
11	0.663 (0.626 to 0.699)	0.682 (0.646 to 0.717)	<0.001	<0.001
12	0.670 (0.633 to 0.704)	0.688 (0.653 to 0.721)	<0.001	<0.001
13	0.664 (0.626 to 0.698)	0.685 (0.649 to 0.714)	<0.001	<0.001
14	0.663 (0.626 to 0.698)	0.680 (0.644 to 0.714)	<0.001	<0.001
15	0.658 (0.620 to 0.693)	0.678 (0.641 to 0.711)	<0.001	<0.001
16	0.664 (0.626 to 0.698)	0.679 (0.642 to 0.712)	NA	NA

Bold values represent those of the final selected short form.

BFNE II, Brief Fear of Negative Evaluation II; NA, not applicable.

Properties of optimal shortened versions Bold values represent those of the final selected short form. NA, not applicable. Equivalency analysis results Bold values represent those of the final selected short form. BFNE II, Brief Fear of Negative Evaluation II; NA, not applicable. The SAAS-5 includes item 6 (‘I am concerned that people will find me unappealing because of my appearance’), item 7 (‘I am afraid people find me unattractive’), item 12 (‘I am frequently afraid that I won’t meet others’ standards of how I should look’), item 13 (‘I worry people will judge the way I look negatively’) and item 14 (‘I am uncomfortable when I think others are noticing flaws in my appearance’). The SAAS-5 had a Cronbach’s α of 0.95 as compared with the Cronbach’s α of the full-length form of 0.96. Thus, the SAAS-5 maintained high reliability. The correlation of the summed scores from the SAAS-5 with those from the full 16-item SAAS scores was r=0.97 (95% CI 0.97 to 0.97). The correlation of the factor scores between the full-length and shortened versions was r=0.95 (95% CI 0.95 to 0.96). The summed scores on the SAAS-5 maintained moderate-to-high positive correlation with the BFNE II (r=0.66, 95% CI 0.62 to 0.69) compared with 0.66 (95% CI 0.63 to 0.70) for the 16-item SAAS. Similarly, the factor scores on the SAAS-5 maintained moderate-to-high positive correlations with the BFNE II (r=0.68, 95% CI 0.64 to 0.71) compared with 0.68 (95% CI 0.64 to 0.71) for the full SAAS. The mean score on the SAAS-5 in this sample was 8.29 with an SD of 4.55 and possible range of 5 to 25.

Post hoc convergent validity of the SAAS-5

The convergent validity of the SAAS-5 was statistically equivalent, within a tolerance of 0.05, to that of the full-length SAAS for the two subscales of the Brief-SWAP, and the SIAS-6, as shown in table 5. The convergent validity correlation was not statistically equivalent for the PHQ-8. However, even for this measure, convergent validity was moderate for both the SAAS-5 and full-length version.

Table 5

Convergent validity correlations

Measure	Full SAAS, r (95% CI)	SAAS-5, r (95% CI)	Equivalency corrected p values
Brief-SWAP Dissatisfaction with Appearance	0.411 (0.356 to 0.463)	0.375 (0.318 to 0.429)	0.048
Brief-SWAP Social Discomfort	0.729 (0.697 to 0.758)	0.694 (0.659 to 0.726)	0.007
PHQ-8	0.528 (0.480 to 0.573)	0.472 (0.421 to 0.521)	0.797
SIAS-6	0.547 (0.500 to 0.591)	0.518 (0.469 to 0.563)	0.005

Brief-SWAP, Brief Satisfaction with Appearance Scale; PHQ-8, Patient Health Questionnaire-8; SAAS, Social Appearance Anxiety Scale; SIAS-6, Social Interaction Anxiety Scale-6.

Convergent validity correlations Brief-SWAP, Brief Satisfaction with Appearance Scale; PHQ-8, Patient Health Questionnaire-8; SAAS, Social Appearance Anxiety Scale; SIAS-6, Social Interaction Anxiety Scale-6.

Discussion

This study investigated how OTA methods can be used to develop shortened versions of PRO measures, using a measure of social appearance anxiety—the SAAS. The 16-item SAAS was shortened to a five-item version through a reproducible process based on prespecified and objective criteria. The SAAS-5 maintained high reliability (α=0.95), high concurrent validity with the full-length form, with an r=0.97 (95% CI 0.97 to 0.97) for summed scores, and an r=0.95 (95% CI 0.95 to 0.96) for factor scores. The SAAS-5 maintained statistically equivalent convergent validity correlations with the BFNE II for both summed and factor scores. Furthermore, the SAAS-5 maintained statistically equivalent convergent validity correlations for the two subscales of the Brief-SWAP and SIAS-6 to that of the full-length form. Although the SAAS-5 did not maintain a statistically equivalent convergent validity correlation with the PHQ-8, this does not suggest poor convergent validity. This may have occurred because items that captured symptoms most relevant to depression in the SAAS are no longer included in the SAAS-5. Scores on the short-form remained moderately correlated in the expected direction with the PHQ-8. In addition to its measurement properties, face validity, or the degree to which a test appears to measure what it reports to measure,39 is strong for the SAAS-5. The items of the SAAS-5 assess concern about being unappealing and unattractive, not meeting others’ appearance standards, worry about appearance-related judgement and discomfort when others notice appearance-related flaws. These items all appear to measure aspects of social appearance anxiety, or a fear of situations in which one’s appearance will be evaluated. Thus, findings from the present study suggest that the SAAS-5 is a brief, valid and reliable measure of social appearance anxiety among adults with SSc. The SAAS-5 may be preferred over the 16-item SAAS as it reduces participant burden, which is particularly important among adults with SSc who may have difficulty completing self-report questionnaires due to restricted physical functioning. By reducing the number of items in measures like the SAAS, researchers may be able to increase the number of constructs that are measured in hard-to-access populations, such as people living with rare diseases, including SSc. There are several limitations that must be considered in this study. First, the SPIN Cohort is a convenience sample of patients receiving treatment at SPIN recruiting centres and who completed study questionnaires online. In addition, this sample had a relatively low skin score, which may limit the generalisability of study findings to patients with low disease severity. This study used cross-sectional data, and therefore, the sensitivity to change or intervention status, discriminant validity and test–retest reliability of the SAAS-5 were not investigated. The purpose in the present study was to illustrate the use of OTA in creating a shortened version of a full-length form and to propose a new shortened version of the SAAS. Future studies should investigate these properties in order to assess the discriminant, predictive and evaluative characteristics of the SAAS-5. Furthermore, the assessment of longitudinal changes that are clinically meaningful due to, for example, treatment in a clinical trial of SSc patients, would need further study. Therefore, this may limit the utility of the SAAS-5 as an evaluative measure in patients with SSc. The method used in this study does not include content validity or expert assessment of the items selected into the shortened form. Had an expert panel or focus group of patients been convened, they may have selected a different subset of items into the shortened form. An expert panel may have been able to use their knowledge to select items that were appropriate for the detection of clinically meaningful changes of worsening or improvement. However, such a procedure would not rely directly on patient data, may not be replicable and may result in reduced measure validity based on imperfect clinical intuition.20 More resource-intensive methods for developing short forms, such focus groups and content experts, along with replicable statistical criteria, would be ideal. However, the resources necessary to complete these procedures may represent a substantial barrier to the development of shortened forms. The OTA method provides a replicable method that maintains performance standards based on objective criteria and provides a more feasible method. The OTA procedure is sensitive to the investigator-defined choice of decision criteria in the selection of the final shortened version. These decision criteria, when applied in future studies, must be carefully considered by researchers. Furthermore, the OTA method treats the 16 items of the SAAS as if they represented a full item bank of possible items. It is possible that if other items were considered that a different set of items would have been selected into the final shortened version. The OTA procedure is data driven, and results of this study should be replicated in this patient population. An analysis based on one sample of SSc patients may not be sufficient for the derivation of a disease-specific measure. The results of this study are only as applicable for patients with SSc as the original full-length SAAS. It should be noted that the original SAAS was developed based on three different samples of volunteers from introductory psychology courses at large public universities. Therefore, even the original SAAS instrument might not provide sufficient coverage in terms of content validity for patients with SSc. Lastly, future work should assess whether the SAAS-5 is the optimal shortened form in other patient populations, as well as the general population, as results of this study are limited in their generalisability beyond patients with SSc.

Conclusion

In summation, this study showed how OTA methods might be used to shorten PROs. This method was used to shorten the 16-item SAAS to a five-item version while maintaining comparable reliability and validity among a sample of adults with SSc. This analysis should be replicated in this patient population, as well as other patient populations, to increase the generalisability of these findings. Moreover, expert opinions or focus groups should be solicited to assess whether the items selected into the shortened form match clinical intuition.

24 in total

Review 1. Understanding equivalence and noninferiority testing.

Authors: Esteban Walker; Amy S Nowacki
Journal: J Gen Intern Med Date: 2010-09-21 Impact factor: 5.128

2. Scleroderma (systemic sclerosis): classification, subsets and pathogenesis.

Authors: E C LeRoy; C Black; R Fleischmajer; S Jablonska; T Krieg; T A Medsger; N Rowell; F Wollheim
Journal: J Rheumatol Date: 1988-02 Impact factor: 4.666

3. Equivalence tests for comparing correlation and regression coefficients.

Authors: Alyssa Counsell; Robert A Cribbie
Journal: Br J Math Stat Psychol Date: 2014-10-27 Impact factor: 3.380

4. Mood and anxiety disorders in systemic sclerosis patients.

Authors: Thierry Baubet; Brigitte Ranque; Olivier Taïeb; Alice Bérezné; Olivier Bricou; Salim Mehallel; Christine Moroni; Catherine Belin; Christian Pagnoux; Marie-Rose Moro; Loïc Guillevin; Luc Mouthon
Journal: Presse Med Date: 2010-11-04 Impact factor: 1.228

5. Using Optimal Test Assembly Methods for Shortening Patient-Reported Outcome Measures: Development and Validation of the Cochin Hand Function Scale-6: A Scleroderma Patient-Centered Intervention Network Cohort Study.

Authors: Alexander W Levis; Daphna Harel; Linda Kwakkenbos; Marie-Eve Carrier; Luc Mouthon; Serge Poiraudeau; Susan J Bartlett; Dinesh Khanna; Vanessa L Malcarne; Maureen Sauve; Cornelia H M van den Ende; Janet L Poole; Anne A Schouffoer; Joep Welling; Brett D Thombs
Journal: Arthritis Care Res (Hoboken) Date: 2016-10-09 Impact factor: 4.794

Review 6. Psychosocial Aspects of Scleroderma.

Authors: Linda Kwakkenbos; Vanessa C Delisle; Rina S Fox; Shadi Gholizadeh; Lisa R Jewett; Brooke Levis; Katherine Milette; Sarah D Mills; Vanessa L Malcarne; Brett D Thombs
Journal: Rheum Dis Clin North Am Date: 2015-05-27 Impact factor: 2.670

7. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years.

Authors: David Cella; Susan Yount; Nan Rothrock; Richard Gershon; Karon Cook; Bryce Reeve; Deborah Ader; James F Fries; Bonnie Bruce; Mattias Rose
Journal: Med Care Date: 2007-05 Impact factor: 2.983

8. The PHQ-8 as a measure of current depression in the general population.

Authors: Kurt Kroenke; Tara W Strine; Robert L Spitzer; Janet B W Williams; Joyce T Berry; Ali H Mokdad
Journal: J Affect Disord Date: 2008-08-27 Impact factor: 4.839

9. Development and examination of the social appearance anxiety scale.

Authors: Trevor A Hart; David B Flora; Sarah A Palyo; David M Fresco; Christian Holle; Richard G Heimberg
Journal: Assessment Date: 2008-03

10. Shortening self-report mental health symptom measures through optimal test assembly methods: Development and validation of the Patient Health Questionnaire-Depression-4.

Authors: Miyabi Ishihara; Daphna Harel; Brooke Levis; Alexander W Levis; Kira E Riehm; Nazanin Saadat; Marleine Azar; Danielle B Rice; Tatiana A Sanchez; Matthew J Chiovitti; Pim Cuijpers; Simon Gilbody; John P A Ioannidis; Lorie A Kloda; Dean McMillan; Scott B Patten; Ian Shrier; Bruce Arroll; Charles H Bombardier; Peter Butterworth; Gregory Carter; Kerrie Clover; Yeates Conwell; Felicity Goodyear-Smith; Catherine G Greeno; John Hambridge; Patricia A Harrison; Marie Hudson; Nathalie Jetté; Kim M Kiely; Anthony McGuire; Brian W Pence; Alasdair G Rooney; Abbey Sidebottom; Adam Simning; Alyna Turner; Jennifer White; Mary A Whooley; Kirsty Winkley; Andrea Benedetti; Brett D Thombs
Journal: Depress Anxiety Date: 2018-09-20 Impact factor: 8.128

5 in total

1. A Short Form of the Chinese Version of the Weinstein Noise Sensitivity Scale through Optimal Test Assembly.

Authors: Sha Li; Daniel Yee Tak Fong; Sarah Lai Yin Wan; Bradley McPherson; Esther Yuet Ying Lau; Lixi Huang; Mary Sau Man Ip; Janet Yuen Ha Wong
Journal: Int J Environ Res Public Health Date: 2021-01-20 Impact factor: 3.390

2. Mask-Wearing Behavior During the COVID-19 Pandemic: A Cross-Cultural Comparison Between the United States and South Korea.

Authors: Hyo Jung Julie Chang; Seoha Min; Hongjoo Woo; Jennifer Yurchisin
Journal: Fam Consum Sci Res J Date: 2021-08-22

3. Development and Evaluation of Short-Form Measures of the HIV/AIDS Knowledge Assessment Tool Among Sexual and Gender Minorities in Brazil: Cross-sectional Study.

Authors: Rayanne C Ferreira; Thiago S Torres; Maria Das Graças B Ceccato; Daniel Rb Bezerra; Brett D Thombs; Paula M Luz; Daphna Harel
Journal: JMIR Public Health Surveill Date: 2022-03-29

4. Assessing differential item functioning for the Social Appearance Anxiety Scale: a Scleroderma atient-centred Intervention Network (SPIN) Cohort Study.

Authors: Sophia J Sommer; Daphna Harel; Linda Kwakkenbos; Marie-Eve Carrier; Shadi Gholizadeh; Karen Gottesman; Catarina Leite; Vanessa L Malcarne; Brett D Thombs
Journal: BMJ Open Date: 2020-10-12 Impact factor: 2.692

5. Nonrestorative sleep scale: a reliable and valid short form of the traditional Chinese version.

Authors: S Li; D Y T Fong; J Y H Wong; K Wilkinson; C Shapiro; E P H Choi; B McPherson; E Y Y Lau; C L K Lam; L X Huang; M S M Ip
Journal: Qual Life Res Date: 2020-05-16 Impact factor: 4.147

5 in total