Literature DB >> 31758485

Deriving an overall appearance domain score by applying bifactor IRT analysis to the BODY-Q appearance scales.

Daan Geerards1,2,3, Lisa van den Berg1,2,3, Andrea L Pusic1,2, Maarten M Hoogbergen3, Anne F Klassen4, René R W J van der Hulst5, Chris J Sidey-Gibbons6,7,8.   

Abstract

PURPOSE: With the BODY-Q, one can assess outcomes, such as satisfaction with appearance, in weight loss and body contouring patients using multiple scales. All scales can be used independently in any given combination or order. Currently, the BODY-Q cannot provide overall appearance scores across scales that measure a similar super-ordinate construct (i.e., overall appearance), which could improve the scales' usefulness as a benchmarking tool and improve the comprehensibility of patient feedback. We explored the possibility of establishing overall appearance scores, by applying a bifactor model to the BODY-Q appearance scales.
METHODS: In a bifactor model, questionnaire items load onto both a primary specific factors and a general factor, such as satisfaction with appearance. The international BODY-Q validation patient sample (n = 734) was used to fit a bifactor model to the appearance domain. Factor loadings, fit indices, and correlation between bifactor appearance domain and satisfaction with body scale were assessed.
RESULTS: All items loaded on the general factor of their corresponding domain. In the appearance domain, all items demonstrated adequate item fit to the model. All scales had satisfactory fit to the bifactor model (RMSEA 0.045, CFI 0.969, and TLI 0.964). The correlation between the appearance domain summary scores and satisfaction with body scale scores was found to be 0.77. DISCUSSION: We successfully applied a bifactor model to BODY-Q data with good item and model fit indices. With this method, we were able to produce reliable overall appearance scores which may improve the interpretability of the BODY-Q while increasing flexibility.

Entities:  

Keywords:  Appearance; BODY-Q; Bifactor; Body contouring; Item response theory; Massive weight loss; Obesity; Patient-reported outcome measures; Psychometrics

Mesh:

Year:  2019        PMID: 31758485      PMCID: PMC7142051          DOI: 10.1007/s11136-019-02366-8

Source DB:  PubMed          Journal:  Qual Life Res        ISSN: 0962-9343            Impact factor:   4.147


Background

The BODY-Q is a patient-reported outcome measure (PROM) designed to assess outcomes of people who undergo weight loss and/or body contouring. The BODY-Q can be used over an entire trajectory from obesity through to weight loss and subsequent body contouring surgery. The original BODY-Q framework consisted of 18 independently functioning scales (i.e., subdomains) in three different top-level domains (referred to as overall appearance scores in bifactor literature): appearance (7 scales), health-related quality of life (HR-QoL) (5 scales), and experience of care (4 scales) [1]. Additional scales (i.e., appearance of chest, nipples and stretch marks, appearance-related distress, and expectations) have been developed and published [2-4]. The scales contain 4 to 10 items, all scored on a Likert scale from 1 (e.g., ‘Definitely disagree’ or ‘Very dissatisfied’) to 4 (e.g., ‘Definitely agree’ or ‘Very satisfied’). Raw scores are converted into scores ranging from 0 (worst) to 100 (best) [1]. The BODY-Q questionnaire is currently being administered in both paper-based and Web-based form in multiple countries. Recently, computerized adaptive testing (CAT) of the BODY-Q was developed, which can reduce the number of items that a patient would need to complete to obtain a reliable score for each BODY-Q scale [5]. Systematic review evidence suggests that the BODY-Q is a valid and reliable tool for measuring outcomes following weight loss and body contouring surgeries [6]. One of the features of the BODY-Q is the set of appearance scales that measure satisfaction with the body overall and for specific areas (upper arms, abdomen, back, buttocks, inner thighs, and hips and outer thighs). These scales were designed specifically for obese and massive weight loss patients. However, there are some situations whereby overall appearance scores for body appearance could provide several benefits. Firstly, for example, an item about satisfaction with abdomen may contain not only information about how a patient feels about his/her abdomen but may also contain information about overall appearance. This latent information is not utilized in current unidimensional measurement models (i.e., the partial credit Rasch model). Secondly, individual scale scores may become more accessible to interpret if separate appearance scales scores can be related to an overall appearance score. Thirdly, providing feedback to patients and physicians is desirable in outcome assessment and is made less complicated by providing a few summary scores instead of up to 7 separate scale scores. Lastly, benchmarking results for health care insurance, clinics, clinicians, or even individual patients might become more straightforward with overall domain scores instead of up to 7 different scales scores. Earlier studies have made use of a bifactor model in outcome assessment, especially in mental health and quality of life research [7-14]. To our knowledge, only Kleif et al. applied a bifactor model to a surgical population [15]. An analysis using the bifactor model may have the potential to establish an overall domain score, potentially resulting in the aforementioned advantages. This study explores the feasibility of producing summary scores of the BODY-Q appearance domain through regular scale administration by applying a bifactor model to the BODY-Q.

Methods

Patient sample

The data sample for the bifactor analysis consisted of 734 patients (403 weight loss patients and 331 body contouring patients) from different practices in the United States (185 patients), Canada (412 patients), and the United Kingdom (137 patients). Patient demographics and characteristics are available in literature elsewhere [1].

Bifactor model

Bifactor analysis was first described by Holzinger and Swineford in 1937 and extended to a confirmatory multidimensional Item Response Theory (IRT) model by Gibbons and colleagues [13, 16, 17]. In a bifactor model, which is a hierarchical model, there is a two-level structure. All items are assumed to load on both a primary or overall appearance score (e.g., satisfaction with appearance) and a secondary or lower order dimension (e.g., satisfaction with abdomen) [18]. Items within a scale (e.g., satisfaction with abdomen) can have a high correlation, compared to items between scales (e.g., satisfaction with abdomen vs. satisfaction with outer hips). When this is the case, there are as many dimensions as there are scales (i.e., subdomains), which is a violation of unidimensional IRT. This violation could be dealt with by using a bifactor IRT model [19]. In the same approach as described, the bifactor model might be applicable to a BODY-Q appearance domain.

Domains and scales

For the appearance domain, the skin and scar scales were excluded from the analysis as they are only applicable to some patients at some timepoints, skin for patients after massive weight loss with excess skin, and scar for patients after body contouring surgery [1]. All seven remaining scales were included in the analysis: satisfaction with body, abdomen, upper arms, back, buttocks, hips and outer thighs, and inner thighs.

Analysis

Analysis was performed in R (version 3.4.3). The mirt package was used to estimate the bifactor models including multidimensional IRT parameters [20, 21]. Item fit values were derived by using the ‘itemfit’ function with item type set to graded response model. Factor loading values per item were collected with the ‘bfactor’ function, where each scale resembled a separate factor. Item parameters were derived with the ‘coef’ function within the mirt package. Patients undergoing surgery for cosmetic reasons only completed the scales related to their procedures (e.g., arms scale for brachioplasty patients and/or patients with excess skin on upper arms), whereas weight loss patients completed all appearance scales. Furthermore, respondents were not obliged to complete every item within a scale. Due to the nature of the mirt package, it was necessary to impute missing data (23%) in order to derive model fit statistics. Plausible values for missing data were therefore imputed using a 2PL graded response model for each of the separate subscales prior to assessment [21].

Outcomes

Outcomes assessed were factor loadings (FL) of the scales within the appearance domain, Chi square statistics, root mean square error of approximation (RMSEA) [22], Tucker-Lewis Index (TLI) [23], and comparative fit index (CFI) [24]. Factor loadings can be described as a standardized regression coefficient. These values indicate how strongly an observed variable (i.e., an item) relates to one or more underlying latent factors (i.e., scale or domain score) and are considered as strongly related if a value is 0.4 or higher [25]. The Chi square value illustrates if an observed variable score corresponds to the expected variable score. A non-significant Chi square value (p > 0.01) indicates that the item fits; however, Chi square statistics are more prone to bias in large samples, such as ours [26]. Other fit indices, such as RMSEA, TLI, and CFI, take sample size into account [27]. Based on research using structural equation modeling (SEM), TLI and CFI values above 0.90 indicate adequate fit. Similarly, for RMSEA, a value below 0.05 represents a good fit, and a value higher than 0.10 represents a poor fit. [22, 27, 28]. We evaluated the usefulness of the overall appearance score with the estimated common variance (ECV) statistic. The ECV statistic is a useful indication of extent to which the general factor explains the variance in scores [14]. The statistic ranges from 0 to 1 where 1 is perfectly unidimensional. Though few studies have evaluated the validity of different thresholds for the ECV statistic, a value of .90 or greater than .90 could be considered essentially unidimensional, and below .70 sufficiently multidimensional to fit the data to a multidimensional IRT model [29]. We assessed the correlation between the appearance bifactor domain scores, with the satisfaction with body scale excluded, and original satisfaction with body scale scores. We also determined the correlation between all 7 subscales (Table 1).
Table 1

Satisfaction with Body Scale. Item descriptions are not intended for replication. Please visit the Q Portfolio website for full item wording

Item content*Very dissatisfiedSomewhat dissatisfiedSomewhat satisfiedVery satisfied
1. …looks when dressed1234
2. …how clothes fit1234
3. …size1234
4. …Shape1234
5. …looks in photos1234
6. …looks from behind1234
7. …Looks from the side1234
8. …Looks in summer clothes1234
9. …Looks in a swimsuit1234
10. …Look in a mirror unclothed1234
Satisfaction with Body Scale. Item descriptions are not intended for replication. Please visit the Q Portfolio website for full item wording

Results

All factor loadings for the corresponding items can be seen in Table 2. It was found that all items (n = 42) had substantial loadings onto both the primary and overall appearance factors (FL > 0.40, FL > 0.69, respectively), indicating that all BODY-Q items represent valuable components of the primary or overall appearance factor (i.e., that these items were adequately related to overall appearance satisfaction).
Table 2

Appearance items and factor loadings (2 = Chi square, df = degrees of freedom)

ScaleItemsFactor loadingsItem fit for primary factor
Primary factorBodyAbdomenUpper armsBackButtocksHips and outer thighInner thighs2dfp2
Body1. Looks when dressed0.8240.44689. 086860.388
2. How clothes fit0.7640.52486.473950.722
3. Size0.8500.41586.129840.415
4. Shape0.7810.46795.226890.306
5. Looks in photos0.8040.479117.523940.051
6. Looks from the behind0.8830.30360.422770.918
7. Looks from the side0.8430.28870.043730.576
8. Looks in summer clothes0.9040.19566.743660.451
9. Looks in a swimsuit0.9260.10558.939620.587
10. Looks in mirror unclothed0.9300.08354.827480.232
Abdomen1. How clothes fit0.8570.45972.373770.628
2. Size0.8620.45367.326710.602
3. Looks from the side0.8780.37337.724540.955
4. Shape0.8470.45479.861870.694
5. Looks in a swimsuit0.8620.45763.472690.665
6. How toned0.8960.35465.640640.420
7. Looks when naked0.8970.35357.039480.174
Upper arms1. Size0.7600.46584.389900.647
2. How smooth0.7660.50388.027920.598
3. Shape0.6550.560130.3161000.022
4. How skin looks0.6820.52195.5071020.662
5. How toned0.7430.521104.574890.124
6. Look when lifted up0.7620.508104.286890.128
7. Look when not covered0.7060.537106.9761000.298
Back1. How smooth0.8460.39971.771660.293
2. Looks from different angles0.8290.447101.008760.029
3. How toned0.8480.44856.724660.785
4. Looks when naked0.8450.418106.297810.031
Buttocks1. Size0.8470.37880.269810.502
2. Look from the side0.8470.41575.202810.661
3. Shape0.8160.42796.342890.279
4. How smooth0.8340.39088.265870.442
5. How skin looks0.8470.37972.304810.744
Hips and outer thighs1. Size0.8870.38267.642720.624
2. Shape0.8800.39157.154710.883
3. How skin looks0.8730.38466.725720.654
4. How smooth0.8680.36260.471740.871
5. Look from behind 0.8780.36174.897700.323
Inner thighs1. How smooth0.7690.51783.268860.563
2. How skin looks0.7650.53586.415840.407
3. How toned0.8010.45378.241750.376
4. Look when naked0.7860.47886.944760.184
Appearance items and factor loadings (2 = Chi square, df = degrees of freedom) The highest loading item was “How your body looks in the mirror unclothed?” (FL = 0.930). The lowest loading item was “How satisfied are you with the shape of your upper arms?” (FL = 0.655). Without modification, all 42 items in the appearance domain demonstrated an adequate fit to the model based on a p > 0.01 criterion. Model fit was shown to be good with an RMSEA of 0.045 (90% CI 0.043–0.048). In addition, CFI and TLI are above recommended values for adequate fit (CFI = 0.969, TLI = 0.964). The ECV value for the combined appearance scale was − .85, suggesting that the bifactor model was appropriate to use in this case. Multidimensional IRT parameters are displayed in Table 3.
Table 3

Appearance item parameters

ScaleItemsDiscrimination parametersItem intercepts
a1a2a3a4a5a6a7a8d1d2d3
Body 1. Looks when dressed4.0042.1685.4572.004− 3.605
2. How clothes fit3.4932.3503.5960.675− 4.415
3. Size4.3742.1564.0010.383− 5.162
4. Shape3.1911.9074.0100.853− 3.369
5. Looks in photos3.8702.3212.857− 0.136− 5.053
6. Looks from the behind4.2061.4362.411− 0.981− 4.970
7. Looks from the side3.0851.0482.415− 0.380− 4.265
8. Looks in summer clothes4.0520.9001.911− 1.141− 5.846
9. Looks in a swimsuit4.2680.5000.286− 2.494− 7.076
10. Looks in mirror unclothed4.4480.3650.005− 3.069− 7.561
Abdomen1. How clothes fit2.6791.5952.617− 0.284− 3.542
2. Size3.1482.0772.701− 0.606− 4.755
3. Looks from the side2.2632.0071.647− 1.334− 4.552
4. Shape2.3061.6472.203− 0.411− 3.878
5. Looks in a swimsuit2.9251.9781.317− 1.913− 5.168
6. How toned3.2032.1350.768− 2.333− 5.914
7. Looks when naked3.0562.1790.597− 2.287− 5.664
Upper arms1. Size6.2483.3442.224− 2.085− 7.163
2. How smooth6.1543.3231.776− 2.615− 7.387
3. Shape5.0002.186− 0.017− 3.295− 7.214
4. How skin looks5.0562.6963.022− 0.802− 5.704
5. How toned6.7493.4882.031− 2.477− 8.073
6. Looks when lifted up5.7752.3450.408− 3.042− 7.745
7. Looks when not covered5.6672.283− 0.627− 4.376− 7.940
Back1. How smooth4.3822.0734.261− 0.547− 4.802
2. Looks from different angles4.3682.3424.8220.319− 4.833
3. How toned5.7612.8656.1610.076− 6.387
4. Looks when naked4.9902.3693.661− 0.250− 5.691
Buttocks1. Size3.9821.8113.6190.021− 4.487
2. Look from the side3.7971.8892.979− 0.434− 5.253
3. Shape3.2401.6512.280− 0.877− 4.932
4. How smooth3.3371.5922.365− 0.828− 4.977
5. How skin looks4.0521.7833.0190.079− 5.172
Hips and outer thighs1. Size3.4782.3700.553− 2.782− 6.195
2. Shape3.9412.8530.693− 3.363− 7.421
3. How skin looks3.4931.9520.105− 3.230− 6.488
4. How smooth3.5302.097− 0.099− 3.251− 7.118
5. Looks from the behind 5.6752.4484.4280.026− 6.562
Inner thighs1. How smooth6.1072.6964.652− 0.114− 6.909
2. How skin looks4.7092.0873.385− 0.488− 5.660
3. How toned4.8652.1193.425− 0.724− 6.199
4. Looks when naked5.6622.3153.587− 0.760− 7.312
Appearance item parameters Correlation between appearance domain scores and body scale scores was found to be 0.77. Correlation between all subscales was high with values ranging between 0.63 and 0.83 as can be seen in Table 4.
Table 4

Subscale correlations (Pearson correlation coefficient)

ScaleBodyAbdomenUpper armsBackButtocksHips and outer thighsInner thighs
Body 0.830.650.780.760.790.64
Abdomen0.830.640.740.720.740.60
Upper arms0.650.640.680.670.680.67
Back0.780.740.680.740.770.63
Buttocks0.760.720.670.740.810.68
Hips and outer thighs0.790.740.680.770.810.72
Inner thighs0.640.600.670.630.680.72
Subscale correlations (Pearson correlation coefficient)

Discussion

In this study, a bifactor model was applied to the BODY-Q. It was shown that this model is satisfactory for the BODY-Q appearance domain, with good item and model fit. Furthermore, the feasibility to produce overall appearance score from regular items with the bifactor theory was demonstrated. Correlation between subscales was found to be high between all scales, which further justifies a bifactor model. This study has several strengths. Firstly, the BODY-Q sample was international and large, which was beneficial for the analysis. Also, the sample contained both weight loss and body contouring patients, which makes this study applicable to both patient groups. Secondly, the bifactor model makes use of latent and otherwise unused information in already existing items. Thirdly, with this method, a new extra score is derived from regular item administration while the original BODY-Q scale scoring is not altered in any way. Though we analyzed data from multiple countries, which have previously been shown to be invariant across cultures in unidimensional Rasch analyses, we did not employ a multigroup bifactor analysis and thus cannot comment on any potential invariance between cultures for the overall appearance factor. [1, 30] Further research is recommended both to confirm the cross-cultural suitability of the overall appearance factor as well as the general stability of the item calibration across a larger sample of patients. A straightforward example of the use of a bifactor model in health assessment is depression. Depression could be described as a single construct, but actually consists of different components, such as agitation, suicidal thoughts, sleep disturbances, and anxiety. With this in mind, depression could also be seen as a hierarchical construct, where each separate component measures not only its own construct but also a general factor (i.e., severity of depression). Another example is intelligence, which consists of different components, such as logic, reasoning, planning, and problem-solving [14, 18, 19]. The new scores could be useful for different purposes, such as benchmarking, or for enhanced interpretation of PROM scores. The granular insight given by individual scales are useful tools for assessing prospective trials of specific single-site procedures, but the scores on an individual scale might not fully reflect the impact of extreme weight loss on patients. We envision that the overall score for the appearance scale may more accurately reflect the incremental improvement in satisfaction with global appearance which occurs with single-site surgeries. This overall appearance order measure may therefore also be useful for comparing different single-site operations in terms of their overall impact on bodily satisfaction. The bifactor model could also be useful when providing feedback, where it would be easier to discuss a few summary scores instead of more than a dozen different scores. Fourthly, as in the original BODY-Q, all possible combinations of any of the scales can still be used according to the desire of the physician or researcher. Furthermore, multiple fit indices were analyzed, with most fit indices values being adequate or good. Lastly, a high correlation was found between the bifactor overall order appearance score and the regular satisfaction with body scale scores. This high correlation supports the rationale that confirms that the satisfaction with body scale is a satisfactory measure of overall body satisfaction, but also shows that the overall order appearance domain could be used as a surrogate for the satisfaction with body scale. Our study does contain some notable limitations. Firstly, it can be difficult to accurately assess model fit and interpretability for the bifactor model, which is known to be at risk of overfitting. However recent research has shown that overfitting is not always the case but utilizing traditional information theoretic criteria, such as the Akaike information criteria (AIC) or Bayesian information criterion (BIC) [31-33]. Unfortunately, we were unable to calculate these statistics for our model. Additional uncertainly is brought about by the necessity on relying on item fit statistics which are suitable for SEM analysis and, despite popular usage, have not to our knowledge been confirmed as suitable for IRT analyses. Secondly, we had to rely on imputation to derive model fit statistics, due to missing data within the sample and nuances of the statistical packages we used. Given these limitations, we suggest that future research could evaluate longitudinal BODY-Q data to confirm the stability of the item calibrations both for the original Rasch-derived measures and for the bifactor IRT presented here. Recently, a BODY-Q CAT was developed, which showed substantial item reduction of 37% for this comprehensive PROM [5]. The combination of a bifactor model with a multidimensional CAT might have the potential to establish an even more efficient and reliable BODY-Q CAT compared to this recently developed unidimensional CAT [13, 14].Supported by findings from the current study, further research is planned to investigate the performance and utility of a multidimensional CAT for the BODY-Q. Those interested in scoring using the bifactor model can use the parameters presented here in Table 3. Scoring is possible using the R Programming Environment and the mirt package. Our team is developing easy-to-use tools to facilitate online scoring which may be acquired by contacting the corresponding author. The bifactor model proved to be a valuable tool for deriving overall appearance scores. Making use of a bifactor model for the BODY-Q adds value to the information gained from the PROM without increasing patient burden and without influencing regular BODY-Q items, responses, item parameters, or scoring. This method has the potential to further expand the utility of PROMs in clinical outcome assessment while mitigating the burden of response for patients.
  17 in total

1.  A Comparison of Bifactor and Second-Order Models of Quality of Life.

Authors:  Fang Fang Chen; Stephen G West; Karen H Sousa
Journal:  Multivariate Behav Res       Date:  2006-06-01       Impact factor: 5.923

2.  On the Complexity of Item Response Theory Models.

Authors:  Wes Bonifay; Li Cai
Journal:  Multivariate Behav Res       Date:  2017-04-20       Impact factor: 5.923

3.  Recommendations on the most suitable quality-of-life measurement instruments for bariatric and body contouring surgery: a systematic review.

Authors:  C E E de Vries; M C Kalff; C A C Prinsen; K D Coulman; C den Haan; R Welbourn; J M Blazeby; J M Morton; B A van Wagensveld
Journal:  Obes Rev       Date:  2018-06-08       Impact factor: 9.213

4.  Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model.

Authors:  Dong Gi Seo; David J Weiss
Journal:  Educ Psychol Meas       Date:  2015-03-25       Impact factor: 2.821

Review 5.  Bifactor and Hierarchical Models: Specification, Inference, and Interpretation.

Authors:  Kristian E Markon
Journal:  Annu Rev Clin Psychol       Date:  2019-01-16       Impact factor: 18.561

6.  An overview of confirmatory factor analysis and item response analysis applied to instruments to evaluate primary healthcare.

Authors:  Darcy A Santor; Jeannie L Haggerty; Jean-Frédéric Lévesque; Frederick Burge; Marie-Dominique Beaulieu; David Gass; Raynald Pineault
Journal:  Healthc Policy       Date:  2011-12

7.  Comparative fit indexes in structural models.

Authors:  P M Bentler
Journal:  Psychol Bull       Date:  1990-03       Impact factor: 17.737

8.  Systematic review of the QoR-15 score, a patient- reported outcome measure measuring quality of recovery after surgery and anaesthesia.

Authors:  J Kleif; J Waage; K B Christensen; I Gögenur
Journal:  Br J Anaesth       Date:  2017-11-22       Impact factor: 9.166

9.  Self-Report Scales to Measure Expectations and Appearance-Related Psychosocial Distress in Patients Seeking Cosmetic Treatments.

Authors:  Anne F Klassen; Stefan J Cano; Amy Alderman; Charles East; Lydia Badia; Stephen B Baker; Sam Robson; Andrea L Pusic
Journal:  Aesthet Surg J       Date:  2016-05-24       Impact factor: 4.283

10.  The BODY-Q Stretch Marks Scale: A Development and Validation Study.

Authors:  Lotte Poulsen; Andrea Pusic; Sam Robson; Jens Ahm Sorensen; Michael Rose; Claus Bogh Juhl; Rene Klinkby Stoving; Alin Andries; Anne F Klassen
Journal:  Aesthet Surg J       Date:  2018-08-16       Impact factor: 4.283

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.