Literature DB >> 27057832

A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire Developed and Validated for Multiple Sclerosis: The MusiQoL-MCAT.

Pierre Michel¹, Karine Baumstarck, Badih Ghattas, Jean Pelletier, Anderson Loundou, Mohamed Boucekine, Pascal Auquier, Laurent Boyer.

Abstract

The aim was to develop a multidimensional computerized adaptive short-form questionnaire, the MusiQoL-MCAT, from a fixed-length QoL questionnaire for multiple sclerosis.A total of 1992 patients were enrolled in this international cross-sectional study. The development of the MusiQoL-MCAT was based on the assessment of between-items MIRT model fit followed by real-data simulations. The MCAT algorithm was based on Bayesian maximum a posteriori estimation of latent traits and Kullback-Leibler information item selection. We examined several simulations based on a fixed number of items. Accuracy was assessed using correlations (r) between initial IRT scores and MCAT scores. Precision was assessed using the standard error measurement (SEM) and the root mean square error (RMSE).The multidimensional graded response model was used to estimate item parameters and IRT scores. Among the MCAT simulations, the 16-item version of the MusiQoL-MCAT was selected because the accuracy and precision became stable with 16 items with satisfactory levels (r ≥ 0.9, SEM ≤ 0.55, and RMSE ≤ 0.3). External validity of the MusiQoL-MCAT was satisfactory.The MusiQoL-MCAT presents satisfactory properties and can individually tailor QoL assessment to each patient, making it less burdensome to patients and better adapted for use in clinical practice.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27057832 PMCID： PMC4998748 DOI： 10.1097/MD.0000000000003068

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.889

INTRODUCTION

Health-related quality of life (QoL) measurements are increasingly being considered important in regard to evaluating disease progression, treatment options, and the management of care provided to patients with chronic diseases.[1,2] Self-reported questionnaires are traditionally used to measure QoL, but they are often considered too lengthy by patients and professionals.[3] The time and resources necessary for the completion of questionnaires are constraints on professionals whose main role is providing patient care.[4] Additionally, questionnaires should be as brief as possible because of the difficulties of fatigue and concentration in some clinical populations (e.g., patients with multiple sclerosis [MS], schizophrenia). Providing shorter questionnaires in QoL measures may be useful for clinical practice.[5] Short-form instruments are usually a fixed-length (i.e., the same items are proposed to all patients) and adapted from a long-form instrument by reducing the number of questions based on classical and item response theories (IRTs). However, these fixed-length short-form instruments have drawbacks (e.g., the reduction of questions brings a risk of losing important information that can result in a decline of measurement precision).[6,7] Additionally, because some items are not tailored to patients, the precision of the QoL measure is not maximized, and patients may feel a lack of interest in the QoL measure and stop completing the questionnaire. Interestingly, methods based on IRT models, currently used in the development of unidimensional item banks and computerized adaptive testing (CAT), can be adapted to overcome the problems faced by the development of fixed-length short-form questionnaires.[8,9] Indeed, CAT allows for the administration of only the items that will offer the most relevance for a given individual, reducing the length of the questionnaire and the completion time in addition to maintaining the test's precision.[10-12] Additionally, multidimensional CAT (MCAT) based on multidimensional IRT (MIRT) has been recently applied to measure health problems in various chronic diseases (e.g., symptomatology, fatigue, physical, and emotional functioning).[13-18] Because of the multidimensional nature of QoL, this method seems relevant in developing a valid and reliable adaptive short-form QoL questionnaire.[14] Currently, MCATs applied to shorten fixed-length available QoL questionnaires are scarce.[14,19] The aim of this study was to develop a multidimensional computerized adaptive short-form questionnaire (MCAT) from a fixed-length available QoL questionnaire for patients with a chronic disease marked by the difficulties of fatigue and concentration, MS. Our study focused on the multiple sclerosis international quality of life questionnaire (MusiQoL), which is a widely used QoL questionnaire in MS.[20] Compared to other MS questionnaires, this instrument has 3 important characteristics: specifically reflecting the perspective of patients with MS on the impact of the disease on their daily life; anchored in an explicit conceptual approach;[21] and developed and available in multiple languages and psychometrically validated to appropriate standards.

METHODS

Questionnaire

The MusiQoL questionnaire is a MS-specific, self-administered, and multidimensional QoL instrument.[20] It comprises 31 items describing 9 dimensions. Each dimension is named according to its constitutive items as follows: activities of daily living (ADL, 8 items), psychological well-being (PWB, 4 items), symptoms (SYMP, 4 items), relationships with friends (RFR, 3 items), relationships with family (RFA, 3 items), relationships with healthcare system (RHCS, i.e., satisfaction with healthcare; 3 items), sentimental and sexual life (SSL, 2 items), coping (COP, 2 items), and rejection (REJ, 2 items). Each item is scored on a 6-point Likert scale, in which a score of 1 represents never/not at all, 2 represents rarely/a little, 3 represents sometimes/somewhat, 4 represents often/a lot, 5 represents always/very much, and 6 represents not applicable. For each individual, the score on each dimension is obtained by computing the mean of the item scores for that dimension. All dimension scores are linearly transformed to a 0 to 100 scale. A global index score is computed as the mean of the dimension scores. Higher scores indicate a higher level of QoL.

Study Design and Setting

Data from an international, multicenter, and cross-sectional MusiQoL validation study were used.[20] Patients were recruited between January 2004 and February 2005 at neurological departments in 15 countries: Argentina, Canada, France, Germany, Greece, Israel, Italy, Lebanon, Norway, Russia, South Africa, Spain, Turkey, UK, and USA. This study was performed in accordance with the Declaration of Helsinki and all applicable regulatory authority requirements and national laws (Institutional Review Boards or Independent Ethics Committees in accordance with the local requirements of each of the 15 countries). Written informed consent from patients was obtained before any study procedures were performed.

Population

The inclusion criteria included a diagnosis of MS according to McDonald,[22] being treated as an in- or outpatient at a hospital, over 18 years of age, informed consent to participate, and a native speaker of the local language. The main exclusion criteria included a neurologic diagnosis other than MS, dementia, ongoing severe relapse, an inability to complete the questionnaire unassisted, and withdrawal of consent.

Data Collection

In addition to the MusiQoL questionnaire, the following data were collected: Socio-demographic information: age (years); gender (male, female); educational level (less than 12 years, greater than 12 years); marital status (single, not single); and employment status (active, unemployed). Clinical data: disease duration (years); MS subtype (relapsing–remitting [RR], primary progressive [PP], secondary progressive [SP], and clinically isolated syndrome [CIS]);[23] and MS disability using the expanded disability status scale (EDSS)[24] (an ordinal clinical rating scale ranging from 0 [normal neurologic examination] to 10 [death due to MS]). QoL was assessed using the SF-36,[25] a generic questionnaire describing 8 subscales: physical function, social function, role-physical (RP), role-emotional (RE), mental health (MH), vitality, bodily pain, and general health. Two composite scores (physical and mental composite scores [PCS-SF-36] and [MCS-SF-36]) were calculated. The SF-36 yields scores on a 0 to 100 scale, in which 0 represents the lowest and 100 the highest QoL scores.

MCAT Procedure and Analyses

This procedure was divided into 3 phases: MIRT analysis; MCAT simulations with analyses of accuracy and precision; and clinical validity of the MusiQoL-MCAT.

Multidimensional Item Response Theory Analysis

Percentages of missing values were computed for each item. In accordance with the steps taken previously to validate the MusiQoL,[20] a between-items MIRT model was calibrated. We tested 2 flexible IRT models that allow for the consideration of items with various numbers of categories and various difficulty thresholds: the multidimensional graded response model (MRGM)[26] and the multidimensional generalized partial credit model.[27] The MRGM was retained because it yielded a better fit than multidimensional generalized partial credit model in regard to the Akaike's information criterion and Bayesian information criterion. We also tested 2 IRT models with missing data and imputed data. For the model with imputed data, we used multiple data imputation because we considered the data missing not at random, following previous works on QoL.[28-30] The model with missing data was retained because it yielded a better fit in terms of the Akaike's information criterion (145,922 vs 153,334) and Bayesian information criterion (146,974 vs 154,359). Item parameters were thus estimated using the MRGM with unconditional maximum likelihood (ML) estimation, as implemented in the R package mirt.[31] We used the Metropolis–Hastings Robbins–Monro[32] method as an estimation algorithm because it provides better precision than a classical expectation-maximization algorithm approach[33] in the presence of more than 3 factors. The MRGM consists of 2 multidimensional sequential 2-parameter logistic models and is defined as follows: where where i is the ith individual, j the jth item, xij the ordinal response taking the value , αj the item discrimination parameter according to dimension d, θi the individual parameter according to dimension d, and βjk is the kth item difficulty threshold parameter. Bayesian maximum a posteriori (MAP) estimation[8] of person-specific parameters (i.e., latent trait estimates) were computed using the MRGM parameters and the 31 item responses, providing IRT dimension scores for each patient. In IRT, item information is a function of the item parameters (i.e., the discrimination and difficulty threshold parameters). An item with more information is more discriminant and provides a lower error of measurement. The test information is the sum of all item information. The contribution of each item to the total test information (also called the amount of test information) was calculated. The unidimensionality of each dimension was assessed using a Rasch analysis. The goodness-of-fit statistics (inlier-sensitive fit, ranging between 0.7 and 1.3) ensured that all items of the scale measured the same concept.[34] Differential item functioning (DIF) analyses were performed to compare the item differences among countries to determine whether all items behaved the same way.[35] The DIF indicates whether an item performs and measures differently for 1 subgroup of a population compared with another.

MCAT Simulations With Analyses of Accuracy and Precision

We performed a post-hoc or real-data simulation approach (i.e., complete response patterns to the 31 items of the MuSiQoL were used to simulate the conditions of an MCAT assessment). The algorithm of the MCAT was based on Mulder and van der Linden's work for Kullback–Leibler Information Item Selection.[36] Initially, the person-specific parameter estimate was set to the IRT dimension population mean scores. As the starting item, we used the item with the highest amount of test information. Item selection depended on responses to earlier items in the questionnaire taken from the empirical data. At each step of item selection, the Bayesian MAP procedure estimated the latent trait level that maximized the posterior distribution based on the current likelihood of the data and the assumed prior distribution. As a stopping criterion, we examined the 4 initial simulations based on a fixed number of items (5, 10, 15, and 20). For each simulation, MCAT dimension scores were calculated, and accuracy and precision were then assessed. Accuracy was assessed using the level of correlation between the MCAT and the IRT dimension scores based on the full set of items (levels of correlation >0.9 were expected for each dimension). Precision was assessed using 2 indicators: the standard error measurement (SEM) and the root mean square error (RMSE). The SEMs of the MCAT dimension scores are considered indicators of reliability. The SEMs of the MCAT dimension scores are considered indicators of reliability. According to Harvill's work,[37] there is a direct relationship between the reliability of a dimension and the SEM; lower reliability estimates provide higher SEM estimates. An acceptable range was defined as <0.55 to ensure a satisfactory reliability level (reliability >0.70). The RMSE shows how precise the MCAT dimension scores are relative to the IRT scores from the full item set. The RMSE is calculated as follows: where θ is the IRT score from the full item set of the ith individual and is the MCAT score, and smaller values of RMSE represent better measurement precision. RMSE values lower or equal to 0.3 indicate excellent measurement precision.[38] According to the accuracy/precision of the first 4 simulations, other simulations were tested to determine the best MCAT version. The final version of the MusiQoL-MCAT was selected considering the lowest number of items matching with the most satisfactory level of accuracy and precision. The item exposure (i.e., the number of times each item was exposed during the CAT procedure) was described for this version.

Validity of the MusiQoL-MCAT

To assess the validity of the selected MusiQoL-MCAT, we explored both convergent and discriminant validity. To explore the convergent validity, Pearson correlation coefficients were used to investigate the relationships between the dimensions of the MusiQoL-MCAT and the dimensions of the generic QoL questionnaire (i.e., SF-36). In accordance with the assumptions from the initial validation of the MusiQoL,[20] we hypothesized that the MusiQoL-MCAT scores would be more correlated with scores of dimensions exploring similar aspects from the SF-36 than with those exploring dissimilar aspects. The discriminant validity was determined by exploring the relationships between the MusiQoL-MCAT scores and socio-demographic (i.e., age, gender, educational level, marital status, and employment status) and clinical (i.e., EDSS score and MS subtypes) features using t-tests, ANOVAs, and Pearson correlations. To control the familywise error rates caused by the large number of correlations, we performed multivariate permutation tests.[39,40] Several hypotheses were formulated in accordance with previous studies: the MusiQoL-MCAT should differ according to sociodemographic characteristics (i.e., with younger age, higher educational level, and being in a couple associated with higher QoL); should be negatively correlated with the severity of the disease (i.e., EDSS); and should be lower in patients with the SP form of MS. All the statistical analyses were performed using R version 2.15.2.

RESULTS

The international field study sample included 1992 patients with MS. Patients were recruited from the 15 following countries: Argentina (n = 27), Canada (n = 77), France (n = 179), Germany (n = 209), Greece (n = 92), Israel (n = 66), Italy (n = 379), Lebanon (n = 20), Norway (n = 104), Russia (n = 201), South Africa (n = 53), Spain (n = 224), Turkey (n = 228), UK (n = 36), and USA (n = 97). The mean age was 42.2 (standard deviation, SD = 11.9) years; 1382 patients (70.5%) were female, and 578 patients (29.5%) were male; 592 (35.2%) had a high educational level; and 372 (21.7%) were single. Patients had an RR MS subtype in 70.4% of cases, SP in 21.0%, PP in 7.1%, and CIS in 1.5%. The median EDSS score was 3.0 (interquartile range = 3.5).

Multidimensional Item Response Theory Analysis

Percentages of missing data, estimated item parameters, information, and inlier-sensitive fit are presented in Table 1, and the IRT score distribution for each dimension is presented in Figure 1. Item 17 from the RFR dimension (“have you felt understood by your friends?”) provided the greatest amount of information, and item 16 from the SYMP dimension (“have you experienced unpleasant feelings: i.e., hot, cold?”) provided the least amount of information. Substantial DIF between countries was not evidenced for all dimensions, confirming the interest of this MCAT in international studies.

TABLE 1

Estimated Item Parameters and Information

FIGURE 1

IRT score distribution for each MusiQoL dimension. ADL = activities of daily living, COP = coping, IRT = item response theory, MusiQoL = multiple sclerosis international quality of life questionnaire, PWB = psychological well-being, REJ = rejection, RFA = relationships with family, RFR = relationships with friends, RHCS = relationships with healthcare system, SSL = sentimental and sexual life, SYMP = symptoms.

Estimated Item Parameters and Information IRT score distribution for each MusiQoL dimension. ADL = activities of daily living, COP = coping, IRT = item response theory, MusiQoL = multiple sclerosis international quality of life questionnaire, PWB = psychological well-being, REJ = rejection, RFA = relationships with family, RFR = relationships with friends, RHCS = relationships with healthcare system, SSL = sentimental and sexual life, SYMP = symptoms.

Analyses of Accuracy and Precision

Real-data simulations were performed on 922 patients with complete response patterns to the 31 items of the MuSiQoL. Accuracy and precision indicators of each simulation are presented in Table 2.

TABLE 2

MCAT Simulations: Accuracy and Precision Parameters for Each Dimension

MCAT Simulations: Accuracy and Precision Parameters for Each Dimension The number of dimensions with satisfactory accuracy (i.e., correlation >0.9) increased when simulations included a high number of items (from 3 of the 9 dimensions for the 5-item simulation to 8 of the 9 dimensions for the 20-item simulation). The relationships with healthcare system dimensions remained unsatisfactory regardless of the number of items in the simulation. In regard to accuracy, the 2 indicators of precision were better when simulations included a high number of items. The number of dimensions with satisfactory SEM and RMSE varied from 3 of the 9 dimensions for the 5-item simulation to 8 of the 9 dimensions for the 20-item simulation and from 2 of the 9 dimensions for the 5-item simulation to 8 of the 9 dimensions for the 20-item simulation, respectively. The same dimension (i.e., relationships with the healthcare system) remained unsatisfactory regardless of the number of items in the simulation. As accuracy and precision of the 15- and 20-item simulations were the most satisfactory, 4 supplementary simulations were tested from 16 to 19 items. The 16-item version of the MusiQoL-MCAT was defined as the most satisfactory MCAT simulation because the level of accuracy and precision did not substantially change after 16 items. Item exposure (i.e., the utilization frequency of an item) of the 16 item version of the MusiQoL-MCAT procedure is presented in Figure 2. Three items from both the SYMP and RHCS dimensions were never administered (items 15, 16, and 25), whereas 8 were administered more than 9 times out of 10 (items 1 and 2 from ADL dimension, item 10 from PWB dimension, item 14 from SYMP dimension, items 17 and 19 from RFR dimension, item 27 from SSL dimension, and item 28 from COP dimension).

FIGURE 2

Item exposure for the selected computerized adaptive testing (CAT) procedure. Items: 1–8: activity of daily living; 9–12: psychological well-being; 13–16: symptoms; 17–19: relationships with friends; 20–22: relationships with family; 23–25: relationships with health care system; 26, 27: sentimental and sexual life; 28, 29: coping; 30–31: rejection.

Validity

Convergent and discriminant validity results were assessed for the 16-item version of the MusiQoL-MCAT and are shown in Table 3 . Our findings were consistent with our assumptions. Age was negatively correlated with ADL, SYMP, SSL, and REJ dimension scores. RFR dimension scores were significantly higher in women. Individuals with higher educational levels had significantly better scores, except for the SYMP, RFA, and SSL dimensions. Among single individuals, significantly lower scores were observed on the RFA, RHCS, and SSL dimensions. Unemployed people had significantly lower scores on 5 dimensions (ADL, PWB, SYMP, COP, and REJ) compared to active individuals. Disease duration was negatively correlated with ADL and REJ scores. Significant differences were observed for ADL, RHCS, and REJ dimension scores between the 4 MS subtypes, with the highest scores found in individuals with CIS and the lowest scores found in those with SP. Bonferroni pairwise post-hoc tests for the MS subtypes are presented in Appendix 1. The EDSS score was negatively correlated with all the dimensions scores of the MusiQoL-MCAT, except for the RFR and RFA dimensions. Finally, significant positive correlations were found between the MusiQoL-MCAT dimension scores and the SF-36 dimension scores. As expected, the ADL dimension of the MusiQoL-MCAT was highly correlated with the physical-like dimensions (physical function and role-physical) and the physical composite score of the SF-36 (correlation coefficients from 0.60 to 0.78). The “mental/psychological-like” dimensions of the MusiQoL-MCAT (PWB, COP, and REJ) were highly correlated with the “mental/psychological-like” dimensions (RE and MH) and the mental composite score of the SF-36 (correlation coefficients from 0.40 to 0.65). The “social-like” dimensions of the MusiQoL-MCAT (RFR, RFA, and SSL) were moderately correlated with the social functioning domain of the SF-36 (coefficients lower than 0.40).

TABLE 3

Convergent and Divergent Validity of the 16-Item MCAT Procedure

Convergent and Divergent Validity of the 16-Item MCAT Procedure Convergent and Divergent Validity of the 16-Item MCAT Procedure

DISCUSSION

To our knowledge, this study is one of the 1st investigations to propose a multidimensional computerized adaptive short-form questionnaire from a fixed-length available QoL questionnaire. First, we demonstrated that the MusiQoL-MCAT had satisfactory precision and accuracy properties. All the MusiQoL-MCAT dimensions had levels of correlation higher than 0.9 with the IRT dimension scores based on the full set of items, SEM lower than 0.55 and RMSE lower than 0.3, except for 1 dimension (i.e., RHCS). However, the RHCS dimension has previously shown unsatisfactory performance, especially in the initial validation procedure.[20] Despite this drawback, the experts and developers of the MusiQoL decided to maintain this dimension due to its specific content concerning the healthcare environment. Additionally, the external validity of the MusiQoL-MCAT was consistent with the external validity of the fixed-length MusiQoL.[20] The MusiQoL-MCAT scores were moderately correlated with the EDSS. These results confirmed that clinical assessments may not adequately reflect patients’ perceptions and the impact of their SYMP and that the MusiQoL-MCAT adds important complementary information to traditional clinical measures. The lowest MusiQoL-MCAT scores were reported by patients with the SP form of MS, confirming that it is the most clinically aggressive and severe form of the disease. In this work, few significant differences were reported according to gender, which is consistent with other studies.[41] Higher educational level or being in a couple was associated with higher QoL levels, as previously reported in similar cross-sectional studies.[42] Older age was significantly associated with worse scores in the physical dimensions as ADL and SYMP, consistent with previous findings.[43] As expected, the MusiQoL-MCAT scores were correlated with the scores of similar dimensions from the SF36-ADL dimension of the MusiQoL-MCAT with the physical-like dimensions of the SF36 and the “mental/psychological-like” dimensions of the MusiQoL-MCAT with the “mental/psychological-like” dimensions of the SF36. From a methodological perspective, 4 key issues need to be discussed: the IRT model used; the calculation of the trait estimate after an individual gives the response; the item selection; and the stopping rule. Concerning the 1st point, 2 types of MIRT models could have been considered: between-items and within-items models.[44] In our study, we used a between-items model (i.e., each item loading on 1 dimension only) in accordance with the steps taken previously to validate the MusiQoL.[20] A within-item multidimensional model (i.e., 1 item loading on several dimensions) could have also been considered, but the goal of this study was not to reexamine the dimensionality of the MusiQoL. Future work should explore this option and determine whether a within-item multidimensional model better fits the data, and if it can improve the precision and accuracy properties of the MusiQoL-MCAT, especially in relationships with the healthcare dimension. Second, 2 main algorithms are available for ability estimation: ML estimation and Bayesian estimation including MAP and expected a posteriori (EAP). In our study, we used the Bayesian MAP method to estimate the latent trait level for the initial estimation of IRT scores, for updating the scores during the CAT procedure and for the final estimation of CAT scores. Although this option might be debatable, Yao[45] has shown that MAP yielded better precision than ML and performs similarly or better than EAP. Moreover, according to Chalmers’ findings,[31] using EAP scores for models with more than 3 factors are generally not recommended as it results in slower estimation and less precision. Therefore, MAP scores should be used instead of EAP scores for higher dimensional models,[31] such as the MusiQoL structure. Third, the choice of the 1st item and following items is of great importance and depends on the approach taken previously (i.e., ML or Bayesian approach). In the Bayesian approach, it is recommended to select items with the highest information.[46] For example, Petersen et al[14] compared 2 CAT procedures, the 1st using the most informative item as the starting item and the 2nd using a less informative item and reported that administering the least or moderate informative item first provides a greater test length and a less precise measurement. Additionally, the information item selection can also be discussed. The Kullback–Liebler information item selection seemed to be the best way to select the items in our CAT procedures. Indeed, in a recent study, Yao[47] compared the Kullback–Liebler method with 4 other methods. In many ways, the Kullback–Liebler method outperformed the other methods, producing the smallest test length, which was an important argument for clinical use of the MusiQoL. Moreover, the Kullback–Liebler information item selection is preferable to the Fisher selection, especially if the number of items used is small, as in our study.[48,49] Fourth, we chose as a stopping criterion a fixed-length rule that was compatible with clinical practice rather than a variable-length rule which would make the questionnaire too long because of the unsatisfactory property of the relationships with healthcare dimension. The MCAT simulation results indicated that 3 items were never administered (items 15, 16, and 25 from the SYMP and RHCS dimensions). These 3 items were the least discriminating items and provided the least amount of test information. This finding may be not surprizing because the RHCS and SYMP dimensions appear to be more influenced by a medical perspective and are further from the patient's point of view than other MusiQoL dimensions. However, other items from these 2 dimensions (i.e., items 13, 14, 23, and 24) were administered, confirming the satisfactory distribution of item exposure rates for each MusiQoL dimension. For this reason, we did not apply strategies for controlling item exposure in the MCAT.[45,50] Last, this study also provides a broader reflection on the development strategy of the new QoL measures. Fixed-length self-reported questionnaires are classically used to measure QoL in MS and other chronic diseases. CAT has proven to be efficient compared to these classical questionnaire measurements, including increased precision and avoidance of noninformative questions. As a consequence, important groundwork has been the development of unidimensional item banks containing a large amount of items covering the entire range of a latent trait (e.g., fatigue, pain).[51,52] The construction of a QoL item bank is an important step to proposing QoL CAT. However, a QoL item bank requires substantial resources and time because several issues remain unresolved: Is it possible to associate several QoL questionnaires based on various theoretical and conceptual backgrounds in the same bank? Can we associate generic and specific questionnaires? Should we associate questionnaires developed from the perspective of the patient and the experts? Additionally, the multidimensional nature of QoL involves the development of all of the unidimensional attributes of QoL that should be calibrated; then, the development of a multidimensional measure would be possible. All of these issues need to be resolved and therefore delay the development of a large QoL item bank and, thus, a multidimensional QoL CAT based on such a bank. Pending the completion of this major work, and although the number of items is relatively small in QoL questionnaires compared with item banks, the development of MCAT from available QoL questionnaires can be an attractive option based on financial and time resources.

Strengths and Limitations

A limitation in our study is that we used the entire sample only for the MIRT model calibration. MCAT simulations were performed using only the complete response patterns. To overcome this issue, it should be possible to use a well-known data imputation method, such as the multiple imputations approach, and use the imputed dataset for both MIRT model calibration and MCAT simulations. Using multiple imputations on our dataset for MIRT calibration resulted in a deterioration of the model fit. This approach encouraged us to use the raw dataset in this study, given that the sample was large enough to obtain robust results. Even with the large overall sample size in this study, the representativeness of our sample should be discussed. Compared with the most important longitudinal studies that parallel the present study, our patients were younger or older (others had mean ages of 42,[53] 44,[54] and 34 years),[55] had less severe baseline disability statuses (mean EDSS scores of 4.1[53] and 5.1[54] were seen in other studies), and had a sex-ratio of 3:1 (4:1,[53] 2:1,[54] and 2.5:1[55] were found in other studies). Future research with different sample characteristics could improve the generalizability and applicability of the MusiQoL-MCAT. The responsiveness or sensitivity to change was not tested in our study. This property, defined as the ability to detect a meaningful change, is a core psychometric property of measurement instruments.[56] This property is of major interest for the follow-up of patients with MS in clinical practice and for psychosocial research.[57,58] This property should thus be confirmed on the MusiQoL-MCAT in future longitudinal studies. Despite these limitations, our work has several strengths that should be recognized (e.g., a large sample size and psychometric properties performed in accordance with international guidelines for developing questionnaires).[14,59] Moreover, it should be noted that these requirements are not systematically met for more “objective” outcome measures used by clinicians and decision makers.[60] Requirements that are too high-level may cause more harm than good, especially by preventing the use and diffusion of current QoL measures. In this sense, this new multidimensional computerized adaptive short-form questionnaire has satisfactory properties and can be considered interesting option for promoting both the use and usefulness of measuring QoL in MS clinical practice.

CONCLUSION

The MusiQoL-MCAT presents satisfactory properties and can individually tailor QoL assessment to each patient, making QoL assessment less burdensome to patients with multiple sclerosis and better adapted for use in clinical practice. As the construction of QoL item banks requires substantial resources and time, the development of MCAT from available QoL questionnaires using relevant methodology can be an attractive option based on financial and time resources.

TABLE 3 (Continued)

Convergent and Divergent Validity of the 16-Item MCAT Procedure

37 in total

1. Enhancing power while controlling family-wise error: an illustration of the issues using electrocortical studies.

Authors: Paul J Yoder; Jennifer Urbano Blackford; Niels G Waller; Geunyoung Kim
Journal: J Clin Exp Neuropsychol Date: 2004-05 Impact factor: 2.475

2. EORTC QLQ-C15-PAL: the new standard in the assessment of health-related quality of life in advanced cancer?

Authors: Michael A Echteld; Luc Deliens; Bregje Onwuteaka-Philipsen; Martin Klein; Gerrit van der Wal
Journal: Palliat Med Date: 2006-01 Impact factor: 4.762

3. The use of quality of life data in clinical practice.

Authors: J Morris; D Perez; B McNoe
Journal: Qual Life Res Date: 1998-01 Impact factor: 4.147

4. Value of health-related quality of life to predict disability course in multiple sclerosis.

Authors: M A J Visschedijk; B M J Uitdehaag; M Klein; E van der Ploeg; E H Collette; L Vleugels; L E M A Pfennings; E L J Hoogervorst; H M van der Ploeg; C H Polman
Journal: Neurology Date: 2004-12-14 Impact factor: 9.910

5. Defining the clinical course of multiple sclerosis: results of an international survey. National Multiple Sclerosis Society (USA) Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis.

Authors: F D Lublin; S C Reingold
Journal: Neurology Date: 1996-04 Impact factor: 9.910

6. Development and preliminary testing of a computerized adaptive assessment of chronic pain.

Authors: Milena D Anatchkova; Renee N Saris-Baglama; Mark Kosinski; Jakob B Bjorner
Journal: J Pain Date: 2009-09 Impact factor: 5.820

7. Psychometric evaluation of the EORTC computerized adaptive test (CAT) fatigue item pool.

Authors: Morten Aa Petersen; Johannes M Giesinger; Bernhard Holzner; Juan I Arraras; Thierry Conroy; Eva-Maria Gamper; Madeleine T King; Irma M Verdonck-de Leeuw; Teresa Young; Mogens Groenvold
Journal: Qual Life Res Date: 2013-02-28 Impact factor: 4.147

8. Health-related quality of life as an independent predictor of long-term disability for patients with relapsing-remitting multiple sclerosis.

Authors: K Baumstarck; J Pelletier; H Butzkueven; O Fernández; P Flachenecker; E Idiman; S Stecchi; M Boucekine; P Auquier
Journal: Eur J Neurol Date: 2013-01-24 Impact factor: 6.089

9. Measuring global physical health in children with cerebral palsy: illustration of a multidimensional bi-factor model and computerized adaptive testing.

Authors: Stephen M Haley; Pengsheng Ni; Helene M Dumas; Maria A Fragala-Pinkham; Ronald K Hambleton; Kathleen Montpetit; Nathalie Bilodeau; George E Gorton; Kyle Watson; Carole A Tucker
Journal: Qual Life Res Date: 2009-02-17 Impact factor: 4.147

10. Recommended diagnostic criteria for multiple sclerosis: guidelines from the International Panel on the diagnosis of multiple sclerosis.

Authors: W I McDonald; A Compston; G Edan; D Goodkin; H P Hartung; F D Lublin; H F McFarland; D W Paty; C H Polman; S C Reingold; M Sandberg-Wollheim; W Sibley; A Thompson; S van den Noort; B Y Weinshenker; J S Wolinsky
Journal: Ann Neurol Date: 2001-07 Impact factor: 10.422

10 in total

1. Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing.

Authors: Chun Wang; David J Weiss; Zhuoran Shang
Journal: Psychometrika Date: 2018-12-03 Impact factor: 2.500

Review 2. Multiple sclerosis.

Authors: Massimo Filippi; Amit Bar-Or; Fredrik Piehl; Paolo Preziosa; Alessandra Solari; Sandra Vukusic; Maria A Rocca
Journal: Nat Rev Dis Primers Date: 2018-11-08 Impact factor: 52.329

3. Modernizing quality of life assessment: development of a multidimensional computerized adaptive questionnaire for patients with schizophrenia.

Authors: Pierre Michel; Karine Baumstarck; Christophe Lancon; Badih Ghattas; Anderson Loundou; Pascal Auquier; Laurent Boyer
Journal: Qual Life Res Date: 2017-03-25 Impact factor: 4.147

Review 4. Evaluation of quality of life in individuals with severe chronic motor disability: A major challenge.

Authors: Marie-Christine Rousseau; Karine Baumstarck; Thierry Billette de Villemeur; Pascal Auquier
Journal: Intractable Rare Dis Res Date: 2016-05

5. The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.

Authors: Matthew M Engelhard; Karen M Schmidt; Casey E Engel; J Nicholas Brenton; Stephen D Patek; Myla D Goldman
Journal: Qual Life Res Date: 2016-06-24 Impact factor: 4.147

Review 6. Health-Related Quality of Life in Patients with Multiple Sclerosis: Impact of Disease-Modifying Drugs.

Authors: Peter Joseph Jongen
Journal: CNS Drugs Date: 2017-07 Impact factor: 5.749

7. The Patient-Reported Experience Measure for Improving qUality of care in Mental health (PREMIUM) project in France: study protocol for the development and implementation strategy.

Authors: Sara Fernandes; Guillaume Fond; Xavier Zendjidjian; Pierre Michel; Karine Baumstarck; Christophe Lancon; Fabrice Berna; Franck Schurhoff; Bruno Aouizerate; Chantal Henry; Bruno Etain; Ludovic Samalin; Marion Leboyer; Pierre-Michel Llorca; Magali Coldefy; Pascal Auquier; Laurent Boyer
Journal: Patient Prefer Adherence Date: 2019-01-21 Impact factor: 2.711

8. Health-related Quality of Life of People with Multiple Sclerosis in Oman.

Authors: Jansirani Natarajan; Mickael Antoine Joseph; Abdullah Al Asmi; Gerald Amandu Matua; Jaber Al Khabouri; Anitha Nesa Thanka; Laila Darwish Al Balushi; Suad Moosa Al Junaibi; Issa Sulaiman Al Ismaili
Journal: Oman Med J Date: 2021-11-15

9. Measuring Patient-Reported Outcomes Adaptively: Multidimensionality Matters!

Authors: Muirne C S Paap; Karel A Kroeze; Cees A W Glas; Caroline B Terwee; Job van der Palen; Bernard P Veldkamp
Journal: Appl Psychol Meas Date: 2017-10-24

10. Computerized adaptive testing with decision regression trees: an alternative to item response theory for quality of life measurement in multiple sclerosis.

Authors: Pierre Michel; Karine Baumstarck; Anderson Loundou; Badih Ghattas; Pascal Auquier; Laurent Boyer
Journal: Patient Prefer Adherence Date: 2018-06-19 Impact factor: 2.711

10 in total