Literature DB >> 35572073

Content Validity through Expert Judgment for the Depression Clinical Evaluation Test.

María Guillot-Valdés¹, Alejandro Guillén-Riquelme¹, Gualberto Buela-Casal¹.

Abstract

Background/Objective: The evaluation of depression requires valid and reliable measuring instruments, which collect a wide spectrum of symptoms that this disorder displays, in order to carry out an accurate and differential diagnosis. The objective of this work is the construction of the Depression Clinical Evaluation Test (DCET), where affective, somatic, cognitive, behavioral and interpersonal symptoms are considered and also analyze its content validity through an expert judgment. Method: Based on different diagnostic and manual classifications, a specification table for a depression test was established. In its evaluation, 16 experts in Psychological Assessment, Psychometry and/or Psychopathology participated. A total of 300 items were created. The experts had to assess the items according to the criteria of Content, Relevance, Clarity, Comprehension, Sensitivity, and Offensiveness. In addition, 50 adults, evaluated the compression of the items.
Results: The degree of understanding for all the items was high and the expert judgment favoured the suppression of 104 items, thus obtaining a shorter measuring instrument with a total of 196 items for ease of application. Conclusions: The content validity of the test is adequate and fits the agreed definition of depression.

Entities: Chemical

Keywords: Assessment; Content validity; Depression; Expert judgment; Instrumental study

Year: 2022 PMID： 35572073 PMCID： PMC9055062 DOI： 10.1016/j.ijchp.2022.100292

Source DB: PubMed Journal: Int J Clin Health Psychol ISSN： 1697-2600

Depression is one of the most common psychological disorders. According to World Health Organization (WHO) data, it is around 5.2% in the general population, being very close to those observed in other studies, around 7.2% (Lim et al., 2018). The study of depression has aroused interest over the years; and currently there has been a proliferation of work on the prevalence (Bueno-Notivol et al., 2021) and analysis of depression symptoms due to the COVID-19 health crisis (Cecchini et al., 2021). Incidentally, the evaluation of depression is complex even when there are a variety of instruments for its and diagnosis (Guillot-Valdés et al., 2019, 2020) and even in primary care with short evaluations (Rezaeizadeha et al., 2021). A difficulty of depression assessment lies in the fact that it is a disorder with wide and varied symptomatology. This range includes cognitive, behavioural and psychosomatic symptoms in addition to the main emotional symptoms of the disorder. There are no scales on which all of them are evaluated with different items for each type of symptom. One of the most classic and used questionnaires is the Beck Depression Inventory (BDI-II; Beck et al., 1988). One of its advantages is that it covers a wide spectrum of depression with very few items; however, as mentioned above, it only covers each facet with one or two items. This fact makes it difficult to know the most affected areas of a specific case in a reliable way. Thus, it is common for evaluations to be complemented by using various specific questionnaires in order to make a reliable clinical profile of each affected area. This methodology presents results that are not easy to integrate and evaluate independent aspects of depression. Last but not least, there is a controversy about whether depression has a dimensional or a categorical character, which prevails in current mental disorders classification systems. This approach influences the construction of instruments for the evaluation and diagnosis of the disorder (Chiesa et al., 2017). However, there are also contributions that emphasize the existence of an orthogonal structure between the two which would imply that obtaining high scores in positive affect would not lead to low scores in negative affect (Watson et al., 2011). Currently, very few questionnaires are focused on the dimensional approach to depression; therefore, the Basic Depression Questionnaire and the State/Trait Depression Inventory constitute certain examples on which some recent studies have been developed (Guillot-Valdés et al., 2019, 2020). Although they do not cover the entire symptom picture of depressive disorder. The task of constructing a test implies careful planning, a clear and concrete vision of what it intends to measure, and that the items are well written and include a representative sample of the possible behaviours to be assessed (Muñiz et al., 2013; Muñiz & Fonseca-Pedrero, 2019). In this case, it is about operationalizing a construct, through concrete and tangible elements (items) (Carretero-Dios & Pérez, 2007; Muñiz & Fonseca-Pedrero, 2019). For this, a detailed process and a multitude of experts are required to help in the review of decisions. One of the most used methods to find the content validity of a questionnaire is the judgment of experts, who can either suggest which items the instrument should consist of to define the construct to be measured, or as in this case, evaluate the items already created based on a series of quantitative criteria (giving scores) or qualitative and suggesting, or adding any change to their wording if they consider it necessary (Garrote & del Carmen Rojas, 2015). This procedure is widely used by researchers to analyze the content validity of newly created instruments (Leyton-Román et al., 2021) or for adaptations of existing instruments (Cervilla et al., 2021). The aim of this study is, first, establish a test especificacition table. For this we expect to establish an integral model that evaluates the main components of depression, thus covering all of the related symptoms. Secondly, we will develop an item bank test that cover this test specification table, including a proportional number of items for each factor and subfactor. The second aim is to estimate the content validity of this item pool, based on expert judgments of the Clinical Evaluation of Depression Test (TECD). In addition, it is intended to analyze the degree of understanding of the item bank to verify that they are intelligible to adult population.

Method

Participants

The sample, selected by convenience, consists of 16 experts and all of them had PhDs degrees in Psychology, with years of expertise and voluntarily agreed to participate in the study. They were specialized in the area of psychological evaluation, psychometry and/or clinical psychopathology and had great experience in the subject due to their academic training and work experience. Thus, they were able to provide adequate information, evidence, judgments and evaluations (Escobar-Pérez & Cuervo-Martínez, 2008). The criterion that different authors considered for the selection of judges was followed (Skjong & Wentworth, 2001; Urrutia et al., 2014; Varela-Ruiz et al., 2012). Not only the already mentioned criterion but also the impartiality, motivation to participate, adaptability and availability of the judges were taken ino account. They were contacted by email, explaining the purpose of the project and requesting their collaboration. In parallel and following the model of other authors (Fernández-Gómez et al., 2020; García-Cortés & Hernández, 2021; Luque-Vara et al., 2020) a pre-test of comprehension of the items was carried out, in which a total of 50 voluntarily collaborators participated (M = 38, SD = 19.07, 56% women) and to whom, as in the case of the experts, part of the questionnaire (50 items each) was also sent via email. Informed consent was obtained from each of them. The aspects to be evaluated were the degree of understanding of the item, reflected in the question as ‘If the item was understood well’ and the response ranged from bad (0) to perfect (10). The participants were also asked if there were any words that they did not understand and, finally, if they would express the item in another way and how. Subsequently, the mean of these scores was calculated to determine the degree of comprehension.

Instrument

For the creation of the Depression Clinical Evaluation Test (DCET) the ‘Standards for educational and psychological testing’ (American Educational Research Association, et al., 2014) and the guidelines of the International Test Commission (2016) were followed. In addition, several general articles on the creation and adaptation of tests were followed (Almanasreh et al., 2019). In the first phase, from the documentary review carried out, a definition of depression was established: it was understood as a series of mood disorders characterized by having a common core symptomatology and that could vary in intensity, frequency or in the specific presence of symptoms among themselves. Derived from this definition and all the material consulted, the factors that composed it were established, collecting a logical grouping of the characteristics established in the manuals. Consequently, the symptoms were grouped into the following factors: affective, physiological/somatic, cognitive, behavioural and interpersonal. The number of symptoms considered in each one ranged from 3 to 8. The symptoms’ weights were established in accordance with whether they appeared in the DSM-5 and / or in the ICD-10 and 11, giving double weight to those that were collected in all classifications (Table 1). These weights were percentages ranging between 8 and 33%.

Table 1

Symptom summary for major depressive disorder / episode in DSM-5, ICD-10 and 11.

Symptom	DSM-5	ICD-10	ICD-11
Symptom	Major depressive disorder	Major depressive episode	Depressive episode
Depressed mood	X	X	X
Loss of pleasure or interest in almost all activities	X	X	X
Significant weight gain	X		X
Significant weight loss	X	X	X
Significant increase in appetite	X		X
Significant loss of appetite	X	X	X
Insomnia	X	X	X
Hypersomnia	X	X	X
Psychomotor agitation	X		X
Psychomotor slowing down	X		X
Fatigue or loss of energy	X	X	X
Feeling worthless or excessive guilt	X	X	X
Decreased ability to concéntrate	X	X	X
Decreased ability to think / make decisions	X
Recurring thoughts of death	X		X
Suicidal plans or ideation	X	X	X
Suicide attempt	X	X	X
Reduced activity level		X
Decreased attention		X	X
Loss of self-confidence		X
Feeling of inferiority		X
Grim perspective of the future		X	X
Self-harm		X
Loss of reactivity to pleasant events and stimuli		X	X
Loss of libido		X	X

Symptom summary for major depressive disorder / episode in DSM-5, ICD-10 and 11. Once the weights of the factors and sub-factors were established, a confirmation of this phase was carried out by the experts. In addition, the most accurate response scale was utilized with respect to the proposed objective and it was presented to the experts as a ‘table of test specifications’ Two response scales were considered: one exclusively temporal with the evaluations marking the time of duration of the symptoms, and the other indicating the frequency of appearance of the symptoms in three temporal moments (last month, last year and always). All the experts agreed that the best alternative was this second modality. From there, only one change was proposed in the affective factor. Originally it was composed of depressed mood, anhedonia, and undervaluation and guilt each with a value of 33%, but after this initial trial depressed mood changed to 50% and anhedonia as well as undervaluation and guilt each became 25%. After that, a bank of 300 items was prepared, where writing double negatives, double verbs, complex phrases and complex vocabulary was avoided. These items were subjected to qualitative evaluation by consulting six experts who were asked to indicate the adequacy of the definitions that were given of depression and each of the facets as well as the components that formed them. They were also asked to evaluate the sufficiency of the percentage of importance given to each facet in a component (established according to appearance in the DSM, the ICD or both).

Procedure

In the second phase, the second expert judgment coming from 13 judges (three of them also participated in the previous phase) was carried out. First, instructions were provided on the importance of this procedure and the tasks to be performed: After the initial instructions, the general information of the test was presented so that the experts had all the necessary information to understand the complete final test and could provide their suggestions as to the general idea of the questionnaire and its objective. Subsequently, the components and the facets of each of them were presented. Along with the definitions, the weight of the factor within the component was indicated (see Appendix A). Then, the experts were asked to use the response scale. In order to avoid the fatigue effect, the questionnaire was divided into six equal parts (50 items). Each of these parts had the same number of items for each factor and subfactor (also disordered) to avoid both fatigue and response by acquiescence while trying to evaluate all the items of the same factor. Some experts were sent all parts of the questionnaire (300 items) and others only one (50 items) or two of them (100 items). In all cases, the criteria to be evaluated were the following: Content: the item belongs to the indicated factor and subfactor — No (0), Yes, just the factor (1), Yes (2). Relevance: the item is relevant to the construct — Not relevant (0), Needs some revision (1), Relevant, but minor revision (2), Relevant (3). Clarity: the item is clear or needs some revision — Confusing (0), Needs some revision (1), Slight revision (2), Clear (3). Comprehension: the item can be interpreted in different ways — No (0), In two ways (1), In several ways (2). Sensitivity: the item will allow differentiating between depression patients and subjects without the disorder — No, (0), In some cases (1), In most cases (2), Yes (3). Ofensivity: the item may offend the evaluated persons — No (0), In some cases (1), In most cases (2), Yes (3). The qualitative observations of the experts were considered for each of the items that formed the original instrument. In total, five judgments were obtained from each part into which the instrument was divided (50 items). Information was obtained from each of the experts individually (following the individual aggregate method) in a confidential manner, without them having contacted each other (Almenara & Cejudo, 2013). The data were collected in a Microsoft Excel 2010 sheet and then processed in the SPSS 25 statistical programme. This work was approved by the Ethics Committee of the University of Granada (Spain).

Results

All of the items that met the established requirements were considered adequate. Those that were partially adequate and required some changes and the inadequate ones that were considered incongruous or problematic with the established criteria were eliminated. First, the adequacy of the item content — in this case depression — was analyzed to the measured construct. All those items with scores below 1.6 were eliminated (this scale ranges from 0 to 2). Following this criterion, 39 items were eliminated (13% of the total). Then, items with clarity less than 2.2 (scale from 0 to 3) were eliminated, thereby eliminating 17 items (6% of the total). The next criterion was relevance, where items with a mean of less than 2.4 (scale from 0 to 3) between the five experts were taken as the cut-off point. When applying this criterion, the following 42 items were eliminated (14% of the total). Finally, we observed the presence of items that, having acceptable scores, had various areas with scores that were not maximum and these items also exhibited slight comprehension problems. Here, 8 items were removed (4% of the total). This process involved the suppression of 104 items. Some of the items (10) were corrected in writing. All this made it possible to obtain a clearer and slightly shorter measuring instrument, with 196 items, which helped to reduce the application time and improve the objectivity of the response options. Table 2 shows the number of items that finally remained in each Factor and Subfactor.

Table 2

Number of items corresponding to each Factor and Subfactor of the DCET.

Subfactor	Factor (primary)	Items n.
Depressed mood	Affective	25
Anhedonia	Affective	13
Undervaluation and guilt	Affective	5
Thoughts of undervaluation	Affective	7
Vacuum sensation	Somatic	1
Sleep disturbances	Somatic	8
Appetite / weight alteration	Somatic	4
Fatigue	Somatic	8
Motor agitation	Somatic	3
Language slowing	Somatic	2
Pain	Somatic	3
Decreased libido	Somatic	3
Disinterest in activities	Cognitive	8
Decreased concentration	Cognitive	8
Decreased attention	Cognitive	12
Thoughts of death	Cognitive	10
Expressions of discomfort	Behavioural	4
Abandonment of pleasant activities	Behavioural	7
Variation in diet	Behavioural	3
Worst task performance	Behavioural	8
Self-harm / suicide	Behavioural	9
Addictive substance abuse	Behavioural	3
social deterioration	Interpersonal	7
Family deterioration	Interpersonal	6
Work / school impairment	Interpersonal	7
Partner impairment	Interpersonal	6
Deterioration of other áreas	Interpersonal	6
Clinical discomfort	Interpersonal	10

Number of items corresponding to each Factor and Subfactor of the DCET. In addition to the expert judgment, the 300 items were subjected to comprehension evaluation in an adult's sample. The responses of the 50 people surveyed were taken into account (scoring their understanding on a scale of 0 to 10) with an average comprehension of 9.82 out of 10. There were no items with an understanding lower than 9, which indicated that all the items were easily understandable and, therefore, it was not necessary to delete or modify any item after the analysis.

Discussion

The objective of this work was to propose a comprehensive model of depression in order to develop a test for its evaluation. Secondly, it was intended to estimate the content validity based on expert judgments of the DCET which included five dimensions of the disorder for adults. Finally, the authors wanted to evaluate the comprehension of the developed items. After the different analyses, a test specification table was developed which adequately described the clinical criteria. From it, a sensitive and valid a bank of items was created, after purification. In addition, the items were understandable. One of the strengths of this instrument is that it has been created with the intention of exhaustively evaluating those main, core and representative components of depression that are not present in cases of pure anxiety. This fact represents advancement over current questionnaires (e.g., BDI, Beck et al., 1988; CBD, Peñate, 2001; IDER, Spielberger et al., 2008). Likewise, it should be noted that the initial item bank that constituted the instrument was so exhaustive that the entire symptomatic picture of depressive disorder was covered as grouped by the following factors: affective, somatic, cognitive, behavioral and interpersonal. Also, various subfactors were considered within each one of them. This fact corresponds with the current psychometric specifications (Muñiz et al., 2013). This work was submitted to an evaluation of its quality by experts. They evaluated them based on various categories (relevance, representativeness, etc.), thus making this procedure an essential criterion to determine the quality of measurement by an instrument (Muñiz & Fonseca-Pedrero, 2019). Incidentally, Almenara and Cejudo (2013) pointed out among the most outstanding benefits of this methodology, the level of depth it offered, the little difficulty one would experience using it or that the technical and human requirements for its utilization were not too demanding. The present study selected 16 experts to respond to the proposed objectives, a number that was in the range recommended by various authors (Urrutia et al., 2014; Varela-Ruiz et al., 2012). Experts in the field of clinical psychology were selected and it was determined that all of them had to have experience in research and treatment on emotional as well as depressive disorders and psychometrics. In view of the results obtained, one can have an instrument that has adequate content validity to evaluate depression and its symptoms. Furthermore, the sub-factors that compose them are also adjusted to the theoretical definition of depression proposed. This will be essential when evaluating depression comprehensively and will help them to know the main affected areas for the treatment (Mavranezouli et al., 2020; Pybis et al., 2017). Also, it is essential to have evaluation instruments with a dimensional and non-categorical approach. Currently, the ICD-11 (World Health Organization, 2019) recommends the use of these types of approaches as they can more appropriately address various disorders (e.g., personality disorders; Chiesa et al., 2017; Fowler et al., 2015; Waugh et al., 2017) This work is not without its limitations. One of the most outstanding was the large number of items that the instrument initially covered, which meant dividing the questionnaire when presenting it to the experts. Considering future works in obtaining evidence of construct validity, future exploratory developments should also take into account maintaining an adequate number of items in each subfactor, taking special care in factors with few items. The choice of the number of experts was also somewhat difficult, due to differences among the authors. Some considered the ideal range between 7 and 30 (Urrutia et al., 2014). Most authors recommend consulting more than 10 experts (García-Martín et al., 2016; Juárez-Hernández & Tobón, 2018). Thus, for the present study, altogether 16 experts were chosen (6 for the first phase and 10 for the second) for their availability as well as level of experience in the matter. Future researches will be focused on applying pertinent statistical analyses (EFA, CFA) which allow selecting the items that will finally constitute each of the factors and sub-factors with adequate statistical significance. In any case, the authors of this work have managed to develop a pilot instrument to assess depression in a multidimensional way.

Funding

This study has been funded by Bursary FPU17/05262 for University Professor Training as part of the first author's thesis (Psychological Doctoral Programme B13 56 1; RD 99/2011).

19 in total

1. Cost-effectiveness of psychological treatments for post-traumatic stress disorder in adults.

Authors: Ifigeneia Mavranezouli; Odette Megnin-Viggars; Nick Grey; Gita Bhutani; Jonathan Leach; Caitlin Daly; Sofia Dias; Nicky J Welton; Cornelius Katona; Sharif El-Leithy; Neil Greenberg; Sarah Stockton; Stephen Pilling
Journal: PLoS One Date: 2020-04-30 Impact factor: 3.240

2. Ten steps for test development

Authors: José Muñiz; Eduardo Fonseca-Pedrero
Journal: Psicothema Date: 2019-02

3. Categorical and dimensional approaches in the evaluation of the relationship between attachment and personality disorders: an empirical study.

Authors: Marco Chiesa; Antonella Cirasola; Riccardo Williams; Valentina Nassisi; Peter Fonagy
Journal: Attach Hum Dev Date: 2016-11-29

4. An inventory for measuring clinical anxiety: psychometric properties.

Authors: A T Beck; N Epstein; G Brown; R A Steer
Journal: J Consult Clin Psychol Date: 1988-12

5. Evaluation of methods used for estimating content validity.

Authors: Enas Almanasreh; Rebekah Moles; Timothy F Chen
Journal: Res Social Adm Pharm Date: 2018-03-27

6. Psychological Assessment with the DSM-5 Alternative Model for Personality Disorders: Tradition and Innovation.

Authors: Mark H Waugh; Christopher J Hopwood; Robert F Krueger; Leslie C Morey; Aaron L Pincus; Aidan G C Wright
Journal: Prof Psychol Res Pr Date: 2017-04

7. The comparative effectiveness and efficiency of cognitive behaviour therapy and generic counselling in the treatment of depression: evidence from the 2^nd UK National Audit of psychological therapies.

Authors: Jo Pybis; David Saxon; Andy Hill; Michael Barkham
Journal: BMC Psychiatry Date: 2017-06-09 Impact factor: 3.630

8. Content Validation of an Instrument for the Assessment of School Teachers' Levels of Knowledge of Diabetes through Expert Judgment.

Authors: Trinidad Luque-Vara; Marta Linares-Manrique; Elisabet Fernández-Gómez; Adelina Martín-Salvador; María Angustias Sánchez-Ojeda; Carmen Enrique-Mirón
Journal: Int J Environ Res Public Health Date: 2020-11-19 Impact factor: 3.390