Literature DB >> 28195649

Validity and Utility of the Patient Health Questionnaire (PHQ)-2 and PHQ-9 for Screening and Diagnosis of Depression in Rural Chiapas, Mexico: A Cross-Sectional Study.

Jafet Arrieta^1,2,3, Mercedes Aguerrebere³, Giuseppe Raviola^2,3, Hugo Flores^2,3,4, Patrick Elliott^2,3,4, Azucena Espinosa³, Andrea Reyes³, Eduardo Ortiz-Panozo⁵, Elena G Rodriguez-Gutierrez⁶, Joia Mukherjee^2,3,4, Daniel Palazuelos^2,3,4, Molly F Franke².

Abstract

BACKGROUND: Depressive disorders are frequently under diagnosed in resource-limited settings because of lack of access to mental health care or the inability of healthcare providers to recognize them. The Patient Health Questionnaire (PHQ)-2 and the PHQ-9 have been widely used for screening and diagnosis of depression in primary care settings; however, the validity of their use in rural, Spanish-speaking populations is unknown.
METHOD: We used a cross-sectional design to assess the psychometric properties of the PHQ-9 for depression diagnosis and estimated the sensitivity and specificity of the PHQ-2 for depression screening. Data were collected from 223 adults in a rural community of Chiapas, Mexico, using the PHQ-2, the PHQ-9, and the World Health Organization Quality of Life BREF Scale (WHOQOL- BREF).
RESULTS: Confirmatory factor analysis suggested that the 1-factor structure fit reasonably well. The internal consistency of the PHQ-9 was good (Cronbach's alpha > = 0.8) overall and for subgroups defined by gender, literacy, and age. The PHQ-9 demonstrated good predictive validity: Participants with a PHQ-9 diagnosis of depression had lower quality of life scores on the overall WHOQOL-BREF Scale and each of its domains. Using the PHQ-9 results as a gold standard, the optimal PHQ-2 cutoff score for screening of depression was 3 (sensitivity 80.00%, specificity 86.88%, area under receiver operating characteristic curve = 0.89; 95% confidence interval [0.84, 0.94]).
CONCLUSION: The PHQ-2 and PHQ-9 demonstrated good psychometric properties, suggesting their potential benefit as tools for depression screening and diagnosis in rural, Spanish-speaking populations.

Entities: Chemical Disease Gene Species

Keywords: mental health care; primary care; validation

Mesh：

Year: 2017 PMID： 28195649 PMCID： PMC5573982 DOI： 10.1002/jclp.22390

Source DB: PubMed Journal: J Clin Psychol ISSN： 0021-9762

Background

Mental disorders represent 14% of the global burden of disease (Alwan, 2011) and, along with substance disorders, were the leading cause of disability‐adjusted life years (DALYs) and years lived with disability (YLD) in 2010. With a prevalence of 4% globally (Ferrari et al., 2013), depressive disorders accounted for 41% of DALYs caused by mental and substance use disorders (Whiteford et al., 2013). In Mexico, depression prevalence estimates range from 7% to 9% (Medina‐Mora, Borges, Benjet, Lara, & Berglund, 2007; Spitzer, Williams, & Kroenke, 2014) Depression is associated with severe physical and social impairment, lower quality of life (QOL; Andriopoulos, Lotti‐Lykousa, Pappa, Papadopoulos, & Niakas, 2013; Papakostas et al., 2004; Pyne et al., 1997), and higher health care utilization in places where health services are available (Katon et al., 1990; Wu, Erickson, Piette, & Balkrishnan, 2012). Because patients living with depression often do not seek help for psychological problems but seek care for somatic symptoms instead, their depression often goes unrecognized (Katon & Ciechanowski, 2002; Roness, Mykletun, & Dahl, 2005), leading to treatment delays. Among patients presenting with depression and anxiety in the outpatient medical setting, it is estimated that > 50% patients present with physical rather than psychological complaints (Kroenke, 2003). In the World Health Organization's (WHO) Psychological Problems in General Health Care study, physicians correctly recognized depression in only 42% of patients attending a primary care consultation for depressive symptoms (Simon, Goldberg, Tiemens, & Ustun, 1999). Lack of access to mental health care may further exacerbate treatment delays, particularly in resource‐limited settings (Benjet, Borges, Medina‐Mora, Fleiz‐Bautista, & Zambrano‐Ruiz, 2004; Familiar et al., 2013; Jesse, Dolbier, & Blanchard, 2008; Roness et al., 2005; WHO, 2011). In rural or underserved settings conditioned by a shortage of mental health professionals, medications, and infrastructure (Kitchen Andren et al., 2013) and a lack of available services in the health sector may lead people to seek care with traditional healers rather than with trained clinicians (Nigenda, Mora‐Flores, Aldama‐Lopez, & Orozco‐Nunez, 2001; Pedersen & Baruffati, 1985; Roness et al., 2005; WHO, 2011). Furthermore, even when mental health services are available, stigma, exclusion, and discrimination may prevent care and treatment from reaching people with mental disorders (WHO, 2011). In Mexico, the delay to seek treatment has been reported at 10.6 years for early onset depression and 1.8 years for late onset depression, on average (Benjet et al., 2004). Given that patients with depression often delay seeking care–and when they do, they seek care for somatic rather than psychological symptoms–it is a challenge for primary health care providers to identify depressive disorders early. Several screening questionnaires have been developed as tools to guide early detection of depression and clinical decision making. The Patient Health Questionnaire (PHQ)‐2 (Kroenke, Spitzer, & Williams, 2003) and the PHQ‐9 (Spitzer et al., 1999) were specifically designed as depression screening and diagnostic instruments for use in primary care settings, to facilitate the delivery of evidence‐based mental health care interventions in places where there is a lack of specialized mental health providers (Spitzer, Kroenke, & Williams, 1999). Spanish versions of the PHQ‐9 have been reported to be reliable and valid measures of depression in urban clinical settings in Spain (Diez‐Quevedo, Rangil, Sanchez‐Planell, Kroenke, & Spitzer, 2001), Honduras (Wulsin, Somoza, & Heck, 2002), Chile (Baader et al., 2012), and Mexico (Familiar et al., 2015). However, the PHQ‐9 has not yet been validated in rural and highly marginalized Spanish‐speaking populations where the literacy rates are lower than urban settings (Stromquist, 2001) and the subjective experience of illnesses is shaped differently by the cultural background of individuals and the sociological characteristics of their context (Castro & Eroza, 1998). The aim of this study was to assess the psychometric properties of the PHQ‐9 for diagnosis of depression in a rural, Spanish‐speaking population, and if the PHQ‐9 proved to be valid, to estimate the sensitivity and specificity of the PHQ‐2 for screening of depression.

Method

Setting and Study Population

We conducted a cross‐sectional study between July and December 2014 in a rural and highly marginalized community of the Sierra Madre Mountains (the Sierra) of Chiapas, Mexico, in collaboration with Compañeros En Salud (CES), Partners In Health's sister organization in Mexico. Chiapas is the poorest state in Mexico and has a population of approximately 5 million (Instituto Nacional de Estadística y Geografía [INEGI], 2010). Approximately, 51% of the population lives in rural areas, 30% is indigenous, and 75% lives below the poverty line (INEGI, 2010). In the Sierra, living conditions are difficult, with approximately 30% of homes lacking water, 46% lacking sewage, and 15% lacking electricity (Instituto para el Federalismo y el Desarrollo Municipal, 2010). Chiapas has the lowest level of effective health coverage in the country, which is affected by high levels of marginalization, geographic barriers to access care, poor communication infrastructure, and a shortage of human resources for health (Lozano et al., 2013). The prevalence of depression has been estimated at 6.3% for women and 2.6% for men in Chiapas (Belló, Puentes‐Rosas, Medina‐Mora, & Lozano, 2005), where there are only 26 psychiatrists for a rate of 0.54 per 100,000 population , compared to a rate of 2.7 per 100,000 population in Mexico (Heinze, del Carmen Chapa, Santiesteban, & Vargas, 2012). In 2012, CES launched a community‐based mental health program (Belkin et al., 2011; Raviola, Eustache, Oswald, & Belkin, 2012) that aimed to improve access to quality mental health care in the Sierra. This program includes active case‐finding activities for depression, which comprises the use of the PHQ‐2 for screening followed by the implementation of the PHQ‐9 for diagnosis of depression. Patients with a PHQ‐9 score greater than 9 are linked to the clinic for further diagnosis and treatment.

Ethics, Consent, and Permissions

The protocol received ethical approval from the Institutional Review Board (IRB) of the Harvard Medical School Office of Human Research Administration and the Tecnologico de Monterrey School of Medicine IRB. All participants provided verbal informed consent.

Measures

Sociodemographic characteristics

We assessed ability to read and write and financial situation, a proxy for socioeconomic status (SES), via self‐report.

PHQ‐9 (Spitzer et al., 1999)

The PHQ was developed to make a criteria‐based diagnosis of major depressive disorder. The PHQ‐9 comprises nine items that evaluate the presence of the nine Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (American Psychiatric Association, 2000a) criteria for major depressive disorder in the previous 2 weeks. Each item of the PHQ‐9 is rated on a 4‐point scale, ranging from 0 (not at all) to 3 (nearly every day), for a total score ranging from 0 to 27. Higher scores indicated an increased severity of symptoms and an increased likelihood of major depressive disorder (Kroenke, Spitzer, & Williams, 2001). Questionnaires with up to two missing values are scored, replacing any missing values with the average score of the completed items (Kroenke, Spitzer, Williams, & Löwe, 2010). Cutoffs of 5, 10, 15, and 20 represent mild, moderate, moderately severe, and severe levels of depressive symptoms, respectively (Kroenke et al., 2010). A cutoff of 10 or greater has been described as diagnostic in systematic reviews and meta‐analysis of the PHQ‐9 (Kroenke et al., 2010; Manea, Gilbody, & McMillan, 2015) and was used as the cutoff for a diagnosis of depression in the present study. A final item assesses the perception of social, functional, and occupational impairment caused by the symptoms examined by the PHQ‐9. Participants rate how difficult their depressive symptoms made it to do their work, take care of things at home, or get along with other people, with four possible responses: not difficult at all, somewhat difficult, very difficult, and extremely difficult. We used the official Spanish translation of the PHQ‐9 (Spitzer et al., 2014).

PHQ‐2 (Kroenke, 2003)

The PHQ‐2 comprises the first two items of the PHQ‐9 and evaluates the frequency of depressed mood and little interest or pleasure in doing things over the past 2 weeks. Items are rated on a 4‐point scale, ranging from 0 (not at all) to 3 (nearly every day), for a total score ranging from zero to six (Kroenke et al., 2010). A cutoff of 3 or greater has been found to have the greatest sensitivity and specificity for screening of depression (Kroenke et al., 2003).

The WHO Quality of Life BREF (WHOQOL‐BREF) assessment instrument

We used the WHOQOL‐BREF to assess QOL (Harper, 1996). This scale is an abbreviated 26‐item version of the WHOQOL‐100 and comprises items that were extracted from the WHOQOL‐100 field trial data (Harper, 1996). The WHOQOL‐BREF has one item from each of the 24 facets of QOL included in the WHOQOL‐100, plus two items from the general facet on overall QOL and general health. The facets are subsumed in four domains: physical health, psychological health, social relationships, and environment. The scores for each of the domains are transformed on a scale from 0 to 20, to enable comparisons to be made between domains composed of unequal numbers of items. A higher score for the WHOQOL‐BREF scale reflects a higher level of functioning and QOL (Harper, 1996). The WHOQOL‐BREF instrument has proved to be reliable and valid for assessing QOL among diverse Spanish‐speaking populations, and its psychometric properties have been shown to be similar to those of the English version (Colbourn, Masache, & Skordis‐Worrall, 2012; Espinoza, Osorio, Torrejon, Lucas‐Carrasco, & Bunout, 2011; Lucas‐Carrasco, Laidlaw, & Power, 2011; Lucas‐Carrasco, 2012).

Data Collection

Seven medical students visited 152 households as part of a programmatic census and active case‐finding activity, and they invited all eligible adult residents to participate in the study (N = 250). Eligible participants were those who were 18 years or older, resided within the study catchment area, and were native‐Spanish speakers. The students informed the potential participants about the study and asked for verbal informed consent. After obtaining informed consent, the medical students conducted a face‐to‐face interview using the PHQ‐2, the PHQ‐9, and the WHOQOL‐BREF assessment questionnaires. A total of 223 (89%) people completed the interview. Those items that were not answered or understood by the participants were computed as missing data. Participants with a PHQ‐9 score greater than 9 and those who reported self‐harm ideation or attempts were referred to the clinic for further evaluation by the local physician. Additionally, in a random subset (10%) of participants with a PHQ‐9 score greater than 9, a local psychiatrist, blind to the PHQ‐9 results, conducted a second depression evaluation using the Hamilton Depression Rating Scale (Hamilton, 1960) and clinical criteria.

Data Analysis

We excluded eight participants with more than two missing values in the PHQ‐9. We conducted confirmatory factor analyses using weighted least squares and compared goodness of fit statistics between a one‐factor solution and two two‐factor solutions proposed for Mexican women and a United States‐based population, including Latina women, respectively (Familiar et al., 2015; Granillo, 2012). Model goodness of fit was assessed using the comparative fit index (Bentler, 1990) and the Bentler‐Bonnet non‐normed index (Bentler & Bonett, 1980). Values > 0.90 for these indices indicate acceptable fit. We also examined root mean square error of approximation RMSEA, for which a value < 0.08 indicates an excellent model fit. Finally, we examined factor loadings under each solution and their statistical significance. Once we identified the optimal model, we assessed internal consistency of the PHQ‐9 in the study population and across subgroups defined by gender, literacy, and age, using the Cronbach's alpha coefficient. We assessed predictive validity in two ways. First, we compared WHOQOL‐BREF domain scores among participants with PHQ‐9 scores greater than 9 and less, or equal to 9 and tested for differences using Wilcoxon rank‐sum test. Based on previous research studies (Andriopoulos et al., 2013; Papakostas et al., 2004; Pyne et al., 1997), we anticipated that the median WHOQOL‐BREF scores for the overall scale and each of the domains would be lower for participants with a PHQ‐9 score greater than 9. Second, we conducted univariable logistic regression analysis to evaluate the association between sociodemographic characteristics and a PHQ‐9 diagnosis of depression. Based on prior studies, we hypothesized that the presence of depression symptoms would be more common among women than men, and among those with lower SES compared to those with higher SES (Andrade et al., 2003; Andriopoulos et al., 2013; Belló et al., 2005; Bromet et al., 2011; Popoola & Adewuya, 2012; Rancans, Vrublevska, Snikere, Koroleva, & Trapencieris, 2014; Slone et al., 2006). After establishing the validity of the PHQ‐9, we assessed the sensitivity, specificity, and positive and negative likelihood ratios of the PHQ‐2 as a screening instrument, using the PHQ‐9 as a gold standard. We used receiver operating characteristic (ROC) curve analysis to find the optimal PHQ‐2 cutoff for depression screening. Statistical significance for all tests was determined at a p‐value < 0.05. We computed descriptive and analytic statistics of the quantitative data obtained in this study using STATA (version 13.1).

Results

Sociodemographic Characteristics

Table 1 shows the sociodemographic characteristics of the study population. Of the 215 participants included for analysis, 152 (71%) were women. The mean age of the participants was 38 (standard deviation [SD] 16) years, and 21% of the participants were unable to read or write. Agriculture was the main economic activity for 90% of the participants, and 48% of the participants reported not having enough money for basic expenses. Ninety‐one percent had access to at least one social program, including Seguro Popular, a health insurance program, and Oportunidades, a conditional cash transfer program.

Table 1

Sociodemographic Characteristics of the Study Population (N = 215)a

Variable	N (%) or mean (SD)
Woman	152 (71)
Age	38 (16)
Literate	169 (79)
Level of education
Secondary or more	69 (32)
Primary or less	122 (57)
None	24 (11)
Has a partner	171 (80)
Has children	185 (86)
Access to services (electricity, water and sanitation)	181 (84)
Water source
Tap (inside house)	7 (3)
Tap (in yard)	180 (84)
Communal tap	7 (3)
River/stream/pond	21 (10)
Toilet type
Flush toilet	6 (3)
Bucket toilet	202 (94)
Pit latrine	1 (0)
None	6 (3)
Energy source
Electricity	209 (97)
Wood	5 (2)
None	1 (0)
Economic activity
Agriculture	194 (90)
Small business	2 (1)
Both	10 (5)
Other	9 (4)
Religion (N=213)
Catholic	198 (93)
Presbyterian	6 (3)
Other	5 (2)
None	4 (2)
Access to social programs	196 (91)
Seguro Popular (health insurance program)	55 (26)
Oportunidades (conditional cash transfer program)	10 (5)
More than one social program	131 (61)
Financial situation
Enough money for food/clothes and other things	16 (7)
Enough money for food/clothes but not other things	95 (44)
Not enough money for basic expenses	103 (48)
Refused to answer	1 (0)

Note. SD = standard deviation.

aUnless otherwise specified.

Sociodemographic Characteristics of the Study Population (N = 215)a Note. SD = standard deviation. aUnless otherwise specified.

PHQ‐9 Scores, Scale Comprehension, and Follow‐Up

For the 215 participants, the overall mean PHQ‐9 score was 6.29 (SD 5.47). A total of 65 (30.2%) participants had a PHQ‐2 score equal to or greater than 3. A total of 55 (25.6%) participants had a PHQ‐9 score greater than 9: 35 (63.6%) participants had a PHQ‐9 score between 10 and 14, 13 (23.6%) between 15 and 19, and seven (12.7%) greater than 20, corresponding to moderate, moderately severe, and severe depression, respectively. Of the 55 participants with a PHQ‐9 score greater than 9, eight (14.5%) had a previous diagnosis of depression and were already receiving treatment at the local clinic. The remaining participants were referred to their local clinic for follow‐up, 35 (63.6%) of whom attended the appointment; 28 of the 35 (80.0%) received a confirmatory diagnosis of depression by the local physician. When the local psychiatrist conducted an additional evaluation in a random subset of seven (13%) participants with a PHQ‐9 score greater than 9, she confirmed the depression diagnosis in all of them. Table 2 shows the distribution of responses to each of the PHQ‐9 items and the proportion of participants who did not understand each of the items. Feeling tired or having little energy and feeling down, depressed, or hopeless were the most common symptoms, reported in 68% and 56%, respectively. Furthermore, 26% of the study population reported thoughts that they would be better off dead or of hurting themselves in some way in the previous 2 weeks, 9% of which reported having these thoughts on more than half the days. Six percent of participants reported not understanding item 1 (“Having little interest or pleasure in doing things”) and/or item 8 (“Moving or speaking so slowly that other people could have noticed, or feeling so fidgety or restless that you have been moving around a lot more than usual”). Inability to read or write was positively and statistically significantly associated with not understanding item 1 (p‐value = 0.03) and borderline positively associated with not understanding item number 8 (p‐value = 0.05).

Table 2

PHQ‐9 Questions and Answer Frequency (N = 215)

	Not at all	Several days	More than half the days	Nearly every day	Did not understand
Questions	%	%	%	%	%
1. Little interest or pleasure in doing things	51.6	23.7	6.0	12.6	6.0
2. Feeling down, depressed, or hopeless	44.2	32.1	5.6	17.7	0.5
3. Trouble falling or staying asleep, or sleeping too much	60.0	28.8	1.4	9.3	0.5
4. Feeling tired or having little energy	31.2	38.1	9.3	20.5	0.9
5. Poor appetite or overeating	57.7	26.5	2.8	13.0	0.0
6. Feeling bad about yourself, or that you are a failure or have let yourself or your family down	55.3	28.8	5.1	8.8	1.9
7. Trouble concentrating on things, such as watching television	61.4	22.3	4.7	8.4	3.3
8. Moving or speaking so slowly that other people could have noticed or being so fidgety or restless that you have been moving around a lot more than usual	63.7	20.0	2.8	7.0	6.5
9. Thoughts that you would be better off dead or of hurting yourself in some way	74.0	16.3	2.8	6.5	0.5

PHQ‐9 Questions and Answer Frequency (N = 215)

Confirmatory Factor Analyses

Results of confirmatory factor analyses are shown in Table 3. For the one‐factor solution, the comparative fit index indicated acceptable goodness of fit, with a value of 0.91; however, the value for the Bentler‐Bonnett non‐normed index was just below the threshold for acceptability (0.88). All factor loadings were statistically significant at a level of < 0.001, and all but one factor loading (item 1, “Having little interest or pleasure in doing things”) indicated moderate (> 0.40) association with the factor. With regard to the two‐factor solutions, results for each were nearly identical to the one‐factor solution, and the factors within the two‐factor solutions were highly correlated (0.92) for both models. Given that neither of the two factor solutions appeared to improve fit, relative to the single factor solution, all subsequent analyses were conducted based on the single‐factor solution.

Table 3

Confirmatory Factor Analysis of the PHQ‐9 Among Adults in Rural Mexico: Standardized Factor Loadings and Goodness of Fit Statistics

				Model III
	Model I	Model II		Two‐factor solution identified by Familiar et al.
Depression	Affect	Somatic	F1	F2
1. Little interest or pleasure in doing things	36	39		39
2. Feeling down, depressed, or hopeless	82	83		83
3. Trouble falling or staying asleep, or sleeping too much	61		62	59
4. Feeling tired or having little energy	61		63	62
5. Poor appetite or overeating	73		73	73
6. Feeling bad about yourself, or that you are a failure or have let yourself or your family down	54	55			58
7. Trouble concentrating on things, such as watching television	74	73			77
8. Moving or speaking so slowly that other people could have noticed or being so fidgety or restless that you have been moving around a lot more than usual	44		47		48
9. Thoughts that you would be better off dead or of hurting yourself in some way	78	79			82
Correlation between factors	–	0.92		0.92
NNFI	0.88	0.88		0.88
CFI	0.91	0.91		0.92
RMSEA	0.09	0.09		0.09

Note. NNFI = Bentler‐Bonnet non‐normed fit index; CFI = comparative fit index.

Confirmatory Factor Analysis of the PHQ‐9 Among Adults in Rural Mexico: Standardized Factor Loadings and Goodness of Fit Statistics Note. NNFI = Bentler‐Bonnet non‐normed fit index; CFI = comparative fit index. The two items with the highest factor loadings in the confirmatory factor analysis were item 2 (“Feeling down, depressed, or hopeless”) and item 9 (“Having thoughts that you would be better off dead or of hurting yourself in some way”). The two items with the lowest factor loadings were item 1 (“Having little interest or pleasure in doing things”) and item 8 (“Moving or speaking so slowly that other people could have noticed, or feeling so fidgety or restless that you have been moving around a lot more than usual”).

Reliability

The internal consistency of the PHQ‐9 was good for both the overall PHQ‐9 scores and each of the subgroups evaluated. The Cronbach's alpha coefficient was 0.81 for the overall PHQ‐9; 0.85 for men and 0.80 for women; 0.81 for literate participants and 0.83 for illiterate participants; 0.80 for participants younger than 60 years of age and 0.90 for participants 60 years of age and older.

Predictive Validity

Table 4 shows the association between the PHQ‐9 scores and the WHOQOL‐BREF scores. The median WHOQOL‐BREF scores for the overall scale and each of the domains were statistically significantly lower for the participants with a PHQ‐9 score greater than 9, except for the social relationships domain, which was borderline significant (p = 0.05). PHQ‐2 and PHQ‐9 scores were positively correlated, with a Pearson correlation coefficient of 0.75.

Table 4

Association Between WHOQOL‐BREF Domains and PHQ‐9 Scores

	PHQ‐9≤9	PHQ‐9>9
	N, median (25th, 75th)	N, median (25th, 75th)	p‐value a
WHOQOL‐Physical	152, 16.00 (14.29, 17.14)	51, 12.57 (10.86, 14.87)	<0.0001
WHOQOL‐Psychological	151, 16.00 (14.00, 17.33)	51, 12.80 (11.20, 15.33)	<0.0001
WHOQOL‐Social Relationships	156, 16.00 (14.67, 17.33)	54, 16.00 (13.33, 16.00)	0.05
WHOQOL‐Environment	147, 14.00 (12.50, 15.50)	51, 13.00 (11.50, 14.29)	0.0007
Total WHOQOL Score	141, 61.24 (57.62, 64.67)	48, 53.77 (46.99, 59.29)	<0.0001

Note. WHOQOL = World Health Organization Quality of Life BREF Scale; PHQ = Patient Health Questionnaire.

aWilcoxon rank sum p‐value.

Association Between WHOQOL‐BREF Domains and PHQ‐9 Scores Note. WHOQOL = World Health Organization Quality of Life BREF Scale; PHQ = Patient Health Questionnaire. aWilcoxon rank sum p‐value. The univariable regression analysis (Table 5) showed that having a partner was significantly associated with a 64% reduction in the odds of having a PHQ‐9 greater than 9 (p‐value = 0.003). No other statistically significant associations between sociodemographic characteristics and a PHQ‐9 score greater than 9 were found. However, although not statistically significant, not having enough money for basic expenses was associated with an increase in the odds of having a PHQ‐9 greater than 9; while having children as well as access to services and social programs were associated with a reduction in the odds of having a PHQ‐9 greater than 9.

Table 5

Association Between Sociodemographic Characteristics and PHQ‐9 Score > 9

Population (N = 215)	PHQ‐9 Score a
Variable	OR	95% CI	p‐value
Age (per increase in 10 years)	1.01	[0.83, 1.23]	0.92
Female gender	1.14	[0.58, 2.26]	0.70
Has a partner	0.36	[0.17, 0.70]	0.003
Has children	0.54	[0.24, 1.22]	0.14
Literate	1.54	[0.69, 3.43]	0.29
Has access to electricity, water and sanitation	0.95	[0.41, 2.18]	0.90
Not enough money for food/clothes	1.74	[0.94, 3.24]	0.08
Social programs	0.72	[0.26, 2.00]	0.53

Note. PHQ = Patient Health Questionnaire; OR = odds ratio; CI = confidence interval.

aUnivariable regression analysis.

Association Between Sociodemographic Characteristics and PHQ‐9 Score > 9 Note. PHQ = Patient Health Questionnaire; OR = odds ratio; CI = confidence interval. aUnivariable regression analysis.

Sensitivity and Specificity of the PHQ‐2

Table 6 shows the sensitivity, specificity, and positive and negative likelihood ratios of the PHQ‐2 for depression screening. Using a cutoff of 3 and greater, the PHQ‐2 had a sensitivity of 80.0%; specificity of 86.9%; and positive and negative likelihood ratios of 6.10 and 0.23, respectively. In ROC analysis (Figure 1), the PHQ‐2 performed well in detecting participants with a PHQ‐9 diagnosis of depression (area under the ROC curve was 0.89 (95% confidence interval [0.84, 0.94]). Given that with a score of 3 there was a balance between sensitivity and specificity, a PHQ‐2 score of 3 was identified as the optimal cutoff for screening of depression. Using the sensitivity results of the PHQ‐2, at a cutoff of 3, the estimated prevalence of depression was 20% for this study population.

Table 6

Sensitivity and Specificity of the PHQ‐2 for Screening of Depression

PHQ‐2 score	Sensitivity (%)	Specificity (%)	LR+	LR−
≥1	98.2 (54/55)	45.0 (72/160)	1.79	0.04
≥2	89.1 (49/55)	71.3 (114/160)	3.10	0.15
≥3	80.0 (44/55)	86.9 (139/160)	6.10	0.23
≥4	45.5 (25/55)	95.6 (153/160)	10.39	0.57
≥5	27.3 (15/55)	98.1 (157/160)	14.55	0.74
6	20.0 (11/55)	98.8 (158/160)	16.00	0.81

Note. PHQ = Patient Health Questionnaire; LR+ = positive likelihood ratio; LR− = negative likelihood ratio.

Figure 1

The receiver operating characteristic curve of the PHQ‐2 for screening of depression.

Sensitivity and Specificity of the PHQ‐2 for Screening of Depression Note. PHQ = Patient Health Questionnaire; LR+ = positive likelihood ratio; LR− = negative likelihood ratio. The receiver operating characteristic curve of the PHQ‐2 for screening of depression.

Discussion

This study is the first report of the validity and reliability of the PHQ‐2 and PHQ‐9 as screening and diagnostic instruments for depression in rural Spanish‐speaking populations in Mexico and elsewhere. Familiar and colleagues (2015) previously validated the PHQ‐9 among educated women teachers in Mexico; we expand on their work to fill a knowledge gap regarding the generalizability of their findings and offer strong evidence for the validity and reliability of the PHQ‐9 among men and less educated adults in rural Mexico. Specifically, our data support the notion of a one‐factor structure for the PHQ‐9 in this population. The internal consistency was adequate for the overall PHQ‐9 and by subgroups. Predictive validity was supported by the statistically significant inverse association between PHQ‐9 scores and the overall WHOQOL‐BREF scores, and between the PHQ‐9 scores and the scores of each of the physical, psychological and environment domains. These associations have been previously observed in primary care settings (Andriopoulos et al., 2013; Brenes, 2007; Papakostas et al., 2004; Pyne et al., 1997). Predictive validity was also suggested by a positive association between a PHQ‐9 score greater than 9 and a low SES (not having enough money for basic expenses), although this relationship was not statistically significant (p = 0.08) In this study, the mean PHQ‐9 score was 6.29 and the prevalence of a PHQ‐9 diagnosis of depression among the participants was 25.6%. The Mexican teachers’ cohort study, which used the PHQ‐9 to assess female teachers for depression, reported a mean PHQ‐9 score of 4.5 and a prevalence of a PHQ‐9 diagnosis of 12.6% (Familiar et al., 2015). This difference in depression prevalence may be explained by several factors. First, clinic physicians confirmed the PHQ‐9 depression diagnosis in only 80% of people with a score greater than 9 who attended a follow‐up visit. This underscores the critical role of clinical evaluation in conjunction with deployment of the PHQ‐9. Second, participants in the teachers’ cohort study differed from participants in the present study in that they were highly educated (73% with a university degree and 16% with a postgraduate degree) and tended to be well off (45% reported a medium SES and 20% high SES), suggesting the possibility of better access to mental health care. Of participants in the present study, 21% and 48% were unable to read and write and reported a low SES, respectively. The lack of understanding of some of the items asked in the PHQ‐9 was significantly associated with inability to read and write. Also, the data collectors reported that some participants did not understand some of the words (i.e. pleasure) used in the questionnaire. Interestingly, the lowest factor loadings were observed for the two scale items (1 and 8) with the highest percent of individuals who did not understand the items. The influence of low literacy on the accuracy of depression screening instruments has been documented elsewhere (Dickens Akena et al., 2012) and should be taken into account through careful piloting and adaptation when using them in settings like ours where illiteracy rates are high. Previous studies have highlighted the use of the PHQ‐9 in primary care to identify individuals at increased risk of suicide attempt or death (Simon et al., 2013; Uebelacker, German, Gaudiano, & Miller, 2011). In our study, thoughts of being better off dead or hurting oneself (item 9) were present in 26% of the participants, and those thoughts were present more than half the days in 9% of the participants. This item had the second highest factor loading in our confirmatory factor analysis. Although the PHQ‐8, which excludes the self‐harm item, has been recommended for community surveys and has been successfully used in primary care settings, given the high prevalence of thoughts of self‐harm and its high factor loading in the confirmatory factor analysis, we chose not to exclude this item. Further studies should be conducted to assess the presence, frequency, and intensity of these thoughts, and their causes and potential implications to develop strategies aimed at preventing suicide in these settings. Brief tools are needed to effectively screen and diagnose patients for depression in primary care settings where demand for services is high and health professionals, including highly trained mental health professionals, are scarce (Kroenke, 1997; Williams, 1998). In this study, the PHQ‐2 and PHQ‐9 questionnaires were implemented in the context of an active case‐finding activity for depression. Both instruments have shown good case‐finding properties compared to patient‐rated major depression in general medical practice (Olssøn, Mykletun, & Dahl, 2005). Once the psychometric properties of the PHQ‐9 were assessed and its validity for this population was established, we compared the PHQ‐2 with the PHQ‐9 to determine its validity for screening of depression. Our study provides strong evidence to support the use of the PHQ‐2 to identify individuals who may be at high risk for depression. The PHQ‐2 showed high sensitivity and specificity as a screening instrument for a diagnosis of major depression when compared with the PHQ‐9. A cutoff of 3 and greater was optimal for the study population maximizing sensitivity (80.0%) and specificity (86.9%). However, selecting an optimal PHQ‐2 cutoff should be done cautiously: Selecting a high sensitivity and lower specificity would increase false positive screens, which in turn would potentially lead to increased burden on clinicians and/or cost to the patient, while selecting a lower sensitivity in favor of a higher specificity would contribute to miss depression diagnoses. The sensitivity and specificity of the PHQ‐2 in this study population were similar to previous studies in the literature from primary care populations (Arroll et al., 2010; Kroenke et al., 2003; Richardson et al., 2010; Zuithoff et al., 2010). The positive likelihood ratio increased as the cutoff increased and was 6.10 for a cutoff of 3 or greater, which means that given a PHQ‐2 score of 3 or greater, the individual is 6.10 times more likely to have a PHQ‐9 score greater than 9. The positive and negative likelihood ratios are a combination of sensitivity and specificity and are useful for clinicians especially in settings like ours where the true prevalence of depression is unknown (Arroll et al., 2010). The PHQ‐2 showed good psychometric properties for community‐based screening of depression, is easy for care providers to remember the core symptoms it assesses, and is a brief depression screen; therefore, we recommend the use of the PHQ‐2 as a screening tool in rural Spanish‐speaking settings to improve early detection of depression. Given its good reliability and predictive validity, we recommend using a contextually adapted version of PHQ‐9 for further assessment of depression for those patients with a PHQ‐2 of 3 or greater, followed by a further diagnostic work‐up for those patients with a PHQ‐9 greater than 9.

Limitations

Our study has several limitations. First, we conducted the study in only one rural community; however, we believe that our results are likely generalizable to other rural, low literacy Spanish‐speaking communities in Mexico. These findings are also likely generalizable to other populations with similar sociodemographic characteristics (i.e., rural, low literate) in Latin American countries; however, it is unknown how these findings would apply to non‐Spanish‐speaking populations, including those who might speak an indigenous dialect as their first or primary language. Second, we used the PHQ‐9 as a gold standard to assess the sensitivity and specificity of the PHQ‐2 as a screening instrument for depression. Therefore, if the sensitivity or the specificity of the PHQ‐9 is suboptimal, then our analyses related to the sensitivity and specificity of the PHQ‐2 may be biased, and a study using a different diagnostic tool from the PHQ‐9 would be recommended. Given that some the participants did not understand some of the items in the PHQ‐9, we would recommend adapting it to relate to the local context and the literacy level of the population.

Conclusion

Given the large health and social burden of depression and considering the need for brief, structured, reliable, and valid tools to aid primary care providers in assessing patients for depression, both the PHQ‐2 and PHQ‐9 are useful and valuable instruments for screening and diagnosis of depression in rural, Spanish‐speaking populations. However, to reduce the global burden of mental health disorders and improve QOL of patients, the use of these tools should be coupled with expanded access to appropriate mental health and both pharmacological and nonpharmacological interventions that have proven useful for these settings.

51 in total

1. Evaluation of the PHQ-2 as a brief screen for detecting major depression among adolescents.

Authors: Laura P Richardson; Carol Rockhill; Joan E Russo; David C Grossman; Julie Richards; Carolyn McCarty; Elizabeth McCauley; Wayne Katon
Journal: Pediatrics Date: 2010-04-05 Impact factor: 7.124

2. The PHQ-9: validity of a brief depression severity measure.

Authors: K Kroenke; R L Spitzer; J B Williams
Journal: J Gen Intern Med Date: 2001-09 Impact factor: 5.128

3. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire.

Authors: R L Spitzer; K Kroenke; J B Williams
Journal: JAMA Date: 1999-11-10 Impact factor: 56.272

4. Prevalence and correlates of depressive disorders in outpatients with breast cancer in Lagos, Nigeria.

Authors: Abiodun O Popoola; Abiodun O Adewuya
Journal: Psychooncology Date: 2011-04-03 Impact factor: 3.894

5. Help-seeking behaviour in patients with anxiety disorder and depression.

Authors: A Roness; A Mykletun; A A Dahl
Journal: Acta Psychiatr Scand Date: 2005-01 Impact factor: 6.392

6. Barriers to seeking help and treatment suggestions for prenatal depressive symptoms: focus groups with rural low-income women.

Authors: D Elizabeth Jesse; Christyn L Dolbier; Amy Blanchard
Journal: Issues Ment Health Nurs Date: 2008 Impact factor: 1.835

7. Anxiety, depression, and quality of life in primary care patients.

Authors: Gretchen A Brenes
Journal: Prim Care Companion J Clin Psychiatry Date: 2007

8. Community perceptions of mental distress in a post-conflict setting: a qualitative study in Burundi.

Authors: Itziar Familiar; Sonali Sharma; Herman Ndayisaba; Norbert Munyentwari; Seleus Sibomana; Judith K Bass
Journal: Glob Public Health Date: 2013-08-13

9. Mental health response in Haiti in the aftermath of the 2010 earthquake: a case study for building long-term solutions.

Authors: Giuseppe Raviola; Eddy Eustache; Catherine Oswald; Gary S Belkin
Journal: Harv Rev Psychiatry Date: 2012 Jan-Feb Impact factor: 3.732

10. The Patient Health Questionnaire-9 for detection of major depressive disorder in primary care: consequences of current thresholds in a crosssectional study.

Authors: Nicolaas P A Zuithoff; Yvonne Vergouwe; Michael King; Irwin Nazareth; Manja J van Wezep; Karel G M Moons; Mirjam I Geerlings
Journal: BMC Fam Pract Date: 2010-12-13 Impact factor: 2.497

48 in total

1. Validation of Neuro-QoL and PROMIS Mental Health Patient Reported Outcome Measures in Persons with Huntington Disease.

Authors: Noelle E Carlozzi; Siera Goodnight; Anna L Kratz; Julie C Stout; Michael K McCormack; Jane S Paulsen; Nicholas R Boileau; David Cella; Rebecca E Ready
Journal: J Huntingtons Dis Date: 2019

2. A Single-Item Visual Analogue Scale (VAS) Measure for Assessing Depression Among College Students.

Authors: Zhiyong Huang; Iliana V Kohler; Fabrice Kämpfen
Journal: Community Ment Health J Date: 2019-09-17

3. Parent Nativity and Child Asthma Control in Families of Mexican Heritage: The Effects of Parent Depression and Social Support.

Authors: Sally M Weinstein; Kimberly Orozco; Oksana Pugach; Genesis Rosales; Nattanit Songthangtham; Molly A Martin
Journal: Acad Pediatr Date: 2020-05-12 Impact factor: 3.107

4. Prevalence of psychological distress, depression and suicidal ideation in an indigenous population in Panamá.

Authors: Rebekah J Walker; Jennifer A Campbell; Aprill Z Dawson; Leonard E Egede
Journal: Soc Psychiatry Psychiatr Epidemiol Date: 2019-05-04 Impact factor: 4.328

5. Examining brief and ultra-brief anxiety and depression screening methods in a real-world epilepsy clinic sample.

Authors: Heidi M Munger Clary; Mingyu Wan; Kelly Conner; Gretchen A Brenes; James Kimball; Esther Kim; Pamela Duncan; Beverly M Snively
Journal: Epilepsy Behav Date: 2021-04-08 Impact factor: 2.937

6. Measuring Psychological Flexibility: The Cultural Adaptation and Psychometric Properties of the AAQ for Substance Abuse among Spanish Speaking Population in Correctional and Community settings.

Authors: Hilda A Sánchez-Millán; Alfredo Alicea-Cruz; Coralee Pérez Pedrogo
Journal: J Contextual Behav Sci Date: 2021-11-10

7. Resilience as the Mediating Factor in the Relationship Between Sleep Disturbance and Post-stroke Depression of Stroke Patients in China: A Structural Equation Modeling Analysis.

Authors: Lina Zhao; Fengzhi Yang; Kristin K Sznajder; Changqing Zou; Yajing Jia; Xiaoshi Yang
Journal: Front Psychiatry Date: 2021-05-10 Impact factor: 4.157

8. Evaluating the validity and reliability of the Chinese entrapment scale and the relationship to depression among men who have sex with men in Shanghai, China.

Authors: Chen Xu; Xiaoyue Yu; Lhakpa Tsamlag; Shuxian Zhang; Ruijie Chang; Huwen Wang; Shangbin Liu; Ying Wang; Yong Cai
Journal: BMC Psychiatry Date: 2021-07-02 Impact factor: 3.630

9. Impact of the COVID-19 pandemic on stress, resilience and depression in health professionals: a cross-sectional study.

Authors: Isabel Manzanares; Sonia Sevilla Guerra; María Lombraña Mencía; Nihan Acar-Denizli; Josep Miranda Salmerón; Gemma Martinez Estalella
Journal: Int Nurs Rev Date: 2021-06-07 Impact factor: 3.384

10. Primary care-based screening and management of depression amongst heavy drinking patients: Interim secondary outcomes of a three-country quasi-experimental study in Latin America.

Authors: Amy O'Donnell; Bernd Schulte; Jakob Manthey; Christiane Sybille Schmidt; Marina Piazza; Ines Bustamante Chavez; Guillermina Natera; Natalia Bautista Aguilar; Graciela Yazmín Sánchez Hernández; Juliana Mejía-Trujillo; Augusto Pérez-Gómez; Antoni Gual; Hein de Vries; Adriana Solovei; Dasa Kokole; Eileen Kaner; Carolin Kilian; Jurgen Rehm; Peter Anderson; Eva Jané-Llopis
Journal: PLoS One Date: 2021-08-05 Impact factor: 3.240