Literature DB >> 27703765

Case finding and screening clinical utility of the Patient Health Questionnaire (PHQ-9 and PHQ-2) for depression in primary care: a diagnostic meta-analysis of 40 studies.

Alex J Mitchell¹, Motahare Yadegarfar², John Gill², Brendon Stubbs³.

Abstract

BACKGROUND: The Patient Health Questionnaire (PHQ) is the most commonly used measure to screen for depression in primary care but there is still lack of clarity about its accuracy and optimal scoring method. AIMS: To determine via meta-analysis the diagnostic accuracy of the PHQ-9-linear, PHQ-9-algorithm and PHQ-2 questions to detect major depressive disorder (MDD) among adults.
METHOD: We systematically searched major electronic databases from inception until June 2015. Articles were included that reported the accuracy of PHQ-9 or PHQ-2 questions for diagnosing MDD in primary care defined according to standard classification systems. We carried out a meta-analysis, meta-regression, moderator and sensitivity analysis.
RESULTS: Overall, 26 publications reporting on 40 individual studies were included representing 26 902 people (median 502, s.d.=693.7) including 14 760 unique adults of whom 14.3% had MDD. The methodological quality of the included articles was acceptable. The meta-analytic area under the receiver operating characteristic curve of the PHQ-9-linear and the PHQ-2 was significantly higher than the PHQ-9-algorithm, a difference that was maintained in head-to-head meta-analysis of studies. Our best estimates of sensitivity and specificity were 81.3% (95% CI 71.6-89.3) and 85.3% (95% CI 81.0-89.1), 56.8% (95% CI 41.2-71.8) and 93.3% (95% CI 87.5-97.3) and 89.3% (95% CI 81.5-95.1) and 75.9% (95% CI 70.1-81.3) for the PHQ-9-linear, PHQ-9-algorithm and PHQ-2 respectively. For case finding (ruling in a diagnosis), none of the methods were suitable but for screening (ruling out non-cases), all methods were encouraging with good clinical utility, although the cut-off threshold must be carefully chosen.
CONCLUSIONS: The PHQ can be used as an initial first step assessment in primary care and the PHQ-2 is adequate for this purpose with good acceptability. However, neither the PHQ-2 nor the PHQ-9 can be used to confirm a clinical diagnosis (case finding). DECLARATION OF INTEREST: None. COPYRIGHT AND USAGE: © The Royal College of Psychiatrists 2016. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) licence.

Entities: Disease Gene Species

Year: 2016 PMID： 27703765 PMCID： PMC4995584 DOI： 10.1192/bjpo.bp.115.001685

Source DB: PubMed Journal: BJPsych Open ISSN： 2056-4724

Major depressive disorder (MDD) is a serious, disabling condition that is often comorbid with other medical presentations.[1-4] Most care for depression is delivered by general practitioners (GPs) and individually many GPs have considerable experience in managing depression.[5] Approximately 7% of all consultations in primary care are for depression.[6] Yet, clinicians find it challenging to precisely diagnose depression and often overestimate or underestimate levels of distress of their patients sometimes resulting in false-positive or false-negative diagnoses.[7] Indeed, GPs are typically able to detect about half of true cases of depression on a one-off visit[1] and once diagnosed not all patients with depression receive adequate timely care.[8] Although under-detection can lead to inadequate treatment,[9] over-detection (misidentification) can lead to inappropriate treatment.[9,10] For example, in the Baltimore Epidemiologic Catchment Area Study, 38% of antidepressant users never met the criteria for MDD, obsessive–compulsive disorder, panic disorder, social phobia or generalised anxiety disorder in their lifetime.[10] Mitchell et al[1] suggested that this could become a particular problem in routine care where prevalence rates are modest when false positives can outnumber false negatives. Given that many clinicians have highlighted the difficulties in the timely diagnosis of depression[11] and that depression care is often inadequate,[12-14] the use of screening tools in routine care has been suggested by some as possibly beneficial by enhancing diagnosis-as-usual. Screening is most usefully defined as the systematic application of a test to rule out those without a condition and case finding most usefully defined as the systematic application of a test to confirm those with a condition.[15] Screening and case finding have been proposed as solutions adopted into the UK primary care quality outcomes framework (QoF).[16] The use of short screening questionnaires (<5 min) and ultra-short questionnaires (<2 min) may improve the recognition of depression if such tests are accurate, acceptable and implemented.[17,18] Of all the possible tools for depression, the depression module of the Patient Health Questionnaire (PHQ-9) is the most popular current tool which has three main formats. The PHQ-9 (PHQ-9-linear) scored by simple addition and at a threshold of 10 or higher had a sensitivity of 88% and a specificity of 88% for detecting MDD in the initial validation study.[19] The PHQ-9 (PHQ-9-algorithm) scored by the algorithm suggested in DSM-IV for MDD (the DSM algorithm method requires at least five symptoms rated as at least 2 (more than half the days) (>0 for the suicidal ideation item) plus at least one of the symptoms scored as at least 2 is either loss of interest or pleasure or depressed mood all present for 2 weeks or more and associated with distress or dysfunction). As this follows the rules of DSM-IV more precisely, it is anticipated that this method should be the most accurate. The PHQ-2 is the 2-item version utilising only the first two questions, namely loss of interest and low mood for the past 2 weeks, scored by simple linear scoring using a threshold of 2 or higher.[20] An adaptation of the PHQ-2 also exists where the main modification is the duration of questioning which is over the past month rather than 2 weeks. This is known as the Whooley questions after the original author. Yet, it is important to acknowledge that the value of screening and severity assessment has been disputed both in the literature and in clinical practice. Some authors have stated that routine use of depression tools should identify patients with either previously unrecognised MDD or untreated MDD (in effect a demonstration of added value)[21] but policy recommendations and guidelines have been inconsistent. In 2009, the United States Preventive Services Task Force (USPSTF) recommended routine depression screening in primary care settings with follow-up.[22] This recommendation has recently been revised and extended.[23] In the UK, the national guidelines have reversed their advice[24] and the most recent draft guidance state there is little convincing evidence that depression screening will reduce the number of patients with depression or improve depression symptoms.[25] GPs in the UK have been less enthusiastic than patients about routine use of depression scales,[26] leading to the removal of depression screening incentives from the UK QoF. In 2013, the Canadian CTFPHC reconsidered its earlier guideline and also recommended against screening adults for depression in primary care settings.[27] Thus, although some still advocate screening for depression, others do not and the argument has become polarised.[23] Few are putting forth the argument that screening might work in some circumstances or that further evidence is required from high-quality studies, leading to observers to suggest that this is a form of confirmation bias from either side defending an entrenched position.[28] Four previous meta-analyses have been conducted on the accuracy of the PHQ-9 but none have specifically been conducted in primary care.[29-32] One previous meta-analysis has been conducted on the PHQ-2/Whooley questions but is considerably out of date.[17] Thus, the primary objective is to conduct a meta-analysis to determine the diagnostic accuracy of the PHQ-9-linear, PHQ-9-algorithm and PHQ-2 questions to detect MDD among adults.

Method

This systematic review was conducted following a predetermined but unpublished protocol.

Inclusion and exclusion criteria

We included studies that reported the accuracy of PHQ-9/PHQ-2 questions for diagnosing MDD in primary care. The setting had to be mostly primary care (but not exclusively, containing >50% of primary care patients) and we identified one study in two publications with mixed recruitment.[33,34] Studies focusing on one single medical condition in primary care were excluded.[35] The studies had to provide sufficient data to allow us to calculate contingency tables or had to be supplied by authors. We only included studies that defined MDD according to standard classification systems such as the ICD or the DSM using a standardised diagnostic interview schedule (Mini International Neuropsychiatric Interview (MINI), Structured Clinical Interview for DSM Disorders (SCID), Composite International Diagnostic Interview (CIDI), Diagnostic Interview Schedule (DIS) or Revised Clinical Interview Schedule (CIS-R)).

Information sources and searches

Two independent reviewers searched Embase, Web of Science, PsycINFO, CINAHL Plus and PubMed from 1998 until June 2015. We used the key words ‘PHQ’, ‘patient health questionnaire’, ‘screening’, ‘depression’, ‘MDD’, ‘primary care’ and ‘general practice’.

Data abstraction

We collected information about study characteristics and quality using a standardised data collection form. We included the following characteristics: setting, country, age of sample, gender of sample, year of study, sample size, masking of the assessor of the reference test, data integrity, cut-off score and translation of non-English versions of PHQ-9. When an article appeared to meet the criteria but did not contain sufficient data, we contacted the authors up to two times a month.

Study selection

After the removal of duplicates, two independent reviewers screened the titles and abstracts of all potentially eligible articles. Both authors applied the eligibility criteria, and a list of full text articles was developed through consensus. The two reviewers then considered the full texts of these articles and the final list of included articles was reached through consensus. A third reviewer was available for mediation throughout this process.

Methodological quality assessment

We used the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool to assess risk of bias factors in primary studies, and these factors will be included as study-level variables in analyses.[36] The updated QUADAS-2 guidelines stipulate that it should be adapted for each specific review. We employed the QUADAS-2 adaptation utilised in a recent generic PHQ meta-analysis.[30] The QUADAS-2 incorporates assessments of risk of bias across four core domains: patient selection, the index test, the reference standard and the flow and timing of assessments. Two reviewers independently assessed risk of bias with any discrepancies resolved by consensus. Two reviewers also independently assessed outliers that may be qualitatively different in study design.

Meta-analysis and proposed subgroup analysis

A pooled meta-analysis of suitable studies was conducted to identify overall test accuracy, sensitivity, specificity, combined Youden score, positive and negative predictive values (PPV/NPV), positive and negative likelihood ratios (LR+/LR−) and positive and negative clinical utility index (CUI+/CUI−). Further details are available at www.clinicalutility.co.uk. The CUI is a proxy for the applied value of a test with a qualitative as well as quantitative interpretation.[37-39] Clinical utility may be more important to clinicians than validity.[40] Clinical utility estimates the clinical value of a diagnostic test taking into account both the accuracy of the test and its occurrence. The positive utility index (for rule-in or case-finding accuracy) is a product of sensitivity and PPV and the negative utility index (for rule-out or screening accuracy) is a product of Sp x NPV. The interpretation of the CUI is 0.93–1.00 near perfect value, 0.81–0.92 excellent, 0.64–0.80 good, 0.49–0.63 fair, 0.36–0.48 poor and <0.36 very poor. Sensitivity and specificity are generally regarded as intrinsic characteristics of a test and independent of prevalence and are a useful initial metric, but these measures do not reflect clinical practice or inform clinicians how to interpret a positive or negative test.[41] Summary measures of diagnostic accuracy typically use receiver operating characteristic (ROC) curve analysis, by which sensitivity and specificity linked with all possible cut-off scores were calculated and plotted.[42] For an individual study, an optimal cut-off score is chosen which balances sensitivity and specificity. ROC curve data are a proportion with a confidence interval which can be combined across all qualifying studies. From the supplied data, we constructed 2 × 2 tables for each cut-off score and computed any missing values. For completeness, we also performed a bivariate meta-analysis to obtain pooled estimates of specificity and sensitivity and their associated 95% confidence intervals (CIs). We constructed summary ROC curves using the bivariate model to produce a 95% confidence ellipse within the ROC curve space. Each data score in this space represents a separate study. We also constructed a Bayesian plot of conditional probabilities which shows all PPVs and NPVs across every possible prevalence. We assessed between-study heterogeneity using the I[2] statistic[43] which describes the percentage of total variation across studies that is caused by heterogeneity rather than chance. As per convention, we considered an I[2] value of 25% to be low, 50% to be moderate and 75% to be high. We explored the causes of heterogeneity if there was significant between-study heterogeneity. Publication bias was assessed by Harbord or Egger methods.[44] For a secondary moderator analysis, we performed sub-analysis in clinically relevant subgroups such as those studies with a head-to-head comparison of tools in the same sample. We also attempted a logistic meta-regression analysis of diagnostic accuracy using the 50th percentile of Youden score (sum of sensitivity and specificity) using covariates in the meta-regression model.[45] We investigated heterogeneity resulting from the characteristics of the sample or study design by exploring the effects of potential predictive variables.

Results

Search results

The initial search yielded 777 hits. After removal of duplicates, 621 abstracts and titles were screened (Fig. 1). At the full-text review stage, 58 articles were considered and 32 were subsequently excluded, leaving 26 publications and 40 different analyses that were included in the review.[19,32,33,46-68] Details regarding the search results, including reasons for exclusion of articles are summarised in Fig. 1.

Fig. 1

PRISMA flow diagram of search strategy.

Study and participant characteristics

Details of the included studies are summarised in Table 1. Briefly, 11 studies examined the PHQ-9, 9 examined the PHQ-9-algoithm and 20 examined the PHQ-9-linear. Several studies compared diagnostic methods within the same population, allowing a head-to-head comparison. Of particular interest, Thompson & Higgens,[45] Manea et al[32] and Lowe et al[33] compared all three diagnostic methods. Chen et al, Kroenke et al, Liu et al, de Lima Osório et al, 2009, Phelan et al and Richardson et al compared the PHQ-2 with the PHQ-9-linear.[19,50,52,57,60,61] Lamers et al, Lotrakul et al, Wittkampf et al and Zuithoff et al compared the PHQ-9-algorithm with the PHQ-9-linear. In these head-to-head studies, the cut-off thresholds were consistent, namely PHQ-2 (linear) ≥2 and PHQ-9 (linear) ≥10.[56,58,66,68]

Table 1

Summary of included studies

Author	PHQ method	Sample mean age and % male/female	Sample size	Prevalence of depression, %	Reference standard
Lowe et al[34]	PHQ-2 (linear) ≥2	42 years, 32.5% male	520	13.7	SCID, DSM-IV
Löwe et al[33]	PHQ-9-linear and algorithm	41.7 years, 32.9% male	501	13.2	SCID, DSM-IV
Arroll et al[46]	PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10; PHQ-9 (algorithm)	49 years, 39% male	2642	6.2	CIDI, DSM-IV
Ayalon et al[47]	PHQ-9 (algorithm)	75 years 59.5% male	153	3.9	SCID, DSM-IV
Azah et al⁴⁸	PHQ-9 (linear) ≥10	38.7 years, 38.3% male	180	46.1	CIDI, ICD-10
Cannon et al[49]	PHQ-9 (algorithm)	57.2 years, 54% male	526	26.6	SCID DSM-IV MDD lifetime
Chen et al[50]	PHQ-9 (linear) ≥10	Age not reported, 47% male	262	16.7	SCID, DSM-IV
Chen et al[51]	PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10	68.5 years, 56.3% female	77	54.5	SCID DSM-IV
De Lima Osório et al[52]	PHQ-9 (linear) ≥10; PHQ-2 (linear) ≥2	48% under 30, 52% between 31 and 50 years, 100% female	177	33.9	SCID DSM-IV
Gelaye et al[53]	PHQ-9 (linear) ≥10	35.1 years, 61.3% female	363	12.7	SCAN DSM-IV
Gilbody et al[54]	PHQ-9 (linear) ≥10	42.5 years, 77.1% female	96	37.5	SCID DSM-IV
Henkel et al[55]	PHQ-2 (linear) ≥2	53.9 years, 75% female	382	10.0	SCID DSM
Kroenke et al[19]	PHQ-9 (linear) ≥10	46 years, 66% female	580	7.1	DSMIIR
Lamers et al[56]	PHQ-9 (algorithm); PHQ-9 (linear) ≥8	71.4 years, 51.8% male	713	17.3	MINI DSM-IV
Liu et al[57]	PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10	18 years or older, 39% male	1532	3.3	SCAN DSM-IV
Lotrakul et al[58]	PHQ-9 (linear) ≥10; PHQ-9 (algorithm)	45 years, 73.7% female	279	6.8	MINI DSM-IV
Patel et al[59]	PHQ-9 (linear) ≥10	37.5 years, 56.4% female	598	5.5	ICD-10
Phelan et al[60]	PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10	78 years, 62% female	69	11.6	SCID DSM
Richardson et al[61]	PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10	15.3 years, 60% female	442	4.3	DIS for MDD in children (DISC)
Sherina et al⁶²	PHQ-9 (linear) ≥10	30.9 years, 100% female	146	12.1	CIDI, ICD-10
Spitzer et al[63]	PHQ-9 (algorithm)	46 years, 66% female	585	10.0	DSM
Sung et al[64]	PHQ-9 >6	36.1 years, 65.3% female	400	3.0	MINI DSM
Wittkampf et al[66]	PHQ-9 (linear) ≥10; PHQ-9 (algorithm)	49.8 years, 66.7% female	664	12.3	SCID-I
Yeung et al[67]	PHQ-9 (linear) ≥15	Not reported	184	22.8	SCID DSM
Zuithoff et al[68]	PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10; PHQ-9 (algorithm)	51 years, 37% female	1352	13.0	CIDI DSM-IV

CIDI, Composite International Diagnostic Interview; DIS, Diagnostic Interview Schedule; DISC, Diagnostic Interview Schedule for Children; DSM, Diagnostic and Statistical Manual of Mental Disorders; ICD, International Classification of Diseases; MDD, major depressive disorder; MINI, Mini International Neuropsychiatric Interview; PHQ, Patient Health Questionnaire; SCAN, Schedules for Clinical Assessment in Neuropsychiatry; SCID, Structured Clinical Interview for DSM Disorders. The total sample size was 26 902 (median 502, s.d.=693.7) with a mean patient age of 49.38 years, and 61% were female. There were 23 706 individuals without depression according to the criterion reference and 3009 with depression, meaning that the prevalence of depression in primary care was 11.3% (95% CI 10.92–11.68%) from simple pooling of data. However, as several publications used multiple tests, after limiting the analysis to unique adults, there were 14 760 people, of whom 2117 had depression (14.3%; 95% CI 11.3–17.7).

Methodological quality

Supplementary Table DS1 summarises the QUADAS-2 scores for all of the included studies. Only four studies were judged low risk of bias across all four domains.[33,45,55,59] Three studies had either high risk of bias or were considered possible outliers. Richardson et al,[61] utilised adolescents seen in primary care; Whooley et al,[65] used the Whooley questions and was eventually excluded; finally Cannon et al,[48] used lifetime risk of depression rather than current depression (although this did not significantly influence the recorded prevalence levels). We used this information as a moderator analysis.

Diagnostic accuracy of the PHQ

Sensitivity and specificity meta-analysis

Main analysis. The diagnostic validity meta-analysis gave overall sensitivity estimates of 82.2% (95% CI 74.3–88.9), 58.4% (95% CI 44.5–71.7) and 89.9% (95% CI 83.4–94.9) for the PHQ-9-linear, PHQ-9-algorithm and PHQ-2 respectively. In all cases, there was significant heterogeneity but no significant publication bias (see Table 2 which contains the heterogeneity and publication bias data for all of the pooled analysis). The pooled specificity was 84.7% (95% CI 80.4–88.5), 92.1% (95% CI 85.9–96.6) and 72.6% (95% CI 66.0–78.7) for the PHQ-9-linear, PHQ-9-algorithm and PHQ-2 respectively. In the sensitivity analysis (in which we removed the three outliers) and in the bivariate analysis, the results were broadly unchanged (Table 3 and Fig. 2) but they did generate our best estimate of sensitivity of 81.3% (95% CI 71.6–89.3) and specificity of 85.3% (95% CI 81.0–89.1) for the PHQ-9-linear; a best estimate of sensitivity of 89.3% (95% CI 81.5–95.1) and specificity of 75.9% (95% CI 70.1–81.3) for the PHQ-2; a best estimate of sensitivity of 56.8% (95% CI 41.2–71.8) and specificity of 93.3% (95% CI 87.5–97.3) for the PHQ-9-algorithm.

Table 2

Inconsistency and bias analysis in Patient Health Questionnaire (PHQ) data in primary care

Test	Sensitivity bias, % (95% CI)	Specificity bias, % (95% CI)	ROC bias, % (95% CI)
Main results
PHQ-9-linear n=20	I² (inconsistency) = 90.4 (87 to 92.5)Harbord: bias = 3.08 (92.5% CI −0.604 to 6.77) P=0.1314	I² (inconsistency) = 96.6 (96 to 97.1)Harbord: bias = −5.742 (92.5% CI −10.38 to –1.097) P=0.0312	I² (inconsistency) = 77.2 56.8 to 85.6)Egger: bias = −0.805 (−2.50 to 0.89) P=0.3154
PHQ-9-algorithm n=9	I² (inconsistency) = 94.8 (92.7 to 96.1)Harbord: bias = 0.363 (92.5% CI −7.707 to 8.433) P=0.9277	I² (inconsistency) = 98.3 (97.9 to 98.5)Harbord: bias = −13.19 (92.5% CI −31.689 to 5.302) P=0.1797	I² (inconsistency) = 92.1 (87.8 to 94.4)Egger: bias = 1.33 (−7.685 to 10.345) P=0.7375
PHQ-2 n=11	I² (inconsistency) = 85.3 (74.9 to 90.2)Harbord: bias = 0.98 (92.5% CI −2.846 to 4.815) P=0.6175	I² (inconsistency) = 97 (96.2 to 97.5)Harbord: bias = −3.89 (92.5% CI −11.382 to 3.593) P=0.3225	I² (inconsistency) = 73.2 (14 to 86.4)Egger: bias = −0.64 (−6.848 to 5.554) P=0.7865

Head-to-head results
PHQ-9-linear n=8	I² (inconsistency) = 88.5 (79.4 to 92.5)Harbord: bias = 1.22693 (92.5% CI−4.335 to 6.789) P=0.6519	I² (inconsistency) = 96.7 (95.6 to 97.4%)Harbord: bias = −4.17 (92.5% CI −12.913 to 4.558) P=0.3433	I² (inconsistency) = 86.2 (57.6 to 92.8)Egger: bias = −1.03 (−11.014 to 8.950) P=0.6999
PHQ-2-linear n=8	I² (inconsistency) = 83.2 (65.5 to 89.8)Harbord: bias = −0.18 (92.5% CI −4.559 to 4.198) P=0.9321	I² (inconsistency) = 97.1 (96.2 to 97.7)Harbord: bias = −4.01 (92.5% CI −12.868 to 4.830) P=0.3664	I² (inconsistency) = 17.7 (0 to 73.2)Egger: bias = −1.93 (−4.395 to 0.526) P=0.0774
PHQ-9-algorithm n=6	I² (inconsistency) = 94.2 (90.5 to 96)Harbord: bias = 6.96 (92.5% CI −4.920 to 18.852) P=0.2336	I² (inconsistency) = 91.2 (83.7 to 94.3)Harbord: bias = −5.02 (92.5% CI −15.119 to 5.069) P=0.2996	I² (inconsistency) = 92.4 (83.6 to 95.5)Egger: bias = 3.27 (−25.876 to 32.416) P=0.6769

Moderator analysis
PHQ-2 n=9Higher quality; same cut-off; adults	I² (inconsistency) = 87.1 (77.1 to 91.5)Harbord: bias = 1.22 (92.5% CI −3.194 to 5.641) P=0.5809	I² (inconsistency) = 95.8 (94.2 to 96.7)Harbord: bias = −1.86 (92.5% CI −9.244 to 5.506) P=0.6128	I² (inconsistency) = 76.3 (0 to 89.4)Egger: bias = −0.44 (−13.793 to 12.893) P=0.8979
PHQ-9-linear n=16Higher quality; same cut-off; adults	I² (inconsistency) = 92.1 (89.3 to 93.9)Harbord: bias = 3.46 (92.5% CI –1.240 to 8.162) P=0.1786	I² (inconsistency) = 95.9% (94.9 to 96.6)Harbord: bias = −5.86 (92.5% CI −10.423 to 1.313) P=0.0266	I² (inconsistency) = 80.8 (61.1 to 88.3)Egger: bias = −3.20 (−6.581 to 0.173) P=0.0598
PHQ-9-algorithm n=8Higher quality; same cut-off; adults	I² (inconsistency) = 95.1 (93 to 96.4)Harbord: bias = 0.35 (92.5% CI −8.228 to 8.942) P=0.9316	I² (inconsistency) = 98.1 (97.6 to 98.4)Harbord: bias = −12.03 (92.5% CI−33.868 to 9.808) P=0.2808	I² (inconsistency) = 93 (89.1 to 95)Egger: bias = 1.41 (−8.589 to 11.426) P=0.7406

ROC, receiver operating characteristic.

Values in bold are significant.

Table 3

Summary of Patient Health Questionnaire (PHQ) analysis in primary care

Test	Sensitivity, % (95% CI)	Specificity, % (95% CI)	PPV	NPV	ROC	CUI+	CUI-	LR+	LR-
Main results
PHQ-9-linear n=20	82.2 (74.3–88.9)	84.7 (0.80–0.89)	38.0% (36.1–39.9%)	97.7% (97.3–98.0%)	0.910 (0.892–0.930)	0.312 (0.311–0.313) ‘very poor’	0.827 (0.826–0.828) ‘excellent’	5.37 (5.09–5.67)	0.21 (0.19–0.24)
PHQ-9-algorithm n=9	58.4 (44.5–71.7)	92.1 (85.9–96.6)	47.4% (44.7–50.1%)	94.8% (94.3–95.3%)	0.733 (0.676–0.795)	0.277 (0.276–0.278) ‘very poor’	0.873 (0.873–0.873) ‘excellent’	7.39 (6.77–8.07)	0.45 (0.42–0.49)
PHQ-2 n=11	89.9 (83.4–94.9)	72.6 (66.0–78.7)	23.1% (21.5–24.7%)	97.5% (97.0–97.9%)	0.860 (0.819–0.903)	0.188 (0.187–0.189) ‘very poor’	0.708 (0.707–0.709 ‘good’	2.97 (2.82–3.12)	0.26 (0.22–0.30)

Head-to-head results
PHQ-9-linear n=8	87.0 (75.8–95.1)	87.2 (81.1–92.2)	38.2% (35.1–41.2%)	98.6% (98.3–98.9%)	0.920 (0.915–0.924)	0.322 (0.321–0.323)‘very poor’	0.877 (0.987–0.878) ‘excellent’	7.66 (7.04–8.33)	0.18 (0.14–0.22)
PHQ-2-linear n=8	91.5 (83.6–96.9)	72.2 (64.0–79.8)	22.7% (20.8–24.6%)	99.0% (98.7–99.3%)	0.900 (0.865–0.934)	0.205 (0.205–0.206) ‘very poor’	0.742 (0.741–0.743 ‘good’	3.61 (3.42–3.81)	0.13 (0.10–0.17)
PHQ-9-algo n=6	53.0 (36.4–69.3)	95.7 (93.5–97.5)	58.4% (54.1–62.8%)	94.1% (93.5–94.7%)	0.715 (0.628–0.815)	0.273 (0.272–0.275) ‘very poor’	0.905 (0.904–0.906) ‘excellent’	12.35 (10.56–14.45)	0.55 (0.51–0.50)

Moderator analysis
PHQ-9-linear n=16 Higher quality; same cut-off; adults	81.3 (71.6–89.3) 82.33(72.0–89.4)[a]	85.3 (81.0–89.1) 86.4 (81.2–90.4)[a]	44.2% (41.9–46.6%)	38.9 (36.8–41.0)	97.5 (97.2–97.9)	0.316 (0.315–0.317) ‘very poor’	0.832 (0.832–0.832) ‘excellent’	5.53 (5.21–5.87)	0.22 (0.19–0.25)
PHQ-9-algorithm n=8 Higher quality; same cut-off; adults	56.8 (41.2–71.8) 54.0 (40.0–67.5)[a]	93.3 (87.5–97.3) 95.9 (94.0 – 97.3)[a]	60.3% (57.0–63.6%)	48.3% (45.4–51.3)	95.1% (94.7–95.6)	0.275 (0.274–0.276) ‘very poor’	0.887 (0.887–0.887) ‘excellent’	8.46 (97.67 –9.33)	0.46 (0.43–0.50
PHQ-2 n=9 Higher quality; same cut-off; adults	89.3 (81.5–95.1) 91.4 (81.9–96.2)[a]	75.9% (70.1–81.3) 76.3 (69.6–82.0)[a]	27.7% (25.8–29.6%)	26.5% (24.6–28.3%)	98.6% (98.3–99.0%)	0.236 (0.235–0.237) ‘very poor’	0.749 (0.749–0.749) ‘good’	3.71 (3.52–3.90	0.14 (0.11–0.18)

PPV, positive predictive value; NPV, negative predictive value; ROC, receiver operating characteristic; CUI+, positive clinical utility index; CUI−, negative clinical utility index; LI+, positive likelihood ratio; LI−, negative likelihood ratio.

Alternative calculation based on bivariate calculation in STATA.

Fig. 2

Bayesian plot of conditional probabilities PHQ-9-linear v. PHQ-9-algoithm v. PHQ-2 (restricted to head-to-head studies).

ROC, receiver operating characteristic. Values in bold are significant. PPV, positive predictive value; NPV, negative predictive value; ROC, receiver operating characteristic; CUI+, positive clinical utility index; CUI−, negative clinical utility index; LI+, positive likelihood ratio; LI−, negative likelihood ratio. Alternative calculation based on bivariate calculation in STATA. Subanalysis (head to head) PHQ-9-linear In a subanalysis restricted to head-to-head studies on the same population, the sensitivity of the PHQ-9-linear was 87.0% (95% CI 75.81–95.07) v. 91.4% (95% CI 83.60–96.92) for the PHQ-2. The specificity of the PHQ-9-linear was 87.17 (95% CI 81.10–92.20) v. 72.23% (95% CI 63.96–79.81) for the PHQ-2. In the sensitivity analysis, the results were unchanged (Table 3). Subanalysis (head to head) PHQ-9-linear In a subanalysis restricted to head-to-head studies on the same population, the sensitivity of the PHQ-9-linear was 81.1% (95% CI 63.34–93.86) v. 53.1% (95% CI 36.44–69.31) for the PHQ-9-algorithm. The specificity of the PHQ-9-linear was 86.34% (95% CI 80.36–91.38) v. 95.71% (95% CI 93.54–97.45) for the PHQ-9-algorithm, suggesting significantly lower specificity for the PHQ-9-linear. However, caution is necessary as these results are from a predefined cut-point of >10. Results were broadly unchanged in the sensitivity analyses (Table 3). Cut-off analysis: effect of cut-off thresholds. In an analysis restricted to specific cut-offs, we analysed the effect of choosing different fixed cut-off thresholds on the PHQ-2 and PHQ-9 when scored using linear scoring. Results are shown in Table 3. Inevitably, as the cut-point increased sensitivity reduced and specificity increased. For the PHQ-9, looking at combined sensitivity and specificity (Youden index), the optimal cut-off would be ≥10 followed by ≥11. Moderator analysis: effect of influencing variables. In a moderator analysis we found no association between country, mean age, gender, year of publication or sample size.

ROC curve meta-analysis

Main analysis PHQ-9 linear, PHQ algorithm and PHQ-2. The pooled ROC diagnostic validity meta-analysis gave an overall area estimate of 0.91 (95% CI 0.892–0.930) for the PHQ-9-linear, 0.733 (95% CI 0.676–0.795) for the PHQ-9-algorithm and 0.860 (95% CI 0.819–0.903) for the PHQ-2. In all cases there was significant heterogeneity but no significant publication bias; see Table 3 (summary of results). Results were broadly unchanged in moderator analysis with area under the ROC of 0.910 (95% CI 0.882–0.939) for the PHQ-9 linear, 0.732 (0.667–0.803) for the PHQ-9-algorithm and 0.877 (0.824–0.934) for the PHQ-2. Subanalysis (head to head) PHQ-9-linear The area under the ROC for the PHQ-2 was 0.898 (95% CI 0.864–0.933) and 0.922 (95% CI 0.882–0.964) for the PHQ-9-linear in the head-to-head studies. Once again, results were unchanged in the sensitivity analysis. Subanalysis (head to head) PHQ-9-linear The area under the ROC for the PHQ-9-linear was 92.01 (95% CI 91.53–92.48) and 71.49 (95% CI 62.75–81.45) for the PHQ-9-algorithm when restricted to four head-to-head studies. Subanalysis (head to head) PHQ-2 There were insufficient data for this comparison.

Test performance: case finding v. screening

Examining PPV, the diagnostic validity meta-analysis suggested superior PPV of the PHQ-9-algorithm 47.4% (95% CI 44.7–50.1) compared with the PHQ-2 23.1% (95% CI 21.5–24.7); however, caution is required because prevalence is not controlled for (i.e. not matched in both analyses) (correction for prevalence is shown in the Bayesian curve of conditional probabilities). Examining NPV, meta-analysis suggested superior PPV of the PHQ-2 97.5% (95% CI 97.0–97.9) compared with the PHQ-9-algorithm 94.8% (95% CI 94.3–95.3); however, caution is again required because prevalence is not controlled for in this analysis. Results using likelihood ratios are shown in Table 3 but more informative is the clinical utility. For case finding (CUI+), all methods were disappointing, with the following results: PHQ-9-linear 0.312 (95% CI 0.311–0.313), PHQ-9-algorithm 0.277 (95% CI 0.276–0.278) and PHQ-2 0.188 (95% CI 0.187–0.189), all suggesting very poor performance at typical prevalence rates seen in primary care. Results were not substantially different using a moderator analysis for high-quality studies with a fixed cut-off or using head-to-head analysis. For application as a screening test (CUI+) all methods were satisfactory with the following results: PHQ-9-linear 0.827 (95% CI 0.826–0.828), PHQ-9-algorithm 0.873 (95% CI 0.873–0.873) and PHQ-2 0.708 (95% CI 0.707–0.709), all suggesting good to excellent performance at typical prevalence rates seen in primary care. Results were not substantially different using a moderator analysis for high-quality studies with a fixed cut-off or using head-to-head analysis. All analyses suggested the optimal rule-out screening test would be the PHQ-9-algorithm, closely followed by the PHQ linear. Using a Bayesian curve of conditional probabilities, the performance of each test (judged by PPV and NPV) can be demonstrated at every possible prevalence applicable to different settings (Figs. 3 and 4). From the Bayesian curve, the most encouraging test would be the PHQ-2 used as an initial screener followed by either the PHQ-9-linear or another suitable case-finding tool.

Fig. 3

Bivariate plot of summary accuracy of PHQ-9-linear v. PHQ-9-algorithm v. PHQ-2 (restricted to high-quality studies).

Fig. 4

Bayesian plot of conditional probabilities PHQ-9-linear v. PHQ-9-algoithm v. PHQ-2.

Cut-off analysis: effect of cut-off thresholds

On the PHQ-9, looking at combined PPV and NPV (predictive summary index), the optimal cut-point would be ≥14. For the PHQ-2, looking at combined sensitivity and specificity (Youden index), the optimal cut-off would be ≥3 closely followed by ≥2 (note that ≥2 is the conventional threshold). However, looking at combined PPV and NPV (predictive summary index), the optimal cut-point would be ≥6, followed by the ≥5. Comparing the PHQ-9 and the PHQ-2 across all possible cut-offs shows that neither is satisfactory as a case-finding tool in primary care at any cut-off, but the optimal single method is the PHQ-2 at a threshold of ≥6. Six per cent of those without MDD have a score of 5 or lower on the PHQ-2 and of those with a score of 5 or lower, 93.5% are true negatives (true non-cases) (Tables 2–4).

Table 4

PHQ cut-off threshold analysis in primary care

Test	Sensitivity, % (95% CI)	Specificity, % (95% CI)	PPV	NPV	CUI+	CUI−	LR+	LR−
PHQ-2
Cut PHQ-2 ≥1 n=20	96.05 (92.29–98.60)	52.18 (43.42–60.8)	14.7 (13.6–15.9)	99.4 (99.1–99.6)	0.141 (0.140–0.142)‘very poor’	0.519 (0.519–0.519)‘fair’	2.01 (1.95–2.07)	0.07 (0.05–0.11)
Cut PHQ-2 ≥2 n=9	92.20 (85.21–97.10)	70.98 (64.63–76.94)	22.8 (21.2–24.5)	99.0 (98.7–99.3)	0.211 (0.210–0.212)‘very poor’	0.703 (0.703–.703)‘good’	3.18 (3.04–3.32)	0.11 (0.08–0.14)
Cut PHQ-2 ≥3 n=11	76.22 (61.1–88.53)	88.66 (85.01–91.86)	38.6 (35.9–41.3)	97.6 (97.2–97.9)	0.294 (0.293–0.295)‘very poor’	0.865 (0.865–0.865)‘excellent’	6.74 (6.23–7.30)	0.27 (0.23–0.31)
Cut PHQ-2 ≥4 n=11	61.46 (44.00–77.52)	94.14 (91.73–96.15)	53.4 (49.7–57.2)	95.7 (95.2–96.3)	0.329 (0.328–0.330‘very poor’	0.901 (0.901–0.901)‘excellent’	10.45 (9.22–11.85)	0.41 (0.37–0.45)
Cut PHQ-2 ≥5 n=11	47.33 (26.78–68.37)	97.60 (95.36–99.12)	72.8 (67.2–78.4)	93.2 (92.2–94.1)	0.344 (0.241–0.346)‘very poor’	0.909 (0.908–0.910)‘excellent’	19.77 (15.23–25.68)	0.54 (0.49–0.60)
Cut PHQ-2 ≥6 n=11	51.80 (23.07–79.88)	98.63 (96.91–99.66)	83.3 (78.5–88.2)	93.5 (92.6–94.4)	0.421 (0.419–0.424)‘poor’	0.421 (0.419–0.424)[b]‘poor’	0.922 (0.921–0.923)[a] ‘excellent’	0.50 (0.45–0.56)

PHQ-9
Cut PHQ-9 ≥6 n=20	89.81 (81.91–95.63)	62.79 (51.02–73.84)	28.9 (26.7–31.1)	97.3 (96.6–98.0)	0.259 (0.258–0.250)‘very poor’	0.611 (0.611–0.611)‘fair’	2.41 (2.28–2.55)	0.16 (0.13–0.21)
Cut PHQ-9 ≥7 n=20	84.69 (74.32–92.75)	69.17 (57.72–79.53)	31.6 (29.2–34.1)	96.4 (95.6–97.2)	0.268 (0.267–0.269)‘very poor’	0.667 (0.666–0.668)‘good’	2.75 (2.57–2.93)	0.22 (0.18–0.27)
Cut PHQ-9 ≥8 n=20	80.25 (71.00–88.09)	76.54 (69.59–82.84)	29.4 (27.4–31.5)	96.9 (96.4–97.5)	0.236 (0.235–0.237)‘very poor’	0.742 (0.71–0.743)‘good’	3.41 (3.21–3.63)	0.26 (0.22–0.30)
Cut PHQ-9 ≥9 n=20	81.31 (69.69–90.64)	79.82 (72.51–86.26)	32.3 (30.0–34.7)	97.3 (96.8–97.8)	0.263 (0.262–0.264)‘very poor’	0.777 (0.776–0.778)‘good’	4.03 (3.77–4.31)	0.23 (0.20–0.28)
Cut PHQ-9 ≥10 n=20	81.3 (71.6–89.3)	85.3 (81.0–89.1)	44.2 (41.9–46.6)	97.0 (96.7–97.4)	0.333 (0.337–0.339) ‘very poor’	0.863 (0.863–0.863) ‘excellent’	6.90 (6.44–7.40)	0.27 (0.24–0.30)
Cut PHQ-9 ≥11 n=20	75.40 (60.77–87.52)	87.86 (82.77–92.17)	44.6 (41.8–47.4)	96.5 (96.0–97.0)	0.336 (0.335–0.337)‘very poor’	0.848 (0.848–0;.848)‘excellent’	6.23 (5.74–6.76)	0.28 (0.25–0.32)
Cut PHQ-9 ≥12 n=20	68.37 (54.71–80.58)	90.88 (87.54–93.73)	49.1 (46.1–52.0)	95.7 (95.2–96.2)	0.336 (0.335–0.336)‘very poor’	0.870 (0.870–8.70)‘excellent’	7.51 (6.85–8.23)	0.35 (0.31–0.39)
Cut PHQ-9 ≥13 n=20	69.92 (58.39–80.30)	92.93 (89.33–95.83)	60.2 (55.5–64.9)	95.3 (94.4–96.1)	0.421 (0.419–0.423)‘poor'	0.421 (0.419–0.423)[b]‘poor’	9.84 (8.38–11.55)	0.32 (0.28–0.38)
Cut PHQ-9 ≥14 n=20	56.04 (42.88–68.77)	96.57 (94.48–98.18)	73.4 (67.1–79.8)	92.9 (91.6–94.2)	0.411 (0.408–0.415)‘poor’	0.898 (0.897–0.898)[a]‘excellent’	16.5 (12.26–22.21)	0.46 (0.40–0.53)

Optimal cut-off for ruling out those without depression (screening).

Optimal cut-off for ruling in those with depression (case-finding).

PPV, positive predictive value; NPV, negative predictive value; ROC, receiver operating characteristic; CUI+, positive clinical utility index; CUI–, negative clinical utility index; LI+, positive likelihood ratio; LI–, negative likelihood ratio. Optimal cut-off for ruling out those without depression (screening). Optimal cut-off for ruling in those with depression (case-finding).

Discussion

A previous meta-analysis of 41 studies involving 50 371 individuals in primary care found a pooled prevalence of 18.4% (95% CI 13.5–23.9) in adults aged 18–65 years using semi-structured interviews.[1] In this study, we found a slightly lower prevalence of depression in primary care of 14.3% (95% CI 11.3–17.7%) across 14 760 adults. The PHQ-9-linear had better sensitivity but worse specificity than the PHQ-9-algorithm. However, this finding could result from choosing a PHQ-9-linear cut-off threshold which is too low. Regarding the PHQ-2, it had significantly greater specificity over the PHQ-9-linear method. Analysis using the ROC meta-analysis suggested that the area under the ROC of the PHQ-9-linear as well as that of the PHQ-2 was significantly higher than the PHQ-9-algorithm which was surprising given that the PHQ-9-algorithm more tightly adheres to the DSM criterion standard. The difference was maintained when PHQ-9-linear and PHQ-9-algorithm were compared with analysis was restricted to four head-to-head studies. In head-to-head studies, the tools are tested against one another in the same sample, ruling out differences according to prevalence or local conditions. Using the same methods, there was no clear differences between the PHQ-2 and PHQ9-linear which again is surprising given the brevity of the PHQ-2. However, these results do not clarify a specific role for any method in either screening or case finding. For case finding, consistent with previous literature, all methods were disappointing with the results on the CUI+ graded as ‘very poor’. Looking at PPV alone for all methods using the Bayesian curve, results were similarly poor thus confirming overall poor performance of this method at typical prevalence rates seen in primary care. In short, a positive test is infrequent in a typical primary care sample and/or a positive test (when it does occur) is not especially discriminating. For application as a screening test all methods were encouraging with the following results on the CUI−: PHQ-9-linear 0.827 (0.826–0.828), PHQ-9-algorithm 0.873 (0.873–0.873) and PHQ-2 0.708 (0.707–0.709), all suggesting good to excellent performance at typical prevalence rates. In NPV, values were all high. Examining this effect in more detail using a Bayesian curve of conditional probabilities demonstrated (Figs 3 and 4) that although none of the methods performed particularly well at case finding at any prevalence rate when used alone, they performed reasonably well at initial first step. The most practical use of these tools would be the PHQ-2 used as an initial screener followed by either the PHQ-9-linear or another suitable case-finding tool. We also analysed the effect of varying the cut-point. If simply considering sensitivity and specificity, then the cut-point analysis suggested that the current thresholds of ≥10 on the PHQ-9 and ≥2 on the PHQ-2 are very close to optimal. However, as discussed above there is more to the application of tests in clinical practice than simply looking at combined sensitivity and specificity. Clinical utility is better represented by PPV and NPV. Using PPV and NPV (combined) suggests that a substantially higher cut-point in both the PHQ-9 and the PHQ-2 may be appropriate. Furthermore, if one discounts their role in case finding and simply concentrates on rule-out ability (CUI–), then the optimal cut points would be ≥14 on the PHQ-9 and ≥6 on the PHQ-2. Although these high thresholds are surprising, it is evident that those without MDD, 98.6% have a score of 5 or lower on the PHQ-2 and of those with a score of 5 or lower, 93.5% are true negatives (true non-cases). Similarly, 96.5% of non-cases scored <14 on the PHQ-9 and of those that do, 92.9% are true negatives. We suggest further work is required to examine the optimal cut-off thresholds if a two-step procedure were to be used.

Limitations

We acknowledge that there were relatively few studies with all the required subgroups and not all studies reported ROC data (but we were able to calculate this in many cases). To date, studies have not attempted to clarify whether the sample comprises previously untreated or previously undiagnosed patients. We did not attempt to look at severity assessment or sensitivity to change. It must also be acknowledged that the results presented represent the outcome of a single application of the PHQ. Multiple (serial) applications may be conducted in clinical practice and would change results. For completeness, if the PHQ-2 is initially applied (step 1), followed by PHQ-9-linear to those who score positive in step 1, then the combined sensitivity would be 72.4% and specificity 96.4% (overall accuracy 93.0%). If the PHQ-2 were to be initially applied followed by PHQ-9-algorithm, the combined sensitivity would be 50.7% and specificity 98.4% (overall accuracy 91.6%).

Clinical implications and further research

The PHQ has potential to be used to rule out those without depression with few false negatives but an adjustment of the cut-off points (≥14 on the PHQ-9 and ≥6 on the PHQ-2) should be considered. Alternatively its routine use can be improved by a two-step procedure using PHQ-2 and then PHQ-9. This would also reduce the burden on clinicians as the PHQ-9 would only be applied following a positive initial PHQ-2 screen. Depression tools applied for the purpose of screening and/or case finding will only be of use if combined with adequate follow-up and adequate treatment. Screening without removal of barriers to high-quality care is potentially frustrating and arguably counterproductive. Several reviews found modest evidence to support QoF-based PHQ scoring in part because primary care clinicians may lack the skills or resources to appropriately follow-up a positive screen.[26,69] Further work on cut-off thresholds and repeat assessment may further improve results but care must be taken not to increase the burden on clinicians if they are required to implement screening tools. This meta-analysis confirms that neither the PHQ-9 nor the PHQ-2 can confirm a diagnosis of MDD when used alone as a one-off measure and this is independent of the scoring method. However, the PHQ-9 and indeed the PHQ-2 can be used as an initial first screening step and indeed performs quite well in this regard.

62 in total

1. Screening for major and minor depression in a multiethnic sample of Asian primary care patients: a comparison of the nine-item Patient Health Questionnaire (PHQ-9) and the 16-item Quick Inventory of Depressive Symptomatology - Self-Report (QIDS-SR16 ).

Authors: Sharon Cohan Sung; Charity Cheng Hong Low; Daniel Shuen Sheng Fung; Yiong Huak Chan
Journal: Asia Pac Psychiatry Date: 2013-10-03 Impact factor: 2.538

2. Evaluation of the PHQ-2 as a brief screen for detecting major depression among adolescents.

Authors: Laura P Richardson; Carol Rockhill; Joan E Russo; David C Grossman; Julie Richards; Carolyn McCarty; Elizabeth McCauley; Wayne Katon
Journal: Pediatrics Date: 2010-04-05 Impact factor: 7.124

3. The PHQ-9: validity of a brief depression severity measure.

Authors: K Kroenke; R L Spitzer; J B Williams
Journal: J Gen Intern Med Date: 2001-09 Impact factor: 5.128

4. Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis.

Authors: Laura Manea; Simon Gilbody; Dean McMillan
Journal: CMAJ Date: 2011-12-19 Impact factor: 8.262

5. Reliability and validity of the PHQ-9 for screening late-life depression in Chinese primary care.

Authors: Shulin Chen; Helen Chiu; Baihua Xu; Yan Ma; Tao Jin; Manhua Wu; Yeates Conwell
Journal: Int J Geriatr Psychiatry Date: 2010-11 Impact factor: 3.485

6. Validation of Patient Health Questionnaire for depression screening among primary care patients in Taiwan.

Authors: Shen-Ing Liu; Zai-Ting Yeh; Hui-Chun Huang; Fang-Ju Sun; Jin-Jin Tjung; Lee-Ching Hwang; Yang-Hsien Shih; Andrew Wei-Chiang Yeh
Journal: Compr Psychiatry Date: 2010-07-01 Impact factor: 3.735

7. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire.

Authors: R L Spitzer; K Kroenke; J B Williams
Journal: JAMA Date: 1999-11-10 Impact factor: 56.272

8. Summed score of the Patient Health Questionnaire-9 was a reliable and valid method for depression screening in chronically ill elderly patients.

Authors: Femke Lamers; Catharina C M Jonkers; Hans Bosma; Brenda W J H Penninx; J André Knottnerus; Jacques Th M van Eijk
Journal: J Clin Epidemiol Date: 2008-02-14 Impact factor: 6.437

9. Validity of the Patient Health Questionnaire-9 to screen for depression in a high-HIV burden primary healthcare clinic in Johannesburg, South Africa.

Authors: R Cholera; B N Gaynes; B W Pence; J Bassett; N Qangule; C Macphail; S Bernhardt; A Pettifor; W C Miller
Journal: J Affect Disord Date: 2014-06-12 Impact factor: 4.839

10. The Patient Health Questionnaire-9 for detection of major depressive disorder in primary care: consequences of current thresholds in a crosssectional study.

Authors: Nicolaas P A Zuithoff; Yvonne Vergouwe; Michael King; Irwin Nazareth; Manja J van Wezep; Karel G M Moons; Mirjam I Geerlings
Journal: BMC Fam Pract Date: 2010-12-13 Impact factor: 2.497

66 in total

1. Physician-Training Stress and Accelerated Cellular Aging.

Authors: Kathryn K Ridout; Samuel J Ridout; Constance Guille; Douglas A Mata; Huda Akil; Srijan Sen
Journal: Biol Psychiatry Date: 2019-05-09 Impact factor: 13.382

2. Positive association between Toxoplasma gondii IgG serointensity and current dysphoria/hopelessness scores in the Old Order Amish: a preliminary study.

Authors: Abhishek Wadhawan; Aline Dagdag; Allyson Duffy; Melanie L Daue; Kathy A Ryan; Lisa A Brenner; John W Stiller; Toni I Pollin; Maureen W Groer; Xuemei Huang; Christopher A Lowry; Braxton D Mitchell; Teodor T Postolache
Journal: Pteridines Date: 2017-11-22 Impact factor: 0.581

3. Measurement invariance of the patient health questionnaire-9 (PHQ-9) depression screener in U.S. adults across sex, race/ethnicity, and education level: NHANES 2005-2016.

Authors: Jay S Patel; Youngha Oh; Kevin L Rand; Wei Wu; Melissa A Cyders; Kurt Kroenke; Jesse C Stewart
Journal: Depress Anxiety Date: 2019-07-29 Impact factor: 6.505

4. PHQ-9: global uptake of a depression scale.

Authors: Kurt Kroenke
Journal: World Psychiatry Date: 2021-02 Impact factor: 49.548

5. Depression in Emergency Department Patients and Association With Health Care Utilization.

Authors: David G Beiser; Charlotte E Ward; Milkie Vu; Neda Laiteerapong; Robert D Gibbons
Journal: Acad Emerg Med Date: 2019-04-07 Impact factor: 3.451

6. Validation of the Computerized Adaptive Test for Mental Health in Primary Care.

Authors: Andrea K Graham; Alexa Minc; Erin Staab; David G Beiser; Robert D Gibbons; Neda Laiteerapong
Journal: Ann Fam Med Date: 2019-01 Impact factor: 5.166

7. Re: Response to 'Depressive symptoms and psychosocial aspects of work in bank employees'.

Authors: Maria do Socorro da Silva Valente; Paulo Rossi Menezes; Maria Pastor-Valero; Claudia de Souza Lopes
Journal: Occup Med (Lond) Date: 2016-07 Impact factor: 1.611

8. The measurement invariance of the Patient Health Questionnaire-9 for American Indian adults.

Authors: Melissa L Harry; Stephen C Waring
Journal: J Affect Disord Date: 2019-05-11 Impact factor: 4.839

9. Screening for Behavioral Health Conditions in Primary Care Settings: A Systematic Review of the Literature.

Authors: Norah Mulvaney-Day; Tina Marshall; Kathryn Downey Piscopo; Neil Korsen; Sean Lynch; Lucy H Karnell; Garrett E Moran; Allen S Daniels; Sushmita Shoma Ghose
Journal: J Gen Intern Med Date: 2017-09-25 Impact factor: 5.128

10. Exercise as Medicine for Mental and Substance Use Disorders: A Meta-review of the Benefits for Neuropsychiatric and Cognitive Outcomes.

Authors: Garcia Ashdown-Franks; Joseph Firth; Rebekah Carney; Andre F Carvalho; Mats Hallgren; Ai Koyanagi; Simon Rosenbaum; Felipe B Schuch; Lee Smith; Marco Solmi; Davy Vancampfort; Brendon Stubbs
Journal: Sports Med Date: 2020-01 Impact factor: 11.136