| Literature DB >> 27703765 |
Alex J Mitchell1, Motahare Yadegarfar2, John Gill2, Brendon Stubbs3.
Abstract
BACKGROUND: The Patient Health Questionnaire (PHQ) is the most commonly used measure to screen for depression in primary care but there is still lack of clarity about its accuracy and optimal scoring method. AIMS: To determine via meta-analysis the diagnostic accuracy of the PHQ-9-linear, PHQ-9-algorithm and PHQ-2 questions to detect major depressive disorder (MDD) among adults.Entities:
Year: 2016 PMID: 27703765 PMCID: PMC4995584 DOI: 10.1192/bjpo.bp.115.001685
Source DB: PubMed Journal: BJPsych Open ISSN: 2056-4724
Fig. 1PRISMA flow diagram of search strategy.
Summary of included studies
| Author | PHQ method | Sample mean age and % male/female | Sample size | Prevalence of depression, % | Reference standard |
|---|---|---|---|---|---|
| Lowe | PHQ-2 (linear) ≥2 | 42 years, 32.5% male | 520 | 13.7 | SCID, DSM-IV |
| Löwe | PHQ-9-linear and algorithm | 41.7 years, 32.9% male | 501 | 13.2 | SCID, DSM-IV |
| Arroll | PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10; PHQ-9 (algorithm) | 49 years, 39% male | 2642 | 6.2 | CIDI, DSM-IV |
| Ayalon | PHQ-9 (algorithm) | 75 years 59.5% male | 153 | 3.9 | SCID, DSM-IV |
| Azah | PHQ-9 (linear) ≥10 | 38.7 years, 38.3% male | 180 | 46.1 | CIDI, ICD-10 |
| Cannon | PHQ-9 (algorithm) | 57.2 years, 54% male | 526 | 26.6 | SCID DSM-IV MDD lifetime |
| Chen | PHQ-9 (linear) ≥10 | Age not reported, 47% male | 262 | 16.7 | SCID, DSM-IV |
| Chen | PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10 | 68.5 years, 56.3% female | 77 | 54.5 | SCID DSM-IV |
| De Lima Osório | PHQ-9 (linear) ≥10; PHQ-2 (linear) ≥2 | 48% under 30, 52% between 31 and 50 years, 100% female | 177 | 33.9 | SCID DSM-IV |
| Gelaye | PHQ-9 (linear) ≥10 | 35.1 years, 61.3% female | 363 | 12.7 | SCAN DSM-IV |
| Gilbody | PHQ-9 (linear) ≥10 | 42.5 years, 77.1% female | 96 | 37.5 | SCID DSM-IV |
| Henkel | PHQ-2 (linear) ≥2 | 53.9 years, 75% female | 382 | 10.0 | SCID DSM |
| Kroenke | PHQ-9 (linear) ≥10 | 46 years, 66% female | 580 | 7.1 | DSMIIR |
| Lamers | PHQ-9 (algorithm); PHQ-9 (linear) ≥8 | 71.4 years, 51.8% male | 713 | 17.3 | MINI DSM-IV |
| Liu | PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10 | 18 years or older, 39% male | 1532 | 3.3 | SCAN DSM-IV |
| Lotrakul | PHQ-9 (linear) ≥10; PHQ-9 (algorithm) | 45 years, 73.7% female | 279 | 6.8 | MINI DSM-IV |
| Patel | PHQ-9 (linear) ≥10 | 37.5 years, 56.4% female | 598 | 5.5 | ICD-10 |
| Phelan | PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10 | 78 years, 62% female | 69 | 11.6 | SCID DSM |
| Richardson | PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10 | 15.3 years, 60% female | 442 | 4.3 | DIS for MDD in children (DISC) |
| Sherina | PHQ-9 (linear) ≥10 | 30.9 years, 100% female | 146 | 12.1 | CIDI, ICD-10 |
| Spitzer | PHQ-9 (algorithm) | 46 years, 66% female | 585 | 10.0 | DSM |
| Sung | PHQ-9 >6 | 36.1 years, 65.3% female | 400 | 3.0 | MINI DSM |
| Wittkampf | PHQ-9 (linear) ≥10; PHQ-9 (algorithm) | 49.8 years, 66.7% female | 664 | 12.3 | SCID-I |
| Yeung | PHQ-9 (linear) ≥15 | Not reported | 184 | 22.8 | SCID DSM |
| Zuithoff | PHQ-2 (linear) ≥2; PHQ-9 (linear) ≥10; PHQ-9 (algorithm) | 51 years, 37% female | 1352 | 13.0 | CIDI DSM-IV |
CIDI, Composite International Diagnostic Interview; DIS, Diagnostic Interview Schedule; DISC, Diagnostic Interview Schedule for Children; DSM, Diagnostic and Statistical Manual of Mental Disorders; ICD, International Classification of Diseases; MDD, major depressive disorder; MINI, Mini International Neuropsychiatric Interview; PHQ, Patient Health Questionnaire; SCAN, Schedules for Clinical Assessment in Neuropsychiatry; SCID, Structured Clinical Interview for DSM Disorders.
Inconsistency and bias analysis in Patient Health Questionnaire (PHQ) data in primary care
| Test | Sensitivity bias, % (95% CI) | Specificity bias, % (95% CI) | ROC bias, % (95% CI) |
|---|---|---|---|
|
| |||
| PHQ-9-linear |
|
|
|
| PHQ-9-algorithm |
|
|
|
| PHQ-2 |
|
|
|
|
| |||
| PHQ-9-linear |
|
|
|
| PHQ-2-linear |
|
|
|
| PHQ-9-algorithm |
|
|
|
|
| |||
| PHQ-2 |
|
|
|
| PHQ-9-linear |
|
|
|
| PHQ-9-algorithm |
|
|
|
ROC, receiver operating characteristic.
Values in bold are significant.
Summary of Patient Health Questionnaire (PHQ) analysis in primary care
| Test | Sensitivity, % (95% CI) | Specificity, % (95% CI) | PPV | NPV | ROC | CUI+ | CUI- | LR+ | LR- |
|---|---|---|---|---|---|---|---|---|---|
| Main results | |||||||||
| PHQ-9-linear | 82.2 (74.3–88.9) | 84.7 (0.80–0.89) | 38.0% (36.1–39.9%) | 97.7% (97.3–98.0%) | 0.910 (0.892–0.930) | 0.312 (0.311–0.313) ‘very poor’ | 0.827 (0.826–0.828) ‘excellent’ | 5.37 (5.09–5.67) | 0.21 (0.19–0.24) |
| PHQ-9-algorithm | 58.4 (44.5–71.7) | 92.1 (85.9–96.6) | 47.4% (44.7–50.1%) | 94.8% (94.3–95.3%) | 0.733 (0.676–0.795) | 0.277 (0.276–0.278) ‘very poor’ | 0.873 (0.873–0.873) ‘excellent’ | 7.39 (6.77–8.07) | 0.45 (0.42–0.49) |
| PHQ-2 | 89.9 (83.4–94.9) | 72.6 (66.0–78.7) | 23.1% (21.5–24.7%) | 97.5% (97.0–97.9%) | 0.860 (0.819–0.903) | 0.188 (0.187–0.189) ‘very poor’ | 0.708 (0.707–0.709 ‘good’ | 2.97 (2.82–3.12) | 0.26 (0.22–0.30) |
| Head-to-head results | |||||||||
| PHQ-9-linear | 87.0 (75.8–95.1) | 87.2 (81.1–92.2) | 38.2% (35.1–41.2%) | 98.6% (98.3–98.9%) | 0.920 (0.915–0.924) | 0.322 (0.321–0.323)‘very poor’ | 0.877 (0.987–0.878) ‘excellent’ | 7.66 (7.04–8.33) | 0.18 (0.14–0.22) |
| PHQ-2-linear | 91.5 (83.6–96.9) | 72.2 (64.0–79.8) | 22.7% (20.8–24.6%) | 99.0% (98.7–99.3%) | 0.900 (0.865–0.934) | 0.205 (0.205–0.206) ‘very poor’ | 0.742 (0.741–0.743 ‘good’ | 3.61 (3.42–3.81) | 0.13 (0.10–0.17) |
| PHQ-9-algo | 53.0 (36.4–69.3) | 95.7 (93.5–97.5) | 58.4% (54.1–62.8%) | 94.1% (93.5–94.7%) | 0.715 (0.628–0.815) | 0.273 (0.272–0.275) ‘very poor’ | 0.905 (0.904–0.906) ‘excellent’ | 12.35 (10.56–14.45) | 0.55 (0.51–0.50) |
| Moderator analysis | |||||||||
| PHQ-9-linear | 81.3 (71.6–89.3) 82.33(72.0–89.4)[ | 85.3 (81.0–89.1) 86.4 (81.2–90.4)[ | 44.2% (41.9–46.6%) | 38.9 (36.8–41.0) | 97.5 (97.2–97.9) | 0.316 (0.315–0.317) ‘very poor’ | 0.832 (0.832–0.832) ‘excellent’ | 5.53 (5.21–5.87) | 0.22 (0.19–0.25) |
| PHQ-9-algorithm | 56.8 (41.2–71.8) 54.0 (40.0–67.5)[ | 93.3 (87.5–97.3) 95.9 (94.0 – 97.3)[ | 60.3% (57.0–63.6%) | 48.3% (45.4–51.3) | 95.1% (94.7–95.6) | 0.275 (0.274–0.276) ‘very poor’ | 0.887 (0.887–0.887) ‘excellent’ | 8.46 (97.67 –9.33) | 0.46 (0.43–0.50 |
| PHQ-2 | 89.3 (81.5–95.1) 91.4 (81.9–96.2)[ | 75.9% (70.1–81.3) 76.3 (69.6–82.0)[ | 27.7% (25.8–29.6%) | 26.5% (24.6–28.3%) | 98.6% (98.3–99.0%) | 0.236 (0.235–0.237) ‘very poor’ | 0.749 (0.749–0.749) ‘good’ | 3.71 (3.52–3.90 | 0.14 (0.11–0.18) |
PPV, positive predictive value; NPV, negative predictive value; ROC, receiver operating characteristic; CUI+, positive clinical utility index; CUI−, negative clinical utility index; LI+, positive likelihood ratio; LI−, negative likelihood ratio.
Alternative calculation based on bivariate calculation in STATA.
Fig. 2Bayesian plot of conditional probabilities PHQ-9-linear v. PHQ-9-algoithm v. PHQ-2 (restricted to head-to-head studies).
Fig. 3Bivariate plot of summary accuracy of PHQ-9-linear v. PHQ-9-algorithm v. PHQ-2 (restricted to high-quality studies).
Fig. 4Bayesian plot of conditional probabilities PHQ-9-linear v. PHQ-9-algoithm v. PHQ-2.
PHQ cut-off threshold analysis in primary care
| Test | Sensitivity, % (95% CI) | Specificity, % (95% CI) | PPV | NPV | CUI+ | CUI− | LR+ | LR− | |
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| Cut PHQ-2 ≥1 | 96.05 (92.29–98.60) | 52.18 (43.42–60.8) | 14.7 (13.6–15.9) | 99.4 (99.1–99.6) | 0.141 (0.140–0.142) | 0.519 (0.519–0.519) | 2.01 (1.95–2.07) | 0.07 (0.05–0.11) | |
| Cut PHQ-2 ≥2 | 92.20 (85.21–97.10) | 70.98 (64.63–76.94) | 22.8 (21.2–24.5) | 99.0 (98.7–99.3) | 0.211 (0.210–0.212) | 0.703 (0.703–.703) | 3.18 (3.04–3.32) | 0.11 (0.08–0.14) | |
| Cut PHQ-2 ≥3 | 76.22 (61.1–88.53) | 88.66 (85.01–91.86) | 38.6 (35.9–41.3) | 97.6 (97.2–97.9) | 0.294 (0.293–0.295) | 0.865 (0.865–0.865) | 6.74 (6.23–7.30) | 0.27 (0.23–0.31) | |
| Cut PHQ-2 ≥4 | 61.46 (44.00–77.52) | 94.14 (91.73–96.15) | 53.4 (49.7–57.2) | 95.7 (95.2–96.3) | 0.329 (0.328–0.330 | 0.901 (0.901–0.901) | 10.45 (9.22–11.85) | 0.41 (0.37–0.45) | |
| Cut PHQ-2 ≥5 | 47.33 (26.78–68.37) | 97.60 (95.36–99.12) | 72.8 (67.2–78.4) | 93.2 (92.2–94.1) | 0.344 (0.241–0.346) | 0.909 (0.908–0.910) | 19.77 (15.23–25.68) | 0.54 (0.49–0.60) | |
| Cut PHQ-2 ≥6 | 51.80 (23.07–79.88) | 98.63 (96.91–99.66) | 83.3 (78.5–88.2) | 93.5 (92.6–94.4) | 0.421 (0.419–0.424) | 0.421 (0.419–0.424)[ | 0.922 (0.921–0.923)[ | 0.50 (0.45–0.56) | |
|
| |||||||||
| Cut PHQ-9 ≥6 | 89.81 (81.91–95.63) | 62.79 (51.02–73.84) | 28.9 (26.7–31.1) | 97.3 (96.6–98.0) | 0.259 (0.258–0.250) | 0.611 (0.611–0.611) | 2.41 (2.28–2.55) | 0.16 (0.13–0.21) | |
| Cut PHQ-9 ≥7 | 84.69 (74.32–92.75) | 69.17 (57.72–79.53) | 31.6 (29.2–34.1) | 96.4 (95.6–97.2) | 0.268 (0.267–0.269) | 0.667 (0.666–0.668) | 2.75 (2.57–2.93) | 0.22 (0.18–0.27) | |
| Cut PHQ-9 ≥8 | 80.25 (71.00–88.09) | 76.54 (69.59–82.84) | 29.4 (27.4–31.5) | 96.9 (96.4–97.5) | 0.236 (0.235–0.237) | 0.742 (0.71–0.743) | 3.41 (3.21–3.63) | 0.26 (0.22–0.30) | |
| Cut PHQ-9 ≥9 | 81.31 (69.69–90.64) | 79.82 (72.51–86.26) | 32.3 (30.0–34.7) | 97.3 (96.8–97.8) | 0.263 (0.262–0.264) | 0.777 (0.776–0.778) | 4.03 (3.77–4.31) | 0.23 (0.20–0.28) | |
| Cut PHQ-9 ≥10 | 81.3 (71.6–89.3) | 85.3 (81.0–89.1) | 44.2 (41.9–46.6) | 97.0 (96.7–97.4) | 0.333 (0.337–0.339) | 0.863 (0.863–0.863) | 6.90 (6.44–7.40) | 0.27 (0.24–0.30) | |
| Cut PHQ-9 ≥11 | 75.40 (60.77–87.52) | 87.86 (82.77–92.17) | 44.6 (41.8–47.4) | 96.5 (96.0–97.0) | 0.336 (0.335–0.337) | 0.848 (0.848–0;.848) | 6.23 (5.74–6.76) | 0.28 (0.25–0.32) | |
| Cut PHQ-9 ≥12 | 68.37 (54.71–80.58) | 90.88 (87.54–93.73) | 49.1 (46.1–52.0) | 95.7 (95.2–96.2) | 0.336 (0.335–0.336) | 0.870 (0.870–8.70) | 7.51 (6.85–8.23) | 0.35 (0.31–0.39) | |
| Cut PHQ-9 ≥13 | 69.92 (58.39–80.30) | 92.93 (89.33–95.83) | 60.2 (55.5–64.9) | 95.3 (94.4–96.1) | 0.421 (0.419–0.423) | 0.421 (0.419–0.423)[ | 9.84 (8.38–11.55) | 0.32 (0.28–0.38) | |
| Cut PHQ-9 ≥14 | 56.04 (42.88–68.77) | 96.57 (94.48–98.18) | 73.4 (67.1–79.8) | 92.9 (91.6–94.2) | 0.411 (0.408–0.415) | 0.898 (0.897–0.898)[ | 16.5 (12.26–22.21) | 0.46 (0.40–0.53) | |
PPV, positive predictive value; NPV, negative predictive value; ROC, receiver operating characteristic; CUI+, positive clinical utility index; CUI–, negative clinical utility index; LI+, positive likelihood ratio; LI–, negative likelihood ratio.
Optimal cut-off for ruling out those without depression (screening).
Optimal cut-off for ruling in those with depression (case-finding).