Literature DB >> 33177069

Accuracy of the Edinburgh Postnatal Depression Scale (EPDS) for screening to detect major depression among pregnant and postpartum women: systematic review and meta-analysis of individual participant data.

Brooke Levis^1,2,3, Zelalem Negeri^1,2, Ying Sun¹, Andrea Benedetti^2,4,5, Brett D Thombs^{6,2,5,7,8,9,10}.

Abstract

OBJECTIVE: To evaluate the Edinburgh Postnatal Depression Scale (EPDS) for screening to detect major depression in pregnant and postpartum women.
DESIGN: Individual participant data meta-analysis. DATA SOURCES: Medline, Medline In-Process and Other Non-Indexed Citations, PsycINFO, and Web of Science (from inception to 3 October 2018). ELIGIBILITY CRITERIA FOR SELECTING STUDIES: Eligible datasets included EPDS scores and major depression classification based on validated diagnostic interviews. Bivariate random effects meta-analysis was used to estimate EPDS sensitivity and specificity compared with semi-structured, fully structured (Mini International Neuropsychiatric Interview (MINI) excluded), and MINI diagnostic interviews separately using individual participant data. One stage meta-regression was used to examine accuracy by reference standard categories and participant characteristics.
RESULTS: Individual participant data were obtained from 58 of 83 eligible studies (70%; 15 557 of 22 788 eligible participants (68%), 2069 with major depression). Combined sensitivity and specificity was maximised at a cut-off value of 11 or higher across reference standards. Among studies with a semi-structured interview (36 studies, 9066 participants, 1330 with major depression), sensitivity and specificity were 0.85 (95% confidence interval 0.79 to 0.90) and 0.84 (0.79 to 0.88) for a cut-off value of 10 or higher, 0.81 (0.75 to 0.87) and 0.88 (0.85 to 0.91) for a cut-off value of 11 or higher, and 0.66 (0.58 to 0.74) and 0.95 (0.92 to 0.96) for a cut-off value of 13 or higher, respectively. Accuracy was similar across reference standards and subgroups, including for pregnant and postpartum women.
CONCLUSIONS: An EPDS cut-off value of 11 or higher maximised combined sensitivity and specificity; a cut-off value of 13 or higher was less sensitive but more specific. To identify pregnant and postpartum women with higher symptom levels, a cut-off of 13 or higher could be used. Lower cut-off values could be used if the intention is to avoid false negatives and identify most patients who meet diagnostic criteria. REGISTRATION: PROSPERO (CRD42015024785). © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical

Mesh：

Year: 2020 PMID： 33177069 PMCID： PMC7656313 DOI： 10.1136/bmj.m4022

Source DB: PubMed Journal: BMJ ISSN： 0959-8138

Introduction

Depression is common in pregnant and postpartum women and is associated with adverse outcomes for the mother, developing child, mother-infant relationship, and intimate partner relationship.1 2 Depression screening could potentially improve detection and management of perinatal depression. Depression screening involves the use of self-report depression symptom questionnaires to identify women above a preidentified cut-off value for further evaluation to determine whether depression is present.3 4 In the United Kingdom, the National Institute for Health and Care Excellence guidelines5 suggest that healthcare providers consider asking pregnant or postpartum women the two Whooley questions,6 and administering the Edinburgh Postnatal Depression Scale (EPDS) or Patient Health Questionnaire-9 screening questionnaires as part of a full assessment if depression is suspected. The guidelines do not recommend administering a screening tool to all women. The UK National Screening Committee7 and Canadian Task Force on Preventive Health Care8 recommend against screening owing to concerns about false positives, possible harms, and the lack of evidence from well conducted trials that screening improves mental health outcomes. However, the United States Preventive Services Task Force (USPSTF)9 and Australian national guidelines10 recommend depression screening in pregnant and postpartum women, although the USPSTF notes that “screening should be implemented with adequate systems in place to ensure accurate diagnosis, effective treatment, and appropriate follow-up.” Depression screening is sometimes promoted in low and middle income countries, but it is not known whether it would improve mental health in those settings.2 The 10 item EPDS is the most commonly used depression screening tool in perinatal care; cut-off values of 10 or higher and 13 or higher are most often used to identify women who might have depression.11 12 13 14 15 The USPSTF recommends screening pregnant and postpartum women with the EPDS, but does not specify a cut-off value.9 The systematic review conducted to support the USPSTF guideline reported the range of accuracy estimates for EPDS cut-off values of 10 or higher (14 studies) and 13 or higher (17 studies) in 23 primary studies, but did not include a meta-analysis.14 15 An existing meta-analysis that has examined EPDS screening accuracy searched databases through February 2007 and found that combined sensitivity and specificity to detect major depression in postpartum women was greater for a cut-off value of 12 or higher (sensitivity 0.86, specificity 0.87, 15 studies) than for a cut-off value of 10 or higher (sensitivity 0.92, specificity 0.77, 14 studies) or 13 or higher (sensitivity 0.79, specificity 0.89, 18 studies) among a total of 18 studies.13 The results were not pooled for pregnant women because there were too few studies, and no subgroup analyses were conducted among postpartum women because primary studies did not report the necessary data. Estimates were not done separately for different types of reference standards, although important differences exist in design and structure, and in the likelihood of major depression classification between different diagnostic interviews.16 17 18 Therefore, the optimal cut-off value for screening remains unknown, and whether different cut-off values are needed for women with different characteristics needs to be determined. Whereas conventional meta-analyses synthesise aggregate results from study reports, individual participant data meta-analysis (IPDMA) involves the synthesis of participant level data from primary studies.19 The advantages of an IPDMA of the EPDS are the ability to include data from studies that collected EPDS and reference standard outcomes but did not publish accuracy results; the results for all cut-off values from all included studies can be taken into account rather than just published cut-off results; subgroup analyses can be conducted, which were not done in primary studies; and accuracy results can be reported separately for different reference standards. Our objectives were to use IPDMA to evaluate EPDS screening accuracy among studies that used different types of reference standards separately, with semi-structured interviews prioritised; and to investigate whether EPDS screening accuracy differs based on pregnant versus postpartum status, age, and country human development index.

Methods

This IPDMA was registered in PROSPERO (CRD42015024785), a protocol was published,20 and the results were reported following PRISMA-DTA (preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies)21 and PRISMA-IPD (preferred reporting items for systematic review and meta-analyses of individual participant data)22 guidelines. We followed similar methods to those used in our previously published Patient Health Questionaire-9 diagnostic accuracy IPDMA.23 Individual prediction models described in the protocol will be developed in future database versions. Deviations from the protocol include searching from database inception rather than from year 2000, including only one assessment time point for each woman given the small number of studies with multiple time points, and reporting results for cut-off values of 7-15 rather than 9-15.

Study eligibility

Datasets from studies that met the following criteria were deemed eligible: they administered the EPDS; diagnostic classification for current major depressive disorder or major depressive episode used Diagnostic and Statistical Manual of Mental Disorders (DSM)24 25 26 or international classification of diseases (ICD)27 criteria based on a validated semi-structured or fully structured interview; the EPDS and diagnostic interview were conducted no more than two weeks apart; participants were adult women aged at least 18 years who completed assessments during pregnancy or within 12 months of giving birth; and participants were not recruited because they were receiving psychiatric assessment or care, or because they were identified as having possible depression because screening seeks to identify women with otherwise unrecognised major depression.28 Studies in which some participants did not meet eligibility criteria were included in the IPDMA if primary data allowed for the selection of eligible participants.

Database searches and study selection

A medical librarian designed a peer reviewed29 search strategy (eMethods1 in supplementary material) and searched Medline, Medline In-Process and Other Non-Indexed Citations, and PsycINFO through OvidSP, and Web of Science through ISI Web of Knowledge from inception to 3 October 2018. Additionally, investigators examined citations from relevant reviews and requested information about unpublished studies from authors who contributed studies. Citations identified by the search were uploaded into RefWorks (RefWorks-COS, Bethesda, MD, USA). Duplicates were removed and unique citations were uploaded into DistillerSR (Evidence Partners, Ottawa, Canada). Two reviewers independently reviewed titles and abstracts. For publications deemed potentially eligible by either reviewer, a full text review was done by two reviewers, also independently. Any conflicts were resolved by consensus and a third reviewer was consulted if necessary.

Data contribution, extraction, and synthesis

We invited investigators with eligible datasets to contribute deidentified versions of their datasets. We attempted to contact corresponding authors of eligible primary studies by email up to three times, as necessary. When authors did not respond to our emails, we tried to contact them by phone and emailed their coauthors. Two investigators independently extracted information on the diagnostic interview administered and the country of study from the published reports. We used the United Nation’s human development index, based on year of study publication, which reflects life expectancy, education, and income,30 to categorise countries as very high, high, or low-medium development. Participant level data included in the synthesised dataset included age, pregnant or postpartum status, EPDS scores, and major depression classification status. We used major depressive disorder or major depressive episode based on the DSM or ICD criteria to classify major depression; if both were reported, we used major depressive episode because screening attempts to detect episodes of depression. Additional assessment would be needed to determine if episodes are related to major depressive disorder or another psychiatric disorder (bipolar disorder, persistent depressive disorder). We also prioritised DSM over ICD. We used statistical weights to reflect sampling procedures if provided in the datasets; for instance, when primary studies administered a diagnostic interview to all participants with positive screening results but only a random sample of those with negative results. Some studies used sampling procedures that merited weights but did not use weights. For those studies, we used inverse selection probabilities to generate appropriate weights. We verified that participant characteristics and accuracy results from individual datasets matched those that had been published. If any discrepancies were found, we worked with the primary study investigators to understand and resolve differences. All study level and individual level participant data were transformed into a standardised format and combined in a single synthesised dataset. For nine studies that collected data at multiple time points (four with two time points, four with three time points, and one with four time points), we selected the time point with the most participants. If the number of participants was maximised at multiple time points, we selected the one with the most women who had major depression.

Risk of bias assessment

We used the Quality Assessment of Diagnostic Accuracy Studies-2 tool (QUADAS-2; eMethods2)31 to assess risk of bias of included studies. Two investigators independently performed this assessment and any differences were resolved through consensus or by involving a third investigator if necessary. Values used in the risk of bias assessment were coded at both study and participant levels because some values might have differed among participants from the same study (eg, time interval between index test and reference standard).

Statistical analyses

We estimated sensitivity and specificity for three reference standard categories separately across cut-off values of 7-15 for all women. Reference standard categories included semi-structured interviews (Structured Clinical Interview for DSM Disorders (SCID),32 Clinical Interview Schedule,33 Diagnostic Interview for Genetic Studies34), fully structured interviews, excluding the Mini International Neuropsychiatric Interview (MINI35 36; Composite International Diagnostic Interview (CIDI),37 Clinical Interview Schedule-Revised38), and the MINI. We analysed studies that used different types of reference standards separately because we previously found that, controlling for depressive symptom levels, the MINI might classify depression more than other diagnostic interviews, and the CIDI might classify more participants with low level symptoms as having depression but fewer with high level symptoms.16 17 18 These findings are consistent with the design of the different types of diagnostic interviews. Semi-structured interviews are designed to be administered by an experienced diagnostician who can incorporate probes and queries, and use clinical judgment. Fully structured interviews are entirely scripted so that they can be administered by a trained lay interviewer and reduce required resources. By design, fully structured interviews are intended to increase standardisation, but this could be at the cost of reduced validity.39 40 41 42 The MINI is a brief version of a fully structured interview that was designed for rapid administration and tends to be overinclusive.36 We fit bivariate random effects models using Gauss-Hermite quadrature for each reference standard category, for cut-off values of 7-15 separately.43 This is a two stage meta-analytic approach that models sensitivity and specificity simultaneously and accounts for the correlation between them, and for precision of estimates within studies (that is, the clustering). This model provided estimates of pooled sensitivity and specificity for each analysis. We found four studies for the fully structured subgroup, one of which included only one participant with major depression. For this subgroup, we modified the bivariate model by setting the correlation between random effects to zero, and excluded the major depression case from the study that had only one major depression case. Therefore, we used three studies to evaluate sensitivity and four studies to measure specificity. We constructed empirical receiver operating characteristic curves based on pooled sensitivity and specificity estimates, and calculated area under the curves for each reference standard category. Additionally, we conducted one stage meta-regressions with interactions between reference standard category (reference category: semi-structured) and accuracy coefficients (logit(sensitivity) and logit(1−specificity)). We generated nomograms to present positive and negative predictive values for the optimal cut-off value (maximising Youden’s J=sensitivity+specificity−1) and the commonly used cut-off values of 10 or higher and 13 or higher for assumed major depression prevalence of 5-25%. We evaluated heterogeneity for each reference standard category by generating sensitivity and specificity forest plots for each study for the optimal cut-off value and for cut-off values of 10 or higher and 13 or higher. We quantified heterogeneity by reporting estimated variances of the random effects for sensitivity and specificity (τ2) and by estimating R, which is the ratio of the estimated standard deviation of the pooled sensitivity (or specificity) from the random effects model to that from the corresponding fixed effects model.44 Within semi-structured and MINI reference standard categories separately, we fit one stage meta-regressions where we interacted all participant characteristics (age (measured continuously), pregnant v postpartum status (reference category=pregnant), and country human development index (reference category=very high)) with logit(sensitivity) and logit(1−specificity). Too few studies existed that used other fully structured interviews to enable us to perform meta-regressions. We conducted post hoc analyses in which we fit additional one stage meta-regressions for year of study publication. No participants had missing data for any covariates in the meta-regressions. We assessed characteristics one at a time because models attempting to fit all participant characteristics simultaneously did not converge. When characteristics were significantly associated with sensitivity or specificity for all or most cut-off values in the meta-regressions, we fit bivariate random effects models for cut-off values of 7-15 for each subgroup. Age was fit continuously in the meta-regression but was dichotomised (<25 v ≥25 years45) to estimate accuracy by subgroups. For analyses in the age less than 25 subgroup, we excluded four semi-structured studies and four MINI studies that did not have any participants with major depression because the bivariate random effects model could not be applied. Therefore, 21 participants (1%) younger than 25 were excluded from semi-structured studies, and 77 (9%) from MINI studies. In sensitivity analyses, we conducted additional meta-regressions based on QUADAS-2 scores in semi-structured and MINI reference standard categories separately. We interacted QUADAS-2 domain scores with logit(sensitivity) and logit(1−specificity) for all domain scores with at least 100 participants with major depression and 100 without major depression among those categorised as having low risk of bias and among those with high or unclear risk of bias. We again assessed items one at a time. We performed additional sensitivity analyses for EPDS cut-off values of 10-13 by combining IPDMA accuracy results with results from studies that did not contribute individual participant data but published eligible accuracy results. All analyses were run in R (R version R 3.4.1, R Studio version 1.0.143) by using the glmer function within the lme4 package.

Patient and public involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community. However, an online knowledge translation tool, intended for clinicians (the end users of the EPDS screening tool), is available at depressionscreening100.com/epds. The tool allows clinicians to estimate the expected number of positive screens and true and false screening outcomes based on study results.

Results

Search results and dataset inclusion

We identified 4434 unique titles and abstracts from the database search. Of these, 4056 were excluded after title and abstract review and 257 after full text review (eTable1), resulting in 121 eligible articles from 81 unique participant samples. Of these, 56 (69%) contributed datasets (fig 1). Authors of included studies contributed data from two other studies that the search did not retrieve for a total of 58 datasets (15 557 participants, 2069 with major depression). eTable2 shows characteristics of primary studies that contributed data and eligible studies that did not provide datasets. Of 22 788 participants in 83 eligible published studies, 15 557 (68%) were included. Eligible studies that did and did not contribute data were generally similar in terms of sample size, proportion of participants with major depression (excluding non-contributing studies where number with major depression was not reported), and country human development index. The proportion of studies with pregnant women only was also similar for contributing and non-contributing studies. Among both contributing and non-contributing studies, most studies used semi-structured interviews as the reference standard, followed by the MINI, and other fully structured interviews.

Fig 1

Flow diagram of study selection process. EPDS=Edinburgh Postnatal Depression Scale

Flow diagram of study selection process. EPDS=Edinburgh Postnatal Depression Scale Of 58 included studies, 25 included pregnant women, 30 postpartum women, and three both pregnant and postpartum women. Thirty six studies used semi-structured reference standards, including 34 that used the SCID; four used fully structured reference standards (MINI excluded), including three that used the CIDI; and 18 used the MINI (table 1).

Table 1

Participant data by diagnostic interview

Diagnostic interview	No of studies	No of participants	No of participants with major depression (%)
Semi-structured
SCID	34	8811	1292 (15)
CIS	1	190	34 (18)
DIGS	1	65	4 (6)
Fully structured
CIDI	3	2963	196 (7)
CIS-R	1	226	32 (14)
MINI	18	3302	511 (15)
Total	58	15 557	2069 (13)

CIDI=Composite International Diagnostic Interview; CIS=Clinical Interview Schedule; CIS-R=Clinical Interview Schedule-Revised; DIGS=Diagnostic Interview for Genetic Studies; DSM=Diagnostic and Statistical Manual of Mental Disorders; MINI=Mini International Neuropsychiatric Interview; SCID=Structured Clinical Interview for DSM Disorders.

Participant data by diagnostic interview CIDI=Composite International Diagnostic Interview; CIS=Clinical Interview Schedule; CIS-R=Clinical Interview Schedule-Revised; DIGS=Diagnostic Interview for Genetic Studies; DSM=Diagnostic and Statistical Manual of Mental Disorders; MINI=Mini International Neuropsychiatric Interview; SCID=Structured Clinical Interview for DSM Disorders.

EPDS sensitivity and specificity by reference standard category

Table 2 shows sensitivity and specificity estimates for cut-off values of 7-15 by reference standard category. Combined sensitivity and specificity was maximised at a cut-off value of 11 or higher for semi-structured interviews (Youden’s J=0.70), fully structured interviews (Youden’s J=0.73), and the MINI (Youden’s J=0.66). For semi-structured interviews, sensitivity and specificity were 0.85 (95% confidence interval 0.79 to 0.90) and 0.84 (0.79 to 0.88) for a cut-off value of 10 or higher, 0.81 (0.75 to 0.87) and 0.88 (0.85 to 0.91) for a cut-off value of 11 or higher, and 0.66 (0.58 to 0.74) and 0.95 (0.92 to 0.96) for a cut-off value of 13 or higher. eFigure1 shows receiver operating characteristic curves and area under the curve values. No significant differences in accuracy by reference standard category were found that held across all cut-off values (eTable3). Results did not change substantively in sensitivity analyses that included published results from eight of the 26 studies that did not contribute individual participant data but published eligible accuracy results (eTable4). The other 18 eligible datasets that did not contribute individual participant data did not publish eligible diagnostic accuracy results (eTable2b).

Table 2

Comparison of sensitivity and specificity estimates for each reference standard category

Cut-off value	Semi-structured reference standard*		Fully structured reference standard†		MINI reference standard‡
Cut-off value	Sensitivity (95% CI)	Specificity (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)
7	0.95 (0.91 to 0.97)	0.65 (0.58 to 0.71)	0.95 (0.71 to 0.99)	0.57 (0.36 to 0.76)	0.95 (0.89 to 0.98)	0.60 (0.52 to 0.67)
8	0.92 (0.87 to 0.95)	0.72 (0.66 to 0.78)	0.95 (0.70 to 0.99)	0.62 (0.41 to 0.80)	0.91 (0.85 to 0.95)	0.67 (0.60 to 0.74)
9	0.89 (0.83 to 0.93)	0.78 (0.73 to 0.83)	0.95 (0.64 to 1.00)	0.71 (0.50 to 0.85)	0.88 (0.80 to 0.93)	0.74 (0.66 to 0.80)
10	0.85 (0.79 to 0.90)	0.84 (0.79 to 0.88)	0.93 (0.64 to 0.99)	0.78 (0.57 to 0.90)	0.84 (0.74 to 0.91)	0.79 (0.73 to 0.84)
11	0.81 (0.75 to 0.87)	0.88 (0.85 to 0.91)	0.90 (0.58 to 0.98)	0.83 (0.62 to 0.94)	0.82 (0.71 to 0.89)	0.84 (0.79 to 0.89)
12	0.75 (0.67 to 0.81)	0.92 (0.89 to 0.94)	0.81 (0.56 to 0.94)	0.86 (0.70 to 0.94)	0.74 (0.60 to 0.85)	0.89 (0.83 to 0.92)
13	0.66 (0.58 to 0.74)	0.95 (0.92 to 0.96)	0.79 (0.50 to 0.94)	0.90 (0.75 to 0.96)	0.69 (0.54 to 0.81)	0.91 (0.87 to 0.94)
14	0.58 (0.50 to 0.66)	0.96 (0.95 to 0.98)	0.77 (0.43 to 0.94)	0.93 (0.82 to 0.98)	0.60 (0.45 to 0.73)	0.94 (0.91 to 0.96)
15	0.51 (0.44 to 0.58)	0.97 (0.96 to 0.98)	0.66 (0.37 to 0.87)	0.95 (0.86 to 0.99)	0.52 (0.39 to 0.64)	0.95 (0.92 to 0.97)

MINI=Mini International Neuropsychiatric Interview.

Number of studies=36; number of participants=9066; number of participants with major depression=1330.

Number of studies=3 for sensitivity and 4 for specificity; number of participants=3188; number of participants with major depression=227. We modified the bivariate model by setting the correlation between random effects to zero and excluded the participant with major depression from the study that had only one participant with major depression.

Number of studies=18; number of participants=3302; number of participants with major depression=511.

Comparison of sensitivity and specificity estimates for each reference standard category MINI=Mini International Neuropsychiatric Interview. Number of studies=36; number of participants=9066; number of participants with major depression=1330. Number of studies=3 for sensitivity and 4 for specificity; number of participants=3188; number of participants with major depression=227. We modified the bivariate model by setting the correlation between random effects to zero and excluded the participant with major depression from the study that had only one participant with major depression. Number of studies=18; number of participants=3302; number of participants with major depression=511. Nomograms of positive and negative predictive values by reference standard category are shown in figure 2 (cut-off values of ≥11 and ≥13) and eFigure2 (cut-off value of ≥10). For major depression prevalence values of 5-25% and a cut-off value of 11 or higher compared with semi-structured interviews, positive predictive values ranged from 26% to 69% and negative predictive values ranged from 93% to 99%. Ranges were similar for other reference standard types.

Fig 2

Nomograms of positive and negative predictive values by reference standard category (semi-structured diagnostic interviews, fully structured diagnostic interviews, and the MINI) for major depression prevalence values of 5-25%. Upper left panel: EPDS cut-off value of 11 or higher positive predictive value; upper right panel: EPDS cut-off value of 11 or higher negative predictive value; lower left panel: EPDS cut-off value of 13 or higher positive predictive value; lower right panel: EPDS cut-off value of 13 or higher negative predictive value. EPDS=Edinburgh Postnatal Depression Scale; MINI=Mini International Neuropsychiatric Interview

EPDS accuracy among subgroups

Older age (measured continuously) was associated with higher specificity for both the semi-structured and MINI reference standard categories for cut-off values of 9-15 (eTable3). However, based on bivariate random effect models among participants younger than 25 and among those aged 25 or older, specificity estimates were similar across age groups (median difference across cut-off values of 7-15: 0.02 for semi-structured studies, 0.03 for MINI studies; eTable6). No other study or participant characteristics were consistently associated with differences in sensitivity or specificity estimates across both reference standard categories.

Risk of bias sensitivity analyses

eTable7 shows QUADAS-2 ratings for included studies. No QUADAS-2 domain items were consistently associated with differences in sensitivity or specificity estimates across semi-structured and MINI reference standard categories (eTable3).

Discussion

Principal findings

Our main finding was that combined sensitivity and specificity was maximised at a cut-off value of 11 or higher across reference standards. For semi-structured interviews, which are designed to closely replicate clinical diagnoses by mental health professionals, sensitivity and specificity were 81% and 88% for a cut-off value of 11 or higher. At cut-off values of 10 or higher and 13 or higher, which are commonly used for depression screening,13 sensitivity and specificity were 85% and 84%, and 66% and 95%, respectively. Accuracy was similar across reference standards, similar among pregnant and postpartum women, and similar based on other study and participant characteristics.

Comparison with other studies

The cut-off value of 11 or higher that maximised combined sensitivity and specificity in the present study is lower than both the most commonly used cut-off value of 13 or higher13 and the cut-off value of 12 or higher that maximised combined sensitivity and specificity in a previous EPDS accuracy meta-analysis.13 Based on studies with a semi-structured reference standard, across cut-off values of 10-13, sensitivity estimates in the present IPDMA were 6-13% lower than those in the previous meta-analysis, whereas specificity estimates were 4-7% higher. Differences in results between the current IPDMA and the previous meta-analysis might have occurred because the current IPDMA consisted of 58 primary studies, including 36 with a semi-structured reference standard, versus 21 primary studies with various types of reference standards in the previous meta-analysis. Additionally, the current IPDMA incorporated data from all cut-off values for all included studies, whereas the previous meta-analysis was limited to published results and used different sets of studies to evaluate accuracy at different cut-off values.

Implications

Depression screening recommendations differ among prominent international guideline making bodies, and well conducted trials are needed to determine if screening would improve mental health or other important outcomes, such as child development and family outcomes. The present study found that an EPDS cut-off value of 11 or higher maximised combined sensitivity and specificity. Other cut-off values could be used in practice or clinical trials if either sensitivity or specificity is to be prioritised. For instance, if the intention is to only capture participants with high depressive symptom levels, a higher cut-off value might be desired. Conversely, if the intention is to avoid false negatives and capture all participants who might meet diagnostic criteria based on further evaluation, a lower cut-off value might be preferred. Clinicians considering screening for depression with the EPDS can refer to our online knowledge translation tool (depressionscreening100.com/epds), which estimates expected numbers of positive screens and true and false screening outcomes based on results from our IPDMA.

Strengths and limitations

This study used IPDMA to assess EPDS screening accuracy. Strengths include analysis of data from more than twice the number of studies compared with the previous conventional meta-analysis,13 and including results for all cut-off values from all studies. Additionally, our analysis examined the possible influence of study and participant characteristics on accuracy, and assessed accuracy separately across reference standards. A previous meta-analysis of the EPDS, which was a conventional meta-analysis and only included published results, was not able to incorporate results for all key cut-off values from all included studies because they were not consistently published. Furthermore, the previous meta-analysis was not able to conduct subgroup analyses by participant characteristics or reference standards.13 Among other important findings, the present study showed that the same cut-off value can be used in pregnant and postpartum women. Limitations also need to be considered. Firstly, we did not obtain data from 25 of 83 published eligible datasets, although the results did not change when we incorporated published results from studies that did not contribute data but published eligible accuracy results. Secondly, moderate heterogeneity was found across studies. Thirdly, we could not conduct subgroup analyses based on cultural aspects, such as country or language, or in specific pregnancy trimesters or postpartum periods because the data were insufficient. We found no significant or substantive differences based on country human development index, but few studies from low and middle income countries were included. Fourthly, while we categorised studies based on the interview administered, interviews might not always be used as intended; one third of studies were coded as unclear for interviewer qualification in our risk of bias assessment.

Conclusions

In summary, we found that combined sensitivity and specificity for the EPDS is maximised at a cut-off value of 11 or higher. Additionally, accuracy did not differ significantly based on reference standards or participant characteristics, including whether the EPDS is administered during pregnancy or in the postpartum period. Clinicians considering screening for depression with the EPDS can refer to our online tool (depressionscreening100.com/epds) to identify alternative cut-off values that maximise other parameters. Well conducted trials are needed to determine if screening with the EPDS could improve mental health outcomes and minimise harms and resource use. The Edinburgh Postnatal Depression Scale (EPDS) is the most commonly used depression screening tool in perinatal care, with cut-off values of 10 or higher and 13 or higher typically used to identify women who might be depressed A previous meta-analysis of the screening accuracy of the EPDS, which was conducted more than 13 years ago, found that a cut-off value of 12 or higher maximised combined sensitivity and specificity in postpartum women (21 studies) The previous meta-analysis did not pool results for pregnant women because too few studies were found, and no subgroup analyses were conducted among postpartum women because primary studies did not report the necessary data An EPDS cut-off value of 11 or higher maximised combined sensitivity and specificity (81% and 88%, respectively) For commonly used cut-off values of 10 or higher and 13 or higher, sensitivity and specificity were 85% and 84%, and 66% and 95%, respectively; results did not differ across subgroups, including pregnant versus postpartum status An online knowledge translation tool is available to estimate the expected number of positive screens and true and false screening outcomes based on study results (depressionscreening100.com/epds)

31 in total

1. Quantifying heterogeneity in a meta-analysis.

Authors: Julian P T Higgins; Simon G Thompson
Journal: Stat Med Date: 2002-06-15 Impact factor: 2.373

2. A standardized psychiatric interview for use in community surveys.

Authors: D P Goldberg; B Cooper; M R Eastwood; H B Kedward; M Shepherd
Journal: Br J Prev Soc Med Date: 1970-02

3. Measuring psychiatric disorder in the community: a standardized assessment for use by lay interviewers.

Authors: G Lewis; A J Pelosi; R Araya; G Dunn
Journal: Psychol Med Date: 1992-05 Impact factor: 7.723

4. A general population comparison of the Composite International Diagnostic Interview (CIDI) and the Schedules for Clinical Assessment in Neuropsychiatry (SCAN).

Authors: T S Brugha; R Jenkins; N Taub; H Meltzer; P E Bebbington
Journal: Psychol Med Date: 2001-08 Impact factor: 7.723

5. Small signal, big noise: performance of the CIDI depression module.

Authors: Paul A Kurdyak; William H Gnam
Journal: Can J Psychiatry Date: 2005-11 Impact factor: 4.356

6. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.

Authors: Penny F Whiting; Anne W S Rutjes; Marie E Westwood; Susan Mallett; Jonathan J Deeks; Johannes B Reitsma; Mariska M G Leeflang; Jonathan A C Sterne; Patrick M M Bossuyt
Journal: Ann Intern Med Date: 2011-10-18 Impact factor: 25.391

7. Probability of major depression diagnostic classification based on the SCID, CIDI and MINI diagnostic interviews controlling for Hospital Anxiety and Depression Scale - Depression subscale scores: An individual participant data meta-analysis of 73 primary studies.

Authors: Yin Wu; Brooke Levis; Ying Sun; Ankur Krishnan; Chen He; Kira E Riehm; Danielle B Rice; Marleine Azar; Xin Wei Yan; Dipika Neupane; Parash Mani Bhandari; Mahrukh Imran; Matthew J Chiovitti; Nazanin Saadat; Jill T Boruff; Pim Cuijpers; Simon Gilbody; Dean McMillan; John P A Ioannidis; Lorie A Kloda; Scott B Patten; Ian Shrier; Roy C Ziegelstein; Melissa Henry; Zahinoor Ismail; Carmen G Loiselle; Nicholas D Mitchell; Marcello Tonelli; Samir Al-Adawi; Anna Beraldi; Anna P B M Braeken; Natalie Büel-Drabe; Adomas Bunevicius; Gregory Carter; Chih-Ken Chen; Gary Cheung; Kerrie Clover; Ronán M Conroy; Daniel Cukor; Carlos E da Rocha E Silva; Eli Dabscheck; Federico M Daray; Elles Douven; Marina G Downing; Anthony Feinstein; Panagiotis P Ferentinos; Felix H Fischer; Alastair J Flint; Maiko Fujimori; Pamela Gallagher; Milena Gandy; Simone Goebel; Luigi Grassi; Martin Härter; Josef Jenewein; Nathalie Jetté; Miguel Julião; Jae-Min Kim; Sung-Wan Kim; Marie Kjærgaard; Sebastian Köhler; Wim L Loosman; Bernd Löwe; Rocio Martin-Santos; Loreto Massardo; Yutaka Matsuoka; Anja Mehnert; Ioannis Michopoulos; Laurent Misery; Ricard Navines; Meaghan L O'Donnell; Ahmet Öztürk; Jurate Peceliuniene; Luis Pintor; Jennie L Ponsford; Terence J Quinn; Silje E Reme; Katrin Reuter; Alasdair G Rooney; Roberto Sánchez-González; Marcelo L Schwarzbold; Vesile Senturk Cankorur; Juwita Shaaban; Louise Sharpe; Michael Sharpe; Sébastien Simard; Susanne Singer; Lesley Stafford; Jon Stone; Serge Sultan; Antonio L Teixeira; Istvan Tiringer; Alyna Turner; Jane Walker; Mark Walterfang; Liang-Jen Wang; Jennifer White; Dana K Wong; Andrea Benedetti; Brett D Thombs
Journal: J Psychosom Res Date: 2019-12-13 Impact factor: 4.620

8. Diagnostic accuracy of the Edinburgh Postnatal Depression Scale (EPDS) for detecting major depression in pregnant and postnatal women: protocol for a systematic review and individual patient data meta-analyses.

Authors: Brett D Thombs; Andrea Benedetti; Lorie A Kloda; Brooke Levis; Kira E Riehm; Marleine Azar; Pim Cuijpers; Simon Gilbody; John P A Ioannidis; Dean McMillan; Scott B Patten; Ian Shrier; Russell J Steele; Roy C Ziegelstein; Marcello Tonelli; Nicholas Mitchell; Liane Comeau; Joy Schinazi; Simone Vigod
Journal: BMJ Open Date: 2015-10-20 Impact factor: 2.692

9. Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis.

Authors: Brooke Levis; Andrea Benedetti; Brett D Thombs
Journal: BMJ Date: 2019-04-09

10. Instruments to identify post-natal depression: Which methods have been the most extensively validated, in what setting and in which language?

Authors: Catherine E Hewitt; Simon M Gilbody; Rachel Mann; Stephen Brealey
Journal: Int J Psychiatry Clin Pract Date: 2009-12-09 Impact factor: 1.812

74 in total

1. Applying machine learning methods to psychosocial screening data to improve identification of prenatal depression: Implications for clinical practice and research.

Authors: Heidi Preis; Petar M Djurić; Marzieh Ajirak; Tong Chen; Vibha Mane; David J Garry; Cassandra Heiselman; Joseph Chappelle; Marci Lobel
Journal: Arch Womens Ment Health Date: 2022-08-20 Impact factor: 4.405

2. Alcohol consumption habits and associations with anxiety or depressive symptoms postpartum in women with high socioeconomic status in Sweden.

Authors: Susanne Lager; Karin Gidén; Cathrine Axfors; Frida Sigvardsson; Natasa Kollia; Ingrid Nylander; Emma Fransson; Alkistis Skalkidou
Journal: Arch Womens Ment Health Date: 2022-09-26 Impact factor: 4.405

3. Psychological Distress and Behavioural Changes in Pregnant and Postpartum Individuals During the COVID-19 Pandemic.

Authors: Sabrina Kolker; Anne Biringer; Jessica Bytautas; Sahana Kukan; June Carroll
Journal: J Obstet Gynaecol Can Date: 2022-06-25

4. Assessment of Patient-Reported Outcome Measures for Maternal Postpartum Depression Using the Consensus-Based Standards for the Selection of Health Measurement Instruments Guideline: A Systematic Review.

Authors: Pervez Sultan; Kazuo Ando; Rania Elkhateb; Ronald B George; Grace Lim; Brendan Carvalho; Ahish Chitneni; Ray Kawai; Tanya Tulipan; Lindsay Blake; Jessica Coker; James O'Carroll
Journal: JAMA Netw Open Date: 2022-06-01

5. The association of unplanned pregnancy with perinatal depression: a longitudinal cohort study.

Authors: Lotte Muskens; Myrthe G B M Boekhorst; Willem J Kop; Marion I van den Heuvel; Victor J M Pop; Annemerle Beerthuizen
Journal: Arch Womens Ment Health Date: 2022-03-26 Impact factor: 4.405

6. Prenatal Exposure to Bisphenols and Phthalates and Postpartum Depression: The Role of Neurosteroid Hormone Disruption.

Authors: Melanie H Jacobson; Cheryl R Stein; Mengling Liu; Marra G Ackerman; Jennifer K Blakemore; Sara E Long; Graziano Pinna; Raquel Romay-Tallon; Kurunthachalam Kannan; Hongkai Zhu; Leonardo Trasande
Journal: J Clin Endocrinol Metab Date: 2021-06-16 Impact factor: 5.958

7. Cord blood DNA methylation modifications in infants are associated with white matter microstructure in the context of prenatal maternal depression and anxiety.

Authors: Douglas C Dean; Andy Madrid; Elizabeth M Planalp; Jason F Moody; Ligia A Papale; Karla M Knobel; Elizabeth K Wood; Ryan M McAdams; Christopher L Coe; H Hill Goldsmith; Richard J Davidson; Reid S Alisch; Pamela J Kling
Journal: Sci Rep Date: 2021-06-09 Impact factor: 4.379

8. The moderating role of resilience resources in the association between stressful life events and symptoms of postpartum depression.

Authors: Melissa Julian; Huynh-Nhu Le; Mary Coussons-Read; Calvin J Hobel; Christine Dunkel Schetter
Journal: J Affect Disord Date: 2021-06-05 Impact factor: 6.533

9. Screening for perinatal depression with the Patient Health Questionnaire depression scale (PHQ-9): A systematic review and meta-analysis.

Authors: Larry Wang; Kurt Kroenke; Timothy E Stump; Patrick O Monahan
Journal: Gen Hosp Psychiatry Date: 2020-12-21 Impact factor: 7.587

10. Impact of COVID-19 pandemic on postpartum depression among mothers of extreme and early preterm infants.

Authors: Tzanka Vatcheva; Anne Mostaert; Valérie Van Ingelgem; Elisabeth Henrion; Ludovic Legros
Journal: Int J Gynaecol Obstet Date: 2021-09-08 Impact factor: 4.447