Literature DB >> 34843588

The limited screening accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in Italy.

Antonella Gigantesco¹, Gabriella Palumbo¹, Loredana Cena², Laura Camoni¹, Alice Trainini², Alberto Stefana², Fiorino Mirabella¹.

Abstract

BACKGROUND: The PHQ-2 was recently recommended by the International Consortium for Health Outcomes Measurement as a form of initial perinatal screening, followed by the EPDS only for women with positive PHQ-2 score. However, the accuracy of the PHQ-2 in perinatal clinical practice has been barely researched, to date. In the present study, we aim to assess the accuracy of the PHQ-2 against the EPDS in a large sample of perinatal women.
METHODS: A total of 1155 consecutive women attending eleven primary or secondary health care centres throughout Italy completed the EPDS and the PHQ-2 during pregnancy (27-40-weeks) or postpartum (1-13-weeks). Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio, negative likelihood ratio, post-test probabilities and area under the curve (AUC) of the PHQ-2, using a cut-off of ≥ 3, were calculated. MAIN
FINDINGS: During pregnancy, PHQ-2 revealed low sensitivity (39.5%) and PPV (39.4%) but high specificity and NPV (97.5%). In postpartum, it revealed very low sensitivity (32.7%) and moderately high NPV (80.9%), but high specificity (99.3%) and PPV (94.4%). Given the low sensitivity despite the high specificity, the PHQ-2 demonstrated poor accuracy (AUC from 0.66 to 0.68).
CONCLUSION: Initial screening by means of PHQ-2 failed to identify an acceptable number of perinatal women at-risk of depression in Italian clinical practice. The PHQ-2 performance suggested that it has insufficient sensitivity and discriminatory power, and may be inadequate as a screening tool for maternal depression.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34843588 PMCID： PMC8629231 DOI： 10.1371/journal.pone.0260596

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

In Italy, the proportion of women who experience depression during pregnancy and postpartum ranges from 6% [1] to 12% [2, 3] and from 7 to 20% [1, 4], respectively. Perinatal depression has been associated with poorer outcomes such as a reduction in a woman’s ability to perform daily activities and parenting [5], which increases the risk of significant adverse consequences over the years for the psychological well-being and health of the family [6]. It should be highlighted that maternal depression has a negative impact on the child’s development [7]. Given that effective psychosocial interventions (e.g., the WHO Thinking Healthy Programme) [8] and psychological therapies (e.g., cognitive behavioural therapy and interpersonal therapy) are available [9], the importance of early detection of perinatal depression cannot be overstated. Primary and non-psychiatric specialty care visits represent important opportunities to detect or launch treatment of perinatal depression because, typically, perinatal depressed women are more likely to seek care in obstetrics/gynaecology medical settings than in specialised mental health settings. In light of the points mentioned above, several authors have called for identification procedures for perinatal depression to be introduced in clinical practice such as in obstetrics/gynaecology, paediatrics, or internal/family medicine settings [10, 11]. This demands guidelines regarding the implementation of a standardised approach to detect depression with satisfactory accuracy. It has been observed that the accuracy of depression recognition in non-psychiatric health care settings is unsatisfactory [12] and, to date, there is considerable variability regarding how the ascertainment of the risk of depression should be conducted among perinatal women in non-psychiatric clinical practice [13]. International recommendations have adopted various positions concerning the implementation of depression screening in clinical practice. In spite of the fact that the Australian Clinical Practice Guidelines for perinatal depression [14] declare that the use of a universal screening approach is good practice, and also other countries such as the United States [15] and Canada [16] recommend universal screening using the Edinburgh Postnatal Depression Scale (EPDS) [17], neither the Scottish [18], nor British [19] guidelines recommend its use. However, they do recommend selective sequential testing through a 2-step approach in which a 2-question ultra-brief instrument (i.e., Whooley questions) [20] is administered first, followed, only for some women, by the Patient Health Questionnaire-9 (PHQ-9) [21] or EPDS [17]. It should be noted, however, that the NICE recommendation to use the Whooley questions [19] was made in the absence of any validation studies in a large perinatal population. In fact, there is limited evidence for the use of the Whooley questions as a screening tool for maternal depression, only having been validated in small perinatal populations with a wide range of sensitivities found, between 46 and 100% [22]. Consistently with the recommendation of using a universal screening approach, recently, the International Consortium for Health Outcomes Measurement (ICHOM) identified the 2-item PHQ-2 as an initial step to recognize at-risk women to be administered at first contact during pregnancy and again during the early postpartum period, followed by the longer EPDS as confirmatory screening only for women who respond positively to either question [23]. In order to achieve a maximum likelihood that all women at risk of depressive symptoms would be administered the EPDS, it is essential that the PHQ-2 is highly sensitive and, more generally, highly accurate. However, to date, perinatal evidence of PHQ-2 accuracy is limited, and only few studies have validated the PHQ-2 in perinatal settings [22, 24]. Of these, only two have compared the PHQ-2 to a structured diagnostic interview, one showing low sensitivity and moderate specificity [25] and the other moderately high sensitivity and specificity [26]. Apart from these studies, some other studies have been conducted using the EPDS as a reference standard, with mixed results, since sensitivity ranged from 19% to 100% and specificity ranged from 75% to 93% [27-30]. Given the paucity and the inconsistency of reports, additional research in larger and different maternity populations has been recommended in order to validate the PHQ-2 as part of a maternal health care policy to detect perinatal depression [22, 24, 29]. In the present study, we aimed to test the accuracy of the PHQ-2, using the EPDS as a reference standard, in a large sample of pregnant and postpartum women attending several primary or obstetric-gynaecology secondary care centres throughout Italy. To our knowledge, no study has yet compared the use of the PHQ-2 to the EPDS in Italian perinatal clinical practice. The main goal was to determine whether or not the PHQ-2 was highly sensitive and specific, and therefore able to rule out and rule in depression in maternity clinical practice.

Methods

Outline of the study

The study is part of a larger body of work conducted by the Observatory of Perinatal Clinical Psychology (University of Brescia, Italy) and the Italian National Institute of Health. This work included a study which merges a cross-sectional study and a pre–post intervention cohort study [31] with two main objectives: (1) to evaluate the prevalence of both maternal antepartum and postpartum depression and anxiety in a sample of women in Italy (cross-sectional study component) and (2) to evaluate the effectiveness of psychological intervention [32] for both antenatal and postnatal depression (pre-post intervention cohort study component). The present study came from the data collected in the cross-sectional study.

Study protocol

Participants were recruited from eleven publicly-funded primary or obstetric-gynaecology or paediatrics secondary care centres of the Observatory of Perinatal Clinical Psychology (University of Brescia) throughout Italy (Bergamo, Brescia, Mantua and Milan, Lombardy Region; Bologna, Emilia Romagna Region; Florence, Tuscany Region, Novara and Collegno, Piedmont Region; Roma, Lazio Region; Enna, Sicily Region). Evaluation tools were administered once during the pre- or postpartum period, depending on the characteristics of each healthcare centre. The study was approved by the ethics committee of the Healthcare Centre of Bologna (registration number 0077805, dated 6/27/2017) [29].

Procedure and participants

Participants were recruited during a routine perinatal health check-up or paediatric vaccination appointment at one of the eleven publicly-funded healthcare facilities between 2017–2018. Consecutive women were invited to participate in the study. Specifically, the participation to the study was offered by the obstetricians or gynaecologists or paediatricians of those facilities. All the women approached were provided with a pamphlet developed as part of the study, in which the purpose, aims and methodology of the study were explained. Women who wished to participate provided their personal information (name or phone number to be contacted later) in order to meet up with trained psychologists. These psychologists had attended a training on screening, assessment, and treatment for maternal perinatal mental health problems, developed by the National Institute of Health [4]. Women who definitively agreed to participate signed an informed consent form. Then, they were underwent a semi-structured interview (not a diagnostic interview) led by the trained psychologists to elicit information on current and past maternal experience with psychiatric conditions and use of psychotropic drugs. Psychiatric conditions included symptoms of anxiety, depression, psychotic symptoms (i.e., delusions and/or hallucinations), non-suicidal self-harm tendencies, suicidal ideation or substance abuse. The inclusion criteria to be enrolled were being able to speak and read Italian well and being a woman aged ≥18 years with a biological baby aged ≤52 weeks. The exclusion criteria were having issues with drug or substance misuse and/or having on-going psychotic symptoms. The enrolled women were then administered the pertaining scheduled self-report assessment tools for data collection (see Tools). All the women completed the interviews and self-report instruments at the facilities; the majority of them on the same day in which they were invited to join. Few women provided their phone numbers to be contacted for arranging a subsequent appointment at the facilities in order to complete the instruments. The psychologists who administered the self-report tools, made sure that they returned fully completed.

Tools

Psychosocial assessment form

Socio-demographic data were collected at baseline by means of the Psychosocial Assessment Form [4] which addresses socio-demographic characteristics and other information. Socio-demographic characteristics include: age (years), marital status (married or cohabitating; single, separated, divorced or widowed), educational level (primary or illiterate; secondary high school; University), working status (student, homemaker, or unemployed; temporary employee; permanent employee), economic status (several problems; a few problems without specific difficulties; average to high status) and children living at the time of the current pregnancy/birth (yes-no).

Edinburgh Postnatal Depression Scale (EPDS)

The EPDS [17, 33] is the most widely used screening tool for depressive symptoms during pregnancy and postpartum [34, 35]. It is a self-rating measure containing 10 items concerning the symptoms of depression such as anhedonia, feelings of guilt, lethargy, sleep disturbance and suicidal ideation occurring in the past 7 days. Each symptom is scored on a four-point Likert. The total score ranges from 0 to 30, with higher scores indicating more severe depressive symptoms.

Patient Health Questionnaire (PHQ-2)

The PHQ-2 [36] consists of the first two items of the PHQ-9 (investigating depressed mood and anhedonia). The questions ask: “Over the last 2 weeks, how often have you been bothered by little interest or pleasure in doing things?” and “Over the last 2 weeks, how often have you been bothered by feeling down, depressed, or hopeless?”. For each item, the response options are “not at all” (0), “several days” (1), “more than half the days” (2), and “nearly every day” (3). The total score ranges from 0 to 6, with higher scores indicating greater depressive symptoms.

Statistical analysis

All analyses were conducted using the Statistical Package for Social Science (SPSS) created for Windows, version 26.0. The EPDS was considered the reference standard. The PHQ-2 was analysed using total score at cut-off point 3 or more, as is usually recommended to define the result as positive [37, 38]. The EPDS total score was transformed into a binary variable to indicate positive screening for depression using the cut-off score of ≥ 13 during pregnancy, and ≥ 10 in the postpartum, as recommended by literature [39]. Socio-demographic characteristics of women with positive screening for depression were summarised using descriptive statistics. The chi-square test (or Fisher exact test) were used to test for differences between women with positive screening for depression and women without for each socio-demographic characteristic. Proportion of depression risk according to the EPDS, and positive PHQ-2 responses are presented as frequencies and percentages with 95% confidence intervals.

Internal consistency

The internal consistency of the EPDS and the PHQ-2 were assessed. Due to the fact PHQ-2 includes only two items, the mean inter-item correlation (MIC) was adopted.

Accuracy (or criterion validity)

The screening accuracy of the PHQ-2, defined as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR-), and area under the curve (AUC) against the EPDS cut-points using ROC (Receiver Operating Characteristic) analysis was assessed [40]. Positive and negative post-test probabilities were also calculated. The standard error for the area was set as non-parametric with a 95% confidence interval. Area under the curve (AUC) was interpreted as follows: AUC = 0.60–0.70 = poor, 0.70–0.80 = fair, 0.80–0.90 = good, 0.90–1.0 = excellent.

Results

A total of 1155 women, 71% of those who were asked to participate in the study, filled out both the EPDS and the PHQ-2. Both groups of pregnant and postpartum women were primarily in their thirties (Table 1). Overall, the majority of them were married or lived with their partner and were well educated. Furthermore, the majority were employed in paid work and only few had serious economic difficulties.

Table 1

Socio-demographic, pregnancy and delivery characteristics of participants.

	Antenatal		Postnatal	P
	(from 27 to 40-weeks)		(from 1 to 13-weeks)
	n (%)	EPDS ≥ 13	n (%)	EPDS ≥ 10
		n (%)		n (%)
Total	954	38 (4.0%)	201	52 (25.9%)
	(%)		(17.4%)
Age
18–29	209 (21.9)	8 (21.1)	30 (14.9)	6 (11.5)
30–35	453 (47.5)	19 (50.0)	79 (39.3)	16 (30.8)
> 35	292 (30.6)	11 (28.9)	92 (45.8)	30 (57.7)
Marital status
Married or cohabitating	878 (92.6)	35 (92.1)	183 (91.5)	41 (78.8)
Single, separated, divorced or widowed	70 (7.4)	3 (7.9)	17 (8.5)	11 (21.2) ^§§
Educational level
Primary or illiterate	100 (10.5)	9 (23.7)^§	26 (13.0)	4 (7.7)
Secondary	341 (36.0)	11 (28.9)	83 (41.5)	21 (40.4)
University	507 (53.5)	18 (47.4)	91 (45.5)	27 (51.9)
Economic status *
Several problems	58 (6.2)	5 (13.2)	16 (8.1)	9 (17.3)^§
A few problems without specific difficulties	433 (45.9)	18 (47.4)	99 (50.0)	24 (46.2)
Average to high status	452 (47.9)	15 (39.4)	83 (41.9)	19 (36.5)
Working status
Student, homemaker, or unemployed	151 (16.0)	7 (18.4)	39 (19.6)	8 (15.7)
Temporary employee	89 (9.4)	0 (0.0)	20 (10.1)	6 (11.8)
Permanent employee	702 (74.6)	31 (81.6)	140 (70.4)	37 (72.5)
Children living at the time of this pregnancy/birth
No	798 (83.6)	28 (73.7)	137 (68.2)	40 (76.9)
Yes	157 (16.4)	10 (26.3)	64 (31.8)	12 (23.1)

*Regarding economic status: Several problems = having debts, difficulty or inability to pay daily expenses and rent; A few problems without specific difficulties = relatively modest standard of living but without particular difficulties; Average high status = home owned, possibility of taking holidays or travelling for pleasure.

§ p<0.05.

§§ p<0.01.

Socio-demographic, pregnancy and delivery characteristics of participants.

Number (valid percentage). P = statistical significance for the comparison between negative EPDS screening (score<13 during pregnancy and < 10 in postpartum) and positive EPDS screening results (score ≥ 13 during pregnancy and ≥ 10 in postpartum). *Regarding economic status: Several problems = having debts, difficulty or inability to pay daily expenses and rent; A few problems without specific difficulties = relatively modest standard of living but without particular difficulties; Average high status = home owned, possibility of taking holidays or travelling for pleasure. § p<0.05. §§ p<0.01. Significant differences between women with positive EPDS and women without were found regarding educational level, economic problems and marital status. Specifically, among pregnant women, positive EPDS was associated with university education level and, among postnatal women, positive EPDS was associated with having several economic problems and not being married or cohabitating with a partner (Table 1).

Internal consistency

The internal consistency of the EPDS was: α = 0.80 during pregnancy and α = 0.87 following delivery. The PHQ-2 showed acceptable mean inter-item correlations (MIC) both during pregnancy (MIC = 0.38) and postpartum (MIC = 0.45).

Proportion of minor depression based on EPDS and PHQ-2 positive screen tests

The percentage of antenatal and postnatal women who screened positive for depression as determined by EPDS totalled 4.0% and 25.9%, respectively (Table 2). Positive cases determined through the PHQ-2 was 4.0% during pregnancy and 8.9% postpartum. Having a positive EPDS was found to be higher in the postpartum than during pregnancy. Regarding PHQ-2, the number of cases was also higher in the first trimester following birth.

Table 2

Proportion of women with probable minor depression ascertained with EPDS, and case identification using PHQ-2.

	Pregnancy (from 27 to 40-weeks)	Postpartum (from 1 to 13-weeks)
	N = 954	N = 201
	n	n
	% (95% CI)	% (95% CI)
Screened positive for depression*	38	52
	4.0 (2.7–5.1)	25.9 (19.9–32.1)
PHQ-2°	38	18
	4.0 (2.8–5.2)	8.9 (5.0–12.8)

* Screened positive for depression: EPDS cut-off score ≥ 13 during pregnancy and ≥ 10 postpartum.

° PHQ-2 total score cut-off point ≥ 3.

* Screened positive for depression: EPDS cut-off score ≥ 13 during pregnancy and ≥ 10 postpartum. ° PHQ-2 total score cut-off point ≥ 3.

Antepartum and postpartum minor depression: PHQ-2 accuracy

The performance of the PHQ-2 with EPDS as a standard criterion during pregnancy and postpartum is presented in Table 3.

Table 3

Performance of the PHQ-2 against the EPDS.

	Antenatal	Postnatal
	(from 27 to 40-weeks)	(from 1 to 13-weeks)
Sensitivity (95% CI)	39.5 (24.0–56.6)	32.7 (20.3–47.1)
Specificity (95% CI)	97.5 (96.3–98.4)	99.3 (96.3–99.9)
PPV (95% CI)	39.4 (27.0–53.4)	94.4 (69.9–99.2)
NPV (95% CI)	97.5 (96.8–98.0)	80.9 (77.8–83.6)
LR+ (95% CI)	15.7 (8.9–27.6)	48.7 (6.6–357.0)
LR- (95% CI)	0.62 (0.48–0.80)	0.68 (0.56–0.82)
Positive post-test probability	39.4 (27.0–53.4)	94.4 (69.9–99.2)
Negative post-test probability	2.5 (2.0–3.2)	19.2 (16.4–22.2)

CI = Confidence Interval.

CI = Confidence Interval. During pregnancy, out of 38 women that scored positive on the EPDS, 15 tested positive and 23 negative using the PHQ-2. Among 38 women who screened positive using the PHQ-2, 15 were found to have probable minor antepartum depression as determined by the EPDS. Of 916 women that scored negative with the EPDS, 893 tested negative with the PHQ-2 and among 916 women who screened negative with the PHQ-2, 893 were found not to have probable minor antepartum depression. Following delivery, out of 52 women that scored positive on the EPDS, 17 tested positive and 35 negative on the PHQ-2. Among 18 women who screened positive with the PHQ-2, 17 were found to have probable minor postpartum depression. Out of 149 women that scored negative with the EPDS, 148 tested negative with the PHQ-2 and among 183 women who screened negative with the PHQ-2, 148 were found not to have probable minor postpartum depression. During pregnancy, the LR+ ranged from moderate (8.9) to high (27.6) and the LR- ranged from weak (0.5) to very weak (0.8) [41]. Following delivery, the LR+ ranged from moderate (6.6) to very high (357.6), with a wide and imprecise confidence interval, and the LR- was weak (from 0.56 to 0.82) [41]. The post-test probability that a woman who tested negative with the PHQ-2 had minor depression was 2.5% during pregnancy and 19.2% postpartum. The post-test probability that a woman who tested positive with the PHQ-2 had minor depression was 39.4% during pregnancy and 94.4% postpartum. At the recommended cut-off of ≥ 3, screening accuracy of the PHQ-2 was poor (AUC = 0.66–0.68) (Table 4).

Table 4

ROC analysis of the PHQ-2 (cut-off point ≥ 3) with EPDS as criterion standard.

Test variable	AUC	SE	Significance (p)	Lower bound	Upper bound
Screened positive for depression
(EPDS as criterion standard)
Pregnancy (from 27 to 40-weeks)	0.68	0.05	0.000	0.58	0.79
Postpartum (from 1 to 13 weeks)	0.66	0.049	0.001	0.56	0.76

Discussion

To the best of our knowledge, this is the first study which compares the accuracy of the PHQ-2 to the EPDS in a large sample of perinatal women in Italy in order to consider its possible application as a first step screening in perinatal clinical practice. For the PHQ-2, we used the optimal cut-off point (≥ 3) taking into consideration combined sensitivity and specificity (Youden index), as recommended by recent literature concerning the accuracy of the PHQ-2 in clinical practice [42]. Both the PHQ-2 and the EPDS demonstrated acceptable internal consistency. In the present study, one of the main findings was that even though the benefit of including a triage test in perinatal clinical practice consists of narrowing the number of women who need more extensive evaluation, the PHQ-2 demonstrated that it excessively narrows the percentage of women that would need a subsequent longer assessment to a very low level of 4% of the original group of prenatal women and 8.9% of the original group of postnatal women. In fact, the PHQ-2 demonstrated that it was poorly sensitive for identifying expectant women and new mothers at risk for depression (39.5% and 32.7%, respectively) indicating that many cases of probable depression remained undetected (60.5% and 67.3% false-negative). This sensitivity was substantially lower than that reported in the original PHQ-2 validation study (39.5% and 32.7% compared to 83%) [37] but not very distant from that reported in a recent meta-analysis which assessed the PHQ-2 pooled performance against a gold-standard diagnostic interview [24]. This meta-analysis showed that the sensitivity of the PHQ-2 in identifying major depression in primary care was lower than that reported in the original study at ≥ 3 cut-off point (pooled sensitivity 64%; with the lower boundary of the 95% CI = 46%). Consistently, a more recent systematic review has confirmed that in settings such as primary care or some inpatient and outpatients specialty care, the PHQ-2 was up to 62% sensitive for cut-off score of 3 or greater, in studies that used fully structured interviews as reference standards [43]. In contrast, we found the PHQ-2 to be highly specific (97.5%-99.3%), suggesting a low risk of false positives and response burden. As a consequence of low sensitivity, despite the high specificity, the PHQ-2 ultimately demonstrated poor accuracy, based on the ROC analyses, suggesting that it did not possess any substantial discriminatory ability both during pregnancy and postpartum (AUCs = 0.68 and 0.66, both corresponding to poor accuracy). Looking at the PPVs, our findings seemed also to indicate a poor performance of the PHQ-2 in pregnancy, given the very low probability that a woman with a positive result indeed had depression (39.4%). In contrast, the combination of a low prevalence, low sensitivity and high specificity resulted in high NPVs (97.5% in pregnancy and 80.9% postpartum), suggesting that the probability that a woman with a negative PHQ-2 result indeed did not have depression was high, especially in pregnancy. However, these predictive values remained uninformative, at least in prenatal screening, because they are affected by a very low prior prevalence (from 2.7% to 5.1%), which involved the base rate fallacy [44]. This makes the PHQ-2 worthless under the condition of low prevalence, which means that if you intend to use the PHQ-2, you should only apply it in pregnant populations with high prevalence of depression, such as psychiatric populations or other populations at risk of mental disorders. As evidence of this, both PPV (94.4%) and NPV (80.9%) proved to be more informative and acceptable among our sample of postnatal women, who had a higher prevalence of depression. As pointed out by several authors [41, 45], when sensitivity is low despite high specificity, the use of the SpPIn mnemonics (high ecificity, ositive, rules ) is ineffective. In the present study, the SpPIn rule is not applicable also because of low prevalence, especially in our antenatal sample. In the same way, the use of the SnNOut mnemonics (high esitivity, egative, rules t) is not applicable because of low prevalence and low sensitivity. As a consequence, LRs and post-test probabilities may be probably the best way to evaluate the strength of the PHQ-2 in our clinical centres, where women will generally have depression at an earlier and milder stage (i.e., have a low or moderately high prevalence of depression). The LR- of 0.62 in pregnancy and the LR- of 0.68 postpartum were weakly indicative of an absence of depression, suggesting limited accuracy in ruling out the risk of depression following a negative result [41]. After testing negative for the PHQ-2, the post-test probability of having antenatal depression slightly decreased from 4% (pre-test probability) to 2.5%, with a confidence interval ranging from 2.0 to 3.2%. This means that a negative PHQ-2 was still compatible with a 3.2% post-test probability of depression, which clearly was low in absolute terms but unacceptably high in this situation in which the sensitivity was so low. Moreover, despite the weakly indicative LR-, the post-test probability was low because it started from a very low pre-test probability. As evidence of this, among postpartum women, with a higher pre-test probability and similar LR- of 0.68, the post-test probability resulted in a higher point estimate of 19.2% (CI: 16.4–22.2), which may be considered even more unacceptably high in this situation in which the sensitivity was even lower (32.7%). In this situation, the clinician may only feel moderately confident that depression could be ruled out. To be sure, he/she needs to advise women to proceed with other investigations. The LR+ in the prenatal sample was 15.7 (CI: 8.9–27.6), suggesting that the PHQ-2 ranged from moderately (estimated shift in probability of at least 30%, considering the lower boundary of the CI) to strongly (estimated shift in probability of at least 45%, considering the upper boundary of the CI) indicative of the presence of depression. The post-test probability of having prenatal depression increased from 4.0% (pre-test probability) to 39.4% (CI: 27.0–53.4). Despite this considerable increase, the post-test probability revealed the presence of depression among only about one in 2–3 women with a positive PHQ-2, under the low pre-test probability and sensitivity. This may make the clinician decide to rule in prenatal depression and proceed with a further screening assessment, but with moderate confidence. The LR+ in the postnatal sample was 48.7 with a very wide confidence interval (95% CI: 6.6–357.0). The probability of having postnatal depression increased from 25.9% to 94.4% (95% CI: 69.9–99.2) revealing that about one in 1–1.4 women with a positive test had depression. Despite the fact the LR+ suffered from a very wide confidence interval, with the lower 95% confidence interval including 6.6, which was moderately indicative of the presence of depression, this may make the clinician decide to rule in women for the presence of postnatal depression and proceed with the second step of a two-stage screening strategy, but with moderate confidence. The few available comparable studies are mostly consistent with our findings. In particular, our findings are very consistent with those of a recent Australian study which used the same EPDS cut-offs to test the accuracy of the PHQ-2 for probable minor depression [29]. In that study, during pregnancy (36-weeks), sensitivity, specificity, PPV, NPV, LR+ and LR- were 0.50, 0.95, 0.30, 0.98, 9.4 and 0.53, respectively, and at 6-weeks postpartum were 0.39, 0.97, 0.68, 0.91, 14.59 and 0.62, respectively. In the same study, using ROC analysis, the PHQ-2 cut-off ≥ 3 was found to be poor to fair for minor depression (AUC = 0.68 in pregnancy; AUC = 0.72 postpartum), similarly to our AUC findings of 0.66–0.68. Our findings were also similar to those of a study of Smith and colleagues [25], which reported low PHQ-2 sensitivity (62%). Very consistently with the present study, another postnatal study [27] showed a sensitivity of 43.5% and a specificity of 97.2%, using the same PHQ-2 cut-off point of 3 or more to identify a positive screen with the EPDS (cut-off point of 10 or more). A limited number of other studies have reported higher sensitivity. Gjerdingen et al., reported that the PHQ-2 had 75% sensitivity, 88% specificity, 24% PPV and 99% NPV, in a sample of postpartum women (0–1 month postpartum) [26]. However, Gjerdingen et al., used a different cut-off of 2 or higher, which maximized sensitivity. Bennett et al., [28] found that PHQ-2 had a sensitivity of 93%, 82%, and 80%, and specificity of 75%, 80%, and 86%, at 15 and 30 weeks gestational age and 6–16 weeks postpartum, respectively. However, they found low PPVs (44, 24, and 30% respectively), similarly to our antenatal PPV. Furthermore, Bennett et al., adopted a different PHQ-2 response format (dichotomous yes/no, where responding yes to either of the items was considered a positive result), which maximised sensitivity and specificity [29] and a sample with low education levels. It should be noted that it was found that PHQ-2 sensitivity was higher for women who went on beyond high school education than for those with a high school education or less [27]. Chae et al., [30], finally, found that the sensitivity of the PHQ-2 was 100% and the specificity was 79.3% among postpartum women attending a family multi-ethnic medicine residency centre. However, also in this case, the Authors adopted a dichotomous PHQ-2 response format which maximised sensitivity and specificity. Overall, these studies [26–28, 30] suggest that lowering the cut-off point to 2 or higher will increase sensitivity. This, however, would come at the cost of lowered specificity given its inverse relationship with sensitivity. Moreover, at this lower cut-off the specificity is likely to be further reduced when the prevalence of depression is low [24, 29]. This suggests that the PHQ-2 at a cut-off point of 2 or more may have limited usefulness in the identification of women with depression in primary or secondary care services, because in such contexts the prevalence of depression is likely to be low. The extent to which lowering the cut-off point would be a valid option depends on the prevalence of depression and the cost and availability of subsequent strategies, potentially burdensome in busy maternity settings, to further assess those who score positively on the initial screening.

Strengths and limitations

The strengths of the present study include the use of a large sample and several maternity clinical centres throughout Italy. The limitations include a low prevalence of antenatal depression and the use of the EPDS as a reference standard instead of a structured diagnostic interview. However, our prevalence is only slightly less than and comparable to those reported by other Italian studies [3, 4], and the aim in the present study was not to evaluate the diagnostic accuracy of the PHQ-2 but rather its screening accuracy compared to a longer more accredited screening instrument. In other words, the aim in the present study was to investigate if the PHQ-2 was accurate enough to be used as a substitute for a longer screening tool (as would occur in clinical practice using the ICHOM recommended procedure), not for a diagnostic instrument which formally assessed major depression.

Conclusion

This study found limited evidence regarding the use of the PHQ-2 as part of a perinatal multistage case finding strategy at the recommended cut-off point of ≥ 3 [23, 37, 42]. Other Authors have shown that similar ultra-rapid screening instruments, such as the Whooley questions [20], had very low sensitivity when routinely administered in early pregnancy [46]. Maternity and primary care services require simple, quick screening tools to know who to refer. The present study calls into question the appropriateness of the PHQ-2, which aligns with the literature raising questions about the validity of the Whooley questions. The findings in this study may impact on service provision and making decisions about which screening instruments to use. It suggests that more comprehensive instruments may be needed as a first line. In our opinion, directly administering a longer screening instrument such as the EPDS may be the best option. Should this be difficult in a busy maternity setting, a tablet or paper could represent a useful alternative [46].

Data set on the accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in Italy.

(XLSX) Click here for additional data file. 14 Jul 2021 PONE-D-21-08926 The screening accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in Italy PLOS ONE Dear Dr. Gigantesco, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Angela Lupattelli, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 3. PLOS ONE does not conduct masked peer review. Please add references into manuscripts. 4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 27 Aug 2021 We address the reviewers/Editor concerns in the Response to Reviewers file, and corresponding changes have been made to the manuscript. Submitted filename: Response to Reviewers.docx Click here for additional data file. 13 Sep 2021 PONE-D-21-08926R1The screening accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in ItalyPLOS ONE Dear Dr. Gigantesco, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 28 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Angela Lupattelli, PhD Academic Editor PLOS ONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for inviting me to review this manuscript. It reads well, and the analyses presented are accurate. However, there is a major limitation of this study; the assessment of criterion validity of PHQ-2 must be done against a gold standard i.e. diagnostic interviews by mental health specialists. The EPDS and PHQ-2 could be compared to establish convergent validity but not criterion validity. This is because EPDS itself is a screening instrument and that too not with a perfect sensitivity and specificity. The authors mention that psychodiagnostic interviews have been conducted by a clinical psychologist, as part of the study. Perhaps they could revise the analyses using the interview data rather than the EPDS one. I am sorry I could not be much encouraging of the choice of EPDS as a gold standard comparator and added further to the investigators' work. However, with updated analyses, the quality of the manuscript will be significantly improved. I look forward to reading your revised work! Thank you for your excellent work. Best wishes, Reviewer #2: This study looks at the validity of the PHQ-2 as a screening tool for identifying perinatal depression. It uses the EDPS as a reference standard, rather than a more robust clinical interview. This approach facilitated the involvement of a large number of women across a wide geographical area. The study finds that the PHQ-2 may be inadequate in terms of sensitivity and overall accuracy in detecting maternal depression, as defined by a positive EPDS score. This study could contribute to the existing limited evidence regarding the diagnostic utility of the PHQ-2 in perinatal depression. This study’s findings may inform policy decisions for screening for perinatal mental illness among the maternity population. Maternal depression often goes undetected and it is important to have a way of identifying at risk women in the general maternity population. There are many relatively new specialist perinatal mental health services being developed internationally, and referrals rely on perinatal illness being effectively picked up in Primary Care and maternity settings. This paper is a useful contribution to inform the referral process for these services. The paper is difficult to read, with the author at times using confusing, ambiguous and superfluous phrases. This often obscures the meaning. Shorter, more precise sentences throughout would make it easier to read. Overall, I think this is an interesting paper that may warrant publication with changes to the way it is written to convey its message and context more clearly. Specific comments according to subheading: Title (page 1) 1. Fine – although it could be more bold for impact, for instance the title could question the validity of using the PHQ-2 as a screening tool in perinatal depression Abstract (page 2) 2. Last line confusing. The PHQ-2 performance suggested that it has insufficient sensitivity and discriminatory power, and may be inadequate as a screening tool for maternal depression. Introduction (page 3) 3. Paragraph 1 - Remove ‘indeed, among other things’. Perinatal depression is associated with a range of adverse outcomes (neonatal outcomes, substance and alcohol abuse, poor attendance at antenatal care, bonding, family, long-term emotional and cognitive outcomes for the child). 4. Paragraph 2 - The clinical management of perinatal depression is evidence-based and includes pharmacological and psychological approaches. The treatments mentioned here in brackets are not internationally recognised - are they names of group/psychological therapies for women? Given that this paper has international interest I would make sure treatment interventions are described. 5. Paragraph 4 - Consider referencing the literature about the Whooley questions which questions the utility and accuracy of ultra-rapid screening tools (mentioned briefly in the conclusion). The NICE guidelines in 2007 were made in the absence of validation studies in the perinatal population. There is limited evidence for the use of the Whooley questions as a screening tool for maternal depression, only having been validated in small perinatal populations with a wide range of sensitivities found. Methods (page 4) 6. More detail to allow replicability. Results (page 5) 7. First paragraph - Sentence does not need to be in brackets, can be part of the text. Were all the questionnaires completed in full? What happened to partially completed questionnaires? 8. First paragraph - Could calculate the statistical significant of associations, using (?) non-parametric testing. Then statements could be made regarding positive associations eg – such as ‘amongst pregnant women, screening positive for depression on the EPDS was associated with secondary or university level education’. This is interesting because women from lower educational/socioeconomic groups are less likely to access perinatal mental health care, perhaps women from those groups less likely to disclose MH difficulties in screening questionnaires. Perhaps outside the scope of this paper. 9. Second and third paragraph – confusingly worded. Discussion (page 6) 10. Develop explanation for the recommended scoring cut off for PHQ2. Gjerdingen et al used a cut off of 2 and yielded higher sensitivity. Bennet et al used dichotomous scoring system (closer to the Whooley Questions) and yielded higher sensitivity. 11. Original PHQ2 validation study – Perhaps some exploration of the difference in sensitivities, why did they get a sensitivity of 82%, study design? Sample size? 12. In last paragraph, use of the word ‘sensitively’ referring to questioning style not helpful in a discussion about statistical sensitivity. 13. Maternity and primary care services require simple, quick screening tools to know who to refer. This paper calls in to question the appropriateness of the PHQ2, which aligns with the literature raising questions about the validity of the Whooley questions. The findings in this paper may impact on service provision and making decisions about which screening tools to use. It suggests that more comprehensive tools may be needed as a first line with potential cost/implementation implications. Conclusion (page 8) 14. Shorter, more clearly written conclusion required. Introducing new topics that could have been brought in earlier in the paper. Tables (page 14) 15. Table 2 - ‘At least probable minor depression’ - this is a confusing name for the category. Consider ‘screened positive for depression’ or ‘EDPS positive for depression’. Something clearer to be used consistently throughout the paper. 16. Table 2 – Prevalence is usually expressed as a proportion ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Dr. Ahmed Waqas Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 28 Oct 2021 Manuscript: The screening accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in Italy Response to reviewers (responses are shown highlighted) We address the reviewers concerns in this letter, and corresponding changes have been made to the manuscript. Reviewer #1 Thank you for inviting me to review this manuscript. It reads well, and the analyses presented are accurate. However, there is a major limitation of this study; the assessment of criterion validity of PHQ-2 must be done against a gold standard i.e. diagnostic interviews by mental health specialists. The authors mention that psychodiagnostic interviews have been conducted by a clinical psychologist, as part of the study. Perhaps they could revise the analyses using the interview data rather than the EPDS one. I am sorry I could not be much encouraging of the choice of EPDS as a gold standard comparator and added further to the investigators' work. However, with updated analyses, the quality of the manuscript will be significantly improved. We thank the reviewer for his/her valuable comments and we agree that the assessment regarding the criterion validity of PHQ-2 should be carried out against a gold standard, i.e., a diagnostic instrument. However, our aim was not to assess the diagnostic accuracy of the PHQ-2 but rather its screening accuracy in an attempt to corroborate the notion that it could be used instead of longer screening instruments, such as the EPDS. In other words, the aim of the current analysis was to reduce the burden of screening for depression which may derive from an inaccurate pre-screen step, and not to diagnose major depressive disorders, which should be performed by mental health specialists in a subsequent supplementary phase. Moreover, our study extends some of the work of other validation studies of the PHQ-2 that used the EPDS as we did (Bennett et al., 2008; Chae et al., 2012; Cutler et al., 2007; Slavin et al., 2020) as a reference standard. Our study may help establish the reliability of those studies that did not find relatively high sensitivities and specificities of the PHQ-2 compared with the EPDS (Cutler et al., 2007; Slavin et al., 2020). In the future, we would be happy to assess the criterion validity of the PHQ-2 against a diagnostic instrument. In the present study, even if we wanted to, we wouldn’t be able to do it because the interviews were not diagnostic interviews. We apologise because we inadequately described the interview as diagnostic. We have now realised that the term psycho-diagnostic was inappropriate and we have removed it. The interviews were semi-structured to elicit some information regarding current and past maternal experience with severe psychiatric conditions and use of psychotropic drugs. In particular, psychiatric conditions included positive psychotic (i.e., delusions and/or hallucinations), non-suicidal self-harm tendencies, suicidal ideation or substance abuse symptoms. Women that reported current or past delusions and/or hallucinations, self-harm tendency or suicidal ideation or substance abuse symptoms were excluded from the study and invited to be further assessed by a psychiatric consultation service. This was done because the interviews were primarily finalized to ascertain the eligible criteria for the participation of women to a subsequent study on the effectiveness of a psychological intervention. The EPDS and PHQ-2 could be compared to establish convergent validity but not criterion validity. This is because EPDS itself is a screening instrument and that too not with a perfect sensitivity and specificity. We agree. Alternatively, the classic approach was actually to establish convergent validity. However, we were not interested in detecting a linear correlation (Pearson or Spearman correlation) between the continuous scores of the instruments, as this correlation would have likely been weak to moderate and poorly informative. On the contrary, we were more interested in the percentages of false-negatives and false-positives. Using dichotomous variables (positive vs. negative results) instead of continuous variables, as an alternative, we could have calculated the percentages of exact agreements between the two instruments using contingency tables and the chi-squared distribution. However, this analysis would have produced similar results to an accuracy analysis. Reviewer #2 The paper is difficult to read, with the author at times using confusing, ambiguous and superfluous phrases. This often obscures the meaning. Shorter, more precise sentences throughout would make it easier to read. We tried our best in the present version of the manuscript. In this regard, we would like to specify that the article had been sent to a paid language editor at the time of its first submission. Specific comments according to subheading: Title (page 1) 1. Fine – although it could be more bold for impact, for instance the title could question the validity of using the PHQ-2 as a screening tool in perinatal depression. We have now slightly modified the title highlighting the limited validity of using PHQ-2 in perinatal depression. Abstract (page 2) 2. Last line confusing. The PHQ-2 performance suggested that it has insufficient sensitivity and discriminatory power, and may be inadequate as a screening tool for maternal depression. Thank you very much for the suggestion. We have now replaced the previous phrase with the phrase you suggested. Introduction (page 3) 3. Paragraph 1 - Remove ‘indeed, among other things’. Perinatal depression is associated with a range of adverse outcomes (neonatal outcomes, substance and alcohol abuse, poor attendance at antenatal care, bonding, family, long-term emotional and cognitive outcomes for the child). We have now removed it. 4. Paragraph 2 - The clinical management of perinatal depression is evidence-based and includes pharmacological and psychological approaches. The treatments mentioned here in brackets are not internationally recognised - are they names of group/psychological therapies for women? Given that this paper has international interest I would make sure treatment interventions are described. We have now referred to internationally recognized psychological/psychosocial interventions and we have omitted the interventions that we previously reported. 5. Paragraph 4 - Consider referencing the literature about the Whooley questions which questions the utility and accuracy of ultra-rapid screening tools (mentioned briefly in the conclusion). The NICE guidelines in 2007 were made in the absence of validation studies in the perinatal population. There is limited evidence for the use of the Whooley questions as a screening tool for maternal depression, only having been validated in small perinatal populations with a wide range of sensitivities found. We have added some additional information regarding the validity of the Whooley questions. Methods (page 4) 6. More detail to allow replicability. We have added more details in the Procedure and participants section. Results (page 5) 7. First paragraph - Sentence does not need to be in brackets, can be part of the text. It is now part of the text. Were all the questionnaires completed in full? What happened to partially completed questionnaires? We have added that information to the Procedure and participants section. 8. First paragraph - Could calculate the statistical significant of associations, using (?) non-parametric testing. The chi-square test (or Fisher exact test) was used to test for differences between women with positive EPDS and women without for each socio-demographic characteristic. We have now added this information in the statistical analysis section. Then statements could be made regarding positive associations e.g. – such as ‘amongst pregnant women, screening positive for depression on the EPDS was associated with secondary or university level education’. This has now been done. This is interesting because women from lower educational/socioeconomic groups are less likely to access perinatal mental health care, perhaps women from those groups less likely to disclose MH difficulties in screening questionnaires. Perhaps outside the scope of this paper. Thank you for your discussion. We focused on this issue in another paper that we recently submitted to another journal. We agree that lower educational/socioeconomic groups are less likely to access perinatal mental health care. In fact, the sample in this study had a higher level of education and better financial situation compared to the general population of Italian women. For example, in the general population, about 22% of those aged 25-64 and 33% of those aged 30-34 years have obtained a University degree. 9. Second and third paragraph – confusingly worded. We have rephrased those paragraphs. Discussion (page 6) 10. Develop explanation for the recommended scoring cut off for PHQ2. Gjerdingen et al used a cut off of 2 and yielded higher sensitivity. Bennet et al used dichotomous scoring system (closer to the Whooley Questions) and yielded higher sensitivity. We have expanded our discussion in the paragraphs in which we had discussed the findings of Gjerdingen and Bennet at the end of the Discussion section. 11. Original PHQ2 validation study – Perhaps some exploration of the difference in sensitivities, why did they get a sensitivity of 82%, study design? Sample size? We are unable to explain why the original study registered a sensitivity of 82%. However, we suppose that in that validation study, the physicians reviewed the PHQ-2 and asked some additional questions in order to clarify responses on the questionnaires of patients who they felt might have “false negative” PHQ-2 results, as they did previously in the PRIME-MD study (Spitzer RL, Williams JBW, Kroenke K et al. Validation and utility of a self-report version of PRIME-MD. JAMA 1999; 282 (18): 1737-1744), in which the Authors used the longer PHQ-9. This may have inflated the sensitivity. In general, the vast majority of available studies on the accuracy of the PHQ-2 reported lower sensitivity than that in the original validation study. Accordingly, a recent systematic review has shown that in settings such as primary care or some inpatient and outpatient specialty care, the PHQ-2 was up to 62% sensitive for a cut-off score of 3 or greater in studies that used fully structured interviews as reference standards (Levis et al., 2020). 12. In last paragraph, use of the word ‘sensitively’ referring to questioning style not helpful in a discussion about statistical sensitivity. We have now omitted that word and the phrase. 13. Maternity and primary care services require simple, quick screening tools to know who to refer. This paper calls in to question the appropriateness of the PHQ2, which aligns with the literature raising questions about the validity of the Whooley questions. The findings in this paper may impact on service provision and making decisions about which screening tools to use. It suggests that more comprehensive tools may be needed as a first line with potential cost/implementation implications. We agree with this valid point and have now referred to it in the text. Conclusion (page 8) 14. Shorter, more clearly written conclusion required. Introducing new topics that could have been brought in earlier in the paper. This has now been done, also using your valuable comments (see point 13). Tables (page 14) 15. Table 2 - ‘At least probable minor depression’ - this is a confusing name for the category. Consider ‘screened positive for depression’ or ‘EDPS positive for depression’. Something clearer to be used consistently throughout the paper. Thank you. This has now been done. 16. Table 2 – Prevalence is usually expressed as a proportion Thank you. This has now been done. Submitted filename: Responses to reviewers.docx Click here for additional data file. 15 Nov 2021 The limited screening accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in Italy PONE-D-21-08926R2 Dear Dr. Gigantesco, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Angela Lupattelli, PhD Academic Editor PLOS ONE 17 Nov 2021 PONE-D-21-08926R2 The limited screening accuracy of the Patient Health Questionnaire-2 in detecting depression among perinatal women in Italy Dear Dr. Gigantesco: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Angela Lupattelli Academic Editor PLOS ONE

38 in total

1. Variability in use of cut-off scores and formats on the Edinburgh Postnatal Depression Scale: implications for clinical and research practice.

Authors: S Matthey; C Henshaw; S Elliott; B Barnett
Journal: Arch Womens Ment Health Date: 2006-10-02 Impact factor: 3.633

2. Can we effectively use the two-item PHQ-2 to screen for postpartum depression?

Authors: Sung Y Chae; Mark H Chae; Alina Tyndall; Maria Rochelle Ramirez; Robin O Winter
Journal: Fam Med Date: 2012 Nov-Dec Impact factor: 1.756

3. Identification of postpartum depression.

Authors: Dorothy K Y Sit; Katherine L Wisner
Journal: Clin Obstet Gynecol Date: 2009-09 Impact factor: 2.190

Review 4. Postpartum Depression Screening Tools: A Review.

Authors: Nneamaka Ukatu; Camille A Clare; Mary Brulja
Journal: Psychosomatics Date: 2017-11-23 Impact factor: 2.386

5. Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale.

Authors: J L Cox; J M Holden; R Sagovsky
Journal: Br J Psychiatry Date: 1987-06 Impact factor: 9.319

6. Efficiency of a two-item pre-screen to reduce the burden of depression screening in pregnancy and postpartum: an IMPLICIT network study.

Authors: Ian M Bennett; Andrew Coco; James C Coyne; Alex J Mitchell; James Nicholson; Ellen Johnson; Michael Horst; Stephen Ratcliffe
Journal: J Am Board Fam Med Date: 2008 Jul-Aug Impact factor: 2.657

Review 7. Approaches to health-care provider education and professional development in perinatal depression: a systematic review.

Authors: Laura E Legere; Katherine Wallace; Angela Bowen; Karen McQueen; Phyllis Montgomery; Marilyn Evans
Journal: BMC Pregnancy Childbirth Date: 2017-07-24 Impact factor: 3.007

8. Questioning the "SPIN and SNOUT" rule in clinical testing.

Authors: Jean-Pierre Baeyens; Ben Serrien; Maggie Goossens; Ron Clijsen
Journal: Arch Physiother Date: 2019-03-07

9. Cognitive behaviour therapy-based intervention by community health workers for mothers with depression and their infants in rural Pakistan: a cluster-randomised controlled trial.

Authors: Atif Rahman; Abid Malik; Siham Sikander; Christopher Roberts; Francis Creed
Journal: Lancet Date: 2008-09-13 Impact factor: 79.321

10. Perspectives on Early Screening and Prompt Intervention to Identify and Treat Maternal Perinatal Mental Health. Protocol for a Prospective Multicenter Study in Italy.

Authors: Loredana Cena; Gabriella Palumbo; Fiorino Mirabella; Antonella Gigantesco; Alberto Stefana; Alice Trainini; Nella Tralli; Antonio Imbasciati
Journal: Front Psychol Date: 2020-03-11