Literature DB >> 36124356

Instruments to Identify Symptoms of Paternal Depression During Pregnancy and the First Postpartum Year: A Systematic Scoping Review.

Rigmor C Berg^1,2, Beate Larsen Solberg³, Kari Glavin³, Nina Olsvold³.

Abstract

Men often experience depressive symptoms during the transition to parenthood, but there is a lack of synthesized knowledge of instruments used to identify such symptoms. The aim of this scoping review was to identify instruments used to measure symptoms of depressive symptoms among fathers in pregnancy and the postpartum period, and to describe the instruments' characteristics and measurement properties. We identified studies published since 1990 through searches in databases such as MEDLINE, EMBASE, and PsycINFO and in gray literature. Pairs of reviewers selected relevant studies based on predetermined inclusion criteria. For each included study, we collected information relevant to the review question, guided by the COnsensus based Standards for the selection of health status Measurement INstruments (COSMIN). We included 13 instruments, described in 59 studies with about 29,000 participants across 25 countries. There were 12 validation studies. None of the instruments were uniquely developed for assessing paternal depressive symptoms related to fatherhood. The three most extensively examined instruments were the Edinburgh Postnatal Depression Scale (EPDS), Center for Epidemiologic Studies Depression Scale, and Beck Depression Inventory. For seven of the 13 instruments, there was no information reported about the instruments' properties beyond internal consistency, but for the other six instruments the 12 validation studies reported on both reliability and validity. No studies reported on measurement error or responsiveness. EPDS was both the most extensively assessed instrument and reported to be the most reliable and valid. Further research on instruments for identifying men with depression in pregnancy and the postpartum period is warranted.

Entities: Chemical

Keywords: depressive symptoms; fathers; instruments; postpartum; prenatal

Mesh：

Year: 2022 PMID： 36124356 PMCID： PMC9490477 DOI： 10.1177/15579883221114984

Source DB: PubMed Journal: Am J Mens Health ISSN： 1557-9883

Introduction

Becoming a parent is a joyous life transition for both men and women, involving major changes (Condon et al., 2004). The pregnancy and the postpartum period, including the first year after childbirth, while being a positive experience for most parents, can be mentally challenging (Darwin et al., 2017; Philpott et al., 2020). Studies report that depressive symptoms related to the pregnancy and the postpartum period affect fathers as well as mothers. Many fathers describe experiencing psychological difficulties and negative feelings such as stress, anxiety, confusion, uncertainty, helplessness, worries, fear, and frustration (Darwin et al., 2017; Philpott et al., 2020; Shorey & Chan, 2020). Such feelings are often linked to reduced mental health and depression. Within traditional health care, there has been both limited awareness and a lack of attention toward the mental health challenges some fathers experience in relation to pregnancy and the postpartum period (Kim & Swain, 2007). Consequently, fathers’ depressive symptoms can be difficult to detect (Darwin et al., 2017; Shorey & Chan, 2020). Depressive symptoms in fathers during pregnancy and the postnatal period are often referred to as paternal postpartum depression (PPD), but there is no universal definition of PPD (Cameron et al., 2016; Paulson & Bazemore, 2010). Experts state that there is a lack of diagnostic tools developed exclusively for screening symptoms of PPD (Cameron et al., 2016; Musser et al., 2012; Paulson & Bazemore, 2010). Tools used in pregnancy and postpartum care are often developed for mothers or the general population outside maternity care, and often do not account for gender differences in symptoms (Cameron et al., 2016; Madsen & Juhl, 2007). Given the variable methods of measuring, reporting, and lack of standardized guidelines, the prevalence of depressive symptoms described in the literature identifies wide statistical variations (Cameron et al., 2016; Musser et al., 2012; Paulson & Bazemore, 2010). Meta-analyses indicate a meta-estimate of 8.4% to 10.8% depression prevalence among fathers, during pregnancy and the first year after childbirth (Cameron et al., 2016; Paulson & Bazemore, 2010). The highest rates of depression were identified 3 to 6 months after birth. Compared with base rates of depression seen in the general male adult population at 4.8%, PPD represents a significant health concern (Cameron et al., 2016; Paulson & Bazemore, 2010). Compared with maternal postpartum depression, PPD appears different in many ways. Fathers seem to have a mix of both traditional and more insidious and less obvious symptoms, such as somatic symptoms; withdrawal from social situations, work and/or family; indecisiveness and avoidance; as well as irritability, anger, and affective rigidity (Kim & Swain, 2007; Musser et al., 2012; Psouni et al., 2017). Alcohol use, drug use, and partner violence can be expressions of male depression (Kim & Swain, 2007; Musser et al., 2012). PPD does not only affect the father’s health, but leads to an increased risk of disharmony in partner relationships (Goodman, 2004; Ramchandani et al., 2011). Studies have documented increased risk of negative infant bonding and child development (Kerstis et al., 2016; Ramchandani et al., 2011). There is a moderate positive correlation between maternal and paternal depression (Goodman, 2004; Paulson & Bazemore, 2010; Ramchandani et al., 2011). These findings indicate that prevention and intervention of postpartum depression should have a family-focused perspective and underpin the importance of good mapping and identification of depressive symptoms also among fathers (Kerstis et al., 2016; Paulson & Bazemore, 2010; Philpott et al., 2020; Ramchandani et al., 2011). In international research and in health practices in the antenatal and perinatal care, a number of different instruments are used to measure and identify symptoms of paternal depression (Cameron et al., 2016; Musser et al., 2012). But there is a lack of synthesized knowledge of depression measurement instruments for fathers in pregnancy and the postpartum period. Given the need for guidelines and synthesized knowledge of available diagnostic tools to assist health care workers and researchers to make informed choices in the selection of depression measurement instruments, an important first step and contribution to the field is to synthesize information on instruments for PPD. To further our understanding of the conceptual diversity and cultural applicability of existing instruments, the aims of this scoping review were to synthesize knowledge of the number, characteristics, and measurement properties of instruments for paternal depressive symptoms in pregnancy and the postpartum period.

Method

Study Design

We conducted a scoping review, which provides a descriptive account of available research on a particular topic. This type of systematic review tends to have a broader approach compared with a traditional systematic review of, for example, effect or experience and is used to present a broad overview of the evidence pertaining to a topic, irrespective of study quality and without in-depth synthesis of the results (Arksey & O’Malley, 2005; Tricco et al., 2018). Scoping reviews are performed in the same systematic way as traditional systematic reviews, with verifiability and transparent methods (Munn et al., 2018; Peters et al., 2015). Given the purpose of scoping reviews is to describe the current research on a topic, they are useful for summarizing the existing research, identifying research gaps, and making recommendations for further research (Arksey and O’Malley, 2005; Munn et al., 2018; Peters et al., 2015). We conducted the scoping review in accordance with the five-stage methodological framework proposed by Arksey and O’Malley (2005) and further enhanced by Levac et al. (2010): (a) identifying the research question, (b) identifying relevant studies, (c) selecting relevant studies, (d) charting the data, and (e) collating, summarizing, and reporting results. Furthermore, we developed a protocol, which is available upon request, and we report in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist (Tricco et al., 2018).

Research Steps

Identifying the Research Question

Prior to determining the research question, we extensively scoped and read existing literature. Our question was, “Which instruments are used to identify symptoms of paternal depression during pregnancy and the postpartum period, and what are their characteristics and measurement properties?”

Identifying Relevant Studies

An information search specialist conducted the systematic search in the following scientific databases: MEDLINE, EMBASE, PsycINFO, CINAHL, and Health and Psychosocial Instruments (HaPI). The database searches were limited to January 1990 and June 18, 2020. The search strategy was developed in collaboration with the review team and piloted, with combinations of Boolean phrases and truncation strategies to expand and finally narrow the search for relevant publications. The final search strategy was peer reviewed by another librarian and combined the following keywords and their synonyms, using neither language nor methodology search filters: (Scale* OR instrument* OR Psychometrics OR Psychometry OR Questionnaire OR Measurement* OR Psychological test) AND (Depression OR Depression symptoms OR Depressive symptoms OR Depressive disorder) AND (Father* OR Paternal OR Dad* OR Men) AND (Prenatal OR Prepartum OR Peripartum OR Perinatal OR Puerperium OR Postnatal OR Postpartum OR Antenatal OR Antepartum OR Preg* OR Pregnancy OR Childbirth OR Birth OR Parturition). In addition, on June 18 to 22, 2020, we searched for gray literature in web-based search engines (Open Gray, Research gate, Google Scholar), using key terms from the main searches. We searched reference lists of relevant reviews and included studies for further relevant studies, and home pages of relevant organizations and the EU Clinical Trials Register.

Selecting Relevant Studies

We stored retrieved references in an EndNote X9 database and deleted duplicates. Next, we imported all unique records into the screening tool Rayyan QCRI (Ouzzani et al., 2016) for an independent selection process by pairs of reviewers. Pairs screened first all titles and abstracts in accordance with the inclusion and exclusion criteria. Next, they screened the full texts of all records they agreed were relevant according to the inclusion criteria. For each of the two screening levels, we used predesigned inclusion forms. We resolved differences in opinion in the screening process through reexamination of the publication and subsequent discussion. If there were more than one publication based on the same study population, the most informative publication for our purposes was included. With respect to the inclusion and exclusion criteria, given the aim of the review, our main inclusion criterion was that the study reported one or more psychometric properties of an instrument to measure PPD. In specific, eligible studies had as their population expectant fathers or fathers in the postpartum period, including the first 12 months after childbirth. We enforced no restrictions regarding the father’s age, residence, ethnicity, and so on, and studies describing depression in both parents were included provided the fathers’ results were reported separately. The instruments could be of any type and used in any health care setting, but had to be developed with the aim of measuring symptoms of paternal depression during pregnancy or the postpartum period, or be evaluated in their ability to measure symptoms of paternal depression during this period although they were not originally developed for this purpose. The studies needed to report data about one or more measurement property. We based our understanding of measurement property on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN; Mokkink et al., 2010). We included all study designs and publication types published since 1990 and written in English, Norwegian, Swedish, or Danish. We excluded studies not available in full text, such as conference presentations.

Charting the Data

To enable consistency, we collected standard information on each study by applying a common predesigned data charting form to all the research reports. One reviewer extracted data, another reviewer checked the completeness and accuracy of the extraction, and the two reviewers resolved differences through reexamination of the publication and subsequent discussion. For each study, we extracted data on publication details, study methods, sample characteristics, type of instrument, time of measurement, and setting. Importantly, we extracted data on the following measurement properties, based on the COSMIN definitions of measurement properties (Mokkink et al., 2010; Terwee et al., 2007): reliability (internal consistency, reliability, measurement error), validity (content, construct and criterion validity), and responsiveness (Table 1).

Table 1.

Definitions of Measurement Properties Based on COSMIN .

Domain	Measurement properties	Definition
Reliability = The degree to which the measurement is free from measurement error.	Internal consistency	The extent to which items in the instrument are correlated, thus measuring the same concept.
	Reliability	The extent to which participants can be distinguished from each other, despite measurement errors.
	Measurement error	The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured.
Validity = The degree to which an instrument measures the construct it purports to measure.	Content validity	The degree to which the instrument is an adequate reflection of the construct to be measured, with the following aspects: measurement aim of the questionnaire, concepts that the questionnaire is intended to measure, item selection/reduction, and interpretability of the items.
	Construct validity	The degree to which the scores of the instrument relate to other measures, in a manner that is consistent with hypotheses based on the assumption that the instrument validly measures the construct to be measured.
	Criterion validity	The degree to which the scores of the instrument are an adequate reflection of a “gold standard.”
Responsiveness		The ability of the instrument to detect clinically important changes over time.

Note. COSMIN = COnsensus based Standards for the selection of health status Measurement INstruments.

Mokkink et al. (2010) and Terwee et al. (2007).

Definitions of Measurement Properties Based on COSMIN . Note. COSMIN = COnsensus based Standards for the selection of health status Measurement INstruments. Mokkink et al. (2010) and Terwee et al. (2007).

Collating, Summarizing, and Reporting Findings

After compiling the data in a single spreadsheet (Excel), we sorted the extracted data to get an overview of the material. We grouped the data into clusters according to measurement instruments and measurement properties, following a data-driven approach (Arksey & O’Malley, 2005; Peters et al., 2015), and carried out descriptive analyses by using frequencies and cross-tabulations.

Results

Search Results

The search yielded 3,333 records. Following the removal of duplicates, we screened 1,868 titles and abstracts, and 391 full texts. We included 59 studies describing 13 instruments. The results of the data selection process are presented in Figure 1.

Figure 1.

PRISMA Flow Diagram of the Selection Process.

Characteristics of the Included Studies

Characteristics of the 59 included studies are presented in Table 2, with the instruments they describe in Table 3, which provides references to all included studies. All 59 studies were published between 1991 and 2020, in English, in the form of journal articles. The studies were from 25 different countries, with the most studies from the United States (n = 14, 24%) and Italy (n = 7, 12%). In addition to the countries listed in Table 2, there was one study each from Brazil, Chile, Denmark, Greece, Hong Kong, Iran, Israel, Netherlands, New Zealand, Norway, Saudi Arabia, Switzerland, and Vietnam.

Table 2.

Summary Characteristics of the Included Studies (N = 59).

Study characteristics	N (%)
Year of publication
1991–1999	3 (5.1)
2000–2005	1 (1.7)
2006–2010	9 (15.3)
2011–2015	16 (27.1)
2016–2020	30 (50.8)
Country
Australia	4 (6.8)
Canada	2 (3.4)
China	3 (5.1)
England	2 (3.4)
Finland	2 (3.4)
Italy	7 (11.8)
Japan	3 (5.1)
Portugal	3 (5.1)
Sweden	2 (3.4)
Taiwan	2 (3.4)
Turkey	2 (3.4)
USA	14 (23.7)
Other (one study from each country)^a	13 (22.0)
Number of study participants
<50	3 (5.1)
50–99	12 (20.3)
100–499	31 (52.5)
500–1000	8 (13.6)
>1000	5 (8.5)
Recruitment location
Hospital clinics	32 (54.2)
Primary/community care	16 (27.1)
Other (registers, social networks/media, advertisement)	11 (18.7)
Study design
Cross-sectional	21 (35.6)
Longitudinal	26 (44.1)
Validation	12 (20.3)
Time of measurement of depression^b
Pregnancy	30 (50.8)
0–6 months postpartum	46 (78.0)
6–12 months postpartum	9 (12.2)

Shown in the text. bMore than one answer possible.

Table 3.

Characteristics of the Instruments (N = 13).

Instrument	Form/version	Developed for/aim/content	Number of items	Response options (score)	Timeframe	Described in
Beck Depression Inventory (BDI)	BDI	Measures severity and depth of depression symptoms in general population, persons ≥ 13 years	21	0–3 (0–63)	Past week	Lai et al. (2010) Magalhaes et al. (2008) Pérez et al. (2018)
	BDI short form (BDI-13)		13	0–4 (0–52)	Past week	Vanska et al. (2017)
Brief Symptom Inventory (BSI)	Depression scale (DEP), which is one of nine dimensions of BSI	BSI = Measure of nine primary dimensions of psychiatric symptoms (53 items). DEP = six questions about thoughts of ending life, feeling lonely, worthlessness, no interest in things, hopelessness about the future	6	0–4	Past week	Van den Berg et al. (2009)
Center for Epidemiologic Studies Depression Scale (CES-D)	CES-D	Measures cognitive, somatic, and psychological depressive symptoms in community populations	20	0–3	Past week	Biehle and Mickelson (2011) Ferketich and Mercer (1995) Formica et al. (2018) Hunt et al. (2015) Reut and Kanat-Maymon (2018) Richman et al. (1991) Williams et al. (2012)
	CES-D		20	0–3	Past month	Caldwell et al. (2018)
	CES-D short version: five items about somatic manifestations of depression removed		15	0–3	Past week	Divney et al. (2012)
	CES-D short form		12	0–3 (0–36)	Not stated	Paulson et al. (2009)
Chinese Health Questionnaire (CHQ)	CHQ	Screening instrument to identify nonpsychotic psychiatric disorders in community populations	12	Not stated	Not stated	Hung et al. (1996)
Depression Anxiety and Stress Scale (DASS)	DASS short form (DASS-21)	DASS (42 items) measures emotional state of depression, anxiety, and stress in general population (not categorical measure of clinical diagnosis)	21	0–3	Past week	Wee et al. (2015)
Edinburgh Postnatal Depression Scale(EPDS)	EPDS	Developed to detect and measure the risk of postpartum depression in community samples/primary care. Eight statements about depressive symptoms and two about anxiety symptoms. Originally developed to detect postpartum depression in women	10	0–3 (0–30)	Past week	Bamishigbin et al. (2017) Boeding et al. (2019) Carlberg et al. (2018) Cameron et al. (2020) Chen et al. (2019) Cockshaw et al. (2014) Cyr-Alves et al. (2018) Da Costa et al. (2019) Edmondson et al. (2010) Epifanio et al. (2015) Fredriksen et al. (2019) Gao et al. (2009) Goodman (2008) Gürber et al. (2017) Howarth and Swain (2020) Jia et al. (2020) Koc and Ergol (2018) Korja et al. (2018) Lai et al. (2010) Loscalzo et al. (2015) Maleki et al. (2018) Mao et al. (2011) Massoudi et al. (2013) Matthey et al. (2001) Mihelic et al. (2018) Molgora et al. (2017) Nishimura and Ohashi (2010) Nishimura et al. (2015) Pérez et al. (2018) Pinto et al. (2016) Prino et al. (2016) Rolle et al. (2017) Roubinov et al. (2014) Serhan et al. (2013) Shaheen et al. (2019) Tran et al. (2012) Vismara et al. (2016) Warriner et al. (2018)
Gotland Male Depression Scale (GMDS)	GMDS	Developed to detect and measure major depression in males by focusing on “male depressive symptoms.” Items are related to typical depression, including items on irritability, aggression, acting-out behavior, and alcohol abuse	13	0–3 (0–39)	Past month	Carlberg et al. (2018) Madsen and Juhl (2007)
Hospital Anxiety and Depression Scale (HADS)	HADS	Developed to detect and measure states of depression and anxiety in hospital settings	14	0–3 (0–42)	Past week	Brandâo et al. (2019)
Kessler Psychological Distress Scale (K10)	K10	Developed to detect and measure anxiety and depressive symptoms in general population surveys	10	0–4 (0–40)	Past month	Konishi et al. (2016)
	K10 short version (K6)		6	0–4 (0–24)	Past month	Konishi et al. (2016)
Paternal Adjustment and Paternal Attitudes Questionnaire (PAPA)	PAPA-AN(Antenatal version)	PAPA = measures paternal adjustment and attitudes during the transition to parenthood (30 items with three subscales to assess sexual and marital relationship, attitudes towards pregnancy and the baby; all scales associated with psychopathological symptoms (e.g., depression, anxiety) during this period. PAPA-AN same items as PAPA but adjusted for the pregnancy period	30	1–4 (0–120)	Not stated	Pinto et al. (2017)
	PAPA-PN(Postnatal version)	PAPA-PN same items as PAPA but adjusted for the postpartum period	30	1–4 (0–120)	Not stated	Pinto et al. (2017)
Patient Health Questionnaire (PHQ)	PHQ-9 (Depression module of PHQ)	PHQ = Developed as a screening and diagnostic tool for mental health disorders of depression, anxiety, alcohol, eating and somatic disorders. PHQ-9 = Module of PHQ to assess symptoms of depression	9	0–3 (0–27)	Past 2 weeks	Lai et al. (2010) Konishi et al. (2016)
Postpartum Depression Screening Scale (PDSS)	PDSS short version	PDSS = Developed to identify women with high risk for postpartum depression (35 items). PDSS short version = 11 items	11	1–5 (11–55)	Past week	Don and Mickelson (2012)
Zung’s Self-rating Depression Scale (SDS)	SDS	Developed to measure depression-related symptoms in clinic settings	20	1–4 (20–80)	Not stated	Alexopoulou et al. (2018) Hung et al. (1996)

Summary Characteristics of the Included Studies (N = 59). Shown in the text. bMore than one answer possible. Characteristics of the Instruments (N = 13). The sample sizes varied, from 36 to 5,098, with a total of 29,053 participants across the 59 studies. Most studies recruited fathers in a hospital setting (n = 32, 54%) or in primary or community health care (n = 16, 27%). Given their aim to examine an instrument’s measurement properties, we labeled 12 studies (20%) validation studies, meaning a study that examines the extent to which an assessment measures what it is supposed to measure (Fox et al., 2020). Twenty-nine studies measured PPD more than once, with most studies measuring depression during pregnancy (n = 30, 51%) and/or 0 to 6 months postpartum (n = 46, 78%). With respect to participant characteristics, we note that all but two (4%) of the studies included fathers older than 18 years and 20 studies (34%) concerned first-time fathers. Two (4%) studies focused on low-income parents, two (4%) focused on fathers of preterm babies, and a few of the samples had couples receiving infertility treatment, twin babies, or babies born with cesarean section.

Characteristics of the Instruments

We identified 13 instruments with data on their ability to measure symptoms of PPD. The instruments and their characteristics are described in Table 3. Note that five studies reported on two instruments (Carlberg et al., 2018; Hung et al., 1996; Konishi et al., 2016; Perez et al., 2018; Pinto et al., 2017) and one study on three (Lai et al., 2010). The three most frequently examined instruments were the Edinburgh Postnatal Depression Scale (EPDS), Center for Epidemiologic Studies Depression Scale (CES-D), and the Beck Depression Inventory (BDI), assessed in 38, 10, and four studies, respectively. The other 10 instruments were assessed in only one or two studies each: Brief Symptom Inventory (BSI), Chinese Health Questionnaire (CHQ), Depression Anxiety and Stress Scale (DASS), Gotland Male Depression Scale (GMDS), Hospital Anxiety and Depression Scale (HADS), Kessler Psychological Distress Scale (K10), Paternal Adjustment and Paternal Attitudes Questionnaire (PAPA), Patient Health Questionnaire (PHQ), Postpartum Depression Screening (PDSS), and Zung’s Self-rating Depression Scale (SDS). All 13 instruments were self-report questionnaires, completed by the fathers at home or in a clinical setting. The distribution was mostly by mail or online and some were answered in an interview setting with a health care worker. None of the instruments were specifically developed to measure symptoms of PPD. Rather, they were originally developed by other researchers to measure depressive symptoms in general samples (BDI, BSI, CES-D, CHQ, DASS, K10), depression in the context of hospital/clinical settings (HADS, PHQ, SDS), depression among women in the postpartum period (EPDS, PDSS), paternal adjustment and attitudes related to the transition to parenthood focusing on depression and anxiety during the pregnancy and the postpartum period (PAPA), and male depressive symptoms but not originally related to the pregnancy/postpartum context (GMDS). The original versions of the instruments were used in all the studies describing CHQ, EPDS, GMDS, HADS, PAPA, and SDS, whereas nine studies used a reduced or short version of BDI, BSI, CES-D, DASS, K10, PDSS, and PHQ. Although several of the instruments focus mainly on the psychological/emotional dimension of depression (CES-D short version, DASS short form, EPDS, HADS, K10, PDSS), the majority focus on both somatic/physical and psychological/emotional depressive symptoms (BDI, BSI, CES-D, CHQ, GMDS, PAPA, PHQ, SDS). Across the 13 instruments, the number of items of the instruments varied from six to 30 and the response format was Likert-type scale, with either four or five points. The recall for self-report of experienced depressive symptoms was the past week (BDI, BSI, CES-D, DASS, EPDS, HADS, PDSS), past 2 weeks (PHQ), and past month (GMDS, K10). There was no timeframe reported for the instruments CHQ, PAPA, and SDS. Note that one of the 10 studies reporting on CES-D used the last month, rather than the last week, as timeframe. All instruments interpret high/low scores as high/low depression. With respect to the instruments’ cutoff scores for detecting depressive symptoms, no studies reported which cutoff score they used for BSI, DASS, and PDSS. For CES-D, all included studies reported the same cutoff score, ≥16. Only one study each reported on the cutoff score used for depressive symptoms for CHQ, HADS, and SDS, which was ≥3, ≥8, and >40, respectively. Cutoff scores for BDI, EPDS, GMDS, K10 & K6, PAPA, and PHQ are reported below.

The Instruments’ Measurement Properties

Overall, across the 59 studies there was sparse information about the instruments’ measurement properties. On the contrary, as presented in Table 4, almost all studies reported on internal consistency (reliability) as measured with the Cronbach’s alpha coefficient. In the studies using the following 10 scales BDI, BSI, CES-D, DASS, GMDS, K10, PAPA, PDSS, PHQ, and SDS, Cronbach’s alpha ranged from .74 to .91 in pregnancy, 0 to 6 months postpartum, and or 6 to 12 months postpartum. Cronbach’s alpha for EPDS was .73 to .88 (pregnancy), .60 to .88 (0–6 months postpartum), and .73 to .81 (6–12 months postpartum). Two scales had a comparatively lower Cronbach’s alpha: CHQ (α = .67 in pregnancy and 0–6 months postpartum) and HADS (α = .64 in pregnancy).

Table 4.

Internal Consistency (Reliability) of the Instruments (N = 13).

Instrument	Time of measurement	Cronbach’s α coefficient
BDI	Pregnancy	.81
	0–6 months postpartum	.80–.91
	6–12 months postpartum	.84
BSI	Pregnancy	.75
CES-D	Pregnancy	.74–.89
	0–6 months postpartum	.83–.89
	6–12 months postpartum	.83–.89
CHQ	Pregnancy	.67
CHQ	0–6 months postpartum	.67
DASS	Pregnancy	.86–.91
EPDS	Pregnancy	.73–.88
	0–6 months postpartum	.60–.88
	6–12 months postpartum	.73–.81
GMDS	0–6 months postpartum	.88
HADS	Pregnancy	.64 (depression scale)
K10	Pregnancy	.79–.88
PAPA	Pregnancy	.90–.91
PAPA	0–6 months postpartum	.90–.91
PDSS	0–6 months postpartum	.83
PDSS	6–12 months postpartum	.86
PHQ	0–6 months postpartum	.88
SDS	Pregnancy	.83
SDS	0–6 months postpartum	.83–.90

BDI = Beck Depression Inventory; BSI = Brief Symptom Inventory; CES-D = Center for Epidemiologic Studies Depression Scale; CHQ = Chinese Health Questionnaire; DASS = Depression Anxiety and Stress Scale; EPDS = Edinburgh Postnatal Depression Scale; GMDS = Gotland Male Depression Scale; HADS = Hospital Anxiety and Depression Scale; PAPA = Paternal Adjustment and Paternal Attitudes Questionnaire; PHQ = Patient Health Questionnaire; PDSS = Postpartum Depression Screening; SDS = self-rating depression scale.

Internal Consistency (Reliability) of the Instruments (N = 13). BDI = Beck Depression Inventory; BSI = Brief Symptom Inventory; CES-D = Center for Epidemiologic Studies Depression Scale; CHQ = Chinese Health Questionnaire; DASS = Depression Anxiety and Stress Scale; EPDS = Edinburgh Postnatal Depression Scale; GMDS = Gotland Male Depression Scale; HADS = Hospital Anxiety and Depression Scale; PAPA = Paternal Adjustment and Paternal Attitudes Questionnaire; PHQ = Patient Health Questionnaire; PDSS = Postpartum Depression Screening; SDS = self-rating depression scale. For seven of the 13 instruments (BSI, CES-D, CHQ, DASS, HADS, PDSS, and SDS), there was no information reported about the instruments’ properties beyond internal consistency. For the other six instruments (BDI, EPDS, GMDS, K10 & K6, PHQ, and PAPA), 12 studies reported on the instruments’ internal consistency, and other aspect of reliability and/or their validity (Tables 4 and 5). Below, we provide data about each of these six instruments’ measurement properties. The studies reported reliability with correlation coefficients (e.g., Cohen’s κ, Pearson’s correlation coefficient). Given none of the instruments were specifically developed to measure symptoms of PPD, content validity was not applicable. In our included studies, construct validity was measured by examining the correlation with similar measures. For criterion validity, a structured clinical interview was used as the gold standard. No studies reported on measurement error or responsiveness (these properties are therefore not presented in Table 5).

Table 5.

Measurement Properties Based on COSMIN of Seven Instruments.

Study	Instrument	Internal consistency	Reliability	Construct validity	Criterion validity
Edmondson et al. (2010)	EPDS				√
Loscalzo et al. (2015)	EPDS	√		√
Massoudi et al. (2013)	EPDS	√			√
Matthey et al. (2001)	EPDS	√	√	√	√
Nishimura and Ohashi (2010)	EPDS			√
Shaheen et al. (2019)	EPDS				√
Tran et al. (2012)	EPDS	√			√
Lai et al. (2010)	EPDSBDIPHQ-9	√√√	√√√	√√√	√√√
Konishi et al. (2016)	K10K6PHQ-9	√√√		√√√
Carlberg et al. (2018)	GMDS	√		√
Madsen and Juhl (2007)	GMDS		√	√
Pinto et al. (2017)	PAPA-ANPAPA-PN	√√	√√	√√

BDI = Beck Depression Inventory; GMDS = Gotland Male Depression Scale; PHQ = Patient Health Questionnaire; PAPA-AN = Paternal Adjustment and Paternal Attitudes Questionnaire–Antenatal version; PAPA-PN = Paternal Adjustment and Paternal Attitudes Questionnaire–Postnatal version.

Measurement Properties Based on COSMIN of Seven Instruments. BDI = Beck Depression Inventory; GMDS = Gotland Male Depression Scale; PHQ = Patient Health Questionnaire; PAPA-AN = Paternal Adjustment and Paternal Attitudes Questionnaire–Antenatal version; PAPA-PN = Paternal Adjustment and Paternal Attitudes Questionnaire–Postnatal version.

BDI

Four studies, conducted in Brazil, Chile, Hong Kong, and Finland, gave information about BDI (Beck & Gable, 2000). They reported that internal consistency was .81 (pregnancy), .80 to .91 (0–6 months postpartum), and .84 (6–12 months postpartum). Three different cutoffs were proposed (≥6, >9, and ≥14). One study offered information about also reliability, construct validity, and criterion validity (Lai et al., 2010). Split-half reliability as measured by the Spearman–Brown coefficient was .85. Lai et al. (2010) employed the structured clinical interview for Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV; American Psychiatric Association, 1994), finding that BDI had excellent performance with an optimum cutoff ≥6 to detect PPD among Chinese men. But it was less accurate than EPDS. BDI correlated moderately with EPDS and PHQ (r = .72 and r = .78, respectively, p < .01).

EPDS

Of the 13 instruments, EPDS was examined most extensively, by the highest number of studies and across most domains. Thirty-eight studies, from 18 different countries (Australia, Canada, China, England, Finland, Iran, Italy, Japan, New Zealand, Norway, Portugal, Sweden, Switzerland, Taiwan, Turkey, Saudi Arabia, the United States, Vietnam), gave information about EPDS. Thirty-five studies reported that EPDS’s Cronbach’s alpha was .73 to .88 (pregnancy), .60 to .88 (0–6 months postpartum), and .73 to .81 (6–12 months postpartum). In addition to internal consistency, eight studies—from Australia, England, Hong Kong, Italy, Japan, Saudi Arabia, Sweden, and Vietnam—reported on also reliability and/or validity of EPDS. With respect to split-half reliability as measured by the Spearman–Brown coefficient, two studies determined it was .78 and .84, while item-total correlations were .24 to .65 and .46 to .70 (Lai et al., 2010; Matthey et al., 2001). Four studies reported that EPDS measured a mood construct in men similar to CES-D (r = .67, r = .53, r = .62), BDI (r = .86), and PHQ (r = .78; Lai et al., 2010; Loscalzo et al., 2015; Matthey et al., 2001; Nishimura & Ohashi, 2010). But the optimal cutoff scores for PPD differed, ranging from ≥5 to ≥13 with ≥10 as the most frequent cutoff (Edmondson et al., 2010; Lai et al., 2010; Loscalzo et al., 2015; Massoudi et al., 2013; Matthey et al., 2001; Nishimura & Ohashi, 2010; Shaheen et al., 2019; Tran et al., 2012). The six studies assessing criterion validity performed diagnostic clinical interviews by psychologist/Diagnostic Interview Schedule using DSM-IV, Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association, 2013), or the Primary Care Evaluation of Mental Disorders (Edmondson et al., 2010; Lai et al., 2010; Massoudi et al., 2013; Matthey et al., 2001; Shaheen et al., 2019; Tran et al., 2012). Overall, they concluded that EPDS was a valid instrument for identifying depression in postnatal fathers.

GMDS

Two studies, conducted in Denmark and Sweden, gave information about GMDS (Zierau et al., 2002) and compared it with EPDS. Both reported a cutoff for PPD of ≥13. The Swedish study (Carlberg et al., 2018) reported that internal consistency was .83 (0–6 months postpartum) and that it correlated moderately with EPDS (r = .76, p < .001). The Danish study (Madsen & Juhl, 2007) described that reliability was fair to moderate (Cohen’s κ = 0.49) and that the responses in the two scales were related (p < .0001).

K10, K6

One study gave information about K6 and K10 among Japanese fathers (Konishi et al., 2016). It reported that internal consistency was .79 for K6 and .88 for K10 in pregnancy. The optimal cutoff for detecting PPD was ≥5 for K6 and ≥10 for K10. To assess validity, the authors assessed correlation with the scales completed by the men’s female partners (FP). The correlations were weak between K6 and K6-FP (r = .399, p < .01) and between K10 and K10-FP (r = .425, p < .01).

PAPA

Only the study by Pinto et al. (2017), which was conducted in Portugal, gave information about PAPA. It reported that internal consistency was .91 for the antenatal version and .90 for the postnatal version. All PAPA-AN and PAPA-PN items presented an item-total correlation higher than .30, and mean-item correlations higher than .15 were established for all the subscales. Both versions of the instrument (AN and PN) identified significant associations with EPDS, with r = −.484 (p = .01) and r = −.405 (p = .01), respectively. The optimal cutoffs were reported to be ≥95 for PAPA-AN and ≥92 for PAPA-PN.

PHQ-9

Two studies, one from Hong Kong and one from Japan, gave information about PHQ (Kroenke et al., 2001). Lai et al. (2010) reported that internal consistency was .88 (0–6 months postpartum), split-half reliability as measured by the Spearman–Brown coefficient was .82, and all items had good item-total correlations (r = .53–.74). PHQ had excellent performance, but was less accurate than EPDS in detecting PPD among Chinese fathers with an optimum cutoff of ≥4. The study from Japan (Konishi et al., 2016) reported that internal consistency when measured in pregnancy was .79 and the optimal cutoff was ≥10. To assess validity, the researchers assessed correlation with the scale completed by the men’s pregnant female partners, finding a weak correlation (r = .418, p < .01).

Discussion

We identified 13 instruments, described in 59 studies with about 29,000 participants across 25 countries, used to measure paternal depressive symptoms in pregnancy and the postpartum period. There has been a surge in studies focusing on and measuring PPD in the last decade, indicating an increase in research interests into men’s depression related to fatherhood. The United States and Europe dominate the origin of the studies. These are countries with a strong family public policy, and where fathers are increasingly involved in the care and upbringing of their young children (Perez et al., 2017). Although we identified 13 instruments reported in 59 studies, validation studies of instruments used for detecting PPD are rare. Likely, more than 13 instruments are in use, but in three decades, only six instruments have been subject to validation, and of these only EPDS is analyzed by more than two studies. In addition, of the instrument identified, only two are specifically related to pregnancy and the postpartum period (EPDS and PDSS), only two are originally developed for assessing male symptomatology (PAPA and GMDS), and none are uniquely developed for assessing PPD. Most of the instruments are created for the general population regardless of gender. But the general instruments include items detecting symptoms that can be a natural part of fatherhood in this period, such as lack of sleep and increased fatigue (Philpott et al., 2020). The two instruments focusing especially on the postpartum period, EPDS and PDSS, are originally designed for assessing female symptomatology. Only one study—a U.S. study aiming to examine the link between maternal and paternal PPD—has used PDSS and given information about any measurement properties for fathers. It reported that internal consistency is good, ≥.83, in the postpartum period (Don & Mickelson, 2012). EPDS (Cox et al., 1987), on the contrary, is not only the most extensively assessed instrument for measuring PPD, worldwide, but also the instrument subject to most validation in men. EPDS is already one of the most extensively used instruments to evaluate postnatal depression in women (Levis et al., 2020), and with eight validation studies on men, it may already be the first choice for identifying probable depression in postnatal fathers. The eight validation studies, conducted in eight different countries across three continents, reported that reliability as measured by the Cronbach’s alpha and the Spearman–Brown coefficients is good. In addition, the results suggested that the EPDS is related to GMDS, PAPA, and PHQ, but significantly more accurate than these in detecting postnatal depression in men. In EPDS, internal consistency was >.70 in 34 of the 38 studies reporting on it. Across all instruments, the internal consistency coefficient ranged from .60 to .91. According to Terwee et al. (2007), Cronbach’s alpha between .70 and .95 is described as a measure of good internal consistency. This definition excludes CHQ and HADS as reliable for detecting PPD. The number of items of the instruments varied from 6 to 30, with BSI, PHQ, and EPDS as the three least resource-intensive instruments. Several of the studies reported different cutoff scores to detect depression and they were not always the scores recommended by the original developer of the instruments. For example, for SDS the included study used >40, while the literature recommends >50 (Dunstan & Scott, 2019). Given the cutoff point will affect the prevalence of depressive symptoms reported in each study (Matthey et al., 2006; Perez et al., 2017), this is an important issue. For BSI, DASS, and PDSS, the included studies provided no score information at all. In studies across Europe and Asia, EPDS had optimum cutoff scores from ≥5 to ≥13. The originally cutoff score for women is suggested at ≥13 (Cox et al., 1987). This variation in cutoff is likely due in part to cultural factors, with varying degrees of expressiveness about emotions across cultures. Although the validation studies of EPDS suggest ≥10 may be the most appropriate value in most settings, proper validation of EPDS and other screening instruments when they are imported from a foreign culture is important (Matthey et al., 2006). As regards implications, our review found that a dozen instruments exist for detecting paternal depressive symptoms and some of them appear reliable and valid for men, even if they are originally developed for women or general community samples. Our results are useful for both clinical practice and research. Based on its characteristics and measurement properties, presently, EPDS seems the best option for screening of fathers for possible depression. Upon detection of probable depression, health care workers can subsequently plan appropriate interventions and refer men to psychological support (Matthey et al., 2001). We agree with Shaheen et al. (2019) that EPDS seems a reasonable tool for identifying probable postnatal depression in fathers. Apart from appearing reliable and valid—more so than the other instruments identified—it is specifically developed to measure postnatal depression, takes only a short time to administer, is easy to understand, and it is open access (can be used free of charge). But as with all instruments, recalibration for detection of optimum cutoff should be performed based on the specific study setting, population, and culture. There seems to exist a sufficient number of validation studies to undertake a systematic review of EPDS’s measurement properties. Even so, research is necessary to examine whether there is a need for developing a specific instrument for screening for PPD. EPDS does not include items described as more gender-specific symptoms for men, such as anger and affective rigidity (Kim & Swain, 2007; Musser et al., 2012; Psouni et al., 2017). Although we found that it appears reliable and valid, the lack of gender-specific items may lead to underdetection of symptoms in fathers (Matthey et al., 2001; Philpott et al., 2020) and it is unclear whether EPDS and the other instruments uniquely identify depressive symptoms, or a broader state of mind, characterized by distress and anxiety (Loscalzo et al., 2015). Further research evaluating instruments for detection of depressive symptoms in fathers is needed, with assessment of all measurement properties, from more cultural settings, with also younger fathers and low-income families. Our study has the merit of being performed in a systematic way, with verifiability and transparent methods in accordance with standardized methodological frameworks both for conducting (Arksey & O’Malley, 2005) and for collating, summarizing, and reporting results (Mokkink et al., 2010; Terwee et al., 2007). In addition, we searched five databases and for gray literature, and the scoping review was conducted by experienced researchers with knowledge of review methodology and paternal depression measurement. Our findings should be interpreted in light of a few limitations. We excluded studies published in other languages than English and Scandinavian languages, and we did not account for study quality, as the aim was to give an overview of the topic according to scoping review methodology.

Conclusion

Depression is a significant challenge for fathers in pregnancy and the postpartum period. To detect symptoms and support fathers experiencing depression, health professionals need valid and reliable instruments. Our study fills a gap in the literature regarding measurement of depression among fathers during the transition to parenthood. We identified that although there are 13 instruments with studies on one or more of their psychometric properties, research into instruments about men’s depression related to fatherhood and their measurement properties is scarce. Most studies only provide information about the instruments’ internal consistency; of 59 included studies, only 12 studies report on instruments’ validity. Our results provide some preliminary direction for depression assessments among men during the transition to parenthood. EPDS is the most extensively evaluated instrument and appears to be a reliable and valid self-report measure for identifying probable postnatal depression in fathers. However, further studies on EPDS and other instruments’ measurement properties are warranted to broaden our knowledge base about reliable and valid instruments for identifying PPD.

73 in total

1. The Gotland Male Depression Scale: a validity study in patients with alcohol use disorder.

Authors: Finn Zierau; Anne Bille; Wolfgang Rutz; Per Bech
Journal: Nord J Psychiatry Date: 2002 Impact factor: 2.202

2. High risk of depression, anxiety, and poor quality of life among experienced fathers, but not mothers: A prospective longitudinal study.

Authors: Yi-Han Chen; Jian-Pei Huang; Heng-Kien Au; Yi-Hua Chen
Journal: J Affect Disord Date: 2018-08-11 Impact factor: 4.839

3. A comparison of postnatal depression and related factors between Chinese new mothers and fathers.

Authors: Qing Mao; Li-xia Zhu; Xiao-yin Su
Journal: J Clin Nurs Date: 2011-03 Impact factor: 3.036

4. Detecting postnatal depression in Chinese men: a comparison of three instruments.

Authors: Beatrice P Y Lai; Alan K L Tang; Dominic T S Lee; Alexander S K Yip; Tony K H Chung
Journal: Psychiatry Res Date: 2010-05-21 Impact factor: 3.222

5. Paternal Adjustment and Paternal Attitudes Questionnaire: Antenatal and Postnatal Portuguese Versions.

Authors: Tiago Miguel Pinto; Catarina Samorinha; Iva Tendais; Rui Nunes-Costa; Bárbara Figueiredo
Journal: Assessment Date: 2015-12-11

6. Gender roles, social support, and postpartum depressive symptomatology. The benefits of caring.

Authors: J A Richman; V D Raskin; C Gaines
Journal: J Nerv Ment Dis Date: 1991-03 Impact factor: 2.254

7. Common misconceptions about validation studies.

Authors: Matthew P Fox; Timothy L Lash; Lisa M Bodnar
Journal: Int J Epidemiol Date: 2020-08-01 Impact factor: 7.196

8. Paternal postnatal depression in Japan: an investigation of correlated factors including relationship with a partner.

Authors: Akiko Nishimura; Yuichi Fujita; Mayumi Katsuta; Aya Ishihara; Kazutomo Ohashi
Journal: BMC Pregnancy Childbirth Date: 2015-05-31 Impact factor: 3.007

9. Paternal Perinatal Depression Assessed by the Edinburgh Postnatal Depression Scale and the Gotland Male Depression Scale: Prevalence and Possible Risk Factors.

Authors: Magdalena Carlberg; Maigun Edhborg; Lene Lindberg
Journal: Am J Mens Health Date: 2018-01-19

10. The courses of maternal and paternal depressive and anxiety symptoms during the prenatal period in the FinnBrain Birth Cohort study.

Authors: Riikka Korja; Saara Nolvi; Eeva-Leena Kataja; Noora Scheinin; Niina Junttila; Henna Lahtinen; Suoma Saarni; Linnea Karlsson; Hasse Karlsson
Journal: PLoS One Date: 2018-12-17 Impact factor: 3.240