Literature DB >> 28928174

Exercise for patients with major depression: a systematic review with meta-analysis and trial sequential analysis.

Jesper Krogh¹, Carsten Hjorthøj¹, Helene Speyer¹, Christian Gluud², Merete Nordentoft¹.

Abstract

OBJECTIVES: To assess the benefits and harms of exercise in patients with depression.
DESIGN: Systematic review DATA SOURCES: Bibliographical databases were searched until 20 June 2017. ELIGIBILITY CRITERIA AND OUTCOMES: Eligible trials were randomised clinical trials assessing the effect of exercise in participants diagnosed with depression. Primary outcomes were depression severity, lack of remission and serious adverse events (eg, suicide) assessed at the end of the intervention. Secondary outcomes were quality of life and adverse events such as injuries, as well as assessment of depression severity and lack of remission during follow-up after the intervention.
RESULTS: Thirty-five trials enrolling 2498 participants were included. The effect of exercise versus control on depression severity was -0.66 standardised mean difference (SMD) (95% CI -0.86 to -0.46; p<0.001; grading of recommendations assessment, development and evaluation (GRADE): very low quality). Restricting this analysis to the four trials that seemed less affected of bias, the effect vanished into -0.11 SMD (-0.41 to 0.18; p=0.45; GRADE: low quality). Exercise decreased the relative risk of no remission to 0.78 (0.68 to 0.90; p<0.001; GRADE: very low quality). Restricting this analysis to the two trials that seemed less affected of bias, the effect vanished into 0.95 (0.74 to 1.23; p=0.78). Trial sequential analysis excluded random error when all trials were analysed, but not if focusing on trials less affected of bias. Subgroup analyses found that trial size and intervention duration were inversely associated with effect size for both depression severity and lack of remission. There was no significant effect of exercise on secondary outcomes.
CONCLUSIONS: Trials with less risk of bias suggested no antidepressant effects of exercise and there were no significant effects of exercise on quality of life, depression severity or lack of remission during follow-up. Data for serious adverse events and adverse events were scarce not allowing conclusions for these outcomes. SYSTEMATIC REVIEW REGISTRATION: The protocol was published in the journal Systematic Reviews: 2015; 4:40. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

Entities: CellLine Chemical Disease Gene Species

Keywords: Evidence Based Medicine; Exercise; Meta-analysis; Randomised Clinical Trials; Systematic Review

Mesh：

Year: 2017 PMID： 28928174 PMCID： PMC5623558 DOI： 10.1136/bmjopen-2016-014820

Source DB: PubMed Journal: BMJ Open ISSN： 2044-6055 Impact factor: 2.692

The protocol for this review has previously been published. Using meta-regression analysis, trial sequential analysis and the grading of recommendations assessment, development and evaluation system, the conclusions from this review is based on a firm and transparent platform. Based on an extensive literature search, this review included 35 trials allocating almost 2500 participants diagnosed with depression to exercise or control interventions than could be analysed. The effect estimates are largely based on trials at high risk of bias. Effect estimates from included trials had considerable heterogeneity.

Introduction

Depression is a common disorder affecting up to 17% of the population during their lifetime.1 2 Based on data from WHO, depression is ranked as the second largest healthcare problem globally, in terms of years lived with disability.3 Depending on its severity, depression is often treated using psychotherapy, antidepressants or a combination of both. However, the clinical benefits of antidepressants4–6 and psychotherapy7–9 has been challenged. Both treatments are costly in terms of time and money and may also have adverse effects. Compliance with antidepressant treatment is poor; the dropout rate in clinical trials is reported to be between 12% and 40% within the initial 6–8 weeks of treatment.4 10 The weakness of evidence for the beneficial effect of current interventions, along with problems related to low compliance and harms, has resulted in an interest in using alternative interventions. The use of exercise as an intervention has attracted considerable attention, and various forms of exercise varying in intensity have been assessed in a number of randomised clinical trials to test their effectiveness as a treatment for patients with depression. In 2011, we published a meta-analysis of randomised clinical trials examining the effect of exercise on depressive symptoms in patients with clinical depression.11 The results suggested that referring patients with clinical depression to exercise programme was associated with a small-to-moderate effect on depressive symptoms. However, restricting the analysis to three trials at low risk of bias, the effect estimate was non-significant. Since 2011, other reviews have been published on the effect of exercise on depressive symptoms,12 in older people,13 and in patients with chronic illnesses.14 However, none of these reviews addressed the specific population of adults diagnosed with major depression according to valid diagnostic criteria, such as the International Classification of Diseases15 or the Diagnostic and Statistical Manual of Mental Disorders.16 The reviews contained a number of trials that included volunteers who were defined as being depressed on the basis of psychometric testing (eg, Beck Depression Inventory17), as opposed to individuals with a clinical diagnosis of major depression. Furthermore, several randomised clinical trials investigating the effect of exercise in clinically depressed individuals have been published since our 2011 review.11 The objectives of the present systematic review are to investigate the beneficial and harmful effects of exercise, in terms of severity of depression, lack of remission, quality of life and suicide versus controls with or without co-interventions in adults with a clinical diagnosis of major depression. The current systematic review differs from our previous review in a number of aspects.11 We only considered trials including participants diagnosed with depression according to a validated diagnostic system. We also included trials including participants with somatic comorbidity, for example, cancer or diabetes. The harmful effects of exercise interventions are also addressed, the intervention effects being assessed according to the grading of recommendations assessment, development and evaluation (GRADE) framework, and bibliographical searches have been extended to include a Chinese and a South American database until 2016.

Methods/design

The protocol for this review has previously been published.18

Search strategy

The following bibliographical databases was searched: CENTRAL, MEDLINE, EMBASE, Science Citation Index (Web of Science), LILACS and Wanfang using medical subject headings (MeSH or similar) when possible or text word terms: depression, depressive disorder and exercise, aerobic, non-aerobic, physical activity, physical fitness, walking, jogging, running, bicycling, swimming, strength or resistance (see online supplementary material S1 for an example of a bibliographical search). The main search was conducted in August 2015, and the latest search was conducted on 20 June 2017.

Trial selection

One investigator (JK) examined titles and abstracts to remove obviously irrelevant reports. Two investigators (JK+HS) examined full-text reports and abstracts determining compliance with inclusion criteria. A trial was considered eligible if it was a randomised clinical trial including participants diagnosed as having major depression according to a valid and recognised diagnostic system (ie, Research Diagnostic Criteria,19 International Classification of Diseases (ICD)15 or Diagnostic and Statistical Manual of Mental disorders (DSM)16 and included participants aged >17 years. Abstracts and full-text reports were included. Trials were excluded if they measured depression immediately after a single bout of exercise, compared one form of exercise versus another, or compared different exercise intensities without including a control group. The trials had to allocate participants to an exercise intervention versus a control group (ie, exercise vs a control group receiving no intervention or treatment as usual or an attention control using light exercise) or using exercise as an add-on treatment (ie, exercise plus usual treatment in the experimental group vs usual treatment alone in the control group). Exercise intervention was defined as a systematic physical intervention with the intention to increase muscle strength and/or cardiovascular fitness, for example, running, swimming or weight lifting. In case of attention control, it should specifically be mentioned by the authors of the trial report that the intervention was intended as a control intervention.

Outcomes

The primary outcomes were: 1) depressive symptoms measured on a continuous scale assessed at the end of the intervention; 2) lack of remission, that is, a binary outcome of the proportion of participants in each intervention group of the trial who did not obtain remission at the end of the intervention according to the authors’ own definition and 3) serious adverse events defined according to International Council for Harmonisation, Good Clinical Practice (ICH-GCP) as any untoward medical occurrence that was life threatening, resulted in death or persistent or significant disability (ICH-GCP 1997).20 Serious adverse events accordingly include suicide attempts as well as suicides. The secondary outcomes were quality of life, non-serious adverse events (eg, muscle injuries) as well as depressive symptoms and lack of remission assessed after the intervention.

Data extraction

Two authors (JK, HS) independently extracted data using a prepiloted structured form. Any discrepancies in the data extraction or inclusion/exclusion of trials was resolved by referring to the original papers. CG or MN assisted as adjudicator in cases of disagreements. Data extraction included, in addition to outcomes, information regarding country of origin, number of randomised participants, number of participants included in efficacy analysis, mean age of participants, diagnostic system, baseline assessment of depression severity, type of intervention, frequency of intervention and duration of intervention. Continuous outcomes were preferred in the following order: postintervention scores with corresponding SD, mean change from baseline with SD, mean difference between groups postintervention and reported outcomes were preferred to figures. JK and CH independently performed the assessment of bias domains. The authors JK, CG and MN have previously published trial reports assessing the effect of exercise in participants with depression,21 22 and to reduce the risk of academic bias two additional authors were included in the current systematic review (CH, HS).

Risk of bias assessment

Definitions in the assessment of bias risk of a trial was conducted according to the Cochrane Handbook for Systematic Reviews of Interventions23 of the following domains: allocation sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, incomplete outcome data, selective outcome reporting, for-profit bias and other bias. Trials assessed as having ‘low risk of bias’ in all of the above specified domains were considered ‘trials at low risk of bias’. Trials assessed as having ‘uncertain risk of bias’ or ‘high risk of bias’ in one or more of the above specified domains were considered trials at ‘high risk of bias’. In line with our previous systematic review11 and the latest Cochrane review on exercise for depression,24 trials at low risk of bias in the allocation concealment domain, blinded outcome assessment domain and the incomplete outcome data domain were characterised as ‘trials potentially having less risk of bias than other trials at high risk of bias’. Trials assessing the effect of behavioural interventions are rarely able to mask the allocation, and participants and healthcare providers are therefore not blinded. Therefore, we will also report the number of trials at low risk of bias in the remaining domains.

Data synthesis and analysis

In order to be able to include all of the trials in our meta-analysis, estimates of standardised mean difference (SMD) for each individual trial was carried out. SMD is the mean difference in depression score between the exercise and control groups divided by the pooled SD at follow-up. The result is a unit-free effect size. By convention, SMD effect sizes of 0.2, 0.5 and 0.8 are considered small, medium and large intervention effects.23 For dichotomous variables, we calculated the risk ratio (RR) with a 95% CI. It was expected that some trials would have several intervention groups. Data from the experimental groups were pooled and compared with the data from the control group. In case of discrepancies between the random-effects model analysis and the fixed-effect model analysis, both results are reported; otherwise, only results from the random-effects analysis are reported. The degree of heterogeneity was quantified using the I2 statistic,25 which can be interpreted as the percentage of variation observed between the trials attributable to between-trial differences, rather than sampling error (chance). Heterogeneity was explored by analyses of subgroups (see below). For the primary outcomes, trial sequential analysis was performed.26 27 In order to calculate the required information size and the cumulative Z-curve’s eventual breach of relevant trial sequential monitoring boundaries, the required information size for the primary continuous outcome was based on type I error of 5%, a beta of 10%, the SE of the meta-analysis and a minimal difference of three points on the Hamilton Depression Scale, 17 items (HAM-D17).18 Post hoc we calculated the required information size including all trials. This was done by converting effect estimates from trials reporting other outcome scales into the HAM-D17 scale as described by Thorlund et al.28 In order to calculate the required information size and the cumulative Z-curve’s eventual breach of relevant trial sequential monitoring boundaries, the required information size for lack of remission was based on type I error of 5%, a beta of 10%, the proportion of participants in the control group with the outcome and a relative risk reduction of 15% and 30%. Bayes factors were calculated for all primary outcomes.29 Low p values suggest that we can reject the null-hypothesis. But even a low p value from a meta-analysis can be misleading if there is also a low probability that data are compatible with the anticipated intervention effect. In other words, the probability that the actual measured difference in effect of the compared interventions resulted from an a priori anticipated ‘true’ difference needs to be considered. For this purpose, it is helpful to calculate the Bayes factor, which is the ratio of the p value probabilities of the meta-analysis result divided by the probability of the anticipated effect, or ‘true’ effect.29 As suggested by Jakobsen et al,29 a Bayes factor <0.1 together with a low p value suggest, if bias can be ruled out, that the observed result is compatible with the a priori expected effect. If the Bayes factor is >0.1, the result is not compatible with the a priori expected effect and the effect may be lower. To assess the potential impact of missing data (incomplete outcome data bias), we did sensitivity analysis of missing data using the following strategy: a ‘best-worst’ case scenario was assessed, assuming that all participants lost to follow-up in the intervention group had a beneficial outcome (the group mean minus 1 SD), and all those with missing outcomes in the control group have had a harmful outcome (the group mean plus 1 SD and 2 SD). In addition, the reverse ‘worst-best-case’ scenario analysis was also performed.29 Missing data for the ‘lack of remission’ outcome were imputed in sensitivity analysis according to the following scenarios30: 1) poor outcome analysis: assuming that all of the drop-outs/participants lost from both the experimental and the control arms experienced the outcome, including all randomised participants in the denominator; 2) good outcome analysis: assuming that none of the drop-outs/participants lost from the experimental and the control arms experienced the outcome, including all randomised participants in the denominator; 3) extreme case analysis favouring the experimental intervention (‘best-worse’ case scenario): none of the drop-outs/participants lost from the experimental arm, but all of the drop-outs/participants lost from the control arm experienced the outcome, including all randomised participants in the denominator and 4) extreme case analysis favouring the control (‘worst-best’ case scenario): all of the drop-outs/participants lost from the experimental arm, but none from the control arm experienced the outcome, including all randomised participants in the denominator.

Subgroup analyses

In subgroup analyses, the possible effects of variables on intervention effects on outcomes and heterogeneity were compared. Trials potentially having less risk of bias (ie, trials with adequate allocation concealment, blinded outcome assessment and intention-to-treat analysis) were compared with trials at high risk of bias. The effect of age was assessed by comparing trials including older participants (mean age >59 years) to trials including younger participants (mean age <60 years). The effect of type of exercise was assessed by comparing trials using group exercises compared with trials using individual exercise. The effect of duration of intervention was assessed by comparing trials with short duration of intervention to trials with long duration of intervention splitting by the median time of duration. The effect of type of control group was assessed by comparing trials using attention control to trials with waitlist controls and comparing trials with exercise as add-on to medication to trials not using any medication. In addition, a within-study comparison of low-dose exercise versus high-dose exercise in trials using different exercise intensities was performed. The effect of comorbid somatic disease was assessed by comparing the effect estimates from trials including participants with depression compared with trials including participants with depression in addition to a somatic disease. Publication bias was assessed by visual inspection of a funnel plot and by Egger’s test and if publication bias plausible Duval’s and Tweedie’s trim and fill procedure was conducted.31 We assessed and graded the evidence according to the GRADE for high risk of bias, imprecision, indirectness, heterogeneity and publication bias.32 Based on this assessment, the intervention was graded accordingly: ‘high quality’—we are very confident that the true effect lies close to that of the estimate of the effect; ‘moderate quality’—we are moderately confident in the effect estimate. The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different; ‘low quality’—our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect; ‘very low quality’—we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of the effect.33

Deviations from our protocol

Post hoc we included trials using the Chinese Classification of Mental Disorders (CCMD) as well as a few trials including participants classified as having ‘minor depression’. The CCMD system closely adhere to the ICD and DSM systems and have been found highly compatible in field studies, so these studies were included.34 A few trials included some participants classified as having ‘minor depression’ according to the trials chosen diagnostic system (eg, DSM), and it is questionable if these participants have major depression. We therefore decided to include these trials and to conduct a subgroup analysis exclusively including participants with major depression. To further explore heterogeneity, we post hoc included subgroup analysis comparing intervention effects in inpatients and outpatients as well as an analysis according to trial size. Trials were divided into small or large trials using the median of total n included in the efficacy analysis. The effect of exercise capacity was post hoc assessed by comparing trials with a high increase in maximal oxygen uptake (VO2max) with studies with lower increase in maximal oxygen uptake. Assessment of exercise capacity was based on the increase of VO2max in the intervention groups and trials were stratified to either high or low increase in exercise capacity by median. We did not conduct trial sequential analysis based on a relative risk reduction of 30% of lack of remission as this was an implausible effect.

Participant involvement

Depressed participants were not involved in this study.

Results

Bibliographical search and trial characteristics

The main bibliographical search was conducted on 26 August 2015 and the final updates were conducted on 20 June 2017. As illustrated in online supplementary figure S1, we identified 45 publications reporting the effect of exercise on depressive symptoms in 35 randomised clinical trials.21 22 35–78 Seventeen trials were conducted in Europe,21 22 40 49 52 53 55 61 65–68 74 75 77 79 80 eight in the USA,38 39 43 45 60 64 76 81 six in Asia,47 69–73 two in Australia54 58 and two in South America.56 63 A total of 2630 participants were randomised and 2498 were included in the efficacy analysis of benefit. Ten trials included inpatients47 49 56 67 69–73 79 and five trials included participants with a mean age >60 years.52 54 58 60 61 No trials exclusively included participants with comorbid somatic disease. Four trials reported the continuous outcome as mean change from baseline in each group with a corresponding SD,39 53 65 68 and one trial presented data as mean difference between groups postintervention.40 The remaining trials reported postscores in each group with corresponding SD (see table 1 for trial characteristics).

Table 1

Characteristics of trials assessing exercise for patients diagnosed with depression

Author, first Country of origin	Participants	Severity of depression at baseline	N at baseline (included in trial efficacy analysis)	Type of intervention	Frequency	Duration
Klein et al 86 USA	Outpatients Mean age: 30 years (SD 7) 72% female	SCL-D: 2.4 (SD 1)	50 (22)	Aerobic exercise: supervised individual running Control group: supervised meditation in groups	Two sessions per week Control group: one session per week	12 weeks
Martinsen et al 79 Norway	Inpatients Mean age: 40 years (range 17–60) Distribution of sex not reported	BDI: 28.0 (SD 9)	49 (43)	Aerobic exercise: supervised group exercise. Control group: occupational therapy	Three sessions per week Control group: three sessions per week	9 weeks
Epstein64 USA	Outpatients Mean age: 39 years (range 24–60) (NR) % female	BDI: 23.4 (SD 7)	21 (17)	Aerobic exercise: supervised group exercise. Control group: waitlist control	Three sessions per week	8 weeks
Doyne et al 43 USA	Outpatients Mean age: 29 years (SD 4) 100% female	HAM-D₁₇: 13.0 (SD 7)	52 (25)	Aerobic exercise OR weightlifting: supervised individual exercise. Control group: waiting list	Four sessions per week	8 weeks
Veale et al 62 UK	Outpatients Mean age: 35 years (range 19–58) 64% female	BDI: 24.5 (SD 6)	83 (65)	Aerobic exercise: Supervised group exercise. Control group: standard treatment from psychiatric services	Three sessions per week	12 weeks
Singh et al 60 USA	Outpatients recruited from a register of volunteers Mean age: 71 years (SD 1)	BDI: 19.9 (SD 2.3)	32 (32)	Progressive resistance training: supervised group exercise Control group: attended seminars on health	Three sessions per week Control group: two sessions per week	10 weeks
Blumenthal et al 82 USA	Outpatients Mean age: 57 years (SD 7) 71.8% female	HAM-D₁₇: Not reported	103 (103)	Aerobic exercise: supervised exercise plus antidepressant medication (sertraline). Control group: antidepressant medication (sertraline)	Three sessions per week	16 weeks
Mather et al 52 UK	Outpatients Treatment resistant Mean age: 65 years (range 53–91) 69% female	HAM-D₁₇: 17.1 (SD 6)	86 (85)	Mixed aerobic and non-aerobic exercise: supervised group exercise. Control group: attended health seminars	Two sessions per week Control group: two seminars per week	10 weeks
Dunn et al 45 USA	Outpatients Mean age: 36 years (SD 6) 75% female	HAM-D₁₇: 19.4 (SD 2)	80 (80)	Aerobic exercise: individually supervised exercise with (1) low energy expenditure (EE) OR (2) high EE OR (3) low EE OR (4) high EE Control group: flexibility exercise	Group (1) and (2): three sessions per week Group (3) and (4): five sessions per week Control group: three sessions per week	12 weeks
Singh et al 58 Australia	Outpatients Mean age: 69 (SD 6) 55% female	HAM-D₁₇: 18.9 (SD 4.2)	60 (54)	Progressive resistance training (PRT): (1) Low-intensity PRT OR (2) high-intensity PRT. Control group: standard GP care	Group (1) and (2): three sessions per week	8 weeks
Pilu et al 55 Italy	Outpatients Treatment resistant Age between 40 and 60 years 100% female	HAM-D₁₇: 19.7 (SD 6)	30 (30)	Resistance exercise: supervised group sessions Control group: standard treatment	Two sessions per week	32 weeks
Viera et al 63 Brazil	Outpatients Mean age 43.66 years (SD NR) 100% female	HAM-D₂₁: 31.9 (SD 3)	18 (18)	Aerobic exercise: supervised water aerobics Control group: standard GP care	Two sessions per week	12 weeks
Blumenthal and Babyak39 USA	Outpatients Mean age: 52 years (SD 8) 75.8% female	HAM-D₁₇: 16.7 (SD 4)	153 (153)	Aerobic exercise: (1) supervised group exercise OR (2) home-based exercise Control group: placebo medication	(1) and (2): Three sessions per week	16 weeks
Krogh et al 21 Denmark	Outpatients Mean age: 39 years (SD 9) 74% female	HAM-D₁₇: 17.8 (SD 4)	165 (165)	Exercise: (1) aerobic supervised group exercise OR (2) supervised group resistance training Control group: relaxation and stretching exercise	(1)and (2): Two sessions per week Control group: two sessions per week	16 weeks
Mota-Pereira et al 53 Portugal	Outpatients Treatment resistant Mean age: 47.5 years (SD 3) 65.5% female	HAM-D₁₇: 17.1 (SD 3)	33 (29)	Aerobic exercise: home-based exercise+supervised Control group: attention control	Four home-based sessions/week One supervised session/week Control group: one supervised session/week	12 weeks
Krogh et al 22 Denmark	Outpatients Mean age: 42 years (SD 11) 67% female	HAM-D₁₇: 18.9 (SD 4)	115 (115)	Aerobic exercise: supervised group exercise Control group: supervised stretching exercise in groups	Three sessions per week Control group: Three sessions per week	12 weeks
Chalder et al 40 UK	Outpatients Mean age: 40 years (SD 13) 66% female	BDI: 32.1 (SD 9)	361 (361)	Exercise: participants received individually tailored support and encouragement to engage in physical activity. Control group: standard GP care	Individual	16 weeks
Fang et al 73 China	Inpatients Mean age: 44 years (SD 14) 66.9% female	HAM-D₂₄: 29.2 (SD 5)	90 (90)	Aerobic exercise: groups 1 and 2 had supervised group exercise, high intensity Control group: 15 min stretching	Groups 1 and 2 had 3 and 5 sessions per week, respectively Control group: three sessions per week	6 weeks
Huipeng and Xiaohui70 China	Inpatients Mean age: 30 years (SD 5) 100% female	HAM-D₁₇: 28 (SD 5)	68 (68)	Aerobic exercise: jogging Control group: standard treatment	Five sessions per week	6 weeks
Ho et al 47 2014 Hong Kong	Inpatients Mean age: 46 years (SD 12) 67.3% female	MADRS: 19 (10)	52 (52)	Aerobic exercise: supervised exercise Control group: 10 min stretching	Five sessions per week	3 weeks
Danielsson et al 65 Sweden	Outpatients Mean age: 45 years (SD 13) 76% female	MADRS: 24.0 (SD 5)	42 (42)	Mixed aerobic and non-aerobic exercise: first 2 weeks individual supervised exercise then supervised group exercise Control group: one session with advice on physical activity	Two sessions per week	10 weeks
Pfaff et al 54 Australia	Outpatients Mean age: 61 years (SD 8) 63% female	MADRS: 21.3 (SD NR)	200 (200)	Resistance exercise: supervised home-based exercise Control group: standard GP care	Three sessions per week	12 weeks
Guifeng 72 China	Inpatients Mean age: 33 years (SD 14) 70% female	HAM-D₂₄: 25.9 (SD 4)	70 (70)	Aerobic exercise: supervised group exercise Control group: standard treatment	Five sessions per week	8 weeks
Junchin et al 71 China	Inpatients Mean age: 28 years (SD 7) 61% female	HAM-D₂₄: 25.8 (SD 3)	70 (70)	Aerobic exercise: supervised aerobic exercise of the patients’ own choice Control group: standard treatment	Five sessions per week	8 weeks
Schuch et al 84 Brazil	Inpatients Mean age: 40 years (SD 11) 74% female	HAM-D₁₇: 26.7 (SD 2)	50 (50)	Aerobic exercise: supervised individual exercise. Control group: standard treatment	Three sessions per week	2 weeks
Kerling et al 49 Germany	Inpatients Mean age: 43 years (SD 10)	MADRS: 24.0 (SD 9)	42 (42)	Aerobic exercise: supervised exercise Control group: standard treatment	Three sessions per week	6 weeks
Belvederi et al 83 Italy	Outpatients Mean age: 75 years (SD 6) 71% female	HAM-D₁₇: 20.1 (SD 3)	121 (121)	Aerobic exercise: (1) sertraline+supervised non-progressive exercise OR (2) sertraline+supervised progressive aerobic exercise Control group: sertraline	Three sessions per week	24 weeks
Carneiro et al 66 Portugal	Outpatients Mean age: 50.16 years (SD 12) 100% female	BDI: 48.8 (SD 10)	26 (19)	Aerobic exercise: supervised exercise Control group: standard treatment	Three sessions per week	16 weeks
Doose et al 68 Germany	Outpatients Mean age: 47.9 years (SD 11) 63% female	HAM-D₁₇: 14.2 (SD 3)	46 (46)	Aerobic exercise: supervised aerobic exercise Control group: standard treatment	Three sessions per week	8 weeks
Pentecost et al 77 UK	Outpatients Mean age: 44.4 years (SD 14) 48% female	PHQ-9: 16.5 (SD 4)	60 (44)	Exercise: behavioural activation plus physical activity promotion Control group: behavioural activation	Individual	12 weeks
Salehi et al 69 Iran	Inpatients Mean age: 30.0 years (SD 6) 35% female	HAM-D₂₁: 43.4 (SD 8)	40 (40)	Aerobic exercise+ECT: supervised aerobic exercise Control group: ECT	Three sessions per weeks Control group 3 ECTs per week	4 weeks
Legrand et al 67 France	Inpatients Mean age: 46.9 years (SD 13) 67% female	BDI: 36.0 (SD 6)	24 (24)	Aerobic exercise: supervised aerobic exercise Control group: standard treatment	10 sessions in 10 consecutive days	10 days
Euteneuer et al 75 Germany	Outpatients Mean age: 37.1 years (SD 12) 52% female	BDI: 27.2 (SD 9)	71 (68)	Exercise: CBT+PA promotion Control group: CBT+low energy activities	Individual	16 weeks
Olson et al 74 Ireland	Outpatients Mean age: 21.1 years (SD 2) 80% female	BDI: 24.2 (SD 12)	50 (30)	Aerobic exercise: supervised aerobic exercise Control group: stretching exercise	Three sessions per week Three sessions per week	8 weeks
Patten et al 76 USA	Outpatients Mean age: 37.5 years (SD 11) 100% female	PHQ-9: 11.7 (SD 5)	30 (26)	Aerobic exercise: supervised aerobic exercise Control group: health education	Three sessions per week	12 weeks

BDI, Beck’s Depression Inventory; CBT, cognitive behavioural therapy; ECT, electroconvulsive therapy; EE, energy expenditure; GP, general practitioner; HAM-D17: Hamilton Depression Scale, 17 items; MADRS, Montgomery-Asberg Depression Rating Scale; NR, not reported; PA, physical activity; PHQ-9: Patient Health Questionnaire; PRT, progressive resistance training; SCL-D, Symptom Check List, depression subscale.

Characteristics of trials assessing exercise for patients diagnosed with depression BDI, Beck’s Depression Inventory; CBT, cognitive behavioural therapy; ECT, electroconvulsive therapy; EE, energy expenditure; GP, general practitioner; HAM-D17: Hamilton Depression Scale, 17 items; MADRS, Montgomery-Asberg Depression Rating Scale; NR, not reported; PA, physical activity; PHQ-9: Patient Health Questionnaire; PRT, progressive resistance training; SCL-D, Symptom Check List, depression subscale.

Bias risk assessment

Sequence generation was adequate in 15/35 (43%), allocation concealment was adequate in 13/35 (37%) trials, blinding of participants and trial personnel was adequate in 0/35 (0%), blinded outcome assessment was performed in 16/35 (46%), low risk of bias in the ‘incomplete outcome data’ domain was found in 12/35 (34%) trials, selective outcome reporting domain was adequate in 31/35 (89%), for-profit bias domain was adequate in 19/35 (54%) and 25/35 (71%) were free of other bias. Accordingly, all trials were at high risk of bias. Given the nature of the intervention, no trial had blinded participants or trial personnel, however, two trials had low risk of bias in all other bias domains.22 54 Five trials (16%) were sponsored by for-profit organisations: three trials were supported by pharmaceutical companies,53 79 82 one trial by a company producing fitness machines45 and one trial by an insurance company.21 According to our a priori defined criteria, 4/35 (11%) trials potentially had less risk of bias than the other trials at high risk of bias21 22 54 56 (see table 2 for details on assessment of risk of bias).

Table 2

Risk of bias in trials assessing exercise for patients diagnosed with depression

Author, year of publication	Sequence generation	Allocation concealment	Blinding of participants and trial personnel assessors	Blinding of outcome assessors	Incomplete outcome data	Selective outcome reporting	For-profit bias	Other bias	Comment on ‘other bias’
Klein et al 86 1985	Unclear	Unclear	High	High	High	Low	Low	Low
Martinsen et al 79 1985	Unclear	Unclear	High	High	High	Low	High	Low
Epstein64 1986	Unclear	Unclear	High	High	High	Low	Unclear	High	Baseline difference
Doyne et al 43 1987	Unclear	Unclear	High	Low	High	Low	Unclear	High	Baseline difference
Veale et al 62 1992	Unclear	Unclear	High	High	High	Low	Low	High	Baseline difference
Singh et al 60 1997	Low	Unclear	High	Low	Low	Low	Low	High	Baseline difference
Blumenthal et al 38 1999	Unclear	Unclear	High	Low	High	Low	High	Low
Mather et al 52 2002	Low	Low	High	Low	High	Low	Low	Low
Dunn et al 45 2005	Low	Low	High	Low	High	High	High	Low
Singh et al 58 2005	Low	Low	High	Low	High	Low	Unclear	Low
Pilu et al 55 2007	Unclear	Unclear	High	Unclear	Low	Low	Unclear	Low
Viera et al 63 2007	Unclear	Unclear	High	Unclear	Low	Low	Unclear	Low
Blumenthal et al 39 2007	Low	Low	High	Low	High	High	Low	Low
Krogh et al 21 2009	Low	Low	High	Low	Low1	High	High	High	Baseline difference
Mota-Pereira et al 53 2011	Unclear	Unclear	High	Low	High	Low	High	High	Baseline difference
Krogh et al 22 2012	Low	Low	High	Low	Low	Low	Low	Low
Chalder et al 41 2012	Low	Low	High	High	Low	Low	Low	Low
Fang et al 73 2013	Unclear	Unclear	High	Unclear	Unclear	High	Unclear	Low
Huipeng and Xiaohui70 2013	Unclear	Unclear	High	Unclear	Low	Low	Unclear	Low
Ho et al 201447	Low	Unclear	High	Low	High	Low	Low	Low
Danielsson et al 65 2014	Unclear	Low	High	Low	High	Low	Low	Low
Pfaff et al 54 2014	Low	Low	High	Low	Low1	Low	Low	High	Baseline difference
Guifeng et al 72 2015	Unclear	Unclear	High	Unclear	Low	Low	Unclear	Low
Jinchun et al 71 2015	Unclear	Unclear	High	Unclear	Low	Low	Unclear	Low
Schuch et al 84 2015	Unclear	Low	High	Low	Low	Low	Low	Low
Kerling et al 49 2015	Unclear	Unclear	High	Unclear	Low	Low	Low	Low
Belvederi et al 83 2015	Low	Low	High	Low	High	Low	Low	High	Post hoc sample size
Carneiro et al 66 2015	Unclear	Low	High	High	Unclear	Low	Low	Low
Doose et al 68 2015	Unclear	Unclear	High	High	High	Low	Low	High	No sample size calc.
Pentecost et al 77 2015	Low	Low	High	High	High	Low	Low	Low
Salehi et al 69 2016	High	High	High	Low	Unclear	Low	Low	High	Baseline
Legrand and Neff 67	Low	High	High	High	High	Low	Unclear	Low
Euteneuer et al 75 2017	Low	Unclear	High	High	High	Low	Low	Low
Olson et al 74 2017	Low	Unclear	High	High	High	Low	Low	Low
Patten et al 76 2017	Unclear	Unclear	High	High	High	Low	Low	Low

Risk of bias in trials assessing exercise for patients diagnosed with depression

Primary outcomes

The effect of exercise on depression severity

All included trials provided a continuous outcome on depression severity for the assessment of the exercise intervention encompassing 2498/2630 randomised participants (95%). The effect of intervention versus control was a SMD of −0.66 (95% CI −0.86 to −0.46; p<0.001) (figure 1). This corresponds to an effect on the HAM-D17 scale of −4.1 (95% CI −5.3 to −2.9) points.

Figure 1

Effect of exercise on depression severity in patients diagnosed with depression.

Missing data

Missing outcome analysis for depression as a continuous outcome did not markedly change the effect estimates. The least favourable outcome for the exercise intervention was the worse/best outcome analysis using +2 SD resulting in an effect estimate of −0.57 SMD (95% CI −0.78 to −0.36; p<0.001) (see online supplementary table S1).

Heterogeneity and subgroup analysis

The I2 was 81% suggesting substantial heterogeneity. Subgroup analysis revealed that the effect estimates for trials potentially having less risk of bias was −0.11 SMD (95% CI −0.41 to 0.18; p=0.45; I2=62%) compared with that of the trials at high risk of bias −0.75 SMD (−0.98 to −0.52; p<0.001; I2=81%) (test of subgroup difference, p<0.001). In addition, trials including 50 participants or less had a pooled estimate of −1.11 (-1.52 to −0.72; p<0.001; I2=78%) compared with that of larger trials of −0.37 (-0.57 to −0.18; p<0.001; I2=75%) (test of subgroup difference, p=0.001). Trials of short duration of intervention (<10 weeks) had an SMD of −0.92 (−1.09 to −0.74; p<0.001; I2=14%) compared with trials with longer duration of intervention, −0.49 (-0.75 to −0.23; p<0.001; I2=83%) (test of subgroup difference, p=0.007). Effect estimates from trials including participants with minor depression compared with trials exclusively including participants with major depression did not differ (test of subgroup difference, p=0.53). Four trials allocated 206 participants to different exercise intensities/doses.45 58 73 83 Comparing the postintervention depression scores for participants allocated to either high-intensity/high-dose versus low-intensity/low-dose exercise showed a difference of −0.40 SMD (95% CI −0.67 to −0.12; p=0.005; I2=0%) in favour of high-intensity/high-dose exercise. As shown in table 3, no other trial characteristic significantly explained any of the observed heterogeneity (see online supplementary table S2 for trial characteristics used to explore heterogeneity.

Table 3

Heterogeneity of effect estimates for trials assessing the effect of exercise for patients diagnosed with depression explored by comparing subgroups

Subgroups	Number of trials (participants)	Random-effects meta-analysis SMD (95% CI, p, I²)	Subgroup explains heterogeneity p Value
Risk of bias
Less than high risk of bias¹	4 (530)	−0.11 (−0.41 to 0.18; p=0.45; I²=62%)	<0.001
High risk of bias	31 (1968)	−0.75 (−0.98 to −0.52; p<0.001; I²=81%)
Age
Old (>59 years)	5 (492)	−0.77 (−1.34 to −0.19; p=0.009; I²=87%)	0.78
Young (<59 years)	30 (2006)	−0.68 (−0.90 to −0.45; p<0.001; I²=83%)
Exercise context
Group exercise	26 (1785)	−0.75 (−1.01 to −0.50; p<0.001; I²=83%)	0.30
Individual exercise	9 (713)	−0.52 (−0.88 to −0.16; p=0.005; I²=73%)
Duration
<10 weeks	15 (721)	−0.92 (−1.09 to −0.74; p<0.001; I²=14%)	0.007
10 weeks or more	20 (1777)	−0.49 (−0.75 to −0.23; p<0.001; I²=83%)
Attention control
Attention control	10 (733)	−0.56 (−0.98 to −0.15; p=0.008; I²=85%)	0.91
Waitlist	2 (47)	−0.67 (−2.48 to 1.13; p=0.47; I²=88%)
Pharmacotherapy
Add-on	11 (734)	−0.92 (−1.38 to −0.46; p<0.001; I²=86%)	0.82
No medication	6 (318)	−0.82 (−1.58 to −0.06; p=0.03; I²=88%)
Somatic comorbidity
Somatic comorbidity	0	N/A
No comorbidity	35 (2331)	N/A
Minor depression
Including minor depression	6 (350)	−0.90 (−1.65 to −0.15; p=0.02; I²=86%)	0.53
No minor depression	25 (2148)	−0.65 (−0.87 to −0.43; p<0.001; I²=81%)
Patient setting
Inpatients	10 (549)	−0.88 (−1.07 to −0.70; p<0.001; I²=6%)	0.07
Outpatients	21 (1782)	−0.60 (−0.85 to −0.35; p<0.001; I²=83%)
Trial size
Trials n≤50	18 (578)	−1.11 (−1.52 to −0.72; p<0.001; I²=78%)	0.001
Trials n>50	17 (1920)	−0.37 (−0.57 to −0.18; p<0.001; I²=75%)
Increase in exercise capacity
VO₂max>2.8 mL/kg/min	5 (340)	−0.48 (−1.08 to 0.13; p=0.12; I²=86%)	0.65
VO₂max≤2.8 mL/kg/min	6 (661)	−0.32 (−0.61 to 0.02; p=0.03; I²=68%)

VO2max, maximal oxygen uptake.

Heterogeneity of effect estimates for trials assessing the effect of exercise for patients diagnosed with depression explored by comparing subgroups VO2max, maximal oxygen uptake.

Trial sequential analysis and diversity adjusted required information size

The diversity adjusted required information size for HAM-D17 as a continuous outcome was calculated based on our anticipated intervention effect of a minimal relevant difference of 3.0 HDRS points, an SD of 6.78 points, a risk of type I error of 0.05, a power of 90% and the observed diversity of 92% to 2610 participants. Only 14 trials reported results from HAM-D17 21 22 38 39 43 44 52 53 55 56 58 68 70 83 with an accrued 1124 participants. As shown in online supplementary figure S2, the cumulative Z-curve just crossed the trial sequential monitoring boundary for benefit. With the aforementioned settings, the pooled estimate is therefore less likely to be a random finding due to lack of power or multiple testing if bias could be ignored. Post hoc, we calculated the adjusted required information size for HAM-D17 including all trials as shown in online supplementary figure S3. As with the original analysis, the Z-curve crossed the trial sequential monitoring boundary for benefit supporting that the pooled estimate is less likely to represent a type 1 error if bias could be ignored.

Bayes factor

Fourteen trials reported effect estimates using the HAM-D17.21 22 38 39 43 45 52 53 55 63 68 70 83 84 Based on these trials, Bayes factor was calculated (δ=−3.37; SEδ=0.96; µa=−3.0) and was found to be 0.002, which is below the Bayes factor threshold for significance of 0.1, supporting the intervention effect if bias could be ignored.

Publication bias

Inspection of the funnel plot (not shown) suggested that small trials with small or no effect of exercise were missing (see online supplementary figure S4). Egger’s test supported the suspicion of publication bias, p<0.00001. Using the Duval’s and Tweedie’s trim and fill procedure, the estimate was reduced into −0.27 SMD (95% CI −0.50 to −0.05). This corresponds to an effect on the HAM-D17 scale of −1.7 (95% CI −3.1 to −0.31) points.

The effect of exercise on depression—lack of remission

Nineteen trials, randomising 1825 participants and including 1639 participants (90%) in final analysis reported remission as an outcome.21 22 38–40 43 45 47 49 53 54 56 60 61 65 68–70 72 Remission postintervention was defined in various ways: a postintervention score on the HAM-D17<8 points,44 53 56 69 70 not fulfilling the DSM criteria for depression and a HAM-D17<8 points,21 22 39 not fulfilling the DSM criteria for depression,38 54 60 a BDI score <9 points,43 a BDI score <10 points,40 a HAM-D17 score <10 points,83 a Montgomery-Asberg Depression Rating Scale (MADRS) score <10 points,47 a MADRS score <10 points and a 50% reduction in symptom score,65 a 75% reduction in HAM-D24,72 a HAM-D17 score <11.28 points and a reduction in HAM-D17 scores >7.74 points68 and one study used MADRS not specifying the cut-off for remission.49 The RR for lack of remission was 0.78 (95% CI 0.68 to 0.90; p=0.0008) in favour of the intervention using a random-effects analysis. The I2 was 69% suggesting substantial heterogeneity. The forest plot for the intervention effect on lack of remission is illustrated in online supplementary figure S5. The scenario in least favour of the intervention was the ‘poor’ outcome analysis having an effect estimate of RR 0.88 (95% CI 0.83 to 0.94; p=0.0002; I2=69%). As shown in online supplementary table S1, the remaining scenarios did not substantially differ from the main analysis. I2 was 69% for the outcome lack of remission suggesting substantial heterogeneity. For this outcome, only two trials22 84 were considered as trials potentially having less risk of bias than the other trials at high risk of bias. The RR of these two trials was 0.95 (95% CI 0.74 to 1.23; p=0.78) compared with 0.77 (96% CI 0.64 to 0.92; p=0.003) for trials at high risk of bias (test of subgroup difference, p=0.19). Trials including 52 participants or less in their final analysis had a RR of 0.62 (95% CI 0.50 to 0.76; p<0.001; I2=45%) compared with 0.95 (95% CI 0.80 to 1.12; p=0.52; I2=68%) for larger trials (test of subgroup difference, p=0.002). Also, trials with a duration of <10 weeks had a RR of 0.63 (95% CI 0.51 to 0.77; p<0.001; I2=40%) compared with 0.93 (95% CI 0.78 to 1.10; p=0.39; I2=69%) for trials of a longer duration (test of subgroup difference, p=0.004). As shown in online supplementary table S3, no other trial characteristic significantly explained any of the observed heterogeneity (see online supplementary table S2 for trial characteristics used to explore heterogeneity). The diversity adjusted required information size for lack of remission was calculated based on our observed diversity of 74%, a proportion in the control group with lack of remission of 66%, an anticipated intervention effect of 15% relative risk reduction, a risk of type I error of 0.05% and a power of 90%. As shown in online supplementary figure S6, the cumulative Z-curve just crossed the trial sequential monitoring boundary for benefit. With the aforementioned settings, the pooled estimate is therefore less likely to be a random finding due to lack of power or multiple testing if bias could be ignored. Bayes factor was calculated based on the observed relative risk of remission, the associated SE and an anticipated intervention effect of relative increase in number of participants with remission by 15% (δ=−0.248; SEδ=0.08; µδ=−0.163). Bayes factor was 0.02, which is below the Bayes factor threshold for significance of 0.1. Inspection of the funnel plot (not shown) suggested that small trials with small or no effect of exercise were missing. Egger’s test supported the suspicion of publication bias, p=0.002. Imputing theoretically missing studies by the Duval’s and Tweedie’s trim and fill procedure, reduced the estimate of intervention effect into a relative risk reduction of 0.93 (95% CI 0.79 to 1.11).

The effect of exercise on serious adverse events

Serious adverse events (ie, death or suicide attempts) were reported in only three trials.21 22 58 In these trials, one suicide attempt22 and one death by suicide21 were recorded in the intervention groups. The RR for death or suicide in the two trials was 2.21 (95% CI 0.24 to 20.21; p=0.48; I2=0%) as illustrated in online supplementary figure S7. Missing outcome analysis for ‘serious adverse events’ varied according to missing data scenario: poor outcome analysis relative risk, 0.92 (95% CI 0.37 to 2.30; p=0.86; I2=60.0%), good outcome analysis, 2.19 (95% CI 0.23 to 20.76; p=0.50; I2=0.0%), best/worst outcome analysis 0.08 (95% CI 0.02 to 0.34; p=0.001; I2=5.4%), worst/best outcome analysis 19.17 (95% CI 2.64 to 139.2; p=0.004; I2=0.0%).

Trial sequential analysis and Bayes analysis

We decided not to conduct trial sequential analysis or Bayes analysis due to too sparse data. Only 3/35 trials reported on this outcome and no formal assessment for publication bias was made. However, the lack of reporting in the vast majority of trials suggest risk publication bias.

Secondary outcomes

The effect of exercise on quality of life

Nine trials randomising 827 participants reported on quality of life,21 22 38 40 56 60 71 76 85 observing that participants allocated to exercise did not have significantly better quality of life (SMD 0.40; 95% CI −0.03 to 0.83; p=0.07). The I2 was 88% showing substantial heterogeneity (see online supplementary figure S8).

Non-serious adverse events

Non-serious adverse events were reported in only 10 trials.21 22 39 56 58 60 65 67 68 75 Five trials reported on musculoskeletal adverse events without conducting formal tests58 60 65 67 68 and four trials reported on number of participants with high depression scores postintervention compared with baseline assessment.21 22 65 68 The RR for increased severity of depression in patients allocated to exercise postintervention was 0.83 (95% CI 0.40 to 1.70; p=0.60; I2=0.0%).

The effect of exercise on depression beyond the duration of the intervention

Assessment of depression beyond the intervention was conducted in seven trials,21 38 40 52 60 63 86 with a median duration between end of intervention and assessment of depression of 6 months (range 5–23.5 months). The SMD between the intervention group and the control group using a random-effects analysis was −0.10 (95% CI −0.28 to 0.09; p=0.31; I2=19.5%). The I2 for this estimate was 19.5% suggesting low heterogeneity (see online supplementary figure S9). Remission beyond the intervention was assessed in five trials,21 38–40 54 and the relative risk of lack of remission was 0.95 (95% CI 0.82 to 1.11; p=0.53) with an I2 of 0.0% (see online supplementary figure S10).

GRADE assessments

The GRADE assessments are presented in table 4, and quality of evidence for both primary and secondary outcomes was very low or low.

Table 4

Summary of findings

Exercise compared with control or treatment as usual for depression Patient or population: depression Setting: inpatients or outpatients Intervention: exercise Comparison: control or treatment as usual
Outcomes	Anticipated absolute effects* (95% CI)		Relative effect (95% CI)	No. of participants (studies)	Quality of the evidence (GRADE)	Comments
Outcomes	Risk with control or treatment as usual	Risk with exercise	Relative effect (95% CI)	No. of participants (studies)	Quality of the evidence (GRADE)	Comments
Severity of depression	-	0.66 SMD lower (0.46 lower to 0.86 lower)	–	2498 (35 RCTs)	⨁◯◯◯ Very low†	Lower depression scores indicate improvement. SMD of 0.3 is considered clinically relevant.
Lack of remission	Study population		RR 0.78 (0.68 to 0.90)	1639 (19 RCTs)	⨁◯◯◯ Very low‡	Remission is, with minor variations, defined as not fulfilling the criteria for depression.
Lack of remission	646 per 1000	504 per 1000 (426 to 594)	RR 0.78 (0.68 to 0.90)	1639 (19 RCTs)	⨁◯◯◯ Very low‡
Serious adverse events	Study population		RR 2.21 (0.24 to 20.21)	335 (3 RCTs)	⨁⨁◯◯ Low§
Serious adverse events	0 per 1000	0 per 1000 (0 to 0)	RR 2.21 (0.24 to 20.21)	335 (3 RCTs)	⨁⨁◯◯ Low§
Quality of life	–	0.40 SMD higher (0.03 lower to 0.83 higher)	–	827 (9 RCTs)	⨁◯◯◯ Very low¶	Quality of life was assessed using a number of different methods. Higher score indicates improved quality of life. Seven of 24 trials reported on this outcome.
Depression severity after the intervention	–	0.06 SMD lower (0.25 lower to 0.14 higher)	–	713 (7 RCTs)	⨁⨁◯◯ Low**	Lower depression scores indicate improvement. SMD of 0.3 is considered clinically relevant.
Lack of remission after the intervention	Study population		RR 0.95 (0.82 to 1.11)	777 (5 RCTs)	⨁⨁◯◯ Low††
Lack of remission after the intervention	469 per 1000	446 per 1000 (385 to 521)	RR 0.95 (0.82 to 1.11)	777 (5 RCTs)	⨁⨁◯◯ Low††
Depression severity. Restricted to trials with less than high risk of bias	–	0.11 SMD lower (0.41 lower to 0.18 higher)	–	530 (4 RCTs)	⨁⨁◯◯ Lo‡‡	Lower depression scores indicate improvement. SMD of 0.3 is considered clinically relevant.

GRADE Working Group grades on evidence: high quality: we are very confident that the true effect lies close to that of the estimate of the effect; moderate quality: we are moderately confident in the effect estimate. The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different; low quality: our confidence in the effect estimate is limited. The true effect may be substantially different from the estimate of the effect; very low: we have very little confidence in the effect estimate. The true effect is likely to be substantially different from the estimate of the effect.

†Downgraded by 3: risk of bias, inconsistency and publication bias.

‡Downgraded by 3: risk of bias, inconsistency and publication bias.

§Downgraded by 2: imprecision and publication bias.

Downgraded by 3: risk of bias, inconsistency and imprecision.

**Downgraded by 2: risk of bias and imprecision.

††Downgraded by 2: risk of bias and imprecision.

‡‡Downgraded by 2: inconsistency and imprecision.

**The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

RCT, randomised clinical trial; RR, risk ratio; SMD, standardised mean difference.

Summary of findings GRADE Working Group grades on evidence: high quality: we are very confident that the true effect lies close to that of the estimate of the effect; moderate quality: we are moderately confident in the effect estimate. The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different; low quality: our confidence in the effect estimate is limited. The true effect may be substantially different from the estimate of the effect; very low: we have very little confidence in the effect estimate. The true effect is likely to be substantially different from the estimate of the effect. †Downgraded by 3: risk of bias, inconsistency and publication bias. ‡Downgraded by 3: risk of bias, inconsistency and publication bias. §Downgraded by 2: imprecision and publication bias. Downgraded by 3: risk of bias, inconsistency and imprecision. **Downgraded by 2: risk of bias and imprecision. ††Downgraded by 2: risk of bias and imprecision. ‡‡Downgraded by 2: inconsistency and imprecision. **The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). RCT, randomised clinical trial; RR, risk ratio; SMD, standardised mean difference.

Additional analysis

Four studies reported change in scores from baseline with corresponding SDs, and one study reported mean difference between groups postintervention. Comparing the effect size of these five studies with the remaining did not seem to explain part of the heterogeneity (p=0.23).

Discussion

Thirty-five clinical trials allocating more than 2498 participants diagnosed with depression according to validated diagnostic instruments were included in the present systematic review. Pooled estimates suggested moderate antidepressant effect assessed both as a continuous outcome and as lack of remission. Due to risk of bias, inconsistency of effect estimates and publication bias, we have, however, very little confidence in these effect estimates. Subgroup analyses exploring reasons for the heterogeneity found that trials potentially having less risk of bias than other trials at high risk of bias had no effect of exercise on depression. Furthermore, duration of intervention and trial size were inversely associated with effect estimates. Exercise did not improve quality of life or depression or remission after the intervention. Serious adverse events or adverse events were reported inconsistently and only by a few trials not permitting firm conclusions regarding these outcomes.

Strengths and limitations

The strengths of this systematic review are that it is based on the published protocol, a comprehensive search strategy and the inclusion of patient-centred outcomes such as quality of life as well as adverse events. Also, to avoid spurious finding from repeated testing, trial sequential analysis and Bayes analysis were undertaken and these analyses did not suggest that the pooled estimates could be reduced to random errors for effect on depression severity or no remission. Neither trial sequential analysis nor Bayes factor analysis are, however, able to wash of spurious effects induced by bias, fraud or other reasons.26 29 87–89 Had we restricted the trial sequential analysis to trials of potentially lower risk of bias, the number of trials and participants would be limited and we had seen evidence far from crossing any boundaries for benefit, harms or futility. The conclusions for serious adverse events and adverse events were associated with wide CIs due to lack of data and firm conclusions for these outcomes are presently not available. The number of trials with adequate allocation concealment was 37% in the current systematic review compared with only 15.1% in trials assessing non-drug interventions for depression.90 Blinded outcome assessment was performed in 46% of the included trials compared with 44% in non-drug antidepressant trials in general.90 The incomplete outcome bias domain was adequate in 34% of our included trials compared with 32.9% of antidepressant non-drug trials in general.90Compared with non-drug trials assessing interventions for participants with depression, the included exercise trials have more bias domains with low risk of bias. However, all our included trials were at high risk of bias. Two trials had low risk of bias for all bias domains except for blinding of participants and trial personnel, and four trials fulfilled our criteria for trials at potentially less risk of bias than the rest of the trials with at risk of bias. Despite a search strategy including bibliographical databases and trials from China and South America, the vast majority of included trials were conducted in North America and western Europe, which is comparable to the geographical distribution of non-drug trials in general,90 limiting the applicability to other geographic regions. All outcomes for the primary analysis reflect depression severity, however, the different psychometrics may represent different aspects of depression not reflected in the pooled estimate. An in-depth discussion of the included assessment scales is beyond the scope of this review, but in the current systematic review we found no significant differences of effect estimates from trials using HAM-D17 compared with trials using other assessment scales (data not shown).

The effect of exercise on depression

Our present results are similar to the latest Cochrane review by Cooney et al, 24 who found a moderate effect of exercise on depressive symptoms (−0.62 SMD) when including all trials and no effect when restricting the analysis to trials with less risk of bias (−0.18 SMD). The Cochrane review did find evidence of a small antidepressant effect beyond the intervention, which we could not confirm in our present systematic review. Bridle et al 13 included nine trials allocating old (>60 years) participants with depression to exercise interventions versus control interventions. Restricting the analysis to four trials at lower risk of bias they found small-to-moderate effect estimates (SMD −0.34) in favour of exercise. The studies by Cooney et al 24 and Bridle et al 13 both included trials allocating participants with depressive symptoms and not necessarily diagnosed using a validated diagnostic system, potentially explaining the differences in the effect sizes. However, in our present systematic review the estimate for four trials at potential less risk of bias than the remaining trials was −0.11 SMD and in the study by Cooney et al, the effect estimate for eight trials with lower risk of bias was −0.18 SMD24 compared with −0.34 in the study by Bridle et al.13 Meta-analysis of randomised clinical trials assessing the effects of exercise for depression consistently finds positive effects, however, when restricting the analysis to trials with less risk of bias the pooled effect sizes becomes very small or negligible. Meta-analysis examining the effect of exercise beyond the intervention also finds no or small effects of exercise. In the process of interpretation of effect estimates in the current research field, it is important to recognise that effect estimates from trials with non-blinded outcome assessment are at high risk of bias as reported by Savović et al.91 Sixteen of 35 trials in the current systematic review did not use blinded outcome assessment. In contradiction to the current systematic review, a recent meta-analysis by Schuch et al 12 concluded that ‘exercise has a large and significant antidepressant effect in people with depression………Our data strongly support the claim that exercise is an evidence-based treatment for depression’. This statement was based on a meta-analysis of 25 randomised clinical trials including participants with depression or depressive symptoms to exercise or control conditions and excluding trials using any form of active control group. Surprisingly, the authors found that adjusting for publication bias using the trim and fill procedure,31 the estimate increased from an SMD of 0.98 to 1.11. The effect in SMD in included studies ranged from −0.23 to 4.56 representing considerable heterogeneity.12 The authors classified four trials as having lower risk of bias using the same criteria as in our systematic review and 21 trials as having high risk of bias. This illustrates some of the challenges in meta-analysis of exercise and depression: the large heterogeneity driven by small studies inflating the effects of random-effects analysis,92 the misconception that we can restrict our analysis to statistics and not consider the evident effect of bias.23 91 Compared with our previous review,10 we now included 35 trials including 2498 participants versus previously 13 trials and 687 participants. It may seem as a paradox that this large increase in data has not provided us with a similar increase in certainty of conclusions reflected by heterogeneity of trial results as well as our conclusions from the systematic reviews. The increase in available data is, however, primarily provided by small trials at high risk of bias introducing exaggerated effect estimates. In the current systematic review, we included four trials with 530 participants at lower risk of bias compared with three trials with 239 participants in our previous review, reflecting that only a small part of the additional data comes from trials at lower risk of bias. The continuous increase in data associated with high risk of bias will not provide patients, clinicians or policymakers with adequate information and represents an unethical enrolment of trial participants and waste of resources.93–99 We therefore recommend that future systematic reviews and meta-analysis a priori should have a primary outcome restricting effect analysis to larger trials with lower risk of bias and that any recommendations regarding exercise interventions for participants with depression should be assessed with the GRADE framework. The I2 of 81% and 69% for the primary outcomes indicate substantial evidence of heterogeneity of intervention effects that is variation in effect estimates beyond chance. Part of this heterogeneity was explained by bias and by trial size: trials at high risk of bias or small trials have very large effect estimates compared with trials potentially at less risk of bias or larger trials. The funnel plots and Egger’s test indicate publication bias, however, the association between trial size and effect estimates could suggest that the asymmetry in the funnel plots are due to small study bias rather than publication bias.100 It could be argued that both the delivery of exercise as well as the actual increase in fitness are fundamental to the assessment of the antidepressant effects of exercise, and in line with our previous review, we found duration of intervention inversely associated with effect size.11 Comparing different exercise intensities, we did find a small effect of high-intensity exercise compared with lower-intensity exercise. However, assessing delivered exercise expressed as increase in maximal oxygen uptake we could not reproduce this finding. Future trials need to pay more attention to the dose of the intervention as well as compliance with intervention.101 We suggest using maximal oxygen uptake or one repetition maximum as the gold standard to assess the received exercise. Several studies compare exercise with control interventions rather than waitlist control to reduce the effect of non-specific effects, for example, the DEpression og MOtion (DEMO) trials and the trials by Mather et al.21 22 52 Also, it could be speculated that the effect of exercise would be harder to detect if participants also received medical treatment in addition. The current systematic review could not confirm that the type of control condition explained heterogeneity. The discussion of control group is important in non-drug trials: choosing a waitlist control group the results potentially reflects non-specific effects, choosing an active control group (eg, relaxation exercise) the trial is potentially a comparison between two active treatments. However, in the current systematic review we found no evidence that trials using an attention control group or exercise as add-on to pharmacotherapy had significantly different effect estimates compared with other trials. Our systematic review did not find indications of a positive effect on quality of life in participants with depression allocated to exercise interventions, which is in concordance with the review by Cooney et al.24 Only 3/35 trials reported on serious adverse events, and we found no significant effects of exercise on risk of death or suicide attempt. No indication of increased severity of depression or other adverse events in participants allocated to exercise could be detected. However, data on adverse events were reported sporadically in a minority of trials and currently it is not possible to conclude on the risk of serious adverse events or adverse event from exercise interventions in participants with depression.

Conclusions

We have little confidence in the pooled effect estimates, especially because trials with less than high risk of bias produced significantly lower effect estimates, suggesting that exercise interventions only produce small or negligible antidepressant effects, depending on how much of the effect is caused by bias and how much is caused by the intervention. There was no effect of exercise on depression beyond the intervention itself. We found no effect on quality of life. There is currently no evidence in favour of exercise for patients with depression with a view to ameliorate depressive symptoms. Our systematic review did not evaluate possible beneficial effects of exercise on, for example, metabolism or cardiovascular fitness,22 102 and it is possible that exercise may have beneficial effects on these factors in patients diagnosed with depression.

Future perspectives

Despite the large number of published trials, further trials with more robust methodology seem still required to establish progress in this field. Also, additional trials from outside North America and Europe may be required for results to be valid for patients in Asia, Africa and South America. To further elaborate on the current findings, we recommend that future trials must include blinded outcome assessors and outcomes assessing quality of life, metabolic effects and long-term effects beyond the intervention. It is also important that future trials systematically collect and report data on death, suicide events, musculoskeletal injuries and other potential adverse effects in both the intervention group as well as in the control group. Moreover, future trials ought to be designed according to the standard protocol items: recommendations for interventional trials (SPIRIT) guidelines and reported according to the consolidated standards for reporting of trials (CONSORT) guidelines103 104 and transparently report deidentified individual participant data enabling individual participant data meta-analyses.105

86 in total

Review 1. What is meant by intention to treat analysis? Survey of published randomised controlled trials.

Authors: S Hollis; F Campbell
Journal: BMJ Date: 1999-09-11

2. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations.

Authors: Gordon H Guyatt; Andrew D Oxman; Gunn E Vist; Regina Kunz; Yngve Falck-Ytter; Pablo Alonso-Coello; Holger J Schünemann
Journal: BMJ Date: 2008-04-26

3. SPIRIT 2013 explanation and elaboration: guidance for protocols of clinical trials.

Authors: An-Wen Chan; Jennifer M Tetzlaff; Peter C Gøtzsche; Douglas G Altman; Howard Mann; Jesse A Berlin; Kay Dickersin; Asbjørn Hróbjartsson; Kenneth F Schulz; Wendy R Parulekar; Karmela Krleza-Jeric; Andreas Laupacis; David Moher
Journal: BMJ Date: 2013-01-08

4. Effects of adjunctive exercise on physiological and psychological parameters in depression: a randomized pilot trial.

Authors: Arno Kerling; Uwe Tegtbur; Elke Gützlaff; Momme Kück; Luise Borchert; Zeynep Ates; Anne von Bohlen; Helge Frieling; Katja Hüper; Dagmar Hartung; Ulrich Schweiger; Kai G Kahl
Journal: J Affect Disord Date: 2015-01-15 Impact factor: 4.839

5. Exercise treatment for depression: efficacy and dose response.

Authors: Andrea L Dunn; Madhukar H Trivedi; James B Kampert; Camillia G Clark; Heather O Chambliss
Journal: Am J Prev Med Date: 2005-01 Impact factor: 5.043

6. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials.

Authors: Jonathan A C Sterne; Alex J Sutton; John P A Ioannidis; Norma Terrin; David R Jones; Joseph Lau; James Carpenter; Gerta Rücker; Roger M Harbord; Christopher H Schmid; Jennifer Tetzlaff; Jonathan J Deeks; Jaime Peters; Petra Macaskill; Guido Schwarzer; Sue Duval; Douglas G Altman; David Moher; Julian P T Higgins
Journal: BMJ Date: 2011-07-22

7. Reducing waste from incomplete or unusable reports of biomedical research.

Authors: Paul Glasziou; Douglas G Altman; Patrick Bossuyt; Isabelle Boutron; Mike Clarke; Steven Julious; Susan Michie; David Moher; Elizabeth Wager
Journal: Lancet Date: 2014-01-08 Impact factor: 79.321

8. How to increase value and reduce waste when research priorities are set.

Authors: Iain Chalmers; Michael B Bracken; Ben Djulbegovic; Silvio Garattini; Jonathan Grant; A Metin Gülmezoglu; David W Howells; John P A Ioannidis; Sandy Oliver
Journal: Lancet Date: 2014-01-08 Impact factor: 79.321

Review 9. The effect of interpersonal psychotherapy and other psychodynamic therapies versus 'treatment as usual' in patients with major depressive disorder.

Authors: Janus Christian Jakobsen; Jane Lindschou Hansen; Erik Simonsen; Christian Gluud
Journal: PLoS One Date: 2011-04-27 Impact factor: 3.240

10. Thresholds for statistical and clinical significance in systematic reviews with meta-analytic methods.

Authors: Janus Christian Jakobsen; Jørn Wetterslev; Per Winkel; Theis Lange; Christian Gluud
Journal: BMC Med Res Methodol Date: 2014-11-21 Impact factor: 4.615

21 in total

1. [Sport and physical exercise in unipolar depression : Prevention, therapy, and neurobiological mechanisms of action].

Authors: Jonathan Repple; Nils Opel
Journal: Nervenarzt Date: 2021-04-13 Impact factor: 1.214

2. Exercise as Medicine for Mental and Substance Use Disorders: A Meta-review of the Benefits for Neuropsychiatric and Cognitive Outcomes.

Authors: Garcia Ashdown-Franks; Joseph Firth; Rebekah Carney; Andre F Carvalho; Mats Hallgren; Ai Koyanagi; Simon Rosenbaum; Felipe B Schuch; Lee Smith; Marco Solmi; Davy Vancampfort; Brendon Stubbs
Journal: Sports Med Date: 2020-01 Impact factor: 11.136

3. Acute Bouts of Exercising Improved Mood, Rumination and Social Interaction in Inpatients With Mental Disorders.

Authors: Serge Brand; Flora Colledge; Sebastian Ludyga; Raphael Emmenegger; Nadeem Kalak; Dena Sadeghi Bahmani; Edith Holsboer-Trachsler; Uwe Pühse; Markus Gerber
Journal: Front Psychol Date: 2018-03-13

4. Effects of Aerobic Exercise on Cortisol Stress Reactivity in Response to the Trier Social Stress Test in Inpatients with Major Depressive Disorders: A Randomized Controlled Trial.

Authors: Markus Gerber; Christian Imboden; Johannes Beck; Serge Brand; Flora Colledge; Anne Eckert; Edith Holsboer-Trachsler; Uwe Pühse; Martin Hatzinger
Journal: J Clin Med Date: 2020-05-11 Impact factor: 4.241

5. EFFORT-D: results of a randomised controlled trial testing the EFFect of running therapy on depression.

Authors: Frank Kruisdijk; Marijke Hopman-Rock; Aartjan T F Beekman; Ingrid Hendriksen
Journal: BMC Psychiatry Date: 2019-06-10 Impact factor: 3.630

6. A Bird's-Eye View of Exercise Intervention in Treating Depression Among Teenagers in the Last 20 Years: A Bibliometric Study and Visualization Analysis.

Authors: Yanwei You; Dizhi Wang; Yuning Wang; Zhipeng Li; Xindong Ma
Journal: Front Psychiatry Date: 2021-06-18 Impact factor: 4.157

Review 7. Depression and Objectively Measured Physical Activity: A Systematic Review and Meta-Analysis.

Authors: Vincenza Gianfredi; Lorenzo Blandi; Stefano Cacitti; Mirko Minelli; Carlo Signorelli; Andrea Amerio; Anna Odone
Journal: Int J Environ Res Public Health Date: 2020-05-25 Impact factor: 3.390

8. Gene-Environment Interplay Between Physical Exercise and Fitness and Depression Symptomatology.

Authors: Wendy Johnson; Erik Lykke Mortensen; Kirsten Ohm Kyvik
Journal: Behav Genet Date: 2020-08-14 Impact factor: 2.805

Review 9. Effects of Meditative Movements on Major Depressive Disorder: A Systematic Review and Meta-Analysis of Randomized Controlled Trials.

Authors: Liye Zou; Albert Yeung; Chunxiao Li; Gao-Xia Wei; Kevin W Chen; Patricia Anne Kinser; Jessie S M Chan; Zhanbing Ren
Journal: J Clin Med Date: 2018-08-01 Impact factor: 4.241

10. Physical Exercise in Major Depression: Reducing the Mortality Gap While Improving Clinical Outcomes.

Authors: Martino Belvederi Murri; Panteleimon Ekkekakis; Marco Magagnoli; Domenico Zampogna; Simone Cattedra; Laura Capobianco; Gianluca Serafini; Pietro Calcagno; Stamatula Zanetidou; Mario Amore
Journal: Front Psychiatry Date: 2019-01-10 Impact factor: 4.157