Literature DB >> 26467410

From Randomized Controlled Trials of Antidepressant Drugs to the Meta-Analytic Synthesis of Evidence: Methodological Aspects Lead to Discrepant Findings.

Konstantinos N Fountoulakis, Roger S McIntyre, André F Carvalho¹.

Abstract

During the last decade, several meta-analytic studies employing different methodological approaches have had inconsistent conclusions regarding antidepressant efficacy. Herein, we aim to comment on methodological aspects that may have contributed to disparate findings. We initially discuss methodological inconsistencies and limitations related to the conduct of individual antidepressant randomized controlled trials (RCTs), including differences in allocated samples, limitations of psychometric scales, possible explanations for the heightened placebo response rates in antidepressant RCTs across the past two decades as well as the reporting of conflicts of interest. In the second part of this article, we briefly describe the various meta-analyses techniques (e.g., simple random effects meta-analysis and network meta-analysis) and the application of these methods to synthesize evidence related to antidepressant efficacy. Recently published antidepressant metaanalyses often provide discrepant results and similar results often lead to different interpretations. Finally, we propose strategies to improve methodology considering real-world clinical scenarios.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Antidepressive Agents

Year: 2015 PMID： 26467410 PMCID： PMC4761632 DOI： 10.2174/1570159x13666150630174343

Source DB: PubMed Journal: Curr Neuropharmacol ISSN： 1570-159X Impact factor: 7.363

INTRODUCTION

Meta-analysis is regarded as the ‘gold standard’ approach for the evaluation and ranking of evidence in healthcare [1, 2]. However, important concerns related to the conduct of meta-analyses have emerged in the literature [3, 4]. Notwithstanding meta-analysis as a relevant method to synthesize and rank research data [5], a number of methodological issues and the presence of inherent biases (e.g., heterogeneity and discrepant methodologies across included trials) often lead to erroneous meta-analytic results and interpretations [4, 6, 7]. Currently, there is a significant number of meta-analysis concerning antidepressants and overall, they reported the presence of significant publication bias [8], a relatively small effect size in comparison to placebo [9-6], while there is a controversy on the role of initial severity [12, 13, 15]. Notwithstanding a new network meta-analysis affirms the efficacy of antidepressants for mild depression [17]. A number of meta-analyses also support the efficacy of manual-based psychotherapies (e.g., cognitive-behavioral therapy) [18-33], Notwitstanding replicated meta analytic data positive with antidepressants important methodological issues have raised concerning the validity of the results with conclusions of non-efficacy in some analysis [12, 34]. The publication of meta-analysis concluding non-efficacy of antidepressants has fuelled skepticism among stakeholders involved in the major depressive disorder (MDD) ecosystem [35]. The overarching aims of the present narrative review are three-fold: (1) to discuss methodological limitations concerning antidepressant RCTs; (2) to briefly overview the strengths and shortcomings of the main meta-analytic techniques; and (3) to describe antidepressant meta-analyses published in the past decade along with a critical methodological appraisal. Lastly, we propose strategies for improving the conduct and interpretation of antidepressant RCTs on a clinically informative basis, providing guiding principles or a systematic approach to meta-analysis to enhance consistency and rigor.

LIMITATIONS OF ANTIDEPRESSANT RCTS

Participant Characteristics

The recruitment and screening of participants for antidepressant RCTs are often problematic. The reasons behind this problem are mainly the combination of financial benefits for researchers in combination with pressure to meet deadlines. As a result, it has been reported that often the initial ratings are inflated (i.e., patients recruited later for antidepressant RCTs may have their symptom severity artificially inflated) [36, 37]. Furthermore, in general patients recruited in RCTs do not correspond to the average real-life patient (i.e., the representativeness of included participants is often limited) [38-44]. It is also interesting that there is a large variability between sites and countries. North America and Western Europe experience great difficulties in the recruitment of patients while on the contrary, eastern European countries and China recruit participants much easier and faster [4]. The presence of medical comorbidities as well as the history of response to medications in the past do not seem to be consistently reported in most RCTs [45, 46]. Failed trials have negative consequences both in terms of some kind of tolerance towards antidepressants [47], but also psychological consequences (e.g., demoralization) [45]. Thus, it seems necessary to radically change our approach and possibly the complete RCT paradigm (vide infra) [48].

TRIAL DESIGN CHARACTERISTICS AND RATING SCALES

Larger RCTs are considered to be more reliable and methodologically superior in comparison to smaller studies [49]. Most meta-analyses conclude that there are important basic flaws within RCTs, therefore it seems more important to focus on the improvement of the RCTs basic design and structure and utilize the knowledge on the issue which has been accumulated during the recent years [50, 51], rather than to improve meta-analytical methods so as to be able analyze flawed data in a quasi-omnipotent way. A radically different approach suggests that treatments can better be evaluated by a series of smaller but very well designed trials of high quality especially concerning study sample characteristics [4, 37, 52]. Notwithstanding the belief that increments in sample size would improve the signal-to-noise ratio in psychiatric RCTs, in certain circumstances, the quality of data declines with larger sample sizes [37]. For example, in large-scale multicenter RCTs there often is a financial compensation both for researchers and patients. The combination of financial interests and ethical and other administrative restrictions, with a significant pressure to complete the recruitment within strict deadlines and also with the competition among study centers to recruit larger samples might result in the violation (p.e., inflation of severity scores) of inclusion criteria for participants recruited later in the trial [37]. Therefore, noise is no longer random, but it is systematically related to sample size and eventually this may lead to a deterioration of the signal-to-noise ratio. The psychometric scales constitute an additional methodological problem. In addition to the fact that most of them do not correspond to the modern concept of depression, their scores are not rated as continuous variables, but rather as ordinal categorical ones, with unequal distances between score levels. In essence, depression rating scales derive scores from the accumulated number of qualitatively different questions/items. Thus, similar scores might correspond to radically different clinical profiles. The Hamilton depression rating scale (HDRS), which is the most commonly employed rating instrument across antidepressant RCTs has a number of serious drawbacks. We know today that these drawbacks limit its utility and in extreme cases they might make it even inappropriate for use in RCTs. The HDRS includes items reflecting core symptoms of depression, however most of the items reflect either non-specific symptoms like anxiety or sleep disturbances, or medication side effects (e.g., gastrointestinal) [53-55]. Additionally, a cut-off point > 7 for the diagnosis of depression is generally suggested [56], but a score of 15 or 20 is often required for inclusion in an RCT [57, 58]. Moreover, a number of agents, including but not limited to benzodiazepines, second generation antipsychotics or antihistamines could have a significant effect on HDRS scores, which could be erroneouls attributed to ‘true’ antidepressant effects (Table ). Considering that in many antidepressant RCTs, benzodiazepines or similar agents are allowed either in the placebo or in both arms, the final score might reflect an add-on effect of benzodiazepines rather than the actual drug vs. placebo effect. For example, in an RCT of bipolar depression olanzapine promoted a significant reduction in Montgomery–Åsberg Depression Rating Scale (MADRS) scores because it improved sleep, agitation and appetite, inspite of the fact it did not improve core depressive symptoms [59]. Moreover, common side effects of antidepressants (e.g., headaches or gastrointestinal symptoms) could artificially inflate HDRS scores, thereby ‘masking’ antidepressant effects on HDRS items related to core depression dimensions. This ‘masking’ effect may be more substantial in the case of mild depression where a ‘floor’ effect cases the numerical improvement of core items to be small. A report on the analysis of the change in core HDRS items seems to support this hypothesis. This analysis suggested that when only the core HDRS items were considered, standardized mean difference (SMD) was impressively higher When only core HDRS items were utilized, the SMD values reached 6or were higher in comparison to the arbitrary NICE (National Institute for Health and Care Excellence) criteria for efficacy (i.e., a standardized mean differences - SMD above 0.5) [54, 55, 60-62]. Finally, the measurement of other domains may provide useful information in the clinical usefulness of antidepressants. For example, cognitive function has been related to psychosocial functioning, notably work performance [63], it may be considered an alternative outcome for antidepressant RCTs [64].

THE PLACEBO RESPONSE

In antidepressant trials for adults with MDD, the mean response rate is 31% in the placebo group vs. 50% in the medication group, and it has increased at a 7% rate per decade over the last 30 years [65]. Thus, high placebo response rates have been regarded as a culprit to the fact that less than half of antidepressant efficacy trials submitted to the US Food and Drug Administration for regulatory approval found the active drug superior to placebo [66]. In recent years, significant efforts have been directed to pursue a better comprehension of variables related to the high placebo response rates in antidepressant trials (reviewed in [67]). First, it is important to differentiate a “placebo response” from a placebo effect. A “placebo response” usually refers to the percentage of participants randomized to placebo who achieve at least a 50% reduction in baseline depressive symptoms, while a “placebo effect” refers to the therapeutic effect during the receiving of a substance or undergoing a procedure that is not caused by inherent powers of the specific substance or procedure [67, 68]. Different factors may influence the magnitude of the placebo response in antidepressant trials, which can be grouped as therapeutic factors, measurement factors, natural history of the illness and participant characteristics (Fig. ).

Treatment Effects

Two theories for understanding the mechanisms of placebo effects have been proposed, namely expectancy theory and classical conditioning [69]. The expectancy theory hypothesizes that placebo treatment promotes a conscious expectation by the patient that drive symptomatic improvement. On the other hand, classic conditioning theorists attribute placebo responses to unconscious learning processes in which the individual patient associates the improvement in symptoms (unconditioned response) with neutral stimuli including pills, treatment setting, etc. (conditioned stimulus). These stimuli by itself is capable of inducing a therapeutic effect (conditioned response). It is likely that both mechanisms may contribute to the observed placebo effects in antidepressant drug trials. Importantly, placebo treatment may influence neurobiological mechanisms involved in depression pathophysiology (e.g., dopaminergic neurotransmission) (see reference [68] for a review). Khan et al. was the first to report that the higher the number of treatment arms in an antidepressant RCT the lower would be the “success” of the trial [70]. A greater number of active medication arms may increase the probability of receiving the ‘active treatment’, which might enhance patient expectations and in this way it can generate higher placebo response rates. Consistent with this hypothesis, in MDD trials, the mean response rates in head-to-head comparator trials are significantly higher in the medication group in comparison to placebo [71]. Papakostas and Fava confirmed in a meta-analysis that in a clinical trial the probability of receiving placebo was negatively associated with both antidepressant and placebo responses. Interestingly, for each 10% decrease in this probability, the antidepressant response increased by 1.8%, while the of placebo response increased by 2.6% [72]. Several lines of evidence also indicate that the amount of therapeutic contact that participants receive throughout a trial may influence placebo response rates [67, 73, 74]. For example, Posternak and Zimmerman calculated changes in HDRS scores in 41 6-week RCTs of MDD as a function of the number of study visits [73]. A cumulative effect of increasing study visits on the placebo response rates was consistently demonstrated: between weeks 2 and 6, the mean improvement on HDRS scores was 4.24 points in those patients who had weekly visits vs. 3.33 points in those patients with one visit less vs. 2.49 points in those with two visits less. An analysis of antidepressant clinical trials in children and adolescents provided interesting results [74]. In contrast to the large differences in placebo response between the various study types in adults with MDD, there were no significant differences in placebo response rates between comparator and placebo-controlled studies in children and adolescents. The amount of therapeutic contact participants received appeared to influence treatment response rather than increased expectancy: a greater number of study visits was correlated with higher placebo response rates among adolescents. In summary, participant expectations and the amount of therapeutic contact they receive throughout a trial seem to play a role in placebo response rates. The magnitude of each of these effects appears to be influenced by features related to trial design as well as patient characteristics. The high placebo response rates may significantly decrease the likelihood of detecting medication-placebo differences. One approach to dealing with expectancy-related placebo effects has been the conduct of a single-blind placebo lead-in phase in which patients with a high placebo response rate are prematurely excluded from the trial. Notwithstanding this, previous reports indicate that this approach may not be effective in reducing placebo response [75, 76]. It is important to note that one study argued that double-blind lead-in periods may be more effective [77].

Measurement Factors

In most antidepressant clinical trials, investigators rate participants’ depressive symptoms based on changes in depressive severity that are either self-reported by participants or elicited by trained interviewers. Measurements of depressive symptoms are subjected to random error the same way any other measure is. However unlike objective measures (e.g., cholesterol levels), the measurement of depressive symptoms may be associated with additional sources of bias.

Regression to the Mean

Regression to the mean occurs when repeated measurements subject to random error are obtained from the same individual over time. For example, imagine that the criteria for inclusion in an antidepressant trial requires a HDRS score > 16. Some included participants may have ‘true’ means<16 and the statistical tendency of the scores of these patients to decrease on repeated measures will provide the appearance that depressive symptoms improved, when in reality no true therapeutic effect occurred.

Sources of Bias

Rater bias occurs when the measurement of depressive symptoms is influenced by underlying beliefs of the drugs under study. Furthermore, the recruitment of participants for multicenter trials is a competitive process. Thus, the financial and professional returns related to an enrollment of a participant (instead of screening-out a patient) may lead an inflation of baseline severity scores [37, 67, 78]. Conversely, a response bias refers to the systematic tendency participants may have to respond to questionnaire items in accordance to the expectation of researchers (i.e., “on demand”) [67]. “Howthorne effects” are a phenomenon whereby participants in a given experiment modify their behavior under study exactly because the know that the specific behavior is being measured. Therefore, response bias may be more problematic in antidepressant trials due to the inherently subjective nature of rating symptoms based on patients’ reports [67]. Mancini and colleagues performed a patient-level analysis of duloxetine (≥ 60 mg/day) RCTs obtained from Lilly [37]. Lower effect sizes were found for participants in the lowest baseline HDRS depression severity and in patients in the last category of the recruitment period, whereas a higher effect size was obtained for subjects recruited in centers equal or lower in size than 2.5 times the average site-size for the trial. The methodological shortcomings posed by regression to the mean and rater bias (i.e. baseline score inflation and low inter and intra-rater reliabilities) have been explored in different ways. For example, one strategy involves setting a minimum baseline score for enrollment in a trial, but then including in the final analysis participants with a priori defined higher score thresholds. Another strategy has been the use of centralized (and highly-trained) raters but this is often not possible at individual study sites. However, a recent report demonstrated no significant benefits of enhancing interviews with the Structured Interview Guide for the Montgomery- Åsberg depression rating scale (SIGMA), audiotaping of patients’ interviews and “central” appraisal with Rater Applied Performance Scale (RAPS) [79].

Natural History of the Illness

The impact of the natural course of depression on trial outcomes is better appreciated in psychotherapy trials, which commonly enroll a waiting list control group. A meta-analysis found that patients allocated to waiting control group experience an average improvement of 4 points on the HDRS over a mean follow-up duration of 4 weeks [80]. It seems reasonable to assume that the natural history features play a progressively important role in outcomes of depression trials over time as the population enrolled in trials change. For example, in the 1960s and 1970s, most trials enrolled inpatients with more severe depression compared to more recent trials which usually enroll participants with less severe depression. Arguably, individuals with less severe depression may present higher fluctuation in their symptoms (vide infra). Notwithstanding the recruitment of participants of longer illness duration may mitigate the influence of natural history factors, this issue seems to less dependent on investigator behavior than are measurement factors (Table ).

Characteristics of Enrolled Subjects

Several characteristics of enrolled subjects may influence the placebo response, namely prior exposure to antidepressant treatments, severity (vide infra), duration of illness, personality characteristics, degree of refractoriness, depression subtype (eg atypical versus melancholic), and comorbid psychiatric and medical conditions.

The Nocebo Effect

Nocebo refers to adverse events (AEs) related to the negative expectations that a treatment may harm instead of ameliorate the underlying medical condition. Nocebo effects may be evaluated in RCTs. A recent meta-analysis demonstrated that 44.7% of participants enrolled to placebo experienced a at least one AE, while one out of 20 placebo-treated patients is reported to had discontinued treatment due to AEs [81]. Furthermore, there were quantitative and qualitative associations between active and placebo AEs [81]. Thus, some strategies may prevent nocebo effects in antidepressant RCTs. For example, informed consents for the active treatments under investigation may be modified; the nocebo effect should clearly discussed with the participant; and the proper blinding of raters who measure AEs in antidepressant RCTs may be an important step.

The Additive Model

The additivity thesis of pharmacological efficacy is crucial since it suggests that the specific or ‘true’ size of the pharmacological treatment effect is limited to the difference between the drug and placebo responses [82]. Althought this is a convenient and practical model and does not implies the presence of a similar neurobiological mode of therapeutic action, it is important to note that at the end of the day this theory does indirectly imply such a similarity. This method is purely quantitative and thus demands similar ‘quality’. This method does not take into account that participants allocated to the placebo arm often receive additional treatments which may influence several HDRS items. Furthermore, this model has never been confirmed by neurobiological research. On the contrary, antidepressant and placebo responses could be distinct phenomena even if some degree of overlap exists. Four types of response patterns may exist: (i) placebo-only responders; (ii) treatment-only responders; (iii) placebo and treatment responders; and (iv) never responders. Kirsch[83] had proposed a modified version of the balanced placebo design to answer this question. According to this proposal, half of the study participants would be given medication and half would be given placebos. However, informed consents are obtained for participants receiving either drugs or placebo, and participants are informed (or misinformed) after this consent has been given. All subjects are debriefed by the end of the investigation. Since this design would induce deception in a distressed population, serious ethical concerns have been raised [82].

META-ANALYSIS METHODS

Grossly, there are two meta-analyses methods which have been employed to evaluate to efficacy of treatments in psychiatry. Standard pairwise meta-analysis allows the direct comparisons of two treatments. For example, some antidepressant meta-analyses determined the relative efficacy and/ or safety of specific antidepressants over placebo [84, 85], whereas other meta-analyses compared one antidepressant over another agent [86] or even over an antidepressant class [87] through the inclusion of head-to-head randomized trials. The overall estimation of effect sizes is influences by methodological quality, publication bias as well as the heterogeneity across studies. A fundamental assumption of all meta-analysis is that either the true treatment effect is constant across trials (fixed effects model) or that the trial-specific treatment differences follow a common distribution (random effects model). More recently, a meta-analysis method referred to as network meta-analysis (also referred to as mixed treatments comparisons meta-analysis and multiple treatments meta-analysis) have gained increasing popularity in psychiatry [88-90]. Network meta-analysis (NMA) allows the comparison of different treatments on a Bayesian framework through the incorporation of indirect evidences. Head-to-head (i.e., comparator) trials are relatively uncommon in medicine, including psychiatry [91]. Notwithstanding NMA has a strong potential to rank evidences in psychiatry and, therefore, to influence public policies several assumptions and limitations need to be addressed. While comparing a treatment A versus a treatment B, NMA incorporates both direct (a versus B comparisons) and indirect comparisons (for example, the combination of trials A versus C and B versus C) to estimate the AB difference in efficacy. For example in the hypothetical Fig. , treatments A and C have not been compared directly, howver there is indirect evidence contrasting the effect size from the direct AB evidence from the effect size of the direct BC evidence. Importantly, indirect comparisons are built on an assumption of transitivity, which is of legitimate importance for a NMA [92]. The transitivity assumption requires that studies making distinct direct comparisons must be similar in all aspects other than the treatments. When both direct and indirect evidences are available in a network we state that there is mixed evidence. For example, in Fig. , there is indirect evidence only concerning the comparison BD, while there is mixed evidences for comparisons AB, AD, AC, DC, and BC. Multiple treatments meta-analysis relies on the circumstances of each set of trials (eg, inclusion criteria, randomization, baseline depression severity, etc.), thus clinical judgment is important [93]. Several pairwise meta-analyses are not sufficiently powered [94] and similar concerns may extend to NMA [95]. Underpowered RCTs tend to be more prone to bias (e.g., spurious and exaggerated effect estimates and selective reporting of results). Combination of biased data may give rise to unreliable estimates on a network. Network meta-analysis constitutes a unique methodological approach to investigate whether heterogeneity exists in the pairwise comparisons it encompasses. Statistical heterogeneity occurs when estimates of treatment effects (e.g., odds ratios or relative risks) that were obtained from different trials may vary more often than what would be expected by chance. Clinical heterogeneity occurs whenever there are differences between individual studiesin terms of characteristics of included participants. Furthermore, NMA allows the determination of whether coherence or consistency is present in the results of different clinical trials that constitute indirect comparisons vs. the available evidence from direct contrasts between treatments [96]. Box depicts the advantages and limitations of NMA when compared to standard pairwise meta-analysis. It is of high importance to decide on which method to use and which is the most appropriate way to express changes and effect sizes. Most analyses to date use the Raw Mean Difference (RMD) as the measure of effect size, except for a few reports which employed both RMD and SMD (standardized mean differences) [12, 17, 97]. This choice is very important because it leads to different results and subsequently to different interpretations. Adopting the RMD does not take into account the variability within studies, whereas SMD to a certain extent controls for floor and ceiling effects.

THE ISSUE OF BASELINE SEVERITY

A basic problem is that the concept of ‘severity’ not adequately studied and it is poorly defined. It should be noted that some items including ‘depressed mood’ manifest a ceiling effect as severity increases, while others including ‘suicidality’ manifest a floor effect with lower severity [53, 98-107]. Severity of the acute episode does not necessarily reflect overall severity of the illness. The latter should rely upon the long-term course of the illness, burden and outcome. Unfortunately, the HDRS and the MADRS both describe a concept of depression which does not corresponds to modern ideas and classification criteria [53, 104, 108]. The real correlation of HDRS scores and depression severity is a matter of debate. It is believed among clinicians that patients with higher disease severity at baseline respond better to treatment. This relation of baseline disease severity with treatment has a generic name in the statistical literature: ‘the relation between change and initial value’ [109]. In psychology, it is also well-known as the ‘law of initial value’ [110]. In this frame, the concept of ‘mathematical coupling’ [111] suggests that there is a strong structural (mathematical) correlation (~0.71) between the baseline values and change after treatment. This correlation is present, even when ‘change’ is calculated on the basis of two columns of random numbers [112]. Mathematical coupling leads to an artificially inflated association between initial value and change scores [113]. Therefore, in every medical field and every intervention, it is expected that initial severity is related to treatment outcome. This is the result of a mathematical structural characteristic, which is intrinsic to methodology. Bayesian methods, which are able to partially control for this artifact, are not routinely applied in meta- analytic research [114-116]. The problem is that, even Bayesian methods are not completely free from this phenomenon. The issue of initial severity is very important because eventually this is the reason why many treatment guidelines are reluctant to recommend pharmacotherapy for milder forms of major depression. During the last decade, many authors argued that antidepressants act only in severe depression [12]. They also have argued that alternative treatment approaches are more suitable for mild cases. As a consequence patients suffering from mild depression are not deprived from the right to receive treatment with antidepressants, However, this is an incorrect assumption based of inappropriate methods of analysis. A recent meta-analysis of data at the patient level suggested that initial severity plays no role [117]. Furthermore, a careful multiple treatments meta-analysis of the Kirsch [12] data set rejected initial severity as a factor that should dictate the treatment options [17]. However, another individual-level meta-analysis suggests that initial severity plays a major role in antidepressant response rates, with patients with mild depression having unclear therapeutic benefits following antidepressant treatment [13]. As suggested by a recent meta-analysis [118] the therapeutic efficacy of antidepressants for mild depression remains to be established. Clearly, future research should focus on resolving the issue of baseline severity.

RANKING ANTIDEPRESSANTS

Cipriani and colleagues [88] published an influential network meta-analysis of head-to-head antidepressant randomized trials of second generation antidepressants. These authors found escitalopram to have the best balance between efficacy and safety. However, the authors suggested that sertraline should be regarded as the first-line choice based on the fact that sertraline would have lower costs. This conclusion seems peculiar as the authors did not perform a formal cost-effectiveness analysis. This meta-analysis has been extensively criticized elsewhere [119-124]. In brief, we believe that the authors overstated their findings and did not acknowledge several methodological pitfalls of their meta-analysis. For example, the exclusion of placebo-controlled comparisons represents a significant source of bias (vide supra). Furthermore, there is a significant selective reporting of antidepressant trial results (i.e., publication bias) [8]. A significant proportion of negative trials submitted to the US Food and Drugs Administration (FDA) are either not published or published in a way conveying a positive outcome [8]. The primary outcome for this meta-analysis (i.e., treatment response) is binary in nature and may artificially inflate differences between treatments [125]. Thus, methodological heterogeneity between included studies, lack of full representativeness of the studied dataset, problematic analyses, conflicts of interests, and shortcoming in data analysis preclude firm conclusions of different efficacies between newer generation antidepressants. Interestingly, a similar network meta-analysis did not identify meaningful differences in efficacy between second-generation antidepressants [126]. These authors updated their meta-analysis and continued to find no evidences for recommending a particular second generation antidepressant on the basis of differences in efficacy [127]. More recently, Naudet and colleagues performed a nertwork meta-analysis comparing different placebos from fluoxetine, venlafaxine and fluoxetine/venlafaxine versus placebo trials [123]. Notwithstanding the authors did not find significant differences in response/remission rates between the three placebos (i.e., fluoxetine-placebo, venlafaxine-placebo, and venlafaxine/fluoxetine placebo) in terms of response/remission rates, the authors argue that due to publication bias, a firm conclusion that ‘sucrose equals sucrose’ could not be established. In their, epistemologically sound analysis, they suggested that the field should focus in improving trial methodology instead of attempting to prematurely rank available antidepressants regarding efficacy.

CONCLUSIONS

Meta-analysis complements primary research by distrilling the raw data and by providing more specific answers. It is however dangerous to over-analyze data or utilize problematic methods of analysis. Also the risk of over-interpretation is high. The significant number of meta- analysis performed so far made antidepressants maybe the class of drugs best meta-analytically studied in the whole of medicine. The non-harmonization of meta-analytic techniques and methodological inconsistencies in included trials (i.e., clinical heterogeneity) has unintentionally fostered inconsistent results that have belied our wish to arrive at true evaluations of drug efficacy compared to placebo and each other. These inconsistent results resulted in a negativistic way of conceptualization the treatment with antidepressants by the lay press, but this was also the case with prominent medical scientists (reviewed in reference [35]). This is essentially a new type of stigma for depressed patients [35]. It is clear that meta-analysis has the potential to be at the highest level of evidence concerning the evaluation of interventions in health care. On the other hand, methodological inconsistencies across trials and in the inclusion criteria for different meta-analyses pose a significant concern. For example, two 23 network meta-analyses that ranked efficacies of second generation antidepressants failed to demonstrate differential efficacies between drugs [126, 127]. Conversely, the meta-analysis performed by Cipriani and colleagues [88] which had studied the same antidepressants reported mirtazapine and venlafaxine as the most efficacious antidepressants, and duloxetine, fluvoxamine, paroxetine, and reboxetine as the least efficacious. Considering that most available evidences regarding antidepressant efficacy are derived from placebo-controlled trials, it is possible that the exclusion of placebo comparisons from the later NMA [88] might have altered the results. To conclude it is important to establish transparent consensus-based standards for the design and conduction of more well-designed and homogeneous antidepressant RCTs. This initiative has the potential to allow the establishment of more clinically informed and sound evidences of ‘true’ antidepressant effects, which could be more suitable for the synthesis of evidence. Furthermore, the inclusion criteria and conduction of NMA of antidepressant efficacy is open to debate. At the current state of knowledge in the field, it seems premature to rank different antidepressants in terms of efficacy and safety.

SOURCES OF FUNDING

No funding was available for the current study from any source

Table 1.

Hamilton depression rating scale (HDRS) items and their possible relationship to side effects and response to various agents.

		RESPONDS TO
HDRS Item	Side Effect	BZD	AHis	OLZ	AP	MIRT
Loss of libido	+	-	-	-	-	-
Gastrenterological	+	+	-	-	+/-	-
Weight loss	+/-	-	+	+	+	+
Insomnia	+	+	+	+	+	+
General somatic symptoms	+	+
Agitation	(+)	+	+	+	+	+
Anxiety	(+)	+	+	+	+	+

Abbreviations: BZD: benzodiazepines; AHis: anti-histamine; OLZ: olanzapine; AP: antipsychotics; MIRT: mirtazapine.

Table 2.

Variables influencing placebo response rates in antidepressant clinical trials.

Factor	Influences Placebo Response	Related to Depression Neurobiology	Passive of Modification
Treatment factors
Expectancy-related placebo effects	+	+	+
Therapeutic setting	+	+	+
Measurement effects
Rater bias	+	-	+
Response bias	+	-	+
Natural history factors	+	+/-	+
Participant characteristics	+	+	+

Box 1.

Advantages and limitations (i.e., risks) of network meta-analysis.

Advantages
Compared to conventional pairwise meta-analysis, NMA allows the incorporation of both direct as well as indirect sources of evidence; Network meta-analysis allows a probability-based rank order of different treatments in terms of safety and efficacy; Network meta-analysis may inform future research directions by graphically illustrating existing direct and indirect comparisons across treatments; Network meta-analysis can accommodate complex research questions by simultaneously incorporating several outcomes or by adding expert opinions in the form of probability-based prior distributions; Indirect comparisons may in certain circumstances eradicate trial specific-biases that are sometimes not properly identified in comparator (i.e. head-to-head) trials.
Limitations
Statistical heterogeneity; Clinical heterogeneity; Between-studies methodological inconsistencies in the context of a NMA may affect several pooled effect estimates; Incoherence (i.e., significant differences between evidences provided from direct versus indirect comparisons) may limit a consistent ranking of different treatments.

123 in total

Review 1. Psychometric developments of the Hamilton scales: the spectrum of depression, dysthymia, and anxiety.

Authors: P Bech
Journal: Psychopharmacol Ser Date: 1990

2. Relative citation impact of various study designs in the health sciences.

Authors: Nikolaos A Patsopoulos; Apostolos A Analatos; John P A Ioannidis
Journal: JAMA Date: 2005-05-18 Impact factor: 56.272

3. Patients excluded from an antidepressant efficacy trial.

Authors: T Partonen; S Sihvo; J K Lönnqvist
Journal: J Clin Psychiatry Date: 1996-12 Impact factor: 4.384

Review 4. Contribution of spontaneous improvement to placebo response in depression: a meta-analytic review.

Authors: Bret R Rutherford; Shoko Mori; Joel R Sneed; Monique A Pimontel; Steven P Roose
Journal: J Psychiatr Res Date: 2012-03-10 Impact factor: 4.791

Review 5. Antidepressant drug effects and depression severity: a patient-level meta-analysis.

Authors: Jay C Fournier; Robert J DeRubeis; Steven D Hollon; Sona Dimidjian; Jay D Amsterdam; Richard C Shelton; Jan Fawcett
Journal: JAMA Date: 2010-01-06 Impact factor: 56.272

Review 6. Does the probability of receiving placebo influence clinical trial outcome? A meta-regression of double-blind, randomized clinical trials in MDD.

Authors: George I Papakostas; Maurizio Fava
Journal: Eur Neuropsychopharmacol Date: 2008-09-26 Impact factor: 4.600

7. Is the antidepressive effect of second-generation antidepressants a myth?

Authors: P Bech
Journal: Psychol Med Date: 2010-02 Impact factor: 7.723

Review 8. Using measurement strategies to identify and monitor residual symptoms.

Authors: Roger S McIntyre
Journal: J Clin Psychiatry Date: 2013 Impact factor: 4.384

9. No role for initial severity on the efficacy of antidepressants: results of a multi-meta-analysis.

Authors: Konstantinos N Fountoulakis; Areti Angeliki Veroniki; Melina Siamouli; Hans-Jürgen Möller
Journal: Ann Gen Psychiatry Date: 2013-08-13 Impact factor: 3.455

10. Behavioural activation for depression; an update of meta-analysis of effectiveness and sub group analysis.

Authors: David Ekers; Lisa Webster; Annemieke Van Straten; Pim Cuijpers; David Richards; Simon Gilbody
Journal: PLoS One Date: 2014-06-17 Impact factor: 3.240

5 in total

1. Protocol for a systematic review and meta-analysis of the placebo response in treatment-resistant depression: comparison of multiple treatment modalities.

Authors: Brett D M Jones; Cory R Weissman; Lais B Razza; M Ishrat Husain; Andre R Brunoni; Zafiris J Daskalakis
Journal: BMJ Open Date: 2021-02-16 Impact factor: 2.692

2. Magnitude of the Placebo Response Across Treatment Modalities Used for Treatment-Resistant Depression in Adults: A Systematic Review and Meta-analysis.

Authors: Brett D M Jones; Lais B Razza; Cory R Weissman; Jewel Karbi; Tya Vine; Louise S Mulsant; Andre R Brunoni; M Ishrat Husain; Benoit H Mulsant; Daniel M Blumberger; Zafiris J Daskalakis
Journal: JAMA Netw Open Date: 2021-09-01

3. A new taxonomy was developed for overlap across 'overviews of systematic reviews': A meta-research study of research waste.

Authors: Carole Lunny; Emma K Reid; Trish Neelakant; Alyssa Chen; Jia He Zhang; Gavindeep Shinger; Adrienne Stevens; Sara Tasnim; Shadi Sadeghipouya; Stephen Adams; Yi Wen Zheng; Lester Lin; Pei Hsuan Yang; Manpreet Dosanjh; Peter Ngsee; Ursula Ellis; Beverley J Shea; James M Wright
Journal: Res Synth Methods Date: 2022-01-23 Impact factor: 9.308

Review 4. Placebo Effect in Obsessive-Compulsive Disorder (OCD). Placebo Response and Placebo Responders in OCD: The Trend Over Time.

Authors: Georgios D Kotzalidis; Antonio Del Casale; Maurizio Simmaco; Lucia Pancheri; Roberto Brugnoli; Marco Paolini; Ida Gualtieri; Stefano Ferracuti; Valeria Savoja; Ilaria Cuomo; Lavinia De Chiara; Alessio Mosca; Gabriele Sani; Paolo Girardi; Maurizio Pompili; Chiara Rapinesi
Journal: Curr Neuropharmacol Date: 2019 Impact factor: 7.363

5. Item-based analysis of the effects of duloxetine in depression: a patient-level post hoc study.

Authors: Alexander Lisinski; Fredrik Hieronymus; Jakob Näslund; Staffan Nilsson; Elias Eriksson
Journal: Neuropsychopharmacology Date: 2019-09-14 Impact factor: 7.853

5 in total