Literature DB >> 33242191

Quantitative Evidence Synthesis Methods for the Assessment of the Effectiveness of Treatment Sequences for Clinical and Economic Decision Making: A Review and Taxonomy of Simplifying Assumptions.

Ruth A Lewis¹, Dyfrig Hughes², Alex J Sutton³, Clare Wilkinson⁴.

Abstract

Sequential use of alternative treatments for chronic conditions represents a complex intervention pathway; previous treatment and patient characteristics affect both the choice and effectiveness of subsequent treatments. This paper critically explores the methods for quantitative evidence synthesis of the effectiveness of sequential treatment options within a health technology assessment (HTA) or similar process. It covers methods for developing summary estimates of clinical effectiveness or the clinical inputs for the cost-effectiveness assessment and can encompass any disease condition. A comprehensive review of current approaches is presented, which considers meta-analytic methods for assessing the clinical effectiveness of treatment sequences and decision-analytic modelling approaches used to evaluate the effectiveness of treatment sequences. Estimating the effectiveness of a sequence of treatments is not straightforward or trivial and is severely hampered by the limitations of the evidence base. Randomised controlled trials (RCTs) of sequences were often absent or very limited. In the absence of sufficient RCTs of whole sequences, there is no single best way to evaluate treatment sequences; however, some approaches could be re-used or adapted, sharing ideas across different disease conditions. Each has advantages and disadvantages, and is influenced by the evidence available, extent of treatment sequences (number of treatment lines or permutations), and complexity of the decision problem. Due to the scarcity of data, modelling studies applied simplifying assumptions to data on discrete treatments. A taxonomy for all possible assumptions was developed, providing a unique resource to aid the critique of existing decision-analytic models.

Entities: CellLine Chemical Disease Gene Species

Year: 2020 PMID： 33242191 PMCID： PMC7790782 DOI： 10.1007/s40273-020-00980-w

Source DB: PubMed Journal: Pharmacoeconomics ISSN： 1170-7690 Impact factor: 4.981

Key Points for Decision Makers

Introduction

The availability of multiple interventions for the same condition or indication is increasingly common [1]. To optimise treatment outcomes and value for money, a sequence of treatments is likely to be used in such contexts. Policy and clinical decisions based on the optimum sequence rather than the effectiveness or cost-effectiveness of discrete treatments are becoming increasingly important [2-5]. This is especially true for chronic diseases, such as depression, diabetes, and cancer [5-7], and some infectious diseases where treatment resistance can limit effectiveness, for example human immunodeficiency virus (HIV) [8]. However, synthesising and interpreting the evidence base to inform such decisions is not straightforward. Treatment sequencing represents a complex intervention pathway where treatment history and patient characteristics may influence both the choice and the effectiveness of subsequent treatments. Treatment history represents multiple factors, including, number and type of previous treatments [9, 10], carry-over effects of prior treatments [11-13], type, level and duration of response to previous treatment [14-16], time on treatment [17], intolerance or toxicity [16, 18], development of disease resistance [19, 20], and burden of preceding treatments that can impact subsequent adherence [7, 21]. Time and disease trajectory are also important factors that can influence the effectiveness of subsequent treatment, the impact of which can be both dependent on and independent of previous treatments [9, 10, 22, 23]. Subsequent treatment choices include dose escalation, add-on therapy, a completely new treatment, or re-use of a previously effective treatment. In some instances, for example relapsing-remitting multiple sclerosis, previous treatments can restrict the choice of allowable follow-on drugs [24]. Randomised controlled trials (RCTs) provide the most robust estimates of treatment effects to inform policy and clinical decision making. However, RCTs of treatment sequences are few in numbers and do not cover the breadth of decision making needed. As the number of available treatments increases, the number of unique sequences will increase geometrically [4, 25], making it impractical and prohibitively costly to evaluate all conceivable sequences in RCTs. The time-varying adaptive nature of many sequences also means that innovative and novel approaches, such as sequential multiple assignment randomised trials (SMARTs), are required for developing the dynamic treatment regimens [26-28]. RCTs of discrete treatments, used at single points in the treatment pathway, provide robust estimates of effectiveness for their specific context, but may not provide representative estimates for these treatments when used in different contexts, such as the later stages of sequences. Participants who enrol into clinical trials and are adherent to discrete treatments may also be quite different from subjects in trials of treatment sequences where alternative, subsequent treatment options are available [7, 29–31]. In sequential treatment studies, participants’ decision to end first-line treatment may be influenced by the knowledge there is a second-line treatment readily available [21]. Alternative data sources, which can potentially provide context-specific estimates of treatment effects in different sequences, are longitudinal observational studies, but these are subject to selection bias and confounding. Evidence synthesis methods that produce the least biased estimates of treatment-sequencing effects are required to inform reliable clinical and policy decision making. Due to the limitations of primary data sources outlined above, this is likely to require advanced meta-analytic techniques [32-36] or mathematical modelling [37]. There is no current guidance for best practice in this context. The Decision Support Unit (DSU), which is commissioned by the National Institute for Health and Care Excellence (NICE) to provide a research resource to support the institute’s Technology Appraisal Programme has developed a briefing document on reviewing sequential treatments and downstream costs [38]. This was part of a series of briefing papers and reports developed to inform the 2013 update of the NICE methods guide. The updated methods guide highlighted the fact that some technology appraisals may need to consider the comparison of treatment sequences. However, neither the updated methods guide nor the DSU’s briefing document provided guidance on evaluating the clinical effectiveness or modelling treatment sequences. We did not find any other health technology assessment (HTA) guidance that provided information on evaluating treatment sequences. Our paper provides a first step in addressing this limitation. As a step towards informing best practice, a comprehensive review of reported quantitative evidence synthesis methods was conducted to establish what existing methods are available and outline the assumptions they make and any shortcomings. It is also hoped that this review will draw attention to this increasingly important area and encourage future methods development. The review of methods was conducted with the aim of providing guidance for undertaking HTA or similar processes, including comparative effectiveness research and evidence-based guideline development. We did not aim to assess the effectiveness or cost-effectiveness of treatment sequences here, rather the methods used to develop summary treatment effect estimates of whole sequences or discrete treatments conditional on their positioning in the treatment pathway. The review considered methods applied within both clinical and economic evaluation; however, our focus is on the estimation of clinical effectiveness and does not consider the impact of treatment sequencing on the estimation of costs or utility values.

Methods

Literature Search

The intention was to identify the breadth of methods developed for evaluating treatment sequences and not to identify every study that used each method. The breadth of our review, the recognised challenges of identifying and selecting methodological research using reference databases [39-41], and the fact that the majority of relevant literature would likely be studies reporting applicable methods or methodological developments as part of a wider applied study, rather than being primarily methodological studies [41], meant that a conventional systematic search of reference databases was considered insufficient for the current review. A number of approaches and sources were therefore used to identify relevant methodological studies. The following bibliographic databases were searched from inception to August 2013: MEDLINE, Embase, and the Cochrane Library. The search strategy is provided in Online Resource 1 (see the electronic supplementary material). This was supplemented by hand-searching the following: internet search engines; the websites of specific organisations, including NICE; electronic journals; the agendas of online conference proceedings; the references of existing reviews (listed in Online Resource 1) and relevant papers; known author searches; and forward citation tracking. The reference database searches were not updated, but iterative and purposeful hand searches, including the PubMed related citations function, were continued throughout the review process. An in-depth review was conducted of relevant studies identified during the initial searches. Potential new studies then were then cross-referenced with a list of included studies and recorded methods. More recent studies were only included if they contributed to new methods or knowledge. The searches were deemed to be complete when further efforts to identify information did not add to the analysis [42] (with the most recent study published in 2016). This is analogous to reaching the point of ‘saturation’ in qualitative research [42, 43]. The searches have since been supplemented by a recent purposeful and targeted search, which incorporated scanning studies included in a recent systematic review of economic evaluations in rheumatoid arthritis by Ghabri et al. [44].

Assessing Study Relevance

The review included any disease condition and sequence of any type of treatment. It did not consider decision problems relating to prevention, screening/prognostic, diagnostics, or treatment monitoring. It focused on treatment switching based on a clinical assessment. Studies evaluating the effectiveness of planned sequential administration of combined therapy were excluded, as this represented a different type of decision framework. The review included studies that applied or developed quantitative evidence synthesis methodology as part of secondary research. Studies that used qualitative or narrative evidence synthesis and primary research evaluating treatment sequences were excluded. Any type of meta-analytic technique was considered, incorporating, but not limited to, pairwise meta-analysis, meta-regression, network meta-analysis (NMA), and any meta-analysis based on individual patient data (IPD). Decision-analytic modelling techniques developed to evaluate treatment sequencing, whether conducted as part of an economic evaluation or not, were included. Modelling studies that aimed to evaluate the effectiveness of discrete treatments and incorporated the impact of downstream treatments were only included if they specifically modelled sequencing effects. Studies published in abstract form were excluded, as were economic evaluations based on a single RCT.

Results

Overview of Included Studies

Database searches, after de-duplication, identified 752 references, of which 94 were deemed potentially relevant after screening titles and abstracts. Twenty-six of these could not be further assessed as they were unavailable (n = 2), could not be translated (n = 2), or were only published as conference abstracts (n = 22). A further 28 of those retrieved in full were excluded as they were not relevant (Fig. 1). After collating studies published in more than one publication, the remaining 40 references yielded 36 studies of interest. These were included in the review, along with a further 53 studies identified via internet and hand searches. Recent supplementary targeted searches identified two studies [45, 46] that contributed a new modelling technique. There were 91 studies in all.

Fig. 1

Flow diagram showing the number of references identified, publications retrieved, and studies included in the methodology review

Flow diagram showing the number of references identified, publications retrieved, and studies included in the methodology review Forty-nine (54%) studies investigated the use of disease-modifying antirheumatic drugs (DMARDs), including biological agents (or biologics), for the treatment of inflammatory arthritis, including rheumatoid arthritis, psoriatic arthritis, and ankylosing spondylitis. Fourteen (15%) related to oncology. The remainder assessed treatments for epilepsy (n = 4; 5%), psoriasis (n = 4), depression (n = 3; 3%), glaucoma (n = 2; 2%), schizophrenia (n = 2), type 2 diabetes mellitus (n = 2), HIV (n = 2), neuropathic pain (n = 1), postherpetic neuralgia (n = 1), sciatica (n = 1), fibromyalgia (n = 1), chronic hepatitis B infection (n = 1), Crohn’s disease (n = 1), onychomycosis (n = 1), and spasticity (n = 1). The majority involved sequences of drug treatments, but some also considered other interventions, for example, surgery for sciatica. Only two studies were primarily methodological [14, 47]. Meta-analysis and decision-analytic modelling were reviewed as two distinct categories of quantitative evidence synthesis methods.

Meta-Analytic Methods

Twenty-three studies were included in the evaluation of meta-analytic approaches [9–11, 16, 23, 47–64]. However, some of these studies were considered relevant in fairly broad terms, such as providing examples of how the limited evidence base precluded the evaluation of treatment sequencing, or representing the use of stratified analysis by line of therapy, which could potentially provide a building block for future methods development. These approaches were initially not considered pertinent to the review but because of the dearth of relevant methods identified, a post hoc decision was made to include them as examples of simplifying methods. This provided a more comprehensive list of the approaches pragmatically used for evaluating treatment sequencing in general, rather than limited to novel methods for developing sequence-specific summary effect estimates. An overview of the studies, including their aims, approaches used, and the data sources, is presented in Table 1.

Table 1

Overview of the meta-analytic approaches used by included studies (studies are ordered according to the methodological approach used)

Study first author, year	Condition	Aim of study	Analysis aimed to evaluate treatment sequences?	Sequencing studies*	Stratified MA (individual treatment lines)	Subgroup analyses	Meta-regression	NMA of both 1st- and 2nd-line treatments	Modifying factor	Ranking absolute effects	Available evidence base**
Heng, 2014 [52]	Metastatic renal cell carcinoma	To systematically review published real-world evidence comparing sequential treatments	Yes	X							Observational studies (of sequences)
Stenner, 2012 [11]	Metastatic renal cell carcinoma	To evaluate the optimal sequence for the tyrosine kinase inhibitors sorafenib and sunitinib	Yes	X							Observational studies (of sequences)
Hind, 2008 [53]	Advanced colorectal cancer	Economic evaluation considered planned sequences. Aimed to evaluate the cost-effectiveness of irinotecan, oxaliplatin, and raltitrexed as 1st-line treatments. (Data on clinical effectiveness limited to planed sequencing trials.) Clinical evaluation considered treatments used as 1st and 2nd line	Yes	X	X						RCTs of prospective sequences (n = 2)
NICE CG131 [56]	Advanced colorectal cancer	Economic evaluation aimed to assess the effectiveness and cost-effectiveness of chemotherapy sequences	Yes	X	X						1st-line: RCTs; 2nd-line: RCTs of prospective sequences (n = 3, 2 were quasi-sequences)***
Ruhé, 2006 [59]	Major depressive disorder	To systematically review the evidence for switching pharmacotherapy after a first SSRI for major depressive disorder	Yes	X	X						RCTs (one of whole sequences: SMART design) and observational studies; only RCTs pooled due to heterogeneity
Cooper, 2011 [49]	Depression	To systematically review studies of the management of treatment-refractory depression in older people, covering pharmacological, physical, and psychological interventions (Search strategy included treatment sequencing)	Yes	X	X						RCTs and uncontrolled open-label trials (one non-RCT sequences study); pooled breaking randomisation
Lloyd, 2010 [9]	Rheumatoid arthritis	To evaluate the effectiveness of TNF inhibitors when used sequentially	Yes	X		X	X				Observational studies (uncontrolled and comparative studies of 1st- vs 2nd -line studies)
Rendas-Baum, 2011 [58]	Rheumatoid arthritis	To evaluate the relationship between the clinical response to biologics and the number of previous treatments with TNF inhibitors	Yes			X					RCTs and observational studies; pooled breaking randomisation
Suarez-Almazor, 2007 [64]	Rheumatoid arthritis	To review the evidence on the TNF inhibitors, INF and ETA, regarding the timing of therapeutic introduction, dose escalation, and switching	Yes		X	X					1st-line: RCTs; 2nd-line: observational studies only (not pooled)
Schoels, 2012 [62]	Rheumatoid arthritis	To compare efficacy and safety of biologics after inadequate response to TNF inhibitors (Provided an example of the limitations of examining differences in treatment effects according to the number of previous biologics used.)	No			X					RCTs
Singh, 2009 [63]	Rheumatoid arthritis	To compare efficacy and safety of biologics Included planned subgroup analyses for TNF failure vs none; and biologic failure vs conventional DMARD failure vs none	No			X					RCTs
Salliot, 2011 [60]	Rheumatoid arthritis	To compare efficacy of biologics in two clinical situations: (1) active disease despite MTX; (2) after inadequate response to TNF inhibitor (Provides an example of evaluating effect of previous treatment in subgroup analysis)	No		X	X					RCTs
Nixon, 2007 [57]	Rheumatoid arthritis	To compare the efficacy of four biologics, three of which were TNF inhibitors (Provides an example of using MR to account for disease duration in NMA; first study to combine MR and NMA)	No				X				RCTs
Schmitz, 2012 [61]	Rheumatoid arthritis	To compare efficacy of TNF inhibitors in patients with inadequate response to MTX (Provides an example of the challenges of including previous treatments as a covariate in MR due the poor reporting of primary studies; represents NMA combined with multivariate MR)	No				X				RCTs
Christensen, 2015 [10]	Rheumatoid arthritis	To determine if variations in trial eligibility criteria and patient baseline characteristics could be considered effect modifiers of the treatment response when testing targeted therapies (biological and targeted synthetic DMARDs) (Factors considered included previous DMARD and disease duration)	No				X				RCT
Kanters, 2014 [54]	Rheumatoid arthritis	To explore which clinical factors and patient characteristics are associated with the magnitude of comparative efficacy between biologics vs MTX patients with inadequate response to MTX (Demonstrates the challenges of including previous treatments as a covariate in MR due the poor reporting of primary studies)	No				X				RCTs
Anderson, 2000 [23]	Rheumatoid arthritis	To identify factors predicting response to 2nd-line treatment, with conventional DMARDs or devices (Factors considered included previous DMARD and disease duration)	No				X				RCTs (individual patient-level data)
Mandema, 2011 [55]	Rheumatoid arthritis	To compare the dose–response relationship for the efficiency of biologics. Two of the objectives included the following: Are TNF inhibitors different in patients with an inadequate response to MTX compared to those who are MTX-naïve? Are TNF inhibitors more efficacious than MTX in MTX-naive patients?	No		X		X				RCTs
Grothey, 2004 [51]	Advanced colorectal cancer	To evaluate the importance of the availability of all three active cytotoxic agents, FU-LV, irinotecan, and oxaliplatin, on overall survival. Standard 1st-line therapies were FU-LV plus irinotecan or oxaliplatin	Partial				X				RCTs (one of sequences)****
Abrams, 2016 (IMI GetReal Project case study: https://www.imi-getreal.eu/) [47]	Rheumatoid arthritis	To explore how real-world data, from patient registries, can be used to help demonstrate the relative effectiveness of new medicines. Addresses two key issues: How to connect disconnected networks of evidence to conduct NMA; How to optimise an evidence base using 1st-line evidence to inform 2nd-line effectiveness estimates	Partial					X			RCTs and patient registries (individual patient-level data); registry data used to develop comparative studies of 1st vs 2nd lines
Rodgers, 2011 [16]	Psoriatic arthritis	To determine the clinical effectiveness, safety, and cost-effectiveness of TNF inhibitors in the treatment (Sensitivity analysis for economic model incorporated sequential TNF inhibitors)	Yes						X		Observational studies
Connock, 2006 [48]	Epilepsy	To examine the clinical effectiveness and cost-effectiveness of newer antiepileptic drugs for epilepsy in children. (Economic model included treatment sequencing)	Yes						X		RCTs
Finnerup, 2005 [50]	Neuropathic pain	To develop up-to-date calculation of NNT and NNH as the basis of a proposal for an evidence-based treatment algorithm (sequence of treatments)	Partial							X	RCTs

CG clinical guidelines, DMARD disease-modifying antirheumatic drug, FU-LV fluorouracil-leucovorin, IMI Innovative Medicines Initiative, MA meta-analysis, MTX methotrexate, NICE National Institute for Health and Care Excellence, NMA network meta-analysis, NNH numbers needed to harm, NNT numbers needed to treat, RCT randomised controlled trial, SMART sequential multiple assignment randomised trial, TNF tumour necrosis factor

*Whole sequences or comparing treatment lines

**Unless otherwise stated, RCTs relate to the evaluation of discrete treatments; ‘placebo RCTs’ included a placebo control, whilst ‘RCTs’ included either an active or placebo control

***Quasi-sequencing trials: RCTs of 1st-line treatment with subsequent treatment predefined in protocol, or high proportion of patients went on to receive the same 2nd-line treatment

****Included published RCTs that reported the number of patients receiving 2nd-line therapies made by the authors of the trials

Overview of the meta-analytic approaches used by included studies (studies are ordered according to the methodological approach used) CG clinical guidelines, DMARD disease-modifying antirheumatic drug, FU-LV fluorouracil-leucovorin, IMI Innovative Medicines Initiative, MA meta-analysis, MTX methotrexate, NICE National Institute for Health and Care Excellence, NMA network meta-analysis, NNH numbers needed to harm, NNT numbers needed to treat, RCT randomised controlled trial, SMART sequential multiple assignment randomised trial, TNF tumour necrosis factor *Whole sequences or comparing treatment lines **Unless otherwise stated, RCTs relate to the evaluation of discrete treatments; ‘placebo RCTs’ included a placebo control, whilst ‘RCTs’ included either an active or placebo control ***Quasi-sequencing trials: RCTs of 1st-line treatment with subsequent treatment predefined in protocol, or high proportion of patients went on to receive the same 2nd-line treatment ****Included published RCTs that reported the number of patients receiving 2nd-line therapies made by the authors of the trials The evidence to inform treatment sequencing was broadly considered in two ways: a one-step-at-a-time evaluation based on a series of discrete treatments and a comparison of whole sequences. No novel meta-analytic methods (beyond the use of conventional pairwise meta-analysis [32]) were identified for evaluating treatment sequences, and none directly aimed at developing a summary estimate of effect conditional on positioning in the sequence. Most approaches were developed for addressing excessive heterogeneity or specific gaps in the RCT evidence when evaluating discrete treatments at single points in the pathway. For example, in rheumatoid arthritis, RCTs of initial biological therapy investigated the use of these drugs in both early disease, where patients have not previously received any DMARD therapy, and as add-on therapy for established disease in patients with an inadequate response to previous conventional DMARDs, representing a heterogeneous patient population. The first-generation biologics include tumour necrosis factor-α (TNF) inhibitors. Most RCTs of second-line biologics investigated other types of biologics in participants with an inadequate response to previous TNF inhibitors; few RCTs evaluated the sequential use of first-generation TNF inhibitors, whist registry data show that these are often used in practice as second- or subsequent-line therapy [47]. The current meta-analytic approaches, which can potentially be used in a clinical evaluation of a health technology, are summarised below.

Meta-Analysis of Studies Evaluating Whole Sequences

This approach is hampered by the limited number of available RCTs of treatment sequences, which also makes it difficult to establish a closed network for implementing NMA [56]. Observational studies can be used as alternative data sources, but are subject to confounding and bias. The type of observational studies used included the comparison of participants who had received a predefined sequence of two drugs [11], the evaluation of second-line treatment where generic first-line treatment is used [52], and the comparison of the outcomes of first- and second-line treatments [9, 47]. The comparison of treatments used during an earlier versus a later part of the treatment pathway ignores the likely effect of disease trajectory, issues relating to treatment choice, changes in pathophysiology with time, and other confounding factors. The types of bias and limitations of non-randomised studies that are specific to the evaluation treatment sequences, and identified as part of the review, are listed in Box 1.

Box 1

Potential bias or limitation in non-randomised, real-world observational studies that are specific to the evaluation of treatment sequences

i. Selection (allocation) bias results in systematic differences in prognostic factors between individuals in treatment and control groups, e.g. a cohort of patients receiving their first tumour necrosis factor (TNF)-inhibitor compared with a cohort receiving a second or subsequent TNF-inhibitor. Patients in the second group are likely to have worse prognoses and show limited responses to all treatments [9, 175]. Adjustment for both baseline and post-baseline prognostic factors is necessary, to ensure the comparability of treatment groups [166].

ii. Channelling bias favours patients with more severe disease. New treatments create expectations of improved effectiveness and tolerability; early, post-marketing users are likely to be those who experienced little or no benefit from existing drugs and may therefore respond to the new drug in a way that is not representative of the eventual user population [176].

iii. Regression to the mean occurs because patients tend to be treated with a second or subsequent treatment at the height of their disease activity, where there is a greater than 50–50 likelihood that the condition will start improving after the intervention purely by chance [22].

iv. Confounding by disease duration occurs in conditions such as sciatica and rheumatoid arthritis, where the longer the disease duration, the less likely that patients will respond to any treatment [14]. Treatment history can be both correlated with disease duration [10] and act as independent effect modifier [10].

v.Enrichment of successive treatment use with refractory patients A small proportion of patients have refractory disease that will not respond to any treatment [14]. Populations receiving second-line or subsequent treatments are more likely to be enriched with such patients. This is related to class effect bias (vii). Patients who fail initial treatment due to a tolerability or safety issues are likely to have the same problem with any alternative drug from the same class, increasing the risk for developing an adverse event in patients who switched due to an adverse event [16].

vi. Immortal time bias occurs in studies that limit inclusion to patients who are receiving a specific line of treatment (e,g, third-line chemotherapy) [52] or have completed a predefined sequence, and overlook patients who are continuing the initial treatment, or lost to follow-up after first-line treatment due to lack of efficacy, clinical deterioration, death or drug acceptability issues [52]. It is particularly relevant for treatments of advanced cancer where a large proportion of patients may not complete the sequence, or receive multiple treatment lines [52].

vii. Class effect bias, which is the possibility that the comparison between drug classes may be confounded by differences in the type of patients treated with each class [52].

viii. Aggregate data collection is a limitation of real-world observational studies that do not report individual treatment or drug-level data. Any subsequent evidence synthesis has to be based on pooled data across treatments at class level, even when there is evidence that individual drug effects can vary within a class [11]. Class level treatment effects are often reported even when access to individual patient data is available [82].

ix. Missing or inaccurate data may be obtained from real-world practice. Patient registers and administrative databases are rarely set up for evaluating treatment-sequencing, and may not involve a high level of rigour in recording events [52].

x. Variability in how the same outcome measure is conceived across different studies is a particular issue in oncology when using progression free survival (PFS) to evaluate the impact of a sequence of treatments (e.g. using the sum of the progression free survival period for each treatment line). PFS is a composite endpoint, which may or may not incorporate a treatment free period before the next treatment resulting in a differential impact on the results; this needs to be accounted for in any pooled analysis [11]. Importantly, the use of PFS associated with each successive treatment line to inform treatment-sequencing assumes that all treatment effect from each treatment line stops on progression [11]. Alternative endpoints that have been proposed for evaluating a fixed sequence of treatments [183] include: Duration of disease control (DDC) and Time to failure of strategy (TFS).

NB The type of biases listed here may not be mutually exclusive and the descriptors may not be consistently used in the published literature, for example the phenomenon described as `regression to the mean' can also be representative of both a class effect and a channelling bias, favouring patients with more severe disease [180].

Potential bias or limitation in non-randomised, real-world observational studies that are specific to the evaluation of treatment sequences

Subgroup Analyses to Explore the Impact of Treatment History when Evaluating Treatment Sequences in a Piecemeal Fashion

The subgroups can be defined in two ways: by splitting all studies into two or more groups, also referred to as stratified analysis (e.g. early- vs late-stage disease, or failed previous TNF inhibitor ‘yes’ vs ‘no’) [63, 64], or by taking partial data from included studies (e.g. participants switching to a second TNF inhibitor due to intolerance, lack of efficacy, or loss of efficacy) [58]. A summary of the methods used is provided in Online Resource 2 (see the electronic supplementary material). Stratified analysis is also applied when conducting separate meta-analysis for each line of therapy (e.g. first- or second-line biologics) [62, 64] or for different patient populations (e.g. participants with no previous history of biologic therapy or participants with an inadequate response to previous TNF inhibitors) [10, 60]. The main limitation of using subgroup analyses is that it only allows for the comparison of two subgroups at a time, with or without one covariate. All other covariates are pooled, and each analysis is confounded by other variables [65].

Meta-Regression to Adjust for the Previous Treatment

This approach was not generally used for the sole purpose of evaluating treatment sequences, but was used to account for the heterogeneity within the meta-analysis or NMA. A summary of the methods used is provided in Online Resource 2. The covariate representing previous treatment was often dropped from the final analysis due to non-significant findings [54, 61], possibly due to lack of power, as previous treatment was generally poorly reported in primary studies [10, 54]. However, lack of variability between studies can also contribute to non-significant findings, especially when the meta-analysis is used to compare treatments applied at a single point in the pathway, or where the ordering of treatments is much the same in a given disease. To avoid problems with insufficient power, a limited number of covariates are incorporated in the meta-regression. This frequently included disease duration. For example, a study, which combined the use of NMA and meta-regression to account for the significant heterogeneity between studies of biologics for rheumatoid arthritis, included only two study-level covariates in the meta-regression, disease duration and a measure of baseline disability [57]. The analysis included RCTs of participants who were DMARD naive and RCTs of participants with an inadequate response to these drugs lumped together. Disease duration could potentially be considered as a proxy for previous treatment use, as the likelihood of failing prior treatments will increase with increasing duration. However, there is also justification for including treatment history as a covariate, especially when pooling (lumping together) studies across different treatment lines [10, 23]. The inclusion of both covariates could help to disentangle whether long standing disease per se is associated with a poor response to treatment, or whether failure on previous treatments predicts response to subsequent treatment [22]. The use of IPD is likely to enhance the application of this approach [10], but studies that used such data were still hampered by the poor reporting of previous treatment [23]. A further limitation of conducting an NMA of all discrete treatments irrespective of where they are used in the pathway is that previous treatment(s) can both have an impact on treatment effect, acting as an effect modifier, resulting in heterogeneity, and be associated with the type of treatment comparison, acting as a confounding factor and lead to inconsistency in the network. For example, in an NMA of sciatica treatments, non-invasive treatments were more likely to be used as initial treatments and invasive treatments were used after the failure of other treatments in patients with a more long-standing and less responsive condition [66].

Network Meta-Analysis Incorporating Multiple Treatment Lines, for Example, First- and Second-Line Treatments, as Separate Treatment Nodes

This approach was not developed for evaluating treatment sequences as such, but rather to evaluate methods for incorporating real-world data in evidence synthesis of second-line treatment. In particular, the approach sought to optimise an evidence base using first-line evidence to inform second-line effectiveness estimates. The methods were applied as part of the GetReal project case study of biologics in rheumatoid arthritis [47]. The authors had access to IPD from two national registries and five RCTs (two investigated second-line treatment). A series of Bayesian univariate and bivariate NMA was conducted that incorporated both treatment lines. The data from RCTs provided separate networks of evidence for first- and second-line biologics. No RCT reported on both treatment lines; thus the exchangeability assumption was needed to connect the two networks by assuming all treatment effects have a common distribution. The univariate analysis utilised the registry data as data, whereas the bivariate analyses used the registry data to inform the prior distribution for the correlation parameter between first- and second-line treatments. In the univariate analysis, relative effect estimates for first- versus second-line treatment were obtained from the registry, allowing the two networks to be connected and for treatment comparisons (e.g. drug A in first line vs drug A in second line) to be obtained. The use of multivariate analysis allows separate outcomes to be modelled simultaneously, using the correlation to borrow information across multiple outcomes or time points. Here, the treatment effect for first-line treatment was modelled as outcome 1 and second-line treatment as outcome 2, and the correlation was that of between treatment lines. The initial bivariate NMA was conducted using RCTs of first- and second-line treatments. The correlation estimate was obtained by conducting standard pairwise meta-analysis, based on registry data split into first- and second-line response, and monitoring the correlation. In a second bivariate analysis, the registry data were used as part of the NMA by being split into multiple pairwise studies. This allowed for modelling between-study correlation between the lines of therapy. A third analysis used data from the registries, reporting treatment effect estimates on both lines, which allowed for relaxing the exchangeability assumption on the average level. The biggest challenge here was developing an estimate of correlation between first- and second-line treatments to conduct the analysis. The assumptions of consistency and similarity, across the pairwise contrasts, within the NMA may also be difficult to justify, as discussed above in the NMA of sciatica treatments example. The limitations of relying on observational studies comparing first- and second-line treatment are discussed in Sect. 3.2.1 and Box 1.

Developing a Specific Multiplication Factor that Can be Applied to the Summary Effect of a Treatment Used as First Line in Order to Represent Its Use at a Later Point in the Pathway

This approach is not a meta-analytic method as such, but was used to adapt the findings of meta-analysis of discrete (first-line) treatments to represent sequencing effects. The optimal approach for developing a multiplication factor is yet to be established. Current methods incorporate two approaches [16, 48]. One study, investigating TNF inhibitors for psoriatic arthritis, obtained modifying factors from an observational study comparing the class of drugs used as first-line and subsequent treatment for rheumatoid arthritis. A different multiplication factor was developed, depending on whether the initial TNF inhibitor was discontinued due to inefficacy or adverse effects [16]. A second study developed a reduction factor based on the data available for one antiepileptic drug for which there was an RCT of its use at two different time points, first-line monotherapy and later as an add-on therapy [48]. Modification factors were primarily used by modelling studies, with most not reporting the methods used for developing them [18, 67–72]. Most used estimates based on available evidence, mainly an observational or previous modelling study, the choice of which was frequently not justified. The reduction factor used in the most recent (2020) economic evaluation [45] was obtained from a pragmatic RCT of non-TNF-targeted biologic versus a second TNF inhibitor to treat rheumatoid arthritis in patients with insufficient response to their first anti-TNF-inhibitor (Gottenberg et al. [73]).

Decision-Analytic Modelling

Decision Modelling Methods

Seventy-two modelling studies were reviewed and fifty-two distinct models identified [14–18, 45, 46, 48, 53, 56, 67–72, 74, 74–101, 101–127]. An overview of the included modelling studies is provided in Online Resource 3 (see the electronic supplementary material). Most modelling studies were conducted as part of an economic evaluation. A wide range of modelling techniques were used to address a broad spectrum of treatment-sequencing decision problems (Box 2), which included identifying the optimum sequence; adding a new drug to an established sequence; comparing ‘step-up’ or ‘step-down’ treatment approaches; comparing different treatments used at the same point within a sequence; evaluating a drug used at different points within a sequence; and comparing predefined sequences. The sequence of treatments being modelled ranged from a fixed sequence of a limited number of treatment lines to variable treatment algorithms where patient history and characteristics dictate the choice of subsequent treatments.

Box 2

Illustration of the different types of treatment-sequencing decision problems

As part of the review of modelling studies, a coding scheme was developed for categorising modelling studies according to the type of decision problem relating to treatment sequences that was evaluated. The codes used are illustrated below. Some studies include more than one decision problem type.

a). ‘Optimum sequences’

Identifying the best sequence out of all conceivable sequences (as opposed to comparing predefined sequences, thus selecting a manageable number of sequences for comparison in advance)

b) ‘Predefined sequences’

A - B - C

B - A - C

X - Y - Z

Comparison of pre-specified sequences; also incorporates the following:

c) ‘Disease approach’

A - B

B - A

X - A - B

A - B - X

Comparison of ‘step-up’ vs ‘step-down’ approaches, or the use of new drugs first vs starting with older, established drugs.

d) ‘Single point’

A - B - C - D

A - B - X - D

Comparison or decision point = C vs X. Treatment C is replaced by X in the second sequence

e) ‘Different points’

X - B - C - D

A - X - C - D

A - B - X - D

A - B - C - X

Comparison of X used at different points in the sequence

f) ‘Adding’ a new treatment to an established sequence

A - B - C - D

A - B - X - C - D

Comparison or decision point = C vs X. Treatment C is displaced by X in the second sequence

Illustration of the different types of treatment-sequencing decision problems a). ‘Optimum sequences’ Identifying the best sequence out of all conceivable sequences (as opposed to comparing predefined sequences, thus selecting a manageable number of sequences for comparison in advance) b) ‘Predefined sequences’ A - B - C B - A - C X - Y - Z Comparison of pre-specified sequences; also incorporates the following: c) ‘Disease approach’ A - B B - A or X - A - B A - B - X Comparison of ‘step-up’ vs ‘step-down’ approaches, or the use of new drugs first vs starting with older, established drugs. d) ‘Single point’ A - B - C - D A - B - X - D Comparison or decision point = C vs X. Treatment C is replaced by X in the second sequence e) ‘Different points’ X - B - C - D A - X - C - D A - B - X - D A - B - C - X Comparison of X used at different points in the sequence f) ‘Adding’ a new treatment to an established sequence A - B - C - D A - B - X - C - D Comparison or decision point = C vs X. Treatment C is displaced by X in the second sequence Two published taxonomies developed for categorising different modelling techniques according to their key features [128, 129], along with other guides and algorithms that have been developed to aid the selection of an appropriate modelling technique (or structures) for economic evaluation in general [97, 128–140], were used to categorise the included studies and inform the data extraction. The advantages and disadvantages of each modelling approach were assessed as part of the review. The choice of an appropriate modelling approach depends on the complexity of the underlying decision problem, the extent of the treatment sequences being investigated, and the disease condition. Table 2 provides an abbreviated summary of the overall findings of the review of modelling studies, including how treatment sequences were conceptualised within different modelling approaches (column 2); and the type of the additional attributes in the decision problem (beyond the sequencing of individual treatments) and disease condition that were captured by the included models (column 3). A more detailed summary of the methods and findings of the review of modelling studies is provided in Online Resource 4 (see the electronic supplementary material). The modelling techniques used included deterministic decision tree, stochastic decision tree, Markov cohort model, partitioned survival cohort model, semi-Markov cohort model, individual-patient simulation state transition models, discrete event simulation, discretely integrated condition event (DICE) simulation, non-terminating population-based simulation, terminating population-based simulation, and dynamic Markov cohort model. No study compared any of these alternative approaches for evaluating treatment sequences to assess, for example, how sensitive results were to the type of model used. A number of studies did report choosing a discrete event simulation over a state transition model due to the improved computational efficiency [48, 68, 104, 122]. The level of complexity in the decision problem accounted for in the models varied quite considerably, even when evaluating similar treatment sequences within the same disease condition. The decision problem was also simplified by modelling a limited number of treatment lines, streamlining the disease process, and using a short time horizon. For example, some studies used a 2–5 year time frame, rather than a lifetime horizon, for modelling treatment sequences for rheumatoid arthritis, because a longer time horizon implied too many assumptions [71, 78, 79, 84, 112, 113, 124].

Table 2

Summary of the different modelling approaches used and their advantages and disadvantages for evaluating treatment sequences

Description of modelling approach	How treatment sequences are conceptualised in the model	Further attributes of the decision problem captured in the model	Advantages	Disadvantages	Included models*
Cohort-based models: Simulates a closed group of individuals
Deterministic decision tree (DT)
Depicts all possible pathways (series of decisions with associated probabilities and outcomes) over a set period	Treatment sequences were implemented as: (1) A single DT (e.g. initial node representing choice between 1st-line treatments, and subsequent chance nodes response to treatment leading to either a terminal branch (continued treatment) or 2nd-line treatment) (2) Separate DT for each sequence (treatment algorithm) DT structure (and specified timing of chance nodes) also used to account for the following: differential use of subsequent treatment depending on reason for discontinuation; successful treatment (and its discontinuation) leads to either permanent resolution or recurrence requiring re-treatment; multiple level of treatment response (with different time to progression/mortality risk); occurrence of toxic death and all-cause mortality with differential probability and timing; fixed period of treatment administration; some treatments can be skipped	Relapse treated with a previous successful treatment. Duration of response differs according to levels of response. Reason for discontinuation impacts selection of subsequent treatments. Some treatments administered for a fixed period only whilst others continued until progression. Toxic death and all-cause mortality have different probabilities and timing	Can be relatively straightforward to develop and not computationally intensive. Can be easy to interpret and transparent. Can be used as an adjunct or in conjunction with other methods (e.g. Markov cohort, NICE CG152; or partitioned survival, NICE CG131 below)	No explicit time component; governed by fixed timing of outcomes and events (e.g. recurrence or intolerance to treatment). Has finite time horizon; can become exponentially complex (or bushy) with additional events and disease states, or extensive treatment sequences. Cannot handle looping/recurring events (reflecting chronic diseases that evolve over time) easily; becomes cumbersome and inefficient when time horizon is long. Poorly suited for complex scenarios/sequences	Dranitsaris model NICE CG81 NICE CG152 Sciatica model Frankum model Knoester model
Stochastic decision tree**
A type of decision tree that allows for parameter uncertainty	Same as decision tree	Same as decision tree. Not all patients receive all treatments in sequence (implemented in conjunction with partitioned survival, NICE CG131).	Same properties as decision tree	Same properties as decision tree	Advanced Simulation Model Greenhalgh model NICE CG131
Markov cohort
Simulation of a hypothetical cohort through a set of heath states over time, which is divided into equal intervals (cycles). Involves time-dependent transition between states. In Markov chain transition probabilities are constant over time	Treatment sequences implemented using three different approaches: (1) A series of treatment-specific Markov states (2) As a series of treatment-specific states (or treatment lines) along with additional temporary states representing e.g. adverse effects, relapse (3) As a Markov cycle tree***, with Markov states used to represent different levels of disease activity or natural history. Markov cohort was used for decision problems that related to: the comparison of predefined sequences; or assessing all conceivable sequences (ordering treatments according to net-benefit/cost-effectiveness per unit time) (Fig. 2)	Not all patients receive all treatments in the sequence; reflecting reality where patients may skip some treatments. Duration of response differs according to levels of response and treatment line. Probability of continuing treatment or developing toxicity varies with time and for individual treatments (using tunnel states; or making treatment duration dependent on response level, and the response treatment and line dependent, assuming homogeneous response across treatments but not line of therapy). Reason for treatment discontinuation impacts selection of subsequent treatments. Some treatments administered for fixed period only; patients in remission may withdraw from treatment. Cross-resistance from previous treatment (using tunnel states to account for previous treatment). Monitoring different patient subgroups (using tunnel states). Cycle trees used to account for: Consequence of adverse effects Different levels of treatment response Some patients continue treatment despite not achieving full/clinical response Changes in disease activity Natural history of the disease Complex treatment pathways	Can be relatively straightforward to construct and communicate; but less so with more complex model structures. Has a time component; events (e.g. relapse or treatment switching) can occur at any time. Allows looping/recurring events. Transitions can be unidirectional or bidirectional. Can be used in conjunction with decision tree (cycle tree) to allow for different treatment response, fluctuating disease activity, complex treatment pathways, etc. The use of cloned subtrees enables ease of update	Markov assumption (memoryless): prohibits transition probabilities being dependent on patient’s history: time spent in the state, or previous states visited. (Can be overcome using additional temporary or tunnel states, e.g. making transition dependent on disease progression, and stratification, e.g. response to each treatment line with and without adverse effects.) Patients can only be in one state at a time; implementing sequences as a series of treatment-specific states only (1) does not allow for additional factors (e.g. different reasons for treatment discontinuation). Cannot account for multiple events within one cycle (e.g. toxicity and progression; can be overcome using short cycles or cycle trees). Transitions limited to fixed intervals defined by cycle length. Occurrence of events assumed to be constant over time (Markov chain). Exponential complexity with increasing number of states; modelling extensive treatment sequences (multiple lines) can lead to state explosion, especially when also accounting for additional factors	Albert model Maetzel model Welsing model Tanno model Wu model York psoriasis model Beard model Cameron model Davies model Heeg cancer model Lee model Orme model Sawyer model NICE CG137 Shepherd model Smith model Soini model Tebas model Wong model
Semi-Markov cohort
Incorporates the use of a multiple-dimension transition matrix. Assumes transition probabilities depend on the current state, and the time spent in each state depends upon the current and next state	Treatment sequences implemented as a series of treatment-specific health states	Probability of treatment failure decreases with time spent on a specific drug	Same properties as Markov cohort. Reduced impact of Markovian assumption (not memoryless; incorporates time dependency). Multidimensional matrix can potentially allow model to reflect patient history or previous treatments (but no study included this)	Patients can only be in one state at a time. Transitions can only occur at fixed intervals defined by cycle length. Only one transition allowed per cycle. Becomes more complex with added states	York epilepsy model
Partitioned survival
Simulation of a hypothetical cohort through a set of exhaustive and mutually exclusive heath states over time. Time spent in each health state calculated from the area under the curve of survival functions	Treatment sequences implemented as a series of treatment-specific health states	Decreasing probability of remaining on a given treatment with time	Can be relatively straightforward to develop and not computationally intensive. Non-data intensive. Area under the curve can be calculated continuously over time; no cycles required. Can be used in conjunction with decision tree. Can account for extensive treatment sequences (multiple treatment lines)	Cannot account for complex treatment-sequencing algorithms or additional attributes (e.g. adverse effects, disease duration). Underlying structural assumption—surrogate outcomes, e.g. progression free survival, are independent of overall survival	Schadlich model NICE CG131 (with decision tree) Hind model
Individual sampling models: Simulates one individual at a time. Tracks the past health states of each individual and stochastically models the risk of future events
State transition model
Simulates each individual through a set of exhaustive and mutually exclusive heath states over time, which is advanced in fixed intervals	Fixed treatment sequences modelled; disease activity monitored for each individual over time. Health states generally represented response or non-response to each successive treatment, with the addition of adverse effects as a separate state in some models. Most models also allocated patients to different levels of response after initiating treatment, with those achieving a specific threshold (remission) continuing treatment	Duration of response differs according to levels of response. Changes in disease activity (assumed to relate to level of response, not treatment). Patients follow different disease courses, which cannot be predicted at the onset. Stepped care approach (treatment algorithms)	Not limited by Markov assumption (eliminating need for excessive number of states). A large number of characteristics can be ascribed to individually simulated patients. Access to individual patient data enabled key parameters and events in patient histories to be calculated using multivariate regression, allowing adjusting for important covariates. Can account for heterogeneous population	Transition limited to fixed intervals defined by cycle length. Cannot account for multiple events in one cycle. Can be computationally intensive	Sheffield etanercept model Bansback model Sheffield BSRBR model Diamantpoulus model Sheffield AHRQ model Kielhorn model Kobelt model Holmes model
Discrete event situation (DES)
Simulates time to an event and subsequent events for each individual. Probability of the occurrence and timing of an event is determined by random sampling of a probability distribution. Simultaneously varies multiple variables; inputs vary over time	Treatment sequences implemented in three different ways: (1) Fixed treatment sequences (2) Random selection of pre-defined sequences (3) Developed as part of the modelling process by selecting individual drugs, using a random process, at specific points in the sequence	When conceived as a simple structure: Variable time to quitting treatment. Duration of response differs according to levels of response Changes in disease activity (over time and on treatment) Reason for discontinuation impacts selection of subsequent treatments When developed as a more complex structure: Treatment selection and cessation based on algorithms reflecting specific clinical guidelines (accounting for treatment eligibility) Unpredictable nature of disease progression Multiple treatment outcomes Different levels of response with partial/complete response leading to treatment withdrawal in some patients Not all patients go on to receive subsequent treatments in the sequence Differential treatment selection for subgroups	Can ascribe a large number of characteristics to individually simulated patients. Can account for heterogeneous population. Not limited by the use of fixed time advancement (cycles). Treatment duration can be modelled as a continuous distribution, specific to individual treatments. Patients can simultaneously be in multiple states, and experience different events. Allows for modelling of complex scenarios and treatment algorithms. Can be computationally more efficient than state transition model. Can be easily adapted to incorporate additional events or patient attributes	Extensive data required including time to event, which may be limited, e.g. time to treatment withdrawal due to adverse effects. Individual patient-level data preferred, but can be based on aggregate data. Model structures can be difficult to communicate and interpret. Computationally challenging in terms of model design and running it	BPM/BRAM Tran-Duy model Lindgren model Birmingham epilepsy model Denis model Heeg schizophrenia model
Discretely integrated condition event (DICE) simulation
Decision problem conceptualised as a set of conditions (aspects that persist over time) and events (aspects that occur at a point in time) within spreadsheet tables that specify condition values and event consequences. Provides a single template for implementing a variety of model types (DES, Markov models or hybrids; cohort or individual sampling models; stochastic or deterministic)	Fixed treatment sequences modelled. Events included lack of response and loss of response that lead to treatment initiation. Whilst on treatment patients could experience adverse effects (monitored)	Patient characteristics (baseline risk) and response (status, duration) impacts disease milestones. Variable time to quitting treatment. Changes to disease activity (assumed to relate to level of response). Treatment switching based on clinical rules (disease severity and response). Probability of switching treatments in later lines reduced	Can ascribe a large number of characteristics to individually simulated patients. Can account for heterogeneous population. Flexible, allowing combinations of state-transition and time-to-event components in a single model. Treatment duration can be modelled as a continuous distribution, specific to individual treatments. Allows for modelling of complex scenarios and treatment algorithms. Transparent (specifications tabulated rather than programmed in code). Easily modified	Extensive data required including time to event, which may be limited, e.g. time to treatment withdrawal due to adverse effects. Executing simulation using macro in spreadsheet can be slow for complex models	HAS RA model Deniz model
(Open) population-based models: Allows new cohorts to enter over time
Non-terminating population based simulation
Individual sampling model (DES)	Pre-specified clinical thresholds used to invoke escalation to next treatment	Dynamic disease process (dynamic equations used to project haemoglobin levels over time)	Has the same properties as DES. No clear advantage of using open model over other approaches identified for evaluating treatment sequences. However, Cardiff model provides an example of a model that is continually being developed and updated and capable of running using various levels or types of data	Has the same properties as DES	Cardiff T2DM model
Markov multi-cohort model
Cohort model (Markov)	Markov cycle tree (Markov states represented individual treatments and ‘switching’ [entry/exit state])	Impact of adding a new drug on health care budget assessed using prevalence approach (target population kept constant over time—entry of newly diagnosed cohort at each cycle)	Has the same properties as Markov cohort. No clear advantage of using open model over other approaches identified for evaluating treatment sequences	Has the same properties as Markov cohort	Launois model

BPM Birmingham preliminary model, BRAM Birmingham Rheumatoid Arthritis Model, CG clinical guidelines, NICE National Institute for Health and Care Excellence, T2DM type 2 diabetes mellitus

*A list of included modelling studies is provided in Online Resource 3 (see the electronic supplementary material)

**Base-case model analysed using Monte Carlo simulation

***Markovian cycle tree (or Markov decision tree) is where events that can occur within a cycle are modelled as a series of chance nodes[183, 184]

Summary of the different modelling approaches used and their advantages and disadvantages for evaluating treatment sequences Treatment sequences were implemented as: (1) A single DT (e.g. initial node representing choice between 1st-line treatments, and subsequent chance nodes response to treatment leading to either a terminal branch (continued treatment) or 2nd-line treatment) (2) Separate DT for each sequence (treatment algorithm) DT structure (and specified timing of chance nodes) also used to account for the following: differential use of subsequent treatment depending on reason for discontinuation; successful treatment (and its discontinuation) leads to either permanent resolution or recurrence requiring re-treatment; multiple level of treatment response (with different time to progression/mortality risk); occurrence of toxic death and all-cause mortality with differential probability and timing; fixed period of treatment administration; some treatments can be skipped Relapse treated with a previous successful treatment. Duration of response differs according to levels of response. Reason for discontinuation impacts selection of subsequent treatments. Some treatments administered for a fixed period only whilst others continued until progression. Toxic death and all-cause mortality have different probabilities and timing Can be relatively straightforward to develop and not computationally intensive. Can be easy to interpret and transparent. Can be used as an adjunct or in conjunction with other methods (e.g. Markov cohort, NICE CG152; or partitioned survival, NICE CG131 below) No explicit time component; governed by fixed timing of outcomes and events (e.g. recurrence or intolerance to treatment). Has finite time horizon; can become exponentially complex (or bushy) with additional events and disease states, or extensive treatment sequences. Cannot handle looping/recurring events (reflecting chronic diseases that evolve over time) easily; becomes cumbersome and inefficient when time horizon is long. Poorly suited for complex scenarios/sequences Dranitsaris model NICE CG81 NICE CG152 Sciatica model Frankum model Knoester model Same as decision tree. Not all patients receive all treatments in sequence (implemented in conjunction with partitioned survival, NICE CG131). Advanced Simulation Model Greenhalgh model NICE CG131 Simulation of a hypothetical cohort through a set of heath states over time, which is divided into equal intervals (cycles). Involves time-dependent transition between states. In Markov chain transition probabilities are constant over time Treatment sequences implemented using three different approaches: (1) A series of treatment-specific Markov states (2) As a series of treatment-specific states (or treatment lines) along with additional temporary states representing e.g. adverse effects, relapse (3) As a Markov cycle tree***, with Markov states used to represent different levels of disease activity or natural history. Markov cohort was used for decision problems that related to: the comparison of predefined sequences; or assessing all conceivable sequences (ordering treatments according to net-benefit/cost-effectiveness per unit time) (Fig. 2) Not all patients receive all treatments in the sequence; reflecting reality where patients may skip some treatments. Duration of response differs according to levels of response and treatment line. Probability of continuing treatment or developing toxicity varies with time and for individual treatments (using tunnel states; or making treatment duration dependent on response level, and the response treatment and line dependent, assuming homogeneous response across treatments but not line of therapy). Reason for treatment discontinuation impacts selection of subsequent treatments. Some treatments administered for fixed period only; patients in remission may withdraw from treatment. Cross-resistance from previous treatment (using tunnel states to account for previous treatment). Monitoring different patient subgroups (using tunnel states). Cycle trees used to account for: Consequence of adverse effects Different levels of treatment response Some patients continue treatment despite not achieving full/clinical response Changes in disease activity Natural history of the disease Complex treatment pathways Can be relatively straightforward to construct and communicate; but less so with more complex model structures. Has a time component; events (e.g. relapse or treatment switching) can occur at any time. Allows looping/recurring events. Transitions can be unidirectional or bidirectional. Can be used in conjunction with decision tree (cycle tree) to allow for different treatment response, fluctuating disease activity, complex treatment pathways, etc. The use of cloned subtrees enables ease of update Markov assumption (memoryless): prohibits transition probabilities being dependent on patient’s history: time spent in the state, or previous states visited. (Can be overcome using additional temporary or tunnel states, e.g. making transition dependent on disease progression, and stratification, e.g. response to each treatment line with and without adverse effects.) Patients can only be in one state at a time; implementing sequences as a series of treatment-specific states only (1) does not allow for additional factors (e.g. different reasons for treatment discontinuation). Cannot account for multiple events within one cycle (e.g. toxicity and progression; can be overcome using short cycles or cycle trees). Transitions limited to fixed intervals defined by cycle length. Occurrence of events assumed to be constant over time (Markov chain). Exponential complexity with increasing number of states; modelling extensive treatment sequences (multiple lines) can lead to state explosion, especially when also accounting for additional factors Albert model Maetzel model Welsing model Tanno model Wu model York psoriasis model Beard model Cameron model Davies model Heeg cancer model Lee model Orme model Sawyer model NICE CG137 Shepherd model Smith model Soini model Tebas model Wong model Same properties as Markov cohort. Reduced impact of Markovian assumption (not memoryless; incorporates time dependency). Multidimensional matrix can potentially allow model to reflect patient history or previous treatments (but no study included this) Patients can only be in one state at a time. Transitions can only occur at fixed intervals defined by cycle length. Only one transition allowed per cycle. Becomes more complex with added states Can be relatively straightforward to develop and not computationally intensive. Non-data intensive. Area under the curve can be calculated continuously over time; no cycles required. Can be used in conjunction with decision tree. Can account for extensive treatment sequences (multiple treatment lines) Cannot account for complex treatment-sequencing algorithms or additional attributes (e.g. adverse effects, disease duration). Underlying structural assumption—surrogate outcomes, e.g. progression free survival, are independent of overall survival Schadlich model NICE CG131 (with decision tree) Hind model Fixed treatment sequences modelled; disease activity monitored for each individual over time. Health states generally represented response or non-response to each successive treatment, with the addition of adverse effects as a separate state in some models. Most models also allocated patients to different levels of response after initiating treatment, with those achieving a specific threshold (remission) continuing treatment Duration of response differs according to levels of response. Changes in disease activity (assumed to relate to level of response, not treatment). Patients follow different disease courses, which cannot be predicted at the onset. Stepped care approach (treatment algorithms) Not limited by Markov assumption (eliminating need for excessive number of states). A large number of characteristics can be ascribed to individually simulated patients. Access to individual patient data enabled key parameters and events in patient histories to be calculated using multivariate regression, allowing adjusting for important covariates. Can account for heterogeneous population Transition limited to fixed intervals defined by cycle length. Cannot account for multiple events in one cycle. Can be computationally intensive Sheffield etanercept model Bansback model Sheffield BSRBR model Diamantpoulus model Sheffield AHRQ model Kielhorn model Kobelt model Holmes model Treatment sequences implemented in three different ways: (1) Fixed treatment sequences (2) Random selection of pre-defined sequences (3) Developed as part of the modelling process by selecting individual drugs, using a random process, at specific points in the sequence When conceived as a simple structure: Variable time to quitting treatment. Duration of response differs according to levels of response Changes in disease activity (over time and on treatment) Reason for discontinuation impacts selection of subsequent treatments When developed as a more complex structure: Treatment selection and cessation based on algorithms reflecting specific clinical guidelines (accounting for treatment eligibility) Unpredictable nature of disease progression Multiple treatment outcomes Different levels of response with partial/complete response leading to treatment withdrawal in some patients Not all patients go on to receive subsequent treatments in the sequence Differential treatment selection for subgroups Can ascribe a large number of characteristics to individually simulated patients. Can account for heterogeneous population. Not limited by the use of fixed time advancement (cycles). Treatment duration can be modelled as a continuous distribution, specific to individual treatments. Patients can simultaneously be in multiple states, and experience different events. Allows for modelling of complex scenarios and treatment algorithms. Can be computationally more efficient than state transition model. Can be easily adapted to incorporate additional events or patient attributes Extensive data required including time to event, which may be limited, e.g. time to treatment withdrawal due to adverse effects. Individual patient-level data preferred, but can be based on aggregate data. Model structures can be difficult to communicate and interpret. Computationally challenging in terms of model design and running it BPM/BRAM Tran-Duy model Lindgren model Birmingham epilepsy model Denis model Heeg schizophrenia model Decision problem conceptualised as a set of conditions (aspects that persist over time) and events (aspects that occur at a point in time) within spreadsheet tables that specify condition values and event consequences. Provides a single template for implementing a variety of model types (DES, Markov models or hybrids; cohort or individual sampling models; stochastic or deterministic) Fixed treatment sequences modelled. Events included lack of response and loss of response that lead to treatment initiation. Whilst on treatment patients could experience adverse effects (monitored) Patient characteristics (baseline risk) and response (status, duration) impacts disease milestones. Variable time to quitting treatment. Changes to disease activity (assumed to relate to level of response). Treatment switching based on clinical rules (disease severity and response). Probability of switching treatments in later lines reduced Can ascribe a large number of characteristics to individually simulated patients. Can account for heterogeneous population. Flexible, allowing combinations of state-transition and time-to-event components in a single model. Treatment duration can be modelled as a continuous distribution, specific to individual treatments. Allows for modelling of complex scenarios and treatment algorithms. Transparent (specifications tabulated rather than programmed in code). Easily modified Extensive data required including time to event, which may be limited, e.g. time to treatment withdrawal due to adverse effects. Executing simulation using macro in spreadsheet can be slow for complex models HAS RA model Deniz model Has the same properties as DES. No clear advantage of using open model over other approaches identified for evaluating treatment sequences. However, Cardiff model provides an example of a model that is continually being developed and updated and capable of running using various levels or types of data Has the same properties as Markov cohort. No clear advantage of using open model over other approaches identified for evaluating treatment sequences BPM Birmingham preliminary model, BRAM Birmingham Rheumatoid Arthritis Model, CG clinical guidelines, NICE National Institute for Health and Care Excellence, T2DM type 2 diabetes mellitus *A list of included modelling studies is provided in Online Resource 3 (see the electronic supplementary material) **Base-case model analysed using Monte Carlo simulation ***Markovian cycle tree (or Markov decision tree) is where events that can occur within a cycle are modelled as a series of chance nodes[183, 184]

Simplifying Assumptions Regarding Sequences of Treatment

Treatment sequences were often modelled as a series of discrete treatments, each requiring a summary effect estimate conditional on positioning in the treatment pathway. The scarcity of data to inform such estimates meant that simplifying assumptions were often applied to the available data on discrete treatments used at a single point in the pathway. A range of simplifying assumptions used to represent treatment-sequencing effect estimates was identified, which were used to develop a novel taxonomy of all possible assumptions (Table 3). The most common assumptions were that treatment effect is independent of positioning in the sequence, or that treatment effect is dependent on the number of previous treatments (treatment line), but independent of the type of treatments used (Table 4). These assumptions were frequently not validated; nor was their impact on the overall results assessed. Forty-nine studies (72%) assumed that the effect of either all or some of the treatments used after the first treatment modelled (or decision point) were independent of treatment sequence. Only six studies (9%) evaluated the impact of applying this assumption in sensitivity analyses, by reducing the effect of treatments used later in the sequence using a factor based on evidence [67, 69], an arbitrary amount [15, 93, 110], or expert consensus [14]. The assumption that treatment effect is dependent on line of therapy was often used in conjunction with the assumption of treatment independence, applied to treatments adopted later in the sequence.

Table 3

Taxonomy of simplifying assumptions relating to treatment-sequencing effects used by studies included in the review

Simplifying assumptions taxonomy
Treatment independence	Treatment effect is independent of positioning in treatment sequence
	Treatment effect is dependent on the number of previous treatments used, but independent of the type of treatments used
Substitution with another treatment effect	Treatment effect is the same as an alternative treatment from the same class, or a generic class effect—irrespective of positioning in the sequence (generic effect)
	Treatment effect is the same as an alternative treatment from the same class, or a generic class effect—matching the same position in the sequence (positional generic effect)
	Treatment effect is the same as an alternative (substitute) treatment from a different class of treatments, used at the same point in the sequence (substitute treatment)
Modification of treatment effect	Treatment effect is reduced/increased, in line with a multiplier (multiplication factor), when used at a later point in the sequence. (Here, the specific multiplication or reduction factor used to modify the effect is informed by the available evidence that is also relevant to the treatment of interest.)
	Treatment effect decrements by the same pre-set amount with each successive treatment (decrementing effect). (Here, the same generic proportional reduction, used to represent the diminishing effects, is applied at each point in the sequence irrespective of the treatment used. The proportion is not necessarily based on a specific evidence base.)
	Treatment effect is reduced with disease duration, and treatments are not as effective when they are used in late disease
Impact of time since previous treatment	Treatment effect is not affected by previous treatments if patients have been in long-term remission, and thus can re-use the same treatment(s)/class of treatment(s) as that which achieved the prior remission
Displacement effect ignored	A single treatment effect does not differ when it is displaced (i.e. its position in the sequence is changed) by the addition of a new prior treatment (displacement ignored)
The use of uncontrolled/observational studies without bias adjustment	Uncontrolled trials or observational studies provide an un-biased estimate of treatment (sequencing) effects
	Expert consensus provides an un-biased estimate of treatment-sequencing effects

Table 4

Summary of the frequency of use of the simplifying assumptions

Simplifying assumption used	Total (n = 68)*	Rheumatology studies (n = 36)	Non-rheumatology studies (n = 32)
Treatment independence
Independent of positioning (IP)	49 (72%)	25 (71%)	24 (75%)
Dependent on number of previous treatments used (NPT)	29 (44%)	19 (54%)	10 (32%)
NPT used in conjunction with IP		11/19	6/8
Substitution with another treatment effect
Generic effect (GE)	16 (24%)	14 (40%)	2 (6%)
Positional generic effect (PGE)	17 (26%)	15 (43%)	2 (6%)
Substitute treatment (ST)	1 (2%)	1 (3%)	–
Modification of treatment effect
Multiplication factor (MF)	10 (15%)	7 (19%)	3 (10%)
Decrementing effect (TD)	4 (6%)	1 (3%)	3 (10%)
Reduced with disease duration (RDD)	4 (6%)	3 (9%)	1 (3%)
Impact of time since previous treatment
Long-term remission (LR)	2 (3%)	–	2 (6%)
Displacement effect ignored
Displacement ignored (DI)	28 (42%)	23 (66%)	5 (16%)
The use of uncontrolled/observational studies without bias adjustment (internal validity)**
Uncontrolled trials or observational studies (UOBS)	20 (30%)	16 (35%)	4 (13%)
Expert consensus (EXC)	4 (6%)	1 (3%)	3 (10%)

CG clinical guidelines, NICE National Institute for Health and Care Excellence

*Four modelling studies included in the review of modelling approaches were not included in the review of simplifying assumptions. These include two (NICE CG131; Hind, 2008: NICE TA79) [53, 56] that obtained data on clinical effectiveness from sequencing trials and two (McEwan et al., 2010, and Launois et al., 2008) [102, 108] that did not evaluate the clinical effectiveness of treatment sequences, but were included as they provided relevant examples of specific modelling techniques

**This relates to no adjustment made to issues relating to the use of observational data rather than data form randomised controlled trials

Taxonomy of simplifying assumptions relating to treatment-sequencing effects used by studies included in the review Summary of the frequency of use of the simplifying assumptions CG clinical guidelines, NICE National Institute for Health and Care Excellence *Four modelling studies included in the review of modelling approaches were not included in the review of simplifying assumptions. These include two (NICE CG131; Hind, 2008: NICE TA79) [53, 56] that obtained data on clinical effectiveness from sequencing trials and two (McEwan et al., 2010, and Launois et al., 2008) [102, 108] that did not evaluate the clinical effectiveness of treatment sequences, but were included as they provided relevant examples of specific modelling techniques **This relates to no adjustment made to issues relating to the use of observational data rather than data form randomised controlled trials The available evidence to inform treatment-sequencing effects impacts the type of assumptions required. The review focused on modelling studies that evaluated treatment sequences, but economic evaluations often focus on the comparison of discrete treatments and model downstream costs of subsequent treatments. The findings demonstrated that priority was often given to matching the evidence for the decision point, for example, comparing first-line biologics, rather than considering treatment sequences as a whole. Economic evaluations undertaken by, or on behalf of, manufacturers of health technology tended to focus on a specific decision point reflecting treatments used in pivotal RCTs matching the licence indication, for example, comparing a TNF inhibitor to a conventional DMARD [74, 80, 101], or a non-TNF-inhibitor biologic to a TNF inhibitor [101]. The data sources used alongside the simplifying assumptions for treatments used beyond the decision point varied, even when considering the same decision problem and addressing the same evidence gap. For example, data sources used to inform sequential TNF inhibitors included the following: RCTs of TNF inhibitors used as first-line biologics [45, 67, 72, 83, 87, 89, 92, 96, 101, 109], a national patient registry [81, 101, 104, 115, 122], a large, uncontrolled, open-label study of a specific TNF inhibitor in patients who had previously discontinued TNF inhibitors [78, 79, 84, 107, 112, 113], and an RCT of a non-TNF-inhibitor biologic in participants with an inadequate response to TNF inhibitors. The effects of treatments administered later in the treatment pathway were also handled in different ways. For example, in a technology appraisal of TNF inhibitors for rheumatoid arthritis [83], the initial treatment response for each subsequent conventional DMARD was explicitly modelled, whilst in another technology appraisal of TNF inhibitors for psoriatic arthritis [16], the economic model assumed that patients experienced a steady long-term deterioration after the failure of the TNF inhibitor. Therefore, fluctuations caused by response to subsequent conventional DMARDs, which were considered to be administered as part of palliative care, were ignored. The uncertainty in the quality of the alternative evidence to inform sequencing effects was not investigated in depth. Decision models that start at the point of diagnoses are more likely to reflect the complete sequence of treatments used in chronic conditions, for example, some studies of biologics in rheumatoid arthritis developed the decision population within the actual model, with patients entering the model being newly diagnosed with early disease [67, 75, 83, 85, 99, 122]. However, the likelihood that there is no matching evidence is increased, and more assumptions are required. Another approach is to model the initial treatment used prior to the decision point (e.g. when comparing second-line biologics), and apply the assumption that the entire patient population on entering the model have an inadequate response to the first modelled treatment (e.g. first-line biologic). This approach was used in the Advanced Simulation Model, to allow the initial treatments to be costed appropriately, reflecting treatment sequences used in practice [71, 78, 79, 84, 112, 113]. However, the evidence used to inform the treatment effects of the second TNF inhibitor did not match the prior TNF inhibitor failed (first treatment modelled). The third and most common approach was to include a patient population entering the model that reflected the decision problem in terms of the number of previous treatments used, for example, patients receiving their first biologic therapy. Modelling studies that only consider the impact of subsequent treatments when, for example, comparing first-line biologics [71, 72, 74, 75, 83, 89, 101, 120, 123, 127] are generally based on the assumption that the sequences being compared are starting from a level playing field. The potential impact of this is not generally considered within the sensitivity analysis, as it is not part of the cost-effectiveness estimates. A frequent problem when evaluating the introduction of a new treatment to an established sequence is the lack of data to inform the ‘displaced effect’. For example, when adding a new drug (e.g. non-TNF-biologic agent) to an established sequence (e.g. starting with a TNF inhibitor), the existing drug is displaced lower down the sequence (Box 2), and is generally modelled as both the comparator (e.g. first-line) treatment in the baseline sequence and the subsequent (second-line) treatment, after the new drug, in the intervention sequence. The same treatment effect is generally applied to the existing drug, irrespective of whether it is used early or later in the sequence (and disease trajectory), with no RCT data available on its effect in patients with an inadequate response to the new drug.

Discussion

Summary of the Findings

The review identified a range of quantitative evidence synthesis methods used for evaluating the effectiveness of alternative treatment sequences. The findings demonstrated the following: Reviewing the evidence on treatment sequencing is neither trivial nor straightforward. In most cases, treatment sequences represent complex, multifaceted, dynamic intervention pathways, which will require advanced methods of quantitative evidence synthesis, especially if evaluated using a ‘one-step-at-a-time’ approach. Prospective sequencing trials are few in numbers and do not cover the breadth of decision making needed. The evidence synthesis would likely need to consider the inclusion of diverse study designs, including non-randomised studies. The problem has largely not been addressed using evidence synthesis methodology for clinical effectiveness, but is usually dealt with at the decision modelling stage. There is no single best way to evaluate treatment sequences; rather there is a range of approaches and, as yet, no generalised methodology that encompasses the different assumptions used. Each approach has advantages and disadvantages and is influenced by the evidence available and decision problem. When using a one-step-at-a-time approach, previous treatment is an important effect modifier, and subsequent treatments can confound long-term outcomes, such as survival. The reason for discontinuing treatment (lack of effect, loss of effect, or intolerance) has a differential effect on the effectiveness (and choice) of subsequent treatment, and is poorly reported in primary studies. The extent and type of sequences being evaluated tended to reflect the available research evidence, rather than clinical practice.

Comparison with Existing Reviews

We identified three existing reviews of methods for evaluating treatment sequences. This included two systematic reviews of economic evaluations [4, 141] and one review of published UK NICE technology appraisals [3]. Mauskopf et al. analysed treatment-sequencing assumptions after failure of the first biologic in cost-effectiveness models of psoriasis, and compared the modelled sequences with the most recent treatment guidelines [141]. They concluded that models of first-line biologics either do not include subsequent treatments or include only some of the regimes recommended in current guidelines, and that cost-effectiveness results may be sensitive to the assumptions about treatment sequencing, and choice and efficacy of subsequent treatment sequencing regimens. Tosh et al. assessed and critiqued how sequential DMARDs for rheumatoid arthritis have been modelled in economic evaluations [4]. They found that reporting of the methods and evidence used to assess the effect of downstream treatments was generally poor; when lifelong models and treatment sequences were considered, evidence gaps were identified. They concluded that methods were not applied consistently, leading to varied estimates of cost-effectiveness, and that treatment sequences were not fully considered and modelled, potentially resulting in inaccurate estimates of cost-effectiveness. Zheng et al. investigated approaches used to model treatment sequences in NICE appraisals to provide practical guidance on conceptualising whether and how to model sequences in health economic models [3]. They concluded that the biggest challenge is the scarcity of clinical data that capture the long-term impacts of sequences on efficacy and safety. Three commonly used assumptions to bridge the evidence gap were identified, but each had its own limitations. These included the assumption that the efficacy of a treatment stayed unchanged regardless of line of therapy, the use of data from trials in different lines of therapy to directly model a treatment sequence, and the use of retrospective studies of clinical registries or databases. The findings of these reviews were consistent with ours, though their scope was more limited in that they focused on either a single condition or UK NICE appraisals.

Strengths and Limitations of the Review

This is the first review of methods to investigate the evaluation of treatment sequencing across all clinical scenarios, and to include both meta-analytic techniques and decision-analytic modelling. It represents an extensive in-depth review of current methods used to evaluate the clinical effectiveness of treatment sequences, representing a broad and disparate area of research. A potential limitation of our review is that the reference database searches were not updated. However, targeted hand searches were continued during the review process and studies published beyond 2013 have been included. Nevertheless, more recent studies were only included if they contributed new findings, and the searches stopped when no new information was being found. This means that the review could have potentially missed new methods developed in the last few years. Updated targeted hand searches identified a new modelling technique (DICE) that was not previously included in our review. This has since been included. However, the methods used to conceptualise treatment sequences and the level of reality captured in the DICE model did not change the findings and recommendation of our review. The methods used to develop treatments sequencing effect estimates and the accompanying simplifying assumptions made within the new studies [45, 46] were also the same as those included in our review. The assessment of recent studies included in a new systematic review of economic evaluation of sequences of biological treatments for patients with rheumatoid arthritis, published in 2020, did not identify any studies reporting methods or simplifying assumptions not already incorporated in our review [44].

Recommendations for Practice

Primary Study Design

The available evidence base for evaluating new treatments is often driven by the requirements for regulatory approval, and thus focuses on discrete treatments used at a defined point in the pathway [142, 143]. The lack of data on the effectiveness of these treatments when used at another point in the pathway is a barrier to making policy decisions about the optimal positioning of new treatments or treatment sequences. The GetReal project (Sect. 3.2.4) included a stakeholder engagement workshop to solicit views on the usefulness and acceptability of their analytic approach [144]. Interestingly, the regulators considered it to have limited usefulness because the evidence requirements for marketing authorisation in rheumatoid arthritis is line specific, whilst the pharma research and development representatives considered it useful in principal, to better understand the gaps in the evidence across lines of therapy and aid the design of future clinical trials [144]. The focus of primary research on discrete treatments is unlikely to change unless the regulatory authorities specify the importance of treatment sequencing or optimal positioning of new treatments. The reimbursement agencies and HTA bodies should also make recommendations on the nature of the clinical evidence required to inform treatment sequences [145, 146].

Health Technology Assessment

It is important to identify the relevance of, or the need to consider, treatment sequencing early on in the technology assessment process, and incorporate both the clinical and economic evaluation. Treatment sequencing was often considered as part of the economic evaluation only, and not considered in the clinical evaluation [17, 67, 83, 85, 95, 99, 106, 116, 126, 147]. A previous review of NICE technology appraisals also identified a lack of integration or direct use of the systematic review to inform the economic evaluation, and the need to consider the data requirement of the economic model at an early stage [148]. The development of an initial analytic or conceptual framework [40, 149] provides an essential tool for the planning and evaluation of treatment sequences. It can be used to consolidate the requirements of the clinical and economic evaluation; assist in communication within the research team and with a range of stakeholders; ‘think through’ the multiple components of the treatment pathways and disease-specific events in context; enhance the transparency of underlying assumptions; and inform choices about the level of structural complexity required by the model [40, 139, 150–153]. For some chronic diseases, it may be useful to create a disease-specific conceptual framework that can serve as a foundation for developing future HTAs and economic models of current and novel treatments [154], thus potentially allowing for greater stakeholder feedback and future improvement. There is also a need to depict treatment sequences as a tree, rather than a linear sequence of treatments, thus accounting for the complex and dynamic intervention pathways that they represent. Although methods were developed that accounted for the fact that the reason for treatment discontinuation (e.g. loss of effectiveness, adverse events, non-adherence) might determine the average effectiveness for the next line of therapy, the reality is that this may also affect the choice of therapy for the next line. A tree structure is adopted in the SMART design, which is a multistage trial designed to develop and compare treatment pathways that are adapted over time based on individual’s response and/or adverse effects [28]. The time and resource constraints of HTA, accompanied by limited evidence, may render an extensive model unrealistic. It may therefore be tempting to simplify the treatment-sequencing decision problem. However, a model based on an oversimplification of the decision problem and clinical practice is also unlikely to be useful for decision makers. An alternative approach would be to develop a model that is designed to address any/multiple decision problems, rather than a single use model. This may be relevant, not only for chronic disease, but also in the introduction of new treatments in a rapidly changing clinical field, such as oncology [5]. The likelihood that the available data to inform sequencing effects may improve over time also supports developing a model that is easily updated. This is consistent with recent calls for the use of disease-specific reference models [155], pre-verified modules [156], and open-source models [157] to improve the accuracy of economic evaluations. Our review identified some good examples where a model was further developed over time to address multiple reimbursement decisions (e.g. Birmingham Rheumatoid Arthritis Model [BRAM] [75, 158], Tran-Duy model [68, 122], Sheffield rheumatoid arthritis models [159], and the Advanced Simulation Model [78]). However, each was developed by the same research group. An important challenge here is the need to make sufficient detail on the original model openly available. A mathematical challenge for comparing multiple permutations of sequences is to determine the proper starting point of the model. This is also relevant when using a model designed for multiple uses, which may start at the point of diagnosis [75], a key point in the treatment pathway (e.g. initiating DMARD therapy [122] or biologic therapy [78]), or the point at which the decision is made (Sect. 3.3.2). All evaluations should start at the point of divergence (i.e. the point at which a decision might be made) [75]. Models used for comparing multiple permutations of sequences often include the same first one, two, or three lines of treatment. This will essentially 'dilute' the true incremental effects (and costs) of treatment since some patients will have died (and left the model) before the point of divergence. Thus, when calculating the incremental outcomes per patient, the denominator will be greater than should have been used, meaning that the incremental results will underestimate the true effects. A number of studies developed a model based on an existing approach. Existing modelling approaches could also, potentially, be adapted for use in a different disease condition. However, when using an existing model, it is important to consider what underlying assumptions regarding treatment sequences were applied. For example, the York psoriasis model [126], which has subsequently been used by multiple studies evaluating treatment sequences in psoriasis [141, 160, 161], is based on the underlying assumption of treatment independence. The underlying assumptions of some existing modelling approaches mean that they will not be suitable for assessing the treatment sequences for some chronic conditions.

Taxonomy of Simplifying Assumptions Relating to Treatment-Sequencing Effects

The taxonomy of simplifying assumptions (Table 3) provides a unique and important resource to inform future practice and has the potential to be an important tool for clarifying the extent to which treatment-sequencing effects have been accounted for within a decision model. It can be used as a checklist by modellers to help them consider whether treatment sequencing should be modelled, and what implicit assumptions they may be making. It can also be used by reviewers or policy decision makers to appraise or better understand an existing model. However, to apply the taxonomy, better reporting of the simplifying assumptions made is required. Our taxonomy focused on the simplifying assumptions made regarding the initial treatment effect (of discrete treatments conditional on their position in the treatment pathway). This incorporates the impact of previous treatment, differential reason for discontinuing previous treatment, and increasing disease duration. However, the taxonomy did not consider the assumptions made about the long-term effect of treatment. Many treatments of chronic conditions, such as rheumatoid arthritis, result in an initial, short-term improvement, followed by a period of waning effect. In some models, when patients move quickly through the sequence of treatments (for example, early discontinuation due to adverse effects), simulated patients can actually benefit from having multiple 'short-term' benefits from different treatments, thus gaining an additive effect. Some included models of inflammatory arthritis attempted to overcome this problem by introducing a 'rebound' effect, which automatically returns the patient to their starting severity (used in, for example, the Diamantopoulos model [89]), or following some natural, background increase (as used in the BRAM [158]). Although the evidence to support this type of assumption is weak, it is arguably better than the false benefits generated by models otherwise. Similarly, the issue of accumulating short-term benefit can also be problematic where there is an asymmetry in the sequences being compared, for example, the ‘adding’ decision problem illustrated in Box 2. A false benefit can be introduced when modelling a sequence plus new treatment, in comparison to the model without the added treatment, simply by allowing more 'short-term' effects of treatments.

Recommendations for Research

An important outcome of the review is the gaps identified in the research evidence. More research is needed to establish when it is necessary to evaluate treatment sequences, and how best to make this decision. This is likely to be a condition-specific endeavour, but the methods will be relevant across different clinical scenarios. Further research is needed to identify how best to develop a summary treatment effect of whole sequences or discrete interventions conditional on positioning in the sequence. This requires improved reporting on previous and subsequent treatment within primary studies, including better data on reasons for discontinuing or switching treatment. Access to individual patient-level data is also key here [35, 162]. Real-world disease-specific data sources can provide essential follow-up data on entire treatment sequences, and potentially be used to emulate a pragmatic randomised trial of dynamic treatment sequences [27, 163–165]. If these data sources are going to be useful, treatment sequences need to be considered during the planning and development stages. They will also need to go through many high-quality validation studies [164]. The evaluation of whole treatment sequences using real-world data also needs to take into account the potential biases listed in Box 1. Finally, little reference was made within existing research on the potential, or actual role, of incorporating patient perspectives into the evaluation of treatment sequences. Further work is needed to develop the optimal approach for involving members of the public in HTA of treatment sequences, which should be informed by existing guidance and recent research on patient and public involvement in systematic reviews and economic evaluations [166-171]. As experienced-based experts, patients can contribute essential knowledge that is complementary to that of other key stakeholders, such as clinicians and policy makers. Their involvement, on an equal basis to other stakeholders, is likely to be relevant to all stages of the HTA, including refining the scope and decision problem, the evidence synthesis, evidence interpretation and integration, and dissemination and application [172].

Conclusions

The review illuminates a significant gap in methods development. It also demonstrates important limitations in the primary studies, which tended to focus on the evaluation of discrete treatments, with poor reporting of any previous or subsequent treatments. The increasing use of NMA in HTA demonstrates an acknowledgment that clinical and policy decision making should account for the multiple treatments available for many chronic conditions. However, the sequential use of these treatments has yet to be accounted for within clinical evaluations, with most meta-analysis being conducted of discrete treatments that may or may not be stratified by line of therapy. The economic modelling exposes the need to consider treatment sequences, but this is often based on the simplifying assumption of treatment independence. This can lead to misrepresentation of the true level of uncertainty, potential bias in estimating the effectiveness and cost-effectiveness of treatments, and eventually the wrong decision. In summary, there has been no co-ordinated approach to the important issue of evaluating the effectiveness and cost-effectiveness of treatment sequencing. This is a major shortfall at a time when the cohort of people with complex chronic conditions, requiring sequential treatments, is increasing. The findings of the review will help policy makers and researchers gain traction in answering questions about the effectiveness of different treatment sequences. Below is the link to the electronic supplementary material. Supplementary file 1. (PDF 1290 kb)

Treatment sequences, where previous treatment and patient characteristics can affect both the choice and effectiveness of subsequent treatments, are increasingly common in chronic conditions and represent complex treatment pathways. Methods for evidence synthesis that produce the least biased estimates of treatment sequencing effects are required to inform reliable clinical and policy decision making.

Randomised controlled trials (RCTs) of treatment sequences are limited; the use of RCTs of discrete treatments may not provide good evidence on treatment sequencing effects, and observational studies are susceptible to confounding and bias.

The inclusion of discrete treatments used at different points in the treatment pathway may bias a network meta-analysis. Meta-regression needs to account for both previous treatment and duration of disease.

Modelling studies of treatment sequences often apply simplifying assumptions due to the absence of sequencing trials. This can lead to misrepresentation of the true level of uncertainty, potential bias in estimating the effectiveness and cost-effectiveness of treatments, and the wrong decision.

166 in total

1. Dynamic treatment regimes for managing chronic health conditions: a statistical perspective.

Authors: Bibhas Chakraborty
Journal: Am J Public Health Date: 2010-11-18 Impact factor: 9.308

Review 2. Modelling methods for pharmacoeconomics and health technology assessment: an overview and guide.

Authors: James E Stahl
Journal: Pharmacoeconomics Date: 2008 Impact factor: 4.981

Review 3. Algorithm for neuropathic pain treatment: an evidence based proposal.

Authors: N B Finnerup; M Otto; H J McQuay; T S Jensen; S H Sindrup
Journal: Pain Date: 2005-10-06 Impact factor: 6.961

4. Cost effectiveness of saxagliptin and metformin versus sulfonylurea and metformin in the treatment of type 2 diabetes mellitus in Germany: a Cardiff diabetes model analysis.

Authors: Wilma Erhardt; Klas Bergenheim; Isabelle Duprat-Lomon; Phil McEwan
Journal: Clin Drug Investig Date: 2012-03-01 Impact factor: 2.859

5. Economic evaluation of sequential treatments for follicular non-hodgkin lymphoma.

Authors: Erkki J Soini; Janne A Martikainen; Ville Vihervaara; Kim Mustonen; Tapio Nousiainen
Journal: Clin Ther Date: 2012-03-27 Impact factor: 3.393

6. Budget impact model of rituximab after failure of one or more TNFalpha inhibitor therapies in the treatment of rheumatoid arthritis.

Authors: Robert Launois; Stéphanie Payet; Nathalie Saidenberg-Kermanac'h; Camille Francesconi; Lionel Riou França; Marie-Christophe Boissier
Journal: Joint Bone Spine Date: 2008-10-31 Impact factor: 4.929

7. Modelling the cost-effectiveness of etanercept in adults with rheumatoid arthritis in the UK.

Authors: A Brennan; N Bansback; A Reynolds; P Conway
Journal: Rheumatology (Oxford) Date: 2003-07-30 Impact factor: 7.580

8. Economic evaluation of systemic therapies for moderate to severe psoriasis.

Authors: S Sizto; N Bansback; S R Feldman; M K Willian; A H Anis
Journal: Br J Dermatol Date: 2008-12-15 Impact factor: 9.302

9. Using observational data to emulate a randomized trial of dynamic treatment-switching strategies: an application to antiretroviral therapy.

Authors: Lauren E Cain; Michael S Saag; Maya Petersen; Margaret T May; Suzanne M Ingle; Roger Logan; James M Robins; Sophie Abgrall; Bryan E Shepherd; Steven G Deeks; M John Gill; Giota Touloumi; Georgia Vourli; François Dabis; Marie-Anne Vandenhende; Peter Reiss; Ard van Sighem; Hasina Samji; Robert S Hogg; Jan Rybniker; Caroline A Sabin; Sophie Jose; Julia Del Amo; Santiago Moreno; Benigno Rodríguez; Alessandro Cozzi-Lepri; Stephen L Boswell; Christoph Stephan; Santiago Pérez-Hoyos; Inma Jarrin; Jodie L Guest; Antonella D'Arminio Monforte; Andrea Antinori; Richard Moore; Colin Nj Campbell; Jordi Casabona; Laurence Meyer; Rémonie Seng; Andrew N Phillips; Heiner C Bucher; Matthias Egger; Michael J Mugavero; Richard Haubrich; Elvin H Geng; Ashley Olson; Joseph J Eron; Sonia Napravnik; Mari M Kitahata; Stephen E Van Rompaey; Ramón Teira; Amy C Justice; Janet P Tate; Dominique Costagliola; Jonathan Ac Sterne; Miguel A Hernán
Journal: Int J Epidemiol Date: 2016-12-01 Impact factor: 9.685

Review 10. GetReal in mathematical modelling: a review of studies predicting drug effectiveness in the real world.

Authors: Klea Panayidou; Sandro Gsteiger; Matthias Egger; Gablu Kilcher; Máximo Carreras; Orestis Efthimiou; Thomas P A Debray; Sven Trelle; Noemi Hummel
Journal: Res Synth Methods Date: 2016-08-16 Impact factor: 5.273