Literature DB >> 36100825

A Promising Approach to Optimizing Sequential Treatment Decisions for Depression: Markov Decision Process.

Fang Li¹, Frederike Jörg^2,3, Xinyu Li⁴, Talitha Feenstra^4,5.

Abstract

The most appropriate next step in depression treatment after the initial treatment fails is unclear. This study explores the suitability of the Markov decision process for optimizing sequential treatment decisions for depression. We conducted a formal comparison of a Markov decision process approach and mainstream state-transition models as used in health economic decision analysis to clarify differences in the model structure. We performed two reviews: the first to identify existing applications of the Markov decision process in the field of healthcare and the second to identify existing health economic models for depression. We then illustrated the application of a Markov decision process by reformulating an existing health economic model. This provided input for discussing the suitability of a Markov decision process for solving sequential treatment decisions in depression. The Markov decision process and state-transition models differed in terms of flexibility in modeling actions and rewards. In all, 23 applications of a Markov decision process within the context of somatic disease were included, 16 of which concerned sequential treatment decisions. Most existing health economic models relating to depression have a state-transition structure. The example application replicated the health economic model and enabled additional capacity to make dynamic comparisons of more interventions over time than was possible with traditional state-transition models. Markov decision processes have been successfully applied to address sequential treatment-decision problems, although the results have been published mostly in economics journals that are not related to healthcare. One advantage of a Markov decision process compared with state-transition models is that it allows extended action space: the possibility of making dynamic comparisons of different treatments over time. Within the context of depression, although existing state-transition models are too basic to evaluate sequential treatment decisions, the assumptions of a Markov decision process could be satisfied. The Markov decision process could therefore serve as a powerful model for optimizing sequential treatment in depression. This would require a sufficiently elaborate state-transition model at the cohort or patient level.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36100825 PMCID： PMC9550715 DOI： 10.1007/s40273-022-01185-z

Source DB: PubMed Journal: Pharmacoeconomics ISSN： 1170-7690 Impact factor: 4.558

Key Points for Decision Makers

Introduction

Depression is one of the most burdensome and costly of all mental health disorders, with a worldwide average lifetime and 12-month prevalence of 14.6% and 5.5%, respectively [1]. People with depression experience impairment in daily life, resulting in a quality of life that is lower than in the general population [2]. According to WHO projections, depression will rank first in terms of disability-adjusted life-years lost by 2030 [3]. The economic burden of depression is also high, having been estimated at US$326.2 billion for the United States in 2018 (price level 2020) [4]. Depression thus imposes a high burden on society, the healthcare system, and individuals [5]. To reduce this burden and support appropriate treatment selection, increasing attention is being directed to studies comparing different treatments regarding health outcomes and cost effectiveness. Most previous studies have examined only limited numbers of different treatments (e.g., psychotherapies, pharmacotherapy, brain stimulation therapy [6-8], genetic testing) to support targeted therapy [9], using different health economic (HE) models. While such studies have supported choices between different treatments, they have yielded little insight into treatment duration or sequential treatment choices. To date, no consensus has been reached concerning how long (e.g., days, weeks, months, years) a patient should be treated with a specific treatment for depression [10]. Furthermore, it is unclear how consecutive treatments should be selected when initial treatment is not successful. One widespread approach is stepped care: a gradual increase in the intensity of treatments [11]. More recently, however, scholars have been directing greater attention to matched care [12], which implies that initial and sequential treatment steps are carefully adjusted to the personal characteristics and treatment history of the individual. Such adjustments are usually pragmatic and based on general guidelines, although they might also be informed by data-driven optimization. The Markov decision process (MDP) is a mathematical model for sequential decisions and dynamic optimization [13], which generalizes standard Markov models by embedding a sequential decision process into the model and allowing multiple decisions in multiple time periods [14]. To support optimization, MDP models have been applied to address a variety of industrial operation problems, including cost-effective maintenance [15-18], electricity supply [19], and dynamic pricing [20]. Recent studies have demonstrated that MDP has potential to support clinical decision making [14]. Steimle and Denton [21] argue that the MDP model is essential for guiding decision makers in treatment decisions for chronic diseases, as it provides an analytical framework for studying sequential decisions. The framework is very general, however, and not geared toward specific diseases, nor does it contain actual input data. The feasibility of its actual application is therefore unclear. For this reason, two questions are worth exploring. The first concerns the identification of any actual applications of MDP within the field of healthcare, and the second concerns whether MDP could be fruitfully applied to address treatment decision issues in depression. Given the state of knowledge as described above, the primary aim of this article was to examine how MDP has been implemented by reviewing all existing applications of MDP to medical decision making for diseases. It also provides a review of existing HE models of depression and an analysis of the potential of MDP to support sequential treatment decisions in depression, based on the reformulation of an existing HE model of depression and an assessment of the suitability of MDP.

Background

State-transition models (STMs) are structured around a set of mutually exclusive and collectively exhaustive health states, transitions, initial-state vectors, transition probabilities, cycle lengths, and state values (‘rewards’), which conceptualize a decision problem in terms of a set of health (or other) states and transitions among these states [22]. In this background section, the elements of an MDP are defined, starting with and compared with their analogues in STMs. The basic definition of an MDP comprises five elements ( P(.│s, a),), described using a standard notation [23]. To build an MDP model, the decision epochs (), state space (), action space (), transition probabilities ), and rewards ( should be defined. All elements of an MDP are listed in Table 1, in comparison with the corresponding elements in a cohort-level STM.

Table 1

Elements of a Markov decision process (MDP) and comparable structures in a cohort-level state-transition model (STM)

MDP element	Definition	Analogous STM component
Decision epoch	The time at which decisions are made	Cycle time (decisions usually made only in Cycle 1/before the start of the model, by defining different scenarios)
State space	Set of mutually exclusive, collectively exhaustive conditions that describe the possible state of the model	States
Action space	Set of possible decisions that can be made at each decision epoch	No specific analogy
Transition probabilities	Probability of each possible state of the system in the following period (conditional on decision and current state)	Transition probabilities (conditional on current state and scenario)
Reward function	The immediate benefits of taking a particular decision at each state	Pay-offs: costs and utilities linked to each state
Decision rule	A specified decision for each possible state at a specific epoch	No specific analogy
Policy	A sequence of decision rules for all epochs following the beginning time point	The treatment strategy is always defined a priori

Elements of a Markov decision process (MDP) and comparable structures in a cohort-level state-transition model (STM) As demonstrated by this comparison, an MDP can be regarded as an extension of an STM. The difference is the addition of actions (e.g., stop treatment, remain on current treatment, change treatment) and rewards, which may depend on these actions, with transition probabilities being conditional on both current state and current action. Conversely, if each state has only one action and if all rewards depend only on the state, the MDP reduces to an STM. Note that most STMs that have been applied to actual HE evaluations deviate in some way from the pure Markov property (e.g., because mortality depends on age or because some pay-offs vary according to both state and model run-time). It stands to reason that no specific corresponding analogy exists for the MDP actions and decision rules, given that STMs are applied in HE evaluations primarily to compare two or more pre-specified strategies or scenarios. In contrast, states (as applied in STMs) are very similar to the states distinguished in MDPs. The decision epochs of an MDP are a set of points at which decisions are made, and they are analogous to the cycle time in standard Markov models. While cohort-level Markov models can thus be extended into an MDP at the aggregate level, it is also possible to define an MDP with parameters that depend on individual characteristics and to define optimal strategies that vary by individual. Such patient-level MDPs could be regarded as an extension of patient-level STMs, sometimes also called microsimulation models, or patient-level Markov models. Finally, MDPs can be defined in continuous rather than discrete time, and with a finite or infinite time horizon [24].

Methods

We performed two reviews, the first to identify existing applications of the MDP in treatment of disease and the second to identify existing HE models in depression. Data were extracted from MDP applications to articulate assumptions and requirements for the MDP. We then illustrated the elements of an MDP by reformulating an existing HE model and examining its added value. This served as input for discussing the suitability of an MDP for solving sequential treatment decisions in depression. The methodological framework of the present study is displayed in Fig. 1. The protocols for the two reviews were registered in the Open Science Framework.

Fig. 1

Methodological framework of the present study. MDP Markov decision process, HE health economic

Review of Markov Decision Process (MDP) and Health Economic (HE) Models

The two reviews followed the guidelines for Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA ScR) (see Appendix Part 1 in the electronic supplementary material [ESM] for the checklist). The search strings for existing applications of MDP and HE decision models were designed to identify relevant literature (see Appendix Part 2 in the ESM). Web of Science and PubMed were searched in September 2021. An article was eligible for inclusion only if it addressed the treatment of diseases rather than the optimization of hospital operations, surgical techniques, or the application of healthcare devices using MDP. In the review of HE models for depression, publications were eligible for inclusion only if they concerned the economic evaluation of treatments for depression. Both reviews excluded papers published in languages other than English, meeting abstracts, reviews, and publications that were not available in full text. After eliminating duplicates, two reviewers (F.L, X.L) independently screened titles and abstracts. Disagreements were initially addressed through discussion and consensus. Any remaining disputes between the two reviewers were solved by appealing to a third author (T.F). Two authors (F.L, X.L) abstracted data on general study characteristics using a data extraction form. For the MDP review, data extraction focused on the structure of the MDP in each of the applications to evaluate the assumptions and requirements of MDP. The following elements were extracted: time horizon, disease, state space, action space, reward function, and main perspective. The authors also attempted to extract the requirements and assumptions of MDP when applied in healthcare settings based on the studies identified. Both general and specific assumptions related to specific applications were included. The review of HE models for depression started with the categorization of model structures. Given our interest in the structure of STMs and whether this structure could be used as a starting point for MDPs, we retained only models that are structured as a set of health (or other) states and transitions among these states. For these studies, further information on each model was collected. General study characteristics were authors and year, treatment types, and their comparators. Model characteristics were health states, time horizon, cycle length, and aim.

Illustration of Elements of an MDP Using a Reformulation of an Existing HE Model into an MDP

We use a case study to illustrate how a real-world HE decision problem can be reformulated as an MDP. The required model elements (e.g., states, transition probabilities, costs, quality-of-life weights) were first extracted from an existing HE model. We then translated the HE model to an MDP formulation based on the information collected. To investigate the consistency of conclusions between the existing HE model and the MDP approach, we compared the results from the existing HE model to those of the MDP model. Finally, we discussed the potential added value of MDP after reformulation.

Assessment of the Suitability of MDP for Optimizing Sequential Treatment in Depression

After comparing the findings of the two reviews to clarify the assumptions about the use of MDP in depression, we examined whether they might be satisfied. We then discussed how to define MDP structure when used in depression. In this step, we also discussed challenges associated with using MDP to optimize sequential treatment decisions for depression.

Results

Overview of Existing Applications of MDP in Treatment of Disease

All existing applications of MDP to optimize treatment concern somatic disease. As shown in Fig. 2, we selected a total of 23 applications of MDP for inclusion in the review. An overview of the characteristics of these applications is provided in Table 2.

Fig. 2

Flow chart of study selection for MDP applications in the field of healthcare. MDP Markov decision process

Table 2

Summary of MDP model applications in healthcare

Study	Time horizon (decision epoch)	Disease	State space	Action space	Reward function	Aim	Individual level
Eghbali-Zarch et al. (2019) [31]	Finite (annual)	Type 2 diabetes	10 states (defined by a discrete set of clinically relevant ranges of HbA1c levels)	Initial and delayed treatment as actions; (treatments considered: metformin, sulfonylureas, and insulin)	Expected QALYs	Minimize treatment decision-related adverse drug reactions	No
Mason et al. (2012) [33]	Finite (annual)	Type 2 diabetes	4 states (defined by adherence rate)	Initial and delayed treatment as actions (treatments considered: statins, metformin, sulfonylureas, and insulin)	Net monetary benefit	Optimize treatment decision	Yes
Meng et al. (2020) [32]	Finite (every 3 mo)	Type 2 diabetes	10 states (defined by HbA1c levels)	Initial and delayed treatment as actions (treatments considered: metformin, sulfonylureas, and insulin)	Expected QALYs	Optimize treatment decision	Yes
Oh et al. (2021) [34]	Finite (annual)	Type 2 diabetes	72 states (defined by complications of diabetes, risk of diabetes, time elapsed since the occurrence of diabetes, and fasting plasma glucose)	1. Initial monotherapy treatment (metformin, sulfonylurea, or others) 2. Initial dual-therapy (metformin + sulfonylureas, metformin + DPP-4 inhibitors or others) 3. Initial triple therapy (metformin + sulfonylureas + α-glucose inhibitor)	Expected discounted QALYs	Optimize treatment decision	No
Shifrin and Siegelmann (2020) [41]	Finite (6 times/d)	Diabetes	Unclear number of states (defined by blood glucose level, carbohydrate intake, and counter of treatment points)	1. Providing insulin boluses based on sensor data 2. Traditional insulin care	Blood glucose level	Optimize treatment decision	Yes
Abdollahian and Das (2015) [39]	Finite (annual)	Breast and ovarian cancer	8492 states (defined by screening status, preventive surgery status, and age)	At ages 30, 40, and 50 y: 1. Do nothing 2. Conduct surgery 3. Start screening At all intermediate ages (31–39, 41–49, and 51–65 y): 1. Do nothing 2. Start screening 3. Stop screening	Costs/QALYs	Optimize treatment decisions	No
Akhavan-Tabatabaei et al. (2017) [30]	Finite (every 6 mo)	Cervical cancer prevention	12 states (defined by patient diagnosis, age, and record of last screening test)	1. Do nothing 2. Pap test 3. Colposcopy without Pap test	Costs	Optimize screening policy	No
Kim et al., (2009)[42]	Finite (user-defined)	Cancer	Unclear number of states (defined by OAR and tumor)	Choose a non-zero dose in each fraction	Patient utility	Optimize the treatment decision	No
Maass and Kim (2020) [43]	Finite (user-defined)	Cancer	11 states (defined by history of treatment, tissue side effect, tumor progression)	1. Treatment modalities with a high risk 2. Treatment modalities with a lower risk 3. No treatment	Function of side effect and tumor progression	Optimize the treatment decision	No
Bazrafshan and Lotfi (2020) [35]	Finite (after every 8 treatment cycles)	Gastroesophageal cancer	3 states (low toxicity, moderate toxicity, high toxicity)	Sequential treatment (5 different types of chemotherapy/5 chemotherapy treatment strategies)	Expected costs	Optimize treatment decision	Yes
Alagoz et al. (2004) [27]	Infinite (unclear)	Severe liver failure (in need of transplantation)	Unclear number of states (defined by the patient’s end-stage liver score)	1. Transplant 2. Wait	QALYs	Optimize the timing of transplantation	No
Alagoz et al. (2007) [28]	Infinite (unclear)	End-stage liver disease	Unclear number of states (defined by the patient’s end-stage liver score)	1. Accepting the cadaveric offer 2. Accepting the living-donor liver 3. Waiting for one more period	QALYs	Optimize the timing of transplantation	No
Liu et al. (2017) [44]	Finite (2-year)	Hepatitis C	8 states (healthy, no fibrosis, portal fibrosis with no septa, portal fibrosis with few septa, numerous septa without cirrhosis, compensated cirrhosis, decompensated cirrhosis, hepatocellular carcinoma, and liver transplantation)	1. Accepting a specific drug treatment 2. Waiting	QALYs	Optimize sequential treatment	Yes
Hauskrecht and Fraser (2000) [36]	Infinite (unclear)	Ischemic heart disease	11 state variables (characterizing multiple combinations of cardiovascular complications and diagnostic test outcomes) Unclear number of states	1. No action 2. Medication treatment 3. Surgical procedure (angioplasty or coronary artery bypass surgery) 4. Investigative procedures (angiogram or stress test)	Costs	Optimize sequential treatment	Yes
Marrero et al. (2021) [45]	Finite (annual)	ASCVD	10 states (healthy, history of CHD but no adverse event, history of stroke but no adverse event, history of CHD and stroke but no adverse event, survived a CHD event, survived a stroke, death from non-cardiovascular disease, death from CHD event, death from stroke and dead)	1. No treatment 2. Moderate-intensity statin drug 3. High-intensity statin drug	QALYs	Optimize treatment plans of genetic testing	Yes
Schell et al. 2016 [40]	Finite (annual)	Hypertension	10 states (healthy, history of CHD, history of stroke, history of CHD and stroke, survived a CHD, survived a stroke, death from non-CVD, death from CHD, death from stroke and dead)	Choose an appropriate medication treatment (from thiazide diuretics, β-blockers, calcium channel blockers, angiotensin-converting enzyme inhibitors, and angiotensin II receptor antagonists)	Discounted QALYs	Optimize sequential treatment	Yes
Choi et al. 2017 [37]	Finite (annual)	Hypertension	7 states (well, adverse event without CVD history, MI, stroke, post CVD, adverse event with CVD history, death)	1. Stop medication treatment 2. Remain on current treatment 3. Change to other medication treatment	Discounted QALYs	Optimize sequential treatment	Yes
Ibrahim et al. (2016) [38]	Finite (every clinical visit)	Stroke prevention Atrial fibrillation	15 states (defined by patient sensitivity and the international normalized ratio of response to warfarin)	Choose an appropriate dosage of warfarin (0 mg; 2.5 mg; 5 mg; 7.5 mg; 10 mg)	Discounted QALYs	Optimize treatment decision	Yes
Tilson and Tilson (2013) [25]	Finite (3 mo)	Aneurysms	20 states (6 defined by recovery from treatment or a SAH event, plus 13 post-recovery states, and death)	1. No treatment 2. Surgery 3. Endovascular repair	Discounted QALYs	Optimize initial treatment selection	No
Wu et al. (2012) [29]	Finite (every hospital stay)	Acute ischemic stroke	30 states (defined by 6 patient characteristics)	1. Using anticoagulant agents 2. Using TCM treatment of replenishing qi and wen yang 3. Using TCM treatments for clearing heat and extinguishing wind 4. Using use TCM treatments for relaxing the bowels 5. Using herbal medicine	Neurological functional impairment scores	Compare effectiveness of different treatment combinations	No
Shen et al. (2020) [26]	Finite (every 7 d)	Stroke	236 states (defined by age, disease history, complications, Western medicine diagnosis, and TCM syndrome differentiation)	1. Use rehabilitation therapy 2. Use traditional Chinese medicine decoction 3. Use acupuncture treatment 4. No intervention	Scores for neurological function impairment	Optimize initial treatment selection	No
Escandell-Montero et al. (2014) [47]	Finite (unclear)	Anemia	Unclear number of states (defined by degree of anemia, hemoglobin trend, dose of darbepoetin alfa, patient group)	Choose an appropriate dosage of darbepoetin alpha (0 µg/kg/wk; 0.25 µg/kg/wk; 0.50 µg/kg/wk; 0.70 µg/kg/wk; 1 µg/kg/wk)	Hemoglobin level	Optimize treatment decision	Yes
Suen et al. (2018) [46]	Finite (every mo)	Tuberculosis	7 states (cured of tuberculosis, patients with DS tuberculosis, patients with DR tuberculosis, healthy patients defaulted from treatment, DS patients defaulted from treatment, DR patients defaulted from treatment, death)	1. Drug-sensitivity tests 2. Waiting and observing 3. Waiting and not observing	Net monetary benefit	Optimize treatment decision	No

ASCVD atherosclerotic cardiovascular disease, CHD coronary heart disease, CVD cardiovascular disease, DPP-4 dipeptidyl peptidase 4, DR drug resistant, DS drug sensitivity, ESA erythropoiesis-stimulating agents, MDP Markov decision process, MI myocardial infarction, net monetary benefit monetary value of QALYs-total cost, OAR organs at risk, QALYs quality-adjusted life-years, SAH subarachnoid hemorrhage, TCM traditional Chinese medicine

Flow chart of study selection for MDP applications in the field of healthcare. MDP Markov decision process Summary of MDP model applications in healthcare 1. Initial monotherapy treatment (metformin, sulfonylurea, or others) 2. Initial dual-therapy (metformin + sulfonylureas, metformin + DPP-4 inhibitors or others) 3. Initial triple therapy (metformin + sulfonylureas + α-glucose inhibitor) 1. Providing insulin boluses based on sensor data 2. Traditional insulin care At ages 30, 40, and 50 y: 1. Do nothing 2. Conduct surgery 3. Start screening At all intermediate ages (31–39, 41–49, and 51–65 y): 1. Do nothing 2. Start screening 3. Stop screening 1. Do nothing 2. Pap test 3. Colposcopy without Pap test 1. Treatment modalities with a high risk 2. Treatment modalities with a lower risk 3. No treatment 1. Transplant 2. Wait 1. Accepting the cadaveric offer 2. Accepting the living-donor liver 3. Waiting for one more period 1. Accepting a specific drug treatment 2. Waiting 11 state variables (characterizing multiple combinations of cardiovascular complications and diagnostic test outcomes) Unclear number of states 1. No action 2. Medication treatment 3. Surgical procedure (angioplasty or coronary artery bypass surgery) 4. Investigative procedures (angiogram or stress test) 1. No treatment 2. Moderate-intensity statin drug 3. High-intensity statin drug 1. Stop medication treatment 2. Remain on current treatment 3. Change to other medication treatment Stroke prevention Atrial fibrillation 1. No treatment 2. Surgery 3. Endovascular repair 1. Using anticoagulant agents 2. Using TCM treatment of replenishing qi and wen yang 3. Using TCM treatments for clearing heat and extinguishing wind 4. Using use TCM treatments for relaxing the bowels 5. Using herbal medicine 1. Use rehabilitation therapy 2. Use traditional Chinese medicine decoction 3. Use acupuncture treatment 4. No intervention 1. Drug-sensitivity tests 2. Waiting and observing 3. Waiting and not observing ASCVD atherosclerotic cardiovascular disease, CHD coronary heart disease, CVD cardiovascular disease, DPP-4 dipeptidyl peptidase 4, DR drug resistant, DS drug sensitivity, ESA erythropoiesis-stimulating agents, MDP Markov decision process, MI myocardial infarction, net monetary benefit monetary value of QALYs-total cost, OAR organs at risk, QALYs quality-adjusted life-years, SAH subarachnoid hemorrhage, TCM traditional Chinese medicine Researchers have applied MDP models to optimize initial treatment selection [25, 26] and the timing of transplantation [27, 28], to compare the effectiveness of different combinations of treatment [29], to optimize screening policy [30], and to prevent disease-related complications [31]. However, 16 studies concern the optimization of treatment decisions [32-47]. Five studies use the MDP to optimize treatment decisions for cancer [30, 35, 39, 42, 43], five focus on optimizing the treatment of diabetes mellitus [31–34, 41], and the remaining (N = 13) studies are concerned with liver diseases [27, 28], high blood pressure/hypertension [37, 40], hepatitis C [44], atherosclerotic cardiovascular disease [45], ischemic heart disease [29, 36], atrial fibrillation [38], anemia [47], tuberculosis [46], aneurysms [25], and stroke [26]. The MDP approach has been used to determine the optimal sequence of chemotherapy and radiation therapy [35, 39, 42, 43] and to select the appropriate drugs for anemia [47], tuberculosis [46], atherosclerotic cardiovascular disease [45], and hepatitis C [44]. In studies by Meng et al. [32], Mason et al. [33], and Shifrin and Siegelmann [41], MDP is applied to optimize the management of diabetes medication for glycemic control. An MDP-based treatment recommendation system for diabetes medication steps has also been proposed by Oh et al. [34]. In studies by Choi et al. [37] and Schell et al. [40], MDP is used to develop an automated strategy to select suitable anti-hypertensive medications and dosages for patients, thus accounting for their heterogeneity. In contrast, the articles by Ibrahim et al. [38] and Hauskrecht and Fraser [36] are primarily theoretical and do not apply MDP to any actual clinical settings. Of the studies identified, 11 address treatment decisions at the individual level, especially in applications for diabetes and ischemic heart disease [32, 33, 35–38, 40, 41, 44, 45, 47]. They apply risk engines using individual-level covariates (e.g., the Framingham model [48] and the UKPDS risk engine [49]) to calculate transition probabilities between states with different treatments. In the gastro-esophageal cancer treatment application, the transition probability is calculated individually using the expected toxicity level and demographic variables [35]. In contrast, for the hypertension application, the authors examined several individual-level covariates, including 11 variables used as treatment effect modifiers to modify baseline risks [37, 40]. Ibrahim et al. [38] include different transition probabilities when analyzing/optimizing the length of the initiation period of anticoagulation therapy. In all, 20 studies concern MDPs with a finite time horizon, while another three articles involve MDPs with an infinite time horizon [27, 28, 36]. Infinite-horizon MPDs do not require a pre-defined time horizon. For most algorithms to work and result in a well-defined optimal solution, however, these models do require a boundedness condition on the value function. Most studies define states according to clinically relevant variables and discretely, with numbers ranging widely from 4 to 8492 states (see Table 1), except for one study [36] that reports 11 state variables rather than listing all the states. Nine studies consider three actions [25, 28, 30, 34, 37, 39, 43, 45, 46], while six consider two actions [27, 31–33, 41, 44], four use five actions [29, 35, 38, 40], three use four actions [26, 36, 47], and one does not specify the number of actions [42]. In most cases, larger numbers of actions distinguished are associated with greater complexity in the process of finding an optimal solution. Rewards most frequently consist exclusively of health benefits, with more than half of the studies having optimal treatment outcomes as their objective [25–29, 31, 32, 34, 37, 38, 40–45, 47]. Only three studies focus on minimizing costs [30, 35, 36], and three other studies use the combination of treatment outcomes and costs (or net benefits) as the reward function [33, 39, 46].

Assumptions and Requirements of MDP

An MDP model explicates a stochastic control process and formally consists of four essential elements: states, actions, transition probabilities, and rewards. Three common assumptions of all studies in clinical settings are as follows: (i) both states and action space are a finite set; (ii) an absorbing state is included in the Markov process, either death or severe functional impairment, which is essential for any finite-horizon MDP to obtain an optimal solution; (iii) MDP states are observable and mutually exclusive. Several authors make additional assumptions based on the characteristics of specific research questions. For example, Alagoz et al. [28] assume that the reward function is positive and non-increasing in a particular state after a cadaveric liver transplant action. This implies that the intermediate reward does not increase as the patient deteriorates. For diabetes, Eghbali-Zarch et al. [31] assume that the treatment decision of insulin is irreversible (implying that, once patients initiate insulin, they remain on it until the end of the time horizon), thus avoiding optimal strategies that would not correspond to clinical practice. Similarly, to mimic current clinical practice, Choi et al. [37] exclude a dosage decrease in the action space. In the study by Kim et al. [42], a non-zero dose is assumed in each treatment. In this sense, additional assumptions could be added to the model to accommodate current treatment practice or to avoid clinically unrealistic or unacceptable solutions.

Overview of Existing HE Decision Models for Depression

In all, we identified 63 existing HE decision models in the review of existing HE models, more than half of which are STMs (Appendix Fig. S1 in ESM). The number of model states distinguished varies from three to eight (Appendix Table S1 in ESM). In 21 studies, states are defined by disease severity in terms of clinically relevant criteria (e.g., symptom scores for depression). Only five studies have a lifetime time horizon [50-54]. In the remaining studies, except for one study with a very short time horizon (3 months) [55] and one with a relatively long time horizon (11 years) [56], the time horizon varies between 1 and 5 years [6–9, 57–77]. The models focus predominantly on five categories of interventions (Appendix Fig. S2 in ESM). In all, 16 studies use a healthcare perspective [7, 8, 50–52, 63, 65–68, 70–73, 75, 76], while nine adopt a societal perspective [6, 9, 53, 55, 56, 59, 60, 64, 77]. Only two studies use the payer perspective [57, 62], and the rest present results for both the healthcare and societal perspectives [54, 58, 61, 69, 74]. We applied the MDP to reproduce the research carried out by Ssegonja et al. [74]. This study was chosen for three reasons. First, it involves a relatively small number of states, such that it is easy for readers to understand and suitable for use as an example. Second, the study reports all model parameters clearly, providing a basis for reformulating it into MDP. Finally, the model structure of a pure STM (rather than a combination of Markov and decision tree) facilitates reformulation. The study by Ssegonja et al. [74] uses a cost-effectiveness analysis at the cohort level to compare a group-based cognitive behavior therapy (GB-CBT) preventive intervention for depression with a non-intervention option in Sweden for adolescents, using an STM. The transition from subthreshold depression to depression and from subthreshold to healthy was affected by GB-CBT, as illustrated in Fig. 3.

Fig. 3

Simplification of model structure in the original paper [74]

Simplification of model structure in the original paper [74] Translating this decision problem to an MDP formulation, Fig. 4 displays the process of reformulating an existing study into an MDP model, designed to explore the best decision between treating adolescents with GB-CBT and leaving them untreated. The possible decisions are represented by the actions (treating with GB-CBT or leaving untreated).

Fig. 4

Process of the Markov decision process (MDP) model based on the original model by Ssegonja et al. [74]. Note: s denotes the current state; denotes the next state; denotes the reward at time t. The variable is a discount factor. indicates the monetary value of quality-adjusted life-years in the state at time t, taking the decision ; indicates the total cost in the state at time t, taking the decision ; denotes state value function, which is the expected monetary return starting from state s; indicates the expected monetary return starting from state s, taking action a at time t; indicates the optimal value function over all decisions in the state ; is the optimal value function for action a in the state ; t is measured in years According to the original study, the value was 0.97. The Bellman optimality equation was used to find the solution [78]. Based on the optimal state value function at the following decision epoch, the optimal action-value function was calculated, as shown in Fig. 4. The model was coded in Python software 3.3.8 using the MDP toolbox [79]. In keeping with the uncertainty analysis in the original study, we also considered different willingness-to-pay (WTP) thresholds. The values for each state are presented in Table 3, along with different WTP thresholds. Note that, for this simple example, the optimization could be simplified to decide whether GB-CBT should be implemented in the first epoch, given that the action space for each decision epoch except the first is confined to a single action.

Table 3

Value of different states with different willingness-to-pay thresholds

WTP = 20,000		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${q}^{*}\left(subthreshold,intervention\right)$$\end{document}q∗subthreshold,intervention = 134		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${q}^{*}\left(subthreshold,no intervention\right)$$\end{document}q∗subthreshold,nointervention = 131
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${v}^{}(s)$$\end{document}v∗(s)(US$1000)
Healthy	Subthreshold depression		Depression	Recovered	Remission
137	122		−52	−12	32
WTP = 60,000		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${q}^{*}\left(subthreshold,intervention\right)$$\end{document}q∗subthreshold,intervention = 440		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${q}^{*}\left(subthreshold,no intervention\right)$$\end{document}q∗subthreshold,nointervention = 435
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${v}^{}(s)$$\end{document}v∗(s)(US$1000)
Healthy	Subthreshold depression		Depression	Recovered	Remission
430	407		160	216	280
WTP = 100,000		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${q}^{*}\left(subthreshold,intervention\right)$$\end{document}q∗subthreshold,intervention = 747		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${q}^{*}\left(subthreshold,no intervention\right)$$\end{document}q∗subthreshold,nointervention = 738
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${v}^{}(s)$$\end{document}v∗(s)(US$1000)
Healthy	Subthreshold depression		Depression	Recovered	Remission
722	692		372	445	528

Value of different states with different willingness-to-pay thresholds At the WTP threshold value of US$20,000/QALY, the was US$134,000 at t = 1, and the was US$131,000. The optimized value function when choosing to implement GB-CBT is therefore higher than for the alternative strategy, and the former is thus optimal. This means that choosing the intervention brings a net profit. We therefore conclude that adolescents can benefit from the GB-CBT preventive interventions and that it can also generate good value for money, as compared with leaving adolescents with subthreshold depression untreated. This conclusion is consistent with Ssegonja et al. In contrast to the original HE model, the MDP structure allows for more flexibility. We could now extend the action space and consider other strategies (e.g., starting the preventive treatment after a person has been in the subthreshold space for one period). This could be achieved by separating more minor decision epochs that would allow interventions to be performed at more appropriate times, as well as by increasing the number of actions, making it possible to compare multiple preventive treatments simultaneously. In addition, the comparison between different strategies is based on the reward function, and it might therefore be relatively easy to vary the weight assigned to health outcomes or costs to investigate impact on the optimal decision.

Assessment of the Suitability of MDP for Solving Sequential Treatment Decisions in Depression

The Markov property is a precondition for any MDP. To assess the suitability of MDP for depression, it is important to recognize two important assumptions of an MDP. First, the state space and the action space are finite. A state explosion might occur, especially in a state-transition system with many processes or a complex data structure. This means that an infinite number of states could trap the model in an endless loop, causing it to fail in finding the optimal solution. All existing HE models for depression consist of a finite number of states (varying from three to eight), indicating that the application of MDP to optimizing sequential treatment decisions for depression would probably not result in a state explosion problem. The second assumption is that MDP states are observable, which essentially corresponds to the situation in which we know with certainty the disease from which the patient is suffering at all epochs. As for other diseases, the five core elements of MDP for depression are decision epoch, state space, action space, reward, and transition probabilities. The decision epoch of the MDP structure could be the beginning of each treatment cycle, with a decision made at every clinical visit. In practice, this would depend on the frequency of visits. The MDP states could be defined by depression severity. Depression differs from many somatic illnesses, in which states are distinguished according to clinical parameters (e.g., blood glucose level in diabetes mellitus). Such clinical parameters are not easily defined for depression. As illustrated by the review of HE models, the states in most studies concerning depression are defined by disease severity in terms of clinically relevant criteria (e.g., symptom scores for depression). Regarding the action space, depression interventions can largely be divided into two categories: psychotherapies and medications. The action/treatment choice for a patient at a specific point in time could thus be simplified to no intervention, psychotherapy, medication use, or both. In reality, however, many different medications and psychotherapies might be distinguished, and different intensities (dosages and hours of therapy per unit of time) and combinations could be considered. Finally, QALYs, costs, or their combination could serve as a reward, depending on the objective of the decision maker. The heterogeneity of patients with depression could also be integrated into the MDP, allowing for individuals experiencing different trajectories. In theory, therefore, it would be feasible to use MDP to optimize the sequential treatment decision at the individual level. For depression, individual-level covariates (including age, gender, baseline symptomatology, educational level, or socio-economic position) could be used to calculate different transition probabilities between states with specific treatment. This would nevertheless require sufficient data on how these covariates affect transition probabilities. Although MDP proved to be suitable for supporting sequential treatment decisions for depression, several issues continue to require careful consideration. For example, (i) how many states to distinguish and how to define them based on severity; (ii) how to decide on the proper granularity of the treatment choices and decision epochs considered; (iii) which individual characteristics are important to include when optimizing at the individual level; (iv) how to achieve a balance between the level of detail in treatment specification and the feasibility of optimization.

Discussion

Markov decision processes can be regarded as an extension of a state-transition model, which is the most frequently applied model structure in health economic evaluations. The STM model structure is based on the Markov chain, which is also the underlying structure in MDPs. In contrast to STMs, however, MDPs include actions and rewards, thereby allowing greater flexibility in defining treatment strategies and enhancing the optimization of these strategies. To optimize sequential treatment decisions in depression, the MDP structure is relevant and interesting for further pursuit. The current study identifies 23 applications of MDP in healthcare, 16 of which use MDP to solve sequential treatment decisions in somatic disease. This demonstrates how MDP has been used to address treatment issues related to somatic disease. In addition, the reformulation of the existing HE model provides insight into how MDP can be applied to depression, and the added value of MDP demonstrates that it has the capacity to make dynamic comparisons of more interventions over time than would a traditional STM. Our study is subject to several limitations. First, we merely analyze the potential use of MDP for depression in theory. In real-world practical settings, the sequential treatment decision problem might be more complex. Second, we do not assess the quality of each paper, as our main aim is to explore a model of optimizing decision treatment for depression, rather than to analyze the existing publications systematically. Moreover, our search was limited to publications written in English. While we are relatively confident that we identified most existing HE models for depression, we are less certain about our coverage of MDP applications in healthcare, as there is a long list of journals in which such applications could potentially be published. Furthermore, the MDP structure is difficult to identify when it is not adequately described or when it is included as a component of a hybrid model. Third, our review of HE decision models is relatively brief and focused only on aspects that are relevant to the aims of our study. For a complete overview of existing models and their characteristics, other more extensive reviews are available [80, 81]. Sequential decision making in depression treatment is a difficult problem that has given rise to a large volume of research. While some trials have investigated the appropriate type of treatment for patients with depression [82, 83], optimization through a formal simulation modeling approach for depression has yet to be conducted. The repeated choice of optimal sequential treatment decisions (e.g., remain with the current intervention, change to another intervention, or stop treatment) could also help to identify the best treatment duration, based on individual characteristics and a predefined objective. Recently, a new methodological framework known as whole disease modeling (WDM) has attracted attention. This framework is characterized by its ability to reflect decisions occurring at multiple points within the entire clinical trajectory of a disease. As with MDP, it aims to support decision making throughout the clinical trajectory. In contrast, however, WDM emphasizes macro-level HE evaluation considering all relevant aspects of the disease and its treatment from the preclinical phase until death at the system level (e.g., of a national healthcare system). Like MDP, its decision node is transferable across the entire process, as opposed to the single decision node in conventional HE models. At the same time, however, MDP is suitable for supporting decisions concerning a sequence of treatment decisions that support optimal clinical treatment at the individual level, whereas WDM would not usually allow treatment decisions to be changed based on patient characteristics within a short period. More specifically, the scope of a WDM is usually wider, while its depth is lower. The current study provides a review of MDP applications within the field of healthcare and demonstrates that the MDP has the potential to steer the optimization of sequential treatment to aid personalized treatment decisions in the treatment of depression. This could potentially inspire healthcare decision makers, modelers, and the research community with regard to optimizing the allocation of healthcare resources.

Conclusion

The MDP has been successfully used to address healthcare decision-making problems, especially for those involving sequential treatment decisions. For depression, existing STMs have potential for fitting into the MDP approach, thereby laying a solid foundation for developing an MDP for depression. This approach might be better than STM at depicting continuous treatment decision making. In addition to supporting clinicians by offering an optimal sequential treatment plan over time, this model also provides information about the best timing for starting and ending treatment for heterogeneous patient groups. As in current practice, clinicians lack decision rules on what to do for each patient, when, and in which order. We conclude that the MDP is a potentially powerful model for optimizing sequential treatment in depression and for finding the optimal treatment duration for individuals. Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 353 KB)

This article demonstrates that the Markov decision process (MDP) has the potential to steer the optimization of sequential treatment to facilitate personalized treatment decisions.

This article specifically identifies applications of the MDP that have been used to address sequential decision problems in somatic diseases. The results indicate that the MDP could potentially be useful for addressing sequential decision making in depression.

Our study reveals that, although the structure of the state-transition model could potentially be suitable for extension into the MDP model, doing so would require a sufficiently extensive model.

65 in total

1. Cost-Effectiveness of Repetitive Transcranial Magnetic Stimulation versus Antidepressant Therapy for Treatment-Resistant Depression.

Authors: Kim-Huong Nguyen; Louisa G Gordon
Journal: Value Health Date: 2015-07 Impact factor: 5.725

Review 2. A critical review of model-based economic studies of depression: modelling techniques, model structure and data sources.

Authors: Hossein Haji Ali Afzali; Jonathan Karnon; Jodi Gray
Journal: Pharmacoeconomics Date: 2012-06-01 Impact factor: 4.981

3. A Markov Decision Process Model for Cervical Cancer Screening Policies in Colombia.

Authors: Raha Akhavan-Tabatabaei; Diana Marcela Sánchez; Thomas G Yeung
Journal: Med Decis Making Date: 2016-11-02 Impact factor: 2.583

4. Cost-effectiveness of vortioxetine versus venlafaxine (extended release) in the treatment of major depressive disorder in South Korea.

Authors: Sang-Eun Choi; Mélanie Brignone; Seong Jin Cho; Hong Jin Jeon; Rangrhee Jung; Rosanne Campbell; Clément Francois; Dominique Milea
Journal: Expert Rev Pharmacoecon Outcomes Res Date: 2016-01-13 Impact factor: 2.217

5. Guided Internet-Based Cognitive Behavioral Therapy for Depression: Implementation Cost-Effectiveness Study.

Authors: Jordi Piera-Jiménez; Anne Etzelmueller; Spyros Kolovos; Frans Folkvord; Francisco Lupiáñez-Villanueva
Journal: J Med Internet Res Date: 2021-05-11 Impact factor: 5.428

6. Cost effectiveness analysis comparing repetitive transcranial magnetic stimulation to antidepressant medications after a first treatment failure for major depressive disorder in newly diagnosed patients - A lifetime analysis.

Authors: Jeffrey Voigt; Linda Carpenter; Andrew Leuchter
Journal: PLoS One Date: 2017-10-26 Impact factor: 3.240

7. Data-Driven Markov Decision Process Approximations for Personalized Hypertension Treatment Planning.

Authors: Greggory J Schell; Wesley J Marrero; Mariel S Lavieri; Jeremy B Sussman; Rodney A Hayward
Journal: MDM Policy Pract Date: 2016-10-17

8. Design of a health-economic Markov model to assess cost-effectiveness and budget impact of the prevention and treatment of depressive disorder.

Authors: Joran Lokkerbol; Ben Wijnen; Henricus G Ruhe; Jan Spijker; Arshia Morad; Robert Schoevers; Marrit K de Boer; Pim Cuijpers; Filip Smit
Journal: Expert Rev Pharmacoecon Outcomes Res Date: 2020-11-23 Impact factor: 2.217

9. Economic evaluation of agomelatine relative to other antidepressants for treatment of major depressive disorders in Greece.

Authors: Nikos Maniadakis; Georgia Kourlaba; Theodoros Mougiakos; Ioannis Chatzimanolis; Linus Jonsson
Journal: BMC Health Serv Res Date: 2013-05-10 Impact factor: 2.655