Literature DB >> 35039718

A neural network approach to optimising treatments for depression using data from specialist and community psychiatric services in Australia, New Zealand and Japan.

Aidan Cousins¹, Lucas Nakano², Emma Schofield¹, Rasa Kabaila³.

Abstract

This study investigated the application of a recurrent neural network for optimising pharmacological treatment for depression. A clinical dataset of 458 participants from specialist and community psychiatric services in Australia, New Zealand and Japan were extracted from an existing custom-built, web-based tool called Psynary . This data, which included baseline and self-completed reviews, was used to train and refine a novel algorithm which was a fully connected network feature extractor and long short-term memory algorithm was firstly trained in isolation and then integrated and annealed using slow learning rates due to the low dimensionality of the data. The accuracy of predicting depression remission before processing patient review data was 49.8%. After processing only 2 reviews, the accuracy was 76.5%. When considering a change in medication, the precision of changing medications was 97.4% and the recall was 71.4% . The medications with predicted best results were antipsychotics (88%) and selective serotonin reuptake inhibitors (87.9%). This is the first study that has created an all-in-one algorithm for optimising treatments for all subtypes of depression. Reducing treatment optimisation time for patients suffering with depression may lead to earlier remission and hence reduce the high levels of disability associated with the condition. Furthermore, in a setting where mental health conditions are increasing strain on mental health services, the utilisation of web-based tools for remote monitoring and machine/deep learning algorithms may assist clinicians in both specialist and primary care in extending specialist mental healthcare to a larger patient community.

Entities: Chemical

Keywords: Depression; LSTM; Machine learning; Mental health; Treatment optimisation

Year: 2022 PMID： 35039718 PMCID： PMC8754538 DOI： 10.1007/s00521-021-06710-3

Source DB: PubMed Journal: Neural Comput Appl ISSN： 0941-0643 Impact factor: 5.606

Introduction

Depression

According to the International Statistical Classification of Disease Revision 10 (ICD-10), depression is categorised under mood (affective) disorders where the central disturbance is a change in mood to depression or elation [1]. Depressive episodes occur when the patient suffers from decreased mood, energy, activity, self-esteem and self-confidence [1]. Patients exhibit lesser capacity for enjoyment and decreased interest in their avocations while also exhibiting ideas of guilt or worthlessness [1]. There are several subtypes of depression including major depressive disorder, bipolar disorder and melancholic depression [1]. Major depressive disorder (MDD) is described as a short- or long-term impairment causing significant reductions in quality of life and psychosocial functioning by affecting areas of life including mood, affect, motivation and cognition [2]. According to the ICD-10, bipolar disorder is characterised by two or more episodes where the patient’s mood and activity levels are significantly altered resulting in occasions of elevation of mood and increased energy or decreased energy and activity [1]. Hypomania is a period of persistent mild elevation of mood, increased energy and activity, while mania is mood elevated higher than what would be expected in the patient’s circumstances [1]. Melancholic depression has been a contentious issue within psychiatric circles, with some viewing the disorder as a dimension of a severe expression of clinical depression (“melancholia”), while others believe it is a separate type of depressive disorder [3-5]. Regardless of its classification, melancholic depression is a severe form of depression with prominent neurovegetative symptoms [3]. Depression is a major public health issue and cause of disability [6]. An analysis of data gathered in the Global Burden of Disease study (held from 1990 to 2017) showed that the incidence of depression had increased worldwide from 172 to 258 million or by 49.86% [7]. The study further analysed age-standardised incidence rates (ASR) and estimated annual percentage changes (EAPC) of the 195 countries included in the population study. ASR was found to be significantly increased in 29 countries, slightly increased in 132 countries, slightly decreased in 9 countries and significantly decreased in 9 countries [7]. Interestingly, the number of people with depression increased in all five socio-demographic index (SDI) levels (low, low-middle, middle, high-middle and high); however, the ASR only increased in the high-SDI region [7]. Geographically, the number of people with depression increased in all geographical locations. 93.7% of patients with depression in 2017 were found to have MDD [7]. A systematic review conducted on the same Global Burden of Disease data by Ferrari et al. showed that depressive disorders were the second leading cause of years lived with a disability (YLDs) in 2010. MDD accounted for 8.2% of global YLDs [6]. Depressive disorders were also a leading cause of disability-adjusted life-years (DALYs) with MDD accounting for 2.5% of global DALYs. Furthermore, MDD was found to be the cause of 16 million suicide DALYs. In Australia, the National Survey of Mental Health and Wellbeing completed in 2007 estimated that 45% of the Australian population aged between 16 and 85 years would experience a mental disorder during their lifetime and 1 in 5 had experienced a mental disorder in the last year [8]. Since 1998, there have been many attempts to quantify the prevalence of depression with a 2019 study showing depressive symptoms in 7.4% and 13.2% of Australian males and females, respectively [9-12]. The economic impact of mental health illness in Australia was estimated to be $10.6 billion AUD in the 2018–2019 financial year with studies projecting that health service costs will increase by 45% between 2006 and 2026, and that the cumulative cost of mental illness over the next 30 years will exceed $2.63 trillion AUD [8, 13–15]. In New Zealand, the 2018 New Zealand Mental Health Monitor (NZMHM) provides one of the most recent reviews of the mental health of the New Zealand general population [16]. The NZMHM found that 32% of New Zealanders had an experience with mental distress and an additional 32% of New Zealanders lived with someone with a lifetime experience of mental distress [16]. Furthermore, 49% of the population were aware of a close friend who experiences mental distress [16]. During the recent COVID-19 pandemic in 2021, Gasteiger et al. completed a cross-sectional study with a cohort size of 681 adults older than 18 in New Zealand [17]. While the sample of the New Zealand population was 89% female and older than the median age of the general population (40 and 37.4 years, respectively), they found that 64% of participants reported symptoms of depression, 53% reported symptoms of anxiety, 31% reported moderate-to-severe symptoms of depression, and 24% reported moderate-to-severe symptoms of anxiety [17]. Although outdated, the 2005 Depression Service Plan released by the Midcentral District Health Board in New Zealand estimated the cost of depression in New Zealand at $750 million per year [18]. More recent estimates are correlated to population-adjusting Australian figures. Similar to Australia and New Zealand, Japan has a high mental health burden [19, 20]. Community-based mental health surveys play an important role in estimating the prevalence of mental disorders because most people do not seek treatment even if they experience psychological distress equivalent to diagnosable mental disorders [20]. The World Mental Health Japan Survey Second conducted from 2013 to 2015 with a sample size of 2450, randomly selected residents between the ages of 20 and 75 [20, 21]. They found a lifetime prevalence of any mood disorder of 4.57% and 7.21% for men and women, respectively [20, 21]. They also found a 12-month prevalence of any mood disorder of 2.24% and 3.26% for men and women, respectively [20, 21]. Other studies have estimated the economic impact of depression to be between 1.29 billion and 3 trillion yen (or $16.46 billion AUD to $38 billion AUD) yearly [22, 23].

Diagnosis, treatment and management

Currently, there are no biological markers for depression; therefore, diagnosis is based only on symptomatology [24]. A wide range of screening tests that vary in length, style, administration and psychometric evaluation are currently used [25]. The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) Report, released in 2004, was an important milestone in optimising management of patients with depression [26, 27]. Four thousand patients from 41 different primary and psychiatric care sites were evaluated using a novel 4-level treatment paradigm. The remission rate, using The Quick Inventory of Depressive Symptomatology, was 32.9%, 30.6%, 13.6% and 14.7% for the levels 1, 2, 3 and 4, respectively [27]. Although a cumulative remission rate of 65.8% was achieved, crucially, 34.2% did not improve with medical interventions for depression [27]. For non-remission patients, it is critical to identify the earliest point at which to stop further medication trials because the longer the time to remission, the less chance a patient has of reaching remission [27].Optimising an individual patient’s treatment plan so that they have the highest likelihood for remission is a needed advancement in psychiatry.

Psynary

Psynary (www.psynary.com) is a web-based tool used to support clinicians, organisations and patients in the diagnosis and treatment of mood and anxiety disorders in accordance with the ICD-10 [28]. The Psynary system was designed to collect de-identified data in a format that could be represented numerically with no requirement for language processing. It is anonymous, which avoids privacy and data protection issues that can be associated with online platforms [29]. The Optimisation of Treatment for Mood and Anxiety Disorders Study 1 (OptiMA1) involved two parallel studies in New Zealand and Japan for the purpose of validating key outcome measures developed for the Psynary platform [28]. These outcome measures included the R8 Depression score and R8 Anxiety score [28]. The R8 Depression score was designed to fully capture the wide range of symptom domains seen in depression, across the full range of illness severity, and to be sensitive to treatment effects [28]. Participants (n = 270) were recruited from patients registered on Psynary by the public community mental health clinic at Nelson Marlborough District Health Board in New Zealand (n = 62) and by the private clinic serving the English-speaking population at the American Clinic Tokyo in Japan (n = 208) [28]. Patients with probable mood or anxiety disorders who registered to Psynary between 24th March 2016 and 25th October 2018 were invited to complete either an online or written consent process prior to participating in the study [28]. Inclusion criteria included completing Psynary in the English language, being over 18 years of age for NZ, or 20 years of age for Japan, and having an ICD-10 diagnosis of a current depressive episode (unipolar or bipolar) or anxiety disorder (ICD-10 F31.3, F31.4, F31.81, F32.1, F32.2, F33.1, F33.2, F40-F43) confirmed by the treating clinician at their initial appointment [28]. An early analysis of the cohort (n = 131) suggested a similar doubling of remission rates in response to treatment optimisation, but over a shorter 90-day time period compared to the STAR*D trial (Fig. 1) [29]. This validation study found that if patients had less than a 20% reduction in R8 Depression score after 6 days of treatment, the negative predictive value for non-remission was 98% [29] (Table 1) .

Fig. 1

OptiMA1 Data vs STAR*D Trial Remission Percentages over a 48-week period [29]. The OptiMA1 study results are shown in colour, while the STAR*D trial is shown in grey (Colour figure online)

Table 1

Day 4–6 R8 Depression percentage reduction [29]

Group	R8 reduction < 20%	R8 reduction > 20%	Total (n)
Total (n)	101	64	165
Non-remission (n, %)	99 (98%)	38 (59%)	137 (83%)
Remission (n, %)	2 (2%)	26 (41%)	28 (17%)

OptiMA1 Data vs STAR*D Trial Remission Percentages over a 48-week period [29]. The OptiMA1 study results are shown in colour, while the STAR*D trial is shown in grey (Colour figure online) Day 4–6 R8 Depression percentage reduction [29] From OptiMA1, Psynary appears to be beneficial in monitoring response to treatment and guiding timing of medication optimisation [28]. These findings suggested that the Psynary database could potentially be utilised to develop predictive models of response to treatment and guide treatment selection [28]. Since then, OptiMA2 has been conducted, a qualitative study to establish Nurse Practitioner-Psynary-assisted care pathway in Port Macquarie, New South Wales, Australia. This is now being followed up by OptiMA3 that will collect naturalistic clinical outcomes from that pathway. Both these studies also include ethics approval to analyse the naturalistic clinical outcomes from all participants. These have been included in this current study to examine the feasibility of using the Psynary database to conduct machine learning approaches to develop predictive algorithms to guide treatment selection.

Applying predictive tools in mental health

Unlike other medical specialties that heavily rely on quantitative biomarkers to assist in the diagnosis of diseases, treatment planning and measurement of outcomes, mental health still predominantly relies on clinical measures. Mental health team members use patient interviews, questionnaires and patient reports to evaluate signs and symptoms [30, 31]. The experience and subjectivity of the clinician are heavily leveraged to make inferences about symptomatology from this data, which is difficult to imitate using supervised deep learning (DL) models. Supervised DL models require a training set containing “true” labels to optimise model parameters before the model can be used to predict the diagnostic outcome of new subjects. Therefore, the quality of expert-provided diagnostic labels used for training sets the upper-bound for the predictive performance of the model [31-33].

Related studies and study rationale

Despite the difficulties, there has been a large amount of research in the usage of ML/DL techniques in mental health. A number of studies have focussed on scraping social media posts to predict depression for example, using multinomial naive Bayes (with an accuracy of 76.69%) and event-driven tendency warning models (best recall rate of 0.668 and F-measure of 0.624) [34, 35]. Other studies have focussed on analysing electronic health records. Nemesure et al. used a sample of 4184 students who underwent a general health and psychiatric assessment for the diagnosis of MDD and Generalised Anxiety Disorder. A high-level XGBoost model was used to produce an area under the receiver operating characteristic (AUROC) of 0.67, a sensitivity of 0.55 and specificity of 0.7 for major depression [36]. Further studies have used ML/DL techniques to classify, diagnose and grade depression using clinical data, externally validated screening tests and biomarkers [37-44]. In 2018, Gao, Calhoun and Sui completed a review of 66 studies focussed on MDD that have used magnetic resonance imaging to either classify MDD from controls or other mood disorders or investigated treatment outcome predictors for individual patients [45]. The 66 studies investigated by Gao, Calhoun and Sui could be further classified into 9 groups of studies that were focussed on diagnosis/classification of only MDD [46-66], only bipolar disorder [67], diagnosis/classification of MDD compared to bipolar disorder [68-82], diagnosis/classification of MDD compared to Generalised Anxiety Disorder [83], diagnosis/classification of Schizophrenia and mood disorders (including MDD and Generalised Anxiety Disorder) [84, 85], brain abnormalities in patients who have been diagnosed with a particular mood disorder [86-94], analysing therapeutic responses to MDD [95-100], using neurobiological markers/neuroimaging for diagnosis/classification [101-109] and predicting responses to electroconvulsive therapy [110, 111]. In 2021, de Nijs et al. were able to create individualised models to predict 3- and 6-year symptomatic and global outcomes of patients with schizophrenia-spectrum disorders (mainly with established illness, but variable illness duration) based on patient-reportable data with a study size of 523 schizophrenia-spectrum patients [112]. Also in 2021, Taliaz et al. used genetic, clinical and demographic data from patients with solely MDD in the STAR*D Report to create an ML algorithm that generated an accurate predictor of response to three antidepressant medications with an average balanced accuracy of 72.3% and 70.1% across the medications in validation and test sets, respectively [113]. They then obtained data from the Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) of patients treated with citalopram (a selective serotonin reuptake inhibitor) [113]. This external validation yielded accuracy of 60.5% and 61.3% for the STAR*D and PGRN-AMPS, respectively [113]. The aim of this study is to demonstrate how the clinical data collected by Psynary can be used to optimise the treatment of depression in the clinical setting. We hypothesised that ML/DL techniques could be used on the data collected in the Psynary database to support the optimisation of the treatment and management of depression. Although there have been previous studies that used clinical data to build ML/DL algorithms to diagnose depression and a quite recent study that predicted the response of three medications in treating MDD, we are unaware of any published study that has used clinical data to build a single, all-inclusive ML/DL algorithm to optimise the treatment of all subtypes of depression.

Methodology

Data collection and preprocessing

The Psynary database included details from patients being treated for depression (in any of its forms) at associated community and specialist psychiatric facilities in either American Clinic Tokyo, Tokyo, Japan, Tokyo Mental Health, Tokyo, Japan, Kawai Clinic, Otago, New Zealand, or Port Macquarie Base Hospital Mental Health Service, Port Macquarie, Australia, between the dates of 24/03/2016 and 01/06/2021. Other inclusion criteria were 18 years or older, had capacity to consent, fluency in English or Japanese (as Psynary is only available in those two languages), patients who had alcohol-related comorbidities with a primary diagnosis of a mood or anxiety disorder and exclusion criteria were clients presenting with psychotic symptoms, significant comorbid alcohol and drug misuse where these were the primary diagnoses, terminal/life threatening physical comorbidity and cognitive impairment or intellectual disability. These were assessed on the basis of past medical history, liaison with GP and, with permission, collateral history from a relative or carer. All patients consented to their de-identified data being included in the Psynary database for the purposes of ongoing research, and the OptiMA1, 2 and 3 studies were approved by the relevant local ethics committees. OPTIMA patients were instructed to create an online Psynary account and complete a baseline evaluation with the in-person support of a psychiatrist or nurse practitioner and then complete weekly reviews either by themselves remotely or with the help of a nurse practitioner with expertise in mental health. The Psynary database then logged this baseline information which included; past history such as family history, past episodes of depression and other disorders, age of onset, hospital admissions, past deliberate self-harm and past attempted suicide; deliberate self-harm questions including suicidal thoughts intensity, suicidal planning and suicidal attempts and other domains such as alcohol intake, current psychiatric medication (name and dose) previous treatment changes and whether they had electroconvulsive therapy sessions. The Psynary system incorporates the Hypomania Checklist 16 item (HCL-16) [114], Generalised Anxiety Disorder-7 (GAD-7) [115] and Patient Health Questionnaire-9 (PHQ-9) [25] scores and generates ICD-10 diagnoses for all major mood and anxiety disorders. It also includes the proprietary main outcome measures of this study, the R8 Depression and R8 Anxiety scores. All of these metrics were used in building the algorithm reported in this article except for the HCL-16. When a patient completed a review, the Psynary database then logged; current psychiatric medication list, current alcohol intake, current deliberate self-harm responses, recent electroconvulsive therapy treatment, R8 Depression, R8 Anxiety, PHQ-9, GAD-7 current scores. All of these metrics were used in the review aspect of the model except for the HCL-16. On the 1st of June 2021, anonymised clinical data from Psynary was exported and preprocessing undertaken. All patient demographic categorical data was one-hot encoded and empty data values were handled by denoting 0 to “No” and -1 to “No Answer” depending on the question field. All data preprocessing and algorithm creation was built on Jupyter lab version 3.0.14, python version 3.8.5 and using an AMD Ryzen Threadripper 2950X 16-Core Processor 3.50 GHz, 32.0 GB of installed RAM and an NVIDIA RTX 2080 GPU. The review data for each patient was stored in chronological order and filtered and preprocessed in function of the medications taken during the period corresponding to that review. Medications were categorised into one of the following drug classes: analgesics, antidepressants, antihistamines, antipsychotics, anxiolytics, benzodiazepines, hypnotics, mood stabilisers, opioid antagonists and stimulants. Antidepressants were further divided into their mechanism of action: mono-amine oxidase inhibitors (MAOIs), noradrenaline reuptake inhibitors (NaRls), selective serotonin reuptake inhibitors (SSRIs), serotonin–norepinephrine reuptake inhibitors (SNRIs), tricyclic antidepressants (TCAs) and atypical antidepressant medications. Groups with low representation in reviews were discarded, as were patients without reviews for better-represented medications. If, for a given review, a respondent had taken medications of more than one group, that review appears as duplicates in each corresponding array. As is expected from this type of data, review length frequency decreased exponentially (Fig. 2). The average number of reviews per patient was 11.5, with 95% of patients having 39 reviews or fewer.

Fig. 2

Frequency histogram illustrating the number of reviews participants completed

Frequency histogram illustrating the number of reviews participants completed The objective of our analysis was to propose treatment by predicting the effectiveness of psychiatric medications for each individual patient. To better fit the available data, we formulated our problem as a regression of the best R8 Depression score the patient would achieve while taking any given medication [28]. While training the model, the sequence of reviews provided to the model was truncated randomly, for each medication. This truncation used a geometric distribution to allow for early prediction of remission. In addition to the regression of R8 Depression score prediction, the model was trained to predict which medications will be prescribed by clinicians. We found this additional objective to both help prevent overfitting by increasing the ratio of training data to model capacity and improve performance of the model when used as a recommendation system. To do this, the medications prescribed by clinicians in all reviews are given as a multi-hot vector. For medication optimization, predicted R8 Depression scores are corrected by coefficients calculated from the prescription probabilities scaled with a learned factor , the purpose of which is to control the relative importance of the prescription probabilities. The algorithm then proposes the medication with the lowest scaled score (Eq. 1).

Implementation of long short-term memory model

Neural network models have become the de-facto standard in most machine learning applications. There are several choices of architecture for sequence modelling, most notably recurrent neural networks (RNN), 1-dimensional convolutional networks and attention networks [116]. The advantages of the latter two have largely to do with better gradient propagation and performance with long sequence lengths [116]. The Psynary review data, having short sequence lengths and unidirectional chronological structure, appears to be best suited for the simple-to-implement RNN structure. For this particular application, a long short-term memory (LSTM) network was chosen due to its ready availability in deep learning libraries and proven effectiveness [117]. The architecture of the model in this study separately processes each sequence of reviews corresponding to each medication using the same LSTM network. We suspect that the patterns learned by networks trained in each medication separately are largely similar, and training an individual network for each medication group would severely reduce sample sizes due to subdivision of the dataset. Using a larger, single recurrent network was empirically found to produce better and more consistent results. Fully connected networks filled in the rest of the picture—they were used for feature extraction from both the baseline questionnaire and reviews, and at the end of the network for the final R8 Depression score and doctor prescription predictions. Training was performed in three steps. The fully connected network (FCN) review feature extractor and LSTM modules were first trained in isolation on the R8 Depression score prediction task. This recurrent model was then integrated into the baseline questionnaire feature extractor and final FCN layers with frozen weights, and the remainder of the model trained. Finally, the entire model was annealed using low learning rates. Due to the low dimensionality of the data, small size of the model and its recurrent structure, GPU acceleration was not used. Figure 3 shows a diagrammatic illustration of the designed algorithm.

Fig. 3

Block diagram of created algorithm

Optimisation test of model

To further validate the significance of the model, an optimisation test was performed. The model’s outputs were taken and tested by posing the questions “After seeing the patient review data, should the medication regimen change or should it continue?” and “If the medication should change, what should the medication be changed to?”. Because of the extra data dimensionality of patient reviews, instead of sensitivity, specificity, positive predictive and negative predictive values being calculated, the sensitivity, positive predictive value, false positive rate and correct medication change accuracy were calculated (Eqs. 2, 3, 4 and 5).

Results

This study utilised data collected from the OptiMA1, 2 and 3 [28, 118] studies, contained in the Psynary database. In total, data from 458 participants was included in this study (Table 2). The large majority of participants were from Japan (85.6%), followed by New Zealand (9.6%) and then Australia (4.8%). The sex distribution slightly favoured females (57%) and the median age and age range of the participants were 32.5 and 18 to 73 years, respectively (Fig. 4). Most of the participants were employed full-time (52.4%), spoke English (98.9%) and did not have the support of a carer (85.2%). The patient’s age distribution showed two anomalous data points (ages 2 and 17) which were deleted and considered missing data.

Table 2

Variables and their descriptions used in the proposed algorithm

Symbol	Description
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P$$\end{document}P	Prescription probabilities vector (network output)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S$$\end{document}S	Predicted R8 scores vector (network output)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β	Scalar balancing coefficient
M	Scaled medication scores

Fig. 4

Frequency histogram illustrating patient age distribution (in years)

Variables and their descriptions used in the proposed algorithm Frequency histogram illustrating patient age distribution (in years) The participants’ psychiatric characteristics were also interpreted (Table 3). The average R8 Depression raw score was 34.28 out of a maximum of 84 with depression remission considered at a raw score less than 14 [28]. The PHQ-9 score is grouped based on severity into none (0–4), mild (5–9), moderate (10–14), moderately severe (15–19) and severe (20–27) [25]. The participants’ average PHQ-9 score was 16.54 (considered within moderately severe depression) with 59% of the participants considered either moderately severe or severe. The GAD-7 score is grouped into mild (0–4), moderate (5–9) or severe (10–15) anxiety [115]. A total of 354 participants (77.3%) recorded scores reflecting severe anxiety. Furthermore, 326 (71%) participants were considered to have a subtype of unipolar depression and 84 participants (18.3%) were considered to have a diagnosis of a subtype of bipolar disorder . The remaining 3 participants with a provided diagnosis (0.7%) were considered to have hypomania (Table 4) .

Table 3

Participant demographic characteristics

Demographic characteristics	n (%)
Total participants	458
Males	197 (43%)
Females	261 (57%)
Country of residence
Australia	22 (4.8%)
New Zealand	44 (9.6%)
Japan	392 (85.6%)
Work Status
Employed full-time	240 (52.4%)
Employed part-time	45 (9.8%)
Unpaid domestic work	26 (5.7%)
Permanently sick/disabled	4 (0.9%)
Retired	3 (0.7%)
Self-employed full-time	16 (3.5%)
Self-employed part-time	18 (3.9%)
Student	76 (16.6%)
Unemployed	30 (6.6%)
Language
English	453 (98.9%)
Japanese	5 (1.1%)

Table 4

Participant psychiatric characteristics

Psychiatric characteristics	n (%)
Total participants	458
R8 Depression score
Mean	34.28
Standard deviation	13.03
Maximum	77
Minimum	0
PHQ-9 score
Mean	16.54
Standard deviation	5.57
Max	27
Min	1
0–4	2 (0.4%)
5–9	54 (11.8%)
10–14	95 (20.7%)
15–19	133 (29%)
20–27	138 (30%)
Not measured	36 (7.9%)
GAD-7 score
Mean	9.04
< 5	7 (1.5%)
5–10	61 (13.3%)
11–15	354 (77.3%)
Not measured	36 (7.9%)
ICD-10 diagnosis
F30: hypomania	3 (0.7%)
F31.31: bipolar disorder, current episode depressed, mild	3 (0.7%)
F31.32: bipolar disorder, current episode depressed, moderate	7 (1.5%)
F31.4: bipolar affective disorder, current episode severe depression without psychotic symptoms	30 (6.6%)
F31.7: bipolar affective disorder, currently in remission	1 (0.2%)
F31.81: bipolar II disorder	43 (9.4%)
F32.0: mild depressive episode	23 (5%)
F32.1: moderate depressive episode	31 (6.8%)
F32.2: severe depressive episode without psychotic symptoms	95 (20.7%)
F33.0: recurrent depressive disorder, current episode mild	24 (5.2%)
F33.1: recurrent depressive disorder, current episode moderate	43 (9.4%)
F33.2: recurrent depressive disorder, current episode severe without psychotic symptoms	110 (24%)
Not provided	45 (9.3%)

Participant demographic characteristics Participant psychiatric characteristics The participants in this study were treated by two psychiatrists. An analysis of the frequency of medication classes prescribed showed SSRIs being the most prescribed medication class, followed by antipsychotics and mood stabilisers (Fig. 5). SSRIs are likely to be the most prescribed medication class due to the existing treatment guidelines in Japan [119], New Zealand and Australia [120] recommending them as first-line medications.

Fig. 5

Frequency histogram illustrating the number of patients prescribed a particular medication class

Frequency histogram illustrating the number of patients prescribed a particular medication class The model in this study, trained with the Psynary dataset, without reviews, had an accuracy of predicting R8 Depression-defined remission of 49.8% (seen in Fig. 6). After the model had processed 2 reviews, the model had a prediction accuracy of 76.5%.

Fig. 6

Accuracy of the developed model at predicting a patient’s R8 Depression score, with and without reviews

Accuracy of the developed model at predicting a patient’s R8 Depression score, with and without reviews The accuracy of predicting the individual medication classes that led to the best R8 Depression scores were then examined to understand the strengths and weaknesses of the algorithm. Without including reviews, it was found that the most accurate group was SSRIs (78.9%), followed by antipsychotics (53.6%), mood stabilisers (36.8%), benzodiazepines (35.7%), atypical antidepressants (29.4%), NaRIs (16.7%) and finally SNRIs (10.5%). However, when the model was trained with the reviews, antipsychotics were predicted to be the medication with the best R8 Depression scores (88%) followed by SSRIs (87.9%), mood stabilisers (69.1%), benzodiazepines (53.5%), atypical antidepressants (39.2%), SNRIs (27.7%) and NaRIs (25.9%) (Fig. 7). These findings are not surprising as they mirror the frequency of patients prescribed a particular medication class (Fig. 6).

Fig. 7

Accuracy of model predicting by medication class with and without reviews

Accuracy of model predicting by medication class with and without reviews Analysis of training and test loss per epoch graph showed no significant changes between training and test loss (Fig. 8).

Fig. 8

The developed model’s train/test loss per epoch

The developed model’s train/test loss per epoch We also optimised the β coefficient (learned factor) to balance the trade-off between R8 Depression scores and prescribed medication prediction. The best β coefficient was found to be 0.6465 (Fig. 9).

Fig. 9

Optimisation of the developed model’s β coefficient by accuracy with and without reviews

Optimisation of the developed model’s β coefficient by accuracy with and without reviews We then completed an optimisation assessment of our model. Due to the extra dimension of patient reviews, an error matrix was not developed. Instead, we evaluated the model by assessing if asked to change medication, whether the model continued with the current medication or changed and then if it changed whether the model selected the correct medication (Fig. 10). After just 2 reviews, the precision of changing medications (Change Recommendation Precision) was 97.4%, the recall (Change Recommendation Recall) was 71.4%, the accuracy of changing medications to the correct medication (Correct Change Recommendation) was 54.3%, and the false positive rate (Incorrect Change Recommendation) was 2.6% (Fig. 10).

Fig. 10

Optimisation change measures in relation to number of patient reviews

Optimisation change measures in relation to number of patient reviews Finally, we reviewed the mean absolute error by number of reviews. Our model can predict R8 Depression scores before reviews with a mean absolute error of 14.6 ± 9.5. As the number of reviews increases, accuracy increases and deviation decreases significantly, though with some instability, likely due to the decreasing sample count (as shown in Fig. 11).

Fig. 11

The developed model’s mean absolute error per number of reviews

Discussion

In this study, we describe the development of a DL algorithm that predicts reaching remission from depression in response to psychiatric medications. The capabilities of the algorithm are shown by its ability to initially predict remission at an accuracy of 49.8% before processing reviews (Fig. 6). After only 2 reviews, the model had an accuracy of 76.5%. The medications predicted to have the best R8 Depression score results were antipsychotics (88%), followed by SSRIs (87.9%) and mood stabilisers (69.1%) (Fig. 7). This is of significant clinical importance as current treatment protocols for all depression subtypes are largely reliant on trial-and-error of treatment guidelines. Using the model created in this study as an adjunct to normal care could reduce the iterations of trial-and-error. Our model was designed around the restrictions imposed by our available data. Several design decisions were made that took into account the specific biases imposed by the naturalistic dataset. An analysis of a single medication was made difficult by the large variety of prescribed medication, which subdivided the data into very small subsets. To ameliorate this and to reduce dataset entropy, the data was categorised by medication class, with antidepressants divided into groups based on mechanism of action. Furthermore, many patients were given more than one medication at a time, which made unitary classification incompatible with the desired application of the model. Because the best indicator of remission was the R8 Depression score, regression of this value is a very similar task to binary prediction of remission, with the added benefit of giving a measure of expected improvement. The biggest performance improvement came from the addition of the auxiliary prediction of prescribed medications. This served as a heuristic technique that directed medication selection to that which a doctor is likely to prescribe during the treatment process. Put simply, it acts as a tiebreaker, allowing the model to accurately select between medication types it expects to perform well, without adding penalties to the R8 Depression regression. Training a model on a naturalistic dataset like Psynary comes with several drawbacks. The most notable of these are bias caused by the skewed distributions of the data, information gaps caused by the selectiveness of prescribed medications and several layers of survivorship bias. Our dataset comes from patients treated in Australia, New Zealand, and Japan and from clinics with different population profiles. The clinic in Tokyo focuses on primary care of mostly foreign residents and make up the majority of the dataset (85.6%), while patients from Oceania are largely secondary care patients. There is also asymmetric representation of medication types. As there were only two psychiatrists prescribing medications for the participants in this study, the dataset is heavily reliant on the two clinicians’ experience which could be a potential source of survivorship bias. Our choices for model design also came with drawbacks. Most notably, the aggregation of medication types, the use of regression over explicit recommendation and the implementation of a recurrent model were decisions that compromise the model in specific ways. The prediction of a medication class is less useful for a clinician than predicting a specific medication, as patient response can vary significantly to medications of the same action type. Throughout a treatment, a patient may be prescribed several different medications of the class, which, as of now, our model cannot distinguish in any meaningful way. A possible solution for future work may be to use a finer category system, possible from learned clusters. As our model is not an explicit recommender system, multi-medication recommendation becomes difficult, with significantly lower accuracy. The main reason for choosing R8 Depression regression was the strong biases in the data. A binary recommender system would mostly replicate the biases of the prescribing clinicians, rather than learning from the performance of medications on patients. This was another reason that medications were aggregated by class, eliminating the bias coming from preference to a specific brand over another. Future work should focus on this specific point, possibly with a parallel network that uses the activation values of the regression network to produce specific recommendations. The use of recurrent models in neural networks may be on the decline. Many sequence modelling objectives are now better solved using convolutional or attention networks, such as in the state-of-the-art audio generator WaveNet [121] and for translation tasks in transformer architecture [117]. These options have several advantages over recurrent networks, such as shorter gradient propagation distances, better performance with long sequences and easier training. Our choice of using an LSTM caused, in particular, difficulty in training the network, as the parallel fully connected and recurrent portions did not train at equal rates. Using a different sequence modelling architecture would, however, have cost added complexity and development time that we felt was best allocated to other parts of the model. Finally, the training target is the best R8 Depression score reached during treatment in the presented model. Because of this, the model does not handle a relapse of depression once a participant has reached R8 Depression-defined remission. In clinical practice, it is common for patients to relapse in depression and is evident in this study where 177 participants (38.6%) have been diagnosed with a form of recurrent depression (Table 2). Future studies will include updating this current model to reflect this clinical situation. To evaluate the performance of the model under the intended conditions of treatment optimization, the model was evaluated in the binary decision problem of continuing the current medication or changing medication. We evaluated the model’s accuracy both in the choice of changing medication and in the precision in recommending a new medication. The results are presented as a function of the number of reviews in Fig. 11. After just 2 patient reviews, the model shows high recall (71.4%) in determining if a change in medication is needed. In the scenario of such a recommendation, the precision of the medication choice is similarly high (97.4%). To further validate the model, we then asked the model to select a different medication. The accuracy of changing to the correct medication was 54.3%. As illustrated in Fig. 10, these values are quite consistent over 40 patient reviews. Taliaz et al. is the only published study similar to this one. Using genetic, clinical and demographic data from 1679 STAR*D participants, the team generated a hybrid, multi-step DL binary-response algorithm to predict whether a participant was either a “responder” or “nonresponder” for citalopram (SSRI), sertraline (SSRI) and venlafaxine (SNRI). This yielded an average balanced accuracy of 70.1% for the final test set, compared to a 46.8% initial response rate for participants with MDD. While they used a wide variety of data to arguably create a more visible clinical picture of the participant, our model was able to perform with an initial accuracy of 49.8% and accuracy of 76.5% after 2 patient reviews while accommodating for several different types of medication classes, subclasses and subtypes of depression. Taliaz et al. also completed additional statistical calculations of the models performance. The sensitivity, specificity, positive predictive value and negative predictive value were 68.7%, 71.4%, 71.7% and 69%, respectively [113]. In comparison after only 2 reviews, our recall rate (also known as sensitivity) was 71.4%, similar to Taliaz et al., results and our precision rate (also known as positive predictive value) was 97.4% which far exceeded Taliaz et al., results. From the 66 studies analysed by Gao, Calhoun and Sui, only 5 focussed on predicting the therapeutic response and all 5 solely focussed on MDD instead of all subtypes of depression [45]. In 2015, Korgaonkar et al. investigated objective brain volumetric measures of patients with MDD to reliably predict symptomatic remission with their initial antidepressant medication [96]. Their study found two decision trees that had high probability prediction scores of non-remission and were replicated [96]. These were 1) left middle frontal volume less than 14.8 mL and right angular gyrus volume greater than 6.3 mL which discerned 55% of non-remitters with an 85% accuracy; and 2) fractional anisotropy values in the left cingulum bundle less than 0.63, right superior fronto-occipital fasciculus less than 0.5 which discerned 15% of non-remitters with 84% accuracy [96]. Also in 2015, Williams et al. investigated whether amygdala activation stimulated by emotion was a general or differential predictor of response to escitalopram (SSRI), sertraline (SSRI) and venlafaxine (SNRI) using MRI [97]. Their model classified responders vs non-responders with an overall accuracy, cross-validation accuracy, sensitivity and specificity of 75%, 75%, 77% and 72%, respectively [97]. Korgaonkar et al. examined whether diffusion tensor imaging measures of the anterior cingulate and limbic white matter are useful prognostic biomarkers for MDD [98]. They found that when adding age to a model that looked at the stria terminalis fractional anisotropy and the cingulate fractional anisotropy, their model had an overall accuracy of 74%, sensitivity of 74% and specificity of 75% [98]. Gong et al. investigated the diagnostic and prognostic potential of pre-treatment structural neuroanatomy using support vector machine (SVM) in patients with non-refractory depressive disorder or refractory depressive disorder (two subtypes of MDD) [99]. Sixty-one patients were prescribed either an SSRI, TCA or SNRI. The diagnostic accuracy, sensitivity and specificity when applying SVM to both grey and white matter images were 65.22% [99]. The prognostic accuracy based on both grey and white matter images resulted in an accuracy, sensitivity and specificity of 69.57% [99]. Finally, Costafreda et al. interestingly investigated the functional neuroanatomy of showing sad faces of different intensities to patients with acute MDD before cognitive behavioural therapy to predict clinical response [100]. They found that prediction of remission from MDD was significant at the lowest and highest intensities of sadness. Both situations had a sensitivity of 71% and specificity of 86% [100]. Our optimisation assessment illustrates that our model is highly comparable to these 5 studies when comparing our sensitivity rate (71.4%). What is very important from our results and cannot be compared to these 5 studies are our precision rate of 97.4%, false positive rate of 2.6% and correct change accuracy of 54.3%. Additionally, when solely comparing model accuracies, our model has an accuracy of 76.5% after 2 patient reviews and performs better than all studies apart from Korgaonkar et al. With depression prevalence rates increasing and placing pressure on existing services, new management techniques need to be considered to ensure all patients are treated, and in remission, as soon as possible [122]. Currently, the majority of patients with depression present to primary care [123] and secondary mental health services predominantly accommodate severe and/or high risk presentations, leaving the majority of people with depression unable to access specialist mental health services [123-126]. Introduction of telemedicine as an aspect of management for mental disorders covers two important factors of care; it improves patient access in areas where specialists are limited (including regional, rural and remote areas), and it improves disease control and relapse prevention [127]. Self-completed web-based systems ensure consistency of data collection and some patients are more likely to disclose relevant information on a self-completed assessment compared to clinician-interviewed settings [128, 129]. Furthermore, online systems provide the opportunity to collect results and track progress [128]. This study has illustrated the benefits of using Psynary, a web-based tool as an adjunct to normal care for depression. Unlike typical mental health care settings, the data is recorded in a quantitative format that lends itself to analysis. This study has also shown the clinical importance of the application of ML/DL algorithms to clinically collected data for the optimisation of depression treatment. To further validate the use of Psynary and the DL algorithm created in this study, further work will involve diversifying the demographic of participants and clinicians, the locations of the clinics and implementing genomic data (single nucleotide polymorphisms) with a genomics team to create a multi-dimensional, robust model .

Conclusion

Depression has a major health and socio-economic impact, is continuing to increase in prevalence and mental health services are struggling with the increased demand . Using a combination of web-based tools and DL algorithms in a clinical setting, as outlined in this study, could lead to an increase in clinician accessibility and reduce time taken to reach optimal treatment protocols thereby reducing the prevalence of depression and its socio-economic impact on societies.

Data and material availability

All clinical data used in this article is fully anonymised from the point of capture. It is available on application to Dr Andrew Kissane for the purpose of validation, regulation and further research. The data and the web-based Psynary system remain the intellectual property of International Medical K.K.

114 in total

1. Neural correlates of sad faces predict clinical remission to cognitive behavioural therapy in depression.

Authors: Sergi G Costafreda; Akash Khanna; Janaina Mourao-Miranda; Cynthia H Y Fu
Journal: Neuroreport Date: 2009-05-06 Impact factor: 1.837

2. Psychosis, depression and behavioural disturbances in Sydney nursing home residents: prevalence and predictors.

Authors: H Brodaty; B Draper; D Saab; L F Low; V Richards; H Paton; D Lie
Journal: Int J Geriatr Psychiatry Date: 2001-05 Impact factor: 3.485

3. Disorder-specific volumetric brain difference in adolescent major depressive disorder and bipolar depression.

Authors: Frank P MacMaster; Normand Carrey; Lisa Marie Langevin; Natalia Jaworska; Susan Crawford
Journal: Brain Imaging Behav Date: 2014-03 Impact factor: 3.978

4. Cost of depression among adults in Japan.

Authors: Yasuyuki Okumura; Teruhiko Higuchi
Journal: Prim Care Companion CNS Disord Date: 2011

5. A brief measure for assessing generalized anxiety disorder: the GAD-7.

Authors: Robert L Spitzer; Kurt Kroenke; Janet B W Williams; Bernd Löwe
Journal: Arch Intern Med Date: 2006-05-22

6. Diagnosing melancholic depression: some personal observations.

Authors: Gordon Parker
Journal: Australas Psychiatry Date: 2016-07-20 Impact factor: 1.369

7. Depression Disorder Classification of fMRI Data Using Sparse Low-Rank Functional Brain Network and Graph-Based Features.

Authors: Xin Wang; Yanshuang Ren; Wensheng Zhang
Journal: Comput Math Methods Med Date: 2017-04-12 Impact factor: 2.238

Review 8. Machine learning in major depression: From classification to treatment outcome prediction.

Authors: Shuang Gao; Vince D Calhoun; Jing Sui
Journal: CNS Neurosci Ther Date: 2018-08-23 Impact factor: 5.243

9. Subcortical volumes differentiate Major Depressive Disorder, Bipolar Disorder, and remitted Major Depressive Disorder.

Authors: Matthew D Sacchet; Emily E Livermore; Juan Eugenio Iglesias; Gary H Glover; Ian H Gotlib
Journal: J Psychiatr Res Date: 2015-06-16 Impact factor: 4.791

10. Optimizing prediction of response to antidepressant medications using machine learning and integrated genetic, clinical, and demographic data.

Authors: Dekel Taliaz; Amit Spinrad; Ran Barzilay; Zohar Barnett-Itzhaki; Dana Averbuch; Omri Teltsh; Roy Schurr; Sne Darki-Morag; Bernard Lerer
Journal: Transl Psychiatry Date: 2021-07-08 Impact factor: 6.222

1 in total

1. Deep Learning-Based Mental Health Model on Primary and Secondary School Students' Quality Cultivation.

Authors: Shuang Li; Yu Liu
Journal: Comput Intell Neurosci Date: 2022-07-06

1 in total