Philip J Batterham1, Alison L Calear1, Bridianne O'Dea2, Mark E Larsen2, David J Kavanagh3, Nickolai Titov4, Sonja March5, Ian Hickie6, Maree Teesson7, Blake F Dear4, Julia Reynolds1, Jocelyn Lowinger8, Louise Thornton7, Patrick Gorman1. 1. Centre for Mental Health Research, Research School of Population Health, The Australian National University, Canberra, Australia. 2. Black Dog Institute, University of New South Wales, Sydney, Australia. 3. Centre for Children's Health Research, Institute of Health & Biomedical Innovation, School of Psychology and Counselling, Queensland University of Technology, Brisbane, Australia. 4. eCentreClinic and MindSpot Clinic, Department of Psychology, Macquarie University, Sydney, Australia. 5. School of Psychology and Counselling, University of Southern Queensland, Ipswich, Australia. 6. Brain and Mind Centre, University of Sydney, Sydney, Australia. 7. Matilda Centre, University of Sydney, Sydney, Australia. 8. Think Feel ACT, Melbourne, Australia.
Abstract
BACKGROUND: Digital mental health interventions can be effective for treating mental health problems, but uptake by consumers and clinicians is not optimal. The lack of an accreditation pathway for digital mental health interventions is a barrier to their uptake among clinicians and consumers. However, there are a number of factors that may contribute to whether a digital intervention is suitable for recommendation to the public. The aim of this study was to identify the types of evidence that would support the accreditation of digital interventions. METHOD: An expert workshop was convened, including researcher, clinician, consumer (people with lived experience of a mental health condition) and policymaker representatives. RESULTS: Existing methods for assessing the evidence for digital mental health interventions were discussed by the stakeholders present at the workshop. Empirical evidence from randomised controlled trials was identified as a key component for evaluating digital interventions. However, information on the safety of users, data security, user ratings, and fidelity to clinical guidelines, along with data from routine care including adherence, engagement and clinical outcomes, were also identified as important considerations when evaluating an intervention. There are considerable challenges in weighing the evidence for a digital mental health intervention. CONCLUSIONS: Empirical evidence should be the cornerstone of any accreditation system to identify appropriate digital mental health interventions. However, robust accreditation systems should also account for program and user safety, user engagement and experience, and fidelity to clinical treatment guidelines.
BACKGROUND: Digital mental health interventions can be effective for treating mental health problems, but uptake by consumers and clinicians is not optimal. The lack of an accreditation pathway for digital mental health interventions is a barrier to their uptake among clinicians and consumers. However, there are a number of factors that may contribute to whether a digital intervention is suitable for recommendation to the public. The aim of this study was to identify the types of evidence that would support the accreditation of digital interventions. METHOD: An expert workshop was convened, including researcher, clinician, consumer (people with lived experience of a mental health condition) and policymaker representatives. RESULTS: Existing methods for assessing the evidence for digital mental health interventions were discussed by the stakeholders present at the workshop. Empirical evidence from randomised controlled trials was identified as a key component for evaluating digital interventions. However, information on the safety of users, data security, user ratings, and fidelity to clinical guidelines, along with data from routine care including adherence, engagement and clinical outcomes, were also identified as important considerations when evaluating an intervention. There are considerable challenges in weighing the evidence for a digital mental health intervention. CONCLUSIONS: Empirical evidence should be the cornerstone of any accreditation system to identify appropriate digital mental health interventions. However, robust accreditation systems should also account for program and user safety, user engagement and experience, and fidelity to clinical treatment guidelines.
There is extensive evidence that digital mental health interventions can be effective for
treating mental health problems.[1-5] However, uptake by community members
and clinicians for some evidence-based services has been limited, which may result from
a failure of implementation into practice,[1,6] from limitations on the
appropriateness of research evidence,[7] or from market saturation of untested interventions.[8,9] Digital mental health interventions
are defined here as online programs and apps that deliver structured therapy (based on
existing evidence) to the user, in either a self-guided or clinician-supported format.
This definition excludes therapy delivered live by clinicians using technology,
informational or non-therapeutic programs, and tools used exclusively by clinicians.
Clinician and consumer barriers to the use of these interventions include limited
awareness of digital mental health interventions and their appropriateness for different
mental health problems, preference for face-to-face care, lack of knowledge of the
evidence base supporting their use, poorly integrated delivery pathways, concerns around
the privacy of using these interventions, a perceived gap between users’ actual needs
and the problems typically addressed by diagnostically driven online interventions, a
lack of user input into the design and delivery of interventions, and a lack of formal
accreditation processes that would feed into the identification and delivery of
appropriate digital interventions to the public.[1,6,10,11]A number of directories or portals of digital mental health interventions are now
available, and may be employed to support the use of such interventions in clinical,
community or individual settings. Some of these, such as Beacon[12] and Psyberguide[13] include an evaluation of the empirical evidence for each available intervention,
which allows users to identify programs that have the strongest evidence base for
efficacy. However, there are broader considerations that may be important in assisting
potential users to determine the most appropriate intervention.[14] Accreditation systems for digital interventions need to account for multiple
factors in identifying interventions that are appropriate for public use and providing
recommendations to clinicians or consumers.We report on the outcomes of an expert workshop that was convened to discuss the
challenges in assessing evidence for digital mental health interventions and how these
issues might feed into systems of accreditation for such interventions. Formal systems
of accreditation for digital interventions are emerging, such as the UK Digital
Assessment Questionnaire (DAQ)[15] and the US Food and Drug Administration’s procedures for approval of digital
tools as medical devices.[16] However, the workshop was designed to identify the viewpoints of diverse
stakeholders on how different types of evidence can be integrated into new accreditation
systems, without specific reference to these emerging accreditation systems.
Method
Attendees of the workshop (n = 15) were researchers in digital
mental health (n = 9, three were also clinicians), clinicians
(n = 5), a representative of people with lived experience of
mental health condition (n = 1), Government representatives from
the Australian Department of Health (n = 2) and a note-taker.
Although only one consumer representative attended, specific discussion points
around consumer engagement were included in the agenda, with discussion in this area
led by the consumer representative. The 5-h workshop was conducted as an unmoderated
group discussion in February 2018 to inform the ongoing development of the
Australian Government’s headtohealth portal (https://headtohealth.gov.au), a website that lists Australian
information, resources and services to support mental health. The Beacon approach
for assessing empirical evidence for web-based therapy programs, apps and Internet
support groups was described. Beacon is a catalogue describing existing online
health interventions with an evidence rating based on randomised controlled trials
and other empirical designs.[12] Loosely following an agreed agenda, limitations to the Beacon methodology
were discussed, along with broader challenges of relying primarily on evidence from
randomised controlled trials (RCTs) for each digital program or resource to
facilitate the selection of appropriate digital interventions. In addition to
empirical evidence, other important indicators to guide selection of digital mental
health interventions for clinicians and consumers were identified. Discussions were
structured around identifying appropriate evidence for three types of interventions:
online programs, apps, and Internet support groups (which are outside the scope of
the current paper).
Results
Existing approaches to rating digital health interventions
Existing techniques for assessing digital mental health interventions were
identified by workshop participants, as current approaches may provide more
efficient pathways to developing a rigorous evaluation or accreditation
framework. Some of these approaches focus only on empirical evidence, while
others account for user experience, security of data systems and/or alignment
with clinical treatment guidelines.Relevant approaches to evaluating digital interventions identified in the
workshop include:Beacon (https://beacon.anu.edu.au/): Australian-based
directory of internationally available digital health programs
(web-based, mobile apps) with ratings based on peer-reviewed
scientific evidence;Psyberguide (https://psyberguide.org/): directory of mental
health apps with ratings on scientific evidence, user experience,
and clarity of the privacy policy, based in the United States;Mobile Application Ratings Scale:[17] a tool to rate apps focusing primarily on usability and
engagement, but also including an expert rating of content. The
scale has potential for adaptation to rate websites;NICE Guidelines (https://www.nice.org.uk/guidance): a framework for
identifying which therapeutic strategies are likely to be consistent
with treatment guidelines.Additional resources that include listings of digital interventions include:headtohealth (https://headtohealth.gov.au): recently developed
Australian Government portal for digital and other mental health
services, which does not provide an evaluation;NHS apps database (https://apps.beta.nhs.uk/): directory of apps, with
inclusion based on multiple inputs including a technical assessment
of data security, DAQ assessments and NICE guidelines, based in
United Kingdom;National Registry of Evidence-based Programs and Practices (https://nrepp.samhsa.gov): US listing of
evidence-based programs, which includes some digital programs
(listings are provider-driven and must be accredited), based in
United States.
The role and limitations of empirical evidence
The workshop discussion primarily centred on the roles of empirical evidence and
other forms of evidence, and how different forms of evidence might be used to
inform accreditation processes. There was consensus on the majority of
discussion points, except where noted below. Disagreements in the workshop were
not confined to specific stakeholder groups, but primarily occurred within the
researcher (or researcher-clinician) group.The attendees of the workshop noted the importance for clinicians and community
users to have access to an updated database of available online interventions,
with the scientific evidence for each intervention described and rated in an
accessible way. Such databases provide clinicians and consumers with up-to-date
information to guide their decisions about interventions that are most likely to
be effective for them and suited to their needs. The Beacon directory assesses
empirical evidence for digital mental health interventions largely on the basis
of the number of RCTs with a positive outcome. Based on this system, more
positive RCTs result in a higher evidence rating. However, RCTs for specific
programs are not the only form of evidence, and there are a number of challenges
and nuances that were discussed when assessing empirical evidence.RCTs are highly variable and may not be appropriate in some conditions.[18] Discussion in the workshop noted that the quality of RCTs can vary, and
depends on factors including sample size, randomisation method, blinding, and
type of comparison/control condition. The quality of an RCT may impact on the
results obtained, raising questions as to how to account for low-quality studies
when assessing the evidence base. RCTs can be conducted in a range of settings.
For example, a prevention trial of a self-guided program in schools is likely to
have different expectations and outcomes compared to a treatment trial of a
clinician-guided program delivered in a clinical setting. The type of program
(self-guided versus clinician guided), the type of participants (e.g. healthy
population versus severe clinical population, or specific subgroups defined by
age, gender, ethnicity, etc.), the delivery setting (e.g. online, clinic,
school, community) and the mental health target (e.g. depression, anxiety,
substance use) are all likely to play a role in the outcomes of a trial.
Moreover, as interventions are redesigned, turned into apps, become outdated or
are used in ways that differ markedly from trial conditions, it is unclear the
extent to which existing evidence remains valid. It was also noted in the
workshop that RCTs conducted by an organisation independent of the developers of
the intervention may be viewed by some as more rigorous than RCTs conducted by
the developers. Attendees of the workshop were of the opinion that, following
evidence from RCTs, large-scale effectiveness trials using data obtained from
routine care provide useful information to inform public policy but are rarely
conducted. There was also a consensus that evidence needed for apps was
equivalent to that for online programs.In addition to empirical evidence for effectiveness, adherence to and engagement
with a digital intervention may be important considerations. Poor adherence may
indicate an intervention is not engaging, and trials with high drop-out may have
biases in the estimation of effects. Interventions that are shown to be
effective in an RCT may still have suboptimal user engagement.[19] However, it remains imperative that the designers of interventions aim to
optimise the user experience, as this is vital for effective, safe and engaging
delivery. To this end, additional data such as adherence rates or consumer
ratings may provide a clearer picture of how engaging a program is likely to be.
It was noted that adherence is a complex outcome, and that early drop-out from
an intervention may indicate a participant has had a positive response and
hence, discontinued program use. Furthermore, trials with poor adherence or
small sample sizes are less likely to show a positive effect, as the power to
find a difference in effect is restricted. However, it was also noted that
trials with poor adherence may reflect a poorly designed and ineffective
intervention, emphasising the importance of monitoring outcomes from routine
care.Safety was discussed extensively during the workshop, and is a critical element
for the identification of digital interventions that are appropriate for public
use. The concept of safety covers both user safety (i.e. assurances that a
digital intervention will not cause harm or increase the likelihood of
deterioration) and digital safety (i.e. privacy and data security). It was noted
that few trials report rates of deterioration, although interventions that are
shown to be effective typically have fewer users with deterioration than in
control conditions.[20] Interventions that are delivered within a service setting may have more
extensive clinical data on user safety but may not have a comparator (i.e. a
control condition) to enable benchmarking of the expected rates of deterioration
without active intervention. It may be possible to assess data privacy and
security by seeking information from providers about their policies, platforms
and standards for security, and maintaining user privacy. However, independently
verifying these claims remains challenging.[21,22]There are other types of data that may indicate that a digital intervention is
likely to be efficacious or effective, beyond RCTs. Discussions focused on two
forms of data. Firstly, clinical service data from routine care (or relatedly,
other types of empirical studies such as open trials) can be used to support the
effectiveness and safety of an intervention. Programs that are delivered to the
community as a clinical service may have considerable and detailed data on how
specific types of users respond to programs over time.[23] Such services can continuously provide data on usage and clinical
outcomes, can monitor user safety and can be used to identify appropriate
clinical dosage. Such data are essential for determining the actual clinical
benefits and risks when deployed in routine care, information which is essential
for funders and planners. Secondly, fidelity to clinical guidelines may provide
further evidence that an intervention is likely to be effective. Ensuring that
program content is consistent with clinical treatment guidelines and does not
include unsupported treatment strategies is one way to provide some reassurance
that an intervention is likely to be useful and safe for users. Indeed, a
minority of attendees (reflecting some of the researchers and clinicians
providing Internet-based services) viewed fidelity to existing evidence-based
programs as sufficient for meeting minimum standards for accreditation, much as
other new (non-digital) clinical services that conform to clinical guidelines
are not expected to undergo RCTs. Fidelity to clinical guidelines or
evidence-based practice could be assessed using an expert clinical review of the
intervention, for example.[9] However, it was noted that a program may be entirely consistent with
clinical guidelines but have poor outcomes, potentially due to low user
engagement (e.g. if a program is too text-heavy).
Discussion
The outcomes of the workshop with expert stakeholders, including clinician and
consumer representatives, indicated a number of challenges for the development of
accreditation systems that provide information about the suitability of Internet
interventions based on multiple sources of evidence. There was considerable
agreement among the attendees on issues such as the need for high-quality evidence
of effectiveness, digital safety and user safety, along with demonstration that an
intervention is engaging for its intended audience. There was also agreement that
fidelity to clinical guidelines and data from routine service delivery provide
important indicators of effectiveness and safety, although there was no consensus on
whether such data alone would be sufficient for an accreditation system. There was
also no consensus on the role of RCTs. Trials were seen by all attendees as
providing high-quality evidence but the challenges of conducting timely trials that
reflect real-world use of Internet interventions may limit their applicability and
feasibility, necessitating the use of other forms of evidence. A majority of
attendees viewed programs that do not have RCT data as problematic, as evaluations
without a control group may reflect non-specific effects of an intervention or a
natural course of improving symptoms. Overall, it was not possible to form a
consensus on the best balance between multiple forms of evidence – this balance
would need to be considered within the context of how an accreditation system is
designed and delivered.Existing directory services such as Beacon that assess empirical evidence using
objective metrics (e.g. number of positive RCTs) are advantageous due to their
transparency, simplicity, objectivity and reliance on high-quality evidence.
However, this approach may not consider aspects of trial quality and other forms of
empirical evidence, which may disadvantage interventions with poor adherence or
those evaluated using small samples, or ones with evidence using non-RCT
methodology. Reliance on data from RCTs may also be insufficient to demonstrate
their uptake, clinical benefits and risks when implemented at scale as a clinical
service that is available to the public, an important consideration when developing
accreditation systems. There are other important factors to account for in assessing
whether an intervention is likely to be effective and safe. Some interventions may
be effective only for a subgroup of the population or when used in particular ways.
Consumer ratings and adherence rates may provide a guide to how engaging an
intervention is likely to be, an important consideration in recommending
interventions. Many of these factors are being incorporated into emerging
accreditation systems such as the DAQ. However, further consideration of the roles
of different forms of evidence in the development of recommendations to consumers
and clinicians is warranted, taking into account the diverse viewpoints of
developers and end users. In particular, questions remain around the feasibility,
sustainability and impact of accreditation processes and their inclusion of consumer
and clinician evaluations of interventions.
Developing a feasible, sustainable and impactful accreditation or
certification process
In developing an accreditation system for digital interventions, there are a
number of factors that are likely to be important to ensure that the system
provides useful recommendations to clinicians and consumers. At a minimum,
standards should account for some level of empirical evidence that an
intervention is effective and safe for users, along with evidence for data
security, including protections for the privacy of users and transparency around
security policies. Standards for reporting program content[24] and e-health trials,[25] including comparisons of deterioration rates and reporting of adverse
events, would assist in identifying programs that are likely to be safe for
users.In addition, use of clinical service data[26] and fidelity to evidence-based practise or clinical treatment guidelines[27] are indicators that should be routinely reported. Such data will indicate
whether a program is likely to be appropriate for use in the community and is
delivering what it purports to deliver (e.g. cognitive behaviour therapy for
depression). Clinical service data can establish ongoing positive impact and low
deterioration rates, which may be used as a requirement for ongoing
accreditation of a publicly available service. Expert clinical judgement may
provide evidence for fidelity to clinical guidelines or existing treatment
protocols, although evaluation of fidelity requires an objective rating
framework to be developed and evaluated[24] and does not guarantee effectiveness, safety or acceptability.User ratings from clinicians and consumers within a curated repository of digital
mental health interventions may also be a valuable metric to assist users to
determine whether a digital intervention might be appropriate and engaging.
There are also many challenges of implementing a user rating system, including
subjectivity of ratings, scope to ‘game’ the system, need for moderation, and
the resources required to set up and maintain a robust and user-friendly rating
system. It should also be noted that user star ratings within general app stores
have been shown to have limited correlation with measures of the clinical
quality of apps.[28] Establishing independent panels of clinician and consumer users to
provide a consistent rating process may overcome some of these challenges. An
efficacious intervention may be ineffective if it is not designed around the
needs of the user.Indicators for the multiple attributes that may be used to identify an
appropriate digital intervention are likely to be independent (e.g. user
engagement may not always be consistent with effectiveness). Therefore, it may
be challenging to develop overall benchmarks to identify whether a program
should be recommended by an accreditation process. Reporting of individual
indicators may be preferable and allow users to focus on the areas that are most
important to them, with potential for an accrediting agency to establish minimum
standards for each attribute. Accreditation procedures may require flexibility
to assess the total body of evidence available for a digital intervention and
present this evidence in a way that meets the needs of both clinicians and
consumers. This information should include the contexts and specific outcomes
where the digital intervention has documented impact.Currently, digital interventions operate along a continuum of regulatory
requirements, from therapist-guided interventions that typically must comply
with practitioner regulations, to self-guided apps that typically have no such
requirements. Apps are often developed by laypeople with no oversight or
accountability, or by software companies with limited expertise in mental health
or research methodology. Information on the attributes important to
accreditation may be collected in several ways. The onus could be placed on
providers to demonstrate that they meet minimal criteria for efficacy,
effectiveness when deployed within the targeted population or relevant service
setting, safety and user engagement, or an independent panel could collect this
information. There may be risks in a self-report model, such as a lack of
independence and limited expertise or resourcing. Alternatively, external peer
review models or a combination of methods could be used, although the volume of
interventions available is likely to require limits on the scope of external
reviews. The choice of an accreditation model may require choosing an
appropriate business model to ensure that the process is dynamic, so that
listings remain up to date, followed by building public awareness and trust in
the system. Updating the evidence base as programs age will remain a challenge,
although linking evidence updates to an accreditation process may ensure greater
accountability. There also remains a need for sufficient expertise, training and
standard processes to rigorously measure each component of an accreditation
system.
Limitations
A limitation of this paper is that other types of digital support interventions, such
as Internet support groups and chat-based therapy, may be more difficult to
incorporate into an accreditation process, as it may be challenging to assess
clinical outcomes in these settings and standardised delivery cannot be guaranteed.
The marketing and promotion of directories or accreditation systems is an additional
challenge, as existing evidence-based portals compete with app stores, search
engines, and other established and well-funded sources of information that neglect
quality indicators. The inclusion of a variety of stakeholders in the workshop was a
strength, although it was not feasible to include a wide selection of consumers or
carers, overseas experts, or other experts such as information technology experts,
who may have divergent viewpoints. Further investigation of how consumers, carers
and clinicians weigh different forms of evidence is warranted.
Conclusions
Empirical evidence for effectiveness should be the cornerstone of any accreditation
or directory system that identifies appropriate digital mental health interventions.
Although RCTs remain the strongest evidence that a program is efficacious, they may
not provide evidence of effectiveness. Furthermore, there are limitations to the use
of RCTs[7,18] and
limitations in the application of trial evidence to the delivery of interventions as
a clinical service. Ideally, RCT evidence should also be supported by evidence for
effectiveness from large-scale pragmatic effectiveness trials in the relevant
population or clinical settings, which enable regular reporting of outcomes that are
relevant to users, clinical services and funding organisations. Robust accreditation
systems should also account for program and user safety, user engagement and
experience, and fidelity to clinical treatment guidelines. The key outcomes and
indicators that go into any evaluation of existing digital interventions should be
transparent, systematic and objective.
Authors: Philip J Batterham; Matthew Sunderland; Alison L Calear; Christopher G Davey; Helen Christensen; Maree Teesson; Frances Kay-Lambkin; Gavin Andrews; Philip B Mitchell; Helen Herrman; Phyllis N Butow; Demos Krouskos Journal: Aust N Z J Psychiatry Date: 2015-04-23 Impact factor: 5.744
Authors: Nickolai Titov; Blake F Dear; Lauren G Staples; James Bennett-Levy; Britt Klein; Ronald M Rapee; Gerhard Andersson; Carol Purtell; Greg Bezuidenhout; Olav B Nielssen Journal: Aust N Z J Psychiatry Date: 2016-10-12 Impact factor: 5.744
Authors: Helen Christensen; Kristen Murray; Alison L Calear; Kylie Bennett; Anthony Bennett; Kathleen M Griffiths Journal: Med J Aust Date: 2010-06-07 Impact factor: 7.738
Authors: Elizabeth Murray; Eric B Hekler; Gerhard Andersson; Linda M Collins; Aiden Doherty; Chris Hollis; Daniel E Rivera; Robert West; Jeremy C Wyatt Journal: Am J Prev Med Date: 2016-11 Impact factor: 5.043
Authors: Stephen Ross; Gabrielle Agin-Liebes; Sharon Lo; Richard J Zeifman; Leila Ghazal; Julia Benville; Silvia Franco Corso; Christian Bjerre Real; Jeffrey Guss; Anthony Bossis; Sarah E Mennenga Journal: ACS Pharmacol Transl Sci Date: 2021-03-18
Authors: Judith Borghouts; Elizabeth Eikey; Gloria Mark; Cinthia De Leon; Stephen M Schueller; Margaret Schneider; Nicole Stadnick; Kai Zheng; Dana Mukamel; Dara H Sorkin Journal: J Med Internet Res Date: 2021-03-24 Impact factor: 5.428