Literature DB >> 30681967

The App Behavior Change Scale: Creation of a Scale to Assess the Potential of Apps to Promote Behavior Change.

Fiona H McKay¹, Sarah Slykerman¹, Matthew Dunn¹.

Abstract

BACKGROUND: Using mobile phone apps to promote behavior change is becoming increasingly common. However, there is no clear way to rate apps against their behavior change potential.
OBJECTIVE: This study aimed to develop a reliable, theory-based scale that can be used to assess the behavior change potential of smartphone apps.
METHODS: A systematic review of all studies purporting to investigate app's behavior change potential was conducted. All scales and measures from the identified studies were collected to create an item pool. From this item pool, 3 health promotion exerts created the App Behavior Change Scale (ABACUS). To test the scale, 70 physical activity apps were rated to provide information on reliability.
RESULTS: The systematic review returned 593 papers, the abstracts and titles of all were reviewed, with the full text of 77 papers reviewed; 50 papers met the inclusion criteria. From these 50 papers, 1333 questions were identified. Removing duplicates and unnecessary questions left 130 individual questions, which were then refined into the 21-item scale. The ABACUS demonstrates high percentage agreement among reviewers (over 80%), with 3 questions scoring a Krippendorff alpha that would indicate agreement and a further 7 came close with alphas >.5. The scale overall reported high interrater reliability (2-way mixed interclass coefficient=.92, 95% CI 0.81-0.97) and high internal consistency (Cronbach alpha=.93).
CONCLUSIONS: The ABACUS is a reliable tool that can be used to determine the behavior change potential of apps. This instrument fills a gap by allowing the evaluation of a large number of apps to be standardized across a range of health categories. ©Fiona H McKay, Sarah Slykerman, Matthew Dunn. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 25.01.2019.

Entities: Chemical Disease Gene Species

Keywords: apps; mobile app; mobile phone; rating; scale development; smartphone

Mesh：

Year: 2019 PMID： 30681967 PMCID： PMC6367670 DOI： 10.2196/11130

Source DB: PubMed Journal: JMIR Mhealth Uhealth ISSN： 2291-5222 Impact factor: 4.773

Introduction

The delivery of psychological and public health interventions through technology is becoming an increasingly common way to prevent illness and promote health. Smartphones and tablets are well positioned to play a role in such interventions as they offer functionalities and opportunities for personalization through the widespread availability of a range of mobile phone apps [1]. Apps play an important role in the management of illness and are a low-cost, easy avenue for the promotion of health and well-being [2-4]. In 2017, there were 325,000 health apps across the 2 most common app platforms: Google Play and iTunes [5]. This includes apps that have been developed to assist patients in the management of a range of diseases and conditions, including diabetes mellitus type 1 or 2 [1,6], pain management [7,8], the promotion of increased physical activity [9,10], improve nutrition [11,12], and the promotion of improved mental health [13,14]. Although research investigating mobile phone–based technology over recent years has shown that short message service (SMS) text message–based interventions can have a positive impact on sexual health knowledge [15] and that most health interventions can benefit from some form of phone-based activity [16], research into the effectiveness of health behavior change through apps is in its infancy, and there is no clear consensus in the research around which specific features of apps can assist in behavior change. Content analyses of apps have identified some features that may promote health behavior change in apps for smoking cessation [17], alcohol reduction [18,19], and physical activity [20,21]. However, most apps only contain a few features that could be considered to have the potential to change behavior [22]. Features that have been found to promote health behavior change include the ability to provide direct advice about behavior change and track behaviors [17] or provide information on the consequences of continuing with the behavior [19]. Conversely, those studies that have found apps to be lacking in health behavior change features have highlighted the absence of individual tailoring such as personalized notifications or the collection of background information, for example, using global position system data to identify when a person might be at a high-risk area for alcohol use [18] or simply asking a user to set a smoking quit date [17]. Studies that report on user outcomes or experiences of apps have had similarly mixed results. One systematic review that investigated the role of apps and other digital media in physical activity and diet as it relates to cancer survivorship found an overall increase in minutes of physical activity with use of the app, but mixed evidence for improved diet, and no improvement for secondary outcomes such as a reduction in anxiety or depression [23]. A recent study investigating the role of apps in improving mental health found that after 30 days of app use, mental well-being improved in those using 1 of 3 mental well-being apps tested and those using 1 of the 3 apps tested showed improvements in depression. None resulted in improvements in anxiety [24]. A systematic review and meta-analysis of studies that employed a smartphone app to increase physical activity found that the use of apps could result in significant changes to body weight and body mass index; however, nonsignificant results were identified in changes to physical activity [25]. Alongside this growing body of interest in the identification of apps that may play a role in behavior change [26,27] is an increasing body of research that seeks to first understand the features of apps that may play a role in behavior change and then to measure and classify these features [28-30]. Common among these studies is an aim to identify features that employ best practice to allow health practitioners to better inform consumers and patients of the apps most suited to their needs. The ability of practitioners to give this advice is predicated on the ability of researchers to effectively classify and evaluate apps suitable for the most common health conditions through a reliable and valid measurement tool. As described by McKay et al [31], many studies investigating the potential of apps to change behaviors have employed a behavior change taxonomy (either the CALO-RE or 26 or 93 item taxonomy) for the rating and categorization of apps [10,20-22]. The aim of the systematic review undertaken by McKay et al [31] was to investigate ways in which researchers evaluate the potential health behavior change of apps to identify any current best practice approaches. Instruments identified in the review were created to investigate the behavior change potential of Web- and text-based health interventions [32]. The techniques present in these instruments have been identified in a range of studies and then linked back to behavior change potential. Most notable are by Abraham and Michie [33], who suggested a number of behavior change techniques common to many health behavior theories. Michie et al [34] identified 5 techniques present in physical activity and dietary interventions: self-monitoring, intention formation, specific goal setting, review of behavioral goals, and feedback on performance, finding that interventions that included self-monitoring with at least one other technique were responsible for the largest effect size [34]. These findings are supported by other work suggesting that self-monitoring is useful for increasing physical activity and improving diet for those who were overweight with comorbidities [35], with other work suggesting that self-monitoring is one of the strongest predictors of weight loss [36] and can also assist in decreasing alcohol consumption [37]. App-based studies that have employed these taxonomies have found apps to be lacking in the identified characteristics of a good behavior change intervention. For example, in an investigation of 166 apps that encourage medical adherence against 93 behavior change techniques, Morrissey et al [22] found most apps contained between 0 and 7 techniques, with the most common technique identified being action planning, where users are able to set a reminder to take medication at a specific time every day, and set prompts or cues, typically through the setting of an alarm. A total of 2 studies investigated physical activity but found few techniques for behavior change. Direito et al [21] found that most apps contained 8 techniques, most frequently providing instruction, setting graded tasks, and employing self-monitoring, whereas Conroy et al [10] identified 4 or fewer techniques in the physical activity apps they reviewed. As more practitioners begin to recommend apps to patients for a range of health care needs [38,39], it becomes essential that we have a valid and reliable way to evaluate these apps. Although both valid and reliable, the taxonomies of behavior change theory [33] were designed to evaluate the features of text and Web-based interventions [32,40,41], not for the review of apps. For instance, these taxonomies often feature a large number of items that are closely related, and are theoretically important in behavior change theory, but will often only appear once in an app. For example, the behavior change taxonomy used by Morrissey et al [22] includes 93 items, with each item allocated a score of 1 if present and 0 if absent. Many of these items are similar, for example, there are 11 items categorized as reward (including material incentives, material rewards, and nonspecific rewards), all of which are classified separately. For most apps, only one of these items would be present, thus although an app may offer rewards and the benefits that they bring to behavior change, they only offer 1 type means that app would receive a low score in that behavior change category. With increasing knowledge and the growing body of research into app-based interventions, there is a clear need for a purpose-designed app rating system to identify the potential for health behavior change. Although there is 1 scale, the Mobile App Rating Scale (MARS), that is able to describe the functionality of apps, including aesthetics and information shared [42], there is currently no scale that can measure the potential for behavior change. Over the past 3 years, the 3 authors of this study have been involved in rating and reviewing apps for the Victorian Health Promotion Foundation (VicHealth) Healthy Living Apps project [43]. The VicHealth Healthy Living Apps project is an annual rating activity using the MARS [42] and CALO-RE [40] scales to provide consumers with a guide to which apps may assist them best in promoting health. This project typically sees up to 400 apps rated annually for their functionality and ability to encourage or promote behavior change in 1 of the following 5 categories: healthy eating, physical activity, tobacco prevention, alcohol harm prevention, and mental well-being. These categories have been chosen as they form the key priority areas of VicHealth and, therefore, are those that are investigated in the VicHealth Healthy Living Apps project [43]. This experience has made clear to the authors that a purpose-designed scale to measure the health behavior change potential is needed for any app review that seeks to recommend apps to the public. This study aims to develop a reliable, theory-based scale that can be used to assess the behavior change potential of smartphone apps.

Methods

Study Design

The creation of this scale occurred in 4 phases. Phase 1 included a systematic review to identify all scales that have been used to rate the potential of an app to encourage behavior change. Results from this phase were analyzed and developed into a draft tool. Phases 2 to 4 consisted of series of deductive tests. The results of each round of testing were analyzed and incorporated into the next version of the scale until the team could be confident of reliability and validity of the scale. The final version of the scale was shared with a panel of experts for comment and feedback (see Figure 1 for an overview of the study procedure).

Figure 1

Study procedure. ICC: interclass coefficient; ABACUS: App Behavior Change Scale.

Phase 1: Systematic Review to Develop Initial Item Pool

A systematic search of the literature was conducted to gather all published evidence relating to the various ways that apps have been evaluated for behavior change potential to develop an item pool. This search was based on and extended a previous systematic review [31]. A total of 5 databases (Academic Search Complete, CINAHL Complete, E-Journal, MEDLINE Complete, and PsycINFO) were systematically searched. The search was completed on November 17, 2017, with no temporal limitations placed on the search. The search was limited to studies focusing on mobile phones, smartphones, cell phones, and tablets; used apps; and focused on health behaviors previously investigated in the VicHealth Healthy Living Apps project [43]. Search terms were health, wellbeing, preventative health, smok*, nutrition, alcohol, physical activity, or mental wellbeing. The inclusion criteria comprised studies that evaluated mobile health apps in English, evaluations or reviews of apps targeted at consumers, alone or in addition to health professionals, and studies that evaluated the effectiveness of mobile health apps. Study procedure. ICC: interclass coefficient; ABACUS: App Behavior Change Scale. Excluded studies comprised those that evaluated mobile health apps targeted only at health professionals, formative evaluations of mobile health apps, protocols for evaluations, apps that were not publicly or commercially available, studies that reported primarily on the validation of any mobile health app tool (eg, the MARS), and studies of apps not related to health behavior change. The papers were first screened by title and abstract based on the inclusion and exclusion criteria. The full texts of selected papers were then obtained for further assessment for final inclusion.

Phase 2: Face and Content Validity

The initial version of this scale was pilot-tested with 3 physical activity apps. The pilot testing was conducted by 2 experienced reviewers (FM and SS) and allowed the raters to (1) become familiar with the scale and (2) refine the wording of items and create item descriptors and examples. Following this pilot, the ABACUS version 1 was used to rate the 3 highest rating apps from each of the 5 categories (15 apps in total) from the VicHealth Healthy Living Apps project [43]. To undertake this testing, the reviewers downloaded and became familiar with each app. Similar to other studies [28,42,44], the authors spent approximately 10 to 15 min testing all app features before rating. After the apps were rated, the raters met to discuss the app and the allocated score as a way to achieve agreement among raters and strengthen the scale. This discussion allowed for an identification of the similarities and differences in rating and, importantly, the strengths and weakness of each question in the scale, including clarity and specificity. During this process, the raters added and refined descriptors and examples for each item.

Phase 3: Reliability Analysis

The ABACUS version 2 was used by 3 raters (FM, SS, and MD) to review 50 physical activity apps. Physical activity apps were chosen for this phase not only because there are a large number of physical activity apps in the Apple iTunes store, providing a large choice for consumers but also because past reviewing [43] suggests that they represent a wide range in app quality. Apps were downloaded from the app store, and mirroring testing in phase 2, the authors spent approximately 10 min to 15 min reviewing all the features of the app. Reliability of the scale was assessed using Krippendorff alpha. This allows for rating of ordinal data, can be used with an unlimited number of raters, and has been found to be superior to Cohen kappa [45,46]. Consistent with previous research, an alpha of more than .67 is used to indicate agreement [47], whereas, a negative alpha indicates less agreement than that would be expected by chance and suggests that there may have been inconsistencies in how measures were applied [48]. The internal consistency of the scale was calculated using Cronbach alpha. Interrater reliability was determined by interclass coefficient (ICC) [49]. Percentage agreement was also calculated.

Phase 4: Reliability Analysis 2

To investigate the discrepancies identified in phase 3, the same 3 raters (FM, SS, and MD) rated 5 unrated physical activity apps together. The apps were rated 1 at a time allowing for discussion of the results and for clarification of problem areas, specifically in item descriptions and examples. At the completion of this further moderation activity, an additional 20 apps were independently reviewed against ABACUS version 2, following the same procedure as phase 3.

Results

Phase 1

The search identified 593 unique papers. The abstracts and titles of all papers were reviewed, leaving 77 papers for full-text review. This review resulted in 50 papers that fully met the inclusion criteria and were included in this study (the list of resources is available in Table 1). To determine current best or common practice in app reviewing for behavior change, all scales used in the 50 papers identified were collected. For scales that were not provided as part of the manuscript or as a supplemental material, institution and academic sharing websites (such as Research.net) were searched. If the scale was not able to be located, the authors were emailed and a copy was requested. Only 2 scales [50,51] were unable to be obtained as the author had either moved on from that institution or there was no response to the email.

Table 1

Types and methods of evaluation.

Method	Location	Health condition	Reference
Data usage and user feedback	United Kingdom	Alcohol	Attwood et al [52]
Established evaluation checklist (Abraham and Michie 2008) [33]	New Zealand	Physical activity and dietary	Direito et al [21]
Established evaluation checklist (Abraham and & Michie 2008) [33]	The Netherlands	Health and Fitness	Middelweerd et al [20]
Established evaluation checklist (Abraham and Michie 2008) [33]	United States	Cancer	Vollmer et al [53]
Established evaluation checklist (CALO-RE)	United States	Physical activity	Conroy et al [10]
Established evaluation checklist (MARS^a and Abroms, 2013 checklist)	Australia	Smoking	Thornton et al [28]
Established evaluation checklist (MARS)	New Zealand	Weight loss and smoking cessation	Patel et al [54]
Established evaluation checklist (MARS)	New Zealand	Travel and dietary behavior associated with health and environmental impact	Sullivan et al [44]
Established evaluation checklist (MARS) and self-developed evaluation checklist based on literature review	United States	Weight management	Bardus et al [55]
Established evaluation checklist (Michie et al) [32]	Ireland	Medication adherence	Morrissey et al [22]
Matched case-control trial	Australia	Physical activity	Kirwan et al [56]
Not discussed	Spain and United Kingdom	Iron-deficiency anemia, hearing loss, migraine, low vision, asthma, diabetes mellitus, osteoarthritis, and unipolar depressive disorders	Martínez-Pérez et al [2]
Self-developed checklist based on diabetes guidelines	United States	Diabetes	Nie et al [58]
Self-developed checklist based on epilepsy guidelines	Australia	Epilepsy	Pandher et al [59]
Self-developed checklist, established evaluation checklist (system usability scale)	United States	Chronic illness	Singh et al [60]
Self-developed evaluation checklist	Unites States	Smoking cessation	Abroms et al [61]
Self-developed evaluation checklist	Germany	Diabetes	Arnhold et al [1]
Self-developed evaluation checklist	United States	Weight management	Azar et al [4]
Self-developed evaluation checklist	Canada	Cancer	Bender et al [62]
Self-developed evaluation checklist	South Korea	Smoking cessation	Choi et al [63]
Self-developed evaluation checklist; user feedback	Norway	Diabetes	Chomutare et al [6]
Self-developed evaluation checklist; user feedback	United States	Alcohol	Cohn et al [50]
Self-developed evaluation checklist	United States	Diabetes and endocrinology	Eng et al [64]
Self-developed evaluation checklist	Unites States	Smoking cessation	Hoeppner et al [17]
Self-developed evaluation checklist	United Kingdom	Asthma	Huckvale et al [3]
Self-developed evaluation checklist	Canada	Headache	Hundert et al [65]
Self-developed evaluation checklist	United Kingdom	Melanoma	Kassianos et al [66]
Self-developed evaluation checklist; user feedback	United States	Hypertension	Kumar et al [67]
Self-developed evaluation checklist	Spain	Heart disease	Martínez-Perez et al [57]
Self-developed evaluation checklist; user feedback	United Kingdom	Breast cancer	Mobasheri et al [68]
Self-developed evaluation checklist	Australia	Bipolar disorder	Nicholas et al [69]
Self-developed evaluation checklist	Italy	Hearing	Paglialonga et al [70]
Self-developed evaluation checklist	United States	Weight-loss	Pagoto et al [71]
Self-developed evaluation checklist; user feedback	United States	Cancer	Pandey et al [72]
Self-developed evaluation checklist; user feedback	Spain	Mindfulness	Plaza et al [73]
Self-developed evaluation checklist	United States	Mental health	Radovic et al [74]
Self-developed evaluation checklist; user feedback	United Kingdom	Pain	Reynoldson et al [8]
Self-developed evaluation checklist	Spain	HIV	Robustillo et al [75]
Self-developed evaluation checklist	United States	Health and wellness	Sama et al [76]
Self-developed evaluation checklist	Canada	Depression	Shen et al [77]
Self-developed evaluation checklist	United Kingdom	Smoking cessation	Ubhi et al [78]
Self-developed evaluation checklist; established evaluation checklist; user feedback	United States	Pediatric obesity	Wearing et al [79]
Self-developed evaluation checklist; user feedback	Australia	Alcohol	Weaver et al [80]
Self-developed evaluation checklist; user feedback	United States	Physical activity	Yang et al [81]
Self-developed evaluation checklist based on literature review	United States	Suicide prevention	Aguirre et al [82]
Self-developed evaluation checklist; established evaluation checklist (MARS)	United States	Pediatric medication adherence	Nguyen et al [83]
User feedback	Ireland	Physical activity	Casey et al [9]
User feedback	United Kingdom	Women’s health	Derbyshire and Dancey [84]
User feedback	United States	Smoking	Ferron et al [85]
User feedback	Spain	Type 2 diabetes, obesity, and breast-feeding	García-Gómez et al [51]

aMARS: Medication Adherence Rating Scale.

The scales identified in this systematic review were collated into a single document resulting in 1333 items (see Multimedia Appendix 1), with duplicates and questions present in the MARS removed, leaving 130 individual items (see Multimedia Appendix 1). Moreover, 2 authors (FM and SS) experienced in health promotion and health promoting apps reviewed the item pool. These authors had participated in the VicHealth Healthy Living Apps project [43], and each was experienced in rating hundreds of apps. The item pool was reviewed to identify or create items that were clear and based on previous work by these authors would be present in the highest quality apps [43]. From the 130 items, similar items were collapsed, for example, items that sought to identify avoidance or were collapsed with items that sought to minimize distraction; items that were presented as statements or single words were reworked into questions for ease of use. For example, 1 item that read “discrepancy between current behaviour and goal” was reworked to read “Does the app give the user the ability to quickly and easily understand the difference between current action and future goals?” This process resulted in an initial version of this scale, with 33 items that were categorized into 7 groups: (1) general, (2) goals, (3) feedback and monitoring, (4) knowledge and information, (5) actions, (6) rewards, and (7) environmental factors. These items formed the first version of the scale, the App Behavior Change Scale (ABACUS) version 1. Types and methods of evaluation. aMARS: Medication Adherence Rating Scale.

Phase 2

This process resulted in the removal of 9 questions that were deemed to be unclear or were found to be duplicates or unnecessary. For example, in the initial scale, there were 4 separate items that outlined behavior costs, rewards, and encouragement. The authors’ experience rating several hundreds of apps over a number of years, combined with the initial round of reviewing and discussion of this scale, allowed for the determination that more than one of these items were unlikely to be in the same app. As a result, these items were collapsed into 1 item: “Does the app provide general encouragement?” Other items were also removed at this point as it was determined that these questions were not relevant to behavior change, for example, a question about whether the app could be used without internet connection and 2 questions about expertise and consistency with national guidelines were collapsed into 1 question: “Was the app created with expertise and/or Does the app provides information that are consistent with national guidelines?” The resulting scale contained 24 items. Following this, the scale was tested with 15 apps, 3 from each category: physical activity; healthy eating; alcohol; smoking; and mental well-being. Again, this process allowed for a refinement of the scale and resulted in several changes, including clarifying words and descriptors, reordering items, and combining other items, for example, 3 items relating to material, social, and self-reward or incentive were collapsed into a single item: “Does the app provide a material or social reward or incentive?” The authors’ experience rating apps lead to the conclusion that it would unlikely that any 1 app would have more than 1 incentive or reward. Phase 2 resulted in the 22-item ABACUS version 2 with questions categorized into the following 4 categories: knowledge and information, goals and planning, feedback and monitoring, and actions. At this stage, the ABACUS version 2 was sent to 7 external experts for their comment on content. These experts included 3 experts on mental well-being, 1 expert on alcohol and tobacco, 1 on physical activity, 1 on behavioral science, and 1 on health promotion. These experts were able to offer suggestions on language and terminology used, resulting in refinement of terminology and descriptors. For example, one of the reviewers suggested that the descriptor of item 1.4 (Does the app provide instruction on how to perform the behavior?) also includes video instructions (the app is clear in telling the person how to perform a behavior or preparatory behaviors, either verbally, through video, or in written form. Please note, the behavior that is seeking to be changed, not information on how to use the app). This version of the scale is presented in Table 2.

Table 2

Percentage agreement and reliability of App Behavior Change Scale version 2.

Item #	Measure	Phase 3 (50 apps)
Item #	Measure	Interrater reliability (Krippendorff alpha)	Percent agreement
1.1	Customize and personalize features	−.010	56
1.2	Consistent with national guidelines or created with expertise	.25	88
1.3	Baseline information	.45	73
1.4	Instruction on how to perform the behavior	.79	91
1.5	Information about the consequences of continuing and/or discontinuing behavior	.21	92
2.1	Willingness for behavior change	−.01	97
2.2	Goal setting	.22	83
2.3	Review goals, update, and change when necessary	.33	75
3.1	Understand the difference between current action and future goals	.60	84
3.2	Self-monitor behavior	.53	81
3.3	Share behaviors with others and/or allow for social comparison	.30	65
3.4	User feedback (in person or automatically)	.12	88
3.5	Export data	.16	77
3.6	Material or social reward or incentive	.19	66
3.7	General encouragement	.23	65
4.1	Reminders and/or prompts or cues for activity	.23	61
4.2	Encourage positive habit formation	.11	71
4.3	Practice or rehearsal, in addition to daily activities	−.01	51
4.4	Opportunity to plan for barriers	−.01	97
4.5	Restructuring the physical or social environment	−.01	97
4.6	Distraction or avoidance	−.02	95

Phase 3

Phase 3 testing was conducted with 50 physical activity apps downloaded from the app store. All apps were rated independently by 3 reviewers against the ABACUS version 2, with ratings entered into Qualtrics to minimize user error. This phase found half of the questions to have high percentage agreement among reviewers (over 80%) with the scale overall reporting moderate interrater reliability (2-way mixed ICC=.69, 95% CI 0.52-0.82) and moderate internal consistency (Cronbach alpha=.71). However, some questions reported very low agreement. For example, question 4.3 “Does the app allow or encourage for practice or rehearsal, in addition to daily activities?” returned only an agreement of 51% with a negative Krippendorff alpha (alpha=−.01). Several other questions showed similarly low scores (see Table 2), and only 1 question achieved an alpha that would indicate agreement. These results prompted an additional round of discussion, and comparison was undertaken.

Phase 4

The initial discussion resulted in the collapsing of 2 goal questions into 1 from “Does the app allow for the setting of outcome (long-term) goals?” and “Does the app have the ability to set short and medium-term goals or a plan?” to “Does the app allow for the setting of goals?” Furthermore, a number of descriptors were reworded, and examples were provided for all questions. These changes resulted in ABACUS version 3 containing 21 questions (see Table 3 for final version of the scale).

Table 3

Final app behavior change scale, including examples.

Scale: item number and question			Definition	Example or further information	Source of question (from Table 1)
1. Knowledge and information
	1.1	Does the app have the ability to customize and personalize some features?	Elements of the app can be personalized through specific tools or functions that are specific to the individual using the app.	To select a disease type from among several available and then to follow a specific path or set of tools or systems. To select to receive emails or texts of a specific nature. To choose “yes” or “no” to a specific capability of the app would be considered personalization. To create a personalized exercise plan.	[44,54]
	1.2	Was the app created with expertise and/or Does the app provide information that is consistent with national guidelines?	This would be found in the about section or generally in the app.	Does the app suggest 30 min of exercise each day? Does it recommend 5 veg and 3 fruit? Does it seek to build resilience and promote help seeking? Is there any evidence that the app was created by an expert? (doctor/professional body/university)	[44,54]
	1.3	Does the app ask for baseline information?	This includes BMI^a, weight, smoking rate, exercise, or drinking behaviors	This might be at the set-up phase or in a profile setting.	[28,85]
	1.4	Does the app provide instruction on how to perform the behavior?	The app is clear in telling the person how to perform a behavior or preparatory behaviors, either verbally, through video, or in written form.NB: the behavior that is seeking to be changed, not information on how to use the app	This could include showing person how to use gym equipment, sharing sample plans for action, instruction on suitable clothing, recipes, and general tips.	[20,21,22,81]
	1.5	Does the app provide information about the consequences of continuing and/or discontinuing behavior?	The app gives the user information about the consequences of behavior in general, this includes information about the relationship between the behavior and its possible or likely consequences in the general case. This information can be general or personalized.	Consequences may include health, feelings, or cost consequences.	[22,81]
2. Goals and planning
	2.1	Does the app ask for willingness for behavior change?	Is there a feature during setup where you describe how ready you are for behavior change?	This may be in the form of a scale of readiness or in a question that asks the user to describe how ready you are.	[17,85]
	2.2	Does the app allow for the setting of goals?	The person is encouraged to make a behavioral resolution.The person is encouraged to set a general goal that can be achieved by behavioral means. This includes subgoals or preparatory behaviors and/or specific contexts in which the behavior will be performed. The behavior in this technique will be directly related to or be a necessary condition for the target behavior.	This is the explicit noting of a goal or choosing a goal from one provided within the app.	[20,21,40,44,54,55,81]
	2.3	Does the app have the ability to review goals, update, and change when necessary?	Involves a review or analysis of the extent to which previously set behavioral goals (regardless of short or long) were achieved.	This is where a goal can be changed. This allows people to act on previously set goals and then revise or adjust where needed.	[22,40,81]
3. Feedback and monitoring
	3.1	Does the app give the user the ability to quickly and easily understand the difference between current action and future goals?	Allows user to see how they are tracking against a goal and to see the difference between what they want to do and what they are currently doing. This will give some feedback on where they are at and what they need to change to get to where they want to be.	This could be in the form of a graph or some other visual describing how close the user is to meeting their goals.	[22,40,81]
	3.2	Does the app have the ability to allow the user to easily self-monitor behavior?	The app allows for a regular monitoring of the activity.	Connects with watch that records daily steps that can be reviewed. Allows for easy logging of exercise or meditation? Allows for tracking of weight loss. Allows logging of daily alcoholic drinks or cigarettes.	[20,21]
	3.3	Does the app have the ability to share behaviors with others (including social media or forums) and/or allow for social comparison?	The app allows the person to share his or her behaviors on social media or in forums. This could also include a buddy system or a leaderboard.	Share with Facebook or other socials Tell the user that they are doing x and at this time, other people like them are doing y	[4,20,21,22,85]
	3.4	Does the app have the ability to give the user feedback—either from a person or automatically?	The app is able to provide the person with feedback, comments, or data about their own recorded behavior. This might be automatic or could be personal.	Does the app have a coach function?	[22,40,81]
	3.5	Does the app have the ability to export data from app?	The app allows for the export of information and progress to an external user.	Export to a computer or to another user such as a doctor or fitness expert. Sharing to Facebook does not count.	[65]
	3.6	Does the app provide a material or social reward or incentive?	App provides rewards for attempts at achieving a behavioral goal. This might include efforts made toward achieving the behavior or progress made in preparatory steps toward the behavior or in achieving a goal.	Financial, either in returning money that was not spent on, for example, cigarettes or in paying someone to engage in a specific activity. Social or public, for example, congratulating the person for each day that he or she meets his or her exercise target.	[22,40,81]
	3.7	Does the app provide general encouragement?	The app provides general encouragement and positive reinforcement on actions leading to the goal.	This could include achievement badges or telling the user that they are a certain percentage closer to their goal.	[22,40,81]
4. Actions
	4.1	Does the app have reminders and/or prompts or cues for activity?	The app prompts the user to engage in the activity. The app has the ability to give notifications or reminders to cue the behavior.	This could be like the apple watch reminding you to stand or a meditation app telling you to meditate now.	[20,21]
	4.2	Does the app encourage positive habit formation?	The app prompts explicit rehearsal and repetition of the behavior–not just tracking or logging.	An example of this are the couch to 5 km apps that provide a training schedule.	[21,22,81]
	4.3	Does the app allow or encourage for practice or rehearsal, in addition to daily activities?	App does not have a lock on activities or a number that you cannot exceed daily.	This would include allowing the user to undertake extra activities in a single day.	[20,21]
	4.4	Does the app provide opportunity to plan for barriers?	The app encourages the person to think about potential barriers and identify ways of overcoming them.	Alcohol app might give strategies for a night out that would normally be a big night.	[55]
	4.5	Does the app assist with or suggest restructuring the physical or social environment?	The app prompts the person to alter the environment in ways so that it is more supportive of the target behavior.	Might suggest locking up or throw away or their high-calorie snacks or take their running shoes to work.	[21,22,81]
	4.6	Does the app assists with distraction or avoidance?	The app gives suggestions and advice on how the person can avoid situations or distract themselves when trying to reach their goal.	For example, a smoking cessation app may suggest that the user not drink coffee if this is typically combined with smoking behaviors that they are trying to cease.	[21,22,81]

This round of rating found over 80% of questions to have high percentage agreement among reviewers, with 3 questions scoring a Krippendorff alpha indicating agreement and a further 7 came close with alphas more than .5. The scale overall reported high interrater reliability (2-way mixed ICC=.91, 95% CI 0.81-0.97) and high internal consistency (Cronbach alpha=.93; see Table 4).

Table 4

Percentage agreement and reliability of App Behavior Change Scale version 3.

Item #	Measure	Phase 4 (20 apps)
Item #	Measure	Interrater reliability (Krippendorff alpha)	Percent agreement
1.1	Customize and personalize features	.52	83
1.2	Consistent with national guidelines or created with expertise	.73	83
1.3	Baseline information	.79	90
1.4	Instruction on how to perform the behavior	.63	87
1.5	Information about the consequences of continuing and/or discontinuing behavior	−.02	93
2.1	Willingness for behavior change	0	97
2.2	Goal setting	.58	83
2.3	Review goals, update, and change when necessary	.38	80
3.1	Understand the difference between current action and future goals	.34	80
3.2	Self-monitor behavior	.62	83
3.3	Share behaviors with others and/or allow for social comparison	.73	87
3.4	User feedback (in person or automatically)	.26	67
3.5	Export data	.43	87
3.6	Material or social reward or incentive	.15	60
3.7	General encouragement	.54	77
4.1	Reminders and/or prompts or cues for activity	.61	80
4.2	Encourage positive habit formation	.28	63
4.3	Practice or rehearsal, in addition to daily activities	.05	80
4.4	Opportunity to plan for barriers	.31	93
4.5	Restructuring the physical or social environment	.57	93
4.6	Distraction or avoidance	1	100

Percentage agreement and reliability of App Behavior Change Scale version 2. Final app behavior change scale, including examples. To select a disease type from among several available and then to follow a specific path or set of tools or systems. To select to receive emails or texts of a specific nature. To choose “yes” or “no” to a specific capability of the app would be considered personalization. To create a personalized exercise plan. Does the app suggest 30 min of exercise each day? Does it recommend 5 veg and 3 fruit? Does it seek to build resilience and promote help seeking? Is there any evidence that the app was created by an expert? (doctor/professional body/university) This might be at the set-up phase or in a profile setting. This could include showing person how to use gym equipment, sharing sample plans for action, instruction on suitable clothing, recipes, and general tips. Consequences may include health, feelings, or cost consequences. This may be in the form of a scale of readiness or in a question that asks the user to describe how ready you are. This is the explicit noting of a goal or choosing a goal from one provided within the app. This is where a goal can be changed. This allows people to act on previously set goals and then revise or adjust where needed. This could be in the form of a graph or some other visual describing how close the user is to meeting their goals. Connects with watch that records daily steps that can be reviewed. Allows for easy logging of exercise or meditation? Allows for tracking of weight loss. Allows logging of daily alcoholic drinks or cigarettes. Share with Facebook or other socials Tell the user that they are doing x and at this time, other people like them are doing y Does the app have a coach function? Export to a computer or to another user such as a doctor or fitness expert. Sharing to Facebook does not count. Financial, either in returning money that was not spent on, for example, cigarettes or in paying someone to engage in a specific activity. Social or public, for example, congratulating the person for each day that he or she meets his or her exercise target. This could include achievement badges or telling the user that they are a certain percentage closer to their goal. This could be like the apple watch reminding you to stand or a meditation app telling you to meditate now. An example of this are the couch to 5 km apps that provide a training schedule. This would include allowing the user to undertake extra activities in a single day. Alcohol app might give strategies for a night out that would normally be a big night. Might suggest locking up or throw away or their high-calorie snacks or take their running shoes to work. For example, a smoking cessation app may suggest that the user not drink coffee if this is typically combined with smoking behaviors that they are trying to cease. Percentage agreement and reliability of App Behavior Change Scale version 3.

Discussion

Principal Findings

This study reports on the creation of a scale (ABACUS) to measure the potential behavior change of smartphone apps. After conducting a systematic review to identify all research that has evaluated apps for behavior change, 133 items were identified and later modified after expert review to a final set of 21 items. The items within the scale are grouped into the following 4 categories: knowledge and information, goals and planning, feedback and monitoring, and actions. The ABACUS was reviewed by an expert panel and then tested first against 50 physical activity apps; however, because of concerns relating to moderate internal consistency and interrater reliability, an additional step of moderation was taken. This moderation saw the same raters come together to refine the scale, resulting in improved descriptors and the inclusion of examples for each question. Following this revision, the scale was used to rate an additional 20 apps. This round of ratings resulted in a high internal consistency and interrater reliability. Although previous studies evaluating smartphone apps have focused largely on features available in apps [21] or behavior change techniques through a self-developed evaluation checklist [4,10], the ABACUS provides researchers with a reliable and valid instrument to evaluate apps based on their behavior change potential. This scale will allow researchers to investigate the behavior change potential of a large number of apps reasonably quickly. This is important, as the fast-moving pace of app technology means that although randomized controlled trials (RCTs) remain important in understanding the impacts of individual apps on behavior [86], it has been suggested that the RCT may not be the most appropriate method to generate evidence around mobile apps [28]. RCTs can take a significant amount of time in planning and design meaning that by the time the RCT is available for publication, the information is no longer current [28]. The scale developed in this research is not a replacement for an RCT but rather will allow researchers and consumers to understand the behavior change potential of an app in the absence of an RCT. The MARS [42], a 23-item tool included 5 subscales for measuring app quality: engagement, functionality, aesthetics, information, and app subjective quality, with questions such as target age group, ease of navigation, or aesthetics can be used in conjunction with the ABACUS. The MARS is a useful tool in understanding the aesthetic and functional appeal of an app. When used together, the MARS and the ABACUS will allow researchers to provide users with 2 scores for each app: 1 that measures app quality and 1 that measures potential for behavior change. This study is only a starting point in the identification and interpretation of the behavior change potential of smartphone apps. This study only reports on the validation and reliability of physical activity apps, and as such, further testing of the scale should be conducted on additional health areas such as smoking, alcohol, and nutrition, as it is possible that different items may be important for these health areas. Furthermore, a more detailed investigation into the relative scores of apps will need to be undertaken. This will allow for an understanding of the importance of the overall score assigned to each app. At present, this scale is best understood as providing a continuous score rather than specific cut-off points. However, this is not to say that with more investigation and testing that clear scores could not provide a consumer with a numerical rating reflecting a behavioral outcome. This study has not purported to demonstrate correlation between an app’s score and the health outcome; however, this scale could be used in future along with a more detailed study of individual apps and the behavior change outcomes in using them. ABACUS has good interrater reliability and is a valid tool for evaluating the potential behavior change in smartphone apps. The validation and reliability testing of ABACUS contributes to the literature by providing a standardized method of evaluating smartphone apps for behavior change.

Limitations

Although this scale shows good reliability and validity, there are several limitations that need to be addressed. The first is that we have not sought to investigate criterion validity. The scale presented in this paper seeks to measure the theoretical behavior change potential of apps; and therefore, we do not seek to investigate the relationship between actual features of apps and behavioral outcomes. This scale has not been designed for this type of activity, so we leave this up to others to identify an appropriate method for such an investigation. Although reducing the numbers of items on the scale facilitates faster rating, there is a risk that removal of duplicate items and streamlining these items into 1 binary response may inflate a score. For example, by collapsing all goal-setting activities into 1 item, this scale recognizes apps that have any goals-setting ability, rather than the strength of that ability—a feature found in the behavior change taxonomy. Furthermore, there is a risk that by collapsing items that record starting a positive behavior with stopping a negative behavior, we may be missing a key aspect of behavior change. These decisions were made based on the authors’ experience of rating apps with an understanding that a single app will not include both of these features, and as such, in seeking to provide a succinct scale, it makes more sense to only measure 1 outcome. Like other similar studies [42], this study highlights the importance of rater’s knowledge of apps when completing such evaluations and with moderating 5 to 10 apps at the beginning of the process as a team is important to ensure a robust score. In addition, similar to other studies, raters in this study spent 10 min to 15 min with the app to become familiar before completing the evaluation. This time spent using the app is consistent with other studies that seek to review apps, as a longer time under review is not realistic [42,87]. Finally, 1 key limitation of this study is that the scale has been validated on physical activity apps. Although this scale seeks to be used in the future for other health behaviors, at this point in time, we are only confident that it can be used to rate the health behavior potential of physical activity apps. Other health behaviors will need to be investigated in future studies.

Conclusions

The ABACUS is a reliable tool that can be used to determine the behavior change potential of apps. This instrument fills a gap by allowing the evaluation of a large number of apps to be standardized across a range of health categories. This scale can be used by teams to rate apps that seek to promote behavior change, allowing for high-quality apps to then be recommended to the general public.

74 in total

Review 1. Characteristics of Smartphone Applications for Nutrition Improvement in Community Settings: A Scoping Review.

Authors: Emma Tonkin; Julie Brimblecombe; Thomas Philip Wycherley
Journal: Adv Nutr Date: 2017-03-15 Impact factor: 8.701

Review 2. The promise and peril of mobile health applications for diabetes and endocrinology.

Authors: Donna S Eng; Joyce M Lee
Journal: Pediatr Diabetes Date: 2013-04-30 Impact factor: 4.866

3. There is an app for that - Or is there? A content analysis of publicly available smartphone apps for managing alcohol use.

Authors: Bettina B Hoeppner; Melissa R Schick; Lourah M Kelly; Susanne S Hoeppner; Brandon Bergman; John F Kelly
Journal: J Subst Abuse Treat Date: 2017-09-12

4. Smartphone applications for seizure management.

Authors: Puneet Singh Pandher; Karamdeep Kaur Bhullar
Journal: Health Informatics J Date: 2014-07-18 Impact factor: 2.681

Review 5. Mobile health applications for the most prevalent conditions by the World Health Organization: review and analysis.

Authors: Borja Martínez-Pérez; Isabel de la Torre-Díez; Miguel López-Coronado
Journal: J Med Internet Res Date: 2013-06-14 Impact factor: 5.428

Review 6. Mobile applications for diabetics: a systematic review and expert-based usability evaluation considering the special requirements of diabetes patients age 50 years or older.

Authors: Madlen Arnhold; Mandy Quade; Wilhelm Kirch
Journal: J Med Internet Res Date: 2014-04-09 Impact factor: 5.428

Review 7. Commercially available mobile phone headache diary apps: a systematic review.

Authors: Amos S Hundert; Anna Huguet; Patrick J McGrath; Jennifer N Stinson; Mike Wheaton
Journal: JMIR Mhealth Uhealth Date: 2014-08-19 Impact factor: 4.773

Review 8. Apps for asthma self-management: a systematic assessment of content and tools.

Authors: Kit Huckvale; Mate Car; Cecily Morrison; Josip Car
Journal: BMC Med Date: 2012-11-22 Impact factor: 8.775

Review 9. Mindfulness-based mobile applications: literature review and analysis of current features.

Authors: Inmaculada Plaza; Marcelo Marcos Piva Demarzo; Paola Herrera-Mercadal; Javier García-Campayo
Journal: JMIR Mhealth Uhealth Date: 2013-11-01 Impact factor: 4.773

Review 10. Digital health behaviour change interventions targeting physical activity and diet in cancer survivors: a systematic review and meta-analysis.

Authors: Anna L Roberts; Abigail Fisher; Lee Smith; Malgorzata Heinrich; Henry W W Potts
Journal: J Cancer Surviv Date: 2017-08-04 Impact factor: 4.442

23 in total

1. Antifragile Behavior Change Through Digital Health Behavior Change Interventions.

Authors: Benjamin T Kaveladze; Sean D Young; Stephen M Schueller
Journal: JMIR Form Res Date: 2022-06-03

Review 2. Atrial fibrillation self-management: a mobile telephone app scoping review and content analysis.

Authors: Alice Pearsons; Coral L Hanson; Robyn Gallagher; Ronan E O'Carroll; Sahar Khonsari; Janet Hanley; Fiona E Strachan; Nicholas L Mills; Terence J Quinn; Brian McKinstry; Sheona McHale; Stacey Stewart; Mengying Zhang; Siobhan O'Connor; Lis Neubeck
Journal: Eur J Cardiovasc Nurs Date: 2021-05-22 Impact factor: 3.908

3. Digital interventions for the management of chronic obstructive pulmonary disease.

Authors: Sadia Janjua; Emma Banchoff; Christopher Jd Threapleton; Samantha Prigmore; Joshua Fletcher; Rebecca T Disler
Journal: Cochrane Database Syst Rev Date: 2021-04-19

4. Design and testing of a mobile health application rating tool.

Authors: David M Levine; Zoe Co; Lisa P Newmark; Alissa R Groisser; A Jay Holmgren; Jennifer S Haas; David W Bates
Journal: NPJ Digit Med Date: 2020-05-21

5. Mobile Health Projects in a High-Complexity Reference Hospital: Case Study.

Authors: Inmaculada Grau-Corral; Margarida Jansà; Pau Gascon; Raimundo Lozano-Rubí; Percy Efrain Pantoja; Daria Roca; Valentín Aragunde Miguens; Diego Hidalgo-Mazzei; Joan Escarrabill
Journal: JMIR Mhealth Uhealth Date: 2020-01-31 Impact factor: 4.773

6. Efficacy of a Just-in-Time Adaptive Intervention to Promote HIV Risk Reduction Behaviors Among Young Adults Experiencing Homelessness: Pilot Randomized Controlled Trial.

Authors: Diane Santa Maria; Nikhil Padhye; Michael Businelle; Yijiong Yang; Jennifer Jones; Alexis Sims; Marguerita Lightfoot
Journal: J Med Internet Res Date: 2021-07-06 Impact factor: 5.428

Review 7. Analysis of Secure Apps for Daily Clinical Use by German Orthopedic Surgeons: Searching for the "Needle in a Haystack".

Authors: Florian Dittrich; Sascha Beck; Anna Katharina Harren; Felix Reinecke; Sebastian Serong; Jochen Jung; David Alexander Back; Milan Wolf; Stefan Landgraeber
Journal: JMIR Mhealth Uhealth Date: 2020-05-07 Impact factor: 4.773

8. A Personalized Physical Activity Coaching App for Breast Cancer Survivors: Design Process and Early Prototype Testing.

Authors: Francisco Monteiro-Guerra; Gabriel Ruiz Signorelli; Shreya Tadas; Enrique Dorronzoro Zubiete; Octavio Rivera Romero; Luis Fernandez-Luque; Brian Caulfield
Journal: JMIR Mhealth Uhealth Date: 2020-07-15 Impact factor: 4.773

9. Commercial app use linked with sustained physical activity in two Canadian provinces: a 12-month quasi-experimental study.

Authors: Marc Mitchell; Erica Lau; Lauren White; Guy Faulkner
Journal: Int J Behav Nutr Phys Act Date: 2020-02-25 Impact factor: 6.457

10. Adding team-based financial incentives to the Carrot Rewards physical activity app increases daily step count on a population scale: a 24-week matched case control study.

Authors: Emma Pearson; Harry Prapavessis; Christopher Higgins; Robert Petrella; Lauren White; Marc Mitchell
Journal: Int J Behav Nutr Phys Act Date: 2020-11-19 Impact factor: 6.457