| Literature DB >> 33458698 |
Dilawar Shah Zwakman1, Debajyoti Pal1, Chonlameth Arpnikanondt1.
Abstract
Currently, the use of voice-assistants has been on the rise, but a user-centric usability evaluation of these devices is a must for ensuring their success. System Usability Scale (SUS) is one such popular usability instrument in a Graphical User Interface (GUI) scenario. However, there are certain fundamental differences between GUI and voice-based systems, which makes it uncertain regarding the suitability of SUS in a voice scenario. The present work has a twofold objective: to check the suitability of SUS for usability evaluation of voice-assistants and developing a subjective scale in line with SUS that considers the unique aspects of voice-based communication. We call this scale as the Voice Usability Scale (VUS). For fulfilling the objectives, a subjective test is conducted with 62 participants. An Exploratory Factor Analysis suggests that SUS has a number of drawbacks for measuring the voice usability. Moreover, in case of VUS, the most optimal factor structure identifies three main components: usability, affective, and recognizability and visibility. The current findings should provide an initial starting point to form a useful theoretical and practical basis for subjective usability assessment of voice-based systems.Entities:
Keywords: Factor analysis; Graphical user interface; System usability scale; Usability; Voice usability scale; Voice-assistants
Year: 2021 PMID: 33458698 PMCID: PMC7798382 DOI: 10.1007/s42979-020-00424-4
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Summary of usability related studies of voice-assistants
| Study no. | Usability approach | Sample size | Drawbacks |
|---|---|---|---|
| [ | Heuristics based on Nielsen and Molich | 16 participants (2 groups of 8 each) | 1. Heuristic evaluation is done by experts 2. Typically, this method is used during early product lifecycle 3. How the participants interact with the voice-assistants is not elaborated |
| [ | Non-standardized tool | 8 participants | 1. Number of participants too low to come to valid statistical conclusions 2. No standardized questionnaire is used 3. Results only provide descriptive statistics |
| [ | SUS | 20 participants | 1. Results only provide descriptive statistics 2. No factor loadings of different SUS items are reported, neither their correlation 3. Whether SUS could capture the usability issues of voice-assistants in not clear |
| [ | SUS | 52 participants | 1. Results only provide descriptive statistics 2. No factor loadings of different SUS items are reported 3. The tasks that users had to do with the voice-assistants is not clearly mentioned 4. No empirical evidence and discussion toward the suitability of SUS in measuring usability of voice-assistants |
| [ | SUS | 12 participants | 1. The factor structure of SUS is not mentioned 2. The learnability dimension of SUS is non-significant indicating potential loading problems as per the original version 3. Differential loadings of the SUS items indicate that the voice scenario is different from GUI’s 4. High correlation of SUS score with some other scale does not imply that SUS is a good usability measure for voice-assistants |
| [ | Non-standardized tool | 1462 survey participants | 1. The exact nature of voice-assistants evaluated is not mentioned 2. The data is collected from survey with no mentions of whether the participants actually use a voice-assistant or not 3. How the participants interact with the voice-assistants is not mentioned that can produce biased results |
Assessment scheme for the proposed VUS usability instrument
| Usability dimension | Explanation |
|---|---|
| General/usability | There is no specific focus on any aspect. It captures the general impression or sentiments of the users after using the voice-assistants |
| Affective | The psychological state of the users (emotions/feelings/impressions) after using the voice-assistants. The users’ feelings (happy/pleased/satisfied) are reflected by this dimension |
| Recognition & visibility | Users must recognize the various functions and options just through interaction and affordance with the voice-assistants. The voice-assistants must provide interaction in a natural and intuitive manner rather than stating what kinds of commands someone can give |
| Pragmatic | Ability of the voice-assistants to support ‘goal-oriented’ tasks (e.g., making a call, searching for information, etc.). The users will mainly judge the efficiency/usefulness of the voice-assistants |
| Errors & frustration | The voice-assistants should have constraints built in place to help users not to come across errors. Cascading errors should be avoided. In the event of some error, the voice-assistants must allow the users to exit from errors or a mistaken conversation |
| Guidance & help | The voice-assistants must provide guidance to the users through their interactions, so that they are not easily lost. Interaction must be short for minimizing the acoustic confusability of vocabulary (i.e., short yes/no type) |
Fig. 1Flowchart of the overall methodology
SUS and VUS questionnaires
| Item | SUS | VUS |
|---|---|---|
| 1 | I think I would like to use the voice-assistants frequently | I thought the response from the voice-assistant was easy to understand |
| 2 | I found the voice-assistant unnecessarily complex | I thought the information provided by the voice-assistant was not relevant to what I asked |
| 3 | I though the voice-assistant was easy to use | My interaction with the voice-assistants was fast |
| 4 | I think that I would need the support of a technical person to use this voice-assistant | I thought the voice-assistant had difficulty in understanding what I asked it to do |
| 5 | I found the various functions in this voice-assistant were well integrated | I felt the voice-assistant enabled me to successfully complete my tasks when I required help |
| 6 | I thought that there was too much inconsistency in this voice-assistant | It was easy to lose track of where you were in an interaction with the voice-assistants |
| 7 | I imagine that most people would learn to use this voice-assistant very quickly | The voice-assistant had all the functions and capabilities that I expected it to have |
| 8 | I found the voice-assistant very awkward to use | I found it difficult to customize the voice-assistant according to my needs and preferences |
| 9 | I felt very confident using the voice-assistant | Overall, I am satisfied with using the voice-assistant |
| 10 | I needed to learn a lot of things before I could get going with this voice-assistant | I found the voice-assistant difficult to use |
| 11 | × | I felt the response from the voice-assistant was sufficient |
| 12 | × | I found it frustrating to use the voice-assistant in a noisy and loud environment |
| 13 | × | I was able to recover easily from errors |
| 14 | × | The voice-assistant was unreliable |
Overview for some of the questions
| Question no. | Questions |
|---|---|
| 1 | What is the weather like in Bangkok? How about tomorrow? Will it rain on Friday? |
| 2 | What is todays date? What time is it now? Why is the sky blue? |
| 3 | What is COVID-19? |
| 4 | What do you know about Asia? What about Nobel Prize? |
| 5 | Add milk to my shopping list. Add eggs and jam to my shopping list |
| 6 | How many eggs did you add on my shopping list? |
| 7 | What is on my shopping list? |
| 8 | Set an alarm for 08:30 |
| 9 | I need to make an appointment with doctor |
| 10 | Set a schedule for 12:30 |
| 11 | How can I protect myself from corona virus? |
| 12 | Give me some words of wisdom |
| 13 | What is 10 plus 5? Add 20 to the result. What is the final result? Divide 20 by 0 |
| 14 | How to go to Siam BTS? How much time will it take? |
Distribution of the SUS and VUS scores
| Parameters | Previous study [ | SUS (present study) | VUS (present study) |
|---|---|---|---|
| 2324 | 61 | 61 | |
| Minimum | 0.00 | 38.32 | 45.00 |
| Maximum | 100.00 | 89.96 | 98.33 |
| Mean | 70.14 | 63.69 | 70.19 |
| Median | 75.00 | 63.31 | 69.99 |
| Standard deviation | 21.71 | 11.44 | 15.25 |
| Standard error | 0.45 | 1.46 | 1.95 |
| First quartile | 55.00 | 61.64 | 56.66 |
| Third quartile | 87.50 | 77.47 | 83.34 |
| Inter-quartile range | 32.50 | 15.83 | 26.68 |
| 99.9% confidence interval (upper) | 71.50 | 65.76 | 75.38 |
| 99.9% confidence interval (lower) | 68.70 | 58.62 | 64.99 |
Fig. 2SUS and VUS score distribution
Relevant statistical measures for KMO and Bartlett’s test
| Statistic | SUS | VUS |
|---|---|---|
| KMO (sampling adequacy) | 0.733 | 0.790 |
| Bartlett’s test Chi-square | 185.01 | 252.738 |
| Degree of freedom ( | 45 | 45 |
| Significance | < 0.001 | < 0.001 |
Fig. 3Scree plot for SUS and VUS scales
Factor analysis for SUS dataset (four- vs. three- vs. two-factor solutions)
The numbers marked in bold represent the highest loading on the factor
Factor analysis for VUS dataset (four- vs. three- vs. two-factor solutions)
| Methodology | Items | Four-factor solution | Three-factor solution | Two-factor solution | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 1 | 2 | 3 | 1 | 2 | ||
| Principal component analysis | VUS1 | 0.106 | 0.115 | 0.115 | − 0.007 | 0.303 | 0.108 | |||
| VUS2 | 0.177 | 0.247 | 0.126 | 0.258 | 0.222 | 0.276 | ||||
| VUS4 | 0.263 | 0.132 | 0.209 | 0.280 | 0.192 | 0.290 | ||||
| VUS5 | − 0.023 | 0.479 | 0.026 | − 0.071 | 0.255 | − 0.028 | ||||
| VUS7 | 0.186 | 0.181 | − 0.262 | 0.203 | − 0.317 | 0.113 | ||||
| VUS8 | 0.366 | 0.002 | 0.134 | 0.321 | 0.041 | 0.186 | ||||
| VUS9 | 0.323 | 0.226 | 0.162 | 0.366 | 0.064 | 0.356 | ||||
| VUS10 | − 0.023 | 0.094 | 0.301 | 0.007 | 0.320 | 0.054 | ||||
| VUS11 | − 0.080 | 0.049 | 0.532 | 0.037 | 0.228 | 0.073 | ||||
| VUS12 | 0.071 | − 0.281 | − 0.113 | − 0.071 | − 0.244 | − 0.150 | ||||
| Unweighted least squares | VUS1 | 0.101 | 0.126 | 0.063 | 0.115 | 0.038 | 0.117 | 0.502 | ||
| VUS2 | 0.220 | 0.196 | 0.029 | 0.241 | 0.160 | 0.290 | ||||
| VUS4 | 0.237 | 0.125 | 0.121 | 0.234 | 0.197 | 0.309 | ||||
| VUS5 | 0.033 | 0.290 | 0.300 | 0.003 | 0.155 | 0.003 | ||||
| VUS7 | 0.079 | 0.136 | 0.130 | 0.079 | 0.361 | 0.109 | ||||
| VUS8 | − 0.051 | 0.224 | 0.164 | 0.291 | − 0.110 | 0.193 | ||||
| VUS9 | 0.333 | 0.318 | 0.273 | 0.221 | 0.301 | 0.310 | ||||
| VUS10 | − 0.043 | 0.062 | 0.088 | 0.118 | − 0.067 | 0.059 | ||||
| VUS11 | 0.083 | 0.196 | 0.148 | 0.091 | 0.365 | 0.085 | 0.560 | |||
| VUS12 | 0.086 | − 0.135 | − 0.078 | − 0.156 | 0.128 | 0.536 | − 0.057 | |||
| Maximum-likelihood analysis | VUS1 | 0.092 | 0.204 | 0.217 | 0.106 | 0.039 | 0.134 | |||
| VUS2 | 0.208 | 0.200 | 0.121 | 0.248 | 0.166 | 0.292 | ||||
| VUS4 | 0.151 | 0.265 | 0.196 | 0.253 | 0.186 | 0.326 | ||||
| VUS5 | 0.021 | 0.170 | − 0.074 | 0.019 | 0.157 | 0.050 | ||||
| VUS7 | 0.133 | 0.264 | − 0.118 | 0.084 | 0.367 | 0.091 | ||||
| VUS8 | 0.334 | 0.120 | 0.028 | 0.324 | − 0.151 | 0.181 | ||||
| VUS9 | 0.233 | 0.311 | 0.318 | 0.285 | 0.495 | 0.267 | ||||
| VUS10 | 0.130 | − 0.079 | 0.187 | 0.120 | − 0.060 | 0.030 | ||||
| VUS11 | 0.053 | 0.305 | 0.211 | 0.076 | 0.152 | 0.093 | ||||
| VUS12 | − 0.134 | 0.104 | 0.073 | − 0.118 | 0.088 | 0.008 | ||||
The numbers marked in bold represent the highest loading on the factor