| Literature DB >> 35645911 |
Simon Schreibelmayr1, Martina Mara1.
Abstract
The growing popularity of speech interfaces goes hand in hand with the creation of synthetic voices that sound ever more human. Previous research has been inconclusive about whether anthropomorphic design features of machines are more likely to be associated with positive user responses or, conversely, with uncanny experiences. To avoid detrimental effects of synthetic voice design, it is therefore crucial to explore what level of human realism human interactors prefer and whether their evaluations may vary across different domains of application. In a randomized laboratory experiment, 165 participants listened to one of five female-sounding robot voices, each with a different degree of human realism. We assessed how much participants anthropomorphized the voice (by subjective human-likeness ratings, a name-giving task and an imagination task), how pleasant and how eerie they found it, and to what extent they would accept its use in various domains. Additionally, participants completed Big Five personality measures and a tolerance of ambiguity scale. Our results indicate a positive relationship between human-likeness and user acceptance, with the most realistic sounding voice scoring highest in pleasantness and lowest in eeriness. Participants were also more likely to assign real human names to the voice (e.g., "Julia" instead of "T380") if it sounded more realistic. In terms of application context, participants overall indicated lower acceptance of the use of speech interfaces in social domains (care, companionship) than in others (e.g., information & navigation), though the most human-like voice was rated significantly more acceptable in social applications than the remaining four. While most personality factors did not prove influential, openness to experience was found to moderate the relationship between voice type and user acceptance such that individuals with higher openness scores rated the most human-like voice even more positively. Study results are discussed in the light of the presented theory and in relation to open research questions in the field of synthetic voice design.Entities:
Keywords: anthropomorphism; application context; human–robot interaction; speech interface; synthetic voice; uncanny valley; user acceptance; voice assistant
Year: 2022 PMID: 35645911 PMCID: PMC9136288 DOI: 10.3389/fpsyg.2022.787499
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Description of the five experimental robot voices.
| Voice name | Speech engine | Modification | |
|---|---|---|---|
| Real human | Human | (Pro speaker) | Breath sounds filtered |
| High human-likeness | Synthetic I | Amazon Polly (German) | Original version |
| Synthetic II | Microsoft Hedda (German) | Original version | |
| Low human-likeness | Metallic | Amazon Polly | Metallic effect, Echo (10%) |
| Comic | Amazon Polly | Pitch shift (1.35) |
Means and standard deviations of the ratings of the five voices.
| Human-likeness | Eeriness | Pleasantness | ||||||
|---|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | |||
| All voices | 2.27 | 1.13 | All voices | 2.81 | 0.93 | All voices | 2.81 | 1.20 |
| Human | 3.85 | 0.93 | Human | 2.14 | 0.80 | Human | 4.06 | 0.89 |
| Synthetic I | 2.19 | 0.60 | Synthetic I | 2.41 | 0.80 | Synthetic I | 3.15 | 0.96 |
| Synthetic II | 1.99 | 0.97 | Synthetic II | 2.82 | 0.79 | Synthetic II | 2.64 | 1.03 |
| Comic | 1.65 | 0.65 | Comic | 3.32 | 0.75 | Comic | 1.90 | 0.91 |
| Metallic | 1.52 | 0.42 | Metallic | 3.45 | 0.79 | Metallic | 2.16 | 0.87 |
NAll = 163, NHuman = 34, NSynthetic I = 34, NSynthetic II = 33, NMetallic = 31, NComic = 31.
Rated on a five-point semantic differential scale.
Rated on a five-point Likert scale from 1 (very unpleasant) to 5 (very pleasant).
Figure 1The bar chart shows the mean values of the variables Human-likeness, Eeriness, and Pleasantness depending on the heard voice. The five voices are arranged from left to right in an increasing degree of Human-likeness.
Figure 2The bar chart shows the absolute values as percentage of invented names depending on the heard voice. The names were assigned to one of the five name classes.
Figure 3The bar chart shows the mean values of acceptance of the five different voices depending on the respective context. A Kruskal–Wallis test was used for pairwise group comparisons (**p < 0.01; *p < 0.05).