| Literature DB >> 35967831 |
Tze Wei Liew1, Su-Mae Tan2, Wei Ming Pang1,2, Mohammad Tariqul Islam Khan1, Si Na Kew3.
Abstract
Modern text-to-speech voices can convey social cues ideal for narrating multimedia learning materials. Amazon Alexa has a unique feature among modern text-to-speech vocalizers as she can infuse enthusiasm cues into her synthetic voice. In this first study examining modern text-to-speech voice enthusiasm effects in a multimedia learning environment, a between-subjects online experiment was conducted where learners from a large Asian university (n = 244) listened to either Alexa's: (1) neutral voice, (2) low-enthusiastic voice, (3) medium-enthusiastic voice, or (4) high-enthusiastic voice, narrating a multimedia lesson on distributed denial-of-service attack. While Alexa's enthusiastic voices did not enhance persona ratings compared to Alexa's neutral voice, learners could infer more enthusiasm expressed by Alexa's medium-and high-enthusiastic voices than Alexa's neutral voice. Regarding cognitive load, Alexa's low-and high-enthusiastic voices decreased intrinsic and extraneous cognitive load ratings compared to Alexa's neutral voice. While Alexa's enthusiastic voices did not impact affective-motivational ratings differently from Alexa's neutral voice, learners reported a significant increase of positive emotions from their baseline positive emotions after listening to Alexa's medium-enthusiastic voice. Finally, Alexa's enthusiastic voices did not enhance the learning performance on immediate retention and transfer tests compared to Alexa's neutral voice. This study demonstrates that a modern text-to-speech voice enthusiasm can positively affect learners' emotions and cognitive load during multimedia learning. Theoretical and practical implications are discussed through the lens of the Cognitive Affective Model of E-learning, Integrated-Cognitive Affective Model of Learning with Multimedia, and Cognitive Load Theory. We further outline this study's limitations and recommendations for extending and widening the text-to-speech voice emotions research.Entities:
Keywords: Amazon Alexa; Emotional design; Enthusiasm; Multimedia learning; Text-to-speech; Voice effect
Year: 2022 PMID: 35967831 PMCID: PMC9361884 DOI: 10.1007/s10639-022-11255-6
Source DB: PubMed Journal: Educ Inf Technol (Dordr) ISSN: 1360-2357
Fig. 1Cognitive Affective Model of E-learning emphasizing augmented social connection with a positive instructor
Fig. 2Cognitive Affective Model of E-learning emphasizing augmented affective-motivational state with a positive instructor
Fig. 3The multimedia learning environment
Fig. 4The Amazon Alexa Developer console
Learners’ profile
| Learners’ profile | Percentage |
|---|---|
| Male ( | 31.6% |
| Female ( | 68% |
| Undisclosed ( | 0.4 |
| Chinese ( | 83.2% |
| Malay ( | 7.0% |
| Indian ( | 7.4% |
| Other ( | 2.5% |
| Range | 17–25 |
| Standard Deviation | 1.4 |
| Average | 20 years |
| Pre-University ( | 0.8% |
| Diploma ( | 52.0% |
| Bachelor’s Degree ( | 47.1% |
| Accounting ( | 1.2% |
| Banking and Finance ( | 17.2% |
| Human Resource ( | 7.4% |
| International Business ( | 16.8% |
| Knowledge Management ( | 4.9% |
| Marketing ( | 12.3% |
| Business studies (general) ( | 40.2% |
| Johor ( | 23.8% |
| Kedah ( | 1.6% |
| Kelantan ( | 0.4% |
| Melaka ( | 7.8% |
| Negeri Sembilan ( | 13.1% |
| Pahang ( | 0.8% |
| Penang ( | 3.3% |
| Perak ( | 2.0% |
| Perlis ( | 0.4% |
| Sabah ( | 1.2% |
| Sarawak ( | 1.6% |
| Selangor ( | 28.3% |
| Terengganu ( | 1.6% |
| Undisclosed ( | 13.9% |
Means and standard deviations of the measures
| Alexa’s neutral voice | Alexa’s low-enthusiastic voice | Alexa’s medium-enthusiastic voice | Alexa’s high-enthusiastic voice | Total | |
|---|---|---|---|---|---|
| Prior Knowledge | 2.50 (1.30) | 2.84 (1.20) | 2.57 (1.29) | 2.72 (1.09) | 2.66 (1.23) |
| English Proficiency | 5.62 (1.49) | 5.73 (1.95) | 5.79 (1.68) | 6.29 (1.98) | 5.83 (1.78) |
| Positive emotions before learning engagement | 32.11 (6.81) | 32.08 (7.96) | 32.16 (6.95) | 31.96 (7.62) | 32.09 (7.29) |
| Perceived Alexa's enthusiasm | 5.32 (1.98) | 5.52 (1.83) | 6.01 (2.04) | 6.23 (2.09) | 5.75 (2.00) |
| Facilitating Learning | 4.38 (1.12) | 4.41 (1.08) | 4.43 (1.10) | 4.35 (1.12) | 4.40 (1.10) |
| Credibility | 4.88 (1.24) | 5.16 (1.15) | 5.15 (1.25) | 4.90 (1.07) | 5.03 (1.19) |
| Human-like | 3.83 (1.46) | 3.62 (1.38) | 4.01 (1.49) | 3.77 (1.38) | 3.81 (1.43) |
| Engaging | 4.02 (1.41) | 4.11 (1.26) | 4.26 (1.32) | 4.14 (1.41) | 4.13 (1.34) |
| Positive emotions after learning engagement | 32.92 (7.66) | 33.06 (7.79) | 33.84 (7.58) | 33.00 (8.46) | 33.23 (7.80) |
| Intrinsic motivation | 4.58 (1.14) | 4.39 (1.04) | 4.46 (1.15) | 4.55 (1.35) | 4.49 (1.16) |
| Intrinsic Load | 5.40 (1.66) | 4.68 (2.04) | 4.96 (1.87) | 4.80 (2.45) | 4.96 (2.01) |
| Extraneous Load | 3.77 (2.40) | 2.89 (2.02) | 3.19 (1.95) | 2.84 (1.99) | 3.19 (2.12) |
| Germane Load | 6.20 (1.57) | 6.20 (1.46) | 5.96 (1.80) | 5.96 (1.87) | 6.09 (1.67) |
| Retention | 1.39 (1.56) | 1.49 (2.00) | 1.31 (1.48) | 1.82 (2.29) | 1.48 (1.83) |
| Transfer | 4.26 (3.44) | 3.65 (3.75) | 4.12 (3.27) | 3.86 (4.28) | 3.98 (3.65) |
Summary of Alexa’s voice enthusiasm effects
| Alexa’s low-enthusiastic voice | Alexa’s medium-enthusiastic voice | Alexa’s high-enthusiastic voice | |
|---|---|---|---|
| Perceived voice enthusiasm | No effect | Exuded more recognizable enthusiasm cues than Alexa’s neutral voice | Exuded more recognizable enthusiasm cues than Alexa’s neutral voice |
| Alexa’s persona ratings | No effect | No effect | No effect |
| Positive emotions | No effect | Induced a significant increase of positive emotions from baseline positive emotions | No effect |
| Intrinsic motivation | No effect | No effect | No effect |
| Cognitive load ratings | Decreased intrinsic load and extraneous load than Alexa’s neutral voice | No effect | Decreased extraneous load than Alexa’s neutral voice |
| Learning performance | No effect | No effect | No effect |