Toshiaki Kakii1, Hideyuki Fujiu2, Guiming Dai1. 1. Sumitomo Electric Industries Ltd., Osaka, Japan. 2. Department of Psychology, Faculty of Human Sciences, University of Tsukuba, Tokyo, Japan.
Abstract
The availability of remote communication has grown globally due to the COVID-19 outbreak. In some remote communication, meeting participants use audio only with their web cameras turned off, resulting in a lack of exchange of nonverbal information. In this study, we defined an "ear animation" as an avatar composed of a simple face-like body with no facial features and ear-like parts coming out from this body which can be animated. The purpose of this study was to design the ear animation and evaluate user impressions of it as nonverbal information. While setting conveying information and conveying emotion as dependent variables, independent variables we set in this study were three different conditions: when ear animations were presented in silent operation mode, when ear animations were presented simultaneously with simple voice, when only voice was played, and three different contents: "agreement", "skepticism", and "disagreement" conveyed from ear animations. Using Two-way ANOVA (repeated) with these variables, we conducted comparative analysis. The results showed that the condition of ear animations presented simultaneously with voice had the potential to be a new way of conveying nonverbal information by combining relevant ear animation movement forms.
The availability of remote communication has grown globally due to the COVID-19 outbreak. In some remote communication, meeting participants use audio only with their web cameras turned off, resulting in a lack of exchange of nonverbal information. In this study, we defined an "ear animation" as an avatar composed of a simple face-like body with no facial features and ear-like parts coming out from this body which can be animated. The purpose of this study was to design the ear animation and evaluate user impressions of it as nonverbal information. While setting conveying information and conveying emotion as dependent variables, independent variables we set in this study were three different conditions: when ear animations were presented in silent operation mode, when ear animations were presented simultaneously with simple voice, when only voice was played, and three different contents: "agreement", "skepticism", and "disagreement" conveyed from ear animations. Using Two-way ANOVA (repeated) with these variables, we conducted comparative analysis. The results showed that the condition of ear animations presented simultaneously with voice had the potential to be a new way of conveying nonverbal information by combining relevant ear animation movement forms.
Under the influence of COVID-19, the frequency of remote communication is increasing rapidly. For example, it has been reported that Microsoft Teams marked 2.7 billion meetings in one day in March 2020 (Spataro, 2020). Good quality video has been made possible by the widespread availability of high-speed internet and high-performance PCs and other devices. While dynamic picture images such as facial expressions can be transmitted, there are occasions where participants join actual remote meetings with their cameras turned off. There is a survey that asked why you do not show your face in remote communication (Castelli & Sarvary, 2021; Tobi et al., 2021). The most frequent reasons given were: "other people turning their video off", "people wanting to multitask", "people feeling self-conscious about their appearance", "not actively participating” and “the effort of being seen” were mentioned. In short, "feelings of not wanting to be seen", which has nothing to do with the technological aspect of image transmission, has become an obstacle to transmission of facial information. In remote communication where only voice and documents are shown as faces are hidden, nonverbal communication occurred from dialogue is not conveyed enough. Nonverbal information is an important factor for smooth interaction (Archer & Akert, 1997; Haase & Tepper, 1972). Therefore, remote communication with faces not shown on screen prevents the establishment of smooth communication (Rodeghero et al., 2021). Giving feedback and response from dialogue are especially important forms of nonverbal communication. However, current remote communication has a problem that nonverbal communication is not conveyed effectively (Peper et al., 2021). Communication where no facial expression appears lacks this dialogue response, so that big problems are created such as reducing the sense of liveliness in communication, leading to physical fatigue.As a new method for remote communication where no real faces are shown, many studies have done on the use of avatars (Bailenson, 2018; Nowak & Fox, 2018). As a result of comparing a virtual reality meeting versus a video conferencing environment, it revealed that avatars improved feelings of presence, closeness, and arousal in virtual reality environments (Campbell et al., 2019). However, using avatars also causes issues. Especially in business videoconferencing, avatars can be both an enabler and an obstacle in helping interaction with other participants (Forsberg & Kirchner, 2021; Junuzovic et al., 2012). In designing avatars, it is necessary to examine whether an avatar's appearance can effectively interact with user's expression. In order to do so, we need to examine how the avatar works (Oh et al., 2016). Regarding avatars used in video games, there is a study that examines how customization and identification of avatars influence users' communication behaviors (Takano & Taka, 2022). If we make the avatar's face too cartoonish, it becomes too entertaining to be appropriate for business use. Some avatar designs could be misleading in business meetings. This is to say that while avatars have big potential, there are also design-dependent difficulties of individual avatars. For example, the impact of subtle avatar facial design must be considered when avatars are introduced in videoconferencing. The uncanny valley problem, which decreases familiarity once one's avatar resembles to oneself too closely, has been pointed out as well (Mori, MacDorman, & Kageki, 2012; Shin et al., 2019). These issues cannot be solved simply by technical development. In order to deal with situations where a web camera is turned off, it may be beneficial to investigate conveying nonverbal information by means other than avatars equipped with facial expressions, since it will create more options available for diverse communications.Given these circumstances, we became interested in creating a new interface by asking ourselves a question: “Is it possible to establish lively communication with a sense of unity and presence in a remote communication by using nonverbal communication unrelated to facial expressions?” We thought if simple animations are utilized as nonverbal communication, such an interface would be available for a wide range of people with easy and minimum customization.In real-time video interaction through monitors, there is an issue with the discomfort in maintaining direct eye contact. Countermeasures for this have been discussed in some studies (Bohannon et al., 2013; Park et al., 2021). The proposed interface is also expected to be able to end this uncomfortable feeling by eliminating direct eye contact.
Purpose and structure of this study
An avatar is defined as a character representing a particular person on the internet. In this study, an avatar composed of a simple face-like simple body with no facial features and ear-like parts coming out from the body which can be animated was defined as “ear animation”.The purpose of this study was to design the ear animation and evaluate user impressions of it as nonverbal information. This study consists of three parts. First, we designed movement forms as ear animations for nonverbal communication, referring to human body movements such as head movements and gestures. Next, we conducted an experiment on impressions rating of the ear animations’ movement forms. Then we compared psychological impressions using Two-way ANOVA (repeated) with the following independent variables. As the independent variables, we set three different contents conveyed by the ear animations and three different conditions. The three contents conveyed were: “agreement”, “skepticism”, and “disagreement”, which corresponded to movement forms we designed intentionally with the rotation of three axes. The three different conditions were: when the ear animations appear with silence; when the ear animations appear simultaneously with voice; and when only voice is heard while no visual image appears. In the general discussion, we discussed possibility of the nonverbal communication created by combination of our newly designed ear animations and voice and contents conveyed (movement forms). We concluded this study by describing the limitations of this research and future challenges.
Ear animation design
Basic guidelines for designing ear animations
We set the following five guidelines for designing the nonverbal communication utilizing ear-movement forms in remote communication where faces are not shown.Facial expressions (eyebrows, eyes, and mouth) that require each individual design are not included.They can covey information in a way that is easy to understand.They can convey emotions in a way that is easy to understand,They give a friendly impression.They have good noticeabilityAs a new rendering that meets all five requirements, we focused on ear animation representing a rabbit's ears. The animated ears do not need to be limited to those of rabbit, as they can be the ears of any animal. However, we adopted rabbit ears as a representative example since their long ears are very expressive. The advantages we can expect from designing nonverbal communication using long ears are as follows:No facial design is required by using only ears for animation.Ears that are placed as if they protrude upward from top of the head improve visibility.There is a possibility that information, emotion and appearance are expressed with ear movement.It gives a friendly impression by using rabbit ears.In the overall design, the ears are placed on top of the "main head/body" as if they protrude upward. As a simple example of the head/body, the shape of it is a circle in two dimensions or a sphere in three dimensions. Any details related to facial features such as eyes, eyebrows, and a mouth are not included in the main head/body, so that it looks like a featureless face. The main body represents a human head and its body combined. It has a display function to identify the corresponding speaker. The identification includes the speaker's name, initial, position (host, participant), and organization. Its shape is not limited to circle or sphere. Because this study is for primary research, a simple sphere is used for the main head/body. Fig. 1
shows a basic image of the ear animations which consists of the main head/body and ears. The ear animations that appeared on the monitor screen as visual stimulus samples does not show any text such as one's initials that can identify a person. Color and texture of the ear animations can act as an identification information unique to its user. However, since this experiment was not to evaluate individual design but to evaluate the impression of generalized ear movement response, a simple design was used.
Fig. 1
Ear animation's basic form (Primary state prior to start moving).
Ear animation's basic form (Primary state prior to start moving).As far as we know, this is the first research that uses ear animations with no facial features as a central role for nonverbal communication in remote communication.
Specific design of ear animation
In order to extract specific emotion in remote communication, we referred to two previous studies as guidance for designing the ear animation. The first study is about the Specific Affect Coding System (Gottman, McCoy, Coan, & Collier, 1996; Harrigan et al., 2005). A total of 20 different codes are used as specific affect codes for nonverbal communication between couples in contracted relationship to encode and categorize into positive, negative and neutral. However, these specific affect codes for nonverbal communication are intended for evaluating interaction between couples in intimate relationships, such as committed couples including married couples. Therefore, not all of the specific affect codes necessarily match those for a professional meeting, which is our target. In order to set suitable codes for our target, we adopted the viewpoint of the other previous study: an output communication/persuasion matrix that reported 12 steps as listener responses for persuasion psychology (McGuire, 1985). In this study, 12 steps were set for a listener's outputs. Since we assumed meetings in remote communication, three to seven steps in particular correspond. Based on these two previous studies, we designed the following six codes as basic targets for nonverbal communication from the viewpoint of listener reaction in business meetings. These are: 1) agreement and 2) applause as positive responses, 3) neutral and 4) confusion as neutral codes, 5) skepticism and 6) disagreement as negative codes.For designing the ear animation's movement corresponding to the nonverbal communication selected, we referred to a human's head movements and gestures. The meanings of head movements as nonverbal information have been revealed with research on detecting head movements (Maynard, 1987; Buján, 2019). For example, validation is expressed by "nodding", a slow back-and-forth head movement. Critical expression is expressed by "neck swiveling", a head tilting movement side to side. Disgust or disagreement is expressed by "shaking", a head turning movement.Fig. 2 shows the ear animation's basic form and its rotation movement axes: X, Y, and Z. "Agreement" expresses positive nonverbal communication indicating agreement. It is represented by pitching, or X axis rotation which is a head-bending movement. "Skepticism" is designed to express mild negative nonverbal communication. We represented it by rolling, or Y axis rotation which is a head-tilting movement. "Disagreement" expresses a strong negative nonverbal impression indicating rejection. We designed it corresponded by yawing, or Z axis rotation which the head's vertical axis is rotating. "Applause" is designed as the ears clapping. It calls up an image of clapping hands. For "neutral", ears contracts vertically to express a neutral state where one cannot make a decision. "Confusion" is expressed by ears entwined, showing it is hard to understand. Fig. 3
shows the list of movement forms (typical drawing) for the ear animations we designed and their corresponding emotions we intended in their designs. The movement forms describe from their basic (beginning) posture to posture in the middle, then to final posture. We adjusted the speed of head movement and hand clapping movement to the same speed so that it gives a natural impression. Regarding ear movement and speed of the ear animations designed, two public licensed psychologists observed the animations in advance and confirmed the animations did not have any particular issue as animations in dialogue interaction. Videos that show 6 different ear movements in the ear animations are presented at: https://github.com/core-dx/mimichara/tree/main/paper/videodata/movement.
Fig. 2
Relationship among rotation movement axes: X, Y, and Z of the ear animation.
Fig. 3
The list of ear animation movement forms, movement description and their contents intended in design.
Relationship among rotation movement axes: X, Y, and Z of the ear animation.The list of ear animation movement forms, movement description and their contents intended in design.
Experiment 1: evaluating user impressions on visual stimulus of ear-animation-movement forms
Purpose of experiment 1
We designed nonverbal communication needed in remote communication as ear movement forms by replacing the typical head movements and hand gestures of Japanese people with ear animations. Would these ear animations maintain consistency with our intention of design? The purpose of Experiment 1 was to conduct impression ratings of nonverbal information brought by movement of the ear animations. In the experiment, we conducted six levels of one-way experiment planning (repeated) by setting six types of ear animation movement we designed as independent variables, and impression rates of ear animation's movements conveyed as dependent variables.
Hypothesis for Experiment 1
Hypothesis for Experiment 1 is as follows:Ears-bending-forward movement gives an impression of agreement.Ears-clapping movement gives an impression of applause.Ears-contracting movement gives a neutral impression.Ears-entwining movement gives an impression of comprehension difficulty.Ears-leaning-to-one-side movement gives an impression of skepticism.Ears-turning movement gives an impression of disagreement.
Method
Structure and procedure of the experiment 1
In Experiment 1, we attempted to validate H1 to H6 by seeing whether each ear-animation-movement form correspond to its intended impression. For the validation, each ear movement was shown to the participants. Then we asked them six questions regarding their impressions from the ear movements. Six-point likert-type scales were used for the questionnaires by setting a 6-point scale from Never: 0 to Yes, Strongly Agree: 5. Participants watched video of ear-animation-movement forms, then filled in their answers for the questions in Table 1
.
Table 1
Experiment 1: Questions for evaluating impressions of ear-animation-movement forms.
1. Did you feel the movement expressing agreement, was like saying "I see"?
2. Did you feel the movement expressing applause, was like saying "Great"?
3. Did you feel the movement expressing comprehension difficulty, was like saying "I don't get it"?
4. Did you feel the movement expressing a neutral attitude, was like saying "Neither"?
5. Did you feel the movement expressing skepticism, was like saying "I'm not sure about that"?
6. Did you feel the movement expressing disagreement, was like saying "I don't think so"?
Experiment 1: Questions for evaluating impressions of ear-animation-movement forms.Prior to the impression rating of the ear-animation-movement forms, we obtained information regarding the gender and the age of each participant. Each participant sat in front of a display, which was an 11-inch iPad in portrait mode. The distance to the display was approximately 50 cm. The ear-animation-movement forms were displayed on the upper side of a white screen. We formatted each image to a size of 50mm × 60mm. The video for each ear-animation-movement form was displayed without any accompanying sound. Each movement was shown for 1.5 seconds. Once completed, there was a two-second interval, then the same ear-movement form was displayed repeatedly. The questions and the scale were shown underneath the ear-animation-movement forms, so that the participants input their answers by clicking the number on the scale corresponding to their judgment. Once all of the answers were filled out, the NEXT button located at the lower part of the screen was activated. By clicking the activated NEXT button, the screen automatically transitioned to the next ear animation. The display order of ear-animation-movement forms was determined by setting a counterbalance and combining ascending and descending methods to avoid an order effect. Once answers for all ear-animation-movement forms were provided and the NEXT button was clicked, a blank frame for free description appeared so that a participant could input a comment freely on the tablet device. We also interviewed the participants to collect their comments. The numerical values of the answers given to the questions were processed with SPSS to compile the data for statistical testing. The experiment consisted of observing six patterns of movement and answering six questions, making a total of 36 items to be judged. It took approximately 10 minutes for a participant to complete the whole experiment including filling out the open answer.In Experiment 1, we used an in-subject experiment. Using G*Power (Faul et al., 2009), we estimated the sample size required for reproducing a large effect size (f = 0.4; 1-β = 0.80) since head movements provide distinct movement differences. Then we considered a study consisting of one group and six ear-movement forms. According to this analysis, at least 11 participants were required.
Participants
The research experiment received full ethical approval from the Psychological Research Ethic Committee at the University of Tsukuba, Japan prior to its commencement. The criteria for participants were as follows. They live and work in Japan and use remote meetings such as videoconferencing. They speak and write Japanese in daily life and at work. They were recruited as opportunity samples without any recompense. No particular exclusion criteria were set for participants. Prior to the evaluation of impressions of the ear animation movements, we obtained the participants' gender and age. The final sample consisted of 24 participants. Their average age was 45.04 (SD = 11.91, the youngest 21, the oldest 62) while 17 of the participants were male and 7 of them were female.
Results of experiment 1
In order to examine impressions of the ear-animation-movement forms, we conducted a one-way analysis of variance (repeated) to analyze the impressions of the following six different ear-animation-movement forms. Fig. 4, Fig. 5
show the results of Experiment 1. In Fig. 4, the letters in bold font with underline indicate the average value of impressions as intended in each movement design, while the result of multiple comparisons is shown by numbers that indicate significance of each intended design.
Fig. 4
Results of the experiment in ear-movement forms and impressions evaluation (average, (SD), the result of multiple comparisons is indicated a number below each impression, a bold-underlined value in each animation corresponds to our intended design).
Fig. 5
Evaluation of impressions given by the ear-animation-movement forms.
For the movement of ears bending forward, the following main effect of impression was observed (F(5,138) = 129.97, p < 0.001, partial η
2 = 0.825). Bonferroni post-hoc (5%) showed a significant difference was observed in "agreement" as intended in design versus all the other movements.For the movement of ears clapping, the following main effect of impression was observed (F(5,138) = 121.87, p < 0.001, partial η
2 = 0.819). Bonferroni post-hoc (5%) showed a significant difference was observed in "applause" as intended in design compared to all the other movements.For the movement of ears contracting, the following main effect of impression was observed (F(5,138) = 16.029, p < 0.001, partial η
2 = 0.367). Bonferroni post-hoc (5%) showed that a significant difference was observed in "neutral" as intended in design versus "agreement", "applause" and "disagreement". No significant difference was found versus "confusion" and "skepticism".For the movement of ears entwining, the following main effect of impression was observed (F(5,138) = 12.536, p < 0.001, partial η
2 = 0.312). Bonferroni post-hoc (5%) showed that a significant difference was observed in "confusion" as intended in design versus "agreement", "applause", and p < 0.05 versus "disagreement". No significant difference was found versus "neutral" and "confusion".For the movement of ears leaning to one side, the following main effect of impression was observed (F(5,138) = 25.894, p < 0.001, partial η
2 = 0.486). Bonferroni post-hoc (5%) showed that a significant difference was observed in "skepticism" as intended in design versus "agreement", "applause", "neutral" and "disagreement". No significant difference was found versus "confusion".For the movement of ears turning, the following main effect of impression was observed (F(5,138) = 12.401, p < 0.001, partial η
2 = 0.310). Bonferroni post-hoc (5%) showed that a significant difference was observed in "disagreement" as intended in design in comparison to "agreement", "applause", and "neutral". No significant difference was found versus "neutral", "confusion" and "skepticism".Results of the experiment in ear-movement forms and impressions evaluation (average, (SD), the result of multiple comparisons is indicated a number below each impression, a bold-underlined value in each animation corresponds to our intended design).Evaluation of impressions given by the ear-animation-movement forms.
Discussion of experiment 1
For ears bending forward and ears clapping, "agreement" and "applause" supported hypotheses H1 and H2. We were able to confirm user expressions on the "agreement" and "applause" ear animations as intended in design, resulting from those movements being natural enough to be easily comprehensible according to open answers and interviews. For ears contracting, ears entwinning, ears leaning to one side and ears turning, the respective impressions they received were not the ones we intended in design. Therefore, hypotheses H3, H4, H5 and H6 were not supported. In particular, the "disagreement" animation did not reflect our design intention. Since the turning angles of its head movement were ± 65 degrees with a fast-turning speed, many participants commented in their open answers that it gave the impression that it was looking around restlessly or it was looking for something. It is likely that the set values for the head turning angles and speed were too high for Japanese people to perceive the movements. Also, the "disagreement" ear animation movement where the vertical axis of the head rotates, gave the participants the impression of "confusion" or "skepticism". Although the animation was able to give the impression of a negative movement, intensity (angles) and speed of the movement are important for conveying subtle intention. This also indicates the limitation of impressions given to the participants. The ear animation intended for "neutral" ended up giving an impression of "confusion". For this reason, participants commented in their open answers or interviews that it was difficult to understand the movement because it was the first time for them to see such a movement. Since this movement was not a simple movement like turning a head, it seemed to be difficult to judge. It is likely that the ears-entwining movement for "confusion" was affected by its difference to natural movements as the participants formed their impression of it. While the movements had difficulty in conveying subtle impressions, they showed significant effects on classifying the movements negative or positive. This indicates that movements that do not exist in nature can give a positive or negative impression. This finding can be interpreted to mean that we will have a certain degree of freedom for conveying subtle impressions if an ear animation is displayed with words, or if we define the meaning of each movement in advance. It suggests possibilities that both enhancing and softening impressions can be done.Furthermore, there were comments in the open answers describing the ear movements as cute and that they looked friendly. The participants had positive impressions in regard to how ear animations were presented.
Experiment 2: evaluation of user impressions in condition where simple voice and ear animation presented simultaneously
Purpose of experiment 2
In Experiment 1, we compared ratings on user impressions given by the six types of ear animation movement forms. In Experiment 2, we added impression rating based on conveying methods as independent variables. There are three conditions in the conveying methods: when ear animations were presented in silent operation mode, when ear animations were presented simultaneously with simple voice, when only voice was played. Under the three conditions, we compared psychological impressions. In addition, as independent variables, we set three contents conveyed: "agreement", "skepticism", and "disagreement" corresponding to three movement forms: ears bending forward, ears leaning to one side, and ears entwining with one another, all of which followed rotation movement axes: X, Y, and Z of the ear animation shown in Fig. 2. This experiment was Two-way Repeated-Measures which conveying methods and contents conveyed were set as independent variables. Dependent variables are conveying information and conveying emotions. This experiment was to rate impression evaluation from these two points of view. The reason why we added movement of contents conveyed as an independent variable was to review any influence from voice assigned to each different ear animation. In Experiment 2, we picked up "agreement" which showed strong and clear impressions and supported our hypothesis in Experiment 1, "skepticism" which showed relatively strong impressions in the experiment's impression rating, and "disagreement" which showed vague impressions in the rating. By adding these three types to independent variables, we observed how the effect of adding voice to a movement can affect the intensity of impression over the silent movement alone. To explore the possibility of the ear animations a remote communication application, we also conducted questionnaires asking if the participants felt the ear animation promising application.
Hypothesis for Experiment 2
If voice and the ear animation are displayed simultaneously regarding conveying methods of the ear animation, it will be integration of auditory information and visual information, and we can expect that the integration makes it easy to convey information and emotion. For contents conveyed, movement forms of the ear animation are considered to become factors for conveying information and emotion. In Experiment 1, “agreement” (ears bending forward) scored higher impression rates than “skepticism” (ears leaning to one side) and “disagreement” (ears turning) significantly. Therefore, we can expect that interaction between conveying methods and contents conveyed will be generated, then we can also expect that “agreement” combination of the ear animation with voice (voice and movement) and contents conveyed will score higher than the other combinations. Since the ear animation with voice is expected to score higher than the voice-only communication, we expect the ear animation with voice will be utilized in remote communication where nonverbal information is added. Hypotheses for this research are as follows:In conveying information, there is interaction between conveying methods and contents conveyed, and the combination of an ear animation with voice (voice and movement) as a conveying method and “agreement” as a content conveyed scores higher than the other combinations.In conveying emotion, there is interaction between conveying methods and contents conveyed, and the combination of an ear animation with voice (voice and movement) as a conveying method and “agreement” as a content conveyed scores higher than the other combinations.An ear animation with voice is promising in remote communication.
Structure and procedure of experiment 2
The structure and procedure for Experiment 2 are basically the same as the ones for Experiment 1. The same ear animations for "agreement", "skepticism" and "disagreement" as in Experiment 1 were used. Corresponding to each animation, we produced voice sounds using voice synthesis software. For expressing agreement, it uttered the voiced word "I see". For expressing skepticism, it uttered the voiced word "I'm not sure about that". For expressing disagreement, it uttered the voiced word "I don't think so". As all these voice words were actually spoken in Japanese, we translated them to English in this paper for convenience. Videos that show the ear movements with voice (spoken in Japanese) is presented at https://github.com/core-dx/mimichara/tree/main/paper/videodata/movementvoice. In Experiment 2, we compared evaluations of dialogue responses in three different conveying methods: ear animations with voice (voice and movement), silent ear animations (movement), and only audio (voice). A male voice was used for voice synthesis while the length of each utterance was set to 1.5 seconds or shorter, so as to fit the length of time for each corresponding ear animation movement.As in Experiment 1, each participant sat in front of a display, which was an 11-inch iPad placed in portrait mode. The distance to the display was approximately 50 cm. The ear-animation-movement forms were displayed on the upper side of a white screen. We formatted each image to a size of 50mm × 60mm. The method to display the animations, present questions, the criteria, the open form answer, and interview were all the same as the ones for Experiment 1. For the voice-only method, there was nothing displayed on the 50mm × 60mm display. Voice was output from the iPad's speaker. Table 2
lists question items for Experiment 2. The experiment consists of observing the three levels of content conveyed: "agreement", "skepticism" and "disagreement" and three conveying methods: voice-only (voice), silent ear-animation (movement) and ear animation with voice (voice and movement); and answering three questions for each, making up 27 items for judgment. It took approximately 10 minutes for a participant to complete the whole experiment including filling out the open answer.
Table 2
Experiment 2: Questions for evaluating the communicative characteristics of dialogue responses using ear animations.
1. Did you feel this response conveyed the information from your partner?
2. Did you feel this response conveyed your partner's emotion?
3. Did you feel this way of conveying a response is promising to be used effectively in remote meetings?
Experiment 2: Questions for evaluating the communicative characteristics of dialogue responses using ear animations.In Experiment 2, we used an in-subject experiment. Using G*Power (Faul et al., 2009), we estimated the sample size required for reproducing a large effect size (f = 0.4; 1-β = 0.80), since head movements provide distinct movement differences. Then we considered a study consisting of one group, and nine measurements based on three levels multiplied three levels. According to this analysis, at least 9 participants were required.The research experiment received full ethical approval from the Psychological Research Ethic Committee at the University of Tsukuba, Japan prior to its commencement. Participants in Experiment 1 participated in Experiment 2 as well. The final sample consisted of 24 participants. Their average age was 45.04 (SD = 11.91, the youngest 21, the oldest 62) while 17 of the participants were male and 7 of them were female.
Results of experiment 2
As statistical analysis of the ear animation with voice (voice and movement), we conducted Two-way ANOVA (repeated) setting conveying methods and contents conveyed as independent variables to analyze conveying information and conveying emotion respectively.
The result of a two-way ANOVA (repeated), conveying information as dependent variables
A main effect of the contents conveyed (F(2,46) = 5.60, p < 0.01, η2 = 0.24) and a main effect of the conveying methods (F(2,46)=9.42, p < 0.01, η2 = 0.41) were significant. Also, an interaction effect between the contents conveyed and the conveying methods (F(4,92) = 6.80, p < 0.01, η
2 = 0.30) was significant. We conducted simple main effect analysis on these interaction effects.The simple main effect of the contents conveyed with the ear animation with voice (voice and movement) showed marginal significance (F(2,46)=2.74, p < 0.1, η
2 = 0.12). When Bonferroni multiple comparison was conducted (5%), no significance was observed among the contents conveyed with the ear animations with voice (voice and movement).The simple main effect of the contents conveyed with the silent ear animation (movement) showed significance (F(2,46)=6.94, p < 0.01, η
2 = 0.30). When Bonferroni multiple comparison was conducted (5%), for the contents conveyed with the silent ear animation (movement), "agreement" scored high significantly compared to "disagreement".The simple main effect of the contents conveyed with the voice-only (voice) showed significance (F(2,46)=7.36, p < 0.01, η
2 = 0.30). When Bonferroni multiple comparison was conducted (5%), for the contents conveyed with the voice-only (voice), "agreement" scored high significantly compared to "skepticism", while "disagreement" scored high significantly compared to "skepticism".The simple main effect of the contents conveyed of "agreement" showed significance (F(2,46)=6.07, p < 0.01, η
2 = 0.26). When Bonferroni multiple comparison was conducted (5%), for the conveying methods for "agreement", the ear animation with voice (voice and movement) scored high significantly compared to the silent ear animation (movement) and the voice-only (voice).The simple main effect of the conveying methods for "skepticism" showed significance (F(2,46)=7.47, p < 0.01, η
2 = 0.32). When Bonferroni multiple comparison was conducted (5%), for the conveying methods for "skepticism", the ear animation with voice (voice and movement) scored high significantly compared to the silent ear animation (movement) and the voice-only (voice).The simple main effect of the conveying methods for "disagreement" showed significance (F(2,46)=10.30 p < 0.01, η
2 = 0.44). When Bonferroni multiple comparison was conducted (5%), for the conveying methods for "disagreement", the ear animation with voice (voice and movement) scored high significantly compared to the silent ear animation (movement), while the voice-only (voice) scores high significantly compared to the silent ear animation (movement).
The result of a two-way ANOVA (repeated), conveying emotion as dependent variables
A main effect of the contents conveyed (F(2,46) = 3.42, p < 0.05, η
= 0.15) and a main effect of the conveying methods (F(2,46)=12.54, p < 0.01, η
= 0.54) were significant. Also, an interaction effect between the contents conveyed and the conveying methods (F(4,92) = 6.29, p < 0.01, η
= 0.27) was significant. We conducted simple main effect analysis on this interaction effect.The simple main effect of the contents conveyed with the ear animation with voice (voice and movement) showed significant difference (F(2,46)=5.17, p<0.05, η
= 0.22). When Bonferroni multiple comparison was conducted (5%), for the contents conveyed with the ear animation with voice (voice and movement), "agreement" scored high significantly compared to "disagreement".The simple main effect of the contents conveyed with the silent ear animation (movement) showed significance (F(2,46)=9.73, p < 0.01, η
= 0.42). When Bonferroni multiple comparison was conducted (5%), for the contents conveyed with the silent ear animation (movement), "agreement" scored high significantly compared to "disagreement", while "skepticism" scored high compared to "disagreement".The simple main effect of the contents conveyed with voice only (voice) did not show significance.The simple main effect of the conveying methods for "agreement" showed significance (F(2,46)=12.26, p < 0.01, η
= 0.57). When Bonferroni multiple comparison was conducted (5%), for the conveying methods for "agreement", the ear animation with voice (voice and movement) scored high significantly compared to the silent ear animation (movement) and the voice-only (voice).The simple main effect of the conveying methods for "skepticism" showed significance (F(2,46)=8.02, p < 0.01, η
= 0.35). When Bonferroni multiple comparison was conducted (5%), for the conveying methods for "skepticism", the ear animation with voice (voice and movement) scored high significantly compared to the silent ear animation (movement) and the voice-only (voice).The simple main effect of the co conveying methods for "disagreement" showed significance (F(2,46)=8.76 p < 0.01, η
= 0.38). When Bonferroni multiple comparison was conducted (5%), for the conveying methods for "disagreement", the ear animation with voice (voice and movement) scored high significantly compared to the silent ear animation (movement) and the voice-only (voice), while the silent ear animation (movement) scores high significantly compared to the voice-only (voice).Table 3 shows the results of the evaluation of user impressions in Experiment 2.
Table 3
Evaluation results of user impressions on contents conveyed and conveying method by ear animations (Average value (SD)).
Contents conveyed
Conveying method
Information
Emotion
Agreement
Voice-only
3.88(1.33)
2.58(1.82)
Movement-only
3.83(1.32)
3.75(1.59)
Voice & Movement
4.67(0.55)
4.58(0.76)
Skepticism
Voice-only
2.92(1.63)
2.71(1.74)
Movement-only
3.33(1.40)
3.08(1.47)
Voice & Movement
4.29(1.14)
4.13(1.17)
Disagreement
Voice-only
4.00(1.15)
3.13(1.81)
Movement-only
2.50(1.63)
2.00(1.55)
Voice & Movement
4.13(1.27)
3.79(1.32)
Evaluation results of user impressions on contents conveyed and conveying method by ear animations (Average value (SD)).Regarding user expectation of the ear animations to be used in remote communication, a main effect was observed (F(2,207) = 17.057 p < 0.001, partial η
= 0.141) for conveying methods. In the results of the Bonferroni post-hoc, the conveying method using ear-animations with voice was observed as more significant than both the method using voice-only and the one using silent ear-animation. For contents conveyed, a main effect was observed (F(2,207) = 4.957 p < 0.001, partial η
= 0.046). In the results of Bonferroni post-hoc (5%), for conveying information, a significant difference was observed in “agreement” versus “disagreement”. No interaction was observed between contents conveyed and conveying method (F(4,207) = 1.411 p < 0.231, partial η
= 0.027). Table 4
shows the results of evaluating user expectation on remote communication in Experiment 2.
Table 4
Results of user expectation on ear animations (voice and movement) to be used in remote communication.
Conveying method
Contents conveyed
Voice-only
Movement-only
Voice & movement
Agreement
3.25(1.57)
3.63(1.61)
4.42(1.10)
Skepticism
2.21(1.64)
3.17(1.69)
4.17(1.34)
Disagreement
2.71(1.99)
2.29(1.68)
3.92(1.28)
Results of user expectation on ear animations (voice and movement) to be used in remote communication.
Discussion of experiment 2
Interaction between a conveying method and a content conveyed was observed, bringing to a result that that the combination of the ear animation with voice (voice and movement) and “agreement” was superior to those of silent ear animation (movement) and voice-only (voice). This supported hypothesis H7 and H8. It is considered that a synergistic effect of the ear animation's movement and voice appeared in the conveying method. Many comments in the open answer mentioned that it got much easier to understand what the ear animation wanted to convey when voice was played simultaneously. Interestingly, in Experiment 1, the ears-leaning-to-one-side as intended in design for "skepticism" and the ears-turning as intended in design for "disagreement" was perceived as "confusion". This indicates even though impressions on these movements are vague when only their movements are shown, communicative characteristics of their movements become clear by adding voice to them.For contents conveyed, "agreement" had significant difference versus "disagreement" and "skepticism" in conveying information and emotion. As revealed in Experiment 1, it is conceivably affected by the fact that the impression of the ear movement was very strong for "agreement". This proves that the ear animation with voice showed significant differences in conveying information in terms of contents conveyed as well as scoring impression rate in terms of conveying emotion. It is important to optimize movement forms of ear animations with voice.Regarding the interaction between contents conveyed and conveying methods, Fig. 6, Fig. 7, Fig. 8
show the relationships between their total scores. Fig. 6, Fig. 7 respectively reveal total scores for each condition of conveying methods and contents-conveyed in regard to conveying information and conveying emotion. In conveying information, the ear animation with voice (voice & movement) was more significant than the movement-only versus "agreement" and "disagreement", while it was more significant than the voice-only versus "skepticism". In conveying emotion, the ear animation with voice (voice & movement) was more significant than the voice-only versus "agreement" and "skepticism", while it was more significant than the movement-only versus "disagreement". Considering in a comprehensive way, voice is a main contributor for conveying information while movement is a main contributor for conveying emotion. Also, integration of voice and movement seems effective for conveying information and emotion. These indicate that effects on communicative characteristics of voice and movement are generated without any facial expression. The significant difference from the voice-only in conveying information of "skepticism" seems to be caused by the vagueness of its voice information "I'm not sure about that" in Japanese, since its voice information is vague comparing to the ones for "agreement" and "disagreement". In this experiment, "disagreement" showed a difference from "agreement" and "skepticism" in conveying emotions. As revealed in Experiment 1, it is assumed that the ears turning movement, the intended design of which was "disagreement", was weak in conveying impressions. In short, because the impression of the "disagreement" ear-movement form was vague, it is likely the voice and movement became a significant communicative characteristic in conveying both information and emotion more than that of its movement-only when voice was added to the ear-movement form. In contrast, no significant difference was seen its movement-only and its voice-only. It will be improved by reviewing the linguistic expression of "skepticism" and the ear-movement form of "disagreement".
Fig. 6
Results of evaluating contents conveyed and conveying methods in conveying information.
Fig. 7
Results of evaluating contents conveyed and conveying methods in conveying emotion.
Fig. 8
Average scores of contents conveyed and conveying methods.
Results of evaluating contents conveyed and conveying methods in conveying information.Results of evaluating contents conveyed and conveying methods in conveying emotion.Average scores of contents conveyed and conveying methods.Regarding high expectancy of the ear animation with voice as promising to be used in remote communication, participants felt the ear animation with voice is more effective as a conveying method in all "agreement", "skepticism" and "disagreement" conveyances than the silent ear animation and the voice-only. This result supported H9, indicating that we will need to evaluate the ear animations in remote communication in the future.
General discussion
Discussion
In this study, we conducted primary research on user impressions of featureless-face-ear animations as nonverbal communication that can be used in remote communication where web cameras are turned off. Regarding the design of the ear-animation-movement forms, statistical tests revealed movements close to human natural movements such as "agreement" and "applause" are valid in conveying impressions. On the other hand, the "disagreement" movement of turning ears from side to side gave an impression of looking around restlessly. As a result of exaggerating human movements, it led to form rather a vague impression. In designing ear-movement forms that could affect contents conveyed, it is likely that intensity of movements and speed adjustment affect the intensity of an impression. Therefore, to design an ear animation, it is necessary to consider the contents it wants to convey, movement forms corresponding to its level, and speed optimization of movement. On the whole, Experiment 2 showed there was interaction between conveying methods and contents conveyed in conveying information and conveying emotion. The ear animation and the ear-bending movement of “agreement” scored high in the impression rating. It indicates that it is important to design movement forms of the ear animations relevantly and integrate them with voice. In addition to conventional voice interactions, interactions using video, and interactions using avatars with facial expressions, this study indicated a possibility of a new interaction style using ear animations with no facial information.In a sense, to use ear animations with no facial expression and no eye gaze is to use nonverbal communication that lacks the most important part of human nonverbal communication. By integrating voice and movement, this communication style is thought to provoke people to use their imagination, making up for the lack of facial expression. That is to say, conveying rich emotions is made possible by the combination of voice and movements even if every detail is not visually expressed as facial information is omitted. Minimal design has been proposed as a way to emphasize the main points of communication by eliminating unnecessary design elements (Matsumoto et al., 2006). Moreover, it has been reported that a no-facial-expression robot with voice, which introduced the minimum design method for doll therapy, is positively accepted by elderly adults with dementia (Sumioka et al., 2021). Although this research does not deal with a physical robot, this is considered as a new method of communication that stimulates human imagination with reduced amount of information by using the movements of the ear animations without facial expressions.As previously mentioned, it is difficult to meet and recognize one's gaze over a monitor. This has been a particular issue in real-time video interaction. By utilizing the ear animations, we can expect the issue to be solved because eye gaze itself is completely omitted. There is no eye gaze in the animation. However, as long as human imagination can fill up the absence, it might be better than any unnatural eye gaze appeared on the screen. Contrary to audio-only interaction, the voice and ear animations were positively received for use in remote meetings. It is indirectly supported that the voice and ear animations are accepted even if there is no facial information or eye contact information in them.Focusing on the ear animations' visual stimuli, we can compare both differences and similarities between the ear animations and emojis. It has been over 20 years since emojis were first introduced. In recent years, the use of emojis has been expanding remarkably as studies on emojis have been conducted actively (Bai et al., 2019). In general, emoticons are used for expressing different emotions (Derks et al., 2008). Many studies have revealed emoticons and emojis enhance enjoyable communication and stimulate arousal levels (Walther & D'Addario, 2001). As relatively simple visual stimuli, the ear animations have similarities to emojis. It was reported that the visual stimuli of emojis are processed faster than a word (Kaye et al., 2021). While the ear animation has a visual stimulus, it lacks a wide variety of expressions emojis has. To cope with this disadvantage, we are interested in establishing hybrid communication consisting of the ear animations and emojis. This is one of our research themes that will be discussed in the future.
Limitations of the ear animations in this study
In this study, we designed ear animations representing rabbit ears and evaluated user impressions on nonverbal information such as information and emotion brought by specific movement of the ear animations. While evaluating user impression is affected by various factors such as ear animations' designs, movement forms, and movement speed, this study's evaluation is limited to specific ear animations, not covering evaluations for each factor. In regard to participants, we can consider the influence of age, gender, personality, ethnicity, and so on, but our evaluation was limited to certain groups of people. Although this study suggested the combined effects of voice and ear animations, it was limited to combinations of specific words and specific ear animations. Furthermore, we need to consider how to present the ear animations on a screen, such as displaying a single ear animation or multiple ear animations at one time, as well as the observation viewpoint of 3D ear animations. Also, we should not ignore how the back ground of ear animations would affect user impressions. The results of this study are limited to basic and specific examples of ear animations.
Future challenges for the ear animation
First, in this study, we proposed and designed the ear animations, and conducted a primary evaluation of user impressions. For their practical use, we need to evaluate characteristics of dialogues in remote communication using the ear animations.Second, in remote communication research using the ear animations, methods for displaying a self-image should be discussed. The ear animations are able to form a sense of existence as avatars in cyberspace. Therefore, it is important to investigate the way ear animations are displayed in cyberspace to form a sense of communication partner or team unity in remote communication.Thirdly, although a basic potential for the ear animations has been indicated, as described in the discussion, further research on movement speed, movement intensity and a variety of movement forms is expected in order to improve communicative characteristics of the ear animations.
Conclusion
Through proposing and designing the ear animations, and assessing user impressions of the movement forms and communicative characteristics of dialogue responses, this study resulted in suggesting the ear animations that do not contain facial information changing dynamically to be used in remote communication with a web camera turned off. In particular, "agreement" and "applause" are consistent with our design intentions in evaluating ear-animation-movement forms. Such natural movements easily convey impression and dialogue responses. On the other hand, movements that are unlike natural movements such as "disagreement", "confusion" and "neutral" in this study's designed movement forms easily caused dispersion in conveying emotions. However, likewise the ear movement of "agreement", the ear movements of "skepticism" and "disagreement" were significantly improved by adding voice in both conveying information and emotion compared to voice only and movement only. As a mechanism, it is likely that the ear animations with voice produce a synergistic effect of auditory and visual perceptions.To improve conveying impressions as well as the communicative characteristics of dialogue responses, and to adjust subtle expressions, we comprehensively evaluated the results of the statistical testing and open answers. Through these evaluations, we will consider further design of the ear-movement forms and adjust their intensity (degree of dynamic movement), speed, and so on.The ear animations in this study are not equipped with facial expressions changing dynamically. It is considered that the lack of facial information is complemented by human imagination. Because the ear animation design does not require subtle facial expression to be customized, it allows relatively fewer sensors to be used. Therefore, it is easy to implement the ear animations in remote communication. As a result of this study, the ear animations are expected to be applied in remote communication. Going forward, it is desirable to evaluate communication using the ear animations.
Uncited References
Bailenson, 2021; Edmondson, 1999; He et al., 2021; Nowak and Fox, 2018; Kita, 2009.
Declaration of competing interest
The authors have no conflicts of interest directly relevant to the content of this article. Regarding conflict of interest, the authors receive full approval from the Psychological Research Ethic Committee at the University of Tsukuba, Japan.