| Literature DB >> 26257694 |
Chien-Ming Huang1, Sean Andrist1, Allison Sauppé1, Bilge Mutlu1.
Abstract
In everyday interactions, humans naturally exhibit behavioral cues, such as gaze and head movements, that signal their intentions while interpreting the behavioral cues of others to predict their intentions. Such intention prediction enables each partner to adapt their behaviors to the intent of others, serving a critical role in joint action where parties work together to achieve a common goal. Among behavioral cues, eye gaze is particularly important in understanding a person's attention and intention. In this work, we seek to quantify how gaze patterns may indicate a person's intention. Our investigation was contextualized in a dyadic sandwich-making scenario in which a "worker" prepared a sandwich by adding ingredients requested by a "customer." In this context, we investigated the extent to which the customers' gaze cues serve as predictors of which ingredients they intend to request. Predictive features were derived to represent characteristics of the customers' gaze patterns. We developed a support vector machine-based (SVM-based) model that achieved 76% accuracy in predicting the customers' intended requests based solely on gaze features. Moreover, the predictor made correct predictions approximately 1.8 s before the spoken request from the customer. We further analyzed several episodes of interactions from our data to develop a deeper understanding of the scenarios where our predictor succeeded and failed in making correct predictions. These analyses revealed additional gaze patterns that may be leveraged to improve intention prediction. This work highlights gaze cues as a significant resource for understanding human intentions and informs the design of real-time recognizers of user intention for intelligent systems, such as assistive robots and ubiquitous devices, that may enable more complex capabilities and improved user experience.Entities:
Keywords: eye gaze; gaze patterns; intention; intention prediction; support vector machine
Year: 2015 PMID: 26257694 PMCID: PMC4513212 DOI: 10.3389/fpsyg.2015.01049
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Data collection of dyadic interactions in a sandwich-making task. Left: Two participants, wearing gaze trackers, working together to make a sandwich. Middle: The participant's view of the task space from the gaze tracker. The orange circle indicates their current gaze target. Right: The layout of ingredients on the table. The ingredients, from top to bottom, left to right, are lettuce1, pickle1, tomato2, turkey, roast beef, bacon2, mustard, cheddar cheese, onions, pickle2, ham, mayo, egg, salami, swiss cheese, bologna, bacon1, peanut butter, lettuce2, pickle3, tomato1, ketchup, jelly.
Figure 2Illustration of episodic prediction analysis. Each illustrated episode ends at the start of the verbal request. The top plot shows probabilities of glanced ingredients that may be chosen by a customer. Note that the plotted probability was with respect to each ingredient. By calculating the normalized probability across all ingredients, we can determine the likelihood of which ingredient will be chosen. The bottom plot shows the customer's gaze sequence. Ingredients are color coded. Purple indicates gazing toward the bread. Black indicates missing gaze data. An anticipation window is defined as the time period starting with the last change in the prediction and ending with the onset of the speech utterance. The beginning and end probabilities are the probabilities of the predicted ingredient at the beginning and end of the anticipation window.
Summary of our quantitative evaluation of the effectiveness of different intention prediction approaches.
| Chance | 4.35–11.11% | N/A |
| Attention-based | 65.22% | N/A |
| SVM-based | 76.36% | 1831 ms |
Figure 3Two main categories of correct predictions: one dominant choice (top) and the trending choice (bottom). Green indicates the ingredients predicted by our SVM-based predictor that were the same as the actual ingredients requested by the customers. Purple indicates gazing toward the bread and yellow indicates gazing toward the worker. Black indicates missing gaze data.
Figure 4Examples of incorrect predictions. Red indicates the prediction made by the SVM-based predictor, whereas blue indicates the actual ingredient requested by the customers. Purple indicates gazing toward the bread whereas yellow indicates gazing toward the worker. Black indicates missing gaze data.
Figure 5Examples of special gaze patterns. Green indicates the ingredients predicted by our SVM-based predictor that were the same as the actual ingredients requested by the customers. Blue lines indicate the ingredients that the customers picked. Red lines are our predictions. Purple indicates gazing toward the bread, whereas yellow indicates gazing toward the worker. Black indicates missing gaze data.
| Feature 1 | Number of glances toward the ingredient before the verbal request (Integer) |
| Feature 2 | Duration (in milliseconds) of the first glance toward the ingredient before the verbal request (Real value) |
| Feature 3 | Total duration (in milliseconds) of all the glances toward the ingredient before the verbal request (Real value) |
| Feature 4 | Whether or not the ingredient was most recently glanced at (Boolean value) |