Enrique Cáceres1, Miguel Carrasco2, Sebastián Ríos3. 1. Escuela de Informática y Telecomunicaciones, Universidad Diego Portales, Chile. 2. Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, Chile, Av. Diagonal Las Torres 2700, Penalolen, Santiago, Chile. 3. Departamento de Ingeniería Industrial, Universidad de Chile, Chile.
Abstract
Advances in eye-tracking technology have led to better human-computer interaction, and involve controlling a computer without any kind of physical contact. This research describes the transformation of a commercial eye-tracker for use as an alternative peripheral device in human-computer interactions, implementing a pointer that only needs the eye movements of a user facing a computer screen, thus replacing the need to control the software by hand movements. The experiment was performed with 30 test individuals who used the prototype with a set of educational videogames. The results show that, although most of the test subjects would prefer a mouse to control the pointer, the prototype tested has an empirical precision similar to that of the mouse, either when trying to control its movements or when attempting to click on a point of the screen.
Advances in eye-tracking technology have led to better human-computer interaction, and involve controlling a computer without any kind of physical contact. This research describes the transformation of a commercial eye-tracker for use as an alternative peripheral device in human-computer interactions, implementing a pointer that only needs the eye movements of a user facing a computer screen, thus replacing the need to control the software by hand movements. The experiment was performed with 30 test individuals who used the prototype with a set of educational videogames. The results show that, although most of the test subjects would prefer a mouse to control the pointer, the prototype tested has an empirical precision similar to that of the mouse, either when trying to control its movements or when attempting to click on a point of the screen.
Over the last 30 years, many new technologies, making extensive use of human vision as a computer input and control system, have been developed in areas as diverse as health services and patient evaluation (Gold et al., 2016; Asan and Yang, 2015), security and biometrics (Galdi et al., 2016), web usability (Cutrell and Guan, 2007; Penkar et al., 2013), interfaces (Yeoh et al., 2015), interactive systems (Milekic, 2003; Kiefer and Giannopoulos, 2012; Krassanakis, 2011), and cognition and neuroscience (Starr and Rayner, 2001; Johansson et al., 2001; Mrotek and Soechting, 2007). In particular, the development of applications for those with some degree of motor difficulties has been especially useful, since these facilitate the use of a mouse cursor and/or virtual keyboard and provide alternative access during the integration and rehabilitation processes (Biswas and Langdon, 2011; Adjouadi et al., 2004; Perini et al., 2006; Majaranta and Räihä 2002; Majaranta, 2009; Zhai et al., 1999; Ward et al., 2000).In spite of the growing amount of development, there are still several factors that impede precision in these systems; for example, a lack of standard for the quality of data obtained by such devices (Holmqvist et al., 2012; Lutteroth et al., 2015). Furthermore, and especially relevant, is the correct evaluation of the points of regard, or observer gaze, since this must be predicted precisely to determine the point or button selected. This prediction capability depends on the type of application that is being developed (Biswas and Langdon, 2015; Barz et al., 2015). This is a field in continuous growth, and is mainly focused on facilitating the usability of applications, as well as diminishing cognitive overload in the observer.Before the spread of the eye-tracker as a Human-Computer Interaction tool, the mouse (and its variants) was the defacto tool for communicating with computers; however, many studies have demonstrated the difficulties this device has as a control method, even when systems have been implemented to facilitate its use (Trewin et al., 2006; Keates and Trewin, 2005; Hwang et al., 2004). For those patients with some degree of motor deficiency, such as amyotrophic lateral sclerosis, complete paralysis, or pyramidal syndrome (motor neuron disease), with normal cognitive skills, sight is the best available communication option -the lack of a mechanism that will allow them to write or read easily, among other things, becomes a serious barrier to patient access to knowledge and decreases their chances for autonomy and personal development-. Currently, there are many commercial devices available for people with disabilities: these include LC technologies, Tobii, and EagleEyes, which have developed systems that allow for the use of human vision as the exclusive method for controlling a computer (Biswas et al., 2012; Biswas, 2016). In this line, our research seeks to integrate different technologies to develop a low-cost, head-mounted eye-tracker add-on that can be connected to any eye-tracker.Although the focus of our research is directed at people with serious motor difficulties (in particular, those with complete immobility), we have developed our tests with volunteers that have average visual, cognitive, and motor skills. The above is due to the fact that we center the evaluation of this study on measuring system control capability, which isolates the user motor skill variable. As we will discuss ahead, even when taking the paralysis factor into consideration, previous studies on cognitive processes do not allow for inferences to be made about perceptive processes that occur in the observer, independently if they have a motor condition or not. The visual movement patterns that currently exist allow for the estimation of where an observer is looking, but not necessarily what the observer wants to do (Hayhoe, 2004), which is why we focus this work on system performance for people that can fully follow basic system instructions.
Background
Thanks to enormous advances in eye-tracking technologies, it has been possible to advance the understanding of human vision in fields as diverse as psychology, product design, biology, cognitive-neuroscience, and computer vision. One of the main uses of eye-tracking technology is the detection of eye movements in order to convert eye positions into movement patterns. The most recurrent use of these devices is in obtaining information for statistical purposes, either to determine the main sighting point of a user facing a billboard or other object, or to record the foci of his/her attention when looking at something in particular, along with the order of observation. A more complex application, which is gaining in importance, is research into human-computer interaction (HCI) by means of eye-tracking devices through newly-developed platforms that allow the use of certain applications with the technology or that directly incorporate them into the operating system of a computer or mobile device (MacKenzie et al., 2012; Jacob and Karn, 2003).The eye-tracking concept refers to a set of technologies and procedures that make it possible to monitor and register the way in which a person fixes his/her visual attention on a given scene or image; it is important to distinguish two main approaches in performing the observation: (1), those that measure the position of the eye with respect to the head; and (2), those that measure the orientation of the eyes in space. The latter is known as the “point of regard”. There are four techniques for measuring eye movements: (1) Electrooculography (EOG); (2) by means of suction cups or contact lenses; (3), by photo-or-video oculography; and (4) video detection based on the pupil and the corneal reflection (Duchowski, 2007). This lattermost technique allows researchers to measure a point of regard in relation to what is being observed, which can be of high or low precision (based on the type of application required) (Bates and Istance, 2002; Biswas, 2016). Although there are some discussions on whether the change of observation positions may have random behavior, the most recent studies indicate that the process of fixing the gaze and looking at an object in a scene is an efficient, non-random process (Rajashekar, 2004; Riche et al., 2013), which is regulated by exogenous (stimulus-driven) or endogenous (cognitively-driven) factors (Smith and Mital, 2013).Today, there are three kinds of eye-trackers that incorporate some combination of the above procedures: (1), those in which the head is supported and always remains still; (2), those that are affixed to the head through some type of helmet or lens, which allow for head movement; and (3), those that incorporate face movement (face tracking) so as to allow for significant movement without calibration losses. Regardless of type, the basic function of these devices is to capture an image produced by the optical reflection on the first layer of the cornea, and use the vector formed between the center of the pupil and center of the reflection points as position values of the pupil for the eye-tracker. This vector, and not the absolute position of the pupil, is used as a value because it is invariant with respect to involuntary movements of the device or the head. With calibration, these devices can detect the user's point of interest, i.e. the point at which they are actually looking (Bates and Istance, 2002; Ward et al., 2000; MacKenzie et al., 2012).Operations of eye-tracking systems are achieved by means of a series of sequential operations (represented graphically in Fig. 1). First, an infrared LED near to the observer irradiates the eye, and the reflected image is captured by a nearby camera. The infrared LED allows the device to distinguish between the pupil and iris regions. At the same time, the camera also captures the corneal reflection; here, an application may use a predefined threshold to increase precision. The center of the pupil and the position of the corneal reflection are detected using segmentation algorithms. For each user, the system creates a transformation function that corresponds to every movement first, with respect to central vision, and second, relative to the observed image. This transformation takes place through a calibration process that is independent for each observer. After defining and applying the appropriate transformation function, the eye-tracker system can record every eye movement for the scene or image (Krassanakis, 2011; Duchowski, 2007).
Fig. 1
Diagram of the operation of an eye-tracker. (a) User's field-of-view, (b) eye-tracker output, (c) general eye-tracker workflow.
Diagram of the operation of an eye-tracker. (a) User's field-of-view, (b) eye-tracker output, (c) general eye-tracker workflow.The calibration process mentioned above is usually performed by asking the user to look at several points on the system's screen, associating the position of the pupil fixed on each point with the coordinates of that point, thereby creating a transformation matrix between the two reference systems (Fig. 2a and b). If this process is performed incorrectly, the measurements produced by the system will also be erroneous. Depending on the magnitude of calibration error, an HCI cursor may only be off by a couple of pixels (less than 5 pixels), which results in a cursor that is always within a few pixels of the point at which the observer is looking (MacKenzie et al., 2012; Penkar et al., 2013). On the other hand, if the calibration of the system is outside of the above acceptable range, the position of the mouse cursor will have a significant error.
Fig. 2
Calibration of an eye-tracker based on video oculography: (a) Illustration of the pupil movements (black circles) and the quasi-stationary corneal reflections (small white circles) needed to calibrate an eye-tracker whose calibration depends on nine points (ASL eye-tracker); (b) Image of the user's eye taken by the eye-tracking device during system calibration.
Calibration of an eye-tracker based on video oculography: (a) Illustration of the pupil movements (black circles) and the quasi-stationary corneal reflections (small white circles) needed to calibrate an eye-tracker whose calibration depends on nine points (ASL eye-tracker); (b) Image of the user's eye taken by the eye-tracking device during system calibration.Eye-tracking technology is most commonly used in the service of the disabled by controlling a mouse pointer through eye movements, i.e., the movements of the pupil are used to position the mouse pointer at the point of regard. This technology, which at first glance seems quite simple, takes multiple factors into account before any movement is made. In the first place, the eyes are constantly making small movements to correct their position at all times, even when the person believes that his/her sight is fixed on a point (this concept is called eye-fixation, while the small movements are known as saccades, see discussion in Mrotek and Soechting (2007) and Lutteroth et al. (2015) for more details). If the pointer is designed to faithfully follow the movements of the user's pupil, without any kind of filtering or smoothing, its movement would be jittery. In order to simulate a stable position, predictive models of movement are generally used, or at least an average position with respect to a certain window of time (even if this method is imperfect due to changes in gaze velocity). Since the point of regard is related to the context of the task (Hayhoe et al., 2003), the majority of current models use this information to generate a control action or to manage the interface (Majaranta, 2009).For disabled persons, the main benefit of emulating a mouse with an eye-tracking device is that it gives them access to any graphical user interface based on windows, icons, menus, and pointers (WIMP). Many operating systems also have virtual keyboards deployed on the screen that can be operated with the pointer. Thus, if an eye-tracking device can faithfully emulate a conventional mouse, it ensures complete control of the software system (Perini et al., 2006; Majaranta, 2009; Biswas and Langdon, 2015; Lutteroth et al., 2015). Depending on focus and the task needed, many systems execute a click of the cursor based on the dwell time of the pointer in a given area, while others do so by means of a blink; that said, any other method that can be detected and interpreted by the system may be used. In addition to a left click on an object, other actions are needed for complete manipulation of traditional software systems: a double click, a right click, and dragging objects over the user interface. The most widely used solution in eye-tracking systems that emulate a mouse is that a short time on an object simulates a left click, while a long time on the same object simulates a double click, e.g. (Biswas and Langdon, 2015; Biswas, 2016).In spite of the advantages inherent to mouse operation, it is important to indicate that the applications for users as mentioned here should be designed with the consideration of avoiding a cognitive overload, since this can rapidly tire the user out (Bates and Istance, 2002; Kumar et al., 2007; Biswas and Langdon, 2011). The onscreen keyboard is a clear example of overload, since it constrains the user to write each letter, and repeat them very frequently for redundant words (Ward et al., 2000). Other factors, such as letter size and spacing, are elements that affect application design, since they force vision to be situated over multiple parts of the screen (Drury and Hoffmann, 1992; Keates and Trewin, 2005). This point is especially important, since it is related to the amount of time necessary to execute an action on screen in a specific area and during a specific time; this performance is measured through Fittz's Law (Fitts, 1954). Furthermore, even when using the best current eye-trackers, there is a precision error in visual angles of at least 0.5°, which affects the precision for smaller elements onscreen (Bates and Istance, 2002). It has been reported that, at a distance of 65 cm from the screen, the eye can see details in an area of 1.1 cm2 (Yeoh et al., 2015).Another relevant point to make on vision is that it is intrinsically linked to the cognitive processes of the observer themselves. One of the first works to comment on this phenomenon was the recognized work of Yarbus (1967) (Tatler et al., 2010; Yarbus, 1967), which showed that, based on the type of question asked to the observer, the point they focused on was spatially and temporally modified. In recent years, it has been possible to reveal that these points of fixation -places where we fix the gaze for a brief moment– are not sufficient to effectively determine the cognitive processes of the observer. This limitation implies that it is difficult to know what the observer is doing, in spite of the fact that we may use fixation points as special coordinates in relation to what is being observed (Hayhoe et al., 2003; Mrotek and Soechting, 2007). This further implies that, when faced with visual changes, only a small amount of visual information is retained in memory (Irwin, 1996; Starr and Rayner, 2001). This point is fundamental in designing and evaluating applications that make extensive use of visual information as a method to make inferences about the actions the observer performs; this is due in large part to the fact that what we see is related to subconscious processes, which is not necessarily affected by the physical condition of the user.
Materials & methods
Most commercial eye-trackers have been designed not to serve as the basis for a peripheral to allow human-computer interaction, but rather to study the user's visual behavior in relation to his physical surroundings; some examples include systems developed by LC Technologies, Tobii, SMI, BLiNK, FaceLab, and EagleEyes. The developments and advances in these systems have allowed for eye-trackers to be simpler and more portable than before, to be used by practically any user, and to have notably increased precision. This research focuses on a modification of a commercial, head-mounted eye-tracker in order to implement an efficient, low-cost, add-on system for vision-controlled computer interaction. To achieve this goal, we propose modifications to the eye-tracker's control systems and to integrate other technologies based on the concepts of computer vision. All these procedures and costs are listed below, and are detailed in the following sections.
Adding an infrared camera
Eye-trackers allow for the capture of visual panoramas and user eye movements with respect to their field of vision. In contrast, the main function of our proposal is to control the movements and actions of the UI cursor with respect to the user's eye movements: wherever the user looks at on the screen, the pointer will go there; to click on that point, the user needs only to fix their visual attention on it for a given time for the system to determine that the user wants to click there. In order to perform this modification, the coordinates of the user's gaze position must be estimated in reference to user's screen, since the two are not identical. Mathematically, the aim is to transfer a point from one coordinate system to another. To do so, however, there must be verification that the user is looking at the screen area in order to set the pointer at those coordinates. The main problem here is to detect a screen within the user's field of view. Because the user's gaze varies continuously, most image pattern recognition algorithms are not able to handle this detection in real time, especially in differing light conditions, size and distance to the user's screen. Moreover, most such procedures are computationally costly and therefore take too much time to deliver results; furthermore, they are generally inaccurate in surroundings with multiple objects. On the other hand, integrated assistance devices based on sensors under the user's screen requires being placed in front of the screen and, at least, a direct vision of these sensors is required. The last can be complex to use in users with mobility restrictions.To solve the above problem, we propose a simple and effective solution by means of a second camera placed on the head-mounted eye-tracker whose sole purpose is to detect only four infrared LEDs surrounding the user's screen; configured previously (Fig. 3a and b). This second camera is complementary to the head-mounted eye-tracker, and allows for the registration of any movement made by the user that affects the vision of the infrared camera or the eye-tracking camera image; as such, the same scene is integrated and the slight angular change is accounted for. The main benefit of this idea is that points of interest are filtered using a physical device, always visible by the additional infrared camera, and not by means of additional processing; this simple change reduces both processing time and the computational resources required. The above allows us a complete mobility system because our sensors are placed on the users field-of-view.
Fig. 3
Adapted devices and results of their unification: (a) SMI Eye-tracking Glasses including a second camera that detects infrared light; (b) Frame fitted with four infrared LEDs surrounding the monitor. These infrared lights are not visible to the human eye; (c) Detection of the four LEDs after the segmentation and mathematical morphology procedures; (d) Detection of the four LEDs that will finally be used.
Adapted devices and results of their unification: (a) SMI Eye-tracking Glasses including a second camera that detects infrared light; (b) Frame fitted with four infrared LEDs surrounding the monitor. These infrared lights are not visible to the human eye; (c) Detection of the four LEDs after the segmentation and mathematical morphology procedures; (d) Detection of the four LEDs that will finally be used.Even though this method of filtering the image is very precise compared to image processing methods, any high non-infrared luminous intensity of the surroundings can still be detected slightly by the additional IR camera. To prevent this, the first step involves segmentation by infrared LEDs. Here we use the Otsu algorithm (Gonzalez and Woods, 2008) to perform this task. This algorithm evaluates the intensity of every pixel with respect to a numerical condition: if said condition is fulfilled, the intensity of that pixel will be maintained; otherwise, it will be given a value of zero (black). The decision threshold allows the high intensities of the detected LEDs to be separated from the slight representations of the luminous conditions of the surroundings.Once the above filtering procedure has been carried out, it is necessary to correctly calibrate the position of the LEDs lining the screen with respect to the user. This is done by increasing the recorded size and intensity of the LEDs. Mathematical procedures of morphology, e.g., erosion and dilation (Serra, 1983), are applied. First, a circular structural element with a radius of three pixels is created to erode the captured image, which conserves only the center of each LED. Then the previous result is dilated with a circular structuring element with a radius of 20 pixels, thereby increasing its size. Because only the center of the LED is dilated, the final representation of each LED will be white and perfectly circular (Fig. 3d).To establish a relation between the different cameras (and thus a permanent reference between the two coordinate systems), a frame the size of the test screen was designed with four infrared LEDs serving as points of reference. For this purpose they are arranged such that the correct order can be determined if any of the markers is not detected. Internally, an index was added to each LED, numbered from top to bottom and from left to right, graphed on itself in the window in which the position of the LEDs is monitored. After carrying out all these procedures, the existing reference systems are unified, as described in detail in the following sections.
Homography matrix estimation
Due to the two points of reference (the user's field of view and the screen) that the system possesses, it is necessary to realize a homography procedure between them in such a way that the coordinates from one camera and the other are transferred. This process is performed with a homography matrix (known as an matrix), which is a linear operation that relates any point between two frames (Hartley, 1997). This procedure is carried out in two steps: (1), calibration between an image frame of the eye-tracker and an image captured by an infrared camera mounted on the eye tracker (modified webcam, Fig. 3); and (2), the forward-facing infrared camera and the user screen reference LEDs.First, the relative position of the screen with respect to the reference system of the additional camera is obtained. In general, it is necessary to know at least four coordinates of points, corresponding to each reference system, to estimate an matrix; as such, four coordinates that refer to the same point or object are needed per image (Fig. 4). So once the additional camera is fixed to the frame of the head-mounted eye-tracker, an image showing what both cameras detect at that exact instant must be captured. This allows the required four pairs of points in correspondence to be determined from the cameras. Because the additional camera detects only the infrared lights, the points chosen in both images must be the same four LEDs that are lit on the edge of the monitor, because these are the only visible pixels in that image (Fig. 5b). In order to perform this calculation, we used the 2D re-projection described in (Hartley and Zisserman, 2000) to estimate an matrix. Briefly, the idea is to re-project a 2D point into another plane by using a corresponding point . This relationship can be defined as (Eq. (1)):where is a 3 × 3 homography matrix. This problem can be redefined as a linear equation system for each corresponding point between two images (Eq. (2)):where is the i-corresponding point. Rearranging this linear equation as , this problem can be resolved by using a linear least square system, expressed as Since there is no way of relating them in real time, it is necessary to fix the additional camera permanently to the frame of the eye-tracker to prevent it from moving and thus enable it to determine the corresponding point.
Fig. 4
Projective transformation process in an image from 2D to 3D space.
Fig. 5
Points in correspondence for determining matrix H, between the eye tracking device and the added camera, and its final representation. (a) Panoramic view of the eye tracking device showing its four corresponding points. (b) Panoramic view of the added camera showing its four corresponding points. (c) Monitoring window of the four infrared LEDs after the mathematical morphology, arrangement, and identification procedures, together with the representation of the user's visual attention, in the reference system of the added camera, by means of a green circle.
Projective transformation process in an image from 2D to 3D space.Points in correspondence for determining matrix H, between the eye tracking device and the added camera, and its final representation. (a) Panoramic view of the eye tracking device showing its four corresponding points. (b) Panoramic view of the added camera showing its four corresponding points. (c) Monitoring window of the four infrared LEDs after the mathematical morphology, arrangement, and identification procedures, together with the representation of the user's visual attention, in the reference system of the added camera, by means of a green circle.This procedure was carried out using the images as seen in Fig. 5a and b; the coordinates of the required corresponding points of the two reference systems were determined and related, using either the eye-tracker or the additional camera as initial reference system. With these coordinates, using Eq. (2), the values of the matrix relating the two reference systems were calculated, and were then used in Eq. (1). This mathematical representation allows the coordinates of the user's visual attention to be transformed from their own reference system to that of the additional camera. Graphically, this new coordinate will be represented by a green circle in the monitoring window of the proposed software (Fig. 5c).The second step uses the infrared camera LED coordinates , and (2) the user's screen, on which the user fixes his visual attention. In order to use the reference of the user's screen, a new matrix must be estimated. The points in correspondence will be given by the position of the infrared LEDs with respect to their own reference system. In that of the additional camera, these points will be updated continuously, because their coordinates are determined by the computer vision system described above, while in the reference system of the screen these coordinates will be fixed and represented in pixels. The procedures detailed in these points are summarized graphically in Fig. 6.
Fig. 6
Summary of the processes carried out to achieve human-computer interaction by means of an Eye Tracking Glasses.
Summary of the processes carried out to achieve human-computer interaction by means of an Eye Tracking Glasses.
Ethical approval
Our analysis was carried out on 30 able-bodied subjects during approximately 30 minutes per person. Approvals were obtained from the institutional review boards for the University of Diego Portales, and informed consent was obtained from all participants in our experiments. No compensation was offered to the participants. Before each test, every subject was informed verbally of some general aspects of this study, with its objectives, its impacts, the overall way in which the system operates, and the way in which it is used, explained in detail. After this induction, the system (the eye-tracker with the aforementioned modifications) was installed on each user, and calibrated by looking steadily at three points indicated by a supervisor at the beginning of the test. After carrying out all these procedures, the user was ready to control the mouse pointer only with his eye movements, being careful to keep the LEDs on the edge of the monitor within the visual range of the additional camera.
Analysis
Subjects interacted with two video games of the teaching software Activa tu Mente (owned by La Factoria d'lmages), using only their eye movements to control the mouse pointer (Fig. 7). The time required to trigger a click action was set at 1200 ms, activated only if the user's gaze remains in a 10 cm square area of the screen. This was programmed by inserting a click action in the flow of data that communicates the mouse with the operating system. This area size was chosen in relation to the type of menus used in this application, whose buttons are 20 mm × 50 mm on a 21 in monitor with a resolution of 1600 × 1200 pixels. It is important to mention that we used no other type of interaction with the cursor, such as drag-drop or right click; rather, our evaluation was centered on controlling the cursor and the mental load of the user. The development of the application was implemented on a Lenovo ThinkPad X220 with Windows 7 Intel Core i7. The eye-tracker was a head-mounted SMI iView HED-tracking System programmed with the SMI SDK in Microsoft Visual Studio 2008 in C#, with the OpenCV 2.1 library.
Fig. 7
Teaching software Activa tu Mente (owned by La Factoria d'lmages).
Teaching software Activa tu Mente (owned by La Factoria d'lmages).
Subjective measurement on able-bodied users
The procedure for measuring and quantifying the experience of the subjects when interacting with the system was based mainly on a survey of 11 questions aimed at studying all the aspects perceptible to the user. The questions of the survey, together with the value of each non-numerical choice for each question (given in parenthesis), are detailed below:How would you grade this eye-tracking system in general? Grade it from 1 (bad) to 7 (perfect) in terms of your satisfaction.Compared to a conventional mouse and in accordance with your needs, how do you find this device in general for controlling a computer? Choices: Much worse (−2), worse (−1), neither better nor worse (0), better (1), much better (2).How comfortable did you feel physically using this device? Choices: Very uncomfortable (−2), uncomfortable (−1), neutral (0), comfortable (1), very comfortable (2).How did you find the system's precision for controlling the mouse pointer with your sight? Grade it from 1 to 7 according to your satisfaction.How difficult was it to adapt or get used to moving the mouse pointer with your sight? Choices: Very difficult (−2), difficult (−1), neutral (0), easy (1), very easy (2).How long do you think it took you to adapt to moving the mouse pointer with your sight? Choices: Less than 5 minutes; between 5 and 10 minutes; between 10 and 15 minutes; between 15 and 20 minutes; more than 20 minutes.How uncomfortable was controlling the constant involuntary movement of the pointer with your sight? Choices: Excessively uncomfortable (−4), very uncomfortable (−3), uncomfortable (−2), slightly un-comfortable (−1), imperceptible (0).How did you find the system's precision for clicking with your sight? Grade it from 1 to 7 according to your satisfaction. Choices: 1, 2, 3, 4, 5, 6, 7.How difficult was it for you to adapt to clicking with your sight? Choices: Very difficult (−2), difficult (−1), neutral (0), easy (1), very easy (2).How difficult or uncomfortable did you find the calibration process of the eye-tracking device? Choices: Very difficult/uncomfortable (−2), difficult/uncomfortable (−1), neutral (0), easy/comfortable (1), very easy/comfortable (2).Based on your performance using this device, how do you find this eye-tracking system as a tool to assist people who suffer a motor disability and are unable to use a conventional mouse? Choices: Very bad (−2), bad (−1), neutral (0), good (1), very good (2).
Objective measurement: system evaluation
A mathematical model initially proposed by Fitts (1954) has been widely adopted in evaluating the movement, time, and precision of a user in moving a cursor from one point to another. Thanks to this model, it has been possible to compare different pointing devices, which have also been incorporated as the basis of the ISO/TS 9241-400 (2000). Since the focus of our study was not to compare one device to another, this research used ISO test 9241 Point-and-Select after the assessment test to assess the time that the user requires to point and click in a real scenario (Biswas and Langdon, 2015). Fig. 8a shows an image with a blue block in the center of the screen, which the user has to select. Immediately a white block appears in a random position, among several other distraction blocks of the same color as the initial block (Fig. 8b). The time measured is the time lapse between the selection of the blue block to trigger the process and selection of the white block. The time does not include the 1200 ms (on average) necessary for the system to detect a selection. We also carried out the same test with a mouse left-click. This task was carried out after the evaluation of the video game described before.
Fig. 8
ISO 9241 distractor task evaluated (Biswas and Langdon, 2015) (a) Initial block to seek user's attention. (b) Distractor task (white block).
ISO 9241 distractor task evaluated (Biswas and Langdon, 2015) (a) Initial block to seek user's attention. (b) Distractor task (white block).
Results & discussion
The ages of the 30 test subjects varied from 17 to 56 years, and their gender, educational level and technological knowledge also varied, as summarized in Fig. 9. The inferences discussed here can be observed or inferred from the graphs at Fig. 9.
Fig. 9
General information graphs of the test subjects. (a) Gender distribution. (b) Age distribution. (c) Educational level distribution. (d) Technological knowledge distribution of the subjects as perceived by themselves.
General information graphs of the test subjects. (a) Gender distribution. (b) Age distribution. (c) Educational level distribution. (d) Technological knowledge distribution of the subjects as perceived by themselves.
Subjective performance
A high proportion (80%) of the technologically more experienced subjects, who consider their own technological knowledge as advanced, gave the system a grade of 6.0 or higher (1.0 is bad, and 7.0 perfect), showing the positive view that this segment of technologically literate subjects had of the proposed system. Because of their knowledge and skills, technologically advanced subjects are the most critical and objective about technological systems, and their positive opinion of the system reveals that both the functionality and usability of the device are appropriate (Fig. 10).
Fig. 10
Graphs of the results of the survey applied to 30 test subjects. Distribution of the answers: (a) How would you grade this eye-tracking system in general? (b) Compared to a conventional mouse and in accordance with your needs, how do you find this device in general for controlling a computer? (c) How comfortable did you feel physically using this device? (d) How did you find the system's precision for controlling the mouse pointer with your sight? (e) How difficult was it to adapt or get used to moving the mouse pointer with your sight? (f) How long do you think it took you to adapt to moving the mouse pointer with your sight? (g) How uncomfortable was controlling the constant involuntary movement of the pointer with your sight? (h) Question 8 (i) How did you find the system's precision for clicking with your sight?. (j) How difficult was it for you to adapt to clicking with your sight? (k) Based on your performance using this device, how do you find this eye-tracking system as a tool to assist people who suffer a motor disability and are unable to use a conventional mouse? (l) box plot graph of questions (a,d,h).
Graphs of the results of the survey applied to 30 test subjects. Distribution of the answers: (a) How would you grade this eye-tracking system in general? (b) Compared to a conventional mouse and in accordance with your needs, how do you find this device in general for controlling a computer? (c) How comfortable did you feel physically using this device? (d) How did you find the system's precision for controlling the mouse pointer with your sight? (e) How difficult was it to adapt or get used to moving the mouse pointer with your sight? (f) How long do you think it took you to adapt to moving the mouse pointer with your sight? (g) How uncomfortable was controlling the constant involuntary movement of the pointer with your sight? (h) Question 8 (i) How did you find the system's precision for clicking with your sight?. (j) How difficult was it for you to adapt to clicking with your sight? (k) Based on your performance using this device, how do you find this eye-tracking system as a tool to assist people who suffer a motor disability and are unable to use a conventional mouse? (l) box plot graph of questions (a,d,h).The fact that the conventional mouse has been used for decades, and satisfies user's needs due to its complete integration with the computer, makes it an instrument that is hard to replace, even more so when compared with the proposed prototype, which may have lower precision. That is why it is not strange that a large percentage of the subjects (60%) considered it worse than the conventional peripheral, and only 6.67% of them considered it better. On the other hand, it is important to note that a considerable proportion of the subjects (33.33%) qualified the system as neither better nor worse than a conventional mouse, and this speaks of the device's potential, which can be improved in the future and considered as a third input peripheral (Fig. 10b).The microsaccades common to all users are detected by the eye-tracking device and reflected as a slight, but constant, involuntary movement of the pointer. In this respect, only 26.67% of the test subjects considered jittery movements to be imperceptible, while 50% considered it slightly uncomfortable, 20% uncomfortable, and 3.33% very uncomfortable. The fact that most of these opinions are negative indicates that this aspect needs to be improved in future work, inserting computational filters to reduce jittery movements until it is practically imperceptible to the user. The main problem is that our vision is constantly making small movements under an observation zone, which can be difficult to control for some users. Normally, the learning process is performed after some minutes of operation. Thus, when users acquire more experience, the level of discomfort is reduced. Although this vibration was uncomfortable, it did not affect their performance or their opinion on the precision of the pointer. This is reflected in the excellent results achieved in Question 4 of the survey, where 60% of them graded the precision of the pointer with 6.0 or better, even in the presence of the involuntary vibration (Fig. 10d and g).Almost all the subjects (96.67%) evaluated the precision of the pointer with a grade of 5.0 or higher, and that 50% of the whole sample graded it with 6.0 shows that the precision in controlling the pointer with the system is close to the precision achieved by the same user with a conventional mouse. This observation is extremely important, since the basic aim of this system is to achieve, solely with the user's eye movements, the same precision as when using a mouse to control the pointer, an aim that is fulfilled according to the high grades awarded by the subjects (Fig. 10d).Using the proposed system, the test subjects had to fix their visual attention within a bounded range of the monitor in order for the system to detect that they wanted to click in that region. The precision of the system must be sufficient to distinguish between a click on a specific point of the screen and limited fixation time which was not long enough to trigger a click. The general opinion on this characteristic was positive, since 90% of the subjects gave it a grade of 5.0 or better. This tells us that for a conventional user no major difficulty is added by understanding that to click on a point on the monitor using an eye-tracking device as an alternative computer peripheral, he must only fix his eye on that point for a couple of seconds (Fig. 10h).53.33% of the participants graded the comfort of the system as comfortable or very comfortable, and only 16.67% considered it physically uncomfortable. This measurement shows that neither the robust system for holding the device on the subject's head nor the various physical transformations which it has undergone reduced the comfort with which the device was originally developed by the manufacturer. It should be noted that the physical comfort of the subjects making use of a device that is attached to the head is an observation of primary importance. If it causes discomfort, the users will be unwilling to carry out the tests, and would want to finish them as soon as possible, becoming more subjective at the time of evaluating them; while if the feeling is pleasant or neutral it would generate good willingness in the participants, resulting in more honest and objective measurements (Fig. 10c).Perhaps one of the most interesting results of this study has to do with the opinion of the test subjects on the usefulness of the proposed tool to improve the quality of life of people suffering from some motor disability, with respect to their personal performance with the device during the tests. On this point, 93.33% of the subjects considered that it would be a very good tool for people who are unable to use a conventional mouse due to some pathological or physical condition (Fig. 10k). It is important to clarify that, although the above evaluation is subjective, it looks to situate the user in the point of view of another person. This perception is valid in as much as it evaluates the tool based on sight, and people with motor difficulties do not necessarily have sight problems.Finally, we performed an inferential analysis on Questions #1, #4, #8 to assess the variability of each subgroup. In this regard, first, we used a Kolmogorov-Smirnov (KS) test to check the normality of each subgroup at the 5% of significance level. The results reveal that none of the above sets are normally distributed (Fig. 10a, d and h). For this reason, we applied a Wilcoxon signed rank test to find a confidence interval by varying the mean from 1.0 to 7.0. These results are shown in Fig. 10l. As was previously discussed, overall, most subjects perceived that the precision of the system was far above the neutral point of satisfaction (value 4.0). Although, pointer and click precision are more spread out around the mean, both values are above grade 5.5 (see Table 1).
Table 1
Descriptive statistics of questions #1, #4, #8.
Overall score
Pointer precision
Click precision
Average
6,36
5,60
5,90
Standard deviation (STD)
0,850
0,932
1,155
Standard error of the mean (SEM)
0,155
0,170
0,212
Z-score
−15,245
−9,401
−9,009
KS test (p-value 5%)
0,001435
0,0230
0,02169
Descriptive statistics of questions #1, #4, #8.
System performance
Before the set of tests was applied to the subjects, the eye-tracking device was reset individually and a calibration procedure was carried out. This process required the subjects, once seated on the test chair, to fix their sight alternately on three specific points of the monitor for only a couple of seconds, following the instructions of the supervisor. The time associated with this process was variable, ranging from 5 to 15 minutes depending on the user and the eye-tracker adjustments needed. The precision of the transformation between the reference systems of the additional camera and the monitor depends only on the way in which the system detects the infrared LEDs around the monitor edge, either by changing the camera's focus or improving the filtering, segmentation and visual detection methods used by the system.In the case of the transformation between the reference systems of the eye-tracking device and the additional camera, precision depends only on two factors: (1) the selection process of the four pairs of corresponding points between the two reference systems, which must be absolutely precise since an error of a couple of pixels in determination causes slight variations in the calculated transformation matrix and hinders the final performance of the prototype; and (2), once the reference systems have been calibrated to one another, the additional camera must not be moved from the position in which it was fixed to the eye-tracking device, because the slightest movement will de-calibrate the system completely. In order to avoid a de-calibration, both cameras are fixed in such a way that the calibration process is correct while there is no change between them. To calibrate both cameras, our method uses the methodology proposed by Hartley and Zisserman (2000) explained before.In our research, we calculated the time taken by the user to click between the first assessment and the second (discounting the 1200 ms needed for the system to detect that a click is being made). This task was performed only once per user, and was not repeated. In order to assess the system's performance in clicking on the screen, we used the ISO 9241 test as reported in (Biswas and Langdon, 2015). Due to the limited number of study samples, we analyzed the results using the Bootstrap methodology (Efron, 1987), i.e., the real confidence interval of a time distribution of the mouse and the eye-tracker. In terms of the mouse, we obtained a median 825.7 ms with a confidence interval between [823.8–827.6] ms, and a standard deviation on the interval of [306.03–308.72] ms. For the eye-tracker, we obtained a median 3792.9 ms with a confidence interval between [3777.66–3808.14] ms and a standard deviation on the interval of [766.81–788.36] ms. The results indicate that the users carry out the task of clicking with the mouse consistently despite age differences (Fig. 11a). Otherwise, the use of the eye-tracker presents the greater range, especially for users over 30 – these users may have been confounded due to optical devices used in conjunction with the eye-tracker.
Fig. 11
(a) Time taken by the user to click between the first assessment and the second. Test ISO 9241 (Biswas and Langdon, 2015), (b) Box-plot graph, (c) histogram of range 500 ms based on previous data, (d) CDF of paired t-test.
(a) Time taken by the user to click between the first assessment and the second. Test ISO 9241 (Biswas and Langdon, 2015), (b) Box-plot graph, (c) histogram of range 500 ms based on previous data, (d) CDF of paired t-test.Next, we performed an inferential analysis of the data by using statistical tools. First, we used a Kolmogorov-Smirnov (KS) test to check the normality of each data at the 5% of significance level. The results of the KS-test reveals that both measurements are normally distributed (Table 2). Then we conducted a paired sample t-test to determine whether two samples are likely to have come from the same two underlying populations. The results show us that the null hypothesis is rejected since the statistic is below the critic value, which is logic since the natural difference of both experiments. Nonetheless, in order to find a point in which the Cumulative Distribution Function (CDF) p-value is one, we increment the mean time of the mouse data from 0 ms to 4000 ms, and then we evaluate each p-value at each increment to align both distributions (Fig. 11c). After this procedure is completed, we observed a difference of 3600 ms to achieve a maximum p-value. The above analysis implies that the system behavior of the pointing task is consistent with mouse time, i.e., the reaction times for the mouse vs. pointing task are significantly different and that it was necessary to add 3600 ms on average to the mouse task's reaction times to reduce this difference.
Table 2
Descriptive statistics analysis.
Normality analysis
Mouse
Pointing-task
Average
945,53
4091,2
Variance
110560,602
878152,097
Observations
30
30
KS test (null hypothesis, 5%) for normality
True
True
KS test (p-value)
0.93292
0.37637
Paired t-test
t
−20,78849931
Cumulative Prob. P(T ≤ t) two tailed
5,74202E-19
(p-value)
t-critical (two tailed)
2,045229642
Descriptive statistics analysis.To evaluate the system without statistical significance, a control test was also applied, which consisted in describing a circle on the screen following a fixed circle (Fig. 12a). The user with the best performance in the ISO 9241 test was selected to do this test. The user cannot see his results until the assessment is complete. Ten attempts were made; Fig. 12 shows the results of the first, fourth, seventh, and tenth attempts (Fig. 12b, c, d, and e). The graphic results confirmed that eye movements are always in straight lines and very rapid, with involuntary microsaccades occurring when the user fixes his gaze on a specific point; this agrees with the reports found in the literature (Biswas and Langdon, 2015). In the best result the user fixed his gaze on 8 points. Although the design of this test does not allow for inferences to be made about the potential behavior of different users, it is important for our method, since it allows us to visually understand the mechanisms that exist when instructions are given to the user. Of interest is the consistency here with those of the works developed by Yarbus (1967) on the behavior of vision movements.
Fig. 12
User's interaction for drawing a circle on the screen.
User's interaction for drawing a circle on the screen.
General costs
In relation to project costs, and discounting the use of a head-mounted SMI iView HED-tracking System, we used a Microsoft LifeCam 720HD, which we modified to deactivate the autofocus system, and subsequently removed the IR filter, for a cost of $41 USD. Furthermore, a frame was built in order to place the four infrared LEDs around the screen, as well as to contain the transformer for powering them. The cost of the frame was $5 USD, the four LEDs, $2 USD, and the transformer, $7 USD. In total, the modification of the system cost around $USD 55. Let us remember that the system has been designed to be adapted to any head-mounted eye-tracker, for which it could be commercialized as an adaptation add-on.
Conclusions
Most of the questions brought up by the empirical study are divided into three basic study objectives: (1) the general performance of the proposed system; (2) its performance when the subjects attempted to control the pointer with their eye movements; and (3) its performance when the subjects attempt to click on a point of the screen by fixing their gaze on it. To that end, in each test we examined various factors that were useful for breaking down the general performance of the device with respect to its objectives, by observing the adaptation of the subjects to its use, their movement precision, their clicking precision, the physical comfort of the participants during the tests, etc. Finally, collecting all the background information related to this research and studying all the results, the following conclusions were derived in relation to the adapted eye-tracking device and its inclusion in human-computer interaction.In order to click on a point on the screen, the system requires the user to fix their visual attention on that point for a couple of seconds, and the system will carry out the action. The simplicity of this fact is reflected in the easy adaptation of the subjects to carrying out this procedure: more than 73% of them described it as easy or very easy to perform. Only 10% of them described it as a difficult procedure, sometimes because the subjects themselves moved their eyes unconsciously, restarting the time count that triggers the clicking action.Adapting to moving the pointer with their eye movements was very easy (40% of test subjects considered it easy), showing that it is not overly complex for the user to understand that the behavior of the pointer is regulated by their own eye movements. After the tests, the subjects commented that, although it was strange to see how the mouse pointer moved synchronously with their visual attention at first, after a few minutes, this became normal and practically imperceptible. It should be noted that only 16.67% of the subjects described their adaptation as difficult, because at times the lag between the pointer and the user's point of visual attention caused confusion, and they unconsciously tried to follow and reach the pointer with their sight. Surprisingly, the time that the subjects considered they needed to adapt completely to moving the pointer by controlling their eye movements was shorter than expected; it took none of them more than 15 minutes to adapt, and more than half of them (60%) did so in less than 5 minutes. This result is key for future integration of an eye-tracking device as an alternative computer peripheral, because this means that the time needed by a user to become familiarized with it is minimal and does not depend on his technical knowledge.As we have seen in the literature, the growing development of technology oriented towards users with motor skill problems is focused on improving their social integration, and their rehabilitation processes. Our system, in addition to providing a mechanism for controlling a mouse point, provides as simple and efficient means to modify any commercial eye-tracker into a user-monitor interaction system. The design and integration of this new system is low-cost, and can be installed on any eye-tracker with a software development kit (SDK), since it has been designed with open source code libraries (based on OpenCV).Although our system was evaluated with volunteers with average visual, cognitive, and motor skills, the results are valid as long as they meet with the assumption that the user has normal visual capability. Otherwise, it would not be possible to consistently control the mouse pointer. Furthermore, we have employed head-mounted technology, since current devices are low-weight and easily adapted to the user, and allow for the user to move their head to a certain degree without losing system calibration. The greatest limitation is related to users that have glasses. For this group, there are two options: (1) fabricate specific optical lenses for the eye-tracker, since many of these have interchangeable lenses; or (2) adapt the patient's glasses to the eye-tracker in such a way that the focus of the lens is adjusted to that of the eye-tracker. In sum, the proposed system is an innovative and low cost modification for an alternative input peripheral for controlling the mouse pointer.
Declarations
Author contribution statement
Enrique Caceres: Conceived and designed the experiments; Performed the experiments; Wrote the paper.Miguel Carrasco, Sebastian Ríos: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Funding statement
This work was supported by the National Commission of Science and Technology (CONICYT, Chile). Fondecyt grant no. 11100098 and from the School of Information and Telecommunication Engineering at Universidad Diego Portales.
Competing interest statement
The authors declare no conflict of interest.
Additional information
No additional information is available for this paper.