Alexander Nelson1, Sandy McCombe Waller2, Ryan Robucci3, Chintan Patel3, Nilanjan Banerjee3. 1. Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR, USA. 2. 2Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, Baltimore, MD, USA. 3. Department of Physical Therapy and Rehabilitation Science, University of Maryland, School of Medicine, Baltimore, MD, USA.
Abstract
INTRODUCTION: This paper explores the feasibility of using touchless textile sensors as an input to environmental control for individuals with upper-extremity mobility impairments. These sensors are capacitive textile sensors embedded into clothing and act as proximity sensors. METHODS: We present results from five individuals with spinal cord injury as they perform gestures that mimic an alphanumeric gesture set. The gestures are used for controlling appliances in a home setting. Our setup included a custom visualization that provides feedback to the individual on how the system is tracking the movement and the type of gesture being recognized. Our study included a two-stage session at a medical school with five subjects with upper extremity mobility impairment. RESULTS: The experimenting sessions derived binary gesture classification accuracies greater than 90% on average. The sessions also revealed intricate details in participant's motions, from which we draw two key insights on the design of the wearable sensor system. CONCLUSION: First, we provide evidence that personalization is a critical ingredient to the success of wearable sensing in this population group. The sensor hardware, the gesture set, and the underlying gesture recognition algorithm must be personalized to the individual's need and injury level. Secondly, we show that explicit feedback to the user is useful when the user is being trained on the system. Moreover, being able to see the end goal of controlling appliances using the system is a key motivation to properly learn gestures.
INTRODUCTION: This paper explores the feasibility of using touchless textile sensors as an input to environmental control for individuals with upper-extremity mobility impairments. These sensors are capacitive textile sensors embedded into clothing and act as proximity sensors. METHODS: We present results from five individuals with spinal cord injury as they perform gestures that mimic an alphanumeric gesture set. The gestures are used for controlling appliances in a home setting. Our setup included a custom visualization that provides feedback to the individual on how the system is tracking the movement and the type of gesture being recognized. Our study included a two-stage session at a medical school with five subjects with upper extremity mobility impairment. RESULTS: The experimenting sessions derived binary gesture classification accuracies greater than 90% on average. The sessions also revealed intricate details in participant's motions, from which we draw two key insights on the design of the wearable sensor system. CONCLUSION: First, we provide evidence that personalization is a critical ingredient to the success of wearable sensing in this population group. The sensor hardware, the gesture set, and the underlying gesture recognition algorithm must be personalized to the individual's need and injury level. Secondly, we show that explicit feedback to the user is useful when the user is being trained on the system. Moreover, being able to see the end goal of controlling appliances using the system is a key motivation to properly learn gestures.
Technological miniaturization and low-power systems have precipitated an explosive
growth in capability and adoption of wearable sensors. These sensors can be applied
to many medical and rehabilitative applications, including physiological
monitoring,[1] telemedicine,[2] rehabilitation
compliance,[3] and assistive input.[4] The prevalence of such systems
has increased to the point that wearable sensors and systems have become a major
fixture in medical rehabilitative and assistive devices and are poised to change the
way that medical practitioners interact with patients.However, wearable sensors face issues with maintaining patient compliance. If the
patient chooses not to wear these devices, then no intervention can take place on
the person’s behalf. Therefore it is critical that compliance issues are
addressed by either reducing the burden of patient instrumentation or creating an
incentive for the user to wear these systems. To address the former, building
sensors directly into a user’s clothing or environment can greatly reduce
the burden of instrumentation, and provide a more seamless interface in which to
gather and actuate on collected information. E-textile systems solve this by using
textiles as the sensors themselves. For instance, Project Jacquard[5] from Google
Research is an industry project with the goal of creating fabric sensors embedded
into day-to-day clothing.Assisting a user to perform tasks addresses the issue of creating an incentive to
wear such systems. These sensors can then be used to provide input for applications
such as environmental control and home automation. These kinds of sensors have
immediate impact as an accessibility tool, especially to those with upper-extremity
mobility impairments. Persons with this diagnosis, whether the result of a disease
or injury, often require wheelchairs for mobility. Depending on the severity of the
motor impairment, a user may require systems and sensors such as the
sip-n-puff,[6] eye gaze tracking, or electroencephalography (EEG)
monitoring[7] as an assistive interface to facilitate input. Sensors built
into clothing or into the environment such as bedsheets or wheelchair pads can act
as a simple and nonintrusive input for gesture recognition, facilitating additional
interaction patterns for individuals with these kinds of disabilities.Designing a wearable gesture recognition system for upper extremity mobility
impairment is a difficult and multifaceted problem. First, the amount of mobility
that a person has is highly dependent upon the type of injury or disease which
precipitates the mobility impairment. A stroke may remove fractionated movement
(ability to control specific portions of a limb) or somatosensation (feeling), while
a complete spinal cord injury removes all neurological functions below the injury
site. Second, there is large variation in injury levels and hence, the degree of
limb mobility that a user may have is highly variable. A person with a C6 spinal
cord injury may maintain the ability to pronate their forearm, while a person with a
C4 spinal cord injury may lose all or most motor function involving the arms. Third,
within the same injury level or exact diagnosis there exist a broad range of exact
motions a person is capable to perform and what they have relearned through
rehabilitation. Finally, even considering all of these other factors to be held
even, there remains large variation in hand postures, body build, and limb
reachability, all which could potentially affect the best position to place an
assistive device. For these reasons, it is critical that the assistive device
conforms as much as possible to the mobility profile of the individual
user. With this requirement in mind, this paper asks and addresses the
following question: What underlying principles should govern the design of
sensors built into clothing for gesture recognition and environmental control
for individuals with upper extremity mobility impairments?This paper aims to highlight the primary challenges through a clinical study
addressing the use of textile wearable sensors as an accessibility tool for people
with these types of motor impairment. Our custom sensor, illustrated in Figure 1 is an array of
conductive textile plates sewn into fabric such as denim jeans. The flexible sensors
can also be built into items of daily use such as wheelchair pads, bedsheets, and
pillow covers using embroidery. The sensors capture movement in its proximity and
work on the principle of change in capacitance. A user wears the sensor array and
performs gestures in the proximity of the sensor. We focus on an alphanumeric
gesture set based on EdgeWrite.[8] The system uses a position
tracking and dynamic time warping (DTW)–based signal processing algorithm
that converts the raw capacitance measurements to an alphanumeric gesture. Each
classified gesture can be used to control appliances in a home setting. We perform a
usability study of the wearable sensor on five individuals with C3–C6 spinal
cord injuries. Figure 1
demonstrates our experimental setup. The cameras and accelerometer-enabled
smartwatch were used to capture groundtruth and baseline data for the system setup.
We use a custom visualization to provide feedback to the user on how they are
performing the gesture and how the system is recognizing the gestures. The system
was evaluated in a multiday study in a medical school setting.
Figure
1.
The figure shows our prototype system and
experimental setup demonstrated by a subject. The system is composed of
a four by three capacitive sensor array sewn into the denim fabric using
conductive wires. The data from the sensors are analyzed using our
custom-designed wireless module, which uses capacitance measurement ICs,
an MSP430 micro-controller, and Bluetooth wireless module. It also
demonstrates the Smartwatch accelerometer, which was used to profile
gestures for confidence and intensity. The visualization demonstrates
two kinds of feedback; instantaneous positional data, and post-gesture
classification.
The figure shows our prototype system and
experimental setup demonstrated by a subject. The system is composed of
a four by three capacitive sensor array sewn into the denim fabric using
conductive wires. The data from the sensors are analyzed using our
custom-designed wireless module, which uses capacitance measurement ICs,
an MSP430 micro-controller, and Bluetooth wireless module. It also
demonstrates the Smartwatch accelerometer, which was used to profile
gestures for confidence and intensity. The visualization demonstrates
two kinds of feedback; instantaneous positional data, and post-gesture
classification.Our study builds on related work on assistive technology and user studies that
evaluate assistive technology. Here we compare and contrast our work with the most
relevant literature.Assistive technology: Assistive technology is a field that includes the use of any
tool that enables a user to perform a task that would be otherwise difficult or
impossible. This means that assistive technology can be as simple as a
“Mouth Stick” for various pointing exercises,[9] to a complex
system such as EEG-driven wheelchairs[10] or electromyogram (EMG)
prostheses.[11] Gesture recognition in assistive technology has been
considered in recent years, including head gesture control of wheelchairs[12] and smart
interfaces to assist individuals with cognitive disabilities.[13] Recent years
have seen the growth of wearable sensors such as wrist-worn accelerometers, wrist
bands, and headgear for gesture recognition. More recently there is a surge of
systems where sensors can be built into items of daily use such as
clothing[5] for gesture recognition. While most of these systems are
touch-based, our textile sensor system is touchless and uses change in capacitance
to measure movement in the proximity of the sensors.[14] Touchless sensing is critical
for the considered population where users often experience limited sensitivity to
their periphery and continuous touch can lead to skin abrasion. This paper explores
the feasibility of using touchless wearable textile sensors built into clothing for
gesture recognition in individuals with limited mobility.User studies on assistive devices: Assistive technology cannot be developed
completely in laboratory environments. The devices themselves are meant to apply to
specific populations, and therefore must be tested rigorously within that
population. Thus, user feasibility studies and evaluations have been performed to
evaluate assistive technology for multiple populations including cerebral
palsy[15,16] dementia,[17] aging-in-place,[18] and spinal
cord injury.[19]
Motor learning for individuals with mobility impairments has been studied for
rehabilitation purposes. Amongst the salient conclusions drawn is the importance of
controlling feedback.[20,21] Additionally, the use of virtual reality has been considered
for rehabilitation of upper-extremity impairment.[22] Our study considers
individuals with upper-extremity mobility impairments as a result of spinal cord
injury to the cervical vertebrae. These individuals are typically wheelchair users
and use our wearable system for environmental control in a smart home. Our goal is
similar to the usability studies performed on various assistive care devices.
Through our system, our aim is to draw fundamental elements that must be considered
in the design of the sensor hardware and software, as well as the feedback
mechanisms for learning purposes.
Materials and methods
In our study, we use an array of textile capacitive sensors built into denim fabric.
The sensor array can be placed on a subject’s thigh or built directly into
clothing. The array used for the experiment is composed of three rows of four
sensors. Each sensor is one square inch (1 × 1). The outside
of the array measure 6.5 in. by 7 in., with the sensors spaced equidistant from each
other. The metallic textiles couple electrically with the body of the user such that
hand gestures performed in the vicinity (within a few centimeters) of the sensor
array are captured by the system. We use a hierarchical signal-processing algorithm
to convert raw capacitor values to gestures. The on-body data-collection module
performs gesture classification through two steps: (1) The raw capacitor values are
converted to a two-dimensional projection of the geometrical centroid of the hand
onto the capacitor sensor array (CSA) and (2) a pattern matching algorithm based on
dynamic time warping[23] to classify the gesture. This study focuses on interpreting
alphanumeric gestures based on the EdgeWrite gesture set.[8] A full write-up of the system is
available in our previous work.[24] Below we briefly describe the
tracking algorithm.Hand tracking algorithm: The two-dimensional position of the hand is calculated as a
linear weighted summation of sensor positions for any number of sensors
(N), multiplied by their capacitance (c). This
acts as a spatial centroid of capacitance which is used to estimate two-dimensional
positions as
by the following equations:Gesture classification: Gestures are segmented in real time by comparing the sum of
the capacitor sensors against a threshold T. A gesture
is inferred as each
tuple calculated from equations (1) and (2) while
is less than T. If a gesture is exceedingly short or
long, then it is rejected as an inadvertent gesture. The remaining gestures are then
classified using DTW,[23] a distance-based vector quantizer, by comparing against a
set of training gestures called the codebook. The algorithm is described in
Algorithm 1. DTW uses dynamic programming to create the smallest sum distance
between two time series by compressing or dilating time.
Algorithm 1
DTW (O, M)Input:
(positions for the gesture) (model
positions for the gesture)Output:
d (warped distance),
for i: = 1 to len(O)
doend forfor j: = 1 to len(M)
doend forfor j: = 1 to len(M)
dofor i: = 1 to len(G)
do+ distanceend forend forreturnComplexity:Experimental setup: To analyze how each subject performed the gestures, a set of
several input devices was used. Figure 1 demonstrates the setup that was utilized in the trials. Our
fabric CSA was used as a positional localization system to detect and recognize
gestures in a very low power manner. Two cameras were mounted, which would capture
two views of the gestures; the distance of the hand from the CSA, and the
x–y location of the hand with respect to the CSA. These two parameters are
important in the calculation of the position of the hand by the CSA device. A Sony
SmartWatch is used to capture the accelerometer values of the hand which was used
for performing gestures. A virtual reality system built in the Unity
framework[25] demonstrates instantaneous feedback to the user. This feedback
is given in two forms, a 3D tool that shows the motion of the arm as it performs the
gesture, and a 2D tool that demonstrates the user’s calculated
positions. Finally, we use the gestures to control an off-the-shelf Z-Wave home
automation system. Specifically, we used the home automation system to control
lights, televisions, and fans.
User study trials
Our experiments were performed with five individuals who have spinal cord
injuries. Identities of the subjects were anonymized and the subjects were
compensated for their time. This study was approved by the University of
Maryland Institutional Review Board (HP-00060811). All participants signed
informed consent. The participants were males, right-hand dominant with age
ranging between 24 and 50 years. The demographics of the subjects in the study
is shown in more detail in Table 1, which provides the injury site, the American Spinal Injury
Association Impairment Scale classification,[26] the mode of transportation
that they most commonly use, and time since the injury at the date of the user
study. Each subject participated in two sessions, with each session containing
several phases. The first session consisted of four phases; a training phase, an
examination phase, a testing phase, and a recall phase.
Table
1.
Demographic information of users in
study.
User
Level/type of injury
Transportation
Age
Since injury
Hand
Pointer
1
C6 complete ASIA A
Manual chair
40
6 months
Right
Finger
2
C5-6 complete ASIA A
Power chair
45
8 months
Right
Side of hand
3
C4 incomplete ASIA C
Power chair/walker
29
5 months
Either
Fist
4
C5 complete ASIA A
Power chair
24
8 months
Right
Fist
5
C7 complete ASIA A
Power/manual chair
38
12 months
Either
Palm
Demographic information of users in
study.During the training phase, the user became comfortable using the system by moving
their hand around above the array and watching the virtual reality application
demonstrate the position extraction calculated by the system. The user then
learned each gesture in the set of gestures defined by and depicted in Figure 2, where each
gesture is defined by the alphanumeric character which it approximates. To be
trained on a gesture, the user must correctly perform each gesture five times in
succession on two different occasions. The virtual reality application was used
to provide instantaneous positional feedback as well as post-gesture
classification (e.g., “Gesture A”) feedback.
Figure
2.
A pictorial depiction of the
EdgeWrite[8] gestures used in
the user study. Only Gestures (a–e) were used in the first
trial, while gestures (j–v) were added to the original set
for the second trial.
A pictorial depiction of the
EdgeWrite[8] gestures used in
the user study. Only Gestures (a–e) were used in the first
trial, while gestures (j–v) were added to the original set
for the second trial.During the examination phase, the user was presented a set of five gestures in
random order, and the user must perform three sets correctly to verify that the
user is trained on the system and gestures. No instantaneous feedback was
provided, but the user was given feedback on classification through verbal
instruction.The testing phase consisted of the user selecting three gestures from the set of
five gestures, and relating them to the three home automation components. For
instance, a user may choose to relate “A” to activating and
deactivating the fan. They then are presented 150 home automation commands in
random order, and attempt to correctly control the fixtures using the
gestures.Finally, the recall phase consisted of the user attempting to create two sets of
five sequential correctly classified gestures for each of the five original
gestures.The second trial consisted first of another “recall” session
where the subjects were not prompted with what the gestures looked like, and
were asked to perform the five gestures from the original gesture set. The
subjects then were given eight new gestures to learn to complete the set
depicted in Figure 2
from the EdgeWrite set, and trained on each of these gestures, again with
instantaneous feedback using our Unity 3D virtual reality system. They chose
five of these eight gestures, which were then given in a random order until they
were able to correctly perform three sets of five gestures correctly. Lastly,
the subjects chose three gestures from the complete gesture set, and performed
home automation testing until the total time allotted for the session had
expired (typically ≈75 gestures). Results of the user study are
presented qualitatively in the following section through the distillation of two
specific insights, and quantitatively in Table 2 and in Figure 3.
Table
2.
Accuracy of gesture classification when using
a set of template gestures.
Training and evaluation (%
accuracy)
Initial testing (%
accuracy)
User
“A”
“B”
“C”
“D”
“E”
“Testing 1”
“Testing 2”
“Recall 1”
“Recall 2”
1
100
92.5
92.5
100
87
97
100
98
95
2
90
100
100
100
85
90
N/A
98
96
3
100
82
100
100
50
67
90
N/A
92
4
100
92.5
100
86
92.5
90
87
98
98
5
100
92.5
100
100
70
92
93
98
92
Figure
3.
Confusion matrices demonstrating the
percentage of classification of each gesture type to the other
classifications. Gestures with actual classification of
“?” are inadvertent gestures that were classified by
the system, and represent a false-positive classification. If a
subject had no inadvertent gestures, that column is labeled
“nan.”
Accuracy of gesture classification when using
a set of template gestures.Confusion matrices demonstrating the
percentage of classification of each gesture type to the other
classifications. Gestures with actual classification of
“?” are inadvertent gestures that were classified by
the system, and represent a false-positive classification. If a
subject had no inadvertent gestures, that column is labeled
“nan.”
Results and discussion
Presented below is the raw classification accuracy of the system derived from trials
conducted during the user study. The gestures were classified in real time during
the study against a set of template gestures, which were created in a laboratory
setting. The accuracy is shown in confusion matrix form for each subject in Figure 3 and more granularly
in Table 2.From a quantitative and qualitative analysis of the gestures performed during the
user study, we draw two salient insights.Personalization is critical to the success of a wearable gesture
recognition system. Each individual performs gestures of different
sizes, different shapes, and with different speeds. It is a function of
the injury level which affects the reach, speed, and way the gestures
are performed. Even for an individual the way a gesture is performed
varies based on factors like fatigue and motivation. It calls for
personalization at two levels: (1) sensor hardware construction and
placement on clothing and (2) design of gesture recognition algorithms
that adapt to the users.Explicit feedback in the form of visualization is important for
training: Using controlled feedback through instantaneous and post hoc
methods help users learn the gestures faster. It is also critical that
the learning occurs in the context of the target application. For
instance, in our study we found that our subjects were motivated to
learn the gestures when they could use it in the context of the smart
home automation system. Our results demonstrate that while sensor
hardware and software development is important, a critical ingredient to
the adoption of these systems is easy-to-use feedback and training
methods.Using the above insights, we propose algorithms and hardware enhancements for
personalized adaptations to wearable sensors, and recommendations of feedback
mechanisms that the gesture recognition system will benefit from. Our study can
inform the design of usable wearable sensors for individuals with limited
upper-extremity.
Insight 1: Personalization is critical to the success of a wearable gesture
recognition system
While it may seem natural that the inclusion of user preference and ability
should be considered in accessible and assistive technologies, many commercial
or off-the-shelf devices are not natively configurable, which can lead to device
abandonment.[27] Along with rapid prototyping, this fact has led to a
dramatic increase in Do-It-Yourself (DIY) assistive technology
development.[28] Wearable devices, such as the fabric CSA used in this
study, are well suited to use individual configuration for particular users in
the same way that clothing can be tailored or fit specific body types.User-level adaptation: The need for individual configuration became readily
apparent as the users began to perform gestures over the array. Each of the five
subjects chose to approach the CSA in a unique manner, as is demonstrated in
Figure 4.
Figure
4.
Hand position of the five subjects in the
study. Each individual chose to use a unique hand position in
performing gestures, demonstrating the need for
personalization.
Hand position of the five subjects in the
study. Each individual chose to use a unique hand position in
performing gestures, demonstrating the need for
personalization.The exact rotation of the hand has a reduced effect on the operation of a
CSA-based system compared to inertial-based systems, as the position tracking is
calculated as the centroid of capacitance coupled to the remote body. Fidelity
of the position calculation is inversely proportional to the distance of the
remote body that is coupled into our CSA and directly proportional to the size
of that body. For example, subject 1 does not retract his fingers to make a
fist, and instead uses his fingers as a pointer. While the relative area of the
body is small (finger compared to a fist), the subject more accurately tracks a
specific location very near to the array, as is demonstrated in Figure 5(a).
Figure
5.
Demonstration of how the capacitance centroid
calculation can be affected by various hand positions. The red
triangle indicates the position that the centroid would be
calculated in one-dimension above the array. The black lines are
approximate capacitance based on distance to the body. Notice that
the finger (a) is very localized, and must be closer to obtain the
same amount of capacitance as the fist. The fist (b) has a large sum
capacitance and the centroid of capacitance is centered roughly with
the position of the fist. The arm (c) has a very large sum
capacitance, but the centroid is shifted away from the fist due to
the arm extending across the other sensor
pads.
Demonstration of how the capacitance centroid
calculation can be affected by various hand positions. The red
triangle indicates the position that the centroid would be
calculated in one-dimension above the array. The black lines are
approximate capacitance based on distance to the body. Notice that
the finger (a) is very localized, and must be closer to obtain the
same amount of capacitance as the fist. The fist (b) has a large sum
capacitance and the centroid of capacitance is centered roughly with
the position of the fist. The arm (c) has a very large sum
capacitance, but the centroid is shifted away from the fist due to
the arm extending across the other sensor
pads.Conversely, user 3 approaches the array with a fist which is a much larger body,
and spreads out the capacitance over a broader area (e.g., Figure 5(b)), but allows a greater
distance from the array without losing fidelity. The rotational orientation of
the hand with respect to the CSA was typically the palm facing downward, but
subject 2 found it more comfortable to have his hand rotated 90°, while
subject 5 chose to alternate his palm between facing up or down as his arm would
become tired over time. These different hand positions and orientations could
potentially reduce accuracy in vision and inertial-based systems. The effect on
the CSA is demonstrated through the depiction of the position centroid
calculation in Figure
5.An artifact that is more specific to the CSA-based system is that the angle in
which one approaches the CSA can affect the position calculation. Subject 4 has
multiple incomplete spinal cord injuries, and consequently his mobility profile
is more diverse. He is capable of using an upright platform walker, but his
movements are typically slower and his shoulder movement is less pronounced.
When he uses the CSA, his entire arm stretches across the array, which greatly
spreads out the calculated centroid for position. This effect is demonstrated in
Figure 5(c).
Therefore, it was easier for him to have the array oriented toward him on an
incline, shown in Figure
4(3B). Additionally, depending on the movement, he may choose to use
the hand that is closer to the majority of motions so that there is not as much
strain of reaching across the array.These adaptations by the user allowed them to approach the CSA in a way that was
comfortable and provided maximal accuracy given a template gesture set. This
training phase is indicative of a typical motor-rehabilitation session with an
occupational therapist as a person learns to use a new assistive device.
However, assuming that a person will be able to adapt their gestures to the
device is dangerous as many users may not have the ability to perform certain
motions based on mobility constraints. Therefore, in order to perform per-person
personalization without requiring the user to adapt to the system, we want to
consider the inverse; a system which adapts to the user. To
enable this, we consider adapting the gestures to fit each individual’s
own mobility profile through custom templating. This adaptation is introduced,
and a potential solution is evaluated in “Adaptations for
Personalization.”
Insight 2: Explicit feedback in the form of visualization is important for
training
Applying the importance of controlled feedback in motor learning and
rehabilitation, a portion of the study was conducted to determine how a
user’s gestures may vary as a function of the type of feedback which
they are receiving. As the user was initially learning the system and the
EdgeWrite gesture set, instantaneous feedback was provided in the form of
virtual reality visualization. This component was removed and replaced by verbal
confirmation of correct gesture performance. The verbal confirmation was then
removed and replaced by activation of home automation hardware. In the second
trial, the user was instead prompted first with the verbal confirmation. We then
covered the array so that they did not have localized visual feedback, and were
only provided with verbal confirmation. While learning a new set of gestures,
they were provided with the online feedback. The session ended with the home
automation activation as feedback.In all of these trials, the gestures that they were performing were evaluated for
accuracy in real time against template gestures. The home-automation component
provides motivation for the user to correctly perform each gesture as the system
is intended to be an accessibility device which could help these individuals in
the future. It is natural, then, for the method in which the user performs the
gesture to change over time as they attempt to match the template to obtain a
higher accuracy. The individual is learning the system based on
the visual online feedback. Therefore, the online feedback through instantaneous
positional data was removed once the user was trained on a gesture to prevent
manipulating the particular user’s form. The user was instead provided
only with knowledge of results based on confirmation from the attendant, or
activation of home automation hardware.Analysis of gesture timing demonstrates an important factor in removing the
instantaneous feedback. When the user is given the visual feedback they tend to
try to trace the gesture specifically, which can lead to slower more diverse
gestures. The gesture speed reduction is shown across four of the five subjects.
The gesture length throughout the trial is demonstrated in Figure 6. Without visual feedback, the
user tends to focus solely on their own hand in relative position to the CSA,
which results in quicker more fluid motions.
Figure 6.
This figure
demonstrates the length (in seconds) of each gesture for Subject
1’s first trial. A linear fit line demonstrates a reduction
of over a second per gesture throughout the
trial.
This figure
demonstrates the length (in seconds) of each gesture for Subject
1’s first trial. A linear fit line demonstrates a reduction
of over a second per gesture throughout the
trial.To test this, we experimented with covering the CSA with a nonconductive material
so that the user did not have the ability to locate the gesture to specific
pads. This is shown in Figure
7. Removing the ability of the individual to see the CSA results in
smoother gestures, but did result in slightly lower accuracy as subjects would
move their hand outside the bounds of the array, causing the gesture to
terminate early. The overall gesture accuracy without any visual location
confirmation dropped approximately 7% compared to performing just
without the virtual reality feedback.
Figure 7.
This figure shows a
portion of the trial where we cover the capacitor sensor array with
a nonconductive material so that the motions can still be captured,
but the user cannot see where their hand is with respect to the
plates.
This figure shows a
portion of the trial where we cover the capacitor sensor array with
a nonconductive material so that the motions can still be captured,
but the user cannot see where their hand is with respect to the
plates.In order to consider a particular gesture to be trained for a user, the motion
should be consistent and fluid, which is produced only once the instantaneous
feedback is removed. The VR feedback acts to demonstrate the gesture, and allow
the user to properly plan and learn the routes, but is a crutch that needs to be
removed for motor learning to properly occur.
Personalized gesture recognition
In this section, we introduce an adaptation to the CSA recognition algorithms
derived from our observations in the user study. Using personalized training
data for gesture classification based on each user’s own motions, we
attempt to derive a representative gesture set from the user’s own
motions. Using a data-driven approach to generate these templates guarantees
that the gestures fit within the user’s own ability, as they are created
from their own motions.Gestures as input have become a familiar mode of interfacing with devices. Some
argue that gestures are a more natural user interface while others consider that
gestures, by their ephemeral nature, lead to inherent confusion when
misclassification occurs.[29] A common thread among the
field is that accuracy (reduction of both false-positive and false-negative
classifications) and availability are important factors when considering gesture
interfaces. Accuracy can be increased with sophisticated signal processing, but
that can come at the expense of increased power, which can limit the
availability of systems due to reduced battery life. A compromise exists in the
use of personalization in the gesture recognition software. The CSA system used
in this study uses DTW as a recognition algorithm. DTW is a vector-quantization
classifier that calculates a Euclidean distance of time-warped positional data,
and compares the relative distances against a set of template gestures in the
recognition codebook. Personalization of the gesture set can be implemented
through replacing the codebook with gestures that are indicative of those
performed by the actual user. This personalization is important for gesture
recognition; with more similarity between a candidate gesture and the template
gestures comes either higher accuracy or a more dense set of gestures.
Consistent user motions that are not reflected in the templates should be
corrected. Figure 8
demonstrates this phenomenon as the largest proportion of a subject’s
motions is not captured correctly by the template gesture.
Figure
8.
This picture demonstrates a heat map of a
particular subject’s gesture “A” compared
against the template gesture “A,” which is
represented by the red line. The inconsistency along the left-hand
side of the gesture represents a consistent error that reduces the
relative accuracy between these gestures, and could potentially
result in an incorrect classification.
This picture demonstrates a heat map of a
particular subject’s gesture “A” compared
against the template gesture “A,” which is
represented by the red line. The inconsistency along the left-hand
side of the gesture represents a consistent error that reduces the
relative accuracy between these gestures, and could potentially
result in an incorrect classification.To perform this adaptation, we introduce two methods to replace the gesture
recognition codebook. First, we enable the user to create their own gesture set
through a distinct training session. Second, we consider beginning with a
template set, and including the user’s gesture data in the creation of
user specific templates. Both adaptations are discussed in the following two
subsections.Targeted training session: The first adaptation considers a specific targeted
“training” session. In this session, a user decides on a set of
gestures which they want to include in the set. The user then performs a number
of these gestures, and a representative set is chosen to be included in the
codebook. This process is manual, requiring a reviewing procedure for selecting
and curating gestures. We have created a toolkit that is usable by occupational
and physical therapists to help a user create a set of gestures. Our
implementation is depicted in Figure 9.
Figure 9.
Training tab for the CSA
system.
Training tab for the CSA
system.Incorporating in situ motions: As an alternative or extension to the training
session, a template set of gestures can be slowly replaced with the
user’s actual motions. This can be done in a number of ways. For one, a
user can provide periodic feedback about the correctness of a classified
gesture, either through verbal or behavioral feedback mechanisms (e.g., If the
user does not try to correct the home automation actions that were performed, or
does correct in a specific manner). Another method is to create a profile of the
average motions, which are classified as a particular gesture. Either of these
methods have the additional advantage of allowing the system to evolve with the
user over time. Enabling in situ adaptation is particularly useful for assistive
applications where a user’s mobility profile may change over time,
whether through gaining additional motion through therapy or losing mobility
through the progression of a disease or injury state.To simulate the creation of a test set, we chose three gestures of each
classification (i.e., three each of each “A
“–”E” gestures) from the beginning of each
user’s first trial which had the smallest relative distance to the other
gestures in the trial. These gestures should be similar to a representative set
of gestures selected during a targeted training session. The gestures chosen for
the codebook were removed from the evaluation set, and then the accuracy was
re-evaluated with the new “personalized” gesture set. This
method resulted in the results shown in Table 3.
Table
3.
Accuracy of gesture classification when using
a personalized set of gestures.
User
Training and evaluation
“Testing 1”
“Recall 1”
“Recall 2”
1
100% (+5.6%)
97% (+0%)
100%
89%
2
95.8% (+0.8%)
88% (−2%)
92.9%
95%
3
100% (+13.6%)
72% (+5%)
N/A
79%
4
100% (+5.8%)
91% (+1%)
96.6%
94%
5
100% (+7.5%)
95 (+3%)
98.0%
92%
Accuracy of gesture classification when using
a personalized set of gestures.From Table 3, we can
draw a couple conclusions. First, the training accuracy, represented the highest
accuracy increase when using a personalized gesture set. The gestures performed
during this segment were often different than the rest of the set in both
cadence and motion paths. After removing the instantaneous visual feedback from
the user, their motions began to be more consistent, and the motions more
typically followed the template set as the user properly
“learned” each gesture. Second, Subject 3, who had struggled to
use the system in the first trial, fundamentally changed the way he approached
the system such that the personalized gesture set tailored to him from the
training data of the first day no longer matched the motions he was performing.
This suggests that changes in status that manifest in new motion profiles can
greatly inhibit the consistent use of such a system over time. This last
conclusion points to the necessity of wearable assistive devices to continue
to learn about the user and adapt its internal classification
algorithms.
In situ gesture personalization
As described above, the ability to adapt to previously classified motions to
continually modify the template gestures can greatly extend the lifetime
usability of assistive wearable gesture recognition. In this section, we propose
and evaluate a method for calculating and inserting these personalized gestures
into the template gesture set.Figure 10 demonstrates
an important factor in the creation of these personalized evaluation sets. The
histogram shows that the relative distances can be an additional predictor
toward the “correctness” of a given gesture. This knowledge
enables the ability to select specific gestures with high correctness
probabilities to be inserted into the template set without the need for explicit
feedback from the user. However, these correctness values are probabilistic, and
therefore the selection algorithm should provide an extra level of filtering to
remove gestures which appear to be correct classifications.
Figure
10.
This histogram demonstrates the relative
accuracy as opposed to the binary classification accuracy. Gestures
are cast into bins based on the distance metric calculated by the
dynamic time warping algorithm. Green bins are correctly classified
gestures, and red are incorrectly classified gestures. The dotted
lines are the 25th, 50th (median), and 75th percentile for each
correct and incorrect. This figure demonstrates that knowing the
relative distance can be useful in determining the correctness of a
classified gesture.
This histogram demonstrates the relative
accuracy as opposed to the binary classification accuracy. Gestures
are cast into bins based on the distance metric calculated by the
dynamic time warping algorithm. Green bins are correctly classified
gestures, and red are incorrectly classified gestures. The dotted
lines are the 25th, 50th (median), and 75th percentile for each
correct and incorrect. This figure demonstrates that knowing the
relative distance can be useful in determining the correctness of a
classified gesture.A median filter is a fairly simple approach to removing outliers and selecting
the average data from a series. To use a median filter for this application,
there are some modifications that must be made. First, gestures can have
different lengths as discussed above in Figure 6. In order to select the median
set of (x,y) positions in a time series, the inputs must be resampled to vectors
of the same size. Further, gestures may vary in cadence (i.e., the relative
speed of specific motions). This means that points in time along two gestures
may not line up to enable proper use of the median filter. Therefore, the
resampling method should not maintain time invariance, but should instead
maintain the aspect and position data as best as possible. A similar method is
discussed in the Wobbrock et al. $1 Recognizer paper.[30] This
resampling method calculates the entire distance of the composite line segments,
and selects evenly spaced points along the original path.After resampling a set of gestures, the median filter then calculates the
median × coordinate and median y coordinate at each
point (t) in the time series. This is demonstrated in Figure 11 for the x-coordinates of
Subject 1’s “A” gestures. This procedure is done for
each subject and each gesture to create a personalized gesture recognition
codebook, which can be used for classifying the gesture set. The variability
that exists between these gestures on a subject-by-subject basis can be seen in
Figure 12.
Figure
11.
This figure demonstrates a resampling of the
x-coordinates of a particular subject’s “A”
gestures from the first trial. Each black line is a single gesture
performed by the subject. The red line is the median value at each
point i in the time series. The median point at
each sample is collated into a composite gesture, and thus creates
the personalized gesture.
Figure
12.
This figure demonstrates each of the five
median ‘A’ gestures created by the subjects’
gestures during the first trial. These are used to classify gestures
from the second trial as an additional comparison point to static
templates.
This figure demonstrates a resampling of the
x-coordinates of a particular subject’s “A”
gestures from the first trial. Each black line is a single gesture
performed by the subject. The red line is the median value at each
point i in the time series. The median point at
each sample is collated into a composite gesture, and thus creates
the personalized gesture.This figure demonstrates each of the five
median ‘A’ gestures created by the subjects’
gestures during the first trial. These are used to classify gestures
from the second trial as an additional comparison point to static
templates.To evaluate this gesture set, we compare the classification accuracy between the
template gesture set and the personalized gesture set in Figure 13. The classification was
performed only on Trial 2 gestures, while the personalized gestures were
selected from only Trial 1 gestures. Using the personalized gesture set resulted
in a reduction of 7% (three additional misclassifications) for the
“B” gesture, and a gain of 11% (four additional correct
classifications) in the “E” gesture. While this is a modest gain
in accuracy overall, continuing to adapt to the user’s motions should
result in a higher accuracy over time. This can be shown by the relative error
which can be calculated by the distance metric discussed above. Figure 14 demonstrates
this through two histograms of the relative distances between candidate and
codebook gestures for a personalized and a template set of gestures.
Figure
13.
These two confusion matrices demonstrate the
accuracy change when using the template gesture set (left) and the
personalized gesture set (right) for all
subjects.
Figure
14.
This figure demonstrates the relative
distances between the template gesture set (top) and the
personalized gesture set (bottom). The personalized gesture set has
a much smaller relative distance for classification than the
template gesture set.
These two confusion matrices demonstrate the
accuracy change when using the template gesture set (left) and the
personalized gesture set (right) for all
subjects.This figure demonstrates the relative
distances between the template gesture set (top) and the
personalized gesture set (bottom). The personalized gesture set has
a much smaller relative distance for classification than the
template gesture set.
Conclusion
This paper evaluates the usability of CSAs as an accessibility device for persons
with upper extremity mobility impairments. In verifying the accuracy of the system,
we establish two insights into the development and training of a wearable
accessibility device. First, personalization is important, both on the part of the
user adapting to the system, as well as the system adapting to the user in a
symbiotic feedback loop to create an accurate, responsive interface. Second,
training users in accessibility devices can be improved through the control of
knowledge of results. By modulating what feedback the user receives during training,
the practitioner can motivate particular behaviors to allow user’s to feel
comfortable using the system. From the first insight, we proposed and evaluated two
methods of creating a personalized gesture set through study of motions performed by
people with motor disabilities. The first uses a specific targeted
“training” session to generate static personalized gestures
comprised of the individual’s own motions, to ensure that the gesture set
fits in their own mobility profile. The second considers continuous adaptation by
selecting the average correctly classified gesture in a set to add to the codebook,
and slowly replace the template set. This adaptation allows the gestures to evolve
over time with the user’s mobility of the gesture recognition algorithm
through the use of the individuals’ own movements in training. The insights
and adaptations derived in this paper enable this textile accessibility system to be
more available and more accurate to users, and create a more sustainable and
reliable system for people to interact with their environment.
Authors: G Onose; C Grozea; A Anghelescu; C Daia; C J Sinescu; A V Ciurea; T Spircu; A Mirea; I Andone; A Spânu; C Popescu; A-S Mihăescu; S Fazli; M Danóczy; F Popescu Journal: Spinal Cord Date: 2012-03-13 Impact factor: 2.772