| Literature DB >> 35771797 |
Von Ralph Dane Marquez Herbuela1, Tomonori Karita1, Yoshiya Furukawa2, Yoshinori Wada1, Akihiro Toya1, Shuichiro Senba3, Eiko Onishi3, Tatsuo Saeki3.
Abstract
Communication interventions have broadened from dialogical meaning-making, assessment approaches, to remote-controlled interactive objects. Yet, interpretation of the mostly pre-or protosymbolic, distinctive, and idiosyncratic movements of children with intellectual disabilities (IDs) or profound intellectual and multiple disabilities (PIMD) using computer-based assistive technology (AT), machine learning (ML), and environment data (ED: location, weather indices and time) remain insufficiently unexplored. We introduce a novel behavior inference computer-based communication-aid AT system structured on machine learning (ML) framework to interpret the movements of children with PIMD/IDs using ED. To establish a stable system, our study aimed to train, cross-validate (10-fold), test and compare the classification accuracy performance of ML classifiers (eXtreme gradient boosting [XGB], support vector machine [SVM], random forest [RF], and neural network [NN]) on classifying the 676 movements to 2, 3, or 7 behavior outcome classes using our proposed dataset recalibration (adding ED to movement datasets) with or without Boruta feature selection (53 child characteristics and movements, and ED-related features). Natural-child-caregiver-dyadic interactions observed in 105 single-dyad video-recorded (30-hour) sessions targeted caregiver-interpreted facial, body, and limb movements of 20 8-to 16-year-old children with PIMD/IDs and simultaneously app-and-sensor-collected ED. Classification accuracy variances and the influences of and the interaction among recalibrated dataset, feature selection, classifiers, and classes on the pooled classification accuracy rates were evaluated using three-way ANOVA. Results revealed that Boruta and NN-trained dataset in class 2 and the non-Boruta SVM-trained dataset in class 3 had >76% accuracy rates. Statistically significant effects indicating high classification rates (>60%) were found among movement datasets: with ED, non-Boruta, class 3, SVM, RF, and NN. Similar trends (>69%) were found in class 2, NN, Boruta-trained movement dataset with ED, and SVM and RF, and non-Boruta-trained movement dataset with ED in class 3. These results support our hypotheses that adding environment data to movement datasets, selecting important features using Boruta, using NN, SVM and RF classifiers, and classifying movements to 2 and 3 behavior outcomes can provide >73.3% accuracy rates, a promising performance for a stable ML-based behavior inference communication-aid AT system for children with PIMD/IDs.Entities:
Mesh:
Year: 2022 PMID: 35771797 PMCID: PMC9246124 DOI: 10.1371/journal.pone.0269472
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Experimental set-up showing the location and weather sensors and the videotape recorder (VTR) placed 2 meters from child with PIMD/IDs and caregiver.
Fig 2Data analyses workflow from dataset combination to classification accuracy comparison.
Movement definitions and manifestations of the 7 behavior outcome classes in comparison with the Attuning Theory.
| This study | Attuning Theory | |||
|---|---|---|---|---|
| Category | Definition | Manifestations (sample extracts) | Category | Definition |
| Calling | Verbal (e.g. greetings, vocalization) or non-verbal behavior (e.g. smile, staring, pointing, etc.) aim to get the attention of the caregiver or teacher | -moves mouth to say only the "mas" part of "Ohayo gozaimasu"; | Engagement (joint attention) | Engagement of both partners in the dyad may be directed to the same focus. |
| Response | Verbal (e.g. “yes”, “bye-bye”, etc.) or non-verbal (e.g. raises hand, nodding, wave hands, clapping, etc.) responses to other’s questions or gives signals to other person | -Pointing or pushing somewhere in the book with the left hand; | Assent | demonstrates attuned agreement between the dyad. One partner carries out an action or asks a question and the other responds in a clear affirmative manner. |
| Emotions | Mostly non-verbal expressions of feelings of being happy, pleasure, excited, perception of fun, angry, worried, troubled etc. (e.g. smile, moving or opening mouth, shaking head vertically or body, looking away, etc.) | Harmony, delight, pleased, pleasure | ||
| Interest | Verbal (e.g. “let me see”, “yes!”, “what’s that?”) or non-verbal (e.e.g pointing, raising hands, standing up, nodding, etc.) that hints interest in an object, person or action or doing an action. | -Opens mouth wide, smiles and bends over. Says "Oh, hi, hi, hi" in a strained voice; | Interest | the communication partner demonstrates an obvious attention and interest in (attuning to) the action that is going on. The attention is focused through the action. The result of the interplay of attention and action is that the attuning level of the partners rises and falls in tandem with the attention displayed to the action. |
| Negative | Verbal (e.g. “no”, “don’t like”, “dislike” or “end”) or non-verbal actions and vocalizations (e.g. closes mouth, sticks out tongue, turns face away) to express refusal or disagreement. | -refuses to take a spoonful of rice in his mouth. Closes his mouth when a spoon is put close to his mouth; | Pro and negative attuning (Refusal) | In this state, pro attuning coexists with negative attuning. The communication partners |
| Selecting | Mostly non-verbal actions or gestures (e.g. pointing, tapping, reaching) to express decision or desire to choose between or among objects. | -Points to a picture book. Says a sound similar to "this"; | - | - |
| Physiological response | Verbal (e.g. saying “rice”, “sleepy”, “thirsty”, etc.) and non-verbal (e.g. closing eyes, not opening mouth) vocalizations and actions to express functions or desires relating to normal physical or bodily responses. | -sleepy, eyelids close. Look up and do not move; | - | - |
Feature (category) description of the 3 and 2 behavior outcome classes in comparison with the Attuning Theory.
| This study | Attuning Theory | ||||
|---|---|---|---|---|---|
| Outcome classes | Category | Definition | Category | Definition | |
| Response | a one-way communication (from the perspective of the caregiver/teacher) that stimulus from the child (movements, gestures, facial expressions, vocalization or other behavior) may affect or influence the attention of the caregiver or teacher but don’t necessary require an action response from the caregiver/teacher. | Stimulus (Non-action) | Stimulus is an attempt by one partner to encourage an action from another partner. | ||
| Action | two-way or mutual communication (from the perspective of the caregiver/teacher) where the stimulus from the child (movements, gestures, facial expressions, vocalization or other behavior) affects or influence the attention of the caregiver or teacher which cause a response through action (e.g. attending to children’s needs). | Action | Actions are observable process of behavioral change in an individual that is demonstrated by movement, gestures, facial expression, vocalization or other behaviors. It can be a dual action where action may be carried out by both participants in the dyad, that is, they may work together to achieve an action. Dual action arises where one participant carries out part of an action, but the other completes it. | ||
| Response/Action | stimulus from the child (verbal or non-verbal responses or behavior manifestations through movements, gestures, facial expressions, vocalization or other behavior) which affect or influence the attention of the caregiver or teacher which may or may not require responding through action. | ||||
Fig 3Behavior outcome classes in each outcome level.
Fig 4Mean classification accuracy rates (%) of each recalibrated dataset combination (a-f) within and between class comparison using one-way ANOVA (2-tailed) with Bonferroni posthoc test. a = child characteristics with major movement category and environment data (CC+MajC+ED); b = child characteristics and major movement category (CC+MajC); c = child characteristics with minor movement category and environment data (CC+MinC+ED); d = child characteristics and minor movement category (CC+MinC); e = child characteristics with major and minor movement categories and environment data (CC+MajC+MinC+ED); f = child characteristics with major and minor movement categories (CC+MajC+MinC); p < 0.001***, p < 0.01**, p < 0.05*.
The classification accuracy rate by recalibrated dataset (with and without environment data) in each class.
| 2-class | 3-class | 7-class | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (+) ED | (-) ED | Acc. | (+) ED | (-) ED | Acc. | (+) ED | (-) ED | Acc. | |||||||
| (+)Bor | (-)Bor | (+)Bor | (-)Bor | Mean (SD) | (+)Bor | (-)Bor | (+)Bor | (-)Bor | Mean (SD) | (+)Bor | (-)Bor | (+)Bor | (-)Bor | Mean (SD) | |
| XGB | 69.03 | 67.60 | 59.10 | 64.37 | 65.03 (4.39) | 69.27 | 69.63 | 65.77 | 66.23 | 67.73 (2.00) | 43.80 | 46.73 | 39.63 | 43.20 | 43.34 (2.92) |
| SVM | 72.13 | 72.03 | 59.07 | 62.90 | 66.53 (6.57) | 70.67 | 73.50 | 65.43 | 67.17 | 69.19 (3.61) | 46.53 | 45.07 | 38.63 | 43.10 | 43.33 (3.45) |
| RF | 71.63 | 73.33 | 59.53 | 63.37 | 66.97 (6.57) | 70.93 | 73.30 | 65.60 | 66.90 | 69.18 (3.56) | 46.57 | 47.07 | 41.20 | 42.77 | 44.40 (2.88) |
| NN | 74.73 | 72.57 | 61.87 | 66.33 | 68.88 (5.86) | 69.60 | 70.93 | 65.47 | 67.83 | 68.46 (2.34) | 44.47 | 44.27 | 41.33 | 43.07 | 43.29 (1.47) |
| Mean | 71.88 (2.34) | 71.38 (2.57) | 59.89 (1.35) | 64.24 (1.50) | 66.84 (1.59) | 70.12 (0.79) | 71.84 (1.90) | 65.57 (0.17) | 67.03 (0.67) | 68.63 (0.70) | 45.34 (1.42) | 45.79 (1.32) | 40.20 (1.31) | 43.04 (0.17) | 43.60 (0.55) |
Note: (+) ED = dataset with environment data; (-) ED = dataset with environment data; Acc. = accuracy; (+) Bor = with Boruta feature selection; (-) Bor = without Boruta feature selection; XGB = eXtreme Gradient Boosting; SVM = support vector machine; RF = random forest; NN = neural network; SD = standard deviation.
Fig 5Mean classification accuracy rates (%) comparison by class, inclusion of environment data, feature selection and classifier.
(+) ED = dataset with environment data; (-) ED = dataset with environment data; (+) Bor = with Boruta feature selection; (-) Bor = without Boruta feature selection; XGB = eXtreme Gradient Boosting; SVM = support vector machine; RF = random forest; NN = neural network.
Fig 6Variable importance ranking based on Boruta feature selection method in class 2.
CC = child characteristics; MajC = major movement category; MinC = minor movement category; ED = environment data; Condition = PIMD or IDs; MinC_Gaze = gazing; MinC_ChangeLOS = changing line of sight; MinC_FaceExp = facial expression (other than smile); MinC_Voc = vocalization as minor category; MinC_Point = pointing; MinC_Reach = reaching; MinC_Move = moving; MinC_Appro = approaching; MinC_BodyPartMove = movement of a part of the body; MajC_EyeMove = eye movement; MajC_FaceExp = facial expressions; MajC_Voc = vocalization as major category; MajC_HandMove = hand movements; MajC_BodyMove = body movements; GPS1: Longitude; GPS2: Latitude; iB4 = classroom; iB5 = other iBeacon device; S1: Ultraviolet (UV) range (mW/cm2); S2, S3, S4: 6-axis (Accel+Geomag) sensor ranges [g]; S5, S6, S7: 6-axis (Accel+Geomag) sensor resolutions [μT]; S8: UV resolution [Lx]; S9: Pressure sensor range (hPa); S10: Temperature and humidity sensor range (°C); S11: Temperature and humidity sensor resolution (%RH); A7: Minimum temperature (°C); A8: Maximum temperature (°C); A9: Atmospheric pressure (hPa); A10: Main temperature (°C); A11: Humidity (%); A13: Cloudiness (%); A14: Wind direction (degrees); A15: Wind speed (meters/second); Mo_Feb = February; Mo_Sept = September; Mo_Oct = October; Mo_Nov = November; Mo_Dec = December; ShadowMin = minimum Z-score of a shadow attribute; ShadowMax = minimum Z-score of a shadow attribute; ShadowMean = average Z-score of a shadow attribute.
Fig 8Variable importance ranking based on Boruta feature selection method in class 7.
CC = child characteristics; MajC = major movement category; MinC = minor movement category; ED = environment data; Condition = PIMD or IDs; MinC_Gaze = gazing; MinC_ChangeLOS = changing line of sight; MinC_FaceExp = facial expression (other than smile); MinC_Voc = vocalization as minor category; MinC_Point = pointing; MinC_Reach = reaching; MinC_Move = moving; MinC_Appro = approaching; MinC_BodyPartMove = movement of a part of the body; MajC_EyeMove = eye movement; MajC_FaceExp = facial expressions; MajC_Voc = vocalization as major category; MajC_HandMove = hand movements; MajC_BodyMove = body movements; GPS1: Longitude; GPS2: Latitude; iB4 = classroom; iB5 = other iBeacon device; S1: Ultraviolet (UV) range (mW/cm2); S2, S3, S4: 6-axis (Accel+Geomag) sensor ranges [g]; S5, S6, S7: 6-axis (Accel+Geomag) sensor resolutions [μT]; S8: UV resolution [Lx]; S9: Pressure sensor range (hPa); S10: Temperature and humidity sensor range (°C); S11: Temperature and humidity sensor resolution (%RH); A7: Minimum temperature (°C); A8: Maximum temperature (°C); A9: Atmospheric pressure (hPa); A10: Main temperature (°C); A11: Humidity (%); A13: Cloudiness (%); A14: Wind direction (degrees); A15: Wind speed (meters/second); Mo_Feb = February; Mo_Sept = September; Mo_Oct = October; Mo_Nov = November; Mo_Dec = December; ShadowMin = minimum Z-score of a shadow attribute; ShadowMax = minimum Z-score of a shadow attribute; ShadowMean = average Z-score of a shadow attribute.
Fig 7Variable importance ranking based on Boruta feature selection method in class 3.
CC = child characteristics; MajC = major movement category; MinC = minor movement category; ED = environment data; Condition = PIMD or IDs; MinC_Gaze = gazing; MinC_ChangeLOS = changing line of sight; MinC_FaceExp = facial expression (other than smile); MinC_Voc = vocalization as minor category; MinC_Point = pointing; MinC_Reach = reaching; MinC_Move = moving; MinC_Appro = approaching; MinC_BodyPartMove = movement of a part of the body; MajC_EyeMove = eye movement; MajC_FaceExp = facial expressions; MajC_Voc = vocalization as major category; MajC_HandMove = hand movements; MajC_BodyMove = body movements; GPS1: Longitude; GPS2: Latitude; iB4 = classroom; iB5 = other iBeacon device; S1: Ultraviolet (UV) range (mW/cm2); S2, S3, S4: 6-axis (Accel+Geomag) sensor ranges [g]; S5, S6, S7: 6-axis (Accel+Geomag) sensor resolutions [μT]; S8: UV resolution [Lx]; S9: Pressure sensor range (hPa); S10: Temperature and humidity sensor range (°C); S11: Temperature and humidity sensor resolution (%RH); A7: Minimum temperature (°C); A8: Maximum temperature (°C); A9: Atmospheric pressure (hPa); A10: Main temperature (°C); A11: Humidity (%); A13: Cloudiness (%); A14: Wind direction (degrees); A15: Wind speed (meters/second); Mo_Feb = February; Mo_Sept = September; Mo_Oct = October; Mo_Nov = November; Mo_Dec = December; ShadowMin = minimum Z-score of a shadow attribute; ShadowMax = minimum Z-score of a shadow attribute; ShadowMean = average Z-score of a shadow attribute.
Classification performance rates of the classifiers in each class.
| 2-class | 3-class | 7-class | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Rec. | Spec. | Prec. | F1 | AUC | Rec. | Spec. | Prec. | F1 | AUC | Rec. | Spec. | Prec. | F1 | AUC | |
| XGB | 69.72 | 59.43 | 68.38 | 68.17 | 66.61 | 55.28 | 80.47 | 67.40 | 66.69 | 72.17 | 39.28 | 89.93 | 45.92 | 44.40 | 71.35 |
| SVM | 71.58 | 60.49 | 69.78 | 69.64 | 67.13 | 57.40 | 81.47 | 67.90 | 67.63 | 73.68 | 38.85 | 89.90 | 45.96 | 43.77 | 66.71 |
| RF | 69.06 | 64.48 | 71.42 | 68.91 | 68.33 | 57.52 | 81.33 | 68.30 | 67.70 | 73.50 | 40.16 | 90.12 | 47.68 | 45.50 | 71.68 |
| NN | 70.09 | 67.43 | 73.36 | 70.40 | 70.32 | 57.54 | 81.22 | 66.87 | 67.42 | 73.60 | 38.66 | 89.92 | 44.87 | 43.33 | 70.02 |
Note: XGB = eXtreme Gradient Boosting; SVM = support vector machine; RF = random forest; NN = neural network; Rec. = recall; Spec. = specificity; Prec. = precision; F1 = F1 score; AUC = area under the ROC curve.
Fig 9Confusion matrices showing the mean classification accuracy rates (%) in binary and multiple outcome classes.
Three-way ANOVA results of the influences of dataset, feature selection, classifier, and class (and interaction factors) to the mean classification accuracy rates (%).
| Factors | df | F-value | η2 |
|---|---|---|---|
| Dataset | 1 | 351.79*** | 0.79 |
| Feature Selection | 1 | 28.29*** | 0.23 |
| Classifier | 3 | 4.77*** | 0.13 |
| Class | 2 | 2491.16*** | 0.98 |
| Dataset * Feature selection | 1 | 12.96** | 0.12 |
| Dataset * Classifier | 3 | 4.64*** | 0.13 |
| Dataset * Class | 2 | 29.77*** | 0.38 |
| Feature selection * Classifier | 3 | 0.23 | 0.01 |
| Feature selection * Class | 2 | 0.10 | 0.00 |
| Classifier * Class | 6 | 2.88* | 0.15 |
| Dataset * Feature selection * Classifier | 3 | 0.75 | 0.02 |
| Dataset * Feature selection * Class | 2 | 5.20** | 0.10 |
| Dataset * Classifier * Class | 6 | 0.82 | 0.05 |
| Feature selection * Classifier * Class | 6 | 0.82 | 0.05 |
| Dataset * Feature selection * Classifier * Class | 6 | 0.81 | 0.05 |
| Error | 96 |
Note: Dataset = recalibrated with child characteristics, major and minor behavior categories with or without environment data (ED); feature selection = with or without Boruta algorithm; classifier = XGB, SVM, RF or NN; class = class 2, 3 or 7; η2 = partial eta squared; adjustment for multiple comparison (Bonferroni); p < 0.001***, p < 0.01**, p < 0.05*.
Fig 10Three-way ANOVA results of the differences in the pooled mean classification accuracy rate (%) by dataset, feature selection, classifier, and class.
(+) ED = dataset with environment data; (-) ED = dataset with environment data; (+) Bor = with Boruta feature selection; (-) Bor = without Boruta feature selection; XGB = eXtreme Gradient Boosting; SVM = support vector machine; RF = random forest; NN = neural network.