| Literature DB >> 32999273 |
Ryan Anthony J de Belen1, Tomasz Bednarz2, Arcot Sowmya3, Dennis Del Favero2.
Abstract
The current state of computer vision methods applied to autism spectrum disorder (ASD) research has not been well established. Increasing evidence suggests that computer vision techniques have a strong impact on autism research. The primary objective of this systematic review is to examine how computer vision analysis has been useful in ASD diagnosis, therapy and autism research in general. A systematic review of publications indexed on PubMed, IEEE Xplore and ACM Digital Library was conducted from 2009 to 2019. Search terms included ['autis*' AND ('computer vision' OR 'behavio* imaging' OR 'behavio* analysis' OR 'affective computing')]. Results are reported according to PRISMA statement. A total of 94 studies are included in the analysis. Eligible papers are categorised based on the potential biological/behavioural markers quantified in each study. Then, different computer vision approaches that were employed in the included papers are described. Different publicly available datasets are also reviewed in order to rapidly familiarise researchers with datasets applicable to their field and to accelerate both new behavioural and technological work on autism research. Finally, future research directions are outlined. The findings in this review suggest that computer vision analysis is useful for the quantification of behavioural/biological markers which can further lead to a more objective analysis in autism research.Entities:
Mesh:
Year: 2020 PMID: 32999273 PMCID: PMC7528087 DOI: 10.1038/s41398-020-01015-w
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Magnetic resonance imaging (MRI)/functional MRI (fMRI).
| Reference | Focus | Age | Input data/device used | Method used | Dataset | |
|---|---|---|---|---|---|---|
| Samson et al.[ | fMRI to study the neural bases of complex non-social sound processing | 15 ASD, 13 TD | ASD: 24.3 ± 6.25 TD: 23.5 ± 7.42 | fMRI scans/3 T TRIO MRI system | Image processing/ICBM152 (MNI) space and 3D Gaussian Filtering | Own dataset |
| Abdelrahman et al.[ | MRI for diagnosis | 14 ASD, 28 TD | 7-38 years | MRI scans/1.5 T Sigma MRI scanner | Mesh processing | Own dataset |
| Durrleman et al.[ | MRI for biomarker detection | 51 ASD, 25 TD and developmentally delayed children | 18-35 months | MRI, 1.5-T GE Signa MRI scanner | [ | |
| Ahmadi et al.[ | fMRI for biomarker detection | 24 ASD, 27 TD | MRI scans/3T MRI scanner | Machine learning, independent component analysis | Own dataset | |
| Chaddad et al.[ | MRI for biomarker detection | 34 ASD, 30 TD | 4-24 years | MRI scans/3T MRI scanner | Texture analysis | ABIDE I dataset |
| Chaddad et al.[ | MRI for biomarker detection | 539 ASD, 573 TD | ASD: 17.01 ± 8.36 TD: 17.08 ± 7.72 | MRI scans | Texture analysis | ABIDE I dataset |
| Eslami and Saeed[ | fMRI for diagnosis | 187 ASD, 183 TD | fMRI scans | Deep learning, MLP with 2 hidden layers + SVM | Four datasets (NYU, OHSU, USM, UCLA) from ABIDE-I fMRI dataset | |
| Li et al.[ | fMRI for diagnosis | 149 ASD, 161 TD | rs-fMRI scans | Deep learning/SSAE | 4 datasets (UM, UCLA, USM, LEUVEN) from ABIDE MRI dataset | |
| Crimi et al.[ | fMRI for diagnosis | 31 ASD, 23 TD | Imaging data, GE 3T MR750 scanner | Machine Learning/Constrained Autoregressive Model | San Diego State University cohort of ABIDE II dataset | |
| Chanel et al.[ | fMRI for diagnosis | 15 ASD, 14 TD | ASD: 28.6 ± 1.87 TD: 31.6 ± 2.61 | fMRI/3T MRI scanner | Machine learning/SVM | Own dataset |
| Zheng et al.[ | MRI for biomarker detection | 66 ASD, 66 TD | MRI scans | multi-feature-based networks (MFN) and SVM | 4 datasets (NYU, SBL, KUL, ISMMS) from ABIDE database |
Facial expression/emotion.
| Reference | Focus | Age | Input data/device used | Method used | Dataset | |
|---|---|---|---|---|---|---|
| Leo et al.[ | Facial expression for quantitative assessment | 17 ASD, 10 TD | 6–13 years | Image sequences | Deep learning | Own dataset |
| Kalantarian et al.[ | Facial emotion for mobile games | 8 ASD | 6–12 years | Mobile phone | Ensemble classification (AWS + Sighthound + Azure) | Own dataset |
| Kalantarian et al.[ | Facial expression for quantitative assessment | 8 ASD, 5 TD | ASD: 8.5 ± 1.85 TD: 4.4 ± 0.54 (in years) | Video, mobile phone | Histogram of Oriented Gradients (HOG) + SVM | Own dataset |
| Han et al.[ | Emotional expression recognition | 25 ASD | Camera | Deep learning, CNN | [ | |
| Tang et al.[ | Automatic smile detection | 11 ASD, 23 TD | 6–24 months | Video, two wireless cameras | Deep learning, CNN | GENKI-4K, CelebA[ |
| Daniels et al.[ | Emotion recognition for assistive technology | 23 ASD, 20 TD | 6–17 years | Google Glass | n/a | |
| Jazouli et al.[ | Emotion recognition for assistive technology | 10 ASD | 3D image, Microsoft Kinect | Own dataset | ||
| Washington et al.[ | Emotion recognition for assistive technology | 14 ASD | 9.57 months [3.37. 4–15] | Video/Google Glass and mobile phone | Machine learning, Histogram of Gradients (HOG) + SVM | [ |
| Voss et al.[ | Emotion recognition for assistive technology | 20 ASD, 20 TD | Video/Google Glass and mobile phone | Machine learning, Histogram of Gradients (HOG) + SVM | n/a | |
| Vahabzadeh et al.[ | Emotion recognition for assistive technology | 8 ASD | 11.7–20.5 years | Video, Google Glass | n/a | |
| Leo et al.[ | Emotion recognition for behaviour monitoring | 3 ASD | Video, Robokind R25 Robot | [ | ||
| Pan et al.[ | Facial emotion for behaviour analysis | 2 ASD | Video, NAO robot | Own dataset | ||
| Coco et al.[ | Facial expression analysis for diagnosis | 5 ASD, 5 TD | 65.38 months [15.86, 48–65 months] | Video, webcam | Deep learning, Histogram of Oriented Gradients (HOG) feature combined with a linear classifier, CNN | DISFA [24], SEMAINE [26] and BP4D [34] datasets. |
| Leo et al.[ | Facial expression for quantitative assessment | 17 ASD | 6–13 years | Image sequences | Deep learning | Own dataset |
| Samad et al.[ | 3D facial imaging for physiology-based impairment detection | 8 ASD, 8 TD | 7–20 years | 3D images, high resolution 3D facial imaging sensor, 3dMD | n/a | |
| Leo et al.[ | Facial expression recognition for assistive technology | 1 ASD, 1 TD | Video | Deep learning, Facial Action Coding System (FACS) | Own dataset | |
| Guha et al.[ | Facial expression for quantitative assessment | 20 ASD, 19 TD | 9–14 years | Motion capture data, 6 infra-red motion-capture cameras | Deep learning, Facial Action Coding System (FACS) | Own dataset |
| Ahmed and Goodwin[ | Facial expression for predicting engagement and learning performance | 7 ASD | 8–19 years | Video, camera | Computer Expression Recognition Toolbox | Own dataset |
| Harrold et al.[ | Facial expression for assistive technology | 2 ASD, 4 TD | 8–10 years | Video, Apple iPad | n/a | |
| Harrold et al.[ | Facial expression for assistive technology | 2 ASD, 4 TD | 8–10 years | Video, Apple iPad | n/a | |
| White et al.[ | Facial emotion expression and recognition | 20 ASD, 20 TD | 9–12 years | 3D data, Microsoft Kinect | n/a | |
| Garcia-Garcia et al.[ | Facial expression for learning emotional intelligence | 3 ASD | 8–10 years | Video, mobile phone | Affectiva SDK | n/a |
| Jain et al.[ | Facial expression recognition for assistive technology | 6 ASD | 5–12 years | Video, webcam | [ | |
| Li et al.[ | Facial attributes for ASD classification | 49 ASD, 39 TD | Video, Apple iPad | Deep learning, CNN | Training: AffectNet[ Evaluation: Own dataset | |
| Shukla et al.[ | Facial image analysis for diagnosis | 91 ASD, 1035 NDD, 1126 TD | Image, camera | Deep learning, CNN | Own dataset |
Eye Gaze Data.
| Reference | Focus | Age | Input data/device used | Method | Dataset | |
|---|---|---|---|---|---|---|
| Pierce et al.[ | Biomarker detection | 444 subjects from 6 distinct groups | Eye tracking data, Tobii T120 eye tracker | Own dataset | ||
| Murias et al.[ | Biomarker detection | 25 ASD | 24–72 months | Eye tracking data, Tobii TX300 eye tracker | Own dataset | |
| Chawarska et al.[ | Eye movement to determine prodromal symptoms of ASD | 84 ASD | 6 months | Gaze trajectories, SensoMotoric Instruments IView X RED eye-tracking system | Own dataset | |
| Shi et al.[ | Visual stimuli design consideration | 13 ASD, 20 TD | 4–6 years | Infra-red eye-tracking recording, EyeLink1000 | Own dataset | |
| Shic et al.[ | Visual attention preference | 28 ASD, 16 DD, 34 TD | 20 months | Gaze patterns, SMI iView X™ RED dark-pupil 60 Hz eye-tracking system | Own dataset | |
| Liu et al.[ | Eye movement for diagnosis | 29 ASD, 58 TD | 4–11 years | Gaze data, Tobii T60 eye tracker | Machine learning, k nearest neighbours (kNN) | Own dataset |
| Tung et al.[ | Eye detection | 33 ASD | Video, camera | Own dataset | ||
| Balestra et al.[ | Eye tracking to study language impairments and text comprehension and production deficits | 1 ASD | 25 years | Eye tracking data, Tobii 1750 eye tracker | n/a | |
| Li et al.[ | Identification of fixations and saccades | 38 ASD, 179 TD | Eye-tracking data | Modified DBSCAN Algorithm | Own dataset | |
| Matthews et al.[ | Eye gaze analysis for affective state recognition | 19 ASD, 19 TD | ASD: 41.05 ± 32.15 TD: 32.15 ± 9.93 (in years) | Video, Gazepoint GP3 eye-tracker | Scanpath trend analysis and arousal sensing and detection of focal attention | n/a |
| Campbell et al.[ | Gaze pattern for saliency analysis | 15 ASD, 13 TD | 8–43 months | Gaze trajectories, SensoMotoric Instruments iView XRED eye-tracking system | Bayesian model | n/a |
| Syeda et al.[ | Eye gaze for visual face scanning and emotion analysis | 21 ASD, 21 TD | 5–17 years | Gaze data, Tobii EyeX controller | Own dataset | |
| Chrysouli et al.[ | Eye gaze analysis for affective state recognition | Video, Kinect camera | Deep learning, two-stream CNN | MaTHiSis | ||
| Liu et al.[ | Eye movement for diagnosis | Children: 20 ASD, 21 TD adults: 19 ASD, 22 intellectually disabled (ID), 28 TD | children: ASD: 7.85 ± 1.59 TD: 7.73 ± 1.51 adults: ASD: 20.84 ± 3.27 ID: 23.59 ± 3.08 TD: 20.61 ± 2.90 | Eye tracking data, Tobii T60 eye tracker | Bag-of-Words (BOW) framework and SVM | [ |
| Vu et al.[ | Gaze pattern for diagnosis | 16 ASD, 16 TD | 2–10 years | Gaze data, Tobii EyeX controller | Machine learning, similarity matching + kNN | Own dataset |
| Jiang and Zhao[ | Visual attention preference for diagnosis | 20 ASD, 19 TD | ASD: 30.8 ± 11.1 TD: 32 ± 10.4 (in years) | Eye tracking data | Deep learning | [ |
| Higuchi et al.[ | Gaze direction for behaviour analysis | 2 ASD, 2 TD | Video, camera | OpenFace Toolkit | Own dataset | |
| Chong et al.[ | Eye contact detection for behaviour analysis | 50 ASD, 50 TD | Videos, | Deep learning | Own dataset (subset is from MMDB) | |
| Toshniwal et al.[ | Attention recognition for assistive technology | 10 ASD, 8 NDD | 12–18 years | Video, Mobile phone | Android Face Detection API | Own dataset |
Motor control/movement pattern.
| Reference | Focus | Age | Input data/device used | Method used | Dataset | |
|---|---|---|---|---|---|---|
| Dawson et al.[ | Head movement for digital phenotyping | 22 ASD, 82 TD | 16–31 months | Video, iPad | Intraface, model-based object pose | Not publicly available |
| Martin et al.[ | Head movement analysis | 21 ASD, 21 TD | 2.5–6.5 years old | Video, camera | Zface to track pitch, yaw, and roll of head movement | Not publicly available |
| Zunino et al.[ | Grasping actions for diagnosis | 20 ASD, 20 TD | ASD: 9.8 years TD: 9.5 years | Video, Vicon VUE video camera | Deep learning, CNN + LSTM | Publicly available |
| Vyas et al.[ | Motion pattern for diagnosis | Video, Mobile Phone | R-CNN | From NODA programme of Behaviour Imaging company | ||
| Piana et al.[ | Body movement for emotional training | 10 ASD | Mean age: 9.6 years old | Video and motion capture data, Microsoft Kinect v2 | n/a | |
| Bartoli et al.[ | Movement pattern analysis for game-based therapy | 5 ASD | 10–12 years old | Video, Microsoft Xbox 360 Kinect | n/a | |
| Ringland et al.[ | Movement pattern analysis to support therapeutic tool | 15 with neurodevelopmental disorder | 10–14 years old | Video, Microsoft Kinect | n/a | |
| Magrini et al.[ | Gesture tracking for music therapy | 4 ASD | 5–7 years old | Video, camera | n/a | |
| Dickstein-Fischer and Fischer[ | Robot-assisted therapy | Video, Penguin for Autism Behavioural Interventions (PABI) | n/a | |||
| Bekele et al.[ | Head movement analysis for assistive technology | 6 ASD, 6 TD | ASD: 4.70 ± 0.70 TD: 4.26 ± 1.05 | Video, NAO Robot with 2 vertical stereo cameras | Image processing | n/a |
| Dimitrova et al.[ | Movement analysis for assistive technology | 7–9 years old | Video, webcam | n/a |
Stereotyped behaviours.
| Reference | Focus | Age | Input data/device used | Method used | Dataset | |
|---|---|---|---|---|---|---|
| Hashemi et al.[ | Behaviour analysis | 6 ASD, 14 TD | 16–30 months | Video, iPad | IntraFace | Own dataset |
| Hashemi et al.[ | Sharing interest, visual tracking, and disengagement of attention detection | 3 ASD, 3 TD | 6–15 months | Video, two GoPro HD cameras | Histogram of Orientated Gradients (HOG) and SVM | Own dataset |
| Hashemi et al.[ | Behaviour analysis | 12 ASD | 5–16 months | GoPro Hero HD | HOG and SVM | Own dataset |
| Bidwell et al.[ | Behaviour analysis | 15–30 months | Video, Camera and Microsoft Kinect | Omron OKAO Vision Library | Multimodal Dyadic Behaviour (MMDB) Dataset | |
| Campbell et al.[ | Atypical orienting and attention behaviours for behavioural observation | 22 ASD, 82 TD or DD | 16–31 months | Tablet Device | IntraFace | Own dataset |
| Hashemi et al.[ | Engagement, name-call responses, and emotional responses | 15 ASD, 18 TD | 16–31 months | Video, iPad | IntraFace | Own dataset |
| Wang et al.[ | Attention monitoring for diagnosis | 5 ASD, 12 TD | Video, two RGB cameras | Microsoft SDK | Own dataset | |
| Bovery et al.[ | Attention monitoring for behavioural assessment | 22 ASD, 82 TD | 16–31 months | Video, iPad | IntraFace | Own dataset |
| Rajagopalan and Goecke[ | Self-stimulatory behaviour detection | YouTube videos | Histogram of Dominant Motions (HDM) | Self-stimulatory Behaviour Dataset, UCF101 and Weizmann Datasets | ||
| Rajagopalan et al.[ | Self-stimulatory behaviour detection | YouTube videos | Space Time Interest Points (STIP) with Harris3D detectors in a BOW framework | Self-stimulatory Behaviour Dataset | ||
| Rajagopalan[ | Self-stimulatory behaviour detection | YouTube videos | Motion trajectories | Self-stimulatory Behaviour Dataset, UCF101, Hollywood 2 Datasets | ||
| Winoto et al.[ | Behaviour analysis | 4 ASD, 4 TD | Microsoft Kinect v2 | |||
| Feil-Seifer and Matarić[ | Interaction with robots for behaviour analysis | 8 ASD | Video, camera | Heuristics | Own dataset | |
| Moghadas and Moradi[ | Interaction with robots for diagnosis | 8 ASD, 8 TD | ASD: 2.1–4.1 years TD: 2.11–7.6 years | Video, RobotParrot and two cameras | Kernelised Correlation Filter (KFC) and cosine similarity and SVM | Own dataset |
Multimodal data.
| Reference | Focus | Age | Input data/device used | Method used | Dataset | |
|---|---|---|---|---|---|---|
| Egger et al.[ | Emotion and attention analysis | 16–30 months old | Video, mobile phone | IntraFace and[ | BU-3D Facial Expression[ | |
| Rudovic et al.[ | Autism therapy | 35 ASD | 3–13 years old | Synchronised video recordings of facial expressions, head and body movements, pose, and gestures, audio recordings, and autonomic physiology | Deep learning, Personalised Perception of Affect Network (PPA-net) | Own dataset - multimodal data set of children with ASC (MDCA)[ |
| Chen and Zhao[ | Attentional and image-viewing preference for diagnosis | Photo taking: 22 ASD and 23 controls image-viewing: 20 ASD and 19 controls | Photo sequence + Image and Eye fixations | Deep learning, ResNet-50 and LSTM | Own dataset (photos and eye-tracking data) and[ | |
| Wang et al.[ | Mutual gaze and gesture recognition for diagnosis | 2 ASD, 6 TD | Children: mean 25 months Adults: mean 25 years | Image/Two Logitech BRIO 4K Pro RGB cameras + Microsoft Kinect | Deep learning, VGG + SSD | Oxford hand and Egohands dataset |
| Mazzei et al.[ | Robotic social therapy | 5 ASD, 15 TD | 6–12 years old | n/a | ||
| Coco et al.[ | Face detection, landmark extraction, gaze estimation, head pose estimation and FER for behaviour analysis | 8 ASD | 47–93 months | Mobile tablet and Zeno R25 robot | Facial landmark Detection and Tracking: conditional local neural field[ | Own dataset |
| Palestra et al.[ | Head pose, body posture, eye contact and facial expression for robotics treatment of autism | 3 ASD | 8–13 years | Robokind Zeno R25 humanoid robot and a Microsoft Kinect | [ | Own dataset |
| Dickstein-Fischer et al.[ | Face recognition, head pose and eye gaze estimation for assistive technology | 5 ASD | 5–8 years old | Video, Penguin for Autism Behavioural Intervention (PABI) | Face Detection: Histogram of oriented gradients (HOG) + linear classifier Face recognition: LBPH Feature extraction: Regression trees Head pose estimation: Perceptive-N-Point problem | HELEN dataset |
| Mehmood et al.[ | Analysis of joint attention and imitation accuracy | 6 ASD, 2 TD | 4–10 years old | 2 NAO robots, Microsoft Kinect, and EEG | Own dataset | |
| Peters et al.[ | Behaviour recognition for assistive technology | 2 ASD, 5 NDD | 41–56 years | Two cameras, flow sensor, x-imu sensor | Own dataset | |
| Rehg et al.[ | Video, audio, and physiological data for behaviour analysis | 121, total | 15–30 months | Multimodal, cameras Microsoft Kinect, microphone, Q-sensors | Smile/Gaze Detection: Omron OKAO Vision Library + SVM | Multimodal Dyadic Behaviour (MMDB) Dataset |
| Liu et al.[ | Video and audio for diagnosis | 22 ASD, 21 TD | 2–3 years old | Video and audio, camera | Own dataset (‘Response to Name’) | |
| Marinoiu et al.[ | Action and emotion for behaviour analysis | 7 ASD | RGB + Depth/Microsoft Kinect v2 | Deep learning | DE-ENIGMA dataset | |
| Schwarzkopf et al.[ | Study of larger extrastriate population receptive fields in ASD | 15 ASD, 12 TD | 20–48 years old | fMRI & eye gaze/3 T TIM-Trio scanner & EyeLink 1000 MRI compatible eye tracker | Own dataset |