| Literature DB >> 35161612 |
Marco Leo1, Giuseppe Massimo Bernava2, Pierluigi Carcagnì1, Cosimo Distante1.
Abstract
Neurodevelopmental disorders (NDD) are impairments of the growth and development of the brain and/or central nervous system. In the light of clinical findings on early diagnosis of NDD and prompted by recent advances in hardware and software technologies, several researchers tried to introduce automatic systems to analyse the baby's movement, even in cribs. Traditional technologies for automatic baby motion analysis leverage contact sensors. Alternatively, remotely acquired video data (e.g., RGB or depth) can be used, with or without active/passive markers positioned on the body. Markerless approaches are easier to set up and maintain (without any human intervention) and they work well on non-collaborative users, making them the most suitable technologies for clinical applications involving children. On the other hand, they require complex computational strategies for extracting knowledge from data, and then, they strongly depend on advances in computer vision and machine learning, which are among the most expanding areas of research. As a consequence, also markerless video-based analysis of movements in children for NDD has been rapidly expanding but, to the best of our knowledge, there is not yet a survey paper providing a broad overview of how recent scientific developments impacted it. This paper tries to fill this gap and it lists specifically designed data acquisition tools and publicly available datasets as well. Besides, it gives a glimpse of the most promising techniques in computer vision, machine learning and pattern recognition which could be profitably exploited for children motion analysis in videos.Entities:
Keywords: baby motion analysis; deep learning; early diagnosis; machine learning; neurodevelopmental disorders
Mesh:
Year: 2022 PMID: 35161612 PMCID: PMC8839211 DOI: 10.3390/s22030866
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Introduced taxonomy.
Main properties of publicly available datasets. N stands for Newborns, I for Infants and T for Toddlers, Misc for Miscellaneous, NA for Not Applicable, U for Unknown.
| Database | Contents | Frame | Age | Info | Frames | Labels |
|---|---|---|---|---|---|---|
| BabyPose [ | 16 Videos | 640 × 480 | N | Depth | 16,000 | 12 Body |
| MINI-RGBD [ | 12 Videos | 640 × 480 | I | RGB/D | 12,000 | 25 Body |
| SyRIP [ | Images | Misc | I | RGB | 2000 | 17 Body |
| Dataset [ | 85 Youtube | Misc | I | RGB | NA | 18 Body |
| SSBD [ | 75 Youtube | Misc | NA | RGB | U | Behaviors |
| MMDB [ | 160 Videos | Misc | T | Multimodal | U | ASD Diagnosis |
| Tariq [ | 162 Videos | Misc | T | RGB | U | Behaviors |
| DREAM [ | 3121 Videos | NA | T | Depth | NA | 3D Skeleton Gaze ADOS scores |
| 3d-AD [ | 100 Videos | 512 × 424 | T | Depth | U | Behaviors |
Figure 2Percentages of published papers with respect to Age Range taxonomy.
Figure 3Percentages of published papers with respect to setup taxonomy (home, hospital, etc.).
Figure 41D convolutional neural network architecture exploited in [60] for labelling observed movements as indicative of typically developing infants (Normal) or that may be of concern to clinicians (Abnormal).
Summarization of works dealing with movements assessment in newborns.
| Work | Setup | Input | CV/Ml Task | Clinical Scope |
|---|---|---|---|---|
| [ | Hospital | Depth | Pose estimation by Keypoints recognition | General |
| [ | NICU | Depth | Limb Pose by 2 CNN 2CNN (detection + regression) | General |
| [ | Hospital | RGB | Deformable part models | General |
| [ | Hospital | Multimodal | Optical Flow + audio features Logistic regression | Normal/Abnormal |
| [ | Hospital | RGB | Limb Motion Description by SVM, RF, LDA | WM vs. PR |
| [ | NA | Synthetic | Histograms + CNN | Normal/Abnormal |
Figure 5Normal or abnormal movement classification by means of VGG for feature extraction and LSTM for temporal modelling as proposed in [66].
Summarization of works dealing with movements assessment in Infants.
| Work | Setup | Input | Method | Classification Goal |
|---|---|---|---|---|
| [ | Home/Hospital | RGB | Motion Feature + Gaussian mixture network | 4 type of mov. WMs/FMs/PR/CS |
| [ | Hospital | RGB | Motion + MEMD + HT + Decision Tree | CP risk |
| [ | Hospital | RGB | OpenPose+NN | FMs |
| [ | Treatment Center | RGB | Amount of Motion | Pain Level |
| [ | Home/Hospital | RGB | VGG9+LSTM | FMs |
| [ | Home/Hospital | RGB | OpenPose+LSTM | FMs |
| [ | Home | Smartphone | CIMA-Pose | CP risk |
Figure 6The innovative tool proposed in [78]. It follows a Bag-of-Visual-Words configuration for recognising 4 repetitive actions that are a potential indication of ASD disorder.
Summarization of works dealing with movements assessment in Toddlers.
| Work | Setup | Input | Method | Goal |
|---|---|---|---|---|
| [ | domestic (Tariq dataset) | RGB | Random Forests | Typical/Atypical |
| [ | Rehabilitation Environment | Multiview RGB | Faster R-CNN + LSTM + learnable fusion coefficients | 4 daily actions |
| [ | Domestic | RGB | 2D Mask R-CNN + particle filter +CNN classifier | Atypical/Typical Trajectories |
| [ | Domestic | RGB (from YouTube) | YOLOv3 + HOF + K-means K-means + MLP | 4 repetitive Actions |
| [ | Domestic (SSBD dataset) | RGB | CNN + LSTM | ASD/Typical |
| [ | domestic (Tariq dataset) | RGB | Various regressors Classifiers | ASD Features Rating |
Recent works on human motion analysis.
| Work | Improved Task | Key Contribution |
|---|---|---|
| [ | Motion Features Extraction | Spatiotemporal self-similarity |
| [ | Motion Features Extraction | MotionSqueeze module |
| [ | Pose Estimation (Key points Positioning) | Multi-branch regression |
| [ | Pose Estimation (Key points Positioning) | Cascade Transformers |
| [ | Pose Estimation (Key points Positioning) | Adversarial algorithms |
| [ | Human Completion | Topological Structure/Memory Bank |
| [ | Skeleton-Based Action Recognition | Memory Attention Networks |
| [ | Action Recognition | Temporal-Spatial pooling block |
| [ | Action Recognition | CNN+Autoencoder+LSTM |
| [ | Action Recognition | Contrastive Learning |
| [ | Action Recognition | Semi-supervised Action Detection |
| [ | Action Classification | Transformers |
| [ | Action Quality Assessment | Contrastive Regression |
| [ | video representation | Space-Time Graph |
| [ | Video Representation | Self-supervised learning |
| [ | Temporal Modeling | Two-level Motion Modeling |
| [ | Motion Segment Extraction | Hierarchical Framework |
| [ | Temporal Action Localization | E2E anchor free method |
| [ | Temporal Action Localization | Anchor-Constrained Viterbi |
| [ | Temporal Action Localization | Memory Network |
| [ | Temporal Action Localization | Multi-Label Action Dependency layer |
| [ | Human Object Interaction | Transformer /Cascade detector |
| [ | Human Object Interaction | Graph Networks |