Literature DB >> 35721830

Intelligent wearable allows out-of-the-lab tracking of developing motor abilities in infants.

Manu Airaksinen¹, Anastasia Gallen¹, Anna Kivi^1,2, Pavithra Vijayakrishnan¹, Taru Häyrinen^1,2, Elina Ilén³, Okko Räsänen⁴, Leena M Haataja^1,2, Sampsa Vanhatalo^1,5.

Abstract

Background: Early neurodevelopmental care needs better, effective and objective solutions for assessing infants' motor abilities. Novel wearable technology opens possibilities for characterizing spontaneous movement behavior. This work seeks to construct and validate a generalizable, scalable, and effective method to measure infants' spontaneous motor abilities across all motor milestones from lying supine to fluent walking.
Methods: A multi-sensor infant wearable was constructed, and 59 infants (age 5-19 months) were recorded during their spontaneous play. A novel gross motor description scheme was used for human visual classification of postures and movements at a second-level time resolution. A deep learning -based classifier was then trained to mimic human annotations, and aggregated recording-level outputs were used to provide posture- and movement-specific developmental trajectories, which enabled more holistic assessments of motor maturity.
Results: Recordings were technically successful in all infants, and the algorithmic analysis showed human-equivalent-level accuracy in quantifying the observed postures and movements. The aggregated recordings were used to train an algorithm for predicting a novel neurodevelopmental measure, Baba Infant Motor Score (BIMS). This index estimates maturity of infants' motor abilities, and it correlates very strongly (Pearson's r = 0.89, p < 1e-20) to the chronological age of the infant. Conclusions: The results show that out-of-hospital assessment of infants' motor ability is possible using a multi-sensor wearable. The algorithmic analysis provides metrics of motility that are transparent, objective, intuitively interpretable, and they link strongly to infants' age. Such a solution could be automated and scaled to a global extent, holding promise for functional benchmarking in individualized patient care or early intervention trials.

Entities: Chemical

Keywords: Biomarkers; Paediatric research

Year: 2022 PMID： 35721830 PMCID： PMC9200857 DOI： 10.1038/s43856-022-00131-6

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

Early neurodevelopmental care is globally challenged by a scarcity of objective and scalable solutions available for early neurological assessments[1]. More than one in ten infants require active medical follow-up due to perinatal events or abnormal neurological findings[2]. Only a small minority of these infants will be eventually diagnosed with severe disabilities, such as the severe type of cerebral palsy[3], while a larger portion of infants will develop mild or moderate neurocognitive impairments, such as disorders of communication and attention[4,5]. All of these conditions prompt early therapeutic interventions[6]. However, how to distinguish these infants from the majority, who will show a typical range of neurodevelopmental outcomes despite early concerns, has remained elusive. The early development of an infant’s motor abilities provides an essential framework in the developmental cascade related to language and cognitive abilities[7-12]. This has prompted a wide clinical and research approach to survey or observe how the infant reaches developmental milestones, such as rolling, sitting, or walking[13-15]. They are useful for a wide-scale population screening for developmental delays, and they even generalize fairly well across different cultures[13,15]. However, milestone assessment does not quantitate the spontaneous motor ability of infants, and it is not sensitive to the wide variability that characterizes natural motor development[16,17]. More fine-grained information can be obtained by trained professionals using standardized neurodevelopmental assessments[14,18-20], which collate empirical sets of clinically observable or testable items, such as side turning or holding a toy, which requires an evaluation that is at least partly subjective. These test batteries are performed in a controlled environment, such as a doctor’s appointment, which is an unnatural situation from an infant’s perspective, compromising ecological validity from the assessor’s perspective. There is hence a demand to develop methods for early neurodevelopmental tracking that are robust to variability in infant physiology, the skills of the assessor, and the testing environment[1,19,21]. This could be solved with an objective measurement of spontaneous behavior at home, the most ecologically valid environment. Recent progress in sensor technology has made it possible to record extended periods of infants’ spontaneous motor ability in out-of-hospital settings[22-24], with quantitation of behavior at an accuracy that compares with human observers[22]. Here, we set an overall goal to construct and validate a generalizable, scalable, and effective method to measure infants’ spontaneous motor ability across all milestone levels of infant motor development from lying supine to fluent independent walking. In the current study, we present an infant wearable that enables widely scalable out-of-hospital studies and recordings of a total of 59 infants during their spontaneous play. We then develop a novel, unified structured scheme for classifying infant postures and movements (hereafter collectively called “motor ability”) for each second of recording and test its accuracy and generalizability across infant age groups and human observers. We train a deep-learning-based classifier to mimic human annotations of infant motor ability, which in turn enables the construction of posture- and movement-specific developmental trajectories. Finally, all the wearable data are combined to train an algorithm to predict a novel neurodevelopmental index Baba Infant Motor Score (BIMS), which estimates infants’ maturity of motor ability, to be used in individual tracking of neurodevelopment.

Methods

Study design

A primarily cross-sectional cohort of infants was recruited to develop a methodology for quantifying spontaneous motor ability by using a wearable suit, “MAIJU” (Motor ability Assessment of Infants with a JUmpsuit) (Fig. 1a, b). Parts of the sessions were recorded with a synchronized video to allow visual annotation of posture and movements (Fig. 1c) according to a novel infant motor ability description scheme. A self-supervised learning method was employed to confirm that these motor ability classes are genuinely present in the movement signals. Then, a deep-learning-based automatic classifier was trained to analyze infant posture and movement at a second-by-second level in all wearable recordings. These classifiers were shown to perform at a human-equivalent level, enabling the construction of computational indexes for assessing the maturity of infant motor ability (or BABA Infant Motor Score, BIMS), which was compared to a clinically-used assessment scale and parental surveys.

Fig. 1

Overview of the MAIJU wearable, infant cohort, and recording data.

a A 10-month-old subject crawling at home with the MAIJU jumpsuit, equipped with movement sensors in the proximal pockets of each limb. The photograph has been published with informed parental consent. b Summary of the infant cohort (N = 59 infants, N = 64 recordings) recorded in the present study. Bars depict a monthly breakdown of the numbers of infants with MAIJU recordings with vs without synchronized annotated video recordings, as well as the total length of data available for each age. c An example recording in the annotation software showing 20 s of the raw 24-channel data obtained from the four MAIJU sensors, as well as the respective human annotations for postures and movements shown in the bars above the signals, colored according to the motor ability categories shown in Fig. 5a. Note the frequent transitions in posture and movement categories.

Overview of the MAIJU wearable, infant cohort, and recording data.

Fig. 5

Assessing maturation of infant motor ability with MAIJU.

Participants and recordings

Infants (N = 59) were recruited from the Children’s Hospital, Helsinki University Hospital, Helsinki, Finland, to participate in a larger study that assesses neurodevelopment in low-risk term-born infants (N = 38) as well as infants with mild perinatal asphyxia (N = 10) or prematurity (N = 11). Respectively, the recruitment criteria in these three arms were prematurity below 28 weeks gestational age, clinical suspicion or diagnosis of mild perinatal asphyxia in term-born infants, as well as no clinical incidents (low-risk, healthy controls) in term-born infants. For performing MAIJU recordings, we had no exclusion criteria, as the wearable testing and algorithmic development was not expected to be affected by the infant’s clinical condition. As all 59 infants were followed-up, 55 were found to develop typically, while four developed a neurodevelopmental condition. The recordings from these four infants were used in the training of the motor ability classifier, but they were excluded from the training of BIMS score, as well as the analyses of age correlations. While this cohort was primarily cross-sectional, five infants were recorded twice with an interval of 6 to 12 months, yielding a recording dataset of N = 64 recordings at ages 4.5 to 19.5 months. Corrected age was used for prematurely born infants. The infants were recorded with the MAIJU wearable (see below) at home (N = 40) or they came to a home-like environment (see below) in the BABA center due to logistical convenience (N = 24). The infant was dressed in the MAIJU suit, and the recordings lasted for 18 to 199 minutes (average 67 min), with a total recording time in the cohort corresponding to 71 h 30 min. Out of this time, 29 h (18–74 min per infant, average 43 min) in N = 41 infants were video recorded to allow motor ability annotation for classifier training (Fig. 1b, c). During the recording, children were allowed and encouraged to move about and play freely, and with minimal disturbance from the adults. The recording environment was somewhat variable between infants, which might have affected their behavior on top of the situational variance that is naturally present in spontaneous activity. There may be marked differences between homes in terms of physical layout, furniture, or child-relevant objects such as toys. However, a child’s own home is still the environment that is best known by the given child, hence it may be considered ecologically relevant for studying natural behavior. Some infants could not be recorded at home for various reasons (e.g., logistics or parent’s preference), and they came to our research lab, BABA center (www.babacenter.fi). BABA rooms are relatively large (4 × 4 meters) with a large window for natural lighting, as well as typical household furniture including table, chairs, book chest, carpet, and age-appropriate toys. While this environment is not equal to a child’s own home, our experience has shown that it is natural enough to encourage children in a seemingly normal exploratory behavior.

Research governance

The study was carried out in accordance with the Declaration of Helsinki and good clinical practice guidelines. Ethical approval was obtained from the Ethical Committee of Children’s Hospital in Helsinki, the study was approved by the Children’s Hospital, and informed written parental consent was obtained for each infant. The study was an observational methodological development study, and thus not registered as a clinical trial.

Description of the MAIJU wearable

The novel wearable MAIJU (Fig. 1a) was developed for an unobtrusive and comfortable tracking of spontaneous movement. The MAIJU suit was designed and manufactured in different sizes to fit tightly and comfortably on infants throughout the age range of interest. Little pockets with sensor connectors were laminated proximally on each limb to keep the sensors out of the infant’s reach. The garment was designed to tolerate repetitive laundry washing using detergents for synthetic materials. The fabric is akin to those used in swimming suits, i.e., a blend of polyamide and elastane to enable easy movement and a good fit for variable body shapes. The additional characteristics of the fabric include moisture transportation, stain repellency, quick-drying, and mechanical stability over multiple use cycles. Prior studies[22] have shown that four sensors placed proximally on each limb is enough to provide a reliable estimate of body posture, and the sampling rate of 52 Hz is sufficient for capturing details relevant to infant-typical movement types. The waterproof sensors (Movesense, Suunto, Finland) record tri-axial linear acceleration (accelerometer; m/s2) and angular velocity (gyroscope; deg/s), streaming the data wirelessly via Bluetooth 4.0 or 5.0 low energy (BLE) to an iOS mobile data logger application (Kaasa Solutions GmbH, Düsseldorf, Germany).

Development of the motor ability description scheme and visual annotations

We developed a phenomenological motor ability description scheme (Fig. 2) for a comprehensive, transparent, and minimally ambiguous annotation of video recordings during the infants’ spontaneous activity. The scheme had to adequately fulfill three constraints: (1) Being descriptive of all time periods of independent movement, (2) being captured by movement sensors, and (3) retaining an interpretable meaning from visual assessment. The resultant scheme recognizes five different postures and four movement levels in a manner that is physically observable with movement sensors and does not require observers’ inferences, such as estimating the child’s intention, which is commonly used in the clinical assessment scales[25]. A specific description is given in the supplementary material (Supplementary Tables S1–3). The description scheme was developed through an iterative process[22] with frequent discussions using video examples and test annotations, and comments were invited from external informants to ensure both content and clarity.

Fig. 2

Study design and the infant motor ability description scheme.

Study design and the infant motor ability description scheme.

a Flowchart depicting the overall study design. Coloring of the classification comparisons between humans (red) and human vs algorithm (blue) correspond to the same colors in section B. b Illustrations of the posture and movement categories identified in our motor ability description scheme. Numbers in each cell depict the proportion of each category within the annotated dataset (black), and the Fleiss’ kappa agreement between human observers (red) or between the algorithmic analysis and human observers (blue) in the classification of 2.3-s signal frames. c Correlation between infant age and the proportions of motor ability types (N = 42) identified from the video recordings by the human observers (individual points; the line indicates a quadratic regression model with 95% confidence intervals; r represents the Pearson’s correlation coefficient). Note a robust age-related decrease in prone posture, increases in standing and fluent movement, as well as the bell-shaped developmentally transient occurrence of the crawling posture. Each study with a synchronized video recording was annotated by two (N = 9) or three (N = 32) independent human annotators (N = 5) trained for the task and with a background in infant health care or infant research. The inter-rater agreement was measured by the Fleiss’ kappa score computed from the compounded confusion matrices, as well as by confusion matrix-based metrics such as accuracy, recall, and F1 score (see Supplementary Figs. S1, 2). The Fleiss’ kappa was used as the primary performance metric, both in overall multi-class performance as well as class-specific classification performance.

Development of the motor ability classifier

The motor ability classifier was trained as an end-to-end convolutional neural network (CNN) with a specialized structure that takes in as input pre-processed sensor signals in 2.3-s (120 signal sample) frames with 50% overlap and outputs frame-by-frame categorical probabilities for posture and movement. The preprocessing of the signals consists of the removal of gyroscope bias, linear interpolation of received sensor recording packets into a common ideal time-stamp base (with a sampling frequency of 52 Hz), and temporal smoothing with a seven-tap median filter. The structure of the classifier model was similar to the one used in our previous research[22]. It consists of an encoder module, which produces a frame-specific fused representation of the sensor signals, and a classifier module, which models the frame-to-frame temporal structure of the signals and finally produces the classification output. The posture and movement classifiers were trained separately using the same model architecture. The architecture and implementation details are presented in Fig. 3. The entire annotated dataset (N = 41, 29.3 h, 91449 frames) was used for training the system to be used for the classification of unannotated data, and the annotated data was classified with tenfold cross-validation.

Fig. 3

Block diagram of the deep-learning-based motor ability classifier architecture.

Block diagram of the deep-learning-based motor ability classifier architecture.

Abbreviations: activation function (act), average (AVG), channels (ch), convolution operation (conv), dilation (dil), filter size (fw), leaky rectified linear unit (lrelu), padding (pad). The encoder module performs frame-level sensor fusion to obtain a 160-dimensional latent expression of the raw accelerometer and gyroscope signals. The classifier module models the frame-to-frame time dynamics of these features and outputs softmax probabilities for each category separately for each of the classification tracks (posture, movement, and carrying). The training was performed with minibatch gradient descent using the ADAM algorithm (batch size 100 consecutive frames, learning rate 10−4, beta1 = 0.9, beta2 = 0.999, epsilon = 10−8) with a weighted categorical cross-entropy loss. In the loss function, each frame’s error was weighted with the inverse probability of the target class’s occurrence in the training data to mitigate the effects of unbalanced category distributions within the training data. Sample dropout (p = 0.3), as well as sensor dropout (p = 0.3), was also applied randomly to the input signals during training to ensure the robustness of the trained models. The training was run for 200 epochs and held out validation data (20% of training data) was used to select the best performing model in terms of the unweighted average F1 score. The code for the motor ability classifier was implemented with Tensorflow (v.1.12.0) and Python (v.3.6.9). The code is available at request. The performance of supervised machine learning classifiers depends on the consistency of the training annotations. Here, we wanted to utilize all available human input, including time instances with a varying agreement between the annotators. The inter-rater ambiguities in the classifier training data were resolved by combining human- and machine-generated labels in a probabilistic fashion using the iterative annotation refinement (IAR) procedure introduced in ref. [22]. In IAR, contested frame annotations (which might suffer from human inconsistency) are weighted with a classifier’s probabilistic decision (which can be thought of as being consistent for all samples) to obtain more consistent ground-truth targets for classifier training.

Analysis of latent signal structures with self-supervised learning

To obtain a general understanding of the signal structure present in MAIJU recordings, we employed contrastive predictive coding (CPC[26]) to learn a robust latent signal representation based on 42 h of unannotated MAIJU data (Fig. 4a). CPC is a self-supervised machine learning method, which utilizes a learnable encoder model to map the raw signals of an analysis frame at time t into an n-dimensional latent representation Z. In CPC training, the time structure of these latent states is modeled with a recurrent neural network (RNN) model to obtain a time-compounded representation C for each analysis frame, capturing the history of the encoded signal up to that point in time. From C, future states of the encoder latent representations Z are predicted with a simple linear projection. The training is done by applying the InfoNCE loss[26] between the target encoder value Z, the predicted value Ẑ, and a set of contrastive samples Z that are negative samples drawn from another section of the recording. As a result of the training, CPC learns an encoder representation that best supports the separation of true future signal states from false potential future signal states, hence capturing structural discriminative properties of the data without data labels. In the present work, the encoder model trained with CPC was identical to the supervised motor ability classifier (Fig. 3). The latent dimension was n = 128, the gated recurrent unit (GRU) was used as the RNN model, a prediction distance of k = 5 frames (~5.8 s) into the future was used, and the InfoNCE loss utilized ten negative samples randomly drawn from the same recording. The model was trained for 50 epochs.

Fig. 4

Classifier development and computational analyses of MAIJU recordings.

a t-SNE plots obtained from self-supervised feature embeddings (CPC) with color codings for posture (top) and movement (bottom). Note the clear clustering of posture categories, while the movement categories show relatively more dispersion. b Confusion matrices showing recall values (in %) of the algorithm output (“Predicted class”) and the compounded human expert annotations (“Target class”). Note the high numbers in the diagonal line indicating high agreement. c Comparison of quantified motor ability between the classifier and human annotations (N = 42). In the upper graphs, the scatter plots show the proportion of time spent in the given postures or movements as estimated by the classifier algorithm (Y-axis) and the human annotations (X-axis). The Pearson’s r (and its p value) denotes the linear correlation between the proportion values. Below, the Bland–Altman plots of annotations vs classification errors are shown for assessing whether the classification errors have a systematic bias and/or are dependent on the amount of posture/movement identified by the algorithm. The stippled lines depict an average one-month developmental change (percentage points per month) as taken from a linear regression model fitted between the age (in months) and the given motor ability occurrence (cf. Fig. 2c). Note that 100 and 88% of the measurements in posture and fluent movement categories, respectively, are within these stippled lines. The shaded zone depicts the 95% confidence interval (in percentage points) of the classifier error. The t value depicts the two-tailed t-test result (with 40 degrees of freedom) on the null hypothesis that the error has a mean of zero; this shows that the proportional estimates are unbiased.

Classifier development and computational analyses of MAIJU recordings.

Development of the carrying detection classifier

Since MAIJU is primarily a wearable method for out-of-hospital recordings, it was essential to minimize the need for active parental input during the recording. The at-home recordings were instructed to contain a designated “playtime” of at least an hour, during which the parents were encouraged to let the infants play independently as much as possible. Since the parents would still be allowed to guide or possibly carry the infant during such playtimes, we found it important to build an additional layer of preprocessing that would automatically detect periods of independent infant movement versus movements due to external forces, such as parental carrying. To this end, we annotated the data and trained an additional frame-level binary classifier for active carrying detection (ACD) to be run at the preprocessing stage before motor ability classification. The ACD dataset consists of a subset of 17 videoed recordings from the full dataset that were performed at infants’ homes (total length of 17 h). The annotations for the ACD task were performed with a scheme of five categories: independent movement (i.e., an infant has no contact with anyone), passive support (e.g., infant sits and leans on the parent), active support (e.g., parent supports walking by holding hands), passive carrying (i.e., an infant is being held but no movement is present), and active carrying (i.e., an infant is moved by carrying) (see Supplementary Figs. S3, 4). The deep-learning classifier structure was identical to the motor ability classifiers (cf. Fig. 3). The most reliable detection performance (leave-one-subject-out a.k.a. LOSO cross-validation) was achieved with binary classification for active carrying (97.2% accuracy; 54.5% recall, 58.1% precision for carrying; 98.7% recall, 98.5% precision for non-carrying), which means that roughly half of the frames with carrying can be automatically filtered out from further analysis at the expense of only very few false detections. The trained (and cross-validated where applicable) ACD classifier was applied to all analyses on MAIJU recording distributions (Supplementary Fig. S5), which means that these results are obtained with a realistic use-case scenario.

Performance assessment of the motor ability detection algorithm

The performance of the motor ability classifier was tested at multiple levels. At the lowest level, the performance of the frame-to-frame classification was measured based on a compounded confusion matrix: Recording-level tenfold cross-validation was utilized to produce test-set predictions from the left-out recordings. Similar to the inter-rater agreement analysis, the predictions from all folds were compounded against all of the original human annotations into a single (posture- or movement-specific) confusion matrix (Fig. 4b and Supplementary Fig. S6). Additionally, the compounded confusion matrices against the IAR-derived training targets are presented in Supplementary Fig. S7. The practically most relevant performance measure is the accuracy of the recording-level motor ability distributions, which provides the primary output to be used in subsequent analyses. Notably, if the errors on the short-term signal frame classification are unbiased, they will average out with sufficiently long recordings. Hence, recording-level distributions combined with recording length analysis are the most informative for estimating the overall feasibility of the method. We measured this performance by a two-stage analysis for each annotated category (Fig. 4c; full set in Supplementary Figs. S8, 9): first by measuring the correlation of the annotated and classifier-produced category distributions (as measured by Pearson’s r and its p value), and second, by the Bland–Altman plot analysis between the annotated distributions and the classifier error. The two-tailed t-test is used to test for the null hypothesis (at p < 0.05) that the errors have a mean of zero. To add further context to the Bland–Altman analysis, we estimated the standard deviance of the error (±2 SD error area, colored) and we also compared the error to a monthly age-related change Δ (as percentage points per month of age; drawn with dashed lines). The Δ-values were computed as the slope of a least-squares linear regression model fitted into the [age, distribution value] scatter pairs from the annotated dataset. Notably, this representation is informative only for categories with monotonical age-dependent distributions (e.g., standing and fluent movement).

Development of the BABA infant motor score (BIMS) metric

The BABA infant motor score (BIMS) predictor was designed to predict the relative maturity of an infant’s motor ability, which in the typical infants reflects the most likely age of the infant based on the classifier-produced category distributions of MAIJU recordings (see Supplementary Fig. S10): the posture and posture-conditional movement distributions. In the classifier, age-dependent multivariate Gaussian distribution models of the MAIJU motor ability distributions were estimated from the dataset of typically developing infants (N = 55 infants; N = 60 recordings) using an age resolution of 1 month, and including recordings within ±1 month from the center age in the estimation process. After the estimation, the BIMS of a new unseen recording was computed by first computing multivariate Gaussian likelihood for all of the age-dependent models, given the motor ability distributions from the present recording, followed by calculation of the weighted average of the ages corresponding to the Gaussian models using the relative likelihoods of the models as weights. The evaluation of the method was performed with LOSO cross-validation to produce age estimates for all of the held-out subjects, after which Pearson correlation between the target and predicted ages was used as the performance metric. Due to the limited availability of data for all age groups, diagonal covariance matrices were used in the multivariate Gaussian models. At least 3 recordings were used to determine the means and standard deviations of each age bin. The standard deviations were set to have a minimum value of 10−4 to ensure model stability. If at least three recordings were not found in the ±1 month range from the center age, recordings with the smallest age difference to the center bin were added into the group until 3 recordings were obtained. The modeled age ranges were from 4 months to 16 months, where the 16 months age pool included all recorded children who were over 16 months of age. This was motivated by the fact (see also Fig. 4a) that infant motor ability in our description scheme saturates at around this age[27,28], just like the well-known ceiling effect of AIMS values after 18 months of age[27]. Likewise, the target age within BIMS prediction evaluation for children over 16 months was set to 16, and the oldest age group was labeled as “16+” in Fig. 4b. Similar logic would also apply for infants younger than 4 months (not present in the dataset), which makes BIMS a bounded scale of continuous values (4 to 16 months) that are normalized into a scale of [0-100] with BIMS = (predicted_age_in_months−4) × 100/(16−4), where 0 denotes “non-mature” motor ability (as in the ≤4-month-olds’ group), and 100 denotes “fully matured motor ability” (as in the ≥16-month-olds’ group). We tested the robustness of the BIMS estimates in relation to the length of the recording time from which the MAIJU distributions are computed. To reach this end, we systematically measured the mean absolute error (MAE) on the BIMS-classifier (with LOSO cross-validation) with recordings of varying lengths. The classifier was trained with the full dataset (same as in the main BIMS experiment), but for testing, we utilized only the recordings with usable length over 120 min (N = 12) to ensure the underlying uniformity of the test data. From these recordings, we sampled subsegments (with random start times) ranging from 10 to 100 min at 10-min intervals and computed the BIMS age based on each segment’s distribution. The sampling of the segments was performed for 1000 iterations, and finally, the MAE between the BIMS scores and the true ages were computed. The MAE variability as a function of recording segment length is visualized with a boxplot where the median, interquartile range, and range of the data distributions are shown.

Comparison of the algorithmic output to clinical development

We devised a visualization approach (Fig. 5a) to provide an intuitive and easy-to-interpret picture of the MAIJU-classifier derived distributions that capture developmental change in infant motor ability as a function of age. Within the visualization, the age-category pooled (same as in the BIMS classifier; center bins 1 month apart, pooled with recordings from ±1 months) averages of category distribution values are plotted with a violin plot to highlight their deviance from zero. For the sake of visual clarity, the left/right categories have been fused for both the movement and posture tracks, and for the movement track, only a selected number of posture-dependent movement categories are shown, which highlight the development of the typical movement modalities: prone crawling, crawling, and walking.

Assessing maturation of infant motor ability with MAIJU.

a Graphs showing the occurrence of each posture (left) and motor ability class (right) as a function of infants’ prematurity-corrected age (N = 60). The black lines denote the interquartile range (IQR) of the age-related occurrence, and the red cross depicts the median age for the occurrence. The measures combine all analyzed 2.3 s time frames of the recording session, and all infants exhibit motor ability in several classes, which show clear developmental trajectories. Note also the clear developmental sequence in the movement categories within each posture. b Scatter plot showing a correlation between infants’ (N = 60) chronological age and the age prediction from the BIMS algorithm. c Dependence of BIMS estimate on the length of recordings between 10 and 100 min of data. Data were taken as randomly sampled segments from N = 12 recordings whose length was over 120 min (range 121–150 min). The findings in the Y-axis are expressed as the mean absolute error (MAE) in the age prediction as in b) (bars show the median, IQR, and the range). Note how the MAE stabilizes with recording lengths over one hour. d Correlation between BIMS and AIMS score (purple) compared to the correlation between true age and AIMS score (green) (N = 28). The result indicates that the BIMS score is biased towards the actual developmental level, as the correlation is significantly higher (cocor tests; p < 0.05; N = 28) compared to the chronological age correlation. e Comparison between a parental estimate of infant’s time spent in various postures and the MAIJU-derived corresponding measures (N = 20). A subset (N = 28, age range 8–17 months) of the recorded dataset was clinically evaluated by an experienced physiotherapist (T.H.) according to the Alberta Infant Motor Scale (AIMS[27]) on the same day as the MAIJU recordings. The Pearson correlations (rAGE, rBIMS) between the raw (not age-adjusted) AIMS score and infant age (chronological age and BIMS) were measured. To test the hypothesis that the BIMS classifier corrects infants’ ages in the direction of their motor developmental level, we utilized the two-tailed comparing correlations (coror[29]) test between rAGE and rBIMS. Finally, another subset (N = 20) of the infant cohort had an additional parental survey collected to evaluate the parents’ assessment of the amount of time the child typically spends in different postures. We used a larger custom-made questionnaire to assess many aspects of the project, including MAIJU design, infant development, and parents’ perception of various things. This questionnaire was delivered on paper and it was requested to be filled by the parents/caregivers at the time of MAIJU recording. For the present study, we chose two questions to be compared with MAIJU outputs: estimate the average amount of infant’s free playing time spent in (1) crawl posture and (2) sitting posture. The answers were given on a verbally explained scale (in Finnish), whose scale ranged from 1 to 9 (never, very rarely, rarely, sometimes, often, very often, most of the time). The survey answers were compared to the MAIJU-derived posture distribution values using Spearman’s rho (Fig. 4e) to estimate the reliability of such quantitative assessment. As the MAIJU-derived posture distributions can be assumed to be very close to the ground truth for the given recording session, discrepancies between the recorded distributions and survey answers can be mainly attributed to two sources: (1) the normal day-to-day variability of infant movements, and (2) estimation error of the parents. Our setup does not allow differentiating the relative sizes of these effects, but valuable intuitive insights can be gained by comparing the results between multiple posture categories.

Statistics and reproducibility

Data preprocessing and analysis were performed using custom Matlab codes (version 2021a). The analysis codes are publicly available at Zenodo[30]. The raw figure data have been made available in Supplementary Data S1. Due to the large class frequency imbalance expected for the recordings (e.g., older infants rarely crawl, whereas younger infants do not stand), classification performance and inter-rater agreement were measured using a compounded confusion matrix. Each individual recording’s (posture or movement-specific) confusions were summed into a total confusion matrix for all data, from which relevant statistics were computed (kappa, F1, accuracy). Standard tenfold cross-validation was utilized for classifier evaluation, where the individual recordings were split into ten equal-sized groups (folds) at the recording (participant) level, from which nine folds were used to train a classifier while testing on the remaining unseen data fold. The process was then repeated for each possible test fold. Within the training data, 80% of the frames were used for classifier training and 20% for validation. The training was stopped when the classifier had reached maximum performance for the unseen validation data based on the unweighted average F1 score. Correlations were computed using Pearson’s r, where p values were computed using the two-tailed null hypothesis that the correlation is zero, except for the parental questionnaire data where standard Spearman’s rho was applied with the same two-tailed null hypothesis. In Bland–Altman analysis, the two-tailed t-test was used to test the null hypothesis (at p < 0.05) that the errors have a mean of zero. The comparing correlations (cocor[29]) test battery was used to study the statistical significance of different correlation values with the following settings: two dependent groups, overlapping correlations, null hypothesis: r.jk = r.jh, alternative hypothesis: r.jk ≠ r.jh, alpha level 0.05. The cocor test includes the following ten sub-tests: (1) Pearson and Filon’s z, (2) Hotelling’s t, (3) Williams’ t, (4) Olkin’s z, (5) Dunn and Clark’s z, (6) Hendrickson, Stanley, and Hills’ modification of Williams’ t, (7) Steiger’s modification of Dunn and Clark’s z using average correlations, (8) Meng, Rosenthal, and Rubin’s z, (9) Hittner, May, and Silver’s modification of Dunn and Clark’s z using a backtransformed average Fisher’s Z procedure, and (10) Zou’s confidence interval. No outlier or other exclusion criteria were applied to the data, as we assumed all data to be representative of data captured in and outside the lab.

Sample size calculation

This study was observational, not interventional, hence sample size was not calculated.

74 in total

1. Parental report of early cognitive development: benefits, and next steps.

Authors: Aleid van Wassenaer-Leemhuis
Journal: Lancet Child Adolesc Health Date: 2019-08-08

2. Variability and stereotypy of spontaneous motility as a predictor of neurological development of preterm infants.

Authors: B C Touwen
Journal: Dev Med Child Neurol Date: 1990-06 Impact factor: 5.449

3. Changing Opportunities for Learning in Everyday Life: Infant Body Position Over the First Year.

Authors: John M Franchak
Journal: Infancy Date: 2018-10-23

4. Early prediction of unilateral cerebral palsy in infants with asymmetric perinatal brain injury - Model development and internal validation.

Authors: Ulrike C Ryll; Nienke Wagenaar; Cornelia H Verhage; Mats Blennow; Linda S de Vries; Ann-Christin Eliasson
Journal: Eur J Paediatr Neurol Date: 2019-04-27 Impact factor: 3.140