Marianne van der Vaart1, Caroline Hartley1, Luke Baxter1, Gabriela Schmidt Mellado1, Foteini Andritsou1, Maria M Cobo1,2, Ria Evans Fry1, Eleri Adams3, Sean Fitzgibbon4, Rebeccah Slater1. 1. Department of Paediatrics, University of Oxford, Oxford OX3 9DU, UK. 2. Colegio de Ciencias Biologicas y Ambientales, Universidad San Francisco de Quito USFQ, Quito EC170901, Ecuador. 3. Newborn Care Unit, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford OX3 9DU, UK. 4. Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK.
Abstract
Pain assessment in preterm infants is challenging as behavioral, autonomic, and neurophysiological measures of pain are reported to be less sensitive and specific than in term infants. Understanding the pattern of preterm infants' noxious-evoked responses is vital to improve pain assessment in this group. This study investigated the discriminability and development of multimodal noxious-evoked responses in infants aged 28-40 weeks postmenstrual age. A classifier was trained to discriminate responses to a noxious heel lance from a nonnoxious control in 47 infants, using measures of facial expression, brain activity, heart rate, and limb withdrawal, and tested in two independent cohorts with a total of 97 infants. The model discriminates responses to the noxious from the nonnoxious procedure with an overall accuracy of 0.76-0.84 and an accuracy of 0.78-0.79 in the 28-31-week group. Noxious-evoked responses have distinct developmental patterns. Heart rate responses increase in magnitude with age, while noxious-evoked brain activity undergoes three distinct developmental stages, including a previously unreported transitory stage consisting of a negative event-related potential between 30 and 33 weeks postmenstrual age. These findings demonstrate that while noxious-evoked responses change across early development, infant responses to noxious and nonnoxious stimuli are discriminable in prematurity.
Pain assessment in preterm infants is challenging as behavioral, autonomic, and neurophysiological measures of pain are reported to be less sensitive and specific than in term infants. Understanding the pattern of preterm infants' noxious-evoked responses is vital to improve pain assessment in this group. This study investigated the discriminability and development of multimodal noxious-evoked responses in infants aged 28-40 weeks postmenstrual age. A classifier was trained to discriminate responses to a noxious heel lance from a nonnoxious control in 47 infants, using measures of facial expression, brain activity, heart rate, and limb withdrawal, and tested in two independent cohorts with a total of 97 infants. The model discriminates responses to the noxious from the nonnoxious procedure with an overall accuracy of 0.76-0.84 and an accuracy of 0.78-0.79 in the 28-31-week group. Noxious-evoked responses have distinct developmental patterns. Heart rate responses increase in magnitude with age, while noxious-evoked brain activity undergoes three distinct developmental stages, including a previously unreported transitory stage consisting of a negative event-related potential between 30 and 33 weeks postmenstrual age. These findings demonstrate that while noxious-evoked responses change across early development, infant responses to noxious and nonnoxious stimuli are discriminable in prematurity.
The measurement and treatment of pain in the neonatal intensive care unit (NICU) is vital to prevent short- and long-term consequences of pain in neonates (McPherson et al. 2020). However, the nervous system undergoes rapid changes in structure and function across the preterm period and observed responses to noxious stimuli evolve accordingly (Fitzgerald 2005; Hatfield 2014). When the term-aged neonate undergoes a painful medical procedure, such as a blood test, cannulation, or injection, distinct noxious-evoked activity can be observed across multiple levels of the nervous system. For example, a noxious procedure can evoke autonomic responses, including increased heart rate (Waxman et al. 2016) and skin conductance (Eriksson et al. 2008), spinally mediated limb withdrawal activity (Cornelissen et al. 2013; Hartley et al. 2015), and behavioral facial grimacing responses (Grunau et al. 1990; DiLorenzo et al. 2018). In addition, patterns of noxious-evoked brain activity that are distinct from that evoked by nonnoxious sensory stimulation have been well characterized (Slater et al. 2010; Hartley et al. 2017).In contrast, in premature neonates, the observed responses following painful medical procedures are reported to be less able to discriminate noxious from nonnoxious events. For example, a similar percentage of very preterm infants display facial grimacing in response to either heel lancing or nonnoxious touch (Green et al. 2019); similar magnitude limb withdrawal responses can occur in response to noxious and nonnoxious stimuli (Cornelissen et al. 2013); and some studies report that heart rate responses are smaller or even absent in extremely preterm infants compared with term infants (Craig et al. 1993; Porter et al. 1999) although others report no relationship between heart rate response to noxious stimuli and age in preterm infants (Walden et al. 2001; Bartocci et al. 2006). Similarly, the distinct noxious-evoked brain activity patterns observed in term-aged neonates are less frequently observed in younger infants, and in very premature infants generalized delta brush activity is observed in response to both noxious and nonnoxious sensory events (Fabrizi et al. 2011). Consequently, the sensitivity of clinical pain scales in the youngest infants has been questioned, leading to limitations in using these outcomes to measure pain or to assess the efficacy of analgesics in this population (Slater et al. 2020).Nevertheless, observations that noxious and nonnoxious stimulation evoke similar response patterns in premature neonates should not necessarily be interpreted to mean that the neonatal nervous system does not discriminate noxious from nonnoxious events. Animal studies show that subtypes of A-fibers, which are differentially activated by noxious and tactile inputs, are functional at birth in rodents (Brewer and Baccei 2020)—a developmental stage that corresponds to late-preterm infant development. This suggests that the preterm infant nervous system may also be able to generate discriminable responses during this time window. Therefore, depending on the age of the infant, it may be that we cannot discriminate between noxious- and tactile-evoked responses because the evoked patterns are truly not discriminable between these events (true negative) or, alternatively, that we cannot observe discriminable responses across these modalities due to limitations in our measurement techniques, despite response patterns being able to discriminate (false negative). Given that each technique provides relatively crude measures of central nervous system (CNS) activity due to the necessity that they are noninvasive and the essential requirement that they are clinically acceptable for use in neonates, there is an abundance of methodological limitations that could lead to a failure to reject the null hypothesis. For example, evoked electroencephalography (EEG) signals are susceptible to artifact arising from infant movement (Hoehl and Wahl 2012) and can have a low signal-to-noise ratio in single trials (Boudewyn et al. 2018). Utilizing multiple recording modalities and measurement types in premature infants may be one approach whereby we can improve our power to discriminate responses to noxious and nonnoxious stimuli, as has previously been shown in late preterm and term infants (Zamzmi et al. 2017; van der Vaart et al. 2019).The aim of this study was to determine whether infants between 28 and 40 weeks postmenstrual age (PMA) display discriminable physiological, behavioral, reflexive, and cerebral responses to noxious and nonnoxious events, and if so, to investigate how these responses change throughout early human development.
Materials and Methods
Study Design
The study consisted of two parts that related to the two main aims:Part 1: Discrimination: investigation of the discriminability of responses to noxious and nonnoxious stimulationPart 2: Development: investigation of the age-related development of noxious-evoked responses across 28–40 weeks PMA.A total of 144 infants were included in this study in three datasets (Table 1). The local “Oxford” dataset (n = 70) consists of the Oxford training dataset comprising 67% of the Oxford participants (n = 47) and an independent Oxford held-out test dataset comprising 33% of the Oxford participants (n = 23). The “UCL” dataset (n = 74) is an external dataset compiled by Jones and coworkers at University College London (UCL) (Jones et al. 2018a, 2018b). The datasets included both preterm (aged less than 37 weeks PMA at study) and term (aged 37–40 weeks PMA at study) infants to allow the derivation of a developmentally sensitive discriminative model and to provide a comprehensive description of the continuous developmental trajectories from the preterm to the term period.
Table 1
Demographic information of participating infants, split by dataset
Oxford training(n = 47)
Oxford held-out test (n = 23)
UCL(n = 74)
PMA at test occasion (weeks)
35.3 (32.2–37.6)
34.9 (32.4–37.4)
35.6 (33.0–37.1)
Gestational age at birth (weeks)
34.3 (29.2–37.2)
34.1 (30.2–36.7)
34.3 (31.0–36.3)
PNA at test occasion (weeks)
0.7 (0.3–2)
0.7 (0.2–1.5)
0.7 (0.6–1.6)
Sex
Female
21 (45)
10 (43)
38 (51)
Male
26 (55)
13 (57)
36 (49)
Birth weight (g)
2110 (1183–3034)
2010 (1558–3314)
2070 (1490, 2480)
Mode of delivery
Normal vaginal delivery
16 (34)
10 (43)
19 (26)
Vaginal assisted (ventouse/forceps/kiwi) or breech
9 (19)
4 (17)
15 (20)
Elective C-section
4 (9)
3 (13)
15 (20)
Emergency C-section/C-section in labor
18 (38)
6 (26)
25 (34)
Apgar at 1 min
8 (5.3–9)
7 (4–9)
8 (6–9)a
Apgar at 5 min
10 (8–10)
10 (8–10)
9 (9–10)a
Estimated number of previous skin-breaking blood tests
9 (3–14.8)
6 (3–10.5)b
13 (7–20)
Values are median (lower quartile, upper quartile) or number (%). Abbreviations: PMA = postmenstrual age; PNA = postnatal age; UCL = University College London.
aApgar scores missing for 2 infants.
bEstimated number of previous skin-breaking blood tests missing for one infant.
Demographic information of participating infants, split by datasetValues are median (lower quartile, upper quartile) or number (%). Abbreviations: PMA = postmenstrual age; PNA = postnatal age; UCL = University College London.aApgar scores missing for 2 infants.bEstimated number of previous skin-breaking blood tests missing for one infant.For Part 1: Discrimination, the Oxford training dataset was used for feature extraction, model training, and assessment of classification performance using cross-validation, while the Oxford held-out test dataset and the UCL dataset were retained separately. The model established in the Oxford training dataset was then tested in both the independent Oxford held-out test dataset and the independent UCL dataset. The Oxford held-out test dataset was used to test the generalizability of the classification accuracy in an independent dataset collected at the same site, while the UCL dataset was used to assess generalizability across different sites. For Part 2: Development, the Oxford training data, the Oxford held-out test data, and UCL data were combined to maximize sample size and to best represent the developmental trajectories of individual features.
The Oxford Dataset
Participants and Research Governance
Infants were selected from a database of all data previously recorded by our research group between 2012 and 2021 at the John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK. Ethical approval was obtained from the National Research Ethics Service (references: 12/SC/0447, 11/LO/0350 and 19/LO/1085), and informed written parental consent was obtained before each study. Studies conformed to the standards of the Declaration of Helsinki and Good Clinical Practice guidelines.Infants were included in the current analyses if their PMA at study was available in the database and was at least 28 weeks and less than 40 weeks, and if at least 1.55 s of time-locked artifact-free EEG was available at the Cz electrode for a heel lance and a control heel lance. Infants were excluded from this study if they had a postnatal age (PNA) of 8 weeks or more, intraventricular hemorrhage (IVH) grade 3 or 4, hypoxic-ischemic encephalopathy, if their data were used in the development (but not validation) of a previously described template of noxious-evoked brain activity (Hartley et al. 2017), or if they were in the intervention group of a study that specifically investigated the effect of a pain-relieving intervention (e.g., gentle touch or kangaroo care). The database contained fewer preterm infants than (near-)term infants. To prevent data from (near-)term age infants dominating the dataset, a maximum of 10 infants per postmenstrual week were included. For all postmenstrual weeks except 36 weeks, no more than 10 infants that fit the inclusion criteria were identified and all were included in the study. Eligible study numbers may have been missed at random (e.g., if PMA was missing on the database or if the database did not state that EEG was recorded). More than 10 infants were aged 36 weeks at the time of study. To arrive at the final sample for 36 weeks PMA, studies employing a less precise way of time-locking with a microphone instead of an automated event detection interface (n = 4) were removed and then 10 study numbers were selected at random from the remaining list before data analysis. Some infants in the database participated in multiple test occasions; only one test occasion per infant was included in this study. Parts of this dataset were previously used for other analyses (Hartley et al. 2016, 2017; Gursul et al. 2018; Green et al. 2019; van der Vaart et al. 2019; Cobo et al. 2021; Schmidt Mellado et al. 2021). Infants received a variety of medications on the day of study. The most prescribed medications were antibiotics (Oxford training dataset, n = 16, Oxford held-out test set, n = 10) and caffeine (Oxford training dataset, n = 12, Oxford held-out test set, n = 5). None of the infants received analgesics, anticonvulsants, sedatives, or muscle relaxants on the day of study.The selected infants were randomly split into the Oxford training dataset (n = 47) and the Oxford held-out test dataset (n = 19), stratified in weeks PMA at study, using the function cvpartition in MATLAB (Mathworks, version 2019a). During feature extraction and model development in the Oxford training dataset, four infants fitting the inclusion criteria were recruited and were included in the held-out test set bringing the total number of infants in the Oxford held-out test dataset to 23. Demographic information can be found in Table 1 and and Supplementary Figure 1.
Experimental Procedures
All infants were studied during a noxious stimulus, a clinically required heel lance performed for blood sampling, and a nonnoxious stimulus—a control heel lance where the lancet was rotated by 90° and held against the infant’s foot so that when released the blade did not pierce the infant’s skin. All infants received comfort measures in line with clinical practice and judged appropriate by the clinical member of the research team. Dependent on infant age and parental preference, this included swaddling, nonnutritive sucking, containment (by hands or by fabric, such as placing the infant in a nest) or being held by a parent.
Recording Techniques
EEG was recorded using a SynAmps RT 64-channel headbox and amplifiers (Compumedics Neuroscan) and CURRYscan7 neuroimaging suite (Compumedics Neuroscan). Activity was recorded at Cz and 3–20 other electrodes with the reference electrode at Fz and a ground electrode at FPz/forehead according to the international 10–20 system. In two infants, data were recorded with a reference electrode at FPz and re-referenced to Fz during analysis. Sampling frequency was 2000 Hz (68 infants) or 1000 Hz (two infants, upsampled to 2000 Hz during analysis). The scalp was gently cleaned with preparation gel (Nuprep gel, D.O. Weaver and Co.) before the placement of disposable Ag/AgCl cup electrodes (Ambu Neuroline) with conductive paste (Elefix EEG paste, 8 Nihon Kohden). Electrocardiogram (ECG) was recorded with an electrode on the chest which was referenced to the EEG reference electrode. Electromyogram (EMG) was recorded using surface bipolar electrodes (Ambu Neuroline) on the biceps femoris of both legs. The stimuli were time-locked to the electrophysiological (EEG, EMG, and ECG) recordings with an accelerometer and an automated detection interface (Worley et al. 2012). In two infants, the stimuli were time-locked using a microphone that recorded the click produced by the lancet and was recorded along with the electrophysiological recordings.Infants’ facial expressions were recorded using a video camera for 15 s before and until 30 s after the heel lance and control heel lance (see Facial Expressions). The timing of the stimulus was marked with an LED light that was activated by one of the investigators at the point of stimulation.
UCL Dataset
The dataset from Jones and coworkers, which is available on request through the UK Data Service (Jones et al. 2018a, 2018b), contains anonymized EEG recordings, ECG recordings, and facial expression scores for 112 infants during a heel lance, control heel lance, and auditory stimulus. These data were collected by an independent research group at a different site than our Oxford dataset but using similar acquisition methods and stimuli. The UCL study was approved by the NHS Health Research Authority (London—Surrey Borders) and conformed to the declaration of Helsinki (Jones et al. 2018b). Written parental consent was obtained before each test occasion (Jones et al. 2018b). Further details on research governance and recording methods can be found in the associated publication (Jones et al. 2018b). A total of 74 infants in the dataset met our inclusion criteria (they had both a control heel lance and a heel lance, were aged between 28 and 40 weeks PMA, and less than 8 weeks PNA at study) and were included in our analysis. Of these infants, nine had IVH of unknown severity. Comfort methods such as swaddling and kangaroo care were used for each test occasion. Infants received a variety of medications in the 24 h preceding the study. The most prescribed medications were antibiotics or antivirals (n = 15) and caffeine (n = 23). One infant received anticonvulsant therapy in the 24 h before the study. None of the infants received analgesics, sedatives, or muscle relaxants in the 24 h before the study. Demographic information can be found in Table 1 and age distributions are plotted in Supplementary Figure 1.
Analysis
Features of Interest
Relevant literature was reviewed to identify which of the modalities that were available in the Oxford dataset could potentially contribute to the discrimination of noxious from nonnoxious stimuli in a wide age range. The final list of modalities was based on the strongest evidence from the literature, the availability of the data in the Oxford training dataset and the need to restrict the number of features to avoid overfitting of the model (see: ``Part 1: Discrimination-Classification Model''). The list of modalities is not exhaustive. Facial expressions, noxious-evoked event-related potential (ERP) magnitude, and EEG spectral changes were included in the analysis because previous studies report that these modalities are more responsive to noxious stimulation than to nonnoxious stimulation in term infants (Grunau et al. 1990; Slater et al. 2010; Fabrizi et al. 2016; Hartley et al. 2017). Heart rate was included as it has discriminable value in (near-) term infants (Bartocci et al. 2006; van der Vaart et al. 2019) and potentially in preterm infants (Johnston et al. 1995; Holsti et al. 2005). Limb withdrawal is reported to be present in both term and preterm infants (Hartley et al. 2016), with discriminative ability increasing with infant PMA (Cornelissen et al. 2013). As preterm infants are less likely to display the well-characterized noxious-evoked ERP (Fabrizi et al. 2011), a data-driven approach using principal component analysis (PCA) (see below) was used to identify new potentially discriminable features in time-locked EEG recordings in preterm infants. Finally, PMA at test occasion was included in the model as it may modulate noxious-evoked responses. The modalities and features derived from them are listed in Table 2.
Table 2
Response modalities and features used in this study, organized by response domain
Response domain
Response modality
Measurement technique
Extracted feature
Feature name
Neural
Cerebral
EEG
Coefficient of previously derived template of noxious-evoked brain activity (Hartley et al. 2017)
Template
Event-related potentials summarized asPCs; coefficient of first 3 PCs derived from noxious, nonnoxious and combined noxious and nonnoxious EEG data
PC 1–3: noxious, PC 1–3: nonnoxious, PC 1–3: all data
ERSPs: mean log power in 5 time–frequency windows
Early delta, early alpha, late delta, late alpha, late beta
Autonomic
Cardiovascular
ECG
Heart rate increase from mean in 15 s prestimulus to 15 s poststimulus (14 s in the UCL dataset, see Materials and Methods).
Heart rate
Behavioral
Limb withdrawal
EMG
RMS of EMG activity recorded from the ipsilateral and contralateral biceps femoris in the first second poststimulus.
Ipsilateral reflex, contralateral reflex
Facial expressions
Video
Duration of brow bulge in seconds in the 30 s poststimulus
Response modalities and features used in this study, organized by response domainAbbreviations: ECG = electrocardiography; EEG = electroencephalography; EMG = electromyography; PMA = postmenstrual age.
EEG Preprocessing
For the EEG analysis, only the Cz electrode was considered as this has the highest signal-to-noise ratio for recording noxious-evoked brain activity in infants compared with other electrode sites (Hartley et al. 2017). The EEG data were processed using Brainstorm (Tadel et al. 2011) and EEGlab (version 2019.0, Delorme and Makeig 2004) functions in the MATLAB programming environment (Mathworks, version 2019a). Data were filtered with a low-pass filter of 30 Hz, a high-pass filter of 1 Hz and a notch filter at 50 Hz using a Hamming windowed-sinc FIR filter implemented in EEGlab. Data were epoched around the control heel lance and heel lance stimuli with 0.5 s before and 1.05 s after the stimulus for PCA and for projecting the previously developed template of noxious-evoked brain activity and 2.5 s before the stimulus and 4 s after the stimulus for the event-related spectral perturbation (ERSP) analysis. Epochs were visually inspected and rejected if any artifact was present (see Table 4).
Table 4
Overview of missing data for each feature, split by dataset
Number of missing entries
Oxford training
Oxford held-out test
UCL
EEG template of noxious-evoked brain activity and PCA (<1 s)
0/47
0/23
0/74
EEG ERSP analysis (4 s poststimulus)
3/47 heel lances excluded due to movement artifact occurring after 1 s
1/23 heel lances excluded due to movement artifact occurring after 1 s
N/A: only 2 s of poststimulus EEG data available
Heart rate
4/47 infants excluded in both conditions due to ECG recording artifact
1/23 infants excluded in both conditions due to the presence of ectopic beats
19/74 control heel lances unavailable (4 due to artefact, 15 due to less than 14 seconds of ECG) and 9/74 heel lances unavailable (7 due to artifact, 1 with ECG less than 14 s and 1 because ECG was not available)
Ipsilateral reflex
2/47 infants excluded due to movement and recording artifact
0/23
N/A: no EMG available
Contralateral reflex
1/47 infants excluded in both conditions due to movement and recording artifact, in 2/47 infants contralateral EMG was not recorded
0/23
N/A: no EMG available
Brow bulge duration
3/47 infants excluded in both conditions due to failed video recording
2/23 infants excluded in both conditions due to failed video recording
4/74 heel lances and 4/74 control heel lances unavailable (data from 5 infants)
A template of noxious-evoked brain activity suitable for infants aged 34–42 weeks PMA was previously developed and validated in an independent cohort of infants (Hartley et al. 2017). To calculate the weights of this template of noxious-evoked brain activity in the current datasets, EEG epochs were first baseline corrected to the prestimulus mean and then the template of noxious-evoked brain activity was projected to each infant’s data as described in the original publication (Hartley et al. 2017).
Extraction of Main ERP Waveforms using PCA
In many studies, the term ERP refers to the event-related response most typically derived by averaging many trials within and across participants time-locked to a stimulus. Here, we investigated responses to a clinically required noxious stimulus (and a nonnoxious control) which is necessarily a single trial and therefore required alternate methods for derivation. We use the term ERP to describe both the time-locked single-trial responses and the time-locked average responses across infants.PCA was used to quantify the magnitude of ERPs evoked by the noxious and nonnoxious stimuli. PCA permits the investigation of the entire waveform concurrently, instead of focusing on peak amplitudes. In our analysis, epochs were considered as variables and timepoints as observations. We aimed to extract the principal components (PCs) that explained most of the variance in the noxious and nonnoxious responses at the Cz electrode in the Oxford training dataset. Therefore, we used temporal PCA to extract the main waveforms from the responses to the noxious stimulus only, responses to the nonnoxious stimulus only, and the responses to both the noxious and nonnoxious stimuli together. The procedure below was followed for each set of responses.First, individual epochs were baseline corrected to the prestimulus mean. Then, each infant’s data were aligned to an age-weighted average using Woody filtering to allow for age-related interindividual differences in latency to evoked responses. To do so, an age-weighted average signal was calculated for each infant by average weighting of that infant’s and the other infants’ epochs. A Gaussian window with a length of 8 weeks (56 days) was used to assign weights between 0 and 1 to the other traces based on PMA at test occasion in days. This meant that weights were maximal for traces belonging to infants with the same PMA and decayed to 0.5 for traces belonging to infants who were ~2 weeks older or younger and to 0 for infants who were more than 4 weeks older or younger (see Supplementary Fig. 2). The infant’s signal was then aligned to this average using Woody filtering (with a maximum jitter of ±50 ms) in the first second after the stimulus. PCA was performed on the set of Woody-filtered signals in the first 1000 ms poststimulus to identify the waveforms explaining most of the variance in the data. For each set of responses (noxious, nonnoxious and combined noxious and nonnoxious), the first PCs that together explained at least 75% of the variance were considered for further analysis. This yielded three PCs per response set and nine PCs in total.The nine PCs that were identified in the Oxford training dataset were projected onto all data in the Oxford dataset (training and held-out test datasets) and UCL dataset. To do so, the PCs were decomposed using singular value decomposition and the resulting singular values were multiplied with each infant’s data (Hartley et al. 2017). Each trial was first Woody filtered with a maximum jitter of ±50 ms so that the data achieved best alignment with each PC. The corresponding weights were the features that were brought forward to the classification model.
ERSP Analysis
ERSP analysis was performed using EEGlab (Makeig 1993; Grandchamp and Delorme 2011). Data were decomposed into 200 × 59 time–frequency pixels using Morlet wavelets. The number of cycles was set to linearly increase from 3 cycles at 1 Hz to 45 cycles at 30 Hz, which corresponds to 50% of the number of cycles in the equivalent fast Fourier transform window (Makeig 1993). Individual trials were baseline corrected by dividing the poststimulus power at each frequency by the average power in the same frequency band in the 500-ms baseline period directly before the stimulus (Makeig 1993; Grandchamp and Delorme 2011). Data were then log transformed at the single-trial level.In the average ERSP plot in the Oxford training dataset, we visually identified five time–frequency windows that were responsive to noxious stimulation (prominent positive or negative deviations from zero), analogous to identifying peaks in temporal EEG analysis (Fig. 1, Table 3). The mean log power in the five time-frequency windows identified in the Oxford training dataset were used as features in the classification model. This feature was extracted for each individual epoch in the Oxford training dataset and the Oxford held-out test dataset. The UCL dataset only contained 2 s of poststimulus EEG recordings which was too short to calculate the time–frequency decomposition as described above.
Figure 1
Mean ERSP plot in response to the noxious stimulus in the Oxford training dataset. Boxes indicate the time–frequency windows of interest identified using this plot. The noxious stimulus (clinically required heel lance) was performed at time = 0.
Table 3
Time–frequency windows of interest
Name
Time poststimulus
Frequency
Early delta
250–750 ms
2–4 Hz
Early alpha
250–750 ms
7–15 Hz
Late delta
1000–2000 ms
1–2 Hz
Late alpha
1000–2000 ms
7–15 Hz
Late beta
1500–2300 ms
28–30 Hz
Mean ERSP plot in response to the noxious stimulus in the Oxford training dataset. Boxes indicate the time–frequency windows of interest identified using this plot. The noxious stimulus (clinically required heel lance) was performed at time = 0.Time–frequency windows of interest
ECG Preprocessing and Feature Extraction
ECG data were filtered between 12 and 40 Hz using a Hamming windowed-sinc FIR filter implemented in EEGlab (Delorme and Makeig 2004) and divided into epochs with 30 s pre- and poststimulus for the Oxford data and 14 s pre- and poststimulus for the UCL dataset (as 14 s of data were available for most infants in this dataset). The MATLAB (Mathworks, version 2019a) function findpeaks.m was used to identify R peaks; traces were visually examined and extra or missed peaks were manually corrected. Traces with severe artifact were rejected (see Table 4). Heart rate was calculated at every second by taking the inverse of the average RR interval in 3-s sliding windows. Mean heart rate in the 15 s before the stimulus was calculated and subtracted from the maximum heart rate in 15 s poststimulus, as this was shown to be a feature that could discriminate noxious from nonnoxious stimulus responses in a previous model developed in infants from 34 weeks PMA (van der Vaart et al. 2019), and this was used as the heart rate feature in the classification model. For the UCL dataset, the mean in the 14 s prestimulus was subtracted from the maximum in the 14 s poststimulus.Overview of missing data for each feature, split by datasetAbbreviations: ECG = electrocardiography; EEG = electroencephalography; EMG = electromyography; PCA = principal component analysis; PMA = postmenstrual age.
EMG Preprocessing and Feature Extraction
EMG data were rectified, filtered between 10 and 500 Hz with a notch filter at 50 Hz and harmonics (100 and 150 Hz) using a Hamming windowed-sinc FIR filter implemented in EEGlab (Delorme and Makeig 2004), and epoched with 5 s before the stimulus and 15 s after the stimulus. Data were visually inspected to identify artifacts and removed if necessary (see Table 4).The root mean square (RMS) of the signal was calculated in 250 ms bins, and the poststimulus RMS data were divided by the mean RMS in the 1 s prestimulus (to correct for interindividual differences in the baseline EMG signal). The mean baseline-corrected RMS in the first 1000 ms poststimulus was used as a feature in the classification model. This time window was chosen as EMG RMS activity in the first 1000 ms is differentially modulated by noxious and nonnoxious stimuli in term infants (Hartley et al. 2015), is more comparable between term and preterm infants than later time windows (Cornelissen et al. 2013), and is less likely to be contaminated by other behavioral responses, which tend to occur after 1 s (Slater et al. 2009).
Facial Expressions
The duration of brow bulge in seconds in the 30 s after the control heel lance and heel lance was scored according to the feature within the Premature Infant Pain Profile—Revised (PIPP-R) score (Stevens et al. 2014) by researchers trained in this scoring system. For the Oxford dataset, researchers were blinded to the stimulus type. For the UCL dataset, scoring was performed by the original research team in UCL, and we obtained the facial expression duration as part of the dataset that was shared (for details, see Jones et al. 2018b). Brow bulge was chosen as a feature for our model as it has been previously shown to discriminate noxious from nonnoxious stimuli and is highly correlated with other noxious-related facial expressions (van der Vaart et al. 2019).
Missing Data
Not all variables were available for all infants due to technical difficulties during recordings or artifacts. The UCL dataset did not contain EMG data. Table 4 summarizes missing data.
Part 1: Discrimination—Classification Model
To investigate the ability to discriminate noxious from nonnoxious responses across the age range from 28–40 weeks PMA using multimodal responses and PMA, we used a bagged decision tree classification model (Breiman et al. 1984; Breiman 2001). The bagged decision tree model was chosen because it provides robust predictions and can handle missing entries in the data (García-Laencina et al. 2010). The model was implemented in MATLAB (Mathworks, version 2019a) using the function TreeBagger, with 500 weak learners and a minimum leaf size of 3, to classify responses as either noxious (heel lance) or nonnoxious (control heel lance). The model used 20 features (see Table 2): nine EEG PC features obtained through PCA, the weight of the previously described template of noxious-evoked brain activity (Hartley et al. 2017), mean log power increase in the five time–frequency windows identified in the ERSP analysis, increase in heart rate, ipsilateral and contralateral limb withdrawal quantified by EMG RMS, the duration of brow bulge, and PMA at study. Surrogate splits were specified to improve predictions in data with missing variables and the interaction–curvature test was used to allow for interactions between pairs of variables and to obtain unbiased estimates of predictor importance. All the predictors could be selected at each split (in contrast to random forest) to get unbiased estimates of predictor importance.The model was trained in the Oxford training dataset using leave-one-infant out cross-validation (i.e., responses to both the noxious and nonnoxious stimulus for a single infant were left out) to get an estimate of performance in the training set. The model was then retrained on all the data in the Oxford training dataset, and this final model was tested on the two independent datasets, the Oxford held-out test dataset and the UCL dataset, to assess generalizability. The UCL dataset only contained temporal EEG features, heart rate increase, and brow bulge duration, and all other features were considered missing data. Classification accuracy with 95% binomial proportion confidence interval was used to assess model performance. Given the equal number of observations for the noxious and nonnoxious categories, this was a balanced classification problem with an expected null accuracy of 50%. False-positive rates (FPRs) and false-negative rates (FNRs) are also provided for greater insight into model performance. Subanalyses of accuracy in four different age groups (28 < 31, 31 < 34, 34 < 37 and 37 < 40 weeks) were performed in the Oxford training dataset (using the cross-validation results) and UCL dataset only as the subgroups in the Oxford held-out test set were too small to generate reliable results (n = 3 in the youngest age group).The permutation-based feature importance was estimated both in the out-of-bag samples in the final model derived from the Oxford training dataset (to assess feature importance in the training set) and on the Oxford held-out test dataset (to assess features important for generalization). The out-of-bag feature importance was calculated using a built-in functionality in the TreeBagger MATLAB (Mathworks, version 2019a) function which calculates the decrease in out-of-bag classification accuracy if the variable is permuted across the samples that are out-of-bag for a given tree. For each feature, this value is standardized by averaging across all trees and divided by the standard deviation over all trees in the model. Feature importance in the Oxford held-out test dataset was calculated in a similar way, by permuting the observations in the held-out set 50 times for each tree in the model separately for each feature. Mean accuracy loss across the 50 permutations was calculated for each feature and each tree and then averaged and standardized across all trees in the model. To further assess the power of the most important features, a model based on only the four most important features and PMA was trained using all available data, combining the Oxford training dataset, Oxford held-out test dataset, and UCL dataset, and accuracy was assessed by leave-one-infant out cross-validation.
Part 2: Development—Developmental Trajectories
To assess the developmental trajectories of the features included in the classification model, the data from the Oxford training dataset, Oxford held-out test dataset and the UCL dataset were combined, and all statistics were calculated across the three datasets. In the figures, the Oxford dataset (consisting of the Oxford training dataset and the Oxford held-out test dataset) and the UCL dataset are plotted in different colors in the same axes to show the similarity of the data across the two different research sites.Developmental trajectories of the ERP, heart rate, limb withdrawal and ERSP features were qualitatively assessed by plotting age-weighted average (ERP, heart rate, ERSP) or median (limb withdrawal, using the function weightedMedian (Haase 2021)) responses in 4-week sliding windows. Within the window, each infant’s data were weighted so that infants with a PMA close to the central PMA were upweighted and infants with a PMA further from the central PMA were downweighted. A Gaussian window was used to assign the weights between 0 and 1 so that infants who were 1 week from the central PMA were assigned a weight of ~0.5 (see Supplementary Fig. 2). Developmental trajectories of brow bulge were expressed as the proportion of infants showing a brow bulge response (defined as a brow bulge duration above 0 s) in 3-week sliding windows.The development of the subset of noxious-related features with PMA was also quantitatively assessed. The associations between PMA and noxious-evoked heart rate, PC 1 noxious, PC 2 noxious, PC 3 noxious, early delta, early alpha, late delta, late alpha, late beta, ipsilateral reflex, and contralateral reflex were quantified using linear regression models with P-values derived nonparametrically in the FSL PALM (permutation analysis of linear models) toolbox (Winkler et al. 2014) using 10 000 permutations. PALM was implemented in the MATLAB (Mathworks, version 2019a) environment. Each feature was tested separately resulting in a total of 11 statistical tests, and the Holm–Bonferroni correction (Holm 1979) implemented in R (version 4.1.0, R Core Team 2020) was used to correct for multiple testing. The relationship between PMA and brow bulge duration was not tested as we previously reported these results for the Oxford dataset (Green et al. 2019); PC 1–3 nonnoxious and PC 1–3 all data were not tested as these features are not extracted solely from the noxious data. Finally, the template was not tested because its waveform shows high similarity to PC3 noxious (see Fig. 5).
Figure 5
The three PCS derived in the noxious data in the Oxford training dataset and their associated age-related trajectories in the Oxford and UCL datasets. (A–C) The three PCs derived from the noxious data in the Oxford training dataset (scaled to maximum amplitude within each PC). In (C), the previously published template of noxious-evoked brain activity (Hartley et al. 2017) is scaled and overlaid in red to show the similarity of the waveforms. (D–F) Corresponding age-related trajectories of the PC coefficients. PC coefficients were obtained by projecting the PCs to the Oxford training dataset, Oxford held-out test dataset and UCL dataset. Dots represent individual infants (noxious responses only). Developmental trajectories (color and gray lines: noxious and nonnoxious trajectory, respectively) are estimated for visualization only by calculating the age-weighted average and standard deviation of the PC coefficients at each postmenstrual week (see Materials and Methods). Oxford data contain the Oxford training dataset and the Oxford held-out test dataset. Abbreviations: PMA = postmenstrual age.
Additional exploratory analyses were performed to investigate the effects that can more specifically be attributed to PNA or PMA adjusted for PNA by including PNA as a covariate in the linear models described above (see Supplementary Table 1). As we were primarily interested in the effect of an infant’s overall age (gestational age + postnatal age) on the different response features, the adjustment for PNA was not used as the main analysis. These additional exploratory analyses are provided to give further insights into our primary PMA association results. Due to the exploratory nature of these analyses, the P-values are presented unadjusted for multiple testing.Two-sided paired t-tests were used to compare the magnitude of PC 1–3 noxious between the noxious and the nonnoxious stimuli with P-values derived nonparametrically in the FSL PALM (permutation analysis of linear models) toolbox (Winkler et al. 2014) using 10 000 sign-flips. The Holm–Bonferroni correction was used to correct for multiple testing across these three tests.To examine the clustering of features, each feature was first normalized to Z-scores (subtraction of mean and division by standard deviation) within the feature across all ages, and then an age-weighted average of the normalized feature was calculated at each PMA in 4-week sliding windows as described above. Linear models were used to obtain estimates for the regression coefficient between PMA and the Z-scores, and features were sorted by the magnitude of the regression coefficient. Clusters of features with similar trajectories were visually identified.
Software
Shaded plots were created using the shadedErrorbar function implemented in MATLAB (Mathworks, version 2019a, Campbell 2021). Colors were generated using ColorBrewer (Stephen 2021).
Results
Neonates from 28 Postmenstrual Weeks Have Discriminable Responses to Noxious and Nonnoxious Stimulation
Acute noxious stimulation in neonates from 28–40 weeks PMA evokes changes in brain activity, limb withdrawal activity, heart rate, and facial expressions that can be discriminated from responses to nonnoxious stimulation at the group level (Fig. 2), thus demonstrating the appropriateness of including these modalities in a classification model.
Figure 2
Group response to the noxious stimulus (heel lance, top row) and nonnoxious stimulus (control, bottom row) in the Oxford training dataset (n = 47) for the cerebral, limb withdrawal, cardiovascular, and facial expression modalities. Activity is combined across infants aged 28–40 weeks PMA and may not be representative for every age group. The stimulus took place at time = 0 s. (A, B) Mean ERP at the Cz electrode. An early N–P complex is visible in both the noxious and nonnoxious responses (arrows), whereas the later N–P complex is only present in response to the noxious stimulus (asterisk). (C, D) Mean ERSP plot demonstrating an increase in delta, alpha, and beta power in the 2-s period poststimulation, which is greater in response to noxious stimulation compared with nonnoxious stimulation. (E, F) Median RMS of the ipsilateral limb withdrawal activity, plotted as fold-increase over a 1-s baseline. Limb withdrawal activity to the noxious stimulus is greater than to the nonnoxious stimulus. (G, H) Median RMS of the contralateral limb withdrawal activity. Limb withdrawal activity to the noxious stimulus is greater than to the nonnoxious stimulus. (I, J) Mean heart rate. Noxious stimulation elicits a greater increase in heart rate than nonnoxious stimulation. (K, L) Duration of brow bulge, shown as the percentage of infants displaying a brow bulge of a certain duration. A greater proportion of neonates display facial grimacing (brow bulge) to the noxious stimulation (66%) compared with the control condition (25%).
Abbreviations: ERP = event-related potential.
Group response to the noxious stimulus (heel lance, top row) and nonnoxious stimulus (control, bottom row) in the Oxford training dataset (n = 47) for the cerebral, limb withdrawal, cardiovascular, and facial expression modalities. Activity is combined across infants aged 28–40 weeks PMA and may not be representative for every age group. The stimulus took place at time = 0 s. (A, B) Mean ERP at the Cz electrode. An early N–P complex is visible in both the noxious and nonnoxious responses (arrows), whereas the later N–P complex is only present in response to the noxious stimulus (asterisk). (C, D) Mean ERSP plot demonstrating an increase in delta, alpha, and beta power in the 2-s period poststimulation, which is greater in response to noxious stimulation compared with nonnoxious stimulation. (E, F) Median RMS of the ipsilateral limb withdrawal activity, plotted as fold-increase over a 1-s baseline. Limb withdrawal activity to the noxious stimulus is greater than to the nonnoxious stimulus. (G, H) Median RMS of the contralateral limb withdrawal activity. Limb withdrawal activity to the noxious stimulus is greater than to the nonnoxious stimulus. (I, J) Mean heart rate. Noxious stimulation elicits a greater increase in heart rate than nonnoxious stimulation. (K, L) Duration of brow bulge, shown as the percentage of infants displaying a brow bulge of a certain duration. A greater proportion of neonates display facial grimacing (brow bulge) to the noxious stimulation (66%) compared with the control condition (25%).Abbreviations: ERP = event-related potential.A classification model, which was created based on the observed noxious-evoked cerebral, limb withdrawal, cardiovascular, and facial expression responses (see Table 2 in Materials and Methods for the definition of each feature), had a discriminative accuracy of 0.79 (95% CI: 0.70–0.87) between the noxious and nonnoxious conditions and an FPR of 0.28 and FNR of 0.15 in the Oxford training dataset (accuracy across all ages estimated using leave-one-infant-out cross-validation, see Fig. 3). The accuracy in preterm infants was 0.82 (95% CI: 0.73–0.92). The model performed equally well in the independent Oxford held-out dataset, with an accuracy of 0.76 (95% CI: 0.64–0.88), FPR of 0.22, and FNR of 0.26 in the full dataset and an accuracy of 0.74 (95% CI: 0.59–0.88) in preterm infants, showing generalizability across datasets collected at the Oxford site.
Figure 3
Overview of classification performance and feature importance estimates. (A) Accuracy of the classification model across age in the Oxford training dataset (estimated using leave-one-subject-out cross-validation). Error bars represent 95% confidence interval. (B) Number of infants in each age group in the Oxford training dataset. (C) Accuracy of the classification model across age in the UCL Dataset. (D) Number of infants in each age group in the UCL dataset. (E) Feature importance estimates obtained by permuting the observations in the Oxford held-out test dataset. See Table 2 in Materials and Methods for a description of each feature. Abbreviations: PMA = postmenstrual age.
Overview of classification performance and feature importance estimates. (A) Accuracy of the classification model across age in the Oxford training dataset (estimated using leave-one-subject-out cross-validation). Error bars represent 95% confidence interval. (B) Number of infants in each age group in the Oxford training dataset. (C) Accuracy of the classification model across age in the UCL Dataset. (D) Number of infants in each age group in the UCL dataset. (E) Feature importance estimates obtained by permuting the observations in the Oxford held-out test dataset. See Table 2 in Materials and Methods for a description of each feature. Abbreviations: PMA = postmenstrual age.Importantly, these results were confirmed in another independent dataset collected by a different research group at a different research site (UCL dataset, Jones et al. 2018b), demonstrating the generalizability of these findings (Fig. 3). The accuracy of the classifier in the UCL dataset was 0.84 (95% CI: 0.78–0.90), with an FPR of 0.11 and an FNR of 0.22 in the full dataset. In preterm infants, the model had an accuracy of 0.83 (95% CI: 0.76–0.90).Differences in accuracy across 3-week age groups were investigated in the Oxford training dataset and the UCL datasets, where sufficiently large subgroups could be generated based on PMA (Fig. 3). The accuracy was balanced across age groups. Notably, in the 28–31 week group, the accuracy was 0.78 (95% CI 0.59–0.97) in the Oxford training dataset and 0.79 (95% CI 0.57–1.00) in the UCL dataset confirming that neonatal responses to noxious and nonnoxious stimuli can be discriminated in very preterm infants.The importance of each feature to the model was evaluated, which showed that heart rate, late delta, ipsilateral limb withdrawal, and the template of noxious-evoked brain activity gave the greatest contribution to the classification accuracy in both the Oxford training dataset and the Oxford held-out test dataset (Fig. 3, Supplementary Fig. 3). These features span cardiovascular, behavioral, and cerebral modalities, thus demonstrating that classification benefits from data recorded across multiple domains. A new classifier trained on only these four features and PMA across all available data (the Oxford training dataset, Oxford held-out test dataset and the UCL dataset) performed equally well with an overall discriminative accuracy of 0.78, estimated using leave-one-out cross-validation (95% CI 0.73–0.83, varying from 0.69 in the 31–34 weeks PMA group to 0.81 in the 37–40 weeks PMA group).
Noxious-Evoked Cerebral, Limb Withdrawal, Cardiovascular, and Facial Expression Responses Are Age Dependent
Distinct morphological changes in noxious-evoked activity can be observed between 28 and 40 weeks PMA, which are highly consistent across the independently collected Oxford and UCL datasets (Fig. 4). To quantitatively investigate age-related changes in noxious-evoked responses, the Oxford training dataset, Oxford held-out test dataset, and UCL dataset were combined. Noxious-evoked increases in heart rate are small in the youngest infants and significantly increase with PMA (Fig. 4 = 130, beta = 1.69, t = 7.08, Holm-corrected P = 0.0011). In contrast, the magnitude of the withdrawal reflex in either the ipsilateral or contralateral limb in the first second poststimulus does not significantly change with age (Fig. 4; ipsilateral limb: n = 68, beta = −0.073, t = −0.39, Holm-corrected P = 1.0; Fig. 4; contralateral limb; n = 67, beta = 0.21, t = 0.90, Holm-corrected P = 1.0) although greater magnitude responses are visually apparent in the younger infants. The proportion of infants who showed facial grimacing to the noxious stimulus increases from 33% in the youngest age group to 60% in the oldest age group in the Oxford dataset and from 40% to 64% in the UCL dataset (Fig. 4).
Figure 4
Noxious-evoked ERP, heart rate, ipsilateral reflex, contralateral reflex, and brow bulge responses in neonates from 28 to 40 postmenstrual weeks, split by postmenstrual week. For the continuous variables, each trace is an age-weighted average (ERP, heart rate) or median (ipsi- and contralateral reflex) in 4-week sliding windows around the central PMA. For the brow bulge responses, bars demonstrate the percentage of infants who displayed a brow bulge response in a group of infants with a PMA that falls within 1.5 weeks relative to the central PMA. Oxford data contain the Oxford training dataset and the Oxford held-out test dataset. EMG is not available in the UCL dataset. Column (A): ERP morphology changes from a large negative peak in 30–33 week PMA to a positive waveform. Column (B): Heart rate rise increases with PMA. Columns (C, D): Noxious-evoked reflex activity in the first second poststimulus does not significantly change across development. Ipsilateral reflex responses visually decrease in magnitude when examining the entire reflex duration. Column (E): The proportion of infants displaying a brow bulge response increases with PMA. Abbreviations: ERP = event-related potential; PMA = postmenstrual age.
Noxious-evoked ERP, heart rate, ipsilateral reflex, contralateral reflex, and brow bulge responses in neonates from 28 to 40 postmenstrual weeks, split by postmenstrual week. For the continuous variables, each trace is an age-weighted average (ERP, heart rate) or median (ipsi- and contralateral reflex) in 4-week sliding windows around the central PMA. For the brow bulge responses, bars demonstrate the percentage of infants who displayed a brow bulge response in a group of infants with a PMA that falls within 1.5 weeks relative to the central PMA. Oxford data contain the Oxford training dataset and the Oxford held-out test dataset. EMG is not available in the UCL dataset. Column (A): ERP morphology changes from a large negative peak in 30–33 week PMA to a positive waveform. Column (B): Heart rate rise increases with PMA. Columns (C, D): Noxious-evoked reflex activity in the first second poststimulus does not significantly change across development. Ipsilateral reflex responses visually decrease in magnitude when examining the entire reflex duration. Column (E): The proportion of infants displaying a brow bulge response increases with PMA. Abbreviations: ERP = event-related potential; PMA = postmenstrual age.In the youngest infants, the time-locked EEG activity to both noxious and nonnoxious stimulation consists of a high-amplitude slow wave superimposed with high-frequency activity, resembling delta brush activity. From 30 weeks PMA, a prominent ERP is evoked by the noxious stimulation, which has not been previously reported, and consists of a single negative peak at ~400–500 ms that decreases in magnitude with increasing PMA (Fig. 4) and is not present in response to the nonnoxious procedure (Supplementary Fig. 4). From 33 weeks onward, a positive peak at ~500–600 ms is evoked.
The morphology of Noxious-Evoked Brain Activity Changes throughout Early Development
To investigate further the morphology of the developing noxious-evoked brain activity, we examined the presence of the main waveforms that were identified in the noxious responses in the Oxford training dataset. The first three waveforms (PC 1, PC 2, and PC 3, Fig. 5) account for 80% of the variance in the noxious-evoked activity in the Oxford training dataset and represent the activity that can be visually observed across distinct developmental stages (Fig. 4). PC 1 resembles the slow wave component of delta brush activity; PC 2 is a negative deflection with peak latency at ~445 ms; and PC 3 is composed of an early positive and negative deflection (latency: 214 and 429 ms, respectively) followed by a second positive deflection peaking at ~595 ms to which the template of noxious-evoked brain activity (Hartley et al. 2017) can be fit (Fig. 5). When the PCs are projected onto the Oxford and UCL datasets, they discriminate between the noxious and nonnoxious conditions; PC 2 and PC 3 are significantly greater in response to the noxious stimulation compared with nonnoxious stimulation (paired t-test, PC 2; n = 144, t = −5.32, Holm-corrected P = 0.0003, PC 3; n = 144, t = −5.14, Holm-corrected P = 0.0003), while PC 1 is significantly greater following nonnoxious stimulation (paired t-test, n = 144, t = 2.70, Holm-corrected p = 0.0082).The three PCS derived in the noxious data in the Oxford training dataset and their associated age-related trajectories in the Oxford and UCL datasets. (A–C) The three PCs derived from the noxious data in the Oxford training dataset (scaled to maximum amplitude within each PC). In (C), the previously published template of noxious-evoked brain activity (Hartley et al. 2017) is scaled and overlaid in red to show the similarity of the waveforms. (D–F) Corresponding age-related trajectories of the PC coefficients. PC coefficients were obtained by projecting the PCs to the Oxford training dataset, Oxford held-out test dataset and UCL dataset. Dots represent individual infants (noxious responses only). Developmental trajectories (color and gray lines: noxious and nonnoxious trajectory, respectively) are estimated for visualization only by calculating the age-weighted average and standard deviation of the PC coefficients at each postmenstrual week (see Materials and Methods). Oxford data contain the Oxford training dataset and the Oxford held-out test dataset. Abbreviations: PMA = postmenstrual age.The PC coefficients follow distinct age-related trajectories (Fig. 5). PC 1 is dominant in the neonates aged 28–29 weeks PMA and significantly decreases in magnitude with increasing PMA (n = 144, beta = −0.016, t = −5.30, Holm-corrected P = 0.0011). This coincides with the appearance of PC 2, which peaks at ~30 weeks PMA and steadily decreases until term age (n = 144, beta = −0.023, t = −6.89, Holm-corrected P = 0.0011). Meanwhile, PC 3 increases in magnitude with age (n = 144, beta = 0.013, t = 3.74, Holm-corrected p = 0.004) and reaches a plateau at term age. The results are similar when correcting for PNA (see Supplementary Table 1).The ERSP features show weaker associations with age (Fig. 6). Of the five ERSP features that were visually identified in response to the noxious stimulation in the Oxford training dataset (Fig. 1, Table 3), only early delta (n = 66, beta = −0.53, t= -2.77, Holm-corrected p = 0.047) and early alpha (n = 66, beta = −0.56, t = −3.12, Holm-corrected p = 0.017) significantly change with PMA. The other features are relatively constant in magnitude.
Figure 6
Age-related changes in the five ERSP features. Top row: Average ERSP response to the noxious stimulus in the Oxford data with time–frequency windows of interest: early delta (A), early alpha (B), late delta (C), late alpha (D), and late beta (E). The stimulus occurred at time = 0. Bottom row: corresponding developmental trajectories of the ERSP features. Dots represent individual infants (noxious responses only). Color and gray lines represent the mean noxious and nonnoxious trajectory, respectively, and are estimated for visualization only by calculating the age-weighted average and standard deviation of the ERSP feature magnitude at each postmenstrual week (see Materials and Methods). Oxford data contain the Oxford training dataset and the Oxford held-out test dataset. Abbreviations: PMA = postmenstrual age.
Age-related changes in the five ERSP features. Top row: Average ERSP response to the noxious stimulus in the Oxford data with time–frequency windows of interest: early delta (A), early alpha (B), late delta (C), late alpha (D), and late beta (E). The stimulus occurred at time = 0. Bottom row: corresponding developmental trajectories of the ERSP features. Dots represent individual infants (noxious responses only). Color and gray lines represent the mean noxious and nonnoxious trajectory, respectively, and are estimated for visualization only by calculating the age-weighted average and standard deviation of the ERSP feature magnitude at each postmenstrual week (see Materials and Methods). Oxford data contain the Oxford training dataset and the Oxford held-out test dataset. Abbreviations: PMA = postmenstrual age.
The Pattern of Noxious-Evoked Multimodal Activity Changes throughout Early Development
The maturation of the responses across the different modalities follows differing developmental trajectories (Fig. 7). Two clusters of responses are visually apparent. PC 2 noxious, PC 1 noxious, early delta, early alpha, and late alpha are greater in infants below 33 weeks and decrease in magnitude with increasing PMA. In contrast, heart rate increase, PC 3 noxious, brow bulge duration, magnitude of the predefined template of noxious-evoked activity, and late delta are greater in infants above 36 weeks and smaller in infants below 33 weeks.
Figure 7
Normalized magnitude of noxious-related responses at increasing postmenstrual weeks, separated by feature. Magnitudes were first normalized within modality across all infants, and then age-weighted averages were calculated in a 4-week sliding window where infants close to the central PMA were upweighted (see Materials and Methods). Abbreviations: PMA = postmenstrual age.
Normalized magnitude of noxious-related responses at increasing postmenstrual weeks, separated by feature. Magnitudes were first normalized within modality across all infants, and then age-weighted averages were calculated in a 4-week sliding window where infants close to the central PMA were upweighted (see Materials and Methods). Abbreviations: PMA = postmenstrual age.
Discussion
Characterizing noxious-evoked neonatal responses to potentially painful medical procedures is crucial to better measure and treat pain in hospitalized neonates. The youngest premature neonates, born from ~22 weeks’ gestation, experience the highest number of clinically necessary interventions per day (Carbajal et al. 2008), which occur during a developmental period where they are vulnerable to long-term detrimental consequences (Doesburg et al. 2013; Ranger et al. 2015; Duerden et al. 2018). While there are many lines of evidence to suggest that term-aged infants display discriminable patterns of behavioral, autonomic, and cerebral activity in response to noxious events, in contrast, in premature infants a lack of discrimination between observed responses to noxious and nonnoxious stimulation is often reported. Here we demonstrate that both term and preterm neonates display developmentally distinct activity patterns that discriminate between noxious and nonnoxious stimuli. Using a classification model, we show that multimodal activity, including evoked brain activity, behavioral responses, and autonomic responses, can discriminate noxious from nonnoxious stimuli with an accuracy of 76%–84%. Importantly, the accuracy was 78%–79% in the youngest age group (28–31 weeks PMA) demonstrating that even in infants of this age it is possible to discriminate the responses to the two stimuli.In this study we demonstrate that evoked brain activity patterns recorded using EEG change morphologically across the developmental period from 28–40 weeks PMA. Notably, we report the emergence of a transitory noxious-evoked waveform in infants aged ~30–33 weeks PMA that has not previously been described. Identification of this transitory activity, consisting of a negative deflection at ~400–500 ms following the noxious stimulus, has the potential to facilitate the use of noxious-evoked activity as a surrogate pain outcome measure in research investigations and clinical trials in neonates aged less than 34 weeks PMA, analogous to the use of a standardized template of noxious-evoked brain activity in term infants (Hartley et al. 2017, 2018). Additionally, we confirm previously reported observations that noxious stimulation evokes immature delta brushes in very premature infants (Fabrizi et al. 2011) and that from ~34 weeks PMA a previously well-characterized pattern of noxious-evoked activity emerges consisting of a positive deflection at ~500 ms following the noxious stimulus (Slater et al. 2010; Hartley et al. 2017). The observed changes in the noxious-evoked brain activity pattern follow a continuous developmental transition, which could potentially underpin differences in emotional and sensory perceptions evoked by the nociceptive input at different developmental ages. In contrast to the other modalities, where discrimination between the noxious- and the nonnoxious procedure is based on magnitude, ERP responses are qualitatively different between noxious and nonnoxious stimulation. While in the absence of verbal report, we do not know how infants perceive these stimuli; importantly, identification of morphologically distinct noxious-evoked brain activity from 30 weeks PMA highlights the possibility of infants differentially processing noxious and nonnoxious stimuli at this age and emphasizes the need for appropriate pain management for all infants.The developmental period studied from 28 to 40 weeks PMA coincides with major changes in the structural maturation of nociceptive pathways. Between 26 and 28 weeks’ gestation, thalamo-cortical fibers innervate somatosensory areas in the cortical plate through the subplate, a transient structure underlying the cortical plate (Kostović and Judaš 2010; Krsnik et al. 2017). Between 31 and 34 weeks’ gestation the cortical plate differentiates into its adult-like configuration with six layers (Kostović and Judaš 2010). From ~35 weeks’ gestation the subplate begins to disappear, giving way to mature thalamo-cortical connections, and inter-hemispheric connections start to form (Kostović and Judaš 2010). Considering these changes, the emergent transitory brain activity may represent early thalamo-cortical signaling whereas the more mature later positive waveform may represent further processing involving inter- and intrahemispheric connections between the cortex and other brain areas. This would be in line with findings in adults where the individual components of noxious-evoked ERPs covary with activity in distinct brain regions (Mobascher et al. 2009), and each mediates different aspects (perceptual, motor, or autonomic) of the pain-related response (Tiemann et al. 2018). An important next step is to investigate the development of the spatial organization of the noxious-related brain activity in preterm infants using techniques such as fMRI, which have successfully been used in term infants (Goksan et al. 2015; Williams et al. 2015; Duff et al. 2020). Importantly, the emergence of remarkably similar patterns of noxious-evoked activity have also been observed in rat pups after sensory fiber stimulation (Chang et al. 2020), facilitating the translation between laboratory and clinical investigations.Besides temporal EEG changes, we identified changes in EEG ERSPs in response to noxious stimulation. In line with previous findings (Fabrizi et al. 2016), we observed an increase in power mostly in the delta, alpha and high beta ranges. Late Delta (1–2 Hz, 1–2 s poststimulus) was an important discriminative feature in our classification model, while the higher frequency ERSP features—high beta (28–30 Hz) and early and late alpha (7–15 Hz)—had lower feature importance. This is in apparent contrast with some investigations in adults, where painful stimulation is reported to be mainly associated with decreases in alpha and beta activity and increases in gamma activity (Tu et al. 2016; Misra et al. 2017; Ploner et al. 2017). Pain-related responses in preterm infants may be of lower frequency than in adults, considering that the preterm EEG generally has a lower frequency content than the adult EEG (Niemarkt et al. 2011), and adult EEG frequency band cutoffs cannot be directly translated to infants (Saby and Marshall 2012). It would be of interest to further investigate the development of noxious-evoked ERSPs in infants at higher postnatal ages. Future work should also include multiple electrodes to investigate the spatial distribution of these responses.The maturational development of the autonomic and behavioral responses was also apparent in this study. The evoked change in heart rate and brow bulge increased in magnitude between 28 and 40 weeks PMA. Both responses matured concomitantly with the appearance of the more mature noxious-evoked brain activity that has been well characterized in term-aged infants. This is in line with previous work which suggested that discrimination in facial expression responses between noxious and nonnoxious stimuli is related to the maturity of the brain activity response (Green et al. 2019). Interestingly, the youngest infants here only display a small increase in heart rate to the noxious stimulus compared with term infants, which is apparent in both independently collected datasets. While a limitation of the study is the relatively small numbers of infants at the youngest ages (12 infants in the Oxford dataset and 7 infants in the UCL dataset were aged less than 31 weeks PMA), this dependence of the magnitude of the heart rate response on PMA highlights the need for developmentally adjusted clinical pain assessment.The magnitude of the noxious-evoked reflex activity did not seem to attenuate with age. However, this apparent discordance with previous work (Cornelissen et al. 2013; Hartley et al. 2016) may be explained by the relatively short time window (1 s) of response that was used for the analysis—it seems that while the initial magnitude of the rise of the reflex is relatively constant with age, the duration of the responses appears visually longer in the younger neonates, consistent with previous reports (Hartley et al. 2016). Future work to refine the model presented here should investigate whether different characteristics of limb withdrawal could be used at different ages to further improve discrimination accuracy.Importantly, the classification and feature importance results derived from the training data generalized to a statistically independent test set. The most important features spanned cardiovascular, cerebral, and limb withdrawal modalities in both the training and testing data, thus confirming the importance of multimodal assessments. Moreover, the classification results and developmental changes in noxious-evoked activity were reproducible in a fully independent dataset collected at a different center by a different team of researchers (Jones et al. 2018b). This is an initial demonstration of the consistency of these techniques and the utility of brain-derived approaches to better understand how painful procedures impact the activity of the central nervous system in developing infants. Further investigation of this external validity in prospective studies at different research sites is key to clinical translation of these findings. Although implementing these techniques in new centers can be challenging (Baxter et al. 2021), the development of a clinically usable tool with simplified EEG acquisition and integration of standard analysis techniques will facilitate the use of these methods for the measurement of noxious-evoked brain activity in research investigations and clinical trials.The goal of the current study was not to develop a clinical tool with the highest achievable classification accuracy but to investigate discrimination across age. While we achieved reasonably high classification accuracy, further investigations into alternative algorithms, feature selection, and hyperparameter optimization could facilitate the creation of a clinically usable tool to measure noxious-evoked cerebral, behavioral, and autonomic activity in infants from 28 weeks PMA that is developmentally sensitive. The time windows in which responses were measured, which were based on established practices in previous publications, could also be further optimized for machine learning purposes. In addition, the developmental trajectories described here could be further modeled to specifically investigate the transitions in multimodal patterns of activity and to determine the normal boundaries for interpreting noxious-related responses at different ages. As this study included only one type of noxious stimulation (a heel lance), future studies should address the similarity of responses evoked by other clinical procedures, such as immunizations or cannulations. In this work, the noxious and nonnoxious stimuli were closely matched in terms of tactile, vibratory, and auditory aspects. To further investigate the specificity of the noxious-evoked responses, it would also be of interest to investigate the discrimination between noxious procedures and other nonnoxious modalities (e.g., visual) that are matched in saliency (Mouraux and Iannetti 2018).A multimodal model like the one described in this work could play a role in clinical research in the NICU. Given a set of recordings from an infant, the model can generate the probability of exposure to a noxious event. This score could be used as an outcome measure in clinical trials to assess the effect of analgesics and ultimately impact clinical practice.In summary, we demonstrated that evoked responses to noxious and nonnoxious stimuli can be discriminated in in both term and preterm infants. We characterized the developmental changes in brain activity, behavioral, and autonomic responses between 28 and 40 weeks PMA, including the transient emergence of a previously uncharacterized pattern of noxious-evoked brain activity observed between ~30 and 33 weeks PMA. While developmental changes with distinct features occur across this age range, responses can be discriminated at every age. A better understanding of the response to noxious stimuli with age will lead to the construction of developmentally sensitive clinical pain measurement tools and more accurate assessment of the impact of analgesic interventions.
Funding
Wellcome Trust via a senior fellowship awarded to Rebeccah Slater (Grant number 207457/Z/17/Z); Bliss (a UK charity) research grant; Prins Bernhard Cultuurfonds Scholarship to M.v.d.V.; Wellcome Trust collaborative award (215573/Z/19/Z funded S.F.) and a senior fellowship (221933/Z/20/Z funded S.F.).
Notes
We would like to thank Laura Jones and Lorenzo Fabrizi for sharing the UCL dataset with us through the UK Data Service. We would like to thank all other members of the Pediatric Neuroimaging group for help with data acquisition. We are grateful to all the families who participated in the studies.Conflict of Interest: None declared.Click here for additional data file.
Authors: Hendrik J Niemarkt; Ward Jennekens; Jaco W Pasman; Titia Katgert; Carola Van Pul; Antonio W D Gavilanes; Boris W Kramer; Luc J Zimmermann; Sidarto Bambang Oetomo; Peter Andriessen Journal: Pediatr Res Date: 2011-11 Impact factor: 3.756
Authors: Sam M Doesburg; Cecil M Chau; Teresa P L Cheung; Alexander Moiseev; Urs Ribary; Anthony T Herdman; Steven P Miller; Ivan L Cepeda; Anne Synnes; Ruth E Grunau Journal: Pain Date: 2013-04-08 Impact factor: 6.961
Authors: Ricardo Carbajal; André Rousset; Claude Danan; Sarah Coquery; Paul Nolent; Sarah Ducrocq; Carole Saizou; Alexandre Lapillonne; Michèle Granier; Philippe Durand; Richard Lenclen; Anne Coursol; Philippe Hubert; Laure de Saint Blanquat; Pierre-Yves Boëlle; Daniel Annequin; Patricia Cimerman; K J S Anand; Gérard Bréart Journal: JAMA Date: 2008-07-02 Impact factor: 56.272
Authors: Lorenzo Fabrizi; Rebeccah Slater; Alan Worley; Judith Meek; Stewart Boyd; Sofia Olhede; Maria Fitzgerald Journal: Curr Biol Date: 2011-09-08 Impact factor: 10.834
Authors: Jordana A Waxman; Rebecca R Pillai Riddell; Paula Tablon; Louis A Schmidt; Angelina Pinhasov Journal: Pain Res Manag Date: 2016-04-20 Impact factor: 3.037