Literature DB >> 35707576

State transition modeling of complex monitored health data.

Jörn Schulz¹, Jan Terje Kvaløy², Kjersti Engan¹, Trygve Eftestøl¹, Samwel Jatosh³, Hussein Kidanto⁴, Hege Ersdal^5,6.

Abstract

This article considers the analysis of complex monitored health data, where often one or several signals are reflecting the current health status that can be represented by a finite number of states, in addition to a set of covariates. In particular, we consider a novel application of a non-parametric state intensity regression method in order to study time-dependent effects of covariates on the state transition intensities. The method can handle baseline, time varying as well as dynamic covariates. Because of the non-parametric nature, the method can handle different data types and challenges under minimal assumptions. If the signal that is reflecting the current health status is of continuous nature, we propose the application of a weighted median and a hysteresis filter as data pre-processing steps in order to facilitate robust analysis. In intensity regression, covariates can be aggregated by a suitable functional form over a time history window. We propose to study the estimated cumulative regression parameters for different choices of the time history window in order to investigate short- and long-term effects of the given covariates. The proposed framework is discussed and applied to resuscitation data of newborns collected in Tanzania.

Entities: Chemical

Keywords: Aalen's linear model; Aalen-Johansen estimator; monitored health data; multi-state models; state transition intensity

Year: 2019 PMID： 35707576 PMCID： PMC9041820 DOI： 10.1080/02664763.2019.1698523

Source DB: PubMed Journal: J Appl Stat ISSN： 0266-4763 Impact factor: 1.416

Introduction

Monitoring of health data has rapidly increased throughout the last years. In medicine, it has always been important to monitor patient data for numerous reasons such as documentation, the exploration of causes of diseases or to trace treatment effects. In hospitals, the patients' health status is monitored in several situations. Examples are the tight monitoring of patients in a intensive care unit [12,28] or during sedation [11]. A further fast increasing area includes mobile monitor devices like smart textiles or smart watches by the availability of low-priced sensors [5,7,20], e.g. offering to track the heart rate (HR) 24 h. In particular, this is becoming increasingly important in an elderly society [25]. In all these situations, the monitored health data have a common property. They consist of a complex data structure including baseline data (e.g. age and gender) and time-varying signals (e.g. expired lung volume, blood and head pressure) that are often a mixture of categorical and continuous variables. Further, monitored health data usually involves signals recorded by different sensors that are synchronized by the time offset but sample the signals at different time points. Examples of sensors are an electrocardiograph (ECG) sensor, a flow sensor or an acceleration sensor [20,34]. Each time-varying signal is locally dependent (e.g. the HR) and there might be correlation between signals (e.g. between the HR and expired volume). Last but not least, sections of recorded signals can be missing or of low signal quality. All these factors lead to a complex and challenging data structure requiring robust statistical methods. In several medical applications, one is interested in investigating the association between the health status and a set of covariates. This is exactly the targeted application of this manuscript. The health status is often defined by a finite set of states. States can be naturally defined by the data as for example in cardiac arrest [19]. Furthermore, states are frequently defined for continuous signals based on medical reasoning. Examples are the definition of low, medium or high ranges for the blood pressure, the concentration of vitamins and minerals in the blood or the viral load. Although binning of data should generally be avoided, in this context, given a suitable treatment of the thresholds, e.g. by using a hysteresis filter as suggested later in Section 3.2.2, the classification of a continuous signal into a finite set of health states facilitates the medical interpretation and might introduce robustness in the presence of large noise or strong local variations. The objective of this paper is to propose a framework for monitored health data to achieve insights into the relationship between the health status and a set of covariates. Such a framework will facilitate a more comprehensive and descriptive data overview. Due to the observational design of such data and thereby the inherent challenges about causality, the aim is not to uncover causal effects. Mixed effect and joint models, described in detail in Rizopoulos [32], might be used to describe the relationship between a set of covariates and being in a given health state. However, both are less suited for finding covariates that are associated with state changes of the health status and for describing how the impact of the covariates change over time which is the main focus of this article. The understanding of covariates associated with state changes is important to aid improvements in treatment. In the statistical literature, substantial effort has been dedicated to multi-state models. A detailed review can be found in Putter et al. [31], Meira-Machado et al. [27] or Meier-Hirmer and Schumacher [26]. These methods are often based on assumptions like constant transition intensities, Markov or semi-Markov properties. For non-homogeneous or non-Markovian models, the proposed non-parametric estimation of the transition probabilities are often based on a progressive illness-death model. Such models have no period larger than zero, also called cycles, i.e. there are no transition probability larger than zero to return to a state i after leaving it. Often, monitored health data includes such cycles, i.e. back and forth transitions between health states. In addition, the model should incorporate time-dependent covariate effects. This is essential in the targeted application as one like to know what triggers a change in the health status at different time points in order to improve treatment. Kvaløy et al. [19] and Nordseth et al. [29] suggested the use of a non-parametric intensity regression method [2] for modeling cardiac arrest data including the state history. Gran et al. [16] used a similar model to study sickness absence after work rehabilitation including a set of baseline covariates. Due to the non-parametric nature, state intensity regression can handle data with the described challenges under minimal assumptions given the data are treated properly as discussed in detail in Section 3. The proposed intensity regression model does not require Markov or semi-Markov properties. Hidden Markov Models (HMMs) [13,24] and Gaussian Process State Space Models (GP-SSMs) [10,17] are alternative approaches in modeling monitored health data. Both models can also account for most of the above challenges. In addition, they formally incorporate noise within the statistical model. However, their specification is not straightforward. In HMMs, the definition of a concrete parametric model that describes the hidden layer is required. In GP-SSMs, a suitable mean function has to be defined. In addition, both models involve a large number of parameters or hyper-parameters and require training. The hidden layer or the mean function as well as all the parameters has to be carefully chosen for each different application. Thus, we believe that the proposed state intensity regression offers a more simple, intuitive and flexible approach with minimal tuning of parameters whereas it can handle a wide variety of data. In this article, we will generalize the work of Kvaløy et al. [19] and Nordseth et al. [29] to the wide variety of complex monitored health data. In addition, we present a new way of data exploration by using different lengths of time history windows where the covariates are aggregated by a suitable functional form. This allows significant covariates for state changes to be found for a family of time history windows. The idea is inspired by the scale-space-approach as described in Chaudhuri and Marron [8,9] where every relevant bandwidth of a smoothing Gaussian kernel is investigated in the family of smooth curve estimates in order to find significant curve features. The rest of the article is organized as follows. We begin with a motivating example in Section 2 that involves resuscitation data of newborns. In Section 3, notations are defined, signal processing methods are presented, the proposed statistical model is introduced and model assumptions and data challenges are discussed. In Section 4, the merits of the proposed method is illustrated on resuscitation data of newborns, followed by a discussion in Section 5.

Motivating application

In this section, we introduce an example of monitored data consisting of baseline data and time-varying signals with the main objective to determine time dependent treatment effects that are associated with a change of the health status.

Data and problem description

The data considered here is a part of a larger study named Safer Births [1] that is a research and development collaboration between Tanzanian, Norwegian and international research institutions and Laerdal Global Health and aims to save newborn lives with the focus on low-resource settings. The Safer Births study is ethically approved by the National Institute for Medical Research in Tanzania (Ref.NIMR/HQ/R.8a/Vol.IX/1434) and the Regional Committee for Medical and Health Research Ethics in Norway (Ref.2013/110). From this study, resuscitation data of newborns with absence of spontaneous respirations at birth are investigated. Such newborns need additional care such as stimulation, suctioning or positive pressure ventilation [14]. Baseline data were recorded by a midwife in addition to data collected by a newborn resuscitation monitor from Laerdal Global Health. All midwifes were trained to follow the Helping Babies Breathe guidelines [6] emphasizing to start with positive pressure ventilations for non-breathing newborns within 1 minute of birth called the ‘golden’ minute [30]. Non-breathing newborns were brought to the resuscitation table and ECG electrodes were applied over the abdomen. The resuscitation included a combination of positive pressure ventilations, stimulations and other activities. These interventions have happened in any and overlapping order with nonuniform length. The ECG and the ventilation signals were recorded by the monitor. Recorded ventilation signals include airway pressure, flow and . The instantaneous HR, inspired and expired volume, leakage, pressure and are derived from these signals as described in Vu et al. [34]. In addition, acceleration signals of the HR sensor were recorded. These signals in combination with the ECG signal were utilized to detect automatic stimulation activities by a trained classifier described in more detail in Vu et al. [33,35,36]. In general there are several challenges connected to this data set including missing values, low HR signal quality, different onsets and lengths of treatments, and strong correlation between some of these variables. Furthermore, this study is an observational study which cause difficulties in interpretation of causal effects. For example, longer time of ventilation can be associated with a negative impact on the HR but this does not mean that more ventilation is negative rather that newborns with a critical status are ventilated longer. These challenges occur in various types of monitored health data and are discussed in Section 3.4. The data set was earlier studied by Linde et al. [22] and Vu et al. [18]. In Linde et al., the association between HR and a set of covariates including expired volume (ml/kg) was analyzed by a general additive model (GAM). In their analysis, the data were aggregated in the first five ventilation and pause sequences of the observed data. A GAM is well suited to study associations between covariates and an outcome variable and to include non-linear effects but is less suited for finding covariates associated with state transitions, i.e. with a higher or lower health status. In Vu et al. [18], an exploratory tool is suggested to study independently the average of several ventilation parameters for two groups defined by the change in Apgar score. The Apgar score is a measure for clinical status of a newborn and is manually recored by the midwifes after 1 and 5 min. Both studies give valuable insights for resuscitation of newborns but aggregate the data to a large extent and do not study time dependent covariate effects. In Section 3, we will introduce a state intensity regression model for this kind of complex resuscitation data that can tackle the problem of which baseline or time-varying covariates are related with an improved or declined health status during resuscitation of newborns at a given time. Only resuscitation data with ventilation signals were included into the analysis leading to 1111 cases. The response variable was defined by a weighted median filtered and categorized instantaneous HR using a hysteretic filter as discussed in further detail in Section 3.2. We have categorized the HR in four categories 0–60, 60–120, 120–180, >180 bpm, where 120–180 bpm is assumed to be the normal range of the HR of a newborn. The category below 60 bpm is based on resuscitation guidelines [37] and the category 120–180 bpm is based on Linde et al. [23] who studied the heart rate of healthy newborns. Thus, the four defined HR categories are based on the medical literature. Given these HR states, only state transitions between adjacent HR states were observed. This is reasonable as we would expect maybe a quick but always smooth HR change without jumps. Thus, the corresponding state transition graph can be depicted as in Figure 1. In addition, we might include the two absorbing states ‘alive’ and ‘dead’ in the model recorded latest after 30 min by the midwifes and depicted in gray color in Figure 1.

Figure 1.

State transitions during newborn resuscitation between four transient states defined by the HR categories and the two absorbing states alive and dead.

Overview of explanatory variables

The included covariates in our model involve characteristics that might affect the HR during resuscitation by bag mask ventilation. The covariates are a mixture of baseline covariates and time-varying covariates. All baseline covariates are recorded manually whereas time-varying covariates are derived from recorded ventilation signals as discussed in Section 2.1. Baseline covariates Sex (): A zero-one covariate defining the gender where 0 indicates female and 1 male. BW (): A continuous covariate that defines the birth weight. Delivery(): A zero-one covariate indicating whether the delivery mode was vaginal or C-section. First HR (): First HR of the weighted median filtered HR signal indicating the health status of the newborn at delivery. Cord clamp (): A continuous covariate that defines the recorded time between birth and cord clamp. Treatment start (): A continuous covariate that defines the recorded time between birth and start of ventilation. Time-varying covariates Ventilation () Fraction of time spend with ventilation within each time history window. Stimulation () Fraction of time spend with stimulation within each time history window. Expired volume (): Sum of the expired volume within each time history window. Peak pressure (): Mean of peak inflation pressure during bagging mask ventilation within each time history window. BMV length (): Median time between bagging mask ventilations within each time history window defined by the time between peak pressures. We suggest in Section 3.3 to investigate a family of time history windows and based on the results to select one or two specific windows for a more detailed study of the time dependent covariates effects. In addition, the proposed framework allows the inclusion of dynamic covariates as described in more detail in Sections 3.3 and 4.

Model and method

In this section, first the general workflow of modeling state transitions given a set of covariates is discussed. Next, data processing methods are presented to facilitate a robust categorization under certain assumptions in case a continuous signal is reflecting the health status. This includes the weighted median to filter continuous monitored signals and the definition of states via a hysteretic filter. Afterwards, the state intensity regression model is introduced including the time history window in which covariates can be aggregated by a suitable functional form. Finally, model assumptions and data challenges are reviewed.

Work-flow

The work-flow of the proposed state intensity regression modeling of health data is summarized in Figure 2. Given L individuals, we assume that monitored health data consists of time-varying covariates and baseline covariates as depicted in the top left and middle panel of Figure 2. Baseline covariates do not change over time and are easy to incorporate. Time-varying covariates require additional care, e.g. due to missingness and unbalanced sampling patterns between and inside each covariate.

Figure 2.

The workflow diagram depicts a general overview of data processing steps that are required for state intensity regression of monitored health data.

The workflow diagram depicts a general overview of data processing steps that are required for state intensity regression of monitored health data. As depicted in the bottom left panel of Figure 2, new or additional features might be derived from the recorded time-varying signals by suitable signal processing methods. Examples are mentioned in Section 2 by the HR that is derived from an ECG signal, the expired volume that is derived from an air-flow signal or treatment stimulation activities that are derived from an acceleration signal using a trained classifier. The kind of applied signal processing depends on the underlying problem and aims to derive signal features with medical relevance from the recorded signals. As a result of the signal processing, we obtain a set of covariates . For usability, the covariates are uniformly sampled on an equidistant time grid. The definition of states in the top right panel of Figure 2 is described in detail in the next section. Finally, in Section 3.3, the state intensity regression is discussed in which each covariate is described by a suitable summary of the time history of that covariate at each time point t in addition to the baseline covariates.

Data processing

In this section, two data pre-processing methods are discussed, which are particularly suited for the definition of health states from continuous signals. First, we discuss the weighted median as a method for smoothing a continuous signal that is reflecting the health status in order to remove noise and outliers before categorization. In the motivating example, the data from Safer Birth, we only smoothed the HR signal. The weighted median is an option among other smoothing methods like Gaussian processes or kernel methods. It is beyond the scope of this manuscript to compare different smoothing methods. In the targeted application, we found the weighted median is very robust against outliers and intuitively applicable with good results. Afterwards, we discuss a robust procedure to categorize a signal.

Weighted median

Monitored health data often contains variables from continuous signals like the HR that satisfy some smoothness assumptions, e.g. the assumption of a smooth HR signal without jumps. However, the recorded signal might contain a lot of noise or outliers, e.g. due to a weak or changing skin contact of the sensor. We suggest to use a weighted median filter for the signal defined as follows: The weighted median for n ordered numbers and positive weights with is the element such that Now, suppose we observe a signal , e.g. the HR, at n time points with . Then, the weighted median is defined as the weighted median of neighboring signals where κ is a smoothing parameter and is manually chosen. For simplicity, κ is assumed to be an even number so that is the weighted median of . With the application in mind, we suggest to model the weights by a mixture of a signal quality indicator for each and the distance of each neighboring signal to the center . The weights can be defined by where is the weight for the signal quality and is the weight for the distance with . The weights can be calculated by with , , and is a weight for the signal with the largest time distance in the window J, e.g. . If the weight would be 0. A suitable choice of the smoothing parameter κ is important. Too small κ will lead to a noisy signal, whereas too large κ leads to over-smoothing. We have studied different choices of κ in the Supplementary Material with respect to recovering the true signal and with respect to the stability of the state intensity regression. The weighted median filter is less suited for signals with very high frequency as there is an increased risk of over-smoothing as discussed in the Supplementary Material. At the end, the median filtered signal could be interpolated if signal gaps are not larger than a certain threshold. Therewith, missing data can be substituted with interpolated data leading to a more robust analysis. Figure 4 displays the weighted median filtered HR of two selected cases from the introduced resuscitation data. In addition, examples of the weighted median filtered HR for different choices of the smoothing parameter κ are depicted in the Supplementary Material.

Figure 4.

Continuous and categorized HR of two selected individuals. In (a), the instantaneous HR colored by the signal quality indicator is depicted and (b) shows the weighted median filtered HR. Next, in (c), the interpolated median filtered HR is visualized and finally in (d) the categorized HR of both individuals in addition to the two absorbing states alive and dead is presented (Color figure online).

Definition of states

In several applications, there are natural states given by the data. Examples are the medical health status of a patient or the different states during resuscitation of cardiac arrest [19]. However, in certain situations, it might be of interest to derive the health status from continuous signals, e.g. the HR, motivated by established ranges based on medical reasons like very low, low, normal and high HR [23,37]. In practice these categories or thresholds should be important, interpretable and of medical interest. If thresholds are not based on established medical reasons and are rather arbitrary, a different model approach might be more suitable or at least a robustness analysis is necessary that can validate these thresholds. For categorization of a continuous signal, we suggest to apply first a weighted median or an alternative smoothing method as described above in order to smooth the signal and to remove outliers. In the weighted median, the smoothing parameter κ has to be chosen carefully as too little smoothing can result in undesirable state transitions induced by the noise, whereas over-smoothing might remove real state transitions between the health states. This is discussed in more detail in the Supplementary Material. Given a reasonably-filtered signal and certain thresholds, the categorization can be performed using a hysteretic filter. The principle is known from threshold models, e.g. in engineering and economics, and was for example applied to time series models by Li et al. [21]. The application of a hysteretic filter leads to more robustness as it avoids a high number of state changes when the signal is close to the threshold. In the following, we assume a continuous signal sampled at n times points . For simplicity, we consider first the case of a single threshold and two states that transform the process into a process by and otherwise. Now, we define a hysteresis region with , e.g. with . Given the hysteresis region, the process is defined by i.e. the process remains unchanged as long falls within the hysteresis region. The principle is visualized in Figure 3. If the first recorded signals are inside the hysteresis region then we selected the first signal that is outside of the hysteresis region and applied the hysteresis filter in a backward recursive fashion starting with .

Figure 3.

Schematic visualization of two states defined by (a) a simple threshold τ and (b) a hysteresis region given two states .

Schematic visualization of two states defined by (a) a simple threshold τ and (b) a hysteresis region given two states . Continuous and categorized HR of two selected individuals. In (a), the instantaneous HR colored by the signal quality indicator is depicted and (b) shows the weighted median filtered HR. Next, in (c), the interpolated median filtered HR is visualized and finally in (d) the categorized HR of both individuals in addition to the two absorbing states alive and dead is presented (Color figure online). The extension to m>1 thresholds with is straightforward under the assumption of non-overlapping hysteresis regions. Two examples of categorized signals can be found in Figure 4.

State transition

In this section, first the non-parametric additive intensity regression model as suggested by Aalen [2] and Aalen et al. [4] is introduced that associates the conditional intensity of a state transition at a given time point with a set of time-varying covariates. Afterwards, the time history window and dynamic covariates will be discussed in more detail.

The state intensity regression model

Given a finite number of states , we denote a transition from state i to state j by with . In order to analyze the event history of monitored health data given a set of covariates, we denote by the conditional intensity of a transition for an individual at time as a function of the past with , . Then, the conditional intensity process given the history up to time t can be written by where are the covariates that might influence the intensity including baseline and time-varying covariates, e.g. age of an individual or lung pressure. Furthermore, are the regression parameters indicating the effect of the covariates where is the baseline intensity. Notice that these regression parameters are allowed to vary non-linearly over time. Now, given L individuals and a transition , the process (1) can be written by with and is a -matrix defined by with if an individual l is at risk at time t and otherwise, . The cumulative or integrated regression parameters with can be estimated [2,4] by where denotes a generalized inverse and with the counting process of state changes from state i to state j until time t for an individual . For the analysis of monitored health data, we suggest to center all covariates by with the average of the k-th covariate at time point t. Thereby the cumulative baseline intensity can be interpreted as the cumulative intensity of an average individual which is identical to the Nelson-Aalen estimator [4]. The estimated integrated regression parameters can be depicted by plots together with confidence intervals. The estimated regression parameters are reflected by the slope in the corresponding plots. A positive slope means that the intensity is increasing as the covariate increases, and a negative slope means that the intensity is decreasing as the covariate increases. In general, can be understood as the change in intensity for one unit change of the covariate at time t. A significance test for the covariate effects is also given in Aalen [2].

Time history window

Suppose we have K covariates for each individual . Often, the interest is to describe the effect of a certain time history of each covariate on the intensity process (2) at time . For this reason, we define for each covariate a suitable functional form that describes the time history of the corresponding covariate at time by where defines the time history window. An example is if and otherwise, given a history length . The appropriate choice of the functional form depends on the corresponding covariate and the underlying research question. Simple examples are the mean or the cumulative value of the covariate, e.g. . There are no limitations to use different time history lengths h for different covariates in the state intensity regression model (2). The choice of the time history window has an effect on the association of with the state intensity . Thus we suggest to investigate the model (2) for a larger family of time history windows and to examine the effects in more detail for one or two selected time history windows. In Section 4.1, a graphical visualization is presented for studying the intensity regression model (2) for different choices of h. In addition, medical criteria might guide the choice of the time history window. Notice that missing values have to be treated carefully by the functional form in order to avoid to bias the results. An example is the expired volume and the airway pressure from the resuscitation data introduced in Section 2. The expired volume could be aggregated by a mean in . If we have two individuals, one with one expired volume record of size γ in and the other individual with two records of size γ in , then the mean would be γ for both individuals. Therefore, the sum or the average by the time length h is more reasonable in this situation. In case of pressure, we might indeed be interested in the mean peak pressure. Thus, the suitable choice of the functional form depends on the medical meaning and relation of the covariate with the outcome and the influence of missingness in this process.

Dynamic covariates

In modeling the state transitions of monitored health data, the inclusion of dynamic covariates is often of interest as they describe the development of the event history. The development of the event history might be important for the further development of the intensity process. Examples of dynamic covariates are the number of previous state transitions or the time spent in a current state since the last transition. As discussed in Aalen et al. [4], dynamic covariates can be incorporated in the non-parametric intensity regression model (2). Because dynamic covariates contain parts of the event history, by conditioning on them we potentially lose important relationships between the state intensities and the baseline and time-varying covariates. Thus, dynamic covariates cannot be simply added to the model (2). In order to avoid that they ‘steal’ the effect of the covariates, dynamic covariates are orthogonalized based on the other covariates [15]. As (2) is an additive regression model, this can be done by a Gram-Schmidt orthogonalization where each dynamic covariate is replaced successively by its residuals from a least square regression on the other explanatory variables.

Assumptions and challenges

Because state intensity regression is a non-parametric method, the method can incorporate a wide variety of data under minimal assumptions, including baseline and time-varying data, binary and continuous data. The state intensity process (2) is modeled similar to ordinary linear regression. The main assumptions are additivity, no or little multicollinearity, heterogeneity and independence between the intensity processes. In opposite to ordinary linear regression models, there is no assumption about multivariate normality in state intensity regression due to the non-parametric nature of the method. Additivity: In (2), it is assumed that the relationship between the state intensities and the covariates is additive at each time point , i.e. there is a linear relationship. However, the regression parameters are dependent on time t and are estimated non-parametrically. Therefore, the regression parameters as well as the state intensities can have a non-linear structure over time. Thus, the model is very flexible and suitable for monitored health data. Multicollinearity: As in ordinary linear regression, strong multicollinearity will cause problems for state intensity regression. If covariates are too highly correlated with each other, we suggest to remove covariates. For example, in Section 2.2, we have removed the additional covariate inspired volume from the list because of strong correlation with expired volume. Heterogeneity: The basic model assumes no frailty, i.e. there is no unexplained variation between the baseline intensities of individuals. However, when dynamic covariates are added to the model, this is equivalent to allowing frailty between individuals [3, Chapter 8]. Yet, we still need to assume that there is no further heterogeneity which is not already accounted for through the measured dynamic covariates. Independence: Intensity regression requires independence between processes, i.e. between individuals.

Missing values within each process

In general, missing values affect the intensity regression process. We have to distinguish between two different cases. For simplicity, we assume that the health status and covariates are measured on an equidistant time grid with time points . First, if a baseline covariate is missing, the individual is excluded from the analysis. Second, the health status and time-varying covariates can be missing by (i) no measurements from start of the monitoring, (ii) no measurements in an interval during the monitoring and (iii) no further measurements after a time point before the end of the monitoring. The described missingness can be understood as interval or right censoring which is assumed to be independent. The independent censoring assumption for the state intensity process (2) is essentially the same as sequential missingness at random [3, Chapter 2]. If the missingness is informative, i.e. the censoring is not independent, the state intensity process is biased as this information is not captured by the model. Individuals with missing covariates or health status will be excluded only during the time of any missingness but they are included again as soon all information is available. However, in Section 3.3, we have seen that the intensity process at a time point is usually not modeled by the exact covariate values at this time point, rather by a summary of the history of these covariates up to this time point. Therefore, missingness at time points might be replaced by information available in the time history window. This leads to more robustness. It is assumed that missingness inside an individual corresponds to independent censoring. For the introduced resuscitation data, this can be justified by the data collection process and the underlying guidelines as described in Section 2.

Different onsets and lengths of records

The onset and length of the recorded monitored health data for each individual is usually different. Thus, the data have to be aligned according to the medical problem. The presented resuscitation data was aligned by the onset of birth for each individual. However, the lengths of the records might reflect information, e.g. about the severity of the resuscitations in the introduced data. Thus, a careful interpretation of the results is important, in particular for observational studies.

Outliers and signal quality

Outliers and weak signal quality can influence the state intensity regression process (2). Therefore, in Section 3.2, we proposed a weighted median filter as a method to remove outliers and to smooth continuous signals, which is particularly suited as a pre-processing step for the categorization of health states from continuous signals. The signal quality can be incorporated into the weights. Similar to missing values above, potential outliers and noise in the time-varying covariates might be reduced by the functional form of the time history window.

Sampling

For each individual, the continuous covariates might be recorded at different time points due to different sensors. Since the covariates at each time point are described by their history, uneven sample patterns can be handled by the state intensity regression.

Analysis of newborn resuscitation data

In order to illustrate the application of the proposed framework to the analysis of monitored health data, this section presents findings from the resuscitation data of newborns introduced in Section 2. Our objective is to investigate potential treatment factors during resuscitation that are associated with an improvement of the newborn HR. The included data consist of 1111 individuals. Included data consist of those with a HR signal of at least 40 sampling points, any activity signal and a recorded time between birth and start of treatment. Furthermore, two patients with different delivery mode than vaginal or C-section were excluded due to the small number. The data were transformed into an equidistant time grid of 0.25 sec in order to facilitate the data set-up. As explained in Section 2, based on medical reasoning the state space was defined by four HR categories 0–60, 60–120, 120–180, >180 bmp, where 120–180 bpm is assumed to be the range of a normal newborn HR. In addition, we included the two absorbing states alive and dead defined by the 30 min outcome. The transition into the absorbing state was defined after the end of monitoring when no further information was available. The corresponding state transition graph is depicted in Figure 1. The HR was categorized by three steps: First, a weighted median filter was applied to the instantaneous HR signal as described in Section 3.2.1 with . Records with very good signal quality have double weight compared to normal quality signals as they are more certain. Moreover, bad quality signals have weight 0. Second, the median filtered signal was interpolated if signal gaps were not larger than a threshold of 10 s. Third, a hysteretic filter was applied to the filtered signal as described in Section 3.2.2 with bpm. The process is shown in Figure 4 for two selected individuals. In Figure 5(a), the number of individuals in each state, i.e. in each HR category, at a time point are depicted. In the very beginning, the proportion of individuals in the state 60–120 is highest. However, after around 1 min the proportion of individuals in the state 120–180 is higher and reaches its maximum around 4 min. Only very few individuals are in the state 0–60. The frequency for all states flattens out after a certain time due to the transition of each individual into an absorbing state at some time point . If there would be no missingness and absorbing states then the number of all four state frequencies would sum up to 1111 at each time point t. Furthermore, we can observe that the peaks of the four curves are in a sequential order which indicates a positive treatment effect by the transition of individuals in a higher HR category over time.

Figure 5.

(a) Number of individuals in each HR category at time t not including the absorbing states and (b) cumulative number of state transitions until time t (Color figure online).

(a) Number of individuals in each HR category at time t not including the absorbing states and (b) cumulative number of state transitions until time t (Color figure online). In Figure 5(b), the cumulative number of state transitions until time is visualized. In general, we have most transitions from 60–120 to 120–180 which corresponds with the higher number of individuals in the category 120–180 in Figure 5(a). Moreover, the number of state transitions flattens out after circa 8 min due to the fact that an increasing fraction of individuals are finishing the resuscitation treatment and moving into an absorbing state. Further, we observe a positive treatment progression as the upward transition, i.e. the transitions into a higher HR category, are always more frequent than the downward transitions.

Intensity regression model

For studying the intensity regression model, we have applied a modified version of the R-package Addreg [4]. We have included 15 covariates in the intensity regression model (2) composed of 6 baseline covariates (sex, BW, delivery, first HR, cord clamp, treatment start) and 5 time varying covariates (expired volume, peak pressure, BMV length, ventilation (), stimulation ()). The corresponding covariates were described in detail in Section 2.2. In addition, we have included 4 dynamic covariates: HR dispersion (in bpm): Mean of distance between weighted median filtered HR signal and the instantaneous HR within each time history window. HR slope (in bpm): Mean of the HR slope within each time history window where the slope is defined by a simple linear model inside a sliding window of 30 neighboring HR records. Passed ST(in number): Number of previous State Transitions (ST) since the start of records. Time since LST (in sec): Time spend in a current state, i.e. time since Last State Transition (LST). The dynamic covariates were orthogonalized based on the other covariates as described in Section 3.3.3. All five time varying covariates and the two dynamic time varying covariates HR dispersion and HR slope were aggregated in the time history window , whereas the dynamic covariate passed ST was aggregated in . The definition and inclusion of the baseline and time-varying covariates as well as the functional form of each time history window depends on the application and has to be adapted correspondingly. The integrated regression parameters (3) are estimated in a chosen time interval. From Figure 5(a), it can be seen that most of the transitions occur between 1 and 8 min depicted by the two gray dashed vertical lines. Thus, we have selected a starting time of 1 min and a stop time of 8 min for the intensity process. For reasons of space and clarity, we present only results for the two upward transitions and in Figure 6 and elaborate in further detail the state transition 2 for a selected time history.

Figure 6.

Unadjusted univariate models for state transitions (a) and (c) , and final multiple models for state transitions (b) and (d) given different time history lengths h. Univariate model: Each covariate is colored by their P-value. Final model: Included covariates are marked by colored dots accordingly to their P-value. Excluded covariates are marked by small black circles (Color figure online). In Figure 6, P-values for each covariate for the two upward transitions and different time history lengths h are presented. The left column shows results obtained from univariate models, i.e. only one covariate in addition to the intercept is included in the model (2). The P-values are colored (color figure online) red (0) to white (0.05) to blue (1) where white correspond to a chosen significant level of . The right column shows the final multiple model for the two upward transitions and different time history lengths h. Covariates were excluded by an iterative backward procedure based on the largest P-value. Significant covariates are depicted by red (0) to white (0.05) colored dots. Covariates that were excluded from the model are depicted by small black circles. For the state transition , the covariates first HR, BMV length and ventilation are statistically significant for most of the time history lengths h. From the univariate model, we can see that for expired volume, BMV length, ventilation and also for stimulation a certain time history length is necessary in order to be significant. This shows that the effect of resuscitation of newborns with low HR between 0−60 might take longer time compared to newborns within 60–120. The expired volume and partially peak pressure and stimulation are not significant in the final model compared to the univariate model. This can be explained by the correlation of related parameters expired volume, peak pressure, BMV length, ventilation and stimulation during bagging mask ventilation. For the second state transition , the covariates expired volume and ventilation are rather significant in a short time history window whereas stimulation is rather significant for larger h in Figure 6(c). This shows that stimulation is an important factor in a longer time scale whereas the other two covariates have rather an immediate effect on the transition intensity and are smooth out over time. In all final models, first HR is significant for all time scales. In Table 1 and Figure 7, more detailed results for the transition are presented given a time history length of h=60. The P-values in Table 1 correspond to the colored points of the final model at 60 s in Figure 6(d). The estimated integrated regression parameters (3) are depicted together with -confidence intervals in Figure 7. The estimated regression parameters are reflected by the slope in the corresponding plots. As argued in Kvaløy et al. [19], an average effect size may be calculated if the estimated regression parameter is fairly constant over time. This could be done for BMV length, ventilation, Time to LST and maybe sex were we observe a constant positive effect over time. BW has a strong positive effect between ca. 2.5 and 3.5 min, peak pressure has a positive effect in the first 4.5 min afterwards the effect flattens out. Moreover, expired volume has a steep positive slope in the first 2.5 min. This means that the impact of expired volume on the state transition is higher in the beginning of the resuscitation compared to a later time point. This observation is an important medical observation and highlights the need and benefit of having a model which allows for time dependent effects.

Table 1.

Summarized results for the state transition given a time history length of 60 s.

Covariate	Coefficient	95% CI		P-value
Constant	9.199	8.165	10.233	0.000
Sex (male)	1.534	0.230	2.839	0.042
BW	0.001	0.000	0.002	0.002
First HR	0.135	0.105	0.164	0.000
Expired volume	0.016	0.009	0.022	0.000
Peak pressure	0.032	−0.022	0.086	0.009
BMV length	4.705	2.924	6.487	0.000
Ventilation (%)	−8.945	−12.601	−5.288	0.000
Time to LST	−0.031	−0.047	−0.016	0.000

Figure 7.

Plots of estimated integrated regression parameters with approximated confidence intervals (dashed lines) for the state transition given a time history length of 60 s.

Plots of estimated integrated regression parameters with approximated confidence intervals (dashed lines) for the state transition given a time history length of 60 s. The estimated cumulative regression parameters are fairly stable for different choices of the smoothing parameters κ for the weighted median as discussed in detail in Section 1.4 of the Supplementary Material,

Discussion and future work

The article proposed an innovative application of intensity regression to monitored health data. In general, state intensity models can be applied in situations where the interest is to study the effect of the time history of a set of variables on state transitions. This is a common question in medicine. Because state intensity regression is a non-parametric method, there are no distributional assumptions and the model can be adapted for a wide range of data types. Further, state intensity regression allows for time-varying covariate effects and enables the inclusion of dynamic covariates. Thus, we believe that the proposed framework will support the utilization of intensity regression models in this field. In addition, we presented a novel view by studying multiple scales of the time history windows. This can help to visualize and separate between short and long term effects. The presented results are based on the analysis window of 1–8 min, i.e. only data inside this interval was included for the intensity regression model. Similar, to the exploration of effects for different time history lengths, a set of different analysis windows could be explored. This can lead to further understanding of the impact of covariates on the state transition intensity. The proposed model (2) can analyze the association between given state intensities and baseline, time-varying and dynamic covariates. The model is not suited for making predictions. Alternative models have to be considered if the main interest is in predictions. Further, if the health state signal is continuous and has to be categorized, noise and outliers should be removed before categorization because both have a direct influence on the number of state transitions. In this article, we have proposed the use of a weighted median where a smoothing parameter has to be chosen manually by visual inspection as the true underlying signal is usually unknown. An automatic selected adaptive hysteresis regions could further increase the usability of the procedure when the health status is reflected by a continuous signal. The implementation of such adaptive regions is left for future work. The application to resuscitation data of newborns emphasizes the potential of the framework in studying monitored health data. However, a principal limitation with this as well as other analysis approaches is the observational design of the study. For example, there are selection effects involved in the study as difficult cases are likely to be resuscitated longer. Therefore, results have to be interpreted carefully in terms of causality. The approach can be extended to a multivariate state model [4], e.g. modeling of monitored health data where more than one variable describe the health status. Click here for additional data file.

19 in total

1. Dynamic analysis of multivariate failure time data.

Authors: Odd O Aalen; Johan Fosen; Harald Weedon-Fekjaer; Ornulf Borgan; Einar Husebye
Journal: Biometrics Date: 2004-09 Impact factor: 2.571

2. Part 13: Neonatal Resuscitation: 2015 American Heart Association Guidelines Update for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care (Reprint).

Authors: Myra H Wyckoff; Khalid Aziz; Marilyn B Escobedo; Vishal S Kapadia; John Kattwinkel; Jeffrey M Perlman; Wendy M Simon; Gary M Weiner; Jeanette G Zaichkin
Journal: Pediatrics Date: 2015-10-14 Impact factor: 7.124

Review 3. Part 7: Neonatal Resuscitation: 2015 International Consensus on Cardiopulmonary Resuscitation and Emergency Cardiovascular Care Science With Treatment Recommendations.

Authors: Jeffrey M Perlman; Jonathan Wyllie; John Kattwinkel; Myra H Wyckoff; Khalid Aziz; Ruth Guinsburg; Han-Suk Kim; Helen G Liley; Lindsay Mildenhall; Wendy M Simon; Edgardo Szyld; Masanori Tamura; Sithembiso Velaphi
Journal: Circulation Date: 2015-10-20 Impact factor: 29.690

4. Dynamic analysis of recurrent event data using the additive hazard model.

Authors: Johan Fosen; Ornulf Borgan; Harald Weedon-Fekjaer; Odd O Aalen
Journal: Biom J Date: 2006-06 Impact factor: 2.207

5. Tutorial in biostatistics: competing risks and multi-state models.

Authors: H Putter; M Fiocco; R B Geskus
Journal: Stat Med Date: 2007-05-20 Impact factor: 2.373

6. Normal Newborn Heart Rate in the First Five Minutes of Life Assessed by Dry-Electrode Electrocardiography.

Authors: Jørgen Erland Linde; Jörn Schulz; Jeffrey M Perlman; Knut Øymar; Fortunata Francis; Joar Eilevstjønn; Hege Langli Ersdal
Journal: Neonatology Date: 2016-06-02 Impact factor: 4.035

7. A linear regression model for the analysis of life times.

Authors: O O Aalen
Journal: Stat Med Date: 1989-08 Impact factor: 2.373

8. Multi-state models for the analysis of time-to-event data.

Authors: Luís Meira-Machado; Jacobo de Uña-Alvarez; Carmen Cadarso-Suárez; Per K Andersen
Journal: Stat Methods Med Res Date: 2008-06-18 Impact factor: 3.021

9. Implementing a simplified neonatal resuscitation protocol-helping babies breathe at birth (HBB) - at a tertiary level hospital in Nepal for an increased perinatal survival.

Authors: K C Ashish; Mats Målqvist; Johan Wrammert; Sheela Verma; Dhan Raj Aryal; Robert Clark; P K C Naresh; Ravi Vitrakoti; Kedar Baral; Uwe Ewald
Journal: BMC Pediatr Date: 2012-10-05 Impact factor: 2.125

10. Smart wearable body sensors for patient self-assessment and monitoring.

Authors: Geoff Appelboom; Elvis Camacho; Mickey E Abraham; Samuel S Bruce; Emmanuel Lp Dumont; Brad E Zacharia; Randy D'Amico; Justin Slomian; Jean Yves Reginster; Olivier Bruyère; E Sander Connolly
Journal: Arch Public Health Date: 2014-08-22