Literature DB >> 35371895

A Functional Model for Structure Learning and Parameter Estimation in Continuous Time Bayesian Network: An Application in Identifying Patterns of Multiple Chronic Conditions.

Syed Hasib Akhter Faruqui¹, Adel Alaeddini¹, Jing Wang², Carlos A Jaramillo³, Mary Jo Pugh⁴.

Abstract

Bayesian networks are powerful statistical models to study the probabilistic relationships among sets of random variables with significant applications in disease modeling and prediction. Here, we propose a continuous time Bayesian network with conditional dependencies represented as regularized Poisson regressions to model the impact of exogenous variables on the conditional intensities of the network. We also propose an adaptive group regularization method with an intuitive early stopping feature based on Gaussian mixture model clustering for efficient learning of the structure and parameters of the proposed network. Using a dataset of patients with multiple chronic conditions extracted from electronic health records of the Department of Veterans Affairs, we compare the performance of the proposed network with some of the existing methods in the literature for both short-term (one-year ahead) and long-term (multi-year ahead) predictions. The proposed model provides a sparse intuitive representation of the complex functional relationships between multiple chronic conditions. It also provides the capability of analyzing multiple disease trajectories over time, given any combination of preexisting conditions.

Entities: Chemical

Keywords: Continuous time Bayesian network; Gaussian mixture model; Poisson regression; adaptive group lasso; multiple chronic conditions

Year: 2021 PMID： 35371895 PMCID： PMC8975131 DOI： 10.1109/access.2021.3122912

Source DB: PubMed Journal: IEEE Access ISSN： 2169-3536 Impact factor: 3.367

INTRODUCTION

Bayesian networks (BNs) are probabilistic graphical models that represent a set of random variables and their conditional dependencies via a directed acyclic graph (DAG) [1]–[3]. BNs offer valuable insights about the random variables and their interactions for complex data summarization and visualization, prediction and inference, and correlation and causation analysis by encoding the information uncertainty in their structure. BNs also have applications in medicine for predictive modeling of multiple chronic conditions (MCC) [4], [5]. Although BNs were originally designed for studying the relationships among static random variables, recently, it has been applied to study random variables with temporal behavior [5]–[7]. Multilevel temporal Bayesian networks (MTBNs) describe the temporal states of the network variables over a finite number of discretized times. [4], [5]. The set of edges within each discretized time present the regular conditional dependencies among random variables, while the edges between the (discretized) time points represent the temporal dependencies. Since MTBNs do not directly model the time and the dynamics of the random variables, classic structure learning algorithms can be used to learn the structure and parameters of the network. Dynamic Bayesian networks (DBNs) [8]–[10] are another extension to BNs, that represent the temporal dynamics of random variables over an infinite number of discretized times. Unlike MTBNs, DBNs generally duplicate the time slices to represent the temporal dynamics of the random variables over a fixed time range and do not allow for a change in the structure of the network over time [11]. Temporal Nodes Bayesian networks (TNBNs) are yet another alternative for modeling the dynamic processes of BNs random variables. The nodes of TNBNs represent the time of occurrence, and the edges represent the causal-temporal relationships. The temporal nodes allow for having time intervals of different durations to represent the possible delays between the occurrences of parent events (causes) and the corresponding child events [12]. MTBNs, DBNs, and TNBNs describe the states of temporal BNs over discrete time points but do not model time explicitly. This makes it very difficult to query MTBNs, DBNs, and TNBNs over the time at which the state of a random variable change or an event occurs (i.e., at an irregular time). Furthermore, MTBNs, DBNs, and TNBNs slice the time into fixed increments, but in reality, the random variables such as chronic conditions evolve at different time granularities. This makes the inference process very challenging, especially for large-scale networks. Choosing a large or small granularity may change the network structure and cause inaccurate model (for a large time granularity) and learning/inference inefficiencies (for a small-time granularity) [13]–[15]. Continuous time Bayesian networks (CTBNs) [16], on the other hand, explicitly model the time by defining a graphical structure over continuous time Markov processes (CTMPs) [17]. This allows explicit representation of the temporal dynamics and the probability distribution of the random variables over time, i.e., the emergence of a new chronic condition in MCC patients. However, CTBNs assume fixed conditional intensities for representing the relationships between random variables and, therefore, cannot model the impact of exogenous variables on the conditional dependencies of the network. Additionally, similar to DBNs, TNBNs and, MTBNs, learning the structure of CTBNs is challenging and typically carried out by heuristic greedy search algorithms [18]. This restricts the application of CTBNs to problems with multiple exogenous variables of different levels. An example of this problem is when modeling the temporal relationship between the emergence of different chronic medical conditions which is affected by individual patients’ gender, age, race, education, etc. To address the above challenges, we propose to represent the conditional intensities (dependencies) of the CTBN as regularized Poisson regression to take into account the impact of various levels of exogenous variables on the network structure and parameters. We then transform the proposed functional (FCTBN) into a large-scale regularized regression estimation problem and propose an adaptive regularization framework with early stopping features for joint structure and parameter learning. Using a large dataset of patients with multiple chronic conditions extracted from electronic health records of the Department of Veterans Affairs, we compare the predictive performance of the proposed functional CTBN model with some of the existing methods in the literature, including LRMCL [19] and unsupervised MTBN [4]. We also demonstrate the performance of the proposed functional CTBN for analyzing the trajectories of MCC emergence over time. Our paper has the following contributions: We propose to formulate the conditional intensities of the continuous time Bayesian networks (CTBN) as a function of exogenous risk factors using regularized Poisson regression. The proposed functional CTBN (FCTBN) enables the personalization of the CTBN prediction and inference for individual patients according to their risk factors. We propose an adaptive group regularization framework to simultaneously learn the structure and conditional intensities of the proposed functional CTBN. The information of the regularization path of the proposed learning algorithm helps the users, i.e., medical practitioners and patients, to achieve the desired level of sparsity. We propose a Gaussian mixture model (GMM) based approach for early stopping of the proposed learning algorithm without losing much information. The proposed approach uses clustering to expedite pushing insignificant parameters with very small values toward zero, which may take numerous additional iterations of the training algorithm. The remainder of the paper is structured as follows. Section II provides the relevant literature to the proposed study. Section III presents the preliminaries and background for the CTBN. Section IV describes the details of the proposed functional CTBN and the regularized regression model for learning its structure and parameters. Section V presents the study population, the resulting model structure and parameters, predictive performance, and trajectory analysis. Section VI provides the summary and concluding remarks.

RELEVANT LITERATURE

CTBNs are graphical models whose nodes are associated with random variables with states continuously evolving over time. Consequently, the evolution of each variable depends on the state of its parents in the graph. Nodelman et al. [13], [16] presented the framework of CTBN in their previous works. It was built on the framework of homogeneous Markov processes [20], which provided the model of evolution in continuous time and at the same time utilizing the ideas of Bayesian networks to provide a graphical representation for a system. CTBNs overcome the limitations of other temporal models (MTBNS, DBNs, TNBNs, etc.) by explicitly representing temporal dynamics of a system i.e. they can learn the probability distribution over time for systems (processes) that evolve at an irregular time interval [16]. CTBNs have been used for a variety of dynamic temporal systems like discovering the social network dynamics [21], intrusion in network computer system [22], modelling sensor networks [23], reliability analysis of dynamic systems [24], robot motion monitoring [25], and monitoring and predicting cardiogenic heart failure [26]. Nodelman et al. [13] derived a Bayesian scoring function to learn the structure of a CTBN model from fully observed data. However, in real life, we often obtain partially observable data. Thus later, they provided an extension to learn the structure of a CTBN from partially observable data based on the structural EM algorithm [27]. Codecasa et al. [28] extended the CTBN structure learning model presented by Nodelman et al. [13] to a CTBN classifiers by constraining the class nodes (not dependent on time). Their model combines conditional log-likelihood scoring with Bayesian parameter learning, which outperformed the previous log-likelihood scoring function. Yang et al. [3] developed a non-parametric approach to learn a CTBN structure in relational domains, with varying numbers of objects and the relations among them. Although a CTBN provides a compact representation over traditional CTMP, for a large or highly inter-dependent system, the complexity of learning a CTBN model grows exponentially with respect to a node’s parents. In the worst case, a node may depend on all other nodes in the network, resulting in a complexity equivalent to the original CTMP. Perreault et al. [29] imposed additional structures on the model to reduce the complexity in learning the CTBN models. Cao and Dingzhou [52] modeled a logical OR gate utilizing the CTBN nodes with deterministic transitions. Logan et al. [50] extended the Noisy-OR for CTBNs to reduce the required parameters. In this paper, to reduce the number of parameters to estimate, we assume the conditional effects of the parent’s nodes are multiplicative, which is on a par with the Noisy-OR [48], [51] and the CT-NOR [49]. The Noisy-OR model assumes the independence of the effects of parent nodes to reduce the model complexity. Natural parameterization of the NOR model is equivalent to the CT-NOR model in the limit, given the bin width approaches zero. The inference process in CTBNs is different than inference in general BN models. Both the exact inference and approximate inference in CTBNs are NP-hard even if the initial state values are given [30]. The exact inference [13] in CTBNs utilizes the full joint intensity matrix and computes the exponential of the matrix, which is often intractable. This method of inference often ignores the factored nature of the CTBNs; thus, most research in CTBNs’ inference has focused on approximation algorithms [31]. Nodelman et al. [32] developed such an approximation inference method based on expectation propagation. Later, Saria et al. [33] extended the model to full belief propagation and provided an algorithm to adapt the approximation quality. A message-passing scheme has been employed in neighboring nodes for each interval of evidence provided. Messages are continually passed till a consistent distribution has been attained over the interval of evidence. Several sample based algorithms have also been developed. El-hay et al. [34] developed a Gibbs sampling based procedure to sample from the trajectories given a certain set of parent conditions while Fan et al. [35] developed an importance sampling algorithm that computes the expectations of any function of trajectory to perform the inference operation given a fixed set of constraints. Methods using variational techniques such as the belief propagation [36] and the mean-field approximation [37] have also been developed. These models utilize systems of ordinary differential equations to approximate the system distribution. To handle point evidence, Ng et al. [25] developed a continuous time particle filtering algorithm. Aside from CTBNs, a significant amount of work has also been done to integrate Poisson processes with Bayes nets to represent events in continuous time. Rajaram et al. [55] developed the Poisson network model for representing multivariate structured Poisson processes. They modeled the waiting times of a process by an exponential distribution with a piecewise constant rate function that depends on the event counts of its parents. They also adopt a Bayesian approach for learning the network structure. Simma et al. [49] presented CT-NOR, a generative model for representing and reasoning about the relationships among events in continuous time. Using a parameterized function, the CT-NOR can incorporate specific domain knowledge about the expected shape of the distribution of the time delay between events. They used the expectation-maximization (EM) algorithm to fit the parameters of the CT-NOR model. Simma et al. [54] presented a framework for building a probabilistic model of discrete events over continuous time based on cascades of Poisson processes. Their Poisson cascades model can exploit a wide range of delays, transitions, and fertilities. They used the EM algorithm to inference from the Poisson Cascades model. Gunawardana et al. [53] described a set of graphical event models (GEMs) to approximate a board class smooth multivariate temporal point processes. They used BIC and ML for parameter and structure learning. They also provided theoretical results showing that the dependency structure of a universal family of point process models can be learned from data. In this paper, we extend the earlier works in the literature by formulating the conditional intensities of the transitions between the states of the CTBN as a regularized Poisson regression of exogenous risk factors, while assuming the multiplicative effect of parent nodes to reduce the number parameters to estimate. We also propose using principal component analysis (PCA) or kernel PCA to extract few informative features of exogenous risk factors and reduce the dimensionality. We develop an adaptive group regularized regression-based framework to simultaneously learn the structure and parameters of the proposed functional CTBN model, where the information of the regularization path of the learning algorithm allows for achieving the desired level of sparsity. Additionally, we represent a Gaussian mixture model to enable early stopping of the estimation procedure.

RELEVANT BACKGROUND

In this section we review major components of a CTBN [13], [16]. A CTBN represents finite-state, continuous time processes over a factored state, which explicitly represents the temporal dynamics and allows to extract the probability distribution overtime when a specific event occurs.

MARKOV PROCESS

Markov processes are an important class of random processes in which the future state of a random variable is independent of the past, given the present [17]. Let X denotes a random variable with n distinct states, . The stochastic behavior of X can be modeled by an initial distribution and a time-invariant transition intensity matrix Q of size n × n which can be written as where q represents the rate of transition from state x to state x, and . The probability density function (f) and the probability distribution function (F) for staying at the same state (x) are exponentially distributed with parameter q and calculated as- After transitioning, which takes an expected transition time of , the variable X shifts to state x with probability . While a Markov process provides a straightforward framework for modeling the temporal behaviour of a random variable with finite states, it doesn’t scale up well for large state spaces i.e. the size of intensity matrix, Q grows exponentially with the number of variables. For example for a discrete random variable with n = 10 states, it requires conditional intensities to be estimated. Thus to improve the scalability issue, the concept of conditional Markov process is introduced.

CONDITIONAL MARKOV PROCESS

To improve the scalability of Markov processes for large state spaces, Nodelman et al. [16] introduced the idea of the conditional Markov process, in which the transition intensity matrix changes over time, but not as a direct function of time, rather as a function of the state values of some parent variable which also evolves as a Markov process. Let, represent the state space of the parent variable, then the conditional intensity matrix (CIM) Q can be written as Conditioning the transitions on parent conditions increase the sparsity of the intensity matrix considerably, which is especially helpful for modeling large state spaces. When no parent variable is present, the CIM will be the same as the classic intensity matrix. When a parent variable u is present, there will be an intensity matrix associated with each state of the parent variable u ∈ u. When multiple parent variables are present, there will be an intensity matrix associated with each combination of the states of the parent variables, which can still be represented by u. For our case study in Section V, we model the transition intensities between different states of multiple chronic conditions based on conditional Markov processes. The model formulates the probability of transition between the (discrete) states of the multiple chronic conditions as a continuous function of time, namely exponential distributing, with respect to the associate conditional intensities. Such formulation helps better to capture the actual progression of the multiple chronic conditions (Please also see Figure 5 in Section V-C).

FIGURE 5.

Learned functional CTBN structure for a set of given conditions and their progression over time. This includes Traumatic Brain Injury (TBI), Back Pain (BaPa), Post Traumatic Stress Disorder (PTSD), Depression (Depr), and Substance Abuse (SuAb) and the thickness of the edge represents the strengths of the conditional intensities, q. The patient in consideration have the pre-existing conditions of TBI, BaPa and Depr. Overall, it illustrates the dynamics of the transition probabilities (risk of acquiring a new condition) of MCC for a sample patient.

CONTINUOUS TIME BAYESIAN NETWORK

Let reconsider x as a local variable to represent the states i = {0 : not − having, 1 : having} of a set of chronic conditions (random variables) X = {diabetes, obesity, hyperlipidemia, hypertension, cognitive impairment}, which may change over time. A continuous time Bayesian network (CTBN) [16], [20] can be built by putting together a set of CIMs under a graph structure to present the temporal behavior and internal relationships of the local variables (MCC conditions). The two main components of a CTBN are: An initial distribution (), which formulates the structure of the (conditional) relationship among the random variables (chronic conditions) and is specified as a Bayesian network. A state transition model (), which describes the transient behavior of each local variable (the states of a chronic condition) x ∈ X given the states of the parent variables (related preexisting medical conditions) u as specified by Equation 3. Each node X ={diabetes, obesity, hyperlipidemia, hypertension, cognitive impairment} in the CTBN is a random variable with finite discrete states (in this study we consider two states for each node representing having or not-having a disease). Each edge in the graph implies the effect of the parent node (preexisting condition) X on the evolution of the child node (new condition) X. As the graph suggests, the child node’s evolution cannot simultaneously depend on the status of the parent nodes [16]. CTBN explicitly represents the temporal dynamics of random variables, which enable the extraction of the probability distribution over time when a specific event occurs. Unlike traditional Bayesian networks, CTBN allows for cycles in the graph, G. This is an important property for modeling reinforcing loop between random variables, as we will show in the case study for modeling the relationship between multiple chronic conditions. Later, we will also use this property to develop a regularization based method for structure learning of the CTBN.

QUERIES AND INFERENCE

Similar to classic Bayesian networks, CTBNs can be used for making inferences or answering queries. For instance, having observations from some of the nodes (preexisting medical conditions), we can infer about the probability distribution of some other nodes (new possible medical conditions). Given the intensity matrix of the CTBN is formed, it can be used to answer queries the same way as a Markov process. Given a joint intensity matrix , the distribution over the state of X at any time t can be calculated using Equation 4. To calculate the joint distribution over any two points in time, Equation 4 can be modified as following- The inference operation can be performed using either the exact or the approximate algorithm. The use of amalgamation methods [16] is an exact algorithm that involves large matrix representation. However, for systems with large state space, it becomes computationally inefficient (also, exact inference in CTBN is NP-Hard); thus, we tend to utilize the approximation methods [32], [38]. Sampling-based algorithms can also be used to perform the inference operation.

PARAMETER ESTIMATION

Having a dataset of H observed transitions, where τ represents the time at which the h transition has occurred, and is a Bayesian network defining the structure of the (conditional) relationship among variables, we can use maximum likelihood estimation (MLE) (Equation 6) to estimate the parameters of the CTBN model as defined in Nodelman et al. [13] where, is the total time X spends in the same state x, and is the total number of times X makes a transition out of state x given, . The log-likelihood function can be then written as- Maximizing Equation 7, provides the maximum likelihood estimate of the conditional intensities as shown in Equation 8 The above estimation is true for the case with complete data. For the cases including incomplete dataset, expectation maximization (EM) algorithms can be used [27], [32].

PROPOSED METHODOLOGY

We begin with formulating the conditional dependencies of the CTBN as a Poisson regression of some exogenous variables z. Next, we drive the likelihood function of the functional CTBN as a collection of Poisson regression likelihoods. Afterwards, we propose an adaptive group regularization method for structure and parameter learning of the functional CTBN. Finally, we present post-processing and early stopping based on Gaussian mixture model clustering of the estimated parameters.

FUNCTIONAL CONTINUOUS TIME BAYESIAN NETWORK WITH CONDITIONAL DEPENDENCIES AS POISSON REGRESSION

In many real world problems, such as the progression of multiple chronic conditions, which is discussed in our case study, the evolution of the state variables (chronic conditions) not only depends on their immediate past state and the states of their parents variable (pre-existing conditions) but also (possibly) on some exogenous variables (socio-demographic factors). We propose to formulate the conditional intensities of the CTBN as a function of exogenous risk factors using a Poisson regression, which utilizes a special set of generalized linear models. Let denote a set of exogenous variables, i.e., patient-level risk factors such as age, gender, race, education, marital status, etc. The rate of transition between any two-state variables, (say, chronic conditions) can be derived as: where and are the coefficients of the Poisson regression. When the state space of the system and related conditions are binary (as in our case study on MCC transitions, where MCC states include having/not having each of the conditions), the conditional intensities in , can be estimated just using Equation 9b because for Markov processes with binary states . This feature considerably simplifies the estimation of the functional CTBN conditional intensity matrix based on Poisson regression. Having a dataset of state variables’ transition trajectories, , where τ( represents the time at which the h (MCC) transition of the p subject has occurred, we can use maximum likelihood estimation (MLE) to estimate the parameters of the proposed functional CTBN. Assuming that all transitions are observed, the likelihood of can be decomposed as the product of the likelihood for individual transitions, q. Let represents a state transition for subject p with risk factors z, and parent variables u, which/who made transition to state x after spending the amount of time in state . If the state space of the conditions is considered as binary, i.e. having/not having a chronic condition, the likelihood of the single transition d can be written as in Equation 10 By multiplying the likelihoods of all transitions for all subjects (patients) () and taking the log, we obtain the overall log-likelihood function as in Equation 11 Equation 11 is a convex function in terms of and can be maximized using a convex optimization algorithms such as Newton-Raphson. Given the structure of the functional CTBN (See Figure 1), i.e. the parent set for each variable, the maximum number of parameters to be estimated in Equation 11 will be , where is the number of state variables (conditions), is the number of exogenous variables (risk factors) presents in the system, and max() is the maximum number of parents considered (pre-existing diseases for each condition). Therefore, as in classical Bayesian networks, the number of parents has a direct and exponential influence on the computational efficiency of the estimation process and should be limited to a small number. We propose to assume the conditional effect of parents is multiplicative, i.e. , to make the conditional effect of the risk factors additive given the set of parents, i.e. . This assumption, which is on a par with the Noisy-OR [48], [51] and the CT-NOR [49], reduces the maximum number of parameters to be estimated to . However, in situations where the number of exogenous variables (risk factors for the multiple chronic conditions) are large, the estimation of the Poisson regression parameters can still be computationally challenging, even with the multiplicative assumption. To address this problem, we propose to consider principal component analysis (PCA) or kernel PCA (KPCA) to first extract a few informative features of z, and then use those features of the original covariates for building the Poisson regression model for each conditional intensity [39]. While reducing the interpretability of the estimated parameters (), using dimensionality reduction (PCA or KPCA) helps with efficient modeling of the non/linear relationship among the risk factors. Considering the PCA for our case study, while it is restricted to only linear correlation between (exogenous) variables, Our analysis shows the PCA captures a considerable portion of variation (> 82%) in our dataset using only the first principal component, which is also evidenced in the proposed model performance as described in Section V.

FIGURE 1.

Illustration of the functional CTBN for 5 MCC including Traumatic Brain Injury (TBI), Back Pain (BaPa), Post Traumatic Stress Disorder (PTSD), Depression (Depr), and Substance Abuse (SuAb) based on the case study discussed in section V. The thickness of the edges represent the strengths of the conditional intensities, q.

ADAPTIVE GROUP REGULARIZATION FRAMEWORK FOR STRUCTURE LEARNING IN CONTINUOUS TIME BAYESIAN NETWORK

The parameter estimation approach presented above requires the parent set of each condition to be known, which is equivalent to knowing the structure of the Bayesian network. Here, we propose an adaptive group regularization-based framework to simultaneously learn the structure () and conditional intensities () of the functional Bayesian network model. Regularization-based structure learning is a recent approach that is gaining popularity for parameter estimation in graphical models [40]–[42]. However, since regularization can result in cycles in graphical models, it is not generally considered for directed graphs. Given that the proposed functional CTBN has a special structure based on a conditional intensity matrix that allows for cycles, this study proposes to extend the regularization-based structure learning to functional CTBN. Considering the negative log likelihood of the fully connected functional CTBN, we propose to add an adaptive group regularization term to the negative log likelihood function to penalize groups of parameters pertaining to each specific conditional intensities as in Equation 12. where, is the norm of the group of parameters associated with each conditional intensity, k is the groups size which is based on the number of coefficients in the Poisson regression for each conditional intensity, is the tuning parameters of the adaptive group regularization that control the amount of shrinkage, where λ is inversely weighted based on the unpenalized estimated value of the regression coefficients [43].The index j implies the adaptive penalization applied to each grouped parameter. Fast-iterative shrinkage thresholding algorithm (FISTA) can be used for solving Equations 12 [44]. Figure 3 shows the regularization path of the tuning parameter for some of the parameters of the proposed model.

FIGURE 3.

The regularization path of the tuning parameter for the proposed model (For the sake of simplicity only some of the total learned parameters are shown).

An interesting feature of the adaptive group-regularization based structure learning is that we can use the regularization path to control the level of sparsity in the proposed functional CTBN (See Figure 2).

FIGURE 2.

The effect of changing the tuning parameter (regularization path) on the structure of the functional CTBN for the 5 MCC in our case study including TBI, BaPA, PTSD, Depr, and SuAb.

POST PROCESSING AND EARLY STOPPING

Each of the conditional dependencies (edges) in the proposed functional CTBN is consisted of a Poisson regressions with multiple exogenous variables. For cases where the number of parent variables and/or the exogenous variables are large, the process of structure and parameter learning requires substantial computation. However, the majority of changes in the estimated values of the (Poisson regression) parameters happen in the early iterations of the learning algorithm. The (numerous) remaining iterations of the learning algorithms make minor adjustments to the estimated values of the parameters, especially pushing parameters with a small value toward zero. These later steps to push insignificant parameters toward zero can take many iterations without significantly changing other (significant) parameter values. Meanwhile, for a sufficiently large choice of the tuning parameter, some of the parameters will be zero. This is because for a general regression problem, setting the tuning parameter to infinity results in all coefficients except the intercept being zero. On the other hand, a very small choice of the tuning parameter can result in all estimated parameters being non-zero. Additionally, having all the parameters in the proposed model as non-zero would be equivalent to having a fully connected functional CTBN network, which is not expected in most cases. Therefore, it is plausible to have some zero parameters to be discovered by an appropriate regularization method, such as the adaptive group regularization framework as described in section IV-C. To reduce the number of (additional) iterations for zeroing non-significant parameters, we propose to use Gaussian mixture models (GMM). For this purpose, we stop the learning algorithm when there is no significant change in the estimated parameters. Next, we use a GMM to model the estimated parameters [45]. The GMM is expected to have one cluster with a mean around zero (representing the insignificant parameters to be pushed toward zero) and one or more clusters with non-zero means (representing significant parameters). Once the clusters and their parameters are identified, we choose the cluster with (around) zero mean and assign a value of zero to all parameters within ±3σ (standard deviation) around the mean μ ∼ 0. We run an additional iteration of the learning algorithm with the parameter to ensure convergence.

COMPUTATIONAL COMPLEXITY

In this section, we derive the time complexity of algorithms presented earlier. Let, n denote the number of node/variables (chronic conditions) in the graph, k denote the number of observation, l denote the number of parents with respect to a node x, q denote the number of transition of a condition from one state to another, r denote the number of possible values/instances for each variable (in our study r = 2, which represents having/not having a condition), p denote number of risk-factors, and m number of learned coefficients. The algorithm consists of two components: (1) Learning the Parameters of Functional CTBN: , and (2) GMM for Early Stopping and Structure Learning: . Integrating the complexities of the 2 components with some algebraic simplification, the overall complexity can be derived as .

CASE STUDY: IDENTIFYING PATTERNS OF MULTIPLECHRONIC CONDITIONS

Long-lasting diseases, otherwise known as chronic conditions, can be considered a staple example of degradation processes that progress over time and contribute to the development of other new chronic conditions. The presence of two or more chronic medical conditions in an individual is commonly defined as multimorbidity, or multiple chronic conditions (MCC) [46]. Here, we use the proposed functional CTBN to find the impact of patient level risk factors on the conditional dependencies of MCC and the evolution of different chronic conditions over time.

STUDY POPULATION AND DEMOGRAPHICS

The dataset used for this study includes 608,503 patients with two or more MCC (including Traumatic Brain Injury (TBI), Post Traumatic Stress Disorder (PTSD), Depression (Depr), Substance Abuse (SuAb), and Back pain (BaPa)) who received medical care from the Department of Veteran Affairs for at least three years between 2002–2015. For meaningful prediction, we have removed the data for patients whose data was not maintained over three years. The dropout of patient information may be caused by but not limited to death, not requiring care or receiving care, etc. After dropping such data, the number of patients considered for the analysis is 257,633. The dataset includes the ICD-9-CM diagnosis codes documented during the course of VA care, during each inpatient or outpatient encounter. The risk factors (exogenous variables) considered in the dataset include age at VA entry, sex, race/ethnicity (White, African American, Hispanic, Asian/Pacific Islander, Native American, unknown), and education (less than high school, high school graduate, some college, college graduate, post-college education). Table 1 shows the summary of the collected data based on patients’ demographics. In this study, in order to reduce the computational complexity of the algorithm and to show the application of the dimensionality reduction technique, we use PCA to reduce the number of risk factors into one.

Table 1.

Demographics of the patients included in the study.

SI No.	Race	Gender		Marital Status		Age Group				Education
		Male	Female	Married	Un-Married	18-30	31-40	41-50	51- Rest	Unknown	< High School	High School	Some College	College Graduate	Post College
1	White	148355	19183	74487	93051	96799	36003	26167	8569	2334	2037	129921	16743	12024	4479
		57.58%	7.45%	28.91%	36.12%	37.57%	13.97%	10.16%	3.33%	0.91%	0.79%	50.43%	6.50%	4.67%	1.74%
2	Black	35758	11828	23308	24278	20047	12468	12710	2361	658	504	37506	1 4819	3160	939
		13.88%	4.59%	9.05%	9.42%	7.78%	4.84%	4.93%	0.92%	0.26%	0.20%	14.56%	1.87%	1.23%	0.36%
3	Hispanic	25373	4232	14523	15082	17016	6606	4758	1225	386	360	23592	1 2933	1893	441
		9.85%	1.64%	5.64%	5.85%	6.60%	2.56%	1.85%	0.48%	0.15%	0.14%	9.16%	1.14%	0.73%	0.17%
4	Asian	5639	981	3067	3553	3235	1361	1564	460	131	60	4732	1 598	879	220
		2.19%	0.38%	1.19%	1.38%	1.26%	0.53%	0.61%	0.18%	0.05%	0.02%	1.84%	0.23%	0.34%	0.09%
5	Native	3081	707	1747	2041	2115	925	564	184	60	60	3004	1 376	217	71
		1.20%	0.27%	0.68%	0.79%	0.82%	0.36%	0.22%	0.07%	0.02%	0.02%	1.17%	0.15%	0.08%	0.03%
6	Unknown	2135	361	1346	1150	1062	625	673	136	51	22	1808	1 287	223	105
		0.83%	0.14%	0.52%	0.45%	0.41%	0.24%	0.26%	0.05%	0.02%	0.01%	0.70%	0.11%	0.09%	0.04%

DIAGNOSED HEALTH CONDITIONS

We used ICD-9-CM codes from the inpatient and outpatient data (excluding ancillary and telephone care) to identify Traumatic brain injury (TBI), Post Traumatic Disorder (PTSD), Depression (Depr), substance abuse (SuAb), Back pain (BaPa) using validated published algorithms [47]. PTSD, SuAb, and BaPa required two diagnoses at least seven days apart, while TBI, which is an acute injury, required only a single diagnosis. Each condition was coded as “0” or “1” for each year of care, with 1 indicating a diagnosis for that condition regardless of the number of instances for which each condition was diagnosed (Additional information on ICD-9 codes for the considered conditions can be found on appendix B).

STRUCTURE AND PARAMETER LEARNING

To identify the optimal value of the tuning parameter (λ) of the group regularization method for structure and parameter learning, we use cross-validation error based on several λ values (0,100,101,102,....,106). Figure 4a shows the cross-validation error for different λ values.

FIGURE 4.

(a) Tuning of the hyper parameter (λ) based on cross validation, (b) Post processing and early stopping of the structure and parameter learning process using Gaussian mixture model, and (c) The estimated parameters of functional CTBN based on the optimal value of the tuning parameter.

We attain the structure of the functional CTBN and the conditional intensities based on the parameters estimated using the optimal value of λ = 103. Figure 4c illustrates the heatmap of the estimated parameters (β) of the proposed CTBN based on λ = 103. As shown in figure 4a and 4c, setting λ = 103 not only provides considerably low (cross validation) error, but also significantly reduces the number of (non zero) parameters (a sparsity ratio of 64.75%). Figure 4c provides the heatmap of the estimated parameters of the learned functional CTBN model, which is equivalent to the graphical model presented in figure 1. To identify the final structure of the functional CTBN, considering the sparse learned parameter matrix in figure 4c, if all parameters (coefficients) of the Poisson regression connecting a parent node to a child node are zero, there exists no edge between them. On the other hand, if there exists a non zero parameter for the Poisson regression model connecting a parent node to a child node, there exists an edge between the two nodes, where the strength of the connection is represented by the conditional intensity value. Meanwhile, to reduce the number of training iterations for obtaining the spare matrix in Figure 4c, we use GMM as explained in section IV-D. Figure 4b shows the Gaussian densities fitted to the estimated parameters at iteration 30,000 of the learning algorithm, which shows two clusters including one with zero mean and small variation (nonsignificant parameters), and the other with non zero mean and high variance (significant parameters). We assign a value of zero to all parameters within ±3σ (standard deviation) of the cluster with the mean around zero. We have verified this result by running the learning algorithm for an additional 20,000 iterations. Additionally, the functional CTBN allows for loop in the structure (as shown in figure 1). This is an important feature in studying MCC because an MCC condition can simultaneously be the cause and/or the effect (result) of another MCC condition, i.e., depression and substance abuse. The functional CTBN also allows for the self-loops to represent staying in the same MCC state (existing/parent conditions) over (fixed amount of) time [19]. Figure 5 illustrates the dynamics of the transition probabilities of MCC, namely the risk of acquiring a new condition over a period of three years for a sample patient with preexisting conditions TBI, BaPa, and Depression. As shown in the figure, the transition probabilities change as a function of time, which is intuitive in the presence of the preexisting conditions TBI, BaPa, and Depression. In particular, having the preexisting conditions TBI, BaPa, and Depression increases the likelihood of acquiring the new disorders PTSD and SuAb over time. It also increases the reinforcing loop between the existing conditions, which is also intuitive. This can help health practitioners and patients to better the short- and long-term (negative) impact of MCC on acquiring new conditions.

PERFORMANCE EVALUATION: PREDICTIVE POWER

We utilize the validation set method based on 250 000 patients for training and 7633 patients for validation, along with the Area Under the Curve (AUC) of the receiver Operatic Characteristic (ROC) function to evaluate the performance of the proposed FCTBN model. We also compare the performance of the FCTBN with two existing methods from the literature including unsupervised multilevel temporal Bayesian networks (MTBN) [4] and latent regression Markov mixture clustering (LRMCL) [19]. The step-by-step procedure of training and testing of the comparing algorithms is provided in appendix A. For the comparisons, considering the patients’ existing MCC in the base year, which can be any combination of the 5 MCC including no condition, we use each of the comparing methods to predict the future combination of conditions for the next 5 years. Table 2 illustrates the AUC performance of the comparing methods for each of the five conditions (presented in the columns) for 2 to 5 years from the baseline (presented in the rows). As can be seen from the table, the proposed functional CTBN generally provides better accuracy compared to MTBN and LRMCL for 4 out or the 5 conditions (Depression, Substance Abuse, PTSD, and Backpain) over both short and long term predictions (2–5 years). However, it shows less predictive power compared to MTBN for forecasting TBI. One justification for the lower prediction accuracy for TBI may be the distinct temporal behavior of TBI occurrences. At the patient level, TBI occurrence is generally a more singular event with chronic clinical ramifications that are coded separately in the electronic medical record. Meanwhile, the performance gap improves for the longer predictions, i.e., years 4 and 5, as it captures the temporal pattern of staying in the TBI state more effectively.

Table 2.

The AUC performance (of ROC) of the Functional CTBN (FCTBN) model for predicting the future in comparision to MTBN and LRMCL.

	Depression			Substance Abuse			PTSD			Back Pain			TBI
	FCTBN	MTBN	LRMCL	FCTBN	MTBN	LRMCL	FCTBN	MTBN	LRMCL	FCTBN	MTBN	LRMCL	FCTBN	MTBN	LRMCL
Year 2	75.89%	66.92%	67.34%	76.61%	72.09%	71.91%	79.72%	78.31%	67.02%	73.53%	64.28%	66.35%	65.72%	72.11%	67.59%
Year 3	70.98%	65.96%	56.12%	72.34%	69.09%	59.54%	75.75%	74.95%	63.88%	69.65%	62.16%	56.48%	63.36%	70.08%	57.03%
Year 4	68.92%	64.70%	49.61%	70.75%	68.02%	53.00%	73.76%	73.36%	62.25%	67.79%	55.32%	50.18%	63.33%	69.19%	51.02%
Year 5	67.40%	64.11%	46.44%	69.54%	59.41%	42.87%	72.38%	71.92%	61.18%	66.57%	62.66%	48.08%	62.92%	69.56%	49.21%

It may also be worth noting that the predictive performance of the functional CTBN as shown in Table 2 is based on the model trained on risk factors with reduced dimension (1st principal component of the risk factors) to improve the computational time, while the MTBN and LRMCL take advantage of the original risk factors. Therefore, we believe training the functional CTBN model with the complete risk factors may further enhance its predictive performance (with the trade-off of increasing the computational time). We believe the improved performance of the functional CTBN is partly because of the proposed adaptive group regularization-based learning framework for structure and parameter learning. Specifically, we believe the use of group regularization, in the way proposed in the manuscript, results in fewer yet more informative connections, which improve the training and querying (inference) process. We have shown this concept in one of our related research works [66]. We also believe allowing the model to contain loops (bidirectional connections) helps better capture the (reinforcing) dynamics of the multiple chronic conditions. Additionally, we think explicit capturing of the time can positively affect the learning of parameters of the model, even though the prediction task is actually on discrete time stamps, i.e., years.

TRAJECTORY ANALYSIS

An interesting feature of the proposed functional CTBN is the trajectory analysis of state variables (MCC conditions). Here, we demonstrate two cases of MCC trajectory analyses for different preexisting (parent) conditions and age groups (exogenous variables). In the first case, we investigate the effect of age (groups) on the trajectory of Substance Abuse given TBI and PTSD as the preexisting conditions. Figure 6 shows the most probable trajectory for the emergence of substance abuse for different age groups for the next 24 months given TBI and PTSD in the base month. It can be seen that the above 51 age group is more prone to be diagnosed with substance abuse than younger age groups, with the probability of developing substance abuse for this age group going above 80% just after four months. Whereas, the 18–30 age group reaches 80% after 10 months, the 31–40 age group passes 80% mark after 7 months, and the 41–50 group meets the 80% mark after 5 months.

FIGURE 6.

The risk trajectory of developing Substance Abuse disorder over time for patients of different age groups who are diagnosed with TBI and PTSD at baseline.

This is on par with findings in the medical literature. In a study with service members with mild TBI Miller et al. [59] found an increased risk for addiction-related disorders including alcohol and nicotine. In a separate study with 6,824 military personnel, Adams et al. [60] conducted a path-based analysis to examine the association of binge alcohol drinking with TBI and PTSD. They found almost 70% of the total effect of TBI on binge drinking was from the direct path effect, and only 30% represented the indirect effect through PTSD. Graham et al. [65] found a decrease in substance abuse post-TBI in a younger age group, likely motivated by significant influences on lifestyle choices and functional status given proper support from VA. In the second case, we investigate the effect of age (groups) on the trajectory of depression given PTSD as the existing (prior) conditions. Figure 7 shows the most probable trajectory for the emergence of depression for different age groups for the next 24 months given PTSD in the base month. As can be seen from figure 7 the probability of developing depression after PTSD increases (almost) linearly over time, but with a different slope for different age groups. Unlike the first case, here, the younger patients, i.e., 18–30 age group, are more like to develop depression compared to the other age groups. As the (blue) trajectory line in figure 7 shows, the 18–30 age group trajectory has a considerably high slope reaching a risk of 50% after 20 months. Meanwhile, the slope of the trajectory line reduces for older age groups, i.e., the purple line in the figure shows a marginal increase in the risk of depression for patients aged 51 and older. These differences in age group findings may reflect variability in clinical screening approaches, provider biases, and differences in clinical priorities by these patient populations, resulting in increased or decreased likelihood of getting diagnosed with these conditions. For example, younger age group veterans have been undergoing a widespread national screening program to identify PTSD and to establish treatment and followup, which would likely lead to the additional diagnosis of depression, a known comorbid condition.

FIGURE 7.

The risk trajectory of developing Depression over time for patients of different age groups who are pre-diagnosed with PTSD.

The medical literature also supports this result. Lippa et al. [61] used factor analysis to identify patterns of comorbidity in a sample of 255 previously deployed Post9/11 service members and veterans who participated in a structured clinical interview. They found that over 90% of the patients had psychiatric conditions, and approximately half had three or more conditions. They also identified four clinically relevant psychiatric and behavioral factors, including deployment trauma factor, somatic factor, anxiety factor, and substance abuse factor, which account for 76.9% of the variance in the data. They concluded that depression, PTSD, and a history of military mild TBI could comprise a harmful combination associated with a high risk for substantial disability. In a separate study, Duncan et al. [62] found 36% of depressed patients screened positive for PTSD. Kobayashi et al. [63] found that younger and middle-aged VA primary care patients had more severe PTSD symptoms compared to older patients; however, it was impossible to disentangle the effects of age from a passage of time, limiting interpretation of the findings. On a separate study, on average, women reported the age of onset of depression of 23 –24 years. About half the women were experiencing a mildto-moderate depressive episode, with 47% experiencing a severe depressive episode [64]. Such information can help medical practitioners develop more individualized plans to prevent the emergence of new chronic conditions according to the patient’s specific risk factors and prior conditions.

CONCLUSION

In this paper, we propose a functional continuous time Bayesian network with conditional dependencies represented by regularized Poisson Regression that can be used to learn both the structure and parameters of the network by solving a non-smooth convex optimization problem. While most Bayesian networks are sensitive to time granularity, the proposed functional CTBN can model finite-state continuous time Markov processes over a set of factored states at various time granularity. The FCTBN allows for extracting the probability distribution of various combinations of events at different times with respect to any predetermined values of exogenous variables. The model also utilizes an adaptive group regularization method to learn a sparse representation of the system. For the case study, we have used the proposed FCTBN to model the complex temporal relationship among multiple chronic conditions with respect to patient-level risk factors based on a dataset from the Department of Veterans Affairs. The proposed model provides a considerable improvement in prediction performance in comparison to multilevel temporal Bayesian networks (MTBN) and latent regression Markov mixture clustering (LRMCL). It also effectively characterizes the trajectory of a medical condition over time when for different sets of preexisting medical conditions and risk factors. The proposed FCTBN allows for the personalization of the predictions and therefore has both population and patient-level applications. It can also inform clinicians about the emergence trajectory of MCC over time and the significant risk factors affecting the trajectory, which help to guide clinical care to prevent or delay the onset of new chronic conditions.

16 in total

1. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects.

Authors: Wilson Truccolo; Uri T Eden; Matthew R Fellows; John P Donoghue; Emery N Brown
Journal: J Neurophysiol Date: 2004-09-08 Impact factor: 2.714

2. Risk for addiction-related disorders following mild traumatic brain injury in a large cohort of active-duty U.S. airmen.

Authors: Shannon C Miller; Suzanne H Baktash; Timothy S Webb; Casserly R Whitehead; Charles Maynard; Timothy S Wells; Clifford N Otte; Russel K Gore
Journal: Am J Psychiatry Date: 2013-04 Impact factor: 18.112

3. Impact of PTSD comorbidity on one-year outcomes in a depression trial.

Authors: Bonnie L Green; Janice L Krupnick; Joyce Chung; Juned Siddique; Elizabeth D Krause; Dennis Revicki; Lori Frank; Jeanne Miranda
Journal: J Clin Psychol Date: 2006-07

Review 4. An update on substance use and treatment following traumatic brain injury.

Authors: David P Graham; Aaron L Cardon
Journal: Ann N Y Acad Sci Date: 2008-10 Impact factor: 5.691

5. Comorbidity assessments based on patient report: results from the Veterans Health Study.

Authors: Alfredo J Selim; Graeme Fincke; Xinhua S Ren; Austin Lee; William H Rogers; Donald R Miller; Katherine M Skinner; Mark Linzer; Lewis E Kazis
Journal: J Ambul Care Manage Date: 2004 Jul-Sep

6. Multilevel temporal Bayesian networks can model longitudinal change in multimorbidity.

Authors: Martijn Lappenschaar; Arjen Hommersom; Peter J F Lucas; Joep Lagro; Stefan Visscher; Joke C Korevaar; François G Schellevis
Journal: J Clin Epidemiol Date: 2013-09-12 Impact factor: 6.437

Review 7. Polysomnographically measured sleep abnormalities in PTSD: a meta-analytic review.

Authors: Ihori Kobayashi; Jessica M Boarts; Douglas L Delahanty
Journal: Psychophysiology Date: 2007-05-22 Impact factor: 4.016

8. An Integrated Framework for Reducing Hospital Readmissions using Risk Trajectories Characterization and Discharge Timing Optimization.

Authors: Adel Alaeddini; Jonathan E Helm; Pengyi Shi; Syed Hasib Akhter Faruqui
Journal: IISE Trans Healthc Syst Eng Date: 2019-04-19

9. Deployment-related psychiatric and behavioral conditions and their association with functional disability in OEF/OIF/OND veterans.

Authors: Sara M Lippa; Jennifer R Fonda; Catherine B Fortier; Melissa A Amick; Alexandra Kenna; William P Milberg; Regina E McGlinchey
Journal: J Trauma Stress Date: 2015-02

10. Mining patterns of comorbidity evolution in patients with multiple chronic conditions using unsupervised multi-level temporal Bayesian network.

Authors: Syed Hasib Akhter Faruqui; Adel Alaeddini; Carlos A Jaramillo; Jennifer S Potter; Mary Jo Pugh
Journal: PLoS One Date: 2018-07-12 Impact factor: 3.240

1 in total

1. Dynamic Functional Continuous Time Bayesian Networks for Prediction and Monitoring of the Impact of Patients' Modifiable Lifestyle Behaviors on the Emergence of Multiple Chronic Conditions.

Authors: Syed Hasib Akhter Faruqui; Adel Alaeddini; Jing Wang; Susan P Fisher-Hoch; Joseph B McCormick
Journal: IEEE Access Date: 2021-12-20 Impact factor: 3.476

1 in total