Literature DB >> 30001371

Mining patterns of comorbidity evolution in patients with multiple chronic conditions using unsupervised multi-level temporal Bayesian network.

Syed Hasib Akhter Faruqui1, Adel Alaeddini1, Carlos A Jaramillo2, Jennifer S Potter3, Mary Jo Pugh4.   

Abstract

Over the past few decades, the rise of multiple chronic conditions has become a major concern for clinicians. However, it is still not known precisely how multiple chronic conditions emerge among patients. We propose an unsupervised multi-level temporal Bayesian network to provide a compact representation of the relationship among emergence of multiple chronic conditions and patient level risk factors over time. To improve the efficiency of the learning process, we use an extension of maximum weight spanning tree algorithm and greedy search algorithm to study the structure of the proposed network in three stages, starting with learning the inter-relationship of comorbidities within each year, followed by learning the intra-relationship of comorbidity emergence between consecutive years, and finally learning the hierarchical relationship of comorbidities and patient level risk factors. We also use a longest path algorithm to identify the most likely sequence of comorbidities emerging from and/or leading to specific chronic conditions. Using a de-identified dataset of more than 250,000 patients receiving care from the U.S. Department of Veterans Affairs for a period of five years, we compare the performance of the proposed unsupervised Bayesian network in comparison with those of Bayesian networks developed based on supervised and semi-supervised learning approaches, as well as multivariate probit regression, multinomial logistic regression, and latent regression Markov mixture clustering focusing on traumatic brain injury (TBI), post-traumatic stress disorder (PTSD), depression (Depr), substance abuse (SuAb), and back pain (BaPa). Our findings show that the unsupervised approach has noticeably accurate predictive performance that is comparable to the best performing semi-supervised and the second-best performing supervised approaches. These findings also revealed that the unsupervised approach has improved performance over multivariate probit regression, multinomial logistic regression, and latent regression Markov mixture clustering.

Entities:  

Mesh:

Year:  2018        PMID: 30001371      PMCID: PMC6042705          DOI: 10.1371/journal.pone.0199768

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

For nearly a decade, clinicians caring for Veterans with traumatic brain injury (TBI) have described multimorbidity among those with persistent post-concussion symptoms and other commonly occurring comorbidities. This constellation was first described as the Polytrauma Clinical Triad (PCT), and included TBI, post-traumatic stress disorder (PTSD) and pain [1]. At the national level, approximately six percent of Post-9/11 Veterans were diagnosed with TBI, PTSD and pain in 2009 [2]. The number of Veterans receiving care for PCT were increased by a total of 144% over the triennium, and the health care cost that was four times higher than similar Post-9/11 Veterans without TBI [3]. PCT were also associated with sleep disturbance [4], suicide [5]. Miller et al. [6] showed that service members with mild TBI are at increased risk for addiction-related disorders including alcohol and nicotine. They also observed that mild TBI is distinguished from moderate to severe TBI in terms of timing of the risk, indicating a need for rigorous TBI clinical screening. Corrigan et al. [7] identified seven clusters of lifetime history of TBI for substance abuse problem. These clusters were characterized by injury severity, age of injury occurrence, and periods of repeated TBI. The clusters differed in their contribution to predict future consequences, which suggests an underlying complexity of the medical history that results in substance abuse. Adams et al. [8] conducted a path based analysis to examine the association of binge alcohol drinking with TBI and PTSD. The study sample included 6,824 military personnel and revealed that PTSD and a prior history of TBI both had a direct effect on binge drinking. Lippa et al. [9] used factor analysis to identify patterns of comorbidity in a sample of 255 previously deployed Post-9/11 service members and veterans who participated in a structured clinical interview. They found that over 90% of the patients had psychiatric conditions, and approximately half had three or more conditions. They also identified four clinically relevant psychiatric and behavioral factors, including deployment trauma factor, somatic factor, anxiety factor, and substance abuse factor, that account for 76.9% of the variance in the data. They concluded that depression, PTSD, and a history of military mild TBI can comprise a harmful combination associated with high risk for substantial disability. Using a broader range of comorbid conditions, Pugh, et al. [10] used latent class analysis (LCA) to identify longitudinal comorbidity phenotypes in previously deployed Post-9/11 Veterans based on diagnoses received during the first three years of care in the Veterans Health Administration (VA). In analyses stratified by sex, they found five phenotypes that were consistent in men and women: Healthy, Chronic Disease, Pain, Mental Health, and Polytrauma Clinical Triad (Mental Health, TBI and Pain). These subgroups demonstrated increasing likelihood of having relevant diagnoses over time. Pugh et al. [11] showed that these comorbidity phenotypes are associated with measures of community reintegration, with individuals in the PCT and Mental Health groups being significantly more likely to report difficulty in the transition from military to civilian life, lower levels of social support, and higher rates of unemployment. Alaeddini, et al. [12] developed a Latent Regression Markov Mixture Clustering (LRMCL) algorithm to identify major transitions of four MCC that include hypertension (HTN), depression, PTSD, and back pain in a cohort of 601,805 Iraq and Afghanistan war Veterans (IAVs). The LRMCL algorithm was also able to predict the exact status of comorbidities about 48% of the time. Zador et al. [13] used logistic regression to predict TBI outcome in the dataset of the corticosteroid randomization after significant head injury (CRASH), where they utilized a Bayesian network to assess the dependencies between predictors of the model. This gave a strong insight in formalizing clinical intuition for the demographic predictors being used. Zador et al. [14] employed a similar approach based on Bayesian networks to visualize the probabilistic associations between outcome predictors of acute aneurysmal subarachnoid hemorrhage. Cai et al. [15] also established a Bayesian network using a tree-augmented naïve Bayes algorithm to mine relationships between factors influencing hepatocellular carcinoma after hepatectomy. While Bayesian networks can be used to predict disease prevalence, it is also, feasible to use them for providing accurate personalized survival estimates and treatment selection for patient specific variables. Sesen et al. [16] used such a Bayesian network model to create a decision support system for lung cancer care. Forsberg et al. [17] developed and trained a machine-learned Bayesian belief network model to estimate survival time in patients with operable skeletal metastases using candidate features based on historical data. Stojadinovic et al. [18] also used a machine-learned Bayesian belief network to provide clinical decision support in estimating overall survival among colon cancer patients based on a set of prognostic factors at 12-, 24-, 36-, and 60-month post-treatment follow-up. Moreover, Bayesian Networks can be used to study the evolutionary course of multiple disorders. Lappenschaar et al. [19] used a large dataset to develop a multilevel temporal Bayesian networks to model the progression of six chronic cardiovascular conditions. While interesting, most of these studies have been cross-sectional or focus on a relatively short period of time. Moreover, while these methods describe general comorbidity phenotypes, they do not provide insight into the impact of TBI and comorbid conditions on specific adverse outcomes. In this paper, we develop an unsupervised Multi-level Temporal Bayesian Network (MTBN) from big data to identify the relationships among emergence of five deployment related conditions (TBI, PTSD, Depr, SuAb, and BaPa), and patient level risk factors (race, gender, age, education and marital status) over time. We also use a Longest Path Algorithm (LPA) to identify the most probable sequence of comorbidities emerging from and/or leading to specific chronic conditions. Moreover, we demonstrate the performance of the proposed unsupervised MTBN in comparison with the semi-supervised and supervised MTBNs, as well three baseline methods in the literature, including multivariate probit regression [20], multinomial logistic regression [21], and Latent Regression Markov Mixture Clustering (LRMCL) [12].

Methods

Study population

The proposed study uses de-identified data from a large national cohort of patients (N = 608,503) who were deployed in support of the wars in Afghanistan and Iraq and who entered care in the Department of Veterans Affairs between 2002 and 2011 and who received care at least once a year in three different years between 2002-2015. For the purpose of this analysis, only patients with care each year for the first five consecutive years after entering VA care were included (n = 257,633). Dropout may result from not requiring care, dropping out of VA care, or death. Individuals in the cohort were identified using the roster of veterans who had been previously deployed in support of Operations Enduring Freedom, Iraqi Freedom, and New Dawn (OEF/OIF/OND roster). The inpatient and outpatient data were then obtained from the VA national databases in Austin Texas. Data included ICD-9-CM diagnosis codes documented during the course of VA care, during each inpatient or outpatient encounter. This study received institutional review board approval from the University of Texas Health Science Center at San Antonio and the Bedford VA Hospital with a waiver of informed consent.

Measures

Diagnosed health conditions

We used ICD-9-CM codes from inpatient and outpatient data (excluding ancillary and telephone care) to identify Traumatic brain injury (TBI), Post Traumatic Disorder (PTSD), Depression (Depr), substance abuse (SuAb), Back pain (BaPa) using validated published algorithms [22]. PTSD, SA, and BP required two diagnoses at least seven days apart, while TBI, which is an acute injury required only a single diagnosis. Each condition was coded as “0” or “1” for each year of care, with 1 indicating a diagnosis for that condition regardless of the number of instances for which each condition was diagnosed (See S1 Table for ICD-9 Codes for the considered conditions in this manuscript). Table 1 illustrates the prevalence of the five comorbidities in the final dataset based on the first five years of care in the VA.
Table 1

The prevalence of the five comorbidities in the final dataset based on the five years of care.

TBIPTSDBaPaSuABDeprYear_1Year_2Year_3Year_4Year_5
00000137503109626105294103233103950
53.37%42.55%40.87%40.07%40.35%
000011092012382120511186611486
4.24%4.81%4.68%4.61%4.46%
0001037384038392338003594
1.45%1.57%1.52%1.47%1.40%
0001120152378243523972376
0.78%0.92%0.95%0.93%0.92%
001003008232336313503030929622
11.68%12.55%12.17%11.76%11.50%
0010144815898594459875895
1.74%2.29%2.31%2.32%2.29%
0011010761321125712911231
0.42%0.51%0.49%0.50%0.48%
00111656969101010791030
0.25%0.38%0.39%0.42%0.40%
010001657020550225152335023414
6.43%7.98%8.74%9.06%9.09%
010011008413313148031520115513
3.91%5.17%5.75%5.90%6.02%
0101030154363489953525459
1.17%1.69%1.90%2.08%2.12%
0101135425152584461736366
1.37%2.00%2.27%2.40%2.47%
01100787811031119621264112953
3.06%4.28%4.64%4.91%5.03%
011015539861995761026610899
2.15%3.35%3.72%3.98%4.23%
0111013552088243826962879
0.53%0.81%0.95%1.05%1.12%
0111117553096355440554252
0.68%1.20%1.38%1.57%1.65%
1000028342515200016991562
1.10%0.98%0.78%0.66%0.61%
10001514500448403339
0.20%0.19%0.17%0.16%0.13%
10010198178161133121
0.08%0.07%0.06%0.05%0.05%
100111151371229985
0.04%0.05%0.05%0.04%0.03%
1010015281352922876793
0.59%0.52%0.36%0.34%0.31%
10101396387316293260
0.15%0.15%0.12%0.11%0.10%
10110116108746081
0.05%0.04%0.03%0.02%0.03%
101118686767568
0.03%0.03%0.03%0.03%0.03%
1100029373303303429162639
1.14%1.28%1.18%1.13%1.02%
1100118582293225321871990
0.72%0.89%0.87%0.85%0.77%
11010565787782801764
0.22%0.31%0.30%0.31%0.30%
110118341182119512161171
0.32%0.46%0.46%0.47%0.45%
1110024292995287826772435
0.94%1.16%1.12%1.04%0.95%
1110118042741251824522396
0.70%1.06%0.98%0.95%0.93%
11110535678733702714
0.21%0.26%0.28%0.27%0.28%
111116751231126613481296
0.26%0.48%0.49%0.52%0.50%

Demographic characteristics

Sociodemographic characteristics included age at VA entry, sex, race/ethnicity and education. Age was identified during the first year of VA care. Based on our prior work, we categorized age as 18-30, 31-40, 41-50, and 51 and older. Sex, race/ethnicity (White, African American, Hispanic, Asian/Pacific Islander, Native American, unknown) and education (less than high school, high school graduate, some college, college graduate, post-college education) at the time of leaving the military were obtained from the OEF/OIF/OND Roster; missing values for race and sex were supplemented from VA data. Marital status was obtained from VA data (Table 2).
Table 2

Demographics of the patients included in the study.

Sl No.RaceGenderMarital StatusAge GroupEducation
MaleFemaleMarriedUn-Married18-3031-4041-5051- RestUnknown< High SchoolHigh SchoolSome CollegeCollege GraduatePost College
1White14835519183744879305196799360032616785692334203712992116743120244479
57.58%7.45%28.91%36.12%37.57%13.97%10.16%3.33%0.91%0.79%50.43%6.50%4.67%1.74%
2Black3575811828233082427820047124681271023616585043750648193160939
13.88%4.59%9.05%9.42%7.78%4.84%4.93%0.92%0.26%0.20%14.56%1.87%1.23%0.36%
3Hispanic2537342321452315082170166606475812253863602359229331893441
9.85%1.64%5.64%5.85%6.60%2.56%1.85%0.48%0.15%0.14%9.16%1.14%0.73%0.17%
4Asian563998130673553323513611564460131604732598879220
2.19%0.38%1.19%1.38%1.26%0.53%0.61%0.18%0.05%0.02%1.84%0.23%0.34%0.09%
5Native30817071747204121159255641846060300437621771
1.20%0.27%0.68%0.79%0.82%0.36%0.22%0.07%0.02%0.02%1.17%0.15%0.08%0.03%
6Unknown213536113461150106262567313651221808287223105
0.83%0.14%0.52%0.45%0.41%0.24%0.26%0.05%0.02%0.01%0.70%0.11%0.09%0.04%

Bayesian networks

In this section, we describe the proposed unsupervised approach along with the supervised and semi-supervised approaches for structure learning in Bayesian networks to mine patterns of comorbidity evolution in patients with Multiple Chronic Conditions (MCC). We also present a longest path algorithm to identify the most likely path to evolution of a chronic condition. This is one of the first studies to demonstrate the inherent association among chronic conditions and demographics for evolution of new conditions. Fig 1 shows the general scheme of the proposed approach.
Fig 1

Framework.

General Scheme of the proposed method.

Framework.

General Scheme of the proposed method. Bayesian network [23-27] is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). In this study two sets of low-level and high-level variables are considered: (1) low level binary variables representing having or not having a chronic condition, namely TBI, PTSD, BaPa, Depr, and SuAb, and (2) high-level discrete variables representing demographic factors, namely, race, gender, marital status, age group and education. When the structure of a Bayesian network (DAG) in known, the joint probability distribution over the random variables can be derived according to the dependencies represented in the graph: Eq (1) can be used to calculate the likelihood and estimate the conditional probabilities of the Bayesian network [27-31], having a dataset of variable observations, i.e. MCC occurrences.

Structural learning

In most real-world cases, such as multiple chronic condition studies, the Bayesian network structure (DAG) is partially or completely unknown and should be learned along with the conditional probabilities. Generally, the Bayesian network structure can be learned using: (1) Unsupervised learning algorithms, including score-based and constraint-based algorithms, (2) Supervised learning algorithms, where expert knowledge provides the DAG, and (3) semi-supervised methods, which combine unsupervised and supervised learning algorithms. Unsupervised learning algorithm One efficient way of unsupervised structure learning algorithm begins with finding the Maximum Weight Spanning Tree (MWST) for the given data using the mutual information to provides an initial variables (node) ordering in O(n2) steps [32]. Next, a greedy search method such as K2 Algorithm [33] is used to incrementally form the DAG structure based on maximum likelihood in O(n2) steps. Temporal learning Unsupervised learning algorithm can be applied to temporal dataset as well. However, when the number of time slices and consequently variables increase, as in MCC evolution over time, the computational complexity of MWST algorithm degrades considerably. Therefore, to improve the efficiency of the learning process, we apply the MWST algorithm to each time slice separately, and then integrate the result into a single topological ordering based on their time slice (See Fig 2).
Fig 2

Unsupervised method.

Implementation of the Proposed Algorithm (Unsupervised).

Unsupervised method.

Implementation of the Proposed Algorithm (Unsupervised). Supervised learning algorithm (expert knowledge) It uses the expert opinion, i.e. physicians input, and/or earlier related studies to identify the complete network structure (e.g. Fig 3). Then the joint probability distributions can be calculated using either Maximum Likelihood Estimation (MLE) [27], Bayesian Estimation [28], or Expectation Maximization (EM) Algorithm (for incomplete data) [30]. In our study we use the literature review to identify the complete network structure, which is assumed to be consistent across different time slices. We also used Maximum Likelihood Parameter Estimation algorithm to calculate the conditional probabilities based in the study dataset.
Fig 3

Supervised method.

The network structure inferred from literature review.

Supervised method.

The network structure inferred from literature review. Semi-supervised method It’s the mixture of the previous two methods. First, the initial ordering of MCC nodes are derived from the expert opinion and/or literature reviews. Next, the ordered nodes are passed through K2 algorithm to get the final structure of the network and conditional probabilities. Inclusion of hierarchical level To improve the predictive performance of the Bayesian network, demographic variable such as race, sex, age group and education can be added to the DAG as higher-level variables. For this purpose, we simply connect all demographic variables to all the condition with the direction of the connections is constrained to be from demographics to chronic conditions. Meanwhile, we assume no dependence among demographic variables themselves and therefore avoiding any connection between demographics. The probability of the conditional dependencies between the demographic variables and the chronic conditions can be learned in the same we as the other conditional dependencies in the network.

Longest path algorithm (LPA)

Having the Bayesian network structure and probability of conditional dependencies estimated, various queries can be answered to support better decision making for practitioners to slow down or stop the progression of targeted conditions. Among the most important ones is identifying the most likely sequence of comorbidities emerging from and/or leading to specific chronic conditions, i.e. the most likely path between TBI in year 1 and substance abuse in year 5. Such query can be effectively answered by treating the preexisting condition, i.e. TBI, as the source node, and the target condition, i.e. SuAb, as the sink node, and finding the longest path between them on the graph. When only the initial chronic condition (source node), or the terminal chronic condition (sink node) is of interest, i.e. finding the most likely path (from any comorbidity) to substance abuse in year 5, we can introduce a dummy source node to the MTBN and connect it to all of the conditions in the first year, and find the longest path between the dummy source node and the sink node, i.e. SuAb in year 5, on the graph. All the algorithms discussed in this section are included in S1 File.

Results

Learned bayesian structure

Fig 4 illustrates the proposed unsupervised Multi-level Temporal Bayesian network (MTBN). The illustrated MTBN is consisted of five-time slices, one for each year of care, five demographic variables which enable the practitioners to customize the network to specific patients, and the probabilistic dependencies among five prevalent chronic conditions in the dataset (See S1 Fig for the supervised and semi-supervised MTBNs).
Fig 4

Unsupervised network.

Learned BN structure from the proposed method (Unsupervised Method).

Unsupervised network.

Learned BN structure from the proposed method (Unsupervised Method). Comparing the unsupervised MTBN (Fig 4) with the supervised and semi-supervised MTBN networks, we found a significant degree of similarity among the conditional dependencies as shown in Table 3.
Table 3

Similarity between the learned matrices.

Comparision MatrixSupervisedsemi-supervised
UnsupervisedCosine Similarity79.0%93.8%
Correlation Co-efficient77.0%93.0%

Performance comparisons

We use the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) function [34] based on 10-fold cross validation to compare the performance of the unsupervised MTBN with the semi-supervised and supervised methods, as well as three baseline methods in the literature, including multivariate probit regression [20], multinomial logistic regression [21], and Latent Regression Markov Mixture Clustering (LRMCL) [12]. Table 4 illustrates the AUC performance of the competing methods for predicting future comorbidities, given comorbidity information of past years. For example, the first block of rows in Table 4 shows the prediction accuracy of year-2 to year 5 comorbidities, given year-1 comorbidities. Likewise, the second block of rows in Table 4 illustrates the prediction accuracy of year-3 to year 5 comorbidities, given year-1 and year-2 comorbidities. Except for the LRMCL method, which can only incorporate the comorbidity information of the immediate preceding year [12] (See first block of rows in Table 4), for all other competing methods, we collect the information of AUC performance for various years of given comorbidities, namely year 1 to year 4. As shown in Table 4, the MTBNs provides the best performance across all competing methods, followed by the probit and logistic regression and finally LRMCL. Among MTBNs, the semi-supervised learning method demonstrates the overall best performance followed by supervised learning, and unsupervised learning. Meanwhile, the unsupervised learning method which uses no expert information or supervision for structure learning, provides a comparable performance to the best performing methods. Moreover, for all of the comparing methods, as the amount of provided information (comorbidity information of past years) increases, the accuracy of the predictions increases.
Table 4

The Area Under the Curve (AUC) performance of the competing methods for predicting future comorbidities, given comorbidity information of past years.

Evidence ProvidedPrediction Year
Year-1Year-2Year-3Year-4Year-5
DAG methodTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDepr
Unsupervised72.11%78.31%64.28%72.09%66.92%70.08%74.95%62.16%69.09%65.96%69.19%73.36%55.32%68.02%64.70%69.56%71.92%62.66%59.41%64.11%
Semi-Supervised73.81%80.10%74.18%79.34%75.02%71.20%76.89%70.94%75.38%71.21%70.40%75.10%69.55%74.60%69.78%70.58%73.64%68.21%72.82%68.47%
Supervised75.08%80.30%74.42%79.50%76.42%71.35%76.96%71.61%75.58%71.56%70.11%75.18%68.61%74.37%68.90%67.18%73.61%65.39%72.90%67.29%
Multinomial Logit69.42%72.54%69.37%77.16%69.96%66.69%70.53%67.03%72.98%64.59%69.14%67.75%66.22%72.19%62.87%65.46%66.45%64.20%71.32%60.97%
Multinomial Probit69.23%74.91%69.81%77.34%70.70%66.07%71.78%67.29%73.16%64.93%68.50%68.66%66.31%72.24%63.10%64.35%67.36%64.27%71.44%61.13%
LRMCL67.59%67.02%66.35%71.91%67.34%57.03%63.88%56.48%59.54%56.12%51.02%62.25%50.18%53.00%49.61%49.21%61.18%48.08%42.87%46.44%
Evidence ProvidedPrediction Year
Year-1Year-2Year-3Year-4Year-5
DAG methodTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDepr
Unsupervised79.49%84.29%64.06%75.71%73.93%76.17%81.33%55.10%73.48%72.35%75.93%79.11%68.25%63.92%70.64%
Semi-Supervised80.30%86.37%81.18%85.08%82.15%77.03%83.09%77.72%81.92%78.47%76.60%80.97%75.87%79.33%75.79%
Supervised79.48%87.11%82.14%85.11%82.50%76.77%83.90%77.93%81.91%77.68%73.74%81.33%73.39%79.65%74.20%
Multinomial Logit72.59%77.03%73.80%81.39%71.40%72.28%73.19%71.55%79.32%66.83%71.45%71.96%69.24%76.24%65.26%
Multinomial Probit72.94%79.79%74.42%81.58%73.40%72.12%74.88%71.96%79.51%68.45%70.88%73.54%69.55%76.46%66.33%
Evidence ProvidedPrediction Year
Year-1Year-2Year-3Year-4Year-5
DAG methodTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDepr
Unsupervised81.48%85.38%50.48%74.11%76.98%79.80%82.58%75.06%64.77%74.85%
Semi-Supervised81.52%87.37%82.85%87.23%84.39%80.21%84.58%79.93%83.50%80.06%
Supervised81.73%88.60%83.49%87.71%84.19%78.71%85.28%79.19%84.63%78.99%
Multinomial Logit76.82%78.32%73.52%85.43%71.88%74.07%76.57%71.25%79.80%68.93%
Multinomial Probit77.06%80.98%74.25%85.76%74.40%73.96%78.52%71.77%80.10%70.69%
Evidence ProvidedPrediction Year
Year-1Year-2Year-3Year-4Year-5
DAG methodTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDeprTBIPTSDBaPaSuAbDepr
Unsupervised83.47%85.35%81.99%63.88%78.35%
Semi-Supervised83.61%87.10%83.77%87.66%84.35%
Supervised82.87%89.06%83.83%88.67%84.97%
Multinomial Logit77.44%79.00%74.18%83.22%72.75%
Multinomial Probit77.66%81.76%74.83%83.61%75.31%
To demonstrate the predicted frequency of each comorbid condition, given the information of the past years, Table 5 illustrates an example of the unsupervised, semi-supervised, and supervised MTBNs’ conditional probabilities of the comorbidities in year two, given year one data, for a sample patients with the following risk factors, gender (male), marital status (unmarried), education (less than high school), race (white), and age (18-30). For the economy of space, we use the following coding system to represent the comorbidities in the table: no-comorbidity (“0”), TBI (“1”), PTSD (“2”), BaPa (“3”), SuAb (“4”), and Depr (“5”) (See S2 Table for the detail description). Also, to highlight the significance of the transition probabilities among comorbidities, higher probabilities are highlighted with darker color. To interpret the table, the numbers on the main diagonal show the probabilities of retaining the conditions, the numbers above the main diagonal present the probabilities of adding new conditions, and the numbers below the diagonal represent the probabilities of remission from one or more of the existing conditions from year one to year two. As illustrated in the table, there is a considerable similarity between the conditional probabilities and patterns of high transition probabilities of the unsupervised, semi-supervised, and supervised methods. Meanwhile, the most significant pattern of the transition probabilities across all three MTBNs is retention of the existing comorbidities, which is intuitive. The other major pattern is remission from existing conditions (numbers below the main diagonal) which is more likely to happen for small number of comorbidities, i.e. one of two.
Table 5

Conditional probabilities of the comorbidities in year two, given year one data, for a sample patients with the following risk factors, gender (male), marital status (unmarried), education (less than high school), race (white), and age (18-30).

Unsupervised Method
Year 2
012345121314152324253435451231241251341351452342352453451234123513451245234512345
044.2%0.4%15.5%0.3%15.1%0.2%3.9%0.2%7.6%0.3%1.8%0.2%1.8%0.2%0.6%0.2%3.1%0.1%0.7%0.1%0.7%0.1%0.2%0.1%1.2%0.1%0.3%0.1%0.3%0.1%0.1%0.1%
122.0%1.6%18.4%1.2%18.2%0.9%9.8%0.8%7.5%1.2%2.9%0.8%2.9%0.6%1.7%0.5%2.3%0.3%0.9%0.2%0.9%0.2%0.5%0.1%1.4%0.2%0.6%0.2%0.6%0.1%0.3%0.1%
224.5%3.9%12.1%2.3%12.1%2.6%4.2%1.7%6.2%2.9%2.7%1.6%2.8%1.9%1.6%1.2%2.8%1.2%1.1%0.7%1.2%0.8%0.6%0.5%1.6%1.1%0.8%0.6%0.9%0.7%0.5%0.5%
36.8%8.6%9.6%5.6%13.0%5.7%6.4%4.1%6.1%4.4%3.2%2.7%3.5%2.7%2.2%1.9%1.9%1.2%1.0%0.8%1.1%0.8%0.7%0.6%1.2%0.9%0.6%0.5%0.7%0.5%0.4%0.4%
424.7%3.6%13.8%2.9%14.4%2.0%5.1%1.7%7.5%2.7%3.3%2.2%3.0%1.5%1.7%1.3%1.3%0.6%0.7%0.5%0.5%0.4%0.4%0.3%0.8%0.6%0.5%0.5%0.4%0.3%0.3%0.3%
54.5%8.4%12.8%6.9%9.0%5.0%6.7%4.4%6.2%4.9%4.6%4.1%3.4%2.9%2.8%2.6%1.1%1.0%0.9%0.8%0.6%0.5%0.5%0.5%0.9%0.8%0.7%0.7%0.5%0.5%0.4%0.4%
123.6%10.1%8.1%6.3%7.9%5.9%4.4%3.9%7.0%6.4%4.1%3.8%3.9%3.7%2.4%2.3%1.7%1.6%1.0%1.0%1.0%0.9%0.6%0.6%1.4%1.4%0.9%0.9%0.8%0.8%0.5%0.5%
130.5%13.2%9.8%9.4%8.7%8.3%6.3%6.1%5.0%5.0%3.3%3.3%3.1%3.1%2.1%2.1%1.2%1.1%0.8%0.8%0.7%0.7%0.5%0.5%0.9%0.9%0.6%0.6%0.5%0.5%0.3%0.3%
147.8%0.3%10.1%0.2%10.5%0.2%3.0%0.2%37.1%0.3%7.6%0.2%7.9%0.2%2.2%0.2%4.0%0.1%0.9%0.1%0.9%0.1%0.3%0.0%3.6%0.1%0.8%0.1%0.8%0.1%0.3%0.0%
154.8%1.4%12.9%0.9%12.7%0.8%7.0%0.6%23.2%1.3%8.7%0.9%8.4%0.8%4.6%0.6%2.2%0.3%0.9%0.2%0.9%0.2%0.5%0.1%2.0%0.3%0.9%0.2%0.8%0.2%0.5%0.1%
235.9%3.4%9.1%2.0%9.7%2.5%3.9%1.6%20.6%3.3%6.2%2.0%6.7%2.4%2.9%1.6%2.9%0.9%1.1%0.6%1.2%0.7%0.6%0.4%2.6%0.9%1.0%0.6%1.1%0.7%0.6%0.4%
241.8%6.3%6.5%4.0%8.7%4.4%4.5%3.0%10.9%6.1%5.2%3.9%6.3%4.2%3.7%2.9%1.9%1.5%1.1%0.9%1.2%0.9%0.7%0.7%1.8%1.4%1.0%0.9%1.1%0.9%0.7%0.7%
254.9%3.5%9.4%2.9%8.7%2.0%3.8%1.7%22.0%3.4%7.2%2.8%7.1%2.0%3.2%1.7%2.3%0.8%1.0%0.7%0.8%0.5%0.5%0.4%2.2%0.8%1.0%0.7%0.8%0.5%0.5%0.4%
341.6%6.3%9.6%4.8%7.0%3.7%4.9%3.1%11.7%6.1%7.2%4.7%5.7%3.6%4.1%2.9%1.4%1.0%1.0%0.8%0.7%0.6%0.6%0.5%1.3%1.0%0.9%0.8%0.7%0.6%0.6%0.5%
350.8%8.0%6.0%5.0%5.7%4.7%3.5%3.2%9.5%7.7%5.3%4.8%5.1%4.6%3.2%3.0%2.0%1.8%1.2%1.2%1.1%1.1%0.8%0.7%2.0%1.8%1.2%1.2%1.1%1.1%0.8%0.7%
450.3%8.5%6.1%5.8%5.7%5.4%3.9%3.8%8.5%8.2%5.6%5.5%5.2%5.1%3.6%3.6%1.4%1.4%1.0%0.9%0.9%0.9%0.6%0.6%1.4%1.4%0.9%0.9%0.9%0.8%0.6%0.6%
12322.7%0.4%12.1%0.3%11.9%0.2%3.4%0.2%7.5%0.3%1.8%0.2%1.8%0.2%0.7%0.2%19.3%0.3%4.2%0.2%4.1%0.2%1.3%0.1%3.7%0.2%0.9%0.2%0.9%0.1%0.4%0.1%
12410.2%1.6%13.8%1.2%13.4%0.9%7.7%0.8%6.6%1.2%2.8%0.9%2.7%0.7%1.6%0.6%11.3%0.9%4.6%0.7%4.3%0.5%2.7%0.5%2.7%0.7%1.2%0.5%1.2%0.4%0.8%0.4%
12510.8%3.4%8.7%2.0%9.1%2.4%3.5%1.6%5.3%2.5%2.4%1.4%2.5%1.7%1.4%1.1%11.7%2.4%3.7%1.4%4.0%1.7%1.8%1.1%3.2%1.8%1.5%1.1%1.6%1.3%1.0%0.8%
1342.5%6.2%6.4%4.1%8.5%4.3%4.4%3.1%4.8%3.7%2.7%2.3%2.9%2.4%1.8%1.6%6.3%3.5%3.1%2.4%3.8%2.5%2.3%1.8%2.6%2.2%1.5%1.4%1.6%1.4%1.0%1.0%
13513.3%3.4%10.8%2.9%9.9%1.9%4.2%1.7%6.4%2.5%3.1%2.1%2.6%1.4%1.6%1.2%7.7%2.0%3.2%1.7%2.1%1.2%1.4%1.0%2.6%1.5%1.6%1.3%1.2%0.9%0.9%0.8%
1451.6%5.9%7.9%5.0%5.5%3.4%4.4%3.1%5.0%4.4%4.0%3.7%2.8%2.6%2.4%2.3%3.9%3.2%3.2%2.7%2.3%1.8%1.9%1.6%2.6%2.6%2.3%2.2%1.5%1.5%1.3%1.3%
2341.5%7.5%5.9%4.8%5.5%4.4%3.3%3.0%5.0%4.7%3.0%2.9%2.9%2.7%1.8%1.7%5.1%4.4%3.1%2.9%2.8%2.6%1.8%1.8%3.0%2.9%1.8%1.8%1.7%1.7%1.1%1.1%
2350.2%8.3%6.1%5.9%5.3%5.0%3.7%3.7%4.6%4.5%3.1%3.1%2.8%2.7%1.9%1.9%4.3%4.2%3.0%3.0%2.5%2.4%1.8%1.7%2.7%2.7%1.8%1.8%1.6%1.6%1.1%1.1%
2454.0%0.2%7.1%0.2%7.5%0.1%2.3%0.1%26.6%0.2%5.5%0.2%5.6%0.1%1.7%0.1%13.4%0.1%2.8%0.1%2.9%0.1%0.9%0.1%11.7%0.1%2.5%0.1%2.5%0.1%0.8%0.1%
3452.4%1.2%9.5%0.8%9.3%0.7%5.5%0.6%16.9%1.2%6.7%0.8%6.5%0.7%3.8%0.6%7.1%0.8%3.1%0.6%3.0%0.5%1.9%0.4%6.2%0.8%2.7%0.6%2.6%0.5%1.6%0.4%
12343.0%2.5%6.6%1.6%7.2%1.9%3.0%1.2%15.1%2.5%4.7%1.5%5.0%1.8%2.2%1.2%8.7%1.6%2.8%1.0%3.2%1.2%1.5%0.8%7.1%1.6%2.4%1.0%2.6%1.2%1.3%0.8%
12350.8%4.8%4.6%3.1%5.9%3.3%3.2%2.3%7.6%4.7%3.9%3.0%4.5%3.2%2.8%2.2%4.6%3.2%2.5%2.1%2.8%2.1%1.8%1.5%4.2%3.2%2.4%2.0%2.6%2.1%1.7%1.5%
13452.8%2.7%7.2%2.3%6.1%1.6%2.9%1.4%16.5%2.7%5.6%2.2%5.1%1.6%2.5%1.4%7.4%1.6%2.8%1.4%2.0%0.9%1.2%0.8%6.7%1.6%2.6%1.4%2.0%0.9%1.2%0.8%
12450.8%4.7%7.0%3.7%5.0%2.8%3.7%2.4%8.3%4.6%5.4%3.6%4.1%2.7%3.1%2.3%4.0%2.6%2.9%2.1%2.1%1.6%1.7%1.4%3.7%2.6%2.6%2.1%2.0%1.6%1.6%1.3%
23450.5%5.9%4.5%3.8%4.2%3.6%2.6%2.4%7.1%5.8%4.1%3.8%3.9%3.5%2.5%2.4%4.1%3.5%2.5%2.3%2.3%2.1%1.5%1.5%4.0%3.5%2.4%2.3%2.3%2.1%1.5%1.5%
123450.1%6.1%4.4%4.2%4.1%3.8%2.8%2.7%6.1%5.9%4.2%4.1%3.8%3.7%2.7%2.6%3.6%3.5%2.5%2.4%2.2%2.2%1.6%1.6%3.5%3.4%2.4%2.4%2.2%2.1%1.5%1.5%
Semi-Supervised Method
012345121314152324253435451231241251341351452342352453451234123513451245234512345
034.1%6.8%4.7%1.9%9.1%1.8%1.0%0.5%11.5%4.0%2.6%1.3%3.3%1.3%0.8%0.4%4.1%1.0%0.7%0.3%1.5%0.5%0.3%0.1%2.6%0.9%0.6%0.3%1.1%0.4%0.3%0.1%
18.0%27.0%4.8%4.1%6.5%4.9%1.0%0.9%9.4%7.7%2.3%2.1%2.7%2.1%0.7%0.6%2.8%2.2%0.7%0.6%1.0%0.8%0.3%0.3%1.8%1.5%0.7%0.6%0.8%0.6%0.3%0.2%
210.3%5.8%25.1%4.7%6.4%1.5%4.3%1.1%8.1%2.7%6.4%2.2%2.3%0.9%1.9%0.8%2.9%0.7%2.2%0.6%1.1%0.3%0.8%0.3%1.8%0.6%1.5%0.5%0.8%0.3%0.6%0.3%
33.7%16.0%16.8%12.8%4.1%3.0%2.9%2.3%6.0%4.8%4.7%3.9%1.7%1.4%1.3%1.1%1.8%1.4%1.5%1.2%0.7%0.5%0.5%0.4%1.2%0.9%0.9%0.8%0.5%0.4%0.4%0.3%
411.0%6.3%3.7%1.5%31.3%4.9%2.7%1.2%8.8%3.3%2.0%1.0%6.8%2.6%1.5%0.8%2.2%0.6%0.4%0.2%1.9%0.5%0.3%0.2%1.3%0.5%0.3%0.2%1.2%0.5%0.3%0.2%
52.2%19.7%3.3%2.9%18.8%14.0%2.4%2.1%6.4%5.1%1.5%1.3%4.9%3.8%1.2%1.0%1.3%1.0%0.3%0.3%1.1%0.9%0.3%0.3%0.8%0.7%0.3%0.3%0.7%0.6%0.3%0.2%
122.9%4.8%17.6%3.6%19.1%3.7%12.2%2.7%5.4%2.0%4.2%1.6%4.2%1.6%3.3%1.3%1.3%0.4%1.0%0.3%1.2%0.3%0.9%0.3%0.8%0.3%0.7%0.3%0.7%0.3%0.6%0.3%
131.0%11.9%11.8%9.1%11.9%8.6%8.2%6.3%4.0%3.2%3.2%2.5%3.1%2.4%2.4%1.9%0.8%0.6%0.7%0.5%0.7%0.6%0.6%0.5%0.5%0.4%0.4%0.3%0.5%0.4%0.4%0.3%
147.5%6.0%4.9%1.6%6.4%1.2%0.8%0.3%34.8%5.8%4.7%1.5%5.7%1.2%0.8%0.3%4.8%0.8%0.7%0.2%1.4%0.3%0.2%0.1%4.6%0.8%0.7%0.2%1.3%0.3%0.2%0.1%
152.4%17.4%3.3%2.6%3.9%2.8%0.5%0.4%22.2%16.3%3.1%2.5%3.6%2.7%0.5%0.4%2.9%2.1%0.5%0.4%0.8%0.6%0.2%0.1%2.8%2.1%0.5%0.4%0.8%0.6%0.2%0.1%
232.8%4.7%16.4%3.8%3.6%0.9%2.5%0.7%19.9%4.4%14.5%3.6%3.2%0.8%2.4%0.7%2.8%0.6%2.1%0.5%0.8%0.2%0.6%0.2%2.6%0.6%2.0%0.5%0.8%0.2%0.6%0.2%
241.2%10.5%10.9%8.3%2.4%1.7%1.7%1.4%13.4%9.8%10.1%7.9%2.2%1.7%1.7%1.3%1.7%1.3%1.3%1.1%0.5%0.4%0.4%0.3%1.7%1.3%1.3%1.0%0.5%0.4%0.4%0.3%
252.4%4.3%2.9%1.1%18.0%3.4%2.2%0.8%20.6%4.1%2.8%1.0%15.4%3.3%2.1%0.8%3.0%0.6%0.4%0.2%2.4%0.5%0.4%0.1%2.8%0.6%0.4%0.2%2.3%0.5%0.4%0.1%
340.8%11.1%2.1%1.6%11.4%8.3%1.5%1.2%13.8%10.3%1.9%1.5%10.3%7.7%1.5%1.1%1.8%1.4%0.3%0.3%1.5%1.1%0.3%0.2%1.8%1.3%0.3%0.3%1.5%1.1%0.3%0.2%
351.0%3.2%10.1%2.6%10.7%2.5%7.3%2.0%12.3%3.0%9.0%2.4%9.2%2.3%6.7%1.9%1.8%0.4%1.3%0.4%1.5%0.4%1.1%0.3%1.7%0.4%1.3%0.3%1.4%0.4%1.1%0.3%
450.4%6.9%6.9%5.4%7.0%5.1%5.0%4.0%8.5%6.4%6.4%5.1%6.3%4.8%4.8%3.9%1.1%0.8%0.9%0.7%0.9%0.7%0.7%0.6%1.1%0.8%0.9%0.7%0.9%0.7%0.7%0.6%
12317.3%5.7%4.3%1.7%5.5%1.2%0.7%0.3%10.5%3.6%2.4%1.2%2.2%0.8%0.5%0.3%17.9%3.0%2.3%1.0%3.1%0.7%0.5%0.2%6.1%2.1%1.4%0.7%1.5%0.6%0.3%0.2%
1243.0%17.3%3.9%3.4%3.3%2.5%0.7%0.6%8.7%7.2%2.6%2.3%1.8%1.4%0.6%0.5%10.0%7.4%2.1%1.9%1.9%1.4%0.5%0.4%4.7%3.9%1.7%1.5%1.1%0.9%0.4%0.4%
1254.9%4.4%18.6%3.6%3.6%0.9%2.5%0.7%6.9%2.3%5.6%1.9%1.4%0.6%1.2%0.5%11.6%2.1%8.4%1.8%2.0%0.5%1.5%0.4%4.0%1.3%3.4%1.1%0.9%0.4%0.8%0.3%
1341.4%10.9%11.8%9.1%2.2%1.6%1.7%1.3%5.7%4.6%4.5%3.8%1.2%0.9%0.9%0.7%6.5%4.9%5.5%4.3%1.2%0.9%1.0%0.8%3.1%2.5%2.5%2.1%0.7%0.6%0.6%0.5%
1355.6%4.4%2.8%1.2%19.8%3.4%2.1%0.9%6.9%2.5%1.6%0.8%5.4%2.1%1.2%0.7%10.5%2.0%1.4%0.6%8.5%1.8%1.2%0.5%3.6%1.4%0.9%0.5%3.3%1.3%0.8%0.4%
1451.0%11.8%2.5%2.2%11.1%8.4%1.9%1.6%5.7%4.5%1.6%1.4%4.4%3.5%1.3%1.2%5.8%4.4%1.2%1.1%4.8%3.7%1.1%1.0%2.8%2.2%1.0%0.9%2.5%2.0%0.9%0.8%
2341.5%3.3%11.9%2.6%12.3%2.6%8.3%2.0%4.3%1.6%3.4%1.3%3.4%1.3%2.7%1.1%6.5%1.4%4.7%1.2%5.3%1.3%3.8%1.0%2.2%0.9%1.9%0.8%2.1%0.8%1.7%0.7%
2350.5%7.5%7.9%6.1%7.3%5.4%5.5%4.3%3.7%3.0%3.0%2.4%2.9%2.3%2.3%1.8%3.8%2.9%3.3%2.5%3.2%2.4%2.7%2.1%1.8%1.4%1.5%1.2%1.7%1.3%1.3%1.1%
2453.7%4.3%3.6%1.2%4.5%0.9%0.6%0.2%24.5%4.1%3.5%1.1%4.1%0.9%0.6%0.2%13.4%2.1%1.9%0.6%2.6%0.5%0.4%0.1%12.2%2.1%1.8%0.6%2.5%0.5%0.4%0.1%
3451.1%12.2%2.6%2.1%2.8%2.0%0.4%0.3%15.8%11.6%2.5%2.0%2.6%2.0%0.4%0.3%8.0%5.7%1.5%1.2%1.6%1.2%0.3%0.2%7.6%5.6%1.5%1.2%1.6%1.2%0.3%0.2%
12341.4%3.4%11.9%2.8%2.6%0.6%1.9%0.5%14.3%3.2%10.6%2.7%2.4%0.6%1.8%0.5%7.8%1.7%5.8%1.4%1.5%0.4%1.1%0.3%7.1%1.6%5.4%1.4%1.5%0.4%1.1%0.3%
12350.6%7.6%8.0%6.1%1.7%1.3%1.3%1.1%9.8%7.2%7.6%5.9%1.6%1.3%1.3%1.0%5.0%3.6%3.8%3.0%1.0%0.8%0.8%0.6%4.8%3.5%3.8%2.9%1.0%0.8%0.8%0.6%
13451.4%3.1%2.2%0.8%12.3%2.4%1.6%0.6%14.8%2.9%2.1%0.8%10.9%2.3%1.5%0.6%8.2%1.6%1.2%0.4%6.5%1.3%0.9%0.4%7.5%1.5%1.1%0.4%6.0%1.3%0.9%0.4%
12450.5%7.9%1.7%1.3%7.9%5.8%1.2%0.9%10.0%7.4%1.6%1.3%7.3%5.5%1.2%0.9%5.2%3.8%0.9%0.7%4.1%3.0%0.8%0.6%4.9%3.7%0.9%0.7%3.9%3.0%0.8%0.6%
23450.6%2.3%7.5%1.9%7.5%1.8%5.3%1.4%9.0%2.2%6.7%1.8%6.6%1.7%4.9%1.4%5.0%1.2%3.7%1.0%3.9%1.0%2.9%0.8%4.5%1.1%3.4%0.9%3.6%0.9%2.8%0.8%
123450.2%5.0%5.1%4.0%5.0%3.7%3.7%3.0%6.3%4.7%4.8%3.9%4.6%3.5%3.5%2.9%3.3%2.4%2.5%2.0%2.6%1.9%2.0%1.6%3.1%2.4%2.4%2.0%2.5%1.9%2.0%1.6%
Supervised Method
012345121314152324253435451231241251341351452342352453451234123513451245234512345
033.9%7.0%4.6%1.9%9.0%1.9%1.0%0.5%11.4%4.0%2.6%1.3%3.3%1.3%0.7%0.4%4.1%1.2%0.7%0.4%1.5%0.5%0.3%0.2%2.5%1.0%0.6%0.3%1.1%0.5%0.3%0.1%
17.9%27.9%4.9%4.1%6.5%4.9%1.0%0.8%9.3%7.5%2.3%2.0%2.7%2.0%0.7%0.5%2.9%2.0%0.8%0.7%1.1%0.7%0.3%0.2%1.9%1.3%0.7%0.6%0.8%0.5%0.3%0.2%
29.6%6.2%24.0%4.9%6.2%1.9%4.2%1.4%8.3%3.0%6.6%2.5%2.4%1.0%1.9%0.9%2.7%0.9%2.1%0.8%1.0%0.4%0.7%0.4%1.6%0.7%1.3%0.7%0.7%0.3%0.5%0.3%
33.5%15.9%16.4%12.6%4.1%2.9%2.9%2.2%6.4%5.2%5.0%4.1%1.8%1.4%1.3%1.0%1.7%1.5%1.5%1.3%0.6%0.5%0.4%0.4%1.1%1.0%0.9%0.9%0.5%0.4%0.3%0.3%
411.4%5.5%3.8%1.5%32.0%4.3%2.8%1.1%9.0%3.0%1.9%1.1%6.9%2.4%1.4%0.8%2.2%0.6%0.4%0.2%1.9%0.6%0.3%0.2%1.4%0.5%0.3%0.2%1.2%0.5%0.3%0.2%
53.1%19.0%3.3%2.7%19.2%14.1%2.3%2.1%6.3%4.8%1.4%1.2%4.9%3.8%1.1%1.0%1.3%1.1%0.3%0.3%1.1%1.0%0.3%0.3%0.8%0.7%0.3%0.3%0.8%0.7%0.3%0.3%
123.0%4.1%18.5%3.1%19.9%3.1%12.8%2.2%6.1%1.8%4.8%1.4%4.7%1.6%3.7%1.2%1.4%0.2%1.0%0.1%1.2%0.1%0.9%0.1%0.8%0.1%0.6%0.0%0.7%0.1%0.6%0.0%
131.2%11.6%11.8%9.1%11.8%8.9%8.3%6.8%4.3%3.4%3.2%2.7%3.3%2.8%2.4%2.1%0.7%0.4%0.5%0.2%0.7%0.3%0.5%0.2%0.5%0.3%0.3%0.2%0.5%0.3%0.3%0.2%
147.5%6.3%4.7%1.5%6.2%1.5%0.8%0.3%33.9%5.9%4.5%1.5%5.5%1.5%0.8%0.3%4.7%1.1%0.7%0.3%1.3%0.4%0.2%0.1%4.5%1.1%0.6%0.3%1.3%0.4%0.2%0.1%
152.5%17.5%3.3%2.6%3.9%2.8%0.6%0.4%22.0%16.2%3.1%2.5%3.6%2.7%0.5%0.4%2.9%2.3%0.5%0.4%0.8%0.6%0.2%0.1%2.8%2.3%0.5%0.4%0.8%0.6%0.2%0.1%
232.5%4.8%15.9%3.8%3.5%1.1%2.5%0.8%19.6%4.6%14.3%3.7%3.2%1.1%2.4%0.8%2.6%0.8%2.0%0.7%0.7%0.3%0.6%0.2%2.5%0.8%1.9%0.6%0.7%0.3%0.6%0.2%
241.0%10.6%10.7%8.4%2.3%1.8%1.8%1.4%13.5%10.0%10.1%8.0%2.2%1.7%1.7%1.4%1.7%1.3%1.3%1.0%0.5%0.4%0.4%0.3%1.7%1.2%1.3%1.0%0.5%0.4%0.4%0.3%
252.5%3.6%3.0%1.0%18.4%3.0%2.3%0.8%21.0%3.4%2.9%1.0%15.7%2.8%2.2%0.8%3.0%0.6%0.5%0.2%2.5%0.6%0.4%0.2%2.9%0.6%0.5%0.2%2.4%0.6%0.4%0.2%
341.0%10.8%2.1%1.6%11.5%8.2%1.6%1.2%13.7%10.0%1.9%1.5%10.2%7.6%1.5%1.1%1.9%1.4%0.4%0.3%1.5%1.2%0.3%0.3%1.8%1.4%0.4%0.3%1.5%1.2%0.3%0.3%
350.9%2.9%10.3%2.4%10.9%2.3%7.5%1.8%12.7%2.7%9.3%2.2%9.4%2.1%7.0%1.7%1.7%0.4%1.3%0.4%1.5%0.4%1.1%0.3%1.7%0.4%1.3%0.4%1.4%0.4%1.1%0.3%
450.4%6.5%7.0%5.3%6.9%5.0%5.2%4.2%8.5%6.2%6.5%5.0%6.3%4.8%4.9%4.0%1.1%0.9%0.9%0.7%0.9%0.8%0.7%0.6%1.1%0.9%0.9%0.7%0.9%0.8%0.7%0.6%
12316.9%6.3%4.1%1.7%5.3%1.3%0.7%0.3%10.2%3.8%2.4%1.2%2.1%0.9%0.5%0.3%17.4%3.6%2.2%1.0%3.0%0.9%0.5%0.2%6.0%2.4%1.4%0.8%1.4%0.6%0.3%0.2%
1242.9%17.8%3.9%3.5%3.4%2.5%0.7%0.5%9.0%7.0%2.6%2.3%1.8%1.2%0.6%0.4%10.3%7.3%2.1%1.9%1.9%1.3%0.5%0.4%4.8%3.5%1.7%1.5%1.1%0.7%0.4%0.3%
1254.6%4.9%17.3%4.1%3.3%1.1%2.3%0.9%6.5%2.4%5.4%2.2%1.4%0.6%1.1%0.6%10.8%2.8%7.8%2.5%1.9%0.7%1.4%0.6%3.9%1.5%3.4%1.5%0.9%0.4%0.7%0.4%
1341.4%10.7%11.2%9.1%2.0%1.5%1.4%1.1%5.6%4.8%4.6%4.1%1.2%1.0%0.8%0.7%6.1%5.1%5.4%4.7%1.2%0.9%0.8%0.6%3.3%3.0%2.8%2.7%0.8%0.7%0.5%0.5%
1355.7%4.1%2.8%1.2%19.8%3.4%2.1%1.0%6.9%2.4%1.5%0.9%5.4%1.9%1.2%0.7%10.5%2.1%1.4%0.7%8.5%2.0%1.2%0.6%3.6%1.3%0.9%0.5%3.3%1.2%0.8%0.5%
1451.3%11.6%2.3%2.0%11.0%8.6%1.7%1.6%5.7%4.6%1.6%1.4%4.5%3.7%1.3%1.2%5.6%4.7%1.0%0.9%4.7%4.0%0.9%0.9%2.6%2.4%0.9%0.9%2.5%2.2%0.9%0.9%
2341.7%2.6%13.4%2.0%13.8%2.0%9.4%1.4%4.8%0.9%3.8%0.6%3.9%0.8%3.0%0.4%7.2%0.9%5.3%0.7%5.9%0.8%4.4%0.6%2.6%0.3%2.1%0.1%2.4%0.2%1.9%0.0%
2350.7%7.5%8.6%5.5%8.3%5.4%6.1%3.7%4.3%3.3%3.0%2.2%3.6%2.8%2.4%1.7%4.0%2.1%3.1%1.3%3.5%1.6%2.7%0.8%2.2%1.6%1.3%0.8%2.2%1.6%1.3%0.8%
2453.6%4.8%3.4%1.2%4.3%1.2%0.6%0.3%23.4%4.6%3.2%1.2%3.9%1.1%0.6%0.3%12.9%2.7%1.7%0.7%2.5%0.7%0.4%0.2%11.7%2.7%1.7%0.7%2.4%0.7%0.4%0.2%
3451.2%12.7%2.5%2.0%2.7%2.0%0.5%0.3%15.3%11.8%2.4%2.0%2.6%1.9%0.5%0.3%7.9%6.3%1.4%1.1%1.6%1.2%0.3%0.2%7.4%6.1%1.4%1.1%1.5%1.2%0.3%0.2%
12341.2%3.8%11.2%3.1%2.5%0.8%1.8%0.6%13.7%3.7%10.1%3.0%2.3%0.8%1.7%0.6%7.2%2.2%5.4%1.8%1.4%0.5%1.1%0.4%6.8%2.1%5.1%1.7%1.4%0.5%1.1%0.4%
12350.5%7.7%7.9%6.2%1.7%1.3%1.3%1.0%9.9%7.4%7.5%6.0%1.6%1.2%1.3%1.0%4.9%3.6%3.7%3.0%1.0%0.7%0.8%0.6%4.8%3.6%3.7%3.0%1.0%0.7%0.8%0.6%
13451.4%2.8%2.2%0.8%12.2%2.3%1.7%0.7%14.6%2.7%2.1%0.8%10.7%2.2%1.6%0.6%8.2%1.6%1.2%0.5%6.4%1.4%1.0%0.4%7.4%1.6%1.2%0.5%5.9%1.4%1.0%0.4%
12450.6%7.7%1.7%1.3%7.9%5.8%1.3%1.0%9.8%7.2%1.6%1.3%7.2%5.4%1.2%1.0%5.2%3.8%1.0%0.8%4.1%3.1%0.8%0.7%4.8%3.6%1.0%0.8%3.9%3.0%0.8%0.7%
23450.5%2.2%7.4%1.8%7.4%1.7%5.3%1.4%9.1%2.1%6.8%1.7%6.6%1.6%5.0%1.4%4.8%1.2%3.6%0.9%3.9%1.0%3.0%0.9%4.6%1.2%3.5%0.9%3.7%1.0%2.9%0.9%
123450.2%4.8%5.1%3.9%4.8%3.7%3.7%3.1%6.2%4.7%4.9%3.8%4.6%3.6%3.6%3.0%3.1%2.4%2.4%2.0%2.5%2.0%2.0%1.7%3.1%2.4%2.4%2.0%2.4%2.0%2.0%1.7%
Also to demonstrate the clinical utility of the proposed method, Table 6 provides the confusion matrices of the unsupervised, semi-supervised, and supervised MTBNs for predicting the individual comorbid conditions that may occur in year 5, given years 1 to 4 data, based on a 50% threshold. As expected from earlier results, all three structure learning methods show comparable confusion matrices, while the semi-supervised methods provides the most competitive performance. Meanwhile, the true negative rate (specificity) of all MTBNs are acceptably high, and consistent across the five comorbid conditions. But, the true positive rate (recall) of the MTBNs fluctuates in predicting new cases of TBI.
Table 6

The confusion matrices of the unsupervised, semi-supervised, and supervised MTBNs for prediction of comorbidities of year 5, given year 1, 2, 3, 4 data, based on a 50% threshold.

Unsupervised Method
50% Thresholding50% Thresholding
TBIPTSD
Predicted ConditionPredicted Condition
Total Population10Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence7633Predicted Condition positivePredicted Condition negativePrevalence
56570686.54%4368326537.15%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True49926123852.30%47.70%True2836260023691.68%8.32%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
713430468304.26%95.74%47971768302936.86%63.14%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
46.19%3.37%12.274377759.52%7.23%2.487451257
92.90%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)24.6382961173.75%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)18.87462612
53.81%96.63%0.49818289640.48%92.77%0.131788108
50% Thresholding50% Thresholding
Back painsubstance abuse
Predicted ConditionPredicted Condition
Total Population10Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence7633Predicted Condition positivePredicted Condition negativePrevalence
4559307429.83%2730490312.35%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True2277205222590.12%9.88%True94350643753.66%46.34%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
53562507284946.81%53.19%66902224446633.24%66.76%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
45.01%7.32%1.92530952918.53%8.91%1.614098965
64.21%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)10.3641324365.14%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)2.325160924
54.99%92.68%0.18576658981.47%91.09%0.694188066
50% Thresholding
Back pain
Predicted Condition
Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence
3213442025.38%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True1937153540279.25%20.75%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
56961678401829.46%70.54%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
47.77%9.10%2.690027893
72.75%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)9.143243852
52.23%90.90%0.294209357
Semi-Supervised Method
50% Thresholding50% Thresholding
TBIPTSD
Predicted ConditionPredicted Condition
Total Population10Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence7633Predicted Condition positivePredicted Condition negativePrevalence
59370406.54%4883275037.15%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True49927322654.71%45.29%True2836265618093.65%6.35%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
713432068144.49%95.51%47972227257046.42%53.58%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
46.04%3.21%12.1967810654.39%6.55%2.017303981
92.85%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)25.7220962468.47%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)17.02818939
53.96%96.79%0.47417523645.61%93.45%0.118468496
50% Thresholding50% Thresholding
Back painsubstance abuse
Predicted ConditionPredicted Condition
Total Population10Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence7633Predicted Condition positivePredicted Condition negativePrevalence
3629400429.83%1351628212.35%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True2277204323489.72%10.28%True94373920478.37%21.63%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
53561586377029.61%70.39%669061260789.15%90.85%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
56.30%5.84%3.03000064854.70%3.25%8.566579336
76.16%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)20.7534678489.31%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)35.97688389
43.70%94.16%0.14599972745.30%96.75%0.238113433
50% Thresholding
Back pain
Predicted Condition
Total Population10
5432Predicted Condition positivePredicted Condition negativePrevalence
3495193735.66%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True1937169823987.66%12.34%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
34951797169851.42%48.58%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
48.58%12.34%1.704932416
62.52%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)6.71319703
51.42%87.66%0.253967284
Supervised Method
50% Thresholding50% Thresholding
TBIPTSD
Predicted ConditionPredicted Condition
Total Population10Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence7633Predicted Condition positivePredicted Condition negativePrevalence
34172926.54%3245438837.15%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True49916333632.67%67.33%True2836200683070.73%29.27%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
713417869562.50%97.50%47971239355825.83%74.17%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
47.80%4.61%13.091824161.82%18.92%2.738565384
93.27%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)18.9577982972.89%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)6.940447504
52.20%95.39%0.69057724438.18%81.08%0.39458052
50% Thresholding50% Thresholding
Back painsubstance abuse
Predicted ConditionPredicted Condition
Total Population10Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence7633Predicted Condition positivePredicted Condition negativePrevalence
2794483929.83%3419421412.35%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True2277977130042.91%57.09%True94370723674.97%25.03%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
53561817353933.92%66.08%66902712397840.54%59.46%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
34.97%26.87%1.26478636220.68%5.60%1.849456639
59.16%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)1.46378349861.38%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)4.394227164
65.03%73.13%0.86405289179.32%94.40%0.420883257
50% Thresholding
Back pain
Predicted Condition
Total Population10
7633Predicted Condition positivePredicted Condition negativePrevalence
3231440225.38%
1condition positiveTrue PositiveFalse NegativeTrue positive rate (TPR), Sensitivity, RecallFalse negative rate (FNR), Miss rate
True1937135358469.85%30.15%
Condition0condition negativeFalse PositiveTrue NegativeFalse positive rate (FPR), Fall-outTrue negative rate (TNR), Specificity (SPC)
56961878381832.97%67.03%
AccuracyPositive predictive value (PPV), PrecisionFalse omission rate (FOR)Positive likelihood ratio (LR+)Diagnostic odds ratio (DOR)
41.88%13.27%2.118568782
67.75%False discovery rate (FDR)Negative predictive value (NPV)Negative likelihood ratio (LR-)4.710047486
58.12%86.73%0.449797754

Longest path algorithm (LPA)

Here, we use the longest path algorithm, a variation of the shortest path algorithm, to find the most likely path between the emergence of substance abuse (SuAb) in year 1, and the SUD in year 5 for the unsupervised, semi-supervised, and supervised MTBNs (Fig 5). The analysis of the longest path shows the sequence of conditions that are caused by and/or correlated with substance abuse problem at the base year and leading to diagnosis of SuAb in year 5. From the Figure, it can be seen that the most likely paths from all MTBNs, include recurrent SuAb across different years, which shows the previous history of SuAb is a major predictor of SuAb in the future. Our follow up analysis verifies the suggested sequence of SuAb in the longest path, by showing 10% of the patients with SuAb in year 5 going through it. Meanwhile, The semi-supervised and supervised Bayesian networks show additional conditions on their longest path, suggesting possible correlations between PTSD and Depr, and the continuation of substance abuse problem.
Fig 5

Longest path.

The longest paths in the MTBNs. From left (a) Unsupervised Network (b) Semi-supervised Network and (c) Supervised Network.

Longest path.

The longest paths in the MTBNs. From left (a) Unsupervised Network (b) Semi-supervised Network and (c) Supervised Network.

Discussion

Clinical data are often stored as a hierarchical time series in typically large and extensive datasets. Analysis of these raw datasets may give an insight into the evolution of diseases and information about the co-occurrence of chronic conditions. This study describes an unsupervised structure learning approach for constructing Multilevel Temporal Bayesian Networks (MTBN) for mining patterns of comorbidity evolution in a large population of patients in the VA over time. Comparing the learned structure from the proposed approach with those of semi-supervised, and supervised structure learning approaches, demonstrates a significant degree of similarity between the unsupervised approach and the other learning approaches that utilize expert input. The results also show that the unsupervised approach has considerable predictive performance, which is comparable to the supervised and semi-supervised approaches, and better than multivariate probit regression, multinomial logistic regression, and Latent Regression Markov Mixture Clustering (LRMCL). Thus, in the absence of a medical expert, or sufficient literature, the unsupervised model can perform close to that of a semi or fully supervised model. The unsupervised model can also be used to generate strong hypotheses for the dependencies among chronic conditions to be tested by other approaches including clinical investigation. We also present a Longest Path Algorithm (LPA) to mine major trajectories of comorbidities emerging from and/or leading to specific chronic conditions. We applied the LPA to mine the most probable sequence of comorbidities between the emergence of substance abuse in year 1, and substance abuse in year 5, and identified the major trajectory contains recurrent substance abuse across different years. This trajectory that includes 10% of all patients, who end up with substance abuse in year 5, suggests that the history of substance abuse is the major predictor of future substance abuse problems. Such finding from LPA might be best understood in both the context of the health care system where the diagnoses are made and the clinical associations between the diagnoses themselves. Given that substance abuse is also implicated in other adverse outcomes such as suicidality, homelessness and mortality, this use-case analysis will provide prognostic insight for clinicians caring for Veterans who sustained SuAb. In addition to that, it can be used in demonstrating utility of a method that may be used to identify and mitigate risk in individuals at greatest risk for adverse outcomes of interest such as suicide, homelessness, and early mortality. Our findings highlight unique strengths and insights that can be gained from our algorithm and LPA and in this context, we have identified several new lines of clinical inquiry. Can early intervention of substance abuse problem prevents it in the future? Are patients with PTSD and/or Depression self-medicating with agents that predispose them to substance abuse problems, i.e., alcohol and illicit drugs? Were the treatments for these conditions suboptimal and did they include medications that predisposed patients to increased risk of substance abuse, i.e., benzodiazepines and opioids? This may be particularly relevant if back-pain is present, but not a clinical priority.

Study limitations

In this work, patients whose data was not maintained over the focused years were omitted. Aside from death, drop out can result from not requiring care or receiving care from other health providers, which typically happens to healthier patients and those who have health insurance. Consequently, the restricted population whose incomplete data are omitted, can be biased toward healthier patients and those with health insurance, which affect the predictability of the conditions. In addition, omitting the patients with missing data and dropout considerably reduces the number of available records for estimation and validation. The validity of this study was largely dependent on the accuracy of the diagnosis and record keeping. The data used to train the model has been attained from the VA and the data set has partial bias towards conditions related to military affairs. For instance, the Veteran population with TBI is weighted toward mild TBI, and a large number with blast exposure, which is not common in the civilian population. In addition, while military service members are required to have a higher level of fitness than the general population, Veterans who receive VA care tend to be of lower socioeconomic status and have more comorbidities than Veterans who do not receive VA care. Thus, the model cannot be used for general-purpose medical analysis, and findings may not generalize beyond the study cohort. However, the model can be retrained and extended for general-purpose analysis if the model data can be extended using the public health registries with similar diagnostic factors.

Conclusion

In this paper, we proposed an unsupervised Multi-level Temporal Bayesian Network (MTBN) for revealing the hidden patterns of interaction among multiple chronic conditions and patient levels risk factors, that can support medical decisions in absence of an medical expert or existing literature. The proposed approach develops a heuristic node ordering algorithm that reduces the computational complexity of the structure learning algorithm in temporal Bayesian networks. It also proposes a multi-level structure to model the hierarchal structure of patient level risk factors and co-morbidity. Additionally, it incorporates a Longest Path Algorithm (LPA) for identifying the most probable trajectories of comorbidities emerging from and/or leading to specific chronic conditions. We validated the performance of the proposed unsupervised approach against semi-supervised, and supervised learning Bayesian networks, as well as multivariate probit regression, multinomial logistic regression, and Latent Regression Markov Mixture Clustering (LRMCL) using a large dataset of more than 250,000 patients being monitored for 5 years. This approach has clinical implications for predicting how complex comorbid conditions may evolve. Importantly, this methodology can be used with large medical information datasets to develop predictive models for a wide variety and large number of clinical conditions including those that do not have previously demonstrated physiological or epidemiological associations.

Semi-supervised and supervised networks.

Learned BN structure from: (a) the semi-supervised method and (b) the supervised method. (PDF) Click here for additional data file.

Algorithms.

The algorithms used to learn the BN structures mentioned in the manuscript. (PDF) Click here for additional data file.

ICD-9 codes.

The ICD-9 codes for the disease conditions considered in the manuscript. (PDF) Click here for additional data file.

Disease codes in Table 5.

The list of disease codes used in Table 5 of the manuscript. (PDF) Click here for additional data file.
  22 in total

1.  Multinomial logistic regression.

Authors:  Chanyeong Kwak; Alan Clayton-Matthews
Journal:  Nurs Res       Date:  2002 Nov-Dec       Impact factor: 2.381

2.  Traumatic brain injury, posttraumatic stress disorder, and pain diagnoses in OIF/OEF/OND Veterans.

Authors:  David X Cifu; Brent C Taylor; William F Carne; Douglas Bidelspach; Nina A Sayer; Joel Scholten; Emily Hagel Campbell
Journal:  J Rehabil Res Dev       Date:  2013

3.  Traumatic Brain Injury Severity, Comorbidity, Social Support, Family Functioning, and Community Reintegration Among Veterans of the Afghanistan and Iraq Wars.

Authors:  Mary Jo Pugh; Alicia A Swan; Kathleen F Carlson; Carlos A Jaramillo; Blessen C Eapen; Christina Dillahunt-Aspillaga; Megan E Amuan; Roxana E Delgado; Kimberly McConnell; Erin P Finley; Jordan H Grafman
Journal:  Arch Phys Med Rehabil       Date:  2017-06-23       Impact factor: 3.966

4.  A national cohort study of the association between the polytrauma clinical triad and suicide-related behavior among US Veterans who served in Iraq and Afghanistan.

Authors:  Erin P Finley; Mary Bollinger; Polly H Noël; Megan E Amuan; Laurel A Copeland; Jacqueline A Pugh; Albana Dassori; Raymond Palmer; Craig Bryan; Mary Jo V Pugh
Journal:  Am J Public Health       Date:  2015-02       Impact factor: 9.308

5.  Comorbidity assessments based on patient report: results from the Veterans Health Study.

Authors:  Alfredo J Selim; Graeme Fincke; Xinhua S Ren; Austin Lee; William H Rogers; Donald R Miller; Katherine M Skinner; Mark Linzer; Lewis E Kazis
Journal:  J Ambul Care Manage       Date:  2004 Jul-Sep

6.  Mining Major Transitions of Chronic Conditions in Patients with Multiple Chronic Conditions.

Authors:  Adel Alaeddini; Carlos A Jaramillo; Syed H A Faruqui; Mary J Pugh
Journal:  Methods Inf Med       Date:  2018-01-24       Impact factor: 2.176

7.  Multilevel temporal Bayesian networks can model longitudinal change in multimorbidity.

Authors:  Martijn Lappenschaar; Arjen Hommersom; Peter J F Lucas; Joep Lagro; Stefan Visscher; Joke C Korevaar; François G Schellevis
Journal:  J Clin Epidemiol       Date:  2013-09-12       Impact factor: 6.437

8.  Deployment-related psychiatric and behavioral conditions and their association with functional disability in OEF/OIF/OND veterans.

Authors:  Sara M Lippa; Jennifer R Fonda; Catherine B Fortier; Melissa A Amick; Alexandra Kenna; William P Milberg; Regina E McGlinchey
Journal:  J Trauma Stress       Date:  2015-02

9.  Bayesian networks for clinical decision support in lung cancer care.

Authors:  M Berkan Sesen; Ann E Nicholson; Rene Banares-Alcantara; Timor Kadir; Michael Brady
Journal:  PLoS One       Date:  2013-12-06       Impact factor: 3.240

10.  Multivariable and Bayesian Network Analysis of Outcome Predictors in Acute Aneurysmal Subarachnoid Hemorrhage: Review of a Pure Surgical Series in the Post-International Subarachnoid Aneurysm Trial Era.

Authors:  Zsolt Zador; Wendy Huang; Matthew Sperrin; Michael T Lawton
Journal:  Oper Neurosurg (Hagerstown)       Date:  2018-06-01       Impact factor: 2.703

View more
  5 in total

1.  Dynamic Functional Continuous Time Bayesian Networks for Prediction and Monitoring of the Impact of Patients' Modifiable Lifestyle Behaviors on the Emergence of Multiple Chronic Conditions.

Authors:  Syed Hasib Akhter Faruqui; Adel Alaeddini; Jing Wang; Susan P Fisher-Hoch; Joseph B McCormick
Journal:  IEEE Access       Date:  2021-12-20       Impact factor: 3.476

2.  Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence.

Authors:  Genevieve Cezard; Calum Thomas McHale; Frank Sullivan; Juliana Kuster Filipe Bowles; Katherine Keenan
Journal:  BMJ Open       Date:  2021-11-22       Impact factor: 3.006

3.  Summarizing Complex Graphical Models of Multiple Chronic Conditions Using the Second Eigenvalue of Graph Laplacian: Algorithm Development and Validation.

Authors:  Adel Alaeddini; Syed Hasib Akhter Faruqui; Mike C Chang; Sara Shirinkam; Carlos Jaramillo; Peyman NajafiRad; Jing Wang; Mary Jo Pugh
Journal:  JMIR Med Inform       Date:  2020-06-17

4.  A Framework to Understand the Progression of Cardiovascular Disease for Type 2 Diabetes Mellitus Patients Using a Network Approach.

Authors:  Md Ekramul Hossain; Shahadat Uddin; Arif Khan; Mohammad Ali Moni
Journal:  Int J Environ Res Public Health       Date:  2020-01-16       Impact factor: 3.390

5.  A Functional Model for Structure Learning and Parameter Estimation in Continuous Time Bayesian Network: An Application in Identifying Patterns of Multiple Chronic Conditions.

Authors:  Syed Hasib Akhter Faruqui; Adel Alaeddini; Jing Wang; Carlos A Jaramillo; Mary Jo Pugh
Journal:  IEEE Access       Date:  2021-10-26       Impact factor: 3.367

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.