Literature DB >> 33928796

Cost-Effective Machine Learning Based Clinical Pre-Test Probability Strategy for DVT Diagnosis in Neurological Intensive Care Unit.

Li Luo¹, Ran Kou¹, Yuquan Feng¹, Jie Xiang¹, Wei Zhu².

Abstract

In order to overcome the shortage of the current costly DVT diagnosis and reduce the waste of valuable healthcare resources, we proposed a new diagnostic approach based on machine learning pre-test prediction models using EHRs. We examined the sociodemographic and clinical factors in the prediction of DVT with 518 NICU admitted patients, including 189 patients who eventually developed DVT. We used cross-validation on the training data to determine the optimal parameters, and finally, the applied ROC analysis is adopted to evaluate the predictive strength of each model. Two models (GLM and SVM) with the strongest ROC were selected for DVT prediction, based on which, we optimized the current intervention and diagnostic process of DVT and examined the performance of the proposed approach through simulations. The use of machine learning based pre-test prediction models can simplify and improve the intervention and diagnostic process of patients in NICU with suspected DVT, and reduce the valuable healthcare resource occupation/usage and medical costs.

Entities: Chemical Disease Gene Species

Keywords: deep vein thrombosis; economic consideration; electronic health records; machine learning; neurological ICU; risk factors

Year: 2021 PMID： 33928796 PMCID： PMC8114755 DOI： 10.1177/10760296211008650

Source DB: PubMed Journal: Clin Appl Thromb Hemost ISSN： 1076-0296 Impact factor: 2.389

Introduction

Venous thromboembolism (VTE) is blood clots, which may happen if patients’ blood flow changes or slows down somewhere in their bodies, which seriously threatens the life and health of patients. Unfortunately, the symptoms and signs of deep venous thrombosis (DVT) are not the same for everyone, which increases the difficulty of detection in practice. In some cases, DVT symptoms may include pain, swelling, redness, or other discomfort near the affected area.[1] In other cases, however, DVT does not cause any obvious symptoms until more serious complications occur, like pulmonary embolism (PE).[2] Currently, DVT is a major cause of mortality in ICU patients,[3] due to the fact that the majority of patients in ICU have one or even more risk factors for DVT.[4] Those critically ill patients in ICU have a higher risk of developing lower extremity DVT, compared with hospitalized patients in other units.[5] During their hospital stay, ICU patients are further predisposed to DVT due to prolonged immobilization,[6] vascular injury,[7] stroke,[8] sepsis from central venous catheters[9] and other invasive interventions. The DVT diagnosis and intervention are especially crucial and tricky for critically ill patients, since those patients with untreated DVT may develop other symptoms, e.g. PE. In this process, predicting the probability of DVT presence in an individual patient is of utmost important and helpful since DVT can be prevented by thrombosis intervention (also known as thrombosis prophylaxis). Since it is extremely important, in our on-site research, the physician has to make the decision that all patients are suggested for further diagnostic work-up. However, only 20% to 30% of DVT diagnosis of the suggested patients are confirmed, which puts a heavy economic burden on both patients and the government medical expense. Then the question arises, how to improve the efficiency of DVT prediction accuracy to help with diagnosis, and to reduce the waste of valuable hospital resources? One simple solution is to exclude unnecessary tests or interventions of patients who have a lower probability of DVT presence. For example, if we know the probability of DVT presence of a patient is low and his/her first ultrasound venous imaging (the most accurate and noninvasive test to diagnose DVT) result is normal, he/she can preclude the need for serial testing.[10,11] Then the problem is how to estimate the probability of DVT presence of an individual patient. In recent years, risk assessment models for individual patient have become more popular to aid the clinical decision-making. Abundant models have been developed to estimate the probability of a certain outcome in an individual patient, based on the his/her demographics, clinical or laboratory characteristics.[12-14] Therefore, those prediction models enable us to forecast the presence of DVT with less obvious symptoms and conduct early intervention. Furthermore, to identify those patients at lower risk of DVT can minimize the need of a large number of expensive radiological tests for them. In this study, we devote to design a pretest system in NICU base on machine learning methods using EHRs, to filter those patients who do not require repeated ultrasound imaging or prophylaxis therapy. In this study, we comprehensively incorporated all types of sociodemographic and clinical laboratory features from the EHRs system, and then examined the effectiveness of machine learning models in predicting DVT presence of NICU patients. Compared to previous studies, this study contributes from the following aspects: (1) We investigated which factors might be helpful to predict DVT risk in NICU patients, using both univariate and multivariate filtering; (2) We developed machine learning models to accurately predict the risk of DVT of patients in NICU; (3) We devised better clinic process of DVT in NICU patients with our pre-test prediction results; and (4) We explored the cost-saving effect of the proposed approach through simulations. To our best knowledge, this is the first systematic attempt of DVT risk assessment in NICU patients, due to a previous scarcity of suitable data.

Material and Methods

Data Source and Cohort Derivation

The samples were drawn from the EHRs system of West China Hospital (WCH) of Sichuan University (one of the largest public complicated and miscellaneous disease medical center in China), which covers around 14 million residents in 22 districts and counties. We collected data of patients in the NICU care of the hospital from September 2016 to August 2018, and the study framework is shown in Figure 1. Patients in this study have undergone repeated ultrasound as the reference diagnosis to determine the presence or absence of DVT while the imaging evidence is used as the diagnostic criterion for thrombosis. DVT is diagnosed by upper and lower extremity venous color Doppler ultrasound and/or computed tomographic (CT) venography.

Figure 1.

Research methodology framework.

Research methodology framework. 593 records of inpatients admitted in the NICU care of the hospital from September 2016 to August 2018 were extracted, in which 518 records at last were kept. Based on this cohort, records were excluded if they were: (1) of patients with missing DVT ultrasound results; (2) of patients’ lab test information missing; (3) duplicated storage (records with the same inpatient code and case code). First, the data with 593 records were checked for missing values, and subjects with any missing value were excluded from the analysis. Second, the outliers of each group were detected though the interquartile range method and were removed before the start of the analysis, and we ended up with 518 cases (with DVT prevalence of 0.36). Various categories of features were extracted from the original EHRs, including sociodemographic and clinical laboratory factors. With the EHRs, large amounts of data are available, providing an opportunity of more accurate prediction of patients’ outcomes (see Figure 2). By using data-driven predictive machine learning models, we sought to identify reproducible clinical parameters during hospitalization that may identify potential high-risk patients for intervention.

Figure 2.

Flowchart of the study subjects.

Data Analysis

The descriptive data analysis and machine learning algorithm were implemented in R (Version 3.3.2 for Windows).

Feature extraction (risk factors)

A machine learning based risk prediction model contains feature extraction, which determines the predictive power of candidate predictors. The extraction is performed on the candidate predictors (features) to reduce the curse of dimensionality, while the odds of overfitting are reduced by removing less predictive predictors.[15-17] To identify the key predictors of the DVT risk, we first screen the risk predictors using both univariate and multivariate filtering, namely, statistical analysis (statistical), machine learning (feature extraction-random forest (FE-RF)), and regression (Lasso) method. Appropriate statistical tests such as the analysis of variance (ANOVA), the chi-square test or t-test, and the cross association of variables has also been investigated using logistic regression. RF could be used to rank the importance of predictor in a classification problem and provides 2 multivariable importance measures (VIMs), i.e. the Mean Decrease Accuracy (MDA) which is based on classification accuracy of the out-of-bag (OOB) data from bagging, and Mean Decrease Gini (MDG) which is based on the Gini index of node impurity (see Online Appendix 1). FE-RF determined the second subset of predictors with the highest accuracy. We used penalized regression by the least absolute shrinkage and selection operator (Lasso) method in a generalized linear mixed model in the R package glmLasso[18] to determine another subset of predictors. An accuracy-simplicity trade-off in Lasso regression is represented in Supplement 1, and we used 3 feature extraction methods and the original all predictors to construct different datasets. The details about the numbers of datasets risk factors are listed in Online Appendix 2. In this study, we used real data on the diagnosis of DVT to examine our predictive models and we compared the performance of 3 different feature extraction methods as well as the original baseline models.

Machine learning methods

The data was randomly split into 80% and 20% as training and testing data, maintaining the same proportion of each class in both data set. The same set of testing data was consistently held out, and never used for model selection or parameter tuning. We compared the performance of models developed by 4 different machine learning approaches to predict the risk of DVT. To this end, we trained 4 different machine learning models, a Xgboost (eXtreme Gradient Boosting) model, a 2-class support vector machine (SVM) model, a GLM model and a RF[19] model. After spilling the data in to training set and test set, all data pre-process and parameters tuning are completed in R language using the preProcess function and the train function. The train function in R can generate a set of parameter values, in which the trainControl argument controls how many are evaluated. By default, the function automatically chooses the tuning parameters with the best performance. To choose a sensible combination of predictors and modeling strategy, the composite features and different machine learning performance were tested on the test dataset by ROC analysis. Moreover, other classification performance metrics (accuracy, specificity, sensitivity, etc.) change when the threshold of classification model changes.

Results

Descriptive Analyses

We examined 518 patients admitted into NICU from the EHRs database of WCH, with 36.49% DVT prevalence. Continuously distributed outcomes were summarized with the mean and standard deviation (SD) and categorical outcomes were summarized with frequencies and percentages. All statistical testing was 2-sided with a significant level of P-value less than 5%, by using the free statistical software, R. Those basic sociodemographic characteristics of full cohort are summarized in Table 1. The mean (SD) age of the study population is 52.37 (17.35) years, and 53.1% of patients are male. Two groups did not differ significantly in the gender, age, admission times, if transferred, rehospitalization plan and length of stay (LOS), while did differ in race ethnicity, cost type, payment type, job status, marital status, and admission type. Particularly, NICU patients who developed DVT were more likely to have the following features, less surgery times (0.89 vs 0.81; P < .001), retired (16.4% vs 7.3%), mental worker (4.8% vs 2.4%) rather than manual worker (24.3% vs 30.4%), student (1.6% vs 5.2%; P < .001), married (86.8% vs 79.9%) and widowed (5.8% vs 4.6%) rather than single (4.8% vs 13.7%), the race ethnicity of Zang (8.5% vs 3.0%) and Yi (2.6% vs 1.5%) rather than Han (86.8% vs 93.6%), and pay with any type of medical insurances (50.8% vs 36.8%; P < .001).

Table 1.

Patient Demographic Details and Other Factors for DVT.

	Overall	No DVT	DVT
Factors	(518)	(329)	(189)	P
Surgery times (mean (SD))	0.86 (0.35)	0.89 (0.31)	0.81 (0.39)	.01
Age (mean (SD))	52.37 (17.35)	51.77 (17.60)	53.42 (16.90)	.297
LOS (mean (SD))	22.55 (24.03)	22.00 (26.50)	23.50 (19.01)	.495
Gender = M (%)	275 (53.1)	169 (51.4)	106 (56.1)	.345
Cost type (%)				.002
Cash	248 (47.9)	177 (53.8)	71 (37.6)
Insurance	217 (41.9)	121 (36.8)	96 (50.8)
Others	53 (10.2)	31 (9.4)	22 (11.6)
Marriage status (%)				.014
Divorced	11 (2.1)	6 (1.8)	5 (2.6)
Married	427 (82.4)	263 (79.9)	164 (86.8)
Single	54 (10.4)	45 (13.7)	9 (4.8)
Widowed	26 (5.0)	15 (4.6)	11 (5.8)
Job status (%)				.007
Labor	146 (28.2)	100 (30.4)	46 (24.3)
Management	17 (3.3)	8 (2.4)	9 (4.8)
Office	43 (8.3)	29 (8.8)	14 (7.4)
Others	213 (41.1)	134 (40.7)	79 (41.8)
Retired	55 (10.6)	24 (7.3)	31 (16.4)
Student	20 (3.9)	17 (5.2)	3 (1.6)
Unemployed	24 (4.6)	17 (5.2)	7 (3.7)
Race ethnicity (%)				.037
Han	472 (91.1)	308 (93.6)	164 (86.8)
Others	10 (1.9)	6 (1.8)	4 (2.1)
Yi	10 (1.9)	5 (1.5)	5 (2.6)
Zang	26 (5.0)	10 (3.0)	16 (8.5)
Pay type (%)				.001
Medical insurance	124 (23.9)	60 (18.2)	64 (33.9)
Others	349 (67.4)	240 (72.9)	109 (57.7)
Self-paid	35 (6.8)	22 (6.7)	13 (6.9)
Social insurance	10 (1.9)	7 (2.1)	3 (1.6)
Admission type (%)				.012
Emergency	337 (65.1)	202 (61.4)	135 (71.4)
Others	18 (3.5)	9 (2.7)	9 (4.8)
Outpatient	163 (31.5)	118 (35.9)	45 (23.8)
If transferred = T (%)	99 (19.1)	60 (18.2)	39 (20.6)	.581
Rehospitalization = T (%)	54 (10.4)	33 (10.0)	21 (11.1)	.812
Admission times (mean (SD))	1.39 (1.65)	1.45 (1.88)	1.30 (1.14)	.317

Patient Demographic Details and Other Factors for DVT. The results for analyzing laboratory test dataset, the results of coagulation, blood, and biochemical examinations tested before hospitalization of the study population were extracted and classified in Table 2. For the coagulation examination, NICU patients who developed DVT showed higher mean values of fibrinogen (3.40 mg/dL vs 2.80 mg/dL; P < .001). For the routine blood examination, NICU patients who developed DVT showed higher mean values of white cell count (11.61 109/L vs 10.21 109/L; P = .002), percentage of neutrophils (85.13% vs 81.33%; P < .001), average red blood cell volume (92.18 fl vs 90.38 fl; P = .008), red blood cell distribution width CV (14.27% vs 13.93%; P = .046) and SD (46.69 fl vs 44.76 fl; P < .001). NICU patients who developed DVT showed lower mean values of red blood cell count (3.63 1012/L vs 3.84 1012/L; P = .002), hemoglobin (108.40g/L vs 114.49 g/L; P = .004), percentage of lymphocytes (9.09% vs 13.09%; P < .001), hematocrit (33% vs 35%; P = .035), and average red blood cell HGB concentration (325.26 g/L vs 330.54 g/L; P < .001). For the routine biochemical examination, NICU patients who developed DVT showed higher mean values of urea (6.43 mmol/L vs 5.33 mmol/L; P = .001), glucose (8.36 mmol/L vs 7.70 mmol/L; P = .022), and globulin (25.09 g/L vs 24.13 g/L; P = .041). NICU patients who developed DVT showed lower mean values of white ball ratio (1.36 vs 1.50; P < .001), and albumin (32.92 g/L vs 34.93 g/L; P = .001).

Table 2.

Laboratory Predictors of Deep Vein Thrombosis.

	Overall	No DVT	DVT
Category	(518)	(329)	(189)	P
Coagulation-Prothrombin time	13.07 (2.20)	12.97 (2.29)	13.26 (2.04)	0.155
Coagulation-ISR	1.12 (0.20)	1.11 (0.20)	1.13 (0.19)	0.188
Coagulation-Activated partial thromboplastin time	32.50 (11.20)	32.58 (11.67)	32.35 (10.37)	0.82
Coagulation-Thrombin time	20.04 (10.28)	20.22 (9.57)	19.73 (11.43)	0.607
Coagulation-Fibrinogen	3.02 (1.49)	2.80 (1.42)	3.40 (1.54)	<0.001
Coagulation-Thromboplastin time ratio	1.17 (0.40)	1.17 (0.42)	1.16 (0.37)	0.861
Blood-Red blood cell count	3.77 (0.76)	3.84 (0.76)	3.63 (0.75)	0.002
Blood-Hemoglobin	112.26 (23.47)	114.49 (23.46)	108.40 (23.04)	0.004
Blood-Platelet count	158.19 (75.27)	158.77 (71.10)	157.19 (82.22)	0.818
Blood-White cell count	10.72 (4.97)	10.21 (4.53)	11.61 (5.54)	0.002
Blood-Percentage of neutrophils	82.71 (10.78)	81.33 (11.76)	85.13 (8.32)	<0.001
Blood-Percentage of Lymphocytes	11.63 (8.77)	13.09 (9.64)	9.09 (6.28)	<0.001
Blood-Percentage of eosinophils	0.62 (1.22)	0.63 (1.11)	0.60 (1.40)	0.764
Blood-Percentage of basophils	0.15 (0.21)	0.15 (0.18)	0.15 (0.25)	0.975
Blood-Hematocrit	0.34 (0.07)	0.35 (0.07)	0.33 (0.07)	0.035
Blood-Average red blood cell volume	91.04 (7.43)	90.38 (7.36)	92.18 (7.44)	0.008
Blood-Average red blood cell HGB	29.91 (2.67)	29.88 (2.72)	29.97 (2.58)	0.703
Blood-Average red blood cell HGB_concentration	328.62 (14.07)	330.54 (14.19)	325.26 (13.24)	<0.001
Blood-Red blood cell distribution width CV	14.05 (1.88)	13.93 (1.74)	14.27 (2.09)	0.046
Blood-Red blood cell distribution width SD	45.47 (5.99)	44.76 (5.37)	46.69 (6.78)	<0.001
Biochemical-Alanine aminotransferase	27.15 (40.11)	27.47 (46.40)	26.60 (25.85)	0.811
Biochemical-Aspartate aminotransferase	36.92 (137.11)	40.30 (170.66)	31.02 (28.77)	0.459
Biochemical-Urea	5.73 (3.66)	5.33 (3.06)	6.43 (4.44)	0.001
Biochemical-Total bilirubin	13.88 (7.81)	13.58 (7.64)	14.41 (8.08)	0.245
Biochemical-Direct bilirubin	6.21 (3.99)	6.09 (4.04)	6.43 (3.90)	0.354
Biochemical-Indirect bilirubin	7.72 (4.84)	7.58 (4.84)	7.96 (4.85)	0.395
Biochemical-Total protein	58.68 (9.09)	59.06 (9.53)	58.01 (8.26)	0.204
Biochemical-Albumin	34.20 (6.67)	34.93 (6.79)	32.92 (6.28)	0.001
Biochemical-Creatinine	78.65 (83.95)	74.48 (64.07)	85.92 (110.16)	0.135
Biochemical-Glucose	7.94 (3.15)	7.70 (3.05)	8.36 (3.30)	0.022
Biochemical-Alkaline phosphatase	70.78 (36.39)	71.36 (40.18)	69.78 (28.72)	0.636
Biochemical-Glutamyl transpeptidase	40.59 (50.61)	40.12 (53.61)	41.40 (45.04)	0.783
Biochemical-Sodium	143.45 (7.48)	143.08 (7.74)	144.08 (6.98)	0.144
Biochemical-Potassium	3.91 (0.54)	3.94 (0.53)	3.85 (0.55)	0.089
Biochemical-Chlorine	106.57 (8.02)	106.08 (8.26)	107.41 (7.53)	0.069
Biochemical-Globulin	24.48 (5.16)	24.13 (5.18)	25.09 (5.09)	0.041
Biochemical-White ball ratio	1.45 (0.39)	1.50 (0.40)	1.36 (0.38)	<0.001
Biochemical-Uric acid	216.71 (119.58)	221.61 (120.54)	208.19 (117.72)	0.219
Biochemical-Triglycerides	1.45 (1.19)	1.41 (1.21)	1.53 (1.16)	0.272

Laboratory Predictors of Deep Vein Thrombosis.

Predictive Analyses

We ran the 4 algorithms using the training set, in order to build better classifiers by optimizing the parameters of each algorithm, and calibrated the classifiers using the testing set that was never used for model selection or parameter tuning. Fine-tuning the classifiers entailed using different parameter combinations inside trainControl. The parameters producing a classification with the best performance for each algorithm were chosen using cross-validation on the training data. All classifiers utilized in this study were fine-tuned and have the same overall architecture. Several classifiers have been selected to avoid bias toward the use of a particular classifier. The 4 classifiers were run with 3 cohorts of subjects and feature group combinations. To examine the effectiveness of feature extraction procedure, we developed the predictive models using all features (original) as the baseline model and compared with the models using 3 different feature extraction methods. The results showed that models developed by FE-RF feature extraction method have the best performance. The composite features and different machine learning performances were calibrated on the testing dataset with ROC analysis by calculating the area under the ROC curve (AUC). We can see from Figure 3 that GLM and SVM prediction models have larger AUC, i.e. 0.77 and 0.78, respectively (P = .6736, DeLong’s test), when compared to the other 2 methods using FE-RF feature extraction method (P < .1, DeLong’s test). Our predictive model could be used to stratify patients according to their DVT risk in randomized clinical trials and enable us to explore the optimal diagnosis and intervention process to in patients in NICU.

Figure 3.

ROC curves using FE-RF feature extraction methods for (A) GLM, (B) Xgboost, (C) RF, and (D) SVM.

Simulation and Cost-Effect Analysis

Comprehensive ultrasound imaging is the most accurate and noninvasive way to diagnose DVT. However, the availability is often limited due to the lack of equipment or physicians. Usually, some widely accepted diagnostic approaches of DVT include the judgment based on doctors’ clinical suspicion, the use of Wells score for risk stratification and the D-dimer in low-risk patients, to reduce unnecessary imaging.[20,21] The current practice is that patients are screened by compression ultrasound for the first time within the first week of admission to NICU. However, regardless of the results, patients must receive compression ultrasound every week until they leave the hospital. After being screened for DVT, physicians initiate standard prophylaxis (Intermittent pneumatic compression) empirically when ultrasound tests do not show DVT, even most of comprehensive studies are ultimately negative, while the positive patients get treatment. In order to overcome the shortage of the current practice of DVT diagnosis and to reduce the waste of valuable healthcare resources, we propose prediction models based on EHRs data to forecast the DVT presence before any further diagnosis. First of all, repeat screening may not be necessary for all patients. Schellong et al[22] concluded that the compression ultrasound is safe to exclude DVT, thereby, reducing the diagnostic workup process of patients with suspected DVT to only one single ultrasound screening. Some other studies had also proved that it is safe to withhold repeated ultrasound in patients who have a low pretest probability with a normal result of compression ultrasound.[23] Therefore, we made the first adjustment in the repeated ultrasound screening procedure, i.e. if the probability of DVT is predicted relatively low for a patient, we can withhold repeated ultrasound screening. Second, many studies have shown that there is an associated risk of bleeding due to standard DVT prophylaxis in many common NICU diseases.[24] In another word, it may do more harm than the benefit to give all patients diagnosed negative standard DVT prophylaxis. Therefore, we made the second adjustment in the indiscriminate DVT prophylaxis, i.e. only when the presence probability of DVT of a patient is relatively high, prophylaxis is provided. The new proposed process is shown in Figure 4.

Figure 4.

Proposed new diagnosis and intervention process of suspected DVT.

Proposed new diagnosis and intervention process of suspected DVT. To examine the performance of the proposed approach, simulation experiments have been conducted. In Table 3, we summarize the notations we used to calculate the expected cost for every patient in the simulation. All parameters in Table 3 are obtained from either the hospital historical information or other literature. In the new diagnosis process, all patients still go through the first compression ultrasound within the first week of admission to the NICU. However, if a patient’s predictive probability of DVT is lower than P 1 and the first ultrasound is normal, then no repeated ultrasound testing is needed. And if a patient’s predictive probability of DVT is lower than P 2 but higher than P 1, no intervention is needed. Only when a patient’s predictive probability of DVT is higher than P 2, both repeated ultrasound testing and intervention are needed.

Table 3.

Mathematical Notations Summary.

	Notation	Description
Parameters	a	The total treatment cost pre-day
	t ₁	The number of days for treatment
	u	The cost of a single ultrasound screening
	t	The times of ultrasound screening
	y	The success rate for the intervention
	i	The cost of intervention pre-day
	t ₂	The number of days for intervention
	P	The actual current prevalence of DVT
Variables	P	The predictive probability of a patient
	P ₁ and P ₂	The 2 thresholds for the predictive model

Mathematical Notations Summary. We compared the performance of the current diagnosis and intervention process and the proposed approach. Cost analysis has also been taken, aiming to establish necessary screening and intervention at a more reasonable cost. Besides the machine learning model adopted in this paper, we also included a D-dimer test scenario.[23] The optimum cut-point (P 1 and P 2) was the point which minimized the expected cost for every patient. The results are shown in Table 4.

Table 4.

Estimated Effect of the Current Diagnostic Process and the Proposed One on DVT Screening and Interventions.

Scenario	P ₁	P ₂	Expected cost pre-person
Actual (current)			¥6456.2
Optimized 0 (D-Dimer)	.51	.8	¥3856.5
Optimized 1 (GLM)	.532	.626	¥3143.3
Optimized 2 (SVM)	.583	.723	¥3272.7

Estimated Effect of the Current Diagnostic Process and the Proposed One on DVT Screening and Interventions.

Discussion

Since the current diagnosis and intervention process of DVT has many limitations, we adopted predictive models to reduce some unnecessary tests and treatment by forecasting the probability of developing DVT of patients in NICU. In this paper, statistical analysis, FE-RF, and Lasso are used to analyze the candidate risk factors that influence the risk of DVT. The development of machine learning model should base on the characteristics of the data on hand and the proper condition. We used the 2 models with the AUC of 0.77 and 0.78 to conduct the simulation, through which the new process was proved to be cost-effective. In some of the previous studies,[25-27] the univariate filtering was used, and those factors with a P-value of less than .05 was considered statistically significant. Multiple logistic regressions were used to identify the cross association between the possible risk factors affecting the presence of DVT.[28-30] Simple scoring systems had been used as DVT risk assessment model in practice as well.[31-33] Artificial intelligence, and more narrowly known as machine-learning (ML), is beginning to expand humanity’s ability to analyze increasingly large and complex datasets, including in medical research and clinical practice.[34] A lot of research did predictive analytics using ML techniques to shed some lights on better decision making in suspected DVT patients.[10,35,36] Nwosisi et al[37] proposed binary decision trees to predict DVT. Their results showed that the risk probability can well indicate whether a patient would develop DVT, which aids in the early diagnosis of DVT. Khorana et al[38] developed a logistic regression (LR) model to predict chemotherapy-associated VTE using patient’s clinical and laboratory information. Marquez et al[39] also used LR and recursive partitioning methods to develop risk prediction models with predictors from catheterized patients in PICU. Rochefort et al[40] assessed the accuracy of statistical NLP technique using SVM models. Ferroni et al[41] proposed multiple kernel learning based on SVM and random optimization models, which were used to identify VTE risk predictors yielding the best classification performance. The performance of other commonly used tools is also reviewed to compare with our machine learning tools. Eichinger et al[31] developed scoring systems with AUCs (cross-validated discrimination indices) for prediction of the cumulative recurrence risk after 5 years calculated from baseline, 3, 9, and 15 months were 0.63, 0.61, 0.61, and 0.58, respectively. Brateanu et al[35] developed a multiple logistic regression model to predict the probability of developing proximal DVT and/or PE within 3 months after an isolated episode of distal DVT. Their final model had a bootstrap bias-corrected c-statistic of 0.72 with a 95% CI (0.64 to 0.79). Their model might also be used to choose between anticoagulation intervention and monitoring with serial ultrasounds. De Haan et al[36] explored whether the inclusion of established thrombosis-associated SNPs in a venous thrombosis risk model could improve their risk prediction. In their study, the AUC of the risk model based on known non-genetic risk factors was 0.77 (95% CI 0.76-0.78). The optimization of the diagnostic strategy for ruling out DVT is another popular research topic. Given the high degree of heterogeneity and competing risks of thrombosis and hemorrhage among neurocritical care patients, prevention of DVT in this group is challenging.[42] Predicting the probability of DVT presence in an individual patient is of utmost importance since DVT can be prevented by thrombosis intervention (also known as thromboprophylaxis). Furthermore, the classification of patients at lower risk of DVT can minimize the need of a large number of expensive radiological tests for such patients. Tick et al[43] evaluated a new noninvasive diagnostic strategy for ruling out DVT. Oudega et al[44] showed the possibility to safely rule out DVT in a large number of patients in primary care, using 8 simple indicators from patient physical examination, the D-dimer test and history, which can reduce the burden on both patients and health care costs. However, there is a paucity of evidence addressing thromboprophylaxis in neurocritical care patients and should call for additional research in this unique care setting.[45] Data-mining and ML provide great opportunities and promising results to predict future health risk from current health predictors.[46] However, all risk predictive models have their own merits and pitfalls, depending on the characteristics of the data at hand and the proper condition. This study has some limitations and some further researches can be done. This is a retrospective study and the limitations of this methodological approach appropriately addressed. The major limitation of this study is that we could not implement our proposed pretest strategy in the actual setting of patients so far. Although simulation is widely reported upon in health care,[47] it is not clear whether the actual implementation is good or not. Moreover, the small and unitary sample source is not overwhelmingly robust for broad usage, thus large sample comparative studies are needed to validate the results. Another limitation brought by insufficient data is that the time-varying process is not captured in the study. Although some variables included in EHR can be time-varying and the risk of DVT should be varying over the treatment course,[48] our data contains only the first-time lab results. Also, the incidence of DVT may vary between countries and region. Our analysis is based on the available regional data, which may not explain the situation on national level. Moreover, in calculating the effect of the intervention, we only used one type of intervention, i.e. IPC. In future research, we should consider more personalized interventions with different success rates, side effects, and costs, i.e. pharmacological thromboprophylaxis[49] which can further improve the quantity years of life for patients. The public health service of China is developing rapidly while facing many problems, such as a shortage of money and resources. The old mode of DVT diagnosis and interventions is obsolete and over-costing. The possible gains of risk assessment models may be weighed against the costs of unnecessary tests, unnecessary follow-ups and even unnecessary interventions of incidental findings. In this study, the results of cost-effect analysis support the implementation of this risk assessment model. If implemented this way, a new diagnostic mode utilizes less resources for one health care unit as well as manpower, compared to the traditional one. All the saved resources can be allocated elsewhere for patients who need them more.

Conclusion

Since the current diagnosis and intervention process of DVT has many limitations, we adopted predictive models to reduce some unnecessary tests and treatment by predicting the probability of developing DVT of patients in NICU. Prediction tool utilizing the information contained in EHR systems is helpful to the clinical decision and could help those healthcare practitioners to achieve improvements in clinical efficiency. Specifically, the use of such pre-test probability with risk assessment model provides physicians an easily identification of NICU patients with suspected DVT, and therefore can decrease medical costs and reduce the waste of valuable healthcare resource. The simulation results indicate that our approach is effective and efficient with real data from WCH. Click here for additional data file. Supplemental Material, sj-pdf-1-cat-10.1177_10760296211008650 for Cost-Effective Machine Learning Based Clinical Pre-Test Probability Strategy for DVT Diagnosis in Neurological Intensive Care Unit by Li Luo, Ran Kou, Yuquan Feng, Jie Xiang and Wei Zhu in Clinical and Applied Thrombosis/Hemostasis

45 in total

1. Probability of developing proximal deep-vein thrombosis and/or pulmonary embolism after distal deep-vein thrombosis.

Authors: Andrei Brateanu; Krishna Patel; Kevin Chagin; Pichapong Tunsupon; Pojchawan Yampikulsakul; Gautam V Shah; Sintawat Wangsiricharoen; Linda Amah; Joshua Allen; Aryeh Shapiro; Neha Gupta; Lillie Morgan; Rahul Kumar; Craig Nielsen; Michael B Rothberg
Journal: Thromb Haemost Date: 2015-12-10 Impact factor: 5.249

2. Ruling out deep venous thrombosis in primary care. A simple diagnostic algorithm including D-dimer testing.

Authors: Ruud Oudega; Karel G M Moons; Arno W Hoes
Journal: Thromb Haemost Date: 2005-07 Impact factor: 5.249

3. Safety of Chemical DVT Prophylaxis in Severe Traumatic Brain Injury with Invasive Monitoring Devices.

Authors: Bradley A Dengler; Paolo Mendez-Gomez; Amanda Chavez; Lacey Avila; Joel Michalek; Brian Hernandez; Ramesh Grandhi; Ali Seifi
Journal: Neurocrit Care Date: 2016-10 Impact factor: 3.210

Review 4. Deep vein thrombosis.

Authors: Gargi Bandyopadhyay; Subesha Basu Roy; Swaraj Haldar; Rabindra Bhattacharya
Journal: J Indian Med Assoc Date: 2010-12

5. Accuracy of clinical assessment of deep-vein thrombosis.

Authors: P S Wells; J Hirsh; D R Anderson; A W Lensing; G Foster; C Kearon; J Weitz; R D'Ovidio; A Cogo; P Prandoni
Journal: Lancet Date: 1995-05-27 Impact factor: 79.321

Review 6. Analytics with artificial intelligence to advance the treatment of acute respiratory distress syndrome.

Authors: Zhongheng Zhang; Eliano Pio Navarese; Bin Zheng; Qinghe Meng; Nan Liu; Huiqing Ge; Qing Pan; Yuetian Yu; Xuelei Ma
Journal: J Evid Based Med Date: 2020-11-13

Review 7. Reversal of Anticoagulation and Management of Bleeding in Patients on Anticoagulants.

Authors: Prajwal Dhakal; Supratik Rayamajhi; Vivek Verma; Krishna Gundabolu; Vijaya R Bhatt
Journal: Clin Appl Thromb Hemost Date: 2016-10-26 Impact factor: 2.389

8. Deep venous thrombosis: clinically silent in the intensive care unit.

Authors: Mark A Crowther; Deborah J Cook; Lauren E Griffith; Phillip J Devereaux; Christian C Rabbat; France J Clarke; Neala Hoad; Ellen McDonald; Maureen O Meade; Gordon H Guyatt; William H Geerts; Phillip S Wells
Journal: J Crit Care Date: 2005-12 Impact factor: 3.425

9. Development and validation of a predictive model for chemotherapy-associated thrombosis.

Authors: Alok A Khorana; Nicole M Kuderer; Eva Culakova; Gary H Lyman; Charles W Francis
Journal: Blood Date: 2008-01-23 Impact factor: 22.113

Review 10. Applications of Artificial Intelligence and Big Data Analytics in m-Health: A Healthcare System Perspective.

Authors: Z Faizal Khan; Sultan Refa Alotaibi
Journal: J Healthc Eng Date: 2020-08-30 Impact factor: 2.682