Literature DB >> 29503750

Comparison of Models for the Prediction of Medical Costs of Spinal Fusion in Taiwan Diagnosis-Related Groups by Machine Learning Algorithms.

Ching-Yen Kuo^1,2, Liang-Chin Yu¹, Hou-Chaung Chen³, Chien-Lung Chan^1,4.

Abstract

OBJECTIVES: The aims of this study were to compare the performance of machine learning methods for the prediction of the medical costs associated with spinal fusion in terms of profit or loss in Taiwan Diagnosis-Related Groups (Tw-DRGs) and to apply these methods to explore the important factors associated with the medical costs of spinal fusion.
METHODS: A data set was obtained from a regional hospital in Taoyuan city in Taiwan, which contained data from 2010 to 2013 on patients of Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities). Naïve-Bayesian, support vector machines, logistic regression, C4.5 decision tree, and random forest methods were employed for prediction using WEKA 3.8.1.
RESULTS: Five hundred thirty-two cases were categorized as belonging to the Tw-DRG49702 group. The mean medical cost was US $4,549.7, and the mean age of the patients was 62.4 years. The mean length of stay was 9.3 days. The length of stay was an important variable in terms of determining medical costs for patients undergoing spinal fusion. The random forest method had the best predictive performance in comparison to the other methods, achieving an accuracy of 84.30%, a sensitivity of 71.4%, a specificity of 92.2%, and an AUC of 0.904.
CONCLUSIONS: Our study demonstrated that the random forest model can be employed to predict the medical costs of Tw-DRG49702, and could inform hospital strategy in terms of increasing the financial management efficiency of this operation.

Entities: Chemical

Keywords: Costs and Cost Analysis; Diagnosis-Related Groups; Machine Learning; Spinal Fusion; Taiwan

Year: 2018 PMID： 29503750 PMCID： PMC5820083 DOI： 10.4258/hir.2018.24.1.29

Source DB: PubMed Journal: Healthc Inform Res ISSN： 2093-3681

I. Introduction

Spinal fusion is one of the most common procedures performed by spine surgeons. Between 1998 and 2008, the annual number of spinal fusion discharges increased 2.4-fold, and the national bill for spinal fusion increased 7.9-fold in the United States, while laminectomy, hip replacement, and knee arthroplasty showed relative increases of only 11.3%, 49.1%, and 126.8%, respectively [1]. Most recently, the annual number of lumbar spinal fusions has continued to increase, especially at high- and medium-volume hospitals in New York [2]. In many countries, to control the rising costs of healthcare, Diagnosis-Related Groups (DRGs) have been created. In the late 1960s, the Yale Center for Health Studies developed DRGs to classify inpatient resource use. The goals were to motivate physicians to use hospital resources and other resources more economically, to document the relationship between medical and administrative decisions, and to define hospital products and services by diagnosis [3]. The DRG system was designed to control hospital reimbursements by replacing retrospective payments with prospective payments for hospital charges. Patients are assigned to a DRG based on their diagnosis, procedures, age, gender, discharge status, and the presence of complications or comorbidities [4]. For example, the Centers for Medicare and Medicaid Services (CMS) classify spinal fusions DRGs by anterior or posterior spinal fusions and with or without complications or comorbidities, with a total of 8 DRGs. However, according to previous research, there are significant cost variations between different types of spinal surgical procedures, based on the complexity and extent of the surgical procedures, as well as variations within a given DRG [56]. DRGs are linked to fixed payment amounts based on the average treatment cost of patients in the group, not based on costs actually incurred. Hospitals make financial gains by treating patients for whom the hospital costs are lower than the fixed DRG reimbursement rate. Conversely, hospitals suffer financial losses when treating patients whose costs exceed the fixed DRG reimbursement rate. Taiwan launched a single-payer National Health Insurance program on March 1, 1995. As of 2014, 99.9% of Taiwan's population were enrolled [7]. To control rising medical costs, Taiwan has been implementing DRGs (Tw-DRGs) since January 2010. The Tw-DRG 3.4 version of spinal fusion, which classifies patients into anterior or posterior spinal fusions, is divided into five DRGs: Tw-DRG496 (combined anterior/posterior spinal fusion), Tw-DRG49701 (posterior and other spinal fusion with complications or comorbidities), Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities), Tw-DRG49801 (anterior spinal fusion with complications or comorbidities), and Tw-DRG49802 (anterior spinal fusion without complications or comorbidities) [8]. It is very important to monitor spinal fusion DRGs by constructing prediction models of medical costs for spinal fusion and identifying the potential relationship between patient attributes and medical costs. For patients with potential for high medical resource consumption, hospitals can adopt effective treatment plans to improve patient care and manage hospital resources in advance. Machine learning techniques have recently been used in various healthcare applications [9]; machine learning models are non-parametric in nature and do not need the assumptions that are made in traditional statistical techniques [10]. Various machine learning strategies were previously compared using field-specific datasets, of which several had significantly better predictive power than the more conventional alternatives [11]. The application of machine learning techniques can solve classification problems, develop prediction models, and identify high-risk patients, but to the best of our knowledge, no study has employed machine learning to predict medical costs in DRGs. A wide set of machine learning techniques has been employed to develop prediction models, such as naïve-Bayesian, support vector machines (SVM), logistic regression, C4.5 decision tree, and random forest methods. All five models are typical examples of supervised machine learning. Therefore, the purposes of this study were to compare the performance of naïve-Bayesian, SVM, logistic regression, C4.5 decision tree, and random forest methods in predicting the medical costs of spinal fusion in terms of profit or loss effects according to patient characteristics in Tw-DRGs, and to apply these methods to explore the important factors associated with the medical costs of spinal fusion, enabling better management of these patients.

II. Methods

1. Data Collection and Preparation

In this study, a data set was obtained from a regional hospital containing data from January 2010 to December 2013, from which data of patients who underwent spinal fusion surgery were collected. The hospital is a public regional teaching hospital located in Taoyuan City in northern Taiwan, which currently has 1,545 employees, including 194 staff physicians and 972 beds. The data was original claim data of inpatient admissions used for reimbursement. The database included basic characteristics of patients, admission dates, discharge dates, primary diagnoses, complications or comorbidity ICD-9-CM codes, medical orders, and costs. In the regional hospital, Tw-DRG49702 (posterior and other spinal fusion surgeries without complications or comorbidities) contained the largest number of cases among the five spinal fusion Tw-DRGs; therefore, we chose to use Tw-DRG49702 as the basis for analysis in this study. According to a previous study, the factors affecting the costs of performing a spinal fusion surgery include patient age, complications, gender, obesity, diabetic status, and depression [12]. The predictive variables obtained from the database included demographic factors, such as gender, age, primary disease, complications or comorbidities, number of complications or comorbidities, number of intervertebral cages, and length of stay. The class label was defined as ‘loss’ for those patients whose medical costs exceeded the Tw-DRG49702 fixed payment, meaning that the hospital took a loss, and ‘non-loss’ for patients whose medical costs fell below the Tw-DRG49702 fixed payment, in which cases the hospital did not take a loss. There were 532 cases in total, of which the medical costs for 124 (23.3%) patients exceeded the Tw-DRG49702 payment (‘loss’), and the costs for 408 (76.7%) patients fell below the Tw-DRG49702 fixed payment (‘non-loss’). To redress the imbalance in the data distribution between loss and non-loss, we used the synthetic minority over-sampling technique (SMOTE), which is an important approach in which the positive class or the minority class is oversampled. The SMOTE approach can improve the accuracy of classifiers for a minority class [13].

2. Machine Learning Algorithm for Prediction

In this study, we assessed five classification models, namely, naïve-Bayesian, SVM, logistic regression, C4.5 decision tree, and random forest models.

1) Naïve-Bayesian algorithm

Naïve-Bayesian classifiers or simple-Bayesian classifiers based on Bayes' theorem assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence [14]. Naïve-Bayesian classifiers are among the simplest models in machine learning. Miranda et al. [15] detected cardiovascular disease risk factors using a naïve-Bayesian classifier.

2) Support vector machines algorithm

The SVM algorithm was proposed by Cortes and Vapnik [16] in 1995, and it has become the most influential classification algorithm in recent years. The SVM technique builds a maximum-margin hyper-plane that is positioned in transformed input space and divides the pattern classes, while the distance to the closest plainly divided patterns is maximum. SVM can be used to effectively perform non-linear classification. Kuo et al. [17] used SVM to predict the mortality of hospitalized motorcycle riders.

3) Logistic regression algorithm

Logistic regression is a regression model in which the dependent variable is categorical. Logistic regression is used to model the probability of some event occurring as a linear function of a set of predictor variables, and it is widely used in the medical field to predict the diseases or survivability of a patient [9].

4) C4.5 decision tree algorithm

A decision tree is a flow-chart-like tree structure in which each node denotes a test on an attribute value, each branch represents an outcome of the test, and the tree leaves represent classes or class distributions [14]. There are several different decision tree algorithms, such as Iterative Dichotomiser 3 (ID3), C4.5 decision tree, and Classification and Regression Trees (CART). C4.5 is an algorithm used to generate a decision tree developed by Quinlan [18], and is an extension of Quinlan's earlier ID3 algorithm. C4.5 is often referred to as a statistical classifier [19], and it is a widely used classifier to face real world problems [20]. Habibi et al. [21] used decision tree to find the features related to type 2 diabetes risk factors to help in the screening of diabetes patients.

5) Random forest algorithm

A random forest classifier, proposed by Breiman [22], is an ensemble classifier that produces multiple decision trees using randomly selected features. In classification, trees are voted by the majority. The final classification is obtained by combining the classification results from the individual decision trees. Raju et al. [23] used the random forest model to explore factors associated with pressure ulcers. The performance of the models considered in this study was assessed by computing the accuracy, sensitivity, specificity, and the total area under the receiver operating characteristic (ROC) curve (AUC). Accuracy is the ability to differentiate between loss and non-loss cases correctly. Sensitivity is the ability to identify loss cases correctly. Specificity is the ability to identify non-loss cases correctly. These measurements are expressed in terms of true positive (TP), false negative (FN), true negative (TN), and false positive (FP) values: The ROC curve is a plot between the true positive rate (TP/TP+FN) and the false positive rate (FP/FP+TN). The classification performance is represented by the total AUC. The closer the area is to 0.5, the less accurate the corresponding model is. A model with perfect accuracy will have an area of 1.0 [14]. The Waikato Environment for Knowledge Analysis (WEKA) 3.8.1 was used for prediction in this study. To avoid overfitting due to use of the same data for training and testing of different classification methods, a 10-fold cross-validation method was used to minimize the bias associated with the random sampling of the training data. In the 10-fold cross-validation, the data set was divided into 10 parts. Then 9 parts were used for training, and the remaining part was used for testing. The process was repeated until all parts had been tested [10]. The goal of this process was to determine which data mining algorithm performs best so we could use it to generate our target predictive model [24]. The process of data extraction and analysis in this study is shown in Figure 1.

Figure 1

Procedure for data extraction and analysis. Tw-DRG: Taiwan Diagnosis-Related Group, SVM: support vector machine, AUC: area under the receiver operating characteristic curve.

III. Results

1. Spinal Fusion Patient Characteristics

Table 1 shows the demographic and clinical characteristics of patients in the Tw-DRG49702 group during the study period. There were 532 cases in total. The mean (standard deviation) total medical cost was US $4,549.7 (SD = 1,581.7), and the mean age of the patients was 62.4 (SD = 12.5) years. The average number of complications or comorbidities was 2.0 (SD = 1.22). The mean length of stay was 9.3 (SD = 3.9) days. Among the subjects, 41.4% of the subjects were male, and 58.6% were female. Major primary diseases included lumbar stenosis, spondylolisthesis, lumbar disc displacement, acquired spondylolisthesis lumbar atresia fracture, and scoliosis; complications or comorbidities included high blood pressure, sciatica, diabetes, and osteoporosis. There were significant differences in the actual medical cost, length of stay, number of intervertebral cages, lumbar disc displacement, lumbar atresia fracture and scoliosis between the loss and non-loss groups. However, no significant differences were noted in terms of gender, age, number of complications or comorbidities, lumbar stenosis, high blood pressure, sciatica, spondylolisthesis, diabetes, acquired spondylolisthesis, and osteoporosis between the loss and non-loss groups.

Table 1

Spinal fusion Tw-DRG49702 patient characteristics

Values are presented as mean ± standard deviation or number (%).

CCs: complications or comorbidities.

**p<0.01, ***p<0.001.

2. Performance of Models

Table 2 summarizes the performance of all five models analyzed in this study, with accuracies ranging from 76.68% for the naïve-Bayesian model to 84.30% for the random forest model. The random forest model achieved better predictive performance than the other methods, with the highest accuracy, sensitivity, specificity, and AUC. The model achieved an accuracy of 84.30%, with a sensitivity of 71.40%, a specificity of 92.20%, and an AUC of 0.904. The next best model was logistic regression, with an 82.16% accuracy, a 69.80% sensitivity, an 89.70% specificity, and an AUC of 0.860. The worst model in terms of predictive value was the naïve-Bayesian model, with an accuracy of 76.68%, a sensitivity of 56.90%, a specificity of 88.70%, and an AUC of 0.815 (see details in Appendix 1).

Table 2

Comparison of performance of various prediction models

AUC: area under the receiver operating characteristic curve, SVM: support vector machine.

IV. Discussion

To the best of our knowledge, this study was the first to use machine learning to analyze DRG medical costs. The medical costs of performing spinal fusion in Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities) in a regional hospital in Taoyuan city in Taiwan were predicted, and the factors associated with profit and loss in terms of medical costs in Tw-DRG49702 were analyzed, using various machine learning techniques. The results of the study showed that the length of stay, number of intervertebral cages, lumbar disc displacement, lumbar atresia fracture, and scoliosis were important factors associated with the medical costs of Tw-DRG49702. In addition, we found that the random forest model had the best predictive performance in comparison with the logical regression, SVM, C4.5 decision tree, and naïve-Bayes models. We were able to successfully predict 84.30% of the patients' medical costs of Tw-DRG49702 using the random forest method. The length of stay was an important variable in terms of determining medical costs for patients undergoing spinal fusion, the loss group having a significantly longer length of stay. Future management leading to expected reductions in hospital stay will be based on continuous co-operative efforts to improve clinical guidelines or apply lean methods to produce standardized clinical pathways [25]. In our study, in comparison with the C4.5 decision tree classifier, the random forest model had better classification accuracy, their accuracies being 78.51% and 84.30%, respectively. The random forest algorithm, which is one of the most powerful ensemble algorithms, is an effective tool for prediction. Because of the law of large numbers it does not overfit [22]. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble [26]. Hu et al. [27] experimentally compared the performance of SVM, C4.5, bagging C4.5, AdaBoosting C4.5, and random forest methods for the analysis of seven microarray cancer data sets. The experimental results showed that all ensemble methods outperformed C4.5. Masetic and Subasi [28] confirmed the superiority of the random forest method over the C4.5 and SVM methods for the detection of congestive heart failure. This study also found the random forest model to be superior to traditional logistic regression, a result similar to those of previous studies. The random forest model was more accurate than logistic regression in predicting clinical deterioration. A study of the accuracy of mortality prediction for patients with sepsis at the emergency department found that the random forest model was more accurate (AUC = 0.86) than the logistic regression model (AUC = 0.76, p ≤ 0.003), and the random forest model was more accurate in predicting mortality after elective cardiac surgery than the logistic regression model [29]. Raju et al. [23] also found that the random forest model had the highest accuracy when used to explore factors associated with pressure ulcers in comparison with decision tree and logistic regression models. These results implied that the random forest model is suitable for classification of the medical costs of Tw-DRG49702. The strength of this study was that it explored spinal fusion medical cost predictive models and identified important factors; however, there were some limitations of our study. First, the accuracy of this model was 84.30%, meaning that there still are other potential factors that could affect the medical costs of spinal fusion. Second, the study was only performed at a single hospital and with small sample size. It is recommended that data from larger hospitals are analyzed in future study. Our study demonstrated that the random forest model can be used to predict the medical costs of Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities), and based on the important factors identified, this study can inform hospital strategy in terms of increasing the efficiency of management of this type of operation in financial terms. Furthermore, methods of this type can also be used to address related problems, such as predicting the costs of other DRGs.

18 in total

1. Diagnosis-related Groups and Hospital Inpatient Federal Reimbursement.

Authors: Simcha B Rimler; Brian D Gale; Deborah L Reede
Journal: Radiographics Date: 2015-10 Impact factor: 5.333

2. A new survival status prediction system for severe trauma patients based on a multiple classifier system.

Authors: José Sanz; Daniel Paternain; Mikel Galar; Javier Fernandez; Diego Reyero; Tomás Belzunegui
Journal: Comput Methods Programs Biomed Date: 2017-02-10 Impact factor: 5.428

3. Cost Variation Within Spinal Fusion Payment Groups.

Authors: David J Wright; Dana B Mukamel; Sheldon Greenfield; S Samuel Bederman
Journal: Spine (Phila Pa 1976) Date: 2016-11-15 Impact factor: 3.468

4. Congestive heart failure detection using random forest classifier.

Authors: Zerina Masetic; Abdulhamit Subasi
Journal: Comput Methods Programs Biomed Date: 2016-03-21 Impact factor: 5.428

5. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods.

Authors: Noorazrul Yahya; Martin A Ebert; Max Bulsara; Michael J House; Angel Kennedy; David J Joseph; James W Denham
Journal: Med Phys Date: 2016-05 Impact factor: 4.071

6. Exploring factors associated with pressure ulcers: a data mining approach.

Authors: Dheeraj Raju; Xiaogang Su; Patricia A Patrician; Lori A Loan; Mary S McCarthy
Journal: Int J Nurs Stud Date: 2014-08-18 Impact factor: 5.837

7. Spinal surgery: variations in health care costs and implications for episode-based bundled payments.

Authors: Beatrice Ugiliweneza; Maiying Kong; Kristin Nosova; Kevin T Huang; Ranjith Babu; Shivanand P Lad; Maxwell Boakye
Journal: Spine (Phila Pa 1976) Date: 2014-07-01 Impact factor: 3.468

8. Reduction of Inpatient Hospital Length of Stay in Lumbar Fusion Patients With Implementation of an Evidence-Based Clinical Care Pathway.

Authors: Alison Bradywood; Farrokh Farrokhi; Barbara Williams; Mark Kowalczyk; C Craig Blackmore
Journal: Spine (Phila Pa 1976) Date: 2017-02 Impact factor: 3.468

9. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining.

Authors: Shafi Habibi; Maryam Ahmadi; Somayeh Alizadeh
Journal: Glob J Health Sci Date: 2015-03-18

10. A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis.

Authors: Jérôme Allyn; Nicolas Allou; Pascal Augustin; Ivan Philip; Olivier Martinet; Myriem Belghiti; Sophie Provenchere; Philippe Montravers; Cyril Ferdynus
Journal: PLoS One Date: 2017-01-06 Impact factor: 3.240

12 in total

1. Relationship between outpatients' sociodemographic and belief characteristics and their healthcare-seeking behavioral decision-making: Evidence from Jiaxing city, China.

Authors: Mingming Yu; Zan Yang; Cheng Jiang; Lemin Shi
Journal: PLoS One Date: 2022-06-30 Impact factor: 3.752

2. Machine Learning Approach to Predict Risk of 90-Day Hospital Readmissions in Patients With Atrial Fibrillation: Implications for Quality Improvement in Healthcare.

Authors: Man Hung; Eric S Hon; Evelyn Lauren; Julie Xu; Gary Judd; Weicong Su
Journal: Health Serv Res Manag Epidemiol Date: 2020-09-29

Review 3. Artificial Intelligence-Driven Prediction Modeling and Decision Making in Spine Surgery Using Hybrid Machine Learning Models.

Authors: Babak Saravi; Frank Hassel; Sara Ülkümen; Alisia Zink; Veronika Shavlokhova; Sebastien Couillard-Despres; Martin Boeker; Peter Obid; Gernot Michael Lang
Journal: J Pers Med Date: 2022-03-22

4. Impact of industry 4.0 to create advancements in orthopaedics.

Authors: Mohd Javaid; Abid Haleem
Journal: J Clin Orthop Trauma Date: 2020-03-18

5. Research on diagnosis-related group grouping of inpatient medical expenditure in colorectal cancer patients based on a decision tree model.

Authors: Suo-Wei Wu; Qi Pan; Tong Chen
Journal: World J Clin Cases Date: 2020-06-26 Impact factor: 1.337

6. Positive predictive value of ICD-10 codes for acute myocardial infarction in Japan: a validation study at a single center.

Authors: Takashi Ando; Nobuhiro Ooba; Mayumi Mochizuki; Daisuke Koide; Koichi Kimura; Seitetz L Lee; Soko Setoguchi; Kiyoshi Kubota
Journal: BMC Health Serv Res Date: 2018-11-26 Impact factor: 2.655

7. Predicting outcomes in older ED patients with influenza in real time using a big data-driven and machine learning approach to the hospital information system.

Authors: Chung-Feng Liu; Chien-Cheng Huang; Tian-Hoe Tan; Chien-Chin Hsu; Chia-Jung Chen; Shu-Lien Hsu; Tzu-Lan Liu; Hung-Jung Lin; Jhi-Joung Wang
Journal: BMC Geriatr Date: 2021-04-27 Impact factor: 3.921