Literature DB >> 35055429

Predicting Venous Thrombosis in Osteoarthritis Using a Machine Learning Algorithm: A Population-Based Cohort Study.

Chao Lu1, Jiayin Song1, Hui Li1,2, Wenxing Yu1, Yangquan Hao1, Ke Xu1, Peng Xu1.   

Abstract

Osteoarthritis (OA) is the most common joint disease associated with pain and disability. OA patients are at a high risk for venous thrombosis (VTE). Here, we developed an interpretable machine learning (ML)-based model to predict VTE risk in patients with OA. To establish a prediction model, we used six ML algorithms, of which 35 variables were employed. Recursive feature elimination (RFE) was used to screen the most related clinical variables associated with VTE. SHapley additive exPlanations (SHAP) were applied to interpret the ML mode and determine the importance of the selected features. Overall, 3169 patients with OA (average age: 66.52 ± 7.28 years) were recruited from Xi'an Honghui Hospital. Of these, 352 and 2817 patients were diagnosed with and without VTE, respectively. The XGBoost algorithm showed the best performance. According to the RFE algorithms, 15 variables were retained for further modeling with the XGBoost algorithm. The top three predictors were Kellgren-Lawrence grade, age, and hypertension. Our study showed that the XGBoost model with 15 variables has a high potential to predict VTE risk in patients with OA.

Entities:  

Keywords:  VTE risk prediction; machine learning algorithm; osteoarthritis; population-based cohort study; venous thrombosis

Year:  2022        PMID: 35055429      PMCID: PMC8781369          DOI: 10.3390/jpm12010114

Source DB:  PubMed          Journal:  J Pers Med        ISSN: 2075-4426


1. Introduction

Osteoarthritis (OA) is the most common joint disease worldwide, with an age-associated increase in both incidence and prevalence [1,2]. It is estimated that approximately 302 million people globally suffer from this disease, and the associated healthcare resources and financial burden can be substantial [3,4]. OA, a primary cause of pain, disability, and joint replacement, is characterized by disease affecting the whole joint, including articular cartilage degradation, synovium and ligament inflammation, and changes to the subchondral bone [5,6,7]. Despite the symptomatic treatment of pain, stiffness, and swelling, there are no FDA-approved disease-modifying drugs [8]. As a complex disease, a multitude of possible etiologies contribute to the development of OA, including obesity, sedentary lifestyle, trauma, and aging [9,10,11]. Early prevention and elimination of risk factors are critical in delaying disease progression [12]. Nevertheless, despite these identifiable underlying causes, OA still cannot be effectively prevented. Venous thrombosis is a relatively common and potentially fatal condition in patients, and an increased risk of VTE has been reported in arthritis, particularly in rheumatic arthritis (RA) [13,14,15,16]. Li et al. reported that RA patients have an increased risk of VTE, pulmonary embolism, and deep vein thrombosis after diagnosis in comparison with the general population [17]. This suggests that VTE may play a vital role in chronic and systemic inflammatory autoimmune disease. However, the relationship between OA and VTE has not been elucidated. A recent study in a large population-based cohort revealed that knee or hip osteoarthritis might increase incident VTE risk to 40% and 80%, respectively, when compared to those without OA, which may be partly mediated through joint replacement [18]. Thus, predicting the VTE risk among OA patients is critical to reduce morbidity and mortality from VTE in OA patients. Machine learning (ML) is a computer-based method of data analysis that is often used to construct predictive models based on large datasets [19]. In this study, we aimed to develop a model using the ML algorithm to identify those at high risk of VTE in OA patients

2. Materials and Methods

We performed a single-center cross-sectional study of OA patients in Xi’an Honghui Hospital between January 2018 and December 2020. Patients were consecutively recruited from joint surgery department and were examined by venous ultrasound of the legs to assess VTE risk. The inclusion criteria were as follows: (1) diagnosed with knee osteoarthritis (guidelines for the diagnosis and treatment of osteoarthritis (2018 edition)) [20]; (2) radiographically evaluated by X-ray at Kellgren–Lawrence grade stages 3–4. Those with heart stent, ischemic stroke, cancers, or incomplete laboratory data were excluded from the study. The study was approved by the Ethics Committee of Xi’an Honghui Hospital and conducted in accordance with the Declaration of Helsinki. Written informed consent was waived owing to the retrospective nature of the study. All confidential patient information was deleted from the entire dataset prior to the analysis. All patient demographics and laboratory data at admission were extracted manually from electronic medical records using a standardized case report form.

2.1. Machine Learning Algorithms

To develop machine learning models, 35 parameters were used for the analysis. Before developing the ML models, laboratory indices, which were continuous variables, were converted into categorical variables based on their normal range values. In addition, the patient’s age was treated as a continuous variable, with missing values replaced by median values. All patients were randomly divided into a training set and test set at a ratio of 8:2. Six ML algorithms, namely logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), and light gradient boosting machine (LGBM), were used to predict the VTE risk. We used the receiver operating characteristic (ROC) curve as the evaluation metric to compare the performance of the ML algorithm between the training and testing sets. The best performance model was chosen, and recursive feature elimination (RFE) was employed to screen the optimized variable combinations. For model interpretation, the Shapley additive exPlanations (SHAP) algorithm was used to calculate the Shapley value of each variable based on game theory to further explain the best performance model.

2.2. Statistical Analysis

All statistical analyses were conducted using Python software (version 3.8). A Fisher’s exact test or an x2 test was conducted for binary variables, and Student’s t-test was used for continuous variables. Owing to the imbalance of the dataset, the synthetic minority oversampling technique (SMOTE) was used to deal with the training set. Six ML algorithms were used to screen for the best performance prediction model. Using the RFE algorithm, all variables were filtered one by one to obtain the best combination, which was then established in a selected ML prediction model. We also used the SHAP algorithm to interpret and evaluate the optimized model. Statistical significance was set at p ≤ 0.05.

3. Results

We excluded subjects with missing data and subsequently enrolled 3169 patients with an average age of 66.52 ± 7.28 years in the study (Figure 1). Of them, 2400 patients were male and 769 patients were female, accounting for 75.73% and 24.27% of all patients, respectively. All patients were divided into the VTE and non-VTE groups. There were 352 patients with VTE, with an average age of 68.05 ± 6.84 and 2817 patients without VTE, with an average age of 66.33 ± 7.31. In the VTE group, 281 patients were male (79.83%) and 71 patients were female (20.17%). In the non-VTE group, 2119 patients were male (75.22%) and 698 were female (24.78%). The baseline characteristics of patients stratified by VTE are summarized in Table 1.
Figure 1

Flow chart of patients for enrollment.

Table 1

Characteristics of the patients stratified by VTE or not.

Class aTotalNone-Venous ThrombosisVenous Thrombosisp b
N 31692817352
Age (year) b 66.52 ± 7.2866.33 ± 7.3168.05 ± 6.84<0.001
Gender
Male2400 (75.73%)2119 (75.22%)281 (79.83%)0.066
Female769 (24.27%)698 (24.78%)71 (20.17%)
Hypertension
No1730 (54.59%)1543 (54.77%)187 (53.12%)0.597
Yes1439 (45.41%)1274 (45.23%)165 (46.88%)
Diabetes
No2751 (86.81%)2437 (86.51%)314 (89.20%)0.185
Yes418 (13.19%)380 (13.49%)38 (10.80%)
Coronary heart disease
No2207 (69.64%)1974 (70.07%)233 (66.19%)0.152
Yes962 (30.36%)843 (29.93%)119 (33.81%)
Kellgren–Lawrence grade
02269 (71.60%)1943 (68.97%)326 (92.61%)<0.001
III181 (5.71%)178 (6.32%)3 (0.85%)
IV719 (22.69%)696 (24.71%)23 (6.54%)
Eosinophil ratio
Normal Range2746 (86.65%)2431 (86.30%)315 (89.49%)0.115
Abnormal423 (13.35%)386 (13.70%)37 (10.51%)
Hematocrit
Normal Range2535 (79.99%)2254 (80.01%)281 (79.83%)0.991
Abnormal634 (20.01%)563 (19.99%)71 (20.17%)
Mean platelet volume
Normal Range2782 (87.79%)2462 (87.40%)320 (90.91%)0.070
Abnormal387 (12.21%)355 (12.60%)32 (9.09%)
Thrombocytocrit
Normal Range2858 (90.19%)2527 (89.71%)331 (94.03%)0.013
Abnormal311 (9.81%)290 (10.29%)21 (5.97%)
platelet-larger cell ratio
Normal Range2390 (75.42%)2112 (74.97%)278 (78.98%)0.114
Abnormal779 (24.58%)705 (25.03%)74 (21.02%)
Uric acid
Normal Range2554 (80.59%)2261 (80.26%)293 (83.24%)0.208
Abnormal615 (19.41%)556 (19.74%)59 (16.76%)
Glucose
Normal Range2665 (84.10%)2369 (84.10%)296 (84.09%)0.941
Abnormal504 (15.90%)448 (15.90%)56 (15.91%)
Antistreptococcal hemolysin “O”
Normal Range3074 (97.00%)2726 (96.77%)348 (98.86%)0.045
Abnormal95 (3.00%)91 (3.23%)4 (1.14%)
Anti-CCP antibody
Normal Range2549 (80.44%)2255 (80.05%)294 (83.52%)0.140
Abnormal620 (19.56%)562 (19.95%)58 (16.48%)
Rheumatoid factors
Normal Range2902 (91.57%)2577 (91.48%)325 (92.33%)0.661
Abnormal267 (8.43%)240 (8.52%)27 (7.67%)

a Continuous variable are transformed to dichotomous variables according to their normal range. b Values are presented as mean ± SD.

The patients were randomly stratified (8:2) into training and testing sets to evaluate the model performance. Finally, a total of 35 characteristics were enrolled in the six ML algorithms, including LR, RF, XGBoost, AdaBoost, GBDT, and LGBM, to identify the model with the best predictive performance. Our results showed that the XGBoost model demonstrated the best performance, with an area under the curve (AUC) of 0.741 (95% CI: 0.676, 0.806) (Figure 2A,B). The AUC values of the other models are shown in Table 2.
Figure 2

The receiver operating characteristic (ROC) curves of the machine learning models on the training set (A) and testing set (B).

Table 2

The area under the curve (AUC) of training set and testing set.

Training Set (AUC, 95% CI)Testing Set (AUC, 95% CI)
LR0.843 (0.832, 0.855)0.690 (0.620, 0.760)
RF0.872 (0.862, 0.882)0.685 (0.618, 0.753)
XGBoost0.980 (0.977, 0.983)0.741 (0.676, 0.806)
AdaBoost0.858 (0.847, 0.868)0.687 (0.619, 0.755)
GBDT0.965 (0.960, 0.970)0.720 (0.656, 0.784)
CatBoost0.973 (0.969, 0.977)0.724 (0.657, 0.790)
To further optimize the XGBoost model, the RFE method was used to screen the most important variables that can predict the VTE risk. Finally, 15 variables were employed to establish the final prediction model, and the new XGBoost model showed that the AUC of the testing dataset was 0.727 (95% CI = 0.662, 0.792) (Figure 3A,B).
Figure 3

Using the RFE method to screen the optimal variables. (A) The most import variables, screened by the RFE method; (B) The receiver operating characteristic (ROC) curves of XGBoost model on the training set and testing set.

Interpretation and Evaluation of Machine Learning Model

The SHAP method was also used to interpret the relative importance of each variable in the XGBoost model. Our results showed that age, eosinophil ratio (EOSR), hematocrit (HCT), mean platelet volume (MPV), thrombocytocrit (PCT), platelet-larger cell ratio (P-LCR), uric acid (UA), glucose, antistreptococcal hemolysin “O” (ASO), anti-cyclic citrullinated peptide antibody (ACPA), rheumatoid factor (RF), Kellgren–Lawrence grade (K–L grade), history of hypertension, diabetes, and coronary artery disease (CAD) were associated with the risk of VTE in OA patients. Particularly, K–L grade, age, and hypertension were the three vital variables (Figure 4A,B).
Figure 4

Interpretation and Evaluation of Machine Learning Model. (A) SHAP analysis on the dataset, which shows the 15 most important features and their impact on the model output. Each dot represents one patient, with blue color meaning the lowest range and red color meaning the highest range of the feature; (B) Ranking of the features’ importance indicated by SHAP analysis.

4. Discussion

Extensive efforts have been made to delay OA patients progress to the end stage. In this hospital-based cross-sectional study, we used the ML algorithm to predict VTE risk in patients with OA. We found that using the XGBoost model with 15 variables can predict VTE risk in OA patients, and this may have a growing prevalence due to the global ageing population. OA is not simply a matter of mechanical damage to the joint but involves several additional risk factors [21]. Nevertheless, some patients still inevitably rapidly progress to the end stages [22]. The 11th leading cause of disability worldwide has resulted in a rapid increase in orthopedic surgeries over the last few decades [4]. Rather than medication, lifestyle modification is the most promising avenue for the prevention of OA [3,23]. Many risk factors, including VTE, have been identified, and these may be partly mediated through knee or hip replacement. In a large population-based cohort study, Zeng et al. reported that VTE increased by approximately 40% among individuals with knee OA and by 80% among individuals with hip OA compared to those without OA [18]. Machine learning is a crucial branch of artificial intelligence that utilizes historical data to predict the likelihood of a future outcome [24,25]. As a multidisciplinary approach, ML algorithms are increasingly being utilized to predict outcomes in lower-extremity total joint arthroplasty [26]. Lu et al. used ML to establish a model to predict surgical outcomes after non-compartmental knee arthroplasty [27]. Kunze et al. developed machine learning algorithms based on partially modifiable risk factors for predicting dissatisfaction after arthroplasty [28]. In this study, we found that the XGBoost algorithm was the best performing algorithm. In this prediction model, 15 variables were found to be associated with VTE risk. In addition to the conventional risk factors such as age, hypertension, and diabetes, our study found that CAD, EOSR, HCT, MPV, PCT, P-LCR, UA, ASO, ACPA, RF, and Kellgren–Lawrence grade were also correlated with VTE. These have not been reported elsewhere. The present study has certain limitations. First, although ML algorithms are widely used in medical practice, the predictive value is limited due to the “black box” characteristic. Thus, rather than being used as a clinical judgment tool, an ML algorithm model should be used as a reference for physicians. Second, all the data analyzed in the present study were from a single institution, and the imbalance of gender ratio has limited the generalization of our results. Additionally, because of the nature of an observational study, some unmeasured confounding effects may persist; thus, additional validation and assessment of the relationship between the variables and VTE in OA patients should be performed in a large population. Nevertheless, despite such limitations, to our knowledge, this is the first study to use a machine learning method to predict VTE risk in OA patients.

5. Conclusions

In conclusion, we developed a XGBoost model with a high accuracy in the prediction of VTE risk in patients with OA, which might supply a complementary tool for the screening of populations at high risk of VTE.
  27 in total

Review 1.  Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions.

Authors:  J Matthew Helm; Andrew M Swiergosz; Heather S Haeberle; Jaret M Karnuta; Jonathan L Schaffer; Viktor E Krebs; Andrew I Spitzer; Prem N Ramkumar
Journal:  Curr Rev Musculoskelet Med       Date:  2020-02

Review 2.  Osteoarthritis: Pathology, Diagnosis, and Treatment Options.

Authors:  Benjamin Abramoff; Franklin E Caldera
Journal:  Med Clin North Am       Date:  2019-12-18       Impact factor: 5.456

3.  Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints.

Authors:  Daniel Prieto-Alhambra; Andrew Judge; M Kassim Javaid; Cyrus Cooper; Adolfo Diez-Perez; Nigel K Arden
Journal:  Ann Rheum Dis       Date:  2013-06-06       Impact factor: 19.103

4.  Clinical features and myofascial pain syndrome in older adults with knee osteoarthritis by sex and age distribution: A cross-sectional study.

Authors:  Eleuterio A Sánchez-Romero; Daniel Pecos-Martín; Cesar Calvo-Lobo; David García-Jiménez; Victoria Ochoa-Sáez; Verónica Burgos-Caballero; Josué Fernández-Carnero
Journal:  Knee       Date:  2018-12-07       Impact factor: 2.199

5.  Noncardiac vascular disease in rheumatoid arthritis: increase in venous thromboembolic events?

Authors:  A Kirstin Bacani; Sherine E Gabriel; Cynthia S Crowson; John A Heit; Eric L Matteson
Journal:  Arthritis Rheum       Date:  2012-01

6.  Risk of venous thromboembolism in patients with rheumatoid arthritis and association with disease duration and hospitalization.

Authors:  Marie E Holmqvist; Martin Neovius; Jonas Eriksson; Ängla Mantel; Solveig Wållberg-Jonsson; Lennart T H Jacobsson; Johan Askling
Journal:  JAMA       Date:  2012-10-03       Impact factor: 56.272

Review 7.  Machine Learning in Medicine.

Authors:  Rahul C Deo
Journal:  Circulation       Date:  2015-11-17       Impact factor: 29.690

Review 8.  Changes in the osteochondral unit during osteoarthritis: structure, function and cartilage-bone crosstalk.

Authors:  Steven R Goldring; Mary B Goldring
Journal:  Nat Rev Rheumatol       Date:  2016-09-22       Impact factor: 20.543

Review 9.  Relationship between the Gut Microbiome and Osteoarthritis Pain: Review of the Literature.

Authors:  Eleuterio A Sánchez Romero; Erika Meléndez Oliva; José Luis Alonso Pérez; Sebastián Martín Pérez; Silvia Turroni; Lorenzo Marchese; Jorge Hugo Villafañe
Journal:  Nutrients       Date:  2021-02-24       Impact factor: 5.717

10.  2019 American College of Rheumatology/Arthritis Foundation Guideline for the Management of Osteoarthritis of the Hand, Hip, and Knee.

Authors:  Sharon L Kolasinski; Tuhina Neogi; Marc C Hochberg; Carol Oatis; Gordon Guyatt; Joel Block; Leigh Callahan; Cindy Copenhaver; Carole Dodge; David Felson; Kathleen Gellar; William F Harvey; Gillian Hawker; Edward Herzig; C Kent Kwoh; Amanda E Nelson; Jonathan Samuels; Carla Scanzello; Daniel White; Barton Wise; Roy D Altman; Dana DiRenzo; Joann Fontanarosa; Gina Giradi; Mariko Ishimori; Devyani Misra; Amit Aakash Shah; Anna K Shmagel; Louise M Thoma; Marat Turgunbaev; Amy S Turner; James Reston
Journal:  Arthritis Rheumatol       Date:  2020-01-06       Impact factor: 10.995

View more
  5 in total

1.  Explainable Machine Learning Model for Predicting First-Time Acute Exacerbation in Patients with Chronic Obstructive Pulmonary Disease.

Authors:  Chew-Teng Kor; Yi-Rong Li; Pei-Ru Lin; Sheng-Hao Lin; Bing-Yen Wang; Ching-Hsiung Lin
Journal:  J Pers Med       Date:  2022-02-07

2.  Development and Evaluation of a Machine Learning Prediction Model for Small-for-Gestational-Age Births in Women Exposed to Radiation before Pregnancy.

Authors:  Xi Bai; Zhibo Zhou; Yunyun Luo; Hongbo Yang; Huijuan Zhu; Shi Chen; Hui Pan
Journal:  J Pers Med       Date:  2022-03-31

3.  Explainable Machine Learning-Based Risk Prediction Model for In-Hospital Mortality after Continuous Renal Replacement Therapy Initiation.

Authors:  Pei-Shan Hung; Pei-Ru Lin; Hsin-Hui Hsu; Yi-Chen Huang; Shin-Hwar Wu; Chew-Teng Kor
Journal:  Diagnostics (Basel)       Date:  2022-06-19

4.  Identification for heavy metals exposure on osteoarthritis among aging people and Machine learning for prediction: A study based on NHANES 2011-2020.

Authors:  Fang Xia; Qingwen Li; Xin Luo; Jinyi Wu
Journal:  Front Public Health       Date:  2022-08-01

5.  Predictive models for small-for-gestational-age births in women exposed to pesticides before pregnancy based on multiple machine learning algorithms.

Authors:  Xi Bai; Zhibo Zhou; Mingliang Su; Yansheng Li; Liuqing Yang; Kejia Liu; Hongbo Yang; Huijuan Zhu; Shi Chen; Hui Pan
Journal:  Front Public Health       Date:  2022-08-08
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.