Literature DB >> 34778477

Automated Risk Stratification of Hip Osteoarthritis Development in Patients With Femoroacetabular Impingement Using an Unsupervised Clustering Algorithm: A Study From the Rochester Epidemiology Project.

Sunho Ko1, Ayoosh Pareek2, Changwung Jo1, Hyuk-Soo Han1, Myung Chul Lee1, Aaron J Krych2, Du Hyun Ro1.   

Abstract

BACKGROUND: Studies evaluating the natural history of femoroacetabular impingement (FAI) are limited.
PURPOSE: To stratify the risk of progression to osteoarthritis (OA) in patients with FAI using an unsupervised machine-learning algorithm, compare the characteristics of each subgroup, and validate the reproducibility of staging. STUDY
DESIGN: Cohort study (prognosis); Level of evidence, 2.
METHODS: A geographic database from the Rochester Epidemiology Project was used to identify patients with hip pain between 2000 and 2016. Medical charts were reviewed to obtain characteristic information, physical examination findings, and imaging details. The patient data were randomly split into 2 mutually exclusive sets: train set (70%) for model development and test set (30%) for validation. The data were transformed via Uniform Manifold Approximation and Projection and were clustered using Hierarchical Density-based Spatial Clustering of Applications with Noise.
RESULTS: The study included 1071 patients with a mean follow-up period of 24.7 ± 12.5 years. The patients were clustered into 5 subgroups based on train set results: patients in cluster 1 were in their early 20s (20.9 ± 9.6 years), female dominant (84%), with low body mass index (<19 ); patients in cluster 2 were in their early 20s (22.9 ± 6.7 years), female dominant (95%), and pincer-type FAI (100%) dominant; patients in cluster 3 were in their mid 20s (26.4 ± 9.7) and were mixed-type FAI dominant (92%); patients in cluster 4 were in their early 30s (32.7 ± 7.8), with high body mass index (≥29 ), and diabetes (17%); and patients in cluster 5 were in their early 30s (30.0 ± 9.1), with a higher percentage of males (43%) compared with the other clusters and with limited internal rotation (14%). Mean survival for clusters 1 to 5 was 17.9 ± 0.6, 18.7 ± 0.3, 17.1 ± 0.4, 15.0 ± 0.5, and 15.6 ± 0.5 years, respectively, in the train set. The survival difference was significant between clusters 1 and 4 (P = .02), 2 and 4 (P < .005), 2 and 5 (P = .01), and 3 and 4 (P < .005) in the train set and between clusters 2 and 5 (P = .03) and 3 and 4 (P = .01) in the test set. Cluster characteristics and prognosis was well reproduced in the test set.
CONCLUSION: Using the clustering algorithm, it was possible to determine the prognosis for OA progression in patients with FAI in the presence of conflicting risk factors acting in combination.
© The Author(s) 2021.

Entities:  

Keywords:  FAI; UMAP; clustering; femoroacetabular impingement; machine learning; osteoarthritis

Year:  2021        PMID: 34778477      PMCID: PMC8573500          DOI: 10.1177/23259671211050613

Source DB:  PubMed          Journal:  Orthop J Sports Med        ISSN: 2325-9671


Femoroacetabular impingement (FAI) involves abnormal contact between femoral head and acetabulum leading to labral damage and cartilaginous injury. It can occur in active young adults with hip pain and functional limitation. The most common structural deformities include abnormal femoral head with increasing radius (cam-type lesion), or general or local anterior overcoverage of acetabulum (pincer-type lesion), or combined deformity. The awareness of the symptomatic FAI has been increasing, demonstrated to be 54.4 per 100,000 person-years, which increased consistently between 2000 and 2016. The number of operations including hip arthroscopy, surgical hip dislocation, and anteverting periacetabular osteotomy has increased concordantly. Patients with hip FAI are known to be at an increased risk of hip osteoarthritis (OA), yet little is known about the risk factors for hip OA. Recently, Melugin et al conducted a large cohort study of 1104 patients in the Rochester Epidemiology Project (REP) database analyzing risk factors for hip OA in patients with FAI without prior surgery. REP is a unique population health resource and a collaborative initiative established in 1966 and originating in Olmsted County, Minnesota. The candidate risk factors include characteristic information, radiologic measurements, and physical examination findings. The rate of OA was 13.5% and total hip arthroplasty (THA) was performed in 4% of patients. Male sex, body mass index (BMI) >29 kg/m2, and increased age were risk factors for OA. Independent risk factors for disease are analyzed using conventional statistics in many studies. However, patients have multiple risk factors and the relationship between risk factors is not independent. Therefore, prediction models based on machine learning have been developed to calculate the overall effect of these risk factors in orthopaedics, classifying patients into high- and low-risk groups. Certain risk factors are not predominantly seen in the entire patient group, but in specific groups, they can have a great effect on outcomes. However, it is difficult to identify these risk factors with existing conventional models. If patients are clustered based on topological relationship between variables, the risk stratification and the dominant risk factors in each cluster can be studied. The purpose of this study was to stratify the risk of progression to OA in patients with hip FAI using an unsupervised machine-learning algorithm, compare the characteristics of each subgroup, and validate the reproducibility of staging performed via clustering analysis using a geographic population with long-term follow-up. We hypothesized that (1) risk stratification for progression to OA of patients with hip FAI can be achieved with unsupervised machine learning. (2) There are several subgroups of patients with hip FAI for progression to OA, and there will be distinct risk factors for each subgroup. (3) The staging done by clustering algorithm is reproducible.

Methods

Study Population

This study was approved by an institutional review board. Patients who presented to physician with an International Classification of Diseases, 9th Revision (ICD-9) or 10th Revision (ICD-10), diagnostic code of hip pain, hip impingement, or hip joint disorders between January 2000 and December 2016 from the REP were reviewed. The REP is a medical record linkage system representing a population-based data resource combining the medical records of our institution and other community providers in the city of Rochester and in Olmsted County, Minnesota. Patients between ages of 14 and 50 years were included. For all patients, time from initial hip pain to follow-up was calculated. The patient data were monitored until December 2019 (minimum follow-up of 3 years). Patients with a history of other hip disorders (avascular necrosis, neuromuscular disorder, trochanteric bursitis, hip fracture, pelvic fracture, and hip dislocation) or a history of hip arthroplasty or hip preservation procedures were excluded. All patients provided antero-posterior views and at least 1 lateral view (cross-table, frog-leg, or 45° Dunn) hip radiograph during their initial assessment. All radiographs were reviewed by attending or senior resident-level orthopaedic surgeons. The classification of FAI (cam, pincer, mixed) was based on radiographic criteria proposed by Clohisy et al. The radiographic reviews were evaluated by 2 orthopaedic surgeons to confirm the consistency of radiographic measurement. Clinical charts were reviewed to determine the physical examination findings. Patients with symptoms consistent with hip FAI defined by Warwick agreement, clinical signs, and radiographic findings were included. Patient characteristic factors were identified during chart review. Symptomatic hip OA was defined as symptoms requiring treatment and Tönnis grade 1 or higher on hip radiographs. Symptoms included pain, hip clicking, catching, locking, stiffness, restricted range of motion, or giving way. The data collection and preprocessing steps were performed as described by Melgun et al.

Statistical Analysis

The statistical analysis was performed using R 4.0 (R Core Team, http://www.R-project.org/), and Python 3.7 (Python Software Foundation, https://www.python.org). A descriptive statistical analysis of characteristic, radiographic, and physical examination features was performed. Chi-square and t tests were used to compare OA and non-OA groups. One-way analysis of variance was used for multiple group comparison. Survival curves were plotted using the Kaplan-Meier method, and log-rank test was used for comparison. Missing data were replaced by median value for numerical features and constant value for categorical features (0 for binary categorical variable, otherwise, closest category to the median value of the feature)

Model Creation

Initial feature selection was performed to screen the OA and non-OA groups based only on significant differences in features to cluster the patients based on the risk factors for OA progression. The patient data were randomly split into 2 mutually exclusive sets: train set (70%) for model development and test set (30%) for validation. The features of train set and test set were compared to assess the adequacy of split.

Clustering

In unsupervised machine learning, the training data are not labelled, and thus the system learns without any guidance. Clustering can be used to group the patients without any labels first (train group), then we could use clustering to label the test group for validation, or to find out which cluster a new patient belongs to. In this study, the unsupervised machine-learning algorithm, dimension reduction, and the clustering algorithm were used to stage the risk of progression to OA in patients with hip FAI. Uniform manifold approximation and projection for dimension reduction (UMAP) is a powerful, nonlinear dimension reduction technique. UMAP projects the data into low-dimensional space via conservation of its original topological structure. Hierarchical density-based spatial clustering of applications with noise (HDBSCAN) is an enhanced version of density-based spatial clustering of applications with noise (DBSCAN), which reflects the local density and the hierarchical structure of the data. The data were first projected into the 2-dimensional space via UMAP and then clustered by HDBSCAN.

Comparison with Binary-Classification Machine-Learning Model

The clustering-based model was compared with the binary-classification machine-learning model. The binary-classification model was developed according to the gradient boosting machine algorithm. The model classifies patients into high-risk and low-risk groups. For comparison with the clustering technique, the test set of the previous study was clustered and visualized with the algorithm developed in this study.

Results

Overall, 1071 of 1893 patients met the inclusion criteria. Patients who had undergone hip arthroscopy or hip preservation procedures were excluded (n = 208). The flowchart of process is shown in Figure 1. Mean follow-up time was 24.7 ± 12.5 years. Mean age at the onset of pain was 28.5 ± 9.3 years, and males constituted 29.6% of the cohort. Progression to OA was detected in 149 (13.9%) patients over an average follow-up of 40.1 months. The average follow-up of non-OA patients was 95.4 months.
Figure 1.

The study population. HDBSCAN, hierarchical density-based spatial clustering of applications with noise; UMAP, uniform manifold approximation and projection for dimension reduction.

The study population. HDBSCAN, hierarchical density-based spatial clustering of applications with noise; UMAP, uniform manifold approximation and projection for dimension reduction. Of the 37 variables, 14 were initially selected via univariate analysis. Full list of 37 variables can be found in Appendix Table A1. The characteristics of OA and non-OA groups are listed in Table 1. Follow-up duration of OA (40.1 ± 52.9 months) and non-OA (95.4 ± 58.9 months) groups were significantly different. Labral tear on magnetic resonance imaging (MRI) of OA (72%) and non-OA (52%) groups were also significantly different. Since MRI was taken electively, 740 (69%) patients do not have MRI data. Physical examination data also have large portion of missing values ranging from 26% to 33%. There were no significant differences between train and test sets (Appendix Table A2). Additionally, no difference in survival outcome was detected between the 2 sets (Appendix Figure A1) indicating excellent split between train and test datasets.
Table A1

Full List of Variables

Characteristic Variables
Age at onset of pain
Sex
Race
Education
BMI
Smoker
Diabetes
Ehlers-Danlos (preoperative)
Radiographic variables
Cam deformity
Joint space (mm)
LCEA
Tönnis angle
Tönnis grade
Crossover sign
Posterior wall sign
Ischial spine sign
Acetabular protrusio
Os acetabuli
Type of impingement (choice = pincer)
Type of impingement (choice = cam)
Type of impingement (choice = mixed)
MRI labral tear
Physical examination variables
Anterior/groin pain
Posterior/buttocks pain
Lateral pain
Back pain
Clicking/locking/instability
Stinchfield test
Sitting pain
FADIR test
FABER test
Iliopsoas snapping test
TFL snapping test
Athletic pubalgia test
Flexion limited
Flexion pain
Internal rotation limited
Internal rotation pain
External rotation limited
External rotation pain

BMI, body mass index; FABER, flexion, abduction, external rotation; FADIR, flexion, adduction and internal rotation; LCEA, lateral center edge angle; MRI, magnetic resonance imaging; TFL, tensor fasciae latae.

Table 1

Patient Characteristics and Physical Examination and Imaging Findings

VariableAll Patients (N = 1071)OA Group (n = 149)Non-OA Group (n = 922) P Missing
Follow-up, mo87.7 ± 61.240.1 ± 52.995.4 ± 58.9<.0010 (0)
Age at onset of pain, y28.9 ± 9.335.7 ± 5.627.8 ± 9.3<.0010 (0)
Male sex327 (31)66 (44)261 (28)<.0010 (0)
BMI<.00119 (2)
 <1959 (6)2 (1)57 (6)
 19-23354 (34)22 (15)332 (37)
 24-28284 (27)42 (29)242 (27)
 29-33202 (19)39 (27)163 (18)
 ≥34153 (15)42 (29)111 (12)
Diabetes84 (8)27 (18)57 (6)<.0011 (0)
Pain: anterior/groin605 (56)106 (71)499 (54)<.0011 (0)
Pain: back72 (7)21 (14)51 (6)<.0011 (0)
Flexion limited69 (10)25 (23)44 (7)<.001351 (33)
IR limited83 (11)31 (29)52 (8)<.001311 (29)
ER limited45 (6)21 (18)24 (4)<.001279 (26)
Labral tear on MRI scan186 (56)49 (72)137 (52).003740 (69)
Tönnis grade<.0012 (0)
 0382 (36)23 (15)359 (39)
 1622 (58)91 (61)531 (58)
 259 (6)31 (21)28 (3)
 36 (1)4 (3)2 (0)
Type of impingement0 (0)
 Pincer209 (20)16 (11)193 (21).001
 Cam106 (10)24 (16)82 (9)
 Mixed755 (70)109 (73)646 (70)

Data are reported as mean ± SD or n (%). All variables were statistically significantly different between the study groups (P < .05). BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis.

Table A2

Comparison of Train and Test Set Groups

Total (N = 1071)Train (n = 749)Test (n = 322) P (Train vs Test)Missing
Progression to OA149 (13.9)104 (13.9)45 (14.0).9690 (0)
Follow-up, mo87.7 ± 61.287.6 ± 61.187.8 ± 61.50 (0)
Age at onset of pain, y28.9 ± 9.328.9 ± 9.428.8 ± 9.1.9470 (0)
Male sex327 (31)214 (29)113 (35).0340 (0)
BMI.44219 (2)
 <1959 (6)38 (5)21 (7)
 19-23354 (34)257 (35)97 (30)
 24-28284 (27)190 (26)94 (29)
 29-33202 (19)145 (20)57 (18)
 ≥34153 (15)105 (14)48 (15)
Diabetes84 (8)64 (9)20 (6).1911 (0)
Pain: anterior/groin605 (56)415 (55)190 (59).2531 (0)
Pain: back72 (7)53 (7)19 (6).4891 (0)
Flexion limited69 (10)42 (8)27 (13).063351 (33)
IR limited83 (11)55 (10)28 (13).281311 (29)
ER limited45 (6)30 (5)15 (6).527279 (26)
Labral tear on MRI scan186 (56)125 (53)61 (64).085740 (69)
Tönnis grade.9022 (0)
 0382 (36)264 (35)118 (37)
 1622 (58)443 (59)179 (56)
 259 (6)39 (5)20 (6)
 36 (1)3 (0)3 (1)
Type of impingement.7870 (0)
 Pincer209 (20)147 (20)65 (20)
 Cam106 (10)71 (9)35 (11)
 Mixed755 (70)530 (71)225 (70)

Data are reported as n (%) or mean ± SD. BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis.

Figure A1.

Kaplan-Meier curve of train and test sets. Shading represents 95% CI.

Patient Characteristics and Physical Examination and Imaging Findings Data are reported as mean ± SD or n (%). All variables were statistically significantly different between the study groups (P < .05). BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis.

Clustering Outcome

Figure 2 shows the clusters of the train and test dataset via UMAP and DBSCAN. The train set was clustered into 5 subgroups (Figure 2A). Figure 2B annotates the patients who progressed to OA in the train set. Figure 2C shows the projection of test patients by the trained model. Figure 2D annotates the patients progressed to OA in the test set. Distribution of patients and progression to OA was well reproduced in the test set.
Figure 2.

(A) Clustering of patient data for the train set via UMAP-enhanced DBSCAN. (B) Progression to osteoarthritis (OA) in each cluster (characterized by a different color). The rate of OA progression is clearly distinguished within each cluster. (C) Clustering of patient data for the test set via UMAP-enhanced DBSCAN. (D) Progression to OA in each cluster (characterized by a different color). The test set shows the same distribution as the train set, and the rate of OA progression is also similar. DBSCAN, density-based spatial clustering of applications with noise; UMAP, uniform manifold approximation and projection for dimension reduction.

(A) Clustering of patient data for the train set via UMAP-enhanced DBSCAN. (B) Progression to osteoarthritis (OA) in each cluster (characterized by a different color). The rate of OA progression is clearly distinguished within each cluster. (C) Clustering of patient data for the test set via UMAP-enhanced DBSCAN. (D) Progression to OA in each cluster (characterized by a different color). The test set shows the same distribution as the train set, and the rate of OA progression is also similar. DBSCAN, density-based spatial clustering of applications with noise; UMAP, uniform manifold approximation and projection for dimension reduction. The characteristics and progression to OA of the clusters in the train set are listed in Table 2 and shown in Figure 3. Cluster 1 had a mean age of 20.9 years, was 16% male, all patients had a BMI of <19 kg/m2, and the mean Tönnis grade was 0.63. Cluster 2 had a mean age of 22.9 years, was 5% male, all patients had a BMI of 19 to 24 kg/m2, and the mean Tönnis grade was 0.74. Cluster 3 had a mean age of 26.4 years, was 29% male, all patients had a BMI of 19 to 24 kg/m2, and the mean Tönnis grade was 0.75. Cluster 4 had a mean age of 32.7 years, was 24% male, all patients had a BMI of 29 to 34 kg/m2 in 58% and ≥34 kg/m2 in 42%, and mean Tönnis grade was 0.10. Cluster 5 had a mean age of 30.0 years, was 43% male, all patients had a BMI of 24 to 29 kg/m2, and the mean Tönnis grade was 0.81. The corresponding characteristics and progression to OA of the test-set clusters are shown in Table 3 and Figure 4.
Table 2

Characteristics of the Clusters in the Train Set

All Train Set (N = 749)Cluster 1 (n = 38)Cluster 2 (n = 65)Cluster 3 (n = 192)Cluster 4 (n = 250)Cluster 5 (n = 204) P Missing
Progression to OA104 (14)1 (3)1 (2)16 (8) 56 (22) 30 (15)<.0010 (0)
Follow-up, mo87.4 ± 61.082.6 ± 44.980.4 ± 54.586.5 ± 61.494.6 ± 65.383.4 ± 59.6.2490 (0)
Age at onset of pain, y28.9 ± 9.420.9 ± 9.622.9 ± 6.726.4 ± 9.7 32.7 ± 7.8 30.0 ± 9.1<.0010 (0)
Male sex214 (29)6 (16) 3 (5) 56 (29)61 (24)88 (43)<.0010 (0)
BMI<.00114 (2)
 <1938 (5)38 (100)0 (0)0 (0)0 (0)0 (0)
 19-23257 (35)0 (0)65 (100)192 (100)0 (0)0 (0)
 24-28190 (26)0 (0)0 (0)0 (0)0 (0)190 (100)
 29-33145 (20)0 (0)0 (0)0 (0)145 (58)0 (0)
 ≥34105 (14)0 (0)0 (0)0 (0) 105 (42) 0 (0)
Diabetes64 (9)0 (0)1 (2)7 (4) 42 (17) 14 (7)<.0011 (0)
Pain: anterior/groin415 (55)19 (50)39 (60)101 (53)139 (56)117 (57).4810 (0)
Pain: back53 (7)1 (3)2 (3)5 (3)32 (13)13 (6)<.0010 (0)
Flexion limited42 (8)1 (4)0 (0)6 (5)21 (13)14 (9).016241 (32)
IR limited55 (10)2 (8)2 (4)8 (5)23 (13)20 (14).041207 (27)
ER limited30 (5)2 (9)1 (2)5 (3)15 (9)7 (4).124188 (25)
Labral tear on MRI scan125 (53)3 (38)9 (45)36 (55)35 (51)42 (58).538514 (68)
Tönnis grade<.0010 (0)
 0264 (35) 14 (37) 58 (89) 58 (30)69 (28)65 (32)
 1443 (59)23 (61)7 (11) 123 (64) 164 (66) 126 (62)
 239 (5)1 (3)0 (0)10 (5)17 (7)11 (5)
 33 (0)0 (0)0 (0)1 (1)0 (0)2 (1)
Type of impingement<.0010 (0)
 Pincer147 (20)9 (24) 65 (100) 0 (0)34 (14)39 (19)
 Cam71 (10)0 (0)0 (0)16 (8)24 (10)31 (15)
 Mixed530 (71) 29 (76) 0 (0) 176 (92) 192 (77) 133 (65)

Data are reported as n (%) or mean ± SD. Bolded values indicate significant differentiating factors among clusters. BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis.

Figure 3.

Characteristics of clusters in the train set as visualized using bar graphs. BMI, body mass index; OA, osteoarthritis.

Table 3

Characteristics of Clusters in the Test Set

All Test Set (N = 322)Cluster 1 (n = 21)Cluster 2 (n = 23)Cluster 3 (n = 74)Cluster 4 (n = 105)Cluster 5 (n = 99) P Missing
Progression to OA45 (14)1 (5)0 (0)5 (7) 25 (24) 14 (14).0020 (0)
Follow-up, mo87.8 ± 61.5108.1 ± 65.276.3 ± 54.288.5 ± 55.889.1 ± 64.684 ± 63.4870 (0)
Age at onset of pain, y28.9 ± 9.121.1 ± 9.923.1 ± 8.425.0 ± 9.4 32.4 ± 7.5 31 ± 8<.0010 (0)
Male sex113 (35)6 (29) 1 (4) 29 (39)38 (36)39 (39).0230 (0)
BMI<.0015 (2)
 <1921 (7)21 (100)0 (0)0 (0)0 (0)0 (0)
 19-2397 (31)0 (0)23 (100)74 (100)0 (0)0 (0)
 24-2894 (30)0 (0)0 (0)0 (0)0 (0)94 (100)
 29-3357 (18)0 (0)0 (0)0 (0)57 (54)0 (0)
 ≥3448 (15)0 (0)0 (0)0 (0) 48 (46) 0 (0)
Diabetes20 (6)0 (0)0 (0)1 (1) 16 (15) 3 (3)<.0010 (0)
Pain: anterior/groin190 (59)11 (55)13 (57)40 (54)63 (60)63 (64).7651 (0)
Pain: back19 (6)0 (0)0 (0)2 (3)11 (11)6 (6).0921 (0)
Flexion limited27 (13)2 (17)0 (0)6 (12)11 (16)8 (11).615110 (34)
IR limited28 (13)0 (0)0 (0)6 (11)11 (17)11 (16).202104 (32)
ER limited15 (7)0 (0)0 (0)1 (2)7 (10)7 (9).20591 (28)
Labral tear on MRI scan61 (64)2 (50)1 (50)19 (56)22 (79)17 (61).396226 (70)
Tönnis grade<.0012 (1)
 0118 (37) 9 (45) 21 (91) 24 (32)29 (28)35 (36)
 1179 (56)10 (50)2 (9) 47 (64) 61 (58) 59 (60)
 220 (6)1 (5)0 (0)3 (4)14 (13)2 (2)
 33 (1)0 (0)0 (0)0 (0)1 (1)2 (2)
Type of impingement<.0010 (0)
 Pincer62 (19)5 (24) 23 (100) 0 (0)12 (11)22 (22)
 Cam35 (11)2 (10)0 (0)7 (10)19 (18)7 (7)
 Mixed225 (70) 14 (67) 0 (0) 67 (91) 74 (71) 70 (71)

Data are reported as n (%) or mean ± SD. Bolded values indicate differentiating factors among clusters. BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis.

Figure 4.

Characteristics of clusters in the test set as visualized using bar graphs. BMI, body mass index; OA, osteoarthritis.

Characteristics of the Clusters in the Train Set Data are reported as n (%) or mean ± SD. Bolded values indicate significant differentiating factors among clusters. BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis. Characteristics of clusters in the train set as visualized using bar graphs. BMI, body mass index; OA, osteoarthritis. Characteristics of Clusters in the Test Set Data are reported as n (%) or mean ± SD. Bolded values indicate differentiating factors among clusters. BMI, body mass index; ER, external rotation; IR, internal rotation; MRI, magnetic resonance imaging; OA, osteoarthritis. Characteristics of clusters in the test set as visualized using bar graphs. BMI, body mass index; OA, osteoarthritis.

Comparison with Binary Classification Model

Figure 5 shows the difference between a binary-classification model and a clustering-based model. The patient groups based on clustering are clearly distinguished from other groups in terms of the risk or rate of OA progression and represent the group of patients who did not appear in the existing binary-classification, machine-learning model.
Figure 5.

Progression to osteoarthritis (OA) as shown using the binary-classification machine-learning model. (A) Green and yellow represent patients determined by machine-learning models as being at high risk and low risk, respectively. High-risk patients determined by the machine-learning model are distributed across several clusters. (B) OA progression in each cluster. The unique subgroups of patients identified using the clustering algorithm are undetected in the existing binary-classification machine-learning model.

Progression to osteoarthritis (OA) as shown using the binary-classification machine-learning model. (A) Green and yellow represent patients determined by machine-learning models as being at high risk and low risk, respectively. High-risk patients determined by the machine-learning model are distributed across several clusters. (B) OA progression in each cluster. The unique subgroups of patients identified using the clustering algorithm are undetected in the existing binary-classification machine-learning model.

Long-term Prognosis of Clusters

The progression to OA in each group is presented in Figure 6. The percentage of progression to OA (survivals) of each cluster are listed in Table 4. In the train set, the mean survival for clusters 1 to 5 were 17.9 ± 0.6, 18.7 ± 0.3, 17.1 ± 0.4, 15.0 ± 0.5, 15.6 ± 0.5 years, respectively. According to the log-rank test, there were significant differences in survival between train-set clusters 1 and 4 (P = .02), 2 and 4 (P < .005), 2 and 5 (P = .01), and 3 and 4 (P < .005), likewise for the test-set clusters 2 and 5 (P = .03), and 3 and 4 (P = .01). The long-term prognosis of each group was clearly distinguished in both train and test groups in a similar fashion.
Figure 6.

Survival functions for the (A) train set and (b) test set. The long-term prognosis of each group was clearly distinguished in the train and test groups similarly.

Table 4

Percentage of Progression to Osteoarthritis (Survivals) of Clusters for the Train Set and Test Set

Train SetCluster 1 (n = 38)Cluster 2 (n = 65)Cluster 3 (n = 192)Cluster 4 (n = 250)Cluster 5 (n = 204)
2 y1.0000.9850.9320.8830.901
5 y1.0000.9850.9190.8520.874
10 y0.9540.9850.9040.7510.862
15 y0.9540.9850.9040.7230.771
Test SetCluster 1 (n = 21)Cluster 2 (n = 23)Cluster 3 (n = 74)Cluster 4 (n = 105)Cluster 5 (n = 99)
2 y0.9521.0000.9720.8840.959
5 y0.9521.0000.9380.8170.891
10 y0.9521.0000.9090.7250.820
15 y0.9521.0000.9090.7250.713
Survival functions for the (A) train set and (b) test set. The long-term prognosis of each group was clearly distinguished in the train and test groups similarly. Percentage of Progression to Osteoarthritis (Survivals) of Clusters for the Train Set and Test Set

Discussion

Using the dimension reduction and clustering algorithms, patients with hip FAI are separated into 5 clusters. Characteristics and survival of each cluster were evaluated, and each cluster has a different risk for OA progression.

Cluster Characteristics

Table 2 clearly shows the difference between BMI and type of impingement between clusters in the train set. The best prognosis among the 5 clusters was seen in the cluster 2 patients, characterized by age in the early 20s (22.9 ± 6.7 years), female dominant (95%), and pincer-type FAI (100%) dominant, with a mean survival of 18.7 ± 0.3 years. On the other hand, the worst prognosis was in the cluster 4 patients, characterized by age in the early 30s (32.7 ± 7.8 years), high BMI (≥29 kg/m2), and diabetes (17%), with a mean survival of 15.0 ± 0.5 years. Patients in cluster 3 were in their mid-20s (26.4 ± 9.7 years), and were mixed-type FAI (92%) dominant, with a mean survival of 17.1 ± 0.4 years. Patients in cluster 1 were characterized by age in the early 20s (20.9 ± 9.6 years), female dominant (84%), and low BMI (<19 kg/m2), with a mean survival of 17.9 ± 0.6 years. Cluster 5 patients were in their early 30s (30.0 ± 9.1 years) and were male (43%) dominant than the other clusters, with limited internal rotation (14%) and a mean survival of 15.6 ± 0.5 years. The relationship between the characteristics of each cluster and its prognosis can be explained by former study results. BMI >29 kg/m2, increased age, and male sex have all been identified as risk factors for OA progression. The difference in the type of impingement between clusters may originate in the sex difference. Pincer type is predominant in females, while cam-type lesion is dominant in young male athletes. This can be seen in cluster 2, which was female dominant (95%) and pincer-type dominant (100%), while cluster 3 had a higher percentage of males (29%) than cluster 2 and cam (8%), and were mixed-type (92%) impingement. There was no significant difference between clusters in MRI labral tear or physical examination data, despite their clinical importance. MRI labral tears or physical examination data are binary categorical features with large missing value. For binary categorical features, the imputation of missing values can be done only by constant value—usually and in this case zero, resulting in only small portion among the whole patients having positive values. Because the clustering algorithm divides subgroups based on features that ‘group’ the ‘whole patients’, influence of features with only small percentage of positive value inevitably decreases in clustering modeling.

Modeling Strengths

Classification models predicting high-risk groups and survival regression models such as random survival forest was mainly used in prognostic models using supervised machine-learning algorithm. The classification model facilitates the interpretation of the result clearly. Risk status, sensitivity, and specificity data provide clear insight into patient status. However, the follow-up time is not considered in modeling of long-term follow-up data. In the case of survival regression model, a quantified survival curve can be obtained. However, the c-index, a widely used evaluation metric, determines the order of events (progression to OA), but not the exact survival time. However, the advantage of the clustering technique is that there is no need to consider follow-up time as it does not require outcome variable in the modeling process, and the result interpretation is also intuitive and clear.

Limitations

There are limitations in the patient population. Although the mean age of the group was young, patients up to 55 years old were included in the study. The risk of OA is expected to increase with age. The patients were from single location, and the dominant race was White. Additional multiregional studies are needed to determine whether this model can be generalized to patients in other regions. There may be disagreement on the definition of symptomatic hip OA. In this study, symptomatic hip OA was defined as symptoms requiring treatment and Tönnis grade ≥1 on hip radiographs. Some may argue that including Tönnis grade 1 is too stringent in determining significant OA. The indication for THA was not standardized in all patients, so this group was excluded and not analyzed separately. There are limitations in the modeling algorithm. Because the clustering method does not provide model interpretability (black box), the modeling process must be presumed based on the characteristics of each cluster. Therefore, another statistical analysis is required to determine the effect of specific factors. The model may not provide cluster information for some patients. Although all patients in the test set were successfully classified in to 5 clusters, a patient from different cohort can have different characteristic, and may not belong to any clusters. UMAP clusters the data based on the topological relationship between input data—which means that causal relationship between the features is not considered. Therefore, domain knowledge is required to interpret the causal relationship between the features. Application of clustering algorithm to other patient groups cannot stratify prognosis clearly if risk factors in each cluster shift favorably or unfavorably. However, this also has clinical implications in that it is possible to determine the prognosis in the presence of conflicting risk factors acting in combination in actual patients. Therefore, the clustering algorithm is of high clinical value.

Conclusion

The candidate risk factors for OA progression in patients with FAI were selected; then, unsupervised machine-learning algorithm was applied for stratifying the risk of OA progression. The clusters were characterized by BMI, type of impingement, and sex, which were identified as independent risk factors for OA progression by conventional statistics.
  17 in total

1.  Predictors of progression of osteoarthritis in femoroacetabular impingement: a radiological study with a minimum of ten years follow-up.

Authors:  N V Bardakos; R N Villar
Journal:  J Bone Joint Surg Br       Date:  2009-02

2.  Transfusion after total knee arthroplasty can be predicted using the machine learning algorithm.

Authors:  Changwung Jo; Sunho Ko; Woo Cheol Shin; Hyuk-Soo Han; Myung Chul Lee; Taehoon Ko; Du Hyun Ro
Journal:  Knee Surg Sports Traumatol Arthrosc       Date:  2019-06-28       Impact factor: 4.342

3.  History of the Rochester Epidemiology Project.

Authors:  L J Melton
Journal:  Mayo Clin Proc       Date:  1996-03       Impact factor: 7.616

4.  Hip morphology influences the pattern of damage to the acetabular cartilage: femoroacetabular impingement as a cause of early osteoarthritis of the hip.

Authors:  M Beck; M Kalhor; M Leunig; R Ganz
Journal:  J Bone Joint Surg Br       Date:  2005-07

5.  Prevalence of cam-type femoroacetabular impingement morphology in asymptomatic volunteers.

Authors:  Kalesha Hack; Gina Di Primio; Kawan Rakhra; Paul E Beaulé
Journal:  J Bone Joint Surg Am       Date:  2010-10-20       Impact factor: 5.284

6.  The Warwick Agreement on femoroacetabular impingement syndrome (FAI syndrome): an international consensus statement.

Authors:  D R Griffin; E J Dickenson; J O'Donnell; R Agricola; T Awan; M Beck; J C Clohisy; H P Dijkstra; E Falvey; M Gimpel; R S Hinman; P Hölmich; A Kassarjian; H D Martin; R Martin; R C Mather; M J Philippon; M P Reiman; A Takla; K Thorborg; S Walker; A Weir; K L Bennell
Journal:  Br J Sports Med       Date:  2016-10       Impact factor: 13.800

7.  Prevalence of associated deformities and hip pain in patients with cam-type femoroacetabular impingement.

Authors:  D Allen; P E Beaulé; O Ramadan; S Doucette
Journal:  J Bone Joint Surg Br       Date:  2009-05

Review 8.  Femoroacetabular impingement: a cause for osteoarthritis of the hip.

Authors:  Reinhold Ganz; Javad Parvizi; Martin Beck; Michael Leunig; Hubert Nötzli; Klaus A Siebenrock
Journal:  Clin Orthop Relat Res       Date:  2003-12       Impact factor: 4.176

9.  Deep learning-based survival prediction of oral cancer patients.

Authors:  Dong Wook Kim; Sanghoon Lee; Sunmo Kwon; Woong Nam; In-Ho Cha; Hyung Jun Kim
Journal:  Sci Rep       Date:  2019-05-06       Impact factor: 4.379

10.  Maximum lifetime body mass index is the appropriate predictor of knee and hip osteoarthritis.

Authors:  Sabine Patricia Singer; Dietmar Dammerer; Martin Krismer; Michael C Liebensteiner
Journal:  Arch Orthop Trauma Surg       Date:  2017-10-27       Impact factor: 3.067

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.