| Literature DB >> 33804560 |
Christos Kokkotis1,2, Serafeim Moustakidis3, Vasilios Baltzopoulos4, Giannis Giakas2, Dimitrios Tsaopoulos1.
Abstract
Knee osteoarthritis (KOA) is a multifactorial disease which is responsible for more than 80% of the osteoarthritis disease's total burden. KOA is heterogeneous in terms of rates of progression with several different phenotypes and a large number of risk factors, which often interact with each other. A number of modifiable and non-modifiable systemic and mechanical parameters along with comorbidities as well as pain-related factors contribute to the development of KOA. Although models exist to predict the onset of the disease or discriminate between asymptotic and OA patients, there are just a few studies in the recent literature that focused on the identification of risk factors associated with KOA progression. This paper contributes to the identification of risk factors for KOA progression via a robust feature selection (FS) methodology that overcomes two crucial challenges: (i) the observed high dimensionality and heterogeneity of the available data that are obtained from the Osteoarthritis Initiative (OAI) database and (ii) a severe class imbalance problem posed by the fact that the KOA progressors class is significantly smaller than the non-progressors' class. The proposed feature selection methodology relies on a combination of evolutionary algorithms and machine learning (ML) models, leading to the selection of a relatively small feature subset of 35 risk factors that generalizes well on the whole dataset (mean accuracy of 71.25%). We investigated the effectiveness of the proposed approach in a comparative analysis with well-known FS techniques with respect to metrics related to both prediction accuracy and generalization capability. The impact of the selected risk factors on the prediction output was further investigated using SHapley Additive exPlanations (SHAP). The proposed FS methodology may contribute to the development of new, efficient risk stratification strategies and identification of risk phenotypes of each KOA patient to enable appropriate interventions.Entities:
Keywords: explainability; feature selection; genetic algorithm; knee osteoarthritis prediction; machine learning
Year: 2021 PMID: 33804560 PMCID: PMC8000487 DOI: 10.3390/healthcare9030260
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Main categories of the feature subsets considered in this paper. A brief description is given along with the number of features considered per category and for each of the two visits.
| Category | Description | Number of Features from Baseline | Number of Features from Visit 1 |
|---|---|---|---|
| Subject characteristics | Includes anthropometric parameters (Body mass index (BMI), height, etc.) | 36 | 9 |
| Symptoms | Questionnaire data regarding arthritis symptoms and general arthritis or health-related function and disability | 120 | 80 |
| Behavioral | Includes variables of participants’ quality level of daily routine and social behavior | 61 | 43 |
| Medical history | Questionnaire results regarding a participant’s arthritis-related and general health histories and medications | 123 | 51 (only medications) |
| Medical imaging outcome | Medical imaging outcomes (e.g., joint space narrowing and osteophytes) | 21 | - |
| Nutrition | Block Food | 224 | - |
| Physical activity | Questionnaire data regarding leisure activities, etc. | 24 | 24 |
| Physical exam | Participants’ measurements, including knee and hand exams, walking tests and other performance measures | 115 | 26 |
|
| 724 | 233 | |
|
| 957 | ||
Figure 1Stratification of the patients in our study and formulation of the training dataset. Inclusion/exclusion criteria are presented along with the definition of the two data classes (knee osteoarthritis (KOA) progressors and non-progressors).
Figure 2The proposed GenWrapper feature selection (FS) methodology that includes all the involved processing steps: (i) generation of the initial population; (ii) fitness measurement approach; (iii) stopping criterion; (iv) evolution mechanisms and (v) final feature ranking after the termination of the genetic algorithm (GA).
Figure 3Definition of genes, chromosomes and population.
Figure 4Proposed mechanism for estimating the fitness of each chromosome within a generation.
Hyperparameters of the optimized GenWrapper algorithm. A brief description of each hyperparameter is provided along with the finally selected value.
| Parameter | Description | Selected Value |
|---|---|---|
| Population size | Number of individual solutions in the population | 50 |
| Number of generations | Maximum number of generations before the algorithm halts | 100 |
| Mutation rate | Probability rate of being mutated | 0.1 |
| Crossover Fraction | The fraction of the population at the next generation, not including elite children, that the crossover function creates. | 0.8 |
| Elite Count | Positive integer specifying how many individuals in the current generation are guaranteed to survive into the next generation | 5 |
| StallGenLimit | The algorithm stops if the weighted average change in the fitness function value over StallGenLimit generations is less than Function tolerance | 50 |
| Tolerance | 1 × 10−3 |
Figure 5Fitness with respect to number of generations for GenWrapper. The black and blue dashed lines show the best and the mean fitness achieved at each generation, respectively.
Figure 6Feature ranking produced by the proposed FS (the dashed line indicates the number of features that were finally selected).
Comparative analysis with respect to the final selection of features: proposed feature ranking versus the feature subset of the best individual solution in the final generation.
| FS Criterion | 10FCV Accuracy Performed 10 Times | ||||
|---|---|---|---|---|---|
| Average | Min | Max | Std | No. of Features | |
| Feature subset extracted from the “best” individual solution of the final generation | 70.10% | 67.59% | 72.04% | 1.13% | 42 |
| Proposed feature ranking | 71.25% | 69.22% | 73.33% | 1.57% | 35 |
Characteristics of the 35 most informative risk factors as selected by the proposed GenWrapper.
| Selected Features | Feature Category | Description |
|---|---|---|
| P01BMI, P01HEIGHT | Subject characteristics | Anthropometric parameters including height and BMI |
| KSXRKN1, V00WOMSTFR, KPLKN1, V00WPLKN2, DIRKN16, V00KOOSYML, V00INCOME | Symptoms | Symptoms related to pain, swelling, stiffness and knee difficulty |
| V00EDCV, V00KQOL4, V00KQOL2, V00CESD9, CEMPLOY | Behavioral | Participants’ quality level of daily routine and social behavior and social status |
| V00RXCHOND, V00RA, V00CHNFQCV | Medical history | Questionnaire data regarding a participant’s general health histories and medications |
| P01SVLKOST | Medical imaging outcome | Medical imaging outcomes (e.g., osteophytes) |
| V00SUPCA, V00FFQ59, V00FFQSZ13, V00FFQ33, V00SUPB2, V00FFQ12, V00SUPFOL, V00FFQ19 | Nutrition | Block Food Frequency questionnaire for daily average, how much each time or for past 12 months |
| PASE2, PASE6, V00PA130CV | Physical activity | Questionnaire results regarding activities during typical week or past 7 days |
| RKALNMT, V00lfmaxf, V00rfTHPL, V00lfTHPL, STEPST1, V00rkdefcv | Physical exam | Physical measurements of participants, including tests and other performance measures |
Figure 7Accuracy (mean 10-fold cross-validation (10FCV)) with respect to selected features (curves): GenWrapper versus a classical wrapper using two classifiers (support vector machine (SVM) and logistic regression (LR)).
Figure 8Accuracy (mean 10FCV) with respect to selected features: GenWrapper versus the remaining competing FS techniques. SVM was used for the classification task for all eight FS techniques.
Best performance (mean 10FCV) achieved by all competing FS techniques employing SVM along with the number of selected features in which this accuracy was accomplished.
| Approach | Best Accuracy (Mean 10FCV) | Number of Features | Statistical Comparison * | Execution Time (sec) ** |
|---|---|---|---|---|
| GenWrapper | 71.25 | 35 | - | 311.6 |
| Wrapper | 69.79 | 31 | 10.2 | |
| CFS | 61.97 | 69 | 0.1 | |
| ILFS | 63.63 | 82 | 0.5 | |
| Inf-FS | 63.32 | 35 | 0.1 | |
| Lasso | 64.41 | 94 | 21.2 | |
| Mrmr | 67.29 | 36 | 2.3 | |
| Hybrid | 67.85 | 41 | 15.5 | |
| PCA | 65.11 | 29 | <0.1 |
* Statistical comparison with the proposed GenWrapper. ** All the algorithms were executed on an Intel Core i7-7500 processor, 2.70 GHz CPU (16 GB RAM) using MATLAB 2020b.
Figure 9Bar graph comparison for the best models (SVMs trained on the optimum number of selected features per case). Red lines correspond to the mean 10FCV, blue boxes visualize the standard deviation of the obtained accuracies, dashed black lines show the min–max range and the red crosses depict outliers (if any).
Figure 10This figure depicts: (a) the SHAP summary plot and; (b) the SHAP feature importance for the SVM trained on the features selected by the proposed GenWrapper.
Selected features that led to the best overall KOA prediction performance in our study. The features have been ranked according to their impact on the classification result as calculated by SHapley Additive exPlanations (SHAP).
| Selected Features | Description | Feature Category |
|---|---|---|
| P01SVLKOST | Left knee baseline X-ray: evidence of knee osteophytes | Medical imaging outcome |
| P01BMI | Body mass index | Subject characteristics |
| V00SUPCA | Block Brief 2000: average daily nutrients from vitamin supplements, calcium (mg) | Nutrition |
| V00EDCV | Highest grade or year of school completed | Behavioral |
| V00FFQ59 | Block Brief 2000: ice cream/frozen yogurt/ice cream bars, eat how often, past 12 months | Nutrition |
| V00KQOL2 | Quality of life: modified lifestyle to avoid potentially damaging activities to knee(s) | Behavioral |
| V00CHNFQCV | Chondroitin sulfate frequency of use, past 6 months | Medical history |
| V00WOMSTFR | Right knee: WOMAC Stiffness Score | Symptoms |
| V00FFQSZ13 | Block Brief 2000: french fries/fried potatoes/hash browns, how much each time | Nutrition |
| V00KQOL4 | Quality of life: in general, how much difficulty have with knee(s) | Behavioral |
| P01HEIGHT | Average height (mm) | Subject characteristics |
| V00lfTHPL | Left Flexion MAX Force High Production Limit | Physical exam |
| V00rkdefcv | Right knee exam: alignment varus or valgus | Physical exam |
| V00FFQ19 | Block Brief 2000: green beans/green peas, eat how often, past 12 months | Nutrition |
| V00FFQ33 | Block Brief 2000: beef steaks/roasts/pot roast (including in frozen dinners/sandwiches), eat how often, past 12 months | Nutrition |
| KPLKN1 | Left knee pain: twisting/pivoting on knee, last 7 days | Symptoms |
| PASE2 | Leisure activities: walking, past 7 days | Physical activity |
| V00INCOME | Yearly income | Behavioral |
| V00PA130CV | How often climb up total of 10 or more flights of stairs during typical week, past 30 days | Physical activity |
| V00CESD9 | How often thought my life had been a failure, past week | Behavioral |
| PASE6 | Leisure activities: muscle strength/endurance, past 7 days | Physical activity |
| DIRKN16 | Right knee difficulty: heavy chores, last 7 days | Symptoms |
| V00SUPB2 | Block Brief 2000: average daily nutrients from vitamin supplements, B2 (mg) | Nutrition |
| STEPST1 | 20-meter walk: trial 1 number of steps | Physical exam |
| V00FFQ12 | Block Brief 2000: any other fruit (e.g., grapes/melon/strawberries/peaches), eat how often, past 12 months | Nutrition |
| KSXRKN1 | Right knee symptoms: swelling, last 7 days | Symptoms |
| V00lfmaxf | Left Flexion MAX Force | Physical exam |
| V00rfTHPL | Right Flexion MAX Force High Production Limit | Physical exam |
| RKALNMT | Right knee exam: alignment, degrees (valgus negative) | Physical exam |
| CEMPLOY | Current employment | Behavioral |
| V00KOOSYML | Left knee: KOOS Symptoms Score | Symptoms |
| V00WPLKN2 | Left knee pain: stairs, last 7 days | Symptoms |
| V00RA | Charlson Comorbidity: have rheumatoid arthritis | Medical history |
| V00SUPFOL | Block Brief 2000: average daily nutrients from vitamin supplements, folate (mcg) | Nutrition |
| V00RXCHOND | Rx Chondroitin sulfate use indicator | Medical history |