Literature DB >> 31114300

Classifying patients with lumbar disc herniation and exploring the most effective risk factors for this disease.

Samira Jafari1, Tania Dehesh2, Farhad Iranmanesh3.   

Abstract

Objectives: To classify patients suffering from low back pain (LBP) into two different groups - patients with lumbar disc herniation (LDH) and patients without this disease based on simple questions and without magnetic resonance imaging (MRI) procedure - and to diagnose the most effective risk factors of LDH.
Methods: Four hundred patients aged over 18 years suffering from LBP for over 6 months were randomized into two groups in this cross-sectional study. The data were gathered at Besat clinic, in Kerman, southeast of Iran. Twelve dichotomous questions from the main LDH risk factors were asked. Three statistical classification methods - K-nearest neighbors (KNN), support vector machine (SVM), and logistic regression (LR) - were performed. LR was used in order to diagnose the most important risk factors of LDH.
Results: SVM method was more efficient among the small sample sizes, while KNN method showed the best classification relative to other methods when the sample size increased. LR model had the least efficiency of all. The drug use increased the chance of LDH more than 7 times (OR=7.249), and the chance of having LDH among people who had associated illness was 4.847 times more compared with people who did not have. Using hookah increased the chance of having LDH more than twice (OR=2.401), and the chance of smokers for LDH was near four times higher than nonsmokers (OR=3.877).
Conclusion: The statistical classification methods had acceptable precisions for diagnosis of LDH patients. It is suggested that neurologists become more familiar with these methods and use them before MRI prescription to decrease the unnecessary burden on health services. Addiction to drugs, cigarettes, and hookah is the main factor in the creation of a lumbar disc herniation.

Entities:  

Keywords:  K nearest neighbors; classification; logistic regression; lumbar disc herniation; support vector machine

Year:  2019        PMID: 31114300      PMCID: PMC6489673          DOI: 10.2147/JPR.S189927

Source DB:  PubMed          Journal:  J Pain Res        ISSN: 1178-7090            Impact factor:   3.133


Introduction

One of the common chronic disorders that have increased in recent decades is low back pain (LBP).1 There is no exact agreement about the definition of LBP in most literature.2 This disease can vary from a permanent ache to a sudden severe pain in the patients’ back.3 This disorder can be classified depending on the duration of the pain. The chronic LBP lasts between 6 and 12 months.4 The reported complications of LBP are excessive. In some cases, it causes severe pain that brings motionlessness and weakness that lowers the quality of life.5 This disease disrupts the daily activities of more than 80% of the people around the world.6 It is reported that annual prevalence of this disease is between 15% and 45%, and it involves an average of 30% of the people.7 Often, all people have experienced LBP between the ages of 30 and 50.8 One of the most prevalent reasons of LBP is lumbar disc herniation (LDH) which is a musculoskeletal disease.4 The most important and effective causes of LDH are genetic, biochemical, traumatic, and psychosocial. In most cases, this disease is inherited from parents to children.9 The usual procedure for diagnosing LDH is labeling the hernia discs by magnetic resonance imaging (MRI).6 MRI is a medical procedure that is often used in the assessment of people with spinal problems. This procedure has huge budget burden on patients and health-care services.10 The increased prevalence of LBP creates an economic burden on society and health services because usually all the patients with backache are referred to an MRI center by the physicians. Despite incredible benefits of MRI in diagnosing the main cause of LBP such as spinal discs, ligaments, vertebral bodies, vascular structures, muscle tissues, disc degeneration, and lumbar spinal canal stenosis,11 it may be possible to diagnose the back pain patients due to LDH in the early stages with other procedures rather than MRI. Many of the patients with LBP need conservative treatments first, and MRI diagnostic procedure could be the next step if the pain worsens. The classification methods have old history in statistics which are used in classifying the observations in similar classes. LR classic classification method has simple interpretations but not in complicated data.12 Recently, machine learning methods, which diagnose the existing classes automatically, were introduced to the classification fields.13 The most frequently used methods in machine learning that classify observations are KNN and SVM.14 Despite traditional classification method, LR, the machine learning methods are relaxed from linearity combination of predictor variables in order to find the best decision boundary. Logistic regression (LR) focuses on maximizing the likelihood ratio function in order to have the best coefficients and also prediction. SVM tries to find the separating hyperplane which has the biggest distance to the closest points (the support vectors). If the cases could not be separated by linear function, this procedure could project the cases to higher dimensions with kernel functions and separate them perfectly. KNN method also classifies cases based on distances, not likelihood function. In fact, this procedure is free from probability density function for outcome.15 SVM and KNN are called the newest classification methods and LR is the most famous one.16 Many of these classification methods can classify patients into two groups with high precision. The performances of these classification methods are based on simple information provided by asking simple questions from the patients without incurring any cost. Using classification methods in statistics, physicians can avoid sending people who have backache for an MRI unnecessarily, and for those patients who do not belong to the LDH group, remedies can be prescribed first. Therefore, the patients can be protected from the stress of MRI, as well as the unnecessary financial burden. This important goal needs the cooperation of biostatistics experts and physicians. Actually, different scientific fields must help each other in order to have better services in the society. A study conducted by Pedro J. García-Laencina et al in 2015 assessed and predicted breast cancer in women. In this study, KNN and LR were used according to the specificity (SP) and concluded that the KNN model was more efficient than LR.17 In another study conducted by Eun-Suk Yang et al, KNN and LR have been used to assess the best combination for ovarian cancer. In the present study, according to the values of receiver operating characteristic (ROC) curve, LR was recognized as the best model for the classification of individuals.18 Chia-Hsun Hsieh et al assessed cancer screening in an asymptomatic population using multiple tumor markers. His results also indicated superiority of SVM over KNN and LR methods.19 The present study aims to compare classification methods (LR, KNN, and SVM) to classify patients with backache into two groups – patients with LDH and patients without LDH with high precision – and to explore and diagnose the most effective risk factors of LDH based on LR.

Methods

Data collection and preparation

The data of this cross-sectional study were collected from June to September 2017 at Besat clinic which is the main clinic for lumbar disorders in the center of Kerman, southeast of Iran. Eligible patients were 252 women and 148 men aged over 18 years suffering from LBP for at least 6 months. Also, they complained of severe pain in their low back that disabled them in routine activities such as walking, sitting, and standing. These patients were referred to the MRI center of Besat clinic by the neurologists to investigate the main cause of their severe back pain. Exclusion criteria were lack of understanding Persian language and any kind of back surgery within the last 4 months. The data were divided into two parts, which are called train and test sets. Three models were built with train set and then their classification efficiency was checked on test set. In fact, the test set did not have any role in building the models. Each of the train and test tests was composed of 200 samples. The models were made based on training data, and then were implemented on the testing data for checking their efficiency better. In this study, the main variables, which were the most effective risk factors for LDH, were collected through a checklist. This checklist, which was developed under the guidance of a neurologist, consisted of 12 questions. The questions are indicated in Table 1. The history of associated illnesses such as diabetes, kidney stones, hypertension, and underlying diseases such as the elongation of the ligaments or muscles of the lower back musculoskeletal system and low back arthritis20,21 were also recorded as Yes or No.
Table 1

The list of 12 simple questions (Q)

Checklist’s itemsWith LDH, n (%) or mean ± SDWithout LDH, n (%) or mean ± SD
Q1: Sex
Female89 (35.2)164 (64.8)
Male48 (32.7)99 (67.3)
Q2: Any waist injury
Yes16 (38.1)26 (61.9)
No121 (33.8)237 (66.2)
Q3: Any sensory disorder in the body
Yes92 (33.7)181 (66.3)
No45 (35.4)82 (64.6)
Q4: Any movement disorder in the body
Yes45 (34.1)87 (65.9)
No92 (34.3)176 (65.7)
Q5: Any associated illness, such as diabetes, kidney stones, hypertension
Yes59 (38.1)96 (61.9)
No78 (31.8)167 (68.2)
Q7: Daily smoking
Yes14 (56.0)11 (44.0)
No123 (32.8)252 (67.2)
Q8: Daily drug use
Yes16 (48.5)17 (51.5)
No121 (33.0)246 (67.0)
Q9: Daily Hookah use
Yes10 (41.7)14 (58.3)
No127 (33.8)249 (66.2)
Q10: How long have you experienced pain in the back (in months)66.83±89.59360.86±80.886
Q11: How long have you experienced severe pain that disrupts daily activity (in months)12.41±25.57012.68±29.099
Q12: Age46.70±14.28744.94±14.276
Height (m)1.62±1.1091.62±0.108
Weight (kg)71.64±15.10070.66±15.456

Abbreviation: LDH, lumbar disc herniation.

The list of 12 simple questions (Q) Abbreviation: LDH, lumbar disc herniation. The interviewing team was trained by the neurologist about the concept of checklist’s questions before dispatching them to the clinic. The participants completed the written informed consent form prior to enrollment in the study. The study was approved by the ethical committee of our institution Kerman University of Medical Sciences (reference code: IR.KMU.REC.1397.078). The study was conducted in compliance with the Helsinki Declaration. The result of the outcome variable was being diagnosed after observing the result of MRI by the neurologist in the form of having lumber disc/not having lumber disc.

KNN method

K-nearest neighbors (KNN) method represents one of the simplest algorithms of the learning machine where we have no model and estimation of parameters. In the KNN method, the distances between all observations (patients, in our study) are calculated and the nearest observations construct a group. Therefore, observations are classified into K similar classes based on the distances between them. After building the classes based on distances, the new observation (new patient), which does not have any role in previous classification, is classified to the nearest class.

SVMs method

The SVMs were introduced by Vapnik et al (1990) for the first time as a family of machine-learning methods. The aim of SVMs is to separate observations by a line (two dimensions) or by a plan (more than two dimensions) in a complicated problem.22 Essentially, SVMs are looking for the best separating line between the classes.23

LR method

The LR is a traditional statistical technique for classification. The same as the previous two methods, its outcome is a dichotomous variable. The method is a kind of regression method which needs a link function.24 The explanatory (independent) variables may be continuous, discrete, or combined.24 Classifying the training data with three classification methods was done with R software version (3.4.4), packages caret, and e1071. R software is free and also the most used, especially when program writing is needed. The models were examined with the testing data, and the best models are introduced based on four classification critera. These diagnosis criteria are sensitivity (SE), SP, correct classification rate (CCR), and kappa coefficient (KC). SE evaluates the validity of the test in detecting the real patient. SP estimates the number of healthy patients that must be excluded. CCR measures how correctly a diagnostic test identifies and rules out a certain condition. CCR of a diagnostic test can be determined from SE and SP with the prevalence.25 The model with the highest value of SE, CCR, and KC and the lowest value of SP is identified as the best classification model. SE, SP, and CCR are described in terms of TP, TN, FN, and FP. The homogeneity of training and testing groups based on demographic variable were tested by Chi-square test and independent samples T test. To explore the effect of sample size on the efficiency of three models, the above classification models were performed in different sample sizes (50, 100, 150, and 200). LR model was used for diagnosing the most effective risk factors of LDH. The odds ratio (OR) index was used to reflect the rate of risk factor effects.

Results

Table 2 describes the demographic characteristics of all the patients in the present study. As depicted, 36.8% were male and 63.2% female. Also, 137 out of 400 (34.2%) patients that were referred to the neurologist had herniated discs according to MRI results. The mean (±SD) age of the patients was 45 years. It showed that people who suffered from LBP were middle-aged and their mean value of BMI was in the normal range.
Table 2

Subject characteristics based on sex

VariablesTotal (n=400)Women (n=253)Men (n=147)P-value
Age (mean ± SD)45.47±14.7747.31±14.1243.64±15.260.091
BMI (mean ± SD), kg/m227.06±5.0227.90±4.7626.18±5.180.235
Duration of smoking (mean ± SD) (days)344.30±1754.38100±1186.52764.76±2387.470.002
Duration of drug use (mean ± SD) (days)388.89±1841.24157.25±1009.60787.55±2693.050.007
Duration of hookah use (mean ± SD)(days)58.09±310.5132.67±183.79101.84±449.610.077
Duration of LBP (mean ± SD) (days)1887.27±2517.181962.35±2376.651758.06±2745.930.435
Duration of severe LBP (mean ± SD) (days)377.61±837.27433.02±941.46282.24±609.480.053
LDH, n (%)
Has LDH137 (34.20)90 (35.60)49 (33.30)0.753
Does not have LDH)263 (65.80)163 (64.60)98 (66.70)
Waist injury, (%)0.192
Yes42 (10.50)24 (9.50)18 (12.20)0.192
No358 (89.50)229 (90.50)129 (87.80)
Sensory disorders, number (%)
Yes273 (68.20)181 (71.50)93 (63.30)0.282
No127 (31.80)72 (28.50)54 (36.70)
Movement disorders, number (%)
Yes132 (33)90 (35.60)43 (29.30)0.750
No268 (67)163 (64.40)104 (70.70)
Smoking, number (%)
Yes25 (6.20)4 (1.60)20 (13.60)1
No375 (93.80)249 (98.40)127 (86.40)
History of drug use, number (%)
Yes33 (8.20)11 (4.30)22 (15)0.364
No367 (91.80)242 (95.70)125 (85)
History of hookah use, number (%)
Yes24 (6)11 (4.30)13 (8.80)0.068
No376 (94)242 (95.70)134 (91.20)
Associated illness, number (%)
Having70 (17.50)46 (18.20)24 (16.30)
Not having330 (82.50)207 (81.80)123 (83.70)0.638
Underlying disease, number (%)
Having155 (38.80)121 (47.80)34 (23.10)<0.001
Not having245 (61.20)132 (52.20)113 (76.90)

Abbreviation: LDH, lumbar disc herniation.

Subject characteristics based on sex Abbreviation: LDH, lumbar disc herniation. As observed, the majority of LBP patients were women (n=253, 63.2%) in proportion to men (n=147, 36.8%), but the number of herniated discs between them was not statistically significant (P=0.753). The main cause of LBP in patients was not a hit to the back (89.5%). Most patients had a sensory disorder (89.5%), but not any movement disorder (33%). Against the neurologist’s expectation, the prevalence of cigarettes (6.2%) and drug addiction among patients was low (8.2%). These results exhibited that the main reason of LBP in most patients was not the usual reasons that were categorized in the checklist as the main factors by neurologist. Comparing the two patient groups (train and test), there were no statistically significant differences in all demographic variables. This result demonstrates the homogeneity of two groups according to demographic variables. The result of this table also shows that there is no statistically significant difference between women and men in terms of demographic and diagnostic LDH variables, except for the duration of drug, cigarettes, and hookah use. Men used all of them significantly longer than women. Women had significantly higher levels of LDH than men (P<0.001). Table 3 shows the results of comparing four criteria (SE, SP, KC, and CCR) in different sample sizes (50, 100, 150, and 200) between three classification methods. As explained in the method section, for best classification, SE, KC and CCR values must be high and SP value must be low. So, in general, the SVM model is useful in classifying low sample size and as the sample size increases, the efficiency of KNN model increases. Also, according to the four criteria, LR model had the least efficiency in all the sample sizes. The comparison of three models with four criteria in different sample sizes is shown for more clarification (Figure 1).
Table 3

The results of comparing KNN, SVM, and LR in different sample sizes

NData setKNNSVMLR
SESPKCCCRSESPKCCCRSESPKCCCR
50Train0.230.940.210.700.1810.220.720.290.900.230.70
Test0.470.810.290.680.4710.530.800.470.900.420.74
100Train0.030.980.020.660.8510.880.950.170.900.100.66
Test0.360.920.320.720.890.980.890.950.110.950.080.65
150Train0.020.980.140.650.450.990.500.810.040.950.0010.64
Test0.320.920.270.710.0610.070.670.070.960.0420.65
200Train0.010.980.010.650.660.980.690.870.290.900.230.70
Test0.310.930.270.710.150.990.1810.690.120.960.100.66

Notes: The highest value of all four criteria (SP, SE, CCR and KC) shows the best model in each sample size, therefore these values are shown in bold.

Abbreviations: N, sample size; SE, sensitivity; SP, specificity; KC, kappa coefficient; CCR, correct classification rate; KNN, K-nearest neighbors; LR, logistic regression; SVM, support vector machine.

Figure 1

The comparison of three models with four criteria in different sample sizes.

Abbreviations: SE, sensitivity; SP, specificity; KC, kappa coefficient; CCR, correct classification rate; KNN, K-nearest neighbors; LR, logistic regression; SVM, support vector machine.

The results of comparing KNN, SVM, and LR in different sample sizes Notes: The highest value of all four criteria (SP, SE, CCR and KC) shows the best model in each sample size, therefore these values are shown in bold. Abbreviations: N, sample size; SE, sensitivity; SP, specificity; KC, kappa coefficient; CCR, correct classification rate; KNN, K-nearest neighbors; LR, logistic regression; SVM, support vector machine. The comparison of three models with four criteria in different sample sizes. Abbreviations: SE, sensitivity; SP, specificity; KC, kappa coefficient; CCR, correct classification rate; KNN, K-nearest neighbors; LR, logistic regression; SVM, support vector machine. Table 4 shows the results of LR model. According to the values of OR, the most effective risk factors of LDH are the use of drug, associated illness of LDH, use of hookah, and cigarettes, respectively. The use of drug increased the chance of LDH more than 7 times (OR=7.249) and the chance of having LDH in people who have associated illness 4.847 times comparing with people who do not have. Using hookah increased the chance of having LDH more than twice (OR=2.401) and the chance of smoker for LDH is near four times higher than nonsmokers (OR=3.877). The chance of women was near 1.5 times more than men for LDH (OR=1.38) and the sensory disorder in the back increased the chance of LDH (OR=1.5). Other risks factors did not significantly affect the development of LDH.
Table 4

The effects of risk factors on LDH

VariablesβP-value95% CIOR
Intercept−4.5670.002−7.417 to 1.7160.010
Age in years0.0030.761−0.015 to −0.021.003
BMI−0.0130.595−0.06 to −0.0340.987
Sex−0.3260.227−0.856 to −0.2031.388
History of hit to the lumber0.3920.3030.353 to 1.1371.480
Sensory disorders−0.3970.122−0.900 to −0.1071.488
Movement disorders−0.0260.920−0.54 to −0.4870.974
Smoking1.3550.049−0.004 to 2.7063.877
History of drug use1.9810.005−0.611 to 3.3517.249
Duration of smoking−3.0510.7300.0001.000
Duration of drug use0.0000.0158.072 to 0.0011.000
Duration of hookah use0.0010.3660.000 to 0.0021.001
Duration of LBP−1.6690.7310.000 to 7.8601.000
Duration of severe LBP9.9140.5080.0001.000
History of hookah use0.8760.036−0.573 to 2.3240.036
Associated illness1.5780.000−0.997 to 2.1604.847
Underlying disease0.1930.453−0.311 to 0.6971.213
The effects of risk factors on LDH

Discussion

The results of the present study provide a perspective for researchers who want to use simpler statistical diagnostic methods before clinical diagnostic. The present results revealed that basic sciences could have an important role in clinical diagnostic which has been rarely mentioned in the previous studies. In this study, the results of three statistical classification methods KNN, SVM, and LR were compared with the results of MRI in diagnosing patients with herniated discs to determine whether they could be used as the former step of clinical methods to avoid the burden of stress and expense on the patients. For this comparison, simple questions have been asked from the patients in the first step of diagnosis. Since this is the first study appraising these three classification methods with dichotomous questions to diagnose patients with herniated disks, there were no directly comparable studies in this context. Most of the previous studies compared classification methods with qualitative questions. Four criteria which are SE, SP, KC, and CCR in four different sample sizes were used to compare. For small sample sizes (50, 100), the SVM models had better classification based on the four criteria, and the KNN model displayed the best classification for larger sample sizes (150, 200). These results depict that SVM model classifies observations better than other methods, especially in small sample sizes. This may be due to the fact that in small sample sizes with dichotomous questions the information gained is little. Since the SVM model does not pay attention to the distribution of data and works based on constructing a high-dimensional plan between observations, these results are in accordance with actual expectations. In contrast, the KNN model is made based on distances between observations, and calculating the distances with quantitative values is simpler relative to dichotomous data. The present results are also in accordance with those of the previous works, such as those noted in the study by Wu Yunfeng in 2016 about knee vibration quantum. In the present study, the comparison is between SVM and LR, and SVM model was chosen in competition with LR.26 LR method is a completely parametric method that displays the best result in follow-up studies, but KNN method is not a parametric method, and it is conducted based on distances between observations. Therefore, when the number of predictors becomes more, KNN procedure shows better results. The present researchers’ findings reveal that sample size has the most influence on the efficiency of KNN compared with the other two methods. Another important finding is that addiction increases the chance of LDH, especially use of drugs, hookah, and cigarettes. These findings are completely in accordance with the previous study.27 The chance of having LDH for people who have sensory disorder and also women is higher than in others. They should be more careful about their back. This result also approves previous results.28 An important strength of the present study is that the predictor variables are made up of the simplest possible questions and the physician collects the information without any measurements. The researchers tried to depict the precision of statistical models in proportion to the clinical method (MRI) in the correct diagnosis of patients. Hence, if there is only one computer system with statistical programs in the doctor’s office s/he can ask patients suffering from LBP the simplest questions and enter their information into the system. If the patient is classified in the group of LDH patients, then the neurologist could refer the patients to MRI for definitive diagnosis. Finally, most studies agree with the present results. Small differences exist because of the differences between type and number of predictors. Therefore, offering methods which could diagnose patients without any measurement is valuable. An important strength of the present study is that today, basic sciences like biostatistics and medicine must cooperate to avoid the burden of many emotional and financial costs on patients. Separating different scientific branches is completely wrong when they could be complementary. The present study had some limitations that merit attention when interpreting the findings. First of all, most of the 12 questions in the checklist were based on one neurologist experiences and also neurology text. In fact, there are other valuable questions that could be taken into account in this study. Maybe other neurologists have useful experiences. Furthermore, diagnosis of LDH was done by one neurologist who could diagnose incorrectly. It could be better to have other neurologists’ cooperation in order to prevent incorrect diagnosis. Planning to compare these three models with a variety of questions and cooperation of more than one neurologist in order to demonstrate the efficiency of models is necessary.

Conclusion

To the best of the present researchers’ knowledge, this study was one of the first ones that compared classification methods when the majority of predictors are binomial. This study confirms the precision of statistical methods in classifying patients which is near MRI results. Hence using them could prevent unnecessary MRI, especially in the first stage of the disease. Since the mean age of our patients’ population is not high, we could emphasize that different types of addiction were the main cause of LDH in young people.
  2 in total

1.  A comparison of sciatica in young subjects and elderly person.

Authors:  Keyvan Mostofi; Morad Peyravi; Babak Gharaei Moghaddam
Journal:  J Clin Orthop Trauma       Date:  2019-07-24

2.  Fast Independent Component Analysis Algorithm-Based Diagnosis of L5 Nerve Root Compression and Changes of Brain Functional Areas Using 3D Functional Magnetic Resonance Imaging.

Authors:  Bofeng Zhao; Fuxia Yang; Lan Guan; Xinbei Li; Yuanming Hu; Chunlei Zhang; Yang Liu; Xiutao Li; Wucheng Wen; Hanqing Lyu
Journal:  J Healthc Eng       Date:  2021-07-22       Impact factor: 2.682

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.