| Literature DB >> 28228810 |
Mehdi Teimouri1, Farshad Farzadfar2, Mahsa Soudi Alamdari1, Amir Hashemi-Meshkini3, Parisa Adibi Alamdari4, Ehsan Rezaei-Darzi5, Mehdi Varmaghani3, Aysan Zeynalabedini6.
Abstract
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various types of diseases from which we have focused on the identification of ten diseases. In this study, data mining tools are used to identify diseases for which prescriptions are written. In order to evaluate the performances of these methods, we compare the results with Naïve method. Then, combining methods are used to improve the results. Results showed that Support Vector Machine, with an accuracy of 95.32%, shows better performance than the other methods. The result of Naive method, with an accuracy of 67.71%, is 20% worse than Nearest Neighbor method which has the lowest level of accuracy among the other classification algorithms. The results indicate that the implementation of data mining algorithms resulted in a good performance in characterization of outpatient diseases. These results can help to choose appropriate methods for the classification of prescriptions in larger scales.Entities:
Keywords: Data Mining; Voting; Diagnosis; Medical Prescription; Outpatient Diseases; Stacking; Weighted Voting
Year: 2016 PMID: 28228810 PMCID: PMC5242358
Source DB: PubMed Journal: Iran J Pharm Res ISSN: 1726-6882 Impact factor: 1.696
Figure 1Weighted Voting algorithm
Figure 2Comparing the classifier performances based on the average evaluation criteria
Figure 3Confusion Matrix of Support Vector Machine
Figure 4Comparing the Naïve method with data mining algorithms based on average of evaluation metrics
Figure 5Comparing the performances of combining algorithms
The performance of classifiers in detecting diseases
|
|
|
|
|
|
|
|
|
|
|
| Diseases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 94 | 100 | 100 | 93.7 | 95.4 | 98.3 | 91.1 | 92 | 99 | 93.9 | 81 | Decision Tree | Sensitivity |
| 96.2 | 100 | 100 | 94.9 | 93.8 | 98.3 | 93.1 | 95.5 | 99 | 94.9 | 85.7 | Support Vector Machine | |
| 95.8 | 100 | 100 | 94.9 | 95.4 | 96.6 | 93.1 | 95.5 | 99 | 94.9 | 85.7 | Neural Network | |
| 88.9 | 98.4 | 95.2 | 86.1 | 89.2 | 96.6 | 93.1 | 89.3 | 92.8 | 91.9 | 81.7 | Naïve Bayes | |
| 95.8 | 100 | 100 | 97.5 | 95.4 | 96.6 | 93.1 | 94.6 | 99 | 94.9 | 81 | Logistic Regression | |
| 92.2 | 96.9 | 95.2 | 78.5 | 89.2 | 98.3 | 92.1 | 85.7 | 87.6 | 88.9 | 77 | K-Nearest Neighbor | |
| 96.3 | 99.9 | 99.9 | 99.8 | 99.2 | 99.9 | 98.9 | 99.4 | 100 | 99.98 | 98.6 | Decision Tree | Specificity |
| 96.8 | 100 | 100 | 99.99 | 99.2 | 100 | 99.5 | 99.5 | 100 | 99.8 | 99.2 | Support Vector Machine | |
| 96.6 | 100 | 99.9 | 99.9 | 99.3 | 99.9 | 99.6 | 99.6 | 100 | 99.9 | 99.1 | Neural Network | |
| 95.1 | 100 | 99.9 | 98.9 | 99.3 | 98.7 | 98.6 | 99.1 | 99.8 | 99.5 | 98.4 | Naïve Bayes | |
| 95.9 | 100 | 99.9 | 99.4 | 99.3 | 99.9 | 99.7 | 99.8 | 100 | 99.7 | 99.3 | Logistic Regression | |
| 92.1 | 99.9 | 99.9 | 99.5 | 99.1 | 99.8 | 99.2 | 98.9 | 99.8 | 99.5 | 98.1 | K-Nearest Neighbor | |
| 94.1 | 98.5 | 98.4 | 97.4 | 84.9 | 98.3 | 86.8 | 92.8 | 100 | 96.9 | 85 | Decision Tree | |
| 95 | 100 | 100 | 98.7 | 84.7 | 100 | 94 | 93.9 | 100 | 96.9 | 91.5 | Support Vector Machine | Precision |
| 94.8 | 100 | 98.4 | 98.7 | 86.1 | 96.6 | 94.9 | 95.5 | 100 | 98.9 | 90.8 | Neural Network | |
| 92.1 | 100 | 98.3 | 81.9 | 86.6 | 77 | 83.9 | 89.3 | 97.8 | 93.8 | 83.7 | Naïve Bayes | |
| 93.8 | 100 | 100 | 90.6 | 87.3 | 98.3 | 95.9 | 98.1 | 100 | 95.9 | 91.9 | Logistic Regression | |
| 88.1 | 98.4 | 98.3 | 91.2 | 82.9 | 95.1 | 84.9 | 87.3 | 97.7 | 92.6 | 80.2 | K-Nearest Neighbor |