| Literature DB >> 28760726 |
Adel Elmessiry1, William O Cooper2, Thomas F Catron2, Jan Karrass2, Zhe Zhang3, Munindar P Singh1.
Abstract
BACKGROUND: Unsolicited patient complaints can be a useful service recovery tool for health care organizations. Some patient complaints contain information that may necessitate further action on the part of the health care organization and/or the health care professional. Current approaches depend on the manual processing of patient complaints, which can be costly, slow, and challenging in terms of scalability.Entities:
Keywords: NLP; machine learning; natural language processing; patient complaints
Year: 2017 PMID: 28760726 PMCID: PMC5556254 DOI: 10.2196/medinform.7140
Source DB: PubMed Journal: JMIR Med Inform
Implemented classifiers.
| Classifier | Description |
| Scaled linear discriminant analysis (SLDA) | Expresses one dependent variable as a linear combination of other variables. SLDA is similar to ANOVA, but with the difference that SLDA assumes continuous independent variables and categorical dependent labels. SLDA is widely used in image and pattern recognition [ |
| Support vector machines (SVM) | Divides the dataset via a set of hyperplanes during the learning phase and maps new data to fall into one of the hyperplanes. SVM has been used for text classification [ |
| Glmnet | An implementation of the Lasso and elastic-net regularized generalized linear models, Glmnet is popular for domains with large databases [ |
| Max entropy | A probabilistic classifier that selects the model with maximum entropy from among a set of models and uses it to classify data [ |
| Boosting | Aggregates a set of weak learners (classifiers that perform slightly better than random) to create a strong learner by weighting them appropriately [ |
| Random forests | An ensemble learning method, similar to boosting, that learns and combines many decision trees and subsequently selects the best performing from among multiple learning algorithms to improve predictions. |
Figure 1Term frequency-generated features using 10-splits Monte Carlo cross-validation accuracy.
Figure 2Term frequency-inverse document frequency-generated features using 10-splits Monte Carlo cross-validation accuracy.
Classifiers term frequency (TF) versus term frequency-inverse document frequency (TF-IDF) accuracy, sensitivity, specificity, and F-score using 10-splits Monte Carlo cross-validation at 0.99 sparsity.
| Classifier | TF | TF-IDF | ||||||
| Accuracy | Sensitivity | Specificity | F-score | Accuracy | Sensitivity | Specificity | F-score | |
| SLDA | 0.76 | 0.72 | 0.80 | 0.76 | 0.74 | 0.66 | 0.83 | 0.74 |
| SVM | 0.79 | 0.71 | 0.86 | 0.78 | 0.75 | 0.67 | 0.82 | 0.74 |
| Glmnet | 0.76 | 0.71 | 0.81 | 0.75 | 0.76 | 0.64 | 0.86 | 0.73 |
| Max entropy | 0.77 | 0.71 | 0.83 | 0.76 | 0.77 | 0.69 | 0.84 | 0.76 |
| Boosting | 0.70 | 0.85 | 0.55 | 0.67 | 0.73 | 0.82 | 0.64 | 0.72 |
| Random forests | 0.80 | 0.74 | 0.87 | 0.80 | 0.82 | 0.76 | 0.87 | 0.81 |
Classifier error analysis (n=3010).
| Number of classifiers sharing an error prediction | % of errors |
| 6 | 17.97 |
| 5 | 43.99 |
| 4 | 1.99 |
| 3 | 1.00 |
| 2 | 1.00 |
| 1 | 33.99 |