| Literature DB >> 26043671 |
Lin Chen, Kirsten Vallmuur, Richi Nayak.
Abstract
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.Entities:
Mesh:
Year: 2015 PMID: 26043671 PMCID: PMC4460654 DOI: 10.1186/1472-6947-15-S1-S5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1Term frequency in log scale.
Dataset statistics.
| Dataset | # Document | Average Length | Max Length | Min Length |
|---|---|---|---|---|
| Training Dataset | 10,000 | 70 | 254 | 1 |
| Testing Dataset | 5,000 | 68 | 245 | 1 |
Figure 2Distribution of dataset. A) shows the external code distribution and B) shows the injury factor code distribution.
Evaluation measures.
| Measure | Definition | Equation |
|---|---|---|
| Ratio of truth positive to number of cases truly falls under the classes | TP/(TP+FP) | |
| Proportion of actual positives which are correctly identified | TP/(TP+FN) | |
| Proportion of negative which are correctly identified | TN/(FP+TN) | |
| Split the dataset into 10 folds and runs classification 10 times | (TP+TN)/(TP+FP+TN+FN) | |
| Robust measure taking into account the agreement by chance | K = (I | |
| Identify the relation between sensitivity and specificity | Area under curve plotting sensitivity and (1-specificity) | |
Figure 3External cause classification.
Figure 4Injury factor classification.
Performances of major injury code.
| Class Description | Sensitivity | Specificity | PPV |
|---|---|---|---|
| Infant | 0.89 | 0.91 | 0.87 |
| Furnishing | 0.90 | 0.83 | 0.90 |
| Appliance | 0.86 | 0.98 | 0.81 |
| Utensil container | 0.90 | 0.91 | 0.93 |
| Transport | 0.94 | 0.90 | 0.95 |
| Sporting equipment | 0.92 | 0.95 | 0.92 |
| Tool | 0.92 | 0.90 | 0.86 |
| Food | 0.86 | 0.97 | 0.74 |
| Chemical substance | 0.70 | 0.98 | 0.65 |
| Structure fitting | 0.90 | 0.93 | 0.92 |
Performances of external code.
| Class Description | Sensitivity | Specificity | PPV |
|---|---|---|---|
| Motor vehicle | 0.94 | 0.92 | 0.92 |
| Motorcycle | 0.90 | 0.93 | 0.90 |
| Pedal cyclist | 0.90 | 0.95 | 0.93 |
| Pedestrian | 0.77 | 0.98 | 0.57 |
| Unspecified transport | 0.73 | 0.98 | 0.21 |
| Horse related | 0.94 | 0.92 | 0.95 |
| Animal related (exclude horse) | 0.92 | 0.93 | 0.89 |
| Fall | 0.84 | 0.95 | 0.93 |
| Drowning submersion | 0.90 | 0.92 | 0.64 |
| Breathing | 0.48 | 0.97 | 0.36 |
| Fire | 0.64 | 0.98 | 0.66 |
| Exposure | 0.85 | 0.91 | 0.89 |
| Poisoning | 0.80 | 0.93 | 0.82 |
| Firearm | 0.6 | 0.98 | 0.18 |
| Cutting | 0.50 | 0.98 | 0.81 |
| Machinary | 0.63 | 0.98 | 0.73 |
| Electricity | 0.92 | 0.92 | 0.89 |
| Natural original | 0.85 | 0.96 | 0.66 |
| Unspecified | 0.80 | 0.95 | 0.51 |
| Struck | 0.81 | 0.95 | 0.82 |
Figure 5Impact of training size.
Figure 6Impact of training size against computation time.
Figure 7Impact of removal top-n.
Figure 8Impact of weighting measure.
Figure 9Impact of k selection.
Figure 10Impact of enhancement learning.