| Literature DB >> 30818342 |
Chulho Kim1,2,3, Vivienne Zhu2,3, Jihad Obeid2,3, Leslie Lenert2,3,4.
Abstract
BACKGROUND ANDEntities:
Mesh:
Year: 2019 PMID: 30818342 PMCID: PMC6394972 DOI: 10.1371/journal.pone.0212778
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Preprocessing flow chart of “quanteda” natural language processing package.
Fig 2Difference of the text character lengths between AIS and non-AIS reports.
AIS, acute ischemic stroke.
Fig 3Result of keyness plot analysis of AIS and non-AIS reports.
AIS, acute ischemic stroke.
Fig 4Comparison of ML and NLP algorithms for classifying the brain MRI reports.
ML, machine learning; NLP, natural language processing, BLR, binary logistic regression; NBC, naïve Bayesian classification; SDT, single decision tree; SVM, support vector machine; TFIDF, term frequency-inverse document frequency.
Results of performance of each machine learning algorithms.
| TP | FP | FN | TN | Total | Sensitivity (Recall) | Specificity | PPV (Precision) | NPV | Accuracy | F1-measure | P for χ2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BLR unigram | 100 | 106 | 29 | 671 | 906 | 77.5 | 86.4 | 48.5 | 95.9 | 85.1 | 59.7 | <0.001 |
| BLR tf-idf | 102 | 103 | 27 | 674 | 906 | 79.1 | 86.7 | 49.8 | 96.1 | 85.7 | 61.1 | <0.001 |
| BLR adding bigram | 64 | 298 | 65 | 479 | 906 | 49.6 | 61.6 | 17.7 | 88.1 | 59.9 | 26.1 | 0.020 |
| BLR adding bigram+tf-idf | 60 | 297 | 69 | 480 | 906 | 46.5 | 61.8 | 16.8 | 87.4 | 59.6 | 24.7 | 0.082 |
| NBC unigram | 110 | 170 | 19 | 607 | 906 | 85.3 | 78.1 | 39.3 | 97.0 | 79.1 | 53.8 | <0.001 |
| NBC tf-idf | 112 | 170 | 17 | 607 | 906 | 86.8 | 78.1 | 39.7 | 97.3 | 79.4 | 54.5 | <0.001 |
| NBC adding bigram | 111 | 112 | 18 | 665 | 906 | 86.0 | 85.6 | 49.8 | 97.4 | 85.7 | 63.1 | <0.001 |
| NBC adding bigram+tf-idf | 116 | 118 | 13 | 659 | 906 | 89.9 | 84.8 | 49.6 | 98.1 | 85.5 | 63.9 | <0.001 |
| SDT unigram | 123 | 12 | 6 | 765 | 906 | 95.3 | 98.5 | 91.1 | 99.2 | 98.0 | 93.2 | <0.001 |
| SDT tf-idf | 123 | 12 | 6 | 765 | 906 | 95.3 | 98.5 | 91.1 | 99.2 | 98.0 | 93.2 | <0.001 |
| SDT adding bigram | 123 | 12 | 6 | 765 | 906 | 95.3 | 98.5 | 91.1 | 99.2 | 98.0 | 93.2 | <0.001 |
| SDT adding bigram+tf-idf | 123 | 12 | 6 | 765 | 906 | 95.3 | 98.5 | 91.1 | 99.2 | 98.0 | 93.2 | <0.001 |
| SVM unigram | 76 | 5 | 53 | 772 | 906 | 58.9 | 99.4 | 93.8 | 93.6 | 93.6 | 72.4 | <0.001 |
| SVM tf-idf | 80 | 5 | 49 | 772 | 906 | 62.0 | 99.4 | 94.1 | 94.0 | 94.0 | 74.8 | <0.001 |
| SVM adding bigram | 43 | 0 | 86 | 777 | 906 | 33.3 | 100.0 | 100.0 | 90.0 | 90.5 | 50.0 | <0.001 |
| SVM adding bigram+tf-idf | 50 | 1 | 79 | 776 | 906 | 38.8 | 99.9 | 98.0 | 90.8 | 91.2 | 55.6 | <0.001 |
TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value, NPV, negative predictive value; BLR, binary logistic regression; tf-idf, term frequency-inverse document frequency; NBC, Naïve Bayesian classification; SDT, single decision tree; SVM, support vector machine.
* the best classifiers.
** the worst classifier.
Error analysis of result of single decision tree in classifying AIS and non-AIS.
| Reason for misclassification | FN | FP |
|---|---|---|
| Various disease condition could be accompanied with MR diffusion restrictions | 3 | 0 |
| Reading including the recent or old cerebral hemorrhages | 3 | 4 |
| Lesions with diffusion restrictions in MRI but no relevant clinical symptoms | 5 | 0 |
| Miscellaneous | 1 | 2 |
| Total | 12 | 6 |
AIS, acute ischemic stroke; FN, false negative; FP, false positive.