| Literature DB >> 31512729 |
Michel Oleynik1, Amila Kugic1, Zdenko Kasáč1, Markus Kreuzthaler1,2.
Abstract
OBJECTIVE: Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset.Entities:
Keywords: data mining; deep learning; machine learning; natural language processing
Mesh:
Year: 2019 PMID: 31512729 PMCID: PMC6798565 DOI: 10.1093/jamia/ocz149
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Overview of the target classification criteria in the 2018 n2c2 shared task track 1
| Criterion | Balance | Description |
|---|---|---|
|
| Balanced | History of intra-abdominal surgery. |
|
| Balanced | Presence of advanced cardiovascular disease. |
|
| Imbalanced | Current weekly alcohol use over recommended limits. |
|
| Semibalanced | Use of aspirin to prevent myocardial infarction. |
|
| Balanced | Serum creatinine above the normal limit. |
|
| Balanced | Use of dietary supplements in the last two months. |
|
| Imbalanced | Drug abuse. |
|
| Imbalanced | The patient can speak English. |
|
| Balanced | Glycated hemoglobin levels between 6.5% and 9.5%. |
|
| Imbalanced | Ketoacidosis in the last year. |
|
| Balanced | Major complication due to diabetes. |
|
| Imbalanced | The patient can make decisions by himself. |
|
| Imbalanced | Myocardial infarction in the last six months. |
Balanced criteria had the minority class with at least one-third of samples; the semibalanced criterion Asp-for-mi had “met” in around 20% of samples and imbalanced criteria had 1 class with <10% of the training samples.
Overview of the evaluated methods and their characteristics
| Acronym | Classification method | Word embeddings |
|---|---|---|
|
| Majority | N/A |
|
| Rule-based classifier | N/A |
|
| Support vector machine | N/A |
|
| Logistic regression | Self-trained |
|
| Logistic regression | Pretrained |
|
| Long short-term memory | Self-trained |
|
| Long short-term memory | Pretrained |
Pretrained word embeddings were obtained from BioWordVec.
N/A: not applicable.
Overall F1 score per criterion on the test set of the evaluated strategies when compared with the baseline, a majority classifier
| Criterion | Baseline | RBC | SVM | SELF-LR | PRE-LR | SELF-LSTM | PRE-LSTM |
|---|---|---|---|---|---|---|---|
|
| 0.3944 | 0.8720 | 0.6028 | 0.5681 | 0.5959 | 0.4930 | 0.5146 |
|
| 0.3435 | 0.7902 | 0.7281 | 0.7109 | 0.6838 | 0.5865 | 0.4788 |
|
| 0.4911 | 0.4881 | 0.4911 | 0.4911 | 0.4911 | 0.4911 | 0.4881 |
|
| 0.4416 | 0.7095 | 0.6063 | 0.5962 | 0.6060 | 0.4948 | 0.4416 |
|
| 0.4189 | 0.8071 | 0.6532 | 0.7180 | 0.7399 | 0.4788 | 0.5322 |
|
| 0.3385 | 0.9185 | 0.5814 | 0.6150 | 0.6261 | 0.5903 | 0.4640 |
|
| 0.4911 | 0.6910 | 0.4911 | 0.4911 | 0.4881 | 0.4850 | 0.4881 |
|
| 0.4591 | 0.8644 | 0.4591 | 0.4591 | 0.4591 | 0.5253 | 0.5176 |
|
| 0.3723 | 0.9382 | 0.6267 | 0.5393 | 0.5770 | 0.4682 | 0.5137 |
|
| 0.5000 | 0.5000 | 0.5000 | 0.5000 | 0.5000 | 0.5000 | 0.5000 |
|
| 0.3333 | 0.8369 | 0.7555 | 0.7518 | 0.7420 | 0.4883 | 0.5435 |
|
| 0.4911 | 0.4911 | 0.4911 | 0.4911 | 0.4911 | 0.4911 | 0.4881 |
|
| 0.4756 | 0.8752 | 0.6815 | 0.4756 | 0.4756 | 0.4658 | 0.4691 |
| Overall (macro) | 0.4270 | 0.7525 | 0.5899 | 0.5698 | 0.5751 | 0.5045 | 0.4953 |
| Overall (micro) | 0.7608 | 0.9100 | 0.8035 | 0.8017 | 0.8063 | 0.7362 | 0.7377 |
Overall F1 score is the simple mean of the F1 scores for the classes “met” and “not met.”
PRE-LR: pretrained logistic regression; PRE-LSTM: pretrained long short-term memory; RBC: rule-based classifier; SELF-LR: self-trained logistic regression; SELF-LSTM: self-trained long short-term memory; SVM: support vector machine.
Overall accuracy per criterion on the test set of the evaluated strategies when compared with the baseline, a majority classifier
| Criterion | Baseline | RBC | SVM | SELF-LR | PRE-LR | SELF-LSTM | PRE-LSTM |
|---|---|---|---|---|---|---|---|
|
| 0.6512 | 0.8837 | 0.6512 | 0.6279 | 0.6628 | 0.5233 | 0.6047 |
|
| 0.5233 | 0.7907 | 0.7326 | 0.7209 | 0.6977 | 0.5465 | 0.5465 |
|
| 0.9651 | 0.9535 | 0.9651 | 0.9651 | 0.9651 | 0.9535 | 0.9651 |
|
| 0.7907 | 0.8605 | 0.7558 | 0.7674 | 0.7791 | 0.7442 | 0.7791 |
|
| 0.7209 | 0.8372 | 0.7209 | 0.7674 | 0.7907 | 0.5698 | 0.6395 |
|
| 0.5116 | 0.9186 | 0.5814 | 0.6163 | 0.6279 | 0.6047 | 0.4651 |
|
| 0.9651 | 0.9651 | 0.9651 | 0.9651 | 0.9535 | 0.9651 | 0.9651 |
|
| 0.8488 | 0.9419 | 0.8488 | 0.8488 | 0.8488 | 0.8372 | 0.8488 |
|
| 0.5930 | 0.9419 | 0.6512 | 0.5814 | 0.6047 | 0.6047 | 0.5465 |
|
| 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
|
| 0.5000 | 0.8372 | 0.7558 | 0.7558 | 0.7442 | 0.5349 | 0.5465 |
|
| 0.9651 | 0.9651 | 0.9651 | 0.9651 | 0.9651 | 0.9651 | 0.9651 |
|
| 0.9070 | 0.9651 | 0.9302 | 0.9070 | 0.9070 | 0.9070 | 0.9070 |
| Overall | 0.7648 | 0.9123 | 0.8095 | 0.8068 | 0.8113 | 0.7504 | 0.7522 |
Overall accuracy is calculated with “met” and “not met” being considered as positive and negative outcomes, respectively.
PRE-LR: pretrained logistic regression; PRE-LSTM: pretrained long short-term memory; RBC: rule-based classifier; SELF-LR: self-trained logistic regression; SELF-LSTM: self-trained long short-term memory; SVM: support vector machine.