| Literature DB >> 30943960 |
Liang Yao1, Chengsheng Mao1, Yuan Luo2.
Abstract
BACKGROUND: Clinical text classification is an fundamental problem in medical natural language processing. Existing studies have cocnventionally focused on rules or knowledge sources-based feature engineering, but only a limited number of studies have exploited effective representation learning capability of deep learning methods.Entities:
Keywords: Clinical text classification; Convolutional neural networks; Entity embeddings; Obesity challenge; Word embeddings
Year: 2019 PMID: 30943960 PMCID: PMC6448186 DOI: 10.1186/s12911-019-0781-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The class distribution in the obesity challenge datasets
| Label | Training Set | Test Set | ||
|---|---|---|---|---|
| Textual | Intuitive | Textual | Intuitive | |
| Y | 3208 | 3267 | 2192 | 2285 |
| N | 87 | 7362 | 65 | 5100 |
| Q | 39 | 26 | 17 | 14 |
| U | 8296 | 0 | 5770 | 0 |
Fig. 1The test phase of our method
The types of CUIs we used
| TUI | Semantic type description |
|---|---|
| T023 | Body Part, Organ, or Organ Component |
| T033 | Finding |
| T034 | Laboratory or Test Result |
| T047 | Disease or Syndrome |
| T048 | Mental or Behavioral Dysfunction |
| T049 | Cell or Molecular Dysfunctions |
| T059 | Laboratory Procedure |
| T060 | Diagnostic Procedure |
| T061 | Therapeutic or Preventive Procedure |
| T121 | Pharmacologic Substance |
| T122 | Biomedical or Dental Material |
| T123 | Biologically Active Substance |
| T184 | Sign or Symptom |
Fig. 2Our knowledge-guided convolutional neural network architecture
Macro F1 scores and Micro F1 scores of Solt’s system [5] (paper) and our method with word and entity embeddings
| Disease | Solt’s paper [ | Our method with word & entity embeddings | ||||||
|---|---|---|---|---|---|---|---|---|
| Textual | Intuitive | Textual | Intuitive | |||||
| Macro | Micro | Macro | Micro | Macro | Micro | Macro | Micro | |
| Asthma | 0.9434 | 0.9921 | 0.9784 | 0.9894 | 0.9434 | 0.9921 | 0.9784 | 0.9894 |
| CAD | 0.8561 | 0.9256 | 0.6122 | 0.9192 | 0.8551 | 0.9235 |
|
|
| CHF | 0.7939 | 0.9355 | 0.6236 | 0.9315 | 0.7939 | 0.9355 | 0.6236 | 0.9315 |
| Depression | 0.9716 | 0.9842 | 0.9346 | 0.9539 | 0.9716 | 0.9842 |
|
|
| DM | 0.9032 | 0.9761 | 0.9682 | 0.9729 | 0.9056 | 0.9801 | 0.9731 | 0.9770 |
| Gallstones | 0.8141 | 0.9822 | 0.9729 | 0.9857 | 0.8141 | 0.9822 | 0.9689 | 0.9837 |
| GERD | 0.4880 | 0.9881 | 0.5768 | 0.9131 | 0.4880 | 0.9881 | 0.5768 | 0.9131 |
| Gout | 0.9733 | 0.9881 | 0.9771 | 0.9900 | 0.9733 | 0.9881 | 0.9771 | 0.9900 |
| Hypercholesterolemia | 0.7922 | 0.9721 | 0.9053 | 0.9072 | 0.7922 | 0.9721 |
| 0.9118 |
| Hypertension | 0.8378 | 0.9621 | 0.8851 | 0.9283 | 0.8378 | 0.9621 |
|
|
| Hypertriglyceridemia | 0.9732 | 0.9980 | 0.7981 | 0.9712 | 0.9434 | 0.9961 | 0.7092 | 0.9630 |
| OA | 0.9594 | 0.9761 | 0.6286 | 0.9589 | 0.9626 | 0.9781 | 0.6307 | 0.9610 |
| Obesity | 0.4879 | 0.9675 | 0.9724 | 0.9732 | 0.4885 | 0.9696 | 0.9747 | 0.9754 |
| OSA | 0.8781 | 0.9920 | 0.8805 | 0.9939 | 0.8781 | 0.9920 | 0.8805 | 0.9939 |
| PVD | 0.9682 | 0.9862 | 0.6348 | 0.9763 | 0.9682 | 0.9862 | 0.6314 | 0.9742 |
| Venous insufficiency | 0.8403 | 0.9822 | 0.8083 | 0.9625 |
|
| 0.8083 | 0.9625 |
| Overall | 0.8000 | 0.9756 | 0.6745 | 0.9590 |
|
|
|
|
Scores in bold font means they are higher than the corresponding scores of the paper and Perl implementation
Macro F1 scores and Micro F1 scores of Solt’s system [5] (code) and our method with word embeddings only
| Disease | Solt’s code | Our method with word embeddings only | ||||||
|---|---|---|---|---|---|---|---|---|
| Textual | Intuitive | Textual | Intuitive | |||||
| Macro | Micro | Macro | Micro | Macro | Micro | Macro | Micro | |
| Asthma | 0.9434 | 0.9921 | 0.9784 | 0.9894 | 0.9434 | 0.9921 | 0.9784 | 0.9894 |
| CAD | 0.8551 | 0.9235 | 0.6122 | 0.9192 | 0.8551 | 0.9235 | 0.6122 | 0.9192 |
| CHF | 0.7939 | 0.9355 | 0.6236 | 0.9315 | 0.7939 | 0.9355 | 0.6236 | 0.9315 |
| Depression | 0.9716 | 0.9842 | 0.9346 | 0.9539 | 0.9716 | 0.9842 |
|
|
| DM | 0.9056 | 0.9801 | 0.9731 | 0.9770 | 0.9056 | 0.9801 | 0.9731 | 0.9770 |
| Gallstones | 0.8141 | 0.9822 | 0.9729 | 0.9857 | 0.8141 | 0.9822 | 0.9729 | 0.9857 |
| GERD | 0.4880 | 0.9881 | 0.5768 | 0.9131 | 0.4880 | 0.9881 | 0.5768 | 0.9131 |
| Gout | 0.9733 | 0.9881 | 0.9771 | 0.9900 | 0.9733 | 0.9881 | 0.9771 | 0.9900 |
| Hypercholesterolemia | 0.7922 | 0.9721 | 0.9101 | 0.9118 | 0.7922 | 0.9721 | 0.9042 | 0.9049 |
| Hypertension | 0.8378 | 0.9621 | 0.8861 | 0.9283 | 0.8378 | 0.9621 |
|
|
| Hypertriglyceridemia | 0.9732 | 0.9980 | 0.7092 | 0.9630 | 0.9732 | 0.9980 | 0.7092 | 0.9630 |
| OA | 0.9626 | 0.9781 | 0.6307 | 0.9610 | 0.9626 | 0.9781 | 0.6307 | 0.9610 |
| Obesity | 0.4885 | 0.9696 | 0.9747 | 0.9754 | 0.4885 | 0.9696 | 0.9747 | 0.9754 |
| OSA | 0.8781 | 0.9920 | 0.8805 | 0.9939 | 0.8781 | 0.9920 | 0.8805 | 0.9939 |
| PVD | 0.9682 | 0.9862 | 0.6314 | 0.9742 | 0.9682 | 0.9862 | 0.6314 | 0.9742 |
| Venous insufficiency | 0.8403 | 0.9822 | 0.8083 | 0.9625 | 0.8403 | 0.9822 | 0.8083 | 0.9625 |
| Overall | 0.8014 | 0.9760 | 0.6745 | 0.9592 | 0.8014 | 0.9760 |
|
|
Scores in bold font means they are higher than the corresponding scores of the paper and Perl implementation
Macro F1 scores and Micro F1 scores of Logistic Regression and SVM
| Disease | Logistic Regression | SVM | ||||||
|---|---|---|---|---|---|---|---|---|
| Textual | Intuitive | Textual | Intuitive | |||||
| Macro | Micro | Macro | Micro | Macro | Micro | Macro | Micro | |
| Asthma | 0.9434 | 0.9921 | 0.9784 | 0.9894 | 0.9434 | 0.9921 | 0.9784 | 0.9894 |
| CAD | 0.8551 | 0.9235 | 0.6204 | 0.9301 | 0.8551 | 0.9235 | 0.6122 | 0.9192 |
| CHF | 0.7939 | 0.9355 | 0.6236 | 0.9315 | 0.7939 | 0.9355 | 0.6236 | 0.9315 |
| Depression | 0.9716 | 0.9842 | 0.9573 | 0.9706 | 0.9716 | 0.9842 | 0.9573 | 0.9706 |
| DM | 0.9056 | 0.9801 | 0.9731 | 0.9770 | 0.9056 | 0.9801 | 0.9731 | 0.9770 |
| Gallstones | 0.8141 | 0.9822 | 0.9729 | 0.9857 | 0.8141 | 0.9822 | 0.9729 | 0.9857 |
| GERD | 0.4880 | 0.9881 | 0.5768 | 0.9131 | 0.4880 | 0.9881 | 0.5768 | 0.9131 |
| Gout | 0.9733 | 0.9881 | 0.9771 | 0.9900 | 0.9733 | 0.9881 | 0.9771 | 0.99 |
| Hypercholesterolemia | 0.7922 | 0.9721 | 0.9043 | 0.9049 | 0.7922 | 0.9721 | 0.9134 | 0.9142 |
| Hypertension | 0.8378 | 0.9621 | 0.9271 | 0.9507 | 0.8378 | 0.9621 | 0.9271 | 0.9507 |
| Hypertriglyceridemia | 0.9732 | 0.9980 | 0.7092 | 0.9630 | 0.9732 | 0.9980 | 0.7092 | 0.9630 |
| OA | 0.9626 | 0.9781 | 0.6307 | 0.961 | 0.9626 | 0.9781 | 0.6307 | 0.9610 |
| Obesity | 0.4885 | 0.9696 | 0.9747 | 0.9754 | 0.4885 | 0.9696 | 0.9747 | 0.9754 |
| OSA | 0.8781 | 0.992 | 0.8805 | 0.9939 | 0.8781 | 0.9920 | 0.8805 | 0.9939 |
| PVD | 0.9682 | 0.9862 | 0.6314 | 0.9742 | 0.9682 | 0.9862 | 0.6314 | 0.9742 |
| Venous insufficiency | 0.8403 | 0.9822 | 0.8083 | 0.9625 | 0.8403 | 0.9822 | 0.8083 | 0.9625 |
| Overall | 0.8014 | 0.9760 | 0.6764 | 0.9619 | 0.8014 | 0.9760 | 0.6764 | 0.9618 |
Classes with very few examples are labeled by Solt’s system