| Literature DB >> 31437913 |
Benjamin C Knoll1, Elizabeth A Lindemann2, Arian L Albert1, Genevieve B Melton1,2, Serguei V S Pakhomov1,3.
Abstract
Although a number of foundational natural language processing (NLP) tasks like text segmentation are considered a simple problem in the general English domain dominated by well-formed text, complexities of clinical documentation lead to poor performance of existing solutions designed for the general English domain. We present an alternative solution that relies on a convolutional neural network layer followed by a bidirectional long short-term memory layer (CNN-Bi-LSTM) for the task of sentence boundary disambiguation and describe an ensemble approach for domain adaptation using two training corpora. Implementations using the Keras neural-networks API are available at https://github.com/NLPIE/clinical-sentences.Entities:
Keywords: Machine Learning; Natural Language Processing; Neural Networks (Computer)
Mesh:
Year: 2019 PMID: 31437913 PMCID: PMC7360019 DOI: 10.3233/SHTI190211
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Examples of sentences without termination
| Text |
|---|
| RECOMMENDATIONS FOR MDs/PROVIDERS TO ORDER: |
| Recommendations already ordered by Registered Dietitian (RD): Calorie counts reordered |
| Diet: dysphagia diet level 2 mechanical, thin liquids, magic cup between meals, Nepro between meals |
| Pt reported his appetite is getting better, he likes the supplements |
| (+) No chance of pregnancy C-spine cleared: N/A, no H/O Chronic pain,no other significant disability |
Inpatient note types per FV batch
| Note Type | Number |
|---|---|
| Progress Note | 3 |
| Plan of Care | 3 |
| ED Notes | 2 |
| 8 other note types | 1 each |
Outpatient note departments per FV batch
| Note Type | Number |
|---|---|
| Family Medicine | 5 |
| Internal Medicine | 4 |
| Pediatrics | 3 |
| Obstetrics and Gynecology | 3 |
| Hematology and Oncology | 3 |
| Urgent Care | 3 |
| Physical Therapy | 3 |
| Cardiovascular Disease | 3 |
| 13 other departments | 1 each |
Figure 1 –Word Representation Layer
Figure 2 –Bi-directional LSTM
Figure 3 –Complete Model Graph
Figure 4 –Ensemble of Two Networks
Figure 5 –Weighting of classes
Distribution of Tags in MIMIC
| Tag | Count | Percentage |
|---|---|---|
| B | 23,648 | 7.5% |
| I | 200,272 | 63.4% |
| O | 91,877 | 29.1% |
Distribution of Tags in FV
| Tag | Count | Percentage |
|---|---|---|
| B | 43,636 | 10.5% |
| I | 336,018 | 80.9% |
| O | 35,458 | 8.5% |
Sentence Termination Type
| Type | MIMIC | FV |
|---|---|---|
| Period | 12,698 (53.7%) | 13,619 (31.2%) |
| Exclamation Point | 4 | 19 |
| Question Mark | 24 (0.1%) | 261 (0.6%) |
| Semi-colon | 48 (0.2%) | 10 |
| Colon | 4,855 (20.5%) | 6,180 (14.2%) |
| Quotation | 4 | 58 (0.1%) |
| No symbol | 6,018 (25.4%) | 23,506 (53.8%) |
‘B’ Tag Accuracy Against FV Hold-out
| Method | Precision | Recall | F1 |
|---|---|---|---|
| LR-MIMIC | 0.511 | 0.840 | 0.636 |
| LR-FV | 0.650 | 0.948 | 0.771 |
| MIMIC | 0.829 | 0.971 | 0.895 |
| FV | 0.923 | 0.991 | 0.956 |
| MIMIC+FV | 0.919 | 0.956 | |
| MIMIC then FV | 0.910 | 0.992 | 0.949 |
| Ensemble | 0.989 |