| Literature DB >> 33724196 |
Jordan Jouffroy1,2, Sarah F Feldman1,2, Ivan Lerner1,2, Bastien Rance2,3, Anita Burgun1,2, Antoine Neuraz1,2.
Abstract
BACKGROUND: Information related to patient medication is crucial for health care; however, up to 80% of the information resides solely in unstructured text. Manual extraction is difficult and time-consuming, and there is not a lot of research on natural language processing extracting medical information from unstructured text from French corpora.Entities:
Keywords: deep learning; electronic health records; hybrid system; medication information; natural language processing; rule-based system, recurrent neural network
Year: 2021 PMID: 33724196 PMCID: PMC8077811 DOI: 10.2196/17934
Source DB: PubMed Journal: JMIR Med Inform
Figure 1General architecture of the model. BiLSTM: bidirectional long short term memory; CRF: conditional random field.
Figure 2Annotation process with automatic annotation and completion with manual annotation.
Description of the task.
| Type | Description | Examples |
| Medication name | Descriptions that denote any medication, active molecule, association or protocol | doliprane, paracetamol, augmentin |
| Medication class | Descriptions that denote any Anatomical Therapeutic Chemical class or common therapy | ß-Lactam, antibiotherapy |
| Dosage | Dose or concentration of medication in prescription | 3 mg, 2 tablets |
| Frequency | Frequency of medication administration | 3 per day, every morning |
| Duration | Time range for the administration | 3 weeks, until the surgery |
| Route | Medication administration mode | intravenous, per os |
| Condition | The event which provokes the administration | if pain, if infection |
Number of slots and tokens for each class per data set.
| Label | Train | Development | Test | ||||
|
| Tokens | Slots | Tokens | Slots | Tokens | Slots | |
| Medication name | 1385 | 1227 | 146 | 143 | 450 | 398 | |
| Medication class | 309 | 228 | 38 | 30 | 97 | 76 | |
| Dosage | 1366 | 761 | 115 | 62 | 606 | 311 | |
| Frequency | 1604 | 600 | 142 | 46 | 468 | 184 | |
| Duration | 161 | 70 | 26 | 13 | 68 | 37 | |
| Route | 95 | 85 | 8 | 8 | 69 | 55 | |
| Condition | 192 | 61 | 9 | 3 | 89 | 28 | |
Overall medication component information predictions metrics by models.
| Modela | F-measure | Precision | Recall | Slot error rate | Insertion error rate | Deletion error rate | Type error rate | Frontier error rate |
| RBSb | 79.41 | 94.67 | 72.28 | 0.29 | 0.03 | 0.23 | 0.02 | 0.04 |
| BiLSTMc | 73.93 | 83.89 | 67.57 | 0.45 | 0.09 | 0.25 | 0.07 | 0.15 |
| BiLSTM + FTd | 88.08 | 89.48 | 87.17 | 0.21 | 0.07 | 0.08 | 0.03 | 0.09 |
| BiLSTM + ELMoe | 88.03 | 88.81 | 87.38 | 0.24 | 0.1 | 0.08 | 0.03 | 0.1 |
| BiLSTM + RBS | 83.74 | 88.46 | 80.24 | 0.27 | 0.08 | 0.13 | 0.03 | 0.09 |
| BiLSTM + FT + RBS | 88.18 | 91.73 | 85.54 | 0.21 | 0.07 | 0.09 | 0.01 | 0.07 |
| BiLSTM + ELMo + RBS | 89.86 | 90.83 | 89.17 | 0.19 | 0.09 | 0.05 | 0.03 | 0.08 |
| BiLSTM-CRFf | 70.12 | 79.04 | 65.57 | 0.53 | 0.11 | 0.26 | 0.11 | 0.21 |
| BiLSTM-CRF + FT | 87.16 | 88.58 | 86.41 | 0.25 | 0.09 | 0.08 | 0.03 | 0.12 |
| BiLSTM-CRF + ELMo | 88.66 | 87.95 | 89.44 | 0.23 | 0.11 | 0.06 | 0.02 | 0.11 |
| BiLSTM-CRF + RBS | 84.16 | 88.56 | 80.73 | 0.27 | 0.09 | 0.13 | 0.03 | 0.09 |
| BiLSTM-CRF + FT + RBS | 87.74 | 89.72 | 86.25 | 0.22 | 0.08 | 0.08 | 0.02 | 0.09 |
| BiLSTM-CRF + ELMo + RBS | 89.3 | 90.4 | 88.31 | 0.20 | 0.08 | 0.06 | 0.02 | 0.09 |
aModels are described according to their components; if neither ELMo nor FT is mentioned, then we used skip-gram embedding.
bRBS: rule-based system (ie, the outputs are added as extra features to the input of the deep learning module).
cBiLSTM: bidirectional long short term memory.
dFT: FastText embedding.
eELMo: embedding for language model.
fCRF: conditional random field.
Medication information predictions metrics results by models.
| Label | RBS | BiLSTM + ELMo | BiLSTM + ELMo + RBS | ||||||||
|
| F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | ||
| Medication name | 90.31 | 96.46 | 84.89 | 92.2 | 93.79 | 90.67 | 95.33 | 95.33 | 95.33 | ||
| Medication class | 13.33 | 87.5 | 7.22 | 62.3 | 66.28 | 58.76 | 64.36 | 61.9 | 67.01 | ||
| Dosage | 90.43 | 96.62 | 84.98 | 92.17 | 91.13 | 93.23 | 95.29 | 95.52 | 95.05 | ||
| Frequency | 86.13 | 98.89 | 76.28 | 92.8 | 93.3 | 92.31 | 92.24 | 93.04 | 91.45 | ||
| Duration | 48.89 | 49.25 | 48.53 | 82.17 | 86.89 | 77.94 | 78.79 | 81.25 | 76.47 | ||
| Route | 47.92 | 85.19 | 33.33 | 75.52 | 72.97 | 78.26 | 72.86 | 71.83 | 73.91 | ||
| Condition | 33.64 | 100 | 20.22 | 55.9 | 62.5 | 50.56 | 62.16 | 77.97 | 51.69 | ||
aRBS: rule-based system
bBiLSTM: bidirectional long short term memory.
cELMo: embedding for language models.