| Literature DB >> 33949962 |
Ghada Alfattni1,2, Maksim Belousov1, Niels Peek3,4,5, Goran Nenadic1,5.
Abstract
BACKGROUND: Drug prescriptions are often recorded in free-text clinical narratives; making this information available in a structured form is important to support many health-related tasks. Although several natural language processing (NLP) methods have been proposed to extract such information, many challenges remain.Entities:
Keywords: discharge summaries; electronic health records; information extraction; medication prescriptions; natural language processing
Year: 2021 PMID: 33949962 PMCID: PMC8135022 DOI: 10.2196/24678
Source DB: PubMed Journal: JMIR Med Inform
Descriptive statistics of entity types in the National NLP Clinical Challenges (n2c2) data set.
| Entity types | Entities, n (%) | Links to 1 drug, n (%) | Links to multiple drugs, n (%) | Maximum number of drug associations |
| Drug | 26,800 (32.57) | —a | — | — |
| Form | 11,010 (13.38) | 10,980 (99.56) | 48 (<1) | 2 |
| Strength | 10,921 (13.27) | 10,913 (99.70) | 33 (<1) | 3 |
| Frequency | 10,293 (12.51) | 10,281 (99.39) | 63 (1) | 4 |
| Route | 8989 (10.92) | 9000 (99.08) | 84 (1) | 4 |
| Dosage | 6902 (8.39) | 6877 (99.38) | 43 (1) | 4 |
| Reason | 6400 (7.78) | 7158 (83.44) | 1421 (16.56) | 10 |
| Duration | 970 (1.2) | 991 (92.7) | 78 (7) | 4 |
aNot applicable.
Descriptive statistics of relations between drugs and their associated attributes in the National NLP Clinical Challenges (n2c2) data set.
| Relation type | Relations, n (%) | Drugs with 1 link, n (%) | Drugs with more than 1 link, n (%) |
| Strength-drug | 10,946 (18.88) | 10,639 (97.20) | 307 (2.8) |
| Frequency-drug | 10,344 (17.84) | 10,054(97.20) | 290 (2.8) |
| Route-drug | 9084 (15.67) | 8903 (98.01) | 181 (1.99) |
| Reason-drug | 8579 (14.80) | 7704 (89.80) | 875 (10.2) |
| Dosage-drug | 6920 (11.94) | 6765 (97.76) | 155 (2.2) |
| Form-drug | 11,028 (19.02) | 6511 (59.04) | 4517 (40.96) |
| Duration-drug | 1069 (1.84) | 1021 (95.51) | 48 (5) |
Figure 1The architecture of bidirectional long-short term memory with conditional random field for the named entity recognition models. BiLSTM-CRF: bidirectional long-short term memory with conditional random field; PWE+CE: pretrained word embeddings and character embeddings; PWE: pretrained word embeddings; PWE+SFE: pretrained word embeddings and semantic-feature embeddings; RIWE: randomly initialized word embeddings; WE: word embeddings.
Figure 2Semantic-feature token embeddings. B-Drug: begin-drug; B-Temporal: begin-temporal; CLAMP: Clinical Language Annotation, Modeling, and Processing Toolkit; cTakes: Clinical Text Analysis and Knowledge Extraction System; O: outside.
Figure 3The architecture of context-aware long-short term memory for the relation extraction model. e: embedding; LSTM: long-short term memory.
Figure 4Rule-based method for linking drug names to corresponding attributes in discharge summaries.
Evaluation results of the named entity recognition models on the test set (lenient evaluation).
| Entity | RIWEa | PWEb | (PWE+CE)c | (PWE+SFE)d | |||||||||||
|
| Precision | Recall | F-score | Precision | Recall | F-score | Precision | Recall | F-score | Precision | Recall | F-score | |||
| Drug | 0.942 | 0.892 | 0.917 | 0.963 | 0.930 | 0.946 | 0.946 | 0.953 | 0.949 | 0.952 | 0.947 |
| |||
| Strength | 0.977 | 0.959 | 0.968 | 0.979 | 0.970 | 0.975 | 0.973 | 0.976 | 0.974 | 0.977 | 0.977 |
| |||
| Duration | 0.893 | 0.706 | 0.789 | 0.883 | 0.762 | 0.818 | 0.910 | 0.698 | 0.790 | 0.903 | 0.786 |
| |||
| Route | 0.964 | 0.928 | 0.946 | 0.964 | 0.938 | 0.951 | 0.956 | 0.948 |
| 0.953 | 0.943 | 0.948 | |||
| Form | 0.964 | 0.935 | 0.949 | 0.965 | 0.940 | 0.952 | 0.969 | 0.944 |
| 0.972 | 0.932 | 0.951 | |||
| Dosage | 0.928 | 0.912 | 0.920 | 0.932 | 0.931 |
| 0.931 | 0.928 | 0.929 | 0.928 | 0.931 | 0.930 | |||
| Frequency | 0.945 | 0.925 | 0.935 | 0.965 | 0.952 | 0.959 | 0.980 | 0.933 | 0.956 | 0.968 | 0.968 |
| |||
| Reason | 0.771 | 0.458 | 0.575 | 0.821 | 0.497 | 0.620 | 0.860 | 0.452 | 0.593 | 0.621 | 0.653 |
| |||
| Micro | 0.943 | 0.863 | 0.901 | 0.951 | 0.892 | 0.921 | 0.950 | 0.894 |
| 0.927 | 0.913 | 0.920 | |||
| Macro | 0.936 | 0.840 | 0.883 | 0.951 | 0.876 | 0.910 | 0.949 | 0.884 |
| 0.923 | 0.901 | 0.910 | |||
aRIWE: bidirectional long-short term memory with conditional random fields with random word embeddings.
bPWE: bidirectional long-short term memory with conditional random fields with pretrained word embeddings.
c(PWE+CE): bidirectional long-short term memory with conditional random fields with pretrained word embeddings and character embeddings.
d(PWE+SFE): bidirectional long-short term memory with conditional random fields with pretrained word embeddings and semantic-feature embeddings.
eThe best results for each metric are italicized.
Evaluation results of pretrained word embeddings+character embedding named entity recognition model, pretrained word embeddings+character embedding named entity recognition model, and the ensemble model on the test set (lenient evaluation).
| Entity | (PWE+CE)a | (PWE+SFE)b | Ensemble | ||||||||
|
| Precision | Recall | F-score | Recall | Precision | F-score | Precision | Recall | F-score | ||
| Drug | 0.946 | 0.953 | 0.949 | 0.952 | 0.947 |
| 0.962 | 0.939 |
| ||
| Strength | 0.973 | 0.976 | 0.974 | 0.977 | 0.977 |
| 0.981 | 0.972 |
| ||
| Duration | 0.910 | 0.698 | 0.790 | 0.903 | 0.786 |
| 0.919 | 0.720 | 0.807 | ||
| Route | 0.956 | 0.948 | 0.952 | 0.953 | 0.943 | 0.948 | 0.963 | 0.944 |
| ||
| Form | 0.969 | 0.944 |
| 0.972 | 0.932 | 0.951 | 0.972 | 0.939 | 0.955 | ||
| Dosage | 0.931 | 0.928 | 0.929 | 0.928 | 0.931 | 0.930 | 0.943 | 0.930 |
| ||
| Frequency | 0.980 | 0.933 | 0.956 | 0.968 | 0.968 |
| 0.979 | 0.915 | 0.946 | ||
| Reason | 0.860 | 0.452 | 0.593 | 0.621 | 0.653 |
| 0.858 | 0.476 | 0.613 | ||
| Micro | 0.950 | 0.894 |
| 0.927 | 0.913 | 0.920 | 0.961 | 0.884 |
| ||
| Macro | 0.949 | 0.884 |
| 0.923 | 0.901 | 0.910 | 0.962 | 0.869 | 0.911 | ||
a(PWE+CE): bidirectional long-short term memory with conditional random fields with pretrained word embeddings and character embeddings.
b(PWE+SFE): bidirectional long-short term memory with conditional random fields with pretrained word embeddings and semantic-feature embeddings.
cThe best results for each metric are italicized.
Post-hoc analysis of variance (ANOVA) of the named entity recognition models: P values of two-tailed paired t tests for each pair of models.a
| Named entity recognition | PWEb, | PWE+CEc, | PWE+SFEd, |
| RIWEe | <.001 | <.001 | <.001 |
| PWE | N/Af | .94 | .99 |
| PWE+CE | N/A | N/A | .95 |
aRIWE is significantly worse than the rest of the models. At the same time, there is no statistically significant difference between PWE, PWE+CE, and PWE+SFE.
bPWE: pretrained word embeddings.
cCE: character embedding.
dSFE: semantic-feature embeddings.
eRIWE: randomly initialized word embeddings.
fN/A: not applicable.
Evaluation results of the relation extraction models (using gold-standard entities) on the test set (lenient evaluation).
| Relation type | LSTMa | Rulesb | ||||
|
| Precision | Recall | F-score | Precision | Recall | F-score |
| Strength-drug | 0.973 | 0.961 | 0.967 | 0.963 | 0.988 |
|
| Dosage-drug | 0.963 | 0.958 | 0.961 | 0.956 | 0.976 |
|
| Duration-drug | 0.909 | 0.892 | 0.901 | 0.942 | 0.880 |
|
| Frequency-drug | 0.962 | 0.904 | 0.932 | 0.964 | 0.988 |
|
| Form-drug | 0.982 | 0.918 | 0.949 | 0.970 | 0.992 |
|
| Route-drug | 0.958 | 0.934 | 0.946 | 0.962 | 0.972 |
|
| Reason-drug | 0.741 | 0.830 |
| 0.767 | 0.704 | 0.734 |
| Micro | 0.922 | 0.913 | 0.918 | 0.937 | 0.917 |
|
| Macro | 0.914 | 0.910 | 0.909 | 0.935 | 0.902 |
|
aLSTM: long-short term memory method.
bRules: rule-based method.
cThe best results for each metric are italicized.
Evaluation results of the end-to-end models (ie, output from the best-performing named entity recognition and relation extraction models) on the test set (lenient evaluation).
| Relation type | RIWEa+rules | PWEb+rules | (PWE+CE)c+rules | (PWE+SFE)d+rules | |||||||||||
|
| Precision | Recall | F-score | Precision | Recall | F-score | Precision | Recall | F-score | Precision | Recall | F-score | |||
| Strength-drug | 0.919 | 0.914 | 0.917 | 0.952 | 0.943 | 0.947 | 0.948 | 0.950 | 0.949 | 0.948 | 0.964 |
| |||
| Dosage-drug | 0.848 | 0.853 | 0.851 | 0.890 | 0.888 | 0.889 | 0.892 | 0.884 | 0.888 | 0.894 | 0.897 |
| |||
| Duration-drug | 0.837 | 0.615 | 0.709 | 0.842 | 0.662 | 0.741 | 0.889 | 0.617 | 0.729 | 0.860 | 0.678 |
| |||
| Frequency-drug | 0.878 | 0.874 | 0.876 | 0.931 | 0.919 | 0.925 | 0.949 | 0.902 | 0.925 | 0.934 | 0.947 |
| |||
| Form-drug | 0.894 | 0.888 | 0.891 | 0.939 | 0.915 | 0.927 | 0.944 | 0.919 | 0.931 | 0.959 | 0.920 |
| |||
| Route-drug | 0.885 | 0.866 | 0.875 | 0.924 | 0.895 | 0.909 | 0.919 | 0.904 | 0.911 | 0.920 | 0.908 |
| |||
| Reason-drug | 0.635 | 0.333 | 0.437 | 0.702 | 0.371 | 0.485 | 0.744 | 0.343 | 0.470 | 0.503 | 0.472 |
| |||
| Micro | 0.865 | 0.770 | 0.815 | 0.909 | 0.802 | 0.852 | 0.918 | 0.797 |
| 0.871 | 0.830 | 0.850 | |||
| Macro | 0.859 | 0.733 | 0.784 | 0.902 | 0.770 | 0.824 | 0.918 | 0.765 |
| 0.849 | 0.801 | 0.821 | |||
aRIWE: bidirectional long-short term memory with conditional random fields with random word embeddings.
bPWE: bidirectional long-short term memory with conditional random fields with pretrained word embeddings.
cPWE+CE: bidirectional long-short term memory with conditional random fields with pretrained word embeddings and character embeddings.
dPWE+SFE: bidirectional long-short term memory with conditional random fields with pretrained word embeddings and semantic-feature embeddings.
eThe best results for each metric are italicized.
Post-hoc analysis of variance (ANOVA) of the end-to-end models: P values of two-tailed paired t tests for each pair of models.
| End-to-end models | PWEa+rules, | (PWE+CEb)+rules, | (PWE+SFEc)+rules, |
| RIWEd+rules | .01 | .01 | .03 |
| PWE+rules | N/Ae | .99 | .99 |
| (PWE+CE)+rules | N/A | N/A | .99 |
aPWE: pretrained word embeddings.
bCE: character embedding.
cSFE: semantic-feature embeddings.
dRIWE: randomly initialized word embeddings.
eN/A: not applicable.
Figure 5Confusion matrix (token-level) from the output of bidirectional long-short term memory with conditional random field (with pretrained word embeddings and character embeddings) on the National NLP Clinical Challenges test set. The diagonal entries indicate labels that were correctly predicted, and the off-diagonal entries indicate errors. The total number of errors (sum of off-diagonal cells) was 693.