| Literature DB >> 23268488 |
Qi Li1, Haijun Zhai, Louise Deleger, Todd Lingren, Megan Kaiser, Laura Stoutenborough, Imre Solti.
Abstract
OBJECTIVE: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora. DATA AND METHODS: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard.Entities:
Keywords: attribute linkages; clinical notes; clinical trial announcements; multi-layered sequence labeling; natural language processing
Mesh:
Substances:
Year: 2012 PMID: 23268488 PMCID: PMC3756265 DOI: 10.1136/amiajnl-2012-001487
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1Examples of linkages between medications and their attributes. Access the article online to view this figure in colour.
Descriptive statistics of the two corpora
| CN | CTA | |||||
|---|---|---|---|---|---|---|
| Description of corpora | ||||||
| Total no. of documents | 1655 | 3000 | ||||
| Description of entities | No. of entities | % | No. of entities | % | ||
| Total no. of entities | 42854 | 100 | 31652 | 100 | ||
| Total no. of medication entities | 16792 | 39 | 21575 | 69 | ||
| Medication | 12517 | 29 | 9968 | 32 | ||
| Medication type | 4275 | 10 | 11789 | 37 | ||
| Total no. of attributes | 26062 | 61 | 9077 | 31 | ||
| Date | 122 | 0.3 | 16 | 0.05 | ||
| Dosage | 1885 | 4 | 645 | 2 | ||
| Duration | 620 | 1 | 644 | 2 | ||
| Form | 4411 | 10 | 482 | 2 | ||
| Frequency | 4551 | 11 | 381 | 1 | ||
| Modifier | 1769 | 4 | 5827 | 18 | ||
| Route | 3237 | 8 | 893 | 3 | ||
| Status change | 2982 | 7 | 598 | 2 | ||
| Strength | 6485 | 15 | 409 | 1 | ||
CN, clinical notes; CTA, clinical trial announcements.
Figure 2Features for binary classification-based linkage detection.
Figure 3Examples of representative medication–attribute linkages using the multi-layered sequence labeling model. Access the article online to view this figure in colour.
Figure 4Features for the multi-layered sequence labeling.
Figure 5Evaluation measures.
Results of binary classification-based linkage detection: TOKEN versus POS and SVM versus MaxEnt
| Corpora | Features | Method | P | R | F |
|---|---|---|---|---|---|
| CTA | TOKEN | SVM | 0.93 | 0.93 | 0.93 |
| POS | 0.82 | 0.81 | 0.81 | ||
| TOKEN | MaxEnt | 0.92 | 0.93 | 0.92 | |
| POS | 0.81 | 0.80 | 0.81 | ||
| CN | TOKEN | SVM | 0.93 | 0.94 | 0.94 |
| POS | 0.85 | 0.90 | 0.87 | ||
| TOKEN | MaxEnt | 0.93 | 0.94 | 0.93 | |
| POS | 0.81 | 0.84 | 0.83 |
CN, clinical notes; CTA, clinical trial announcements; F, F-measure; P, precision; POS, part-of-speech; R, recall; SVM, support vector machine.
Cumulative results of multi-layered sequence labeling linkage detection
| Corpus | CTA | CN | |||||
|---|---|---|---|---|---|---|---|
| Features | Layer | P | R | F | P | R | F |
| Unigram (Order 1) | L0 | 0.938 | 0.596 | 0.729 | 0.918 | 0.276 | 0.424 |
| L1 | 0.867 | 0.710 | 0.780 | 0.873 | 0.484 | 0.623 | |
| L2 | 0.852 | 0.721 | 0.781 | 0.848 | 0.608 | 0.708 | |
| L3 | 0.848 | 0.723 | 0.780 | 0.821 | 0.704 | 0.758 | |
| L4 | 0.847 | 0.724 | 0.780 | 0.784 | 0.771 | 0.777 | |
| L5 | 0.847 | 0.724 | 0.780 | 0.760 | 0.802 | 0.781 | |
| L6 | 0.847 | 0.724 | 0.780 | 0.749 | 0.812 | 0.779 | |
| L7 | 0.847 | 0.724 | 0.780 | 0.743 | 0.816 | 0.778 | |
| 3Gram | L0 | 0.936 | 0.599 | 0.730 | 0.925 | 0.278 | 0.427 |
| L1 | 0.864 | 0.716 | 0.783 | 0.879 | 0.494 | 0.632 | |
| L2 | 0.850 | 0.727 | 0.784 | 0.853 | 0.623 | 0.720 | |
| L3 | 0.847 | 0.729 | 0.784 | 0.827 | 0.721 | 0.770 | |
| L4 | 0.846 | 0.730 | 0.784 | 0.789 | 0.789 | 0.789 | |
| L5 | 0.846 | 0.731 | 0.784 | 0.765 | 0.820 | 0.791 | |
| L6 | 0.846 | 0.731 | 0.784 | 0.753 | 0.828 | 0.789 | |
| L7 | 0.846 | 0.731 | 0.784 | 0.748 | 0.832 | 0.788 | |
| 5Gram | L0 | 0.943 | 0.602 | 0.734 | 0.939 | 0.283 | 0.434 |
| L1 | 0.883 | 0.725 | 0.796 | 0.899 | 0.496 | 0.639 | |
| L2 | 0.869 | 0.735 | 0.796 | 0.876 | 0.622 | 0.727 | |
| L3 | 0.867 | 0.738 | 0.797 | 0.847 | 0.720 | 0.779 | |
| L4 | 0.866 | 0.739 | 0.797 | 0.806 | 0.788 | 0.796 | |
| L5 | 0.866 | 0.739 | 0.797 | 0.760 | 0.802 | 0.781 | |
| L6 | 0.866 | 0.739 | 0.797 | 0.749 | 0.812 | 0.779 | |
| L7 | 0.866 | 0.739 | 0.797 | 0.743 | 0.816 | 0.778 | |
| Order 4 | L0 | 0.956 | 0.584 | 0.725 | 0.945 | 0.267 | 0.416 |
| L1 | 0.924 | 0.724 | 0.812 | 0.917 | 0.479 | 0.629 | |
| L2 | 0.914 | 0.738 | 0.816 | 0.888 | 0.612 | 0.725 | |
| L3 | 0.913 | 0.739 | 0.817 | 0.853 | 0.712 | 0.776 | |
| L4 | 0.913 | 0.740 | 0.817 | 0.811 | 0.780 | 0.795 | |
| L5 | 0.913 | 0.740 | 0.817 | 0.786 | 0.813 | 0.799 | |
| L6 | 0.913 | 0.741 | 0.817 | 0.772 | 0.823 | 0.797 | |
| L7 | 0.913 | 0.741 | 0.817 | 0.768 | 0.828 | 0.797 | |
CN, clinical notes; CTA, clinical trial announcements; F, F-measure; P, precision; R, recall.
Description of experiment groups
| Token features | Context features | Semantic type | 3Gram | 5Gram | Order | |
|---|---|---|---|---|---|---|
| Unigram | X | X | X | Order 1 | ||
| 3Gram | X | X | X | X | Order 1 | |
| 5Gram | X | X | X | X | Order 1 | |
| Order 4 | X | X | X | Order 4 |