| Literature DB >> 22564405 |
Son Doan1, Nigel Collier, Hua Xu, Hoang Duy Pham, Minh Phuong Tu.
Abstract
BACKGROUND: Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22564405 PMCID: PMC3502425 DOI: 10.1186/1472-6947-12-36
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Number of fields and descriptions with examples from the i2b2 2009 dataset
| Medication | 12773 | “Lasix,” “Caltrate plus D,” “fluocinonide 0.5% cream,” “TYLENOL (ACETAMINOPHEN)” | Prescription substances, biological substances, over-the-counter drugs, excluding diet, allergy, lab/test, alcohol. |
| Dosage | 4791 | “1 TAB,” “One tablet,” “0.4 mg,” “0.5 m.g.,” “100 MG,” “100 mg x 2 tablets” | The amount of a single medication used in each administration. |
| Mode | 3552 | “Orally,” “Intravenous,” “Topical,” “Sublingual” | Describes the method for administering the medication. |
| Frequency | 4342 | “Prn,” “As needed,” “Three times a day as needed,” “As needed three times a day,” “x3 before meal,” “x3 a day after meal as needed” | Terms, phrases, or abbreviations that describe how often each dose of the medication should be taken. |
| Duration | 597 | “x10 days,” “10-day course,” “For ten days,” “For a month,” “During spring break,” “Until the symptom disappears,” “As long as needed” | Expressions that indicate for how long the medication is to be administered. |
| Reason | 1534 | “Dizziness,” “Dizzy,” “Fever,” “Diabetes,” “frequent PVCs,” “rare angina” | The medical reason for which the medication is stated to be given. |
Figure 1An example of the i2b2 data: “m” is for medication, “do” is for dosage, “mo” is for mode, “f” is for frequency, “du” is for duration, “r” is for reason, “ln” is for “list/narrative”.
Statistics for the i2b2 2009 dataset
| # Notes | 268 |
| # Sentences | 9,689 |
| # Words | 326,474 |
| # Fields | 27,589 |
An example of the BIO representation
| | | | ||
| | | | ||
| Additionally, | | Percocet | 1-2 | Tablets |
| | ||||
| p.o. | Q | 4 | prn, | |
Performance of the SVM-based system for different feature combinations in 10-fold cross-validation
| * | | | | | | 87.09/77.05/81.76 |
| * | * | | | | | 90.34/78.17/83.81 |
| * | | * | | | | 91.81/80.74/85.91 |
| * | | | * | | | 89.92/78.54/83.84 |
| * | | | | * | | 91.62/81.32/86.15 |
| * | | | | | * | 92.38/86.73/ |
| * | * | * | | | | 91.72/81.08/86.06 |
| * | * | * | * | | | 91.81/81.06/86.10 |
| * | * | * | * | * | | 91.78/81.29/86.22 |
| * | * | * | * | * | * | 93.75/87.55/ |
* indicates the corresponding feature in the first row is used. Rows indicate the combination of features and their micro-averaged Precision(Pre)/Recall(Re)/F-score. The best F-score is underlined. F-scores that are significantly higher (p-values < 0.05) than all F-scores from rows above are shown in bold.
Results from the customized MedEx system, CRF (all features), SVM (all features) systems, Simple Majority, Local CRF-based and SVM-based voting in 10-fold cross-validation: “m” stands for medication, “do” for dosage, “mo” for mode, “f” for frequency, “du” for duration, “r” for reason
| Pre | 89.57 | 90.33 | 95.01 | 96.26 | 92.09 | 51.19 | 62.10 | |
| | Re | 84.01 | 89.10 | 82.88 | 86.95 | 88.50 | 58.82 | 47.93 |
| | F-score | 86.67 | 89.68 | 88.50 | 91.32 | 90.16 | 54.20 | 53.78 |
| Pre | 94.38 | 93.99 | 96.47 | 97.63 | 95.61 | 77.40 | 79.34 | |
| | Re | 86.92 | 90.38 | 89.42 | 92.11 | 91.38 | 62.13 | 43.41 |
| | F-score | 90.48 | 92.13 | 92.79 | 94.77 | 93.42 | 68.64 | 55.74 |
| Pre | 93.75 | 93.84 | 95.40 | 97.13 | 95.68 | 70.42 | 74.46 | |
| | Re | 87.55 | 90.76 | 91.46 | 93.27 | 92.69 | 48.21 | 44.50 |
| | F-score | 90.54 | 92.26 | 93.38 | 95.14 | 94.14 | 56.89 | 55.50 |
| Pre | 93.99 | 93.62 | 96.39 | 97.27 | 95.44 | 73.11 | 77.91 | |
| | Re | 87.17 | 90.72 | 89.71 | 92.45 | 91.96 | 53.97 | 45.82 |
| | F-score | 90.43 | 92.12 | 92.91 | 94.78 | 93.63 | 61.65 | 57.37 |
| Pre | 94.11 | 93.86 | 95.43 | 97.16 | 95.65 | 70.58 | 85.81 | |
| | Re | 87.81 | 90.79 | 91.49 | 93.27 | 92.64 | 65.76 | 40.87 |
| | F-score | 90.84 | 92.28 | 93.40 | 95.16 | 94.11 | 67.78 | 55.01 |
| Pre | 93.32 | 93.88 | 95.40 | 97.16 | 95.65 | 70.58 | 70.27 | |
| | Re | 88.19 | 90.79 | 91.44 | 93.24 | 92.64 | 65.76 | 46.99 |
| F-score | 90.67 | 92.30 | 93.37 | 95.14 | 94.11 | 67.78 | 56.08 | |
Statistical significance tests for differences in performance using approximate randomization in 10-fold cross-validation
| all, m, mo, do, f, du, r | all, m, mo, do, f, du | all, m, mo, do, f, du, r | all, m, mo, do, f, du | all, m, mo, do, f, du | |
| | do, f, du | du | all, do, du | do, f | |
| | | du | du | du | |
| | | | all, do, du | du | |
| NS |
The entries in cells indicate that the two systems are significantly different in F-scores for the whole system (all), medication (m), dosage (do), mode (mo), frequency (f), duration (du), and reason (r). NS means “not significant different”. Significance is decided at p = 0.05.
Results from the first ranked system (Sydney), customized MedEx system, CRF (all features), SVM (all features) systems, Simple Majority, Local CRF-based and SVM-based voting on the test set from the 2009 i2b2 challenge: “m” stands for medication, “do” for dosage, “mo” for mode, “f” for frequency, “du” for duration, “r” for reason
| Pre | 93.78 | 93.51 | 94.78 | 96.45 | 96.59 | 69.71 | 76.04 | |
| | Re | 85.03 | 88.31 | 88.91 | 91.28 | 91.25 | 40.93 | 38.83 |
| | F-score | 89.19 | 90.84 | 91.75 | 93.80 | 93.85 | 51.58 | 51.41 |
| Pre | 89.51 | 89.97 | 94.95 | 96.23 | 92.18 | 50.94 | 62.31 | |
| | Re | 84.95 | 89.68 | 84.04 | 88.14 | 89.80 | 60.62 | 47.87 |
| | F-score | 87.17 | 89.83 | 89.16 | 92.01 | 90.97 | 55.36 | 54.14 |
| Pre | 94.11 | 93.71 | 95.92 | 97.26 | 95.60 | 71.88 | 77.52 | |
| | Re | 84.89 | 89.19 | 87.37 | 90.19 | 90.73 | 38.86 | 40.97 |
| | F-score | 89.26 | 91.39 | 91.44 | 93.59 | 93.10 | 50.45 | 53.63 |
| Pre | 93.35 | 93.98 | 94.79 | 96.56 | 95.37 | 68.73 | 68.54 | |
| | Re | 85.42 | 89.18 | 88.73 | 91.01 | 91.71 | 38.34 | 40.75 |
| | F-score | 89.21 | 91.51 | 91.66 | 93.71 | 93.50 | 49.22 | 51.12 |
| Pre | 93.91 | 93.62 | 95.86 | 97.23 | 95.58 | 72.73 | 75.90 | |
| | Re | 85.76 | 90.19 | 87.62 | 90.44 | 91.20 | 44.21 | 43.67 |
| | F-score | 89.65 | 91.87 | 91.55 | 93.71 | 93.34 | 54.99 | 55.44 |
| Pre | 94.20 | 93.96 | 94.84 | 96.56 | 95.27 | 74.07 | 83.39 | |
| | Re | 85.11 | 89.18 | 88.80 | 91.01 | 91.71 | 34.54 | 37.13 |
| | F-score | 89.42 | 91.51 | 91.72 | 93.71 | 93.46 | 47.11 | 51.38 |
| Pre | 93.03 | 94.06 | 94.77 | 96.56 | 95.37 | 66.94 | 65.83 | |
| | Re | 85.76 | 89.18 | 88.71 | 90.98 | 91.66 | 42.66 | 44.52 |
| F-score | 89.24 | 91.55 | 91.64 | 93.69 | 93.48 | 52.11 | 53.12 |
Statistical significance tests for differences in performance using approximate randomization on the test set from the 2009 i2b2 challenge
| all, m, mo, do, f | m | m | all, m, du, r | m,du | m | |
| | all, m, mo, do, f, du | all, m, mo, do, f, du, r | all, m, mo, do, f | all, m, mo, do, f, du | all, m, | |
| | | NS | all, m, du | du | du | |
| | | | all, du, r | NS | NS | |
| | | | | du, r | all, du, r | |
| du |
The entries in cells indicate that the two systems are significantly different in F-scores for the whole system (all), medication (m), dosage (do), mode (mo), frequency (f), duration (du), and reason (r). NS means “not significant different”. Significance is decided at p = 0.05.