| Literature DB >> 23617267 |
Qi Li1, Louise Deleger, Todd Lingren, Haijun Zhai, Megan Kaiser, Laura Stoutenborough, Anil G Jegga, Kevin Bretonnel Cohen, Imre Solti.
Abstract
BACKGROUND: Cincinnati Children's Hospital Medical Center (CCHMC) has built the initial Natural Language Processing (NLP) component to extract medications with their corresponding medical conditions (Indications, Contraindications, Overdosage, and Adverse Reactions) as triples of medication-related information ([(1) drug name]-[(2) medical condition]-[(3) LOINC section header]) for an intelligent database system, in order to improve patient safety and the quality of health care. The Food and Drug Administration's (FDA) drug labels are used to demonstrate the feasibility of building the triples as an intelligent database system task.Entities:
Mesh:
Year: 2013 PMID: 23617267 PMCID: PMC3646673 DOI: 10.1186/1472-6947-13-53
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Excerpt of a drug label (Urea) annotated for three selected medical conditions and the conditions collected in three triples
| Urea | hyperkeratotic lesion | INDICATION | |
| Urea | stinging | ADVERSE REACTION | |
| Urea | burning | ADVERSE REACTION | |
*The drug names and LOINC Sections are added to the triples via rules that utilize the XML tags of the original label.
List of 15 selected drug labels
| Diovan | 0083-4001-01 | UREA | 42192-101-10 |
| ARICEPT | 62856-851-30 | GlucaGen HypoKit | 0169-7065-15 |
| DORYX | 50546-550-01 | Tramadol Hydrochloride | 54868-4638-6 |
| BENICAR HCT | 65597-107-11 | Lisinopril | 51138-139-30 |
| Copaxone | 0088-1153-30 | Glyburide | 23155-058-10 |
| Natural Fiber PowderOrange Flavor | 53329-102-56 | ||
| WhiskCare 373 | 65585-373-04 | ||
| Degree for Men CleanAntiperspirant and Deodorant | 64942-0866-2 | ||
| UltrasolSunscreenSunscreen Lotion SPF 34 | 59886-319-11 | ||
| Topcare Allergy | 36800-479-68 | ||
Descriptive statistics of medical conditions in the annotated drug labels
| 83 | 6,295 | 3,806 | 10,184 | |||
| 46 | 1,129 | 737 | 1,423 | |||
| 104 | 2,072 | 1,867 | 4043 | |||
| 46 | 525 | 437 | 742 | |||
| 187 | 8,367 | 5,673 | 14,227 | |||
| 92 | 2,942 | 11,74 | 2,165 | |||
| 67 | 1,443 | 1271 | 2781 | |||
| 39 | 553 | 470 | 860 | |||
| 54 | 3,642 | 2144 | 5840 | |||
| 30 | 1,547 | 927 | 2114 | |||
| 121 | 5,085 | 3415 | 8,611 | |||
| 69 | 2,091 | 1391 | 2,953 | |||
(Total=total number of tokens in that category, all=absolute number, unique = unique number).
Figure 1Automatic Medical Condition Extractor (AutoMCExtractor).
Figure 2Excerpt of cTAKES Output.
Feature sets for CRF
| Current token features | |
| Tokens in the 5-window size | |
| Bigram of current token | The current token bigram and the previous token bigram. |
| POS features | The |
| Initial capital features | The features indicating whether the tokens (including the current token, the previous two tokens, and the next two tokens) are upper-case-initial. |
| Number or not features | The features indicating whether the current token is digital or alphabetic or mixed. |
| Capital feature | The feature indicating whether the current token is all capitalized or mixed with capital characters. |
| Prefix and suffix | The prefix and suffix of the current token (first or last two characters). |
| Token length | The character length of the current token. |
| CUI | |
| TUI | |
Note: the features in bold are utilized directly from the cTAKES output; the rest of the features are generated or modified by custom-developed processes.
Features in baseline and experimental systems
| | |||||
|---|---|---|---|---|---|
| | | | X | | |
| X | | | | | |
| X | X | X | | | |
| X | X | X | X | | |
| X | X | X | X | ||
The token and span level results of the experiment
| Token | MC_B | 0.661 | 0.575 | 0.615 | |
| MC_I | 0.890 | 0.21 | 0.338 | ||
| Overall | 0.775 | 0.391 | 0.476 | ||
| Span | Exact Match | 0.827 | 0.506 | 0.628 | |
| Token | MC_B | 0.804 | 0.733 | 0.767 | |
| MC_I | 0.811 | 0.473 | 0.597 | ||
| Overall | 0.808 | 0.603 | 0.681 | ||
| Span | Exact Match | 0.888 | 0.698 | 0.781 | |
| Token | MC_B | 0.910 | 0.782 | 0.841 | |
| MC_I | 0.936 | 0.660 | 0.773 | ||
| Overall | 0.919 | 0.731 | 0.814 | ||
| Span | Exact Match | 0.886 | 0.766 | 0.822 | |
| Left Match | 0.915 | 0.862 | 0.888 | ||
| Right Match | 0.941 | 0.877 | 0.908 | ||
| Partial Match | 0.982 | 0.849 | 0.911 | ||
| Token | MC_B | 0.928 | 0.831 | 0.877 | |
| MC_I | 0.942 | 0.686 | 0.793 | ||
| Overall | 0.933 | 0.771 | 0.844 | ||
| Span | Exact Match | 0.900 | 0.812 | 0.854 | |
| Left Match | 0.931 | 0.841 | 0.886 | ||
| Right Match | 0.944 | 0.852 | 0.900 | ||
| Partial Match | 0.985 | 0.889 | 0.935 | ||
| Token | MC_B | 0.912 | 0.787 | 0.845 | |
| MC_I | 0.936 | 0.663 | 0.775 | ||
| Overall | 0.920 | 0.735 | 0.817 | ||
| Span | Exact Match | 0.886 | 0.769 | 0.824 | |
| Left Match | 0.917 | 0.844 | 0.879 | ||
| Right Match | 0.944 | 0.861 | 0.900 | ||
| Partial Match | 0.982 | 0.852 | 0.912 | ||
The statistical significance tests (with p-values < 0.007)
| 0.039 | |||
| 0.147 | 0.0409 | 0.0264 | |
| 0.215 |
Note 1: Statistical significance was tested using approximate randomization [32].
Note 2: Bonferroni correction was applied because of multiple comparisons [33].