| Literature DB >> 28815149 |
Yadan Fan1, Lu He2, Serguei V S Pakhomov1,3, Genevieve B Melton1,4, Rui Zhang1,4.
Abstract
Clinical notes contain rich information about supplement use that is critical for detecting adverse interactions between supplements and prescribed medications. It is important to know the context in which supplements are mentioned in clinical notes to be able to correctly identify patients that either currently take the supplement or did so in the past. We applied text mining methods to automatically classify supplement use into four status categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). We manually classified 1,300 sentences into these categories, which were further split as training (1000 sentences) and testing (300 sentences) sets. We evaluated the 7 types of feature sets and 5 algorithms, and the best model (SVM with unigram, bigram and indicator word within certain distance) performed F-measure of 0.906, 0.913, 0.914, 0.715 for status C, D, S, U, respectively on the testing set. This study demonstrates the feasibility of using text mining methods to classify supplement use status from clinical notes.Entities:
Year: 2017 PMID: 28815149 PMCID: PMC5543386
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Supplement name and lexical variations.
| Supplement | Lexical variations |
|---|---|
| alfalfa | alfalfa |
| echinacea | echinacea |
| fish oil | fish oil |
| garlic | garlic |
| ginger | ginger |
| ginkgo | ginkgo, ginko, gingko, ginkoba |
| ginseng | ginseng |
| melatonin | melatonin |
| St John’s Wort | St. John’s Wort, St. Johns Wort, St. John Wort, |
| Vitamin E | Vitamin E, Vit E |
| bilberry | bilberry |
| biotin | biotin |
| black cohosh | black cohosh |
| Coenzyme Q10 | Coenzyme Q 10, Coenzyme Q-10, CoQ10 |
| cranberry | cranberry |
| dandelion | dandelion |
| flax seed | flax seed, flaxseed |
| folic acid | folic acid |
| glucosamine | glucosamine |
| glutamine | glutamine |
| kava | kava |
| lecithin | lecithin |
| milk thistle | milk thistle |
| saw palmetto | saw palmetto |
| turmeric | turmeric |
Indicator words and their corresponding morphological forms.
| Indication Words | Keywords |
|---|---|
| start | start, starts, started, starting |
| begin | begin, begins, began, begun |
| restart | restart, restarts, restarted, restarting |
| resume | resume, resumes, resumed, resuming |
| initiate | initiate, initiates, initiated, initiating |
| add | add, adds, added, adding |
| try | try, tries, tried, trying |
| increase | increase, increases, increased, increasing |
| decrease | decrease, decreases, decreased, decreasing |
| reduce | reduce, reduces, reduced, reducing |
| lower | lower, lowers, lowered, lowering |
| continue | continue, continues, continued, continuing, continuation |
| take | take, takes, took, taking, taken |
| consume | consume, consumes, consumed, consuming |
| tolerate | tolerate, tolerates, tolerated, tolerating |
| stop | stop, stops, stopped, stopping |
| discontinue | discontinue, discontinues, discontinued, discontinuing |
| hold | hold, holds, held, holding |
| recommend | recommend, recommends, recommended, recommendation |
| advise | advise, advises, advised, advising |
| avoid | avoid, avoids, avoided, avoiding |
| deny | deny, denies, denied, denying |
| decline | decline, declines, declined, declining |
| refuse | refuse, refuses, refused, refusing, refusal |
| neg | no, not, never |
The weighted average of F-measure of 5 classifiers with different token distance.
| Distance from the supplement mentions | Classifiers | ||||
|---|---|---|---|---|---|
| SVM | Maximum Entropy | Naïve Bayes | Decision Tree | Random Forest | |
| L1 R1 | 0.630 | 0.625 | 0.462 | 0.582 | 0.605 |
| L2 R2 | 0.741 | 0.699 | 0.472 | 0.683 | 0.740 |
| L3 R3 | 0.780 | 0.758 | 0.526 | 0.736 | 0.783 |
| L4 R4 | 0.751 | 0.541 | 0.786 | ||
| L5 R5 | 0.740 | ||||
L1R1 indicates context from the 1 word on the left side to the 1 word on the right side of the supplement mention.
The performance of 5 classifiers on 7 types of feature sets on training set.
| Feature Sets | Classifiers | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | Maximum Entropy | Naïve Bayes | Decision Tree | Random Forest | |||||||||||
| P | R | F | P | R | F | P | R | F | P | R | F | P | R | F | |
| Type 0 | 0.771 | 0.751 | 0.748 | 0.778 | 0.762 | 0.760 | 0.726 | 0.738 | 0.718 | 0.717 | 0.789 | 0.763 | 0.753 | ||
| Type 1 | 0.799 | 0772 | 0.760 | 0.735 | 0.734 | 0.734 | 0.659 | 0.639 | 0.596 | 0.792 | 0.759 | 0.743 | 0.791 | 0.767 | 0.756 |
| Type 2 | 0.839 | 0.838 | 0.838 | 0.813 | 0.794 | 0.786 | 0.635 | 0.579 | 0.497 | 0.790 | 0.785 | 0.772 | 0.753 | 0.747 | |
| Type 3 | 0.789 | 0.790 | 0.788 | 0.784 | 0.783 | 0.783 | 0.750 | 0.711 | 0.792 | ||||||
| Type 4 | 0.799 | 0.798 | 0.798 | 0.793 | 0.761 | 0.751 | 0.678 | 0.612 | 0.541 | 0.761 | 0.75 | 0.745 | 0.816 | 0.794 | 0.786 |
| Type 5 | 0.653 | 0.584 | 0.499 | 0.792 | 0.778 | 0.772 | 0.810 | 0.808 | |||||||
| Type 6 | 0.829 | 0.829 | 0.828 | 0.750 | 0.749 | 0.749 | 0.681 | 0.647 | 0.613 | 0.791 | 0.787 | 0.784 | 0.813 | ||
The performance of SVM classifier with type 5 feature set on the training data in 4 status categories.
| Category | Precision | Recall | F-measure |
|---|---|---|---|
| Continuing (C) | 0.835 | 0.864 | |
| Discontinued (D) | 0.878 | ||
| Started (S) | 0.842 | 0.842 | 0.842 |
| Unclassified (U) | 0.806 | 0.769 | 0.787 |
The performance of SVM classifier with type 5 feature set on the testing data in 4 status categories.
| Category | Precision | Recall | F-measure |
|---|---|---|---|
| Continuing (C) | 0.869 | 0.906 | |
| Discontinued (D) | 0.894 | 0.913 | |
| Started (S) | 0.896 | 0.932 | |
| Unclassified (U) | 0.786 | 0.657 | 0.715 |