| Literature DB >> 27504237 |
Muhammad Zubair Asghar1, Maria Qasim1, Shakeel Ahmad2, Syeda Rabail Zahra1, Fazal Masud Kundi1.
Abstract
The exponential increase in the health-related online reviews has played a pivotal role in the development of sentiment analysis systems for extracting and analyzing user-generated health reviews about a drug or medication. The existing general purpose opinion lexicons, such as SentiWordNet has a limited coverage of health-related terms, creating problems for the development of health-based sentiment analysis applications. In this work, we present a hybrid approach to create health-related domain specific lexicon for the efficient classification and scoring of health-related users' sentiments. The proposed approach is based on the bootstrapping modal, a dataset of health reviews, and corpus-based sentiment detection and scoring. In each of the iteration, vocabulary of the lexicon is updated automatically from an initial seed cache, irrelevant words are filtered, words are declared as medical or non-medical entries, and finally sentiment class and score is assigned to each of the word. The results obtained demonstrate the efficacy of the proposed technique.Entities:
Keywords: Bootstrapping; Domain specific; Health-related; Hybrid approach; Lexicon; Sentiment classification; Sentiment detection
Year: 2016 PMID: 27504237 PMCID: PMC4954801 DOI: 10.1186/s40064-016-2809-x
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Fig. 1Generic framework of SentiHealth
Fig. 2Detailed architecture of SentiHealth
Sample patient review on health forum
| Drug | Avelox |
| Reason | Sinus infection |
| Rating | 4 |
| Sex | Female |
| Age | 38 |
| Comments | My major concern is the unusual headache. It’s severe and I am afraid of getting a stroke or aneurism from this. Still getting some bad leg pains, dizziness, crushing fatigue and some stomach upset. It did cure the infection, but these side effects make me unhappy |
| Dosage | 400 mg |
| Duration | 12 days (1 × d) |
P.O.S tagging for given sentence
| The/DT main/JJ problem/NN with/IN using/VBG Glucophage/NN is/VBZ severe/JJ ankle/NN swelling/VBG |
Parsed sentence
Initial seed cache (HL-1)
| Type | No. of words | ||
|---|---|---|---|
| Pos | Neg | Neu | |
| Noun | 300 | 300 | 300 |
| Verb | 250 | 250 | 250 |
| Adjectives | 200 | 200 | 200 |
| Adverbs | 150 | 150 | 150 |
Partial list of entries from intermediate lexicon (HL_2)
| Word | WL repository | Co-reference PMI score |
|---|---|---|
| Depression | Downheartedness | 1 |
| Desperation | 0.856 | |
| Abasement | 0.628 | |
| Discouragement | 0.353 | |
| Gloominess | 0.335 | |
| Sadness | 0.323 | |
| Sorrow | 0.3 | |
| Trouble | 0.2 |
Filtered lexicon (HL-2)
| Word | WL repository | Co-reference PMI score |
|---|---|---|
| Depression | Downheartedness | 1 |
| Desperation | 0.856 | |
| Abasement | 0.628 |
Sample list of words and their UMLS codes
| S. no | Word | UMLS code |
|---|---|---|
| 1 | Sore throat | C0242429 |
| 2 | Heart burn | C0018834 |
| 3 | Stomach pain | C1963242 |
| 4 | Abdominal cramps | C0000729 |
| 5 | Dizzy | C0012833 |
| 6 | Increased blood pressure | C2917273 |
| 7 | Nausea | C0549206 |
| 8 | Weak | C0004093 |
| 9 | Moral distress | C1828099 |
| 10 | Diarrhea | C0011991 |
| 11 | Sleep walking | C0037672 |
| 12 | Dysenteric diarrhea | C0277526 |
| 13 | Weight gain adverse even | C2911647 |
| 14 | Heart attack | C0027051 |
| 15 | Heatstroke | C0018843 |
| 16 | Silent migraine | C3203712 |
Sample SWN entries
| POS | ID | Pos score | Neg score | Obj score | Synset terms | Gloss definition |
|---|---|---|---|---|---|---|
| Verb | 02109404 | 0.5 | 0 | 0.5 | tolerate#3 | Have a tolerance for a poison or strong drug or pathogen or environmental condition; “The patient does not tolerate the anti-inflammatory drugs we gave him” |
| Adjective | 02114746 | 0 | 0.5 | 0.5 | infective#2 infectious#1 | Caused by infection or capable of causing infection; “viruses and other infective agents” |
| Noun | 14259133 | 0 | 0.625 | 0.375 | temporal_arteritis#1 | Inflammation of the temporal arteries; characterized by headaches and difficulty chewing and (sometimes) visual impairment |
| Adverb | 00275035 | 0 | 0 | 1 | asleep#1 | Into a sleeping state; “he fell asleep” |
POS represents part-of-speech, ID represents SWN-entry’s identification key; Pos Score, Neg Score, and Obj Score represent positive, negative and objective scores of entry respectively. Synset Terms represents synonyms set of the entry, and Gloss definition represents textual explanation of the entry
Selected list of words and their polarity class
| Word (uni/bi-gram) | SWN class | Polarity class detection using Eq. |
|---|---|---|
| Sore throat | Not found | −ive |
| Atheroma | Objective | −ive |
| Fibroelastosis | Not found | −ive |
| Tuberculin | Objective | +ive |
| Puberty | Objective | +ive |
| Heatstroke | Objective | −ive |
| Diarrhea | Objective | −ive |
| Heart burn | Not found | −ive |
Selected list of words with modified polarity scores
| S. no | Word (unigram/bigram) | UMLS code | SWN score | Modified score using Eq. |
|---|---|---|---|---|
| 1 | Sore throat | C0242429 | Not found | Negative (1.2) |
| 2 | Heart burn | C0018834 | Not found | Negative (1.6) |
| 3 | Stomach Pain | C1963242 | Not found | Negative (1.9) |
| 4 | Abdominal cramps | C0000729 | Not found | Negative (2.0) |
| 5 | Atheroma | C0264956 | Neutral (1) | Negative (2.8) |
| 6 | Fibroelastosis | C0016038 | Not found | Negative (1.7) |
| 7 | Trypanosomiasis | C0041227 | Not found | Negative (2.1) |
| 8 | Tuberculin | C0022415 | Neutral (1) | Positive (2.2) |
| 9 | Diarrhea | C0011991 | Neutral (1) | Negative (1.7) |
| 10 |
| C0034011 | Neutral (1) | Positive (2) |
| 11 | Heatstroke | C0018843 | Neutral (1) | Negative (2.5) |
| 12 | Silent migraine | C3203712 | Neutral (1) | Negative (1.7) |
Fig. 3Accuracy-based performance evaluation of the proposed method
Fig. 4Lexicon-wise accuracy comparison with respect to vote-switch algorithm
Performance comparison with other methods
| Work | Dataset | Noise Reduction | Features | Approach | Precision | Recall | F-measure |
|---|---|---|---|---|---|---|---|
| Goeuriot et al. ( | 25,000 reviews | Filtering | Unigram and bigram | Hybrid (lexicon-based + information gain) | 0.76 | 0.52 | 0.62 |
| Asghar et al. ( | 15,000 reviews | Filtering, tokenization, stop word removal, stemming | Unigram, bigram and trigram | Supervised (revised mutual information) | 0.78 | 0.64 | 0.64 |
| Demiroz et al.( | 9000 reviews | Filtering, stop word removal | Bag of words | Supervised (delta scoring) | 0.75 | 0.48 | 0.58 |
| Our work | 26,060 reviews | Filtering, tokenization, stop word removal, lemmatization, spell correction, co-reference resolution | Unigram, bigram and trigram | Hybrid (boot strapping + corpus-based) | 0.89 | 0.79 | 0.83 |
All the mentioned evaluations measures are as reported by their respective work