| Literature DB >> 22879764 |
Irena Spasić1, Pete Burnap, Mark Greenwood, Michael Arribas-Ayllon.
Abstract
The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico-semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern-matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.Entities:
Keywords: natural language processing; naïve Bayes classifier; sentiment analysis; topic classification
Year: 2012 PMID: 22879764 PMCID: PMC3409485 DOI: 10.4137/BII.S8945
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Figure 1.Conceptual architecture of the proposed topic classification approach.
Semantic categories with examples taken from the corresponding lexicons.
| Family | 67 | |
| Personal names | 57 | |
| People | 54 | |
| Occupation | 112 | |
| Health | 282 | |
| Religion | 31 | |
| Pejorative words | 349 |
Topics and examples of their lexical clues.
| Abuse | 13 | |
| Anger | 21 | |
| Blame | 10 | |
| Fear | 33 | |
| Forgiveness | 2 | |
| Guilt | 26 | |
| Happiness_peacefulness | 16 | |
| Hopefulness | 10 | |
| Hopelessness | 79 | |
| Information | 60 | |
| Instructions | 102 | |
| Love | 49 | |
| Pride | 5 | |
| Sorrow | 10 | |
| Thankfulness | 45 |
Topics and examples of pattern-matching rules.
| Abuse | (drive|drove).{0,1} me.{0,10} (mad|crazy|insane|mind) | Your indifference, …, just plain | 7 |
| Anger | see .* hell | I will | 53 |
| Blame | ^(?!.*(never|not|n’t|to)) let me down | It looks like John | 47 |
| Fear | (not|n’t|no).{0,10} courage | I have | 10 |
| Forgiveness | forgive (you|him|her|anyone) | Tell him I | 3 |
| Guilt | I (ask|beg).{0,10} forgiveness | 75 | |
| Happiness_peacefulness | (not|n’t) mourn | Please do | 12 |
| Hopefulness | meet in heaven | I hope someday we will | 22 |
| Hopelessness | I ca(n’t|not).* any longer | 206 | |
| Information | I have.{0,20} (\\$|dollar) | I believe | 71 |
| Instructions | I leave.{0,50} to my | 209 | |
| Love | (loads|lots) of love | 57 | |
| Pride | proud of | I’m | 2 |
| Sorrow | my heart.{0,20} br | 11 | |
| Thankfulness | I.{0,10} appreciate | 26 |
Features used to represent a sentence.
| Sentence length | 1 | …tokens in a sentence. |
| POS | 21 | …tokens with a given POS tag. |
| WordNet lexical domains | 45 | …tokens mapped to a given lexical domain. |
| Lexicons (semantic categories) | 7 | …tokens found in a given lexicon. |
| Lexicons (topic clues) | 15 | …tokens found in a given lexicon. |
| Occupation words | 24 | …occurrences of a given word in a sentence. |
| Informative words (MI) | 153 | …occurrences of a given word in a sentence. |
| WordNet–Affect lexicon | 58 | … occurrences of words mapped to a given emotion directly or indirectly through inheritance. |
| SentiWordNet lexicon | 6 | Positive/negative polarity scores of individual words in a sentence aggregated as their maximum, average or sum. |
| Positive/negative polarity | 4 | …positive/negative words found in a given lexicon normalized by sentence length. |
| Negation | 1 | …occurrences of negation words in a sentence. |
| Pronouns | 2 | … occurrences of personal and possessive pronouns (1st person vs. all other persons). |
| Pattern–matching rules | 16 | …patterns successfully matched to a sentence. |
The three test run results.
| Run 1 | 0.00 | 1,250 | 53.68 | 52.75 | 53.21 |
| Run 2 | 0.30 | 1,199 | 54.96 | 51.81 | 53.34 |
| Run 3 | 0.50 | 1,095 | 55.71 | 47.96 | 51.54 |
The evaluation results achieved during training and testing phases.
| Abuse | 6 | 23 | 3 | 20.69 | 66.67 | 0 | 11 | 5 | 0.00 | 0.00 | ||
| Anger | 25 | 28 | 44 | 47.17 | 36.23 | 7 | 10 | 19 | 41.18 | 26.92 | ||
| Blame | 56 | 32 | 51 | 63.64 | 52.34 | 4 | 28 | 41 | 12.50 | 8.89 | ||
| Fear | 11 | 8 | 13 | 57.89 | 45.83 | 4 | 8 | 9 | 33.33 | 30.77 | ||
| Forgiveness | 5 | 3 | 1 | 62.50 | 83.33 | 2 | 4 | 6 | 33.33 | 25.00 | ||
| Guilt | 110 | 92 | 96 | 54.46 | 53.40 | 54 | 45 | 63 | 54.55 | 46.16 | ||
| Happiness_peacefulness | 15 | 6 | 10 | 71.43 | 60.00 | 6 | 4 | 10 | 60.00 | 37.50 | ||
| Hopefulness | 17 | 12 | 30 | 58.62 | 36.17 | 2 | 9 | 36 | 18.18 | 5.26 | ||
| Hopelessness | 286 | 107 | 168 | 72.77 | 63.00 | 121 | 74 | 108 | 62.05 | 52.84 | ||
| Information | 143 | 91 | 149 | 61.11 | 48.97 | 44 | 57 | 60 | 43.56 | 42.31 | ||
| Instructions | 580 | 263 | 224 | 68.80 | 72.14 | 224 | 165 | 156 | 57.58 | 58.95 | ||
| Love | 245 | 96 | 50 | 71.85 | 83.05 | 139 | 65 | 57 | 68.14 | 70.92 | ||
| Pride | 11 | 6 | 4 | 64.71 | 73.33 | 1 | 0 | 8 | 100.00 | 11.11 | ||
| Sorrow | 19 | 16 | 31 | 54.29 | 38.00 | 4 | 7 | 30 | 36.36 | 11.76 | ||
| Thankfulness | 82 | 39 | 8 | 67.77 | 91.11 | 41 | 51 | 4 | 44.57 | 91.11 | ||