| Literature DB >> 22879759 |
Sunghwan Sohn1, Manabu Torii, Dingcheng Li, Kavishwar Wagholikar, Stephen Wu, Hongfang Liu.
Abstract
This paper describes the sentiment classification system developed by the Mayo Clinic team for the 2011 I2B2/VA/Cincinnati Natural Language Processing (NLP) Challenge. The sentiment classification task is to assign any pertinent emotion to each sentence in suicide notes. We have implemented three systems that have been trained on suicide notes provided by the I2B2 challenge organizer-a machine learning system, a rule-based system, and a system consisting of a combination of both. Our machine learning system was trained on re-annotated data in which apparently inconsistent emotion assignment was adjusted. Then, the machine learning methods by RIPPER and multinomial Naïve Bayes classifiers, manual pattern matching rules, and the combination of the two systems were tested to determine the emotions within sentences. The combination of the machine learning and rule-based system performed best and produced a micro-average F-score of 0.5640.Entities:
Keywords: machine learning; natural language processing; sentiment classification; suicidal emotion
Year: 2012 PMID: 22879759 PMCID: PMC3409488 DOI: 10.4137/BII.S8961
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Emotions and their annotation guidelines.
| Abuse | Was abused verbally, physically, mentally ... |
| Anger | Is angry with someone ... |
| Blame | Is blaming someone ... |
| Fear | Is afraid of something ... |
| Guilt | Feels guilt ... |
| Hopelessness | Feels hopeless ... |
| Sorrow | Feels sorrow ... |
| Forgiveness | Is forgiving someone ... |
| Happiness_peacefulness | Is feeling happy or peaceful ... |
| Hopefulness | Has hope for future ... |
| Love | Feels love for someone ... |
| Pride | Feels pride ... |
| Thankfulness | Is thanking someone ... |
| Instructions | Giving directions on what to do next |
| Information | Giving practical information where things stand |
Figure 1.Statistics of 600 suicide notes in the training set.
Figure 2.Summary of pattern matching rules.
Result statistics on the test set.
| Instructions (382) | 254 | 128 | 382 | 254 | 128 | 135 | 254 | 128 | 135 |
| Hopelessness (229) | 130 | 99 | 79 | 123 | 106 | 85 | 143 | 86 | 119 |
| Love (201) | 133 | 68 | 53 | 135 | 66 | 54 | 143 | 58 | 74 |
| Guilt (117) | 58 | 59 | 49 | 49 | 68 | 42 | 64 | 53 | 57 |
| Information (104) | 51 | 53 | 79 | 51 | 53 | 79 | 51 | 53 | 79 |
| Thankfulness (45) | 31 | 14 | 20 | 38 | 7 | 34 | 39 | 6 | 34 |
| Blame (45) | 1 | 44 | 3 | 4 | 41 | 4 | 4 | 41 | 7 |
| Hopefulness (38) | 0 | 38 | 0 | 1 | 37 | 2 | 1 | 37 | 2 |
| Sorrow (34) | 0 | 34 | 1 | 0 | 34 | 12 | 0 | 34 | 12 |
| Anger (26) | 1 | 25 | 0 | 3 | 23 | 3 | 3 | 23 | 3 |
| Happiness_peacefulness (16) | 3 | 13 | 0 | 3 | 13 | 6 | 3 | 13 | 6 |
| Fear (13) | 2 | 11 | 4 | 3 | 10 | 5 | 3 | 10 | 5 |
| Pride (9) | 1 | 8 | 0 | 1 | 8 | 0 | 1 | 8 | 0 |
| Forgiveness (8) | 0 | 8 | 0 | 0 | 8 | 0 | 0 | 8 | 0 |
| Abuse (5) | 0 | 5 | 0 | 0 | 5 | 0 | 0 | 5 | 0 |
Note: In Emotion column, number in () is the number of the given emotion in the gold standard.
Evaluation results on the test set.
| Instructions | 0.653 | 0.665 | 0.659 | 0.653 | 0.665 | 0.659 | 0.653 | 0.665 | 0.659 |
| Hopelessness | 0.622 | 0.568 | 0.594 | 0.591 | 0.537 | 0.563 | 0.546 | 0.624 | 0.582 |
| Love | 0.715 | 0.662 | 0.687 | 0.714 | 0.672 | 0.692 | 0.659 | 0.711 | 0.684 |
| Guilt | 0.542 | 0.496 | 0.518 | 0.538 | 0.419 | 0.471 | 0.529 | 0.547 | 0.538 |
| Information | 0.392 | 0.490 | 0.436 | 0.392 | 0.490 | 0.436 | 0.392 | 0.490 | 0.436 |
| Thankfulness | 0.608 | 0.689 | 0.646 | 0.528 | 0.844 | 0.650 | 0.534 | 0.867 | 0.661 |
| Blame | 0.250 | 0.022 | 0.041 | 0.500 | 0.089 | 0151 | 0.364 | 0.089 | 0.143 |
| Hopefulness | 0 | 0 | 0 | 0.333 | 0.026 | 0.049 | 0.333 | 0.026 | 0.049 |
| Sorrow | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Anger | 1.000 | 0.038 | 0.074 | 0.500 | 0.115 | 0.188 | 0.500 | 0.115 | 0.188 |
| Happiness_peacefulness | 1.000 | 0.188 | 0.316 | 0.333 | 0.188 | 0.240 | 0.333 | 0.188 | 0.240 |
| Fear | 0.333 | 0.154 | 0.211 | 0.375 | 0.231 | 0.286 | 0.375 | 0.231 | 0.286 |
| Pride | 1.000 | 0.111 | 0.200 | 1.000 | 0.111 | 0.200 | 1.000 | 0.111 | 0.200 |
| Forgiveness | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Abuse | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note:
Micro averaged – ie, obtained by using a global count of each emotion and averaging these sums.
Abbreviations: Prec, precision; Rec, recall; F-sco, F-score.
Emotion statistics in the test set.
| Gold standard | 1272 | 1098 | 1.16 | 0.61 |
| System 1 (ML) | 1088 | 889 | 1.22 | 0.52 |
| System 2 (rules) | 1126 | 907 | 1.24 | 0.54 |
| System 3 (union) | 1242 | 984 | 1.26 | 0.60 |
Notes: Emot is the number of emotions, Esent is the number of sentences that contain emotion, Tsent (=2086) is the total number of sentences in the test set.