| Literature DB >> 32012057 |
Anastazia Zunic1, Padraig Corcoran1, Irena Spasic1.
Abstract
BACKGROUND: Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy, and politics. This review focuses specifically on applications related to health, which is defined as "a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity."Entities:
Keywords: machine learning; natural language processing; sentiment analysis; text mining
Year: 2020 PMID: 32012057 PMCID: PMC7013658 DOI: 10.2196/16023
Source DB: PubMed Journal: JMIR Med Inform
Research questions.
| ID | Question |
| RQ1 | What are the major sources of data? |
| RQ2 | What is the originally intended purpose of spontaneously generated narratives? |
| RQ3 | What are the roles of their authors within health and care? |
| RQ4 | What are their demographic characteristics? |
| RQ5 | What areas of health and well-being are discussed? |
| RQ6 | What are the practical applications of SAa? |
| RQ7 | What methods have been used to perform SA? |
| RQ8 | What is the state-of-the-art performance of SA? |
| RQ9 | What resources are available to support SA related to health and well-being? |
aSA: sentiment analysis.
Inclusion criteria.
| ID | Criterion |
| IN1 | The input text represents spontaneously generated narrative. |
| IN2 | The input text discusses topics related to health and well-being. |
| IN3 | The input text captures the perspective of an individual personally affected by issues related to health and well-being (eg, patient or carer) rather than that of a health care professional. |
| IN4 | Sentiment is analyzed automatically using natural language processing. |
Exclusion criteria.
| ID | Criterion |
| EX1 | Sentiment analysis is performed in a language other than English. |
| EX2 | The article is written in a language other than English. |
| EX3 | The article is not peer reviewed. |
| EX4 | The article does not describe an original study. |
| EX5 | The article is published before January 1, 2000. |
| EX6 | The full text of the article is not freely available to academic community. |
Quality assessment criteria.
| ID | Criterion |
| QA1 | Are the aims of the research clearly defined? |
| QA2 | Is the study methodologically sound? |
| QA3 | Is the method explained in sufficient detail to reproduce the results? |
| QA4 | Were the results evaluated systematically? |
Figure 1Flow diagram of the literature review process.
Data extraction framework.
| Item | Description |
| Data | Provenance, purpose, selection criteria, size, and use. |
| Topic | General topic discussed in the given dataset including medical conditions and treatments. |
| Author | Author (data creator) demographics and their role in health care. |
| Application | Downstream application of SAa results. |
| Method | Type of SA method used, feature selection/extraction, and any resources used to support implementation of the method. |
| Evaluation | Measures used to evaluate the results, specific results reported, baseline method used, and improvements over the baseline (if any). |
aSA: Sentiment analysis.
Health-related websites and networks.
| Website | Description | Used in |
| RateMDs [ | Allows users to post reviews about health care staff and services. | [ |
| WebMD [ | Publishes content about health and care topics, including fora that allow users to create or participate in support groups and discussions. | [ |
| Ask a Patient [ | Allows users to share their personal experience about drug treatments. | [ |
| DrugLib.com [ | Allows users to rate and review prescription drugs. | [ |
| Breastcancer.org [ | A breast cancer community of 218,615 members in 81 fora discussing 154,832 topics. | [ |
| MedHelp [ | Allows users to share their personal experiences and evidence-based information across 298 topics related to health and well-being. | [ |
| DailyStrength [ | A social networking service that allows users to create support groups across 34 categories related to health and well-being. | [ |
| Cancer Survivors Network [ | A social networking service that connects users whose lives have been affected by cancer and allows them to share personal experience and expressions of caring. | [ |
| NHS website [ | The primary public facing website of the United Kingdom’s National Health Service (NHS) with more than 43 million visits per month. It provides health-related information and allows patients to provide feedback on services. | [ |
| DiabetesDaily [ | A social networking service that connects people affected by diabetes where they can trade advice and learn more about the condition. | [ |
The roles of authors with respect to health and well-being.
| Role | Description | Studies |
| Sufferer | A person who is affected by a medical condition. | [ |
| Addict | A person who is addicted to a particular substance. | [ |
| Patient | A person receiving or registered to receive medical treatment. | [ |
| Carer | A family member or friend who regularly looks after a sick or disabled person. | [ |
| Suicide victim | A person who has committed suicide. | [ |
Recording and accessing demographic factors.
| Platform | Age | Gender | Education level | Income level | Marital status | Occupation | Religion | Used in |
| ?a/Ub | ?/Nc | Xd/N | X/N | X/N | X/N | X/N | [ | |
| Me/U | M/U | ?/U | X/N | ?/U | ?/U | ?/U | [ | |
| M/U | M/U | X/N | X/N | X/N | X/N | X/N | [ | |
| YouTube | M/U | ?/U | X/N | X/N | X/N | X/N | X/N | [ |
| X/N | X/N | X/N | X/N | X/N | X/N | X/N | [ | |
| Amazon | X/N | X/N | X/N | X/N | X/N | X/N | X/N | [ |
| RateMDs | X/N | X/N | X/N | X/N | X/N | X/N | X/N | [ |
| WebMD | M/U | ?/U | X/N | X/N | X/N | X/N | X/N | [ |
| Ask a Patient | M/Yf | M/Y | X/N | X/N | X/N | X/N | X/N | [ |
| DrugLib.com | M/Y | M/Y | X/N | X/N | X/N | X/N | X/N | [ |
| Breastcancer.org | M/U | ?/U | X/N | X/N | X/N | ?/U | X/N | [ |
| MedHelp | ?/U | M/U | X/N | X/N | X/N | X/N | X/N | [ |
| DailyStrength | M/U | M/U | X/N | X/N | X/N | X/N | X/N | [ |
| Cancer Survivors Network | ?/U | ?/U | X/N | X/N | X/N | X/N | X/N | [ |
| NHSg website | ?/U | ?/U | ?/U | X/N | X/N | X/N | X/N | [ |
| DiabetesDaily | ?/U | ?/U | X/N | X/N | X/N | ?/U | X/N | [ |
a? indicates optional recording.
bU: user-specific access.
cN: not accessible online.
dX: recording not available.
eM: recording mandatory.
fY: accessible online.
gNHS: National Health Service.
Health-related problems studied by sentiment analysis.
| Problem | Studied in |
| Cancer | [ |
| Mental health | [ |
| Chronic condition | diabetes [ |
| Eating disorder | obesity [ |
| Addiction | smoking [ |
| Pain | [ |
| Infectious diseases | Ebola [ |
| Quality of life | [ |
Health care treatments studied by sentiment analysis
| Treatment | Studied in |
| Medication | [ |
| Vaccine | [ |
| Surgery | [ |
| Orthodontic | [ |
| Physician | [ |
| Health care | [ |
Machine learning algorithms used in sentiment analysis related to health and well-being.
| Algorithm | Description | Used in |
| Support vector machine | Builds a classification model as a hyperplane that maximizes the margin between the training instances of 2 classes. | [ |
| Naïve Bayes classifier | A probabilistic classifier based on Bayes theorem and an assumption that features are mutually independent. | [ |
| Maximum entropy | A probabilistic classifier based on the principle of maximum entropy. | [ |
| Conditional random fields | A method for labeling and segmenting structured data based on a conditional probability distribution over label sequences given an observation sequence. | [ |
| Decision tree learning | A method that uses inductive inference to approximate a discrete-valued target function, which is represented by a decision tree. | [ |
| Random forest | An ensemble learning method that fits multiple decision trees on various data samples and combines them to improve accuracy and control overfitting. | [ |
| AdaBoost | AdaBoost combines multiple weak classifiers into a strong one by retraining and weighing the classifiers iteratively based on the accuracy achieved. | [ |
| A nonparametric, instance-based learning algorithm based on the labels of the | [ | |
| Logistic regression | A method for modeling the log odds of the dichotomous outcome as a linear combination of the predictor variables. | [ |
| Convolutional neural network | A feed-forward neural network that learns to extract salient features that are useful for the given prediction task. Convolutions are used to filter features by using nonlinear functions. Pooling can then be used to reduce the dimensionality. | [ |
Implementations of machine learning algorithms.
| Library | Description | Used in |
| SVMlight [ | An implementation of SVMsa in C. | [ |
| PySVMLight [ | A Python binding to the SVMlight (see above). | [ |
| LIBLINEAR (LIBSVM) [ | Integrated software for support vector classification, regression, and distribution estimation. It supports multiclass classification. | [ |
| Weka [ | A Java library that implements a collection of machine learning algorithms. | [ |
| scikit-learn [ | A Python library that implements a collection of machine learning algorithms. | [ |
| Keras [ | A high-level neural networks APIb written in Python. | [ |
| TextBlob [ | A Python library that supports NLPc and implements a collection of machine learning algorithms. | [ |
aSVM: support vector machine.
bAPI: application programming interface.
cNLP: natural language processing.
Classification performance.
| Study | Algorithma | Accuracy (%) | Precision (%) | Recall (%) | F-measure (%) |
| [ | SVMb | 70 | —c | — | — |
| [ | SVM | — | 55.72 | 54.72 | 55.22 |
| [ | SVM | — | — | — | 53.31 |
| [ | SVM | — | 49 | 46 | 47 |
| [ | SVM + CRFd + rules | — | 60.1 | 36.8 | 45.6 |
| [ | SVM | — | 51.9 | 48.59 | 50.18 |
| [ | KNNe, | — | 49.92 | 50.55 | 50.23 |
| [ | SVM + rules | — | 41.79 | 55.03 | 47.5 |
| [ | SVM, | — | 53.8 | 53.9 | 53.8 |
| [ | rules | — | 45.98 | 44.57 | 45.27 |
| [ | SVM | — | 46 | 54 | 49.41 |
| [ | SVM | — | 55.09 | 48.51 | 51.59 |
| [ | NBg, rules, | — | 57.09 | 55.74 | 56.4 |
| [ | NB + rules | — | 54.96 | 51.81 | 53.34 |
| [ | SVM, | — | — | — | 50.38 |
| [ | ME | — | 57.89 | 49.61 | 53.43 |
| [ | — | 56 | 62 | 59 | |
| [ | SVM + NB + MEh + CRF + lexicon | — | 58.21 | 64.93 | 61.39 |
| [ | LRi | — | 51.14 | 47.64 | 49.33 |
| [ | SVM, | 88.6 | — | — | 89 |
| [ | NB | — | — | — | 54 |
| [ | AdaBoost | 79.2 | — | — | — |
| [ | SVM, AdaBoost, | 79.4 | — | — | — |
| [ | AdaBoost | 79.2 | — | — | — |
| [ | NB, ME, | — | 85.25 | 65 | 73.76 |
| [ | NB, | — | 84.52 | 66.67 | 74.54 |
| [ | SVM | 88.6 | — | — | — |
| [ | SVM, LR, | 79.2 | — | — | — |
| [ | — | 71.47 | 66.91 | 67.23 | |
| [ | SVM, | — | — | — | 84 |
| [ | — | 63 | 82 | 73 | |
| [ | — | 75.8 | 74.3 | 73 | |
| [ | CNNj | 76.6 | 73.7 | 76.6 | 73.6 |
| [ | SVM + NB | 82.04 | — | — | — |
| [ | — | 68.73 | 51.42 | 58.83 | |
| [ | SVM | — | 78.6 | 78.6 | 78.6 |
| [ | LR, | 75 | 76.1 | — | — |
| [ | NB | 80 | — | — | — |
| [ | N-gram | — | 81.93 | 81.13 | 81.24 |
| [ | — | — | — | 82.4 | |
| [ | SVM, KNN, | — | 58 | 99 | 73 |
aWhere multiple algorithms were compared, the performance of the best performing algorithm is indicated by italic typeset.
bSVM: support vector machine.
cNot applicable.
dCRF: conditional random fields.
ek-nearest neighbors
fDT: decision tree
gNB: naïve Bayes classifier.
hME: maximum entropy
iLR: logistic regression.
jCNN: convolutional neural network.
kRF: random forest.
Overall classification performance.
| Aggregated value | Accuracy (%) | Precision (%) | Recall (%) | F-measure (%) |
| Minimum | 70.00 | 41.79 | 36.8 | 45.27 |
| Maximum | 88.6 | 85.25 | 99 | 89 |
| Median | 79.20 | 57.89 | 54.87 | 54.81 |
| Mean | 79.80 | 61.54 | 60.23 | 61.52 |
| Standard deviation | 5.39 | 12.63 | 14.55 | 13.15 |
Lexical resources for sentiment analysis.
| Resource | Description | Used in |
| Affective Norms for English Words [ | A set of normative emotional ratings for a large number of words in terms of pleasure, arousal, and dominance. | [ |
| AFINN [ | A list of 2477 words and phrases manually rated for valence with an integer between –5 (negative) and 5 (positive). | [ |
| Harvard General Inquirer [ | A lexicon attaching syntactic, semantic, and pragmatic information to words. It includes 1915 positive and 2291 negative words. | [ |
| LabMT 1.0 [ | A list 10,222 words, their average happiness evaluations according to users on Mechanical Turk. | [ |
| Multi-Perspective Question Answering [ | A subjectivity lexicon that provides polarity scores for approximately 8000 words. | [ |
| Emotion Lexicon (also called EmoLex) [ | A list of words and their associations with 8 basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and 2 sentiments (negative and positive). The annotations were done manually by crowdsourcing. | [ |
| OpinionKB [ | A knowledge base of indirect opinions about drugs represented by quadruples ( | [ |
| Opinion Lexicon [ | A list of around 6800 positive and negative opinion words. | [ |
| SentiSense [ | A lexicon attaching emotional category to 2190 WordNet synsets, which cover a total of 5496 words. | [ |
| SentiWordNet [ | An extension of WordNet that associates each synset 3 sentiment scores: positivity, negativity, and objectivity. | [ |
| WordNet-Affect [ | An extension of WordNet that correlates a subset of synsets suitable to represent affective concepts with affective words. Its hierarchical structure was modelled on the WordNet hyponymy relation. | [ |
Figure 2The representation of the UMLS in sentiment lexica.