| Literature DB >> 31711536 |
Irena Spasić1, David Owen2, Andrew Smith3, Kate Button4.
Abstract
BACKGROUND: Knee injury and Osteoarthritis Outcome Score (KOOS) is an instrument used to quantify patients' perceptions about their knee condition and associated problems. It is administered as a 42-item closed-ended questionnaire in which patients are asked to self-assess five outcomes: pain, other symptoms, activities of daily living, sport and recreation activities, and quality of life. We developed KLOG as a 10-item open-ended version of the KOOS questionnaire in an attempt to obtain deeper insight into patients' opinions including their unmet needs. However, the open-ended nature of the questionnaire incurs analytical overhead associated with the interpretation of responses. The goal of this study was to automate such analysis. We implemented KLOSURE as a system for mining free-text responses to the KLOG questionnaire. It consists of two subsystems, one concerned with feature extraction and the other one concerned with classification of feature vectors. Feature extraction is performed by a set of four modules whose main functionalities are linguistic pre-processing, sentiment analysis, named entity recognition and lexicon lookup respectively. Outputs produced by each module are combined into feature vectors. The structure of feature vectors will vary across the KLOG questions. Finally, Weka, a machine learning workbench, was used for classification of feature vectors.Entities:
Keywords: Named entity recognition; Natural language processing; Open-ended questionnaire; Patient reported outcome measure; Sentiment analysis; Text classification; Text mining
Mesh:
Year: 2019 PMID: 31711536 PMCID: PMC6849171 DOI: 10.1186/s13326-019-0215-3
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1A sample of adjectives collocated with the word pain
Fig. 2Two negated mentions of the word pain
Fig. 3The role of verb to ease in effectively negating a nominal subject (nsubj)
Fig. 4A negated mention of the word pain
Feature extraction subsystem
| Module | Software | Resources | Output |
|---|---|---|---|
| linguistic pre-processing | Stanford Core NLP [ | language model | POS tags, dependencies |
| sentiment analysis | Stanford Core NLP [ | sentiment model | sentiment polarities |
| named entity recognition | MetaMap [ | UMLS [ | named entities, semantic types |
| lexicon lookup | N/A | 24 lexicons | matched items |
Definition of ordinal classes
| Question | 1 | 2 | 3 |
|---|---|---|---|
| Q3 | worse | same | better |
| Q4 | no | reasonably | fully |
| Q5–Q7 | none | some | severe |
| Q8–Q9 | not at all | somewhat | a lot |
| Q10 | negative | neutral | positive |
Fig. 5Inter-annotator agreement for questions Q3–Q10
Performance of the KLOSURE system
| Question | Topic | Classes | Method | Features | P (%) | R (%) | F (%) |
|---|---|---|---|---|---|---|---|
| Q1 | condition | 3 | MetaMap | N/A | 95.3 | 87.6 | 91.3 |
| Q2 | treatment | 4 | MetaMap | N/A | 84.9 | 61.6 | 71.4 |
| Q3 | changes | 3 | naive Bayes | 8 | 81.3 | 80.8 | 81.0 |
| Q4 | confidence | 3 | best-first decision tree | 8 | 70.1 | 67.3 | 66.9 |
| Q5 | stiffness | 3 | reduced error pruning tree | 8 | 85.3 | 79.6 | 75.6 |
| Q6 | pain | 3 | complement naive Bayes | 10 | 62.8 | 58.3 | 59.0 |
| Q7 | other symptoms | 3 | naive Bayes | 5 | 83.2 | 83.0 | 81.0 |
| Q8 | daily activities | 3 | J48 pruned tree | 14 | 77.3 | 72.3 | 73.2 |
| Q9 | other activities | 3 | random forest | 14 | 71.0 | 70.2 | 70.6 |
| Q10 | other comments | 3 | Stanford Core NLP | N/A | 75.3 | 72.3 | 73.8 |
Fig. 6Evaluation results