| Literature DB >> 34966903 |
Jaya Chaturvedi1, Aurelie Mascio1, Sumithra U Velupillai1, Angus Roberts1,2.
Abstract
Pain has been an area of growing interest in the past decade and is known to be associated with mental health issues. Due to the ambiguous nature of how pain is described in text, it presents a unique natural language processing (NLP) challenge. Understanding how pain is described in text and utilizing this knowledge to improve NLP tasks would be of substantial clinical importance. Not much work has previously been done in this space. For this reason, and in order to develop an English lexicon for use in NLP applications, an exploration of pain concepts within free text was conducted. The exploratory text sources included two hospital databases, a social media platform (Twitter), and an online community (Reddit). This exploration helped select appropriate sources and inform the construction of a pain lexicon. The terms within the final lexicon were derived from three sources-literature, ontologies, and word embedding models. This lexicon was validated by two clinicians as well as compared to an existing 26-term pain sub-ontology and MeSH (Medical Subject Headings) terms. The final validated lexicon consists of 382 terms and will be used in downstream NLP tasks by helping select appropriate pain-related documents from electronic health record (EHR) databases, as well as pre-annotating these words to help in development of an NLP application for classification of mentions of pain within the documents. The lexicon and the code used to generate the embedding models have been made publicly available.Entities:
Keywords: electronic health records; lexicon; mental health; natural language processing; pain
Year: 2021 PMID: 34966903 PMCID: PMC8710455 DOI: 10.3389/fdgth.2021.778305
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1Google trends for medical condition search term “pain” compared to other common symptoms “fever” and “cough.” X-axis represents time in years. Y-axis numbers represent the search interest relative to the highest point on the chart (100 is the peak popularity for the term, 50 indicates the term is half as popular, and 0 means there was insufficient data for the term).
Count of mentions of “pain”, “chronic pain,” and “-algia” per 10,000 tokens (counts for “pain” include “chronic pain” instances too).
|
|
|
|
|---|---|---|
| Pain | 29.59 | 44.13 |
| Chronic pain | 1.22 | 4.04 |
| *Algia | 1.14 | 1.44 |
Length of text within documents containing the word “pain” in the 4 text sources on a random set of 50 documents for each text source.
|
|
|
|
|
|
|---|---|---|---|---|
| Average length of text (charac.) | 8,144 | 3,864 | 62 | 1,065 |
| Minimum length of text (charac.) | 1,155 | 165 | 11 | 139 |
| Maximum length of text (charac.) | 32,767 | 9,549 | 106 | 3,598 |
Common themes around “pain” in the 50 randomly selected documents from the four data sources.
|
|
|
|
|
|
|---|---|---|---|---|
| CRIS | In constant pain | Overwhelmed by chronic pain problems | Drugs to numb the pain | Chronic back pain |
| MIMIC-III | Severe pain | – | PO as needed for pain | Chronic back pain |
| Sharp pain | Could be causing pain | Helped my back pain | Shoulder pain | |
| To live pain-free | Muscle painbuster | Joint muscle pain |
Collocates for “pain” with frequency > 10.
|
|
|
|
|
|---|---|---|---|
| Chronic | Control | Pain | Agony |
| Back | Acute | About | Amazingly |
| Clinic | Chronic | Anyone | Achieved |
| Physical | Assessment | Back | American |
| Health | Plan | Anything | Body |
Collocates for “pain” with an MI score > 6.
|
|
|
|
|
|---|---|---|---|
| Killers (R) | Chronic (R) | Board (R) | People (L) |
| Chronic (L) | Control (L) | Certified (L) | Amp (R) |
| Fibromyalgia (R) | Complains (L) | Suboxone (L) | Get (L) |
| Ongoing (R) | Incisional (L) | Chronic (L) | Medical (L) |
| Feet (R) | Acute (L) | Doctor (R) | Suffer (L) |
Figure 2Conceptual diagram of pain. Created using an online tool, Grafo (43).
Number of words obtained from the different sources, and parameters/elbow threshold for the embedding models.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Literature | – | – | 71 | 170 | 241 |
| Ontologies | 83 | 440 | 523 | ||
| UMLS | – | – | 11 | 70 | 81 |
| SNOMED-CT | – | – | 67 | 368 | 435 |
| ICD-10 | – | – | 5 | 2 | 7 |
| Embedding models | 171 | – | 171 | ||
| MIMIC-II | w2v, size = 100, window = 5, | 0.57 | 33 | – | 33 |
| MIMIC-II | w2v, size = 400, window = 5, | 0.47 | 40 | – | 40 |
| MIMIC-III | w2v, size = 100, window = 5, | 0.66 | 4 | – | 4 |
| MIMIC-III | w2v, size = 400, window = 5, | 0.47 | 12 | – | 12 |
| MIMIC-III | w2v, size = 300, window = 10, | 0.44 | 26 | – | 26 |
| MIMIC-III | FastText, size = 300, window = 10, | 0.93 | 30 | – | 30 |
| CRIS (SMI) | w2v, size = 300, window = 10, | 0.69 | 16 | – | 16 |
| PubMed | w2v, size = 200, window = 5 | 0.73 | 10 | – | 10 |
Lexicon coverage.
|
|
|
|
|---|---|---|
| Literature | 218 | 241 |
| Ontologies | 291 | 523 |
| Embeddings | 68 | 171 |
Figure 3Venn diagram of unique terms generated from the different sources (A), different ontologies (B), and different embedding models (C).
Figure 4Distribution of terms within pain lexicon.
Top 13 common pain-related terms within a cohort of patients (n = 57,008) in the CRIS database.
|
|
|
|---|---|
| %ache% | 54 |
| %pain% | 36 |
| %burn% | 7 |
| %sore% | 3 |
| %algia% | < 1 |
| %spasm% | < 1 |
| %dynia% | < 1 |
| %algesia | < 1 |
| colic% | < 1 |
| hurt% | < 1 |
| sciatic% | < 1 |
| tender% | < 1 |
| cramp% | < 1 |