| Literature DB >> 36180844 |
Nicolas Vandenbussche1,2, Cynthia Van Hee3, Véronique Hoste3, Koen Paemeleire4,5.
Abstract
BACKGROUND: Headache medicine is largely based on detailed history taking by physicians analysing patients' descriptions of headache. Natural language processing (NLP) structures and processes linguistic data into quantifiable units. In this study, we apply these digital techniques on self-reported narratives by patients with headache disorders to research the potential of analysing and automatically classifying human-generated text and information extraction in clinical contexts.Entities:
Keywords: Cluster headache; Machine learning; Migraine; Natural language processing
Mesh:
Year: 2022 PMID: 36180844 PMCID: PMC9524092 DOI: 10.1186/s10194-022-01490-0
Source DB: PubMed Journal: J Headache Pain ISSN: 1129-2369 Impact factor: 8.588
Demographic characteristics and textual characteristics for the main corpus and corpus with headache attack descriptions
| 121 | 81 | 40 | 112 | 74 | 38 | |
| 45 (13) | 43.1 (12) | 48.9 (14.2) | 45 (13) | 42 (12) | 50 (13.6) | |
| 72 (60%) | 64 (80%) | 8 (20%) | 68 (60.7%) | 61 (82.4%) | 7 (18.4%) | |
| 476 (218–765) | 474 (227–745) | 508 (198–794) | 156 (80–242) | 152 (84–242) | 156 (71–223) | |
| 231 (130–321) | 224 (131–317) | 236 (126–341) | 94 (60–131) | 89 (61–133) | 96 (55–122) | |
| 23 (10–42) | 23 (11–41) | 22 (8–40) | 7 (3–12) | 8 (4–12) | 5 (2–11) | |
Legend: Q1 Lower quartile, Q3 Upper quartile, SD Standard deviation
Thematic analysis from full texts (median proportions per text with first and third quartiles)
| 29.5% (15.8%-46.6%) | 28.4% (16.0%-45.3%) | 34.3% (15.8%-49.7%) | |
| 17.5% (4.8%-29.2%) | 15.2% (4.6%-26.9%) | 20.5% (7.8%-30.3%) | |
| 11.3% (3.9%-24.8%) | 11.3% (3.9%-28.7%) | 12.7% (4.2%-23.4%) | |
| 11.9% (0%-25.7%) | 13.2% (3.9%-24.4%) | 6.8% (0%-26.7%) | |
| 1.4% (0%-9.3%) | 2.8% (0%-9.7%) | 0% (0%-4.4%) | |
| 0% (0%-0%) | 0% (0%-0%) | 0% (0%-0%) | |
| 0% (0%-0%) | 0% (0%-0%) | 0% (0%-0%) |
Fig. 1Key words per diagnosis (red colour migraine, blue colour cluster headache). Legend: (*) = p < 1*10–2, (**) = p < 1*10–5, (***) = p < 1*10–8. Abbreviations: chi2abs = absolute value of the chi-squared statistic, en = English, nl = Dutch
Lexicon-based sentiment analysis statistics of the attack descriptions
| Dataset | Sentiment distribution within full texts | Sentiment distribution within headache attack descriptions |
|---|---|---|
| All patients | 86% negative (104/121), 14% positive (17/121) | 96% negative (107/112), 4% positive (5/112) |
| Cluster headache | 85% negative (34/40), 15% positive (6/40) | 95% negative (36/38), 5% positive (2/38) |
| Migraine | 86% negative (70/81), 14% positive (11/81) | 96% negative (71/74), 4% positive (3/74) |
Experimental results for multi-class classification and cluster headache class detection
| Two Classes | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0,744 | 0,7 | 0,688 | 0,732 | 0,854 | 0,832 | 0,838 | 0,856 | 0,83 | 0,838 | ||||
| 0,802 | 0,816 | 0,808 | 0,821 | 0,804 | 0,816 | 0,808 | 0,821 | 0,804 | 0,816 | 0,808 | 0,821 | ||
| 0,744 | 0,7 | 0,688 | 0,732 | 0,838 | 0,826 | 0,828 | 0,849 | 0,848 | 0,84 | 0,84 | |||
| 0,676 | 0,586 | 0,586 | 0,834 | 0,74 | 0,778 | 0,838 | 0,74 | 0,779 | |||||
| 0,71 | 0,81 | 0,754 | 0,71 | 0,81 | 0,754 | 0,71 | 0,81 | 0,754 | |||||
| 0,676 | 0,586 | 0,586 | 0,792 | 0,76 | 0,769 | 0,808 | 0,784 | ||||||
Legend: Highest accuracy scores for the two classes are boldfaced, the best F1-score for the ‘cluster headache’ class is underlined. Abbreviations: Avg Average, P Precision, R Recall
Experimental results for the best classifiers (SVM and LR) using leave-one-out cross-validation with n-grams features only
| Two classes | ||||
|---|---|---|---|---|
| 0,86 | 0,85 | 0,86 | ||
| 0,82 | 0,79 | 0,8 | 0,83 | |
| 0,83 | 0,79 | |||
| 0,79 | 0,68 | 0,73 | ||
Legend: Highest accuracy score for the two classes is boldfaced. The best F1-score for the ‘cluster headache’ class is underlined. Abbreviations: Avg Average, P Precision, R Recall