| Literature DB >> 30625206 |
Jorge Carrillo-de-Albornoz1, Ahmet Aker2, Emina Kurtic3, Laura Plaza1.
Abstract
INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: "experiences", "facts" and "opinions", using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers.Entities:
Mesh:
Year: 2019 PMID: 30625206 PMCID: PMC6326476 DOI: 10.1371/journal.pone.0209961
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Distribution of sentences into information types (“Facts”, “Experiences” and “Opinions”).
| Facts | Experiences | Opinions | |
|---|---|---|---|
| 267 | 348 | 271 | |
| 273 | 931 | 389 | |
| 225 | 278 | 310 | |
| 765 | 1,557 | 970 |
Percent inter-annotator agreement for the three factuality labels and the three diseases.
| Experience | Opinion | Fact | |
|---|---|---|---|
| 86% | 69% | 65% | |
| 88% | 65% | 72% | |
| 77% | 70% | 79% | |
| 84% | 68% | 72% |
Feature comparison for the allergies domain.
Results are reported in Accuracy, F-measure, Precision and Recall. Best results are indicated in bold.
| Feature | Acc | F-1 | Pr | Re |
|---|---|---|---|---|
| BoW— | 56 | 54,9 | 55,5 | 56 |
| Bow+Position | 55,9 | 55 | 55,3 | 55,9 |
| Bow+Position+Net | 55,7 | 54,8 | 54,9 | 55,6 |
| Bow+Position+Net+SA | 55,8 | 55,2 | 55,2 | 55,8 |
| Bow+Position+Net+SA+Neg | 55,6 | 55,2 | 55,2 | 55,6 |
| Bow+Position+Net+SA+Neg+verb | 56 | 55,5 | 55,5 | 56 |
| Bow+Position+Net+SA+Neg+verb+POS | ||||
| NP | 41,3 | 40,9 | 45,4 | 41,3 |
| NP+Position | 40,5 | 40,8 | 42,2 | 40,5 |
| NP+Position+Net | 43,6 | 44 | 45,6 | 43,6 |
| NP+Position+Net+SA | 47,9 | 48 | 48,3 | 47,9 |
| NP+Position+Net+SA+Neg | 51 | 51,1 | 51,4 | 51 |
| NP+Position+Net+SA+Neg+verb | 52,1 | 52,2 | 52,3 | 52,1 |
| NP+Position+Net+SA+Neg+verb+POS | ||||
| ST | 48,4 | 48,5 | 48,8 | 48,4 |
| ST+Position | 48,8 | 48,7 | 48,9 | 48,8 |
| ST+Position+Net | 48,3 | 48,3 | 48,5 | 48,3 |
| ST+Position+Net+SA | 50,2 | 20,3 | 50,5 | 50,2 |
| ST+Position+Net+SA+Neg | 51,1 | 51,2 | 51,3 | 51,1 |
| ST+Position+Net+SA+Neg+verb | 51,9 | 51,9 | 52 | 51,9 |
| ST+Position+Net+SA+Neg+verb+POS | ||||
| CUI | 47,8 | 47,2 | 47 | 47,7 |
| CUI+Position | 48 | 47,6 | 47,2 | 48 |
| CUI+Position+Net | 47,6 | 47,1 | 46,9 | 47,6 |
| CUI+Position+Net+SA | 51,9 | 51,8 | 51,8 | 51,9 |
| CUI+Position+Net+SA+Neg | 52,6 | 52,5 | 52,5 | 52,6 |
| CUI+Position+Net+SA+Neg+verb | 55,9 | 55,9 | 55,9 | 55,9 |
| CUI+Position+Net+SA+Neg+verb+POS | ||||
| W2V | 60,2 | 59,9 | 59,9 | 60,2 |
| W2V+Position | 60,5 | 60,2 | 60,2 | 60,5 |
| W2V+Position+Net | 60,5 | 60,2 | 60,2 | 60,5 |
| W2V+Position+Net+SA | 62,2 | 61,9 | 61,9 | 62,2 |
| W2V+Position+Net+SA+Neg | 62,8 | 62,9 | 62,9 | 62,8 |
| W2V+Position+Net+SA+Neg+verb | 64,5 | 64,6 | 64,6 | 64,5 |
| W2V+Position+Net+SA+Neg+verb+POS |
Feature comparison for the breast cancer domain.
Results are reported in Accuracy, F-measure, Precision and Recall. Best results are indicated in bold.
| Feature | Acc | F-1 | Pr | Re |
|---|---|---|---|---|
| BoW— | 62,4 | 62,1 | 62,3 | 62,4 |
| Bow+Position | 63,6 | 63,4 | 63,5 | 63,6 |
| Bow+Position+Net | 64,1 | 63,7 | 64 | 64,1 |
| Bow+Position+Net+SA | 63,6 | 63,4 | 63,5 | 63,6 |
| Bow+Position+Net+SA+Neg | 63,6 | 63,4 | 63,5 | 63,6 |
| Bow+Position+Net+SA+Neg+verb | ||||
| Bow+Position+Net+SA+Neg+verb+POS | 64 | 63,7 | 63,8 | 64 |
| NP | 51,8 | 50,2 | 51,9 | 51,8 |
| NP+Position | 51,9 | 50,4 | 52,2 | 51,9 |
| NP+Position+Net | 51,8 | 50,2 | 51,9 | 51,8 |
| NP+Position+Net+SA | 55,7 | 54 | 56,7 | 55,7 |
| NP+Position+Net+SA+Neg | 55,1 | 53,5 | 55,9 | 55,1 |
| NP+Position+Net+SA+Neg+verb | 59,2 | 57,4 | 58,8 | 59,2 |
| NP+Position+Net+SA+Neg+verb+POS | ||||
| ST | 45,6 | 44,4 | 45,4 | 45,6 |
| ST+Position | 43,7 | 42,6 | 43,3 | 43,7 |
| ST+Position+Net | 48,3 | 46,4 | 49,9 | 48,3 |
| ST+Position+Net+SA | 50,6 | 49,6 | 51,3 | 50,6 |
| ST+Position+Net+SA+Neg | 49,3 | 48,4 | 49,8 | 49,3 |
| ST+Position+Net+SA+Neg+verb | 57,3 | 56,9 | 57,1 | 57,3 |
| ST+Position+Net+SA+Neg+verb+POS | ||||
| CUI | 52 | 51,5 | 51,7 | 52 |
| CUI+Position | 52,9 | 52,2 | 52,5 | 52,9 |
| CUI+Position+Net | 53,3 | 52,7 | 52,8 | 53,3 |
| CUI+Position+Net+SA | 54,5 | 53,9 | 54,1 | 54,5 |
| CUI+Position+Net+SA+Neg | 55 | 54,4 | 54,6 | 55 |
| CUI+Position+Net+SA+Neg+verb | 60,3 | 59,8 | 60,1 | 60,3 |
| CUI+Position+Net+SA+Neg+verb+POS | ||||
| W2V | 65,8 | 65,8 | 65,7 | 65,8 |
| W2V+Position | 65,9 | 65,9 | 65,9 | 65,9 |
| W2V+Position+Net | 65,1 | 65 | 65,1 | 65,1 |
| W2V+Position+Net+SA | 66,4 | 66,4 | 66,4 | 66,4 |
| W2V+Position+Net+SA+Neg | 66,1 | 66,1 | 66,1 | 66,1 |
| W2V+Position+Net+SA+Neg+verb | ||||
| W2V+Position+Net+SA+Neg+verb+POS |
Fig 1Feature comparison for the allergy domain.
Fig 3Feature comparison for the breast cancer domain.
Comparison between diseases (allergies, crohn, and breast cancer) for the best performance features combinations (F-measure).
Best results are indicated in bold.
| Feature | Allergies | crohn | Breast cancer |
|---|---|---|---|
| BoW + Position + Network + SA + Neg + verb + POS | 56,3 | 70,6 | 63,7 |
| NP + Position + Network + SA + Neg + verb + POS | 54 | 65,5 | 58,5 |
| ST + Position + Network + SA + Neg + verb + POS | 52,4 | 64,8 | 57,5 |
| CUI + Position + Network + SA + Neg + verb + POS | 56,5 | 68,9 | 60,2 |
| W2V + Position + Network + SA + Neg + verb + POS | |||
| BoW baseline | 56 | 66,9 | 62,1 |
| Majority baseline | 39,3 | 58,4 | 38,1 |
F-measure by class for the W2V classifier.
Best results are indicated in bold.
| Class | Allergies | Crohn | Breast cancer |
|---|---|---|---|
| Experiences | |||
| Facts | 52,9 | 55,2 | 60,5 |
| Opinions | 55,6 | 065 | 066 |
F-measure by class for the W2V classifier (Resample).
Best results are indicated in bold.
| Class | Allergies | Crohn | Breast cancer |
|---|---|---|---|
| Experiences | |||
| Facts | 77,1 | 78,6 | 82,0 |
| Opinions | 73,8 | 81,9 | 83,0 |
| Total | 79,1 | 87,1 | 83,2 |
Classification results when data for the three diseases are combined.
| Feature | Acc | Pr | Re | F-1 |
|---|---|---|---|---|
| W2V | 70,6 | 70,3 | 70,7 | 70,2 |
| W2V— | 83,4 | 83,3 | 83,4 | 83,4 |
Feature comparison for the crohn domain.
Results are reported in Accuracy, F-measure, Precision and Recall. Best results are indicated in bold.
| Feature | Acc | F-1 | Pr | Re |
|---|---|---|---|---|
| BoW— | 67,7 | 66,9 | 66,5 | 67,7 |
| Bow+Position | 69,7 | 68,9 | 68,5 | 69,7 |
| Bow+Position+Net | 69,7 | 68,9 | 68,5 | 69,7 |
| Bow+Position+Net+SA | 70,1 | 69,3 | 69 | 70,1 |
| Bow+Position+Net+SA+Neg | 70,1 | 69,3 | 69 | 70,1 |
| Bow+Position+Net+SA+Neg+verb | ||||
| Bow+Position+Net+SA+Neg+verb+POS | ||||
| NP | 62,6 | 57,4 | 59,6 | 62,6 |
| NP+Position | 64,7 | 60,9 | 62,2 | 64,7 |
| NP+Position+Net | 64,9 | 61,2 | 62,4 | 64,9 |
| NP+Position+Net+SA | 65,2 | 62,2 | 62,1 | 65,2 |
| NP+Position+Net+SA+Neg | 64,4 | 60,9 | 61,5 | 64,4 |
| NP+Position+Net+SA+Neg+verb | 66,9 | 64,6 | 64,7 | 66,9 |
| NP+Position+Net+SA+Neg+verb+POS | ||||
| ST | 61,7 | 52,6 | 55,8 | 61,8 |
| ST+Position | 65,6 | 60 | 64,2 | 65,7 |
| ST+Position+Net | 65,9 | 60,5 | 64,1 | 66 |
| ST+Position+Net+SA | 65,9 | 60,5 | 64,1 | 66 |
| ST+Position+Net+SA+Neg | 65,6 | 60,8 | 62,9 | 65,6 |
| ST+Position+Net+SA+Neg+verb | ||||
| ST+Position+Net+SA+Neg+verb+POS | 67,7 | 64,8 | 65 | 67,7 |
| CUI | 65,2 | 63,3 | 63,1 | 65,2 |
| CUI+Position | 67,5 | 66,1 | 65,9 | 67,5 |
| CUI+Position+Net | 68,1 | 66,7 | 66,5 | 68,1 |
| CUI+Position+Net+SA | 67,7 | 66,4 | 66,2 | 67,7 |
| CUI+Position+Net+SA+Neg | 67,6 | 66,3 | 66 | 67,6 |
| CUI+Position+Net+SA+Neg+verb | 69,4 | 68,4 | 68,2 | 69,4 |
| CUI+Position+Net+SA+Neg+verb+POS | ||||
| W2V | 75,9 | 75,2 | 75 | 75,9 |
| W2V+Position | 76,1 | 75,4 | 75,3 | 76,1 |
| W2V+Position+Net | 76,1 | 75,4 | 75,3 | 76,1 |
| W2V+Position+Net+SA | 77,2 | 76,5 | 76,3 | 77,2 |
| W2V+Position+Net+SA+Neg | 77,2 | 76,5 | 76,3 | 77,2 |
| W2V+Position+Net+SA+Neg+verb | ||||
| W2V+Position+Net+SA+Neg+verb+POS | 77,2 | 76,5 | 76,3 | 77,2 |