| Literature DB >> 29764801 |
Susan McRoy1, Majid Rastegar-Mojarad2, Yanshan Wang2, Kathryn J Ruddy2, Tufia C Haddad2, Hongfang Liu2.
Abstract
BACKGROUND: Patient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources.Entities:
Keywords: automated content analysis; online health forum; text classification; text retrieval
Year: 2018 PMID: 29764801 PMCID: PMC5974460 DOI: 10.2196/cancer.9050
Source DB: PubMed Journal: JMIR Cancer ISSN: 2369-1999
Distribution of expressions of information need (HASN) and categories in the MayoConnect (MC) and Cancer Survivor’s Network (CSN) data sets.
| Category | MC total, n (%) | MC HASN, n (%) | CSN total, n (%) | CSN HASN, n (%) |
| Any | 1943 | 110 (6%) | 2246 | 196 (8%) |
| Medical | 597 (31%) | 34 (31%) | 473 (21%) | 48 (24%) |
| Resource | 87 (4%) | 9 (8%) | 32 (1%) | 9 (4%) |
| Social | 353 (18%) | 0 (0%) | 443 (20%) | 9 (4%) |
| Psychological | 61 (3%) | 0 (0%) | 63 (2%) | 5 (2%) |
| Background | 69 (4%) | 0 (0%) | 38 (1%) | 0 (0%) |
| Wellness | 88 (5%) | 3 (3%) | 78 (3%) | 5 (2%) |
| Physical | 167 (9%) | 18 (16%) | 193 (8%) | 15 (7%) |
| Previous | 147 (8%) | 38 (35%) | 425 (18%) | 81 (41%) |
| Other | 313 (16%) | 8 (7%) | 728 (32%) | 24 (12%) |
| Multiple | 60 (3%) | 0 (0%) | N/Aa | N/A |
aN/A: not applicable. This category was not used when annotating the CSN data set.
Five most frequent concepts for each information and topic category
| Category | Top 5 MCa concepts | Top 5 CSNb concepts |
| Information need | experience, side effects, look, surgery, chemo | help, chemo, treatment, normal, experience |
| No information need | cancer, breast cancer, chemo, years, now | now, take, good, chemo, feel |
| Medical | chemo, cancer, radiation, breast cancer, diagnosed | chemo, radiation, now, taxol, treatment |
| Social | thank, hope, good, luck, best | Hi, thank, good, love, take |
| Psychological | feel, make, right, better, depressed | scared, go, feel, cry, thing |
| Background | live, years, breast cancer, now, old | years, old breast cancer, diagnosed, age |
| Wellness | help, shampoo, started, make, work | exercise, eat, help, diet, keep |
| Physical | hair, pain, back, side effect, issue | pain, back, hair, feel, Taxol |
| Previous | help, need, experience, make see | one, help, think, out, now |
| Resource | website, research, mayo, cancer, breast cancer | book, breast cancer, insurance, groups, site |
| Other | one, out, need, go, cancer | make, think, thing, out, feel |
aMC: MayoConnect.
bCSN: Cancer Survivor’s Network.
The performance of Random Forest classifiers for each category for MayoConnect (MC) and Cancer Survivor’s Network (CSN) data.
| Category | MC data | CSN data | ||||||||||
| Without local context features | With local context features | Without local context features | With local context features | |||||||||
| Preca | Recall | F-measure | Prec | Recall | F-measure | Prec | Recall | F-measure | Prec | Recall | F-measure | |
| Medical | .74 | .73 | .73 | .90 | 0.91 | .90 | .65 | .64 | .64 | .78 | .75 | .76 |
| Social | .78 | .78 | .78 | .85 | .85 | .85 | .71 | .70 | .70 | .83 | .82 | .82 |
| Psychological | .73 | .72 | .72 | .77 | .76 | .76 | .69 | .68 | .68 | .73 | .74 | .73 |
| Background | .77 | .77 | .77 | .77 | .77 | .77 | .73 | .73 | .73 | .74 | .74 | .74 |
| Wellness | .76 | .75 | .75 | .80 | .79 | .79 | .67 | .66 | .66 | .70 | .71 | .70 |
| Physical | .80 | .79 | .79 | .82 | .83 | .83 | .64 | .64 | .64 | .70 | .70 | .70 |
| Previous | .61 | .61 | .61 | .58 | .58 | .58 | .70 | .70 | .70 | .71 | .71 | .71 |
| Other | .59 | .59 | .59 | .84 | .86 | .85 | .61 | .60 | .60 | .67 | .66 | .66 |
aPrec: precision.
The performance of Random Forest classifiers for each category tested on MayoConnect (MC) and Cancer Survivor’s Network (CSN) data and trained on either MC or CSN data, using local context features.
| Category | Test MC data | Test CSN data | ||||||||||
| Train MC data | Train CSN data | Train CSN data | Train MC data | |||||||||
| Preca | Recall | F-measure | Prec | Recall | F-measure | Prec | Recall | F-measure | Prec | Recall | F-measure | |
| Medical | .90 | .91 | .90 | .71 | .71 | .71 | .78 | .75 | .76 | .71 | .67 | .68 |
| Social | .85 | .85 | .85 | .61 | .69 | .66 | .83 | .82 | .82 | .75 | .77 | .76 |
| Psychological | .77 | .76 | .76 | .51 | .55 | .51 | .73 | .74 | .73 | .50 | .54 | .52 |
| Background | .77 | .77 | .77 | .51 | .53 | .51 | .74 | .74 | .74 | .50 | .52 | .51 |
| Wellness | .80 | .79 | .79 | .55 | .59 | .56 | .70 | .71 | .70 | .51 | .60 | .55 |
| Physical | .82 | .83 | .83 | .56 | .54 | .55 | .70 | .70 | .70 | .60 | .66 | .63 |
| Previous | .58 | .58 | .58 | .54 | .60 | .57 | .71 | .71 | .71 | .55 | .58 | .56 |
| Other | .84 | .86 | .85 | .65 | .56 | .60 | .67 | .66 | .66 | .61 | .63 | .62 |
aPrec: precision.
Results of classifier training to identify sentences expressing information need in CSN-R (data set of randomly selected sentences from the Cancer Survivor’s Network data set).
| Learning model | Precision | Recall | F-measure |
| Naïve Bayes | .57 | .75 | .59 |
| Random forest | .62 | .65 | .63 |
| Support Vector Machines | .58 | .71 | .61 |