Sarah A Marshall1, Christopher C Yang2, Qing Ping2, Mengnan Zhao2, Nancy E Avis3, Edward H Ip4,5. 1. Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA. 2. College of Computing and Informatics, Drexel University, Philadelphia, PA, 19104, USA. 3. Department of Social Sciences and Health Policy, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA. 4. Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA. eip@wakehealth.edu. 5. Department of Social Sciences and Health Policy, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA. eip@wakehealth.edu.
Abstract
PURPOSE: User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. METHODS: Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. RESULTS: The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. CONCLUSIONS: The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations.
PURPOSE: User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. METHODS: Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. RESULTS: The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. CONCLUSIONS: The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations.
Entities:
Keywords:
Breast cancer; MedHelp; Online forum; Social media; Symptom cluster; Text mining
Authors: Vanessa M Barnabei; Barbara B Cochrane; Aaron K Aragaki; Ingrid Nygaard; R Stan Williams; Peter G McGovern; Ronald L Young; Ellen C Wells; Mary Jo O'Sullivan; Bertha Chen; Robert Schenken; Susan R Johnson Journal: Obstet Gynecol Date: 2005-05 Impact factor: 7.661
Authors: Nancy E Avis; Beverly Levine; Michelle J Naughton; L Douglas Case; Elizabeth Naftalis; Kimberly J Van Zee Journal: Breast Cancer Res Treat Date: 2013-04-16 Impact factor: 4.872
Authors: Daniel Capurro; Kate Cole; Maria I Echavarría; Jonathan Joe; Tina Neogi; Anne M Turner Journal: J Med Internet Res Date: 2014-03-14 Impact factor: 5.428
Authors: Melissa Mazor; Janine K Cataldo; Kathryn Lee; Anand Dhruva; Bruce Cooper; Steven M Paul; Kimberly Topp; Betty J Smoot; Laura B Dunn; Jon D Levine; Yvette P Conley; Christine Miaskowski Journal: Eur J Oncol Nurs Date: 2017-12-19 Impact factor: 2.398
Authors: Meagan S Whisenant; Loretta A Williams; Tito Mendoza; Charles Cleeland; Tsun-Hsuan Chen; Michael J Fisch; Quiling Shi Journal: Cancer Nurs Date: 2021-12-28 Impact factor: 2.760
Authors: Vasiliki Foufi; Tatsawan Timakum; Christophe Gaudet-Blavignac; Christian Lovis; Min Song Journal: J Med Internet Res Date: 2019-06-13 Impact factor: 5.428
Authors: Winnie K W So; Bernard M H Law; Marques S N Ng; Xiaole He; Dorothy N S Chan; Carmen W H Chan; Alexandra L McCarthy Journal: Cancer Med Date: 2021-03-21 Impact factor: 4.452
Authors: Suzanna Maria Zick; Ananda Sen; Afton Luevano Hassett; Andrew Schrepf; Gwen Karilyn Wyatt; Susan Lynn Murphy; John Todd Arnedt; Richard Edmund Harris Journal: JNCI Cancer Spectr Date: 2019-01-16