| Literature DB >> 34384202 |
Yuen Chi Phang1, Azleena Mohd Kassim1, Ernest Mangantig2.
Abstract
OBJECTIVE: The main aim of this study was to use text mining on social media to analyze information and gain insight into the health-related concerns of thalassemia patients, thalassemia carriers, and their caregivers.Entities:
Keywords: Data Mining; Data Science; Natural Language Processing; Social Media; Thalassemia
Year: 2021 PMID: 34384202 PMCID: PMC8369049 DOI: 10.4258/hir.2021.27.3.200
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Comparison between previous studies and this study
| Previous studies | This study[ | |
|---|---|---|
|
|
| |
| Methods | Limitation | Contribution |
| Interview sessions with thalassemia patients and caregivers [ | It was time-consuming and costly to organize interview sessions, as travel to different places was often required to carry out face-to-face interviews. | Time and costs were saved due to not having to organize interview sessions or prepare survey questions. |
|
| ||
| Digital surveys of thalassemia patients and caregivers [ | Some patients and caregivers may not have understood all the questions and possibly gave wrong answers. | Patients and caregivers were free to post issues related to thalassemia on social media, from their levels of understanding and points of view. |
|
| ||
| Text mining approach on social media for breast cancer patients [ | Text data was in one language only (French). | A detailed workflow to pre-process social media texts was presented. A pre-processing method known as Malay-English social media text pre-processing (MESMTPP) was introduced. |
| Text mining approach on social media for general posts [ | Basic text data cleaning. | The text data included a mixture of two languages (English and Malay). |
The study method is text mining of social media posts by thalassemia patients, thalassemia carriers, and caregivers.
Number of posts extracted in each group with attribute descriptions and data type
| For ALL Thalassemia | Kelab Thalassemia | Description | |
|---|---|---|---|
| Social media platform | |||
|
| |||
| Total posts | 784 | 261 | |
|
| |||
| Posts with text | 768 | 154 | |
|
| |||
| Attribute[ | |||
|
| |||
| Posts | Posts from Facebook group | ||
| Date | Date each post was made | ||
| Year | Year each post was made (2015–2020) | ||
| Number of likes | Number of likes received by each post | ||
| Number of comments | Number of comments given on each post | ||
| Group | The group (“For ALL Thalassemia MALAYSIA” or “Kelab Thalassemia Malaysia”) in which each post was made | ||
Attribute data types can be text-based or numerical.
Figure 1Number of posts, comments, and likes for each year, with the distribution of the total number of words per post.
Figure 2Malay-English social media text pre-processing (MESMTPP) framework: a procedure to pre-process text from social media posts.
Figure 3Word cloud: cleaned data without stemming and with stemming (with English translation).
Figure 4Top 5 words with the highest TF-IDF (term frequency-inverse document frequency) values from a sample post (with English translation).
Figure 5Hashtag correlation plot.
Comparison of coherence scores across a range of topic numbers after applying LDA in GenSim, LDA in MALLET, and LSA to the dataset
| Topic number | Coherence score | |||||
|---|---|---|---|---|---|---|
|
| ||||||
| Without stemming | With stemming | |||||
|
|
| |||||
| LDA in GenSim | LDA in MALLET | LSA | LDA in GenSim | LDA in MALLET | LSA | |
| 2 | 0.29 | 0.332 | 0.334 | 0.293 | 0.314 | 0.354 |
|
| ||||||
| 4 | 0.293 | 0.365 | 0.363 | 0.319 | 0.357 |
|
|
| ||||||
| 6 | 0.301 | 0.335 |
| 0.308 | 0.402 | 0.398 |
|
| ||||||
| 8 | 0.304 |
| 0.352 | 0.317 | 0.396 | 0.403 |
|
| ||||||
| 10 | 0.285 | 0.365 |
|
| 0.396 | 0.376 |
|
| ||||||
| 12 | 0.288 | 0.398 | 0.377 | 0.307 | 0.390 | 0.353 |
|
| ||||||
| 14 | 0.29 | 0.374 | 0.354 | 0.321 |
| 0.331 |
|
| ||||||
| 16 |
| 0.410 | 0.373 | 0.310 | 0.386 | 0.348 |
|
| ||||||
| 18 | 0.324 | 0.406 | 0.370 | 0.305 | 0.391 | 0.360 |
Bold text is the best coherence score on each method.
LDA: latent Dirichlet allocation, LSA: latent semantic analysis.
Eight main topics generated with keywords and examples of posts (with English translations and descriptions)
| No. | Topic | Keywords | Example of original posts in Malay with English descriptions | |
|---|---|---|---|---|
|
| ||||
| Malay | English | |||
| 1 | Treatment for thalassemia (umbilical cord, iron chelation therapy, and bone marrow-related treatment) |
| patient | “ |
|
| doctor | |||
|
| treatment | |||
|
| healthy | |||
|
| center | |||
|
| test | |||
|
| year | |||
|
| outcome | |||
|
| hospital | |||
|
| baby | |||
|
| ||||
| 2 | Challenges faced by thalassemia patients (illness, work, and treatment side effects) |
| make | “ |
|
| month | |||
|
| year | |||
|
| sick or sore | |||
|
| new | |||
|
| close | |||
|
| hit | |||
|
| feel | |||
|
| last | |||
|
| friend | |||
|
| ||||
| 3 | Iron-rich foods and diet |
| iron | “What should we eat in Thalassemia? Nutrition & Thalassemia. It is recommended that patients going through blood transfusion should opt for a low iron diet…” ( |
|
| nutrients | |||
|
| medicine | |||
|
| eat | |||
|
| food | |||
|
| main | |||
|
| liver | |||
|
| high | |||
|
| transfusion | |||
|
| rate | |||
|
| ||||
| 4 | Praying for strength and peer support |
| child | “ |
|
| sufferer | |||
|
| mother | |||
|
| self | |||
|
| time | |||
|
| Allah (God) | |||
|
| able | |||
|
| hope | |||
|
| continue | |||
|
| strong | |||
|
| ||||
| 5 | Asking questions/opinion about the information of disease and machine used for treatment. |
| group | “ |
|
| hospital | |||
|
| members | |||
|
| place | |||
|
| information | |||
|
| safe | |||
|
| share | |||
|
| prosperous | |||
|
| friend | |||
|
| blood | |||
|
| ||||
| 6 | Blood donation (issues related to shortage of blood and side effects) |
| donation | “ |
|
| cell | |||
|
| type | |||
|
| body | |||
|
| transplant | |||
|
| low | |||
|
| normal | |||
|
| sign | |||
|
| problem | |||
|
| ||||
| 7 | Sharing experiences related to a thalassemia society and activities |
| health | “ |
|
| life | |||
|
| share | |||
|
| family | |||
|
| world | |||
|
| friend | |||
|
| experience | |||
|
| country | |||
|
| disease | |||
|
| activity | |||
|
| ||||
| 8 | Genetics of thalassemia and issues among carrier couples |
| Thalassemia | “ |
|
| carrier | |||
|
| child | |||
|
| beta | |||
|
| alpha | |||
|
| trait | |||
|
| “ | |||
|
| good | |||
|
| expert | |||
Figure 6Visualization of the number of posts, likes, and comments on each topic.