| Literature DB >> 35917151 |
Maya Stemmer1, Yisrael Parmet1, Gilad Ravid1.
Abstract
BACKGROUND: Patients use social media as an alternative information source, where they share information and provide social support. Although large amounts of health-related data are posted on Twitter and other social networking platforms each day, research using social media data to understand chronic conditions and patients' lifestyles is limited.Entities:
Keywords: IBD; NLP; Twitter; inflammatory bowel disease; natural language processing; patient identification; sentiment analysis; user classification
Mesh:
Year: 2022 PMID: 35917151 PMCID: PMC9382547 DOI: 10.2196/29186
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Figure 1The general workflow of the first stage of the study: building a classifier of Twitter users for identifying patients with inflammatory bowel disease (IBD).
Figure 2The general workflow of the second stage of the study: using the classification from the first stage for analyzing patients' tweets. IBD: inflammatory bowel disease.
Summary of classification features and their types.
| User classification feature, feature level, and features | Type | ||
|
| |||
|
|
| ||
|
|
| Tweet counter | Integer |
|
|
| Retweet counter | Integer |
|
|
| Retweet to tweet ratio | Float (0 to 1) |
|
|
| IBDa flag | Binary |
|
|
| User-level IBD ratio | Float (0 to 1) |
|
|
| Crohn flag | Binary |
|
|
| User-level Crohn ratio | Float (0 to 1) |
|
|
| Colitis flag | Binary |
|
|
| User-level colitis ratio | Float (0 to 1) |
|
|
| ||
|
|
| Tweet counter | Integer |
|
|
| Retweet counter | Integer |
|
|
| Retweet to tweet ratio | Float (0 to 1) |
|
|
| IBD counter | Integer |
|
|
| Bio-IBD flag | Binary |
|
|
| IBD ratio | Float (0 to 1) |
|
|
| Crohn counter | Integer |
|
|
| Bio-Crohn flag | Binary |
|
|
| Crohn ratio | Float (0 to 1) |
|
|
| Colitis counter | Integer |
|
|
| Bio-colitis flag | Binary |
|
|
| Colitis ratio | Float (0 to 1) |
|
| |||
|
|
| ||
|
|
| Emoji counter | Integer |
|
|
| Interjection counter | Integer |
|
|
| Profanity counter | Integer |
|
|
| Mention counter | Integer |
|
|
| Hashtag counter | Integer |
|
|
| URL flag | Binary |
|
|
| First-person flag | Binary |
|
|
| Number of words | Integer |
|
|
| Number of characters | Integer |
|
|
| Polarity | Float (−1 to 1) |
|
|
| Positive polarity flag (1 if polarity >0, else 0) | Binary |
|
|
| Negative polarity flag (1 if polarity <0, else 0) | Binary |
|
|
| Subjectivity | Float (0 to 1) |
|
|
| LDAb topic distribution (document=tweet) | 20×float (0 to 1) |
|
|
| ||
|
|
| Emoji sum | Integer |
|
|
| Emoji average | Float |
|
|
| Bio-emoji counter | Integer |
|
|
| Interjection sum | Integer |
|
|
| Interjection average | Float |
|
|
| Bio-interjection counter | Integer |
|
|
| Profanity sum | Integer |
|
|
| Profanity average | Float |
|
|
| Bio-profanity counter | Integer |
|
|
| Mention sum | Integer |
|
|
| Mention average | Float |
|
|
| Bio-mention counter | Integer |
|
|
| Hashtag sum | Integer |
|
|
| Hashtag average | Float |
|
|
| Bio-hashtag counter | Integer |
|
|
| URL sum | Integer |
|
|
| URL average | Float (0 to 1) |
|
|
| Bio-URL flag | Binary |
|
|
| First-person sum | Integer |
|
|
| First-person average | Float (0 to 1) |
|
|
| Bio–first-person flag | Binary |
|
|
| Word average | Float |
|
|
| Bio-number of words | Integer |
|
|
| Character average | Float |
|
|
| Bio-number of characters | Integer |
|
|
| Bio-polarity | Float (−1 to 1) |
|
|
| Positive polarity sum | Integer |
|
|
| Positive polarity average | Float (0 to 1) |
|
|
| Negative polarity sum | Integer |
|
|
| Negative polarity average | Float (0 to 1) |
|
|
| Subjectivity average | Float (0 to 1) |
|
|
| Bio-subjectivity | Float (0 to 1) |
|
|
| LDA topic distribution (document=all the user’s tweets) | 20×float (0 to 1) |
|
| |||
|
|
| ||
|
|
| User-level log in-degree | Float |
|
|
| User-level log out-degree | Float |
|
|
| User-level closeness | Float (0 to 1) |
|
|
| ||
|
|
| Log in-degree | Float |
|
|
| Log out-degree | Float |
|
|
| Closeness | Float (0 to 1) |
aIBD: inflammatory bowel disease.
bLDA: latent Dirichlet allocation.
Three examples of category classification and keyword sentiment extraction after text cleaning.
| Number | Original text | Text after cleaning | Category classification | Keyword sentiment |
| 1 | Spinach is an inflammatory food with a lot of sulfur. Ban that too. (I noticed my Crohn’s tended to flare around spinach season.) | Spinach is an inflammatory food with a lot of sulfur. Ban that too. (I noticed my Crohn’s tended to flare around spinach season.) | Food and drink | Spinach: −0.63 |
| 2 | @bottomline_ibd great poll. I do have the odd binge, but IBD has changed what I can drink. No more red wine or ale | great poll. I do have the odd binge, but IBD has changed what I can drink. No more red wine or ale | Food and drink | Red wine: −0.83; ale: −0.83 |
| 3 | I am living proof that yoga can help #uchicagoibd #studiothree #yoga #ibd | I am living proof that yoga can help #uchicagoibd #studiothree #yoga #ibd | Religion and spirituality | Yoga: 0.69 |
The 10-fold cross-validation and test results for the single instance (SI) and multiple instance (MI) classifications.
| Algorithm and metric | SI tweet-level classification | MI user-level classification | ||||||||
|
| 10-fold | Test | 10-fold | Test | ||||||
|
| ||||||||||
|
| Precision | 0.6775 | 0.7241 | 0.6151 | 0.5902 | |||||
|
| Recall | 0.6297 | 0.5385 | 0.7284 | 0.9231 | |||||
|
| F1 score | 0.6525 | 0.6176 | 0.6542 | 0.7200 | |||||
|
| ROC AUCa | 0.7532 | 0.7248 | 0.8469 | 0.8226 | |||||
|
| ||||||||||
|
| Precision | 0.7416 | 0.6471 | 0.6668 | 0.6735 | |||||
|
| Recall | 0.6465 | 0.5641 | 0.6778 | 0.8462 | |||||
|
| F1 score | 0.6906 | 0.6027 | 0.6711 | 0.7500 | |||||
|
| ROC AUC | 0.7768 | 0.7154 | 0.8658 | 0.8342 | |||||
|
| ||||||||||
|
| Precision | 0.7249 | 0.6667 | 0.6648 | 0.5814 | |||||
|
| Recall | 0.6832 | 0.7179 | 0.6398 | 0.6410 | |||||
|
| F1 score | 0.7034 | 0.6914 | 0.6472 | 0.6098 | |||||
|
| ROC AUC | 0.7883 | 0.7812 | 0.8463 | 0.7205 | |||||
|
| ||||||||||
|
| Precision | 0.7405 | 0.6333 | 0.6594 | 0.6250 | |||||
|
| Recall | 0.6335 | 0.4872 | 0.6358 | 0.6410 | |||||
|
| F1 score | 0.6829 | 0.5507 | 0.6423 | 0.6329 | |||||
|
| ROC AUC | 0.7712 | 0.6825 | 0.8473 | 0.7372 | |||||
|
| ||||||||||
|
| Precision | 0.7676 | 0.7333 | 0.6721 | 0.6444 | |||||
|
| Recall | 0.4355 | 0.2821 | 0.6646 | 0.7436 | |||||
|
| F1 score | 0.5555 | 0.4074 | 0.6595 | 0.6905 | |||||
|
| ROC AUC | 0.6906 | 0.6188 | 0.8722 | 0.7829 | |||||
aROC AUC: area under the receiver operating characteristic curve.
bSVM: support vector machine.
Figure 3Test result comparison between the 2 classification approaches. MI: multiple instance; ROC AUC: area under the receiver operating characteristic curve; SI: single instance; SVM: support vector machine.
The 20 most positive and 20 most negative lifestyles sorted by mean sentiment.
| Rank | Keyword | Count | Sentiment, mean (SD) | Count of positive | Count of negative | Odds |
| 1 | Sushi | 9 | 0.466 (0.814) | 7 | 2 | 3.500 |
| 2 | Ginger ale | 5 | 0.407 (0.597) | 3 | 1 | 3.000 |
| 3 | Salmon | 7 | 0.344 (0.691) | 4 | 3 | 1.333 |
| 4 | Cherry | 10 | 0.33 (0.696) | 6 | 2 | 3.000 |
| 5 | Breakfast | 29 | 0.28 (0.75) | 19 | 9 | 2.111 |
| 6 | Garlic | 8 | 0.244 (0.671) | 4 | 2 | 2.000 |
| 7 | Bagel | 5 | 0.224 (0.633) | 3 | 1 | 3.000 |
| 8 | Almond | 9 | 0.193 (0.668) | 6 | 3 | 2.000 |
| 9 | Yogurt | 14 | 0.189 (0.688) | 7 | 3 | 2.333 |
| 10 | Yoga | 15 | 0.186 (0.693) | 7 | 5 | 1.400 |
| 11 | Ham | 5 | 0.184 (0.535) | 2 | 1 | 2.000 |
| 12 | Biscuit | 13 | 0.172 (0.75) | 8 | 5 | 1.600 |
| 13 | Spinach | 6 | 0.171 (0.76) | 4 | 2 | 2.000 |
| 14 | Vegan cheese | 5 | 0.164 (0.92) | 3 | 2 | 1.500 |
| 15 | Lamb | 5 | 0.14 (0.861) | 3 | 2 | 1.500 |
| 16 | Cake | 26 | 0.13 (0.752) | 16 | 9 | 1.778 |
| 17 | Fitness | 19 | 0.114 (0.728) | 9 | 6 | 1.500 |
| 18 | Ginger | 17 | 0.112 (0.724) | 8 | 7 | 1.143 |
| 19 | Tomato | 10 | 0.089 (0.608) | 5 | 3 | 1.667 |
| 20 | Cafe | 7 | 0.081 (0.783) | 3 | 3 | 1.000 |
| 125 | Fodmap | 12 | −0.501 (0.573) | 2 | 9 | 0.222 |
| 126 | Cocktail | 5 | −0.51 (0.769) | 1 | 4 | 0.250 |
| 127 | Fiber | 63 | −0.512 (0.547) | 7 | 47 | 0.149 |
| 128 | Spicy | 37 | −0.514 (0.572) | 7 | 28 | 0.250 |
| 129 | Vegetable | 49 | −0.533 (0.529) | 6 | 39 | 0.154 |
| 130 | Corn | 28 | −0.534 (0.487) | 2 | 22 | 0.091 |
| 131 | Alcohol | 64 | −0.545 (0.545) | 9 | 51 | 0.176 |
| 132 | Milkshake | 5 | −0.556 (0.811) | 1 | 4 | 0.250 |
| 133 | Milk | 44 | −0.565 (0.5) | 4 | 35 | 0.114 |
| 134 | Vegetarian diet | 10 | −0.567 (0.409) | 1 | 8 | 0.125 |
| 135 | Snack | 10 | −0.573 (0.568) | 2 | 8 | 0.250 |
| 136 | Fig | 5 | −0.578 (0.621) | 1 | 4 | 0.250 |
| 137 | Turkey | 10 | −0.608 (0.626) | 2 | 8 | 0.250 |
| 138 | Yeast | 16 | −0.624 (0.391) | 1 | 13 | 0.077 |
| 139 | Orange | 7 | −0.638 (0.449) | 0 | 5 | 0.000 |
| 140 | Beverage | 7 | −0.661 (0.616) | 1 | 6 | 0.167 |
| 141 | Cabbage | 8 | −0.675 (0.19) | 0 | 8 | 0.000 |
| 142 | Orange juice | 5 | −0.682 (0.385) | 0 | 4 | 0.000 |
| 143 | Flour | 6 | −0.785 (0.211) | 0 | 6 | 0.000 |
| 144 | Lentil | 6 | −0.785 (0.188) | 0 | 6 | 0.000 |