| Literature DB >> 32231152 |
María José Aramburu1, Rafael Berlanga2, Indira Lanza2.
Abstract
Background: Recent work in social network analysis has shown the usefulness of analysing and predicting outcomes from user-generated data in the context of Public Health Surveillance (PHS). Most of the proposals have focused on dealing with static datasets gathered from social networks, which are processed and mined off-line. However, little work has been done on providing a general framework to analyse the highly dynamic data of social networks from a multidimensional perspective. In this paper, we claim that such a framework is crucial for including social data in PHS systems.Entities:
Keywords: health surveillance; multidimensional analysis; social network analysis; text mining
Year: 2020 PMID: 32231152 PMCID: PMC7177443 DOI: 10.3390/ijerph17072289
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Summary of the multidimensional model for Twitter streams analysis. We follow the usual notation for multidimensional models: (D) for dimensions, (F) for facts and (M) for metrics. Dotted lines indicate that dimensions are dynamically created from the streamed data.
Keywords for the Twitter stream for Public Health Surveillance.
Figure 2Quality metrics for the top-2000 users according to published tweets in the stream. Green points are good users, black points are users that need manual inspection, and red points are users that should be filtered out.
Top-10 users’ quality metrics. In the screen names column, stream keywords are in bold face. Shadowed rows correspond to out-of-domain users.
| Screen Name (User) | #Tweets | Yule’s I | Coherence (10−4) | Description |
|---|---|---|---|---|
| QunolOfficial | 3306 | 42,3 | 0,88 | Forum |
| 2556 | 94,3 | 0,21 | Comics bot | |
| 2110 | 75,4 | 1,39 | Academic bot | |
| treda10 | 1740 | 70,6 | 1,15 | Company CEO |
| AvosFromMexico | 1623 | 49,9 | 0,33 | Food sales |
| EurekaMag | 1194 | 71,2 | 1,87 | Magazine |
| medvizor | 1053 | 56,6 | 0,65 | Forum |
| beechnutwx | 1010 | 11,2 | 0,31 | Weather bot |
| SnoMiddleForkWX | 963 | 29,8 | 0,32 | Weather bot |
| Sara_ | 947 | 113,7 | 0,02 | Influencer |
Top frequent events (English).
| Event (bigram) | #Tweets | Main Topic | User’s Quality | #Users |
|---|---|---|---|---|
| Alex Trebek | 2659 | Pancreas cancer | (70, 31%, 6) | 2200/4535 |
| Carlos Carrasco | 659 | Leukemia | (72, 28%, 4) | 589/2473 |
| Tapeworm woman | 554 | False tumor | (65, 39%, 6) | 437/332 |
| Ross Perot | 357 | Leukemia | (85, 23%, 4) | 335/1093 |
| Bank holidays | 86 | Chemotherapy | (68, 0%, 5) | 84/27 |
Top frequent events (Spanish).
| Event (bigram) | #Tweets | Main Topic | User’s Quality | #Users |
|---|---|---|---|---|
| Tabare Vázquez | 921 | Lung Tumor | (57, 27%, 6) | 691/1521 |
| Carlos Carrasco | 854 | Leukemia | (55, 33%, 5) | 262/622 |
|
| 338 | Homeopathy | (59, 11%, 6) | 293/638 |
| Luis Enrique | 282 | Cancer | (71, 13%, 3) | 102/1331 |
|
| 108 | Cancer | (61, 7%, 7) | 103/84 |
Top frequent topics (English subset).
| Topic | #Tweets | User’s Quality | #Users | %Tweets |
|---|---|---|---|---|
| leukemia | 27.000 | (76, 6%, 12) | 3.200/17.300 | (16%, 12%) |
| melanoma | 26.791 | (78, 8%, 15) | 2.700/3.364 | (14%, 7%) |
| skin cancer | 19.000 | (72, 7%, 5) | 8.219/23.022 | (23%, 12%) |
| brain tumor | 18.900 | (70, 5%, 5) | 6.091/11.268 | (37%, 28%) |
| lung cancer | 1.500 | (75, 4%, 45) | 401/396 | (18%, 8%) |
Top frequent topics (Spanish subset).
| Topic | #Tweets | User’s Quality | #Users | %Tweets |
|---|---|---|---|---|
|
| 12.390 | (57, 8%, 4) | 1.388/11.108 | (23%, 7%) |
|
| 4.451 | (67, 5%, 3) | 1.264/9.951 | (23%, 16%) |
|
| 3.726 | (60, 15%, 9) | 208/421 | (8%, 3%) |
|
| 2.800 | (58, 4%, 6) | 221/188 | (17%, 12%) |
|
| 1.100 | (57, 3%, 4) | 259/242 | (2%, 3%) |
Figure 3Comparing profiles from different topics and events (authors in blue and audience in red): (a) Top: Leukemia (topic), Bottom: Carlos Carrasco (event) (b) Top: Skin Cancer (topic), Bottom: Tapeworm (event).
Language models for different profiles according to the users’ descriptions.
| Profile | Definition | Top Words in Users’ Descriptions |
|---|---|---|
| professional | Specialists | radiosurgery, oncoplastic, oncologist, haemato-oncology, consultants |
| ex-professional | Retired people | retired, former, senate, viet, colonel |
| student | Students | student, thesis, undergraduate, engineering, studying |
| sports | Sportman | runner, marathon, athlete, skater, rider |
| religion | Religious terms | amin, allah, savior, hindu, jesus |
| p.services | Public services | traffic, 24h, breaking, weather, protection |
| h.services | Health-care services | urgencias, screenings, specialties, uninsured, visitors |
| concerned | Disease-concerned people | survivor, reminder_ribbon, survivorship, warrior |
| pseudo | Alternative therapies | reiki, meditation, yoga, ayurveda, remedial |
| journalist | Press and publications | medline-indexed, issn, journal, indexed, open-access |
| others | Other people | zombies, yin-yang, weekends, voracious, virgo (all with low probs) |
Controversial events/topics.
| Topic/Event | User’s Quality | #Users | %Tweets |
|---|---|---|---|
| 1. black skin | (85, 5%, 2) | 180/5915 | (43%, 22%) |
| 2. force chemo | (97, 0%, 5) | 66/15 | (3%, 0%) |
| 3. | (24, 0%, 282) | 109/185 | (0%, 0%) |
| 4. | (25, 0%, 255) | 115/366 | (0%, 0%) |
| 5. | (56, 0%, 4) | 83/3 | (0%, 0%) |