| Literature DB >> 25831020 |
Shiliang Wang1, Michael J Paul, Mark Dredze.
Abstract
BACKGROUND: Recent studies have demonstrated the utility of social media data sources for a wide range of public health goals, including disease surveillance, mental health trends, and health perceptions and sentiment. Most such research has focused on English-language social media for the task of disease surveillance.Entities:
Keywords: air pollution; data mining; natural language processing; public health surveillance; social media; text mining
Mesh:
Year: 2015 PMID: 25831020 PMCID: PMC4400579 DOI: 10.2196/jmir.3875
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Two pollution-related topics learned from a probabilistic topic model. The left topic is about air quality, and the right topic is about pollution in general.
Correlation of messages matching each filter in 74 cities to the average (ADV) and maximum (MDV) daily PM2.5 values in 2013.
| Filter | Including URLs | Without URLs | ||||
| Number of messages | Corr. (ADV)a | Corr. (MDV)b | Number of messages | Corr. (ADV) | Corr. (MDV) | |
| AQc topic | 7665 | .546 | .545 | 5866 | .583 | .565 |
| POd topic | 21,902 | .361 | .421 | 17,696 | .286 | .387 |
| “air” | 6321 | .552 | .593 | 4949 | .610 | .637 |
| “pollution” | 15,809 | .458 | .474 | 12,044 | .606 | .633 |
| “breathe” | 4807 | .351 | .257 | 4454 | .361 | .290 |
| “cough” | 12,437 | −.005 | −.151 | 11,921 | .027 | −.023 |
| AQ+“air” | 4133 | .564 | .557 | 3103 | .623 | .579 |
| AQ+“pollution” | 4866 | .630 | .619 | 3766 | .703 | .657 |
aCorr. (ADV): Correlation, average daily value
bCorr. (MDV): Correlation, maximum daily value
cAQ: air quality
dPO: pollution
Figure 2Scatter plot showing average daily PM2.5 values (y-axis) and the Weibo rate for 74 cities using our most correlated filter, AQ+”pollution” (r=.703).
Figure 3Summary of annotation results on sample of 170 messages. Tree structure indicates which codes are dependent on their parent codes. Different branches are not mutually exclusive.
Percentage of annotated messages matching the criteria, along with annotator agreement statistics for each question.
| Code | Agreement, n (%) | Agreement (kappa) |
| Relevant to air quality, n=170 | 160 (94.1) | .869 |
| Request for action, n=107 | 104 (97.2) | .557 |
| Firsthand experience, n=107 | 87 (81.3) | .363 |
| Reactive behavior, n=78 | 73 (93.6) | .864 |
| Health concern, n=78 | 61 (78.2) | .429 |
Examples of messages with various labels (the original Chinese Weibo is shown, followed by an English translation).
| Label | Message |
| Not about pollution | 累昏厥了。牢笼一般的机场巴士, 传说中根本不叫花钱的物价, 空气里的尿骚味以及灰蒙蒙的天。无论哪顿饭除了咖喱还是咖喱。 |
| About pollution, not a firsthand experience | 老外说: 这幅画表达的是污染程度的北京。PM爆表。 |
| Request for action | 不能在空气质量重度污染时才想起低碳行动! |
| Firsthand, reactive behavior | 今晚想出去跑步,一查空气指数,还是轻度污染,在家避毒吧。 |
| Firsthand, health concern | 三天前开始咳嗽。一定是北京污染的天气有关, 以后出门戴口罩[生病]。 |