| Literature DB >> 35040794 |
Zahra Shakeri Hossein Abad1,2, Gregory P Butler3, Wendy Thompson3, Joon Lee2,4,5.
Abstract
BACKGROUND: Crowdsourcing services, such as Amazon Mechanical Turk (AMT), allow researchers to use the collective intelligence of a wide range of web users for labor-intensive tasks. As the manual verification of the quality of the collected results is difficult because of the large volume of data and the quick turnaround time of the process, many questions remain to be explored regarding the reliability of these resources for developing digital public health systems.Entities:
Keywords: crowdsourcing; digital public health surveillance; machine learning; public health database; social media analysis
Mesh:
Year: 2022 PMID: 35040794 PMCID: PMC8808350 DOI: 10.2196/28749
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1A sample labeling task (ie, human intelligence task [HIT]) for sedentary behavior. Each HIT contains 4 questions (section 1), and each asks if the presented tweet is a self-reported physical activity, sedentary behavior, or sleep quality–related behavior (section 2). The fourth question is an easy, qualification question that was used to check the quality of the worker (section 3).
Figure 2The pipeline of the deep learning model used to predict labels using both textual information and meta-information. LSTM: long short-term memory.
Figure 3The number of workers who completed different numbers of human intelligence tasks (HITs). Most workers completed a relatively small number of HITs.
Details of the collected labels and label consistency (LC) score for each of the physical activity, sleep quality, and sedentary behavior categories. LC ranges from 0 to 1, and the values close to 1 show more consistency among workers’ input.
| Type | Tweets, n (%) | LCmulti | LCbinary | Workers, n (%) |
| Physical activity | 48,576 (49.2) | 0.54 | 0.75 | 232 (38) |
| Sedentary behavior | 17,367 (17.6) | 0.55 | 0.74 | 157 (25.7) |
| Sleep quality | 32,779 (33.2) | 0.58 | 0.77 | 221 (36.2) |
| Total | 98,722 (100) | 0.56 | 0.75 | 610 (100) |
Characteristics of the ground truth data set used to develop and evaluate the supervised and unsupervised inference models.
| Variable | Physical activity (n=4000) | Sedentary behavior (n=2000) | Sleep quality (n=3000) | |||||
|
| ||||||||
|
|
| |||||||
|
|
| Yes | 1629 (40.73) | 726 (36.3) | 1063 (35.43) | |||
|
|
| No | 2371 (59.28) | 1274 (63.7) | 1937 (64.57) | |||
|
|
| |||||||
|
|
| YYa | 1629 (40.73) | 726 (36.3) | 1063 (35.43) | |||
|
|
| YNb | 550 (13.75) | 395 (19.75) | 862 (28.73) | |||
|
|
| NYc | 179 (4.48) | 19 (0.95) | 52 (1.73) | |||
|
|
| NNd | 1642 (41.05) | 860 (43) | 1023 (34.1) | |||
|
| ||||||||
|
| Female | 1131 (28.28) | 576 (28.80) | 469 (15.63) | ||||
|
| Male | 1980 (49.50) | 906 (45.30) | 490 (16.34) | ||||
|
| Unknown | 889 (22.22) | 518 (25.90) | 2041 (68.03) | ||||
|
| ||||||||
|
| ≤18 | 204 (5.10) | 170 (8.50) | 150 (5) | ||||
|
| 19-29 | 743 (18.58) | 475 (23.75) | 331 (11.03) | ||||
|
| 30-39 | 897 (22.42) | 365 (18.25) | 249 (8.30) | ||||
|
| ≥40 | 1267 (31.68) | 472 (23.60) | 229 (7.64) | ||||
|
| Unknown | 889 (22.22) | 518 (25.90) | 2041 (68.03) | ||||
|
| ||||||||
|
| Sunday | 664 (16.60) | 325 (16.25) | 440 (14.66) | ||||
|
| Monday | 595 (14.88) | 307 (15.35) | 440 (14.66) | ||||
|
| Tuesday | 493 (12.32) | 245 (12.25) | 435 (14.50) | ||||
|
| Wednesday | 504 (12.60) | 278 (13.9) | )393 (13.10) | ||||
|
| Thursday | 525 (13.12) | 270 (13.50) | 416 (13.86) | ||||
|
| Friday | 531 (13.28) | 274 (13.70) | 421 (14.03) | ||||
|
| Saturday | 668 (16.70) | 283 (14.15) | 2433 (14.43) | ||||
|
| Unknown | 20 (0.50) | 18 (0.90) | 22 (0.76) | ||||
| Time (24 hours), Q1-Q3 | 10-19 | 10-19 | 5-18 | |||||
| Month (range) | February to July | April to September | January to August | |||||
|
| ||||||||
|
| Organization | 563 (14.08) | 179 (8.95) | 97 (3.23) | ||||
|
| Users | 3437 (85.93) | 1821 (91.05) | 2903 (96.77) | ||||
aYY: self-reported and recent physical activity, sedentary behavior, and sleep quality experience.
bYN: self-reported but not recent physical activity, sedentary behavior, and sleep quality experience.
cNY: not self-reported but recent physical activity, sedentary behavior, and sleep quality experience.
dNN: neither self-reported nor recent physical activity, sedentary behavior, and sleep quality experience.
Performance of the truth interference methods using a ground truth data set of 9000 labeled tweets: 4000 physical activity, 2000 sedentary behavior, and 3000 sleep quality tweets. The top 4 rows of each PASS (physical activity, sedentary behavior, and sleep quality) category represent the results of the applied unsupervised truth inference models.
| Tweets and method | Precision (%) | Recall (%) | F1 (%) | AUCPRa (%) | ||||||||
|
| Multiclass | Binary | Multiclass | Binary | Multiclass | Binary | Multiclass | Binary | ||||
|
| ||||||||||||
|
| MVb | 72 | 85 | 70 |
| 71 | 84 | 56 | 85 | |||
|
| DSd | 74 | 85 | 68 |
| 70 | 84 | 54 | 85 | |||
|
| GLADe | 73 | 84 | 70 | 84 | 71 | 83 | 57 | 84 | |||
|
| RYf | 74 | 85 | 68 |
| 70 | 84 | 54 | 84 | |||
|
| LRg | 74 | 85 |
|
|
|
|
| 87 | |||
|
| KNNh | 74 | 85 | 74 |
| 73 | 84 | 60 |
| |||
|
| SVMi | 72 |
| 73 |
| 73 |
|
|
| |||
|
| RFj | 73 | 85 | 74 | 84 | 73 |
| 60 | 87 | |||
|
| XGBoost | 72 | 81 | 72 | 81 | 71 | 81 | 58 | 83 | |||
|
| DLmetak |
| 84 | 68 | 84 | 73 | 84 | 60 | 78 | |||
|
| DLtext_and_meta | 78 | 84 | 70 | 84 | 73 | 84 | 60 | 78 | |||
|
| ||||||||||||
|
| MV | 71 | 82 | 68 | 82 | 68 | 82 | 54 | 80 | |||
|
| DS | 70 | 81 | 62 | 81 | 65 | 81 | 48 | 79 | |||
|
| GLAD | 71 | 79 | 68 | 79 | 68 | 79 | 54 | 77 | |||
|
| RY | 70 | 81 | 62 | 81 | 65 | 81 | 48 | 79 | |||
|
| LR | 72 |
|
|
| 70 |
|
|
| |||
|
| KNN | 71 | 82 | 71 | 82 | 67 | 82 | 56 | 80 | |||
|
| SVM | 73 |
|
|
| 70 |
|
|
| |||
|
| RF | 72 |
|
| 82 | 69 |
| 57 |
| |||
|
| XGBoost | 68 | 82 | 69 | 82 | 67 | 82 | 54 | 80 | |||
|
| DLmeta |
| 80 | 65 | 80 |
| 80 | 56 | 73 | |||
|
| DLtext/meta |
| 80 | 65 | 80 |
| 80 | 56 | 75 | |||
|
| ||||||||||||
|
| MV | 78 |
| 74 |
| 75 |
| 61 | 87 | |||
|
| DS | 80 |
| 74 |
|
|
| 62 | 87 | |||
|
| GLAD | 79 | 85 | 75 | 85 | 76 | 85 | 62 | 82 | |||
|
| RY | 80 |
| 74 |
| 76 |
| 62 | 87 | |||
|
| LR | 76 | 88 |
| 87 |
| 88 | 64 | 88 | |||
|
| KNN | 76 |
|
|
|
|
| 63 |
| |||
|
| SVM | 76 | 88 |
| 88 |
| 88 | 64 | 88 | |||
|
| RF | 75 |
| 76 |
| 76 |
| 63 |
| |||
|
| XGBoost | 72 | 87 | 72 |
| 72 | 87 | 58 | 87 | |||
|
| DLmeta |
| 86 | 72 | 86 | 76 | 86 | 63 | 81 | |||
|
| DLtext/meta | 80 | 87 | 72 | 87 | 76 | 87 |
| 82 | |||
aAUCPR: precision-recall area under the curve.
bMV: majority voting.
cItalicization indicates best performance for the metric and each PASS (physical activity, sedentary behavior, and sleep quality) category.
dDS: David and Skene.
eGLAD: generative model of labels, abilities, and difficulties.
fRY: Raykar algorithm.
gLR: logistic regression.
hKNN: K-nearest neighbors.
iSVM: support vector machine.
jRF: random forest.
kDL: deep learning.
Figure 4Incremental classification accuracy using pool-based active learning. KNN: K-nearest neighbors; LR: logistic regression; RF: random forest; SVM: support vector machine; XGB: XGBoost.
Figure 5The estimated impact of each piece of meta-information on XGBoost when predicting the truth label. Age is in years. D&S: David and Skene; GLAD: generative model of labels, abilities, and difficulties; LFC: Learning from Crowds (Raykar algorithm); SHAP: Shapley additive explanations.