| Literature DB >> 32329741 |
Ashlynn R Daughton1, Rumi Chunara2,3, Michael J Paul4.
Abstract
BACKGROUND: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study.Entities:
Keywords: bias; influenza, human; infodemiology; logistic models; selection bias; social media
Mesh:
Year: 2020 PMID: 32329741 PMCID: PMC7210500 DOI: 10.2196/14986
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Study demographics.
| Variable | Study cohort (n=81) | Original data (N=396) | |
|
| |||
|
| Male | 54 (30) | 235 (38) |
|
| Female | 24 (67) | 149 (59) |
|
| Other | 1 (1) | 6 (2) |
|
| |||
|
| Black | 1 (1) | 18 (5) |
|
| White | 69 (85) | 311 (79) |
|
| Native | 2 (3) | 9 (2) |
|
| Latino | 2 (3) | 32 (8) |
|
| Islander | 11 (14) | 58 (15) |
| Age (years), mean (SD) | 40.91 (14.01) | 37.47 (14.24) | |
Logistic regression model results.
| Outcome of interest and feature set | Area under the curve | |||||
|
| ||||||
|
|
| |||||
|
|
| Topic model | 0.51 | .38 | ||
|
|
| Behavior features | 0.30 | <.001 | ||
|
|
| |||||
|
|
| Topic model | 0.57 | <.001 | ||
|
|
| Behavior features | 0.50 | —a | ||
|
|
| |||||
|
|
| Topic model | 0.47 | .02 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.47 | <.001 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.52 | .11 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.50 | — | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.46 | <.001 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.50 | — | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.51 | .28 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.50 | — | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.48 | .27 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.47 | <.001 | ||
|
|
| Behavior features | 0.50 | — | ||
|
|
| |||||
|
|
| Topic model | 0.48 | .002 | ||
|
|
| Behavior features | 0.50 | — | ||
|
| ||||||
|
|
| |||||
|
|
| Topic model | 0.67 | <.001 | ||
|
|
| Behavior features | 0.50 | — | ||
|
| ||||||
|
|
| |||||
|
|
| Topic model | 0.50 | — | ||
|
|
| Behavior features | 0.50 | — | ||
aInstances where P value cannot be calculated.
Most important topics for the in-sample classifier and direction of association.
| Topic | Top words | Associated with |
| 13 | science new human scientists data microbiome learning research study using great lab dna brain gt machine biology paper talk work wcsj2017 project interesting cool ai deep citizen check bacteria | In sample |
| 24 | cancer study disease research new risk brain join heart treat-ment scientific patients contributed health pain blood humanitar-ian drug help gut therapy depression diseases flu high dr women years vaccine | In sample |
| 36 | spread help share awareness terrible disease time cpu wcg earned points donating results days word donated past wcgrid week month years day hours son old semicolon badge 3026 1650935 raise | In sample |
| 97 | 5points genes gene human dna cancer notes new data cells genome tumor cell variants genomes vs bog15 genetic rare finds expression nygc rna non agbt15 gt pg14 protein paper | In sample |
| 16 | gold olympics usa olympic ich medal die org und silver team der ist rio2016 old medals contact es ein hockey won win das teamusa women nicht wins war einen | Random sample |
| 29 | que la el en se es lo por los mi para una te del las ya si como pero todo ser yo su tu da eu os est qu hoy | Random sample |
| 38 | hai a1 ho ke india ki modi a3 ka a2 se hi a5 nahi kya ko bhi toh a4 na timepass aur ab main contest mein tu ye kar | Random sample |
| 52 | new photo facebook posted martin instagram king photos luther video page yorker album pic shoot jeff caption cover credit selfie beijing york shkreli beatbaker burger ad repost fb likes selfies | Random sample |
| 53 | follow retweet gain trapadrive followers fast let thanks appreci-ate gainwithxtiandela retweets 1ddrive likes tweet active time rts naijafollowtrain follows 500 bam gainwithpyewaw ifb gaining turn 100 quick mzanzifollotrain gainwithtrevor | Random sample |
| 69 | launch shared rocket sd first spacex holbrook falcon test elon musk space satellite ship says fund 10 percent barrier mission join landing location second stage project life new cruise | Random sample |