| Literature DB >> 33132740 |
Hung Nguyen1, Thin Nguyen2, Duc Thanh Nguyen3.
Abstract
We propose in this work a graph-based approach for automatic public health analysis using social media. In our approach, graphs are created to model the interactions between features and between tweets in social media. We investigated different graph properties and methods in constructing graph-based representations for population health analysis. The proposed approach is applied in two case studies: (1) estimating health indices, and (2) classifying health situation of counties in the US. We evaluate our approach on a dataset including more than one billion tweets collected in three years 2014, 2015, and 2016, and the health surveys from the Behavioral Risk Factor Surveillance System. We conducted realistic and large-scale experiments on various textual features and graph-based representations. Experimental results verified the robustness of the proposed approach and its superiority over existing ones in both case studies, confirming the potential of graph-based approach for modeling interactions in social networks for population health analysis. © Springer Science+Business Media, LLC, part of Springer Nature 2020.Entities:
Keywords: Geo-tagged tweets; Graphs; Health on the web; Large-scale computing; Population health
Year: 2020 PMID: 33132740 PMCID: PMC7585996 DOI: 10.1007/s11042-020-10034-0
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.757
Fig. 1Flowchart of our approach
Case study 1: health index estimation performance (Spearman’s rho) of inter-feature graphs vs existing work
| Features | Methods | 2014 | 2015 | 2016 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Generic | Physical | Mental | Generic | Physical | Mental | Generic | Physical | Mental | ||
| LIWC | Non-graph [ | 0.67 | 0.59 | 0.55 | 0.58 | 0.52 | 0.52 | 0.57 | 0.50 | 0.49 |
| WL subtree [ | 0.69 | 0.60 | 0.55 | 0.62 | 0.56 | 0.52 | 0.57 | 0.50 | 0.49 | |
| BC property [ | 0.69 | 0.62 | 0.58 | 0.62 | 0.57 | 0.56 | 0.58 | 0.52 | 0.51 | |
| WL subtree [ | 0.69 | 0.60 | 0.55 | 0.62 | 0.56 | 0.52 | 0.57 | 0.50 | 0.49 | |
| + LIWC | ||||||||||
| BC property [ | 0.69 | 0.62 | 0.58 | 0.62 | 0.57 | 0.56 | 0.58 | 0.52 | 0.51 | |
| + LIWC | ||||||||||
| Topics | Non-graph [ | 0.69 | 0.65 | 0.68 | 0.62 | 0.59 | 0.62 | 0.57 | 0.55 | 0.59 |
| Latent | WL subtree [ | 0.33 | 0.27 | 0.27 | 0.28 | 0.25 | 0.22 | 0.24 | 0.18 | 0.18 |
| BC property [ | 0.64 | 0.53 | 0.51 | 0.61 | 0.53 | 0.48 | 0.55 | 0.45 | 0.43 | |
| WL subtree [ | 0.67 | 0.61 | 0.64 | 0.62 | 0.59 | 0.62 | 0.54 | 0.49 | 0.52 | |
| + Latent topics | ||||||||||
| BC property [ | ||||||||||
| + Latent topics | ||||||||||
For graph-based methods, only BC property’s results and WL subtree kernel’s results are presented as BC property and WL subtree kernel respectively perform best among other graph properties and kernels. The index ranges are as follows: [4, 51] for generic health, [1, 10] for physical health, and [1, 10] for mental health. In each year and on each health index, best performances are highlighted
Case study 1: health index estimation performance (Spearman’s rho) of inter-tweet graphs vs existing work
| Features | Methods | 2014 | 2015 | 2016 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Generic | Physical | Mental | Generic | Physical | Mental | Generic | Physical | Mental | ||
| LIWC | Non-graph [ | 0.46 | 0.37 | 0.31 | 0.30 | 0.22 | 0.13 | 0.33 | 0.15 | 0.11 |
| WL subtree [ | 0.46 | 0.37 | 0.33 | 0.33 | 0.24 | 0.17 | 0.34 | 0.17 | 0.15 | |
| BC property [ | 0.46 | 0.36 | 0.33 | 0.34 | 0.25 | 0.17 | 0.34 | 0.20 | 0.17 | |
| Latent | Non-graph [ | 0.49 | 0.36 | 0.31 | 0.36 | 0.26 | 0.22 | 0.35 | 0.26 | 0.23 |
| topics | WL subtree [ | 0.23 | 0.26 | 0.27 | ||||||
| BC property [ | 0.53 | 0.37 | 0.26 | 0.37 | ||||||
For graph-based methods, only BC property’s results and WL subtree kernel’s results are presented as BC property and WL subtree kernel respectively perform best among other graph properties and kernels. The index ranges are as follows: [4, 51] for generic health, [1, 10] for physical health, and [1, 10] for mental health. In each year and on each health index, best performances are highlighted
Fig. 2Learning curves of our graph convolution model on the training and validation set. We used the model trained at 50 epochs in evaluations and comparisons
Case study 2: classification performance of inter-feature graphs on three health indices: generic, physical, and mental
| Year | Features | Generic | Physical | Mental | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | Mean | Std | Acc | Sen | Spe | Mean | Std | Acc | Sen | Spe | Mean | Std | ||
| AUC | AUC | AUC | AUC | AUC | AUC | |||||||||||
| 2014 | LIWC | 0.87 | 0.74 | 1.00 | 0.92 | 0.02 | 0.83 | 0.67 | 1.00 | 0.04 | 0.80 | 0.61 | 1.00 | 0.05 | ||
| Latent | 0.87 | 0.75 | 1.00 | 0.03 | 0.82 | 0.65 | 0.99 | 0.90 | 0.05 | 0.82 | 0.65 | 1.00 | 0.88 | 0.06 | ||
| topics | ||||||||||||||||
| 2015 | LIWC | 0.85 | 0.71 | 1.00 | 0.91 | 0.04 | 0.82 | 0.64 | 1.00 | 0.04 | 0.80 | 0.61 | 1.00 | 0.05 | ||
| Latent | 0.84 | 0.69 | 1.00 | 0.04 | 0.81 | 0.62 | 1.00 | 0.89 | 0.05 | 0.77 | 0.54 | 1.00 | 0.06 | |||
| topics | ||||||||||||||||
| 2016 | LIWC | 0.84 | 0.68 | 1.00 | 0.05 | 0.78 | 0.57 | 1.00 | 0.06 | 0.77 | 0.55 | 1.00 | 0.06 | |||
| Latent | 0.80 | 0.60 | 1.00 | 0.88 | 0.04 | 0.79 | 0.59 | 1.00 | 0.87 | 0.06 | 0.77 | 0.54 | 1.00 | 0.86 | 0.05 | |
| topics | ||||||||||||||||
We report the classification performance of our inter-feature graphs built upon LIWC and latent topics in years 2014, 2015, and 2016. In each year and on each health index, best mean AUC performances are highlighted
Fig. 3Visualization of learning features in graph convolution. Each data point corresponds to a county whose the graph is built on LIWC: a input graph features, b features learned after 50 epochs. Features are created by concatenating node features in graphs and presented in 2D using Principal Component Analysis. The most two prominent components are selected for this visualization. As shown, compared with input features, learned features are better separated
Case study 2: comparison of our method with existing ones using mean AUC in three years 2014, 2015, and 2016
| Features | Methods | 2014 | 2015 | 2016 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Generic | Physical | Mental | Generic | Physical | Mental | Generic | Physical | Mental | ||
| LIWC | Non-graph [ | 0.80 | 0.76 | 0.75 | 0.74 | 0.71 | 0.70 | 0.63 | 0.65 | 0.61 |
| Ours | ||||||||||
| Latent | Non-graph [ | 0.85 | 0.80 | 0.81 | 0.82 | 0.76 | 0.75 | 0.76 | 0.76 | |
| topics | Ours | |||||||||
We report the performance of our method and the work in [15] on both LIWC and latent topics. In each year and on each health index, best performances are highlighted