| Literature DB >> 23630615 |
Hong-Hee Won1, Woojae Myung, Gil-Young Song, Won-Hee Lee, Jong-Won Kim, Bernard J Carroll, Doh Kwan Kim.
Abstract
Suicide is not only an individual phenomenon, but it is also influenced by social and environmental factors. With the high suicide rate and the abundance of social media data in South Korea, we have studied the potential of this new medium for predicting completed suicide at the population level. We tested two social media variables (suicide-related and dysphoria-related weblog entries) along with classical social, economic and meteorological variables as predictors of suicide over 3 years (2008 through 2010). Both social media variables were powerfully associated with suicide frequency. The suicide variable displayed high variability and was reactive to celebrity suicide events, while the dysphoria variable showed longer secular trends, with lower variability. We interpret these as reflections of social affect and social mood, respectively. In the final multivariate model, the two social media variables, especially the dysphoria variable, displaced two classical economic predictors - consumer price index and unemployment rate. The prediction model developed with the 2-year training data set (2008 through 2009) was validated in the data for 2010 and was robust in a sensitivity analysis controlling for celebrity suicide effects. These results indicate that social media data may be of value in national suicide forecasting and prevention.Entities:
Mesh:
Year: 2013 PMID: 23630615 PMCID: PMC3632511 DOI: 10.1371/journal.pone.0061809
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Prediction of nation-wide suicide number occurring in three-day epochs.
Vertical bars denote the one month period following each celebrity suicide case (N = 6) (see Methods). These intervals overlapped for the first 2 celebrity suicide cases. Data of 2008 and 2009 were used as a training set and data of 2010 were used as a validation set. (A) Prediction range accuracy. Observed suicides (blue solid line) and prediction intervals (red dashed lines). The prediction range was computed for 85% probability. Prediction range accuracy was 0.88 for the training set and 0.79 for the validation set. (B) Predicted suicides (red) and observed suicides (blue). Correlations of 0.82 and 0.74 were obtained for 243 epochs of the training set and 121 epochs of the validation set, respectively. (C) Celebrity suicides and social media data. Suicide weblog counts (blue) and dysphoria weblog counts (black) are presented. The dysphoria weblog count was divided by 5 to adjust the ordinate axis scale with the suicide weblog count.
Variables selected for multivariate prediction model development.
| Variable | Description |
|
| AdjustedR-squared |
|
| ||||
| suicide (t-1) | 3-day sum of observed number of suicides at time t-1 | 16.40 | <2.00×10–16 | 0.53 |
| suicide_5yr_avg (t) | last five-year-average of suicides for the same month | 4.74 | 3.60×10–6 | 0.08 |
|
| ||||
| consumer price index (t-1) | change in monthly consumer price index from –13 monthsto –1 month | –2.89 | 0.004 | 0.03 |
| unemployment (t-1) | monthly unemployment rate previous month | 7.44 | 1.75×10–12 | 0.18 |
| stock (t-1) | Korean stock index, KOSPI, most recent 3-day epoch average close | –7.11 | 1.29×10–11 | 0.17 |
| temperature (t-1) | daily temperature, most recent 3-day epoch average | 5.42 | 1.42×10–7 | 0.11 |
| celebrity (t-1) | within one month after a celebrity suicidal event, 1; else, 0 | 8.45 | 2.80×10–15 | 0.22 |
|
| ||||
| suicide weblog count (t-1) | most recent 3-day sum of weblog posts that contain the Koreanword | 8.42 | 3.32×10–15 | 0.22 |
| dysphoria weblog count (t-1) | most recent 3-day sum of weblog posts that contain the Koreanword | 7.63 | 5.44×10–13 | 0.19 |
t indicates the predicted time point, and t-1 indicates a previous time point (see Methods for details). With the exception of the consumer price index, all variables were significant after Bonferroni correction for multiple testing (P<0.0002 or lower). The Table displays uncorrected P values.
Final variables included in the prediction model (adjusted R-squared = 0.66).
| Variable | Regression coefficient |
|
|
| constant | 4.07 | 34.83 | <2.00×10–16 |
| suicide (t-1) | 0.003 | 6.27 | 1.74×10–9 |
| dysphoria weblog count (t-1) | 3.18×10–5 | 5.95 | 9.66×10–9 |
| stock (t-1) | –2.30×10–4 | –4.83 | 2.49×10–6 |
| celebrity (t-1) | 0.10 | 3.11 | 0.002 |
| suicide_5yr_avg (t) | 2.84×10–4 | 3.10 | 0.002 |
| temperature (t-1) | 0.004 | 3.00 | 0.003 |
| suicide weblog count (t-1) | 3.23×10–5 | 2.00 | 0.047 |
t indicates the predicted time point, and t-1 indicates a previous time point (see Methods for details).