| Literature DB >> 34818324 |
Sayanti Mukherjee1, Zhiyuan Wei1.
Abstract
Disparity in suicide rates across various metropolitan areas in the US is growing. Besides personal genomics and pre-existing mental health conditions affecting individual-level suicidal behaviors, contextual factors are also instrumental in determining region-/community-level suicide risk. However, there is a lack of quantitative approach to model the complex associations and interplays of the socio-environmental factors with the regional suicide rates. In this paper, we propose a holistic data-driven framework to model the associations of socio-environmental factors (demographic, socio-economic, and climate) with the suicide rates, and compare the key socio-environmental determinants of suicides across the large and medium/small metros of the vulnerable US states, leveraging a suite of advanced statistical learning algorithms. We found that random forest outperforms all the other models in terms of both in-sample goodness-of-fit and out-of-sample predictive accuracy, which is then used for statistical inferencing. Overall, our findings show that there is a significant difference in the relationships of socio-environmental factors with the suicide rates across the large and medium/small metropolitan areas of the vulnerable US states. Particularly, suicides in medium/small metros are more sensitive to socio-economic and demographic factors, while that in large metros are more sensitive to climatic factors. Our results also indicate that non-Hispanics, native Hawaiian or Pacific islanders, and adolescents aged 15-29 years, residing in the large metropolitan areas, are more vulnerable to suicides compared to those living in the medium/small metropolitan areas. We also observe that higher temperatures are positively associated with higher suicide rates, with large metros being more sensitive to such association compared to that of the medium/small metros. Our proposed data-driven framework underscores the future opportunities of using big data analytics in analyzing the complex associations of socio-environmental factors and inform policy actions accordingly.Entities:
Mesh:
Year: 2021 PMID: 34818324 PMCID: PMC8612572 DOI: 10.1371/journal.pone.0258824
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Study samples.
| Urbanization level | Selected counties |
|---|---|
| Large Metros | Maricopa County (AZ), Adams County (CO), Arapahoe County (CO), Denver County (CO), Douglas County (CO), Jefferson County (CO), Johnson County (KS), Jefferson County (KY), St. Charles County (MO), St. Louis County (MO), Jackson County (MO), Clark County (NV), Oklahoma County (OK), Clackamas County (OR), Multnomah County (OR), Washington County (OR), Davidson County (TN), Shelby County (TN), Salt Lake County (UT), Clark County (WA), King County (WA), Pierce County (WA), Snohomish County (WA) |
| Medium/Small Metros | Mohave County (AZ), Pima County (AZ), Yavapai County (AZ), El Paso County (CO), Weld County (CO), Ada County (ID), Sedgwick County (KS), Washoe County (NV), Hillsborough County (NH), Bernalillo County (NM), Tulsa County (OK), Utah County (UT), Weber County (UT), Spokane County (WA) |
Fig 1Violin plot depicting normalized suicide mortality rates between large and medium/small metropolitan areas.
Violin plots are similar to box plots, with a rotated kernel density plot on each side showing the probability density of the data at different values.
Description of socio-environmental variables.
| Variable Name | Description | Periodicity |
|---|---|---|
| Urbanization level | Large metro or medium/small metro per county. | Annually |
| Unemployment Rate | Percent of unemployed workers in the total labor force. | Monthly |
| Poverty | Percent of people (of all ages) in poverty in the county. | Annually |
| Income | Median household income in the county. | Annually |
| Age Group 1 | Percent of county’s population ages below 14. | Annually |
| Age Group 2 | Percent of county’s population between ages 15–29. | Annually |
| Age Group 3 | Percent of county’s population between ages 30–44. | Annually |
| Age Group 4 | Percent of county’s population between ages 45–59. | Annually |
| Age Group 5 | Percent of county’s population between ages 60–74. | Annually |
| Age Group 6 | Percent of county’s population ages above 75. | Annually |
| Female | Percent of county’s population female. | Annually |
| NA | Percent of county’s population which is Native Hawaiian, Pacific Islander alone (i.e., no other race). | Annually |
| AA | Percent of county’s population which is Asian alone. | Annually |
| IA | Percent of county’s population which is American Indian, Alaska native alone. | Annually |
| BA | Percent of county’s population which is Black alone. | Annually |
| WA | Percent of county’s population which is White alone. | Annually |
| NH | Percent of county’s population which is non-Hispanic. | Annually |
| Education Group 1 | Percent of county’s population whose education level is less than a High School diploma. | Annually |
| Education Group 2 | Percent of county’s population whose education level is a High School diploma only. | Annually |
| Education Group 3 | Percent of county’s population whose education level is some college of Associates degree. | Annually |
| Education Group 4 | Percent of county’s population whose education level is a Bachelor’s degree or higher. | Annually |
| DP10 | Number of days with ≥ 1.00 inch of precipitation in the month. | Monthly |
| DT00 | Number of days with minimum temperature ≤ 0 degrees Fahrenheit. | Monthly |
| DX32 | Number of days with maximum temperature ≤ 32 degrees Fahrenheit. | Monthly |
| DX70 | Number of days with maximum temperature ≥ 70 degrees Fahrenheit. | Monthly |
| DX90 | Number of days with maximum temperature ≥ 90 degrees Fahrenheit. | Monthly |
| EMXP | Extreme maximum daily precipitation total within month. Values are given in inches (to hundredths). | Monthly |
| CDSD | Cooling degree days (season-to-date). Running total of monthly cooling degree days through the end of the most recent month. Each month is summed to produce a season-to-date total. Season starts in July in Northern Hemisphere and January in Southern Hemisphere. | Monthly |
| HDSD | Heating degree days (season-to-date). Running total of monthly heating degree days through the end of the most recent month. Each month is summed to produce a season-to-date total. Season starts in July in Northern Hemisphere and January in Southern Hemisphere. | Monthly |
Fig 2Schematic of the proposed data-driven research framework.
Large metropolitan counties: Model performance comparison.
| Large Metropolitan County Model | |||||||
|---|---|---|---|---|---|---|---|
| # | Models | Goodness-of-fit | Predictive accuracy | ||||
|
| RMSE | MAE |
| RMSE | MAE | ||
| 1 | Generalized Linear Model | 0.507 | 0.265 | 0.206 | 0.470 | 0.268 | 0.211 |
| 2 | Ridge Regression | 0.505 | 0.265 | 0.207 | 0.470 | 0.268 | 0.210 |
| 3 | Lasso Regression | 0.487 | 0.270 | 0.209 | 0.459 | 0.271 | 0.211 |
| 4 | Generalized Additive Model | 0.557 | 0.250 | 0.194 | 0.475 | 0.267 | 0.208 |
| 5 | Multi Adaptive Regression Splines [degree = 1] | 0.527 | 0.259 | 0.201 | 0.472 | 0.267 | 0.207 |
| 6 | Multi Adaptive Regression Splines [degree = 2] | 0.532 | 0.258 | 0.200 | 0.462 | 0.270 | 0.211 |
| 7 | Multi Adaptive Regression Splines [degree = 3] | 0.577 | 0.245 | 0.191 | 0.402 | 0.285 | 0.220 |
| 8 | Multi Adaptive Regression Splines [degree = 3; penalty = 2] | 0.506 | 0.264 | 0.206 | 0.442 | 0.275 | 0.213 |
| 9 |
|
|
|
|
|
|
|
| 10 | Gradient Boosting Method | 0.887 | 0.126 | 0.100 | 0.365 | 0.293 | 0.229 |
| 11 | Bayesian Additive Regression trees | 0.574 | 0.246 | 0.190 | 0.484 | 0.265 | 0.205 |
| 12 | Null Model (Mean-only) | NA | 0.377 | 0.296 | NA | 0.369 | 0.292 |
Medium/Small metropolitan counties: Model performance comparison.
| Medium/Small Metropolitan County Model | |||||||
|---|---|---|---|---|---|---|---|
| # | Models | Goodness-of-fit | Predictive accuracy | ||||
|
| RMSE | MAE |
| RMSE | MAE | ||
| 1 | Generalized Linear Model | 0.626 | 0.398 | 0.300 | 0.570 | 0.402 | 0.307 |
| 2 | Ridge Regression | 0.626 | 0.398 | 0.300 | 0.570 | 0.402 | 0.308 |
| 3 | Lasso Regression | 0.591 | 0.416 | 0.312 | 0.537 | 0.418 | 0.317 |
| 4 | Generalized Additive Model | 0.779 | 0.305 | 0.233 | 0.645 | 0.364 | 0.274 |
| 5 | Multi Adaptive Regression Splines [degree = 1] | 0.750 | 0.325 | 0.249 | 0.655 | 0.359 | 0.272 |
| 6 | Multi Adaptive Regression Splines [degree = 2] | 0.760 | 0.319 | 0.246 | 0.627 | 0.371 | 0.280 |
| 7 | Multi Adaptive Regression Splines [degree = 3] | 0.790 | 0.297 | 0.230 | 0.587 | 0.391 | 0.287 |
| 8 | Multi Adaptive Regression Splines [degree = 3; penalty = 2] | 0.724 | 0.340 | 0.264 | 0.617 | 0.379 | 0.286 |
| 9 |
|
|
|
|
|
|
|
| 10 | Gradient Boosting Method | 0.967 | 0.117 | 0.090 | 0.620 | 0.376 | 0.284 |
| 11 | Bayesian Additive Regression trees | 0.804 | 0.287 | 0.218 | 0.667 | 0.354 | 0.266 |
| 12 | Null Model (Mean-only) | NA | 0.650 | 0.456 | NA | 0.619 | 0.440 |
Fig 3Large metropolitan counties: Model diagnostics of final random forest model.
(A) Residuals QQ plot (the blue dashed lines represent 95% confidence intervals); (B) Predicted versus actual suicide counts, normalized per 100,000 of population.
Fig 4Medium/Small metropolitan counties: Model diagnostics of final random forest model.
(A) Residuals QQ plot (the blue dashed lines represent 95% confidence intervals); (B) Predicted versus actual suicide counts, normalized per 100,000 of population.
Summary of top 15 variables in large and medium/small areas.
| Variable | Description | Large Metros | Medium/Small Metros | ||
|---|---|---|---|---|---|
| Rank | Correlation | Rank | Correlation | ||
| AA | Percentage of Asian population. | 1 | Negative | 2 | Negative |
| BA | Percent of Black population. | 12 | Mixed | 1 | Negative |
| NH | Percent of non-Hispanic population. | 9 | Positive | 15 | Positive |
| IA | Percent of American Indian, Alaska native population. | 10 | Mixed | 5 | Negative |
| NA | Percent of Native Hawaiian, Pacific Islander population. | 13 | Positive | 6 | Negative |
| Female | Percent of female population. | 7 | Mixed | 4 | Negative |
| Age_1 | Percent of young adults aged below 14 years old. | 6 | Mixed | 14 | Mixed |
| Age_2 | Percent of adolescents aged 15–29 years old. | 4 | Positive | 8 | Negative |
| Age_6 | Percent of elder people aged above 75 years old. | - | - | 7 | Mixed |
| Education_1 | Percent of people with less than a high school degree. | 11 | Mixed | 11 | Positive |
| Education_2 | Percent of people with a high school degree. | 8 | Mixed | 3 | Positive |
| Education_3 | Percent of people with an associate degree. | 14 | Negative | 10 | Negative |
| Unemployment | Percent of unemployed workers in the total labor force. | - | - | 9 | Positive |
| Income | Median household income. | - | - | 13 | Mixed |
| DX90 | Number of days with temperature ≥ 90°F. | 2 | Positive | - | - |
| DX70 | Number of days with temperature ≥ 70°F. | 3 | Positive | - | - |
| HDSD | Seasonal heating degree days. | 5 | Mixed | - | - |
| EMXP | Extreme maximum daily precipitation total within month. | 15 | Mixed | - | - |
| CDSD | Seasonal cooling degree days. | - | - | 12 | Positive |
Note that, positive correlation denotes the relationship between predictor and response variable that changes in the same way (either increasing or decreasing), while negative correlation denotes this relationship changes in the opposite way. Otherwise, a mixed correlation indicates a combination of positive and negative relationship between predictor and response variable.
Fig 5Variable importance ranking of top 15 predictors.
Top 15 socio-environmental factors selected from random forest in relation to suicide rates are shown in (a) and (b) with respect to large metros and medium/small metros.
Fig 6Suicide mortality rate and race: (A) Large metro areas; (B) Medium/small metro areas.
Rug lines on the x axis indicate prevalence of data points; black curve is the average marginal effect of the predictor variable; red lines indicate the 95% confidence intervals.
Fig 7Suicide mortality rate and gender: (A) Large metro areas; (B) Medium/small metro areas.
Rug lines on the x axis indicate prevalence of data points; black curve is the average marginal effect of the predictor variable; red lines indicate the 95% confidence intervals.
Fig 8Suicide mortality rate and age: (A) Large metro areas; (B) Medium/small metro areas.
Rug lines on the x axis indicate prevalence of data points; black curve is the average marginal effect of the predictor variable; red lines indicate the 95% confidence intervals.
Fig 9Suicide mortality rate and education: (A) Large metro areas; (B) Medium/small metro areas.
Rug lines on the x axis indicate prevalence of data points; black curve is the average marginal effect of the predictor variable; red lines indicate the 95% confidence intervals.
Fig 10Suicide mortality rate in economics for (B) Medium/small metro areas only.
Rug lines on the x axis indicate prevalence of data points; black curve is the average marginal effect of the predictor variable; red lines indicate the 95% confidence intervals.
Fig 11Suicide mortality rate and climate: (A) Large metro areas; (B) Medium/small metro areas.
Rug lines on the x axis indicate prevalence of data points; black curve is the average marginal effect of the predictor variable; red lines indicate the 95% confidence intervals.