| Literature DB >> 35114614 |
George Grekousis1, Zhixin Feng2, Ioannis Marakakis3, Yi Lu4, Ruoyu Wang5.
Abstract
A growing number of studies show that the uneven spatial distribution of COVID-19 deaths is related to demographic and socioeconomic disparities across space. However, most studies fail to assess the relative importance of each factor to COVID-19 death rate and, more importantly, how this importance varies spatially. Here, we assess the variables that are more important locally using Geographical Random Forest (GRF), a local non-linear regression method. Through GRF, we estimated the non-linear relationships between the COVID-19 death rate and 29 socioeconomic and health-related factors during the first year of the pandemic in the USA (county level). GRF outputs are compared to global (Random Forest and OLS) and local (Geographically Weighted Regression) models. Results show that GRF outperforms all models and that the importance of variables highly varies by location. For example, lack of health insurance is the most important factor in one-third (34.86%) of the US counties. Most of these counties are (concentrated mainly in the Midwest region and South region). On the other hand, no leisure-time physical activity is the most important primary factor for 19.86% of the US counties. These counties are found in California, Oregon, Washington, and parts of the South region. Understanding the location-based characteristics and spatial patterns of socioeconomic and health factors linked to COVID-19 deaths is paramount for policy designing and decision making. In this way, interventions can be designed and implemented based on the most important factors locally, avoiding thus general guidelines addressed for the entire nation.Entities:
Keywords: COVID-19; Random forest; USA; spatial Machine learning
Mesh:
Year: 2022 PMID: 35114614 PMCID: PMC8801594 DOI: 10.1016/j.healthplace.2022.102744
Source DB: PubMed Journal: Health Place ISSN: 1353-8292 Impact factor: 4.078
Names, descriptions and sources of the variables.
| Theme | Variable Name | Description | Source |
|---|---|---|---|
| Demographic | Population density | Population density per square km | U.S. Census US Census Bureau, 2019 |
| %Age 20-39 | % Population by age: 20–39 years | ||
| %Age 40-59 | % Population by age: 40–59 years | ||
| %Age 60-79 | % Population by age: 60–79 years | ||
| %Age 80+ | % Population by age: 80 years and over | ||
| %African American | % Population by race: Black or African American alone | ||
| %Asian | % Population by race: Asian alone | ||
| %Disabled | % Civilian noninstitutionalized population with a disability | ||
| Households | Household size | Average household size | U.S. Census US Census Bureau, 2019 |
| Housing Characteristics | %No vehicles | % Occupied housing units with no vehicles available | U.S. Census US Census Bureau, 2019 |
| %Housing problem | Percentage of households with at least 1 of 4 housing problems: overcrowding, high housing costs, lack of kitchen facilities, or lack of plumbing facilities | ||
| Education | %> Bachelor | % Population 25 years and over bachelor's degree | U.S. Census US Census Bureau, 2019 |
| Employment | %Work construction and trade sector | % Workers in construction, manufacturing, wholesale trade, transportation, warehousing, utilities and retail trade | U.S. Census US Census Bureau, 2019 |
| %Work services sector | % Workers in information, finance, insurance, real estate, rental, leasing, professional, scientific, management, administrative and waste management services | ||
| %Work social sector | % Workers in educational services, health care, and social assistance | ||
| Economic | Median income | Households median annual income (in 1000 dollars) | U.S. Census US Census Bureau, 2019 |
| %Unemployment | % Unemployment rate for population 20–64 years | ||
| %No insurance | % Current lack of health insurance among adults aged 18–64 years 2018 age-adjusted prevalence | ||
| %Poverty | % Below poverty level population for whom poverty status is determined | U.S. Census US Census Bureau, 2019 | |
| Commuting | %Private transportation | % Worker 16 years and over by means of transportation to work: car, truck, or van drove alone | U.S. Census US Census Bureau, 2019 |
| %Walking | % Worker 16 years and over by means of transportation to work: walk | ||
| %Work from home | % Worker 16 years and over who worked from home | ||
| Commuting time | Mean travel time to work (minutes) for workers 16 years and over who did not work from home | ||
| Health Condition | Heart disease mortality | Heart disease mortality per 100,000 population age-adjusted, spatially smoothed, 3-year average. 2016–2018 | |
| %Asthma | % Current asthma among adults aged ≥18 years 2018 age-adjusted prevalence | ||
| %Obesity | % Obesity among adults aged ≥18 years 2018 age-adjusted prevalence | ||
| %Sleep<7hrs | % Sleeping less than 7 h among adults aged ≥18 years 2018 age-adjusted prevalence | ||
| %No leisure-time PA | % no leisure-time physical activity among adults aged ≥18 years 2018 age-adjusted prevalence | ||
| %Smokers | % Current smoking among adults aged ≥18 years 2018 age-adjusted prevalence | ||
| Coordinates | X,Y | Counties' centroids coordinates | U.S. Census US Census Bureau, 2019 |
| COVID-19 | Deaths per 100k | Cumulative COVID-19 deaths per 100,000 population as of February 5th, 2021 |
Model assessment metrics.
| Model | RMSE | MAE | OOB | |
|---|---|---|---|---|
| OLS | 74.83 | 57.19 | 0.30 | NA |
| GWR | 70.25 | 55.20 | 0.55 | NA |
| RF | 71.31 | 54.59 | 0.63 | 0.38 |
| GRF | 67.29 | 50.31 | 0.76 | 0.43 |
NA: Not applicable.
Fig. 1RF variable importance. A higher increase (%) in mean squared error (%IncMSE) corresponds to higher importance.
Fig. 2Partial dependence plots for the top 20 most important risk factors.
Fig. 3Average local importance per variable in the GRF model. A higher increase (%) in mean squared error (%IncMSE) corresponds to higher importance.
Counties having the same factor with the highest importance (primary factor).
| Local primary factors | Share of counties (%) |
|---|---|
| Lack of health insurance (%) | 34.86 |
| No leisure-time physical activity (%) | 19.86 |
| Aged over 80 years (%) | 12.25 |
| No vehicles (in occupied housing units) (%) | 8.81 |
| Smokers (%) | 8.57 |
| Heart disease mortality rate | 4.47 |
| African American (%) | 4.40 |
| Households' median annual income | 3.21 |
| Other risk factors | 3.57 |
Fig. 4Spatial distribution of importance of key factors.
Fig. 5Primary local factor per county.
Fig. 6(a) Spatial distribution of the standardised local residuals. (b) Local Moran's I of standardized residuals.