| Literature DB >> 36158091 |
Abstract
Social restrictions, such as social distancing and self-isolation, imposed owing to the coronavirus disease-19 (COVID-19) pandemic have resulted in a decreased demand of commodities and manufactured products. However, the factors influencing sales in commercial districts in the pre- and post-COVID-19 periods have not yet been fully understood. Thus, this study uses machine learning techniques to identify the changes in important geographical factors among both periods that have affected sales in commercial alleys. It was discovered that, in the post-COVID-19 period, the number of pharmacies, age groups of the working population, average monthly income, and number of families living in apartments priced higher than $600k in the catchment areas had relatively high importance after COVID-19 in the prediction of a high level of sales. Moreover, the percentage of deciduous forests appeared to be a important factor in the post-COVID-19 period. As the average monthly income and worker population in their 60s and numbers of pharmacies and banks increased after the pandemic, sales in commercial alleys also increased. The survival of commercial alleys has become a critical social problem in the post-COVID-19 era; therefore, this study is meaningful in that it suggests a policy direction that could contribute to the revitalization of commercial alley sales in the future and boost the local economy.Entities:
Keywords: Extreme gradient boosting; Feature importance; Geographic information system (GIS); Random forest; Shapley additive explanations (SHAP)
Year: 2022 PMID: 36158091 PMCID: PMC9484863 DOI: 10.1016/j.heliyon.2022.e10708
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Commercial alleys (green color) in Seoul.
Figure 2An example of a catchment area (pink color) around a commercial alley (green color).
Details of the response variable and influential factors.
| Name | Description | Notes | |
|---|---|---|---|
| Response | Sales | Level of total sales of all businesses in each commercial alley. | Seoul commercial alley data |
| Economy | Household related to the area of the apartment | Number of families living in apartments smaller than 66 m2 or larger than 66, 99, 132, or 165 m2. | Seoul commercial alley data |
| Household related to the price of the apartment | Number of families living in apartments priced under $100,000 or over $100,000, $200,000, $300,000, $400,000, $500,000, or $600,000. | Seoul commercial alley data | |
| Income (currency: KRW) | Average monthly income. | Seoul commercial alley data | |
| Magnet | Facility | Number of facilities including all facilities, public facilities, banks, hospitals, clinics, pharmacies, kindergarten, elementary schools, middle schools, high schools, colleges, department stores, supermarkets, theaters, accommodations, airports, railway stations, bus terminals, subway stations, and bus stops. | Seoul commercial alley data |
| Green space | Percentage of deciduous forest, coniferous forest, mixed forest, natural grass, and artificial grass. | Land cover data | |
| Population structure | Dynamic population | Total, male, female, 10s, 20s, 30s 40s, 50s, or over 60, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. | Seoul commercial alley data |
| Resident population | Total, male, female, 10s, 20s, 30s, 40s, 50s, or over 60. | Seoul commercial alley data | |
| Worker population | Total, male, female, 10s, 20s, 30s, 40s, 50s, or over 60. | Seoul commercial alley data | |
| Connectivity | Cluster | Two clusters based on distance. | K-means |
| Time | Quarter | Quarter of a year. | Seoul commercial alley data |
Factor names and their descriptions.
| Factor name | Description |
|---|---|
| A_a_# | Number of families living in apartments smaller than 66 m2 or larger than 66, 99, 132, or 165 m2. |
| A_p_# | Number of families living in apartments priced under $100,000 or over $100,000, $200,000, $300,000, $400,000, $500,000, or $600,000. |
| IC | Average monthly income. |
| F_t, F_pf, F_ba, F_hos, F_clin, F_pha, F_kin, F_ele, F_mid, F_high, F_col, F_ds, F_supm, F_thea, F_acco, F_air, F_railsta, F_buster, F_substa, F_busstp | Total number of all facilities, public facilities, banks, hospitals, clinics, pharmacies, kindergarten, elementary schools, middle schools, high schools, colleges, department stores, supermarkets, theaters, accommodations, airports, railway stations, bus terminals, subway stations, and bus stops, respectively. |
| LC_df, LC_cf, LC_mf, LC_ng, LC_ag | Percentages of deciduous forest, coniferous forest, mixed forest, natural grass, and artificial grass, respectively. |
| DP_t, DP_m, DP_f, DP_#, DP_M, DP_Tu, DP_W, DP_Th, DP_F, DP_Sa, DP_Su | Dynamic population total, male, female, 10s, 20s, 30s 40s, 50s, and over 60, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday, respectively. |
| RP_t, RP_m, RP_f, RP_# | Resident population total, male, female, 10s, 20s, 30s, 40s, 50s, and over 60, respectively. |
| WP_t, WP_m, WP_f, WP_# | Worker Population total, male, female, 10s, 20s, 30s, 40s, 50s, and over 60s, respectively. |
| Cluster | Cluster. |
| Qs | Quarter. |
Figure 3Silhouette plot to determine the optimal number of clusters via k-means.
Figure 4Result of k-means cluster analysis of commercial alleys with the optimal number of clusters; x and y coordinates are relative coordinates, which represent the locations of commercial alleys and clusters.
Hyperparameter ranges and optimized hyperparameter combinations.
| Hyperparameter ranges used for tuning | RF | |
| XGB | ||
| Optimized hyperparameter combinations | RF | |
| XGB |
Figure 5Example of a single decision tree constructed via RF.
Performances of the RF and XGB models for the two datasets: 2018–2019 and 2020–2021.
| RF | XGB | |||
|---|---|---|---|---|
| 2018–2019 | 2020–2021 | 2018–2019 | 2020–2021 | |
| Accuracy (%) | 89.34 | 91.29 | 89.89 | 91.58 |
| Precision (%) | 89.46 | 91.34 | 89.97 | 91.61 |
| Recall (%) | 89.34 | 91.29 | 89.89 | 91.58 |
| F1 (%) | 89.32 | 91.29 | 89.88 | 91.57 |
Figure 6Prediction accuracies with errors of RF and XGB after 2- to 10-fold cross-validations.
Figure 7Importance of influential factors in the XGB model.
Figure 8Impact of important influential factors on the prediction of high level of sales in the period before COVID-19.
Figure 9Impact of important influential factors on the prediction of a high level of sales in the period after COVID-19.
Figure 10Dependence of influential factors on the prediction of the high level of sales in the pre-COVID-19 period.
Figure 11Dependence of influential factors on the prediction of the high level of sales in the post-COVID-19 period.