Literature DB >> 34173505

Recreational and philanthropic sectors are the worst-hit US industries in the COVID-19 aftermath.

Satyaki Roy¹, Ronojoy Dutta², Preetam Ghosh³.

Abstract

Lockdown measures to curb the spread of COVID-19 has brought the world economy on the brink of a recession. It is imperative that nations formulate administrative policies based on the changing economic landscape. In this work, we apply a statistical approach, called topic modeling, on text documents of job loss notices of 26 US states to identify the specific states and industrial sectors affected economically by this ongoing public health crisis. Our analysis reveals that there is a considerable incongruity in job loss patterns between the pre- and during-COVID timelines in several states and the recreational and philanthropic sectors register high job losses. It further shows that the interplay among several possible socioeconomic factors would lead to job losses in many sectors, while also creating new job opportunities in other sectors such as public service, pharmaceuticals and media, making the job loss trends a key indicator of the world economy. Finally, we compare the low income job loss rates against overall job losses due to COVID-19 in the US counties, and discuss the implications of press reports on reopening businesses and the unemployed workforce being absorbed by other sectors.

Entities: Chemical Disease Species

Keywords: COVID-19; Economy; Job losses; Policymaking; Topic model

Year: 2020 PMID： 34173505 PMCID： PMC7723762 DOI： 10.1016/j.ssaho.2020.100098

Source DB: PubMed Journal: Soc Sci Humanit Open

Introduction

Human history is scarred by plague, flu and Ebola that have globally claimed millions of lives (Coronavirus: what have be, 2020). Infectious disease is the leading cause of human deaths and will continue to affect many more in the years to come (Walsh, 2020). COVID-19, the ongoing pandemic caused by acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to 1.28 million deaths since its inception in Wuhan, China in December 2019 (Coronavirus world map: wh, 2020). Despite the strictest lockdown measures (Kimball et al., 2020) to contain infection spread, efforts by the frontline health workers and clinical trials of vaccines (Smith, 2020), there is no end in sight to this global health crisis. Epidemiologists, clinicians and computer scientists are applying their expertise to seek out factors and their implications on contagion as well as economic downturn (Adhikari et al., 2020). First, there are attempts to apply machine learning (ML) to build prediction models on epidemiological and clinical data. Given existing clinical data, prediction models (Wynantset al., 2020) and therapeutic approaches can help identify vulnerable groups (Alimadadi et al., 2020; Randhawa et al., 2020). Epidemiologists are trying to identify spread dynamics of COVID-19. Inga Holmdahl et al. (Holmdahl & Buckee, 2020) analyze the pros and cons of forecasting models that make predictions through curve fitting or mechanistic models, while supervised and unsupervised ML is helping trace the trends in infection dynamics (Wang et al., 2020). Khan et al. used regression tree analysis, cluster analysis and principal component analysis on Worldometer infection count data to gauge the variability and effect of testing in prediction of confirmed cases (Khan et al., 2020). Roy et al. perform regression analysis to identify pre-lockdown factors that affect the post-lockdown pandemic numbers (Roy & Ghosh, 2020). The other aspect of the ensuing lockdown is the looming financial crisis worldwide. The economists are expecting an unprecedented decline in industrial output and stock exchange percentage, increase in the price of goods (Khan et al., 2020) as well as a potential for contraction in US GDP (Baker et al., 2020). While national governments try to offset this slump in commodity prices as well as the households, firms, and financial markets by providing economic assistance to the affected groups (Gopinath, 2020), it is clear that such an arrangement is not sustainable for the developing nations. It is imperative to study the implications of COVID-19 on the global fiscal ecosystem to design effective administrative policies. Contributions. In this work, we study the industrial impacts of COVID-19 in 26 US states. We apply topic model -- a widely used statistical and natural language processing approach to identify latent topics in documents -- to recognize the affected job sectors. We generate industry type (such as hospitality, health care, etc.) repository for a state as follows: we process an open-access dataset of layoff notices from the Worker Adjustment and Retraining Notification (WARN) Act (from July 1998 to present) and connect them with the industry types from a Kaggle database of 7 million companies. Our topic analysis reveals the following interesting details about the US states and industry types: (1) Arkansas, Colorado, Connecticut, Georgia, Kentucky, North Carolina and Virginia are the states that have the highest disparity in job loss patterns in the pre- and during-COVID timelines and (2) the recreational and philanthropic sectors are the worst affected industries, while the blue collar jobs, public service (including pharmaceuticals) and print media industries undergo fewer job losses in the wake of COVID-19; food distribution, politics, electronic media, research are unaffected. We also study the job loss data from the US counties to gauge the impact of COVID-19 on the low income job loss rates. Finally, we discuss the implications of media reports on businesses reopening during and after the pandemic as well as the unemployed workforce being absorbed by other sectors.

Materials and methods

Data preprocessing

We consider three datasets: (1) of 7 million companies from 237 countries sourced from standard sources like LinkedIn (and available at https://www.kaggle.com/peopledatalabssf/). It includes the following fields: company name, year founded, industry type, country, number of employees, etc. (2) Worker Adjustment and Retraining Notification (WARN) Act (and the U.S. Department of Labor) that helps ensure advance notice in cases of qualified plant closings and mass layoffs. For each of the 26 states (for which all the fields could be parsed properly), there are the following fields: company name, layoff, total workforce, local area and WARN Date. (3) Urban Institute data science team used data from US Bureau of Labor Statistics to estimate the number of low income jobs (less than $40,000 salary) lost due to COVID-19 (https://datacatalog.urban.org/dataset/estimated-low-income-jobs-lost-covid-19). We make them available on https://github.com/satunr/COVID-19/tree/master/JobDataset. We process the Kaggle dataset to generate a hash table of the following format H: company name → industry type. For a company-name entry in the WARN data and state i, we record the industry type (i.e., H (company name)) in a new document d . Thus, di is a space-delimited text file of industry type tokens processed further to filter out stopwords (such as prepositions, punctuation, articles, etc.). We then utilize the date field in the WARN data to break down d into pre-COVID (up to December 31st, 2019) and during-COVID (January 1st, 2020 onwards) datasets named d and d , respectively.

Topic modeling

A document can be represented as a distribution of latent topics, while a topic is a distribution over a vocabulary set V.

Latent semantic analysis

It captures the association between documents and words using a matrix M; is the importance of word w in document d, measured as term frequency-inverse document frequency (tf-idf), given by: Here is the number of occurrences of w in d and is the number of documents containing w. In other words, tf-idf score is high when w occurs in few documents but has a high presence in the current document d (Ramoset al.).

Latent Dirichlet Analysis

It is a variant of latent semantic analysis that uses a Bayesian probabilistic model (Hoffman et al., 2010). Given that the distribution of topics in a document d is given by a Dirichlet distribution and the distribution of words in a topic is given by a Dirichlet distribution , latent Dirichlet Analysis (LDA) models the generation of a document d using words in an iterative, two-step generative approach. Choose a topic k from K based on a multinomial distribution . Choose a word w from V based on a multinomial distribution . Learning the topic and word distributions. One approach to learn the and is to minimize KL divergence between the predicted and true posterior. Markov Chain Monte Carlo approach presents a Gibbs sampling method that repeatedly samples the topic assignment of the word conditioned on the data and all other topic assignments (Darling, 2011). The probability that a word w belongs to topic k in document is given by: In the above equation, the first term represents the proportion of words in document d that are assigned to topic k as the ratio between the number of words assigned to k and total words in d (); the second term is the proportion of assignments to topic k across all documents that come from w calculated as the ratio between the number of times w is assigned to topic k () to the total number of word assignments to k.

Similarity

A document d is a vector of topics and a topic k is a vector of words. These metrics are used to gauge the inter-document or inter-topic distance. Hellinger distance. Given two distributions p and q, it is calculated as It always takes a value between 0 and 1 (Hellinger, 1909). Kullback-Leibler divergence. Given two distributions p and q, it is calculated as (Kullback & Leibler, 1951).

Implementation

We implement topic modeling with the Python Gensim library (Rehurek & Sojka, 2011) used for document indexing and similarity retrieval in text documents. Given a set of states , we have a set of three documents per state d , d and d . Gensim processes the corpora and represents each d as a list of industry type tokens (in a manner similar to the approach described in Sec. 2.2.1). It then applies LDA (see Sec. 2.2.2) to identify the latent topics in L and the distribution of words in each topic. We use the Hellinger distance and KL divergence (see Sec. 2.2.3) libraries to gauge inter-document distance between pre- and during-COVID documents d and d .

One sample proportion Z-test

It is a standard hypothesis testing approach. Given the number of trials and successor trials, one can test a null hypothesis (H ) whether the proportion (i.e., fraction of successful trials) of the data equals a prespecified value.

Results

The results are classified into the following two subsections: (a) distribution of industry types (or words) in topics and (b) identification of the most affected US states and industry types.

Distribution of words in topics

For our experiments we consider (|K| =)10 topics. Each topic k is a distribution of words w belonging to the vocabulary set V. Thus, k can be represented as some distribution , where is the weight of in k such as Below are the top 5 words in each topic in the decreasing order of the weights. Topic 0. hospitality, health care, retail, oil energy, restaurants Topic 1. hospitality, health care, retail, finance, food and beverages Topic 2. retail, food production, oil energy, finance, outsourcing Topic 3. retail, hospitality, automotive, health care, mining Topic 4. hospitality, health care, retail, oil energy, automotive Topic 5. retail, hospitality, health care, restaurants, IT Topic 6. hospitality, retail, restaurants, advertising, health care Topic 7. retail, health care, hospitality, finance, automotive Topic 8. hospitality, restaurants, retail, food and beverage, health care Topic 9. finance, aviation and aerospace, food and beverages, retail, IT For each topic k, we estimate the total number of statistically significant words, defined as words such as is greater than a predefined cut-off. Fig. 1 a depicts that, given a cut-off equal to 0.005, the statistically significant words are fairly distributed across topics, showing very little deviation from the mean (represented by a black dotted line).

Fig. 1

Distribution of words in topics. (a) the statistically significant words (with cut-off 0.005) are fairly distributed across topics; (b) mean pairwise Hellinger and KL distance scores for words in each topic. Similar to topics, a word w too can be represented as a vector comprising the weights it has in each topic, i.e., , where is the weight of w in topic k. We calculate the mean pairwise Hellinger and KL distance (see Sec. 2.2.3) between the statistically significant words in each topic. Fig. 1b shows that the Hellinger distance (equal to 1 - similarity) scores range between 0.15 and 0.2, suggesting the words in each topic are quite similar. KL divergence (shown in black line), which does not necessarily lie in the range 0 and 1, correlates with and corroborates the Hellinger distance scores.

Identification of significant states and industry types

A document d (representing the job losses for a state i) is broken down into pre- and during-COVID timelines d and di (as stated in Sec. 2.1). We record the Hellinger and KL distances between the d and d for each state i, d and d can be written as a vector (once again Fig. 2 a shows that Arkansas, Colorado, Connecticut, Georgia, Kentucky, North Carolina and Virginia (marked in red bars) exhibit Hellinger distance higher than cut-off 0.9. These states have the highest disparity in the job loss patterns in the pre- and during-COVID timelines.

Fig. 2

Identification of significant states and industry types. (a) Hellinger and KL distances between the pre- and during-COVID timelines for each state; (b) mean weight of topics across states in pre- and during-COVID timelines. In order to pinpoint the topics showing the highest variation between the pre- and during-COVID timelines, we plot the mean topic distribution of all the states. Fig. 2b shows that the mean score for topics 3 and 9 have increased and decreased, respectively, in the during-COVID timeline. Both these topics have retail as a common word (see Sec. 3.1), suggesting the following: one or more of the industry types (1) hospitality, automotive, health care, mining have suffered high job losses and (2) finance, aviation and aerospace, food and beverages, and IT have fired fewer professionals. Significant industry types. We dig deeper to recognize the specific sectors that show an increase and decrease in job loss trends in the during-COVID word. To achieve this, we calculate the cumulative contribution of each word towards the pre- and during-COVID mean scores (reported in Fig. 2b). For a word w , its pre-COVID score (and analogously during-COVID score) is , where is the mean pre-COVID weight of topic k in Fig. 2b. The final score for w is . The interpretation of S i is as follows. S i » 1: proportion of job losses for industry type w increased in the during-COVID timeline. S i ~1: proportion of job losses for industry type w remained roughly the same. S i « 1: proportion of job losses for industry type w decreased in the during-COVID timeline. Our analysis reveals that (1) the recreational and philanthropic sectors show a spike in job losses (i.e., S i » 1). In the increasing order, religious institutions, recreational facilities and services, museums, arts and crafts, luxury goods and jewelry, computer gaming, hospitality, non-profit organization, performing arts, gambling have the highest S; (2) food distribution, politics, electronic media, research sectors are largely unaffected (i.e., S i ~ 1), as supermarkets, scientific research, political organizations, broadcast media and marketing match the pre-COVID weights; and (3) the blue collar jobs, public service and print media sectors exhibit a drop in job losses (i.e., S i « 1). In the increasing order, market research, telecommunication, pharmaceuticals, aviation, food and beverage production, newspapers, textiles, consumer goods, printing show the least S.

Economic strata of unemployed workers

We utilize the job loss estimates due to COVID-19 in the US counties (see Sec. 2.1). Let the overall job loss rate and low income job loss rate of each US county be denoted by and . We create a vector , where is an indicator variable set to 1 if , and 0 otherwise. We then apply the one-sample proportion Z-test (see Sec. 2.2.5) based on the proportion of counties where the overall job loss rates exceed low income job loss rates, i.e., . We perform the one sample Z-test with the following null hypothesis – 70% US counties have higher overall job loss rates than low income job loss rates We estimate the Z-scores for proportion = 0.60, 0.65, 0.7 and 0.75, Fig. 3 illustrates that the Z-scores and p values follow decreasing trends, as the proportions () are statistically significant (i.e., p value = 4.7 × 10−9, Z-score = 5.7) for up to 0.7 (or 70%). Thus, there is statistical evidence to reject the null hypothesis and state that the overall job loss rates are higher than low income job loss rates in over 70% of the US counties, or that the higher income job losses are more adversely affected by COVID-19.

Fig. 3

Z-score and p values (in black) for the proportion of the US counties where the overall job loss rate exceed low-income job loss rates (<$40,000 annual salary).

Absorption into other sectors and reopening

There is little evidence of large-scale absorption (i.e., rehiring) of unemployed workers from one sector into another, or even into the same sector. There is no available dataset that enlists such migrations of workers to new sectors. It is worth mentioning, several artificial intelligence-based firms are attempting to minimize human contact by replacing human labor with digitization. This suggests that there is a possibility that unemployed workers will be “reskilled” in technology and inducted into the warehousing, manufacturing, and retail sectors in the near future (The Hill Vilas Dhar, 2020). As far as reopening is concerned, media reports suggest that most of the affected sectors have either stalled plans for reopening or are struggling to find consumers in the COVID “new normal” era. The U.S. Army Corps of Engineers, Jacksonville District, FL announced a reopening of Corps-managed recreational areas starting October 9, 2020, while ensuring necessary precautionary measures (U.S. ARMY CORPS OF ENGINEERS JACKSONVILLE DISTRICT, 2020). The administration of the City of San Jose, California decided to reopen parks and recreational neighborhoods on October 23, 2020 (City of San Jose, 2020). Museums across the US were originally contemplating reopening by May 2020 (Elassar, 2020), but at present most of them are looking at an indefinite shutdown and a few run the risk of closing forever (Vankin, 2020). Major jewelry outlets in the US have reopened but are finding it difficult to garner sales (Doiron, 2020), and this trend is projected to continue for a long time (CNBC, 2020). The lodging real estate investment trusts of USA continue to under-perform as people feel wary to step out (Krishnan et al., 2020). Although, several hospitality businesses have resumed operation, they are struggling to find takers (Loh et al., 2020). Also, the performing arts sectors are considering pushing back reopening to next year (Cooper, 2020; Limbong, 2020).

Discussions

In this work we study the effect of COVID-19 on the American economy with respect to job losses. We apply a statistical approach, called topic modeling, on a combination of exhaustive datasets from Worker Adjustment and Retraining Notification (WARN) Act and repository of 7 million companies to unravel crucial findings on the worst affected US states and job sectors. While Arkansas, Colorado, Connecticut, Georgia, Kentucky, North Carolina and Virginia show a high incongruity in job loss patterns between the pre-and during-COVID timelines, recreational and philanthropic sectors record the highest job losses. At present, majority of the affected sectors are contemplating extended periods of shutdown or finding fewer consumers despite reopening. Few of the observations made in course of this study stood out. First, it is crucial to understand that the economic landscape will change immensely in the during-COVID world, i.e., some industries will take a hit, while others will get a boost. Second, our findings suggest that some industries may in fact continue to stay relevant due to the interplay of several socioeconomic factors. For instance, contrary to expectation, aviation and retail (with score S = 0.77 and 0.85) show a lower proportion of job losses in the during-COVID timeline. Third, our analysis on the job loss datasets suggest that the higher income jobs are more adversely affected than the low income counterparts. This could be because some low income jobs, such as grocery stores, maintenance, food joint, security, etc., must stay operational despite the lockdown. This study can have wide-ranging implications in public policymaking to bolster economy and government subsidization of endangered sectors.

CRediT authorship contribution statement

Satyaki Roy: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Writing - original draft. Ronojoy Dutta: Data curation, Formal analysis, Methodology, Software. Preetam Ghosh: Conceptualization, Formal analysis, Methodology, Writing - review & editing.

Declaration of competing interest

We have no conflict of interest.

7 in total

1. Wrong but Useful - What Covid-19 Epidemiologic Models Can and Cannot Tell Us.

Authors: Inga Holmdahl; Caroline Buckee
Journal: N Engl J Med Date: 2020-05-15 Impact factor: 91.245

2. Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics.

Authors: Peipei Wang; Xinqi Zheng; Jiayang Li; Bangren Zhu
Journal: Chaos Solitons Fractals Date: 2020-07-01 Impact factor: 9.922

3. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.

Authors: Gurjit S Randhawa; Maximillian P M Soltysiak; Hadi El Roz; Camila P E de Souza; Kathleen A Hill; Lila Kari
Journal: PLoS One Date: 2020-04-24 Impact factor: 3.240

4. Artificial intelligence and machine learning to fight COVID-19.

Authors: Ahmad Alimadadi; Sachin Aryal; Ishan Manandhar; Patricia B Munroe; Bina Joe; Xi Cheng
Journal: Physiol Genomics Date: 2020-03-27 Impact factor: 3.107

Review 5. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review.

Authors: Sasmita Poudel Adhikari; Sha Meng; Yu-Ju Wu; Yu-Ping Mao; Rui-Xue Ye; Qing-Zhi Wang; Chang Sun; Sean Sylvia; Scott Rozelle; Hein Raat; Huan Zhou
Journal: Infect Dis Poverty Date: 2020-03-17 Impact factor: 4.520

6. Factors affecting COVID-19 infected and death rates inform lockdown-related policymaking.

Authors: Satyaki Roy; Preetam Ghosh
Journal: PLoS One Date: 2020-10-23 Impact factor: 3.240

7. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal

Authors: Laure Wynants; Ben Van Calster; Gary S Collins; Richard D Riley; Georg Heinze; Ewoud Schuit; Marc M J Bonten; Darren L Dahly; Johanna A A Damen; Thomas P A Debray; Valentijn M T de Jong; Maarten De Vos; Paul Dhiman; Maria C Haller; Michael O Harhay; Liesbet Henckaerts; Pauline Heus; Michael Kammer; Nina Kreuzberger; Anna Lohmann; Kim Luijken; Jie Ma; Glen P Martin; David J McLernon; Constanza L Andaur Navarro; Johannes B Reitsma; Jamie C Sergeant; Chunhu Shi; Nicole Skoetz; Luc J M Smits; Kym I E Snell; Matthew Sperrin; René Spijker; Ewout W Steyerberg; Toshihiko Takada; Ioanna Tzoulaki; Sander M J van Kuijk; Bas van Bussel; Iwan C C van der Horst; Florien S van Royen; Jan Y Verbakel; Christine Wallisch; Jack Wilkinson; Robert Wolff; Lotty Hooft; Karel G M Moons; Maarten van Smeden
Journal: BMJ Date: 2020-04-07