Literature DB >> 35371911

Aggravated social segregation during the COVID-19 pandemic: Evidence from crowdsourced mobility data in twelve most populated U.S. metropolitan areas.

Xiao Li¹, Xiao Huang², Dongying Li³, Yang Xu⁴.

Abstract

The notion of social segregation refers to the degrees of separation between socially different population groups. Many studies have examined spatial and residential separations among different socioeconomic or racial populations. However, with the advancement of transportation and communication technologies, people's activities and social interactions are no longer limited to their residential areas. Therefore, there is a growing necessity to investigate social segregation from a mobility perspective by analyzing people's mobility patterns. Taking advantage of crowdsourced mobility data derived from 45 million mobile devices, we innovatively quantify social segregation for the twelve most populated U.S. metropolitan statistical areas (MSAs). We analyze the mobility patterns between different communities within each MSA to assess their separations for two years. Meanwhile, we particularly explore the dynamics of social segregation impacted by the COVID-19 pandemic. The results demonstrate that New York and Washington D.C. are the most and least segregated MSA respectively among the twelve MSAs. Since the COVID-19 began, six of the twelve MSAs experienced a statistically significant increase in segregation. This study also shows that, within each MSA, the most and least vulnerable groups of communities are prone to interacting with their similar communities, indicating a higher degree of social segregation.

Entities: Chemical

Keywords: COVID-19; Mobility homophily; Smartphone data; Social segregation; Social vulnerability

Year: 2022 PMID： 35371911 PMCID： PMC8964479 DOI： 10.1016/j.scs.2022.103869

Source DB: PubMed Journal: Sustain Cities Soc ISSN： 2210-6707 Impact factor: 10.696

Introduction

The legal battle against segregation is won, but the community battle goes on. –Dorothy Day Segregation is a long-standing social phenomenon, broadly defined as the degree of spatial separation/isolation between two or more population groups, limiting their contacts, communications, and social relations (Freeman, 1978; Newby, 1982). Since the early stage of modern cities was featured by separating different social, ethnic, and racial groups, this segregated social structure was preserved and still exists in today's society (Shlay & Balzarini, 2015; Wong, 2016). Studies have demonstrated that segregation directly reflects and exacerbates social inequalities, which not only harms the socially vulnerable populations but also produces negative impacts on society as a whole (Acs, Pendall, Treskon, & Khare, 2017; Yao et al., 2019). Therefore, effectively measuring and better understanding the nature of social segregation are of great importance for urban planning and policymaking (Buck et al., 2021; Johnston et al., 2014).

Segregation studies: from place-based to mobility-based

A considerable number of place-based studies have been carried out to evaluate the population mix and potential interactions within geographic units—racial or ethnic segregation (Echenique & Fryer, 2007; Wang et al., 2018) or examine the regional differences in housing and living environments experienced by different population groups—residential segregation (Jeon & Jung, 2019; Jiang et al., 2021; Massey, 1990; Moya-Gómez et al., 2021; Musterd et al., 2017). However, there are several obvious limitations associated with place-based studies (Wang et al., 2018; Yao et al., 2019). For example, these studies treat different regions as isolated geographic units, ignoring their interconnections. Meanwhile, most studies mainly focus on the residential space but fail to take into account the visiting and activity places where people spend time across the day. It is worth noting that with the advancement of transportation and communication technologies, people's activities and social interactions are no longer limited to their residential areas (Graif et al., 2017; Small, 2006). More and more social interactions are taken place at varying locations (a.k.a., third places) across the city (Park & Kwan, 2017). Therefore, there is an increasing need to rethink and evaluate social segregation from a mobility perspective by considering peoples’ non-residential activities and the homophily of mobility patterns in the interconnected and mobility society. With the advancement of mobile sensing technologies, smartphones have become a game-changing data acquisition platform. Various emerging data sources can be collected from mobile phones (e.g., social media, activity-tracking app, cellular signals), which could effectively capture fine-grained activity and mobility patterns from a huge number of users with little or zero extra cost (Li et al., 2019; Li & Goldberg, 2018; Macias et al., 2013). Fueled by these individual-level mobility data sources, recent studies attempt to re-assess social segregation by incorporating individuals’ activities and mobility patterns into their analytical frameworks (Candipan et al., 2021; Liu, 2021; Xu et al., 2019; Yip et al., 2016). Four types of individual-level mobility data sources have been intensively utilized in existing studies, including mobility/activity surveys (Farber et al., 2015; Park & Kwan, 2018), social media (Heine et al., 2021; Wang et al., 2018), call details records (CDRs) (Amini et al., 2014; Xu et al., 2019), and activity-tracking mobile apps (Yip et al., 2016).

COVID-19 induced mobility change

The ongoing (at the time of writing) COVID-19 pandemic has brought suffering to all populations, especially in socially vulnerable communities. In the face of this unprecedented public health crisis, the public policy involving across States or local levels diverge in the type of policy, timing and speed of adoption and application, and stringency of the measures adopted (Warner & Zhang, 2021), which may further impact the behavioral patterns of various subgroups. Studies have demonstrated that the pandemic has produced unequal impacts on different population groups’ mobility and daily activities (Glaeser et al., 2020; Huang, Lu, et al., 2021). In the past two years, considerable efforts have been devoted to understanding changes in urban mobility during COVID-19. Many studies discovered a dramatic decrease in mobility during the pandemic (Gao et al., 2020), especially when lockdowns were implemented (Xiong et al., 2020). There was a larger reduction in long-distance travels (Schlosser et al., 2020). In many U.S. cities, the pandemic also affected the routine of pedestrians, with a general decrease in utilitarian walking but an increase in recreational walking (Hunter et al., 2021). A notable finding from many studies is that the pandemic's impact on mobility is heterogenous over different income groups (Hong et al., 2021). More reductions in mobility were observed from wealthier populations (Hernando et al., 2020; Heroy et al., 2021; Weill et al., 2020). These studies reveal critical changes in collective travel behavior and structural changes in urban mobility. Despite these fruitful outcomes, little effort has been devoted to understanding the impact of mobility changes on socioeconomic segregations. Since mobility changes are jointly affected by many factors, such as travel frequency, distance, and behavioral heterogeneity across social groups, their impact on socioeconomic segregation is not directly observable. It would be meaningful to explicitly quantify how mobility changes during the pandemic directly affect interactions and exposure of various socioeconomic groups.

Examining the impacts of COVID-19 on social segregation: solution and new contributions

This study aims to address a meaningful but still unanswered research question: Whether and to what extent does the COVID-19 aggravate social segregation in the U.S. twelve most populated metropolitan statistical areas (MSAs)? We evaluate the social segregation change before and during the COVID-19 pandemic within the top twelve populated U.S. MSAs based on the massive-volume individual-level mobility records. Compared to existing research efforts, this study makes the following new contributions: We utilize a fine-grained mobility dataset collected from 45 million phone users to comprehensively evaluate the social segregation for twelve U.S. MSAs at the monthly interval over two years. This study innovatively incorporates a compound index—social vulnerability index proposed by the Centers for Disease Control and Prevention (CDC SVI) into the segregation assessment. Different from the existing efforts focusing on racial or economic segregation, this study primarily examines how socially vulnerable communities connect/separate with other communities by analyzing mobility flows. To the best of the authors’ knowledge, this study marks the first attempt to comprehensively assess the influence of the COVID-19 pandemic on social segregation at two geographic scales (MSA-level and census tract level).

Study areas and data sources

Study areas

This study selected the twelve most populated metropolitan statistical areas (MSAs) in the United States as the study areas (Fig. 1 ). Each MSA refers to a statistical region comprising one or more adjacent counties with at least one central city defined by a built-up area with more than 50,000 population. All communities within the same MSA are closely linked and integrated socially and economically, making the MSA be an ideal geographic unit to study social segregation.

Fig. 1

Distribution of the twelve most populated MSAs in the United States.

Distribution of the twelve most populated MSAs in the United States. Table 1 lists the rank, population estimate in 2020, and the percent population change (2010-2020) of the selected MSAs. As reported by the U.S. Census Bureau, residents within these twelve MSAs make up approximately 30% of the U.S. population. Most of the selected MSAs, except Chicago, experienced an apparent population increase from 2010 to 2020, as shown in Table 1. Therefore, it is more meaningful to reveal and quantify the social segregation within these densely populated regions.

Table 1

The ranks and population of the twelve most populated MSAs in the U.S.

Rank	MSA	2020 population estimate	Population change % (2010–2020)
1	New York-Newark-Jersey City (New York)	19,124,359	+1.2%
2	Los Angeles-Long Beach-Anaheim (Los Angeles)	13,109,903	+2.19%
3	Chicago-Naperville-Elgin (Chicago)	9,406,638	-0.58%
4	Dallas-Fort Worth-Arlington (Dallas)	7,694,138	+20.85%
5	Houston-The Woodlands-Sugar Land (Houston)	7,154,478	+20.84%
6	Washington-Arlington-Alexandria (Washington D.C.)	6,324,629	+11.95%
7	Miami-Fort Lauderdale-Pompano Beach (Miami)	6,173,008	+10.93%
8	Philadelphia-Camden-Wilmington (Philadelphia)	6,107,906	+2.39%
9	Atlanta-Sandy Springs-Alpharetta (Atlanta)	6,087,762	+15.15%
10	Phoenix-Mesa-Chandler (Phoenix)	5,059,909	+20.68%
11	Boston-Cambridge-Newton (Boston)	4,878,211	+7.16%
12	San Francisco-Oakland-Berkeley (San Francisco)	4,696,902	+8.34%

The ranks and population of the twelve most populated MSAs in the U.S.

Mobility data

The mobility data used in this study is derived from SafeGraph (https://www.safegraph.com/), a commercial company that aggregates anonymized location data from various digital device applications to provide insights on visitation of physical places. Specifically, the data we used are open-sourced human movement records from SafeGraph's Social Distancing Metrics (SafeGraph, 2020), a Census Block Group (CBG) level mobility data product that covers the entire Conterminous U.S. from January 1, 2019, to April 16, 2021, with a temporal granularity of day. Mobility records from the Social Distancing Metrics are collected using a panel of GPS points from around 45 million anonymous mobile devices (around 10% of mobile devices in the U.S.). Such a high penetration ratio makes it an ideal data source to summarize human spatial interactions in the U.S., thus benefiting our understanding of the hidden intra-urban, intra-rural, and urban-rural movement patterns. The home locations of device users are first determined (to a Geohash-7 granularity ()) using the common nighttime location of each device over a six-week period, and users’ daily movement patterns at the CBG level are further reported (Li et al., 2021; Li, Huang, et al., 2021; SafeGraph, 2020). That is to say, the OD matrix extracted from this dataset measures the daily moving pattern with home location as the origin location.

The centers for disease control and prevention's social vulnerability index (CDC SVI)

The CDC SVI is maintained by the Agency for Toxic Substances and Disease Registry (ATSDR), which has been extensively utilized to help researchers, health officials, and emergency response planners identify socially vulnerable communities and better respond to hazardous events. The CDC SVI comprises fifteen carefully selected social factors, grouped under four themes: socioeconomic status, household composition & disability, minority status & language, and housing type & transportation, which are fully presented in Appendix A. Each census tract receives distinct scores and rankings based on different variables and themes and also has an overall score and ranking assigned by comprehensively considering all selected social factors. This study made use of the composite SVI calculated based on scores from all themes (SPL_THEMES) from the latest CDC SVI—CDC SVI 2018 to quantify the social vulnerability of census tracts within our study areas. We obtained the SVI data from CDC/ATSDR SVI Data and Documentation Download (https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html.)

Methods

Fig. 2 illustrates the workflow of this study. First, we obtained two-year mobility records from SafeGraph for the twelve most populated MSAs in the United States, aggregated to the census tract level. Next, we combined the aggregated mobility records with CDC SVI to calculate the social segregation index for each MSA (Global SSI) and the census tracts within each MSA (Local SSI). Then, we implemented the optimized hot spot analysis (OHSA) to examine the spatial distribution of highly segregated census tracts based on the Local SSI before and during the pandemic. Last, we examined the difference in social segregation before and during the pandemic and across socially different population groups through three statistical tests based on Global and Local SSI.

Fig. 2

Methodology flowchart.

Mobility data preprocessing

To protect users’ privacy, SafeGraph excludes CBG information if fewer than five devices visit an establishment in a month from a given CBG (SafeGraph, 2020). Other privacy protection measures are also implemented, such as the introduction of Laplacian noises. However, these measures do not greatly affect the quality of SafeGraph mobility records. In general, the mobile device owners sampled by SafeGraph correlate highly with the Census population in various demographic and socioeconomic settings (Squire, 2019). Although SafeGraph reported a 10% penetration ratio of its dataset, such representativeness is not geographically consistent. More details regarding the representativeness of SafeGraph samples in selected MSAs can be found in Huang et al. (2021). To match with census tract level CDC SVI data, we further aggregate the SafeGraph mobility data from the CBG level to the census tract level. This re-aggregation process can help with mitigating the low sampling issues that occur at the CBG level. After performing re-aggregation, a considerable number of mobility records have their origins and destinations within the same census tract. In this study, these mobility records were filtered to eliminate the influence of intra-connections, as suggested in Xu et al. (2019), and to highlight the segregation based on mobility patterns between different census tracts. Meanwhile, we also filter the mobility records with origins or destinations not in the same MSA to eliminate the influence of excessively long external travels.

Social segregation index (SSI) calculation

In this study, we adapted the method proposed by Xu et al. (2019) to quantify social segregation. This method was originally designed to quantify the social segregation level of individuals, which was modified in this study to assess the segregation of MSAs at the census tract level. The principle of the method is that if a census tract primarily interacted with its “similar” census tracts (a.k.a. mobility homophily), this census tract would be identified as socially segregated. In this study, the similarity of census tracts was quantified by their social vulnerability (SV) levels. The interactions of census tracts are represented by the mobility activities recorded from each pair of census tracts. Based on the SV similarity and crowdsourced mobility records, two social segregation indices (SSIs) were calculated, including a Global SSI for each MSA and a Local SSI for each census tract within the MSA.

SV distance and SV similarity

This study utilized the SPL_THEMES derived from CDC SVI to represent the SV levels of each census tract. For a collection of census tracts within the same MSA, we first ranked these tracts based on their SPL_THEMES values, resulting in a sequence , where represents the rank of the ith tract and represents the total number of tracts within the MSA. To calculate the SV distance from the tract to the tract , we first calculated the absolute difference of their SV ranks and compared it with the absolute differences between the tract with the rest of tracts within the same MSA, , where . Next, we identified and generated a set of tracts that are closer in SV ranks to the tract than the tract to tract and a set of tracts that are at the same distance to tract as the tract . Then, we calculated the SV distance using Eq. (1).where and represent the cardinality of set and . The SV distance represents the number of census tracts that are closer to tract than tract to tract , normalized by the total number of census tracts within an MSA. The value of SV distance is in the range of 0 to 1. The higher the number, the longer the SV distance. Based on the SV distance, we can calculate the SV similarity between tracts and using Eq. (2). The value of is also in the range of 0 to 1, with a higher value indicating a higher SV similarity.

Global SSI

After we calculated the SV similarity for each pair of census tracts, we combined it with the crowdsourced mobility data to generate a Global SSI for quantifying the social segregation of each MSA. This study defines the Global SSI as the weighted average of SV similarity for all mobility activities recorded from different pairs of origin and destination (OD) within the same MSA. For the crowdsourced mobility records, we first grouped them based on the combination of their OD. This process generated a set of distinct OD pairs for mobility records within the same MSA , where denotes the number of distinct OD pairs within an MSA, and denotes the -th OD pair. We also count the number of records for each OD pair , where denotes the number of mobility records with the same OD— Since the OD for each mobility record is mapped to the census tract level as introduced above, we can directly calculate the SV similarity for each OD pair using Eqs. (1), (2). Then, the Global SSI can be expressed as Eq. (3):where denotes the SV similarity for the -th OD pair.

Local SSI

This study also calculated a Local SSI to quantify the social segregation for each census tract. Similar to the Global SSI, we calculated the weighted average of SV similarity for mobility records with the same origin (census tract) as the Local SSI for that tract. Therefore, we first generated a subset of distinct OD pairs with a specified census tract to be the origin . Meanwhile, we also counted the number of mobility activities recorded for each distinct OD pair within the — . Then, the Local SSI for the census tract can be calculated using Eq. (4). As Xu et al. (2019) demonstrated, if a census tract equally interacts with other census tracts, its Local SSI should equal 0.5. A Local SSI closer to 1 means the census tract is more tightly related to census tracts at similar social vulnerability levels. A Local SSI closer to 0 indicates the census tract primarily interacts with census tracts at different social vulnerability levels.

Optimized hot spot analysis

Based on the obtained Local SSI, this study implemented the optimized hot spot analysis (OHSA) to examine the spatial distribution of highly segregated census tracts—tracts with higher Local SSI values within each MSA. OHSA is an advanced spatial statistic method for identifying statistically significant hot spots—features with a high value surrounded by other high-value features. OHSA combines the Getis-Ord Gi* (Gi*) statistic with the incremental spatial autocorrelation, which could automatically determine the scale of analysis (e.g., searching distance) to generate the optimal hot spot results. Implementing OHSA contains three steps, including (1) initial data assessment, (2) scale of analysis, and (3) hot spot analysis (Lu et al., 2019; Peeters et al., 2015). The first step was the initial data assessment, which aimed to check if the dataset contains an adequate number of features for implementing OHSA. The second step was to identify the most appropriate scale of analysis—the searching distance for the Gi* statistic, which could be obtained by performing incremental spatial autocorrelation. After obtaining the optimal searching distance, the last step was to run the Gi* statistic to identify the significant spatial clusters of high values (hot spots) and low values (cold spots).

Segregation change examination

This study utilized three statistical tests, including paired sample t-test, Analysis of Variance (ANOVA), and Tukey's range test, to examine the existence of the statistically significant difference between two or more data groups. These tests were performed to explore the influence of the COVID-19 on social segregation and assess the variation of segregation levels in different population groups.

Paired sample t-test

Paired sample t-test, also called the dependent samples t-test, is one of the most used statistical methods to examine whether the mean difference between two datasets is significant, which yields a p-value indicating the significance level of difference (Li et al., 2020; Mishra et al., 2019). This test is particularly suitable to measure the difference of the same objects at two different time points.

ANOVA test and Tukey's range test

ANOVA test is commonly used to examine the differences between three or more data groups. As an omnibus test statistic, the p-value generated from the ANOVA test indicates that at least one pair of data groups are statistically different. However, it cannot tell where those differences lie (e.g., which pair of samples are different). To overcome this limition, Tukey’ range test is usually applied as a post hoc test for ANOVA to examine the existence of significant differences between each pair of groups (Mishra et al., 2019; Park & Kwan, 2018).

Results

This study utilized two-year (from 2019-03 to 2021-02) mobility data to assess the social segregation for the twelve most populated U.S. MSAs. We calculated both the monthly Global and Local SSI for each MSA. According to the timeline of COVID-19 development (AJMC, 2021), WHO declared COVID-19 as a pandemic in March 2020. In the same month, The White House announced COVID-19 as a national emergency. Therefore, these monthly Global and Local SSI were further grouped into two study periods: Before COVID-19 (from 2019-03 to 2020-02) and During COVID-19 (from 2020-03 to 2021-02).

Global SSI before and during COVID-19

In this study, we first performed the paired sample t-test to determine whether the mean values of monthly Global SSI at two study periods (Before COVID-19 and During COVID-19) are significantly different. The t-test results are summarized in Table 2 , including the averaged monthly Global SSI over two study periods, the t-statistic values, and the p-values for two-tailed tests. In general, the two-tailed test examines whether any difference (both positive and negative) exists between two data groups; the one-tailed test examines the existence of a specific type of difference (positive or negative). This study utilized the two-tailed test results to examine whether a significant difference exists in the Global SSI between the Before COVID and During COVID-19 periods. The results indicate that six of twelve MSAs experienced a dramatic change of social segregation over two study periods, including New York, Los Angeles, Phoenix, Dallas, Houston, and San Francisco.

Table 2

Paired sample t-test results between monthly Global SSI before and after COVID-19.

MSAs	Mean Global SSI (Before)	Mean Global SSI (After)	t-statistic	p-valuetwo-tailed
New York	0.635	0.642	-9.592	0.000*
Los Angeles	0.618	0.621	-4.934	0.000*
Phoenix	0.618	0.620	-3.743	0.003*
Chicago	0.618	0.618	-0.365	0.722
Philadelphia	0.611	0.610	1.661	0.125
Dallas	0.604	0.606	-4.271	0.001*
Houston	0.603	0.605	-5.183	0.000*
Boston	0.604	0.604	0.837	0.421
Miami	0.595	0.596	-2.189	0.051
San Francisco	0.584	0.590	-7.594	0.000*
Atlanta	0.584	0.585	-1.073	0.305
Washington D.C.	0.568	0.569	-1.885	0.086

represents statistically significant p-value (<0.05).

Paired sample t-test results between monthly Global SSI before and after COVID-19. represents statistically significant p-value (<0.05). The boxplots in Fig. 3 show the distribution of the monthly Global SSI before and during COVID-19 for the examined MSAs. Each “box” shows the first quartile and third quantile of the data, with the middle line indicating the median. As shown in this figure, all the “boxes” show distributions of values above 0.5, suggesting that social segregation exists in all twelve MSAs mildly or severely.

Fig. 3

Monthly Global SSI for the most populated MSAs before and during COVID-19.

Monthly Global SSI for the most populated MSAs before and during COVID-19. Among these MSAs, New York MSA shows the most severe segregation, implying that their census tracts more closely interact with their similar tracts in terms of social vulnerability. Washington D.C. remains to be the least segregated MSA before and during COVID-19. In terms of segregation severity, the rest of MSAs are ranked as Los Angeles, Phoenix, Chicago, Philadelphia, Dallas, Houston, Boston, Miami, San Francisco, and Atlanta. By comparing the two periods’ boxplots, we can clearly see that the COVID-19 pandemic significantly aggravated the segregation severity of New York and San Francisco. Their mean values of monthly Global SSI were respectively increased from 0.635 to 0.642 and from 0.584 to 0.590. Meanwhile, we also observed a noticeable increase in social segregation since COVID-19 began in Los Angeles, Phoenix, Dallas, Houston. The rest of MSAs remained the same segregation severity before and during COVID 19. Only Boston and Philadelphia show slight decreases in their monthly Global SSI, which are not significant in our t-test.

Local SSI before and during COVID-19

In this study, we also calculated the monthly Local SSI to depict the spatial variation of segregation within each MSA at the census tract level. For each census tract, we first aggregated the monthly Local SSI into two groups, including Before COVID-19 and During COVID-19. Then we performed the paired sample t-test (two-tailed) on the aggregated (averaged) monthly Local SSI to examine the segregation difference at the census tract level before and during COVID-19. Table 3 shows the descriptive statistics and the t-test results of the aggregated monthly Local SSI for the twelve most populated U.S. MSAs at two study periods.

Table 3

Descriptive statistics of aggregated monthly Local SSI in the twelve most populated MSAs.

MSAs	Number of Census Tracts	Before COVID-19(Aggregated Monthly Local SSI)				After COVID-19(Aggregated Monthly Local SSI)				Paired Sample t-test
MSAs	Number of Census Tracts	Mean	Maximum	Minimum	SD	Mean	Maximum	Minimum	SD	t-statistic	p-value
New York	4462	0.626	0.838	0.203	0.084	0.633	0.861	0.166	0.090	-19.490	0.000*
Los Angeles	2893	0.615	0.828	0.291	0.075	0.617	0.849	0.286	0.078	-6.515	0.000*
Phoenix	982	0.613	0.767	0.309	0.067	0.615	0.791	0.300	0.070	-3.291	0.001*
Chicago	2202	0.616	0.777	0.298	0.075	0.619	0.794	0.301	0.077	-6.319	0.000*
Philadelphia	1460	0.607	0.813	0.339	0.079	0.609	0.822	0.271	0.083	-5.140	0.000*
Dallas	1309	0.600	0.778	0.308	0.077	0.602	0.783	0.286	0.078	-7.927	0.000*
Houston	1064	0.595	0.794	0.356	0.075	0.598	0.789	0.352	0.075	-8.487	0.000*
Boston	991	0.600	0.783	0.338	0.084	0.604	0.800	0.312	0.085	-5.424	0.000*
Miami	1,196	0.593	0.783	0.374	0.072	0.594	0.806	0.343	0.074	-4.716	0.000*
San Francisco	972	0.577	0.769	0.302	0.078	0.584	0.779	0.300	0.081	-10.07	0.000*
Atlanta	946	0.587	0.792	0.301	0.079	0.592	0.801	0.306	0.082	-10.939	0.000*
Washington D.C.	1,350	0.568	0.733	0.337	0.068	0.571	0.791	0.348	0.077	-5.081	0.000*

SD = standard deviation;

The Mean, Maximum, and Minimum were calculated based on the aggregated monthly Local SSI for all census tracts within each MSA at two different study periods

represents statistically significant p-value (<0.05).

Descriptive statistics of aggregated monthly Local SSI in the twelve most populated MSAs. SD = standard deviation; The Mean, Maximum, and Minimum were calculated based on the aggregated monthly Local SSI for all census tracts within each MSA at two different study periods represents statistically significant p-value (<0.05). The result illustrates, since the COVID-19 began, the aggregated monthly Local SSI also experienced a statistically significant increase (p-value < 0.05) in all twelve MSA (Table 3), in which New York shows the most significant increase with the mean value of the aggregated monthly Local SSI increased from 0.626 to 0.633. Meanwhile, as shown in Table 3, the increased maximum value and standard deviation and the decreased minimum value of Local SSI in most MSAs also imply that the COVID-19 produced varying degrees of influence on the segregation throughout each MSAs’ census tract. Please note some census tracts with zero estimates for population do not have a valid social vulnerability score. Meanwhile, there are also some census tracts without any mobility data observed from them. Therefore, these census tracts were removed from this study.

Local SSI by social vulnerability levels (SV Levels)

Based on the calculated Local SSI, we also explored how social segregation varies along with the social vulnerability (SV) levels of census tracts. In this study, we first divided each MSA's census tracts into ten SV Levels (from 1 to 10) with equal intervals based on their CDC SVI values. The SV Level 1 represents the least vulnerable group of census tracts, and the SV Level 10 represents the most vulnerable group of census tracts. To examine the segregation differences experienced by census tracts with different SV Levels, we first calculated the average value of the monthly Local SSI for each census over the two-year period. Then, we performed the ANOVA test to assess whether the averaged values are significantly different at different SV Levels. In addition, we also applied Tukey's range test to examine the existence of significant differences between each pair of groups (census tract groups at different SV Levels) (Mishra et al., 2019; Park & Kwan, 2018). The ANOVA result shows that the averaged segregation values (averaged Local SSI) of tracts at different SV Levels are significantly different (p-value < 0.05) in all twelve MSAs. To further assess where these differences occur, we performed Tukey's range test for each MSA. Fig. 4 shows the pairwise comparisons between different SV Levels’ census tract groups in each MSA. The red cell represents the significant difference (p-value < 0.05) and the grey cell represents the non-significant difference (p-value > 0.05). This figure shows that New York and Chicago have the greatest number of red cells among the twelve MSAs, indicating their census tracts at different SV Levels experience significantly different social segregation. For other MSAs, low-vulnerable census tracts (SV Levels 1 to 2) and high-vulnerable census tracts (SV Level 9 to 10) typically show significant differences with other census tracts in terms of segregation.

Fig. 4

Pairwise comparisons of averaged Local SSI between census tracts at different SV Levels (Tukey's range test).

Pairwise comparisons of averaged Local SSI between census tracts at different SV Levels (Tukey's range test). Fig. 5 illustrates the mean value of monthly Local SSI for census tracts aggregated to different SV Levels. Like the result of Tukey’s range test, a clear pattern can be identified from all MSAs that the greatest segregation is constantly observed from the most or the least vulnerable census tract groups. It indicates that people living in the most (SV Level 10) or least (SV Level 1) vulnerable communities are more likely to commute to socially similar communities. For most MSAs, the closer the SV Level to the extremes (Level 1 or Level 10), the greater segregation it exhibits. However, in Boston, San Francisco, and Washington D.C. MSAs, almost all higher vulnerable groups (SV Levels 6 to 10) show greater segregation than the lower vulnerable groups (SV Levels 1 to 5), which implies the vulnerable communities overall experienced higher segregation in these MSAs.

Fig. 5

The mean value of monthly Local SSI for census tracts at different SV Levels before and during COVID-19.

The mean value of monthly Local SSI for census tracts at different SV Levels before and during COVID-19. In addition, we found that the Local SSI of higher vulnerable groups exhibits a noticeable fluctuation since the COVID-19 began in nine of the twelve MSAs, including New York, Los Angeles, Chicago, Philadelphia, Houston, Miami, San Francisco, Atlanta, and Washington D.C. It suggests that the socially vulnerable communities are more significantly influenced by the pandemic, leading to an increase in social segregation.

ocal SSI hot spots before and during COVID-19

This study also implemented the OHSA to examine the spatial clustering of segregated census tracts in each MSA. As introduced before, we divided the data into two study periods: Before and During COVID-19. We utilized the mean value of each census tract's Local SSI during each period as the examined variable to identify hot spots. Then we compared two periods’ hot spots to assess whether the spatial distribution of high segregated census tracts changed due to the pandemic. Table 4 lists the number of hot spots identified from the selected MSAs during each study period. Since this study primarily focuses on the high segregated census tracts—tracts with higher Local SSI, we applied a criterion to the OHSA's results to further select hot spots. In this study, the hot spots are defined as census tracts with p-value < 0.01, z-value > 2.58, and Local SSI > the mean value of census tracts within the MSA.

Table 4

Identified hot spots for high segregated census tracts before and during COVID-19.

MSAs	Number of Census Tracts	Hot Spots (Before COVID-19)		Hot Spots (During COVID-19)
MSAs	Number of Census Tracts	Count	Percent	Count	Percent
New York	4462	528	11.8%	617	13.8%
Los Angeles	2893	581	20.1%	572	19.8%
Phoenix	982	188	19.1%	183	18.6%
Chicago	2202	430	19.5%	432	19.6%
Philadelphia	1460	291	19.9%	334	22.9%
Dallas	1309	201	15.4%	198	15.1%
Houston	1064	96	9.0%	96	9.0%
Boston	991	154	15.5%	141	14.2%
Miami	1196	274	22.9%	203	17.0%
San Francisco	972	159	16.4%	135	13.9%
Atlanta	946	70	7.4%	114	12.1%
Washington D.C.	1350	415	30.7%	390	28.9%

Hot spots are defined as census tracts with p-value < 0.01, z-value > 2.58, and Local SSI > the mean value of census tracts within the MSA.

Identified hot spots for high segregated census tracts before and during COVID-19. Hot spots are defined as census tracts with p-value < 0.01, z-value > 2.58, and Local SSI > the mean value of census tracts within the MSA. Fig. 6 provides an example of OHSA results, illustrating the identified hot spots in Washington D.C. MSA before the pandemic. This figure clearly shows that high-segregated census tracts were more densely clustered in D.C. as well as its neighboring counties and cities, including Montgomery County, Prince George's County, Arlington County, and Alexandria City. It proves the effectiveness of OHSA for identifying spatial clusters of high values.

Fig. 6

An example of OHSA results of the identified hot spots for Washington D.C. MSA in the before-COVID period.

An example of OHSA results of the identified hot spots for Washington D.C. MSA in the before-COVID period. Before COVID-19, Washington D.C., Miami, and Los Angeles were the top three MSAs with a higher percent of census tracts identified as hot spots, which implies that the high segregated census tracts tend to be spatially clustered in these MSAs, as shown in Table 4. During COVID-19, the top three MSAs in terms of the percent of hot spots were changed to Washington D.C., Philadelphia, and Los Angeles. The last three MSAs remain to be Houston, Atlanta, and New York in both study periods, which means the high segregated census tracts are less clustered in these MSAs. By comparing the counts and percentages of hot spots identified from these MSAs before and during COVID-19, we noticed the localized spatial autocorrelation structure of the Local SSI remain almost the same (with changes under 5%) in all MSAs, which implies the distribution of high segregated census tracts is spatially consistent before and during COVID-19.

Discussion

Main findings and interpretations

This study examined the monthly variation of social segregation within the twelve most populated U.S. MSAs by calculating two social segregation indices: Global SSI (MSA-level) and Local SSI (census tract-level) based on high-volume mobility records. The results indicate that different MSAs showed different levels of social segregation, and half of them experienced a statistically significant increase in segregation degrees since the COVID-19 began. Meanwhile, social segregation degrees also vary across census tracts at different social vulnerability levels. The most and least vulnerable census tract groups generally experience significantly higher degrees of segregation than other tracts within each MSA. Although identifying the causes of such varying degrees of segregation change is beyond the scope of this study, we believe this phenomenon could be potentially explained by the observed urban homophily documented from various studies (Xu et al., 2019, 2022). Many pieces of evidence have revealed that the COVID-19 pandemic led to the reduction of long-distance travel, restricting citizens’ mobility to smaller locales (Fatmi et al., 2021). U.S. urban static layouts are intrinsically segregated, with locales of similar sociodemographic settings geographically clustered (Fogli & Guerrieri, 2019). Thus, the reduction of long-distance travel is expected to further aggravate segregation levels, especially for the most and the least vulnerable groups, as they tend to present stronger geographically clustering patterns. This study innovatively utilized a crowdsourced fine-grained mobility dataset collected from 45 million phone users to comprehensively examine social segregation based on mobility homophily. A major contribution of this study is the framing of mobility-based spatial segregation based on travel trajectories and social mixing. Residential locations and the racial/economic dissimilarity in the neighborhood have long been widely adopted to measure segregation. While residential segregation is a key component of the structural inequalities of the society, it does not represent the other dimensions where discriminatory policies are institutionalized, such as workplaces, schools, social institutions, and even third places (Riley, 2018). Furthermore, relying solely on arbitrary static administrative boundaries, the results on residential segregation are susceptible to the modifiable areal unit problem (Wong et al., 1999). Studies have repeatedly called for an expanded perspective of segregation beyond residential isolation alone (Riley, 2018; Tan et al., 2021). Specifically, the focus on residential context alone neglects that individuals are mobile and their daily trajectory usually intersects various urban settings (Shareck et al., 2014). Although this study primarily assesses social segregation dynamics during COVID-19 based on mobility homophily and social vulnerability index, the results offer some insights that support studies on segregation through the lens of racial and residential isolation. For example, our results demonstrated different census tracts within the same MSA may experience significantly different social segregation. The OHSA results also showed spatial clusters of highly segregated census tracts exist at varying degrees in each MSA. Segregation has long been reported to vary in degrees inter-and intra-cities (Wong et al., 1999). As segregation is associated with environmental inequalities and health burdens (Collins & Williams, 1999; Morello-Frosch & Lopez, 2006), elucidating the global and local patterns of spatial segregation may help identify at-risk populations for environmental inequalities. More importantly, results from this work help clarify whether the mobility-based segregation gap has widened or narrowed for the socially vulnerable groups under the global pandemic. This study demonstrated that six of the twelve most populated MSAs in the U.S. (i.e., New York City, Los Angeles, Phoenix, Dallas, Houston, and San Francisco) experienced a statistically significant increase in terms of social segregation. The differences in changes in city-level segregation after COVID-19 may be attributable to the variations in COVID-19 prevalence and pandemic policy responses at the state, county, or city levels. California, Texas, and New York States witnessed consistently high confirmed cases during 2020 and early 2021. Compared to cities such as Miami and Atlanta, New York City and Los Angeles have stricter mask mandates in indoor places through most of the waves of the pandemic. New York City, Los Angeles, and San Francisco governments were among the earliest to react and announce work-from-home orders and had stricter guidelines against COVID-19 (Warner & Zhang, 2021). For example, as one of the quickest to respond, Los Angeles County declared a state of emergency on March 4th, 2020, and the State of California implemented stay-at-home order on March 19th (City of Los Angeles, 2022). While indoor dining remained closed for many months in New York City, in Atlanta, for example, businesses such as gyms were reopened, and restaurants allowed in-person dining in late April (City of Atlanta, 2022). Another pattern worth noting is that cities that have been experiencing high residential income segregation in the past seem to show stronger aggravation in mobility-based segregation. For example, in 2010, Houston, Dallas, New York, and Los Angeles were the top four segregated metropolitan areas (Fry & Taylor, 2012); they also rose to the top in our results, suggesting that COVID-19 may have reinforced the existing patterns of residential segregation by limiting diversity and social mix by travel. The segregation in higher vulnerable groups exhibits a noticeable fluctuation since the COVID-19 began in nine of the twelve MSAs. Traditional literature has mostly referred to mobility as an asset-that wealthier residents may be better equipped or socially situated to escape their neighborhood and enjoy amenities and social mixing in more advantaged areas; whereas more disadvantaged residents may be more bounded to the neighborhood (Krivo et al., 2013; Wang et al., 2012). Social network research has revealed that close relatives are likely to live close to each other, with minority groups such as Black and Mexican Americans being more likely to live with their kin than White residents (Kim & McKenry, 1998). As such, disadvantaged groups may display more geographic homophily (Smith et al., 2014) and therefore weaker social networks that would support or necessitate travel across the city to other destinations during the pandemic. However, on the other hand, disadvantaged populations may be forced to travel through space for work and consequently be exposed to risks of infection under the pandemic (Dyer, 2020; Huang, Lu, et al., 2021; Kirby, 2020). These populations are more exposed to COVID-19 as they are more likely to be employed in essential service sectors and lack the privilege to work remotely (Clouston et al., 2021; Cyrus et al., 2020; Thomas et al., 2020). Given this hypothesis, wider activity space and less segregation may be observed in the vulnerable populations compared to the better-off groups during COVID-19. For example, Bassolas et al. (2021) coined a measure of diffusion segregation to characterize the spatial mix among racial groups based on residential and activity locations. By examining the relationships between diffusion segregation and COVID-19 incidences, they uncovered that segregation caused by mobility patterns showed stronger associations with COVID-19 incidence and mortality rates.

Policy implications

Our study calls policy attention to COVID-related stressors that may further limit the transportation and destination options and reinforce existing segregation. Our findings showed half of the cities analyzed showed significantly aggravated segregation. Prior studies examining patterns of post-pandemic transit demand also revealed an approximate 40% decline in transit commune trips relative to pre-pandemic times and suggested that such behavioral shifts may become the “new normal” (Salon et al., 2021). With such transitions, new urban planning and policies that can address such inequalities are warranted. Our findings also outline some potential complications regarding public health policy during the pandemic. As cities that had higher COVID-19 prevalence rates and more stringent infection control policies experienced increased segregation, it is essential to rethink the traditional values associated with spatial and social proximity, such as mixing and isolation, in the wake of the COVID-19 pandemic. Mitigating segregation while controlling disease exposure pathways for the socially vulnerable groups would be critical. The literature has suggested a relationship between racial and socioeconomic segregation and COVID-19 related mortality (Khanijahani & Tomassoni, 2022), which, together with our results, depicts a vicious cycle that places a disproportionate burden on residents in socially vulnerable neighborhoods. Our study and several recent studies that consider segregation in a dynamic spatiotemporal framework, together open new horizons for interventions against segregation. Reactions to segregation and the risk of concentrated poverty have typically been divided: mixing is often adopted by the government to enhance diversity and justice, but escapist strategies (Smets & Salman, 2008) can be popular among the wealthier populations by reinforcing separation. The accepted norm to pursue urban diversity through policy measures is to bring housing mix into disadvantaged neighborhoods in order to achieve a fair distribution of amenities and disamenities across social groups (Bolt et al., 2010). However, research that examines the effects of policies providing middle-class housing in the worse-off precincts reveals complex findings (Graham et al., 2009; Manley et al., 2012; Smets & Salman, 2008). Given our findings on how the dynamics of mobility can be related to variations in segregation, diverse policies that promote mixing outside of residential settings by providing better transit opportunities and inclusive destinations may be an additional strategy to alleviate the burdens of segregation. The heterogeneity of multidimensional realms of urban space, rather than residential areas alone, may serve as an incentive for breaking down implicit biases and strengthening links for the urban conglomerate.

Limitations and future work

We need to acknowledge the limitations of this work and provide guidelines for future studies. First, our segregation measurement is built upon CDC's SVI, a social vulnerability measurement with four themes after considering a total of 15 demographic and socioeconomic variables (See Appendix A). Despite the comprehensiveness of SVI thanks to its consideration of multidimensional social factors, its overall vulnerability index, i.e., SPL_THEMES, is a summation of the vulnerability from four themes. Such a summation operation assumes that these four themes are equally weighted, which is not usually the case given the complex interactions among these variables, potentially leading to the dilution of certain variables, such as race/ethnicity, in our spatial segregation measurement. In addition, the potential contribution from the missing demographic and socioeconomic variables (beyond the 15 factors involved in CDC's SVI) in spatial segregation deserves further exploration. One possible improvement is to involve a greater number of demographic and socioeconomic variables, followed by a principal component analysis to select important components in an uncorrelated manner. Such an approach has been adopted in the construction of many social vulnerability measurements, with one notable effort by Cutter et al. (2003). Second, the advent of geopositioning techniques provides us with an opportunity to closely monitor human spatial interactions from digital device holders. However, due to privacy and confidentiality concerns, mobility records from digital devices are not usually accessible to the public, and proper aggregation steps need to be performed before the data release. For example, the SafeGraph mobility dataset we used in this study is originally aggregated at the Census Block Group (CBG) level. Such aggregated mobility records lack individual details and are limited in scalability. Demographic and socioeconomic information of the individuals making the trips were not available. Likewise, contextual information regarding the types of trips, motivations and purposes of travels, and staying time at each stop is unfortunately not included. To supplement the contextual information behind trips, a growing number of studies resort to social media, taking advantage of the locational, textual, and visual information from social media posts (Beiró et al., 2016; Huang et al., 2020; Wu et al., 2014). Surveys can also assist in providing essential contextual information (e.g., purposes of travels, detailed location of destinations, demographic profiles, socioeconomic status, etc.) that contributes to a comprehensive story behind human movements (Hu et al., 2021), thus leading to a better understanding of the observed spatial segregation phenomenon in this study. Third, the representativeness of SafeGraph data also needs to be discussed. Despite that SafeGraph reports a 10% penetration ratio of its dataset, we should acknowledge the “Digital Divide” issue (Brown et al., 2011), where underprivileged members of society who do not have access to digital devices, especially the poor and the elderly, are largely neglected from this study. Meanwhile, the representativeness of SafeGraph in rural and suburban areas is relatively lower than in urban areas, which may influence the reliability of the obtained results. Due to these data limitations, this study only examined the most populated twelve MSAs, where we can obtain sufficient SafeGraph samples to ensure the results’ accuracy and reliability. Human mobility is characterized by its multi-faceted nature, evidenced by many existing studies (González et al., 2008; Huang et al., 2021). Therefore, we urge more efforts to be made to understand spatial segregation issues using multi-source mobility datasets that capture a broader population spectrum for both urban and rural areas. Forth, since the focus of this study is to understand the impact of mobility changes on interactions and social exposure among tracts, we have not considered intra-tract movements that reflect more localized interactions between different social groups. Given the notable impact of COVID-19 on travel distance, it would be meaningful to further examine how these intra-tract movements contribute to the segregation or integration of various socioeconomic groups in these MSAs. It would require more fine-grained observations on the spatial distribution of socioeconomic groups within the tracts. This would be a possible direction for future study. Finally, while focusing on revealing the spatial segregation patterns before and during the pandemic in selected U.S. MSAs, our study does not explore the mechanistic pathways that lead to the observed spatial segregation disparity across MSAs and before/during the COVID-19 pandemic. Studies are still needed to investigate the underlying reasons for the spatial segregation dynamics. Meanwhile, future integration of residential, mobility, social network, and other metrics for assessing segregation is also suggested for future research.

Conclusions

Segregation is a long-standing social issue. Effectively quantifying social segregation is of great importance for city planning, management, and policymaking. This study innovatively utilized fine-grained individual-level movement data collected from 45 million phone users to quantify the social segregation based on the mobility homophily of socially different communities. We examined the monthly dynamics of social segregation in the twelve most populated U.S. MSAs over two years (from 2019-03 to 2020-02) at two levels and focused on exploring whether the COVID-19 pandemic aggravates social segregation. The key findings are summarized as follows: Among the twelve MSAs, New York and Washington D.C. remain to be the most and least segregated MSA before and during COVID-19. Since the COVID-19 began, six of the twelve MSAs show statistically significant increases in the segregation degree. The highest segregation is constantly observed from the most or the least vulnerable communities (census tract groups) in all twelve MSAs. The spatial clustering structure of highly segregated communities remains almost the same in all MSAs before and during COVID-19 .

Declaration of Competing Interest

The authors declare no conflict of interest.

Table A1

CDC SVI variables and themes.

CDC SVI Themes		CDC SVI Variables
Name	Description	Name	Description
SPL_THEME1	Sum of series for Socioeconomic theme	EP_POV	Percentage of persons below poverty
		EP_UNEMP	Unemployment rate
		EP_PCI	Per capita income
		EP_NOHSDP	Percentage of persons with no high school diploma (age 25+)
SPL_THEME2	Sum of series for Household composition & disability theme	EP_AGE65	Percentage of persons aged 65 and older
		EP_AGE17	Percentage of persons agreed 17 and younger
		EP_DISABL	Percentage of civilian noninstitutionalized population with a disability
		EP_SNGPNT	Percentage of single parent households with children under 18
SPL_THEME3	Sum of series for Minority status & language barrier theme	EP_MINRTY	Percentage minority (all persons except white, non-Hispanic)
SPL_THEME3	Sum of series for Minority status & language barrier theme	EP_LIMENG	Percentage of persons (age 5+) who speak English “less than well”
SPL_THEME4	Sum of series for Housing type & transportation theme	EP_MUNIT	Percentage of housing in structures with 10 or more unites
		EP_MOBILE	Percentage of mobile houses
		EP_CROWD	Percentage of occupied housing units with more people than rooms
		EP_NOVHE	Percentage of households with no vehicle available
		EP_GROUPQ	Percentage of persons in group quarters
SPL_THEMES	Sum of all themes

32 in total

1. Investigating the relationship between environmental quality, socio-spatial segregation and the social dimension of sustainability in US urban areas.

Authors: Kyle D Buck; J Kevin Summers; Lisa M Smith
Journal: Sustain Cities Soc Date: 2021-01-14 Impact factor: 7.587

2. Urban mobility and neighborhood isolation in America's 50 largest cities.

Authors: Qi Wang; Nolan Edward Phillips; Mario L Small; Robert J Sampson
Journal: Proc Natl Acad Sci U S A Date: 2018-07-09 Impact factor: 11.205

3. Structural Racism and COVID-19 in the USA: a County-Level Empirical Analysis.

Authors: Shin Bin Tan; Priyanka deSouza; Matthew Raifman
Journal: J Racial Ethn Health Disparities Date: 2021-01-19

4. Diffusion segregation and the disproportionate incidence of COVID-19 in African American communities.

Authors: Aleix Bassolas; Sandro Sousa; Vincenzo Nicosia
Journal: J R Soc Interface Date: 2021-01-27 Impact factor: 4.118

5. Socioeconomic and Racial Segregation and COVID-19: Concentrated Disadvantage and Black Concentration in Association with COVID-19 Deaths in the USA.

Authors: Ahmad Khanijahani; Larisa Tomassoni
Journal: J Racial Ethn Health Disparities Date: 2021-01-19

6. Socioeconomic inequalities in the spread of coronavirus-19 in the United States: A examination of the emergence of social inequalities.

Authors: Sean A P Clouston; Ginny Natale; Bruce G Link
Journal: Soc Sci Med Date: 2020-11-30 Impact factor: 4.634