Literature DB >> 35645430

Rapid surveillance of COVID-19 by timely detection of geographically robust, alive and emerging hotspots using Particle Swarm Optimizer.

Abstract

A novel virus, called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has rapidly become a pandemic called Coronavirus disease 2019 (COVID-19). According to the World Health Organization, COVID-19 was first detected in Wuhan city in December 2019 and has affected 216 countries with 9473214 confirmed cases and 484249 deaths globally as on June 26th, 2020. Also, this outbreak continues to grow in many countries like the United States of America (U.S.), Brazil, India, and Russia. To ensure rapid surveillance and better decision-making by government authorities in different countries, it is vital to identify alive and emerging hotspots within a country promptly. State-of-the-art methods based on space-time scan statistics (like SaTScan) are not geographically robust. Also, due to the enumeration of many Spatio-temporal cylinders, the computation cost of Spatio-temporal SaTScan (ST-SaTScan) is very high. In the applications like COVID-19 where we need to detect the emerging hotspots daily as soon as the new count of cases gets updated, ST-SaTScan seems inefficient. Therefore, this paper proposes a Particle Swarm Optimizer-based scheme to timely detect geographically robust, alive, and emerging COVID-19 hotspots in a country. Timely detection can help government officials design better control strategies like increasing testing in hotspots, imposing stricter containment rules, or setting up temporary hospital beds. Performance of ST-SaTScan and proposed scheme have been analyzed for four worst-hit U.S. states for the incubation period of 14 days between June 11th, 2020, and June 24th, 2020. Results indicate that the proposed scheme detects hotspots of a higher likelihood ratio (a measure to indicate the significance of hotspot) than ST-SaTScan in significantly less time. We also applied the proposed scheme to detect the emerging COVID-19 hotspots in all states of the U.S. During the study period, the proposed scheme has detected 104 emerging COVID-19 hotspots.

Entities: Chemical

Keywords: COVID-19; Coronavirus; Hotspot detection; Particle swarm optimizer; Spatio-temporal analysis

Year: 2022 PMID： 35645430 PMCID： PMC9127146 DOI： 10.1016/j.apgeog.2022.102719

Source DB: PubMed Journal: Appl Geogr ISSN： 0143-6228

Introduction

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first identified in December 2019 in Mainland China, with its epicenter at Wuhan City (C. Wang, Horby, Hayden, & Gao, 2020). The most common symptoms of COVID-19 include fever, cough, shortness of breath, and fatigue; however, severe cases may suffer from organ failures, heart attacks, pneumonia, or even death (Sanche et al., 2020). The novelty of COVID-19, its highly contagious nature, and lack of appropriate medicinal knowledge resulted in its rapid outbreak worldwide. Consequently, COVID-19 was declared a pandemic by the World Health Organization on March 11th, 2020 (Cucinotta & Vanelli, 2020). On June 27th, 2020, there were 9473214 confirmed cases and 484249 deaths globally. However, the death rate of COVID-19 lies between 1% and 5%, and 80% of the total cases are mild ones (Ruan, Yang, Wang, Jiang, & Song, 2020). The lack of proper medication and vaccines for COVID-19 forced the governments of different countries to impose a complete lockdown on the entire country. Depending on the outbreak's severity, many countries strictly implemented the lockdown; however, governments in some countries have taken various other preventive measures. These measures include restrictions on international and domestic flights, restricted inter-territorial movements for non-essential commodities, etc. The complete lockdown and other preventive measures had severe and apparent effects on the economies, resulting in lifting the lockdowns even with a surge of confirmed COVID-19 cases. However, in fighting the outbreak of a highly infectious disease like COVID-19 and also to ensure little adversity in economies, it is essential to target the prioritized locations (known as hotspots) for stricter containment measures, increased testing, and ramping up of medical facilities (Desjardins, Hohl, & Delmelle, 2020). For many years, spatial hotspot detection techniques have been applied to detect significant spatial hotspots for various applications. These techniques can also be used to identify COVID-19 hotspots and thus prioritize them for immediate actions and preventive measures. Formally, a spatial hotspot is defined as an area (or zone) where cases inside the area are significantly greater than the number of cases outside that area (Eftelioglu, Tang, & Shekhar, 2015). In addition to spatial dimensions, time is used as a third dimension in Spatio-temporal analysis, which helps identify active, emerging, diminishing, or persistent hotspots. In monitoring outbreaks like COVID-19, decision-makers are often interested in those Spatio-temporal hotspots which are active at the end of the study period, called alive or emerging hotspots. Researchers have made significant contributions to detect spatial and spatio-temporal hotspots of different shapes useful in various application domains in the past few years. Some of the contemporary contributions made in this regard are discussed next.

Related work

The contemporary methods for spatial and spatio-temporal hotspot detection can be broadly classified into two categories: clustering-based approaches and Scan-statistics-based approaches. Clustering-based techniques assume hotspots to be clusters of spatial locations, where, location points within a cluster are closer, and the points in different clusters are farther from each other. Considering this interpretation, numerous approaches have been proposed to detect the hotspots which utilize the basic principles of clustering. These works identify hotspots using conventional clustering methods. Recently, many authors (Kumar, n.d.; Lawson, 2010; Shi & Pun-Cheng, 2019) consolidated and reported the details of some of the clustering based (partition based, density based etc.) hotspot detection methods. In one contribution, authors (Tayal et al., 2014) developed a partition based clustering approach for crime hotspot detection and criminal identification on dataset of India. They used k-means clustering to identify circular hotspots. Certain limitations of k-means clustering in context of hotspot detection were observed. First, the value of k (number of clusters) needs to be specified, which may not be known beforehand. Second, k-means simply partitions all the points into k clusters, where each cluster contains approximately same number of points. Thus, this type of distribution may not represent significant hotspot regions. Further, in another contribution, authors (F Di Martino, Loia, & Sessa, 2008) extended k-means by incorporating fuzzy logic to determine the value of k automatically. Furthermore, another work (Ferdinando Di Martino & Sessa, 2013) introduced a Fuzzy PSO based clustering approach to identify spatial hotspots. The authors claimed that Fuzzy PSO approach for partition clustering is efficient in determining the optimal value of k in k-means, is robust to outliers and takes less computational cost than Fuzzy k-means approach. Thus, they proved their scheme to be more efficient for large datasets. Apart from partition-based clustering, numerous conventional density-based clustering methods like DBSCAN (Ester, Kriegel, Xu, & Miinchen, 1996), ST-DBSCAN (Birant & Kut, 2007), CLIQUE (Agrawal, Gehrke, Gunopulos, & Raghavan, 1998), STING (W. Wang, Yang, & Muntz, 1997), etc. have been used for identifying spatial and spatio-temporal hotspots. Out of these approaches DBSCAN (Ester et al., 1996) is the most widely used method which groups nearby dense points in the given spatial space. The grouping contains at least points and then also includes other points within ε distance (called ε neighbourhood) of these points. In addition, ST-DBSCAN (Birant & Kut, 2007) is the extension of DBSCAN proposed by authors to deal with spatio-temporal data. Choosing optimal values of user specific parameters viz. and in DBSCAN and ST-DBSCAN requires domain expertise. To select the optimal values of DBSCAN related parameters, researchers have proposed different solutions. For instance, in one work, authors (Scitovski, 2018) proposed a modified DBSCAN called rough DBSCAN which automatically identifies the DBSCAN related parameters. They also applied their proposed scheme to identify earthquake prone zones. In another similar study for earthquake zoning, authors of (Ankita & Thakur, 2018) presented a meta-heuristic algorithm to identify DBSCAN related parameters and showed effectiveness of density based clustering schemes for irregular shaped spatial hotspot detection. From the related work discussed in clustering based hotspot detection category, it can be observed that clustering-based techniques offers low computational complexity; however, they may result in false hotspots due to lack of statistical significance tests (Eftelioglu, Shekhar, Kang, & Farah, 2016). Further, clustering-based approaches need some preliminary information related to the method, which may not be available beforehand. On the other hand, scan statistics-based approaches use a scan window of the desired shape (circular, ring, ellipse, cylindrical etc.) and a statistical test to identify significant hotspots. Kulldorff has made the initial contribution in scan statistics-based methods for detecting spatial hotspots of circular and elliptical hotspots (Martin Kulldorff, 1997; Martin Kulldorff, Huang, Pickle, & Duczmal, 2006). The work was extended on spatio-temporal domain by Kulldorff (Martin Kulldorff, 2001) and is popularly known as ST-SaTScan. Kulldorff's space-time scan statistic has been used in literature to detect emerging hotspots of various diseases like influenza (Li, Fu, Lin, & Jiang, 2019), brain cancer (Martin Kulldorff, Athas, & Miller, 1998), malnutrition (Verdiana & Widyaningsih, 2017), legionnaires disease (Edens et al., 2019) and most recently COVID-19 (Desjardins et al., 2020). ST-SaTScan uses a cylindrical window of three dimensions where the base represents the space and height represents the time. This cylindrical window moves in the spatial and temporal domain in search space to enumerate candidate cylindrical hotspots. Spatially, many cylinder bases with different geographical centers and radius are generated by considering each county (or equivalent administrative units) as center and distance to every other county (or equivalent administrative units) as radii (Martin Kulldorff, 2001). Thus, spatially SaTScan allows only the location of counties to be the center of the cylinder. Due to the selection criteria of center and radii, SaTScan may miss detecting some significant hotspots where the center of hotspot does not lie exactly on the location of a county and hence is not geographically robust (not sensitive to geographical barriers). Temporally, for each cylinder base, consideration of different starting dates results in different cylinder heights. Hence, many candidate cylindrical hotspots with varying bases and heights are generated. A statistical measure called Log-Likelihood Ratio (Martin Kulldorff, 1997), indicating the strength of a hotspot, is then calculated for each cylinder. The significance of a hotspot is confirmed by relying on randomization tests to determine whether the observed pattern is significant or has occurred by chance. Due to the generation of a large number of cylinders, followed by a randomization test, SaTScan is a high computational cost method. In subsequent years various researchers have tried to improve Kulldorff's approach in terms of shape and efficiency by applying pruning, filtering, and speedups. In one contribution, authors (Eftelioglu et al., 2015) improved the SaTScan algorithm for circular hotspot detection by using a cubic grid circle listing algorithm. They applied an upper bound on each grid's likelihood ratio to detect circular hotspots. Through their approach, authors identify non-contiguous crime hotspots separated by rivers, mountains, etc. However, their approach is limited to the spatial domain. In another contribution, the authors (Tang, Eftelioglu, & Shekhar, 2015) proposed a fast elliptical hotspot detection method and detected crime hotspots in Denver city. Using a lookup table and upper bound pruning technique, the authors achieved lower computational complexity than Kulldorff's elliptical hotspot detection method. Another study (Eftelioglu, Shekhar, et al., 2016) proposed a ring-shaped hotspot detection method based on dual grid pruning and found ring-shaped crime hotspots in San Diego city. Also, a technique for linear hotspot detection was proposed by (Tang, Eftelioglu, Oliver, & Shekhar, 2017) and used to detect pedestrian fatality hotspots in Florida. The author proposed an efficient method by applying the shortest path tree pruning and speeding up the Monte Carlo simulations. Also, irregularly shaped hotspots have been detected in (Duczmal, Cançado, Takahashi, & Bessegato, 2007) by using a Genetic Algorithm based scan statistics approach for optimizing the shape of the scan window. Another approach for irregularly shaped hotspot detection based on a multi-objective Genetic algorithm was proposed by (Cançado et al., 2010). It optimizes two objectives: likelihood ratio and the penalty function for restricting excessive freedom in shape. Finally, in a study, authors (Izakian & Pedrycz, 2012) used Particle Swarm Optimization (PSO) to optimize the shape of scanning window hotspot detection. In general, the discussed contributions fall under two broad categories (Clustering-based and Scan statistics based) and suggests following: (a) clustering-based hotspot detection techniques may detect false hotspots and also needs prior information like the number of clusters (Eftelioglu, Shekhar, et al., 2016) whereas, (b) scan statistics based approaches rules out the false hotspots and detects such hotspots which are statistically significant but at a high computation cost (c) the modifications to improve SaTScan/ST-SaTScan available in the literature are limited to optimizing the shape of hotspots and not in terms of improving its efficiency or geographical robustness.

Highlights and outline of the paper

Considering the limitations of discussed approaches, we have proposed a novel Particle Swarm Optimizer based scheme for detecting geographically robust Spatio-temporal hotspots in this paper. The proposed approach facilitates the timely identification of geographically robust Spatio-temporal, alive, and emerging COVID-19 hotspots and contributes to the ongoing COVID-19 surveillance. We have also presented the comparison between the proposed approach and the traditional SaTScan method using county-level COVID-19 data of a few worst-hit states of the United States of America (U.S.), viz. New York, Texas, Florida, and California. The comparison is made using time taken to detect the hotspots and the quality of detected hotspots (in terms of Log-Likelihood Ratio). Motivated by the reduced computational cost taken by the proposed scheme to identify hotspots in worst-hit U.S. states, the PSO-based scheme is further applied to identify all U.S. states' alive and emerging hotspots. In all these analyses (either comparison with SaTScan over worst-hit U.S. states or finding emerging hotspots of all the U.S. states), the COVID 19 data of the U.S. from June 11th, 2020 to June 24th, 2020 is considered. The rest of the paper is organized as follows: Section 2 reviews the Kulldorff's SaTScan method adopted for COVID-19 hotspots detection. It also forms a baseline for our proposed approach. Section 3 presents the proposed Particle Swarm Optimizer based approach for detecting alive and emerging COVID hotspots and is referred to as ST-PSO-GRHD (Spatio Temporal PSO-based Geographically Robust Hotspot Detection). Section 4 details the experimental studies and results. It includes brief details of the datasets, followed by a comparative analysis of both approaches on four worst-hit states of the U.S. using day-wise confirmed cases count. In this section, the performance of the proposed PSO-based approach is also presented to detect alive and emerging hotspots of the entire U.S. (all the 3026 counties). Finally, conclusions and future scope of the work are discussed in Section 5.

Spatio-temporal SaTScan: A review in the context of COVID-19 hotspots

SaTScan is one of the often-used approaches (Martin Kulldorff, 1999) or software (M Kulldorff, 2018) to detect spatial, temporal, and Spatio-temporal clusters (or hotspots). The basis of SaTScan is scan statistics. In search of active and emerging COVID-19 hotspots, SaTScan has been explained here to detect cylindrical hotspots, i.e., Spatio-temporal (S.T.) hotspots. Depending on data availability, SaTScan can be used either for point data (each point represents the location of one event occurrence) or aggregated data (e.g., county or equivalent administrative block level). The considered data for COVID-19 is available at the aggregate level, i.e., county-wise data of U.S. states (The New York Times, 2020); therefore, SaTScan has been explained here in the context of aggregate data. In this study, the considered administrative unit is U.S. county. For Spatio-temporal analysis, Spatio-temporal SaTScan (ST-SaTScan) moves a cylindrical scan window of varying radii and heights in the search space S to detect areas of unusually high concentration of an event. Each cylindrical zone is defined using the center of the base representing latitude and longitude of a county, radius of the base , and height of the cylinder representing last days of the study period. To detect active and emerging hotspots, the duration of cylindrical window is considered between any one day of study period as the starting date and the last day of the study period as the end date. All the candidate hotspots are enumerated. The detailed procedure of enumerating cylinders is presented in Step 1 of Algorithm 1. Further, a hypothesis test is required, and for each cylinder, the following question needs to be addressed: “Is there any difference in the concentration of events inside vs. outside the cylinder?“. In ST-SaTScan, the hypothesis test is constructed with two outcomes, null hypothesis and alternate hypothesis . The null hypothesis states that “the number of occurrences of an event follows complete spatial randomness (CSR) in the study area ”, whereas the alternate hypothesis states that “the frequency of occurrence of an event is higher within the cylinder as compared to outside”. In the hypothesis test, the strength of a cylindrical window is calculated to check its candidature for a hotspot. Log-likelihood ratio (LLR) (Xie, 2019) (Shi & Pun-Cheng, 2019) is one of the often-used measures in ST-SaTScan to compute the strength and accordingly decide the candidature of the obtained cylindrical window as a hotspot. Let be the total number of cases in all counties. Let be the actual observed number of cases within all counties of cylinder . As per the null hypothesis , the expected number of cases within denoted by can be defined using (1) by the unitary method.where is the population of all counties within the circular base of cylinder , is the total population of study area , is the number of days under consideration for analysis. Next, the probability that a particular case lies within the hotspot if were true is given by and for the case to be outside Z is . If all the cases are uniformly distributed and H0 is true, the total probability that exactly points are present within is given by as shown in (2). Further, the alternate hypothesis considers that H0 is not true. In this case, the probability that any case lies within the hotspot is given by and that the case lies outside Z is . The total probability that exactly points are present within if the actual distribution is true is given by , as shown in (3). The likelihood ratio is defined as .Taking logarithm on both sides, the expression for LLR is shown in (4).where is a binary indication function. has a value of when a zone has more activity points than expected ; otherwise, it is set to to prevent the detection of low cases areas. Along with LLR, its distribution under the null hypothesis is also required to perform the hypothesis test. This distribution indicates what to expect when there are no hotspots in the study area. A randomization test is carried out to obtain this distribution as follows: Monte Carlo Simulations (MCS) is used to perform the randomization test and get the distribution of LLR under . In total, random datasets in the study area are generated using the Poisson point process (CSR) (assuming that COVID-19 cases follow a Poisson distribution according to the population of the geographic area). All possible cylinders are enumerated for each of the random datasets. As the higher value of (likelihood ratio) indicates that more cases have been reported than expected, the highest LLR value out of all cylinders is stored. This is done for all datasets, and the values are stored in a list. The list is named as . Next, to obtain the distribution, which suggests what type of LLRs value to expect in the same study area when there are no hotspots, the list is sorted in descending order. After enumerating all the candidate cylinders and their corresponding LLR values, the cylinder with maximum LLR is kept among overlapping cylinders, and the rest are removed. The statistical significance of each cylinder is then calculated using its LLR value and distribution of LLR under the null hypothesis. A cylindrical zone is referred as a statistically significant hotspot based on its p-value (referred as significance level). Relative position of in list is determined first to calculate the p-value of the cylindrical zone Z. The p-value is then calculated by taking the ratio between and , where is the number of Monte Carlo Simulations. If the p-value of a cylindrical zone is less than the desired level of significance , then is considered as a statistically significant alive or emerging hotspot; otherwise, the alternate hypothesis is rejected for that . Obviously, the number of enumerated cylinders is very large, so each cylinder is expanded till the user-inputted maximum spatial bound (radius in km) and maximum temporal bound (duration in days) is reached. As detailed in Section 4, experimental analysis is carried out with user-inputted spatial bound, and temporal bound is taken as 50% of the study period. However, decision-makers of different agencies can decide these bounds so that the government can adequately look and cater to each hotspot. The ST-SaTScan scheme for detecting alive and emerging COVID-19 hotspots is presented in Algorithm 1. The computational complexity of ST-SaTScan scheme presented in Algorithm 1 is computed as follows: Step 1 of the algorithm enumerates all cylinders by generating O(C2) spatial circles, and for each spatial base, O(T) temporal windows are generated. For each of the O(C2T) cylindrical windows, LLR computation takes O(C) time. Hence, Step 1 requires O(C3T) time. Step 2 of the ST-SaTScan scheme, repeats Step 1 for ‘m’ number of times to find the cylinder with maximum LLR in each Monte Carlo simulation. Thus, the computation time required for Step 2 is O(m × C3 × T) time. Step 3 checks each cylinder for its statistical significance and takes O(C2T) time. Hence, the total time complexity of ST-SaTScan is O (m × C3 × T). As seen from Algorithm 1, SaTScan enumerates only those cylinders which are centered at the centroid of some county. Therefore, it may miss detecting some of the significant hotspots whose center lies at some location other than a county centroid (Eftelioglu, Tang, & Shekhar, 2016). Thus, the hotspots detected by ST-SaTScan may or may not be geographically robust and may report reduced significance. Geographical robustness can be added by allowing any random location (apart from the centroid of a county) to be selected as the center of the spatial base. But it will enlist an infinite number of cylinders. Thus, adding geographical robustness to Spatio-temporal hotspot detection makes it an NP-class problem. Considering the significance of geographical robustness in detecting COVID 19 alive and emerging hotspots, we propose a Particle Swarm Optimizer (PSO) based scheme for the same in the next section. ST-SaTScan for detecting COVID-19 Hotspots

Proposed Scheme-Spatio Temporal Hotspot detection using PSO

As discussed in Section 2, finding geographically robust hotspots using ST-SaTScan is an NP-class problem. Therefore, this section proposes an efficient scheme for detecting geographically robust alive and emerging COVID-19 hotspots using Particle Swarm Optimizer (PSO). The motivation to use PSO is to reduce the time taken to detect Spatio-temporal COVID-19 hotspots. This section first gives a brief detailing of the Particle Swarm Optimizer. Secondly, we propose and model the Spatio-temporal hotspot detection problem as an optimization problem. Finally, a complete scheme for detecting significant COVID-19 Spatio-temporal hotspots along with its complexity analysis is presented. The proposed PSO-based scheme for Spatio-Temporal Hotspots Detection is now onwards referred to as ST-PSO-GRHD.

Particle Swarm Optimizer

Particle Swarm Optimizer (PSO) was proposed in 1995 by Kennedy and Eberhart (Eberhart & Kennedy, 1995) to copy the optimization behavior seen in a group of organisms in nature like fishes and birds (called as particles) and use it to solve real-world optimization problems. In PSO, a group of particles coordinates to reach an optimal solution to a problem. Each particle possesses a d-dimensional position (representing a solution) and a d-dimensional velocity, where d is the dimension of the search space. Each candidate solution is evaluated using a fitness function f().The first step of the algorithm is random initialization of each particle's position and zero initialization of its velocity . Each particle then iteratively moves to better positions in the search space using its own experience and the social experience of other particles. The movement of particles over the iterations is modeled using two equations, viz. velocity update (5) and position update (6). Each particle's movement is guided by its own local best position (called ) as well as the best position obtained by the entire group of particles (called ). The velocity of particle at iteration and the new position of the particle are obtained using (5), (6) respectively (Clerc & Kennedy, 2002). where represents the inertia weight of the particle, and are random numbers in the range and and represent the weightage given to cognitive component and social component of the swarm, respectively. The particles keep updating their positions and velocities until some convergence criteria are met. Convergence criteria can be either a maximum number of iterations or the same fitness if obtained for a fixed number of iterations. in (5) is a constriction coefficient and is defined in (7).where should be greater than 4 to ensure stability. With the inclusion of the constriction factor, PSO can obtain better quality solutions by maintaining a balance between exploration and exploitation (Clerc & Kennedy, 2002)(Lim, Montakhab, & Nouri, 2009).

An optimization model for geographically robust spatio-temporal hotspot detection

In this section, we model the Spatio-temporal hotspot detection problem as an optimization problem. The presented optimization model is aimed to detect alive and emerging COVID-19 hotspots. For the COVID-19 dataset having day-wise cases in counties (of U.S.) for number for days, the optimization model aims to find such an alive or emerging cylinder in search space with the highest likelihood ratio. The obtained cylinder with the highest likelihood ratio represents the most significant hotspot. The model is reused by removing counties obtained in the previous cylindrical hotspot to find the subsequent hotspots. The model is used repeatedly until a non-significant hotspot (identified using a hypothesis test) is obtained. The inputs, decision variables, fitness function, and constraints of the optimization model are presented in Table 1 , Table 2 , Equation (8), and Equations (9)–(12), respectively.

Table 1

Inputs to the optimization model for detecting the most significant COVID-19 hotspot.

Input	Description
Cases [C][T]	C: number of counties
	T: number of days in the study period.
	Cases(i,j): represents the number of confirmed COVID-19 cases reported in ith county on jth day.
Location[C]	(Latitude, Longitude) pairs of C counties
Population[C]	The population of C counties
min_rad, max_rad	The lower and upper bound of desired hotspots

Table 2

Decision Variables in the optimization model for detecting most significant COVID-19 hotspot.

Decision Variable	Description
(xc,yc)	Center of the base of cylinder (a latitude-longitude pair representing a location in search space S)
r	The radius of the cylinder
t	Height of cylinder from the end of the study period
	(representing time window (T-t+1)^th to T^th day).
	This allows only alive and emerging hotspots.

Inputs to the optimization model for detecting the most significant COVID-19 hotspot. Decision Variables in the optimization model for detecting most significant COVID-19 hotspot. Maximize where, Subject to The objective function (8) represents the Log-Likelihood Ratio of the candidate cylinder defined using the center, radius, and height which are decision variables for the model. The model aims to find the cylinder with the maximum likelihood ratio. To calculate the expected count of cases in a cylinder, we assume that COVID-19 cases follow a Poisson point process according to the population of the geographic area. Constraints (9) and (10) are used to keep the center ( of the cylinder within the study area . Constraint (11) represents the bounding condition for the radius of the cylinder. The lower and upper bounds on height representing the temporal dimension are presented in (12). The bounds are kept the same for fair comparative analysis in ST-SaTScan and ST-PSO-GRHD.

ST-PSO-GRHD scheme

The proposed ST-PSO-GRHD scheme to detect the geographically robust hotspot detection is presented in Algorithm 2. There are three major steps in the proposed scheme: (1) generating the distribution of Log-Likelihood ratio under null hypothesis H0, (2) finding the cylindrical hotspot with maximum likelihood ratio in the search space S, and (3) statistical significance using hypothesis test. The first step of the proposed scheme involves generating the distribution of LLR under null hypothesis , which states that there is no significant hotspot in the search area. This distribution is called and is obtained using Monte Carlo simulations in a similar manner, as discussed in Section 2. In the second step, the proposed scheme uses PSO to find the cylindrical hotspot with maximum likelihood ratio in the search space . The decision variables, fitness function, and constraints used by PSO are presented in Section 3.2. Initially, each of the particles is randomly placed in the search space where each particle has 4 dimensions representing the center , radius , and height of the cylindrical window. To detect alive and emerging hotspots only, the height of the cylindrical window is considered from the end date of the study period. The fitness value (LLR) of each particle is calculated using (8) to identify the global best () and personal best ( position of each particle. While calculating LLR for a cylindrical zone, all those counties whose centroid location lies within the circular spatial base of the cylinder are considered inside the zone . The sum of the day-wise count of cases from (T-t+1)th to Tth day in counties inside is used as . Particles then update their velocities and positions based on and using (5), (6). All the particles keep wandering in the search space to reach the optimum value of LLR until a convergence criterion is met. The ST-PSO-GRHD scheme converges if the best solution in the swarm does not improve continuously for a fixed number of iterations or a maximum number of iterations has been reached. After convergence, the global best value represents the cylindrical hotspot with the maximum LLR value. The third step of the proposed scheme checks the statistical significance of hotspot C.H. using a hypothesis test. The p-value of C.H. is obtained in a similar way as discussed in Section 2. If the p-value of C.H. is less than , then we store C.H. in a list of significant cylindrical hotspots. Searching for the next non-overlapping hotspot begins by removing the counties within C.H. from county set C and repeating steps 2 and 3 with the remaining counties. If the p-value of any cylindrical hotspot obtained in step 2 is greater than , we say that C.H. is not a significant hotspot and terminate the algorithm. Finally, the list of significant hotspots is outputted. The complete ST-PSO-GRHD scheme is presented in algorithm 2. ST-PSO-GRHD for detecting emerging and alive COVID-19 hotspots

Complexity analysis of ST-PSO-GRHD

The function, cylinder_maxLLR(), which uses PSO to detect cylinder with maximum LLR value, is used in two of the three major steps of the proposed ST-PSO-GRHD scheme. The function, cylinder_maxLLR() requires K × 4 computational steps to initialize K particles each of 4 dimensions in the search space followed by calculation of their fitness; hence time complexity is O(K). Each particle's fitness value calculation (LLR) requires O(C) computations. Based on the particle's personal best and global best positions, each particle's position and velocity are updated until convergence or for the maximum number of iterations ‘iter’. Thus, the time complexity of the function cylinder_maxLLR() is O(iter × K × C). Step 1 of the algorithm runs ‘m’ Monte Carlo simulations and hence calls cylinder_maxLLR() function ‘m’ times, i.e. time complexity for Step 1 is O(m × iter × K × C). In Step 2, cylinder_maxLLR() is called once; hence, the time complexity for Step 2 is O(iter × K × C). Finally, Step 3 calculates the p-value in constant time. However, Step 2 and Step 3 are repeated ‘n’ times, where n is the number of significant hotspots in the search space. Thus, the combined time complexity for Step 2 and Step 3 is O(n × iter × K × C). Therefore, as, m > n, the time complexity of ST-PSO-GRHD is O(m × iter × K × C).

Experimental studies and results

This section presents the experimental details and results obtained for ST-SaTScan and ST-PSO-GRHD schemes on the COVID-19 dataset of the United States of America (U.S.). First, we briefly describe the COVID-19 dataset we used for our experiments. Subsequently, we investigate and compare the performance of ST-SaTScan and ST-PSO-GRHD schemes in terms of quality of hotspot detected (effect of geographical robustness) and time taken for the detection. For analysis, county-wise data of the 4 worst-hit states (as on June 25th, 2020) in the U.S is used. Finally, we present the alive, and emerging hotspots of the entire U.S. country detected using the proposed PSO-based scheme. The experiments are conducted on an Intel CORE i7 processor with 8 GB RAM using MATLAB R2018a.

Dataset

The COVID-19 case count data for the United States is obtained from the “COVID-19-data” repository published by the New York Times on their GitHub page. It is an ongoing repository maintained and published by The New York Times by compiling time series data from state governments and health departments. These data are freely available on their GitHub page (https://github.com/nytimes/covid-19-data) (The New York Times, 2020) at different levels of aggregation viz. National-level data, State-level data, and county-level data. To identify hotspots at the lowest level of detail, we have used county-level case count data for our study. As the incubation period of coronavirus is 14 days, we have taken 14 days of data from June 11th, 2020 to June 24th, 2020. This data is available for 3026 counties of the U.S. The dataset contains a daily cumulative case count of different counties. Hence, to obtain a day-wise count for spatio-temporal analysis, the case count for each day in the study period is obtained by subtracting the previous day-case count from the current day case count. For some counties, the obtained day-wise count was negative for a few days. This indicates fewer cumulative cases on a particular day than the previous day, which is practically impossible. Therefore, we have cleaned the data by replacing these negative values with zero (assuming that the count of reported cases on that day is zero). Further, longitude and latitude pairs of the centroid of each U.S. county have been obtained from the Data.Healthcare.Gov web page, a federal government website managed by the Centers for Medicare and Medicaid Services in Baltimore, US (https://data.healthcare.gov/dataset/Geocodes-USA-with-Counties/52wv-g36k)(Geocodes USA with Counties, 2013). Also, the population of each U.S. County is required to generate the distribution of test statistics under the null hypothesis of no hotspots. The estimated 2019 population of each county is obtained from the United States Census Bureau website(https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html) (County Population Totals: 2010–2019, n.d.).

Comparative analysis and discussions

This subsection presents the comparative analysis between ST-SaTScan and ST-PSO-GRHD schemes to detect COVID-19 hotspots. The comparison is made using the Log-likelihood ratio, a statistical indicator of the strength of the detected hotspots, and the time taken by both the schemes to detect COVID-19 hotspots. For comparison, we have used COVID-19 data of four worst-hit U.S. states (as on June 24th, 2020) viz. New York, California, Texas, and Florida. Execution time reported in both the schemes does not include the time for Monte Carlo simulations. For both the schemes, m (number of Monte Carlo Simulations) and αp (p-value threshold) are set to 99 and 0.01, respectively (m is computed experimentally as detailed in Appendix 1). In general, an upper bound (i.e., a maximum allowed value) for the cylinder base (spatial upper bound) is used to avoid detection of extremely large hotspots. This value is either chosen by experts or it is chosen proportionately to the overall study area under consideration. Therefore, the value of this upper bound (or maximum allowed radius) is not same for all the states and chosen as 300 km, 300 km, 200 km, and 400 km for California, New York, Florida and Texas respectively in our experiments. Similarly, to detect the emerging hotspots, an upper bound on height of the cylinder (temporal upper bound) is used. The temporal upper bound is set as 50% of the study period (June 11th, 2020 and June 24th, 2020) so that only emerging hotspots can be detected. The values of other PSO-specific parameters in ST-PSO-GRHD are presented in Table 3 . These are the typical values of PSO parameters used in the literature (Lim et al., 2009) (Sengupta, Basak, & Peters, 2018).

Table 3

PSO specific Parameter Setting in ST-PSO-GRHD.

Parameter	Value
Population Size	50
α (cognitive component)	2.05
β (social component)	2.05
Ω (inertia weight)	Linearly decreased from 0.9 to 0.2
Maximum Number of iterations	500
Number of iterations for convergence	100

Further, the characteristics of the statistically significant emerging COVID-19 hotspots which are reported for both the schemes include the duration of the hotspot, count of counties in the hotspot, the observed and expected number of cases in the hotspot, LLR value of the hotspot, and time taken to detect them.

PSO specific Parameter Setting in ST-PSO-GRHD. Further, the characteristics of the statistically significant emerging COVID-19 hotspots which are reported for both the schemes include the duration of the hotspot, count of counties in the hotspot, the observed and expected number of cases in the hotspot, LLR value of the hotspot, and time taken to detect them. Table 4 presents the characteristics of COVID-19 hotspots detected in four worst-hit U.S. states by ST-SaTScan and ST-PSO-GRHD schemes. These states include California State (58 counties), New York State (58 counties), Florida State (67 counties), and Texas State (252 counties) with a maximum radius of the circular base of the cylindrical hotspot (spatial upper bound) as 300 km (km), 300 km, 200 km, and 400 km respectively. As presented in Table 4 , both the schemes detected 4 emerging hotspots, each in California and Florida; however, counties' count in detected hotspots is different. Further, ST-SaTScan detected 1 and 5 emerging hotspots in New York and Texas, respectively, whereas ST-PSO-GRHD detected 2 and 6 emerging hotspots in New York and Texas states, respectively.

Table 4

U.S. State, Count of counties in the state, Spatial upper bound while detecting the hotspots (in km)	Hotspot #	ST-SaTScan						ST-PSO-GRHD
	Hotspot #	Duration	Count of Counties (in Hotspot)	Observed	Expected	LLR	Required Time (in seconds)	Duration	Count of Counties (in Hotspot)	Observed	Expected	LLR	Required Time (in seconds)
California State, 58 counties, 300 km	1	18th to 24th June	10	17331	10545.4	2361.3	165.59	21st to 24th June	5	12178	6003.5	2838.9	36.76
	2	22nd to 24th June	2	2716	1756.68	232.71		22nd to 24th June	12	2143	1076.9	419.04
	3	20th to 24th June	2	1171	661.68	161.48		22nd to 24th June	2	449	227.77	83.95
	4	20th to 24th June	3	727	449.04	73.02		22nd to 24th June	2	509	390.87	16.41
New York State, 58 counties, 300 km	1	18th to 24th June	4	2694	1123.84	942.77	170.38	18th to 24th June	1	2403	606.03	1708.78	41.58
New York State, 58 counties, 300 km	2	- -	- -	- -	- -	- -	- -	18th to 24th June	1	138	85.09	13.97	41.58
Florida State, 67 counties, 200 km	1	18th to 24th June	25	21774	14894.2	2359.98	429.06	18th to 24th June	25	21774	14894.2	2359.98	65.34
	2	20th to 24th June	2	170	40.7704	113.70		20th to 24th June	3	215	53.58	137.62
	3	19th to 24th June	2	1141	869.97	39.31		18th to 24th June	1	1225	929	43.89
	4	23rd to 24th June	2	84	12.49	88.61		21st to 24th June	4	127	58.54	29.96
Texas State, 252 counties, 400 km	1	19th to 24th June	47	17489	9613.5	3418.44	67419.1	19th to 24th June	47	17350	9442.19	3479.7	133.73
	2	22nd to 24th June	21	809	328.30	251.26		22nd to 24th June	21	832	321.64	283.03
	3	23rd to 24th June	5	1908	1266.7	144.56		18th to 24th June	4	3298	2581.25	96.87
	4	18th to 24th June	2	368	211.44	47.61		18th to 24th June	2	371	212.62	48.41
	5	23rd to 24th June	4	104	46.43	26.33		18th to 24th June	1	43	7.91	37.76
	6	- -	- -	- -	- -	- -	- -	18th to 24th June	1	37	6.26	35.03

COVID 19 emerging hotspots detected by ST-SaTScan and ST-PSO-GRHD schemes for four worst-hit U.S. states. The first column lists the name of the four U.S. states along with the count of counties in that state, and spatial upper bound in kilometers (km). The remaining fields represent the Characteristics (count of counties in the hotspot, observed and expected number of cases in the hotspot and respective LLR, followed by time taken in seconds to detect the hotspots) of the detected emerging hotspots. The spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by both schemes in four worst-hit U.S. states, California, New York, Florida, and Texas, have been presented in Fig. 1 , Fig. 2 , Fig. 3 , and Fig. 4 , respectively. In these figures, the most significant hotspot (hotspot with highest LLR) in a state is labeled as Hotspot 1. The second most significant hotspot in the state is Hotspot 2 and so on.

Fig. 1

Fig. 2

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for New York State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2 and so on.

Fig. 3

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Florida State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on.

Fig. 4

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Texas State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on.

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for California State. Hotspot with the highest LLR is labeled as Hotspot 1, second highest LLR is labeled as Hotspot 2, and so on. Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for New York State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2 and so on. Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Florida State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on. Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Texas State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on. While comparing the performance (in terms of LLR of the detected hotspots) of both the schemes, it is observed that the LLR values of most likely hotspots detected by the proposed scheme for all the four states of the U.S. are higher than LLR of most likely hotspots detected by ST-SaTScan. For example, the LLR of hotspot #1 (i.e., most significant hotspot) detected by ST-SaTScan in California State is 2361.3. In contrast, ST-PSO-GRHD detected the most significant hotspot (hotspot #1) with improved LLR as 2838.9. Compared to ST-SaTScan, similar improvements in LLR have been observed in the detection of subsequent hotspots (hotspot #2, i.e., second most significant hotspot, etc.) in California State by ST-PSO-GRHD. Obviously, the detection of the hotspots with improved LLR is the impact of geographical robustness addressed in ST-PSO-GRHD. Besides LLR, the performance of both schemes (ST-SaTScan, ST-PSO-GRHD) has also been compared using the time required to detect the hotspot(s). Although geographical robustness is not considered in ST-SaTScan (due to enumeration of an infinite number of circular bases), it has been observed that the time taken by ST-SaTScan is significantly higher than ST-PSO-GRHD. For example, the required time by ST-SaTScan to detect all hotspots in California State is 165.59 s, whereas it is 36.76 for ST-PSO-GRHD. In another observation, there are around 4 times more counties in Texas than in California, but the required time by ST-SaTScan to detect all hotspots in Texas significantly increased by 1834 times of the required time for California. Compared to ST-SaTScan, more scalable performance has been shown by ST-PSO-GRHD where the time required to detect all hotspots in Texas is increased by 4 times the required time for California. In general, as the number of counties in the state under study increases, time taken by ST-SaTScan increases significantly. In contrast, the time taken by ST-PSO-GRHD increases almost linearly with an increase in the number of counties. The apparent reason behind the observed performance is the dependency of these schemes on the count of counties (C). Recall that the required time in ST-SaTScan is proportional to C3 (Section 2) and may not be time-efficient in the detection of hotspots with bigger values of C. In contrast, the time required by ST-PSO-GRHD is proportional to C (Section 3.4) and may be effective in the detection of hotspots with bigger values of C.

Analysis over two different timespans

In this section, we present and analyze the results of ST-PSO-GRHD over two incubation periods viz. May 28th, 2020 to June 10th, 2020, and June 11th, 2020 to June 24th, 2020. ST-PSO-GRHD is applied to the COVID-19 data of California, New York, Florida and Texas over two timespans. The spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected over both the timespans in California, New York, Florida, and Texas have been presented in Fig. 5 , Fig. 6 , Fig. 7 , and Fig. 8 , respectively.

Fig. 5

Emerging COVID-19 hotspots in California detected by ST-PSO-GRHD for two consecutive timespans.

Fig. 6

Emerging COVID-19 hotspots in New York detected by ST-PSO-GRHD for two consecutive timespans.

Fig. 7

Emerging COVID-19 hotspots in Florida detected by ST-PSO-GRHD for two consecutive timespans.

Fig. 8

Emerging COVID-19 hotspots in Texas detected by ST-PSO-GRHD for two consecutive timespans.

Emerging COVID-19 hotspots in California detected by ST-PSO-GRHD for two consecutive timespans. Emerging COVID-19 hotspots in New York detected by ST-PSO-GRHD for two consecutive timespans. Emerging COVID-19 hotspots in Florida detected by ST-PSO-GRHD for two consecutive timespans. Emerging COVID-19 hotspots in Texas detected by ST-PSO-GRHD for two consecutive timespans. As observed from Fig. 5, three hotspots covering 8 counties were identified in California from May 28th, 2020 to June 10th, 2020. However, in the next 14 days, these hotspots expanded and reported 21 counties as the hotspots. These 21 counties include 7 out of 8 counties reported as emerging hotspots in the last 14 days. Similar observations were made for New York state. As evident from Fig. 6, New York reported only one hotspot consisting of New York city counties in the timespan May 28th, 2020 to June 10th, 2020. However, in the next 14 days, along with New York City, Oneida county also emerged as a hotspot. Further, it is evident from Fig. 7 that the 18 counties which were part of hotspots identified in Florida from May 28th, 2020 to June 10th, 2020 are also part of 33 hotspot counties identified in the next 14 days i.e., June 11th, 2020 to June 24th, 2020. Finally, the observations over two consecutive timespans for Texas state indicate that only 23 counties were emerging hotspots as on June 10th, 2020. However, in the next 14 days timespan, the count increased to 76 counties. It is evident from Fig. 8 that the majority of counties that were emerging hotspots in the first time span are also a part of emerging hotspots in the second time span. Motivated with the performance (quality of the detected hotspots and time requirement) of the proposed ST-PSO-GRHD scheme over ST-SaTScan, we applied the proposed ST-PSO-GRHD scheme to detect the emerging hotspots in all the U.S. states (totaling 3026 counties). In the subsequent sections, we present the observed results.

Case study: emerging hotspots in the United States

This section presents the alive and emerging COVID-19 hotspots detected for all the states in the U.S. Considering the need of timely initiation of the preventive measures for COVID-19 outbreak and inefficiency of ST-SaTScan in handling large datasets (recall from Section 4.1, there are 3026 counties in the U.S. for which day-wise confirmed COVID-19 cases are available), ST-PSO-GRHD is applied to detect the hotspots for all the states in the U.S. For uniformity, the spatial upper bound (maximum radius of the circular base of the cylindrical hotspot) is kept as 100 km (km), and the temporal upper bound (height of the cylinder) is kept as 50% of the study period (between June 11th, 2020 and June 24th, 2020). During the study period of 14 days, ST-PSO-GRHD detected 104 emerging COVID-19 hotspots. Out of 104, the characteristics of the 30 most significant emerging COVID-19 hotspots detected in the U.S. are presented in Table 5 . These characteristics include the duration of the hotspot, its LLR value, the observed (or actual) number of cases, expected number of cases (calculated using (1)) and the number of counties forming the hotspot. The characteristics of the remaining 74 hotspots are listed in Appendix 2.

Table 5

Hotspot number	Duration	LLR	Observed	Expected	Number of Counties
1	18th to 24th June	11840.71	14215	2802.954	2
2	18th to 24th June	3235.547	10087	4037.074	7
3	18th to 24th June	3213.829	8783	3264.843	4
4	18th to 24th June	3213.798	9274	3556.827	9
5	19th to 24th June	2771.25	11263	5132.634	1
6	18th to 24th June	2253.346	6237	2330.538	8
7	18th to 24th June	1819.053	3448	1003.561	4
8	18th to 24th June	1802.955	4048	1327.246	16
9	18th to 24th June	1494.514	1071	108.0902	1
10	18th to 24th June	1419.09	1885	405.6899	7
11	18th to 24th June	1070.055	4551	2105.665	17
12	18th to 24th June	944.681	5347	2773.984	2
13	18th to 24th June	873.5656	1973	648.2487	19
14	18th to 24th June	826.9278	884	151.6107	3
15	19th to 24th June	719.3879	2094	802.2832	24
16	18th to 24th June	686.27	797	149.6394	1
17	18th to 24th June	664.3452	5628	3327.263	8
18	18th to 24th June	662.4306	889	193.0186	13
19	18th to 24th June	662.1652	1135	305.0777	20
20	19th to 24th June	655.1483	1871	709.1135	17
21	20th to 24th June	607.0873	1060	289.0985	19
22	18th to 24th June	550.2235	2326	1071.669	2
23	18th to 24th June	535.1361	1923	821.8592	4
24	19th to 24th June	461.4817	1283	478.6709	1
25	18th to 24th June	424.8899	802	232.1271	4
26	18th to 24th June	414.6855	2987	1679.363	19
27	18th to 24th June	407.8246	628	154.3832	1
28	18th to 24th June	379.3361	1938	965.2368	3
29	18th to 24th June	365.7823	701	205.1724	2
30	21st to 24th June	291.0555	739	261.2061	19

COVID 19 active and emerging hotspots detected in the United States by ST-PSO-GRHD scheme. The first column indicates the hotspot number. The remaining fields represent the Characteristics (duration of the hotspot, LLR value of the hotspot, the observed and expected number of cases in the hotspot, followed by the number of counties inside the hotspot) of the emerging hotspots. Out of 104, the most significant hotspot referred to as Hotspot #1 is found in Arizona State with an LLR value of 11840.71 and includes 2 counties viz. Yuma and Maricopa. A total of 14215 cases are observed in these two counties for the last 7 days of the study period, which is significantly greater than the expected count of 2802.95 cases. The second most significant hotspot, referred to as Hotspot 2 with 10087 observed cases, is detected in Florida and consists of 7 counties viz. Broward, Collier, Glades, Hendry, Martin, Miami Dade, and Palm Beach. The expected number of cases in these 7 counties is 4037.07, which is much less than the observed cases, and hence it is a significant hotspot with an LLR value of 3235.54. Further, the next three significant hotspots, i.e., Hotspot #3 (includes 4 counties with LLR as 3213.82), Hotspot #4 (includes 9 counties with LLR as 3213.79), and Hotspot #5 (includes 1 county with LLR as 2771.25), are detected in Texas, Florida, and California respectively. The observed duration for the first four most significant hotspots is between June 18th, 2020 and June 24th, 2020 whereas the duration for the fifth-most significant hotspot is between June 19th, 2020 and June 24th,2020. All the 104 alive or emerging hotspots in the U.S. detected by ST-PSO-GRHD are shown in Fig. 9 . It can be observed that most of the hotspots are detected in California, Texas, Florida, New York, New Jersey, Arizona, Alabama, Mississippi, Utah, and North Carolina states of the U.S. Other states containing a few hotspots include Nevada, Colorado, Kansas, Idaho, South Dakota, Iowa, Minnesota, Tennessee, and Virginia. Out of 104, the maximum number of significant emerging hotspots is detected in Texas State whereas Michigan, Colorado, Ohio, and the Wisconsin States have the least number of hotspots.

Fig. 9

Spatial distribution of active and emerging COVID-19 hotspots detected by ST-PSO-GRHD in the U.S.

Conclusion and future scope

This paper proposed a PSO-based geographically robust and efficient scheme for detecting Spatio-temporal alive and emerging hotspots. The proposed scheme, ST-PSO-GRHD, does not rely on considering a county centroid location as the center of the hotspot and thus detects more significant hotspots (evident with improved LLR) than ST-SaTScan. Also, the ability of ST-PSO-GRHD to promptly detect significant hotspots makes it a valuable surveillance scheme that can be used to detect emerging hotspots (or outbreaks) on a daily basis and as soon as they unfold. This study utilized the COVID-19 day-wise count of confirmed cases in the United States for one incubation period of 14 days between June 11th, 2020, and June 24th, 2020 to detect emerging hotspots. Performance of the ST-SaTScan and ST-PSO-GRHD schemes to detect emerging COVID-19 hotspots have been analyzed on 4 worst-hit states of the U.S. viz. New York, California, Texas, and Florida. It has been observed that ST-PSO-GRHD detects more significant hotspots (higher LLR) than ST-SaTScan quickly. Hence, the ST-PSO-GRHD method was applied to detect emerging hotspots in all states of the U.S. The counties belonging to 104 significant emerging hotspots detected by ST-PSO-GRHD should be given attention and priority in testing and allocating resources to slow down the virus transmission. Health officials and government agencies can use the rapid surveillance and statistical analysis provided by the proposed scheme to identify high-risk areas and take timely measures to contain the COVID-19 outbreak. Besides the strengths and applicability of this study, aggregation of data at the lowest level can help officials further drill down to affected zones that may further improve the precision in handling the COVID-19 outbreak. Also, the addition of some relevant information in the context of COVID-19, viz. count of older people, count of people with preexisting health conditions, etc., in future studies may further strengthen the conclusions.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

CRediT authorship contribution statement

Ankita Wadhwa: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Manish Kumar Thakur: Conceptualization, Methodology, Validation, Formal analysis, Resources, Writing – review & editing, Supervision.

Declaration of competing interest

None.

m	average	Stdev	Time(in sec)
9	7.107938	2.879943	12.304297
19	5.803806	2.249776	25.678133
49	6.620155	2.869332	65.58123
99	6.209283	2.651164	125.32
199	6.488089	2.516975	246.85
299	6.316002	2.425302	384.927146
399	6.347084	2.561424	521.640005
499	6.2629	2.64859	656.99209
999	6.328002	2.683821	1301.67

Hotspot number	Duration	LLR	Observed	Expected	Number of Counties
31	18th to 24th June	289.9761	524	146.7059	2
32	18th to 24th June	281.2931	1225	571.2766	1
33	18th to 24th June	269.7471	1396	698.5231	3
34	18th to 24th June	268.1499	1085	489.8453	4
35	18th to 24th June	264.895	748	281.4323	1
36	18th to 24th June	263.6708	444	117.5777	8
37	18th to 24th June	240.8672	369	90.28718	5
38	18th to 24th June	237.2377	608	216.0992	1
39	19th to 24th June	234.3723	2111	1267.37	2
40	18th to 24th June	212.9769	1427	783.0567	2
41	18th to 24th June	212.4057	1527	857.2695	13
42	18th to 24th June	207.1469	1202	627.5904	20
43	19th to 24th June	201.4234	502	175.3917	2
44	19th to 24th June	194.532	1088	560.4645	10
45	18th to 24th June	192.3892	2668	1780.853	2
46	19th to 24th June	153.4877	419	154.5761	8
47	19th to 24th June	149.2828	104	10.02947	1
48	18th to 24th June	145.1174	295	90.06648	3
49	18th to 24th June	121.1934	941	540.8403	2
50	18th to 24th June	119.026	451	197.4701	3
51	18th to 24th June	115.6583	119	19.51784	1
52	18th to 24th June	115.1042	634	324.6582	12
53	19th to 24th June	113.2196	2348	1693.732	7
54	18th to 24th June	112.6252	789	439.1247	7
55	18th to 24th June	107.8076	609	314.771	10
56	19th to 24th June	105.7793	286	104.9098	5
57	21st to 24th June	105.5969	284	103.8541	9
58	18th to 24th June	102.0974	249	85.81601	8
59	18th to 24th June	91.39835	815	487.6019	1
60	19th to 24th June	88.51591	518	271.1683	3
61	18th to 24th June	86.3387	134	33.14848	5
62	18th to 24th June	85.38336	1647	1172.723	11
63	18th to 24th June	84.1971	91	15.7869	1
64	18th to 24th June	80.44444	757	459.6297	1
65	18th to 24th June	77.53729	77	12.11201	1
66	19th to 24th June	77.53255	661	390.4487	5
67	19th to 24th June	69.65585	269	118.8152	2
68	18th to 24th June	66.60587	309	147.9007	5
69	18th to 24th June	66.01095	190	72.22951	5
70	18th to 24th June	62.83659	931	629.7371	12
71	18th to 24th June	60.01844	95	23.89597	1
72	18th to 24th June	56.30142	124	40.00079	5
73	18th to 24th June	56.22217	201	85.5798	1
74	18th to 24th June	51.00192	43	5.489356	1
75	18th to 24th June	47.76887	102	32.22037	1
76	18th to 24th June	46.58505	37	4.346511	1
77	19th to 24th June	46.19377	596	391.1809	6
78	21st to 24th June	45.39494	655	440.5409	2
79	18th to 24th June	45.1104	164	70.39356	1
80	18th to 24th June	44.54471	212	102.534	3
81	21st to 24th June	42.63214	448	279.944	6
82	18th to 24th June	41.64425	104	36.36945	3
83	20th to 24th June	41.63051	189	89.64546	2
84	21st to 24th June	39.04052	420	263.9994	3
85	18th to 24th June	37.49152	89	30.1506	4
86	18th to 24th June	37.43705	184	90.15118	1
87	18th to 24th June	36.93593	47	9.682573	1
88	18th to 24th June	36.37135	29	3.425554	2
89	19th to 24th June	33.74168	98	37.44089	4
90	18th to 24th June	33.29322	307	185.333	9
91	18th to 24th June	32.9358	51	12.59039	1
92	18th to 24th June	30.75873	394	258.1548	3
93	18th to 24th June	29.47358	65	20.98577	2
94	19th to 24th June	28.31764	181	97.69437	3
95	19th to 24th June	27.1288	71	25.54377	1
96	18th to 24th June	26.1122	89	37.00589	4
97	19th to 24th June	23.11535	375	258.309	3
98	18th to 24th June	20.71059	73	30.86339	2
99	18th to 24th June	20.45434	52	18.38335	2
100	18th to 24th June	20.40657	268	176.575	3
101	18th to 24th June	20.209	40	11.979	1
102	18th to 24th June	20.1643	66	26.88252	2
103	18th to 24th June	19.57009	116	60.98775	3
104	19th to 24th June	18.28813	16	2.146286	1

10 in total

1. An elliptic spatial scan statistic.

Authors: Martin Kulldorff; Lan Huang; Linda Pickle; Luiz Duczmal
Journal: Stat Med Date: 2006-11-30 Impact factor: 2.373

2. Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico.

Authors: M Kulldorff; W F Athas; E J Feurer; B A Miller; C R Key
Journal: Am J Public Health Date: 1998-09 Impact factor: 9.308

3. Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters.

Authors: André L F Cançado; Anderson R Duarte; Luiz H Duczmal; Sabino J Ferreira; Carlos M Fonseca; Eliane C D M Gontijo
Journal: Int J Health Geogr Date: 2010-10-29 Impact factor: 3.918

4. Multistate analysis of prospective Legionnaires' disease cluster detection using SaTScan, 2011-2015.

Authors: Chris Edens; Nisha B Alden; Richard N Danila; Mary-Margaret A Fill; Paul Gacek; Alison Muse; Erin Parker; Tasha Poissant; Patricia A Ryan; Chad Smelser; Melissa Tobin-D'Angelo; Stephanie J Schrag
Journal: PLoS One Date: 2019-05-30 Impact factor: 3.240

9. Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters.

Authors: M R Desjardins; A Hohl; E M Delmelle
Journal: Appl Geogr Date: 2020-04-08

10. WHO Declares COVID-19 a Pandemic.

Authors: Domenico Cucinotta; Maurizio Vanelli
Journal: Acta Biomed Date: 2020-03-19