Literature DB >> 35645430

Rapid surveillance of COVID-19 by timely detection of geographically robust, alive and emerging hotspots using Particle Swarm Optimizer.

Ankita Wadhwa1, Manish Kumar Thakur1.   

Abstract

A novel virus, called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has rapidly become a pandemic called Coronavirus disease 2019 (COVID-19). According to the World Health Organization, COVID-19 was first detected in Wuhan city in December 2019 and has affected 216 countries with 9473214 confirmed cases and 484249 deaths globally as on June 26th, 2020. Also, this outbreak continues to grow in many countries like the United States of America (U.S.), Brazil, India, and Russia. To ensure rapid surveillance and better decision-making by government authorities in different countries, it is vital to identify alive and emerging hotspots within a country promptly. State-of-the-art methods based on space-time scan statistics (like SaTScan) are not geographically robust. Also, due to the enumeration of many Spatio-temporal cylinders, the computation cost of Spatio-temporal SaTScan (ST-SaTScan) is very high. In the applications like COVID-19 where we need to detect the emerging hotspots daily as soon as the new count of cases gets updated, ST-SaTScan seems inefficient. Therefore, this paper proposes a Particle Swarm Optimizer-based scheme to timely detect geographically robust, alive, and emerging COVID-19 hotspots in a country. Timely detection can help government officials design better control strategies like increasing testing in hotspots, imposing stricter containment rules, or setting up temporary hospital beds. Performance of ST-SaTScan and proposed scheme have been analyzed for four worst-hit U.S. states for the incubation period of 14 days between June 11th, 2020, and June 24th, 2020. Results indicate that the proposed scheme detects hotspots of a higher likelihood ratio (a measure to indicate the significance of hotspot) than ST-SaTScan in significantly less time. We also applied the proposed scheme to detect the emerging COVID-19 hotspots in all states of the U.S. During the study period, the proposed scheme has detected 104 emerging COVID-19 hotspots.
© 2022 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  COVID-19; Coronavirus; Hotspot detection; Particle swarm optimizer; Spatio-temporal analysis

Year:  2022        PMID: 35645430      PMCID: PMC9127146          DOI: 10.1016/j.apgeog.2022.102719

Source DB:  PubMed          Journal:  Appl Geogr        ISSN: 0143-6228


Introduction

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first identified in December 2019 in Mainland China, with its epicenter at Wuhan City (C. Wang, Horby, Hayden, & Gao, 2020). The most common symptoms of COVID-19 include fever, cough, shortness of breath, and fatigue; however, severe cases may suffer from organ failures, heart attacks, pneumonia, or even death (Sanche et al., 2020). The novelty of COVID-19, its highly contagious nature, and lack of appropriate medicinal knowledge resulted in its rapid outbreak worldwide. Consequently, COVID-19 was declared a pandemic by the World Health Organization on March 11th, 2020 (Cucinotta & Vanelli, 2020). On June 27th, 2020, there were 9473214 confirmed cases and 484249 deaths globally. However, the death rate of COVID-19 lies between 1% and 5%, and 80% of the total cases are mild ones (Ruan, Yang, Wang, Jiang, & Song, 2020). The lack of proper medication and vaccines for COVID-19 forced the governments of different countries to impose a complete lockdown on the entire country. Depending on the outbreak's severity, many countries strictly implemented the lockdown; however, governments in some countries have taken various other preventive measures. These measures include restrictions on international and domestic flights, restricted inter-territorial movements for non-essential commodities, etc. The complete lockdown and other preventive measures had severe and apparent effects on the economies, resulting in lifting the lockdowns even with a surge of confirmed COVID-19 cases. However, in fighting the outbreak of a highly infectious disease like COVID-19 and also to ensure little adversity in economies, it is essential to target the prioritized locations (known as hotspots) for stricter containment measures, increased testing, and ramping up of medical facilities (Desjardins, Hohl, & Delmelle, 2020). For many years, spatial hotspot detection techniques have been applied to detect significant spatial hotspots for various applications. These techniques can also be used to identify COVID-19 hotspots and thus prioritize them for immediate actions and preventive measures. Formally, a spatial hotspot is defined as an area (or zone) where cases inside the area are significantly greater than the number of cases outside that area (Eftelioglu, Tang, & Shekhar, 2015). In addition to spatial dimensions, time is used as a third dimension in Spatio-temporal analysis, which helps identify active, emerging, diminishing, or persistent hotspots. In monitoring outbreaks like COVID-19, decision-makers are often interested in those Spatio-temporal hotspots which are active at the end of the study period, called alive or emerging hotspots. Researchers have made significant contributions to detect spatial and spatio-temporal hotspots of different shapes useful in various application domains in the past few years. Some of the contemporary contributions made in this regard are discussed next.

Related work

The contemporary methods for spatial and spatio-temporal hotspot detection can be broadly classified into two categories: clustering-based approaches and Scan-statistics-based approaches. Clustering-based techniques assume hotspots to be clusters of spatial locations, where, location points within a cluster are closer, and the points in different clusters are farther from each other. Considering this interpretation, numerous approaches have been proposed to detect the hotspots which utilize the basic principles of clustering. These works identify hotspots using conventional clustering methods. Recently, many authors (Kumar, n.d.; Lawson, 2010; Shi & Pun-Cheng, 2019) consolidated and reported the details of some of the clustering based (partition based, density based etc.) hotspot detection methods. In one contribution, authors (Tayal et al., 2014) developed a partition based clustering approach for crime hotspot detection and criminal identification on dataset of India. They used k-means clustering to identify circular hotspots. Certain limitations of k-means clustering in context of hotspot detection were observed. First, the value of k (number of clusters) needs to be specified, which may not be known beforehand. Second, k-means simply partitions all the points into k clusters, where each cluster contains approximately same number of points. Thus, this type of distribution may not represent significant hotspot regions. Further, in another contribution, authors (F Di Martino, Loia, & Sessa, 2008) extended k-means by incorporating fuzzy logic to determine the value of k automatically. Furthermore, another work (Ferdinando Di Martino & Sessa, 2013) introduced a Fuzzy PSO based clustering approach to identify spatial hotspots. The authors claimed that Fuzzy PSO approach for partition clustering is efficient in determining the optimal value of k in k-means, is robust to outliers and takes less computational cost than Fuzzy k-means approach. Thus, they proved their scheme to be more efficient for large datasets. Apart from partition-based clustering, numerous conventional density-based clustering methods like DBSCAN (Ester, Kriegel, Xu, & Miinchen, 1996), ST-DBSCAN (Birant & Kut, 2007), CLIQUE (Agrawal, Gehrke, Gunopulos, & Raghavan, 1998), STING (W. Wang, Yang, & Muntz, 1997), etc. have been used for identifying spatial and spatio-temporal hotspots. Out of these approaches DBSCAN (Ester et al., 1996) is the most widely used method which groups nearby dense points in the given spatial space. The grouping contains at least points and then also includes other points within ε distance (called ε neighbourhood) of these points. In addition, ST-DBSCAN (Birant & Kut, 2007) is the extension of DBSCAN proposed by authors to deal with spatio-temporal data. Choosing optimal values of user specific parameters viz. and in DBSCAN and ST-DBSCAN requires domain expertise. To select the optimal values of DBSCAN related parameters, researchers have proposed different solutions. For instance, in one work, authors (Scitovski, 2018) proposed a modified DBSCAN called rough DBSCAN which automatically identifies the DBSCAN related parameters. They also applied their proposed scheme to identify earthquake prone zones. In another similar study for earthquake zoning, authors of (Ankita & Thakur, 2018) presented a meta-heuristic algorithm to identify DBSCAN related parameters and showed effectiveness of density based clustering schemes for irregular shaped spatial hotspot detection. From the related work discussed in clustering based hotspot detection category, it can be observed that clustering-based techniques offers low computational complexity; however, they may result in false hotspots due to lack of statistical significance tests (Eftelioglu, Shekhar, Kang, & Farah, 2016). Further, clustering-based approaches need some preliminary information related to the method, which may not be available beforehand. On the other hand, scan statistics-based approaches use a scan window of the desired shape (circular, ring, ellipse, cylindrical etc.) and a statistical test to identify significant hotspots. Kulldorff has made the initial contribution in scan statistics-based methods for detecting spatial hotspots of circular and elliptical hotspots (Martin Kulldorff, 1997; Martin Kulldorff, Huang, Pickle, & Duczmal, 2006). The work was extended on spatio-temporal domain by Kulldorff (Martin Kulldorff, 2001) and is popularly known as ST-SaTScan. Kulldorff's space-time scan statistic has been used in literature to detect emerging hotspots of various diseases like influenza (Li, Fu, Lin, & Jiang, 2019), brain cancer (Martin Kulldorff, Athas, & Miller, 1998), malnutrition (Verdiana & Widyaningsih, 2017), legionnaires disease (Edens et al., 2019) and most recently COVID-19 (Desjardins et al., 2020). ST-SaTScan uses a cylindrical window of three dimensions where the base represents the space and height represents the time. This cylindrical window moves in the spatial and temporal domain in search space to enumerate candidate cylindrical hotspots. Spatially, many cylinder bases with different geographical centers and radius are generated by considering each county (or equivalent administrative units) as center and distance to every other county (or equivalent administrative units) as radii (Martin Kulldorff, 2001). Thus, spatially SaTScan allows only the location of counties to be the center of the cylinder. Due to the selection criteria of center and radii, SaTScan may miss detecting some significant hotspots where the center of hotspot does not lie exactly on the location of a county and hence is not geographically robust (not sensitive to geographical barriers). Temporally, for each cylinder base, consideration of different starting dates results in different cylinder heights. Hence, many candidate cylindrical hotspots with varying bases and heights are generated. A statistical measure called Log-Likelihood Ratio (Martin Kulldorff, 1997), indicating the strength of a hotspot, is then calculated for each cylinder. The significance of a hotspot is confirmed by relying on randomization tests to determine whether the observed pattern is significant or has occurred by chance. Due to the generation of a large number of cylinders, followed by a randomization test, SaTScan is a high computational cost method. In subsequent years various researchers have tried to improve Kulldorff's approach in terms of shape and efficiency by applying pruning, filtering, and speedups. In one contribution, authors (Eftelioglu et al., 2015) improved the SaTScan algorithm for circular hotspot detection by using a cubic grid circle listing algorithm. They applied an upper bound on each grid's likelihood ratio to detect circular hotspots. Through their approach, authors identify non-contiguous crime hotspots separated by rivers, mountains, etc. However, their approach is limited to the spatial domain. In another contribution, the authors (Tang, Eftelioglu, & Shekhar, 2015) proposed a fast elliptical hotspot detection method and detected crime hotspots in Denver city. Using a lookup table and upper bound pruning technique, the authors achieved lower computational complexity than Kulldorff's elliptical hotspot detection method. Another study (Eftelioglu, Shekhar, et al., 2016) proposed a ring-shaped hotspot detection method based on dual grid pruning and found ring-shaped crime hotspots in San Diego city. Also, a technique for linear hotspot detection was proposed by (Tang, Eftelioglu, Oliver, & Shekhar, 2017) and used to detect pedestrian fatality hotspots in Florida. The author proposed an efficient method by applying the shortest path tree pruning and speeding up the Monte Carlo simulations. Also, irregularly shaped hotspots have been detected in (Duczmal, Cançado, Takahashi, & Bessegato, 2007) by using a Genetic Algorithm based scan statistics approach for optimizing the shape of the scan window. Another approach for irregularly shaped hotspot detection based on a multi-objective Genetic algorithm was proposed by (Cançado et al., 2010). It optimizes two objectives: likelihood ratio and the penalty function for restricting excessive freedom in shape. Finally, in a study, authors (Izakian & Pedrycz, 2012) used Particle Swarm Optimization (PSO) to optimize the shape of scanning window hotspot detection. In general, the discussed contributions fall under two broad categories (Clustering-based and Scan statistics based) and suggests following: (a) clustering-based hotspot detection techniques may detect false hotspots and also needs prior information like the number of clusters (Eftelioglu, Shekhar, et al., 2016) whereas, (b) scan statistics based approaches rules out the false hotspots and detects such hotspots which are statistically significant but at a high computation cost (c) the modifications to improve SaTScan/ST-SaTScan available in the literature are limited to optimizing the shape of hotspots and not in terms of improving its efficiency or geographical robustness.

Highlights and outline of the paper

Considering the limitations of discussed approaches, we have proposed a novel Particle Swarm Optimizer based scheme for detecting geographically robust Spatio-temporal hotspots in this paper. The proposed approach facilitates the timely identification of geographically robust Spatio-temporal, alive, and emerging COVID-19 hotspots and contributes to the ongoing COVID-19 surveillance. We have also presented the comparison between the proposed approach and the traditional SaTScan method using county-level COVID-19 data of a few worst-hit states of the United States of America (U.S.), viz. New York, Texas, Florida, and California. The comparison is made using time taken to detect the hotspots and the quality of detected hotspots (in terms of Log-Likelihood Ratio). Motivated by the reduced computational cost taken by the proposed scheme to identify hotspots in worst-hit U.S. states, the PSO-based scheme is further applied to identify all U.S. states' alive and emerging hotspots. In all these analyses (either comparison with SaTScan over worst-hit U.S. states or finding emerging hotspots of all the U.S. states), the COVID 19 data of the U.S. from June 11th, 2020 to June 24th, 2020 is considered. The rest of the paper is organized as follows: Section 2 reviews the Kulldorff's SaTScan method adopted for COVID-19 hotspots detection. It also forms a baseline for our proposed approach. Section 3 presents the proposed Particle Swarm Optimizer based approach for detecting alive and emerging COVID hotspots and is referred to as ST-PSO-GRHD (Spatio Temporal PSO-based Geographically Robust Hotspot Detection). Section 4 details the experimental studies and results. It includes brief details of the datasets, followed by a comparative analysis of both approaches on four worst-hit states of the U.S. using day-wise confirmed cases count. In this section, the performance of the proposed PSO-based approach is also presented to detect alive and emerging hotspots of the entire U.S. (all the 3026 counties). Finally, conclusions and future scope of the work are discussed in Section 5.

Spatio-temporal SaTScan: A review in the context of COVID-19 hotspots

SaTScan is one of the often-used approaches (Martin Kulldorff, 1999) or software (M Kulldorff, 2018) to detect spatial, temporal, and Spatio-temporal clusters (or hotspots). The basis of SaTScan is scan statistics. In search of active and emerging COVID-19 hotspots, SaTScan has been explained here to detect cylindrical hotspots, i.e., Spatio-temporal (S.T.) hotspots. Depending on data availability, SaTScan can be used either for point data (each point represents the location of one event occurrence) or aggregated data (e.g., county or equivalent administrative block level). The considered data for COVID-19 is available at the aggregate level, i.e., county-wise data of U.S. states (The New York Times, 2020); therefore, SaTScan has been explained here in the context of aggregate data. In this study, the considered administrative unit is U.S. county. For Spatio-temporal analysis, Spatio-temporal SaTScan (ST-SaTScan) moves a cylindrical scan window of varying radii and heights in the search space S to detect areas of unusually high concentration of an event. Each cylindrical zone is defined using the center of the base representing latitude and longitude of a county, radius of the base , and height of the cylinder representing last days of the study period. To detect active and emerging hotspots, the duration of cylindrical window is considered between any one day of study period as the starting date and the last day of the study period as the end date. All the candidate hotspots are enumerated. The detailed procedure of enumerating cylinders is presented in Step 1 of Algorithm 1. Further, a hypothesis test is required, and for each cylinder, the following question needs to be addressed: “Is there any difference in the concentration of events inside vs. outside the cylinder?“. In ST-SaTScan, the hypothesis test is constructed with two outcomes, null hypothesis and alternate hypothesis . The null hypothesis states that “the number of occurrences of an event follows complete spatial randomness (CSR) in the study area ”, whereas the alternate hypothesis states that “the frequency of occurrence of an event is higher within the cylinder as compared to outside”. In the hypothesis test, the strength of a cylindrical window is calculated to check its candidature for a hotspot. Log-likelihood ratio (LLR) (Xie, 2019) (Shi & Pun-Cheng, 2019) is one of the often-used measures in ST-SaTScan to compute the strength and accordingly decide the candidature of the obtained cylindrical window as a hotspot. Let be the total number of cases in all counties. Let be the actual observed number of cases within all counties of cylinder . As per the null hypothesis , the expected number of cases within denoted by can be defined using (1) by the unitary method.where is the population of all counties within the circular base of cylinder , is the total population of study area , is the number of days under consideration for analysis. Next, the probability that a particular case lies within the hotspot if were true is given by and for the case to be outside Z is . If all the cases are uniformly distributed and H0 is true, the total probability that exactly points are present within is given by as shown in (2). Further, the alternate hypothesis considers that H0 is not true. In this case, the probability that any case lies within the hotspot is given by and that the case lies outside Z is . The total probability that exactly points are present within if the actual distribution is true is given by , as shown in (3). The likelihood ratio is defined as .Taking logarithm on both sides, the expression for LLR is shown in (4).where is a binary indication function. has a value of when a zone has more activity points than expected ; otherwise, it is set to to prevent the detection of low cases areas. Along with LLR, its distribution under the null hypothesis is also required to perform the hypothesis test. This distribution indicates what to expect when there are no hotspots in the study area. A randomization test is carried out to obtain this distribution as follows: Monte Carlo Simulations (MCS) is used to perform the randomization test and get the distribution of LLR under . In total, random datasets in the study area are generated using the Poisson point process (CSR) (assuming that COVID-19 cases follow a Poisson distribution according to the population of the geographic area). All possible cylinders are enumerated for each of the random datasets. As the higher value of (likelihood ratio) indicates that more cases have been reported than expected, the highest LLR value out of all cylinders is stored. This is done for all datasets, and the values are stored in a list. The list is named as . Next, to obtain the distribution, which suggests what type of LLRs value to expect in the same study area when there are no hotspots, the list is sorted in descending order. After enumerating all the candidate cylinders and their corresponding LLR values, the cylinder with maximum LLR is kept among overlapping cylinders, and the rest are removed. The statistical significance of each cylinder is then calculated using its LLR value and distribution of LLR under the null hypothesis. A cylindrical zone is referred as a statistically significant hotspot based on its p-value (referred as significance level). Relative position of in list is determined first to calculate the p-value of the cylindrical zone Z. The p-value is then calculated by taking the ratio between and , where is the number of Monte Carlo Simulations. If the p-value of a cylindrical zone is less than the desired level of significance , then is considered as a statistically significant alive or emerging hotspot; otherwise, the alternate hypothesis is rejected for that . Obviously, the number of enumerated cylinders is very large, so each cylinder is expanded till the user-inputted maximum spatial bound (radius in km) and maximum temporal bound (duration in days) is reached. As detailed in Section 4, experimental analysis is carried out with user-inputted spatial bound, and temporal bound is taken as 50% of the study period. However, decision-makers of different agencies can decide these bounds so that the government can adequately look and cater to each hotspot. The ST-SaTScan scheme for detecting alive and emerging COVID-19 hotspots is presented in Algorithm 1. The computational complexity of ST-SaTScan scheme presented in Algorithm 1 is computed as follows: Step 1 of the algorithm enumerates all cylinders by generating O(C2) spatial circles, and for each spatial base, O(T) temporal windows are generated. For each of the O(C2T) cylindrical windows, LLR computation takes O(C) time. Hence, Step 1 requires O(C3T) time. Step 2 of the ST-SaTScan scheme, repeats Step 1 for ‘m’ number of times to find the cylinder with maximum LLR in each Monte Carlo simulation. Thus, the computation time required for Step 2 is O(m × C3 × T) time. Step 3 checks each cylinder for its statistical significance and takes O(C2T) time. Hence, the total time complexity of ST-SaTScan is O (m × C3 × T). As seen from Algorithm 1, SaTScan enumerates only those cylinders which are centered at the centroid of some county. Therefore, it may miss detecting some of the significant hotspots whose center lies at some location other than a county centroid (Eftelioglu, Tang, & Shekhar, 2016). Thus, the hotspots detected by ST-SaTScan may or may not be geographically robust and may report reduced significance. Geographical robustness can be added by allowing any random location (apart from the centroid of a county) to be selected as the center of the spatial base. But it will enlist an infinite number of cylinders. Thus, adding geographical robustness to Spatio-temporal hotspot detection makes it an NP-class problem. Considering the significance of geographical robustness in detecting COVID 19 alive and emerging hotspots, we propose a Particle Swarm Optimizer (PSO) based scheme for the same in the next section. ST-SaTScan for detecting COVID-19 Hotspots

Proposed Scheme-Spatio Temporal Hotspot detection using PSO

As discussed in Section 2, finding geographically robust hotspots using ST-SaTScan is an NP-class problem. Therefore, this section proposes an efficient scheme for detecting geographically robust alive and emerging COVID-19 hotspots using Particle Swarm Optimizer (PSO). The motivation to use PSO is to reduce the time taken to detect Spatio-temporal COVID-19 hotspots. This section first gives a brief detailing of the Particle Swarm Optimizer. Secondly, we propose and model the Spatio-temporal hotspot detection problem as an optimization problem. Finally, a complete scheme for detecting significant COVID-19 Spatio-temporal hotspots along with its complexity analysis is presented. The proposed PSO-based scheme for Spatio-Temporal Hotspots Detection is now onwards referred to as ST-PSO-GRHD.

Particle Swarm Optimizer

Particle Swarm Optimizer (PSO) was proposed in 1995 by Kennedy and Eberhart (Eberhart & Kennedy, 1995) to copy the optimization behavior seen in a group of organisms in nature like fishes and birds (called as particles) and use it to solve real-world optimization problems. In PSO, a group of particles coordinates to reach an optimal solution to a problem. Each particle possesses a d-dimensional position (representing a solution) and a d-dimensional velocity, where d is the dimension of the search space. Each candidate solution is evaluated using a fitness function f().The first step of the algorithm is random initialization of each particle's position and zero initialization of its velocity . Each particle then iteratively moves to better positions in the search space using its own experience and the social experience of other particles. The movement of particles over the iterations is modeled using two equations, viz. velocity update (5) and position update (6). Each particle's movement is guided by its own local best position (called ) as well as the best position obtained by the entire group of particles (called ). The velocity of particle at iteration and the new position of the particle are obtained using (5), (6) respectively (Clerc & Kennedy, 2002). where represents the inertia weight of the particle, and are random numbers in the range and and represent the weightage given to cognitive component and social component of the swarm, respectively. The particles keep updating their positions and velocities until some convergence criteria are met. Convergence criteria can be either a maximum number of iterations or the same fitness if obtained for a fixed number of iterations. in (5) is a constriction coefficient and is defined in (7).where should be greater than 4 to ensure stability. With the inclusion of the constriction factor, PSO can obtain better quality solutions by maintaining a balance between exploration and exploitation (Clerc & Kennedy, 2002)(Lim, Montakhab, & Nouri, 2009).

An optimization model for geographically robust spatio-temporal hotspot detection

In this section, we model the Spatio-temporal hotspot detection problem as an optimization problem. The presented optimization model is aimed to detect alive and emerging COVID-19 hotspots. For the COVID-19 dataset having day-wise cases in counties (of U.S.) for number for days, the optimization model aims to find such an alive or emerging cylinder in search space with the highest likelihood ratio. The obtained cylinder with the highest likelihood ratio represents the most significant hotspot. The model is reused by removing counties obtained in the previous cylindrical hotspot to find the subsequent hotspots. The model is used repeatedly until a non-significant hotspot (identified using a hypothesis test) is obtained. The inputs, decision variables, fitness function, and constraints of the optimization model are presented in Table 1 , Table 2 , Equation (8), and Equations (9)–(12), respectively.
Table 1

Inputs to the optimization model for detecting the most significant COVID-19 hotspot.

InputDescription
Cases [C][T]C: number of counties
T: number of days in the study period.
Cases(i,j): represents the number of confirmed COVID-19 cases reported in ith county on jth day.
Location[C](Latitude, Longitude) pairs of C counties
Population[C]The population of C counties
min_rad, max_radThe lower and upper bound of desired hotspots
Table 2

Decision Variables in the optimization model for detecting most significant COVID-19 hotspot.

Decision VariableDescription
(xc,yc)Center of the base of cylinder (a latitude-longitude pair representing a location in search space S)
rThe radius of the cylinder
tHeight of cylinder from the end of the study period
(representing time window (T-t+1)th to Tth day).
This allows only alive and emerging hotspots.
Inputs to the optimization model for detecting the most significant COVID-19 hotspot. Decision Variables in the optimization model for detecting most significant COVID-19 hotspot. Maximize where, Subject to The objective function (8) represents the Log-Likelihood Ratio of the candidate cylinder defined using the center, radius, and height which are decision variables for the model. The model aims to find the cylinder with the maximum likelihood ratio. To calculate the expected count of cases in a cylinder, we assume that COVID-19 cases follow a Poisson point process according to the population of the geographic area. Constraints (9) and (10) are used to keep the center ( of the cylinder within the study area . Constraint (11) represents the bounding condition for the radius of the cylinder. The lower and upper bounds on height representing the temporal dimension are presented in (12). The bounds are kept the same for fair comparative analysis in ST-SaTScan and ST-PSO-GRHD.

ST-PSO-GRHD scheme

The proposed ST-PSO-GRHD scheme to detect the geographically robust hotspot detection is presented in Algorithm 2. There are three major steps in the proposed scheme: (1) generating the distribution of Log-Likelihood ratio under null hypothesis H0, (2) finding the cylindrical hotspot with maximum likelihood ratio in the search space S, and (3) statistical significance using hypothesis test. The first step of the proposed scheme involves generating the distribution of LLR under null hypothesis , which states that there is no significant hotspot in the search area. This distribution is called and is obtained using Monte Carlo simulations in a similar manner, as discussed in Section 2. In the second step, the proposed scheme uses PSO to find the cylindrical hotspot with maximum likelihood ratio in the search space . The decision variables, fitness function, and constraints used by PSO are presented in Section 3.2. Initially, each of the particles is randomly placed in the search space where each particle has 4 dimensions representing the center , radius , and height of the cylindrical window. To detect alive and emerging hotspots only, the height of the cylindrical window is considered from the end date of the study period. The fitness value (LLR) of each particle is calculated using (8) to identify the global best () and personal best ( position of each particle. While calculating LLR for a cylindrical zone, all those counties whose centroid location lies within the circular spatial base of the cylinder are considered inside the zone . The sum of the day-wise count of cases from (T-t+1)th to Tth day in counties inside is used as . Particles then update their velocities and positions based on and using (5), (6). All the particles keep wandering in the search space to reach the optimum value of LLR until a convergence criterion is met. The ST-PSO-GRHD scheme converges if the best solution in the swarm does not improve continuously for a fixed number of iterations or a maximum number of iterations has been reached. After convergence, the global best value represents the cylindrical hotspot with the maximum LLR value. The third step of the proposed scheme checks the statistical significance of hotspot C.H. using a hypothesis test. The p-value of C.H. is obtained in a similar way as discussed in Section 2. If the p-value of C.H. is less than , then we store C.H. in a list of significant cylindrical hotspots. Searching for the next non-overlapping hotspot begins by removing the counties within C.H. from county set C and repeating steps 2 and 3 with the remaining counties. If the p-value of any cylindrical hotspot obtained in step 2 is greater than , we say that C.H. is not a significant hotspot and terminate the algorithm. Finally, the list of significant hotspots is outputted. The complete ST-PSO-GRHD scheme is presented in algorithm 2. ST-PSO-GRHD for detecting emerging and alive COVID-19 hotspots

Complexity analysis of ST-PSO-GRHD

The function, cylinder_maxLLR(), which uses PSO to detect cylinder with maximum LLR value, is used in two of the three major steps of the proposed ST-PSO-GRHD scheme. The function, cylinder_maxLLR() requires K × 4 computational steps to initialize K particles each of 4 dimensions in the search space followed by calculation of their fitness; hence time complexity is O(K). Each particle's fitness value calculation (LLR) requires O(C) computations. Based on the particle's personal best and global best positions, each particle's position and velocity are updated until convergence or for the maximum number of iterations ‘iter’. Thus, the time complexity of the function cylinder_maxLLR() is O(iter × K × C). Step 1 of the algorithm runs ‘m’ Monte Carlo simulations and hence calls cylinder_maxLLR() function ‘m’ times, i.e. time complexity for Step 1 is O(m × iter × K × C). In Step 2, cylinder_maxLLR() is called once; hence, the time complexity for Step 2 is O(iter × K × C). Finally, Step 3 calculates the p-value in constant time. However, Step 2 and Step 3 are repeated ‘n’ times, where n is the number of significant hotspots in the search space. Thus, the combined time complexity for Step 2 and Step 3 is O(n × iter × K × C). Therefore, as, m > n, the time complexity of ST-PSO-GRHD is O(m × iter × K × C).

Experimental studies and results

This section presents the experimental details and results obtained for ST-SaTScan and ST-PSO-GRHD schemes on the COVID-19 dataset of the United States of America (U.S.). First, we briefly describe the COVID-19 dataset we used for our experiments. Subsequently, we investigate and compare the performance of ST-SaTScan and ST-PSO-GRHD schemes in terms of quality of hotspot detected (effect of geographical robustness) and time taken for the detection. For analysis, county-wise data of the 4 worst-hit states (as on June 25th, 2020) in the U.S is used. Finally, we present the alive, and emerging hotspots of the entire U.S. country detected using the proposed PSO-based scheme. The experiments are conducted on an Intel CORE i7 processor with 8 GB RAM using MATLAB R2018a.

Dataset

The COVID-19 case count data for the United States is obtained from the “COVID-19-data” repository published by the New York Times on their GitHub page. It is an ongoing repository maintained and published by The New York Times by compiling time series data from state governments and health departments. These data are freely available on their GitHub page (https://github.com/nytimes/covid-19-data) (The New York Times, 2020) at different levels of aggregation viz. National-level data, State-level data, and county-level data. To identify hotspots at the lowest level of detail, we have used county-level case count data for our study. As the incubation period of coronavirus is 14 days, we have taken 14 days of data from June 11th, 2020 to June 24th, 2020. This data is available for 3026 counties of the U.S. The dataset contains a daily cumulative case count of different counties. Hence, to obtain a day-wise count for spatio-temporal analysis, the case count for each day in the study period is obtained by subtracting the previous day-case count from the current day case count. For some counties, the obtained day-wise count was negative for a few days. This indicates fewer cumulative cases on a particular day than the previous day, which is practically impossible. Therefore, we have cleaned the data by replacing these negative values with zero (assuming that the count of reported cases on that day is zero). Further, longitude and latitude pairs of the centroid of each U.S. county have been obtained from the Data.Healthcare.Gov web page, a federal government website managed by the Centers for Medicare and Medicaid Services in Baltimore, US (https://data.healthcare.gov/dataset/Geocodes-USA-with-Counties/52wv-g36k)(Geocodes USA with Counties, 2013). Also, the population of each U.S. County is required to generate the distribution of test statistics under the null hypothesis of no hotspots. The estimated 2019 population of each county is obtained from the United States Census Bureau website(https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html) (County Population Totals: 2010–2019, n.d.).

Comparative analysis and discussions

This subsection presents the comparative analysis between ST-SaTScan and ST-PSO-GRHD schemes to detect COVID-19 hotspots. The comparison is made using the Log-likelihood ratio, a statistical indicator of the strength of the detected hotspots, and the time taken by both the schemes to detect COVID-19 hotspots. For comparison, we have used COVID-19 data of four worst-hit U.S. states (as on June 24th, 2020) viz. New York, California, Texas, and Florida. Execution time reported in both the schemes does not include the time for Monte Carlo simulations. For both the schemes, m (number of Monte Carlo Simulations) and αp (p-value threshold) are set to 99 and 0.01, respectively (m is computed experimentally as detailed in Appendix 1). In general, an upper bound (i.e., a maximum allowed value) for the cylinder base (spatial upper bound) is used to avoid detection of extremely large hotspots. This value is either chosen by experts or it is chosen proportionately to the overall study area under consideration. Therefore, the value of this upper bound (or maximum allowed radius) is not same for all the states and chosen as 300 km, 300 km, 200 km, and 400 km for California, New York, Florida and Texas respectively in our experiments. Similarly, to detect the emerging hotspots, an upper bound on height of the cylinder (temporal upper bound) is used. The temporal upper bound is set as 50% of the study period (June 11th, 2020 and June 24th, 2020) so that only emerging hotspots can be detected. The values of other PSO-specific parameters in ST-PSO-GRHD are presented in Table 3 . These are the typical values of PSO parameters used in the literature (Lim et al., 2009) (Sengupta, Basak, & Peters, 2018).
Table 3

PSO specific Parameter Setting in ST-PSO-GRHD.

ParameterValue
Population Size50
α (cognitive component)2.05
β (social component)2.05
Ω (inertia weight)Linearly decreased from 0.9 to 0.2
Maximum Number of iterations500
Number of iterations for convergence100

Further, the characteristics of the statistically significant emerging COVID-19 hotspots which are reported for both the schemes include the duration of the hotspot, count of counties in the hotspot, the observed and expected number of cases in the hotspot, LLR value of the hotspot, and time taken to detect them.

PSO specific Parameter Setting in ST-PSO-GRHD. Further, the characteristics of the statistically significant emerging COVID-19 hotspots which are reported for both the schemes include the duration of the hotspot, count of counties in the hotspot, the observed and expected number of cases in the hotspot, LLR value of the hotspot, and time taken to detect them. Table 4 presents the characteristics of COVID-19 hotspots detected in four worst-hit U.S. states by ST-SaTScan and ST-PSO-GRHD schemes. These states include California State (58 counties), New York State (58 counties), Florida State (67 counties), and Texas State (252 counties) with a maximum radius of the circular base of the cylindrical hotspot (spatial upper bound) as 300 km (km), 300 km, 200 km, and 400 km respectively. As presented in Table 4 , both the schemes detected 4 emerging hotspots, each in California and Florida; however, counties' count in detected hotspots is different. Further, ST-SaTScan detected 1 and 5 emerging hotspots in New York and Texas, respectively, whereas ST-PSO-GRHD detected 2 and 6 emerging hotspots in New York and Texas states, respectively.
Table 4

COVID 19 emerging hotspots detected by ST-SaTScan and ST-PSO-GRHD schemes for four worst-hit U.S. states. The first column lists the name of the four U.S. states along with the count of counties in that state, and spatial upper bound in kilometers (km). The remaining fields represent the Characteristics (count of counties in the hotspot, observed and expected number of cases in the hotspot and respective LLR, followed by time taken in seconds to detect the hotspots) of the detected emerging hotspots.

U.S. State, Count of counties in the state, Spatial upper bound while detecting the hotspots (in km)Hotspot #ST-SaTScan
ST-PSO-GRHD
DurationCount of Counties (in Hotspot)ObservedExpectedLLRRequired Time (in seconds)DurationCount of Counties (in Hotspot)ObservedExpectedLLRRequired Time (in seconds)
California State, 58 counties, 300 km118th to 24th June101733110545.42361.3165.5921st to 24th June5121786003.52838.936.76
222nd to 24th June227161756.68232.7122nd to 24th June1221431076.9419.04
320th to 24th June21171661.68161.4822nd to 24th June2449227.7783.95
420th to 24th June3727449.0473.0222nd to 24th June2509390.8716.41
New York State, 58 counties, 300 km118th to 24th June426941123.84942.77170.3818th to 24th June12403606.031708.7841.58
2- -- -- -- -- -- -18th to 24th June113885.0913.97
Florida State, 67 counties, 200 km118th to 24th June252177414894.22359.98429.0618th to 24th June252177414894.22359.9865.34
220th to 24th June217040.7704113.7020th to 24th June321553.58137.62
319th to 24th June21141869.9739.3118th to 24th June1122592943.89
423rd to 24th June28412.4988.6121st to 24th June412758.5429.96
Texas State, 252 counties, 400 km119th to 24th June47174899613.53418.4467419.119th to 24th June47173509442.193479.7133.73
222nd to 24th June21809328.30251.2622nd to 24th June21832321.64283.03
323rd to 24th June519081266.7144.5618th to 24th June432982581.2596.87
418th to 24th June2368211.4447.6118th to 24th June2371212.6248.41
523rd to 24th June410446.4326.3318th to 24th June1437.9137.76
6- -- -- -- -- -- -18th to 24th June1376.2635.03
COVID 19 emerging hotspots detected by ST-SaTScan and ST-PSO-GRHD schemes for four worst-hit U.S. states. The first column lists the name of the four U.S. states along with the count of counties in that state, and spatial upper bound in kilometers (km). The remaining fields represent the Characteristics (count of counties in the hotspot, observed and expected number of cases in the hotspot and respective LLR, followed by time taken in seconds to detect the hotspots) of the detected emerging hotspots. The spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by both schemes in four worst-hit U.S. states, California, New York, Florida, and Texas, have been presented in Fig. 1 , Fig. 2 , Fig. 3 , and Fig. 4 , respectively. In these figures, the most significant hotspot (hotspot with highest LLR) in a state is labeled as Hotspot 1. The second most significant hotspot in the state is Hotspot 2 and so on.
Fig. 1

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for California State. Hotspot with the highest LLR is labeled as Hotspot 1, second highest LLR is labeled as Hotspot 2, and so on.

Fig. 2

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for New York State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2 and so on.

Fig. 3

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Florida State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on.

Fig. 4

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Texas State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on.

Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for California State. Hotspot with the highest LLR is labeled as Hotspot 1, second highest LLR is labeled as Hotspot 2, and so on. Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for New York State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2 and so on. Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Florida State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on. Spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected by ST-SaTScan (left) and ST-PSO-GRHD (right) for Texas State. Hotspot with the highest LLR is labeled as Hotspot 1, the second-highest LLR is labeled as Hotspot 2, and so on. While comparing the performance (in terms of LLR of the detected hotspots) of both the schemes, it is observed that the LLR values of most likely hotspots detected by the proposed scheme for all the four states of the U.S. are higher than LLR of most likely hotspots detected by ST-SaTScan. For example, the LLR of hotspot #1 (i.e., most significant hotspot) detected by ST-SaTScan in California State is 2361.3. In contrast, ST-PSO-GRHD detected the most significant hotspot (hotspot #1) with improved LLR as 2838.9. Compared to ST-SaTScan, similar improvements in LLR have been observed in the detection of subsequent hotspots (hotspot #2, i.e., second most significant hotspot, etc.) in California State by ST-PSO-GRHD. Obviously, the detection of the hotspots with improved LLR is the impact of geographical robustness addressed in ST-PSO-GRHD. Besides LLR, the performance of both schemes (ST-SaTScan, ST-PSO-GRHD) has also been compared using the time required to detect the hotspot(s). Although geographical robustness is not considered in ST-SaTScan (due to enumeration of an infinite number of circular bases), it has been observed that the time taken by ST-SaTScan is significantly higher than ST-PSO-GRHD. For example, the required time by ST-SaTScan to detect all hotspots in California State is 165.59 s, whereas it is 36.76 for ST-PSO-GRHD. In another observation, there are around 4 times more counties in Texas than in California, but the required time by ST-SaTScan to detect all hotspots in Texas significantly increased by 1834 times of the required time for California. Compared to ST-SaTScan, more scalable performance has been shown by ST-PSO-GRHD where the time required to detect all hotspots in Texas is increased by 4 times the required time for California. In general, as the number of counties in the state under study increases, time taken by ST-SaTScan increases significantly. In contrast, the time taken by ST-PSO-GRHD increases almost linearly with an increase in the number of counties. The apparent reason behind the observed performance is the dependency of these schemes on the count of counties (C). Recall that the required time in ST-SaTScan is proportional to C3 (Section 2) and may not be time-efficient in the detection of hotspots with bigger values of C. In contrast, the time required by ST-PSO-GRHD is proportional to C (Section 3.4) and may be effective in the detection of hotspots with bigger values of C.

Analysis over two different timespans

In this section, we present and analyze the results of ST-PSO-GRHD over two incubation periods viz. May 28th, 2020 to June 10th, 2020, and June 11th, 2020 to June 24th, 2020. ST-PSO-GRHD is applied to the COVID-19 data of California, New York, Florida and Texas over two timespans. The spatial distribution of emerging Spatio-temporal COVID-19 hotspots detected over both the timespans in California, New York, Florida, and Texas have been presented in Fig. 5 , Fig. 6 , Fig. 7 , and Fig. 8 , respectively.
Fig. 5

Emerging COVID-19 hotspots in California detected by ST-PSO-GRHD for two consecutive timespans.

Fig. 6

Emerging COVID-19 hotspots in New York detected by ST-PSO-GRHD for two consecutive timespans.

Fig. 7

Emerging COVID-19 hotspots in Florida detected by ST-PSO-GRHD for two consecutive timespans.

Fig. 8

Emerging COVID-19 hotspots in Texas detected by ST-PSO-GRHD for two consecutive timespans.

Emerging COVID-19 hotspots in California detected by ST-PSO-GRHD for two consecutive timespans. Emerging COVID-19 hotspots in New York detected by ST-PSO-GRHD for two consecutive timespans. Emerging COVID-19 hotspots in Florida detected by ST-PSO-GRHD for two consecutive timespans. Emerging COVID-19 hotspots in Texas detected by ST-PSO-GRHD for two consecutive timespans. As observed from Fig. 5, three hotspots covering 8 counties were identified in California from May 28th, 2020 to June 10th, 2020. However, in the next 14 days, these hotspots expanded and reported 21 counties as the hotspots. These 21 counties include 7 out of 8 counties reported as emerging hotspots in the last 14 days. Similar observations were made for New York state. As evident from Fig. 6, New York reported only one hotspot consisting of New York city counties in the timespan May 28th, 2020 to June 10th, 2020. However, in the next 14 days, along with New York City, Oneida county also emerged as a hotspot. Further, it is evident from Fig. 7 that the 18 counties which were part of hotspots identified in Florida from May 28th, 2020 to June 10th, 2020 are also part of 33 hotspot counties identified in the next 14 days i.e., June 11th, 2020 to June 24th, 2020. Finally, the observations over two consecutive timespans for Texas state indicate that only 23 counties were emerging hotspots as on June 10th, 2020. However, in the next 14 days timespan, the count increased to 76 counties. It is evident from Fig. 8 that the majority of counties that were emerging hotspots in the first time span are also a part of emerging hotspots in the second time span. Motivated with the performance (quality of the detected hotspots and time requirement) of the proposed ST-PSO-GRHD scheme over ST-SaTScan, we applied the proposed ST-PSO-GRHD scheme to detect the emerging hotspots in all the U.S. states (totaling 3026 counties). In the subsequent sections, we present the observed results.

Case study: emerging hotspots in the United States

This section presents the alive and emerging COVID-19 hotspots detected for all the states in the U.S. Considering the need of timely initiation of the preventive measures for COVID-19 outbreak and inefficiency of ST-SaTScan in handling large datasets (recall from Section 4.1, there are 3026 counties in the U.S. for which day-wise confirmed COVID-19 cases are available), ST-PSO-GRHD is applied to detect the hotspots for all the states in the U.S. For uniformity, the spatial upper bound (maximum radius of the circular base of the cylindrical hotspot) is kept as 100 km (km), and the temporal upper bound (height of the cylinder) is kept as 50% of the study period (between June 11th, 2020 and June 24th, 2020). During the study period of 14 days, ST-PSO-GRHD detected 104 emerging COVID-19 hotspots. Out of 104, the characteristics of the 30 most significant emerging COVID-19 hotspots detected in the U.S. are presented in Table 5 . These characteristics include the duration of the hotspot, its LLR value, the observed (or actual) number of cases, expected number of cases (calculated using (1)) and the number of counties forming the hotspot. The characteristics of the remaining 74 hotspots are listed in Appendix 2.
Table 5

COVID 19 active and emerging hotspots detected in the United States by ST-PSO-GRHD scheme. The first column indicates the hotspot number. The remaining fields represent the Characteristics (duration of the hotspot, LLR value of the hotspot, the observed and expected number of cases in the hotspot, followed by the number of counties inside the hotspot) of the emerging hotspots.

Hotspot numberDurationLLRObservedExpectedNumber of Counties
118th to 24th June11840.71142152802.9542
218th to 24th June3235.547100874037.0747
318th to 24th June3213.82987833264.8434
418th to 24th June3213.79892743556.8279
519th to 24th June2771.25112635132.6341
618th to 24th June2253.34662372330.5388
718th to 24th June1819.05334481003.5614
818th to 24th June1802.95540481327.24616
918th to 24th June1494.5141071108.09021
1018th to 24th June1419.091885405.68997
1118th to 24th June1070.05545512105.66517
1218th to 24th June944.68153472773.9842
1318th to 24th June873.56561973648.248719
1418th to 24th June826.9278884151.61073
1519th to 24th June719.38792094802.283224
1618th to 24th June686.27797149.63941
1718th to 24th June664.345256283327.2638
1818th to 24th June662.4306889193.018613
1918th to 24th June662.16521135305.077720
2019th to 24th June655.14831871709.113517
2120th to 24th June607.08731060289.098519
2218th to 24th June550.223523261071.6692
2318th to 24th June535.13611923821.85924
2419th to 24th June461.48171283478.67091
2518th to 24th June424.8899802232.12714
2618th to 24th June414.685529871679.36319
2718th to 24th June407.8246628154.38321
2818th to 24th June379.33611938965.23683
2918th to 24th June365.7823701205.17242
3021st to 24th June291.0555739261.206119
COVID 19 active and emerging hotspots detected in the United States by ST-PSO-GRHD scheme. The first column indicates the hotspot number. The remaining fields represent the Characteristics (duration of the hotspot, LLR value of the hotspot, the observed and expected number of cases in the hotspot, followed by the number of counties inside the hotspot) of the emerging hotspots. Out of 104, the most significant hotspot referred to as Hotspot #1 is found in Arizona State with an LLR value of 11840.71 and includes 2 counties viz. Yuma and Maricopa. A total of 14215 cases are observed in these two counties for the last 7 days of the study period, which is significantly greater than the expected count of 2802.95 cases. The second most significant hotspot, referred to as Hotspot 2 with 10087 observed cases, is detected in Florida and consists of 7 counties viz. Broward, Collier, Glades, Hendry, Martin, Miami Dade, and Palm Beach. The expected number of cases in these 7 counties is 4037.07, which is much less than the observed cases, and hence it is a significant hotspot with an LLR value of 3235.54. Further, the next three significant hotspots, i.e., Hotspot #3 (includes 4 counties with LLR as 3213.82), Hotspot #4 (includes 9 counties with LLR as 3213.79), and Hotspot #5 (includes 1 county with LLR as 2771.25), are detected in Texas, Florida, and California respectively. The observed duration for the first four most significant hotspots is between June 18th, 2020 and June 24th, 2020 whereas the duration for the fifth-most significant hotspot is between June 19th, 2020 and June 24th,2020. All the 104 alive or emerging hotspots in the U.S. detected by ST-PSO-GRHD are shown in Fig. 9 . It can be observed that most of the hotspots are detected in California, Texas, Florida, New York, New Jersey, Arizona, Alabama, Mississippi, Utah, and North Carolina states of the U.S. Other states containing a few hotspots include Nevada, Colorado, Kansas, Idaho, South Dakota, Iowa, Minnesota, Tennessee, and Virginia. Out of 104, the maximum number of significant emerging hotspots is detected in Texas State whereas Michigan, Colorado, Ohio, and the Wisconsin States have the least number of hotspots.
Fig. 9

Spatial distribution of active and emerging COVID-19 hotspots detected by ST-PSO-GRHD in the U.S.

Spatial distribution of active and emerging COVID-19 hotspots detected by ST-PSO-GRHD in the U.S.

Conclusion and future scope

This paper proposed a PSO-based geographically robust and efficient scheme for detecting Spatio-temporal alive and emerging hotspots. The proposed scheme, ST-PSO-GRHD, does not rely on considering a county centroid location as the center of the hotspot and thus detects more significant hotspots (evident with improved LLR) than ST-SaTScan. Also, the ability of ST-PSO-GRHD to promptly detect significant hotspots makes it a valuable surveillance scheme that can be used to detect emerging hotspots (or outbreaks) on a daily basis and as soon as they unfold. This study utilized the COVID-19 day-wise count of confirmed cases in the United States for one incubation period of 14 days between June 11th, 2020, and June 24th, 2020 to detect emerging hotspots. Performance of the ST-SaTScan and ST-PSO-GRHD schemes to detect emerging COVID-19 hotspots have been analyzed on 4 worst-hit states of the U.S. viz. New York, California, Texas, and Florida. It has been observed that ST-PSO-GRHD detects more significant hotspots (higher LLR) than ST-SaTScan quickly. Hence, the ST-PSO-GRHD method was applied to detect emerging hotspots in all states of the U.S. The counties belonging to 104 significant emerging hotspots detected by ST-PSO-GRHD should be given attention and priority in testing and allocating resources to slow down the virus transmission. Health officials and government agencies can use the rapid surveillance and statistical analysis provided by the proposed scheme to identify high-risk areas and take timely measures to contain the COVID-19 outbreak. Besides the strengths and applicability of this study, aggregation of data at the lowest level can help officials further drill down to affected zones that may further improve the precision in handling the COVID-19 outbreak. Also, the addition of some relevant information in the context of COVID-19, viz. count of older people, count of people with preexisting health conditions, etc., in future studies may further strengthen the conclusions.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

CRediT authorship contribution statement

Ankita Wadhwa: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Manish Kumar Thakur: Conceptualization, Methodology, Validation, Formal analysis, Resources, Writing – review & editing, Supervision.

Declaration of competing interest

None.
maverageStdevTime(in sec)
97.1079382.87994312.304297
195.8038062.24977625.678133
496.6201552.86933265.58123
996.2092832.651164125.32
1996.4880892.516975246.85
2996.3160022.425302384.927146
3996.3470842.561424521.640005
4996.26292.64859656.99209
9996.3280022.6838211301.67
Hotspot numberDurationLLRObservedExpectedNumber of Counties
3118th to 24th June289.9761524146.70592
3218th to 24th June281.29311225571.27661
3318th to 24th June269.74711396698.52313
3418th to 24th June268.14991085489.84534
3518th to 24th June264.895748281.43231
3618th to 24th June263.6708444117.57778
3718th to 24th June240.867236990.287185
3818th to 24th June237.2377608216.09921
3919th to 24th June234.372321111267.372
4018th to 24th June212.97691427783.05672
4118th to 24th June212.40571527857.269513
4218th to 24th June207.14691202627.590420
4319th to 24th June201.4234502175.39172
4419th to 24th June194.5321088560.464510
4518th to 24th June192.389226681780.8532
4619th to 24th June153.4877419154.57618
4719th to 24th June149.282810410.029471
4818th to 24th June145.117429590.066483
4918th to 24th June121.1934941540.84032
5018th to 24th June119.026451197.47013
5118th to 24th June115.658311919.517841
5218th to 24th June115.1042634324.658212
5319th to 24th June113.219623481693.7327
5418th to 24th June112.6252789439.12477
5518th to 24th June107.8076609314.77110
5619th to 24th June105.7793286104.90985
5721st to 24th June105.5969284103.85419
5818th to 24th June102.097424985.816018
5918th to 24th June91.39835815487.60191
6019th to 24th June88.51591518271.16833
6118th to 24th June86.338713433.148485
6218th to 24th June85.3833616471172.72311
6318th to 24th June84.19719115.78691
6418th to 24th June80.44444757459.62971
6518th to 24th June77.537297712.112011
6619th to 24th June77.53255661390.44875
6719th to 24th June69.65585269118.81522
6818th to 24th June66.60587309147.90075
6918th to 24th June66.0109519072.229515
7018th to 24th June62.83659931629.737112
7118th to 24th June60.018449523.895971
7218th to 24th June56.3014212440.000795
7318th to 24th June56.2221720185.57981
7418th to 24th June51.00192435.4893561
7518th to 24th June47.7688710232.220371
7618th to 24th June46.58505374.3465111
7719th to 24th June46.19377596391.18096
7821st to 24th June45.39494655440.54092
7918th to 24th June45.110416470.393561
8018th to 24th June44.54471212102.5343
8121st to 24th June42.63214448279.9446
8218th to 24th June41.6442510436.369453
8320th to 24th June41.6305118989.645462
8421st to 24th June39.04052420263.99943
8518th to 24th June37.491528930.15064
8618th to 24th June37.4370518490.151181
8718th to 24th June36.93593479.6825731
8818th to 24th June36.37135293.4255542
8919th to 24th June33.741689837.440894
9018th to 24th June33.29322307185.3339
9118th to 24th June32.93585112.590391
9218th to 24th June30.75873394258.15483
9318th to 24th June29.473586520.985772
9419th to 24th June28.3176418197.694373
9519th to 24th June27.12887125.543771
9618th to 24th June26.11228937.005894
9719th to 24th June23.11535375258.3093
9818th to 24th June20.710597330.863392
9918th to 24th June20.454345218.383352
10018th to 24th June20.40657268176.5753
10118th to 24th June20.2094011.9791
10218th to 24th June20.16436626.882522
10318th to 24th June19.5700911660.987753
10419th to 24th June18.28813162.1462861
  10 in total

1.  An elliptic spatial scan statistic.

Authors:  Martin Kulldorff; Lan Huang; Linda Pickle; Luiz Duczmal
Journal:  Stat Med       Date:  2006-11-30       Impact factor: 2.373

2.  Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico.

Authors:  M Kulldorff; W F Athas; E J Feurer; B A Miller; C R Key
Journal:  Am J Public Health       Date:  1998-09       Impact factor: 9.308

3.  Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters.

Authors:  André L F Cançado; Anderson R Duarte; Luiz H Duczmal; Sabino J Ferreira; Carlos M Fonseca; Eliane C D M Gontijo
Journal:  Int J Health Geogr       Date:  2010-10-29       Impact factor: 3.918

4.  Multistate analysis of prospective Legionnaires' disease cluster detection using SaTScan, 2011-2015.

Authors:  Chris Edens; Nisha B Alden; Richard N Danila; Mary-Margaret A Fill; Paul Gacek; Alison Muse; Erin Parker; Tasha Poissant; Patricia A Ryan; Chad Smelser; Melissa Tobin-D'Angelo; Stephanie J Schrag
Journal:  PLoS One       Date:  2019-05-30       Impact factor: 3.240

5.  Spatiotemporal Variation and Hotspot Detection of the Avian Influenza A(H7N9) Virus in China, 2013⁻2017.

Authors:  Zeng Li; Jingying Fu; Gang Lin; Dong Jiang
Journal:  Int J Environ Res Public Health       Date:  2019-02-22       Impact factor: 3.390

6.  A novel coronavirus outbreak of global health concern.

Authors:  Chen Wang; Peter W Horby; Frederick G Hayden; George F Gao
Journal:  Lancet       Date:  2020-01-24       Impact factor: 79.321

7.  Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China.

Authors:  Qiurong Ruan; Kun Yang; Wenxia Wang; Lingyu Jiang; Jianxin Song
Journal:  Intensive Care Med       Date:  2020-03-03       Impact factor: 17.440

8.  A new PSO-optimized geometry of spatial and spatio-temporal scan statistics for disease outbreak detection.

Authors:  Hesam Izakian; Witold Pedrycz
Journal:  Swarm Evol Comput       Date:  2012-02-13       Impact factor: 7.177

9.  Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters.

Authors:  M R Desjardins; A Hohl; E M Delmelle
Journal:  Appl Geogr       Date:  2020-04-08

10.  WHO Declares COVID-19 a Pandemic.

Authors:  Domenico Cucinotta; Maurizio Vanelli
Journal:  Acta Biomed       Date:  2020-03-19
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.