Literature DB >> 35075315

Comparing the dynamics of COVID-19 infection and mortality in the United States, India, and Brazil.

Nick James¹, Max Menzies², Howard Bondell¹.

Abstract

This paper compares and contrasts the spread and impact of COVID-19 in the three countries most heavily impacted by the pandemic: the United States (US), India and Brazil. All three of these countries have a federal structure, in which the individual states have largely determined the response to the pandemic. Thus, we perform an extensive analysis of the individual states of these three countries to determine patterns of similarity within each. First, we analyse structural similarity and anomalies in the trajectories of cases and deaths as multivariate time series. Next, we study the lengths of the different waves of the virus outbreaks across the three countries and their states. Finally, we investigate suitable time offsets between cases and deaths as a function of the distinct outbreak waves. In all these analyses, we consistently reveal more characteristically distinct behaviour between US and Indian states, while Brazilian states exhibit less structure in their wave behaviour and changing progression between cases and deaths.

Entities: Chemical

Keywords: COVID-19; Federal states; Nonlinear dynamics; Population dynamics; Time series analysis

Year: 2022 PMID： 35075315 PMCID： PMC8769590 DOI： 10.1016/j.physd.2022.133158

Source DB: PubMed Journal: Physica D ISSN： 0167-2789 Impact factor: 2.300

Introduction

The United States (US), India and Brazil have each been severely impacted by COVID-19 and lead the world in both case and death counts. While the three countries have quite different cultures and levels of economic and technological development, they each have a similar federation structure, with governing responsibilities divided between federal and state governments. In all three countries, government responses have consistently differed between constituent states and over time [1], [2], [3], yielding different levels of virus transmission and impact on communities. Thus, a careful analysis of the most and least successful states is of great relevance to a response to the ongoing threat of COVID-19. Moreover, it is worthwhile to compare and contrast the state-by-state behaviours of the pandemic between the three countries as a whole. In the US, India and Brazil, as well as throughout the world, the scientific response to COVID-19 has been as multifaceted and as significant as the government response. Medical researchers have uncovered numerous means of treating infections [4], [5], [6], [7], culminating in the production of vaccines [8], [9]. Outside the medical field, analytical approaches to model and study the virus and its impact have been broad. First, many models based on existing mathematical models, such as the Susceptible–Infected–Recovered (SIR) model and the reproductive ratio , have been proposed and systematically collated by researchers [10], [11]. These have been utilised for various purposes, including diagnosis and prognosis of COVID-19 patients, studies of the efficacy of medications, and vaccine development. Next, nonlinear dynamics researchers have proposed several sophisticated extensions to the classical predictive SIR model, including analytic techniques to find explicit solutions [12], [13], modifications to the SIR model with additional variables [14], [15], [16], [17], [18], [19], incorporation of Hamiltonian dynamics [20] or network models [21], and a closer analysis of uncertainty in the SIR equations [22]. Other mathematical approaches to prediction and analysis include power-law models [23], [24], [25], forecasting models [26], fractal approaches [27], [28], [29], neural networks [30], Bayesian methods [31], distance analysis [32], network models [33], [34], [35], [36], analyses of the dynamics of transmission and contact [37], [38], clustering [39], [40] and many others [41], [42], [43], [44], [45]. Finally, numerous articles have been devoted to understanding the spatial components of the virus’ spread, in numerous countries [46], [47], [48], [49]. We have a different motivation and approach relative to the aforementioned work. Numerous works have studied trends in COVID-19 prevalence on a country-by-country basis [50] or state-by-state basis, frequently within the US or Brazil [2], [51]. However, we are unaware of any work to consider more than one federation of states at once. We were motivated to compare the US, India and Brazil for several reasons. First, these are the three countries most impacted by COVID-19, both in case and death counts. Secondly, the level of human development varies drastically from country to country, but less so within each federation of states. Third, during the COVID-19 pandemic, international movement drastically decreased, leaving such large federal states almost as self-contained regions in which COVID-19 spread independently from what was occurring in other countries. Thus, tracking the heterogeneity of COVID-19 prevalence and behaviour within and between federations could be used to distinguish the effects of policies at the state and federal level. For example, countries whose federal government had less of a policy role could see more heterogeneity of behaviours with states, if states implemented drastically differing policies. This work could assist various researchers in different fields. Analysing and predicting the spread of COVID-19 is consistently challenging due to the inability to establish true control groups; indeed, it is practically impossible to split entire countries into different regions where certain mitigation measures are or are not implemented. By comparing states within and between different federations, policy researchers can approximate the existence of control groups, and investigate which socioeconomic features and interventions were associated with better and worse outcomes. For policymakers, a comparison of different states within each federation can provide opportunities for state governments to learn from each other’s triumphs and setbacks. Across the three countries, this analysis could reveal relationships between COVID-19 spread and the intervention of the national government or the underlying level of economic development. This paper is structured in such a way as to thoroughly investigate numerous aspects of the spread and human cost of COVID-19 in the three federations. First, Section 2 investigates the structural similarity and anomalies in the trajectories of cases, deaths and rolling mortality rate on a state-by-state basis in the three countries. We explore commonalities in virus behaviour within the three countries as well as the extent of heterogeneity across each country as a whole. Next, Section 3 performs a closer analysis of a highly significant aspect of COVID-19 epidemiology: differing waves of the outbreak. Using a newly introduced turning point algorithm and distance between finite sets, we perform clustering on all the individual states of the US, India and Brazil to identify characteristic wave behaviours across the entire collection. Finally, Section 4 draws upon the previous two sections to address a highly pertinent metric — the average progression between cases and deaths. This paper introduces a variety of novel optimisation methods to estimate this, and takes a new approach, separating this feature according to the mathematically determined waves of the pandemic. We employ five different optimisation methods, each of which uses state-by-state data [52], [53], [54], to estimate an appropriate offset between case and death time series for the US, India and Brazil as a whole. This allows us to track the changing nature of COVID-19 mortality among the different waves of the pandemic. We summarise all our findings and insights in Section 5. In addition to the above motivation and specific questions we study, the methodologies used in this paper have applicability well beyond the COVID-19 pandemic, and could be used in any setting of multivariate time series. In particular, Section 2 presents a new approach to carefully quantify the extent of heterogeneity in a multivariate time series (or in other spaces more generally) that handle the existence of outlier elements well, while Section 4 could be used to study various other time series where lagging is to be expected. Given the fourth wave of COVID-19 that Europe is currently facing, scientists should seek to learn from the countries most severely impacted by COVID-19, and their prior waves of COVID-19 cases. This manuscript provides computational tools and findings that would be of great relevance to this audience.

Trajectory analysis, structural similarity and anomaly detection

In this section, we explore the similarity and structure between case, death and rolling mortality time series for the US, India and Brazil. Our data spans 26 Feb 2020 to 23 May 2021, a period of days. For each country, let the multivariate time series of new COVID-19 cases and deaths be and , where indexes the days and indexes states under consideration. Throughout this manuscript, we will examine either one country at a time, with states (including the District of Columbia) for the US, states (including union territories) for India, states (including the Federal District) for Brazil, or the entire collection of individual states, with . In addition, we define a 30-day rolling mortality rate for each state as follows: We wish to examine the three aforementioned multivariate time series to determine the structure and degree of heterogeneity within each country’s states and collectively, between all countries’ underlying states. To a case time series we associate the following probability distribution: where is the Dirac delta distribution at . That is, is a distribution that apportions to day the weight of the new cases observed on that day as a proportion of the total cases across the whole period. Then, we define where is the -Wasserstein metric [55] between distributions on . Analogously, we associate distributions and to death and mortality time series and , respectively. We define trajectory distance matrices between state trajectories for deaths and mortality analogously as follows: This distance has several advantageous properties over previously used discrepancy measures between normalised trajectories. Previous work [51] has used the norm and metric between normalised trajectories, defined as follows: This treats each time series as a vector in , normalises by its norm, and compares these normalised vectors with the metric [56]. This distance is suitable in most instances but has some undesirable properties when quantifying discrepancy between noisy time series. Specifically, this distance has maximal possible value equal to 2 when and have disjoint support. Practically, this would mean that two states’ trajectories would receive a large discrepancy measure if the cases were simply reported to fall on different days. For example, if state and state had broadly similar trends in cases, but in state cases were reported more on Mondays and Wednesdays while state reported more on Tuesdays and Thursdays, then the distance measure would be larger than their similarity. Smoothing and 7-day averaging can resolve some of these issues, but the Wasserstein metric ameliorates this issue even more, as it is robust to small translations of distributions. That is, if is a distribution and , then , as shown in [57]. This means the Wasserstein metric assigns a low value in the case that states and have similar trajectories where cases just fall on nearby but distinct days. We will examine the matrices defined above ( and ) for each individual country (with for the US, 36 for India, 27 for Brazil) as well as the entire collection of states, with . In Fig. 1, we display the matrices , and each for the totality of the collection. In Table 1, we record the -norms , and each restricted to one of the three federations. For example, for the US, and are 51 × 51 matrices, whereas they are 36 × 36 matrices for India. For an matrix , we define its norm by This calculates a total magnitude of the matrix, appropriately normalised for the number of non-zero elements. For our distance matrices , and , these norms reflect the heterogeneity among trajectories within each country. As the Wasserstein distance is taken between appropriately normalised distributions, it is possible to compare between case, death and mortality time series. Due to the normalisation coefficient, it is possible to compare this between different countries.

Fig. 1

Table 1

Trajectory distance matrix norms
Country	Cases	Deaths	Mortality rate
US	27.39	43.70	43.03
India	40.09	55.08	76.45
Brazil	30.30	35.33	29.67

Trajectory distance matrices, as defined in Section 2, with respect to (a) cases, (b) deaths and (c) mortality rate time series. Each matrix is computed using the entire collection of 114 states and ordered with US states first, then Indian states, then Brazil. Darker values indicate smaller entries of the matrix, signifying greater similarity between states. India exhibits particular heterogeneity between mortality rates. Table 1 reveals that India exhibits the highest heterogeneity between states regarding all three behaviours, with norms of 40.09, 55.08 and 76.45 for cases, deaths and mortality, respectively. For case trajectories, the US and Brazil have similar levels of total homogeneity. For deaths and mortality trajectories, however, Brazil’s norms of 35.33 and 29.67 are rather less than the US’ scores of 43.70 and 43.03. This highlights the relative homogeneity in death and mortality trajectories among Brazilian states. Next, we wish to further examine the heterogeneity between states of each country, as well as identify the presence of any outlier states that may be influencing the total norms recorded in Table 1. Given each country’s trajectory matrix (with respect to cases, deaths or mortality rates), we perform the following procedure to sequentially identify the most anomalous state, remove it, and compute the resulting norm of the reduced collection. This is described in Algorithm 1. In Fig. 2, we display the sequence of norm scores for each matrix , and for the US, India and Brazil. By removing the greatest in each step of the algorithm, this sequence of norm scores is necessarily decreasing. As all norms are appropriately normalised, we may compare these decreasing sequences between all our different countries and time series. Several insights can be gained from these figures. First, India consistently produces the largest anomaly score for all three attributes. This can be seen by the magnitude of the decreasing trend for India throughout the plots. This is consistent with the analysis in Table 1, but ensures that it is not due simply to the presence of a small number of outlier states. Second, relative to cases and deaths, mortality rate trajectories are significantly more dissimilar in the case of India. For the US and Brazil, there is greater uniformity in anomaly trajectories among each of the three attributes. When examining the nine sequential norm trajectories, it is pertinent to look for sharp drops, which would indicate that a particular state accounts for a disproportionate amount of heterogeneity. This effect is seen in the Indian mortality rate norms (Fig. 2(e)) and to a lesser extent in the cases and deaths norms, (Figs. 2(b) and Fig. 2(e), respectively).

Fig. 2

Sequences of decreasing matrix norms , as determined in Algorithm 1, for (a) US cases (b) Indian cases (c) Brazilian cases (d) US deaths (e) Indian deaths (f) Brazilian deaths (g) US mortality rates (h) Indian mortality (i) Brazilian mortality. These norms are obtained sequentially after removing the most anomalous country at each step. India exhibits sharper drops at the start, indicating a small collection of highly anomalous states, particularly for mortality rates.

Normalised matrix norms , and , as defined in Section 2, for each of the three countries with respect to case, death and mortality time series. The higher values for India indicate greater heterogeneity between its states. Sequences of decreasing matrix norms , as determined in Algorithm 1, for (a) US cases (b) Indian cases (c) Brazilian cases (d) US deaths (e) Indian deaths (f) Brazilian deaths (g) US mortality rates (h) Indian mortality (i) Brazilian mortality. These norms are obtained sequentially after removing the most anomalous country at each step. India exhibits sharper drops at the start, indicating a small collection of highly anomalous states, particularly for mortality rates. Table 2 records the five most anomalous states in each country with respect to cases, deaths and mortality rates, as determined by Algorithm 1, and also reveals several insights. In the US, there is a pronounced geographic trend in all three attributes’ anomaly trajectories. Northeastern states New York, New Jersey, Connecticut and Vermont are identified as anomalous in at least two attributes’ trajectories each. Several other Northeastern states appear, such as New Hampshire, Maine, Massachusetts and DC. In addition, there is substantial consistency in the states exhibiting anomalous behaviours in cases, deaths and mortality. In India, the state Lakshadweep is the most anomalous in cases, deaths and mortality, but otherwise relatively less repetition is observed among the most anomalous states. Lakshadweep’s status as an anomaly can also explain the sharp drops observed for India in Fig. 2, but not for the US or Brazil. Brazil exhibits even greater variability in the most anomalous states than the US or India, with little consistency in the states exhibiting anomalous behaviours among cases, deaths and mortality.

Table 2

The five most anomalous states in each country with respect to case, death and mortality rate time series, as determined by Algorithm 1.

Country	Cases	Deaths	Mortality
US1	Vermont	New York	Oklahoma
US2	Maine	New Jersey	Vermont
US3	New Hampshire	Connecticut	New Jersey
US4	New York	DC	Connecticut
US5	Michigan	Massachusetts	New York
India1	Lakshadweep	Lakshadweep	Lakshadweep
India2	Andaman & Nicobar Islands	Tripura	Mizoram
India3	Tripura	Andhra Pradesh	Nagaland
India4	Arunachal Pradesh	Odisha	Himachal Pradesh
India5	Assam	Dadra and Nagar Haveli	Gujarat
Brazil1	Maranhão	Pernambuco	Pernambuco
Brazil2	Roraima	Paraná	Piauí
Brazil3	Amapá	Minas Gerais	Ceará
Brazil4	Distrito Federal	Rio Grande do Sul	Distrito Federal
Brazil5	Minas Gerais	Santa Catarina	Paraíba

The five most anomalous states in each country with respect to case, death and mortality rate time series, as determined by Algorithm 1.

Wave behaviour analysis

In this section, we investigate one of the most significant aspects of the spread of COVID-19, the tendency for the virus to exhibit multiple distinct waves of prevalence. As in the last section, we analyse either each country on a state-by-state basis (with , and states) or the entire collection of states across the three countries together ( states). To each state, we apply a newly introduced turning point algorithm [51] to identify non-trivial local maxima (peaks) and minima (troughs) in the new case time series. We first apply a Savitzky–Golay filter to each new case time series to generate a smoothed collection of time series , and . We then apply a two-stage turning point algorithm, detailed in the Appendix, to generate non-empty sets and of non-trivial local maxima (peaks) and local minima (troughs), respectively. These turning points alternate between a trough and peak, beginning with a trough at , when there are no cases. Next, we use an appropriate distance measure to quantify the similarity between two sets of turning points. We apply the semi-metric first introduced in [57]. Given two non-empty finite sets , this is defined as where is the minimal distance from to the set . The distance measure is symmetric, non-negative, and zero if and only if . We then define turning point distance matrices by As before, this may be computed for the entire collection () or one specific country. In Fig. 3, Fig. 3, Fig. 3, respectively, we display hierarchical clustering on the three obtained turning point matrices restricted to the states of the US, India and Brazil separately.

Fig. 3

Examining these three dendrograms reveals a similar cluster structure between the US and India. Both countries display a dense majority cluster and a small collection of outlier states. Brazil, by contrast, exhibits quite a different structure, with two similarly sized clusters that contain the majority of elements, and then some outliers. We can further examine the cluster-split behaviour of Brazil by examining the results of clustering all states in our collection in Fig. 4. This total dendrogram contains a majority cluster containing 90% of all states, and two small outlier clusters of five and four states (clusters B and C respectively). The majority cluster contains two subclusters (A1 and A2), featuring a break between US and Indian states, with almost no intersection between the two countries. However, Brazil’s states are far more widely distributed. Not only do the outlier clusters B and C consist only of Brazilian states, but Brazil’s states are spread throughout both A1 and A2, interleaving between US and Indian states. This finding suggests that US and Indian states exhibit higher intra-collection homogeneity and inter-collection heterogeneity in their wave behaviours when compared to Brazilian states.

Fig. 4

Hierarchical clustering on the matrix , defined in Section 3, for all states in our collection. The majority cluster contains two subclusters A1 and A2, broadly consisting of US states and Indian states, respectively. Brazilian states, labelled with two letters for visibility, are interleaved among A1 and A2 and also the two outlier clusters B and C.

To elucidate the reasons behind these state clustering patterns, we study the distribution of the location of the first non-trivial trough, . This trough indicates the end of the first wave; thus, the value gives the total length of the first wave in each state. Table 3 documents the median and standard deviation of among each country’s states, while Fig. 5 displays kernel density estimates of the full distribution of values. There is significant variability between the states’ first wave lengths between the three countries. The US has a median value of 92 and a standard deviation of 76.9, indicating that most states experienced a short first wave. By contrast, Indian states mostly experienced a long first wave, with a median value of 231 and a standard deviation of 63.6. This suggests that the first wave of COVID-19 cases in Indian states was on average 2.5 times longer than US states, with limited variance between states. As in Fig. 3, Brazil does not exhibit as strong a characteristic behaviour, with a median score of 143 and a significantly higher standard deviation among Brazilian states of 109. Notably, the median value of Brazilian states is located between the US and Indian median values. Also of note is the highly skewed distribution for the Brazilian states, with a substantial number of high values despite the relatively lower peak. When viewed in conjunction with Fig. 4, one can see how the heterogeneous turning point behaviours of Brazilian states are classified into predominantly US or Indian subclusters (A1 and A2, respectively). Fig. 5 shows in more detail that the lengths of the first wave among Brazilian states are broadly positioned between those of US and Indian states.

Table 3

Median and standard deviation of the length of the first wave , defined in Section 3 and measured by the first non-trivial trough. The US has the shortest first wave, India has the longest, while Brazil exhibits the greatest variability.

Country	Median T1	Standard deviation T1
US	92	76.9
India	231	63.6
Brazil	143	109

Fig. 5

Kernel density estimates of distributions of the first wave length , defined in Section 3, over each country. The US exhibits the smallest first wave length, India the greatest, while Brazil has the greatest variability.

Hierarchical clustering on the matrix , defined in Section 3, between the individual sets of (a) US states (b) Indian states (c) Brazilian states. A broadly similar cluster structure is observed between the US and India, while Brazil’s structure is quite different. Hierarchical clustering on the matrix , defined in Section 3, for all states in our collection. The majority cluster contains two subclusters A1 and A2, broadly consisting of US states and Indian states, respectively. Brazilian states, labelled with two letters for visibility, are interleaved among A1 and A2 and also the two outlier clusters B and C. Median and standard deviation of the length of the first wave , defined in Section 3 and measured by the first non-trivial trough. The US has the shortest first wave, India has the longest, while Brazil exhibits the greatest variability. Kernel density estimates of distributions of the first wave length , defined in Section 3, over each country. The US exhibits the smallest first wave length, India the greatest, while Brazil has the greatest variability.

Offsets between cases and deaths

In this section, we combine the motivating questions from the previous two sections: the different wave behaviour of the virus, and the time-varying properties of cases, deaths and mortality rates by states. Here, we investigate various methods to quantify and analyse the changing offset between cases to deaths in the different waves of the pandemic in the three countries under consideration. To standardise our comparison of offsets between constituent states, we consider a uniform partition into waves for each entire country. That is, let be the new daily case time series for an entire country (total counts for the US, India, or Brazil). As in the previous section, we use the methodology of [51], detailed in the Appendix, to divide each aggregated country’s case time series into a first, second and possibly third wave. Let , be the first non-trivial trough, and be the second non-trivial trough, if it exists. For India and Brazil, this does not exist, so we set . Then the interval represents the first wave, the second wave, and in the case of the US only, represents the third wave. For notational convenience, we set for the US. Thus, the th wave can be described by the interval , where for India and Brazil and for the US. These turning points for the three country’s aggregated cases are displayed in Fig. 6.

Fig. 6

New daily case time series and determined turning points, defined in Section 3, for (a) the US (b) India (c) Brazil.

We apply five different methods to estimate suitable values of the offset between case and death time series for each wave in each country. Each method determines an appropriate offset using case and death data only between and . Let be the length of this interval. We describe the five methods below. New daily case time series and determined turning points, defined in Section 3, for (a) the US (b) India (c) Brazil. Affinity matrices: For a given wave and country, let the offset be chosen as follows: on each day , let be the matrices of differences between cases and deaths, respectively. That is, is an matrix defined by , where indices range over the states of one country, and similarly for . To any distance matrix , we can assign a corresponding affinity matrix defined by Let and be the affinity matrices corresponding to , respectively. Given an offset , with , let the normalised total affinity difference be defined as The matrix norm is the same as defined in (9). Then, the affinity offset of a wave is defined as the value that minimises this total difference. Probability density function (PDF): For a given wave and country, let the offset be chosen as follows: on each day, let be the probability vector for new cases and deaths on day . That is, is a length vector defined by where ranges over the states of one country. Given an offset , let the normalised total pdf difference be defined as where is the norm between vectors. Then the pdf offset of a wave is defined as the value that minimises this total difference. Wasserstein distance: Again, we assume a given country and wave is under consideration. For each constituent state , let be the offset that minimises the Wasserstein distance, where is the distribution associated over the interval , as in (2), and similarly for . Then, let be the nearest integer to the mean of the estimated offsets for each state . Energy distance: Using similar notation as the above method, for each constituent state , let be the offset that minimises the energy distance [58], where and are distributions defined above and is the integral norm between the associated cumulative distribution functions [58]. Then, let analogously as before. Normalised inner product: Using similar notation as the above method, for each constituent state , let be the offset that minimises the normalised inner product , defined as Then, let analogously as before. Thus we have offsets , for each country and wave . Each of these methods considers case and death data on a state-by-state basis, taking into account the federal structure of each country. We remark that the affinity matrix and PDF methods share common features of analysing relationships between different states’ proportional sizes of case and death counts. Also, the Wasserstein and energy methods share common features of truncating time series and computing distances between distributions. Before we present the results of this methodology, we present a proposition that demonstrates our methods work well in the case of simulated data. Let the multivariate time series of cases and deaths for a federation be and . Suppose they have the property that there exists a consistent and proportionate progression from cases to deaths after a time lag of . That is, where and are constants. Then, for any wave of length at least , all five methods above return . That is, all five methods identify the correct offset for the following simulated example. Let be a fixed interval of length . Then the normalised total affinity difference (13), evaluated for , produces the value By (19), for all in the interval . Thus, . Due to the normalisation process of computing the affinity matrix, this implies for all . Thus, the normalised total affinity difference for the value produces the minimal possible value of zero, so the method selects . Next, for the PDF method, the normalised total pdf difference evaluated for produces Again by (19), we have for all in the interval , so for all . Thus, the normalised total pdf difference for the value produces the minimal possible value of zero, so the method selects . Next, we turn to the Wasserstein and Energy distance methods. Here, we can again show that for the selected offset , the corresponding Wasserstein distance is equal to zero. Indeed, is a scalar multiple of , so when both are normalised to distributions and respectively, they coincide. Thus, produces the minimal possible value of zero for the Wasserstein distance and so the method selects for each state , hence . The same argument holds mutatis mutandis for the Energy distance. Finally, for the normalised inner product method, the same reasoning shows that the normalised inner product achieves its maximal value of 1 when , so the method selects for each state . Hence, is analogously chosen to be equal to . We remark that the procedure of truncating the interval to for the case time series and for the death time series is essential for the proof to work as above. Indeed, in this simulated example, the death time series has exactly days of leading zeros before it coincides with a shifted constant times , and the truncation is necessary for the methods to select the correct offset. □ Table 4 documents the wave-specific offsets for all three countries among our five methods. We observe broad similarity across all countries and waves between the results obtained by pairs of related methods (affinity and PDF, Wasserstein and energy). Each country presents a unique pattern in the length of their progression from cases to deaths for each wave of the pandemic. First, the US is the only country determined to experience three waves of COVID-19 cases within our analysis window. For all five methods, the first wave produces a significantly lower offset than the second and third waves of COVID-19. The timing of the first wave corresponds to the first half of 2020, when many US states (especially those located in the Northeast) were overwhelmed by early case numbers. As a result, many cases went undetected, and hospitals were unable to administer optimal care to patients. Furthermore, early in the pandemic, there was greater uncertainty within the medical community on suitable treatments for COVID-19 patients.

Table 4

Methodology	Wave 1	Wave 2	Wave 3
Affinity (US)	6	37	16
PDF (US)	5	23	16
Wasserstein (US)	11	19	41
Energy (US)	9	17	38
Inner product (US)	10	20	29
Affinity (India)	11	8	n/a
PDF (India)	8	7	n/a
Wasserstein (India)	32	5	n/a
Energy (India)	32	5	n/a
Inner product (India)	13	8	n/a
Affinity (Brazil)	9	9	n/a
PDF (Brazil)	9	9	n/a
Wasserstein (Brazil)	18	13	n/a
Energy (Brazil)	15	11	n/a
Inner product (Brazil)	12	21	n/a

Offsets between cases and deaths by country and wave of the pandemic, computed with five different methods, as described in Section 4. Only the US is determined to have a third wave within our period of analysis. India, which exhibits two waves of COVID-19 in our analysis window, features almost the opposite observation. As shown in Table 3, the length of the first wave in India was 2.5 times that of the US, and it exhibited a more gradual progression (and subsequent decline) in daily cases until states reached their first peak and trough, respectively. Although much shorter, the second wave was more severe among Indian states — with universally rapid growth in cases and deaths. All five optimisation methods determined the offset of the second wave to be shorter than that of the first wave. This mirrors our finding in the case of the US: when states are overwhelmed with COVID-19, hospitals become overwhelmed with cases, and many patients go undetected — this leads to a decrease in the length of the offset between cases and deaths. This can most likely be explained by latent COVID-19, the inability to access critical equipment (such as ventilators), and inferior treatment within hospitals. Brazil has quite a different finding again, with little consistency in the offset trend between its first and second waves. Several reasons may explain the variability in our estimates. First, the Brazilian data is quite noisy, with more missing data and reporting issues than the US and India. Second, the variability in the distribution of states’ values may suggest limited collective consistency in offset trends among the Brazilian states. Accordingly, we see no clear trend in offset behaviours as we progress from the first to the second wave of the outbreak.

Discussion

In this paper, we perform a detailed analysis of the three countries most impacted by COVID-19, the US, India and Brazil. Given COVID-19’s severe yet varied impact on countries worldwide, our motivation is to understand the differences in the dynamics of the virus’ propagation among the world’s three worst affected countries. We seek to study both internal structural similarity between states within each country and differences between the countries with respect to several attributes around COVID-19. Comparing the structural dynamics of separate countries’ COVID-19 outbreaks may provide insights into the influence different governments, cultures and healthcare systems have had in the evolution of the pandemic. In addition to this explicit contrast, we wanted to explore variability within each country, namely similarity between countries’ constituent states. First, we study the similarity between case, death and mortality rate trajectories produced by each of our three countries’ constituent states. In Section 2, we offer methodological contributions as well as non-trivial findings regarding heterogeneity between states in each federation. Our procedure in Algorithm 1 not only identifies a sequence of the most anomalous elements (in this case states) of a collection, it also produces an easily interpretable decreasing curve quantifying the collective heterogeneity. This procedure is robust to the existence of one or even several outlier elements. By the scale of the curves displayed in Fig. 2, one can immediately see that India exhibits the greatest heterogeneity between states with respect to the three trajectories analysed, particularly rolling mortality rates. This is a robust finding that consistently holds even when we remove anomalous states, and highly non-trivial given the findings of Section 3 discussed below. The specific identification of the most anomalous states is also non-obvious, revealing different patterns in each federation. In the US, we find that the most anomalous behaviour is consistently located in the Northeast. In India, the state Lakshadweep is consistently identified as most anomalous in cases, deaths and mortality. In Brazil, there is less consistency in the type of anomalies identified among our three attributes. The insights generated above concern broad structure in the data on a state-by-state basis. We have combined existing statistical learning methodologies (such as clustering), a new distance between trajectories as well as a new algorithmic approach to identify specific states and quantify overall heterogeneity, with robustness to outliers. The insights presented in this manuscript would not be possible without a combination of existing (rather sophisticated) and new (rather bespoke) procedures, all carefully considered for the application. More broadly, most COVID-19 data consumed by the general public is reported at the national level; most variation within states is ignored, especially a detailed quantification of heterogeneity. Our methods combine non-trivial mathematical investigation with data sets that are typically not examined in detail at the state level. In Section 3, we apply our turning point algorithm to study wave behaviours among the three countries. In the US, where three waves of COVID-19 cases are observed, a median first wave length of 92 days is found among the distribution of US states. By contrast, Indian states produced a median first wave length of 231 days, with a lower variance than the US, and just two waves of COVID-19 cases overall. In Brazil, where two waves of the cases were also identified, the median length of states’ first wave was 143, with high variance. Our analysis suggests that US and Indian states exhibit stronger characteristic behaviours than those exhibited by Brazil. Indeed, clustering reveals that the US and India are quite dissimilar in wave behaviour, almost entirely clustering among themselves, while Brazil is quite heterogeneous, with some states similar to US states, some similar to Indian states, and some outlier states. These findings are highly non-trivial without undertaking judicious mathematical analysis as we have done. Numerous papers on COVID-19 simply estimate the duration of waves by inspection or other unreliable methods, while we use a careful algorithm to do so. Unlike most work, we do so on a state-by-state basis, and thus must deal with data issues such as anomalous counts and missing values. Our findings contrast notably with Section 2 and are highly non-trivial to guess. While it is predictable that US and Indian states exhibit relatively strong characteristic wave behaviours among themselves, it is certainly non-trivial that Brazilian states interleave between US and Indian states with respect to wave behaviour, and that the distribution of first wave length among Brazilian states (Fig. 5) is so broad. Further, it is striking that Section 2 reveals the greatest heterogeneity between Indian states in terms of trajectories, but Section 3 demonstrates the least variance in first wave length (Table 3). This is not necessarily contradictory but is highly non-obvious: case and death curves exhibit substantial differences but the overall wave pattern is more uniform across India. Finally, Section 4 introduces new optimisation methodologies to study the progression of COVID-19 cases to deaths in each of our three countries’ waves of the pandemic. We believe this is the first work to explicitly acknowledge that the progression from cases to deaths may vary between different waves of the pandemic and aim to study this. In the US, we highlight a significantly longer period between diagnosis and death in the second and third waves of COVID-19 cases. This finding is consistent among all five optimisation methods. In India, all five methods demonstrate a sharp reduction in the length of this offset as we progress from the first to the second wave. In Brazil, we find limited consistency among our methods, with no clear takeaway regarding the change in the length of the COVID-19 case life cycle, in the first and second waves. In aggregate, our analysis suggests that when countries become overwhelmed with COVID-19 cases, the length of the case-to-death progression decreases. This may be due to overwhelmed hospital systems, sub-optimal medical treatment, limited access to medical resources such as ventilators and an increase in undetected cases. We also include theoretical validation of our methodology, which is non-trivial due to the truncation of time series inherent in the case and death data (that is, death data lag behind cases and non-zero counts begin later). There are several reasons why these determinations of offsets between cases and deaths are not particularly obvious. First, they are computed in a high dimensional manner with several methods that use the federal structure of the three countries. Second, the changes between waves of these offsets are different for all three federations, which we believe shows the impossibility of a straightforward prediction of their behaviour. Algorithmic techniques must be used to identify time series turning points (corresponding to waves of the pandemic), and the relationship between cases and deaths is fluid — varying over time, across countries and between countries’ constituent states and territories. Although the offset in the progression from COVID-19 cases to deaths is only one facet of a hugely complex global pandemic, it is of great importance to understand for the future treatment and management of COVID-19 cases. COVID-19 data follows a causal structure: any COVID-19 case will ultimately progress into either the recovered or death category. This causal structure is typically modelled via SIRD models and their variants described in Section 1. These have their utility, but are not ideal to study the multi-wave dynamics of COVID-19 brought about by regularly shifting government restrictions and community behaviour. We choose to exclusively address the transition from cases to deaths without the strong parametric assumptions in SIRD models; we believe this progression to be of direct importance in treating COVID-19 patients currently burdening many countries’ healthcare systems.

Future work

There are many avenues for potential future work, in both methodological and applied contexts. First, one could investigate the reasons for more or less heterogeneity among constituent states for various countries. For example, one could explore why Brazil’s states experienced rather different outcomes relative to wave behaviours and progression from cases to deaths. In this paper, we highlight that these differences are far more significant than the USA and India. Indeed, Brazil’s human development index (HDI) of 0.765 is between that of the US (0.926) and India (0.645), and it is conceivable that development among Brazilian states differs more than that among the US or India. This, along with other predictors, may help construct supervised and unsupervised learning algorithms where relationships can be learned and associations can be formed, respectively. Next, the methods that are introduced in this paper could be extended. Although the offsets in this paper have been implemented in discrete time partitions, these methods could conceivably be implemented in a rolling manner, where a continuous (time-varying) offset may be estimated. Furthermore, the theoretical aspects of these estimators could be further investigated, and tested on data generated from a variety of data generating processes. This may include noise generated from a wide variety of distributions, adversarial data such as extreme points and outliers, and so on. In addition, future work could further explore the aforementioned causal structure in the data, including offsets between time series of COVID-19 cases, counts of recovered patients (including those who experience “long Covid” [59]) and COVID-19 deaths. One could compare the offsets between COVID-19 cases and deaths, and COVID-19 cases and recovered patients separately — and then study whether there is a latent relationship between these two offsets, and more specifically, study how they evolve with time. Our descriptive and nonparametric analysis could conceivably be incorporated with judiciously chosen SIRD models on a wave by wave basis. At the time of writing this paper, many parts of the world are currently experiencing a fourth wave of COVID-19 cases. Many European countries such as Austria and Germany are attracting a substantial amount of publicity, regarding their growth in new daily COVID-19 cases. It would be of great interest to compare the heterogeneity of COVID-19 epidemiology within differing states or regions of these countries, and estimate the offset in the progression from cases to deaths during the fourth wave of the pandemic. In particular, with the appropriate data, one could distinguish between the vaccinated and unvaccinated populations.

Conclusion

Overall, we have identified numerous features that characterise the nature of the pandemic within the US, India and Brazil. India exhibits the greatest heterogeneity in its trajectories, and yet simultaneously the most homogeneity in its wave behaviours due to a very long first wave and a rapid second wave in almost every state. The US and India cluster quite separately in trajectory and wave behaviours, while Brazilian states are interleaved between them, characterised by the greatest variance in wave lengths. A similar distinction is observed in offsets, where the US case-to-death progressions drastically lengthen between first and subsequent waves, the reverse holds for India, while Brazil is again a mixture of the two. Throughout this work, we have identified specific states within the three federations as the most anomalous and determined various non-trivial features in the federations’ COVID-19 behaviour, including heterogeneity of trajectories, wave behaviour, and the progression from cases to deaths. New methodologies have been presented for this purpose, including the ability to more robustly determine distances between trajectories and determine patterns in overall heterogeneity without too much vulnerability to outliers. We have identified numerous avenues for future work to apply these methods in new contexts, such as Europe’s fourth wave, or to undertake closer analysis with researchers from other disciplines to investigate some of the policy measures or regional features that could be contributing to these patterns.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

50 in total

1. Spatiotemporal Characteristics of the COVID-19 Epidemic in the United States.

Authors: Yun Wang; Ying Liu; James Struthers; Min Lian
Journal: Clin Infect Dis Date: 2021-02-16 Impact factor: 9.079

2. Accurate closed-form solution of the SIR epidemic model.

Authors: Nathaniel S Barlow; Steven J Weinstein
Journal: Physica D Date: 2020-04-29 Impact factor: 2.300

3. Epidemic models with discrete state structures.

Authors: Suli Liu; Michael Y Li
Journal: Physica D Date: 2021-03-24 Impact factor: 2.300

4. COVID-19: Development of a robust mathematical model and simulation package with consideration for ageing population and time delay for control action and resusceptibility.

Authors: Kok Yew Ng; Meei Mei Gui
Journal: Physica D Date: 2020-06-09 Impact factor: 2.300