Literature DB >> 35938066

Spatio-temporal variation of Covid-19 health outcomes in India using deep learning based models.

Abstract

Deep learning methods have become the state of the art for spatio-temporal predictive analysis in a wide range of fields, including environmental management, public health, urban planning, pollution monitoring, and so on. Despite the fact that a variety of powerful deep learning-based models can address various problem-specific issues in different research domain, it has been found that no single optimal model can outperform everywhere. Now, in the last two years, various deep learning-based studies have provided a variety of best-performing techniques for predicting COVID-19 health outcomes. In this context, this study attempts to perform a case study that investigates the spatio-temporal variation in the performance of deep-learning-based methods for predicting COVID-19 health outcomes in India. Various widely applied deep learning models namely CNN (convolutional neural network), RNN (recurrent neural network), Vanilla LSTM (long short-term memory), LSTM Autoencoder, and Bidirectional LSTM are considered to investigate their spatio-temporal performance variation. The effectiveness of the models is assessed using various metrics based on COVID-19 mortality time-series from 36 states and union territories of India.

Entities: Chemical

Keywords: Covid-19; Deep learning; Spatio-temporal variation

Year: 2022 PMID： 35938066 PMCID： PMC9345394 DOI： 10.1016/j.techfore.2022.121911

Source DB: PubMed Journal: Technol Forecast Soc Change ISSN： 0040-1625

Introduction

Spatio-temporal data analysis is becoming increasingly important due to the increasing availability of large scale data in various fields of research. Deep learning models have recently been used in spatio-temporal predictive analysis in a variety of applications, including pollution monitoring (e.g., particulate matter prediction), climate monitoring (e.g., precipitation forecasting), public health monitoring (e.g., COVID-19 health outcome prediction), and so on. However, due to the spatio-temporal variations in the data of target variables, the models’ performance varies greatly both spatially and temporally. In the aforementioned context, the primary objective of this research is to investigate spatio-temporal variation in the performance of deep learning-based methods for estimating COVID-19 health outcomes in India. At the beginning of 2020, the world health organization announced the COVID-19 outbreak a public health emergency of global concern. So far, COVID-19 has been associated with over 256 million confirmed cases and about 5.2 million fatalities globally. Many countries seek to protect their citizens’ health by imposing mitigation strategies including social distancing, quarantines, travel restrictions, hard and soft locks, as well as postponing and cancellations of activities. In this context, several deep learning-based models for predicting COVID-19 health outcomes have been introduced in the existing works to aid in healthcare resources management and mitigation planning. Specifically, the accurate prediction could be helpful to decide the goals, guiding principles, and strategies to mitigate future consequences due to COVID-19. Various deep learning based previous studies (Mohimont et al., 2021, Chimmula and Zhang, 2020, Zeroual et al., 2020, Said et al., 2021, Shastri et al., 2020) from around the world have yielded various top-performing methods for forecasting COVID-19 health outcomes. It is observed that there is no single best deep learning based forecasting model that can outperform everywhere. This is because there is a wide variation that exist in the COVID-19 health outcome time-series. It is worth noting that the variation in COVID-19 health outcome time-series depends on various risk factors that vary greatly across the geographical space. Moreover, several literature (Middya and Roy, 2021, Sannigrahi et al., 2020) highlights that there exists a geographically varying nature of the association between COVID-19 health outcomes and the complex risk factors. Hence, it is expected that there should have some variation in the models’ performance. In the above-mentioned context, this study attempts to explore the spatio-temporal variation in the effectiveness of the deep learning based predictive methods for predicting covid-related health outcomes. It is worth noting that the deep learning techniques can learn and extract features from raw and imperfect data on their own. Specifically, these techniques could be helpful in a time series forecasting scenario for automatically learning the temporal dependency from the data. Moreover, advanced deep learning techniques, e.g., LSTM (long short-term memory) (Middya and Roy, 2022, Nath et al., 2021, Das et al., 2021) and its variants, are highly capable in detecting patterns from input data that span lengthy periods of time. These techniques can help to address the problem by removing the requirement for extensive feature engineering activities, data scaling techniques, and differencing to make the data stationary. The classical timeseries forecasting methods cannot perform well if there exist complex associations between the variables, noisy data, missing data, and irregular temporal patterns in data. In other words, to perform successfully, classical timeseries forecasting methods often require clean and comprehensive data sets. On the other hand, the deep learning based techniques are resistant to noise in the input data, and can even learn and forecast in the existence of missing data. There are several existing deep learning based forecasting methods for COVID-19 health outcome forecasting. For instance, in Mohimont et al. (2021), Mohimont et al. employed temporal CNN to forecast various health outcomes, e.g., number of confirmed cases, deaths, etc. By using LSTM (long short-term memory) networks, Chimmula and Zhang (2020) predicted the COVID-19 transmission in Canada. Moreover, some previous works (Zeroual et al., 2020, Said et al., 2021, Shastri et al., 2020) explore various LSTM variants such as Bi-LSTM, Stacked LSTM, etc for covid-19 prognosis. However, to the best of our knowledge, none of the existing literature focuses on investigating spatio-temporal variation in the performance of deep learning based models. This study attempts to explore such spatio-temporal variation in the models’ performance and also identify spatial and temporal distribution of the methods for predicting covid-related health outcomes in India. The major contributions of this work can be summarized in the following points. Explore spatio-temporal variability pattern in the performance of deep learning based forecasting models (namely CNN, RNN, Vanilla LSTM, LSTM Autoencoder, and Bidirectional LSTM) for predicting COVID-19 health outcomes in India. Thoroughly evaluate the performance of the methods based on various performance indicators for COVID-19 mortality time-series data of 36 states and union territories of India for various forecast horizons. Finding spatial distribution of best-performing models for predicting Covid-19 mortality data for different temporal resolutions (e.g., one week, two weeks, etc.). The article is structured as follows. Section 3 provides the problem definition and the presented approach. In Section 4, experimental setting, evaluation metrics, and results are provided. Future directions and conclusion are provided in Section 5.

Related work

This section discusses existing works that are primarily concerned with spatiotemporal data modelling of COVID-19 health outcomes. Specifically, in the following paragraphs, we present related works, their limitations/gaps, and how our work attempts to bridge the gaps. Several previous works capture patterns of COVID-19 progression using mathematical, machine learning, and deep learning techniques (John et al., 2022). Kavouras et al. (2022) studied the efficacy of employing deep learning techniques to properly model COVID-19 transmission. However, they focuses on low-granular (country-level) data modelling and analysis over EU (European Union) regions. Furthermore, they created models that can only predict COVID-19 health outcomes on a daily basis (i.e., short forecast horizon). Nikparvar et al. (2021) conducted spatio-temporal forecasting of COVID-19 health outcomes in US counties. However, they only used vanilla LSTM model and did not verify the effectiveness of their proposed model with the advanced deep learning models, e.g., other LSTM variants. In da Silva et al. (2021), the authors used spatial and time-dependent features to perform monitoring and real-time spatio-temporal prediction of COVID-19 in Brazil. They only focuses on shallow machine learning based models (e.g., random forests, support vector machines, etc.) and eventually end up with low predictive accuracy. Bhimala et al. (2021) used the meteorological parameter integrated deep learning technique to estimate COVID-19 incidents for various states of India. However, they only consider a vanilla LSTM model for relatively short forecast horizon. Conventional epidemiological methods (e.g., the Susceptible–exposed–infectious–removed (SEIR) method and variations) (Tang et al., 2020, Zhao et al., 2020, Fanelli and Piazza, 2020) have also been used to quantify viral dissemination and predict the influence of policy actions on infection rates. However, these methods suffers from various shortcomings. For instance, these methods are reliant on a large number of hypothesized input factors. Since SEIR methods are so sensitive to variations in these input factors, their predicted accuracy might suffer significantly. In contrast to earlier research, this study focused on data modelling to investigate if advanced deep learning models can better capture the spatiotemporal variability of COVID-19 health outcomes. Specifically, a variety of advanced deep learning models namely CNN, RNN, Vanilla LSTM, LSTM Autoencoder, and Bidirectional LSTM are built to accurately perform state-level high granular COVID-19 variability analysis in India. This research work also focuses on developing models with a relatively longer forecast horizon (maximum forecast horizon of 21 days, i.e., 3 weeks). Moreover, compared to the existing works, the advanced deep learning models are rigorously evaluated on multiple forecast horizons to verify the models’ scalability.

Problem statement and proposed approach

Problem statement

Suppose is a set of states (=28) and union territories (=8) of India. Given a total of different COVID-19 death time-series datasets corresponding to the states and union territories. Let, denotes the time series dataset of daily COVID-19 death count for . Here, is the number of COVID-19 deaths on th day in . Suppose, is a set of popular deep learning based time-series forecasting models that could be used for predicting future values of COVID-19 health outcomes. The goal of this research work is to determine the optimal set of forecasting models and a mapping between and , i.e., . More specifically, the objective is to find a state-wise spatial mapping of optimal deep learning based methods for estimating the number of covid-related deaths in India. Moreover, this study attempts to explore spatial variability in the performance of the optimal forecasting models.

Proposed approach

Fig. 1 provides a detailed framework for exploring spatio-temporal variation in the performance of deep learning based methods for estimating state-level COVID-19 mortality in India. This section demonstrates how state-level COVID-19 mortality time-series data is acquired, preprocessed, batched, and subsequently utilized for model training and performance evaluation.

Fig. 1

Overall approach of spatio-temporal variation in the performance of deep learning based methods for estimating state-level COVID-19 mortality in India.

Dataset description

On the basis of daily time-series data from India, location-specific optimal forecasting models are found. The COVID-19 death time series data are collected for a total of 36 states and union territories of India. More specifically, 36 time-series datasets (one for each of the states and the union territories) are constructed. The raw COVID-19 data are collected up to November 01, 2021 from two sources namely (i) MOHFW website1 (ii) COVID19INDIA website2 . COVID19INDIA is a crowd-sourcing project that documents COVID-19 data from India’s states and union territories. Each of the 36 datasets contains about 576 days’ data samples. Geographical distributions of state-level death counts are presented in Fig. 2. More than 46% of the states report more than 10 000 COVID-19 related death cases. The state of Maharashtra is the most severely affected by COVID-19, with more than 140000 deaths.

Fig. 2

Spatial distribution of state-level COVID-19 death counts.

Data pre-processing and batching

This section provides the details of the pre-processing of the raw time-series data and batching of the pre-processed data. The COVID-19 mortality time-series for each state and UTs is smoothed with an exponential function to lessen the jaggedness of the curve. Eq. (1) provides the mathematical form of exponential the smoothing function. where represents the smoothing factor and . The raw COVID-19 mortality time series data is denoted by starts at and the smoothed output is represented as . Note that Eq. (1) considers that . The data are divided into train and test sets once it has been smoothed. The time-series data cannot be shuffled since the order is determined by ‘timestamp’. As a result, the test set is made up of the last 150 days of data, which constitutes around 25% of the whole dataset. After that, we fit a standard scaler to the training data only, then use that scaler to standardize both the training and test data. Standardization can be expressed mathematically in Eq. (2). where, denotes data point, and represent mean and standard deviation respectively. Spatial distribution of state-level COVID-19 death counts. Next, the time-series sequence data are converted into the supervised form which we call the batching process. Let us consider, is a sequence of data values found after standardizing the mortality time-series. The sequence of data can be split into several input–output vectors, where consecutive values are considered as an input vector and next consecutive values are considered as an output vector (i.e., target vector). If the input and corresponding output vectors are denoted by and respectively, the scaled data after batching can be represented as , where, , and . Note that, . The details of the batching process are provided in Algorithm 1.

Deep learning models

This section discusses various deep learning models that are employed in predicting state-level COVID-19 health outcomes in India. Spatial variation in the models’ performance could then be used in finding the geographical distribution of best-performing models. Convolutional neural network (CNN) The CNN (Albawi et al., 2017, Zhao et al., 2017) is usually used to extract features from two-dimensional data and is commonly utilized for image processing. However, because of its ability to learn both local and global characteristics from sequence data, CNNs are also used in several time-series applications. In this context, a one-dimensional CNN (1D-CNN) is employed for forecasting COVID-19 health outcomes from COVID-19 mortality time-series data. CNN’s major feature learning layers are the convolutional and pooling layers. The following is the procedure (i.e., Eq. (3)) for performing a 1D convolutional operation where, and are the input vector and the kernel respectively; is the kernel size. Across the feature maps produced by the convolution layer, the pooling is utilized to decrease the size of the feature map. Recurrent neural network (RNN) For sequence prediction tasks, RNNs (Hewamalage et al., 2021) are the most often utilized neural network architecture. RNN consists several RNN units. Elman developed the ERNN (Elman RNN) cell, which is a type of base recurrent unit (Elman, 1990). An ERNN cell could be mathematically expressed using Eqs. (4), (5). where the input and output of the cell at th time step are and respectively, indicates hidden state. For the hidden state, and represent the weight matrices, and represents the bias vector. Similarly, and are the cell output’s weight matrix and bias vector, respectively. Here, represents the sigmoid function. Note that the present hidden state is determined by the earlier time step’s hidden state as well as the present input. For the lengthy sequences, the RNNs suffer from the exploding gradient and vanishing gradient issues. This means that RNN will not be capable to carry long-range dependencies into the future. Vanilla LSTM Many variants of the basic RNN have been produced over time to overcome its flaws (Hewamalage et al., 2021). LSTM model, which is able to learn long-range dependencies, is possibly the most popular branch of RNN. An LSTM cell has two states: the internal cell state and the hidden state, which are different from the standard RNN cell. Eq. (6) - Eq. (12) describes an LSTM cell mathematically where cell state and hidden states at th time step are represented by and , denotes candidate cell state. The input and output are and respectively. , , , and represent the weight matrices of the cell state, forget gate, output gate, and input gate respectively. Similarly, the weight matrices related to the present input are represented by , , , and . On the other hand, , , , and are bias vectors. The forget, output, and the input gate vectors are denoted by , , and respectively. The function denotes the sigmoid activation. For time series forecasting, LSTM is one of the most commonly utilized models. LSTM Autoencoder Model architecture for state-level COVID-19 death count forecasting models (a) CNN (b) LSTM Autoencoder (c) RNN (d) Vanilla LSTM (e) Bi-Directional LSTM. An autoencoder (AE) (Hinton and Salakhutdinov, 2006) is a from of neural network architecture utilized to reduce dimensions and feature extraction. Specifically, an AE compresses the data to a smaller-dimensional code, which they then use to rebuild the output. In general, it is made up of two key parts such as an encoder and a decoder. Compressed representations are learnt by the model during the encoding stage. On the other hand, the reconstructions of the outputs from the compressed representation are performed in the decoding stage. The encoding and decoding could be mathematically expressed using Eqs. (13), (14). where, and are encoder and decoder functions respectively; represents the input to the encoder; represents the output from the decoder; is the encoder’s output which is also used as the input to the decoder; and represents the bias vectors. The AE is trained by minimizing the reconstruction error. The LSTM autoencoder (Zeroual et al., 2020) is an AE in which the LSTM network serves as both the encoder and the decoder components. The capability of LSTM to identify patterns in time series data over extended periods makes them ideal for creating a forecasting model. Bidirectional LSTM Bidirectional LSTMs (Bi-LSTMs) (Zeroual et al., 2020) are a type of LSTM that considers both past and future states in order to increase predictive performance. The Bi-LSTM is made up of two separate LSTM networks, one processing the time-series data from right and the other processing it from left. It can learn sequential input through both backward and forward directions using the LSTM architecture. The forecast result for the upcoming time step is generated by combining the outcomes of both the backward and forward LSTM networks. In this work, Bi-LSTM is found to be an effective technique for the future prediction of COVID-19 health outcomes.

Implementation of deep learning models

This section discusses the details of problem-specific model architectures, training, hyper-parameter tuning, etc. Fig. 3 presents the architecture of the models used in this study to predict COVID-19 health outcomes at the state level in India. Fig. 3(a) presents the CNN-based model architecture for prediction, which consists of two convolutional blocks (ConvB1 and ConvB2). Each of the convolutional blocks has two Conv1D layers followed by a max pooling and a dropout layer. For each Conv1D layer, number of filters (Nf) 32 and kernel size (Sf) 4. A pool size (Ps) of 2 and dropout rate (Dr) of 0.2 are used for max pooling and the dropout layers respectively. The output of ConvB2 is flattened by the flatten layer which is followed by two dense layers. In the first dense layer, the number of units (Nu) is 48. On the other hand, in the second dense layer, Nu is the value of the forecast horizon (i.e., the number of output steps). Fig. 3(b) presents the architecture of the LSTM auto-encoder. The encoder part contains two LSTM layers (L1 and L2) with 100 and 50 units respectively. Similarly, the decoder part has two LSTM layers (L1 and L2) with 50 and 100 units. There is a repeat vector layer between the encoder and decoder parts. Finally, similar to CNN, there are two dense layers with the same Nu values. The architecture of the RNN based predictive model is presented in Fig. 3(c). It made up of an RNN layer followed by a dropout layer and two Dense layers. Here, Nu in the RNN layer and the dropout rate in the dropout layer are Nu 80 and Dr 0.2 respectively. Fig. 3(d) presents the architecture of the vanilla LSTM. It has a single LSTM layer with 100 units followed by a dropout layer (Dr 0.2) and two dense layers. Finally, the bi-directional LSTM model’s architecture is shown in Fig. 3(e). There are two bi-directional LSTM layers each with Nu of 70. These layers are followed by an LSTM layer (with Nu 50), a dropout layer (Dr 0.2), and two dense layers.

Fig. 3

Model architecture for state-level COVID-19 death count forecasting models (a) CNN (b) LSTM Autoencoder (c) RNN (d) Vanilla LSTM (e) Bi-Directional LSTM.

As previously stated, approximately 25% of the most recent COVID-19 mortality data is included in the test set, while the remaining 75% is used for training purposes. When creating learning models, hyper-parameter optimization is a critical step. The manual process of trying random combinations takes a long time because identifying the set of optimal parameters is a time-consuming procedure. To combat this, a parameter tuning technique known as grid-search is applied to various sets of parameter values to identify optimal parameter settings. In this work, model-specific parameter settings are identified by considering a search space specific to the model. The parameters considered during model tuning are: learning rate, batch size, kernel size, number of epochs, dropout rate, and number of units. The parameter settings are found based on a validation set taken from the training data. MSE (mean squared error), which is expressed in Eq. (15), is the loss function used for the model training. where, and denote the actual and estimated death counts and denotes the total number of instances. The MSE loss is suggested over the MAE loss because the MAE derivative remains constant whereas the MSE derivative reduces as the estimation approaches to the target. MSE’s learning rate would slow down, whereas MAE’s would stay the same. When training the predictive models, the Adam optimizer is considered. The Adam optimizer could be expressed using the following set of equations (Eq. (16) - Eq. (19)). where, loss function is denoted by ; learning rate is denoted by ; the momentum is represented by ; sum of the squares of the previous gradient is denoted by ; and represent the decay coefficients; the bias-corrected form of and are and respectively; represents the weight at th step. Spatial distribution of MAE values for different models. (a) LSTM Auto-Encoder (b) Bi-LSTM (c) CNN (d) vanilla LSTM (e) RNN. Spatial distribution of RMSE values for different models. (a) LSTM Auto-Encoder (b) Bi-LSTM (c) CNN (d) vanilla LSTM (e) RNN.

Evaluation

Experimental setup

The numerical experiments are conducted under a Windows environment on a PC with 3.30 GHz Intel(R) Xeon(R) CPU E3-1226 v3 processor and 128 GB of RAM. Python (https://docs.python.org/3.7/) is used to implement all of the deep learning-based COVID-19 health outcome prediction models and various analyses. The predictive models are developed using the following major Python libraries: (i) numpy (version 1.19.5) (ii) pandas (version 1.1.5) (iii) keras (version 2.7.0) (iv) sklearn (1.0.1). Moreover, for the visualization of spatial data, QGIS (Quantum GIS) is utilized.

Evaluation metrics

Various evaluation metrics (namely mean absolute error, root mean squared error, and coefficient of determination) are used to assess the effectiveness of the methods for predicting COVID-19 mortality counts. The mathematical form of root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) are expressed in Eq. (20), Eq. (21), and Eq. (22) respectively. where, and are th observed and the predicted value, is the total number of predictions, sum of squares of residuals is represented by , and total sum of squares is represented by TSS.

Spatial variability in the models’ performance

For different states and union territories, deep learning based forecasting techniques are built using state-level COVID-19 mortality data. The geographical distribution of MAE (mean absolute error) and RMSE (root mean squared error) values for different states and union territories are provided in Fig. 4, Fig. 5 respectively. The accuracies are computed for a 3-week forecast horizon. It is observed that for the north-western (primarily the states of Rajasthan and Punjab) and south-eastern (primarily the states of Andhra Pradesh, Telangana, and Odisha) parts of India, the CNN-based predictive models perform better in terms of MAE and RMSE values. Almost all of the models achieve better accuracy in the northern and eastern parts. In the southern parts (primarily the states of Kerala and Tamil Nadu) the LSTM auto-encoder performs relatively better compared to other methods with respect to both the MAE and RMSE values. In comparison to the other models, the Bi-LSTM model performs better in the western regions (primarily the states of Gujarat, Maharashtra, and Karnataka).

Fig. 4

Spatial distribution of MAE values for different models. (a) LSTM Auto-Encoder (b) Bi-LSTM (c) CNN (d) vanilla LSTM (e) RNN.

Fig. 5

Spatial distribution of RMSE values for different models. (a) LSTM Auto-Encoder (b) Bi-LSTM (c) CNN (d) vanilla LSTM (e) RNN.

The distribution of the models’ COVID-19 health outcome predictive accuracies are also observed to vary significantly over the forecast horizon. Fig. 6, Fig. 7, and Fig. 8 present the distribution of MAE, RMSE, and R2 values over three forecast horizons (1 week, 2 weeks, and 3 weeks). The accuracy distribution is visualized using box plots that overlap with the corresponding swarm plots. The box plots show various distribution summaries namely minimum value, first quartile, median value, third quartile, and the maximum value. Note that a swarm plot corresponding to a box plot indicates where the data point falls in the distribution summaries. For the box plots of MAE (Fig. 6) and RMSE (Fig. 7) values, it is clear that all the methods perform well for a shorter horizon. For a shorter forecast horizon, the interquartile range (IQR) for different models is smaller. It implies that the predictive accuracy across most of the states does not differ significantly for shorter forecast horizons. The increase in IQR due to the increase of forecast horizon is less in the case of Bi-LSTM model. Moreover, for both the MAE and RMSE, the visual analysis reveals that the models like Bi-LSTM and CNN achieve better median accuracies across the forecast horizons. Fig. 8 shows relatively high R2 values are by Bi-LSTM and CNN models with less IQR.

Fig. 6

Box and swarm plots of MAE values for various predictive models over different forecast horizons (a) one week (b) two weeks (c) three weeks.

Fig. 7

Box and swarm plots of RMSE values for various predictive models over different forecast horizons (a) one week (b) two weeks (c) three weeks.

Fig. 8

Box and swarm plots of R2 values for various predictive models over different forecast horizons (a) one week (b) two weeks (c) three weeks.

Box and swarm plots of MAE values for various predictive models over different forecast horizons (a) one week (b) two weeks (c) three weeks. Box and swarm plots of RMSE values for various predictive models over different forecast horizons (a) one week (b) two weeks (c) three weeks. Box and swarm plots of R2 values for various predictive models over different forecast horizons (a) one week (b) two weeks (c) three weeks. Observation versus predicted plots of COVID-19 death count for some of highly covid-affected states (a) West Bengal (b) Maharashtra (c) Karnataka (d) Tamil Nadu (e) Delhi (f) Uttar Pradesh (g) Punjab (h) Andhra Pradesh. The observation versus prediction plots of COVID-19 death count for some of the highly covid-affected states is provided in Fig. 9. For a particular state, the plots are provided only for the best-performing model. The results show that the estimated COVID-19 death counts are closely spread along the line relative to the observed values in the majority of cases. The models are unable to predict higher values for some states and union territories (e.g., Maharashtra, Delhi, etc.).

Fig. 9

Observation versus predicted plots of COVID-19 death count for some of highly covid-affected states (a) West Bengal (b) Maharashtra (c) Karnataka (d) Tamil Nadu (e) Delhi (f) Uttar Pradesh (g) Punjab (h) Andhra Pradesh.

Spatial distribution of optimal models

This section presents the spatially varying distribution of best-performing models for predicting COVID-19 death counts in India. There is no single best method that can outperform across different states in India due to the wide variation in COVID-19 mortality time-series. Fig. 10 shows the distribution of the state-specific optimal models in India for different forecast horizons (1 week, 2 weeks, and 3 weeks). It is interesting to observe how the distribution of best-performing models changes over different forecast horizons. Fig. 10(a) presents the models’ distribution when the forecast horizon is 1 week. Note that in the case of a shorter horizon, both the basic models such as CNN and RNN dominates for most of the states. For instance, CNN mainly dominate in the southern, northern and eastern parts of India. On the other hand, RNN mainly dominates in the central and western parts of India. However, in the case of a longer forecast horizon, some advanced architecture like Bi-LSTM, LSTM auto-encoder outperform instead of RNN. Bi-LSTM and CNN achieve significantly better results across the states in all three forecast horizons. Bi-LSTM provides optimal performance in approximately 14%, 39%, and 34% of the states for a forecast horizon of 1 week, 2 weeks, and 3 weeks respectively. On the other hand, CNN achieves optimal performance in approximately 42%, 28%, and 39% of the states for a forecast horizon of 1 week, 2 weeks, and 3 weeks respectively.

Fig. 10

Geographical distribution of the best performing models for different forecast horizons (a) one week (b) two weeks (c) three weeks.

Discussion

The existing studies (Middya and Roy, 2021) reveal that the COVID-19 health outcomes depend on complex factors including socio-economic, environmental pollution, meteorological parameters, local strategies for mitigation, and public health measures. The impact of these factors varies greatly across the geographical space in the countries like India. Consequently, it is not unexpected that the nature of COVID-19 mortality time-series varies across different states. Due to the geographically varying nature of the association between COVID-19 health outcomes and the complex risk factors, the predictive models’ performance differs significantly across the geographic space. The results of various deep learning models highlight that they could be adopted by most of the states to forecast COVID-19 health outcomes in advance. Specifically, deep learning methods like Bi-LSTM and CNN produce overall better predictive accuracies compared to the others. However, the findings show that selecting an appropriate forecasting model is influenced not only by the geographic location but also by the forecast horizon for which the prediction is made. For example, in terms of the shorter horizon (e.g., 1 week), CNN dominates in India’s southern, northern, and eastern regions, whereas RNN dominates in the country’s central and western regions. In 65% of the states and union territories, the CNN and RNN models perform better. However, the basic RNN does not perform well over longer time horizons, such as 2 or 3 weeks, because it is unable to carry long-term dependencies into the future. For a longer forecast horizon (see Fig. 10 (b, c)), Bi-LSTM provides better accuracies compared to RNN. It could also be verified by the box and swarm plots of Figs 6, 7, and 8. The IQR along with the median MAE and RMSE values in Fig. 6(b, c) and Fig. 7(b, c) are smaller in the case of Bi-LSTM compared to RNN. Furthermore, when compared to the basic RNN, Bi-LSTM has a significantly higher median R2 value.

Conclusion

This paper focuses on finding the spatial distribution of best performing deep learning based techniques to predict Covid-19 health outcomes over India. Specifically, the use of widely applied deep learning techniques for such forecasting at the state level, as well as the geographic variability of their predictive performance, are investigated. To find the state-specific optimal deep learning model and the spatial variability of their performances, a total of 36 time-series datasets (one for each of India’s states and union territories) are used. Various deep learning based time-series forecasting models including CNN, RNN, Vanilla LSTM, LSTM Autoencoder, and Bidirectional LSTM are investigated in this work. It is found that there is no single best method that can outperform across different states in India due to the wide variation in COVID-19 mortality time-series data. Additionally, the best-performing models’ geographical distribution varies significantly across forecast horizons. Models like CNN and RNN dominate for shorter horizons, but models like Bi-LSTM and CNN dominate for longer horizons in most states. We would like to explore some possible extensions of this work in the future. For instance, one of the major research questions that could be addressed is how the relationship between COVID19 mortality and risk factors will change in the future.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

18 in total

1. Reducing the dimensionality of data with neural networks.

Authors: G E Hinton; R R Salakhutdinov
Journal: Science Date: 2006-07-28 Impact factor: 47.728

2. Pollutant specific optimal deep learning and statistical model building for air quality forecasting.

Authors: Asif Iqbal Middya; Sarbani Roy
Journal: Environ Pollut Date: 2022-02-17 Impact factor: 8.071

3. COVID-19 Spatio-Temporal Evolution Using Deep Learning at a European Level.

Authors: Ioannis Kavouras; Maria Kaselimi; Eftychios Protopapadakis; Nikolaos Bakalos; Nikolaos Doulamis; Anastasios Doulamis
Journal: Sensors (Basel) Date: 2022-05-11 Impact factor: 3.847

4. An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov).

Authors: Biao Tang; Nicola Luigi Bragazzi; Qian Li; Sanyi Tang; Yanni Xiao; Jianhong Wu
Journal: Infect Dis Model Date: 2020-02-11

5. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak.

Authors: Shi Zhao; Qianyin Lin; Jinjun Ran; Salihu S Musa; Guangpu Yang; Weiming Wang; Yijun Lou; Daozhou Gao; Lin Yang; Daihai He; Maggie H Wang
Journal: Int J Infect Dis Date: 2020-01-30 Impact factor: 3.623