Literature DB >> 33900932

A COVID-19 Pandemic Artificial Intelligence-Based System With Deep Learning Forecasting and Automatic Statistical Data Acquisition: Development and Implementation Study.

Cheng-Sheng Yu1,2,3,4, Shy-Shin Chang1,2, Tzu-Hao Chang3,5, Jenny L Wu1,3, Yu-Jiun Lin1,2, Hsiung-Fei Chien6, Ray-Jade Chen4,7,8.   

Abstract

BACKGROUND: More than 79.2 million confirmed COVID-19 cases and 1.7 million deaths were caused by SARS-CoV-2; the disease was named COVID-19 by the World Health Organization. Control of the COVID-19 epidemic has become a crucial issue around the globe, but there are limited studies that investigate the global trend of the COVID-19 pandemic together with each country's policy measures.
OBJECTIVE: We aimed to develop an online artificial intelligence (AI) system to analyze the dynamic trend of the COVID-19 pandemic, facilitate forecasting and predictive modeling, and produce a heat map visualization of policy measures in 171 countries.
METHODS: The COVID-19 Pandemic AI System (CPAIS) integrated two data sets: the data set from the Oxford COVID-19 Government Response Tracker from the Blavatnik School of Government, which is maintained by the University of Oxford, and the data set from the COVID-19 Data Repository, which was established by the Johns Hopkins University Center for Systems Science and Engineering. This study utilized four statistical and deep learning techniques for forecasting: autoregressive integrated moving average (ARIMA), feedforward neural network (FNN), multilayer perceptron (MLP) neural network, and long short-term memory (LSTM). With regard to 1-year records (ie, whole time series data), records from the last 14 days served as the validation set to evaluate the performance of the forecast, whereas earlier records served as the training set.
RESULTS: A total of 171 countries that featured in both databases were included in the online system. The CPAIS was developed to explore variations, trends, and forecasts related to the COVID-19 pandemic across several counties. For instance, the number of confirmed monthly cases in the United States reached a local peak in July 2020 and another peak of 6,368,591 in December 2020. A dynamic heat map with policy measures depicts changes in COVID-19 measures for each country. A total of 19 measures were embedded within the three sections presented on the website, and only 4 of the 19 measures were continuous measures related to financial support or investment. Deep learning models were used to enable COVID-19 forecasting; the performances of ARIMA, FNN, and the MLP neural network were not stable because their forecast accuracy was only better than LSTM for a few countries. LSTM demonstrated the best forecast accuracy for Canada, as the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were 2272.551, 1501.248, and 0.2723075, respectively. ARIMA (RMSE=317.53169; MAPE=0.4641688) and FNN (RMSE=181.29894; MAPE=0.2708482) demonstrated better performance for South Korea.
CONCLUSIONS: The CPAIS collects and summarizes information about the COVID-19 pandemic and offers data visualization and deep learning-based prediction. It might be a useful reference for predicting a serious outbreak or epidemic. Moreover, the system undergoes daily updates and includes the latest information on vaccination, which may change the dynamics of the pandemic. ©Cheng-Sheng Yu, Shy-Shin Chang, Tzu-Hao Chang, Jenny L Wu, Yu-Jiun Lin, Hsiung-Fei Chien, Ray-Jade Chen. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 20.05.2021.

Entities:  

Keywords:  COVID-19; artificial intelligence; data visualization; deep learning; machine learning; pandemic; statistical analysis; time series

Mesh:

Year:  2021        PMID: 33900932      PMCID: PMC8139395          DOI: 10.2196/27806

Source DB:  PubMed          Journal:  J Med Internet Res        ISSN: 1438-8871            Impact factor:   5.428


Introduction

In December 2019, the first cases of a new respiratory disease caused by a novel coronavirus were reported in Wuhan, Hubei province, China [1]. The novel coronavirus was subsequently identified and named SARS-CoV-2, and the disease caused by SARS-CoV-2 was named COVID-19 by the World Health Organization (WHO) [2,3]. Since the time the first cases were reported, many confirmed cases have been reported in various other countries. By March 11, 2020, more than 118,000 confirmed cases and 4291 deaths had been reported across 114 countries. The WHO declared the COVID-19 outbreak a pandemic [4], which continues to worsen. As of December 27, 2020, there were more than 79.2 million confirmed cases and 1.7 million deaths [5]. COVID-19 management has emerged as an urgent global issue. Many studies have investigated the factors that contribute to the spread of COVID-19. Demographic, geographic, and economic factors have influenced the spread of the disease. However, social factors, especially governmental response to the pandemic, have significantly influenced disease severity within certain countries [6-11]. Some countries have shown that implementing rigorous public health care management strategies can successfully control infection spread and maintain normal societal functioning [11]. The rapid development of artificial intelligence (AI) in the health care field offers new opportunities to medical researchers. There are many studies that employ AI techniques in disease predictions, such as Yu et al, who have established an online machine learning health assessment system for metabolic syndrome and chronic kidney diseases [12]. Lin et al utilized multicenter data to develop an end-stage liver disease mortality prediction scoring system [13]. Ayyoubzadeh et al analyzed the rate of COVID-19 incidence in Iran using Google Trends data and deep learning methods [14]. Yeung et al combined several online COVID-19 data to train and evaluate five non–time series machine learning models in predicting confirmed infection growth [15]. These studies have shown that AI is suitable for evaluating disease trends and can provide governments with information that can be used to prevent outspread. There are abundant research findings on COVID-19–related AI prediction and the utilization of mobile sensor data with cell broadcast to identify and manage potential contacts [14,16-20]. However, most of these studies have been conducted in a specific region or single country. There is public health consensus that vaccination is an effective prevention strategy. However, with regard to its efficiency and medical expenditure, long-term follow-up investigation is needed to evaluate the clinical effects of vaccines that have not undergone the standard approval process and tests of their mid- and long-term side effects on different groups [21]. Moreover, different studies have focused on different time frames in pandemic trend prediction. They have drawn the same conclusion: there is a high possibility that COVID-19 will remain a common illness or become endemic in the future, and we must learn to coexist with it. Many factors influence how the pandemic will progress (eg, herd immunity), and governmental and individual responses vary widely across nations [22,23]. Successful epidemic prevention and control measures remain the most efficient solution for public health problems. However, there is limited literature on the relationship between governmental responses and the severity of the domestic spread of COVID-19 [24,25]. Therefore, we constructed an online AI system that contains worldwide COVID-19–related data, each country’s governmental responses to the COVID-19 pandemic, and each country’s population data [26]. The COVID-19 Pandemic AI System (CPAIS) can be used to analyze the dynamic trend of the COVID-19 pandemic, facilitate forecasting and predictive modeling, and produce heat map visualization of policy measures in different countries.

Methods

Data Acquisition and System

The CPAIS integrated two data sets: the data set from the Oxford COVID-19 Government Response Tracker (OxCGRT) from the Blavatnik School of Government, which is maintained by the University of Oxford, and the data set from the COVID-19 Data Repository, which was established by Johns Hopkins University Center for Systems Science and Engineering (CSSE). The COVID-19 Data Repository also contains each country’s population data, which are obtained from the United Nations World Population Prospects [27-31]. A total of 171 countries that featured in the databases were included in the system. The CPAIS was placed on a sever and embedded with time series deep learning models to provide forecasting analyses by the statistical program R, version 3.6.3 (The R Foundation). We used the React.js, version 16.14.0, framework; the styling language Sass (Syntactically Awesome Style Sheets), version 4; and the programming language JavaScript ES6 for front-end implementation. As for back-end implementation, we used Java 8; Spring Boot, version 2.0.2 (VMware, Inc); and R as the programming languages, and we used the MySQL (Structured Query Language), version 5.7.21, database as the storage system. In addition, this AI-based system has been programmed to update itself by auto-retrieving information from all data sets each morning at 9 AM (GMT + 8). The auto-retrieval can be summarized in the following three steps: (1) setting the crawler to fetch the data from the source databases, (2) integrating the updated data into our own MySQL database, and (3) conducting statistical analysis using the database-stored procedure. The COVID-19 Data Repository established by Johns Hopkins University CSSE contains three categories of data concerning COVID-19 incidence—confirmed cases, recovered cases, and number of deaths—with country geolocation retrieved from 192 affected countries since January 21, 2020. For most of the countries, country-level data concerning the numbers of reported cases are available. Province- and city-level data concerning reported cases are available for some countries. To depict the COVID-19 pandemic comprehensively, we archived country-level data. The number of reported cases was updated daily using data retrieved from multiple online sources. The number of cases was retrieved from the WHO and the regional and local health departments of the affected countries, including their centers for disease control and prevention. All data were shared freely through GitHub. OxCGRT has been collecting and documenting governmental responses to the COVID-19 pandemic based on several parameters since January 1, 2020. The data set includes 183 countries and 20 items (19 indicators and 1 free response) that characterize governmental responses. There are three types of items: (1) ordinal scale for severity or intensity, (2) numeric scale for specific numbers, and (3) text for other information types. These items can be further classified into four groups: (1) containment and closure policies (8 indicators), (2) economic policies (4 indicators), (3) health system policies (7 indicators), and (4) miscellaneous policies (1 free response). Miscellaneous policies were not included in this system because they were assessed using a free-text response format and limited data were available. OxCGRT data were retrieved from publicly available sources and regularly updated on GitHub.

Statistical Analysis and Deep Learning Techniques

Overview

Four time series models were considered for this study. Each model was applied to all the countries in our system to facilitate forecasting. With regard to 1-year records (ie, whole time series data), records from the last 14 days served as the validation set, whereas earlier records served as the training set. Using records from the last 14 days, forecasting performance was evaluated based on the following five indices: mean error (ME), root mean square error (RMSE), mean absolute error (MAE), mean percentage error (MPE), and mean absolute percentage error (MAPE) [32,33]. RMSE, MAE, and MAPE are always positive values, whereas RMSE, MPE, and MAPE are scaled measures. The hyperparameters for each model can be found in Table S1 in Multimedia Appendix 1, and the diagram of the neural networks can be found in Figure 1. R, version 3.6.3 (The R Foundation), was used to conduct statistical analysis and apply deep learning techniques.
Figure 1

The structure of the COVID-19 Pandemic AI System (CPAIS). ARIMA: autoregressive integrated moving average; CSSE: Center for Systems Science and Engineering; FNN: feedforward neural network; LSTM: long short-term memory; MLP: multilayer perceptron; NN: neural network.

The structure of the COVID-19 Pandemic AI System (CPAIS). ARIMA: autoregressive integrated moving average; CSSE: Center for Systems Science and Engineering; FNN: feedforward neural network; LSTM: long short-term memory; MLP: multilayer perceptron; NN: neural network.

Autoregressive Integrated Moving Average

An autoregressive integrated moving average (ARIMA) model is a statistical regression analysis that utilizes time series data to either understand the data set better or predict future trends. The purpose of ARIMA is to forecast future trends by examining differences between values in the series rather than by using actual values [34,35]. The three main components of ARIMA are autoregression, integration, and moving average. Autoregression refers to a model with a changing variable that regresses on its lag values. Integration represents the differences between data values and their previous values for stationary time series. Moving average incorporates the dependence between an observation and an error term from a moving average model. An ARIMA model can be comprehended by outlining each component, which serves as a parameter with a standard notation. For ARIMA models, there are three standard notations, wherein integer values serve as substitutes for the parameters to indicate the type of ARIMA model used. The parameters can be defined as follows: p: the number of time lags d: the degree of differencing q: the size of the moving average window. In this study, we used the auto.arima function for R, which returns the best ARIMA model based on either the Akaike information criterion value or Bayesian information criterion value. The function searches for possible models within the order constraints provided in the forecast package for R [36,37].

Feedforward Neural Network

A feedforward neural network (FNN) is the simplest type of artificial neural network [38]. The FNN algorithm is biologically inspired. It consists of several simple neuron-like units that are organized in layers. In FNN, information moves in one direction—from the input nodes, through the hidden nodes, and to the output nodes. The mechanism of an FNN is different from that of recurrent neural networks (RNNs) in that connections between the units do not form cycles or loops in FNNs [38,39]. In this study, we used the nnetar function for R, which constructs FNNs with a single hidden layer and lagged inputs for the purpose of forecasting univariate time series. Also, in the forecast package, the function fits into a single hidden-layer neural network for forecasting, with the nnet function included in the nnet package for R [40,41].

Multilayer Perceptron Neural Network

Like FNNs, multilayer perceptron (MLP) neural networks are common deep learning feedforward networks. An MLP neural network is also a supervised learning algorithm used for classification. The main difference is that between the input and output layer, there can be multiple nonlinear layers, called hidden layers, which are the true computational engine of the MLP neural network. MLP neural networks use a learning technique called back-propagation for training. Their multiple layers and nonlinear activation distinguish MLP neural networks from a linear perceptron [42-44]. In other words, MLP neural networks are designed to solve nonlinearly separable problems. Specifically, the units of MLP neural networks apply a sigmoid function as an activation function. In the back-propagation technique, the difference between the output values and the ground truth answer are calculated using predefined error functions. The error is fed back through the network. Using this information, the algorithm can adjust the weights of each connection to significantly reduce the value of the error function. In this study, the mlp function fits MLP neural networks for time series forecasting executed using the nnfor package [45-47].

Long Short-term Memory

Long short-term memory (LSTM) networks are a special type of recurrent deep learning neural network that learns order dependence in sequence prediction problems. LSTM was introduced by Hochreiter and Schmidhuber in 1997, and it is now widely used in a variety of studies and projects [48,49]. A typical RNN makes use of sequential information. These networks are described as recurrent because they use their internal state to process the variable length sequences of inputs. It is difficult for a standard RNN to carry forward information from prior time steps to later ones if a sequence is too long, because it may exclude important information from the beginning. Therefore, LSTM has an advantage in that information can be remembered for long periods of time. Unlike traditional FNNs, LSTM has feedback connections, whereby the output from the previous step is supplied as input in the current step [50]. A common LSTM unit includes a cell, an input gate, an output gate, and a forget gate. The cell recalls values over an arbitrary time interval, and the three gates regulate the flow of information in and out of the cell. In this study, we used the keras R package to recall TensorFlow for conducting the LSTM analysis [51]. TensorFlow was developed by the Google Brain team and released in 2015. It is a free open-source software library for machine learning techniques, particularly deep neural networks [52].

Data Visualization of Time Series Data Sets

Heat maps can be generated to depict variations in policy measures for the COVID-19 pandemic across time. Gradient color bars represent changes in measures across different levels and the support received in the form of financial assistance and investments. The time schedule presented along the horizontal axis will be updated daily. Cumulative and monthly records are represented using histograms and line charts, respectively. This system also provides a download option to interested countries and comparable services with dynamic rankings of the total number of confirmed cases and deaths and declining trends for the COVID-19 pandemic. The following simple regression formula is used to examine declining trends with dynamic time intervals: where β is the slope that represents an increasing or decreasing trend.

Results

In this study, the CPAIS was developed to explore variations, trends, and forecasts related to the COVID-19 pandemic across several counties. A drop-down list for country selection is available. The framework of the CPAIS—from data acquisition and preprocessing to deep learning model application, forecasting, and data visualization—is presented in Figure 1. It includes a combination of two data sets, construction of databases for deep learning prediction and statistical analysis, four statistical or deep learning models for forecasting, and front-end functions for data visualization. The numbers of confirmed cases, recovered individuals, and deaths in 15 countries are listed by month in Table 1. The number of confirmed monthly cases in the United States reached a local peak in July 2020 and another peak of 6,368,591 in December 2020. Regarding the United States, the number of recovered cases after December 14, 2020, is not recorded in the COVID-19 Data Repository database. The total population for each of the 15 countries in 2020 is also mentioned in the table. The dynamic heat map with policy measures is shown in Figure 2, which depicts changes in COVID-19 measures for each country, with Australia used as an example. A total of 19 measures were embedded within the three main policy sections (ie, containment and closure policies, economic policies, and health system policies). Economic policies have the least number of measures, and only 4 of the 19 measures are continuous measures related to financial support or investment.
Table 1

The numbers of confirmed cases, recovered individuals, and deaths in 15 countries by month in 2020.

Country (total populationa) and casesJanFebMarAprMayJuneJulyAugSeptOctNovDec
United States (N=329,466,283)
Confirmed717192,152884,047718,241834,3591,922,7301,464,6761,201,8221,914,9934,466,4516,368,591
Deaths01527160,69941,70320,11326,30629,59123,51523,92837,03877,572
Recovered077017146,923290,811275,873717,529746,665655,863771,7901,533,8411,151,763b
Canada (N=37,855,702)
Confirmed416850745,93038,02213,61812,18412,63730,18976,206144,244202,852
Deaths0010132094064127633019317384119603485
Recovered00158619,83227,78919,90733,78613,11421,16161,771105,643189,043
Mexico (N=127,792,286)
Confirmed04121118,00971,440135,425198,548174,923143,656181,746181,746312,551
Deaths00291830807117,83918,91917,72613,23214,10714,10719,867
Recovered003511,38852,349110,766152,577169,107131,785151,364151,364251,209
Brazil (N=212,559,409)
Confirmed02571581,47081,470887,1921,260,4441,245,787902,663724,670800,2731,340,095
Deaths002015805580530,28032,88128,90622,57115,93213,23621,829
Recovered0012735,80835,808581,7631,220,5361,259,7371,006,183730,387592,6411,251,042
Argentina (N=45,195,777)
Confirmed001054337412,42347,679126,772226,433333,266415,923257,609200,981
Deaths002719132176822365117827714,06577284515
Recovered002401016408016,69261,752217,415293,450379,294283,288169,449
Chile (N=19,116,209)
Confirmed02284214,858105,848155,84376,27456,05951,26547,26541,48757,230
Deaths00122158274634376918321452146612031198
Recovered00156842434,147198,50287,09855,55252,71050,05339,96250,778
United Kingdom (N=67,886,004)
Confirmed25938,754139,95678,76827,67719,57733,290117,763558,947618,940862,498
Deaths00245724,29710,7732952795315644441211,90015,077
Recovered08171680331180692436914667311909
France (N=65,273,512)
Confirmed59552,727114,47221,71013,05423,13493,789285,045808,678864,165400,792
Deaths02353020,847442610414223721346484015,99311,940
Recovered012950139,96318,99779265365502611,84224,46344,81832,229
Greece (N=10,423,056)
Confirmed041310127732649210685840815820,77666,02033,579
Deaths0049913517146012523517802432
Recovered005213220002430788211,388070,690
Taiwan (N=23,816,775)
Confirmed92928310713520212641120124
Deaths014110000000
Recovered093028310114322213250106
Thailand (N=69,799,978)
Confirmed172316091303127901391071522152243155
Deaths00104431001013
Recovered52331423422799369149105213219462
South Korea (N=51,269,183)
Confirmed103139663698872913471486584637072746801727,117
Deaths0161468623111923915160391
Recovered02753813664135011911620196564682691352815,068
India (N=1,380,004,385)
Confirmed12139433,466155,746394,8721,110,5071,995,1782,621,4181,871,4981,278,727803,865
Deaths00351119425411,99219,11128,77733,39023,43315,51011,117
Recovered03120894582,784256,060746,4621,745,5082,433,3192,218,3121,398,072970,695
Australia (N=25,459,700)
Confirmed91645342207436718936085391277499317513
Deaths001875101974562311911
Recovered293475384876422294311,3673434552266160
Egypt (N=102,334,403)
Confirmed017094827482743,32625,767486142594357835622,151
Deaths004634634619941852616509336384981
Recovered011561224122412,42321,17833,29123,565295832669387

aTotal population in 2020.

bThe number of recovered cases after December 14, 2020, were not recorded in the COVID-19 Data Repository database (the record only includes cases from December 1 to 14, 2020); therefore, this value was underreported.

Figure 2

The interface of the dynamic heat map with policy measures on the COVID-19 Pandemic AI System (CPAIS) website.

Deep learning and statistical learning models were used to enable COVID-19 forecasting. The function facilitates 14-day forecasting using four powerful algorithms (Figure 3). ARIMA is the statistical learning model with time series regression; the other models are deep learning neural network algorithms with a single hidden layer, multiple hidden layers, or recurrent techniques. The performance of forecasting for each model for the 15 countries listed in Table 1 is shown in Table 2. A small error value indicates a perfect fit for the data, but the comparison between the different countries was not meaningful because they had different baselines based on their populations. For most of the countries, LSTM demonstrated better forecast accuracy with fewer errors than the other models. The performances of ARIMA, FNN, and the MLP neural network were not stable because their forecast accuracy was only competitive with LSTM for some specific countries. For example, LSTM demonstrated the best forecast accuracy for Canada. The RMSE, MAE, and MAPE were 2272.551, 1501.248, and 0.2723075, respectively. ARIMA (RMSE=317.53169; MAPE=0.4641688) and FNN (RMSE=181.29894; MAPE=0.2708482) demonstrated better performance for South Korea.
Figure 3

The COVID-19 Pandemic AI System (CPAIS) interface for machine learning prediction models facilitating 14-day COVID-19 forecasting. The plot shows the curve for deep learning modeling of total cumulative confirmed cases.

Table 2

Forecasting performance for each model in the validation set for the 15 countries.

Country (total populationa) and methodsMean errorbRoot mean square errorbMean absolute errorbMean percentage errorbMean absolute percentage errorb
United States (N=329,466,283)
ARIMAc–183,472.5153229,501.345183,888.691–0.95382650.9562102
FNNd–197,967.69975251,014.19201,574.807–1.0279881.048648
MLPe34,016.7158945,932.60935,569.5610.17748210.1862749
LSTMf–17,670.38 41,667.98 g 31,092.06 –0.09409045 0.1664009
Canada (N=37,855,702)
ARIMA–3786.814634953.76593786.8146–0.68283420.6828342
FNN–1902.82187733146.81612133.5721–0.35030410.3898707
MLP–6056.71044307294.19336056.7104–1.0946431.094643
LSTM306.1702 2272.551 1501.248 0.04896196 0.2723075
Mexico (N=127,792,286)
ARIMA–3776.62376281.9874841.25440.35012431.2391347
FNN–15,894.20024119,622.06616,156.1290–1.1455241.165534
MLP–3551.3816356534.1195455.281–0.25176120.3969063
LSTM–1137.118 2883.836 2334.178 –0.08386455 0.1716616
Brazil (N=212,559,409)
ARIMA–52,913.866169,053.9554,328.55–0.70321640.7228866
FNN–168,251.54394204,577.061168,251.544–2.2406812.240681
MLP–28,723.3393843,395.96531,117.856–0.37972250.412664
LSTM–2746.457 16,085.02 14,347.73 –0.03768765 0.1931052
Argentina (N=45,195,777)
ARIMA10,240.49591212,832.603510,240.49590.64339340.6433934
FNN22,285.96240426,555.12822,285.96241.4020421.402042
MLP10,914.14327513,689.553910,929.68740.68577690.6867919
LSTM1253.045 3920.961 3202.607 0.07803485 0.2024643
Chile (N=19,116,209)
ARIMA1823.552161992.351823.55220.30485020.3048502
FNN8171.77230609157.98818171.77231.3639511.363951
MLP2169.7023072435.45402169.70230.36226280.3622628
LSTM595.9308 790.8397 648.5224 0.1001373 0.1090634
United Kingdom (N=67,886,004)
ARIMA40,161.748155,436.73541,580.21551.70539441.776331
FNN–17,129.95094323,936.14417,129.951–0.73045110.7304511
MLP81,031.84102,155.323881,031.8413.4821553.482155
LSTM15,560.98 17,735.29 15,560.98 0.6832804 0.6832804
France (N=65,273,512)
ARIMA1807.5070 8181.384 6633.665 0.07287266 0.2565254
FNN61,075.9902367,684.57561,075.9902.3408442.340844
MLP9601.59485111,456.38210,239.3080.37266480.3969022
LSTM6262.6939254.2647784.8040.2415490.3000627
Greece (N=10,423,056)
ARIMA5423.21436072.07735423.21434.0033384.003338
FNN–21.8694361 561.98452 400.61927 –0.01977488 0.2937978
MLP–1145.1654051341.15961145.1654–0.8443990.844399
LSTM–512.1191565.7909512.1191–0.38215590.3821559
Taiwan (N=23,816,775)
ARIMA–15.9743447717.28850115.97434–2.03799692.037997
FNN–6.5710071467.3796796.571007–0.846062320.8460623
MLP–9.48517912.9252389.9162023–1.20057061.257011
LSTM–2.059649 3.322996 2.978151 –0.3227033 0.3820354
Thailand (N=69,799,978)
ARIMA1471.0821531620.870091471.08215323.784223823.784224
FNN1463.1099101611.2395731463.10991023.65952423.659524
MLP1517.219840661674.5850041517.21984124.516502524.516502
LSTM173.2286 308.695 202.2714 2.950519 3.435209
South Korea (N=51,269,183)
ARIMA–260.265311317.53169265.29603–0.45403950.4641688
FNN–75.7162332 181.29894 154.2065 –0.1226205 0.2708482
MLP–1138.03524761419.839111145.57606–1.9631961.978379
LSTM323.9709342.9156323.97090.59787930.5978793
India (N=1,380,004,385)
ARIMA19,113.7783421,947.37519,113.7780.18746880.1874688
FNN–10,156.962689 13,612.018 10,156.963 –0.09945817 0.09948717
MLP20,964.357626624,556.93620,964.3580.20557180.20055718
LSTM–13,037.6414,480.9113,037.64–0.1281780.1281378
Australia (N=25,459,700)
ARIMA26.960602030.4020826.960600.095420630.09542063
FNN187.8959192205.6998187.895920.66340380.6637038
MLP–15.6908569576.4818662.261210–0.054785760.2197826
LSTM5.898776 14.39023 11.91991 0.02086999 0.04212132
Egypt (N=102,334,403)
ARIMA2392.2857143239.047322392.285711.78445941.784459
FNN1944.55868802641.981681944.558691.450171.45017
MLP669.96030638936.05245669.960310.49886670.4988667
LSTM437.0412 500.6487 438.0092 0.3304228 0.3311979

aTotal population in 2020.

bFive commonly used measures for evaluation of forecasting include mean error, root mean square error (RMSE), mean absolute error (MAE), mean percentage error, and mean absolute percentage error (MAPE), according to the records of the latest 14 days in 2020. The RMSE, MAE, and MAPE are always positive values.

cARIMA: autoregressive integrated moving average.

dFNN: feedforward neural network.

eMLP: multilayer perceptron.

fLSTM: long short-term memory.

gThe values for best performances in each country are italicized.

Figure 4 presents descriptive statistics for specific countries. On the website, three countries can be simultaneously compared, and the period can be customized. Users can select the countries that are of interest to them and compare the COVID-19–related data. For each respective country, a line chart showing the number of confirmed cases, recoveries, and deaths per month is generated. In addition, a global comparison is also provided on the website.
Figure 4

The interface of descriptive statistics for selected countries with customization on the COVID-19 Pandemic AI System (CPAIS) website. CSV: comma-separated values.

Users can rank 171 countries based on five different parameters: (1) the number of confirmed cases, (2) confirmed cases by percentage of population, (3) the number of confirmed deaths, (4) confirmed deaths by percentage of population, and (5) declining trend. Figure 5 shows an example of how the top 20 countries can be ranked using confirmed cases by percentage of population. With regard to customization, the ranking function is flexible. The selected countries and specific time period can be changed by the user.
Figure 5

The interface for the ranking of selected countries with customization on the COVID-19 Pandemic AI System (CPAIS) website.

The numbers of confirmed cases, recovered individuals, and deaths in 15 countries by month in 2020. aTotal population in 2020. bThe number of recovered cases after December 14, 2020, were not recorded in the COVID-19 Data Repository database (the record only includes cases from December 1 to 14, 2020); therefore, this value was underreported. The interface of the dynamic heat map with policy measures on the COVID-19 Pandemic AI System (CPAIS) website. The COVID-19 Pandemic AI System (CPAIS) interface for machine learning prediction models facilitating 14-day COVID-19 forecasting. The plot shows the curve for deep learning modeling of total cumulative confirmed cases. Forecasting performance for each model in the validation set for the 15 countries. aTotal population in 2020. bFive commonly used measures for evaluation of forecasting include mean error, root mean square error (RMSE), mean absolute error (MAE), mean percentage error, and mean absolute percentage error (MAPE), according to the records of the latest 14 days in 2020. The RMSE, MAE, and MAPE are always positive values. cARIMA: autoregressive integrated moving average. dFNN: feedforward neural network. eMLP: multilayer perceptron. fLSTM: long short-term memory. gThe values for best performances in each country are italicized. The interface of descriptive statistics for selected countries with customization on the COVID-19 Pandemic AI System (CPAIS) website. CSV: comma-separated values. The interface for the ranking of selected countries with customization on the COVID-19 Pandemic AI System (CPAIS) website.

Discussion

Principal Findings

A combination of data on COVID-19 incidence and policy measures can be used to examine the relationship between the progression of the COVID-19 pandemic and governmental epidemic prevention efforts. The CPAIS can help users determine whether policy measures are successful in preventing COVID-19 transmission. According to a report published by the Lowy Institute for International Policy [53], a ranked comparison of the performance of countries in managing the COVID-19 pandemic shows that New Zealand, Vietnam, and Taiwan are the top three countries with the highest average scores on their six indicators. Besides, New Zealand and Taiwan successfully controlled the COVID-19 outbreak without international financial support (Figures S1-S3 in Multimedia Appendix 1). Specifically, New Zealand had immediately implemented infection control and closure policies with a flexible adaptation on measures; in addition, Taiwan had enforced strict guidelines regarding international travel that not only contributed to infection control but also rendered the strict measures described in the containment and closure policies unnecessary. Furthermore, both countries had taken great efforts to maximize the implementation of testing and contact tracing policies during 2020. In this regard, both countries are outstanding examples. The vivid heat maps in the CPAIS illustrate time-dependent fluctuations in the measures and help users monitor variations in, and the effects of, policy measures in each country. Several time series AI learning techniques have been used for forecasting purposes. Both statistical learning and deep learning models demonstrated efficacious performance for different countries. Although the values are not absolute, they are comparable between countries with different total populations. When compared to the results of a past study [19], performance for the same model and country was better in this study because more extensive time series data were included in our system. In addition, 14-day COVID-19 trend forecasting can serve a useful alert that will help governments and experts reduce the incidence of COVID-19. Furthermore, different AI learning techniques have unique advantages. According to the Wold decomposition theorem [34,54,55], the autoregressive moving average model is theoretically sufficient to describe a regular stationary time series. It is possible to change a nonstationary time series into a stationary one, such as by using differencing. As noted earlier, ARIMA models have three components: autoregression, integration, and moving average. They are applied to data with evidence of nonstationarity in the mean, whereby an initial differencing step can be applied one or more times to eliminate the nonstationarity of the mean function in the trend. We used the auto.arima function for R to choose the best model according to either the Akaike information criterion, corrected Akaike information criterion, or Bayesian information criterion value; the auto.arima function also conducts the model search within the order constraints provided. FNN is similar to ARIMA because the fitted model is analogous to an autoregression(p) model, where p is the order but with nonlinear functions for nonseasonal data in this study. Therefore, it is denoted as a neural network autoregression(p,k) model called NNAR, where k represents the number of hidden nodes. That is why, for some countries, ARIMA and FNN yielded similar outcomes for forecast accuracy. Differences between the two models still exist; the error can be reduced only for FNN by increasing the number of iterations, but the iteration time will be increased as a result. The capabilities of neural networks are attributable to the hierarchical or multilayered structure of the networks. The data structure can include features at different scales or resolutions and combine them into higher-order features. After repeating the learning process for a sufficient number of training cycles, the network will transition to some state where the error term is small enough. Generalization and tolerance are the two main characteristics. First, neural networks permit generalization because they can classify both unknown and known patterns with the same distinguishing features. Second, neural networks are highly fault tolerant. Because of their distributed nature, they will continue to function even if a significant fraction of neurons and interconnections fail. In general, increasing the number of hidden nodes may enhance the performance of prediction, and increasing the number of networks to train may result in an ensemble forecast. The core idea of LSTM lies in the cell state—the horizontal line that runs down the chain with information flowing alongside (Figure 6). In addition, LSTMs have the ability to remove or add information to the cell state, controlled by the gates, which are a pathway through which information can be allowed to pass. They consist of a sigmoid neural net layer and a pointwise multiplication operation. LSTM networks are powerful in promptly forecasting series data since there can be lags of unknown duration between events in time series. Hence, when compared to other traditional RNNs in this study, LSTM networks do not have the vanishing gradient problem. Thus, LSTM has the advantages of being relatively insensitive to time intervals and of making fewer errors in prediction when compared to other methods.
Figure 6

Diagram of the long short-term memory neural network with three functional gates.

In the CPAIS, long-term cumulative records of confirmed cases, recoveries, and deaths are included. In addition, daily figures for these metrics are provided for each month. Thus, short-term trends can be examined using this system. Users can compare three or more countries and visualize the relative incidence of COVID-19 within a specific time duration. Short-term and long-term trends can be simultaneously viewed. In previous studies [14,19,20], only a limited number of countries were included for forecasting. Our system contains 171 countries and provides information about policy measures. Further, data visualization, statistical and deep learning for incidence forecasting, and customized ranking are possible. Based on their objectives, users can select country names and time periods. Similar cultural backgrounds, neighboring geographical characteristics, and high-frequency trading may also serve as attractive features. In particular, a declined ranking is calculated by our system to explore the effectiveness of COVID-19 management strategies implemented in 2020. Thus, the CPAIS is a comprehensive AI-based service that is available on the internet. It relies on big data and offers data visualization, deep learning–based prediction, and customized comparison. This system can be used to investigate COVID-19 progression trends. Diagram of the long short-term memory neural network with three functional gates.

Limitations

To the best of our knowledge, this is the first web-based machine learning system that can explore variations, trends, and forecasts related to the COVID-19 pandemic across 171 countries. This pilot system still has several limitations. First, this database relies heavily on the source databases and shares similar limitations with the source databases. For example, the source databases did not consider the number of COVID-19 patients that were traveling internationally, and this may result in inaccurate analysis for a small number of countries. However, we think that the number of COVID-19 patients who were traveling internationally is small, as most countries imposed COVID-19–negative tests or proof of vaccination before allowing the traveler into the country. Second, the CPAIS cannot be updated daily if the source databases are not updated. For example, at present, the number of recoveries in the United States was last updated on December 14, 2020. So the number of recoveries in the United States may not be accurate. Finally, since the main purpose of this platform is to consolidate raw data retrieved from various databases and associated measures of pandemic policy implementation, we remind the reader to use text mining, local reports, and information retrieved from the medical system of a given country for further assessment.

Conclusions

In general, the CPAIS collects and summarizes information about the COVID-19 pandemic and offers data visualization and deep learning–based prediction. It may be a useful and consequential reference resource for any serious outbreak or epidemic that may occur in the future. In addition, information about the vaccine is also stored in our system. It may be used to evaluate the efficacy of the vaccine in different countries in the future. Moreover, the 2-week machine learning forecasts may serve as warning signs and highlight current trends in the epidemic that have been made apparent by AI techniques. To conclude, the CPAIS can be used to summarize several factors that can influence the effectiveness of epidemic prevention and predict the next serious outbreak.
  24 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker).

Authors:  Thomas Hale; Noam Angrist; Rafael Goldszmidt; Beatriz Kira; Anna Petherick; Toby Phillips; Samuel Webster; Emily Cameron-Blake; Laura Hallas; Saptarshi Majumdar; Helen Tatlow
Journal:  Nat Hum Behav       Date:  2021-03-08

3.  Analysing governmental response to the COVID-19 pandemic.

Authors:  Ayman Imtyaz; Mohd Javaid
Journal:  J Oral Biol Craniofac Res       Date:  2020-08-14

4.  The coronavirus is here to stay - here's what that means.

Authors:  Nicky Phillips
Journal:  Nature       Date:  2021-02       Impact factor: 69.504

5.  Scientific consensus on the COVID-19 pandemic: we need to act now.

Authors:  Nisreen A Alwan; Rochelle Ann Burgess; Simon Ashworth; Rupert Beale; Nahid Bhadelia; Debby Bogaert; Jennifer Dowd; Isabella Eckerle; Lynn R Goldman; Trisha Greenhalgh; Deepti Gurdasani; Adam Hamdy; William P Hanage; Emma B Hodcroft; Zoë Hyde; Paul Kellam; Michelle Kelly-Irving; Florian Krammer; Marc Lipsitch; Alan McNally; Martin McKee; Ali Nouri; Dominic Pimenta; Viola Priesemann; Harry Rutter; Joshua Silver; Devi Sridhar; Charles Swanton; Rochelle P Walensky; Gavin Yamey; Hisham Ziauddeen
Journal:  Lancet       Date:  2020-10-15       Impact factor: 79.321

6.  Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study.

Authors:  Seyed Mohammad Ayyoubzadeh; Seyed Mehdi Ayyoubzadeh; Hoda Zahedi; Mahnaz Ahmadi; Sharareh R Niakan Kalhori
Journal:  JMIR Public Health Surveill       Date:  2020-04-14

7.  Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions.

Authors:  Zifeng Yang; Zhiqi Zeng; Ke Wang; Sook-San Wong; Wenhua Liang; Mark Zanin; Peng Liu; Xudong Cao; Zhongqiang Gao; Zhitong Mai; Jingyi Liang; Xiaoqing Liu; Shiyue Li; Yimin Li; Feng Ye; Weijie Guan; Yifan Yang; Fei Li; Shengmei Luo; Yuqi Xie; Bin Liu; Zhoulang Wang; Shaobo Zhang; Yaonan Wang; Nanshan Zhong; Jianxing He
Journal:  J Thorac Dis       Date:  2020-03       Impact factor: 3.005

8.  Noninvasive and Convenient Screening of Metabolic Syndrome Using the Controlled Attenuation Parameter Technology: An Evaluation Based on Self-Paid Health Examination Participants.

Authors:  Yu-Jiun Lin; Chang-Hsien Lin; Sen-Te Wang; Shiyng-Yu Lin; Shy-Shin Chang
Journal:  J Clin Med       Date:  2019-10-24       Impact factor: 4.241

View more
  8 in total

1.  Challenges of Multiplex Assays for COVID-19 Research: A Machine Learning Perspective.

Authors:  Paul C Guest; David Popovic; Johann Steiner
Journal:  Methods Mol Biol       Date:  2022

2.  Investigation of robustness of hybrid artificial neural network with artificial bee colony and firefly algorithm in predicting COVID-19 new cases: case study of Iran.

Authors:  Mohammad Javad Shaibani; Sara Emamgholipour; Samira Sadate Moazeni
Journal:  Stoch Environ Res Risk Assess       Date:  2021-09-30       Impact factor: 3.821

3.  A proficient approach to forecast COVID-19 spread via optimized dynamic machine learning models.

Authors:  Yasminah Alali; Fouzi Harrou; Ying Sun
Journal:  Sci Rep       Date:  2022-02-14       Impact factor: 4.379

4.  Multi-Modal Data Analysis for Pneumonia Status Prediction Using Deep Learning (MDA-PSP).

Authors:  Ruey-Kai Sheu; Lun-Chi Chen; Chieh-Liang Wu; Mayuresh Sunil Pardeshi; Kai-Chih Pai; Chien-Chung Huang; Chia-Yu Chen; Wei-Cheng Chen
Journal:  Diagnostics (Basel)       Date:  2022-07-13

5.  A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA.

Authors:  Hu-Li Zheng; Shu-Yi An; Bao-Jun Qiao; Peng Guan; De-Sheng Huang; Wei Wu
Journal:  Environ Sci Pollut Res Int       Date:  2022-09-22       Impact factor: 5.190

6.  Comparison of Transcriptomic Signatures between Monkeypox-Infected Monkey and Human Cell Lines.

Authors:  Do Thi Minh Xuan; I-Jeng Yeh; Chung-Che Wu; Che-Yu Su; Hsin-Liang Liu; Chung-Chieh Chiao; Su-Chi Ku; Jia-Zhen Jiang; Zhengda Sun; Hoang Dang Khoa Ta; Gangga Anuraga; Chih-Yang Wang; Meng-Chi Yen
Journal:  J Immunol Res       Date:  2022-09-01       Impact factor: 4.493

7.  Deep learning for Covid-19 forecasting: State-of-the-art review.

Authors:  Firuz Kamalov; Khairan Rajab; Aswani Kumar Cherukuri; Ashraf Elnagar; Murodbek Safaraliev
Journal:  Neurocomputing       Date:  2022-09-08       Impact factor: 5.779

8.  COVID-19 Diagnosis from Chest X-ray Images Using a Robust Multi-Resolution Analysis Siamese Neural Network with Super-Resolution Convolutional Neural Network.

Authors:  Happy Nkanta Monday; Jianping Li; Grace Ugochi Nneji; Saifun Nahar; Md Altab Hossin; Jehoiada Jackson; Chukwuebuka Joseph Ejiyi
Journal:  Diagnostics (Basel)       Date:  2022-03-18
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.