Literature DB >> 36099253

A data-driven eXtreme gradient boosting machine learning model to predict COVID-19 transmission with meteorological drivers.

Md Siddikur Rahman1, Arman Hossain Chowdhury1.   

Abstract

COVID-19 pandemic has become a global major public health concern. Examining the meteorological risk factors and accurately predicting the incidence of the COVID-19 pandemic is an extremely important challenge. Therefore, in this study, we analyzed the relationship between meteorological factors and COVID-19 transmission in SAARC countries. We also compared the predictive accuracy of Autoregressive Integrated Moving Average (ARIMAX) and eXtreme Gradient Boosting (XGBoost) methods for precise modelling of COVID-19 incidence. We compiled a daily dataset including confirmed COVID-19 case counts, minimum and maximum temperature (°C), relative humidity (%), surface pressure (kPa), precipitation (mm/day) and maximum wind speed (m/s) from the onset of the disease to January 29, 2022, in each country. The data were divided into training and test sets. The training data were used to fit ARIMAX model for examining significant meteorological risk factors. All significant factors were then used as covariates in ARIMAX and XGBoost models to predict the COVID-19 confirmed cases. We found that maximum temperature had a positive impact on the COVID-19 transmission in Afghanistan (β = 11.91, 95% CI: 4.77, 19.05) and India (β = 0.18, 95% CI: 0.01, 0.35). Surface pressure had a positive influence in Pakistan (β = 25.77, 95% CI: 7.85, 43.69) and Sri Lanka (β = 411.63, 95% CI: 49.04, 774.23). We also found that the XGBoost model can help improve prediction of COVID-19 cases in SAARC countries over the ARIMAX model. The study findings will help the scientific communities and policymakers to establish a more accurate early warning system to control the spread of the pandemic.

Entities:  

Mesh:

Year:  2022        PMID: 36099253      PMCID: PMC9469970          DOI: 10.1371/journal.pone.0273319

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.752


Introduction

The novel coronavirus disease 2019 (COVID-19), induced by severe acute respiratory syndrome 2 [1, 2], has become a serious public health threat globally. The disease has quickly spread over the world because of its extremely human-to-human transmission characteristics [3-5]. As of July 02, 2022, more than 553.87 million confirmed cases and over 6.36 million deaths have been reported globally [6]. It has already been studied that meteorological factors like temperature, relative humidity and wind speed have been linked to the development of the transmission of recognized coronavirus infections such as Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) [7, 8]. According to laboratory tests, SARS-CoV-2 is very stable in cold conditions but vulnerable to rising temperatures [9]. Different previous studies also investigated that meteorological factors such as temperature [10-12], humidity [13, 14], and wind speed [15] might affect COVID-19 transmission [16-18]. The transmission of the COVID-19 pandemic is reduced as temperature rises in China as well as other regions of the world [13, 16, 18]. It was also found that wind speed had lagged correlations with COVID-19 incidence in various Turkish cities [19]. Humidity was also a major meteorological factor in reducing COVID-19 viral transmission in China, Pakistan, Sri Lanka and other countries [11, 14, 20]. However, the humidity was also negatively associated with the COVID-19 epidemic in Indonesia and New York [21, 22]. Different studies widely used different types of statistical approaches including correlation, regression analysis, generalized additive model, and generalized linear model to analyze the influence of environmental variables on COVID-19 transmission [5, 13, 19, 21–24]. Besides these, several studies have used Autoregressive Integrated Moving Average with exogeneous variables model to determine the association of climate variables with COVID-19 transmission and forecasting [23, 25, 26]. Time-series modelling is a popular forecasting method for understanding the dynamic association of important variables. However, the transmission of COVID-19 disease is often influenced by several factors which exhibit nonlinear influences which cause problems [27]. This problem can be easily solved by machine learning techniques [28, 29]. Given the uncertainty around decisions on the accurate time of the emergence and disappearance of the disease, short-term forecasting is crucial to create better plans and more appropriate responses. The eXtreme Gradient Boosting (XGBoost) is an uptrend machine learning technique in time series modelling. The XGBoost model can generate a high precision result for its self-learning characteristics. This study contributes to the advancement of the time-series prediction of COVID-19. Consequently, an initial benchmarking is given to demonstrate the potential of machine learning for future research. The study further suggests that a genuine novelty in COVID-19 prediction can be realized by a data-driven XGBoost machine learning model. Currently, no study used this technique for determining the association between meteorological factors and COVID-19 transmission and prediction. Therefore, the study aimed to: (a) identify the meteorological risk factors; (b) compare the predictive accuracy of the ARIMAX and XGBoost for precise modelling of COVID-19 incidence in the South Asian Association for Regional Cooperation (SAARC) countries. In this study, our proposed methodology (Fig 1) and results are useful to select a suitable model for COVID-19 prediction. The findings from this study will help the countries’ policymakers for taking effective strategies to establish a more accurate early warning system to control the spread of the pandemic.
Fig 1

Schematic of the proposed methodology.

Materials and methods

Data source

The daily COVID-19 confirmed cases data of the SAARC countries (Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka) were collected from the Johns Hopkins Coronavirus Resource Center [30]. The meteorological data of each country were obtained based on hourly meteorological observations from the NASA Langley Research Center (LaRC) website [31], including minimum and maximum temperatures (°C), relative humidity (%), maximum wind speed (m/s), surface pressure (kPa) and precipitation (mm/day). The study period was from the onset of COVID-19 to January 29, 2022, for each SAARC country.

Model building, prediction, and performance evaluation

Predictive modeling and statistical analyses were conducted using RStudio (Version 4.1.0) [32]. The ’tseries’ and stats packages were used to process the time series. The ARIMAX models were built with the ’forecast’ package using auto.arima function for choosing the best model based on the Corrected Akaikes Information Criteria (AICc) values [33]. The ‘forecastxgb’ package was used for building the XGBoost model. Details data and necessary codes for predictive modeling and statistical analysis are provided in supplements (S1 Table and S1 Text). In this study, predictive accuracy of ARIMAX and XGBoost models was compared to determine which was more suitable for predicting COVID-19 confirmed cases in SAARC countries based on meteorological risk factors. The data were divided into training and test sets. All the significant meteorological factors were used as covariates in ARIMAX and XGBoost models for predicting the COVID-19 confirmed cases. The ensemble machine learning technique XGBoost were built using the lagged meteorological variables as covariates by frequently changing several parameters. The adjusted parameters for the model of each SAARC country are nrounds, nrounds_method = ‘cv’, nfold, seas_method, trend_method = ‘none’. Predictive accuracy refers to the capacity of the model to predict COVID-19 incidence. There are several metrics for computing the model’s accuracy [34]. However, in this study, we used four prominent performance metrics such as the mean absolute percentage error (MAPE), mean percentage error (MPE), mean absolute error (MAE) and root mean square error (RMSE). The mathematical form of the error measures are as follows: Where n represents the number of observations, represents the error between the predicted and actual value.

ARIMAX model

The Autoregressive Integrated Moving Average (ARIMA) model introduced by Box and Jenkins (2013) is widely used for predicting time series data because of its capacity to handle non-stationary data [35]. ARIMA(p, d, q) combines the Autoregressive (AR) and Moving Average (MA) models, with the ‘I’ indicating integration; where p stands for autoregressive order, d for differencing order, and q stands for moving average order [36]. The AR(p) in ARIMA stands for a linear combination of p prior observations with a random error factor that determines a variable’s future value which can be mathematically expressed as Where, Y and ε are the actual value and random error terms at time t, ∅ (i = 1,2,3,4….) represents model parameters, and c is a constant. The order of the model is a positive integer p. The MA(q) model incorporates a dependent variable for previous errors which can be expressed as Where μ indicates the series mean, θ (j = 1, 2, 3 … q) indicates model parameters, and q is the model’s order [37]. The ARIMA model may be stated in its basic form as where represents differenced series (it can be more than one); ∅1, ∅2, … ∅ are the coefficients of AR terms and θ1, θ2 … θ are the coefficients of moving average term. The ARIMAX model is the generalization of the ARIMA model. It enhances the ARIMA model’s capabilities by including several meteorological information such as temperature, humidity, precipitation and other meteorological conditions in time series modelling. An ARIMAX model is be formed as follows: where, y represents the response variable for the given time series; x1, … x are the features or exogenous variables of the time series that potentially explain y; η is the regression model error that describes the ARIMA model (Eq 3) [38].

XGBoost model

The XGBoost model is a supervised machine learning technique and an emerging machine learning method for time series forecasting in recent years [39, 40]. It uses an improved generalized gradient boosting library that can rapidly assess the value of all input attributes [41-43]. Boosting is a technique that combines hundreds of low-accuracy prediction models into a single high-accuracy model by frequently integrating the models under tolerable parameter values [44-46]. The objective function of the model is as follows: Where y stands for the observed values, x stands for the feature vector, n stands for the sample size, m stands for the number of iterations, and f stands for the error in m iterations. l stands for the loss function, which computes the deviation between the label and the forecasting in the previous phase as well as the output of the new tree, and R stands for the regularization term, which reduces the new tree’s output variation [37, 39, 47].

Result

As of 29 January 2022, India had reported the highest total of 41.1 million COVID-19 confirmed cases, resulting in 0.5 million fatalities, whereas Bhutan had reported the lowest COVID-19 confirmed cases and fatalities among SARRC countries (Table 1).
Table 1

Summary statistics of COVID-19 confirmed cases and deaths for SAARC countries till January 29, 2022.

CountriesDaily confirmed casesDaily Deaths
MinMaxMean ± SDTotalMinMaxMean ± SDTotal
Afghanistan03243228.50 ± 397.09161,306015910.49 ± 18.957405
Bangladesh016,2302559 ± 3163.931,773,149026440.88 ± 53.2228,329
Bhutan02056.57 ± 19.394566010.005 ± 0.084
India0533,03556,214 ± 85,201.0441,092,52204529665.80 ± 904.03486,718
Maldives02813192.30 ± 390.21133,2880100.40 ± 0.98274
Nepal010,0521287.20 ± 1937.83947,394061915.90 ± 39.4211,703
Pakistan012,0732011 ± 1758.871,417,991031341.49 ± 38.8729,248
Sri Lanka011,366829.80 ± 1237.04609,047033420.98 ± 42.6415,400

Min: Minimum; Max: Maximum; SD: Standard Deviation.

Min: Minimum; Max: Maximum; SD: Standard Deviation. The maximum temperature among the SAARC countries varies from -3.38°C (Nepal) to 47.01°C (India) and the minimum temperature varies from -26.17°C (Afghanistan) to 31.67°C (India). The highest average maximum temperature was observed in India (32.54°C). The highest level of humidity was observed in Maldives (100%), but the lowest level of humidity was observed in Afghanistan (5.06%). Bangladesh had the highest maximum wind speed at 10M (15.68 m/s), but Bhutan had the lowest (1.62 m/s). The highest surface pressure was observed in Bangladesh (101.88 kPa) and the lowest surface pressure was observed in Nepal (66.08 kPa) as illustrated in Fig 2.
Fig 2

Boxplot of meteorological variables for SAARC countries.

Max. tem: Maximum temperature; Min. temp: Minimum temperature; Rel. humidity: Relative humidity; S. pressure: Surface pressure; Max. w. speed: Maximum wind speed.

Boxplot of meteorological variables for SAARC countries.

Max. tem: Maximum temperature; Min. temp: Minimum temperature; Rel. humidity: Relative humidity; S. pressure: Surface pressure; Max. w. speed: Maximum wind speed. The time series figure depicts the trend of COVID-19 confirmed cases from the onset of the disease to January 29, 2022, in each SAARC country. Daily confirmed cases in Bangladesh, Nepal and Pakistan fluctuated at different periods including a highly upward trend. The pattern in Afghanistan and Sri Lanka was remarkably similar, indicating a downward tendency. Overall, Bhutan and Maldives had a comparatively lower rate of COVID-19 transmission than other SAARC countries (Fig 3). The cross-correlation between COVID-19 confirmed cases and meteorological variables was formed at 0 to 30 lags. Only positive lags were considered to explore the influence of meteorological factors on the COVID-19 transmission in a certain period [48]. In Afghanistan, the maximum and minimum temperature at lag 0 showed a significant relationship with COVID-19 confirmed cases. The only maximum temperature at lag 4 showed a significant relationship in India. Maximum wind speed showed a significant relationship in Bangladesh at lag 9 and Maldives at lag 13 days. Relative humidity at a lag of 26 days in Bhutan and lag of 10 days in Nepal showed a significant correlation with COVID-19 confirmed cases. Surface pressure showed a significant correlation with COVID-19 confirmed cases in India at lag of 9 days, in Sri Lanka at lag of 13 days and in Pakistan at lag of 28 days (Fig 4).
Fig 3

Time series plot showing the trend of COVID-19 confirmed cases for SAARC countries.

Fig 4

Cross-correlation between COVID-19 confirmed cases and meteorological variables in SAARC countries.

Max. temperature: Maximum temperature; Max. W. speed: Maximum Wind speed; CCF: Cross-Correlation Function.

Cross-correlation between COVID-19 confirmed cases and meteorological variables in SAARC countries.

Max. temperature: Maximum temperature; Max. W. speed: Maximum Wind speed; CCF: Cross-Correlation Function. The aforementioned meteorological factors were used as covariates in ARIMAX model at different lags to determine their influence on COVID-19 confirmed cases. For example, in Afghanistan, the maximum and minimum temperature at lag 0 was used as covariates for building the ARIMAX model. Similarly for Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan and Sri Lanka, the lagged variables were used as covariates and the influence of those variables on the disease was shown in Table 2.
Table 2

Estimated parameters with 95% confidence intervals of significant meteorological factors of ARIMAX models.

FactorsAfghanistanBangladeshBhutanIndiaMaldivesNepalPakistanSri Lanka
ARIMAX (3,1,0)ARIMAX (0,1,0)ARIMAX (5,1,0)ARIMAX (2,1,0)ARIMAX (1,1,0)ARIMAX (1,1,0)ARIMAX (4,1,0)ARIMAX (5,1,0)
Min. temperature (0)-8.93*
(-14.30, -3.56)
Max. temperature (0)11.91*
(4.77, 19.05)
Max. temperature (4)0.18*
(0.01, 0.35)
Max. W. speed (9)-53.89*
(-93.45, -14.32)
Max. W. speed (13)-4.24*
(-8.31, -0.18)
Rel. humidity (10)-4.84*
(-9.20, -0.48)
Rel. humidity (26)-0.12*
(-0.22, -0.02)
Surface pressure (9)-1.91*
(-3.75, -0.06)
Surface pressure (13)411.63*
(49.04, 774.23)
Surface pressure (28)25.77*
(7.85, 43.69)

Max. temperature: Maximum temperature; Min. temperature; Minimum temperature; Rel. humidity: Relative humidity; Max. W. speed: Maximum Wind speed;

* indicates significance at 5% level.

Max. temperature: Maximum temperature; Min. temperature; Minimum temperature; Rel. humidity: Relative humidity; Max. W. speed: Maximum Wind speed; * indicates significance at 5% level. Table 2 depicts the minimum temperature with a lag of 0 (i.e., same day) in Afghanistan (β = -8.93, 95% CI: -14.30, -3.56) negatively impact the transmission of COVID-19 cases. The maximum temperature with a lag of 4 days in India (β = 0.18, 95% CI: 0.01, 0.35) and with a lag of 0 (i.e., same day) in Afghanistan (β = 11.91, 95% CI: 4.77, 19.05) had a positive influence on the transmission of COVID-19 confirmed cases. Maximum wind speed with a lag of 9 days in Bangladesh (β = -53.89, 95% CI: -93.45, -14.32) and a lag of 13 days in Maldives (β = -4.24, 95% CI: -8.31, -0.18) negatively impacts the transmission of COVID-19 confirmed cases. Relative humidity with a lag of 10 days in Nepal (β = -4.84, 95% CI: -9.20, -0.48) and at a lag of 26 days in Bhutan (β = -0.12, 95% CI: -0.22, -0.02) negatively impacts COVID-19 confirmed cases. Surface pressure positively impacts COVID-19 confirmed cases in Pakistan (β = 25.77, 95% CI: 7.85, 43.69) with a lag of 28 days and Sri Lanka (β = 411.63, 95% CI: 49.04, 774.23) with a lag of 13 days. Moreover, surface pressure with a lag of 9 days in India (β = -1.91, 95% CI: -3.75, -0.06) negatively impacts the transmission of COVID-19 confirmed cases. The detailed result about the influence of meteorological factors on COVID-19 transmission is presented in Table 2. The average value of the error measures in the XGBoost model is lower than the ARIMAX models for all the SAARC countries (Fig 5). Hence, in our study, it was found that XGBoost performs better in predicting COVID-19 confirmed cases in most of the SAARC countries. The detailed procedure of ARIMAX and XGBoost model fitting for COVID-19 confirmed case prediction is presented in S1 File.
Fig 5

Performance metrics of the ARIMAX and XGBoost models for predicting COVID-19 confirmed cases in SAARC countries.

Discussion

This study predicted the effect of meteorological factors on the transmission of COVID-19 confirmed cases in SAARC countries. In South Asia, Bangladesh experiences subtropical monsoon weather, with annual average temperatures hovering from 26 to 36°C [49]. Afghanistan experiences hot, dry summer and chilly winter. In summer, the highest temperature in the country reaches up to 50°C, whereas in winter it is -25°C [50]. India has two distinct climate conditions: tropical monsoon weather and tropical wet and dry weather [51]. In Pakistan, there has a wide range of typical temperatures, from 2°C to 38°C [52]. There are distinct rainy and dry seasons in Sri Lanka’s tropical climate. The coastal regions of Sri Lanka get year-round temperatures of 28°C whereas the highland regions experience lower, more moderate temperatures of 16°C to 20°C [53]. The vast elevational differences in Bhutan result in a diversified climate [54]. The climate of Nepal varies according to altitude: subtropical with a rainy season in the southern flat strip, moderate in the low mountains, and chilly in the Himalayan peaks [55]. The study found that the meteorological factors have both positive and negative influences on the transmission of COVID-19 confirmed cases. For instance, the maximum temperature had a positive influence on the transmission of COVID-19 confirmed cases in Afghanistan and India which is similar to some previous studies in the EU [56]. This study also found a negative impact of minimum temperature on COVID-19 transmission which is in line with some previous studies conducted in China and the USA [10, 16, 57]. But some previous studies conducted in Spain and China claimed that temperature had no impact on COVID-19 transmission [58, 59]. We also found that relative humidity had a negative influence on the transmission of COVID-19 cases in Bhutan and Nepal which is in line with some previous studies [10, 21, 22, 60]. We found that surface pressure had a positive influence in Pakistan and Sri Lanka as well as negative impact on the COVID-19 confirmed cases in India which is also in line with some previous studies [61, 62]. It was also stated by some studies that surface pressure had no impact on COVID-19 transmission [25]. This study also found a statistically significant association of maximum wind speed with COVID-19 confirmed cases in Bangladesh and Maldives which is similar to a previous study [63]. This study didn’t find any statistically significant association of precipitation with COVID-19 confirmed cases while previous studies examined that this is associated with the transmission of COVID-19 confirmed cases [64]. This paper evaluated the applicability of two popular models ARIMAX and XGBoost for predicting the COVID-19 incidence in SAARC countries. The models showed promising results in terms of predicting the time series without the assumptions that traditional epidemiological models require. Machine learning models, as an alternative to epidemiological models, showed potential for COVID-19 prediction. Considering the availability of only a small amount of training data, it is expected that machine learning will be developed further as the basis for, or a component of, future COVID-19 outbreak prediction models. The XGBoost model is an uptrend machine learning technique in time series modelling. The novelty of our study is that we predicted COVID-19 confirmed cases with ARIMAX model and a data-driven eXtreme Gradient Boosting algorithm using the significant meteorological factors as covariates. The XGBoost technique offers several benefits in terms of model forecasting, including the non-requirement of data preprocessing, complete feature extraction and high prediction accuracy. This study used this technique to predict COVID-19 confirmed cases using the significant meteorological variables. Because it features a higher late trimming penalty than a standard Gradient boosting decision tree, which reduces the likelihood of overfitting [65]. The XGBoost model was developed by adjusting its different parameters. We selected the most traditional ARIMAX as a baseline for our study. The study found that the XGBoost model performs better in predicting the COVID-19 confirmed cases in most of the SAARC countries. In this study, we used these models as a case study to find the significant relationship between meteorological factors and the COVID-19 transmission and compared the prediction accuracy of those models to determine the best model. The findings of this study are also useful for all other COVID-19-affected countries similar to SAARC countries.

Limitations

This study used ARIMAX and XGBoost predictive models to investigate the impact of meteorological factors on COVID-19 transmission in SAARC countries. Therefore, a limitation of the study is that, for example, socioeconomic, demographic, healthcare facilities, human mobilities and population density covariates were not incorporated in this study. These covariates might be correlated with the COVID-19 transmission and should be investigated further based on the data availability.

Conclusion

This study shows the machine learning-based XGBoost model performs better than the ARIMAX model in predicting the COVID-19 incidence in SAARC countries. In the absence of effective COVID-19 prevention strategies, our proposed predictive model is useful for government authorities, researchers and planners to put forward strategic plans to control the spread of COVID-19. It is, therefore, possible for other nations to adopt the suggested frameworks and prevention measures. By exploring the influence of meteorological risk factors on COVID-19 transmission, we can help people to establish a more accurate early warning system and recommend developing appropriate environmental policies to control the spread of the pandemic.

Time series COVID-19 data of SAARC countries along with meteorological variables from the onset of COVID-19 incidence to January 29, 2022.

(XLSX) Click here for additional data file.

R codes.

(TXT) Click here for additional data file. (DOCX) Click here for additional data file. 24 Jun 2022
PONE-D-22-14088
Identification of significant meteorological risk factors and accuracy comparison of ARIMAX and XGBoost models for COVID-19 prediction in SAARC Countries
PLOS ONE Dear Dr. Rahman, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 08 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Randeep Singh Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please amend either the abstract on the online submission form (via Edit Submission) or the abstract in the manuscript so that they are identical. 3. Please ensure that you refer to Figure 1 in your text as, if accepted, production will need this reference to link the reader to the figure. 4. We note that Figure 2 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure 1 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. The following resources for replacing copyrighted map figures may be helpful: USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/ The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/ Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/ Landsat: http://landsat.visibleearth.nasa.gov/ USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/# Natural Earth (public domain): http://www.naturalearthdata.com/ [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly Reviewer #3: Yes Reviewer #4: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This study found that meteorological factors had both positive and negative impacts on the transmission of COVID-19 confirmed cases. However, to conclude this, the study needs a rigorous regression analysis to control other essential covariates (demographic, socioeconomic, healthcare, human mobility, etc.). In Table 1, some significant values at the 5% level are inconsistent with the confidence intervals. If a confidence interval contains zero, there is strong evidence that there is no 'significant' difference between the two population means. There are typographical or grammatical errors, including the agent's name for COVID-19. And finally, the authors need to explain their research findings as the evidence to date is inconsistent. Reviewer #2: Title is not so attractive and easily understandable. Abbreviation need to be avoided from the title. The tile is large enough, so please concise and make it attractive for laymen reader also. Abstract- Methodology part need to be more elaborately mentioned, how data was analysis need to be mentioned. The conclusion section is more general rather need to mention about what specific action need to be taken. Introduction- Section is very short, how climate is related to COVID-19 need to be more emphasized in introduction. Methodology- Though this is a secondary source of data, but not clearly mentioned. How and from where the permission was taken from the data utilization need to be mentioned. What are the parameters use for the study need to be added. Ethical the ethical permission was taken for this study. Result- Interesting findings. But why country wise the result is different can be a matter of fact. In Afghanistan and Pakistan the transmission of COVID-19 cases negatively impact with minimum temperature, but reverse in other two countries but why, is there any other factors. Need to measure again the degree of error in your study findings. The findings could not equally achieve all three objective you mentioned. Conclusion- Is not so attractive, need rewrite ignoring general sentence in such special innovative research. What we can do in future should be directed in the conclusion. Reviewer #3: The study aimed at identifying significant meteorological risk factors and accuracy comparison of ARIMAX and XGBoost models for COVID-19 prediction in SAARC Countries. The study was well designed. They used sophisticated machine learning algorithms for doing so and from my expertise I think that they have been executed accurately. The results were presented in a well-organized manner. The findings are well-grounded and will be very helpful for the scientific community working the field of COVID-19. Therefore, I am recommending this study for publication. Reviewer #4: The paper is interesting, however, the key contribution is not clear. Please add additional paragraph of theoretical / practical significance of the research. Also, please copy-edit the entire manuscript. The references are not followed proper guideline: For example: 175 29. Hyndman RJ. AG. Forecasting: principles and practice. [cited 28 Feb 2022]. Available: 176 https://books.google.com.bd/books?hl=en&lr=&id=_bBhDwAAQBAJ&oi=fnd&pg=PA7 177 &dq=Forecasting+principles+and+practice&ots=Tii- 178 tiZPFJ&sig=Q6Y58Rd860QSTBEi1BxeriFw4Z0&redir_esc=y#v=onepage&q=Forecastin 179 g principles and practice&f=false 180 30. Franses PH. Primary Demand for Beer in The Netherlands : An Application of ARMAX 181 Model. 1991;xxvm: 240–245. 182 31. Lv CX, An SY, Qiao BJ, Wu W. Time series analysis of hemorrhagic fever with renal 183 syndrome in mainland China by using an XGBoost forecasting model. BMC Infect Dis. 184 2021;21: 1–13. doi:10.1186/S12879-021-06503-Y/TABLES/5 ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Abu Sayeed Md. Abdullah Reviewer #3: Yes: Ishtiaque Ahammad Reviewer #4: Yes: Syed Far Abid Hossain ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 8 Jul 2022 Attached Submitted filename: Response to Reviewers.docx Click here for additional data file. 21 Jul 2022
PONE-D-22-14088R1
Time series prediction of COVID-19 transmission in SAARC countries using a data-driven XGBoost machine learning model
PLOS ONE Dear Dr. Rahman, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 04 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Randeep Singh Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The key findings of this study are based on Table 2, which is not valid and not reliable with the description. The authors failed to interpret the findings in Table 2. Most of the beta values are statistically insignificant but described as significant findings in the manuscript. There is a close relationship between confidence intervals and significance tests. Specifically, if a statistic is significantly different from 0 at the 0.05 level, then the 95% confidence interval will not contain 0. Reviewer #2: Title need to be changed for making more attractive. The abbreviation used in the title need to remove. Three objectives are more in a article which might confuse the reader. The objective need to make easy to understand. How study finings guide the authorities need to mention in conclusion of abstract. The methodology is relatively complex to understand. Conclusion should include some points of recommendation Reviewer #4: The author (s) tried to investigate the Time series prediction of COVID-19 transmission in SAARC countries using a datadriven XGBoost machine learning model. The revised version is good. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Farid Uddin Ahmed Reviewer #2: No Reviewer #4: Yes: Dr. Syed Far Abid Hossain ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 26 Jul 2022 Attached Submitted filename: Response to Reviewers.docx Click here for additional data file. 8 Aug 2022 A data-driven eXtreme Gradient Boosting machine learning model to predict COVID-19 transmission with meteorological drivers PONE-D-22-14088R2 Dear Dr. Rahman, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Randeep Singh Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Interestingly and surprisingly the authors have addressed previous issues regarding the analysis. Now it would be better, if they clarify what were the problem-either with the data or with the analysis method!!!! Reviewer #2: Title can be revised for understanding easily. One main objective can be highlighted in the research. The conclusion of the study is not strong enough. The discussion section need to be elaborated with other such epidemiological distribution of COVID-19 influencing factors. Your highlighted findings are better to justify with other such findings in other countries. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Farid Uddin Ahemd Reviewer #2: No ********** 2 Sep 2022 PONE-D-22-14088R2 A data-driven eXtreme Gradient Boosting machine learning model to predict COVID-19 transmission with meteorological drivers Dear Dr. Rahman: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Randeep Singh Academic Editor PLOS ONE
  45 in total

1.  Mapping the spatial distribution of the dengue vector Aedes aegypti and predicting its abundance in northeastern Thailand using machine-learning approach.

Authors:  M S Rahman; Chamsai Pientong; Sumaira Zafar; Tipaya Ekalaksananan; Richard E Paul; Ubydul Haque; Joacim Rocklöv; Hans J Overgaard
Journal:  One Health       Date:  2021-12-04

2.  Comparison of Two Hybrid Models for Forecasting the Incidence of Hemorrhagic Fever with Renal Syndrome in Jiangsu Province, China.

Authors:  Wei Wu; Junqiao Guo; Shuyi An; Peng Guan; Yangwu Ren; Linzi Xia; Baosen Zhou
Journal:  PLoS One       Date:  2015-08-13       Impact factor: 3.240

3.  Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries.

Authors:  Yu Wu; Wenzhan Jing; Jue Liu; Qiuyue Ma; Jie Yuan; Yaping Wang; Min Du; Min Liu
Journal:  Sci Total Environ       Date:  2020-04-28       Impact factor: 7.963

4.  Meteorological factors and COVID-19 incidence in 190 countries: An observational study.

Authors:  Cui Guo; Yacong Bo; Changqing Lin; Hao Bi Li; Yiqian Zeng; Yumiao Zhang; Md Shakhaoat Hossain; Jimmy W M Chan; David W Yeung; Kin-On Kwok; Samuel Y S Wong; Alexis K H Lau; Xiang Qian Lao
Journal:  Sci Total Environ       Date:  2020-11-23       Impact factor: 7.963

5.  Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study.

Authors:  Mirxat Alim; Guo-Hua Ye; Peng Guan; De-Sheng Huang; Bao-Sen Zhou; Wei Wu
Journal:  BMJ Open       Date:  2020-12-07       Impact factor: 2.692

6.  Role of artificial intelligence-internet of things (AI-IoT) based emerging technologies in the public health response to infectious diseases in Bangladesh.

Authors:  Md Siddikur Rahman; Nujhat Tabassum Safa; Sahara Sultana; Samira Salam; Ajlina Karamehic-Muratovic; Hans J Overgaard
Journal:  Parasite Epidemiol Control       Date:  2022-08-12

7.  Stability of SARS-CoV-2 in different environmental conditions.

Authors:  Alex W H Chin; Julie T S Chu; Mahen R A Perera; Kenrie P Y Hui; Hui-Ling Yen; Michael C W Chan; Malik Peiris; Leo L M Poon
Journal:  Lancet Microbe       Date:  2020-04-02

8.  Impact of meteorological factors on the COVID-19 transmission: A multi-city study in China.

Authors:  Jiangtao Liu; Ji Zhou; Jinxi Yao; Xiuxia Zhang; Lanyu Li; Xiaocheng Xu; Xiaotao He; Bo Wang; Shihua Fu; Tingting Niu; Jun Yan; Yanjun Shi; Xiaowei Ren; Jingping Niu; Weihao Zhu; Sheng Li; Bin Luo; Kai Zhang
Journal:  Sci Total Environ       Date:  2020-04-09       Impact factor: 7.963

Review 9.  Evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as coronavirus disease 2019 (COVID-19) pandemic: A global health emergency.

Authors:  Thamina Acter; Nizam Uddin; Jagotamoy Das; Afroza Akhter; Tasrina Rabia Choudhury; Sunghwan Kim
Journal:  Sci Total Environ       Date:  2020-04-30       Impact factor: 7.963

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.