Literature DB >> 36091345

The practicality of Malaysia dengue outbreak forecasting model as an early warning system.

Suzilah Ismail1, Robert Fildes2, Rohani Ahmad3, Wan Najdah Wan Mohamad Ali3, Topek Omar4.   

Abstract

Dengue is a harmful tropical disease that causes death to many people. Currently, the dengue vaccine development is still at an early stage, and only intervention methods exist after dengue cases increase. Thus, previously, two scientific experimental field studies were conducted in producing a dengue outbreak forecasting model as an early warning system. Successfully, an Autoregressive Distributed Lag (ADL) Model was developed using three factors: the epidemiological, entomological, and environmental with an accuracy of 85%; but a higher percentage is required in minimizing the error for the model to be useful. Hence, this study aimed to develop a practical and cost-effective dengue outbreak forecasting model with at least 90% accuracy to be embedded in an early warning computer system using the Internet of Things (IoT) approach. Eighty-one weeks of time series data of the three factors were used in six forecasting models, which were Autoregressive Distributed Lag (ADL), Hierarchical Forecasting (Bottom-up and Optimal combination) and three Machine Learning methods: (Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest). Five error measures were used to evaluate the consistency performance of the models in order to ensure model performance. The findings indicated Random Forest outperformed the other models with an accuracy of 95% when including all three factors. But practically, collecting mosquito related data (the entomological factor) was very costly and time consuming. Thus, it was removed from the model, and the accuracy dropped to 92% but still high enough to be of practical use, i.e., beyond 90%. However, the practical ground operationalization of the early warning system also requires several rain gauges to be located at the dengue hot spots due to localized rainfall. Hence, further analysis was conducted in determining the location of the rain gauges. This has led to the recommendation that the rain gauges should be located about 3-4 km apart at the dengue hot spots to ensure the accuracy of the rainfall data to be included in the dengue outbreak forecasting model so that it can be embedded in the early warning system. Therefore, this early warning system can save lives, and prevention is better than cure.
© 2022 The Authors.

Entities:  

Keywords:  Early warning system; IoT; Machine learning; dengue; rainfall

Year:  2022        PMID: 36091345      PMCID: PMC9418377          DOI: 10.1016/j.idm.2022.07.008

Source DB:  PubMed          Journal:  Infect Dis Model        ISSN: 2468-0427


Introduction

Dengue is a harmful disease caused by a virus commonly found in Aedes aegypti and Aedes albopictus mosquito (Johari et al., 2019; Nor Aliza et al., 2019; Rohani et al., 2014). Every year, an estimated 100–400 million people are affected by dengue (WHO, 2021), where about 5% of cases die (World Mosquito Program, 2021). At present, a dengue vaccine is still at an early stage of development (Lim & Poh, 2018; Sheng-Qun et al., 2020). Thus, only an intervention method exists to combat dengue. In many countries, including Malaysia, the current intervention around a dengue outbreak is usually conducted after dengue cases have occurred, such as fogging the dengue cases area with insecticide (MaHTAS, 2019; Song-Quan, 2016), but this conventional approach causes harm to the community. Consequently, many studies have been conducted to predict dengue outbreaks using mathematical models (either statistical or artificial intelligence or both) in attempts to create an early warning system to prevent dengue. Sanchez-Gendriz et al. (2022) developed a data-driven computational intelligence approach to forecasting the dengue outbreak at Natal, Brazil, using Regression and Neural Network models. Liu, Yin, et al. (2021) implemented Machine Learning Models (Artificial Neural Network and Support Vector Machine) in Guangzhou, China, by integrating them with environmental features. Patil and Pandya (2021) employed Regression, Machine Learning (Random Forest, Decision Trees, Support Vector Machine), and Time Series Models (Moving average, Exponential Smoothing, and ARIMA) in Maharashtra State, India, in forecasting dengue hotspots using meteorological parameters. Colon-Gonzalez et al. (2021) applied Bayesian spatiotemporal models in predicting dengue cases in Vietnam. Johansson et al. (2019) used probabilistic forecasting for dengue epidemics in Peru and Puerto Rico. Ong et al. (2018) used Random Forest to predict risk dengue transmission based on dengue, population, entomological and environmental data in Singapore. The outcome of these studies highlighted the significance of weather data as the key driver and machine learning methods as the best performing dengue outbreak forecasting model. In Malaysia, the search for a useful dengue outbreak forecasting model started 15 years ago by conducting scientific experimental field studies to establish appropriate impactful factors because many past studies had failed to establish sufficiently accurate models to be of practical value (Ang & Li, 2002; Cheah et al., 2006; Cheng, 2006; Fatimah et al., 2005; Luz et al., 2003; Seng et al., 2005, pp. 109–123; Usman, 2003). One reason was relating a seasonal pattern to the epidemiological factor (dengue cases) where the impact of climate change has amplified the inconsistency of the rainfall season in Malaysia (Anyamba et al., 2006; Bartley et al., 2002; Cheah et al., 2006; Hales et al., 2002; Hartley et al., 2002; Sulaiman et al., 1996; Woodruff et al., 2006). Another reason, although there were studies using two factors: epidemiological (dengue cases) and environmental (weather data) but relying on weather data from Meteorological Department (secondary data) jeopardized the accuracy of forecasting model (Barbazan et al., 2002; Cheah et al., 2006; Kumarasamy, 2006; Ram et al., 1998; Rosa-Freitas et al., 2006) because of the localized rainfall in Malaysia. Thus, previously, two scientific studies were implemented using an experimental field design to establish factors determining dengue outbreaks in Malaysia, which were epidemiological (dengue cases data), entomological (mosquito-related data), and environmental (weather data) factors. These studies aimed to produce a dengue outbreak forecasting model suitable to be incorporated into an early warning system. Experimental field design was used to collect the primary data of the entomological and environmental factors to guarantee the accuracy of the obtained dengue outbreak forecasting model. Past studies have shown that using secondary weather data in the dengue prediction model has lowered accuracy, such as in Salim et al. (2021), who only obtained 70% and Jain et al. (2019), attaining 73%. The first experimental study was implemented from 2007 until 2009 in four high dengue cases housing areas in Malaysia (Rohani et al., 2011). But due to the small areas selected, only two factors (entomological and environmental) were significant. However, the study contributed important findings regarding the relationship between entomological (mosquito related data) and environmental (rainfall, temperature, and humidity); where it revealed that the previous week's rainfall (which was collected primarily by placing a mobile weather station in the housing areas) influenced the dengue mosquito population's increment. But the rainfall data (secondary data) obtained from the Meteorological Department based on their weather stations showed insignificant results. This is because Malaysia's rainfall pattern is very localized (Muhammad et al., 2020; Singh et al., 2022), meaning there are rains in certain areas but no rain just a few blocks away. The localized rain can jeopardize rainfall data quality (obtained from Meteorological Department). But rainfall is an important indicator in dengue outbreak forecasting model (Hii et al., 2012; Liu, Yin, et al., 2021; Ong et al., 2018; Patil & Pandya, 2021; Singh et al., 2022) because rainfall influence the Aedes mosquito population (the vector of dengue virus) to breed. Therefore, a practical solution is by placing several rain gauges (mobile weather stations) in the dengue hot spot areas, thereby enhancing the quality of the rainfall data, and potentially increasing the accuracy of the dengue outbreak forecasting model. The first study's findings and limitations were considered when implementing the second experimental field study in two dengue-prone areas in Malaysia (Rohani et al., 2018). Two larger areas were selected based on five consecutive years of high dengue cases: ovitraps were used to collect the Aedes larvae and several mobile weather stations comprising rain gauges, temperature and humidity data loggers were allocated at both areas. Similar findings were obtained regarding the relationship between entomological and environmental as in Rohani et al. (2011). The findings also indicated the three factors were significant, and in-depth relationships among them were established. An Autoregressive Distributed Lag (ADL) Model was developed to forecast dengue outbreaks with an accuracy of 85%. Hence, this study aims to develop a dengue outbreak forecasting model with at least 90% accuracy to be embedded in an early warning computer system using the Internet of Things (IoT) approach that is practical and cost-effective for ground operationalization of the system. High accuracy of 90% is essential to minimize the costs and ensure the benefits when implemented in the dengue hot spot area. The last few years have seen considerable interest and recent success in developing new forecasting methods. Two areas that have seen greatest success is by applying machine learning (ML) methods to time series forecasting problems such as the ones outlined here (see Makridakis et al., 2018), and second, the use of combinations of forecasts through hierarchical models. We, therefore, consider three standard ML methods and analyze their performance compared to a state-of-the-art statistical approach, whilst also examining the value that taking a hierarchical approach brings. Thus, the contribution of this study is twofold: to develop a forecasting method sufficiently accurate to be of value to those working on Dengue fever, and second, to examine the benefits of introducing more advanced forecasting methods into the field.

Methodology

We used Rohani et al. (2018) time series data which was collected weekly for eighty-one (81) weeks involving three factors (epidemiological, entomological, and environmental) for two large areas in Malaysia (Selayang and Bandar Baru Bangi). The two areas were selected based on five consecutive years of high dengue cases. Epidemiological data were notified dengue cases, onset cases and the number of interventions was obtained from the e-Dengue system developed by the Malaysia Ministry of Health. Notified dengue cases were defined as clinical description dengue cases and notified to the nearest health office, while the onset cases were denoted as the first day of having high fever related to dengue symptoms. The number of interventions was based on interventions conducted in the dengue cases areas. Entomological data were the number of Aedes larvae collected weekly using 50 ovitraps in Selayang and 55 ovitraps in Bandar Baru Bangi; and screened using reverse transcriptase-polymerase chain reaction (RT-PCR) method to detect the dengue virus (positive PCR). This process was time consuming and very costly to conduct because every week, all ovitraps (105) had to be collected and replaced with fresh ovitraps. Then the ovitraps were brought to the laboratory and the eggs/larvae were allowed to further develop up to 5 days and 10 days respectively for the Aedes identification which requires a total of 15 days for each batch each week. Next, the larvae were screened using RT-PCR to detect the type of dengue virus with each RT-PCR costing around MYR100, a total cost of MYR200,000 (USD50,000) for 81 weeks. But this huge cost was worth it because of the success in establishing deep understanding regarding the relationships among the three factors as the parameters in the dengue outbreak forecasting model. The environmental data were collected using ten and eleven mobile weather stations allocated at both areas, respectively, which consisted of rain gauges, temperature, and humidity data loggers. The Air Pollution Index (API) was obtained from Malaysia Environment Department. Refer to Rohani et al. (2018) for the details and description of the data collections. Based on Fig. 1, all the data were aggregated for level 0 described as the pooled data because the aim was to develop one dengue outbreak forecasting model for Malaysia. Total was calculated to pool the epidemiological data (notified dengue cases, onset cases and number of interventions) in both areas as well as the entomological data (Aedes larvae and number of positive PCR) and rainfall data. Next, the data were partitioned for model estimation/training (week 1 until week 70) and model evaluation/testing (week 71 until week 81) in measuring the model's performance.
Fig. 1

Hierarchical structure.

Hierarchical structure. Six forecasting methods were used, which were Autoregressive Distributed Lag (ADL) model, Hierarchical Forecasting (Bottom-up and Optimal combination), and Machine Learning (ML) methods (Artificial Neural Network (ANN), Support Vector Machine (SVM), and Random Forest). The ADL and ML forecasting methods were used because based on past studies (Chiung et al., 2018; Guo et al., 2017; Liu, Yin, et al., 2021; Ong et al., 2018; Patil & Pandya, 2021; Sanchez-Gendriz et al., 2022) showed their relevance in predicting dengue outbreaks. They had also proved their value in various forecasting evaluation studies (see e.g., Makridakis et al., 2018). Hierarchical Forecasting approaches (Bottom up and Optimal combination) have also been adopted because while the aim is to produce only one generic dengue outbreak forecasting model, by pooling the two areas data improved accuracy may be expected, as several studies have shown that pooling the data can increase the accuracy of the resulting model (Matthias & Feng, 2022; Petropoulos et al., 2022; Siriyasatien et al., 2018; Spiliotis et al., 2019; Zhao et al., 2020).

a. Autoregressive Distributed Lag (ADL)

The following equation representing the ADL model was used in this study,where j is the lag length, t = 1,2, …,T (time periods) and is the target variable which is the notified dengue cases. } represent the ten predictors, which are epidemiological, entomological, and environmental variables. ε are independent identically distributed random errors with mean zero and variance , α and φ are unknown parameters to be estimated using Ordinary Least Squares (Mohd Alias, 2013). Based on the Autoregressive Distributed Lags (ADL) models obtained in Rohani et al. (2018), the following variables contributed to the 85% accuracy, supported by the theoretical explanation of the dengue experts; these have been based on the Aedes mosquito life cycle related to weather, virus incubation period, dengue symptoms, and intervention taken by the authority when there were reported cases. Dependent (Target Variable): Notified Dengue Cases () Independent Variables (Predictors): Onset dengue cases (Onset) Last week Onset dengue cases (Onset1) Last 3 weeks Larvae (Larvae3) Last 3 weeks Positive PCR (PCR3) Last 4 weeks Rainfall (Rainfall4) Last 3 weeks Minimum Temperature (MinTemp3) Last 3 weeks Maximum Temperature (MaxTemp3) Last 3 weeks Maximum Humidity (MaxHumid3) Last 4 weeks of Air Pollution Index (API4) Next week Intervention (InterventionLD1) In this study, the pooled data of these variables were used in the ADL model and also the following forecasting methods in determining the best dengue outbreak forecasting model that can produce at least 90% accuracy.

b. hierarchical forecasting

There are four hierarchical forecasting methods: Top-down, Bottom-up, Middle out, and Optimal combination (Hyndman & Athanasopoulos, 2021). However, in this study, only Bottom-up and Optimal combinations were used because the aim was to produce one general model for Malaysia by pooling the data as explained and illustrated in Fig. 1 previously. Top-down was excluded because it would generate a separate individual model for the two areas. Middle out was also excluded because it required at least two hierarchical levels, and, in this study, there was only one level. In the Bottom-up, two ADL models were estimated at the bottom level of the hierarchical structure (level 1) as in Fig. 1. Then the forecast values of the two models were added together to produce the level 0 forecast values. In the optimal combination, forecast values were produced at level 1, and level 0 using three ADL models (i.e., two for the individual areas data and one using the pooled data) and named as the base forecast values. Next, the three sets of base forecast values were used in the following equation to produce coherent forecast (Hyndman & Athanasopoulos, 2021) Therefore, the optimal reconciled forecasts are given by, for all , where where is coherent forecasts, is base forecasts, is a matrix that maps the base forecasts into the bottom-level, and the summing matrix sums these up using the aggregation structure to produce a set of coherent forecasts, is the forecast error variance of the -step-ahead base forecasts and is vector of residuals of the models that generated the base forecasts stacked in the same order as the data.

c. machine learning methods

Three methods were used in this study which are Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest. ANN adopts the Multi-Layer Perceptron algorithms together with the back propagation method (Blokdyk, 2020, p. 305). Support Vector Machine (SVM) for regression with a polynomial kernel (Kowalczyk, 2017). Random Forest constructs multiple decision trees with bootstrap aggregation (Geneur & Poggi, 2020). All these ML methods were conducted using WEKA (Brownlee, 2020). After obtaining 6 estimated models from the forecasting methods, forecast errors were generated based on out-of-sample data (week 71–81 as stated in earlier section for model evaluation) and the following error measures (Koutsandreas et al., 2022; Chen et al., 2017) were used to evaluate the performances of the models. In this application, several error measures were required in order to determine the performance consistency of the forecasting models (Davydenko & Fildes, 2013) because there are multiple potential users and applications of the final forecasting model under construction who do not necessarily share the same loss functions. a. Geometric Mean Relative Absolute Error, Where is the out-of-sample forecast error obtained from Random Walk Model , and is a white noise variable with zero mean and constant variance . b. Relative Mean Absolute Error, Where and is the out-of-sample MAE obtained from Random Walk Model , and is a white noise variable with zero mean and constant variance . c. Mean Absolute Scaled Error, Where is the in-sample MAE obtained from Random Walk Model , and is a white noise variable with zero mean and constant variance . d. Unscaled Mean Bounded Relative Absolute Error, Where , is the out-of-sample forecast error obtained from Random Walk Model , and is a white noise variable with zero mean and constant variance . e. Mean Absolute Percentage Error, MAPE was used to calculate the percentage accuracy of the model (100 – MAPE) These forecast errors were calculated based on 4 weeks ahead (Lead 1, 2, 3 and 4); while median was used to summarize the overall 4 leads for each of the above error measures. Next, the median was ranked from 1 to 6 to determine the best forecasting model.

Results and discussion

Fig. 2 shows 81 weeks of notified dengue cases of Selayang, Bandar Baru Bangi, and aggregated (pool) cases for both areas. Selayang and Bandar Baru Bangi have quite similar patterns, although the numbers of cases were more in Bandar Baru Bangi as compared to Selayang. Rohani et al. (2018) has established a relationship between notified dengue cases and three factors which were the epidemiological, entomological, and environmental using correlation and an Autoregressive Distributed Lag (ADL) model.
Fig. 2

Notified dengue cases for Selayang, Bandar Baru Bangi and pool data.

Notified dengue cases for Selayang, Bandar Baru Bangi and pool data. Table 1 displays the correlation of the notified dengue cases (target variable) and predictors of Selayang, Bandar Baru Bangi, and the pooled data. There were significant relationships in the pooled data between notified dengue cases with all the predictors, where the notified cases related to this week and last week onset cases; the last 3 weeks larvae; the last 3 weeks positive PCR; the last 3 weeks minimum and maximum temperature; the last 3 weeks maximum humidity; the last 4 weeks rainfall and the last 4 weeks air pollution index (API), respectively. Notified cases were also related with next week intervention.
Table 1

Correlations between notified dengue cases (target variable) and predictors.

AreaNotified Dengue Cases (Target Variable)
Predictors
OnsetOnset1API4Larvae3PCR3Rainfall4MinTemp3MaxTemp3MaxHumid3Inter ventionLD1
Selayang0.782a0.814a−0.637a0.847a0.537a0.678a−0.453a−0.452a0.674a0.533a
BBBangi0.819a0.811a−0.393a0.747a0.407a0.678a−0.379a−0.474a0.656a0.304a
Pool0.837a0.871a−0.415a0.797a0.465a0.665a−0.485a−0.518a0.649a0.260∗

Significant at 1% ∗ Significant at 5%.

Correlations between notified dengue cases (target variable) and predictors. Significant at 1% ∗ Significant at 5%. Based on the correlations obtained in Table 1, Fig. 3 displays the conceptual relationship between the target variable and predictors according to the interconnection of the 3 factors epidemiological, entomological, and environmental. The relationship was established by dengue experts in connecting the variables with the Aedes mosquito life cycle related to weather, virus incubation period, dengue symptoms, and intervention taken by the authority when there were reported cases. Rainfall plays an important role in Aedes mosquito life cycle by providing potential breeding sites for the Aedes mosquito to lay eggs and develop to the adult stage (Banulata et al., 2021; Malinda et al., 2012; Masnita et al., 2018; Seidahmed & Eltahir, 2016) but long-term air pollution (API) due to haze may interferes with the life cycle of the Aedes mosquito thus reduce their population (Ahmad Khairul Hakim et al., 2020; Carneiro et al., 2017; Rohani et al., 2018; Wilder-Smith et al., 2010). Next, the eggs hatch into larvae within 2–4 days, depending on temperature and humidity. Lower minimum temperature (21-22 °C), higher minimum temperature (26 °C and above) and higher maximum temperature (36 °C and above) decreased the number of larvae, but higher maximum humidity (90% and above) increased the number of larvae. However, the temperature and humidity must bie supported wth the amount of rain to influence the multiplication of larvae (Drakou et al., 2020; Masnita et al., 2018; Tran et al., 2020). According to Tran et al. (2020), the effects of precipitation and relative humidity on Breteau Index (BI) were more obvious when the average temperature exceeded the threshold. The larger number of larvae also contributed to higher chances of obtaining more positive PCR (Fansiri et al., 2021; Peña-García et al., 2016). Then after 5–10 days, adult Aedes mosquito (male and female) emerge and mate within 3–5 days. Later, the infected adult female mosquito bites human for blood in producing eggs (Harrison et al., 2021; Clements, 1992). Aedes mosquito acquires the virus while feeding on the blood of an infected (viraemic) human. It takes 3–8 days before the mosquito become capable of transmitting the virus (Alto & Bettinardi, 2013; Carrington & Simmons, 2014) either vertically (transovarial) or horizontally. Vertical or transovarial transmission involves the passage of dengue viruses to the offspring through the eggs of infected female mosquitoes (Wan Najdah et al., 2021; Windyaraini et al., 2019; da Costa et al., 2017; Joshi et al., 2002) while horizontal transmission involves the passage of dengue viruses to a vertebrate host via a vector. Typically, 4 days after being bitten by an infected Aedes mosquito, a person will develop viremia, a condition in which there is a high level of the dengue virus in the blood. Viremia lasts for approximately 5–12 days depending on the virus incubation period in the human. On the first day of viremia, the person generally shows no symptoms of dengue fever. Five to 7 days after being bitten by the infected mosquito, the person develops symptoms of dengue fever, starting with high fever, defined as onset dengue cases. Typically, 70% of the dengue cases have been notified to the nearest health office, 2–6 days after onset (termed as notified dengue cases) because in the early-stage people usually misdiagnose themself (MaHTAS, 2015). Once the cases have been notified, the Health Department conducts an intervention at the dengue hot spot (MaHTAS, 2015). For larva control, the strategies conducted would be environment management, source reduction, use of larvicides such as temephos (Abate), house inspection and enforcement of Destruction of Disease-Bearing Insect Act 1975; while for adult control, fogging will be carried out based on the viral cases reported (Kumarasamy, 2006; Tham, 2000, pp. 15–23). All the fogging activities are employed by the trained practitioners of Ministry of Health and Ministry of Housing and Local Government. Since the intervention was implemented after 4 weeks that larvae have existed, there is ample time for dengue transmission. We then can expect a subsequent initial increase in dengue cases being notified from the locality 3–4 weeks after the existence of larvae (Sanchez-Gendriz et al., 2022; Rohani et al., 2018). The reason is these intervention methods cannot prevent dengue outbreaks by themselves as they are not sustainable. An important reason for this is the low vector threshold for dengue transmission (Shamsul et al., 2012). As low as 2-3 adult female mosquitoes emerging every day in a locality of 100 people is sufficient to start an outbreak.
Fig. 3

Conceptual relationship (target variable & predictors): Epidemiological, entomological and environmental factors based on weeks.

Conceptual relationship (target variable & predictors): Epidemiological, entomological and environmental factors based on weeks. Table 2 presents the estimated ADL model for Selayang, Bandar Baru Bangi and the Pooled data. Five variables were significant in the pooled model, which were Onset, Onset1, Larvae3, Rainfall4 and InterventionLD1. Although there were other variables that were not significant in the model but since these variables were important in explaining the notified dengue cases and the relationship have been verified by experts and established in previous study Rohani et al. (2018). Thus, the same set of variables were used in the other forecasting methods considered.
Table 2

Estimated autoregressive distributed lag (ADL) model.

Variables (Predictors)SelayangBandar Baru BangiPool
Intercept−18.4008.138−80.4545
Onset0.372a0.393a0.490a
Onset10.381a0.399a0.137b
Larvae30.007a0.005b0.011a
Rainfall4−0.0210.0170.055a
MinTemp3−0.2721.3521.282
MaxTemp30.705−1.3231.305
MaxHumidity30.0390.051−0.181
PCR30.624b0.7270.262
API4−0.0160.0070.002
InterventionLD10.0610.0580.113a
Adjusted R20.84650.83880.8506
Information Criterion135.039268.466206.447
Accuracy84.9% (∼85%)84.1% (∼84%)85.7% (∼86%)

Target Variable: Notified Dengue Cases.

Significant at 1%.

Significant at 5%.

Estimated autoregressive distributed lag (ADL) model. Target Variable: Notified Dengue Cases. Significant at 1%. Significant at 5%. Table 3 show the error measures results based on RelMAE, MASE, GMRAE and UMBRAE for 4 weeks ahead forecast (lead 1, 2, 3 and 4). All error measures indicated Random Forest as the best forecasting model, followed by Support Vector Machine (SVM) and Artificial Neural Network (ANN). These demonstrated machine learning (ML) methods performed better than the Autoregressive Distributed Lag (ADL) and Hierarchical Forecasting when applied to an ADL base forecaster. Since the ADL model is the basis of Hierarchical Forecasting (Bottom-up and Optimal combination), three error measures (RelMAE, MASE and GMRAE) indicated Optimal combination performs better than ADL and Bottom-up. But ADL on pooled data outperformed Bottom-up, which showed that pooling the data initially was better than estimating it individually as in the Bottom-up approach. However, UMBRAE rank slightly different from RelMAE, MASE, and GMRAE but the same with MAPE as in Table 4; where ADL outperformed Hierarchical Forecasting, but still optimal combination outperformed bottom up. These findings aligned with Zhao et al. (2020) where pooling the data and using Random Forest enhanced the dengue prediction. Other studies also have highlighted the significant of data pooling (Matthias & Feng, 2022; Petropoulos et al., 2022; Siriyasatien et al., 2018; Spiliotis et al., 2019). In this study pooling the data increased the number of dengue cases as well as other predictors, thus contributing to the sufficient numbers of data points needed in the dengue modelling process: previously Rohani et al. (2011), only established the model using 2 factors (epidemiological and environmental) due to insufficient number of dengue cases at small areas.
Table 3

Error measures based on RelMAE, MASE, GMRAE and UMBRAE.

Forecasting MethodsLeadRelMAEMASEGMRAEUMBRAE
Autoregressive Distributed Lag (ADL)10.91950.56720.99160.7497
20.75350.51420.96900.6261
30.74730.5480.96420.5938
40.56360.44960.92140.4330
Median0.7504 (5)0.5311 (5)0.9666 (5)0.6100 (4)
Hierarchical Forecasting:Bottom Up using ADL10.94340.58190.99420.9333
20.77480.52880.97210.7618
30.73380.53810.9620.6955
40.58240.46450.92570.5834
Median0.7543 (6)0.5335 (6)0.9670 (6)0.7287 (6)
Hierarchical Forecasting:Optimal Combination using ADL10.9120.56260.99080.8221
20.74420.50790.96770.6559
30.72470.53140.96050.6024
40.5650.45070.92170.5049
Median0.7344 (4)0.5197 (4)0.9641(4)0.5317 (5)
Artificial Neural Network (ANN)10.67980.41940.42080.6339
20.68760.46930.58410.5735
30.66930.49080.50930.6956
40.8660.69070.63160.6037
Median0.6837 (3)0.4800 (3)0.5467 (3)0.5317 (3)
Support Vector Machine (SVM)10.50660.31250.33310.4947
20.44230.30190.25570.394
30.35790.26250.22150.3282
40.3030.24170.18190.2641
Median0.4001 (2)0.2822 (2)0.2386 (2)0.3611 (2)
Random Forest10.38810.23940.20700.4048
20.33760.23040.24630.3191
30.24280.17810.16000.2233
40.23150.18470.13510.2118
Median0.2902 (1)0.2076 (1)0.1835 (1)0.2712 (1)

() - Rank.

Table 4

Model accuracy based on MAPE.

MAPELead 1Lead 2Lead 3Lead 4MedianRankAccuracy (%)
Autoregressive Distributed Lag (ADL)15.957313.898814.704311.244914.3016485.70
Hierarchical Forecasting:Bottom Up using ADL17.005715.034415.261012.724615.1477684.85
Hierarchical Forecasting:Optimal Combination using ADL16.112314.046614.618611.788314.3326585.67
Artificial Neural Network (ANN)11.645012.591715.199120.755213.8954386.10
Support Vector Machine (SVM)8.58338.10016.96316.17887.5316292.47
Random Forest6.76096.22224.64704.69815.4602194.54
Random Forest Less 2 variables7.89568.74155.40899.17928.318691.68
Error measures based on RelMAE, MASE, GMRAE and UMBRAE. () - Rank. Model accuracy based on MAPE. Table 4 also displays percentage accuracy calculated using MAPE. The ADL model based on the pooled data has 86% accuracy compared to individual area Selayang (85%) and Bandar Baru Bangi (84%) as in Table 2. Pooling the data has increased the accuracy by 1–2%. Although Artificial Neural Network (ANN) is a machine learning method, its percentage accuracy was not much different from ADL and Hierarchical Forecasting, ranging from 85% to 86%. However, other machine learning methods, Support Vector Machine (SVM) and Random Forest, performed very well with 90% and above. The percentage accuracy of Random Forest was the highest (95%). Past dengue forecasting studies also has highlighted Random Forest as the best model (Carvajal et al., 2018; Ong et al., 2018; Silitonga et al., 2020; Zhao et al., 2020). Random forest performed well here as in other comparisons because of using the bootstrap method (i.e., resampling with replacement) to produce multiple decision trees (Cutler et al., 2011; Fawagreh et al., 2014; Schonlau & Yuyan Zou, 2020; Khaled) where it managed to capture different patterns (i.e., behaviour) in the relationship between notified cases, larvae and rainfall. The first pattern showed that the amount of the last 4 weeks rainfall would increase the last 3 weeks larvae (thus increasing the last 2 weeks mosquito population), finally increasing this week's notified cases. However, another pattern (the second pattern) existed in the relationship due to rainfall. Both previous studies by Rohani et al. (2011 and 2018) have indicated that the rainfall data collected was using the amount and not the intensity. Heavy rainfall (i.e., intensity) can contribute to flush out or kill the larvae in the ovitrap (Promprous et al., 2005). As a result, although there was a high amount of rainfall but less larvae, thus less notified cases, which was the reason for the moderate correlation between notified cases and last 4 weeks rainfall (rainfall4: 0.665) as in Table 1. However, based on our data, this situation only occurred less than 10% of the time, as indicated in Fig. 4(week 50, 56, 57 and 72).
Fig. 4

Total larvae, amount of rainfall and notified dengue cases.

Total larvae, amount of rainfall and notified dengue cases. Although Random Forest was the best model with 95% accuracy, it is practically very costly and time-consuming to collect mosquito-related data (Larvae3) and do virus screening using RT-PCR (PCR3) as explained in the methodology section. Thus, we removed the two variables (Larvae3 and PCR3) of the entomological factor and implemented Random Forest. Table 4 indicated that the percentage accuracy of Random Forest with these two less variables was 92%, reducing the full model accuracy by 3% but still acceptable to be embedded in the early warning system, which can rely only on just two factors (i.e., epidemiological and environmental). The epidemiological data (notified dengue cases, onset cases, and the number of interventions) can be retrieved automatically by the early warning system through integrating with the e-Dengue system developed by Malaysia Ministry of Health. Meanwhile, the environmental data of rainfall, temperature, humidity, and Air Pollution Index can be collected by the mobile weather station and link automatically with the system through the Internet of Things (IoT) approach. Furthermore, this Random Forest model can forecast up to 4 weeks in advance (based on leads 1,2,3 and 4), giving time for the authority to act before dengue cases occur where presently intervention only happens after notified dengue cases increase. However, the real-time operationalization of this system also requires several rain gauges to be located at the dengue hot spot because the rainfall pattern is very localized as explained in the previous section. Thus, further analysis was conducted in determining the distances of the rain gauges. Fig. 5, Fig. 6 show the location of the 10 and 11 rain gauges at Selayang (RG1 to RG10) and Bandar Baru Bangi (RG11 to RG21), respectively (Rohani et al., 2018). The rain gauges were spread across the two large areas to measure the localize amount of rainfall. Table 5, Table 6 display the distance of the rain gauges (upper diagonal) and correlation of the amount of rainfall (lower diagonal) for both areas. In Selayang, there was a strong correlation (0.893) between RG1 and RG2 when the distance was 0.97 km but a weak correlation (0.162) between RG1 and RG6 when the distance was 6.14 km. A similar finding was obtained for Bandar Baru Bangi, where strong correlation (0.871) between RG11 and RG13 when the distance was 1.13 km but a weak correlation (0.078) between RG11 and RG15 when the distance was 4.73 km. These correlations revealed the localized pattern of rainfall amount at the two areas based on the distance of the rain gauges, as shown in Fig. 7 and Fig. 8. There was an obvious pattern when the rain gauges were near to each other, the amounts of rainfall were similar, but when they were far away, the amounts of rainfall were different. Other rain gauges also showed similar findings. Thus, indicating the existence of localized rainfall in both areas, which often occurs in Malaysia (Muhammad et al., 2020; Singh et al., 2022). Next, we extracted the moderate correlation and respective distances as presented in Table 7 for both areas. The mean rain gauge distance was 3.34 km with a 95% confidence interval of 2.82–3.85 km. Based on this, it is recommended that the rain gauges should be located about 3–4 km apart at the dengue hot spot in ensuring the accuracy of the amount of rainfall collected for the dengue outbreak forecasting model to be embedded in the early warning system.
Fig. 5

Location of Rain gauge (Mobile Weather Station) at Selayang.

Fig. 6

Location of Rain gauge (Mobile Weather Station) at Bandar Baru Bangi.

Table 5

Rain gauge distance in kilometre (KM) and correlation (Selayang).

Distance (KM)RG1RG2RG3RG4RG5RG6RG7RG8RG9RG10
CorrelationDistance (KM)
RG1Correlation_0.973.884.374.856.144.042.591.622.91
RG2.893a_4.204.695.346.795.013.562.593.72
RG3.483a.484a_0.892.263.233.494.383.561.78
RG4.481a.485a.773a_1.462.262.914.213.571.63
RG5.411a.423a.477a.476a_1.451.783.723.551.77
RG6.162∗.188∗.269a.306a.375a_2.915.025.013.22
RG7.314a.321a.351a.388a.267a.185∗_2.272.591.94
RG8.508a.512a.492a.519a.363a.286a.431a_1.132.58
RG9.521a.532a.493a.535a.396a.311a.408a.886a_2.1
RG10.392a.390a.608a.634a.410a.221a.344a.520a.685a_

Sig. at 1%, ∗ Sig. at 5%.

Table 6

Rain gauge distance in kilometre (KM) and correlation (Bandar Baru Bangi).

Distance (KM)RG11RG12RG13RG14RG15RG16RG17RG18RG19RG20RG21
CorrelationDistance (KM)
RG11Correlation_1.961.312.444.731.791.412.763.253.573.26
RG12.797a_2.123.415.031.472.441.794.545.033.91
RG13.871a.756a_1.954.582.462.113.093.733.912.45
RG14.744a.720a.763a_3.283.712.934.553.433.111.63
RG150.0780.1080.0740.149_5.054.825.843.623.433.94
RG16.776a.741a.770a.720a0.078_1.781.304.064.544.21
RG17.810a.746a.804a.711a0.067.815a_2.622.953.243.74
RG18.723a.654a.721a.697a0.038.820a.754a_4.705.355.02
RG19.415a.395a.433a.515a.228a.465a.461a.433a_1.324.38
RG20.372a.352a.398a.488a.238a.441a.431a.422a.867a_4.05
RG21.542a.537a.583a.680a0.119.576a.538a.573a.436a.430a_

Sig. at 1%, ∗ Sig. at 5%.

Fig. 7

Rainfall amount pattern of near and far rain gauge (Selayang).

Fig. 8

Rainfall amount pattern of near and far rain gauge (Bandar Baru Bangi).

Table 7

Moderate correlation & distance (KM).

AreaRain GaugeModerate CorrelationDistance (KM)
SelayangRG1 & RG80.5082.59
RG1 & RG90.5211.62
RG2 & RG80.5123.56
RG2 & RG90.5322.59
RG4 & RG80.5194.21
RG4 & RG90.5353.57
RG8 & RG100.5202.58
Bandar Baru BangiRG19 & RG140.5153.43
RG21 & RG110.5423.91
RG21 & RG120.5372.45
RG21& RG130.5834.21
RG21& RG160.5763.74
RG21& RG170.5385.02
RG21& RG180.5733.26
OverallMean3.34
Standard Deviation0.89
95% Confidence Interval for Mean (Lower & Upper Bound)2.82
3.85
Location of Rain gauge (Mobile Weather Station) at Selayang. Location of Rain gauge (Mobile Weather Station) at Bandar Baru Bangi. Rain gauge distance in kilometre (KM) and correlation (Selayang). Sig. at 1%, ∗ Sig. at 5%. Rain gauge distance in kilometre (KM) and correlation (Bandar Baru Bangi). Sig. at 1%, ∗ Sig. at 5%. Rainfall amount pattern of near and far rain gauge (Selayang). Rainfall amount pattern of near and far rain gauge (Bandar Baru Bangi). Moderate correlation & distance (KM). Fig. 9 displays the early warning system conceptual framework using the Internet of Things (IoT) approach by integrating the epidemiological data readily available in the e-Dengue system with the environmental data through automated mobile weather stations (around USD500 each) located at the dengue hot spots. In order to implement a real-time warning system, the Random Forest forecasting model needs to be transformed into an algorithm and embedded in the e-Dengue system by including the parameters as listed in Fig. 9. Real time notification of a potential dengue outbreak will then be alerted to the Health Department and COMBI, a community engagement group set up by Ministry of Health (My Health, 2017) to prevent dengue by destroying the Aedes breeding sites. Thus, the proposed e-Dengue early warning system can save lives because prevention is better than cure. According to Lee et al. (2010), the cost of dengue patient hospitalization (dengue illness cost) in Malaysia are 11 times the amount of government spending on Aedes vector control. This relationship indicates that increased investment on prevention could potentially reduce these illness costs. The study also found that dengue may also adversely impact tourism and create emotional and long-term burdens on families affected by illness and deaths. Md Shahin et al. (2016) stated that the mean total cost per case of dengue infection was USD365.16 and average duration of dengue illness was 9.69 days. All of these negative impacts due to dengue can be avoided through the proposed e-Dengue early warning system which is very practical and cost effective using the Internet of Things (IoT) approach.
Fig. 9

e-dengue early warning system conceptual framework.

e-dengue early warning system conceptual framework.

Conclusion

The journey in searching of Malaysia dengue outbreak forecasting model by integrating 3 important factors (epidemiological, entomological, and environmental) using scientific experimental field studies started 15 years ago due to the many past studies that have failed to establish a practical model to be embedded in an early warning system. One of the reasons was using only the epidemiological factor (dengue cases) related to seasonal patterns where the impact of climate change has affected the consistency of the rainfall season in Malaysia. Another reason for failure, even when, using just two factors: epidemiological (dengue cases) and environmental (weather data), arose due to using weather data from Meteorological Department (secondary data), jeopardized the accuracy of forecasting model because of the localized rainfall in Malaysia. Hence, to overcome all these limitations, two scientific experimental field studies were conducted to establish the relationship of the 3 factors. However, the first study (Rohani et al., 2011), only managed to relate 2 factors (entomological (mosquito related data) and environmental (weather data)) because of reliance on small localities contributed to insufficient number of dengue cases (epidemiological). The second study was conducted at two large high dengue cases areas and successfully established specific relationships of the 3 factors and obtained 85% of accuracy using an ADL model (Rohani et al., 2018). But higher accuracy of at least 90% and above were required for embedding the forecasting model into dengue outbreak early warning system (e-Dengue). Thus, the search continued in this study using Rohani et al. (2018) data and six forecasting methods (Autoregressive Distributed Lag (ADL), Hierarchical Forecasting (Bottom-up and Optimal combination) and Machine Learning methods (Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest) with the aimed of at least 90% accuracy. Successfully, Random Forest proved the most appropriate for forecasting dengue outbreak in Malaysia with 95% accuracy when using the 3 factors, but it was very costly and time consuming to collect mosquito related data (entomological). Thus, only 2 factors (epidemiological and environmental) were used in the model, and while the accuracy dropped to 92%, this was still acceptable. However, the localized rainfall requires practically several rain gauges to be located at the dengue hot spot areas about 3–4 km apart; in ensuring the accuracy of the forecasting model when embedded in the early warning system using an Internet of Things (IoT) approach. The e-Dengue early warning system integrating the epidemiological data readily available in the system with the environmental data through automated mobile weather stations (as the IoT objects/sensors) located at the dengue hot spots and using Random Forest to forecast up to 4 weeks the dengue cases. Real time notification of potential dengue outbreak will then be alerted to Health Department and COMBI, a community engagement group set up by Ministry of Health to prevent dengue by destroying the Aedes breeding sites. Thus, the proposed practical and cost-effective e-Dengue early warning system can save lives because prevention is better than cure. The research contribution here is to show how recent advances in forecasting methods when combined to the careful and practice-based simplification of the included drivers of dengue outbreaks has led to a successful implementable warning system.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  46 in total

1.  Associations between dengue and combinations of weather factors in a city in the Brazilian Amazon.

Authors:  Maria Goreti Rosa-Freitas; Kathleen V Schreiber; Pantelis Tsouris; Ellem Tatiani de Souza Weimann; José Francisco Luitgards-Moura
Journal:  Rev Panam Salud Publica       Date:  2006-10

2.  Environmental factors can influence dengue reported cases.

Authors:  Marco Antonio F Carneiro; Beatriz da C A Alves; Flávia de Sousa Gehrke; José Nuno Domingues; Nelson Sá; Susana Paixão; João Figueiredo; Ana Ferreira; Cleonice Almeida; Amaury Machi; Eriane Savóia; Vânia Nascimento; Fernando Fonseca
Journal:  Rev Assoc Med Bras (1992)       Date:  2017-11       Impact factor: 1.209

3.  A novel dengue fever (DF) and dengue haemorrhagic fever (DHF) analysis using artificial neural network (ANN).

Authors:  Fatimah Ibrahim; Mohd Nasir Taib; Wan Abu Bakar Wan Abas; Chan Chong Guan; Saadiah Sulaiman
Journal:  Comput Methods Programs Biomed       Date:  2005-09       Impact factor: 5.428

4.  Eco-virological survey of Aedes mosquito larvae in selected dengue outbreak areas in Malaysia.

Authors:  A Rohani; A R Aidil Azahary; M Malinda; M N Zurainee; H Rozilawati; W M A Wan Najdah; H L Lee
Journal:  J Vector Borne Dis       Date:  2014-12       Impact factor: 1.688

5.  The Effects of Meteorological Factors on Dengue Cases in Malaysia.

Authors:  Sarbhan Singh; Lai Chee Herng; Lokman Hakim Sulaiman; Shew Fung Wong; Jenarun Jelip; Norhayati Mokhtar; Quillon Harpham; Gina Tsarouchi; Balvinder Singh Gill
Journal:  Int J Environ Res Public Health       Date:  2022-05-26       Impact factor: 4.614

6.  Potential effect of population and climate changes on global distribution of dengue fever: an empirical model.

Authors:  Simon Hales; Neil de Wet; John Maindonald; Alistair Woodward
Journal:  Lancet       Date:  2002-09-14       Impact factor: 79.321

7.  A new accuracy measure based on bounded relative error for time series forecasting.

Authors:  Chao Chen; Jamie Twycross; Jonathan M Garibaldi
Journal:  PLoS One       Date:  2017-03-24       Impact factor: 3.240

8.  Entomological Risk Assessment for Dengue Virus Transmission during 2016-2020 in Kamphaeng Phet, Thailand.

Authors:  Thanyalak Fansiri; Darunee Buddhari; Nattaphol Pathawong; Arissara Pongsiri; Chonticha Klungthong; Sopon Iamsirithaworn; Anthony R Jones; Stefan Fernandez; Anon Srikiatkhachorn; Alan L Rothman; Kathryn B Anderson; Stephen J Thomas; Timothy P Endy; Alongkot Ponlawat
Journal:  Pathogens       Date:  2021-09-24

9.  Infection Rates by Dengue Virus in Mosquitoes and the Influence of Temperature May Be Related to Different Endemicity Patterns in Three Colombian Cities.

Authors:  Víctor Hugo Peña-García; Omar Triana-Chávez; Ana María Mejía-Jaramillo; Francisco J Díaz; Andrés Gómez-Palacio; Sair Arboleda-Sánchez
Journal:  Int J Environ Res Public Health       Date:  2016-07-21       Impact factor: 3.390

10.  Estimating the Threshold Effects of Climate on Dengue: A Case Study of Taiwan.

Authors:  Bao-Linh Tran; Wei-Chun Tseng; Chi-Chung Chen; Shu-Yi Liao
Journal:  Int J Environ Res Public Health       Date:  2020-02-21       Impact factor: 3.390

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.