Literature DB >> 34481056

COVID-19 modelling by time-varying transmission rate associated with mobility trend of driving via Apple Maps.

Min Jing¹, Kok Yew Ng², Brian Mac Namee³, Pardis Biglarbeigi², Rob Brisk⁴, Raymond Bond⁵, Dewar Finlay², James McLaughlin².

Abstract

Compartment-based infectious disease models that consider the transmission rate (or contact rate) as a constant during the course of an epidemic can be limiting regarding effective capture of the dynamics of infectious disease. This study proposed a novel approach based on a dynamic time-varying transmission rate with a control rate governing the speed of disease spread, which may be associated with the information related to infectious disease intervention. Integration of multiple sources of data with disease modelling has the potential to improve modelling performance. Taking the global mobility trend of vehicle driving available via Apple Maps as an example, this study explored different ways of processing the mobility trend data and investigated their relationship with the control rate. The proposed method was evaluated based on COVID-19 data from six European countries. The results suggest that the proposed model with dynamic transmission rate improved the performance of model fitting and forecasting during the early stage of the pandemic. Positive correlation has been found between the average daily change of mobility trend and control rate. The results encourage further development for incorporation of multiple resources into infectious disease modelling in the future.

Entities: Chemical

Keywords: COVID-19; Data integration; Dynamic transmission rate; Infectious disease modelling; Mobility trend

Mesh：

Year: 2021 PMID： 34481056 PMCID： PMC8410221 DOI： 10.1016/j.jbi.2021.103905

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 8.000

Introduction

Mathematical modelling of infectious diseases plays an important role in understanding and controlling the transmission dynamics of epidemics such as coronavirus disease (COVID-19), which helps to identify the trends, make general forecasts and support the intervention measures. A well-established compartmental model is the SEIR model [1], which divides the population into different compartments: Susceptible (S), Exposed (E), Infectious (I) and Recovered (R), then models how the disease transmits across the compartments over time. The SEIR model has been extended and widely applied to model the dynamics of COVID-19 [2], [3], [4], [5], [6], [7], [8], [9], [10]. In the compartmental models, the parameters are often set based on individual decisions or assumptions, such as some infectious disease models may consider the transmission rate remains as a constant during the entire epidemic. However, a constant may not be adequate to capture the dynamics in reality because there are many external factors, such as intervention measures or changes in social behaviours, that can influence the disease transmission. Therefore, a dynamic is more desirable than a constant one. For example, a study [4] took social distancing into account and the authors proposed a time-varying based on assumption that social behaviours would change due to the fear of increased deaths, then proposed a dynamic version of modelled by daily change of deaths. Although their assumption is limited since the transmission rates may change for many other reasons, not only a fear of increased deaths, their study provided a new idea for dynamic transmission rate for COVID-19 modelling. To study the lockdowns and second waves, another study [11] modified the SEIR model to have the different transmission rates, before and after the lockdowns, which were estimated separately according to the country’s time for lockdown. One study [6] attempted to model the change of social behaviours via integration of the epidemiological and economic models. They proposed a group-dependent contact rate, which measures the probability that a susceptible person in one group meets an infectious person from another group and then they become infectious. They then took into account of social distancing in their model. Several studies have proposed a time-varying version of [12], [13], [14], which takes into account the subexponential growth dynamics in empirical data and the variety of mechanisms in the Ebola outbreak. A similar model was also extended to associate with the reproduction number [15] and applied to study the 2014 Ebola Virus Disease (EVD) outbreak in West Africa [16]. Incorporating prior information into mathematical models or multimodal data fusion has been widely applied for healthcare applications such as for brain image decomposition [17], fusion of EEG and fMRI [18] and prediction of clinical measures from neuroimages [19]. However, integration of data from multiple sources in modelling of the infectious disease like COVID-19 has not been widely explored yet. Centers for Disease Control and Prevention (CDC) has published the factors contributing to COVID-19 acceleration [20] and many studies have been carried out in those areas. These factors include ongoing travel associated spread of the virus[7], [21], [22], large gatherings[23], introductions into high-risk workplaces/settings (such as hospital or care home)[24], [25], crowding and high population density [26], [27], cryptic transmission (such as presymptomatic or asymptomatic spread [28], [29]). Since these factors directly affect the infection occurrences, they may be potentially associated with infectious disease modelling thereby improving modelling performance. However, it is not always easy to quantify such information to be used in modelling. Even if the information can be quantified, the way of incorporating it into the model may not be straightforward. Some studies have focused on using social contact matrices to quantify population contact patterns [30] or to associate the social contact metrics with the reproduction number [31]. As for the research presented in this paper, the focus is to associate the mobility trend with the dynamic infectious disease models. Recent studies for COVID-19 have highlighted the importance of mobility trends in disease transmission. A study based in the USA [21] has revealed that mobility patterns are strongly correlated with decreased COVID-19 case growth rates for the most affected 20 counties. They used daily mobility data derived from aggregated and anonymised mobile phone data to capture real-time trends in movement patterns for each US county, and used these data to generate a social distancing metric. Another study [7] worked on a SEIR-like transmission model that included a network of 107 provinces in Italy connected by mobility at high resolution. This study did not use the mobility data in the model, but used it as a reference to assess the connection of the regions. A study [32] explored the relationship between the effective reproduction number and mobility levels during COVID-19 lockdowns for 56 countries based on the mobility trend data obtained from Apple Maps [33]. Although these studies suggested the importance of mobility trend in disease transmission by analysing the relationship between their findings with mobility trend, they did not directly associate the mobility trend data in disease modelling. As a proof of concept, this study aimed to explore the potential of integrating multiple data resource into infectious disease modelling thereby enhancing the model performance. In order to connect the disease model with the extrinsic factors that may contribute to disease transmission, a dynamic model with a control rate has been introduced, from which information from multiple sources can be incorporated within the model. In this study, the mobility trend data was used as an example to associate with control rate, but other types of data (if can be quantified), such as a social contact matrix [34] may be used for different applications (involving social distancing or exiting strategies). The contribution of this study is based on three main aspects: (1) we modified the time-varying transmission rate proposed in [12], [15] and deployed it to COVID-19 modelling. The results based on six European countries suggest that the proposed method improved the performance in model fitting and forecasting during the early stage of the pandemic. The idea of can be applied to different infectious disease models; (2) we associated the control rate in with the disease intervention by incorporating the mobility trends (mobility of driving via Apple Maps was used as an example). Different ways to process the mobility trend data were explored; (3) we investigated the relationship between the control rate and the processed mobility trend by predicting 20-day death rates in four stages. The results warrant further development for incorporation of mobility into infectious disease modelling. The rest of this paper is arranged as follows: in Section 2, the dynamic transmission rate is defined together with the simulation. The mobility trend data is introduced together with four ways of processing the trend data. In Section 3, the evaluation for model fitting and prediction based on six European countries are presented and results are discussed. The paper is finished with discussion in Section 4 and conclusion in Section 5.

Methodology

Dynamic transmission rate

Several studies based on 2014–2015 Ebola epidemic [12], [13], [14] have found the subexponential growth resulted in an early decline in effective reproduction number due to rapid onset of behaviour changes and intervention strategies to control the spread of the disease. They proposed a time-varying version of transmission rate based on an “exponential decay” model to take into account the subexponential growth dynamics in empirical data. Similarly, restriction measures have been introduced to controlling the spread of COVID-19, which have shown effective impact on control the acceleration of infection cases in most countries during the first wave of the pandemic. Some studies have observed the subexponential growth of COVID-19 in China [35] and in South Korea [36] due to effective containment and implementation of social distancing measures. To capture the subexponential growth dynamics of COVID-19, the time-varying version of based on an “exponential decay” model proposed in [12], [15] was adopted in this study, for example the transmission rate is exponentially declined from an initial value towards (for at a control rate . Here, we modified their model and defined the dynamic as:where denotes the absolute value. The transmission rate from an initial value changes to under the control rate , which governs the speed of the disease spreading over time. When there is no control, , the transmission rate is a constant (as in conventional SEIR model). Here we considered the possible change in both directions, especially after relaxing the level of restrictions such as easing lockdown, therefore the absolute change of was used. The impact of control rate on the dynamic model is demonstrated by plotting in Fig. 1 (a) (with and for demonstration purpose). As seen in Fig. 1 (a), the higher the control rate the quicker declines. Fig. 1(b) presents with added random noise and shows that the transmission rate may not always decline as in the smoothed version. It may have fluctuation due to various reasons in reality, but the overall trend remains declined over a period of time with the control rate implemented. Note that the proposed transmission rate appears to decline because it was based on the “exponential decay” model to take into account the subexponential growth dynamics. With evolution of COVID-19 during the pandemic, the COVID-19 variants, strains and mutations [37], [38], [39] will add to the complexity as they can have very different transmission rates, in which different strategies may need to be considered in the future studies.

Fig. 1

The impact of control rate on the transmission rate based on different control rates: (a) smooth version; (b) with noise. Both cases show that the higher the control rate the quicker declines.

The impact of control rate on the transmission rate based on different control rates: (a) smooth version; (b) with noise. Both cases show that the higher the control rate the quicker declines. To demonstrate the impact of on the infected cases , the proposed was applied to a modified SIR model [4], such aswhere denotes susceptible, is infectious, denotes resolving, i.e. sick but not infectious, D is deceased, C is recovered, and N is the population. Infectiousness resolves at Poisson rate . After the infectious period is over, a constant fraction () of people exit the “Resolving” state , with a fraction of them deceased and rest of them recovered. Assuming for SIR model; and for the proposed . Fig. 2 presents the comparison of the infected cases by SIR and its dynamic version with three different control rates 0.01, 0.03 and 0.05. It can be seen from Fig. 2(a), in the dynamic model, started with same transmission rate , increasing the control rate not only delays but also lowers the peak of infected cases, which is vital to control the spreading of the disease such that the number of infected cases do not exceed the healthcare capacity. Fig. 2(b) shows the fluctuations of infected cases under with noise, in which the similar impact of control rate can be observed as in Fig. 2(a).

Fig. 2

Comparison of the infected cases with a fixed and (a) dynamic and (b) with noise. It can be seen that in the dynamic model, a higher control rate not only delays but also lowers the peak of infected cases.

Apply dynamic to infectious disease model

Proposed model

In the experiments for real data, we applied to the general SEIR (GSEIR) model proposed in [3], which was based on study for COVID-19 in China and has also been applied to COVID-19 study in Spain [40], [41] and Italy [42]. Apart from and R in the classical SEIR model[1], GSEIR model introduced three additional states to model the epidemic dynamics, which are Quarantined Q, Deceased D and Insusceptible P. The quarantine state Q was originally proposed in [43], which was used to refer the isolated individuals (as in quarantine). In GSEIR model, the period from I to Q was defined as the time from the infectious to the case being confirmed. The block diagram of the GSEIR model with dynamic is shown in Fig. 3 and the dynamic GSEIR model can be expressed by a set of ordinary differential equations (ODEs) as in Eq. (3).The total population . A set of coefficients represent the protection rate, dynamic transmission rate, average latent time, average time to enter the quarantine state, cure rate and mortality rate, respectively. The system of ODEs can be solved using the classic 4th order Runge–Kutta method and the model parameters can be estimated by curve-fitting via least square estimation (LSE) technique.

Fig. 3

The block diagram for GSEIR model with dynamic .

Parameter setting

The LSE optimisation usually can provide a good fit to the data, however, some auto-fitted parameters may not be plausible from the perspective of infectious disease, especially for those associated with , latent period, and , the period between infectious (I) to becoming confirmed Q. (For example, for the data from UK, the auto-fitted parameters for latent period day and days, which appear far from the reality and those reported in the literature). In GSEIR model, the period from I to Q was defined as the time from the infectious to the case being confirmed, which can vary for different countries depending on the disease control policy. Therefore, we fixed the parameters and during model fitting after exploring different values reported in the literature and leave the rest to be estimated by LSE. (Those fixed parameters can be different for individual countries as summarised in the experiment section Table 1 .) Note, this study was focused on the early data before serious mutations may have occurred, different strategies for parameter setting may be needed for those complex situations due to the COVID-19 variants, strains and mutations [37], [38], [39].

Table 1

The parameter setting for selected countries.

Country	Starting Date	γ-1	δ-1	E(0)=Q(0)+	I(0)=Q(0)+
NI	12/03/2020	6.67	5.48	25	0
UK	23/03/2020	4.30	7.89	70	20
Italy	23/02/2020	4.30	5.48	200	100
Spain	02/03/2020	2.84	5.98	200	100
France	29/02/2020	4.30	5.48	150	120
Germany	01/03/2020	3.31	7.65	250	200

The parameter setting for selected countries. The latent period, , is the period between the time at which a person is exposed to the virus (E) to the time which they become infectious (I). During this period, the pathogen is present in a “latent” stage, without clinical symptoms or signs of infection in the host. Currently, there is no agreement on how long it takes an infected individual to become infectious. It has been reported [9] and is largely accepted, that transmission of COVID19 infection may occur from an infectious but asymptomatic individual. In [44], it was reported that the median time prior to symptom onset is 3 days, the shortest time is 1 day and the longest period as much as 24 days. Another study [7] summarised that the latency period reported in the literature varies from 3.44–3.69 days [5] to 7 days [45], [46]. According to those reported, we fixed within a range of 2.5–7 days, which can still vary slightly for six selected countries. For , the period between being infectious (I) to becoming confirmed Q, which can vary depending on a country’s testing capability, quarantine policy or efficiency of the reporting system. For countries such as China, people with symptoms are required to go to the hospital immediately and be quarantined as soon as possible. In the UK, (at time during conducting this study), people with symptoms were asked to self-quarantine for at least 7 days (for individual) or 14 days (for the whole household) before going to hospital for a test. Therefore, it is not easy to use one value for all countries, final value of was set within a range between 5–14 days. It is noticed that the cure rate and recovery rate are also defined as time-varying, which can be decided in different ways. In [3] they were based on estimation from the reported recovered and mortality data in China. Here we adopted the simple functions being used in [40], [41], in which the cure rate increases and mortality rate decreases over time, such that: and . For each country, the parameters and were added to the set of coefficients to be estimated by model fitting, so they can be different for each country. For those who are interested in tailoring the parameters of cure rate and recovery rate to suit individual country scenarios, more variations of functions and implementation can be found in [47].

Mobility trend

Data via Apple Maps

Theoretically, the control rate can be associated with different types of disease control measures as long as they can be quantified, which however is not always easy in practice. In this study, we used the mobility trend as an example and explored how they can be associated with the infectious disease models. The daily global COVID-19 mobility trend data published by Apple Maps [33] were used in the study. The published data were provided in CSV files showing the mobility of driving, walking, and transports for most countries. Users can request the mobility trend data by country, region or city. The daily data were compared to the data created on January 13th, 2020, which was used as the baseline (set as 100) for entire mobility trend by Apple Maps. Due to our geographical location, this study was initially focused on the United Kingdom (UK), particularly Northern Ireland (NI), and later extended to four more European countries: Italy, Spain, France, and Germany. But since only the driving trends were available for NI, we only considered the trends of driving in this study. Fig. 4 (a) presents the original mobility trend of driving (denoted by ) in 100 days for six selected countries. It is noted that there are fluctuations in the data that can be due to the changes of mobility during the weekends and public holidays. To mitigate the fluctuation, a 7-day moving average was applied to capture the overall trends. Fig. 4 (b) shows the mobility trend smoothed by 7-day moving average and normalised by dividing the baseline. The x-axis in figure shows the number of days. The outbreaks began at different times in each country, for fair comparison, the starting date in this study was set at the day that the country confirmed its 100th case, except for NI where we started with confirmation of the 20th cases due to the relatively small number of the confirmed cases compared to the rest countries. The starting dates for each country are: (a) NI: Mar.12th, 2020; (b) UK: Mar.23rd, 2020; (c) Italy: Feb.23rd, 2020; (d) Spain: Mar.2nd, 2020; (e) France: Feb.29th, 2020; and (f) Germany: Mar.1st, 2020.

Fig. 4

The mobility trend of driving for six selected countries based on: (a) the original data from Apple Maps; (b) smoothed by 7-day averaging and normalised by dividing the baseline 100 (as set by Apple Maps [33]). In Fig. 4 (b), it can be seen that driving in most countries started to drop during first 20 days suggesting the impact of control measurements such as lockdown or tightening social distancing rules. The trends for UK appear to drop a few days behind that of the others, which suggests the UK’s response is slower than others. NI appears to reduce the mobility earlier than other countries. It was because most activities have been stopped since March 17th due to the St Patrick’s Day public holiday. The two universities in NI closed from March 18th and all primary schools were closed from March 20th, which was three days earlier than the UK government announced the lockdown on March 23rd. Early reduction in mobility may contribute to the fact that NI has the least infection cases and deaths when compared to the rest of the UK during the first wave of outbreak.

Processing mobility trend data

To incorporate the mobility trend data to the dynamic model, one instinctive idea was to directly apply the mobility trend as the control rate in Eq. (1). However, more consideration is needed before that. For example, the mobility trend data needs to be processed before linking to the disease models. There are different ways to process the trend data, which one should be used? or based on what criterion? In terms of technical implementation, one may also need to consider whether the mobility trend should be incorporated using a fixed value (such as average mobility for a period of time) or as a time series (which is more challenging technically). In the present study, the focus was to find out which format of processed mobility trend can be associated with the control rate via measuring the correlation (as shown in the later experiments). Further development of direct incorporation of mobility trend in the dynamic model will be considered in the second phase of the study. Next, we introduce four types of processing for the mobility trend, which can be applied to countries individually. For each country, the control rate was considered to be proportional to the mobility trend , such as . The trend can be expressed in two different forms, the 7-day smoothed daily mobility trend () and the daily change of mobility comparing to baseline , which can be presented as:andIt is noticed that the mobility trend is always positive but daily change of mobility can be positive or negative. When is lower than baseline, which suggests the reduced mobility during the lockdown. When , the mobility trend is close to or higher than the baseline, such as before lockdown or after easing lockdown. To get the average of mobility trend within a time period, if time starts at and finishes at , two types of average of can be calculated by: and two types of average of can be calculated by: For example, calculates the average of 20 days (from 21 to 40), calculates the mean of all 40 days. The examples of the mobility trend processed by four different ways are provided in the results section (Fig. 7). In the experiments, we investigated how the control rate can be associated with the mobility trend in these four different forms.

Fig. 7

The box plot of four types of processed mobility trend from six selected countries: (a) : average mobility change per 20 days; (b) : average mobility change per +20 days; (c) : average mobility per 20 days; (d) : average mobility per +20 days.

Experiments and results

Evaluation of model performance can be carried out by different ways, such as via model fitting or prediction. Model fitting is to fit a model to experimental data and to choose the model (parameters) best fits the data. Model prediction is to assess how well the model predicts to points not being used in model fitting via data partition. In the experiments, we investigated whether the model fitting and prediction can be improved by introducing the control rate to the infectious disease models and how the control rate can be associated with the different formats of mobility trend. The proposed method was compared to GSEIR and a modified SEIR model (by excluding state of Q and P from GSEIR model). The performance was evaluated based on the root mean square error (RMSE) and mean absolute error (MAE). The proposed model was implemented by MATLAB R2019b. The code for GSEIR model is available via open source [47].

Data

The COVID-19 data for UK, Italy, Spain, France and Germany were obtained from Johns Hopkins University data repository [8], and the data for NI were obtained from COVID-19 UK Data via Github [48]. For each country, the data includes the case numbers for the total confirmed cases, deaths and recovered cases each day, which are required for infectious disease modelling. It has been noticed that the data for the recovered cases in the UK and NI were not properly reported. For Spain and France there were some data fluctuations in some days, such as reduced numbers in cumulative cases, which could be due to the issues in these countries’ reporting system. The instability of data may affect the performance in the results. The mobility trend data is available via Apple Maps [33] and has been explained in Section 2.3.1.

Parameters

The parameters for and used in the experiments and the starting date for the data in each country are summarised in Table 1. The initial values for the parameters include: and . The initial value for and can be found from the reported data. and are expected to be greater than , which were determined empirically by adding values from 20 to 250 to .

Model fitting

The proposed model was fit to the 100 day data and the performance was evaluated based on RMSE and MAE, which provide different ways of quantifying the difference between the estimated and the reported. The model was run based on a set of control rates varying from 0.02 to 1.0 with interval of 0.02. The RMSE was calculated and the optimal control rate was determined by finding the minimum RMSE. The comparison of performance by RMSE and MAE for SEIR, GSEIR and the proposed method is given in Table 2 for deaths and Table 3 for the confirmed cases, respectively. The best result for each measure for a given country is in bold. It can be seen that the proposed method outperformed SEIR and GSEIR for both fitting for deaths and confirmed. The optimal control rates for deaths and confirmed are slightly different for some countries, which can be due to the complexity of the reported data. For example, the number of confirmed cases can be affected by testing capability, reporting system and disease control policy. Like some studies [4] that focused on the mortality data only, the following experiments for prediction just used the mortality data.

Table 2

Comparison of model fitting for death cases by SEIR, GSEIR and the proposed method.

	SEIR		GSEIR		Proposed
Country	RMSE	MAE	RMSE	MAE	RMSE	MAE	Control Rate
NI	375	312	15	14	14	12	0.45
UK	15470	11389	3740	3001	788	582	0.04
Italy	9478	7462	671	541	372	308	0.10
Spain	4438	3448	1407	1238	561	444	0.12
France	12534	9214	1167	828	1148	764	0.06
Germany	1108	1011	827	681	450	351	0.10

Table 3

Comparison of model fitting for confirmed cases by SEIR, GSEIR and the proposed method.

	SEIR		GSEIR		Proposed
Country	RMSE	MAE	RMSE	MAE	RMSE	MAE	Control Rate
NI	2066	1734	130	109	102	90	0.45
UK	101200	79812	8777	6833	5795	4512	0.10
Italy	72417	58732	7954	6540	2671	1727	0.10
Spain	47438	35258	10163	6939	5739	4800	0.12
France	87292	65856	7167	4359	7051	4194	0.03
Germany	61801	46280	5301	4171	4020	3178	0.10

Comparison of model fitting for death cases by SEIR, GSEIR and the proposed method. Comparison of model fitting for confirmed cases by SEIR, GSEIR and the proposed method. The results of model fitting by the proposed method are given in Fig. 5 for the cumulative confirmed cases and deaths. Fig. 6 presents the results for the daily confirmed cases. There are several noticeable spikes and drops in the daily data for France and Spain, which may be due to possible adjustment made by these countries in their data reporting system. For example, as seen in Fig. 5 for Spain, the reported cumulative confirmed cases dropped during 50 to 60 days, which explains the negative number of daily confirmed cases in Spain in Fig. 6. Similar observation can be found for France in Fig. 5, which explains their corresponding negative values in Fig. 6.

Fig. 5

The results of model fitting based on the cumulative confirmed and deaths data by the proposed method in 100 days for six selected countries.

Fig. 6

The results of model fitting for the daily confirmed cases in 100 days by the proposed model for six selected countries. (The negative number of daily confirmed cases in Spain and France may be due to possible adjustment made in their data reporting system as noticed corresponding drops in their reported cumulative confirmed cases in Fig. 5).

The results of model fitting based on the cumulative confirmed and deaths data by the proposed method in 100 days for six selected countries. The results of model fitting for the daily confirmed cases in 100 days by the proposed model for six selected countries. (The negative number of daily confirmed cases in Spain and France may be due to possible adjustment made in their data reporting system as noticed corresponding drops in their reported cumulative confirmed cases in Fig. 5). It is also noticed that the model may not perfectly represent the saddle point of infection in the UK between time (days) 30 and 60 (as seen in Fig. 6), which may be due to the data for the recovered cases in the UK was not properly reported. Technically, a better fitting performance can be achieved by adjusting the parameters via auto-fitting by LSE optimisation, however, as explained in Section 2.2.2, those parameters may not be explainable from the perspective of infectious disease. In the experiment, we applied the proposed method to the data from six countries, the purpose was not to compare the performance among those countries but to evaluate the performance of the proposed method. The overall performance shows that the model fits well with the data. The results suggest a relatively consistent performance was achieved from different countries, which is encouraging for future studies.

Processing of mobility trend

To investigate how the control rate can be related to the mobility trend, we processed the mobility data in four different ways as explained in Section 2.3.2. The average of mobility was calculated per 20 days and per +20 days, which were associated with the control rate for death prediction per 20 days in the next experiment. The boxplot of four types of mobility trend for the six selected countries are shown in Fig. 7 , which includes: (a) average change of mobility trend per 20 days (); (b) average change of mobility trend per +20 days (); (c) average mobility trend per 20 days () and (d) average mobility trend per +20 days (. The box plot of four types of processed mobility trend from six selected countries: (a) : average mobility change per 20 days; (b) : average mobility change per +20 days; (c) : average mobility per 20 days; (d) : average mobility per +20 days. For example, Fig. 7(a) presents the average of mobility trend change, , in five predefined time periods: 1–20, 21–40, 41–60, 61–80 and 81–100 days. It can be seen that the average mobility drops around during 21–40 days, then gradually rises up close to normal during 81–100 days. The trends change per +20 days in Fig. 7(b) presents slightly different trends to those in Fig. 7(a). The average mobility in Fig. 7(c) and in Fig. 7(d) present the same trends as in Fig. 7(a) and (b), respectively, however in different scales (as shown in y-axis).

Four stage mortality prediction

The evaluation of the performance for prediction was carried out based on the prediction of deaths at four stages, in which four data lengths were used (as those set for mobility processing). In practice, it is better to use as much data available as possible to fit the model before forecasting the unseen case numbers in future days. To ensure the prediction was completely out-of-sample, in the first stage prediction, data from the first 1–20 days were used to estimate the parameters, which then were used to predict the deaths cases for the next 20 days (shown in the column of Days for Prediction as 21–40 in Table 4, Table 5 ). At the second stage with more data available, the data from 1–40 days were used for model fitting then the parameters were applied to predict for days 41–60, and so on so forth. The RMSE and MAE were calculated at each stage between the predicted 20 days and their corresponding reported data only.

Table 4

Comparison of four stage predictions by SEIR, GSEIR and the proposed model.

	Days for	SEIR		GSEIR		Proposed
Country	Prediction	RMSE	MAE	RMSE	MAE	RMSE	MAE
NI	21–40	507	330	61	45	6	5
	41–60	378	373	31	30	12	9
	61–80	500	499	4	3	3	3
	81–100	538	538	10	10	10	9

UK	21–40	5006	3834	3138	2248	3371	2456
	41–60	81644	58921	4284	3185	984	889
	61–80	99711	76297	883	822	402	306
	81–100	41943	31328	10076	9999	241	208

Italy	21–40	18780	11974	804	676	417	342
	41–60	138317	104928	4375	3930	222	170
	61–80	99291	84941	1646	1556	106	99
	81–100	63726	59998	2127	2093	332	300

Spain	21–40	53490	32407	2259	2070	2123	1938
	41–60	122825	97770	2129	2065	968	817
	61–80	71432	64205	2204	2127	960	885
	81–100	25706	25268	1353	1198	701	415

France	21–40	5039	3827	4363	3248	4314	3210
	41–60	66391	51848	1234	1120	3914	3843
	61–80	150076	128539	4851	4660	1746	1625
	81–100	102708	93317	908	899	249	224

Germany	21–40	1079	779	1173	884	821	611
	41–60	4830	4677	3810	3613	3654	3449
	61–80	3145	3105	1269	1244	1360	1334
	81–100	552	551	1269	1262	1048	1040

Table 5

Results of control rate and four types of mobility trend.

Country	Days for Prediction	Control Rate	\|Mc1\|	\|Mc2\|	Md1	Md2
NI	21–40	0.18	0.56	0.48	0.44	0.52
	41–60	0.16	0.47	0.48	0.53	0.52
	61–80	0.52	0.27	0.43	0.73	0.57
	81–100	0.02	0.06	0.35	0.94	0.65

UK	21–40	0.20	0.64	0.36	0.36	0.64
	41–60	0.90	0.61	0.45	0.39	0.55
	61–80	0.84	0.49	0.46	0.51	0.54
	81–100	0.18	0.27	0.42	0.73	0.58

Italy	21–40	0.08	0.82	0.53	0.18	0.46
	41–60	0.34	0.79	0.62	0.21	0.38
	61–80	0.12	0.66	0.63	0.34	0.37
	81–100	0.14	0.34	0.57	0.65	0.43

Spain	21–40	0.10	0.85	0.56	0.15	0.44
	41–60	0.88	0.79	0.64	0.21	0.36
	61–80	0.22	0.64	0.64	0.35	0.36
	81–100	0.20	0.36	0.58	0.63	0.42

France	21–40	0.12	0.78	0.50	0.22	0.50
	41–60	1.00	0.71	0.57	0.30	0.43
	61–80	0.84	0.47	0.54	0.53	0.46
	81–100	0.68	0.47	0.44	0.96	0.56

Germany	21–40	0.02	0.50	0.31	0.50	0.69
	41–60	0.98	0.37	0.32	0.63	0.67
	61–80	0.04	0.19	0.29	0.81	0.71
	81–100	0.06	0.10	0.21	1.10	0.79

Comparison of four stage predictions by SEIR, GSEIR and the proposed model. Results of control rate and four types of mobility trend. The comparison of RMSE and MAE for prediction by SEIR, GSEIR and the proposed model are given in Table 4, and the results of corresponding control rate and mobility trend processed in four different ways are presented in Table 5. The best result for each measure for a given country is in bold. It can be seen that in most cases, the proposed method achieved better performance than GSEIR and SEIR according to both the RMSE and MAE measures, except in three cases GSEIR performed better and one case SEIR is better. Note the main focus here was not on maximizing predictive accuracy although it may be achieved by further tailoring the model for each individual country scenario. Instead, the goal was to evaluate the proposed model with a control rate, which can then provide a basis for capturing the dynamics of infectious disease at the early stage and potentially associated with additional information for disease control. The results presented so far suggest that the proposed model captures the disease transition and can be used to make reasonable predictions.

Relationship between mobility and control rate

One of the objectives for this study was to investigate whether and how the control rate can be associated with the mobility trend. To measure the degree of association or relationship between two variables quantitatively, correlation coefficient was used. A correlation between two variables indicates that as one variable changes in value, the other variable tends to change in the same or opposite direction. The correlation coefficient measures both the direction and the strength of the tendency to vary together. A positive correlation coefficient indicates that both variables change in the same direction. For example in the financial markets, a positive or negative correlation coefficient indicates two stocks move in the same or opposite direction, respectively. For this study, a positive correlation between control rate and any form of processed mobility trend will indicate that the direction of changes in control rate aligns with the changes in mobility trend. The control rates obtained in four stage predictions and the mobility trend processed by four different ways are presented in Table 5. The number of days in the 2nd column is the data length used for prediction. The control rate was varied from 0.02 to 1.0 and the prediction was run for four data lengths and optimum rate was obtained by minimum RMSE (using MAE produced the same results of rate). Notice that can be negative, but the control rate need to be positive, therefore in Table 5, and were used. To assess how the association may be established between the control rate and the mobility trend being processed in four different ways, the correlation coefficients between the rates and mobility trend were calculated. The results of correlation coefficients are given in Fig. 8 . It can be seen that the positive correlation is found between the control rates and average mobility changes and , which suggest that the change of control rates is in line with the change of and . In addition, apart from France, average change of mobility within the entire prediction period has a higher correlation with the control rate than the rest. These results are encouraging and suggest the potential of further development for incorporating the mobility trend into the dynamic disease modelling.

Fig. 8

Results of correlation coefficients between the control rate and four types of mobility trend based on 20-day death prediction for six countries.

Discussion

Disease modelling is challenging because it involves different factors varying from biological, social behaviour, healthcare systems, and intervention policy. Good performance in model fitting and prediction may require incorporating different levels of complexity, hence resulting more complicated models compared to the basic compartmental model as SIR or SEIR. The purpose of this study was not to design a complex model to suit all countries, instead the goal was to propose an informatics approach to capture the dynamics of disease transmission and to integrate different types of data into the disease modelling thereby to improve the model performance. The mobility trend data were used as an example in this study, but the proposed ideas may be extended to incorporate other information, such as a social contact matrix [34], for other types of applications involving social distancing or exit strategies. Since disease modelling relies on the reported data, any factors that affect the data will have impact on the model performance. For example, in this study, the data for the recovered cases in UK and NI were not properly reported, the instability in data reported in Spain and France show the negative changes in daily confirmed cases in several data points (as seen in model fitting result Fig. 6). Testing strategies will also have impact on the reported data. Different types of testing strategies have been introduced including diagnostic testing, screen testing, and public health surveillance. Many countries have adopted testing strategies such as using rapid profession-use and home-use antigen lateral flow tests [49] but how the end-users perform in the rapid testing at different environmental settings still requires further investigation [50], [51]. How effectively the testing strategies are implemented and how accurate and timely the results are reported to the data system will have a direct impact on the data hence may affect the model performance, which is the challenge that disease modelling needs to tackle in general. Future improvement may refer to some studies [52] that have taken the testing strategies into account for COVID-19 modelling. One limitation of this study is that the mobility trend data were not directly applied in model fitting or prediction, which is another exciting research topic. Instead, as the first phase of this study, the main focus here was to investigate how the dynamics in infectious disease model can be captured by introducing the control rate and whether the control rate can be associated with the mobility trend being processed in four different ways. Effectively integrating the mobility trend (either to be used as a fixed value or time series) into the model will require further technical development, which is considered as a next phase of this study. As a proof of concept, the preliminary results from this study are positive, which suggests that there is some utility for incorporating mobility trend data into infectious disease modelling. However, applying the proposed approach to different scenarios is still a challenging task and several factors may need to be taken into account. Firstly, to incorporate any information into a mathematical model, such information needs to be quantifiable, such as mobility trends or a social matrix [34] which are good examples. Secondly, the information can be potentially associated with the outcomes of modelling, such as case growth rates or the reproductive number (R number) [21], [31], [32]. Thirdly, the application may be adjusted for certain time frames or a specific country/region. The current study was based on data from an early stage (100 days) of the COVID-19 pandemic, and it did not consider more complex scenarios that have been developed over a longer period of time, such as virus mutation or the second wave. For viral mutation, as reported in [39], the overall evolutionary rate for SARS-CoV-2 is very low, as it will take some time for the virus to acquire substantial genetic diversity. With evolution of COVID-19 during the pandemic, the COVID-19 variants, strains and mutations [37], [38] will certainly add to the complexity as they have very different transmission rates, in which different strategies for modelling may be needed. For the second wave, take Germany as an example, their lock-down strategy failed to work [53] given that their public health offices were overwhelmed by the increased workload to test citizens returning from summer holidays, furthermore, the spread of the virus went beyond clusters of cases into wider communities, making it harder to pinpoint the source of infection. Since the proposed model did not consider the second wave, it may not be suitable for the situation like Germany unless further modification can be made with additional region-specific knowledge and data. Due to the complexity involved in infectious disease modelling, it is difficult to apply one model to suit all countries or different scenarios in real life. A better research direction may be to tailor the study for one country or target to tackle a specific scenario such that the model can be adjusted accordingly, which however are beyond the scope of this study. There are future studies that may be conducted from different perspectives by employing the proposed model. For example, the current study was based on the global data published by Johns Hopkins University data repository, which unfortunately does not include the age-specific data. Some studies have been carried out using the age-specific data [54], which may be used to investigate the age-related dynamics of the proposed model as the basis for future research.

Conclusion

This study presented a novel approach that introduced a dynamic transmission rate into infectious disease model for COVID-19. A control rate was included to govern the speed of disease spreading, which can be associated to the quantifiable information related to disease control, such as mobility trend data via Apple Maps. The impact of dynamic transmission rate on the overall infection case was demonstrated by simulation. The results based on six European countries suggest that the proposed approach provided an overall improvement for model fitting and mortality prediction during the early days of the pandemic. The relationship between the control rate and four types of mobility trend presentations were investigated and the results suggest that the control rate is correlated with the average mobility changes. Integration of multiple sources of data into disease modelling is a challenging task and it is difficult to have a universal approach that can capture all the characteristics of infectious disease. Nevertheless, this study presented a new direction for disease modelling, by which we hope to inspire more studies to integrate the information from multiple sources with infectious disease modelling in the future.

CRediT authorship contribution statement

Min Jing: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Kok Yew Ng: Methodology, Validation, Writing – review & editing. Brian Mac Namee: Methodology, Writing – review & editing. Pardis Biglarbeigi: Writing – review & editing. Rob Brisk: Resources. Raymond Bond: Writing – review & editing. Dewar Finlay: Writing – review & editing. James McLaughlin: Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

45 in total

1. Recurrent outbreaks of childhood diseases revisited: the impact of isolation.

Authors: Z Feng; H R Thieme
Journal: Math Biosci Date: 1995 Jul-Aug Impact factor: 2.144

2. Temporal changes of diffusion patterns in mild traumatic brain injury via group-based semi-blind source separation.

Authors: Min Jing; T Martin McGinnity; Sonya Coleman; Armin Fuchs; J A Scott Kelso
Journal: IEEE J Biomed Health Inform Date: 2014-08-26 Impact factor: 5.772

3. A modified SEIR model to predict the COVID-19 outbreak in Spain and Italy: simulating control scenarios and multi-scale epidemics.

Authors: Leonardo López; Xavier Rodó
Journal: Results Phys Date: 2020-12-25 Impact factor: 4.476

4. COVID-19 evolution during the pandemic - Implications of new SARS-CoV-2 variants on disease control and public health policies.

Authors: Cock van Oosterhout; Neil Hall; Hinh Ly; Kevin M Tyler
Journal: Virulence Date: 2021-12 Impact factor: 5.882

5. A data driven change-point epidemic model for assessing the impact of large gathering and subsequent movement control order on COVID-19 spread in Malaysia.

Authors: Sarat C Dass; Wai M Kwok; Gavin J Gibson; Balvinder S Gill; Bala M Sundram; Sarbhan Singh
Journal: PLoS One Date: 2021-05-27 Impact factor: 3.240

2. Analysis of multi-strain infection of vaccinated and recovered population through epidemic model: Application to COVID-19.

Authors: Olusegun Michael Otunuga
Journal: PLoS One Date: 2022-07-29 Impact factor: 3.752

3. Recursive state and parameter estimation of COVID-19 circulating variants dynamics.

Authors: Daniel Martins Silva; Argimiro Resende Secchi
Journal: Sci Rep Date: 2022-09-23 Impact factor: 4.996

3 in total