Literature DB >> 35568232

Forecasting the transmission trends of respiratory infectious diseases with an exposure-risk-based model at the microscopic level.

Ziwei Cui¹, Ming Cai², Yao Xiao³, Zheng Zhu⁴, Mofeng Yang⁵, Gongbo Chen⁶.

Abstract

Respiratory infectious diseases (e.g., COVID-19) have brought huge damages to human society, and the accurate prediction of their transmission trends is essential for both the health system and policymakers. Most related studies focus on epidemic trend forecasting at the macroscopic level, which ignores the microscopic social interactions among individuals. Meanwhile, current microscopic models are still not able to sufficiently decipher the individual-based spreading process and lack valid quantitative tests. To tackle these problems, we propose an exposure-risk-based model at the microscopic level, including 4 modules: individual movement, virion-laden droplet movement, individual exposure risk estimation, and prediction of transmission trends. Firstly, the front two modules reproduce the movements of individuals and the droplets of infectors' expiratory activities, respectively. Then, the outputs are fed to the third module to estimate the personal exposure risk. Finally, the number of new cases is predicted in the final module. By predicting the new COVID- 19 cases in the United States, the performances of our model and 4 other existing macroscopic or microscopic models are compared. Specifically, the mean absolute error, root mean square error, and mean absolute percentage error provided by the proposed model are respectively 2454.70, 3170.51, and 3.38% smaller than the minimum results of comparison models. The quantitative results reveal that our model can accurately predict the transmission trends from a microscopic perspective, and it can benefit the further investigation of many microscopic disease transmission factors (e.g., non-walkable areas and facility layouts).

Entities: Chemical

Keywords: COVID-19; Environmental epidemiology; Exposure risk; Microscopic model; Public health; Respiratory infectious diseases

Mesh：

Year: 2022 PMID： 35568232 PMCID： PMC9095069 DOI： 10.1016/j.envres.2022.113428

Source DB: PubMed Journal: Environ Res ISSN： 0013-9351 Impact factor: 8.431

Introduction

The unexpected outbreak and rapid spread of Corona Virus Disease 2019 (COVID-19) have damaged the world tremendously, and brought a profound influence on various fields such as medicine, industry, energy (Sharifi et al., 2021; Anand et al., 2021; Bontempi et al., 2020; Wang et al., 2020). At present, a growing number of countries have moved into a post-pandemic phase, i.e., the overall spread of COVID-19 has been controlled, but intermittent small-scale outbreaks still occur (Jin et al., 2021). COVID-19 is a respiratory infectious disease (RID) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Zhang et al., 2022a). RID poses a huge threat to the population and public health, and recent studies have found that the loss of life expectancy due to RID stood at 1.29 years in 2017 globally (Huang and Guo, 2022). To live with the ongoing challenges by COVID-19 or the re-emergence of other respiratory viral infections, people must raise awareness and take measures for infection prevention and control in daily life (Coccia, 2021; Zhu and Zhu, 2021). Various factors that affect the transmission of RIDs can be grouped into 6 main categories (see Table 1 ): virus-related factors, population characteristics, economic factors, scene factors, environmental and geographical factors, prevention and control measures. It is important to examine the impacts of these factors on transmission, which are beneficial for disease prevention and control. For example, non-pharmaceutical interventions (NPIs) (e.g., maintaining safe social distance (Flaxman et al., 2020; Lai et al., 2020); entry limitation policy (Xiao et al., 2021)), which are in the last category, play essential roles in stopping the spread of respiratory viral infections. NPIs can efficiently respond to emerging epidemics, and they are long-term approaches that interfere with people's social behavior (Duives et al., 2021; Perra, 2021). In a public place, it is necessary to find the most critical NPIs, which can effectively reduce the number of infected cases. Thus, to help evaluate factors' impacts, a model that can accurately describe diseases' transmission process is necessary (Ren et al., 2020).

Table 1

There are some factors that affect the transmission of RIDs.

Categories	Examples and references
Virus-Related Factors	Strains of the Virus (Abu-Hammad et al., 2020)
Population Characteristics	Age (Abu-Hammad et al., 2020); Genetic Makeup of Populations Blood Groups (Abu-Hammad et al., 2020)
Economic Factors	Gross Domestic Product (GDP) (Zhang et al., 2021); Commercial Trade (Bontempi and Coccia, 2021; Bontempi, 2021, 2022; Bontempi et al., 2021)
Scene Factors	Facility Layouts (Xiao et al., 2021); Inter-Provincial Travels (Ahmadi et al., 2021)
Environmental and Geographical Factors	Air Pollution (Urrutia-Pereira et al., 2020; Conticini et al., 2020; Lembo et al., 2021; Wu et al., 2021); Solidwaste Generated by Infected Individuals (Al Huraimel et al., 2020); Sunspot Numbers (Nasirpour et al., 2021)
Prevention and Control Measures	NPIs, e.g., Maintaining Safe Social Distance (Flaxman et al., 2020; Lai et al., 2020), Entry Limitation Policy (Xiao et al., 2021); Vaccines (Daniels et al., 2021; Verma et al., 2022); Statins and Other Medicines (Davoudi et al., 2021); Virus Detection Technologies (Li et al., 2021; Zhang et al., 2022b)

There are some factors that affect the transmission of RIDs. Mathematical epidemiological models are helpful to decipher the complex transmission process of epidemics (Araya, 2021; Hosseini et al., 2020), which generally includes macroscopic level and microscopic level models (Xiao et al., 2021). At the macroscopic level, researchers focus on the infection and recovery process of disease among the population (Earn, 2008), and have proposed many compartmental models and computational intelligence methods: in the former, there are susceptible–infected–removed, or recovered (SIR) model (Kermack and Mckendrick, 1927), the susceptible–infected–recovered–susceptible (SIRS) model (Hethcote, 1976), and their extended modifications (Liu et al., 1987; Black and Mckane, 2010; Cross et al., 2007); in the latter, there are the Long Short-Term Memory (LSTM) method (Aragão et al., 2022), the hybrid intelligent approach based on fractal theory and fuzzy logic (Castillo and Melin, 2020), the multiple ensemble neural network model with fuzzy response aggregation (Melin et al., 2020), and others. Although the macroscopic models have established the research discipline of mathematical epidemiology, most of them need continuously updated data or large amount of data to obtain optimized system parameters, and they may ignore details in modeling the social interactions among individuals. Therefore, macroscopic approaches can be insensitive in evaluating NPIs or require strong assumptions to overcome the incapability. Fortunately, microscopic models address these limitations to some extent because they focus on the disease spreading between individuals. However, current microscopic models are still not able to sufficiently decipher the individual-based spreading process and lack valid quantitative tests. In this study, an exposure-risk-based model at the microscopic level is developed, and the principal contributions can be described as follows. First, the movements of individuals as well as the dynamics of droplets are separately modeled and coupled to calculate the personal exposure risk. An integrated transmission process of the RID at the individual level is formulated. Second, a bridge between the macroscopic epidemic transmission data and our microscopic model at individual level is built. As a result, our model can be quantitatively calibrated and validated through the macroscopic data, such as the number of new cases. Third, based on the proposed model, the influences of factors, that affect the model input values or scenarios (e.g., non-walkable areas and facility layouts), can be quantitatively evaluated. The rest of the paper is organized as follows. Section 2 is the review of previous studies. Section 3 introduces our model, and Section 4 manifests applications of the model based on real-world data in the COVID-19 of the United States. Section 5 presents the comparison of different models. In Section 6, discussions and future perspectives are reported. Finally, we summarize this paper in Section 7.

Literature review

There are some microscopic models forecasting the spreading trends of RIDs. For example, previous scholars have estimated individual-level mobility due to daily activities (such as work, study, or shopping) (Eubank et al., 2004). In these models, RIDs can transmit between individuals at one specific location and spread between locations due to mobility. It should be noted that the location mentioned here is a specific area likes a school or shopping mall, not a physical location of the individual. Unlike these activity-based models, some studies focus on the transmission of respiratory viruses, and they have constructed the Well-Riley models and aerosol infectious dose-response models (Bazant and Bush, 2021; Kriegel et al., 2020). These models assume that infectious particles are distributed homogeneously, which results in the same infection risks among individuals regardless of their physical distance from the infector. Recent studies have considered the spatiotemporal distribution of pathogens in the environment under different transmission routes, and the personal exposure risk is determined based on the duration and distance of infectious contacts (Arav et al., 2021; Gao et al., 2021). Nevertheless, transmission is modeled as a static event between infectors and susceptible individuals in the aforementioned models. Thus, they cannot describe the epidemic spreading process with time-varying individual physical distances during individual movements. Several studies have integrated pedestrian dynamics into epidemic spreading models to tackle the issue (Kim and Quaini, 2020; Ronchi and Lovreglio, 2020). Indeed, pedestrian dynamics are suitable for describing individual decisions and actions in mass gathering scenarios. Pedestrian-based epidemic spreading models are composed of the individual movement module (e.g., the social force model (Xiao et al., 2021), the nomad model (Duives et al., 2021), a social force model coupled with an Eikonal equation (Abdul Salam et al., 2021)) and the disease transmission module (e.g., the model based on the cut-off distance (Xiao et al., 2021), the QVEmod (Duives et al., 2021)), the non-local SEIS contagion model (Abdul Salam et al., 2021). The former module simulates the general individual crowd movement and outputs time series of personal positions. Based on the outputs from the first module, the second module evaluates the disease transmission risk from the infectors to susceptible individuals. However, previous pedestrian-based epidemic spreading models ignored critical factors for simplification, e.g., the exposure risk is a fixed term when the individual is exposed to the infection risk area (Xiao et al., 2021), and they potentially overestimate or underestimate the number of high-risk people. Besides, only limited situations are analyzed, such as the cruise (Fang et al., 2020), the supermarket (Parisi et al., 2020), and the academic building (Romero et al., 2020). Moreover, different scenarios have various scales and geometries, e.g., tables, chairs, and other furnish and decorate in a restaurant (10 m 9 m) (Duives et al., 2021). These variable settings affect the virus transmission risks and bring more model inputs and computation costs. On the other hand, a general public place can represent all scenes, e.g., an empty room of 10 m 9 m represents a restaurant or store of the same scale without considering the indoor geometries. Meanwhile, most outputs of existing microscopic-level models are hard to verify due to the lack of actual data, such as the personal exposure risk and the number of high-risk exposed people. Then, these models are used directly or after the qualitative analysis but lack quantitative tests. Hence, building a model for a general public place that can be validated and applied to all scenes is necessary and practical.

Model

To construct the microscopic level exposure-risk-based model, we should define the exposure risk of individuals. RIDs are transmitted through viral droplets produced by respiratory activities such as breathing, talking, coughing, and sneezing. Here, the typical symptom of most RIDs, i.e., coughing, is considered in our model. Thus, we define the instantaneous personal exposure risk as the maximal mass of virion-laden droplets produced by a typical cough exposed to infectors at a certain moment. Then, the personal exposure risk can be calculated by directly summing up instantaneous personal exposure risks during the dwell time. The components of our model are shown in Fig. 1 . There are three important parameters (i.e., number of individuals , number of infectors among individuals , and mean dwell time of individuals ) as the input. Besides, the output includes not only the result of existing pedestrian-based spreading models (i.e., number of high-risk exposed people ), but also the key outcome (i.e., number of new cases ). In addition, personal exposure risk is obtained as the intermediate output.

Fig. 1

The flowchart of our model.

Individual movement simulation

In this part, the movements of individuals are modeled within a general public place. First, we simulate the space and individuals of the reality. Second, the pedestrian dynamic model (i.e., the social force model) to reproduce individuals’ movements is introduced. Finally, time series of individual positions can be estimated for Section 3.3. Herein, the simulation space is a general place without obstacles, but it has boundaries like an empty indoor room. Whether individuals from model inputs are infected or not, their movement modelings are the same, as shown in Fig. 2 , where each person is represented as a circle with a radius . Thus, when we model the motion of one person during the dwell time, the movements of all others can be determined in the same way.

Fig. 2

The sketch map of the individuals moving in the simulation space.

The sketch map of the individuals moving in the simulation space. The social force model is one of the most widely used microscopic models, and it is also the basic model to simulate pedestrian dynamics in commercial software (e.g., PTV-Vissim, MassMotion) (Bouchnita and Jebrane, 2020a, 2020b). Hence, the social force model is applied here. The movement of individual (with a mass of ) at time is driven by the resultant force aswhere reflects the desire of individual to maintain a certain walking speed towards a certain direction in a relaxation time ; is the interaction force between the objective individual and the neighboring individual ; means the instant interaction force between the objective individual and the wall/obstacle . More details about the social force model can be found in the literature (Chraibi et al., 2011; Helbing and Molnár, 1995; Kretz et al., 2018). As the motion of individual is driven under Newton's second law with a second-order dynamics function, the velocity and the unit direction vector of individual is determined inbased on which the location of individual is finally obtained as In addition, to simulate the crowd movements, a random walking process is set by adjusting the desired initial direction . Then, when individual is close to hitting the boundary, the model will force the person to change the desired direction to any direction that is away from the boundary.

Cough droplet movement simulation

Virion-laden droplet transmission of a typical cough is modeled through Computational Fluid Dynamics (CFD) simulations within a closed environment. First, the computational domain and grids are simulated. Second, we choose the appropriate methods and parameter settings from literatures for the numerical simulation. Finally, after completing the simulation with the commercial CFD solver ANSYS Fluent release 2020 R2, time series of positions and masses of each cough droplet can be determined for Section 3.3. An infector is represented by a source manikin 1.70 m tall, representing an average-sized human as Fig. 3 (a) (Liu et al., 2017). The mouth of the infector is 1.50 m from the ground, and the mouth outflow is roughly horizontal. For susceptible people whose height ranges from 1.40 m to 2.00 m, we take 1.20 m–1.80 m as the breathing height area, as shown in the blue area of Fig. 3(a). The computational domain is in a cuboid shape and is illustrated by Fig. 3(b), in which “x”, “y”, and “z” axes represent the lateral, axial (streamwise), and vertical directions, respectively. The length of the computational domain in the “y” axis is set to be 3.00 m to consider the scope of a cough fully. Now, the blue area of Fig. 3(a) is the view of Fig. 3(b) along the “x” axis. The inlet is a square with an area of 3.70 × 10−4 , representing the human mouth (Borro et al., 2021). The center of the inlet is denoted by (0, 0, 0), which is also the origin of the coordinate system. The computational domain and grids are generated by using Gambit 2.4.6. The grid resolution is 0.01 m, and the total number of computational cells is approximately 1.80 million.

Fig. 3

Schematic diagram of (a) the computational domain with source and susceptible manikins, and (b) the numerical simulation computational domain.

Schematic diagram of (a) the computational domain with source and susceptible manikins, and (b) the numerical simulation computational domain. To accurately estimate the consequences of the coughing event, reliable modeling methods and settings of numerical simulation are important. The transmission medium of the cough is modeled as an incompressible ideal gas with constant properties calculated at ambient conditions. The flow field evolution of coughing is time-dependent, so the simulations are conducted under a transient condition. The gravitational acceleration is −9.81 along the “z” axis, and the energy equation is required. Since the droplet volume fraction is very low in the cough flow, the Eulerian- Lagrangian method is used in this study (Zhu and Zhu, 2021; Gupta et al., 2010; Zhu et al., 2006). According to previous study (Borro et al., 2021), a time step size of 0.01 s is used, with 10 sub-iterations. The total flow time of the transient simulation is 15.00 s, which is enough to investigate the dynamic characteristics of the droplets produced by coughing. More detailed settings are determined based on literature (Aliabadi et al., 2010; Bi, 2018; Chao et al., 2009; Duguid, 1946; Gao and Niu, 2006; Zhang and Li, 2012; Zhang et al., 2017). The droplet dispersion process in the computational domain is obtained after the simulation. The results show that the drop velocity increases with a larger diameter, and the droplet can be suspended for a longer period when the diameter is less than or equal to 1.00 × 10−5 m, as the same in (Borro et al., 2021).

Individual exposure risk estimation

In this part, we first give the mathematical presentation of the instantaneous personal exposure risk. As mentioned above, when the person and the cough droplets meet in the same simulation place, the instantaneous exposure risk is defined as the possible maximal mass of droplets suffered. Then, based on the numerical simulation results of the typical cough, we count the mass of droplets vary with the time at several fixed distances planes, and analyze the data distribution to determine the model function, which is the final formula of the instantaneous exposure risk. Finally, the total exposure risk during the visiting time for each person is estimated. It should be noted that, according to a few studies that analyze the dispersion of cough-generated droplets in the wake of a walking person (Li et al., 2020), the situation can be more complicated when there are more individuals. Besides, the direction of the cough is likely to be different from that of the individual's movement, so the impact of walking on cough transmission is unpredictable. Therefore, in this model, we assume that the infector is stationary when he/she starts the -th cough in the position at time . Then, cough droplets' movements are only determined by the physical distance from and the time interval from . All positions needed here can be determined based on the crow simulation in Section 3.1. When individual in the position at time , the instant distance from the position is and the time interval from is . If and are both in the cough infectious distance and infectious time respectively, based on the definition, we havewhere is the instantaneous exposure risk of the individual exposed to individual -th cough, and is the maximal mass of droplets generated from individual -th cough when spreading to the distance after . On the assumption that all coughs are typical, to get the deterministic mathematical formula of , it is sufficient to analyze the change of droplet mass during the transmission of one typical cough. Since the density of all droplets in Section 3.2 is set to the same value, the larger the diameter, the greater the droplet mass. Droplets with a large mass fall quickly, and their destinations are always close to the injector. In contrast, others with small mass exist in the computation region for a long time, and their physical distances from the injector are random. Hence, it is hard to find a suitable deterministic formula to describe the change of droplet mass with the time-varying distance. To solve this problem, we select representative x-z planes and model the arriving droplet mass varies with time for each plane. For a typical cough, the mass of droplets passing through the x-z plane along the “y” axis at distance m and time () can be counted. Since the individual is represented as a circle with m, the calculation domain in the simulation is divided with a length of 0.20 m along the “y” axis, and the x-z plane at the center of each segment is taken as the representative of each region. More exactly, the value of starts from 0.10 m to 2.90 m with an interval of 0.20 m, and there are 15 representative x-z planes as shown in Fig. 4 .

Fig. 4

15 representative x-z planes in the three-dimensional schematic diagram of the computational domain.

15 representative x-z planes in the three-dimensional schematic diagram of the computational domain. Moreover, to avoid reducing the counted droplet mass caused by the setting of timestep in pedestrian dynamics simulation, the module uses the total mass of droplets passing through during timesteps as . As settings in the second module (Section 3.2), the simulation time step size of cough droplet transmission is 0.01 s. Therefore, the simulation time step size of pedestrian dynamics should be bigger than 0.01 s. When we set = 0.02 s, 0.03 s, 0.04 s, 0.05 s, respectively, the statistical results show that the distribution of varies with and it conforms to the Gaussian distribution shown in Fig. 5 . Thus, for the , is obtained bywhere , and are parameters of the fitted Gaussian distribution function for specific .

Fig. 5

versus at different times: (a) ; (b) ; (c) ; (d) .

versus at different times: (a) ; (b) ; (c) ; (d) . After determining the function of the for a typical cough, since in the typical cough is the same as of individual -th cough, if and , the formula of is represented by As a consequence, if and , there is Then, the exposure risk of individual during the visit followswhere and respectively denotes the place enter time and the dwell time of individual , indicates the number of infectors in the place at time , and is the number of infectious coughs of the infector at time.

Prediction of transmission trends

In this part, we first determine the number of high-risk exposed people during the simulation horizon. Then, based on , we present a novel method to predict the number of new cases . Individual whose is defined as the high-risk exposed individual, where α is cut-line of high exposure risk. When the number of total individuals for the simulation is , the value of changes with the α and we have Admittedly, always increases with the growth of , i.e., there is a positive correlation between the and the . From the viewpoint of mathematics, can be set as a function of quantitatively as We consider the simplest relationship to determine the function by assuming that is a linear equation. Moreover, in an extreme case, when there are no infectious diseases, there are no high-risk exposed people and new cases, i.e., when . Based on these analyses, the function passes through the origin and be defined aswhere is the proportionality coefficient. To determine the optimal values of parameters and , the actual historical numbers of daily new cases are needed. Let and be the predicted and actual values of the -th historical observation, respectively, and indexes such as mean absolute error (MAE) can be used to describe the error between and . Herein, we select MAE as the index, and the appropriate values of and can be estimated as Finally, the in the future can be predicted with the and via Eqs. (9), (10), (11).

Applications in the COVID-19 of the United States

Data sources

To estimate RID transmission based on the actual situation, we used data from all 50 states and Washington, D.C. in the United States (U.S.) during the spreading of COVID- 19. From May 1st to July 25th in 2020, 79 days with complete national-level data were utilized, and statistics of public data were made based on days. There were 3 inputs in our model, as shown in Fig. 1, and they were introduced as follows: 1) The number of individuals each day was approximated as the daily population traveling out of home, which was collected from the U.S. Department of Transportation's Bureau of Transportation Statistics Trips by Distance –National data product (Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland, 2021; Zhang et al., 2021). 2) The number of infectors among individuals was not available from public data because whether each infector had a trip in a single day was unknown. As there were some undiagnosed cases in the travel crowd, we presented a reasonable assumption, i.e., the ratio of infectors to individuals was equal to the ratio of the total number of cases to the population traveling out of home. The total number of cases in each day was found from the U.S. Centers for Disease Control and Prevention (Centers for Disease Control and Preventions, 2021a). 3) Mean dwell time of individuals represented the average time spent for each person in the public space. We collected this data from the University of Maryland COVID-19 Impact Analysis Platform (Xiao et al., 2021; Zhang et al., 2021). The study period (79 days selected from May 1st to July 25th) was divided into a training set (the first 60 days, i.e., selected from May 1st to July 6th) and a testing set (the rest 19 days, i.e., from July 7th to July 25th), as shown in Fig. 6 . Besides, to reduce the computational cost, the number of individuals and the number of infectors among individuals were scaled down with a proportion for the simulation in both the training and testing sets. Consequently, the main outputs of the simulation, i.e., the number of high-risk exposed people and the number of daily new cases , should be expanded with the same proportion after the simulation. We used the first day in the study period (May 1st, 2020) as the benchmark, then 245,469,060 individuals who had at least one trip at that day could be scaled down to 10,000 individuals for the simulation, thus the proportion . The values of and could be estimated in the training set, and then be adopted in the testing set to evaluate the model.

Fig. 6

Schematic diagram of data relations used in this case.

Schematic diagram of data relations used in this case. In the model validation, the model's outcome (i.e., number of new cases ) was compared with the actual (i.e., number of daily new cases ), and the evaluation indexes were mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). Besides, the real number of daily new cases was obtained from the U.S. Centers for Disease Control and Prevention (Centers for Disease Control and Preventions, 2021a).

Simulation setups

In fact, there are various indoor scenes (e.g., restaurant, cinema, and subway hub), and their specific geometry or activity may impact the transmission process, which can be captured by our microscopic model. Even so, paying too much attention to the heterogeneous settings is unnecessary. Since considering all specific settings is not applicable, there is a more practical way that formulating a general public space. Accordingly, a 22 m 22 m indoor room without obstacles was formulated, and there was an entrance on one side of the room and an exit on the opposite side (see Fig. 7 ). The red and blue dots represented the infectors and the susceptible individuals, respectively.

Fig. 7

The sketch map of the simulation space in this case.

The sketch map of the simulation space in this case. In the simulations, the specific individual number, the infector number and the mean dwell time were determined according the realistic data introduced in section 4.1. Initially, no individual existed in the room, and the crowd were designed to enter the indoor room through the entrance in sequence. The individual followed a random walking pattern in the room during the dwell time and leave the room through the exit. The individual was represented by a circle with m, and the relaxation time and the desired speed were 0.50 s and 1.34 m/s, respectively. The infector averagely coughed every 15 s after entering the room, and the infectious time of a cough followed a uniform distribution from 0 to 15 s. The infectious distance of a typical cough is set at 1.70 m since equaled almost 0 at a greater distance according to Fig. 5(c).

Simulation outputs

Individual exposure risk estimation

The exposure risk of each person in a single day is obtained from the simulation. Picking 4 days (May 1st, May 17th, July 10th, July 25th) as examples, the inputs of the simulation (i.e., , , and ) and the statistical results of simulation outputs (i.e., the number of susceptible individuals whose or ) are shown in Table 2 and Fig. 8 .

Table 2

These are examples of simulation inputs and statistical results.

Date	Inputs			Results
Date	Ctotal	Cinf	Tdwell/minutes	Csus with Ei=0	Csus with Ei>0
05/01	10,000	39	26	1092	8869
05/17	10,210	55	24	724	9431
07/10	10,210	128	25	225	9857
07/25	9844	169	23	138	9537

Fig. 8

The number of individuals varies in different exposure risk segments.

These are examples of simulation inputs and statistical results. The number of individuals varies in different exposure risk segments. According to Table 2 and Fig. 8, simulation results show that with μg on May 1st is 2722 more than that on July 25th. However, with μg on May 1st is 1652 less than that on July 25th. In other words, with the increase of virus transmission time, more and more individuals have higher exposure risks. This trend is consistent with existing cognition and in line with the reality, which demonstrates the validity of our model.

Prediction of new cases

In order to analyze the contributions of 2 parameters ( and ) to the model results, Sobol method is adopted and the total-effect index is used as the evaluation measurement. According to the previous results of personal exposure risk, the variation range of is between 0.50 g and 100.00 g. Besides, is set to change from to through numerous tests. Saltelli's sampling scheme generates 6000 samples for each training day. Based on the training set, the average total-effect of parameters and in each training day are shown in Fig. 9 , and all days' average total-effect of and are 0.986 and 0.269, respectively, where the confidence interval level is 95.00%. Results illustrate that the output of our model depends on parameters and . Parameter makes more contributions as expected because it is the cut-line of high exposure risk, which is the key to determining both the and .

Fig. 9

The average total-effect of parameters and in each training day.

The average total-effect of parameters and in each training day. To predict the daily new cases in the testing set, parameters and should be estimated first. After expanding the predicted results with the proportion ( in Section 4.1), MAE varies with and can be seen from Fig. 10 . Herein, we find μg and (corresponding MAE is 6080.89) based on the training set according to Equation (12). Note that the dark blue curve in Fig. 10(a) indicates that increases with the growth of as expected.

Fig. 10

MAE varies with α and β. (a) The side view of when μg and . (b–d) The front view and 2 side views when MAE are less than 6500.

MAE varies with α and β. (a) The side view of when μg and . (b–d) The front view and 2 side views when MAE are less than 6500. Then, with parameters and , the prediction in the testing set can be calculated and expanded with the proportion . As a result, our model achieves a good prediction effect under different evaluation indexes: MAE is 8140.00, RMSE is 9598.17, and MAPE is 12.01%. For some machine learning models, the number of training samples with only 60 days is small, which cannot reflect the optimal performance of the model (Zhang and Jiang, 2021). To verify whether 60 days are enough for training our model, we randomly select several days (e.g., 1 day, 2 days, …, 59 days) from 60 days as new training sets, and analyze the changes of prediction effects in the same testing set (i.e., the rest 19 days). We estimate and in the same way as before for different new training sets, and the evaluation results on the testing set are shown in Fig. 11 . The values of MAE, RMSE, and MAPE decrease with the increase of the number of days in different training sets, and each of the indexes converges to a stable value within 60 days. Hence, 60 days are sufficient to be the training set, based on which the parameters obtained are reliable, and the corresponding prediction results can represent the best effect of our model.

Fig. 11

(a) MAE, RMSE, and (b) MAPE change with the number of days randomly selected as the training set.

Comparison with other models

In this section, following the study region and period in Section 4, we predict transmission trends via our model and several existing models from macroscopic or microscopic levels, and then compare their performances.

Microscopic-level models

Since the proposed model focuses on the general place and determines the personal exposure risk at the microscopic level, comparison methods should have the same central points, thus Xiao's model (Xiao et al., 2021) and Hernández- Orallo's model (Hernández-Orallo and Armero-Martínez, 2021) are adopted. Besides, as there is no direct prediction of daily new cases in these two compared models, we follow Section 3.4 to make the forecast after determining the personal exposure risk by these two models, respectively. The input data fields required by the two models are the same as our proposed model, and they have the same training and testing sets as Section 4. For the contact approach and cough approach of Xiao's model, the cut-off distance of the exposure is 1.00 m and 2.50 m, respectively, as the same settings in (Xiao et al., 2021). In Hernández-Orallo's model, the contact cut-off distance is also 1.00 m, and the parameter for adjusting the model is set at the same value 1/30 as in (Hernández-Orallo and Armero-Martínez, 2021). The simulation place is an indoor room, as shown in Fig. 7, which has low air renewal, high temperatures, and low solar radiation, so the value of the quality of the medium in Hernández-Orallo's model is 7/9 = (1 + 1/3 + 1)/3. Based on the training set, we follow Equation (12) to find the and for each model: 10.00s and 1.19 in Xiao's Model (corresponding MAE is 7205.67); 0.60 MEMs and 1.41 in Hernández-Orallo's Model (corresponding MAE is 7172.78). Hence, based on Eqs. (9), (10), (11), the prediction results of the testing set by the two microscopic-level models are reported in Fig. 12 and Table 3 .

Fig. 12

The actual values and different model results of change with the date.

Table 3

These are MAE, RMSE, and MAPE between actual and predicted results by different models.

Model	MAE	RMSE	MAPE
SIR Model	11,589.07	13,638.09	16.84%
Grey Model	10,594.70	12,768.68	15.39%
Xiao's Model	38,965.99	39,468.87	58.46%
Hernández-Orallo's Model	34,989.95	35,537.92	52.41%
Our Model	8140.00	9598.17	12.01%

NOTE. MAE, mean absolute error; RMSE, root mean square error; MAPE, mean absolute percentage error.

The actual values and different model results of change with the date. These are MAE, RMSE, and MAPE between actual and predicted results by different models. NOTE. MAE, mean absolute error; RMSE, root mean square error; MAPE, mean absolute percentage error.

Macroscopic-level models

The typical SIR model (Kermack and Mckendrick, 1927; Maier and Brockmann, 2020) and grey model (Luo et al., 2021; Zhang and Jiang, 2021) are adopted as comparison macroscopic-level models. The SIR model is a traditional infectious disease model, and it works with the assumption that the population in the study region is uniform and homogeneously mixed. In the SIR model, the population is divided into three classes, namely, : susceptible, : infected, and : removed (by recovery and death) (Maier and Brockmann, 2020). The time-varying number of cases in each class is governed by the infectivity rate and the removal rate (Zamiri et al., 2015). Thus, when we know the initial , , , and the constant values of and , we can predict the number of infectors in the future, based on which the number of daily new cases are calculated. The grey model, favored for its “simple model, strong adaptability, and easy parameter changes”, has been widely used in the field of infectious diseases (Zhang and Jiang, 2021). Unlike the neural network methods that need a substantial number of datasets for training system parameters, the grey model can predict the trend of the COVID-19 well with limited information. Specifically, only the historical daily new infected cases are needed to forecast the in the following days. In Section 4, only 60 days (selected from May 1st to July 6th) are used as the training set due to the lack of data. Fortunately, data required by the SIR model and grey model in 67 days (from May 1st to July 6th) are available from the public datasets. Thus, the time-series data from May 1st to July 6th at the national level of the U.S. are adopted as the training set here. The testing set is the same as Section 4, i.e., 19 days from July 7th to July 25th. Besides, the parameters of the SIR model and grey model are set based on the national dataset. In SIR model, the population size of the U.S. is , which comes from the U.S. Department of Transportation's Bureau of Transportation Statistics Trips by Distance – National data product (Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland, 2021; Zhang et al., 2021). The cumulative number of infected cases is obtained from the U.S. Centers for Disease Control and Prevention (Centers for Disease Control and Preventions, 2021a). The number of removed cases (by recovery and death) is provided by the record COVID-19 DATA HUB1 (Guidotti and Ardia, 2020). The number of susceptible individuals is getting as . After fitting the training set with the ordinary least squares, we get parameters that minimize the sum of squares of errors: and . Besides, the basic reproduction number is reasonable with existing research (Centers for Disease Control and Preventions, 2021b), and it is in line with the situation when no epidemic prevention and control policies were implemented in the U.S. For the grey model, the cumulative number of infected cases is adopted from the U.S. Centers for Disease Control and Prevention (Centers for Disease Control and Preventions, 2021a), and there are no other parameters to be set. Thus, the prediction results of the testing set by the two macroscopic-level models are reported in Fig. 12 and Table 3.

Comparison of results

The time series numbers of daily new cases in the testing set estimated by the proposed model, macroscopic models, microscopic models, and obtained via the real-world dataset are shown in Fig. 12 and Table 3. Based on these, the proposed model achieves the best prediction performance when compared with the four existing models. The SIR model and grey model at the macroscopic level cannot determine the personal exposure risk and the number of high-risk exposed people, but these can be estimated by microscopic-level models such as ours. Meanwhile, even though time series data are used for training the SIR model and grey model, the number of samples in the training set may still be too small to guarantee the best performance. However, based on the analyses of Fig. 11, the number of samples, in this case, is enough to show the good performance of our proposed model, thus our model is more suitable for small sample data. We admit that these two models are classical and traditional, and now there are many improved ones based on them, which may have better prediction ability than our model. The personal exposure risk is defined differently in various microscopic level models, e.g., Xiao's model, Hernández- Orallo's model, and ours. In Xiao's model, the exposure risk in the influence area is set to a constant value that cannot change with the physical distances. Hernández-Orallo's model has proposed the exposure risk based on the distance and time between individuals and the quality of the medium. Our model considers the cough droplet dispersion processes and the time-varying individuals' physical distances, and determines the exposure risk as the total maximum droplet mass that the individual may suffer. Results in Fig. 12 and Table 3 show that our model's definition of exposure risk is more appropriate than the other 2 models to forecast the transmission trends of RIDs.

Discussions and future perspectives

In the construction of the proposed model, to forecast the number of new cases, the assumption that is a linear equation has been illustrated, and other functional forms (e.g., exponential function, power function) can be further studied. In the simulation setups (Section 4), due to the lack of real-word scene and human activities data, a general indoor space without non-walkable areas and a random walking pattern of individuals are formulated for the reproduction of microscopic epidemic transmission, and by this means, the simulation results of our model can be compared with the macroscopic epidemic transmission data. Fortunately, the potential drawbacks due to ideal assumptions on the model comparison results is minimized since the involved models here are all applied in the same scene. It should be noted that our model can be used to simulate scenes with different obstacles, activities, and other settings, once we have the specific scene setting data. The proposed model can predict transmission trends under different model inputs or scenes. Hence, for factors as Table 1 shows, if they affect the model inputs or scenes, the model outputs change accordingly, then the transmission trends caused by factors can be quantitatively explored. For example, GDP is an economic indicator (Zhang et al., 2021). Generally, there are more people and more advanced transportation systems in areas with high-level GDP, where people have more activities in the public space. Therefore, for the inputs of our model, the number of individuals and the mean dwell time in high-level GDP areas will be larger than that in regions with low-level GDP, resulting in different prediction results based on the proposed model. Similarly, in the scene with a large number of visits, the NPI named entry limitation policy (Xiao et al., 2021), which means the maximum number of visitors inside the indoor place is limited, can change the number of individuals definitely, and its contributions can be captured by our model. However, we must admit that the influence of several factors (e.g., sunspot numbers (Nasirpour et al., 2021)) cannot be considered in our model because they have limited effect on the model inputs or scenarios. Besides, for several prevention and control measures that block the spread of cough droplets directly near the mouth, such as facemask wearing, the transmission of droplets is different from a typical cough droplet transmission. Since the virion-laden droplet transmission of a typical cough is set as a deterministic model in Section 3, it is necessary to improve the model to evaluate these NPIs’ contributions. Moreover, the direct and indirect effects of influence factors on the prediction results can be further determined with methods such as structural equation modeling (Ma and Bennett, 2021). In fact, policy-makers and political communicators should consider not only the influences of factors, but also the social acceptance and effects of different rules, and these can be further studied (Bontempi, 2021; Gollwitzer et al., 2021; Smith et al., 2021).

Conclusions

In this paper, we developed a microscopic level exposure-risk-based model to predict the transmission trends of RIDs in a general public place. Specifically, to determine the personal exposure risk, our model coupled the motions of individuals and the time-varying cough droplet dispersion process. Then, the number of new cases was predicted based on the assumption that it was suitable for a linear function with the number of high-risk exposed people, and the prediction could be tested quantitatively when compared with actual values. Based on COVID-19 data collected in the U.S., we constructed real-world simulations and compared the prediction effect of our model with those of several existing models. Our model achieves superior performance in forecasting the transmission trends of RIDs, and brings guiding significance to control and prevention.

Funding

This work was supported by the (Grant No. 72101276), the Shenzhen Science and Technology Program (Grant No. GXWD 20200830165201001) and the Fundamental Research Funds for the Central Universities, Sun Yat-sen University (Grant No. 22qntd1710).

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

60 in total

1. The size and the duration of air-carriage of respiratory droplets and droplet-nuclei.

Authors: J P Duguid
Journal: J Hyg (Lond) Date: 1946-09

2. VaCoChain: Blockchain-Based 5G-Assisted UAV Vaccine Distribution Scheme for Future Pandemics.

Authors: Ashwin Verma; Pronaya Bhattacharya; Mohd Zuhair; Sudeep Tanwar; Neeraj Kumar
Journal: IEEE J Biomed Health Inform Date: 2022-05-05 Impact factor: 5.772

3. COVID-19 indoor exposure levels: An analysis of foot traffic scenarios within an academic building.

Authors: Van Romero; William D Stone; Julie Dyke Ford
Journal: Transp Res Interdiscip Perspect Date: 2020-08-06

4. A novel grey model based on traditional Richards model and its application in COVID-19.

Authors: Xilin Luo; Huiming Duan; Kai Xu
Journal: Chaos Solitons Fractals Date: 2020-11-17 Impact factor: 5.944

5. A guideline to limit indoor airborne transmission of COVID-19.

Authors: Martin Z Bazant; John W M Bush
Journal: Proc Natl Acad Sci U S A Date: 2021-04-27 Impact factor: 11.205

6. Theoretical investigation of pre-symptomatic SARS-CoV-2 person-to-person transmission in households.

Authors: Yehuda Arav; Ziv Klausner; Eyal Fattal
Journal: Sci Rep Date: 2021-07-14 Impact factor: 4.379

7. Inference of person-to-person transmission of COVID-19 reveals hidden super-spreading events during the early outbreak phase.

Authors: Liang Wang; Xavier Didelot; Jing Yang; Gary Wong; Yi Shi; Wenjun Liu; George F Gao; Yuhai Bi
Journal: Nat Commun Date: 2020-10-06 Impact factor: 14.919

8. Can commercial trade represent the main indicator of the COVID-19 diffusion due to human-to-human interactions? A comparative analysis between Italy, France, and Spain.

Authors: E Bontempi; M Coccia; S Vergalli; A Zanoletti
Journal: Environ Res Date: 2021-06-18 Impact factor: 6.498