Literature DB >> 35472092

Using mobile phone data to estimate dynamic population changes and improve the understanding of a pandemic: A case study in Andorra.

Alex Berke¹, Ronan Doorley¹, Luis Alonso¹, Vanesa Arroyo², Marc Pons², Kent Larson¹.

Abstract

Compartmental models are often used to understand and predict the progression of an infectious disease such as COVID-19. The most basic of these models consider the total population of a region to be closed. Many incorporate human mobility into their transmission dynamics, usually based on static and aggregated data. However, mobility can change dramatically during a global pandemic as seen with COVID-19, making static data unsuitable. Recently, large mobility datasets derived from mobile devices have been used, along with COVID-19 infections data, to better understand the relationship between mobility and COVID-19. However, studies to date have relied on data that represent only a fraction of their target populations, and the data from mobile devices have been used for measuring mobility within the study region, without considering changes to the population as people enter and leave the region. This work presents a unique case study in Andorra, with comprehensive datasets that include telecoms data covering 100% of mobile subscribers in the country, and results from a serology testing program that more than 90% of the population voluntarily participated in. We use the telecoms data to both measure mobility within the country and to provide a real-time census of people entering, leaving and remaining in the country. We develop multiple SEIR (compartmental) models parameterized on these metrics and show how dynamic population metrics can improve the models. We find that total daily trips did not have predictive value in the SEIR models while country entrances did. As a secondary contribution of this work, we show how Andorra's serology testing program was likely impacted by people leaving the country. Overall, this case study suggests how using mobile phone data to measure dynamic population changes could improve studies that rely on more commonly used mobility metrics and the overall understanding of a pandemic.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35472092 PMCID： PMC9041757 DOI： 10.1371/journal.pone.0264860

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

At the start of the COVID-19 pandemic, nonpharmaceutical interventions (NPIs) were widely deployed in an effort to stymie the rate of new infections. These interventions included stay-at-home orders and restrictions on economic activity, which were used as a means to reduce contact and hence transmission rates, effectively limiting mobility. Country border restrictions were also put in place to reduce the chance of importing the virus through inter-country travel. At the same time, tests became more available to better track population infection rates [1]. There has been an influx of data and research used to study the efficacy of various interventions [2-4]. In particular, this work addresses the use of population movement data. Research preceding COVID-19 has indicated a close relationship exists between human mobility and the spread of infectious disease [5]. Past studies have shown how mobility data, such as commuter trips, can be used to improve disease forecasting models [6]. These earlier works highlighted the importance of combining their modeling frameworks with mobility data to address potential future emergent respiratory viruses, while also citing a lack of real-time mobility data as a limitation. In the wake of COVID-19, such real-time mobility data became widely available to study the pandemic, largely collected through airlines or via mobile phones. This is demonstrated in early works using aggregated metrics from Baidu LBS [7] to estimate domestic population movement in China. By combining this data with airline transportation data to estimate international travel, researchers modeled the effect of travel restrictions and the international spread of COVID-19 [8]. Similarly, the Baidu LBS data was also used to model the spatial spread of COVID-19 from Wuhan to evaluate the impact of domestic control measures [9]. Mobility data collected from mobile phones has also since been made available by Google [10], Facebook [11], Safegraph [12], transit apps [13], telecoms, and other companies [14]. Metrics based on these sources have been used to model or predict COVID-19 transmission rates [15-20] as well as to verify model results [21], with the assumption that changes in transmission rates are correlated with changes in the mobility metrics. Researchers have also combined mobile phone data from multiple sources to better understand the spatiotemporal dynamics of how the virus can spread. This includes work that simulated relationships between the number of virus cases imported to an area, subsequent population mobility, and virus spread in multiple European countries [22]. Whereas another study tracked a specific fast-spreading lineage of COVID-19 in the United Kingdom by combining aggregated mobility metrics from both Google and the O2 telecommunications service provider with genomic data [23]. Despite the broad use of these mobility data sources, their relationship to COVID-19 remains unclear. The published mobility metrics are often aggregated statistics representing the number of trips taken, such as measured through transit apps, or based on foot traffic to points-of-interest (POIs). Furthermore, the mobility data used in each of the above works are limited in that they report on a fraction of the population. (For example Baidu LBS and O2 have about 30% and 35% market share, respectively [9, 23], and Safegraph has one of the larger U.S. datasets yet in 2019 they covered only about 10% of the U.S. population and acknowledged reporting bias [24]). Likewise, other studies using reported cases microdata or air travel data to analyze the risks of importing the virus via inter-country travel (e.g. [25, 26]) are also limited by data sources that only report on a fraction of the true data.

Contribution

This work presents a unique case study in Andorra, with comprehensive datasets that include telecoms data covering 100% of mobile subscribers in the country, and results from a serology testing program that more than 90% of the population voluntarily participated in. Previous work used these data sources to compare various mobility metrics and infection rates with retrospective correlation analysis [27]. This work builds upon these previous findings and develops compartmental epidemic models. At the start of the pandemic in Andorra, border restrictions and economic lockdowns drastically reduced country entrances and internal country mobility. This study includes that period as well as when restrictions were lifted. The mobile phone data are used to estimate mobility metrics representing trips, similar to related works, as well as to conduct a real-time census and estimate metrics that represent the dynamic population changes, such as daily country entrances. These data are then used to improve the understanding of the pandemic in Andorra in multiple ways. First, we show how Andorra’s serology testing program, conducted in May 2020, was likely impacted by people leaving the country. We then show how the estimated country entrances data can improve epidemiological (SEIR) models that otherwise rely on mobility measured by trips. Related works have used meta-population SEIR models where the modeled sub-populations are dynamic, yet based on static census commuting data or based on a combination of POI visits and static commuting data (e.g. [28]). In contrast, this work uses comprehensive telecoms data to estimate a real-time census to more accurately capture the changing dynamics of the population during the period of study. We develop and test multiple (SEIR) models that differ in how they parameterize transmission rates based on the trips and entrances metrics developed in this work. The models are simple, where their purpose is to illustrate how different types of mobility information can be better incorporated into SEIR models. Finally, we use the best model to simulate a hypothetical counterfactual, representing a scenario where economic and border restrictions had not been put in place, and trips and entrances metrics had not drastically reduced.

Outline

Before presenting our methods and results, we provide background information, with a timeline of events around the start of COVID-19 in Andorra, and the features of the country that contribute to a unique case study. We also provide background information about compartmental epidemic models to guide the reader in the presentation of our models.

Background

Andorra and COVID-19

The study region of this work is the small country of Andorra, which is located in the Pyrenees mountains and shares borders with only France and Spain. The country has a population of approximately 77,000 [29], yet attracts more than 8 million visitors annually, mostly for tourism associated with skiing and nature-related activities [30]. In addition, a large number of cross-border temporary workers reside in the country, mainly employed in the tourism industry. Andorra lacks an airport or train service so the primary way to enter or exit the country is by crossing the French or Spanish border by car. The country is divided into 7 municipalities, called parishes. Partly because of the country’s small size and limited border crossings, Andorra was able to implement comprehensive policies at the start of the COVID-19 pandemic, as well as implement a serology testing program which more than 90% of the population participated in. Furthermore, there is one telecoms provider for the entire country, which contributes a comprehensive view of all mobile subscribers who spend any time in Andorra, whether they are Andorran nationals or have foreign SIM cards. The telecoms data and serology data are used in this work and are described in the Data sources and preprocessing section.

Timeline of COVID-19 cases and policies

The first COVID-19 case in Andorra was reportedly imported via Italy and confirmed March 2, 2020 [31]. Reported cases then rose rapidly in March before falling again in April (see Fig 1). On March 13, government officials ordered the closure of public establishments and a quarantine was requested of the entire population. A series of COVID-19 related policies followed and neighboring country borders were restricted. In accordance with these policies, mobility within the country dropped and border crossings ceased. Other NPIs, such as masks and hand sanitizer, were also deployed. The lockdown measures in Andorra were gradually lifted in April and May, and fully lifted starting June 1. Borders also reopened in June and border crossings resumed. Table A.1 in S1 Appendix shows a timeline of COVID-19 related events.

Fig 1

Daily reported cases, trips, and entrances metrics at the start of the COVID-19 pandemic in Andorra.

The time series data are plotted for March to August, 2020, which covers the study period. Solid lines show values smoothed over a 7-day rolling window.

Daily reported cases, trips, and entrances metrics at the start of the COVID-19 pandemic in Andorra.

The time series data are plotted for March to August, 2020, which covers the study period. Solid lines show values smoothed over a 7-day rolling window.

Nationwide serology testing program

In May of 2020, Andorra conducted a nationwide serology testing program. This resulted in the first published seroprevalence study universally testing the entire population of a country and one of the largest of its kind [32]. Anyone over the age of 2 was invited to participate in the study, including the country’s temporary workers. The testing was conducted in two phases: May 4 -14, and May 18—28, 2020. The objectives of the second phase were (a) to track the progression of COVID-19 between the two surveys and (b) to account for indeterminate or potential false negative results from the first survey. More than 90% of the population participated voluntarily in at least one of the two surveys. However, an issue with the testing program was that many participants in the first phase did not participate in the second, limiting the data collection and impact of the two-phase study. This issue is further explored and addressed in the Results section.

SEIR models and COVID-19

SEIR models, and their variations, are compartmental models used in epidemiology. They have been widely used in forecasting COVID-19 transmission and modeling the outcomes of government policies [15, 33, 34]. The basic concept of these models is that the population is partitioned into sequential compartments, and transitions through the compartments over time. This framework was first developed by Kermack and McKendrick in 1927 [35] and has been well described more recently by Keeling et al. [36]. In short, the SEIR model takes its name from its compartments: S = Susceptible E = Exposed I = Infectious R = Removed (quarantined, recovered, or deceased) S represents the number of Susceptible people in the population who have not yet been exposed to the virus. Individuals transition from Susceptible to Exposed after exposure to individuals in the Infectious (I) compartment. Hence the transition S to E is a function of the number of people in the Susceptible (S) and Infectious (I) compartments, as well as the transmission rate, β, and the total population size, N. The standard model considers N constant, and the following conservation holds for any time, t: Transitions between compartments are modeled by a set of ordinary differential equations (ODEs). Where β = transmission rate of the infection σ = latent rate γ = removal rate The latent rate, σ, is the average rate to become infectious after exposure (i.e. σ−1 = average incubation period) and the removal rate, γ, is the average rate at which individuals transition from I to R. The modeled compartments and transitions are simplifications, yet this simple framework may be well applied to COVID-19 at the start of the pandemic, before populations were vaccinated or encountering re-infections. (Models for diseases over longer periods of time may also incorporate changes in the population via birth and death rates, while other models handle individuals becoming susceptible again [36]). An epidemic is often characterised by the basic reproduction number, R0. The estimation and value of the reproduction number is complex and often misrepresented, but in general it represents the expected number of secondary infections which would be caused by a typical infected case if everyone in the population were susceptible [37, 38]. R0 can be calculated as the ratio of the transmission rate to the removal rate. Often in compartmental models, both of these parameters are constant in time. However, if one or both of these parameters is time-varying, then the variation of R0 over time can be estimated. While the R0 only represents the true reproductive rate at the start of the pandemic when the whole population is susceptible, the variation of this ratio over time isolates the impact of changes in human behavior and NPIs on the reproductive rate. (The effective reproductive rate R, on the other hand, represents the actual reproductive number at any point in time, given the behaviour as well as the susceptible portion of the population [39].) Estimates for reproduction numbers have been used to understand the state of a pandemic and to measure the effectiveness of interventions [4, 34, 40–42]. R0 is a function of both transmission rate and removal rate. The removal rate represents the rate at which infectious individuals are removed from the population and then are no longer at risk of infecting susceptible individuals. Removal might occur because they isolate, or recover and are no longer infectious, or die. The removal rate may vary due to changes in testing procedures (e.g. more proactive testing can identify more cases and cause individuals to isolate earlier in their infectious period) or government policies (e.g. quarantine rules). Likewise, the transmission rate can change due to governmental policies and behavioral changes (e.g. staying home, wearing masks, and other NPIs). Recent models that address COVID-19 have taken into account that transmission rates vary over time [15, 43–45]. Many models do so by incorporating mobility metrics to estimate behavioral changes and model changes in transmissibility based on these data. However, these mobility metrics are often based on sources that report on a small fraction of the population, and where the mobility metrics are aggregated statistics based on the number of trips to points of interest (POIs), which may not be the most important indicators of COVID-19 transmission. This is in contrast to the telecoms data used in this work, which covers all mobile subscribers within the country of Andorra, and is provided as a complete and unaggregated dataset, not limited to trips to POIs. We note that any of the models referenced or presented in this work are oversimplifications of the complex dynamics of disease spread. They also suffer from unreliable case reports data, limited by the availability of tests, and reactive to changes in testing protocols [1].

Materials and methods

This section describes the SEIR models used in this work, and how they are trained and tested. It then describes data sources and preprocessing methods.

Code and data availability

All aggregated metrics and code used in this work are made available and documented in a public repository. The code includes analysis notebooks as well as the preprocessing scripts that produced the aggregated metrics. The data reporting on individuals, which was used to compute aggregate metrics, is sensitive and kept private. https://github.com/CityScope/CSL_Andorra_COVID_Public.

Modeling

This work develops and compares multiple SEIR models that differ in how they incorporate trips and entrances data in order to model transmission rates. The trips data measure mobility behavior within the country while the entrances data measure new country entrances (described in the Data sources and preprocessing section). The aim is to evaluate the relative impact of the trips and entrances data on model performance; the aim is not to build a state-of-the-art, accurate predictive model. To this end, the models are highly simplified.

Comparison models

In SEIR models, β(t) typically represents the average number of people an infected person would expose per-unit time if everyone were susceptible. In particular, β(t) is used to model the transition from the Susceptible to Exposed compartments. The use of β(t) in our models is captured by the following equation from Eq (2). We develop multiple models that only differ in how they define β(t). In the following descriptions, b0, …, bn are parameters of β(t) and are estimated during model training for each model in which they are included. One model uses trips without entrances data (model ii). Another model uses both trips and entrances data (model iii). A model that uses neither data source is used as a baseline (model i). Each of the models use the same framework, methods, and training and testing periods, described further below. Model i: constant transmissibility This is a baseline, dummy model where β(t) is constant. Model ii: transmission as a function of trips data Model iii: transmission as a function of trips and entrances data In this model, the average rate at which the susceptible population is exposed can be impacted by the behavior of people within the country (e.g. mobility measured in trips) as well as the import of new cases (entrances). where f(entrances(t)) represents the likelihood of new country entrants importing the virus. The term reflects the assumption that the likelihood of new country entrants being infectious tracks with the timeline of infection rates in Andorra. This assumption is based on the fact that during the study period, the timeline of infections in Andorra was highly correlated with the timeline of infections in Spain and France (with Pearson correlation coefficients of 0.922 (p = 0.000) and 0.932 (p = 0.000), respectively), and the primary way to enter Andorra is through the Spanish or French borders. Furthermore, telecoms data showed that 86% of entrances by foreign SIMs were either Spanish or French, and when accounting for entrances by Andorran SIMs, 68% of all entrances were by Spanish or French SIMs. See section A.3 in S1 Appendix. The above functions using entrances and trips can be combined into one equivalent expression representing transmissibility. We do this to simplify modeling and maintain a common expression for E′(t). where

Model framework

The SEIR framework used in this work is illustrated in Fig 2 and is described by the ODEs in Eq (3)). We note that many traditional SEIR models use the I compartment to represent the entirety of an individual’s infectious period. Our modeling framework assumes that individuals transition from I to R as soon as they suspect they are infectious. Individuals may then seek a test, and the result of the test will be reported with some delay. C represents the report of a positive test after that delay, d. Where

Fig 2

Schematic representing the SEIR model framework used in this work.

Schematic representing the SEIR model framework used in this work.

The population is divided into compartments where individuals transition through the compartments: Susceptible, Exposed, Infected, Removed, Case reported, where the transitions are described by ODEs (Eq (3)). C(t) is cumulative case reports and accounts for reporting delay, d, and the reporting rate, r. Given initial values for the compartments and the other model parameters, time series data for the compartments can be deterministically estimated by integrating over the ODEs into the future, where each compartment time series represents the compartment population on each day, t. This is done to calibrate parameters during model training as well as to generate forecasts beyond the training period. Initial values for R and C at t = 0 are set based on the number of cumulative reported cases at the start of the study period. Initial values for E, I, are estimated by model training, along with γ and parameters of β(t). The reporting rate, r, is set to , estimated from the serology and case reports data (Data sources and preprocessing section). The latent rate, σ, is set to , estimated by prior work [46]. The reporting delay, d, is set to 7, consistent with related works [21, 47] and empirical checks (see section A.7 in S1 Appendix). d is the average time from when an infectious individual is removed (isolated) to the time the case is reported, and must account for the time it takes to seek a test, for the test to be processed, and for the result to be included in reported cases data. At the start of the pandemic, tests in Andorra were sent to Spain for processing, which may have increased reporting delays. The reporting delay is incorporated into the models by shifting the trips and entrances metrics time series by d. See Table A.5 in section A.6 of S1 Appendix for a concise description of model parameters.

Training and testing

Cumulative reported cases in Andorra reached a threshold of 2 (over a 7-day average) on March 14. The serology tests, which were used to estimate the reporting rate, were conducted in May. In September, massive testing programs began and even before then, testing started to become more available. These programs and test availability increased the case identification rate, impacting both the reporting rate and the removal rate, changing the dynamics in modeling. For these reasons, the study period includes March to August, 2020. The period of March 14—May 31 is used for model training and the following 10 weeks are used for testing. Training. Parameters and initial values for E(t), I(t) at t = 0 were fit with maximum likelihood estimation (MLE). Log-likelihood was computed by comparing time series values of predicted cumulative reported cases (C) to the time series of actual cumulative reported cases: Where the sum is over all days in the training data, P(k, λ) is the Poisson distributed probability mass function, k is actual reported cases, λ is predicted reported cases. Parameters were optimized by minimizing the negative log-likelihood using the L-BFGS-B method [48]. See section A.5 in S1 Appendix for details. Testing. Median absolute percentage error (MAPE) over cumulative estimates has been used in a recent framework to evaluate and compare COVID-19 models [49], where the errors incorporate an intercept shift. MAPE is similarly used to evaluate and compare the performance of models in this work. Given model training estimates S, E, I, R, C up to time t, the trained model is tested starting at time t + 1 as follows. The value of C(t) is corrected to the true reported cases at time t and further integration over the ODEs is used to continue the simulation over the test period. The resulting C estimated over the test period is compared to actual reported cases via MAPE.

Data sources and preprocessing

Three main data sources are used in this work and are further described below: (i) serology data from the nationwide testing program conducted in May 2020, (ii) telecoms data covering all mobile subscribers in the country, (iii) official COVID-19 case and death reports. All time series metrics estimated from (ii) and (iii) are smoothed by taking the mean over a 7-day rolling window.

Serology data

As described in the Andorra and COVID-19 section, a nationwide serology testing program was conducted in May of 2020. The program was voluntary, and conducted in 2 phases, and 91% of the population participated. The program was conducted for a previous research study, in which the methods and results are detailed [32]. The study was approved by the Institutional Review Board of the Servei Andorra Atencio Sanitaria (register number 0720). An anonymized version of the dataset was also provided to researchers in our lab as part of a research partnership. The dataset includes a unique identifier for each participant and results from the 1st and 2nd round of tests; test results were left empty when there was a lack of participation. The dataset also includes demographic information for participants, including their home parish and whether they are a temporary worker. As previously described, an issue with the serology testing program was that many of the participants from the first phase of testing did not participate in the second phase (see Table A.3 in S1 Appendix). From the serology data, Bayes Theorem [50] was used to estimate the portion of the population infected up to May. With this number and the official reported cases data, we estimated a case reporting rate of . This reporting rate is used in the epidemiology models described in this work.

Telecoms data and metrics

Andorra has one telecoms provider (Andorra Telecom), which provided the data for this study. Since they are the sole provider, the dataset covers 100% of mobile subscribers in the country, including subscribers using foreign SIM cards. This is unlike most telecoms datasets where the market is fragmented. Each data point includes a unique ID for the subscriber, a timestamp, the coordinates of the device, and nationality for the subscriber’s home network. The data have been further described in [51]. The stay-point extraction algorithm of Li et al. (2008) [52] was used to reduce the series of data points for each subscriber into a series of stay-points of 10 minutes or more within a radius of 200m or less. The stay-points represent a more concise and reliable series of places the subscriber spent time; stay-points were used to infer presence in the country, dynamic population changes, and compute the trips and entrances metrics. There are gaps in the available telecoms data and the resulting trips and entrances metrics during the period of study (data gaps are June 28–29, and July 21–27, 2020). Missing values were imputed by taking the mean across the values from the 7 days surrounding each missing period of data. Dynamic population inference and metrics. On each day, a subscriber was considered present in the country if they had a stay-point in the country within a 7-day window. The window accounts for unobserved subscriber devices due to a combination of inactivity, lack of reception in certain areas, or noisy data. The beginnings and endings of periods of presence were counted as entrances to and departures from the country, respectively. Trips metrics. Daily trips for subscribers were counted as their daily number of stay points minus 1, since a new stay point is recorded when a subscriber moves beyond a 200m radius. Daily trips by subscribers were summed as a total daily trips metric. Home inference. The home parish of each subscriber was inferred from the telecoms data, to come up with a population count for each of the 7 parishes of Andorra. This was done by first assigning each stay-point to the parish in which it was contained. Each subscriber’s home parish was then determined to be the parish in which they spent the most cumulative time during night-time hours (12:00am to 6:00am). Related studies of human mobility that use cellular data have employed similar methods [53-56]. These inferred parish-level populations were compared to the published 2020 population statistics [29]. There is a Pearson correlation coefficient of 0.959 (p < 0.001), suggesting that the telecoms data are representative of the true population. (See Table A.2 and Fig A.1 in S1 Appendix). Inferring the parish of residence is done both to check methodology as well as compare populations to serology test participation (see the Serology tests and country departures section).

COVID-19 infection data

This dataset was made available by Johns Hopkins University [57] and downloaded from OWID [58] as a time series of daily reports. Reported cases in Andorra were used for model estimation and prediction. There were cases identified in Andorra through the May serology testing program that were reported late, on June 2 [59]. This reporting error was handled by removing the excess case reports. Fig 1 plots the resulting daily new and cumulative case reports over the period of study. Reported deaths data for Andorra and its neighboring countries, Spain and France, were used in model assumptions (see section A.3 in S1 Appendix).

Results

2019 versus 2020 metrics

Before presenting our main findings, we first present the start of the pandemic in Andorra through a series of plots, and compare this period to the same period in 2019, when Andorra experienced a normal economy with tourism. Fig 3 shows that by the start of March of 2020, there were already fewer people (mobile subscribers) in the country than in 2019. This number then substantially dropped with the start of the border restrictions and economic lockdown in mid March. There were also already fewer total daily trips being taken at the start of March, 2020, compared to 2019. This is largely due to fewer people in the country making the trips. This metric also substantially dropped at the start of the lockdown. This drop was partly due to even fewer people in the country making trips, and due to the government imposing restrictions on movement. The number of trips gradually rose again before the border restrictions were lifted in June, indicating that the population increased internal mobility. The number of daily entrances to (and departures from) Andorra also significantly dropped in mid March of 2020, as tourists and others left the country and border restrictions were imposed, limiting entry to the country. These daily metrics remained near zero throughout April and May, until border restrictions were lifted in June.

Fig 3

Estimated population, trips, country entrances and departures metrics for 2020 vs 2019.

(Top) daily mobile subscribers counted as present in the country, (middle) daily total trips, and (bottom) daily country entrances and departures, for the country of Andorra during the start of the pandemic in 2020 versus the same period in 2019. All metrics are estimated from telecoms data that covers 100% of mobile subscribers in the country. Solid lines show values smoothed over a 7-day rolling window.

Estimated population, trips, country entrances and departures metrics for 2020 vs 2019.

COVID-19 cases and mobility

The time series of reported COVID-19 cases is shown with the time series of the trips and entrances metrics in Fig 1. Other studies have implied that changes in case growth often lag changes in behavior and mobility metrics by 14 or more days [17, 21, 27]. However, Fig 1 shows that daily trips were able to increase throughout May of 2020 while newly reported cases remained low. Case growth did not increase again until daily entrances increased again when the border restrictions were lifted in June. This suggests that the entrances metric is more related to case growth than the trips metric in this case study. The relative predictive power of these metrics is further shown by the model results (Models results section).

Serology tests and country departures

Andorra’s nationwide serology testing program conducted in May, 2020 involved two phases of testing (see the Andorra and COVID-19 section). An issue with this program was that many of the participants from the first phase of testing did not participate in the second phase, limiting the impact of the study. An important question for a country conducting such a program might be why this happened. This drop in participation might be particularly concerning, as we found the drop in participation was more than 3 times higher among temporary workers versus the general population, and results from the testing program showed that temporary workers had higher seroprevalence (infection rates) versus the general population. See Table A.4 in S1 Appendix. This might imply that a more infected demographic group was then less monitored. By combining the serology test data with information inferred from the telecoms data, we find that test participants likely left the country after their first test. We counted the number of mobile subscribers, by inferred home parish, who were in the country during the first and second phases of testing (May 4–14 and May 18–28, 2020). Subscribers were counted as present during a testing period if they had at least one “stay” within the period. We estimated how many subscribers left the country after the first test by counting how many subscribers were present during only the first test period versus both test periods. These numbers were compared to the parish-level serology test participant populations. Namely, the portion of serology test participants who did test 1 but not test 2 was compared to the estimated portion of mobile subscribers who left the country between test periods, and this comparison was done for each home parish. Comparing across parishes, there is a statistically significant Pearson correlation coefficient of 0.937 (p = 0.0019). To check the robustness of this result, we also restricted the May 2020 telecoms data to subscribers who had at least 7 days, or 4 nights, of data. The results are similar with Pearson correlation coefficients of 0.925 (p = 0.0028), and 0.955 (p = 0.0008), respectively. To further validate that the decline in test participation was related to people leaving the country, we repeated these tests using 2019 telecoms data: we estimated the number of subscribers by home parish who were in the country during the periods May 4–14 and May 18–28 of 2019 (using 2019 telecoms data) and compared the number of subscribers who left the country between those periods to the serology test participation. In this case, there is a Pearson correlation coefficient of 0.4928 (p = 0.2612). If the May 2020 subscribers had left the country for reasons not related to the pandemic, we would expect the correlation to be similar for the 2019 and 2020 data. However, the correlation for the 2019 data is much lower and not statistically significant. See Table A.4 in S1 Appendix.

Models results

Simple models based on the SEIR framework, were developed to compare the impact of trips and entrances data on transmission rates and predicted infections. The baseline, dummy model (i) assumes a constant transmission rate. For model (ii) transmission is a function of mobility measured by trips data, and for model (iii) transmission is a function of both trips and entrances data. (See the Modeling section for details). Models were trained over the period March 14—May 31, 2020. Table A.5 and Fig A.5 in S1 Appendix show the parameter values for the best fit models and the corresponding time series values for the estimated R0, the compartment populations, and the predicted reported cases, over the training period. Models were evaluated by their prediction performance over the weeks that followed the training period. This was done using MAPE, based on the framework used by Friedman et al. to evaluate leading COVID-19 models [49]. Results for 1—10 forecasting weeks are shown in Table 1. All models performed relatively well during the period of study. (As a point of comparison, Friedman et al. found in their global evaluation of COVID-19 models, MAPE values of 1—2% for 1 week forecasts and 17—25% for 10 week forecasts. See Figs 3 and 5 in [49]. Note their evaluation used cumulative deaths data whereas this work uses cumulative cases data.)

Table 1

MAPE results for the 3 models.

	MAPE
	model
forecasting weeks	i. constant β	ii. trips data	iii. trips & entrances data
1	0.03	0.02	0.24
2	0.20	0.30	0.19
3	0.65	0.80	0.25
4	0.97	1.16	0.34
5	1.22	1.44	0.52
6	1.47	1.73	0.76
7	1.36	1.60	1.01
8	1.39	1.54	1.08
9	1.59	1.78	1.09
10	1.80	1.98	1.12

Median absolute percentage error (MAPE) used to evaluate the 3 models. The MAPE measures errors relative to the true values and can vary from 0 to infinity where 0 represents perfect agreement. The models differ in whether they incorporate trips and entrances data to model transmissibility. Model (i) is a baseline, dummy model where transmissibility is constant, model (ii) uses trips data, and model (iii) uses trips and entrances data. All models used the same framework and methods. The model (iii) using both trips and entrances data outperformed the other models in all but excluding the first week that followed the training period. More importantly, the model (ii) that used trips data to model transmission rates (without entrances data) had results similar to, and slightly worse than, the baseline model (i) which assumed a constant transmission rate. This is not surprising, as the data indicated trips were able to increase without impacting transmission rates (Fig 1). This is also shown in that the best fit for model (ii) had parameters that flattened the impact of the trips data, resulting in a nearly flat reproduction number, R0. Given that there were few new infections at the end of the training period (i.e. a smaller population in the I compartment), this resulted in relatively flat predictions for new reported cases for model (ii) over the forecasting weeks that followed the training period (similar to model (i)). This is in contrast to the model (iii) that used both trips and entrances data, and where predictions for new reported cases closely tracked with actual predictions. See Fig 4. Overall, these estimated R0 values are reasonable and within the range of values estimated by previous works [60].

Fig 4

Fit model results.

Fit model results.

Time series values for (top) the estimated R0 and (bottom) actual versus predicted reported cases that resulted from model training. Left: Plotted values for the model which uses just trips data. Right: Plotted values for the model which uses both trips and entrances data. Models were trained over the period March 14—May 31 and tested over the weeks that followed. The training and testing periods are divided by gray and white backgrounds, respectively. Axes for the R0 values are set to highlight that values were flattened for the trips data model. See Fig A.5 in S1 Appendix for plots that show the full variation in the R0 values. As a robustness check, all models were trained and tested over an additional set of training and testing periods that ended slightly earlier than those used for the main results. (The training period for the robustness check was March 14—May 14, 2020.) The results are similar to the main results, and shown in Table A.6 and Fig A.6 in section A.8 of S1 Appendix. However in this case, the model (iii) using trips and entrances data consistently outperformed the other models for all forecasting weeks. These results may seem surprising and their interpretation remains unclear. In epidemiology, the 3 models may be considered as (i) a homogeneous mixing model, (ii) a model of one population in which transmission depends on local mixing only, and (iii) a model that accounts for local mixing and external seeding, where trips are a proxy for local mixing and entrances are a proxy for external seeding. It is possible that the lack of predictive power of trips in the model is due to the model being calibrated during a lockdown period, when transmission opportunities represented by trips were not as important without external seeding. However, it is also possible that while trips have been used as a proxy for mixing in related works, trips did not necessarily convert to transmission opportunities in this case. This may be due to trips being safely taken with social distancing guidelines and other NPIs in place. And again, this may partly be due to the model being calibrated during a lockdown. At the same time, the entrances metric may represent more than external seeding, and also represent a more open economy and additional activities that may increase transmission opportunities.

Counterfactuals

What if Andorra had not imposed a lockdown, which caused reduced mobility? What if border restrictions had not been put in place, which caused a drop in entrances? Overall, what if the population mobility, measured in total trips and entrances, had not dropped in March? In this section we explore such a counterfactual scenario by using the best fit model (iii) from the Models results section, which uses the trips and entrances data. The lockdown in Andorra began on March 13, 2020, and there was a large drop in trips and entrances surrounding this date (see Fig 1). We again take a simplified approach to modeling, and create hypothetical trips and entrances data for a counterfactual scenario where mobility and border restrictions were not put in place. We do this by using the true metrics up to March 13 of 2020, and then keeping the metrics constant at the March 13 values. This is shown in Fig 5. We then estimate counterfactual case reports by using the previously fit model (i.e. we use the model parameters that were fit with the true trips and entrances time series values) and replace the model’s trips and entrances data with the counterfactual data. We then run the simulation over the same period that was used to train the original model. The result is a prediction of 2941 cumulative reported cases up to May 31, 2020 under the counterfactual model, versus the actual 766 reported cases up to May 31, under the true scenario. The difference is an additional 2175 (more than 3x as many) reported cases during this time period under the counterfactual scenario.

Fig 5

Counterfactual results.

Top: Hypothetical total trips and entrances metrics that are used to simulate reported cases for a counterfactual scenario where mobility and border restrictions had not been put in place. Bottom: Simulated reported cases for such a counterfactual scenario, versus the actual reported cases that occurred in the true scenario.

Counterfactual results.

Discussion

When COVID-19 was introduced to Andorra at the start of March 2020, the country and its bordering neighbors responded quickly with economic and border restrictions. These interventions and other NPIs showed to be effective in Andorra, as the country brought case growth under control from March—May 2020, before the restrictions were fully lifted. The counterfactual scenario modeled in this work shows a stark alternative had the mobility changes observed during this period not occurred, with more than an estimated 3x as many cases, likely overwhelming the hospital system. Numerous other works have also used mobility data collected from mobile phones to model the impacts of mobility restrictions on COVID-19 transmission. However, these studies have relied on data about trips, and the data represented a small sample. Other works using meta-population SEIR models, where the modeled sub-populations are dynamic, have been based on static census data. In contrast, this work leverages data collected from mobile phones that represent 100% of subscribers in a country. We showed how these data could be used to build on previous works by computing daily trips metrics as well as estimating a dynamic, real-time population census. We then showed how these data can be used to improve upon the understanding of a pandemic in two main ways. First, these data were used in order to better understand why participation in the nationwide serology testing program dropped between the first and second phases of testing. The drop in participation may have been concerning as the second phase of testing was intended to help better detect and track the virus. This decreased ability to track the virus might have been particularly concerning because the test results showed that the temporary worker population had the highest infection rates and this population also had the largest drop in test participation. However, the analysis, which leveraged the telecoms data to estimate dynamic population changes, suggested that the decline in participation was likely due to test participants leaving the country after their first test. Second, we showed how the dynamic population data could be used to improve epidemiological (SEIR) models that otherwise rely on mobility measured by trips. In our contribution, we developed simple SEIR models that differed in how they used the trips and entrances metrics developed through this work. These models performed well compared to the 7 global COVID-19 models evaluated by Friedman et al. (2021) [15, 43, 44, 49, 61–63], but their purpose was not to be highly accurate; the purpose of these models was to illustrate the relative importance of trips mobility data versus real-time population data, namely country entrances. In particular, for the case of Andorra, we find that the population was able to regain internal mobility measured in daily total trips with limited growth in cases, and that total trips per day did not have predictive value in the SEIR models while country entrances did. While we show that the entrances metric had superior predictive power over the trips metric in Andorra, we do not mean to draw a direct line between country entrances and new COVID-19 cases. Changes in the entrances metric may have been highly correlated with other changes that impacted transmission rates, such as changes in COVID-19 policies and cautions. In general, the models were limited by their simplifications. For example, there was likely an interaction effect between the trips and entrances metrics that was not captured in the models. The models also assumed that the case identification rate (and hence removal rate) and reporting rate were constant, which related works have as well (e.g. [21]). However, these rates likely changed with Andorra’s increased testing. Future works can more accurately model the impacts of mobility and entrances, and the interaction between these metrics. This might also include incorporating data on the infection rates for other countries whose populations contribute to entrances. Future work can also incorporate data on testing rates to better model changes in the removal and reporting rates. Furthermore, our modeling approach was able to leverage features that make Andorra a special case study compared to other countries. In particular, Andorra normally has a highly dynamic population, given its small population and relatively large number of cross-border traffic and temporary workers. These features, along with the fact that our study was conducted over one period at the start of COVID-19, may make our results less transferable to other countries or contexts. Despite these limitations, overall, this case study suggests how using mobile phone data to measure dynamic population changes could improve studies that rely on more commonly used mobility metrics and the overall understanding of a pandemic.

Supplementary appendix.

(PDF) Click here for additional data file. 13 Dec 2021

PONE-D-21-35256

Using mobile phone data to estimate dynamic population changes and improve the understanding of a pandemic: A case study in Andorra

PLOS ONE Dear Dr. Berke, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Both referees have positively evaluated the manuscript, raising only minor comments. Please, take them into account in a revised version of the manuscript, in particular considering the concerns of Referee #1. Please submit your revised manuscript by Jan 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Michele Tizzoni Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please provide additional details regarding participant consent. If you are reporting a retrospective study of medical records, archived samples or third party data, please ensure that you have discussed whether all data were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data from their medical records used in research, please include this information. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). 3. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met. Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/plosone/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript. 4. Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables (should remain/ be uploaded) as separate "supporting information" files. 5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 6. Please include a caption for figure 5. 7. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript highlights a very important phenomenon on the role played by cases importation on local epidemics and how this can be best used in our current epidemic models to reproduce and understand the observed trends. The authors study the epidemic in Andorra which is indeed a very nice example of the role played by seeding thanks to the high ratio of external flows versus inhabitants. I think that the manuscript is well written and clear in the research purpose that it wants to address. The methodology and the data suit well this purpose. I think that the manuscript is suitable for acceptance after fulfilling a minor revision. - “The value of 149 R0 can change over the course of an epidemic due to changes in human behavior and 150 NPIs; a goal of control efforts, such as those employed during the COVID-19 pandemic, 151 is to reduce R0 to bring an epidemic under control” The authors make some confusion on the description of R0 (which does not change), probably confusing it with Rt. Please check the definitions to avoid misconceptions and correct the paragraph. https://royalsociety.org/-/media/policy/projects/set-c/set-covid-19-R-estimates.pdf - line 212: “Model ii: transmission a function of trips data” → as - line 244: It is not clear to me why C(t) should be proportional to the removals R with a delay in time. The cumulative reported cases should be reported with a certain delay after the individuals get infected, i.e. when they start being positive to tests, not when they get recovered or they isolate or die. - line 315: “telecom data on mobility is provided by Andorra Telecom”, but then the authors say “Furthermore, telecoms data showed that 86% of entrances by foreign SIMs were either Spanish or French, and when accounting for entrances by Andorran SIMs, 68% of all entrances were by Spanish or French SIMs”. So the mobility data includes also foreign countries SIM cards? - on march 163h the govt of Andorra imposed a restriction to public activities and schools. Your model is trained on this period until May 31, can this affect the calibration of your model to more general periods, like for example periods with no activity restrictions? - Fig3 resolution is very poor and it is very difficult to understand as it is. - Fig4 shows a flat R0 for the trips-only model. However in Fig5 total trips seem to vary a lot during the period of observation. How come this scenario does not produce any variation of Rt with respect to trips (internal mixing)? I understand that Andorra only counts for 77,000 inhabitants and that the external flow is very high with respect to the local population relatively to other countries, however I don’t understand how a model based only on internal trips does not provide a varying reproductive number along the time of simulation. - line 432: only to get this clear in epidemiological terms, model 1 is a homogeneous mixing, model 2 is a model of one population in which transmission depends on local mixing only, model 3 accounts for local mixing and external seeding (importation and exportation) - line 453: “More importantly, the model (ii) that used trips data to model transmission rates (without entrances data) had results similar to, and slightly worse than, the baseline model (i) which assumed a constant transmission rate. This is not surprising, as the data indicated trips were able to increase without impacting transmission rates” I would say that this happens also because the model is calibrated in a lockdown period, hence internal mixing does not necessarily convert to spreading opportunities because of strong social distancing measures. - line 454: “This is also shown in that the best fit for model (ii) had parameters that flattened the impact of the trips data, resulting in a nearly flat reproduction number, R0.” indeed, the effect of the lockdown calibration, but of course using a fixed transmission parameter leads to this problem. The authors may need to address this model scenario limitation. (check gramatic of the first part of the sentence) - Table 1: in the description of the error metric it would be better to state also the range of the values that this metric can reach. From 0 (perfect) to N (bad). - Fig.5, very interesting, why not plotting also the new cases time-series? - Some recent and very related work on the effect on local epidemics of population changes and seeding caused by entrances is missing from the references, see for example: -Mazzoli, M., Valdano, E., & Colizza, V. (2021). Projecting the COVID-19 epidemic risk in France for the summer 2021. Journal of travel medicine, 28(7), taab129. -Kraemer, M. U., Hill, V., Ruis, C., Dellicour, S., Bajaj, S., McCrone, J. T., ... & Pybus, O. G. (2021). Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B. 1.1. 7 emergence. Science, 373(6557), 889-895. -Kraemer, M. U., Yang, C. H., Gutierrez, B., Wu, C. H., Klein, B., Pigott, D. M., ... & Scarpino, S. V. (2020). The effect of human mobility and control measures on the COVID-19 epidemic in China. Science, 368(6490), 493-497. -Mazzoli, M., Pepe, E., Mateo, D., Cattuto, C., Gauvin, L., Bajardi, P., ... & Ramasco, J. J. (2021). Interplay between mobility, multi-seeding and lockdowns shapes COVID-19 local impact. PLoS computational biology, 17(10), e1009326. Reviewer #2: The authors study the problem of how COVID-19 transmission dynamics relate to human mobility patterns. In particular, the aim is to compare the predictive power of domestic mobility and cross-border mobility and thus infer how important dynamic population mapping is in the context of infectious disease dynamics. This is a very interesting and relevant research question, as few studies on infectious disease spread take a dynamic population into account, often because mobility datasets are limited to a single country and do not include cross-border traffic which could inform population changes. The authors study the case of Andorra, use mobility data to inform an SEIR model, and make the surprising observation that within-country trips hardly had any predictive power, while taking into account cross-border mobility improved predictions. The manuscript is very well written. The methods and results are described in clear, unambiguous language. The underlying data is presented in detail, including the mobility data results of the serological study. How mobility is measured and how the data is processed is described clearly, where the authors make use of data-processing practices commonly used in literature (such as the stay detection and home detection). The serological study is properly discussed, including caveats stemming from the two waves of testing. The data used in the study is made available online. The authors have also made software code used in the study available. I have downloaded the data and code and was able to replicate several results and figures used in the manuscript. The SEIR model is set up, trained and evaluated appropriately. The two components of mobility are incorporated in a reasonable manner in the model. The use of three model scenarios is an appropriate way to test the influence of trips- and entrance-data on the prediction. I was honestly surprised by the extreme lack of predictive power of the model scenario using only trips data (model ii) and was very skeptical at first, but I could not find a flaw in the methodology. I do think that the entrances are such a good predictor as they are probably highly correlated with a host of other interventions that affect transmission rate, but the authors properly address this in the discussion. Altogether, I think the results, conclusions and their interpretations are sound. I would recommend the manuscript for publication as is. I have only one minor remark which I would appreciate being addressed by the authors: It is fair to say that Andorra is a very special case compared to other countries, due to its very small population and relatively large amount of cross-border traffic and temporary visitors. I would assume it has a much more "dynamic" population than most countries, which might mean that the results are not very transferable to other countries, where cross-border traffic and temporary visitors are of less importance. I think it would be appropriate to address this in the discussion. It would also be interesting to include comparative values for Andorra and other countries regarding how dynamic their populations are, if available (such as the fraction of cross border traffic among all traffic, the number of tourists relative to population, number of temporary workers, etc). ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

27 Jan 2022 Please see the document titled "Response to Reviewers" that is attached in our uploaded files. You will find here the same content without proper formatting. Dear editor and reviewers, We thank you for your thoughtful review and comments. The feedback has undoubtedly improved the original manuscript. In the following letter we respond to each of your pieces of feedback separately. Editor and journal requirements 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. Response Please see that we have made necessary changes. We have reached out to the PLOS ONE staff and confirmed that our updated manuscript meets style requirements (we corresponded with Chloe Anderson, January 18). 2. Please provide additional details regarding participant consent. Response Please see the changes to our Methods section and Ethics statement that clarifies that the provided data was anonymized. 3. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Response Please see the uploaded file titled 'Questionnaire on inclusivity in global research' included in our submission file. 4. Please include your tables as part of your main manuscript and remove the individual files. Response Please note that we have updated the tables in our manuscript and confirmed with the PLOS ONE staff that our new manuscript meets journal guidelines for tables. 5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. Response We have updated our data availability statement to be more informative. We are happy to note that both reviewers confirmed data availability. Reviewer #2 even ran the code and reproduced our results. 6. Please include a caption for figure 5. Response The original submission had a caption for this figure directly below Table 1, so no change has been made for this in the revised manuscript. 7. Please review your reference list to ensure that it is complete and correct. Response Thank you for pointing out that we should review our reference list. In general, we observed that some of our references were not easy to find by searching for them on the web. We added URL and DOI links to the references list, where appropriate, in order to address this. We have also added the references recommended by reviewer #1. Reviewer 1 The manuscript highlights a very important phenomenon on the role played by cases importation on local epidemics and how this can be best used in our current epidemic models to reproduce and understand the observed trends. The authors study the epidemic in Andorra which is indeed a very nice example of the role played by seeding thanks to the high ratio of external flows versus inhabitants. I think that the manuscript is well written and clear in the research purpose that it wants to address. The methodology and the data suit well this purpose. I think that the manuscript is suitable for acceptance after fulfilling a minor revision. - “The value of 149 R0 can change over the course of an epidemic due to changes in human behavior and 150 NPIs; a goal of control efforts, such as those employed during the COVID-19 pandemic, 151 is to reduce R0 to bring an epidemic under control” The authors make some confusion on the description of R0 (which does not change), probably confusing it with Rt. Please check the definitions to avoid misconceptions and correct the paragraph. https://royalsociety.org/-/media/policy/projects/set-c/set-covid-19-R-estimates.pdf Response We thank the reviewer for their positive words. We also thank the reviewer for highlighting this potential confusion in terminology. We have updated this section of the text to clarify our use of the term R0 and to distinguish this from Rt. - line 212: “Model ii: transmission a function of trips data” → as Response We thank the reviewer for suggesting this grammatical change. We have made this change to both the Model ii and Model iii descriptions. - line 244: It is not clear to me why C(t) should be proportional to the removals R with a delay in time. The cumulative reported cases should be reported with a certain delay after the individuals get infected, i.e. when they start being positive to tests, not when they get recovered or they isolate or die. Response We recognize that many other SEIR works have modeled the I compartment to represent the entire time an individual is infected, and then model R as "Removed" where Removed indicates they either recovered (are no longer infectious) or died. However, given the guidelines and policies surrounding COVID-19, it became common for individuals to isolate themselves, and hence enter the (R) Removed compartment, as soon as they suspected themselves to be infected. For this reason, our modeling framework assumes that individuals transition from I to R as soon as they suspect they are infectious. Individuals may then seek a test, and the result of the test will be reported with some delay. We model C to represent the report of a positive test after that delay, d. We thank the reviewer for pointing out that this was unclear. We have added additional text to the Model framework section to clarify how this assumption differs from many other SEIR models. - line 315: “telecom data on mobility is provided by Andorra Telecom”, but then the authors say “Furthermore, telecoms data showed that 86% of entrances by foreign SIMs were either Spanish or French, and when accounting for entrances by Andorran SIMs, 68% of all entrances were by Spanish or French SIMs”. So the mobility data includes also foreign countries SIM cards? Response We thank the reviewer for pointing out this should be clarified. In the original manuscript, we mentioned in the Andorra and COVID-19 section that Andorra Telecom data covers "...all mobile subscribers who spend any time in Andorra, whether they are Andorran nationals or have foreign SIM cards". For clarity, we have also updated the Telecoms data and metrics section to say that Andorra Telecom data covers subscribers using foreign SIMs. - on march 163h the govt of Andorra imposed a restriction to public activities and schools. Your model is trained on this period until May 31, can this affect the calibration of your model to more general periods, like for example periods with no activity restrictions? Response We thank the reviewer for pointing out that we should better address this. In the Training and testing subsection we explain the data complications that cause us to use this period. The training period of March 14 - May 31 includes both severe restrictions as well as a gradual reopening of the economy. In the Model results subsection we explain that we also conduct a robustness check over a modified training and testing period, which does not include the Phase 3 reopening period, where results are summarized and then provided in further detail in the SI Appendix. To further address the reviewer's point we have added text in the Model results section to comment on the impact of calibrating the model during the lockdown period. We have also added a comment in the discussion that notes how limiting our study to this specific period is a limitation to transferring our results to other contexts. - Fig3 resolution is very poor and it is very difficult to understand as it is. Response We thank the reviewer for pointing this out. We have revised the figure and hope the reviewer finds it more informative. - Fig4 shows a flat R0 for the trips-only model. However in Fig5 total trips seem to vary a lot during the period of observation. How come this scenario does not produce any variation of Rt with respect to trips (internal mixing)? I understand that Andorra only counts for 77,000 inhabitants and that the external flow is very high with respect to the local population relatively to other countries, however I don’t understand how a model based only on internal trips does not provide a varying reproductive number along the time of simulation. Response We note that while Fig 4 shows a flattened R0, this is due to the scale of the plot. There is variation which is fully shown in Fig A.5 in S1 Appendix. We call the reader's attention to these plots in the Model results subsection and particularly in the caption of Fig 4. Even so, we agree with the reviewer that this flattened result with the trips data is surprising and reason as follows. During our modeling period, which followed Andorra's lockdown, we found that trips were not correlated with transmission rate (as shown in Fig 1). Due to the lack of predictive power of internal trips in the model, variation in these trips did not impact variation in the transmission rates. Since transmission rate was parameterized on trips only in this model, the modeled transmission rate was effectively almost constant. This counter-intuitive result demonstrated the need to incorporate more information - such as the entrances data - into the modeling of transmission rate. As for how it could be the case that trips did not impact the modeled transmission rates, we have added text to address this following your comments below. - line 432: only to get this clear in epidemiological terms, model 1 is a homogeneous mixing, model 2 is a model of one population in which transmission depends on local mixing only, model 3 accounts for local mixing and external seeding (importation and exportation) - line 453: “More importantly, the model (ii) that used trips data to model transmission rates (without entrances data) had results similar to, and slightly worse than, the baseline model (i) which assumed a constant transmission rate. This is not surprising, as the data indicated trips were able to increase without impacting transmission rates” I would say that this happens also because the model is calibrated in a lockdown period, hence internal mixing does not necessarily convert to spreading opportunities because of strong social distancing measures. - line 454: “This is also shown in that the best fit for model (ii) had parameters that flattened the impact of the trips data, resulting in a nearly flat reproduction number, R0.” indeed, the effect of the lockdown calibration, but of course using a fixed transmission parameter leads to this problem. The authors may need to address this model scenario limitation. (check gramatic of the first part of the sentence) Response We thank the reviewer for suggesting these clarifications. We have added text to the Model results section that describes the 3 models in the epidemiological terms that the reviewer recommends. "In epidemiology, the 3 models may be considered as (i) a homogeneous mixing model, (ii) a model of one population in which transmission depends on local mixing only, and (iii) a model that accounts for local mixing and external seeding, where trips are a proxy for local mixing and entrances are a proxy for external seeding." We then add text to further interpret the results and address the fact that the lack of predictive power in the trips metric may be due to calibrating the model during a lockdown period. - Table 1: in the description of the error metric it would be better to state also the range of the values that this metric can reach. From 0 (perfect) to N (bad). Response In the revised manuscript, the following explanation has been added to the caption: “The MAPE measures errors relative to the true values and can vary from 0 to infinity where 0 represents perfect agreement." - Fig.5, very interesting, why not plotting also the new cases time-series? Response We are glad that the reviewer found Figure 5 and the corresponding analysis interesting. Following the reviewer's suggestion, we have updated this figure to include a plot with the new cases timeseries. - Some recent and very related work on the effect on local epidemics of population changes and seeding caused by entrances is missing from the references, see for example: … Response We thank the reviewer for calling our attention to these related works. We have incorporated each of them into the Introduction section. Reviewer 2 The manuscript is very well written. The methods and results are described in clear, unambiguous language. The underlying data is presented in detail, including the mobility data results of the serological study. How mobility is measured and how the data is processed is described clearly, where the authors make use of data-processing practices commonly used in literature (such as the stay detection and home detection). The serological study is properly discussed, including caveats stemming from the two waves of testing. The data used in the study is made available online. The authors have also made software code used in the study available. I have downloaded the data and code and was able to replicate several results and figures used in the manuscript. The SEIR model is set up, trained and evaluated appropriately. The two components of mobility are incorporated in a reasonable manner in the model. The use of three model scenarios is an appropriate way to test the influence of trips- and entrance-data on the prediction. I was honestly surprised by the extreme lack of predictive power of the model scenario using only trips data (model ii) and was very skeptical at first, but I could not find a flaw in the methodology. I do think that the entrances are such a good predictor as they are probably highly correlated with a host of other interventions that affect transmission rate, but the authors properly address this in the discussion. Altogether, I think the results, conclusions and their interpretations are sound. Response We thank the reviewer for both their skepticism and for thoroughly reviewing our methods and using our open source code and data to replicate results. I would recommend the manuscript for publication as is. I have only one minor remark which I would appreciate being addressed by the authors: It is fair to say that Andorra is a very special case compared to other countries, due to its very small population and relatively large amount of cross-border traffic and temporary visitors. I would assume it has a much more "dynamic" population than most countries, which might mean that the results are not very transferable to other countries, where cross-border traffic and temporary visitors are of less importance. I think it would be appropriate to address this in the discussion. It would also be interesting to include comparative values for Andorra and other countries regarding how dynamic their populations are, if available (such as the fraction of cross border traffic among all traffic, the number of tourists relative to population, number of temporary workers, etc). Response We agree with the reviewer that our modeling approach, and hence results, are particularly well suited to a country like Andorra, where there is relatively high cross-border traffic compared to the stable population. We thank the reviewer for suggesting that we note this in the discussion and have added a paragraph describing this. Submitted filename: Response to Reviewers.pdf Click here for additional data file. 18 Feb 2022 Using mobile phone data to estimate dynamic population changes and improve the understanding of a pandemic: A case study in Andorra PONE-D-21-35256R1 Dear Dr. Berke, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Michele Tizzoni Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: All my comments have been addressed, I have no further comments and I find the manuscript suitable for publication Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 24 Feb 2022 PONE-D-21-35256R1 Using mobile phone data to estimate dynamic population changes and improve the understanding of a pandemic: A case study in Andorra Dear Dr. Berke: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Michele Tizzoni Academic Editor PLOS ONE

33 in total

1. Determining the value of diagnostic and screening tests.

Authors: B J McNeil; S J Adelstein
Journal: J Nucl Med Date: 1976-06 Impact factor: 10.057

2. Multiscale mobility networks and the spatial spreading of infectious diseases.

Authors: Duygu Balcan; Vittoria Colizza; Bruno Gonçalves; Hao Hu; José J Ramasco; Alessandro Vespignani
Journal: Proc Natl Acad Sci U S A Date: 2009-12-14 Impact factor: 11.205

3. Predictive performance of international COVID-19 mortality forecasting models.

Authors: Joseph Friedman; Patrick Liu; Christopher E Troeger; Austin Carter; Robert C Reiner; Ryan M Barber; James Collins; Stephen S Lim; David M Pigott; Theo Vos; Simon I Hay; Christopher J L Murray; Emmanuela Gakidou
Journal: Nat Commun Date: 2021-05-10 Impact factor: 14.919

4. Temporal Changes in Ebola Transmission in Sierra Leone and Implications for Control Requirements: a Real-time Modelling Study.

Authors: Anton Camacho; Adam Kucharski; Yvonne Aki-Sawyerr; Mark A White; Stefan Flasche; Marc Baguelin; Timothy Pollington; Julia R Carney; Rebecca Glover; Elizabeth Smout; Amanda Tiffany; W John Edmunds; Sebastian Funk
Journal: PLoS Curr Date: 2015-02-10