Literature DB >> 35603268

SARS-CoV-2 introductions and early dynamics of the epidemic in Portugal.

Vítor Borges¹, Joana Isidro¹, Nídia Sequeira Trovão², Sílvia Duarte³, Helena Cortes-Martins⁴, Hugo Martiniano³, Isabel Gordo⁵, Ricardo Leite⁵, Luís Vieira^3,6, Raquel Guiomar⁷, João Paulo Gomes¹.

Abstract

Background: Genomic surveillance of SARS-CoV-2 in Portugal was rapidly implemented by the National Institute of Health in the early stages of the COVID-19 epidemic, in collaboration with more than 50 laboratories distributed nationwide.
Methods: By applying recent phylodynamic models that allow integration of individual-based travel history, we reconstructed and characterized the spatio-temporal dynamics of SARS-CoV-2 introductions and early dissemination in Portugal.
Results: We detected at least 277 independent SARS-CoV-2 introductions, mostly from European countries (namely the United Kingdom, Spain, France, Italy, and Switzerland), which were consistent with the countries with the highest connectivity with Portugal. Although most introductions were estimated to have occurred during early March 2020, it is likely that SARS-CoV-2 was silently circulating in Portugal throughout February, before the first cases were confirmed. Conclusions: Here we conclude that the earlier implementation of measures could have minimized the number of introductions and subsequent virus expansion in Portugal. This study lays the foundation for genomic epidemiology of SARS-CoV-2 in Portugal, and highlights the need for systematic and geographically-representative genomic surveillance.

Entities: Chemical

Keywords: SARS-CoV-2; Viral infection

Year: 2022 PMID： 35603268 PMCID： PMC9053228 DOI： 10.1038/s43856-022-00072-0

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2), the causative agent of COVID-19, is a novel betacoronavirus that was first reported in December 2019 in Wuhan, China[1,2]. By 29 March 2021, it had already caused more than 126 million cases and 2.7 million deaths worldwide[3,4]. In order to control the virus arrival and spread, many countries adopted rigid public health measures, including complete border closures and general lockdowns, with tremendous consequences at economic and social levels. At the early stages of an epidemic, the success of public health measures is particularly dependent on their timely implementation, which requires comprehensive diagnosis/surveillance systems that are able to efficiently trace where the virus is being introduced and circulating[5-7]. Taking advantage of the extraordinary advances in sequencing technologies, modern surveillance systems are progressively relying on genomic epidemiology as a crucial tool for outbreak investigation and for tracking virus evolution and spread[7-9]. Genomic surveillance of SARS-CoV-2 can be particularly useful to: (i) understand the contribution of “new introductions” versus “local transmission” to the number of new cases at continent/country/regional levels; (ii) evaluate the impact of non-pharmaceutical interventions on the outcomes of transmission chains; (iii) characterize the genetic variability that may negatively affect molecular diagnostic tests; (iv) monitor genetic variability affecting antigens and targets of antiviral drugs with potential impact on the development/effectiveness of prophylactic (vaccines) and therapeutic measures; and (v) investigate potential associations between genetic variants and infectious load, patient immunological status, clinical outcomes (e.g., infection duration, disease severity, etc.)[5,10]. Acting as the National Reference Laboratory for SARS-CoV-2, the Portuguese National Institute of Health (INSA) Doutor Ricardo Jorge rapidly established a genome-based molecular surveillance strategy for SARS-CoV-2 in Portugal, setting up a large nationwide network involving more than 50 laboratories. A bilingual website (https://insaflu.insa.pt/covid19) was launched, providing updated data regarding the analysis of the SARS-CoV-2 genetic diversity and geotemporal dynamics, based on state-of-the-art methodologies for real-time tracking pathogen evolution[11,12]. Also, “situation reports” with major highlights are being released periodically to participating laboratories, national and regional public health authorities, and other stakeholders. Despite all the advantages of genomic surveillance, the uneven geographic sampling of viral genomes can severely skew phylogeographic inferences based on discrete trait ancestral reconstruction[13], therefore hindering the ability to accurately trace the seeding and dissemination patterns of SARS-CoV-2. The COVID-19 pandemic has been characterized by an unprecedented amount of genomic data and associated metadata, such as information on the patients’ recent movements prior to having developed any symptoms. In the present study, we reconstruct and characterize the spatio-temporal dynamics of SARS-CoV-2 introductions and early dissemination in Portugal using newly developed phylodynamic models that allow integration of individual-based travel history, in order to obtain a more realistic reconstruction of the viral dynamics[13]. This includes inferences of the timelines of the first introductions, geographic location of ancestral lineages, and the contribution of detected introductions to the epidemic evolution.

Methods

Sample characterization

Samples used in this study were collected as part of the ongoing national SARS-CoV-2 laboratory surveillance conducted by INSA, Portugal, in collaboration with Instituto Gulbenkian de Ciência (IGC). SARS-CoV-2 positive samples (either clinical specimens or RNA) were provided by a nationwide network, consisting of more than 50 laboratories, that was established at the beginning of the epidemic in Portugal. Anonymized date of sample collection, date of illness onset, and travel history were provided by laboratories and Regional and National Health Authorities. Geographical data presented in this study refers to the Region (“Health Administration region”) of the patients’ residence or, when no information was available, to the Region of exposure or of the hospital/laboratory that collected/sent the sample.

SARS-CoV-2 genome sequencing and assembly

SARS-CoV-2 positive RNA samples were subjected to genome sequencing using a whole-genome amplification strategy with tiled, multiplexed primers[14] and the ARTIC Consortium protocol (https://artic.network/ncov-2019; https://www.protocols.io/view/ncov-2019-sequencing-protocol-bbmuik6w), with slight modifications, as previously described[15]. Analysis of sequence read data was conducted using the bioinformatics pipeline implemented in INSaFLU (https://insaflu.insa.pt/; https://github.com/INSaFLU), which is a web-based (and also locally installable) platform for amplicon-based next-generation sequencing data analysis[16]. Sequence inspection and validation was performed as previously described[15].

Classification by clades and lineages

We explored the diversity of INSA sequences using a variety of nomenclature strategies, namely Nextstrain (using https://clades.nextstrain.org/; 9 November 2020), GISAID (https://www.gisaid.org/; 23 July 2020) and Phylogenetic Assignment of Named Global Outbreak LINeages (cov-lineages.org) (https://pangolin.cog-uk.io/; 16 October 2020)[17]. While Nextstrain and GISAID clade nomenclatures provide a less detailed categorisation of globally circulating diversity, cov-lineages.org classification is focused on identifying highly specific lineages that are actively transmitting in the population. Classification is provided in Supplementary material (Supplementary Data 1).

Assessment of genome sequencing by country

To assess the contribution of each country to the set of publicly available SARS-CoV-2 genomes and to determine the proportion of the number of genomes on the total number of reported COVID-19 cases (genome sampling) of a given country during the study period (until 31 March 2020), we obtained the number of cases per country from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv) and the number of genomes from GISAID (by 8 August 2020). Only the genomes with collection date until 31 March 2020 were considered. When a given genome lacked the collection day date, only specifying the month of collection, it was assigned to the last day of the respective month. When the number of genomes on a given day was higher than the number of cases, the number of cases was considered for graphical representation. Final data on the assessment of genome sampling by country is provided in Supplementary material (Supplementary Data 2).

Selecting a genomic background dataset

For the phylogenetic analyses, we downloaded full-length viral genome sequences from GISAID (https://www.gisaid.org/) on 6 August 2020 with collection dates before 1 April 2020 (Supplementary Data 2). For computational efficiency in the downstream operations, we analysed the A and B lineages separately. Multiple sequence alignments with a reference genome (MN908947.3) were performed using MAFFT v7.458 with parameter –addfragments[18]. Sequences with fewer than 75% unambiguous bases were excluded, as well as duplicate sequences defined as having identical nucleotide composition, collected on the same date and in the same country. The resulting dataset was trimmed at the 5′ and 3′ ends resulting in a multi-sequence alignment with 29,780 nucleotides. Sequences with date information only at the year-level were also excluded. This dataset was subjected to multiple iterations of phylogeny reconstruction using IQ-TREE multicore software version 1.6.12[19] with parameters -m GTR+G, and exclusion of outlier sequences whose genetic divergence was incongruent with sampling date using TempEst software version 1.5.2[20], resulting in 1632 and 22,124 sequences for the A and B datasets, respectively. GISAID acknowledgment table for the background dataset is provided as Supplementary Material (Supplementary Data 3).

Subsampling strategy

The magnitude of the lineage B datasets prohibits a full Bayesian inference approach in a reasonable timeframe. To overcome this constraint, we used a subsampling strategy that removes sequences such that monophyletic clusters that consist entirely of sequences from a particular country are represented by a single sequence. The excess sequences in a country-specific monophyletic clade do not contribute any additional information to the between-country diffusion process we aim to infer[21]. This process resulted in a dataset with 13,489 sequences (B_CS). Despite the almost 40% downsampling in the B lineage dataset, its size is still excessive for timely computational inferences. To further address this issue, we have built a phylogeny using IQ-TREE, as described previously, and partitioned the tree into six monophyletic clades (B_CS1 through B_CS6). These clades were examined for outlier sequences whose genetic divergence and sampling date were incongruent using TempEst software version 1.5.2[20]. All data sets exhibited a positive correlation between genetic divergence and sampling time and appear to be suitable for phylogenetic molecular clock analysis[20] (Supplementary Fig. 1).

Bayesian evolutionary inference of SARS-CoV-2 detected in Portugal

A total of 1275 SARS-CoV-2 genome sequences (obtained from positive samples collected until 31 March 2020) from Portugal were analysed in this study (INSA’s collection, as of 23 July 2020; Supplementary Data 1). Our interest lies in estimating the viral evolutionary history and spatial diffusion process during the early epidemics in the country. Travel history data is of particular importance when analyzing low diversity data, such as that for SARS-CoV-2, using Bayesian joint inference of sequence and location traits because sharing the same location state can contribute to the phylogenetic clustering of taxa[13]. For each of the datasets (A, and B_CS1 through B_CS6), we performed a joint genealogical and phylogeographic inference of time-measured trees using Markov chain Monte Carlo (MCMC) sampling implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST) package[22]. We applied a Hasegawa-Kishino-Yano 85 (HKY85) [23] substitution model with gamma-distributed rate variation among sites[24]. We used an uncorrelated lognormal relaxed molecular clock to account for evolutionary rate variation among lineages[25] and specified an exponential growth coalescent prior in our analyses. To integrate the travel history information obtained from (returning) travelers, we followed Lemey et al.[13] and augmented the phylogeny with ancestral nodes that are associated with a location state (but not with a known sequence), and enforced the ancestral location at a point in the past of a lineage. We specified normal prior distributions on the travel times informed by an estimate of time of infection and truncated to be positive (back-in-time) relative to sampling date. Specifically, we used a period of 14 days (incubation period of 99% of patients[26] where travel history information was collected for all recent movements), and a period between symptom onset and testing with an estimated mean of 4.70 days for the patients in the INSA cohort (estimated from data available for 717/1275 individuals), and a standard deviation of 4.06 days to incorporate the uncertainty on the period between symptom onset and testing. The location traits associated with taxa and with the ancestral nodes were modeled using a bidirectional asymmetric discrete diffusion process[27]. We ran and combined at least eight independent MCMC analyses for 50 million generations, sampling every 50,000th generation and removed 10% as chain burn-in. Stationarity and mixing were investigated using Tracer software version 1.7.1[28], making sure that effective sample sizes for the continuous parameters were greater than 100. We used the high-performance computational capabilities of the Biowulf cluster at the National Institutes of Health (Bethesda, MD, USA) (http://biowulf.nih.gov) to perform these analyses. Portuguese clusters were assumed for phylogeographic summaries if their topology posterior probability was ≥0.001. If the excluded genomes had known travel history, they were re-integrated along with same-cluster sequences if those did not cluster in a clade-defining polytomy (recovered a total of 14 sequences, 7 of them with travel history). The location of the most recent common ancestor (MRCA) of Portuguese clusters is usually inferred as Portugal, thus we compared the estimated locations and times for the parent nodes of the MRCA (for simplicity, here on referred to as PMRCA) across Portuguese BEAST clades representing the origin and timing of seeding events into Portugal.

Real-time data sharing of SARS-CoV-2 genetic diversity and geotemporal spread in Portugal

A website (https://insaflu.insa.pt/covid19) was launched on 28 March 2020 for real-time data sharing on SARS-CoV-2 genetic diversity and geotemporal spread in Portugal. This site gives access to “situation reports of the study and provides interactive data navigation using both Nextstrain (https://nextstrain.org/)[11] and Microreact (https://microreact.org/)[12] tools. As of 23 July 2020, genomic and phylogenetic analysis were performed using the SARS-CoV-2 Nextstrain pipeline version from 23 March 2020 (https://github.com/nextstrain/ncov), with slight modifications[15]. For data navigation, an IQ-TREE-derived[19] phylogenetic tree enrolling the 1275 studied sequences, and the associated metadata, can be visualized interactively at https://microreact.org/project/cM6KURnU7rUpqdAnBq5DAf/a2d3840e.

Ethical approval

Samples were obtained in the frame of the ongoing national SARS-CoV-2 genomic surveillance coordinated by the Portuguese National Institute of Health (INSA), being collected as part of the routine clinical care and laboratory procedures of the laboratories/hospitals (“Portuguese network for SARS-CoV-2 genomics”) collaborating in this system. This study was approved by the Ethical Committee (“Comissão de Ética para a Saúde”) of INSA, dismissing the need for individuals’ informed consent. Designations of all genome sequences are fully anonymized, and no identifying information of the associated patients is provided. Anonymized date of sample collection, date of illness onset and travel history were provided by laboratories and Regional and National Health Authorities.

29 in total

1. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples.

Authors: Joshua Quick; Nathan D Grubaugh; Steven T Pullan; Ingra M Claro; Andrew D Smith; Karthik Gangavarapu; Glenn Oliveira; Refugio Robles-Sikisaka; Thomas F Rogers; Nathan A Beutler; Dennis R Burton; Lia Laura Lewis-Ximenez; Jaqueline Goes de Jesus; Marta Giovanetti; Sarah C Hill; Allison Black; Trevor Bedford; Miles W Carroll; Marcio Nunes; Luiz Carlos Alcantara; Ester C Sabino; Sally A Baylis; Nuno R Faria; Matthew Loose; Jared T Simpson; Oliver G Pybus; Kristian G Andersen; Nicholas J Loman
Journal: Nat Protoc Date: 2017-05-24 Impact factor: 13.491

2. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

Authors: M Hasegawa; H Kishino; T Yano
Journal: J Mol Evol Date: 1985 Impact factor: 2.395

3. Ancient hybridization and an Irish origin for the modern polar bear matriline.

Authors: Ceiridwen J Edwards; Marc A Suchard; Philippe Lemey; John J Welch; Ian Barnes; Tara L Fulton; Ross Barnett; Tamsin C O'Connell; Peter Coxon; Nigel Monaghan; Cristina E Valdiosera; Eline D Lorenzen; Eske Willerslev; Gennady F Baryshnikov; Andrew Rambaut; Mark G Thomas; Daniel G Bradley; Beth Shapiro
Journal: Curr Biol Date: 2011-07-07 Impact factor: 10.834

4. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.

Authors: Lam-Tung Nguyen; Heiko A Schmidt; Arndt von Haeseler; Bui Quang Minh
Journal: Mol Biol Evol Date: 2014-11-03 Impact factor: 16.240

5. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10.

Authors: Marc A Suchard; Philippe Lemey; Guy Baele; Daniel L Ayres; Alexei J Drummond; Andrew Rambaut
Journal: Virus Evol Date: 2018-06-08

6. Nextstrain: real-time tracking of pathogen evolution.

Authors: James Hadfield; Colin Megill; Sidney M Bell; John Huddleston; Barney Potter; Charlton Callender; Pavel Sagulenko; Trevor Bedford; Richard A Neher
Journal: Bioinformatics Date: 2018-12-01 Impact factor: 6.931

7. Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States.

Authors: Joseph R Fauver; Mary E Petrone; Emma B Hodcroft; Kayoko Shioda; Hanna Y Ehrlich; Alexander G Watts; Chantal B F Vogels; Anderson F Brito; Tara Alpert; Anthony Muyombwe; Jafar Razeq; Randy Downing; Nagarjuna R Cheemarla; Anne L Wyllie; Chaney C Kalinich; Isabel M Ott; Joshua Quick; Nicholas J Loman; Karla M Neugebauer; Alexander L Greninger; Keith R Jerome; Pavitra Roychoudhury; Hong Xie; Lasata Shrestha; Meei-Li Huang; Virginia E Pitzer; Akiko Iwasaki; Saad B Omer; Kamran Khan; Isaac I Bogoch; Richard A Martinello; Ellen F Foxman; Marie L Landry; Richard A Neher; Albert I Ko; Nathan D Grubaugh
Journal: Cell Date: 2020-05-07 Impact factor: 41.582

8. In Search of Covariates of HIV-1 Subtype B Spread in the United States-A Cautionary Tale of Large-Scale Bayesian Phylogeography.

Authors: Samuel L Hong; Simon Dellicour; Bram Vrancken; Marc A Suchard; Michael T Pyne; David R Hillyard; Philippe Lemey; Guy Baele
Journal: Viruses Date: 2020-02-05 Impact factor: 5.048

9. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK.

Authors: Louis du Plessis; John T McCrone; Alexander E Zarebski; Verity Hill; Christopher Ruis; Moritz U G Kraemer; Andrew Rambaut; Oliver G Pybus; Bernardo Gutierrez; Jayna Raghwani; Jordan Ashworth; Rachel Colquhoun; Thomas R Connor; Nuno R Faria; Ben Jackson; Nicholas J Loman; Áine O'Toole; Samuel M Nicholls; Kris V Parag; Emily Scher; Tetyana I Vasylyeva; Erik M Volz; Alexander Watts; Isaac I Bogoch; Kamran Khan; David M Aanensen
Journal: Science Date: 2021-01-08 Impact factor: 47.728

10. Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to June 2020.

Authors: Erik Alm; Eeva K Broberg; Thomas Connor; Emma B Hodcroft; Andrey B Komissarov; Sebastian Maurer-Stroh; Angeliki Melidou; Richard A Neher; Áine O'Toole; Dmitriy Pereyaslov
Journal: Euro Surveill Date: 2020-08

1 in total

1. Unraveling the hurdles of a large COVID-19 epidemiological investigation by viral genomics.

Authors: Regina Sá; Joana Isidro; Vítor Borges; Sílvia Duarte; Luís Vieira; João P Gomes; Sofia Tedim; Judite Matias; Andreia Leite
Journal: J Infect Date: 2022-05-21 Impact factor: 38.637

1 in total