Literature DB >> 35698660

The phylodynamics of SARS-CoV-2 during 2020 in Finland.

Phuoc Truong Nguyen¹, Ravi Kant^1,2, Frederik Van den Broeck^3,4, Maija T Suvanto^1,2, Hussein Alburkat¹, Jenni Virtanen^1,2, Ella Ahvenainen¹, Robert Castren¹, Samuel L Hong³, Guy Baele³, Maarit J Ahava⁵, Hanna Jarva^5,6,7, Suvi Tuulia Jokiranta^6,7, Hannimari Kallio-Kokko⁵, Eliisa Kekäläinen^5,6, Vesa Kirjavainen⁵, Elisa Kortela⁸, Satu Kurkela⁵, Maija Lappalainen⁵, Hanna Liimatainen⁵, Marc A Suchard⁹, Sari Hannula¹⁰, Pekka Ellonen¹⁰, Tarja Sironen^1,2, Philippe Lemey³, Olli Vapalahti^1,2,5, Teemu Smura^1,5.

Abstract

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused millions of infections and fatalities globally since its emergence in late 2019. The virus was first detected in Finland in January 2020, after which it rapidly spread among the populace in spring. However, compared to other European nations, Finland has had a low incidence of SARS-CoV-2. To gain insight into the origins and turnover of SARS-CoV-2 lineages circulating in Finland in 2020, we investigated the phylogeographic and -dynamic history of the virus.
Methods: The origins of SARS-CoV-2 introductions were inferred via Travel-aware Bayesian time-measured phylogeographic analyses. Sequences for the analyses included virus genomes belonging to the B.1 lineage and with the D614G mutation from countries of likely origin, which were determined utilizing Google mobility data. We collected all available sequences from spring and fall peaks to study lineage dynamics.
Results: We observed rapid turnover among Finnish lineages during this period. Clade 20C became the most prevalent among sequenced cases and was replaced by other strains in fall 2020. Bayesian phylogeographic reconstructions suggested 42 independent introductions into Finland during spring 2020, mainly from Italy, Austria, and Spain. Conclusions: A single introduction from Spain might have seeded one-third of cases in Finland during spring in 2020. The investigations of the original introductions of SARS-CoV-2 to Finland during the early stages of the pandemic and of the subsequent lineage dynamics could be utilized to assess the role of transboundary movements and the effects of early intervention and public health measures.

Entities: Chemical

Keywords: SARS-CoV-2; Viral epidemiology

Year: 2022 PMID： 35698660 PMCID： PMC9187640 DOI： 10.1038/s43856-022-00130-7

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belonging to betacoronaviruses (genus Betacoronavirus) causes coronavirus disease 2019 (COVID-19), a respiratory infection with severe cases leading to respiratory failure and multiorgan manifestations in humans, and is responsible for the current socially and economically devastating pandemic. The virus has infected more than 184 million people in 221 countries and has caused over 3.9 million deaths as of July 5, 2021[1]. The virus is similar to other betacoronaviruses in terms of a relatively high evolutionary rate (~9.8 × 10−4 substitutions per site per year)[2] leading to the emergence of multiple viral lineages circulating the globe. Viral lineages may become more common in a given host population due to selective advantages or by chance e.g., due to the founder effect or genetic drift. Despite there being currently a plethora of viral lineages, only a small proportion of these are classified as variants of concern (VOCs), i.e. are considered to have enhanced transmissibility, pathogenicity, evasion of immune responses, or resistance to vaccines[3]. Currently, this category contains only the lineage B.1.617.2 (Delta)[4] first detected in India[5] and B.1.1.529 (Omicron) discovered in Botswana[6], but has previously included the lineages B.1.1.7 (Alpha) first detected in the United Kingdom (UK)[7], B.1.351 (Beta) first detected in South Africa[8], and P.1 (Gamma) first detected in Brazil[9]. The first Finnish SARS-CoV-2 case was detected on January 29, 2020, from a tourist from Wuhan, China[10] (Fig. 1), however, this infection did not lead to onward transmissions. The first epidemic wave in Finland began in week 9 (end of February 2020), peaked during week 14 (beginning of May) and ended by week 24 (early June). The incidence was low during the following summer (from mid-June to July). The second epidemic wave began in week 32 (beginning of August 2020) and lasted approx. until week 45 (early November). For additional information about the introduction and spread of SARS-CoV-2 in Finland during 2020, see Supplementary Note 1.

Fig. 1

Weekly SARS-CoV-2 statistics and general timeline of Finland in 2020.

Weekly SARS-CoV-2 statistics and general timeline of Finland in 2020.

The number of PCR tests (total n = 967,885) and positive findings (total n = 21,731) based on the COVID-19 infectious diseases registry of the Finnish Institute for Health and Welfare (THL) are shown in panel (a). The color of lines matches their respective axes, i.e. the axis indicating number of tests is on the right and number of positive cases on the left. The number of SARS-CoV-2 sequences submitted to GISAID (total n = 1,597) are displayed in panel (b). Panel (c) depicts the general timeline of the arrival of SARS-CoV-2 in Finland and the subsequent responses by the Finnish government and health authorities, which are indicated by letters A-J in panels (a) and (b). Exact dates for each response are mentioned within brackets. This information is based on public records by THL. HUS = Hospital District of Helsinki and Uusimaa. In order to gain insight into the geographic source and relative contribution of viral introductions that seeded the first wave epidemic in Finland as well as study phylodynamic aspects, such as the genetic diversity and lineage turnover, of circulating viruses during 2020, we sequenced 1,597 SARS-CoV-2 genomes from Finland and analyzed these using travel-aware Bayesian phylogeographic approaches including 1643 genomes from 17 European countries.

Methods

Sequencing and analyses of Finnish SARS-CoV-2 genomes

Research data for this report consists of SARS-CoV-2 genomes (n = 1,597) that were sequenced from SARS-CoV-2 PCR positive patient samples diagnosed in HUS Diagnostic Center, HUSLAB, University of Helsinki and Helsinki University Hospital (Fig. 2). This study was approved by the Research Administration of Helsinki University Hospital (HUS/32/2018 and HUS/157/2020) and no identifiable patient data were described in this study. As this was a retrospective registry study with no patient intervention, ethics committee approval and informed consent were not required by Finnish national legislation in accordance with the Medical Research Act of Finland 488/1999. RNA was reverse-transcribed to cDNA with the LunaScript RT SuperMix kit (New England Biolabs). Primer pools[11] targeting SARS-CoV-2 were designed using the PrimalScheme tool[12] (Supplementary Data 1) and PCR was conducted with PhusionFlash PCR master mix (Thermo Scientific). Sequencing libraries were prepared with NEBNext ultra II FS DNA library kit (New England Biolabs) according to the manufacturer’s instructions and sequenced with Illumina NovaSeq and MiSeq. Due to HUSLAB initially being the only clinical laboratory sequencing patient samples, some of the virus sequences originate from outside the HUS area e.g., from testing points on the border. The collection period was from spring to fall 2020 and the sampling was random. However, the data might be biased for the most severe cases of SARS-CoV-2, and there was no contact tracing for our data at that time. Consensus sequence data for Finnish SARS-CoV-2 was computed and classified either with the HAVoC pipeline[13] (which utilizes fastp[14] for quality filtering, BWA-MEM[15] for assembly, LoFreq[16] for variant calling and SAMtools[17] for consensus calling) or a modified pipeline consisting of Jovian[18] and pangolin[19]. Sequences were then submitted to the GISAID database. Clade and lineage assignment was done using Nextclade[20] and pangolin.

Fig. 2

Flowchart of sequence data acquisition and sampling and/or selection for spring and fall analyses.

Available Finnish SARS-CoV-2 sequences (Supplementary Data 2) were divided into spring and fall datasets based on the peaks in COVID-19 cases during 2020.

Flowchart of sequence data acquisition and sampling and/or selection for spring and fall analyses.

Available Finnish SARS-CoV-2 sequences (Supplementary Data 2) were divided into spring and fall datasets based on the peaks in COVID-19 cases during 2020. For the phylogenetic analysis of Finnish fall sequences, in addition to the local sequence data between weeks 32–38 (n = 77), a global reference dataset of SARS-CoV-2 genomes (n = 745) was selected from sequences from other countries (n = 20,720) from the same time period. These were obtained from the GISAID database (Supplementary Data 2). Viral sequences from fall of 2020 were aligned with MAFFT[21] and the phylogenetic tree was computed with a SARS-CoV-2 version IQ-TREE (version 2.1.3)[22] with 1,000 bootstraps and with the most optimal substitution model using ModelFinder[23]. Finally, the tree was visualized in R with ggtree[24] and ggtreeExtra[25].

Bayesian time-measured phylogeographic analyses

In order to infer the geographic source(s) of SARS-CoV-2 lineages contributing to the first wave in Finland, we extended our dataset of Finnish genomes with genomes available for other European countries (Fig. 2). A recent phylogeographic analysis demonstrated that SARS-CoV-2 spread in Europe was strongly predicted by Google mobility flows[26]. To inform our sampling, we therefore turned to the Google COVID-19 Aggregated Mobility Research Dataset containing anonymized mobility flows aggregated over users who have turned on the Location History setting (on a range of platforms[27]). Aggregated mobility flows between Finland and all other European countries were summarized between January and April 2020, and we selected the following 16 countries that were responsible for 95% of international travels from and to Finland: Estonia, Latvia, Norway, Hungary, Poland, Turkey, Sweden, Netherlands, Austria, Denmark, Italy, Germany, Switzerland, Spain, France and the United Kingdom. For these countries, we downloaded the available SARS-CoV-2 genomes from GISAID on April 17, 2020. For six countries (Estonia, Latvia, Norway, Hungary, Poland and Turkey) represented by a relatively small number of genomes, we decided to augment our dataset with genomes from GISAID with a sampling date up to April 31, 2020. We selected only sequences from the B.1 lineage with the D614G mutation for the analyses. We removed duplicate genomes for each country using SeqKit (version 0.11)[28]. For Finland, we retained duplicate genomes when these were sampled from cases with different travel histories. All genomes were aligned using MAFFT[21] and trimmed at the 5′ and 3′ ends. We then subsampled each country proportionally to the cumulative number of cases on April 17, 2020 by setting an arbitrary threshold of 7.5 genomes per 10,000 cases, with a minimum number of 100 sequences per country. For the 6 countries where the number of unique genomes was below 100, all genomes were included in the analysis. To maximize the spatial and temporal coverage of the subsampling, we partitioned each country’s genome pool by week and sampled as evenly as possible, selecting sequences from a different region within the country when available. We checked the resulting dataset for potential outliers with a root-to-tip regression using TempEst (version 1.5.3)[29] on a maximum likelihood inferred using IQ-TREE (version 2.0.3)[22], and removed 9 genomes. The final dataset consisted of 1,643 genomes out of an initial 8,513 genomes in spring only. Total, unique and downsampled number of genomes by country are given in Supplementary Table 1. All genomes were associated with exact sampling dates, except for the four genomes from Estonia that were sampled in March 2020. We performed Bayesian evolutionary reconstruction of timed phylogeographic history using BEAST (version 1.10)[30] incorporating genome sequences, their country and date of sampling, Google mobility data, and individual travel history[31,32]. Uncertainty in the sampling time for the four Estonian genomes was accommodated by sampling uniformly across the reported collection month in the Markov chain Monte Carlo (MCMC) analysis. We modeled sequence evolution using a strict molecular clock model and an HKY nucleotide substitution model[33] with gamma-distributed rate variation among sites[34]. We assumed an exponential growth coalescent model as the tree-generative process prior because we only used viral sequences sampled up to the 17th of April, which means that the large majority of sequences were sampled from a viral population experiencing exponential growth. To demonstrate that our results are not sensitive to the choice of this coalescent prior, we have also repeated the BEAST phylogeographic reconstruction with travel history using the Skygrid as a tree prior. Our phylogeographic model incorporated the country of sampling as discrete traits associated with the sampled genomes, and following a recent European SARS-CoV-2 phylogeographic analysis[26], we adopted a generalized linear model (GLM) specification to parametrize each rate of among-location movement as a log linear function of the total Google mobility flows (i.e. relative population flow between each pair of geographical areas over a given time interval) for the January-April, 2020 period. Total mobility flows were log-transformed and standardized after adding a pseudocount to each entry in the matrix. The main goal of our GLM extension was to obtain well-informed phylodynamic estimates. To demonstrate that our GLM parameterization is a better option than the standard inference procedure with BSSVS, we have estimated marginal likelihoods using a path sampling (PS) and stepping stone sampling (SS) approach. To make this procedure efficient for the large data set investigated here, we averaged over the same set of empirical trees for both parameterizations. Our results demonstrated that the GLM model (−1956.98 (PS) and −1956.78 (SS) log marginal likelihoods) outperforms the standard model with BSSVS (−2186.41 (PS) and −2186.39 (SS) log marginal likelihoods) by over 200 log marginal likelihood units. As the ancestral reconstruction of locations depends on the availability of samples, over- or undersampling of sequences from a given location can greatly impact the estimated ancestral locations[31]. To mitigate sampling bias and improve the location-transition history reconstructions, we augmented our elementary phylogeographic model by incorporating travel history information obtained from 44 cases that returned to Finland from Austria (n = 20), Italy (n = 13), Spain (n = 7), Estonia (n = 1), Germany (n = 1), Switzerland (n = 1) and United Kingdom (n = 1). We also investigated how unsampled diversity for six European countries or oversampling of Finnish SARS-CoV2 diversity may impact our phylogeographic reconstructions. Building on our extended phylogeographic model including sampling locations and individual travel histories, we incorporated unsampled taxa for the undersampled countries Estonia (n = 96 taxa added), Latvia (n = 83), Norway (n = 56), Hungary (n = 54), Poland (n = 46) and Turkey (n = 41) to arrive at a minimum of 100 genomes for all countries. Unsampled taxa without observed sequence data were added with associated location and sampling times, for which we randomly sampled dates from case count distributions per country. For this analysis, we also downsampled the Finnish genome dataset to 100 taxa, while ensuring we incorporated the 44 samples with known travel histories. We performed inference under the full model specification using MCMC sampling while employing the BEAGLE library (version 3)[35] to increase computational performance. Because MCMC burn-in takes considerable computational time due to the size of our dataset, with the tree topology representing the most challenging parameter for convergence, we start our analyses with a standard BEAST model considering only sequence evolution (strict molecular clock model, HKY nucleotide substitution model, and exponential growth tree prior). The resulting phylogenetic tree was subsequently used as a starting tree in our phylogeographic analyses. Multiple independent MCMC runs were run to ensure that their combined posterior samples achieved effective sample sizes (ESSs) larger than 100 for all continuous parameters. Transition histories were summarized using the tree sample tool, TreeMarkovJumpHistoryAnalyzer, implemented in BEAST to collect Markov jumps[36] and their timings from a posterior tree distribution annotated with Markov jumps histories[26].

43 in total

1. Bayesian Phylogeographic Analysis Incorporating Predictors and Individual Travel Histories in BEAST.

Authors: Samuel L Hong; Philippe Lemey; Marc A Suchard; Guy Baele
Journal: Curr Protoc Date: 2021-04

2. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.

Authors: Bui Quang Minh; Heiko A Schmidt; Olga Chernomor; Dominik Schrempf; Michael D Woodhams; Arndt von Haeseler; Robert Lanfear
Journal: Mol Biol Evol Date: 2020-05-01 Impact factor: 16.240

3. No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2.

Authors: Lucy van Dorp; Damien Richard; Cedric C S Tan; Liam P Shaw; Mislav Acman; François Balloux
Journal: Nat Commun Date: 2020-11-25 Impact factor: 14.919

4. Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2.

Authors: Philippe Lemey; Samuel L Hong; Verity Hill; Guy Baele; Chiara Poletto; Vittoria Colizza; Áine O'Toole; John T McCrone; Kristian G Andersen; Michael Worobey; Martha I Nelson; Andrew Rambaut; Marc A Suchard
Journal: Nat Commun Date: 2020-10-09 Impact factor: 14.919

5. ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data.

Authors: Shuangbin Xu; Zehan Dai; Pingfan Guo; Xiaocong Fu; Shanshan Liu; Lang Zhou; Wenli Tang; Tingze Feng; Meijun Chen; Li Zhan; Tianzhi Wu; Erqiang Hu; Yong Jiang; Xiaochen Bo; Guangchuang Yu
Journal: Mol Biol Evol Date: 2021-08-23 Impact factor: 16.240

6. HAVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences.

Authors: Ravi Kant; Teemu Smura; Phuoc Thien Truong Nguyen; Ilya Plyusnin; Tarja Sironen; Olli Vapalahti
Journal: BMC Bioinformatics Date: 2021-07-17 Impact factor: 3.169

7. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.

Authors: Andreas Wilm; Pauline Poh Kim Aw; Denis Bertrand; Grace Hui Ting Yeo; Swee Hoe Ong; Chang Hua Wong; Chiea Chuen Khor; Rosemary Petric; Martin Lloyd Hibberd; Niranjan Nagarajan
Journal: Nucleic Acids Res Date: 2012-10-12 Impact factor: 16.971

8. The emergence of SARS-CoV-2 in Europe and North America.

Authors: Michael Worobey; Jonathan Pekar; Brendan B Larsen; Martha I Nelson; Verity Hill; Jeffrey B Joy; Andrew Rambaut; Marc A Suchard; Joel O Wertheim; Philippe Lemey
Journal: Science Date: 2020-09-10 Impact factor: 47.728

9. Phylogenetic analysis of the first four SARS-CoV-2 cases in Chile.

Authors: Andrés E Castillo; Bárbara Parra; Paz Tapia; Alejandra Acevedo; Jaime Lagos; Winston Andrade; Loredana Arata; Gabriel Leal; Gisselle Barra; Carolina Tambley; Javier Tognarelli; Patricia Bustos; Soledad Ulloa; Rodrigo Fasce; Jorge Fernández
Journal: J Med Virol Date: 2020-04-08 Impact factor: 20.693

10. Serological and molecular findings during SARS-CoV-2 infection: the first case study in Finland, January to February 2020.

Authors: Anu Haveri; Teemu Smura; Suvi Kuivanen; Pamela Österlund; Jussi Hepojoki; Niina Ikonen; Marjaana Pitkäpaasi; Soile Blomqvist; Esa Rönkkö; Anu Kantele; Tomas Strandin; Hannimari Kallio-Kokko; Laura Mannonen; Maija Lappalainen; Markku Broas; Miao Jiang; Lotta Siira; Mika Salminen; Taneli Puumalainen; Jussi Sane; Merit Melin; Olli Vapalahti; Carita Savolainen-Kopra
Journal: Euro Surveill Date: 2020-03

1 in total

1. Spatiotemporal clustering patterns and sociodemographic determinants of COVID-19 (SARS-CoV-2) infections in Helsinki, Finland.

Authors: Mika Siljander; Ruut Uusitalo; Petri Pellikka; Sanna Isosomppi; Olli Vapalahti
Journal: Spat Spatiotemporal Epidemiol Date: 2022-02-05

1 in total