Literature DB >> 32265007

Whole genome and phylogenetic analysis of two SARS-CoV-2 strains isolated in Italy in January and February 2020: additional clues on multiple introductions and further circulation in Europe.

Paola Stefanelli1, Giovanni Faggioni2, Alessandra Lo Presti1, Stefano Fiore1,3, Antonella Marchi1,3, Eleonora Benedetti1,3, Concetta Fabiani1,3, Anna Anselmo2, Andrea Ciammaruconi2, Antonella Fortunato2, Riccardo De Santis2, Silvia Fillo2, Maria Rosaria Capobianchi4, Maria Rita Gismondo5, Alessandra Ciervo1, Giovanni Rezza1, Maria Rita Castrucci1, Florigio Lista2.   

Abstract

Whole genome sequences of SARS-CoV-2 obtained from two patients, a Chinese tourist visiting Rome and an Italian, were compared with sequences from Europe and elsewhere. In a phylogenetic tree, the Italian patient's sequence clustered with sequences from Germany while the tourist's sequence clustered with other European sequences. Some additional European sequences in the tree segregated outside the two clusters containing the patients' sequences. This suggests multiple SARS-CoV-2 introductions in Europe or virus evolution during circulation.

Entities:  

Keywords:  Europe; SARS-CoV-2; Whole genome sequence; phylogenetic analysis

Mesh:

Substances:

Year:  2020        PMID: 32265007      PMCID: PMC7140597          DOI: 10.2807/1560-7917.ES.2020.25.13.2000305

Source DB:  PubMed          Journal:  Euro Surveill        ISSN: 1025-496X


An outbreak of a viral respiratory illness (officially named by the World Health Organization coronavirus disease, COVID-19) caused by the newly discovered severe acute respiratory syndrome coronavirus (SARS-CoV-2), started around mid-December 2019, in the city of Wuhan, Hubei province, China [1]. The outbreak subsequently spread further and as at 31 March 2020, 750,890 cases of COVID-19 have been confirmed worldwide including 668,345 outside China [2]. Since 20 February 2020, sustained local transmission has been documented in Italy [3], where to date, 98,716 COVID-19 cases testing positive for SARS-CoV-2 have been diagnosed, with 10,943 deaths [4]. To gain further understanding on the molecular epidemiology of the outbreak in Italy, we characterised the full-genome sequence of two SARS-CoV-2 strains respectively isolated from two patients diagnosed in the country. The first patient was a Chinese tourist from Wuhan diagnosed at the end of January, who had visited Rome and not been in areas of Italy later found to be the initially affected areas of the epidemic in Lombardy. The second patient was an Italian person, with no apparent direct epidemiological link with China and who was diagnosed in the second half of February in Lombardy. The sequences presented are analysed in the context of other available genome sequences from Europe and elsewhere.

Patients, virus cultivation and whole genome sequencing

The two patients in this study had both been hospitalised with an acute respiratory illness (pneumonia), showing a bilateral lung involvement with ground-glass opacity, requiring intensive care. The Chinese patient had had onset of symptoms on 29 January 2020 and had been diagnosed in a hospital in Rome. The Italian patient whose onset of symptoms had occurred on 10 February 2020 had been diagnosed in a hospital in Milan. Biological samples from both patients had been confirmed as being SARS-CoV-2 positive by the National Reference Laboratory (NRL) of the Istituto Superiore di Sanità (ISS) in Rome. The samples used for this study were nasopharyngeal swabs. These had been respectively sampled on the same day of hospitalisation, when symptoms occurred, for the Chinese tourist and 10 days after symptom onset for the Italian patient. An aliquot of each patient’s nasopharyngeal sample was used to generate in vitro cultures in Vero cells grown in modified Eagle’s medium (MEM; Gibco, Thermofisher, United Kingdom) supplemented with GlutaMAX. A total of 140 µL of each culture’s supernatant was used for viral RNA extraction using the QIAMP VIRAL RNA mini kit (Qiagen, Hilden, Germany). The obtained genomic RNAs were retro-transcribed using the SuperScript III Reverse Transcriptase kit (Invitrogen, Carisbad, United States (US)) and double-stranded DNAs were subsequently obtained by Klenow enzyme (Roche, Basel, Switzerland) according to the manufacturer’s instructions. The Nextera XT kit was used for library preparations and whole genome sequencing was performed using the Illumina Miseq Reagent Nano Kit, V2 (2 x 150 cycles) on the Illumina MiSeq instrument (Illumina, San Diego, US). The reads were trimmed for quality and length and assembled by mapping to the reference genome from Wuhan, China (GenBank accession number: NC_045512.2) using Geneious Prime (www.geneious.com) [5]. Viral sequences from the two patients were deposited in the Global Initiative on Sharing All Influenza Data (GISAID; https://www.gisaid.org/epiflu-applications/next-hcov-19-app/ ).

Phylogenetic analysis

To analyse the obtained SARS-CoV-2 genomes respectively derived from the infected Chinese tourist (GISAID accession ID: EPI_ISL_412974) and the Italian patient (GISAID accession ID: EPI_ISL_412973) in a phylogenetic context, a dataset of 40 available SARS-Cov-2 complete genomes from different countries was retrieved from GISAID (https://www.gisaid.org/, last access 2 March 2020; Supplementary material). Sequence alignment was performed using MUltiple Sequence Comparison by Log- Expectation (MUSCLE) software (http://www.clustal.org) [6]. Estimation of the best fitting substitution model (Hasegawa, Kishino, and Yano, HKY model) and inference of the phylogenetic tree were conducted by a maximum likelihood approach using Molecular Evolutionary Genetics Analysis across Computing Platforms (MEGA X; https://www.megasoftware.net/) [7]. Support for the tree topology was estimated with 1,000 bootstrap replicates. The maximum likelihood phylogenetic tree in the Figure shows a main clade containing several clusters. The viral genome sequence of the Chinese tourist (GISAID accession ID: EPI_ISL_412974) was identical to that retrieved from one sample of another Chinese tourist, hospitalised at the same hospital in Rome (GISAID accession ID: EPI_ISL_410546). The latter was closely related to that of another sample taken from the same patient (GISAID accession ID: EPI_ISL_410545). These three genome sequences were located in a cluster with genomes mainly from Europe (England, France, Italy, Sweden), but also one from Australia (Figure, highlighted in dark red).
Figure

Phylogenetic analysis of two SARS-CoV-2 complete genome sequences retrieved in this study, with available complete sequences from different countriesa (n = 40 genome sequences)

Phylogenetic analysis of two SARS-CoV-2 complete genome sequences retrieved in this study, with available complete sequences from different countriesa (n = 40 genome sequences) GISAID: Global Initiative on Sharing All Influenza Data; HKY: Hasegawa, Kishino, and Yano; MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms; SARS-CoV-2: severe acute respiratory syndrome coronavirus. Main clusters are highlighted in different colours. The Wuhan reference genome is in larger font (GenBank accession number: NC_045512.2). The filled circles represent the main supported clusters (bootstrap support values are indicated at the level of the nodes). The scale bar at the bottom of the tree represents 0.000050 nt substitutions per site. The cluster containing the viral sequence of the Chinese tourist who had visited Rome, Italy (GISAID accession ID: EPI_ISL_412974) is in dark red. This cluster includes viral sequences derived from two samples (sputum and nasopharyngeal swabs) of another Chinese tourist visiting Rome (GISAID accession IDs: EPI_ISL_410545 and EPI_ISL_410546). The viral genome sequence (GISAID accession ID: EPI_ISL_412973) derived from a patient from Lombardy, Italy, is in a cluster highlighted in green, which is different from that containing the Chinese tourist’s sequence. a The tree wasbuilt by using the best fitting substitution model (HKY) through MEGA X software. The genome sequence from the Italian patient in Lombardy (EPI_ISL_412973) appeared in contrast to be located in a different cluster including two genome sequences from Germany (EPI_ISL_406862 Bavaria/Munich and EPI_ISL_412912 Baden-Wuerttemberg-1) and one genome sequence from Mexico (EPI_ISL_ 412972), (Figure, highlighted in green). In the tree, some sequences from other SARS-CoV-2 collected in Europe segregated in separate clusters from the two clusters containing the respective patient sequences characterised in this study. There was for example a cluster formed by two sequences from England and a cluster formed by three sequences from France. Using an alignment, the single nt polymorphisms (SNPs) composition and the potentially resulting variable amino-acids in derived protein sequences compared with the Wuhan reference sequences (MN908947 and NC_045512), were investigated for the genome sequences retrieved in this study, as well as three other genome sequences (EPI_ISL_412972, EPI_ISL_ 412912, EPI_ISL_406862) that clustered with the sequence of the patient in Lombardy. The genome-wide SNPs are reported in Table 1 (positions referred respect to the reference sequence; GenBank accession number: NC_045512). The corresponding amino-acid positions and variations inside the proteins are shown in Table 2.
Table 1

Single nt polymorphisms (SNPs)a deduced by comparison of two whole genome sequences of SARS-CoV-2 characterised in this studyb with selected SARS-CoV-2 sequences (n = 7 compared sequences)

SARS-CoV-2 sequence ID (country from which the sequence originated)241303710265110831320614408158062340326144288812888228883
5' UTRORF1ab geneORF1ab geneORF 1ab geneORF1ab geneORF1ab geneORF1ab geneGene SORF3a geneGene NGene NGene N
NC_045512 (China) CCGGCCAAGGGG
MN908947 (China) CCGGCCAAGGGG
EPI_ISL:412972 (Mexico) TTGGGT-GGAAC
EPI_ISL: 412912 (Germany) TTAGCTAGGAAC
EPI_ISL: 406862 (Germany) TTGGCCAGGGGG
EPI_ISL_412973 (Italy) TTGGCTAGGGGG
EPI_ISL_412974 (Italy) CCGTCCAATGGG

N: nucleocapsid protein; ORF: open reading frame; ORF1ab: ORF encoding polyprotein; S: surface glycoprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus; SNP: single nt polymorphism; UTR: untranslated region.

a SNPs are shown according to nt positions in the genome sequence and gene location.

b The two sequences characterised in this study are the ones from Italy (EPI_ISL_412973 and EPI_ISL_412974).

Table 2

Amino acid variationsa deduced by comparing translations of two whole genome sequences of SARS-CoV-2 characterised in this studyb with those of selected SARS-CoV-2 sequences (n = 7 compared sequences)

SARS-CoV-2 strains92433343606431447045170614251203204
ORF1abORF1abORF1abORF1abORF1abORF1abSurface glycoproteinORF3aNucleocapsid phosphoproteinNucleocapsid phosphoprotein
NC_045512 (China) FGLAPQDGRG
MN908947 (China) FGLAPQDGRG
EPI_ISL:412972 (Mexico) FGLGL-c GGKR
EPI_ISL: 412912 (Germany) FSLALQGGKR
EPI_ISL: 406862 (Germany) FGLAPQGGRG
EPI_ISL_412973 (Italy) FGLALQGGRG
EPI_ISL_412974 (Italy) FGFAPQDVRG

ORF: open reading frame; ORF1ab: ORF encoding polyprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus.

a The amino acid positions refer to those in each respective protein sequence of the Wuhan reference (GenBank accession number: MN908947), starting from the first methionine.

b The two sequences characterised in this study are the ones from Italy (EPI_ISL_412973 and EPI_ISL_412974).

c -: possible sequencing error.

N: nucleocapsid protein; ORF: open reading frame; ORF1ab: ORF encoding polyprotein; S: surface glycoprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus; SNP: single nt polymorphism; UTR: untranslated region. a SNPs are shown according to nt positions in the genome sequence and gene location. b The two sequences characterised in this study are the ones from Italy (EPI_ISL_412973 and EPI_ISL_412974). ORF: open reading frame; ORF1ab: ORF encoding polyprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus. a The amino acid positions refer to those in each respective protein sequence of the Wuhan reference (GenBank accession number: MN908947), starting from the first methionine. b The two sequences characterised in this study are the ones from Italy (EPI_ISL_412973 and EPI_ISL_412974). c -: possible sequencing error. The genome sequence from the Chinese tourist hospitalised in Rome differed in two nt positions from that of the COVID-19 patient in Wuhan (NC_045512), while the genome sequence isolated from the Italian patient showed four nt variations (Table 1). For the sequence of the Chinese tourist, the first SNP inside ORF1ab (bps 3037, AA 924) did not result in an amino acid change. In the Table 2 that depicts five sequences characterised outside of China, overall eight missense mutations can be observed compared to the two reference Wuhan sequences: four locate to the ORF1ab polyprotein, whereby only the mutation L3606F has previously been reported by Phan, 2020 [8]; one, D614G, locates to the surface glycoprotein and has been prior observed [8], but is not in the receptor binding domain (RDB), responsible for virus entry into host cell; one is in the ORF3a protein and two are in the nucleocapsid protein. The sequence of the Chinese tourist hospitalised in Rome on 29 January (EPI_ISL_412974) presented a mutation 3606F in ORF1ab with respect to the reference Wuhan genome (L). In ORF3a, this sequence had a V at amino acid position 251, as opposed to a G in the references from Wuhan. Meanwhile, the sequence of the Italian patient from Lombardy (EPI_ISL 412973) presented an L at amino acidic position 4704 with respect to the reference Wuhan genome (P). It also had a mutation in the surface glycoprotein, at amino acidic position 614, where it showed a G compared to the reference sequences from Wuhan that presented a D at that position. With regard to the nucleocapsid protein, both of the sequences from the Italian patient and Chinese tourist presented the same amino acids of the references Wuhan genomes.

Discussion

In this study, the full length genomes of two SARS-CoV-2 strains (EPI_ISL_412973 and EPI_ISL_412974) isolated in Italy, one from an Italian patient, the other from a Chinese tourist visiting Rome, are completely sequenced and analysed, after virus cultivation. Compared to the viral genome sequence of the COVID-19 patient in Wuhan, the sequence from the Chinese tourist had two nt differences, while that of the Italian patient had four. Phylogenetic analysis consistently placed the Italian patient’s strain in a distinct cluster from the tourist’s strain. The strain of the Italian patient grouped with other viral strains identified in Germany and Mexico, while the strain from the Chinese tourist, related with the Wuhan virus strain, clustered with different European strains and a strain from Australia. Other sequences from strains collected in Europe, which were included in the phylogenetic analysis, ended up in separate clusters from the ones respectively containing the sequences of the two patients reported here. The results are consistent with several introductions of SARS-CoV-2 in Europe and/or further circulation of the single strain originating in Wuhan with concurrent evolution and accumulation of mutations. The mutations found in the virus identified in Lombardy, compared with the reference Wuhan strain, and the identification of amino acids changes, should be further investigated to understand whether they may affect virus characteristics. Some limitations need be mentioned: first, the lack of epidemiological information available with most sequences deposited in the database; second, the number of genomes available at the time of the analysis and consequently their selection. Nevertheless, these data may be useful to understand the dynamics of the local transmission of SARS-CoV-2 in Europe.
  5 in total

1.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

2.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

3.  Real-time tentative assessment of the epidemiological characteristics of novel coronavirus infections in Wuhan, China, as at 22 January 2020.

Authors:  Peng Wu; Xinxin Hao; Eric H Y Lau; Jessica Y Wong; Kathy S M Leung; Joseph T Wu; Benjamin J Cowling; Gabriel M Leung
Journal:  Euro Surveill       Date:  2020-01

4.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

5.  Genetic diversity and evolution of SARS-CoV-2.

Authors:  Tung Phan
Journal:  Infect Genet Evol       Date:  2020-02-21       Impact factor: 3.342

  5 in total
  62 in total

1.  Total Knee Replacement: The Inpatient-Only List and the Two Midnight Rule, Patient Impact, Length of Stay, Compliance Solutions, Audits, and Economic Consequences.

Authors:  Richard Iorio; C Lowry Barnes; Matthew P Vitale; James I Huddleston; Derek A Haas
Journal:  J Arthroplasty       Date:  2020-01-15       Impact factor: 4.757

2.  Whole-genome analysis and mutation pattern of SARS-CoV-2 during first and second wave outbreak in Gwangju, Republic of Korea.

Authors:  Shilpa Chatterjee; Choon-Mee Kim; You Mi Lee; Jun-Won Seo; Da Young Kim; Na Ra Yun; Dong-Min Kim
Journal:  Sci Rep       Date:  2022-07-05       Impact factor: 4.996

Review 3.  Tools and Techniques for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)/COVID-19 Detection.

Authors:  Seyed Hamid Safiabadi Tali; Jason J LeBlanc; Zubi Sadiq; Oyejide Damilola Oyewunmi; Carolina Camargo; Bahareh Nikpour; Narges Armanfard; Selena M Sagan; Sana Jahanshahi-Anbuhi
Journal:  Clin Microbiol Rev       Date:  2021-05-12       Impact factor: 26.132

4.  Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.

Authors:  Yatish Turakhia; Bryan Thornlow; Angie S Hinrichs; Nicola De Maio; Landen Gozashti; Robert Lanfear; David Haussler; Russell Corbett-Detig
Journal:  Nat Genet       Date:  2021-05-10       Impact factor: 41.307

5.  Deciphering the co-adaptation of codon usage between respiratory coronaviruses and their human host uncovers candidate therapeutics for COVID-19.

Authors:  Komi Nambou; Manawa Anakpa
Journal:  Infect Genet Evol       Date:  2020-07-22       Impact factor: 3.342

6.  How to choose the right real-time RT-PCR primer sets for the SARS-CoV-2 genome detection?

Authors:  Ahalieyah Anantharajah; Raphaël Helaers; Jean-Philippe Defour; Nathalie Olive; Florence Kabera; Luc Croonen; Françoise Deldime; Jean-Luc Vaerman; Cindy Barbée; Monique Bodéus; Anais Scohy; Alexia Verroken; Hector Rodriguez-Villalobos; Benoît Kabamba-Mukadi
Journal:  J Virol Methods       Date:  2021-05-24       Impact factor: 2.014

7.  A hybrid computational framework for intelligent inter-continent SARS-CoV-2 sub-strains characterization and prediction.

Authors:  Moses Effiong Ekpenyong; Mercy Ernest Edoho; Udoinyang Godwin Inyang; Faith-Michael Uzoka; Itemobong Samuel Ekaidem; Anietie Effiong Moses; Martins Ochubiojo Emeje; Youtchou Mirabeau Tatfeng; Ifiok James Udo; EnoAbasi Deborah Anwana; Oboso Edem Etim; Joseph Ikim Geoffery; Emmanuel Ambrose Dan
Journal:  Sci Rep       Date:  2021-07-15       Impact factor: 4.379

8.  COVID-19 emergency in Sicily and intersection with the 2019-2020 influenza epidemic.

Authors:  Fabio Tramuto; Walter Mazzucco; Carmelo Massimo Maida; Giuseppina Maria Elena Colomba; Daniela DI Naro; Federica Coffaro; Giorgio Graziano; Claudio Costantino; Vincenzo Restivo; Francesco Vitale
Journal:  J Prev Med Hyg       Date:  2021-04-29

9.  SARS-CoV-2 genome sequencing from post-mortem formalin-fixed, paraffin-embedded lung tissues.

Authors:  Claude Van Campenhout; Ricardo De Mendonça; Barbara Alexiou; Sarah De Clercq; Marie-Lucie Racu; Claire Royer-Chardon; Stefan Rusu; Marie Van Eycken; Maria Artesi; Keith Durkin; Patrick Mardulyn; Vincent Bours; Christine Decaestecker; Myriam Remmelink; Isabelle Salmon; Nicky D'Haene
Journal:  J Mol Diagn       Date:  2021-06-18       Impact factor: 5.568

Review 10.  SARS-CoV-2/COVID-19: Viral Genomics, Epidemiology, Vaccines, and Therapeutic Interventions.

Authors:  Mohammed Uddin; Farah Mustafa; Tahir A Rizvi; Tom Loney; Hanan Al Suwaidi; Ahmed H Hassan Al-Marzouqi; Afaf Kamal Eldin; Nabeel Alsabeeha; Thomas E Adrian; Cesare Stefanini; Norbert Nowotny; Alawi Alsheikh-Ali; Abiola C Senok
Journal:  Viruses       Date:  2020-05-10       Impact factor: 5.048

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.