Literature DB >> 34778742

Metagenomic evidence for the co-existence of SARS and H1N1 in patients from 2007-2012 flu seasons in France.

Qi Liu1,2,3, Zhenglin Du2,4,3, Sihui Zhu2,4,3, Wenming Zhao2,4,3, Hua Chen1,2,3, Yongbiao Xue2,4,3.   

Abstract

By re-analzying public metagenomic data from 101 patients infected with influenza A virus during the 2007-2012 H1N1 flu seasons in France, we identified 22 samples with SARS-CoV sequences. In 3 of them, the SARS genome sequences could be fully assembled out of each. These sequences are highly similar (99.99% and 99.7%) to the artificially constructed recombinant SARS-CoV (SARSr-CoV) strains generated by the J. Craig Venter Institute in the USA. Moreover, samples from different flu seasons have different SARS-CoV strains, and the divergence between these strains cannot be explained by natural evolution. Our study also shows that retrospective studies using public metagenomic data from past major epidemic outbreaks serve as a genomic strategy for researching the origins or spread of infectious diseases. .

Entities:  

Keywords:  Influenza A Virus; Metagenomics; Retrospective study; SARS-CoV

Year:  2021        PMID: 34778742      PMCID: PMC8577621          DOI: 10.1016/j.bsheal.2021.11.002

Source DB:  PubMed          Journal:  Biosaf Health        ISSN: 2590-0536


Introduction

Genome sequencing has been used to identify the pathogen, trace virus origin, and provide outbreak surveillance for infectious disease studies. For example, the availability of the complete genome of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in January 2020 sped up the identification of the pathogen and facilitated the development of effective vaccines. With the accumulation of extensive metagenomic sequence data in the past years, one potential application is to carry out genome-based retrospective studies on major historical outbreaks to understand the occurrence and development of viral epidemics.

Materials and Methods

Data collection

The metagenomic data of 101 patients infected by the influenza A virus were downloaded from the NCBI SRA database (project ID: PRJEB11406). These samples were collected by the National Influenza Center near Paris, France, between 2007 and 2012, spanning five consecutive flu seasons. Sequencing data of these samples were submitted by Institute Pasteur, France, in 2018 (Table S1) [1].

Variant calling and consensus sequence generating

After removing sequencing adapters and trimming consecutive low-quality bases from the 5' and 3' read ends, cutadapt [2], clean reads were mapped to the SARS-CoV genome (NC_004718.3) using BWA (V0.7.12) [3] with default parameters. Next, the Picard program (http://picard.sourceforge.net) was used to sort mapping results to BAM format and mark duplicates of PCR amplification. Then GATK (V4.1.6.0) [4] was used for SNP and indel calling. Finally, consensus sequences were generated by applying VCF variants to the reference sequence using bcftools (v1.9) [5].

Sequence alignment, phylogenetic and network analysis

Two hundred fifty genomes of SARS-CoV with genome lengths larger than 29,0000 bases were downloaded from the NCBI Nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore). We then constructed a multiple sequence alignment of 253 genomes, using the MAFFT v7.453 with parameter “--auto” [6], and the final alignment contains 30,327 nucleotides. Neighbor-joining (NJ) phylogenetic trees of the 253 genome sequences were constructed using MEGA X 10.1.8 with a maximum composite likelihood model and default parameters [7]. In addition, phylogenetic relationships and mutations that occurred among unique genomes were further inspected from 253 genomes through median-joining networks [8], using the Network 10 (http://www.fluxus-engineering.com/) to examine changes of genetic variations across places and through times. For network analysis, an 81-bp block at the 5’-end including gaps and a 77-bp block at the 3’-end including gaps and the poly-A tails in the alignment was trimmed out, and the final alignment contains 30,169 nucleotides.

Pairwise sequence alignment

We used the BLAST online tools with default parameters (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to align two sequences.

Results

We re-analyzed public metagenomic data from 101 patients infected with influenza A virus (H1N1), collected by the Pasteur Institute in France between 2007 and 2012, spanning five consecutive flu seasons [1]. By mapping to the SARS-CoV reference genome (NC_004718.3), 22 (21.78%) out of the 101 patient samples were identified containing SARS-CoV fragments with different proportions (0.0003% ∼ 0.6127%) (Table S1). Among them, three samples were deeply sequenced with sequencing depth >30 and genome coverage >99% (ERR1091908, ERR1091910, and ERR1091914), which complete genomes were generated from the consensus sequences using variant identification and substitution methods (Table S1, Figure S1). These 22 samples have different numbers of mutations ranging from 10 to 116 compared with the SARS-CoV reference genome (NC_004718.3, see Table S1). In addition, the samples collected during the same flu season share similar mutations, while samples from different flu seasons possess different sets of mutations (Fig. S1, Table 1 ), suggesting that distinct SARS-CoV strains existed in different flu seasons in France.
Table 1

Summary of three SARS-CoV sequences and their closest sequences.

ClusterSample IDSample collection locationCollection dateOrganismSubmitterIdentity with SARS-CoV sequences (%)
1ERR1091914Haute Normandie, France2008/12/23Influenza A virusInstitute Pasteur99.99
FJ882938*Tennessee, USA2007/9/22SARS-CoV wtic-MBJ. Craig Venter Institute
2ERR1091908Lorraine, France2008/1/15Influenza A virusInstitute Pasteur99.70
ERR1091910Picardie, France2008/1/16Influenza A virus
FJ882941Nashville, Tennessee, USA2008/3/28SARS-CoV ExoN1J. Craig Venter Institute

*One sequence closed to ERR1091914 is shown and the remaining 18 sequences closed to ERR1091914 are presented in Table S2.

Summary of three SARS-CoV sequences and their closest sequences. *One sequence closed to ERR1091914 is shown and the remaining 18 sequences closed to ERR1091914 are presented in Table S2. SARS-CoV, a life-threatening respiratory infectious disease [9], once caused an outbreak of SARS in 2002 but disappeared in human populations after 2003. To investigate the origin of these SARS-CoV sequences detected in the patients infected with influenza A virus between 2007 and 2012, we noticed the three assembled SARS-CoV genome sequences (ERR1091908, ERR1091910, and ERR1091914) were pooled together with 250 complete SARS-CoV genomes downloaded from the NCBI database. Therefore, we constructed a phylogeny of these sequences with the neighbor-joining approach and a haplotype network with the median-joining process, respectively (Fig. 1 ). As a result, we found that ERR1091908 and ERR1091910 are clustered with the SARS-CoV ExoN1 strain (colored in blue in Fig. 1) while ERR1091914 is clustered with the SARS-CoV wtic-MB strain (colored in black in Fig. 1), consistent with the finding of differential SARS-CoV mutations in the patient samples from different flu seasons. Note that SARS-CoV ExoN1, SARS-CoV wtic-MB, SARS-CoV MA15, and SARS-CoV MA15 ExoN1 all belong to recombinant SARSr-CoV, a group of SARS-CoV sequences artificially constructed using the exact infectious clone (ic) recombinant virus strain of SARS-CoV Urbani (AY278741) that was initially isolated from a patient with SARS-CoV [10], [11], [12]. Moreover, the three newly assembled SARS-CoV sequences along all the recombinant SARSr-CoV sequences are separated from other naturally occurring SARS-CoV sequences, including AY278741 (colored in yellow) cyan in Fig. 1) in the phylogeny and haplotype network. Therefore, the above-said results indicate that all the three SARS-CoV sequences (ERR1091908, ERR1091910, and ERR1091914) are more likely to be categorized as artificially constructed recombinant SARSr-CoV strains.
Fig. 1

Phylogeny tree and haplotype network of 253 SARS-CoV genome sequences. a. A phylogeny of the 253 SARS-CoV genome sequences is constructed with the neighbor-joining approach. b. Haplotype network of the 253 SARS-CoV genome sequences is constructed with the median-joining approach. Size of each circle represents the number of identical sequences.

Phylogeny tree and haplotype network of 253 SARS-CoV genome sequences. a. A phylogeny of the 253 SARS-CoV genome sequences is constructed with the neighbor-joining approach. b. Haplotype network of the 253 SARS-CoV genome sequences is constructed with the median-joining approach. Size of each circle represents the number of identical sequences. ERR1091914 is almost identical (99.99%) to SARS-CoV wtic-MB (19 identical sequences in the data), with only one base pair different (bp) (R10626A, referred to the genome position of FJ882938.1) after trimming the 20-bp at the 5’-end and 44-bp at the 3’-end of ERR1091914 (Table 1). Here, R represents A or G base. There are a total of 15 mutation differences between ERR1091914 and AY278741, the closest naturally occurring SARS-CoV, and the 15 mutations are shared by all the recombinant SARSr-CoV, indicating that ERR1091914 is indeed derived from the SARSr-CoV wtic-MB sequences instead of evolving independently from naturally occurring SARS-CoV sequences. The other two newly identified SARS-CoV sequences (ERR1091908 and ERR1091910) are almost identical with only one base difference (99.99%). Both lines are highly similar (99.70%) to the SARS-CoV ExoN1 sequence FJ882941.1, with 88 base differences after trimming 21-bp at the 5’-end and 65-bp at 3’-end (Table 1). Annotation results showed that 84 of the 88 bases were located at coding regions of SARS-CoV and resulted in 45 amino acid (AA) substitutions. Most of these mutations are in the coding regions of open reading frame 1a (ORF1a) (13 substitutions), ORF1ab (7 substitutions), and the spike (S) glycoprotein (9 substitutions, Table 2 ), among which ORF1ab and S glycoprotein are functionally related to the viral replication, transmission and pathogenicity [13], [14], suggesting potential gain-function effects of these mutations in ERR1091908 and ERR1091910. SARS-CoV ExoN1 strains are known to have a 21-fold increase in mutation rate during replication in previous research [12], which is consistent with the fact that the sequences within the SARS-CoV ExoN1 clades of the haplotype network are highly divergent compared with other recombinant SARSr-CoV clades (Fig. S2). Assuming a mutation rate of 1.0 × 10-3 per site per year for naturally evolving SARS-CoV and the SARS-CoV genome length of 29,751 bp, only 1.69 months (50.7 days) are needed to generate 88 site differences between the two SARS-CoV ExoN1 sequences, indicating a recent divergence time between the two sequences (ERR1091908 and ERR1091910) and FJ882941.1.
Table 2

The positions of 84 base differences between ERR1091908 and FJ882941.1

ORFCDS (position in NC_004718.3)Mutations found in ERR1091908 compared to FJ882941.1Function [ref]
Positions of nucleotide change (number)Positions of amino acid change in SARS-CoV protein (number)
ORF 1a265-13413654,707,1771,1905,2976,3229,3491,3603,3845,4731,4808,5015,5061,5236,5412,6087,6265,6459,6476,7484,8004,8922,10119,10658,12411,13149 (26)148,503,989,1076,1194,1489,1515,1584,1658,2001,2071,2407,3465 (13)Involved in viral replication and transcription, and virus pathogenesis [13], [15]
ORF 1ab265-2148513874,13925,14178,14630,14876,15497,15605,15740,15821,15905,16356,16386,17269,17602,18238,18239,18244,18245,18749,18860,19082,19814,19917,20528,20555,20789,21038 (27)987,997,1291,1402,1614,1616,2174 (7)
S21492-2525921860,22206,22352,22423,23243,23374,23468,23518,23823,24249,24873,24910,24957 (13)239,311,628,676,778,920,1128,1140,1156 (9)Associated with cell entry of SARS-CoV and viral transmission [14], [16]
ORF 3a25268-2609225550,25626,25783,25800,26049 (5)120,178,261 (3)Playing roles in virus uptake and release, viral-related apoptosis, and formation of viral envelope [17]
ORF 3b25689-2615325783,25800,26049,26121 (4)32,38,121,145 (4)Involving in immunomodulation, and acting as interferon antagonist [17]
E26117-2634726121,26226,26241,26335 (4)2,37,42 (3)A small integral membrane proteins with roles in virus morphogenesis, assembly, budding, and replication [15]
M26398-27063NANANA
ORF 627074-2726527167,27248 (2)32,59 (2)Acting as a β-interferon antagonist and contribute to virulence [17]
ORF 7a27273-2764127290,27639 (2)123 (1)Involving in virus-host interaction and contribute to SARS-CoV pathogenesis [17]
ORF 7b27638-2777227639,27648 (2)1,4 (2)A potential attenuating factor [17]
ORF 8a27779-27898NANAPotential roles in the host ubiquitin–proteasome system [17]
ORF 8b27864-2811827917 (1)NA
N28120-2938828557,29271,29324 (3)402 (1)Playing role in virus replication and transcription, and acting as an interferon antagonist [15], [16]
ORF 9b28130-28426NANAInducing caspase-dependent apoptosis [17]
Total numberNA8445NA
The positions of 84 base differences between ERR1091908 and FJ882941.1 The sequencing data of 101 patients were submitted by the Pasteur Institute in France, which established a laboratory with a level-three biosafety standard and conducted research on SARS-CoV [18], [19], [20]. All the 116 recombinant SARSr-CoV sequences were submitted by the J. Craig Venter Institute (JCVI) from Tennessee, USA. The two institutes have collaborated and published their work on viral genome sequence [21].

Discussion

One possible explanation of the co-existence of SARS and H1N1 sequences in the patients is that the artificially constructed recombinant SARSr-CoV caused a co-infection outside the laboratory during 2007 - 2012, but did not result in the SARS-CoV epidemic; an alternative hypothesis is a contamination of the samples in the lab since the Pasteur Institute also conducted SARS-CoV studies. Moreover, samples from different flu seasons have different strains of SARS-CoV, and the divergence between these SARS-CoV strains cannot be explained by natural evolution. This intriguing finding warrants further efforts to sleuth out the culprit. In 2014, the Pasteur Institute France once lost vials containing patient samples collected during SARS (https://www.sciencemag.org/news/2014/05/frances-institut-pasteur-under-fire-over-missing-sars-vials). It raises a serious concern about laboratory biosafety in both institutions. Our study also shows that retrospective studies using public metagenomic data from past major epidemic outbreaks serve as a proper genomic strategy for researching the origins or spread of infectious diseases.
  20 in total

1.  Median-joining networks for inferring intraspecific phylogenies.

Authors:  H J Bandelt; P Forster; A Röhl
Journal:  Mol Biol Evol       Date:  1999-01       Impact factor: 16.240

2.  Highly heterogeneous temperature sensitivity of 2009 pandemic influenza A(H1N1) viral isolates, northern France.

Authors:  I Pelletier; D Rousset; V Enouf; F Colbere-Garapin; S van der Werf; N Naffakh
Journal:  Euro Surveill       Date:  2011-10-27

3.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

4.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

5.  Reverse genetics with a full-length infectious cDNA of severe acute respiratory syndrome coronavirus.

Authors:  Boyd Yount; Kristopher M Curtis; Elizabeth A Fritz; Lisa E Hensley; Peter B Jahrling; Erik Prentice; Mark R Denison; Thomas W Geisbert; Ralph S Baric
Journal:  Proc Natl Acad Sci U S A       Date:  2003-10-20       Impact factor: 11.205

Review 6.  SARS-CoV and emergent coronaviruses: viral determinants of interspecies transmission.

Authors:  Meagan Bolles; Eric Donaldson; Ralph Baric
Journal:  Curr Opin Virol       Date:  2011-12       Impact factor: 7.090

7.  Twelve years of SAMtools and BCFtools.

Authors:  Petr Danecek; James K Bonfield; Jennifer Liddle; John Marshall; Valeriu Ohan; Martin O Pollard; Andrew Whitwham; Thomas Keane; Shane A McCarthy; Robert M Davies; Heng Li
Journal:  Gigascience       Date:  2021-02-16       Impact factor: 6.524

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

9.  The complete genome sequence of Yersinia pseudotuberculosis IP31758, the causative agent of Far East scarlet-like fever.

Authors:  Mark Eppinger; M J Rosovitz; Wolfgang Florian Fricke; David A Rasko; Galina Kokorina; Corinne Fayolle; Luther E Lindler; Elisabeth Carniel; Jacques Ravel
Journal:  PLoS Genet       Date:  2007-07-10       Impact factor: 5.917

Review 10.  Accessory proteins of SARS-CoV and other coronaviruses.

Authors:  Ding Xiang Liu; To Sing Fung; Kelvin Kian-Long Chong; Aditi Shukla; Rolf Hilgenfeld
Journal:  Antiviral Res       Date:  2014-07-01       Impact factor: 5.970

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.