Literature DB >> 30429836

Seasonal Genetic Drift of Human Influenza A Virus Quasispecies Revealed by Deep Sequencing.

Cyril Barbezange^1,2,3,4, Louis Jones^2,3,4,5, Hervé Blanc^1,3, Ofer Isakov⁶, Gershon Celniker⁶, Vincent Enouf^2,3,4, Noam Shomron⁶, Marco Vignuzzi^1,3, Sylvie van der Werf^2,3,4.

Abstract

After a pandemic wave in 2009 following their introduction in the human population, the H1N1pdm09 viruses replaced the previously circulating, pre-pandemic H1N1 virus and, along with H3N2 viruses, are now responsible for the seasonal influenza type A epidemics. So far, the evolutionary potential of influenza viruses has been mainly documented by consensus sequencing data. However, like other RNA viruses, influenza A viruses exist as a population of diverse, albeit related, viruses, or quasispecies. Interest in this quasispecies nature has increased with the development of next generation sequencing (NGS) technologies that allow a more in-depth study of the genetic variability. NGS deep sequencing methodologies were applied to determine the whole genome genetic heterogeneity of the three categories of influenza A viruses that circulated in humans between 2007 and 2012 in France, directly from clinical respiratory specimens. Mutation frequencies and single nucleotide polymorphisms were used for comparisons to address the level of natural intrinsic heterogeneity of influenza A viruses. Clear differences in single nucleotide polymorphism profiles between seasons for a given subtype also revealed the constant genetic drift that human influenza A virus quasispecies undergo.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: NGS; genetic drift; influenza season; influenza virus; quasispecies

Year: 2018 PMID： 30429836 PMCID： PMC6220372 DOI： 10.3389/fmicb.2018.02596

Source DB: PubMed Journal: Front Microbiol ISSN： 1664-302X Impact factor: 5.640

Introduction

The development of deep sequencing technology, also known as next generation sequencing (NGS), offers a powerful tool to study the intrinsic heterogeneity of nucleic acids. For RNA viruses, it represents a major improvement in the study of their quasispecies nature over cloning and clone sequencing. The notion of quasispecies refers to the fact that RNA viruses exist as heterogeneous populations of closely related genetic variants (Domingo, 2006; Lauring and Andino, 2010), because their polymerase lacks fidelity and introduces point mutations during replication (Steinhauer et al., 1989a,b). Although the number of studies on RNA virus quasispecies evaluated by deep sequencing technology has dramatically increased in the past few years (Quinones-Mateu et al., 2014), there are still a limited number of such studies for influenza viruses. Influenza A viruses belong to the Orthomyxoviridae viral family. One particularity of influenza A viruses is the segmented nature of their genome, which is composed of eight single-stranded RNA molecules of negative polarity. Each genomic segment encodes one major protein: the three polymerase sub-units PB2, PB1, and PA; the nucleoprotein (NP) associated with each of the RNA segments; the two envelope glycoproteins hemagglutinin (HA) and neuraminidase (NA); the matrix protein M1; and the multifunctional non-structural protein NS1. In addition, several segments encode one or more supplementary proteins through mechanisms of alternative splicing (PB2, MP/M, NS segments), alternative translation initiation (PB1, PA segments), or ribosomal frame-shift (PA segment), among which the ion channel protein M2 encoded by the M segment and the nuclear export protein NEP/NS2 encoded by the NS segment are best characterized (Wise et al., 2011; Jagger et al., 2012; Muramoto et al., 2013; Shaw and Palese, 2013; Yamayoshi et al., 2015). The two glycoproteins in the viral envelope, HA and NA, are used to define subtypes among influenza A viruses (Munster et al., 2007; Tong et al., 2013). Continuous genetic variation of influenza viruses due to the low fidelity of the viral polymerase may translate into antigenic variations through mutations in the HA and/or NA, a phenomenon also known as antigenic drift. In addition, introductions into the human population of new influenza A viruses, or antigenic shift, can occur through cross-species transmission from the animal reservoir (mainly poultry and pigs), and occasionally lead to devastating pandemics. After a pandemic wave, the newly introduced viruses gradually become seasonal (Tscherne and Garcia-Sastre, 2011). They are then responsible for yearly epidemics, mostly causing a relatively mild disease of the upper respiratory tract, that nonetheless represent a major burden for public health worldwide, with significant morbidity and mortality (Molinari et al., 2007; Ortega-Sanchez et al., 2012). Since 1977, two influenza A virus subtypes, H1N1 and H3N2, co-circulate in the human population. While undergoing genetic variation, H1N1 viruses with the H275Y mutation in NA that confers resistance to antiviral oseltamivir treatment emerged in late 2007 in the human population and circulated independently of the use of antiviral treatment to eventually become dominant over the sensitive H1N1 viruses by late 2008 (Meijer et al., 2009; Yang et al., 2011). These H1N1 viruses were fully replaced by the H1N1pdm09 viruses that originated from pigs and resulted in the 2009 pandemics (Neumann et al., 2009). The H1N1pdm09 viruses became established in the human population and currently co-circulate with H3N2 viruses. Deep sequencing technology for influenza virus has mainly been used to determine the consensus sequence more rapidly than conventional methods and for a large number of samples simultaneously (Hoper et al., 2011; Ren et al., 2013; Rutvisuttinunt et al., 2013; Lin et al., 2014; Monne et al., 2014; Lee et al., 2016; Zhao et al., 2016), and to study the intrinsic heterogeneity of the virus genome on cell-adapted viruses (Watson et al., 2013; Thyagarajan and Bloom, 2014) or in clinical samples but focusing primarily on the HA and NA genes (Chen et al., 2010; Ghedin et al., 2011; Fordyce et al., 2013; Téllez-Sosa et al., 2013; Pizzorno et al., 2014; Dinis et al., 2016; Poon et al., 2016; Pichon et al., 2017). Deep sequencing was also applied to monitor the emergence of antiviral resistance in patients treated with oseltamivir (Ghedin et al., 2012; Rogers et al., 2015) or to understand transmission pathways and investigate local outbreaks (Ghedin et al., 2012; Poon et al., 2016; Meinel et al., 2018). Here, by evaluating the virus quasispecies diversity directly in infected human respiratory specimens, we demonstrated differences in the intrinsic genetic diversity between subtypes and showed that the composition of the quasispecies evolves season after season.

Results

Sample Characteristics

The samples (Supplementary Table 1) used in this study were collected during five consecutive seasons of influenza surveillance by the Northern France National Influenza Center (from 2007–2008 to 2011–2012). Respiratory specimens were selected from the collection of samples that tested positive for influenza A virus and for which the virus subtype was determined. For each season, more samples were selected from the dominant subtype that was circulating: pre-pandemic H1N1 (sH1N1) in 2007–2008, H3N2 (sH3N2) in 2008–2009 and 2011–2012, and pandemic H1N1pdm09 (pH1N1) in 2009–2010 and 2010–2011. For sH1N1 positive samples, selection was also based on the presence or absence of the H275Y mutation in NA, which confers resistance to oseltamivir. In addition, the selection included samples from mild cases collected through the GROG community surveillance network, and from severe cases defined as requiring external respiratory assistance and collected through the RENAL network of hospital laboratories. The median patient age was 22, 41, and 29 years old for sH1N1, H3N2 and pH1N1, respectively (95% CI of median: 5–38, 30–50, and 16–36 years old, respectively), with the patient age ranging from a minimum of 2, 0.8, and 3 years-old to a maximum of 81, 91, and 83 years old, respectively. The interval from onset of symptoms to sample collection was similar for the three subtypes, with a median interval of 1, 1.5, and 1 day for sH1N1, H3N2, and pH1N1, respectively (95% CI of median: 1–2 days for the three subtypes). The maximum sampling time from symptoms onset was 9, 5, and 6 days for sH1N1, H3N2, and pH1N1, respectively. PCRs were performed on cDNA synthesized from RNA directly extracted from the clinical specimen in order to avoid artifacts due to virus amplification in cell culture or in eggs. The strategy implemented to cover as much as possible of the coding regions was based on 12 specific PCRs, the four larger segments being amplified by two overlapping PCRs. Moreover, high-fidelity enzymes for both cDNA synthesis and PCR were used to reduce the incorporation of errors during viral genome amplification.

Read Quality Control and Read Cleaning

For each sample, reads obtained by Illumina sequencing were checked for quality (Supplementary Table 1) and cleaned by three successive steps using FastQC and fqcleaner Galaxy tools (Mareuil et al., 2017): first, reads were cleaned of Illumina adaptors and based on Phred scores (retained bases had a score above 30); then reads were cleaned of PCR primer sequences and non-PCR target contaminants; finally the remaining reads were cleaned of non-influenza virus contaminants, using the sequence of other viruses manipulated in the laboratory. For all samples, this last cleaning showed that very few contaminations occurred during sample preparation. For good runs, between 60 and 70% of the reads were retained after the three steps of cleaning. One run for pH1N1 was characterized by an excess of Illumina adaptor contaminants, but the remaining cleaning steps led to proportions of retained reads similar to what was observed for good runs, even if the overall retain proportion was of course lower. The reads obtained after the three steps of cleaning were deposited in the European Nucleotide Archive database for NGS sequences[1] (Accession No. ERP012790). The consensus sequence was extracted for each segment of each sample and was deposited in the Global Initiative on Sharing All Influenza Data (GISAID) database (accession numbers in Supplementary Table 1).

Comparison of ViVAN and LoFreq Pipelines to Analyze Virus NGS Data

To analyze the intrinsic genetic diversity, two pipelines were used to process the cleaned reads and identify positions with significant variants, using each sample’s own consensus sequence as reference. Both pipelines were developed by independent groups and use different algorithms to identify significant variants. The ViVAN pipeline is an all-inclusive pipeline (its own mapper is included) and was recently developed specifically for virus NGS data (Isakov et al., 2015). LoFreq was also specifically designed for virus NGS data and is already recognized within the scientific community (Wilm et al., 2012); it was used in conjunction with Bowtie mapper but we will refer to “LoFreq pipeline” for the whole process of mapping and extracting variant positions. We first compared the “depth of sequencing” obtained by both pipelines. Mapping of the cleaned reads on the reference sequence allowed to extract the number of times each position was covered, i.e., the depth of sequencing. For both pipelines, it was found to be homogeneous along each PCR product. The mean depth of sequencing was calculated for each viral genomic segment sequence of each sample and ranged from around 200 to around 105 reads per position, with the majority of the mean depth values being above 103 reads per position. Both pipelines gave similar results for a given sample, except above 104 reads per position where the LoFreq pipeline was more powerful to map cleaned reads (Figure 1A and Supplementary Table 1). Consequently, the median of the mean depth values was higher with LoFreq (in reads per position: for sH1N1, median 11989, 95% CI of median 9894–14065, min 269, max 89267; for sH3N2, median 11567, 95% CI of median 10237–12293, min 273, max 94094; for pH1N1, median 3790, 95% CI of median 2816–4781, min 175, max 110570) than with ViVAN (in reads per position: for sH1N1, median 6228, 95% CI of median 5796–6471, min 268, max 7747; for sH3N2, median 6398, 95% CI of median 6034–6703, min 271, max 7633; for pH1N1, median 3698, 95% CI of median 2785–4343, min 173, max 7751).

FIGURE 1

Comparison of LoFreq and ViVAN. (A) Correlation of the mean depths of sequencing (in number of reads per position) obtained by both pipelines. X-axis: LoFreq; Y-axis: ViVAN. Each dot represents one segment of one sample: polymerase sub-unit PB2: olive; polymerase sub-unit PB1: green; polymerase sub-unit PA: purple; hemagglutinin HA: red; nucleoprotein NP: dark blue; neuraminidase NA: pink; matrix and ion channel proteins M: orange; non-structural protein and nuclear export protein NS: light blue. (B) Mean number of variant positions identified by the pipelines, without or with setting a mutation frequency threshold. For each bar, from left to right: gray, specific to LoFreq; light gray, common to LoFreq and ViVAN; dark gray, specific to ViVAN. Error bars represent the Standard Error to the Mean. (C) Correlation of the global mutation frequencies obtained with both pipelines with or without setting a mutation frequency threshold. X-axis: LoFreq; Y-axis: ViVAN. Each dot represents one segment of one sample: PB2: olive; PB1: green; PA: purple; HA: red; NP: dark blue; NA: pink; M: orange; NS: light blue. Both pipelines include tests that take into account the depth of sequencing, the complementarity of read orientation, the error rate of the Illumina method and the nature of the nucleotides in the sequence surrounding the position of interest, in order to identify positions with significant variants, even if those are present at a very low frequency (for a given position, number of times a base is different from the reference sequence – here the consensus sequence of the given sample – divided by depth of sequencing). It is commonly accepted that very low frequency variants might be unreliable. We compared the numbers of significant variants obtained by both pipelines without a threshold or after setting a mutation frequency threshold at 0.005 for retaining significant variants (meaning that only positions with a significant variant representing more than 0.5% of the reads would be counted). This threshold represents five times the error rate recognized for the Illumina method using standard, non-high-fidelity enzymes to prepare PCR products from RNA (Nakamura et al., 2011). While an important number of variant positions were specific of each pipeline when no threshold was set, Figure 1B shows that a 0.005 frequency threshold reduced the number of pipeline-specific variant positions (Supplementary Tables 1, 2). This was particularly true for the LoFreq pipeline, for which only a few pipeline-specific variants remained for the three subtypes. We further estimated the viral intrinsic genetic diversity by calculating, for each variant caller pipeline, the “global mutation frequency” for each segment of each sample, using the data generated by the ViVAN and LoFreq pipelines when no threshold or a 0.005 threshold was used (Supplementary Table 1). Basically, the number of variant bases corresponding to all identified or retained variant positions was divided by the total number of sequenced bases (i.e., the sum of the depth of sequencing) covering a given item (segment-sample). When there was no variant identified in a segment (especially when the 0.005 threshold was applied), we allocated an arbitrary value for global mutation frequency slightly below the lowest calculated value among the samples of the same group (one genomic segment of one subtype), in order to avoid null global mutation frequencies. A good correlation between the two pipelines was found when the global mutation frequencies were compared (Figure 1C and Table 1). No correlation was found between possible biases (mean depth, age of patient, virus load) and the global mutation frequency (Supplementary Figure 1 and Supplementary Table 3).

Table 1

Correlation between global mutation frequencies obtained with LoFreq and ViVAN, evaluated by Spearman non-parametric test.

Virus	Threshold	Genomic segment	Pairs	Spearman correlation test
				rs	95% CI
sH1N1	0.0	PB2	27	0.9695	0.9318–0.9865
		PB1	27	0.9811	0.9574–0.9917
		PA	27	0.9536	0.8973–0.9794
		HA	27	0.9470	0.8832–0.9764
		NP	27	0.9707	0.9344–0.9870
		NA	27	0.9690	0.9307–0.9863
		M	26	0.9096	0.8020–0.9600
		NS	22	0.9367	0.8476–0.9744
	0.5	PB2	27	0.9858	0.9679–0.9937
		PB1	27	0.9750	0.9439–0.9889
		PA	27	0.8630	0.7129–0.9375
		HA	27	0.9321	0.8516–0.9696
		NP	27	0.9807	0.9565–0.9915
		NA	27	0.9670	0.9263–0.9854
		M	26	0.9304	0.8456–0.9694
		NS	22	0.9133	0.7948–0.9647
pH1N1	0.0	PB2	38	0.9941	0.9883–0.9970
		PB1	35	0.9950	0.9897–0.9975
		PA	39	0.9921	0.9845–0.9959
		HA	39	0.9883	0.9771–0.9940
		NP	38	0.9932	0.9866–0.9966
		NA	39	0.9954	0.9910–0.9976
		M	39	0.9860	0.9728–0.9928
		NS	39	0.9775	0.9564–0.9885
	0.5	PB2	38	0.9965	0.9931–0.9982
		PB1	35	0.9988	0.9976–0.9994
		PA	39	0.9860	0.9727–0.9928
		HA	39	0.9931	0.9866–0.9965
		NP	38	0.9928	0.9859–0.9964
		NA	39	0.9945	0.9893–0.9972
		M	39	0.9880	0.9767–0.9939
		NS	39	0.9642	0.9310–0.9815
sH3N2	0.0	PB2	35	0.9549	0.9101–0.9776
		PB1	35	0.9826	0.9649–0.9915
		PA	35	0.9504	0.9014–0.9754
		HA	35	0.9437	0.8884–0.9720
		NP	35	0.9894	0.9784–0.9948
		NA	35	0.9731	0.9459–0.9867
		M	35	0.9717	0.9431–0.9860
		NS	32	0.9065	0.8120–0.9547
	0.5	PB2	35	0.9661	0.9320–0.9832
		PB1	35	0.9694	0.9386–0.9849
		PA	35	0.9569	0.9139–0.9786
		HA	35	0.9383	0.8779–0.9693
		NP	35	0.9874	0.9744–0.9938
		NA	35	0.9837	0.9671–0.9920
		M	35	0.9935	0.9868–0.9968
		NS	32	0.9781	0.9542–0.9896

Correlation between global mutation frequencies obtained with LoFreq and ViVAN, evaluated by Spearman non-parametric test.

Group Comparisons: Severity Status Within Subtype; Seasons Within Subtype; Between Subtypes

Global mutation frequency data generated with the LoFreq pipeline at a 0.005 threshold were used to analyze the differences between different groups. Mann–Whitney–Wilcoxon non-parametric test was used for all pairwise comparisons. Three subtypes were considered: sH1N1, sH3N2, and pH1N1. Within sH1N1, subgroups were defined, based on the mutation conferring resistance to oseltamivir (H275Y in the neuraminidase) and on the season. Within sH3N2 and pH1N1, subgroups were defined, based on the severity status (mild and severe cases) and on the season. The few samples of the non-dominant subtype during a season were not included and only the samples belonging to the specified groups within a season were used for comparison. The global mutation frequencies calculated for each segment of each sample of each group are shown in Figure 2.

FIGURE 2

Representation of the global mutation frequencies according to season and group of interest. Values were calculated from the data obtained with LoFreq at a 0.005 threshold. Each symbol represents one segment of one sample: polymerase sub-unit PB2: olive; polymerase sub-unit PB1: green; polymerase sub-unit PA: purple; hemagglutinin HA: red; nucleoprotein NP: dark blue; neuraminidase NA: pink; matrix and ion channel proteins M: orange; non-structural protein and nuclear export protein NS: light blue. For pre-pandemic sH1N1 viruses, groups were based on the oseltamivir-resistance mutation H275Y (275H: sensitive; 275Y: resistant) in the neuraminidase; for both H3N2 and pandemic pH1N1 viruses, groups were based on the severity according to the network of sampling (mild through community physicians, severe through hospitals).

Comparisons of Global Mutation Frequency Distributions Within Each Subtype

Whatever the genomic segment, no significant difference in the global mutation frequency distribution between any subgroups was observed for sH1N1 samples; they were pooled to form a unique group “sH1N1.” The median global mutation frequency for sH1N1 ranged from 6.8 mutations per 106 bases for the NS segment to 1.4 mutations per 104 bases for the PB1 segment (Table 2). Similarly for sH3N2, whatever the genomic segment, no significant difference was observed in the global mutation frequency distribution between the mild and severe cases or between the two studied seasons; all samples were pooled to form the “sH3N2” group. The median global mutation frequency for sH3N2 ranged from one mutation per 105 bases for the NA segment to 3.2 mutations per 105 bases for the NP segment (Table 2). For pH1N1, whatever the genomic segment, no significant difference was observed between mild and severe cases in each studied season. When comparing seasons, a significant difference was observed in the global mutation frequency distribution for genomic segment PA encoding the PA polymerase sub-unit (Mann–Whitney–Wilcoxon U = 61; N2009-2010 = 20; N2010-2011 = 16; actual difference = 2.6 × 10-4; 95% CI of difference = 6.4 × 10-5 to 8.9 × 10-4) and for segment MP encoding the M1 matrix and M2 ion channel proteins (Mann–Whitney–Wilcoxon U = 95; N2009-2010 = 20; N2010-2011 = 16; actual difference = 1.9 × 10-4; 95% CI of difference = 0 to 3.9 × 10-4). The samples were pooled to form the “pH1N1” group, except for PA and MP, for which each season was kept separated for the subtype comparison. The pH1N1 median global mutation frequency ranged from 6.1 mutations per 105 bases for the MP segment of season 2009–2010 to 4.2 mutations per 104 bases for the PB2 segment encoding the PB2 polymerase sub-unit (Table 2).

Table 2

Median global mutation frequency per segment for each virus.

Virus	Genomic segment	Specific season	Sample number	Median	95% CI of median
					Actual %	Lower to upper
sH1N1	PB2		27	2.5 × 10^-5	98.08	6.4 × 10^-6 to 5.0 × 10^-5
	PB1		27	1.4 × 10^-4	98.08	3.5 × 10^-5 to 8.3 × 10^-4
	PA		27	2.9 × 10^-5	98.08	1.1 × 10^-5 to 4.0 × 10^-5
	HA		27	5.1 × 10^-5	98.08	9.4 × 10^-6 to 1.2 × 10^-4
	NP		27	4.0 × 10^-5	98.08	9.9 × 10^-6 to 2.1 × 10^-4
	NA		27	4.3 × 10^-5	98.08	1.6 × 10^-5 to 1.1 × 10^-4
	M		26	2.3 × 10^-5	97.10	7.0 × 10^-6 to 8.2 × 10^-5
	NS		22	6.7 × 10^-6	98.31	2.0 × 10^-7 to 6.3 × 10^-5
sH3N2	PB2		31	2.3 × 10^-5	97.06	7.0 × 10^-6 to 7.5 × 10^-5
	PB1		31	2.9 × 10^-5	97.06	9.4 × 10^-6 to 8.3 × 10^-5
	PA		31	2.0 × 10^-5	97.06	7.8 × 10^-6 to 1.1 × 10^-4
	HA		31	2.5 × 10^-5	97.06	8.0 × 10^-6 to 6.2 × 10^-5
	NP		31	3.2 × 10^-5	97.06	1.8 × 10^-5 to 6.7 × 10^-5
	NA		31	1.0 × 10^-5	97.06	3.0 × 10^-6 to 5.3 × 10^-5
	M		31	1.9 × 10^-5	97.06	3.8 × 10^-6 to 6.7 × 10^-5
	NS		28	2.3 × 10^-5	96.43	9.0 × 10^-6 to 4.5 × 10^-5
pH1N1	PB2		35	4.2 × 10^-4	95.90	1.0 × 10^-4 to 5.4 × 10^-4
	PB1		32	1.7 × 10^-4	97.99	6.1 × 10^-5 to 7.0 × 10^-4
	PA	2009–2010	20	6.9 × 10^-5	95.86	3.9 × 10^-5 to 2.3 × 10^-4
		2010–2011	16	3.3 × 10^-4	97.87	1.0 × 10^-4 to 1.2 × 10^-3
	HA		36	1.7 × 10^-4	97.12	6.4 × 10^-5 to 5.4 × 10^-4
	NP		35	1.3 × 10^-4	95.90	5.9 × 10^-5 to 2.6 × 10^-4
	NA		36	1.1 × 10^-4	97.12	5.6 × 10^-5 to 2.8 × 10^-4
	M	2009–2010	20	6.1 × 10^-5	95.86	1.0 × 10^-6 to 1.6 × 10^-4
		2010–2011	16	2.5 × 10^-4	97.87	2.2 × 10^-5 to 7.6 × 10^-4
	NS		36	1.3 × 10^-4	97.12	6.6 × 10^-5 to 2.9 × 10^-4

Median global mutation frequency per segment for each virus.

Comparisons of Global Mutation Frequency Distributions Between Subtypes

No significant difference in the global mutation frequency distribution was observed between sH1N1 and sH3N2, except for the PB1 polymerase sub-unit genomic segment. On the contrary, pH1N1 was found significantly different from both sH1N1 and sH3N2 for most genomic segments (Figure 3 and Supplementary Table 4). The pH1N1 subtype clearly differed from both sH1N1 and sH3N2 subtypes, and was found to be more heterogeneous with higher global mutation frequencies in most genomic segments.

FIGURE 3

Differences between subtypes. Results of the pairwise comparisons by Mann–Whitney–Wilcoxon test of the global mutation frequency distributions (Supplementary Table 4). The genomic segments considered are mentioned on the left. X-axis: difference between the medians (symbol) and 95% CI of the difference (line). Symbols: circle, sH1N1 vs. sH3N2; square, sH1N1 vs. pH1N1 (for MP and PA, open square for pH1N1 season 2009–2010, plain square for pH1N1 season 2010–2011); diamond, sH3N2 vs. pH1N1 (for MP and PA, open square for pH1N1 season 2009–2010, plain square for pH1N1 season 2010–2011). ns, non-significant; ∗0.05 ≥p-value > 0.01; ∗∗0.01 ≥p-value > 0.001; ∗∗∗0.001 ≥p-value > 0.0001; ∗∗∗∗p-value ≤ 0.0001.

Comparisons of vSNP Pattern Between Seasons

To further analyze the quasispecies structure for the different viruses, we examined the distribution of the positions with a significant variant along each of the genomic segments. A viral single nucleotide polymorphism (vSNP) was defined as a significant variant position that was shared by more than 15% of the samples within a subtype, within a given season. Differences in the vSNP patterns were observed between the two studied seasons for both sH3N2 and pH1N1 subtypes (Figure 4). Subtype sH3N2 was characterized by very few vSNPs in the HA and NA segments, and vSNPs in the other segments that corresponded to variants that were mainly shared by less than 40% of the samples (with the exceptions of position 1731 in the PB1, 1849 in the PA, and 761 in the NS segments). Subtype pH1N1 was characterized by many vSNPs in the HA and NA segments. During season 2009–2010, most vSNPs were variants that were shared by less than 50% of the samples (Figure 4B, green). During season 2010–2011, several vSNPs in PB2, PA, HA, NP, NA, M, and NS segments were variants shared by 50–80% of the samples, whereas most vSNPs in the PB1 segment were shared by less than 50% of the samples (Figure 4B, orange).

FIGURE 4

Distribution of vSNPs along the genome. A viral Single Nucleotide Polymorphism (vSNP) is a position with a significant variant that has been identified in at least 15% of the samples in a group. X-axis: nucleotide position along each genomic segment; Y-axis: percentage of samples sharing a significant variant. (A) sH3N2 samples. In purple: season 2008–2009; in blue: season 2011–2012. (B) pH1N1 samples. In green: season 2009–2010; in orange: season 2010–2011.

Discussion

The intrinsic heterogeneity of nearly the entire viral genome of influenza A viruses was evaluated directly in respiratory specimens collected from a large number of infected patients over five consecutive seasons in a defined geographic area (Northern half of France). Depth of sequencing was not only high, with a coverage largely above 1000 for most PCR products, but it was also found to be extremely homogeneous along the PCR products, which represented an improvement compared to some published data (Kuroda et al., 2010; Kampmann et al., 2011; Lin et al., 2014; Welkers et al., 2015). Differences in the library generation protocols probably explain those results and highlight the fact that many parameters are still not totally controlled when deep sequencing technology is applied (Beerenwinkel et al., 2012). The use of high fidelity enzymes for both the reverse transcription and the PCR steps allowed to limit the introduction of errors during the viral genome amplification, with an estimated overall error reduction by 150 times compared to classical enzymes according to the manufacturers’ data. The introduction of errors during the preparation of libraries for deep sequencing runs is a well-known limitation of NGS that needs to be taken into account (Willerth et al., 2010; Acevedo and Andino, 2014). An 0.001 mutation frequency has been used by others as a threshold to define viral subpopulations based on Illumina technology results (Nakamura et al., 2011). We decided to use, for specific comparisons, only positions with a mutation frequency above 0.005 and that have been identified by two independent pipelines that were specifically designed for the identification of virus quasispecies, making our results extremely reliable and comparable to previous studies (Welkers et al., 2015). However, the physiological relevance of low mutation frequencies is still unknown and it is thus difficult to determine the impact of any subpopulations described by deep sequencing. The global mutation frequency that we calculated for the influenza virus samples appeared slightly lower than the published data on RNA virus polymerase error-rates (Parvin et al., 1986; Drake and Holland, 1999; Sanjuan et al., 2010), suggesting that some bottleneck events might occur between replication of the genome in the cells of the respiratory tract and excretion of the virus as collected in respiratory specimens (Xue et al., 2018). The fact that no bias was observed demonstrated that the level of heterogeneity detected by deep sequencing was not correlated to the virus load in the sample or to the age of the patient. This is an important point, as one could have hypothesized that the virus variability could be higher in infected children or that a higher mutation frequency could artificially be the consequence of more virus genomes being present in the sample. Deep sequencing approaches based on clinical samples instead of cell culture-amplified virus to evaluate the viral quasispecies have long focused on HIV and hepatitis viruses (Quinones-Mateu et al., 2014). Most studies concerned the appearance of resistance and the evolution of specific related mutations under antiviral treatments (Simen et al., 2009; Nasu et al., 2011; Fisher et al., 2012; Nishijima et al., 2012; Svarovskaia et al., 2012; Li et al., 2013; Rodriguez et al., 2013). A few studies were in experimental in vivo conditions, using animal models (Moncla et al., 2013; Sutton et al., 2014), and some focused on the intrinsic heterogeneity of the virus as it is excreted, following, for example, HIV1/HCV quasispecies in the semen of naturally (co-)infected males (Paranjpe et al., 2002; Briat et al., 2005). The importance of studying the quasispecies composition of RNA viruses was recently highlighted when deep sequencing analysis allowed to follow the virus evolutionary trajectories upon the emergence of a Chikungunya virus variant responsible for an epidemic in the Indian Ocean in 2006 (Stapleford et al., 2014). The description of vSNPs common to several samples within a subtype has been previously reported for influenza viruses mainly at positions involved in resistance to antivirals (Téllez-Sosa et al., 2013; Pizzorno et al., 2014). In the present study, vSNPs in the whole genome could be identified because a relatively large number of samples were analyzed for each subtype and for two seasons for sH3N2 and pH1N1. The physiological importance of the vSNPs we found in the different genomic segments is totally unknown and will require further investigation with the help of reverse genetics systems. Most of the positions with vSNPs concerned the third nucleotide of codons, and the variant mutations were consequently mainly synonymous (Supplementary Table 5), meaning that the phenotype of the variant subpopulations was probably not different from that of the main subpopulation. If intrinsic genetic variability does not confer variability in phenotypes, i.e., potentially immediate fitness advantages (Debbink et al., 2017; McCrone et al., 2017), it must have an importance in terms of evolution and adaptability. A heterogeneous population, even with variants at very low frequency, could facilitate or speed up evolution. Comparison of two seasons for a given subtype strikingly highlighted the evolution dynamics and quasispecies plasticity of influenza A viruses. Whereas no difference was observed in the global mutation frequency distribution between the two seasons for both sH3N2 and pH1N1, the vSNPs dispersal along the genome clearly showed differences between the two seasons. Interestingly, contrary to sH3N2 for which only a few positions were vSNPs in the HA and NA gene sequences, pH1N1 was characterized by the presence of many vSNPs in the glycoprotein gene sequences. Together with the higher level of global heterogeneity compared to sH1N1 and sH3N2, and despite the relative stability at the consensus sequence level since its introduction in human in 2009 (Klein et al., 2014), it demonstrated that pH1N1, a virus recently introduced in the human population, is still adapting to the human host, exploring different areas of the sequence space. This plasticity of the quasispecies was nonetheless also observed for sH3N2 viruses, which have been circulating in the human population since the late 1960s, since vSNP differences were identified between seasons. This phenomenon was nonetheless less pronounced than for pH1N1, but it illustrated the permanent, underlying genetic drift occurring in human influenza viruses. Two intriguing observations will require further investigations. First, for both pH1N1 and sH3N2, non-synonymous vSNPs were identified in viral components involved in intracellular steps of the virus replication, such as the polymerase sub-units, the nucleoprotein, the matrix M1 protein, and the non-structural NS1 protein. We hypothesize that these vSNPs might provide some flexibility in different processes, either directly by modulating the activity of a given protein, or indirectly by affecting the interaction with cellular factors. The second point concerns the main determinants of influenza virus antigenic evolution: the envelope glycoproteins HA and NA. It is unclear why so few vSNPs were identified in the HA and NA for sH3N2. We wonder whether a similar situation would be observed for sH3N2 samples collected during a season when sH3N2 does not dominate and whether there could be a relationship between the level of intrinsic heterogeneity and the alternate dominance observed between subtypes. We also noticed that several variants that were identified as minority mutations in the samples used in this study were later detected as dominant in other parts of the world and used to define some genetic clades/groups. Thus, for pH1N1, mutations A134T/S183P in the HA and Q313R/V394I in the NA, that we identified as minority mutations during the 2010–2011 season, were used the following season by WHO Collaborating Centers (Barr et al., 2014) to define group 3 sequences (represented by A/Hong-Kong/3934/2011). Similarly, mutations S143G/A197T in the HA and N44S in the NA were used to define group 7 sequences (represented by A/St-Petersburg/100/2011). Studying the intrinsic heterogeneity of influenza A viruses by deep sequencing directly from respiratory specimens clearly gave interesting insights into the virus evolution dynamics by adding a new dimension to analyses performed on consensus sequences. It would now be interesting to process samples from more epidemic seasons to better describe the phenomenon of evolutionary plasticity. Studies by other groups have not been able to address this question due to the limited number of seasons (Dinis et al., 2016; Poon et al., 2016) or samples (McCrone et al., 2017) that were under consideration or available. Thus, analyzing future seasons of pH1N1 dominance would show if the virus would maintain its high level of heterogeneity and continue to explore the sequence space, or if some equilibrium might soon be reached with a level of heterogeneity similar to that observed for sH3N2. Given the large dispersion observed in the global mutation frequency distribution within a group, it nonetheless seems necessary to process a large set of samples in order to identify statistically significant differences; this will be more achievable as the NGS technologies evolve and the cost decreases.

Materials and Methods

Clinical Samples

Samples were selected among influenza A virus-positive respiratory specimens available in the collection of the National Influenza Centre (NIC) for Northern France. For yearly epidemiological surveillance of influenza viruses circulating during seasonal epidemics, nasopharyngeal swabs were regularly collected by physicians of the GROG network from patients with ARI (acute respiratory illness) and sent to the NIC for virus detection and characterization. As part of its routine surveillance activities, the NIC also received respiratory specimens (nasopharyngeal swabs or nasopharyngeal aspirates) from the hospital laboratories belonging to the RENAL network, which mainly sent samples from hospitalized patients. In this study, for H3N2 (sH3N2) and pandemic 2009 H1N1 (pH1N1), mild cases were defined as patients from the community sampled by the GROG network, while severe cases were patients with severe acute respiratory distress requiring external respiratory assistance and were sampled by the RENAL network. For pre-pandemic H1N1 (sH1N1), sensitivity and resistance to Oseltamivir were based on the amino acid nature at position 275 in the neuraminidase NA protein (H and Y, respectively). All specimens were declared to the Ministère de l’Enseignement Supérieur et de la Recherche (French Research and Higher Education Ministry) as a collection of samples that may be used for research activities including viral quasispecies genetic characterization (Number DC-2010-1197, Collection Number 4). Access to personal data was limited to the patient age.

Viral Genome Amplification

Viral RNA was extracted from 140 μL of respiratory specimens with Qiagen QIAamp Viral RNA kit (Cat# 52904). Ten microliters of purified RNA were then reverse transcribed with Agilent AccuScript High Fidelity 1st strand cDNA Synthesis kit (Cat# 200820) using Uni1 (5′-AGCRAAAGCAGG-3′) primer. The viral genome was then amplified by PCR using Thermo Scientific Phusion High-Fidelity DNA Polymerase (Cat# F-530). All reactions were performed according to the manufacturers’ instructions. Twelve PCRs were designed to cover as much as possible of the coding regions of the eight genomic segments (primer sequences and specific annealing temperature and elongation time conditions are available in Supplementary Table 6).

Deep Sequencing

After purification with Macherey-Nagel Nucleospin Gel and PCR Clean-up kit (Cat# 740609), the PCR products were quantified with Invitrogen Quant-iT Picogreen dsDNA Assay Kit (Cat# P7589). For each sample, two pools were prepared with the 12 PCRs covering the genome, fragmented (New England BioLabs NEBNext dsDNA Fragmentase, Cat# M0348), multiplexed, sequenced with Illumina cBot and GAIIX technology (Illumina TruSeq SR Cluster Kit v2-cBot-GA, Cat# GD-300-2001; Illumina TruSeq SBS Kit v5-GA, Cat# FC-105-5001), and analyzed with established deep sequencing data analysis tools and in-house scripts. Briefly, clipping was performed using the Galaxy tools (Mareuil et al., 2017), removing common adapter and other contaminants and trimming low quality bases (Phred < 30). Clipped reads were aligned to sequences of sH1N1 A/New Caledonia/20/1999 (GISAID Accession Nos. EPI_ISL_649) or sH3N2 A/Brisbane/10/2007 (GISAID Accession Nos. EPI_ISL_25019) or pH1N1 A/California/7/2009 (GISAID Accession Nos. EPI_ISL_159427) as reference, to extract the consensus viral sequence for each sample. For the ViVAN pipeline (Isakov et al., 2015), clipped reads of a given sample were aligned to the consensus sequence of that sample using Burrows Wheeler Aligner BWA v0.5.9 (Li and Durbin, 2009), then alignments were processed using SAMTools (Li et al., 2009) to obtain a pileup of the called bases at each position and statistically significant variants were identified above the background noise due to sequencing error, in every sufficiently covered site (>100×). In parallel, the clipped reads were aligned to the consensus sequence using Bowtie Aligner (Langmead et al., 2009) and significant variants were identified with LoFreq (Wilm et al., 2012).

Statistical Analysis

Correlations were evaluated using Spearman non-parametric tests (Kornbrot, 2014). Differences in global mutation frequency distributions between two groups (pairwise comparisons) were compared by the two-tailed Mann–Whitney–Wilcoxon test, a non-parametric test based on rank-sums (Moses, 2014).

Data Availability

The cleaned reads (as FASTQ files) were deposited in the European Nucleotide Archive database for NGS sequences https://www.ebi.ac.uk/ena; accession number ERP012790). In addition, the original reads (prior to cleaning) will be made available by the authors, without undue reservation, to any qualified researcher.

Author Contributions

CB, SW, and MV conceptualized the study. CB and HB performed the template preparations and NGS sequencing. LJ, OI, GC, and NS designed the softwares and performed the bio-informatics analyses with the Variant Callers. VE gathered the resources. CB contributed to formal analysis and visualization of the data, and wrote the original draft of the manuscript. CB, SW, and MV wrote, reviewed, and edited the manuscript. SW, MV, and NS acquired the funding.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

68 in total

1. High nucleotide substitution error frequencies in clonal pools of vesicular stomatitis virus.

Authors: D A Steinhauer; J C de la Torre; J J Holland
Journal: J Virol Date: 1989-05 Impact factor: 5.103

2. Characterization of oseltamivir-resistant influenza virus populations in immunosuppressed patients using digital-droplet PCR: Comparison with qPCR and next generation sequencing analysis.

Authors: Maxime Pichon; Alexandre Gaymard; Laurence Josset; Martine Valette; Gilles Millat; Bruno Lina; Vanessa Escuret
Journal: Antiviral Res Date: 2017-08-03 Impact factor: 5.970

3. Identification of novel influenza A virus proteins translated from PA mRNA.

Authors: Yukiko Muramoto; Takeshi Noda; Eiryo Kawakami; Ramesh Akkina; Yoshihiro Kawaoka
Journal: J Virol Date: 2012-12-12 Impact factor: 5.103

4. A comprehensive deep sequencing strategy for full-length genomes of influenza A.

Authors: Dirk Höper; Bernd Hoffmann; Martin Beer
Journal: PLoS One Date: 2011-04-29 Impact factor: 3.240

5. Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform.

Authors: Wiriya Rutvisuttinunt; Piyawan Chinnawirotpisan; Sriluck Simasathien; Sanjaya K Shrestha; In-Kyu Yoon; Chonticha Klungthong; Stefan Fernandez
Journal: J Virol Methods Date: 2013-07-12 Impact factor: 2.014

6. Intrahost dynamics of antiviral resistance in influenza A virus reflect complex patterns of segment linkage, reassortment, and natural selection.

Authors: Matthew B Rogers; Timothy Song; Robert Sebra; Benjamin D Greenbaum; Marie-Eve Hamelin; Adam Fitch; Alan Twaddle; Lijia Cui; Edward C Holmes; Guy Boivin; Elodie Ghedin
Journal: MBio Date: 2015-04-07 Impact factor: 7.867

7. Full genome of influenza A (H7N9) virus derived by direct sequencing without culture.

Authors: Xianwen Ren; Fan Yang; Yongfeng Hu; Ting Zhang; Liguo Liu; Jie Dong; Lilian Sun; Yafang Zhu; Yan Xiao; Li Li; Jian Yang; Jianwei Wang; Qi Jin
Journal: Emerg Infect Dis Date: 2013-11 Impact factor: 6.883

8. New world bats harbor diverse influenza A viruses.

Authors: Suxiang Tong; Xueyong Zhu; Yan Li; Mang Shi; Jing Zhang; Melissa Bourgeois; Hua Yang; Xianfeng Chen; Sergio Recuenco; Jorge Gomez; Li-Mei Chen; Adam Johnson; Ying Tao; Cyrille Dreyfus; Wenli Yu; Ryan McBride; Paul J Carney; Amy T Gilbert; Jessie Chang; Zhu Guo; Charles T Davis; James C Paulson; James Stevens; Charles E Rupprecht; Edward C Holmes; Ian A Wilson; Ruben O Donis
Journal: PLoS Pathog Date: 2013-10-10 Impact factor: 6.823

9. The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin.

Authors: Bargavi Thyagarajan; Jesse D Bloom
Journal: Elife Date: 2014-07-08 Impact factor: 8.140

10. Vaccination has minimal impact on the intrahost diversity of H3N2 influenza viruses.

Authors: Kari Debbink; John T McCrone; Joshua G Petrie; Rachel Truscon; Emileigh Johnson; Emily K Mantlo; Arnold S Monto; Adam S Lauring
Journal: PLoS Pathog Date: 2017-01-31 Impact factor: 7.464

11 in total

1. Deep sequencing of 2009 influenza A/H1N1 virus isolated from volunteer human challenge study participants and natural infections.

Authors: Yongli Xiao; Jae-Keun Park; Stephanie Williams; Mitchell Ramuta; Adriana Cervantes-Medina; Tyler Bristol; Sarah Smith; Lindsay Czajkowski; Alison Han; John C Kash; Matthew J Memoli; Jeffery K Taubenberger
Journal: Virology Date: 2019-06-13 Impact factor: 3.616

2. Visualization of Genetic Drift Processes Using the Conserved Collagen 1α1 GXY Domain.

Authors: Anne J Kleinnijenhuis
Journal: J Mol Evol Date: 2019-03-13 Impact factor: 2.395

3. Review of genome sequencing technologies in molecular characterization of influenza A viruses in swine.

Authors: Ravendra P Chauhan; Michelle L Gordon
Journal: J Vet Diagn Invest Date: 2022-01-17 Impact factor: 1.279

4. Establishment of a Pig Influenza Challenge Model for Evaluation of Monoclonal Antibody Delivery Platforms.

Authors: Adam McNee; Trevor R F Smith; Barbara Holzer; Becky Clark; Emily Bessell; Ghiabe Guibinga; Heather Brown; Katherine Schultheis; Paul Fisher; Stephanie Ramos; Alejandro Nunez; Matthieu Bernard; Simon Graham; Veronica Martini; Tiphany Chrun; Yongli Xiao; John C Kash; Jeffery K Taubenberger; Sarah Elliott; Ami Patel; Peter Beverley; Pramila Rijal; David B Weiner; Alain Townsend; Kate E Broderick; Elma Tchilian
Journal: J Immunol Date: 2020-06-26 Impact factor: 5.422

5. Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission.

Authors: Nicolae Sapoval; Medhat Mahmoud; Michael D Jochum; Yunxi Liu; R A Leo Elworth; Qi Wang; Dreycey Albin; Huw Ogilvie; Michael D Lee; Sonia Villapol; Kyle M Hernandez; Irina Maljkovic Berry; Jonathan Foox; Afshin Beheshti; Krista Ternus; Kjersti M Aagaard; David Posada; Christopher E Mason; Fritz Sedlazeck; Todd J Treangen
Journal: bioRxiv Date: 2020-07-02

6. SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission.

Authors: Nicolae Sapoval; Medhat Mahmoud; Michael D Jochum; Yunxi Liu; R A Leo Elworth; Qi Wang; Dreycey Albin; Huw A Ogilvie; Michael D Lee; Sonia Villapol; Kyle M Hernandez; Irina Maljkovic Berry; Jonathan Foox; Afshin Beheshti; Krista Ternus; Kjersti M Aagaard; David Posada; Christopher E Mason; Fritz J Sedlazeck; Todd J Treangen
Journal: Genome Res Date: 2021-02-18 Impact factor: 9.438

7. Full Genomic Sequences of H5N1 Highly Pathogenic Avian Influenza Virus in Human Autopsy Specimens Reveal Genetic Variability and Adaptive Changes for Growth in MDCK Cell Cultures.

Authors: Kantima Sangsiriwut; Pirom Noisumdaeng; Mongkol Uiprasertkul; Jarunee Prasertsopon; Sunchai Payungporn; Prasert Auewarakul; Kumnuan Ungchusak; Pilaipan Puthavathana
Journal: Biomed Res Int Date: 2021-07-22 Impact factor: 3.411

Review 8. Impact of RNA Virus Evolution on Quasispecies Formation and Virulence.

Authors: Madiiha Bibi Mandary; Malihe Masomian; Chit Laa Poh
Journal: Int J Mol Sci Date: 2019-09-19 Impact factor: 5.923

9. A Turkey-origin H9N2 Avian Influenza Virus Shows Low Pathogenicity but Different Within-Host Diversity in Experimentally Infected Turkeys, Quail and Ducks.

Authors: Edyta Świętoń; Karolina Tarasiuk; Monika Olszewska-Tomczyk; Ewelina Iwan; Krzysztof Śmietanka
Journal: Viruses Date: 2020-03-16 Impact factor: 5.048

Review 10. The past, present and future of RNA respiratory viruses: influenza and coronaviruses.

Authors: Vadim Makarov; Olga Riabova; Sean Ekins; Nikolay Pluzhnikov; Sergei Chepur
Journal: Pathog Dis Date: 2020-10-07 Impact factor: 3.166