Literature DB >> 21373190

Emergence and genetic variation of neuraminidase stalk deletions in avian influenza viruses.

Jinling Li¹, Heinrich Zu Dohna, Carol J Cardona, Joy Miller, Tim E Carpenter.

Abstract

When avian influenza viruses (AIVs) are transmitted from their reservoir hosts (wild waterfowl and shorebirds) to domestic bird species, they undergo genetic changes that have been linked to higher virulence and broader host range. Common genetic AIV modifications in viral proteins of poultry isolates are deletions in the stalk region of the neuraminidase (NA) and additions of glycosylation sites on the hemagglutinin (HA). Even though these NA deletion mutations occur in several AIV subtypes, they have not been analyzed comprehensively. In this study, 4,920 NA nucleotide sequences, 5,596 HA nucleotide and 4,702 HA amino acid sequences were analyzed to elucidate the widespread emergence of NA stalk deletions in gallinaceous hosts, the genetic polymorphism of the deletion patterns and association between the stalk deletions in NA and amino acid variants in HA. Forty-seven different NA stalk deletion patterns were identified in six NA subtypes, N1-N3 and N5-N7. An analysis that controlled for phylogenetic dependence due to shared ancestry showed that NA stalk deletions are statistically correlated with gallinaceous hosts and certain amino acid features on the HA protein. Those HA features included five glycosylation sites, one insertion and one deletion. The correlations between NA stalk deletions and HA features are HA-NA-subtype-specific. Our results demonstrate that stalk deletions in the NA proteins of AIV are relatively common. Understanding the NA stalk deletion and related HA features may be important for vaccine and drug development and could be useful in establishing effective early detection and warning systems for the poultry industry.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Neuraminidase

Year: 2011 PMID： 21373190 PMCID： PMC3044137 DOI： 10.1371/journal.pone.0014722

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Wild aquatic birds, such as Anseriformes (ducks and swans) and Charadriiformes (gulls and shorebirds) are reservoir hosts for avian influenza viruses (AIV) [1], [2], [3]. However, AIVs can cause outbreaks in poultry [4], [5], [6], [7], [8]. In some instances, AIV strains from poultry hosts have increased pathogenicity for poultry species [9], [10], and have acquired an ability to infect mammalian hosts [11], [12], [13], and/or have caused fatal infections in humans [14], [15]. Therefore, to minimize adverse effects in humans and poultry from AIV infections, it is important to understand what evolutionary changes occur in AIVs when they are transmitted from wild birds to poultry. Understanding these evolutionary changes can lead to better detection, prevention and control strategies. AIVs interact with their hosts mostly through two glycoproteins, Hemagglutinin (HA) and Neuraminidase (NA). HA recognizes receptors on target cells and NA, a sialidase, assists virus entry and release [16], [17]. One observation that has been reported in viruses isolated during separate poultry outbreaks is a deletion in the stalk region of the NA [18], [19], [20], [21]. The stalk is a structure that separates the enzymatically and antigenically active “head” from the hydrophobic domain embedded in the viral membrane [22], [23], [24]. Little is known about the biological function of the NA stalk. Previous studies have shown that deletions in the NA stalk region influenced the virus' replication efficiency in vivo, increased its host range, reduced its NA enzymatic activity, and in some cases increased the virus' virulence [13], [25], [26]. Stalk deleted NAs (referred to as SΔNA hereafter) were reported sporadically in some AIV subtypes, e.g. H5N1, H6N1, H7N1, H7N3 and H9N2 [8], [9], [27], [28], [29]. SΔNA are often accompanied by observations of variants on the HA protein, such as the addition of glycosylation sites, presumably to maintain functional balance between HA and NA which is necessary for viral infectivity [30], [31], [32]. These HA variants could further influence viral antigenicity, virulence and pathogenicity [32], [33]. Although the SΔNA has been identified in different subtypes of poultry isolates, it is unclear whether there is a general correlation of SΔNAs with species in the order of Galliformes across different NA subtypes as claimed in previous publications [8], [9]. Galliformes (gallinaceous hosts) is an order of birds that includes important domestic and game birds, such as chickens, turkeys, pheasants, and quails. The aim of this study is to provide a broad understanding of the emergence of SΔNA through a comprehensive analysis of NA sequences. Our analysis showed SΔNA prevalence by subtypes, host and time period and demonstrated – with some exceptions – a general correlation between SΔNA and gallinaceous hosts. In addition, we analyzed the statistical correlation between SΔNA and variants on HA proteins.

Results

Occurrence of NA stalk deletion mutations

Among 4,920 neuraminidase sequences analyzed, 45.5% (2,238) carry stalk deletion mutations, i.e. 45.5% lost a stretch of amino acid residues in the stalk region (Table 1). The earliest SΔNA on record was from an H7N1 chicken-origin virus sampled in 1934 in Germany [34]. SΔNAs were observed with varying prevalence in six of nine NA subtypes (N1–N3 and N5–N7). Eighteen of 109 (16.5%) reported HA-NA subtypes in the public database (http://www.flu.lanl.gov/) contain SΔNAs. Eleven of these 18 (61%) HA-SΔNA subtypes include cases from poultry outbreaks as supported by publications [4], . Viruses from nine of 16 (56%) HA subtypes (H2–H7 and H9–H11) had SΔNA genes (Table 1). Forty-seven distinctive SΔNA patterns were identified among six NA subtypes with deleted portions ranging from one amino acid to 36 amino acid residues (Figure 1, Table S1). The only length polymorphism that we did not count as stalk deletion pattern was a Serine at position 40 among N5 sequences that was present in all North American isolates and with no counterpart amino acid in any Eurasian isolate.

Table 1

Occurrence of NA stalk deletions (SΔNA) among all reported avian influenza HA-NAa subtypes.

	N1	N2	N3	N4	N5	N6	N7	N8	N9	N1–N9
H1	136b	5	7	-	5	4	+	-	4	161
H2	19	*1/11	42	1	2	1	3	4	18	1/101
H3	25	*1/110	9	2	9	48	3	239	3	1/448
H4	4	1/19	7	5	4	1/213	4	52	3	2/311
H5	*1526c/1665b	*97/212	*3/58	1	1	1	6	2	8	1626/1954
H6	*113/184	*20/122	5	2	2/17	10	+	44	3	135/387
H7	*49/71	*144/163	*30/127	6	1	2	4/54	4	4	227/432
H8	-	+	1	35	1	-	1	-	-	38
H9	9	*229/705	2	1	3	3	+	1	-	229/724
H10	5	2	12/19	4	3	6	3/110	6	2	15/157
H11	10	22	1/8	1	1	3	-	2	71	1/118
H12	1	+	1	3	36	-	-	-	3	44
H13	-	3	1	-	-	10	-	2	7	23
H14	-	-	-	-	2	-	-	-	-	2
H15	-	+	-	-	-	+	-	1	4	5
H16	-	-	10	-	-	-	-	-	1	11
unknown	3	1/1								1/4
Total samples analyzed	2,132/441d	1,375	297	61	85	301	181	357	131	4,920/3,229d
Total # of SΔNA	1,688/174d	494	46	0	2	1	7	0	0	2,238/724d
SΔNA prevalence (%)	79.17/39.45d	35.93	15.49	0	2.35	0.33	3.87	0	0	45.49/22.42d

: hemagglutinin-neuraminidase.

: Single number and the number behind slash denote the number of samples analyzed.

: Number before slash denotes the number of sequences with SΔNA.

: excluding sequences from HPAI H5N1 viruses.

-: No isolate has been reported for the subtype in publically accessible database, GenBank.

+:No samples were included due to unqualified sequences of the reported HA-NA subtype.

*: Some cases of the subtype were reported in poultry outbreaks.

Figure 1

Patterns and prevalence of AIV SΔNA.

Patterns and prevalence of AIV SΔNA.

Bars represent positions of missing amino acid residues. Colors indicate NA subtypes (red = N1, blue = N2, orange = N3, green = N5, purple = N6 and brown = N7). The first column on the left shows the deletion IDs assigned to each pattern. The second column shows the number of sequences having the corresponding deletion pattern and in parentheses the percentage of this pattern among all sequences of the same NA subtype. Percentages do not sum to 100 for the entire graph since they are calculated per NA subtype. : hemagglutinin-neuraminidase. : Single number and the number behind slash denote the number of samples analyzed. : Number before slash denotes the number of sequences with SΔNA. : excluding sequences from HPAI H5N1 viruses. -: No isolate has been reported for the subtype in publically accessible database, GenBank. +:No samples were included due to unqualified sequences of the reported HA-NA subtype. *: Some cases of the subtype were reported in poultry outbreaks. The site of deletion and the frequency of each missing amino acid vary among NA subtypes. For each NA subtype, the most commonly dispensed positions are from 55 to 70 among all SΔNA subtypes except N5 (Figure S1).

Distribution of NA stalk deletions

The phylogenetic trees of NA nucleotides show that deletion patterns usually group into monophyletic clades that contain sequences belonging to a single deletion pattern (Figures 2, S2, S3, S4, S5 for N1, N2, N3 and N7 genes). Exceptions to monophyletic deletions are instances in which the same deletion pattern arose multiple times independently (deletion patterns 9, 12, 22, and 29, Figures 2, and S3), or patterns that are nested within a larger clade with a different deletion pattern (patterns 6, 7 and 10 within pattern 5, and pattern 17 within pattern 14). The patterns in the latter exception were most likely derived from existing patterns with additional deletions since they have the same deleted amino acid positions as those in the clade they are nested in (Figures 1, 2 and S3).

Figure 2

Phylogenetic tree of a subset of N1 sequences and distribution of SΔNA patterns (constructed in MrBayes 3.1.2).

Phylogenetic tree of a subset of N1 sequences and distribution of SΔNA patterns (constructed in MrBayes 3.1.2).

Values at nodes show estimated posterior probabilities for bipartitions. Branch colors indicate host order. Genes of isolates from gallinaceous hosts are shown by tree branches with extended grey lines. The first column from the left shows a black dash for each isolate with SΔNA. The second column shows the deletion pattern ID. Sequences with the same deletion pattern are denoted by square brackets numbered with pattern ID. The third column shows which continent each isolate was from and the fourth column indicates the HA subtype of each isolate. Most (38/47, 79%) of the SΔNA patterns were associated with a single HA subtype and a few patterns combined with multiple HA subtypes (9/47, 19%) (Table S1). Most mutants were limited to small geographic areas and a few patterns were found in isolates from multiple locations (10/47, 21%) (Table S1). The majority of SΔNAs (67%) persisted for less than a year (Figure 3). However, some patterns (4, 5, 14, 19, and 22) existed for many years (Figure 3, Table S1). Pattern 22 persisted for 25 years and appeared in both Asian and North American isolates (Figure 3 and S3).

Figure 3

Spatial-temporal distribution of SΔNA patterns.

Each line indicates an instance at which an SΔNA pattern emerged (refer to for definition). Pattern IDs on the left are the same as in

Spatial-temporal distribution of SΔNA patterns.

Each line indicates an instance at which an SΔNA pattern emerged (refer to for definition). Pattern IDs on the left are the same as in

Gallinaceous hosts and NA stalk deletions

The percentage of SΔNA virus was higher among gallinaceous (1,355/1,955, 69%) than non-gallinaceous hosts (893/2,424, 36%), mainly in the orders Charadriiformes and Anseriformes. A high percentage of SΔNA mutants was observed among non-gallinaceous isolates in N1 deletions in the 2000s (804/1,086, 74%) due to the large number (∼800) of HPAIV H5N1 samples from infected wild birds. Apart from the HPAI H5N1 viruses, there were 97 SΔNA isolates from non-gallinaceous hosts. Among them, only two SΔNAs of the non-gallinaceous isolates had no obvious connection to domestic poultry. One was an H5N3 virus from a migrating mallard in Japan [45] and the other an H6N5 virus from a shearwater in Australia [46]. Most other SΔNAs of non-gallinaceous isolates were either from domestic birds or linked to poultry for the following reasons: (i) 33 were isolated from farm ducks [8], [47], [48], [49], (ii) ten were isolated from birds in live bird markets [50], [51], [52], (iii) 18 were isolated from hosts species that were not indigenous to the geographic region of the isolate, therefore, domestic, and (iv) the remaining 34 clustered within gallinaceous isolates in the phylogenetic tree. Our analysis of the joint transition rates between all combinations of host types and NA states showed that estimated transition rates generally created a positive correlation between gallinaceous hosts and SΔNA. The rates leading to character combinations supporting a positive correlation (the combinations of gallinaceous/SΔNA and non-gallinaceous/full-length-NA) were subtracted from the rates leading to opposing character combinations (non-gallinaceous/SΔNA and gallinaceous/full-length-NA). A posterior distribution for this rate difference (ΔQ) was estimated. From this posterior distribution, a posterior probability that the rate difference exceeds zero was calculated (Table 2). A positive rate difference indicates that the positive correlation between gallinaceous hosts and SΔNA is due to uneven transition rates. For subclade 1 of N1, subclade 1 of N2 and N3, the posterior probabilities for a positive rate difference were greater than 0.95 (Table 2). However, this is not true for the following two groups, subclade 2 in N1 and subclade 2 in N2. Subclade 2 of N2, composed of N2 genes of H9N2 isolates of poultry outbreaks in China, includes 89% viruses from gallinaceous hosts but contains only 31% SΔNA mutants (Figure S3). The reverse is true for the HPAIV H5N1 subclade that contains 53% viruses from non-gallinaceous hosts yet has 99% SΔNA sequences (Figure S2).

Table 2

Bayesian estimation of joint transitions between gallinaceous and non-gallinaceous hosts and full-length-NA and SΔNA.

					Posterior probability that deletions are irreversible
Subtype	Groupa	Number of sequences analyzed	ΔQ mean (95% range)b	Posterior probability that ΔQ>0	Non- Galliformes	Galliformes
N1	Subclade 1 (not HPc H5N1)	498	386 (316; 468)	1	0.04	0.92
	Subclade 2 (HP H5N1)	1291	225 (−890; 1400)	0.65	0.26	0.61
N2	Subclade 1 (not Eurasian H9N2)	592	425 (213; 567)	1	0	0.97
	Subclade 2 (Eurasian H9N2)	625	−13 (−199; 114)	0.12	0	0.98
N3	Entire tree	261	87 (0–513)	0.97	0.06	0.3

: Analysis was performed separately for major clades when the trees were too large.

: Difference of all rates that lead to the character combinations Galliformes/SΔNA and non-Galliformes/full-length-NA minus all rates that lead the character combinations non-Galliformes/SΔNA and Galliformes/full-length-NA. (refer to Methods).

: highly pathogenic.

: Analysis was performed separately for major clades when the trees were too large. : Difference of all rates that lead to the character combinations Galliformes/SΔNA and non-Galliformes/full-length-NA minus all rates that lead the character combinations non-Galliformes/SΔNA and Galliformes/full-length-NA. (refer to Methods). : highly pathogenic. Models with a zero transition rate from deleted to full-length NA in Galliformes have high posterior probabilities among N1 and N2 sequences. This result implies that restoration of full-length-NA from SΔNA is unlikely for N1 and N2 in Galliformes (Table 2). These posterior probabilities are generally low in non-Galliformes (Table 2).

Geographic area and NA stalk deletions

We tested for a correlation between SΔNA and Asia for N1, N2 and N3. In subclade 1 of N1 and in both N2 subclades, there is a significant positive correlation between SΔNA and an isolate being from Asia (Table 3). In subclade 2 of N1, the correlation is negative and no correlation was found for N3 (Table 3).

Table 3

Bayesian estimation of joint transitions between Asia and non-Asia and full-length-NA and SΔNA.

Subtype	Groupa	ΔQ mean (95% range)b	Posterior probability that ΔQ>0
N1	Subclade 1 (not HPc H5N1)	213 (145; 256)	1
	Subclade 2 (HPc H5N1)	−1184 (−1515; −673)	0
N2	Subclade 1 (not Eurasian H9N2)	286 (170; 401)	1
	Subclade 2 (Eurasian H9N2)	771 (373; 1563)	1
N3	Entire tree	−207 (−511; 2)	0.19

: Analysis was performed separately for major clades when the trees were too large.

: Difference of all rates that lead to the character combinations Asia/SΔNA and non-Asia/full-length-NA minus all rates that lead the character combinations non-Asia/SΔNA and Asia/full-length-NA. (refer to Methods).

: highly pathogenic.

: Analysis was performed separately for major clades when the trees were too large. : Difference of all rates that lead to the character combinations Asia/SΔNA and non-Asia/full-length-NA minus all rates that lead the character combinations non-Asia/SΔNA and Asia/full-length-NA. (refer to Methods). : highly pathogenic.

HA modifications and NA stalk deletions

To understand the potential impact of SΔNA on other viral properties, such as antigenicity, we tested whether any amino acid residues of HA protein features, such as glycosylation sites, are statistically correlated with the SΔNA genotype. Seven HA features were positively associated with SΔNA as indicated by the posterior probability of a positive rate difference (i.e. a rate difference that generates a positive correlation) exceeding 0.95. The features include five glycosylation sites, one deletion and one insertion that are identified in H5, H6 and H7 (Figure 4). All of these associations are NA-subtype-specific because no HA variant is significantly associated with SΔNA in more than one NA subtype (Figure 4, Table 4). The distribution of the seven features on the HA phylogenies is shown in the supplemental material (Figures S6, S7, S8 for H5–H7).

Figure 4

Proportions of HA protein features among viruses with full-length-NA (filled bars) or with SΔNA (open bars).

Table 4

Bayesian estimation of joint transitions between full-length NA and SΔNA and selected HA features.

Subtype	Number of sequences analyzed	Feature	Position	ΔQ mean (95% range)a	Posterior probability that ΔQ>0	Associated NA deletion patternsc
H5N1	119	Glycosylation	158	859 (30; 1776)	0.998	4, 5, 6, 11
H5N2	120	Glycosylation	131	546 (1; 1557)	0.978	25, 33
			240	136 (29; 425)	>0.999	20, 25, 30
H6N1	136	Insertionb	141	943 (212; 1293)	>0.999	1, 2, 4
H7N1	42	Glycosylation	133	520 (13; 1603)	0.987	9, 10
		Glycosylation	159d	749 (114; 1726)	0.995	9
H7N2	138	Deletionb	221	1545 (266; 3046)	0.991	21
		Glycosylation	160	499 (6; 1585)	0.955	39

: Difference of all rates that lead to the character combinations feature present/SΔNA and feature absent/full-length-NA minus all rates that lead the character combinations feature absent/SΔNA and feature present/full-length-NA (refer to Methods).

: Blanks in HA sequences that are associated with SΔNA are designated as deletions and blanks in HA sequences that are associated with full-length-NA are designated as insertions in HA sequences associated with SΔNA.

: NA deletion pattern IDs that are associated with the HA feature that has a higher proportion among AIVs with SΔNA than among AIVs with full-length-NA.

: There is no exact corresponding position on the H3 sequence. This position is on an insertion between position 159 and 160 on the H3 sequence.

Proportions of HA protein features among viruses with full-length-NA (filled bars) or with SΔNA (open bars).

The X-axis indicates the features, where Glyc denotes a glycosylation site, Del denotes a deletion, Ins denotes an insertion and the number indicates the amino acid position according to the H3 numbering. NA subtypes are color-coded (red = N1, blue = N2, orange = N3, brown = N7). Asterisks denote HA features whose SΔNA are statistically correlated with the HA feature since the posterior probability that ΔQ exceeds zero is greater than 0.95 (refer to Methods and Table 4). : Difference of all rates that lead to the character combinations feature present/SΔNA and feature absent/full-length-NA minus all rates that lead the character combinations feature absent/SΔNA and feature present/full-length-NA (refer to Methods). : Blanks in HA sequences that are associated with SΔNA are designated as deletions and blanks in HA sequences that are associated with full-length-NA are designated as insertions in HA sequences associated with SΔNA. : NA deletion pattern IDs that are associated with the HA feature that has a higher proportion among AIVs with SΔNA than among AIVs with full-length-NA. : There is no exact corresponding position on the H3 sequence. This position is on an insertion between position 159 and 160 on the H3 sequence.

Discussion

In this study, we conducted a comprehensive analysis of the polymorphic AIV NA stalk regions using a large set of sequences of natural isolates (as opposed to laboratory-adapted isolates). Forty-seven different NA stalk deletion (SΔNA) patterns were identified in six NA subtypes, N1–N3 and N5–N7. An analysis that controlled for phylogenetic dependence due to shared ancestry showed SΔNA to be positively correlated with gallinaceous hosts and with some amino acid features on the HA. The analysis of the HA features estimated rates at which SΔNA were gained and lost on the HA tree depending on the presence of an HA feature. This analysis concerned only the overall rate of transitions and did not differentiate between transitions on the NA due to reassortment or due to mutation. It was therefore not necessary to estimate the reassortment rate between HA and SΔNA which is unknown for the analyzed dataset. Five glycosylation sites, one insertion and one deletion on the HA were identified to be statistically SΔNA-associated and all of these associations were specific to HA-NA subtype combinations. Our results further suggested that SΔNA mutations essentially cannot be restored to full-length in gallinaceous hosts. One challenge for our analysis was the unevenness of the sequence data due to varying sampling schemes. By fitting transition rates based on phylogenies, we could account for the uneven relatedness and the tendency of shared characteristics among closely related sequences. This was especially important for HPAIVs that are the source of a large number of very closely related sequences. However, when no SΔNAs were observed in non-gallinaceous hosts, the fitted transition rates could not distinguish between a restoration to a full-length NA stalk and an extinction of SΔNA mutants. Despite the limitation of our analysis, the findings shed light on the origin, spread and persistence of SΔNA as well as the functional balance between HA and NA [53]. The results of our analysis that controlled for phylogenetic dependence due to shared NA ancestry suggest that the positive correlation between SΔNA and gallinaceous hosts is caused by host-dependent transition rates between full-length-NA and SΔNA. Previous studies have indicated that naturally-occurring SΔNA mutants tend to appear in gallinaceous hosts but did not demonstrate a general correlation between host and SΔNA [8], [9], [28]. Laboratory studies showed that SΔNA mutations arose from duck-origin viruses after these viruses were passaged through gallinaceous hosts [13], [19], [54]. Our results provide further evidence that the processes demonstrated in laboratory studies generated the patterns observed in naturally-occurring viruses. The emergence of SΔNA in AIV infected gallinaceous hosts seems to be a general phenomenon that is widely detected in several NA subtypes. However, the correlation between gallinaceous hosts and SΔNA mutations is not ubiquitous. In the N2 subclade 2, only 31% of the H9N2 isolates from poultry outbreaks were SΔNA mutants (Tables 2 and S1, Figure S3). As for the N1 subclade of HPAIV H5N1, almost all non-gallinaceous isolates had SΔNA (Figure S2). Besides the HPAIV H5N1 N1 subclade, there were 97 SΔNA sequences of other subtypes from non-gallinaceous isolates. Hence, SΔNA mutants appear to be generated in Galliformes but can be transmitted to other hosts. Whether the SΔNA of other subtypes can persist at low prevalence in non-gallinaceous birds is unknown. The high diversity of deletion patterns found in several NA subtypes suggests that these deletions convey a general advantage for the viruses in gallinaceous hosts that does not depend on the particular NA amino acid sequence. The fact that the same deletion pattern arose more than once in several instances indicates that the number of possible deletion patterns that confer this advantage is limited. Our results showed a positive correlation between Asian locations and SΔNAs in three instances (both subclades of N2 and subclade 1 of N1). This correlation could be due to different farm practices in Asia (lower compliance with biosecurity measures and more mixed-species farms) and/or AIV sampling schemes biased towards poultry species in Asia. Among HP H5N1, there is a negative correlation between Asian locations and SΔNAs because all HP H5N1 that spread from Asia to other regions carried SΔNAs whereas within Asia two forms of NA (SΔNA and full-length-NA) of HP H5N1 co-existed (Figure S2). This study also uncovered amino acid features on HA proteins that are statistically correlated with SΔNA. Previous studies demonstrated that shortened NA stalk length reduces NA activity [25] and that an efficient virus replication requires a matching reduction of HA activity [32], [55]. It has been shown for H1N1 and H7N1 that glycosylation sites on the HA globular head structure reduce HA affinity for receptor binding and make the virus less dependent on NA function [32], [56], [57], [58]. Viruses with these additional HA glycosylation sites replicated efficiently when combined with SΔNA and were less sensitive to NA inhibiting drugs [32], [33], [56], [57], [58]. In H7N1, the glycosylation site that conferred these effects was Asn149 [32], [33], [57] (or 133 according to the H3 numbering [59]), the same glycosylation site that is significantly correlated with SΔNA on N1 according to our results (Figure 4, Table 4). Our analysis showed other putative glycosylation sites in HA of other subtypes, as well as insertion and deletion that are statistically correlated with SΔNA (Figure 4, Table 4). Given the requirement of functional balance between HA and NA [32], [56], [57], [58], we speculate that the HA features identified by our methods could also reduce the HA affinity for host cell receptors and therefore, the virus' sensitivity to NA inhibitors. The fact that SΔNAs are widely associated with isolates of gallinaceous hosts while each HA feature occurs only with a single NA subtype could be an evidence that SΔNAs are adaptations to gallinaceous hosts whereas the accompanying HA features are secondary adaptations to SΔNAs. In theory, some of the correlations between HA features and SΔNAs could be the by-product of two pair-wise correlations with a third feature on an internal gene. Investigating such effects would require techniques to analyze multiple correlations while controlling for phylogenetic dependencies, which will be topics for future studies. In summary, our results showed that SΔNAs are widely observed in AIVs. They are repeatedly associated with poultry outbreaks but are also found in non-poultry hosts. SΔNA mutants should be of special concern for the poultry industry since they could imply an adaptation of a virus to gallinaceous hosts. Previous researchers suggested that these viruses bear the risk of a pandemic [60], [61], [62]. AIV with SΔNA and SΔNA-associated HA features might be less sensitive to NA inhibiting drugs or reduce the efficacy of vaccines developed using a similar virus with full-length NA. Therefore, we believe it is important to closely monitor the emergence of SΔNA mutants in poultry and other species and prevent extended AIV circulation in poultry.

Methods

Sequence retrieval

AIV NA and HA nucleotide sequences and HA amino acid sequences were retrieved on December 17, 2009 from public influenza database (http://www.flu.lanl.gov/) using keywords type A and avian host. Nucleotide sequences of all lengths were retrieved while amino acid sequences were restricted to full length. NA nucleotide sequences were excluded from the analysis if the stalk region was not fully sequenced, i.e. if the first sequenced nucleotide position was after position 90 or the last sequenced position was before 270, counting adenine (A) in the start codon (ATG) as position 1. Furthermore, HA and NA sequences were excluded if they were acquired from viruses that were manipulated in laboratories (e.g. being expanded through laboratory animal or modified for the purpose of vaccine development), or if sequences were duplicates of an earlier submission of the same gene of the same strain, or erroneous as identified in a previous publication [63], or from non-avian isolates. A total of 4,920 NA, 5,596 HA nucleotide and 4,702 HA amino acid sequences were included in this study.

Sequence alignment and identification of stalk deletions

All qualified nucleotide sequences of the same NA subtype were aligned using Clustal W (BioEdit v7.0.5, Ibis Therapeutics, Carlsbad, CA). All nucleotide sequences were translated into amino acid sequences within each subtype alignment using the same program. Deletion patterns in the stalk region (positions 30 to 90 for amino acids and 90 to 270 for nucleotides) were adjusted manually at both nucleotide and amino acid levels by moving nucleotides or amino acids between both positions that flank the deleted region. Nucleotides were adjusted to avoid stretches of missing nucleotides being flanked by incomplete codons. After translation, amino acids flanking a deletion were assigned to the flanking positions that best matched the consensus sequence. If permitted by the consensus sequences, deletion patterns were further adjusted to minimize the numbers of distinct deletion patterns. Deletion patterns were ordered by NA subtype and by size of the deleted region within each NA subtype. An identification (ID) number was assigned to each deletion pattern by numbering the ordered deletion patterns consecutively. Phylogenetic trees were constructed for each NA subtype from aligned NA nucleotide sequences using weighted neighbor-joining (function bionj in the R-package ape) [64]. Trees were rooted with corresponding gene segments of the 1918 H1N1 human virus. The tree branches were labeled with deletion patterns. Due to their sizes, all trees are included in the supplemental material (Figures S2, S3, S4, S5). The first and last observation of each separate emerging SΔNA was recorded. Each different SΔNA pattern and distinct clades of the same SΔNA pattern were counted as separate emerging SΔNA. Two clades of the same SΔNA pattern were counted as distinct if they were separated by non-deleted sequences and had an estimated distance of more than 1% nucleotide changes between their nodes.

Identification of HA features correlated with NA stalk deletion

HA nucleotide and amino acid sequences were aligned using MUSCLE with two alignment iterations [65], [66]. Aligned HA amino acid sequences, excluding positions within cleavage site, were screened for three possible features, namely putative glycosylation sites (NXT/S, where X could be any amino acid residues except Proline) [28], deletions or insertions. A script was written in R [67] that identifies the positions of all three features in all HA sequences (Script S1). HA and NA sequences were linked by strain names. HA and NA sequences with matching location, host species, serial number and year of sampling were assumed to be from the same isolate. The H3 numbering system was used for all HA subtypes as described in a previous study [59].

Estimation of correlations between two characters

We tested for correlations of SΔNA with host, with HA features and with geographic region while controlling for phylogenetic dependence due to shared NA or HA ancestry. The associations between NA stalk state and host or region were analyzed based on NA trees. The association between NA stalk state and HA features was analyzed based on HA trees. Characters of interest (i.e. state of NA stalk, host type, presence of HA feature and geographic region) were coded as binary characters and estimating the transition rates among the four possible combinations of two binary characters based on phylogenetic trees using the software package BayesTraits [68], [69] (Figure 5). Character A represents NA stalk state (with or without stalk deletion) and character B can be either host (gallinaceous or non-gallinaceous), HA amino acid feature (present or absent) or region (Asia or non-Asia). Independent character evolution would imply that the transition rates between two states of one character do not depend on the state of the other character.

Figure 5

Illustration of all possible combinations of two binary characters (A and B) and the transition rates between these combinations.

Illustration of all possible combinations of two binary characters (A and B) and the transition rates between these combinations.

The model fits rates for the transitions per infinitely small time interval and therefore allows the change of only one state at a time (i.e. no rates are fitted for diagonal transitions). The shaded character combinations are expected to dominate if characters A and B are positively correlated. This positive correlation is created by transition rates if ΔQ = q 21+q 23+q 41+q 43−(q 12+q 14+q 32+q 34)>0. For a given association between character A and B, there are two supporting and two opposing character state combinations described in Figure 5. For example, an association between gallinaceous hosts and SΔNAs is supported by the character combinations Galliformes/SΔNA and non-Galliformes/full-length-NA and opposed by non-Galliformes/SΔNA and Galliformes/full-length-NA. The difference between all rates leading to the two supporting character state combinations minus all rates leading to the opposing character state combinations, ΔQ, was calculated as a measure for whether the character correlation is generated by interdependent transition rates (Figure 5). Using a Bayesian framework, posterior distributions of the transition rates were estimated given a distribution of phylogenetic trees. The distributions of phylogenetic HA and NA trees were estimated by fitting a general time reversible (GTR) model with gamma distributed rates to nucleotide sequences using MrBayes 3.1.2 [70]. The Markov chains for the trees were run for 3 Mio time steps with a burn-in period of 250,000 and trees were sampled every 2,000 steps leading to 1375 sampled trees per chain. Two independent chains were run for each tree and it was verified that the chains converged to the same tree by comparing the topological distances between trees within and between the chains. Each BayesTraits analysis sampled from a total of 2750 trees per subtype. Analysis using large trees can lead to log-likelihood values of the transition model below the smallest number that the BayesTraits code can represent. To avoid such low log-likelihood values, sequences were divided into groups and a separate parameter estimation was performed per group. NA sequences were grouped into NA subtypes and N1 and N2 were divided into two subclades within each subtype. N1 sequences were distinguished into the subclade that contains all HPAIV H5N1 NAs or all other N1 sequences. N2 sequences were separated into those from isolates of H9N2 outbreaks in China and those from other isolates. For HA sequences, trees were constructed for each HA-NA subtype, e.g. a separate tree was constructed for HA sequences from H5N1 viruses and from H5N2 viruses. HA sequences were grouped in this fashion since HA trees (Figures S6,S7, S8) suggested that the association between HA features and SΔNA depended on NA subtype. The N1 and H5 trees of the HPAIV H5N1 clade were further reduced by randomly selecting a single sequence from all clades whose maximum distance from its basal node was less than 0.01. Representative sequences belonging to closely related branches were retained by this trimming method. A reversible-jump Monte Carlo Markov Chain was used to simultaneously estimate the posterior distributions of transition rates and the posterior probabilities of constraints imposed on the transition rates [69]. A posterior distribution of ΔQ was calculated. A range containing 95% of ΔQ values and a posterior probability that ΔQ exceeds zero was determined from the posterior distribution. A high posterior probability of ΔQ exceeding zero was interpreted as evidence that observed positive correlations between characters were due to correlated transition rates and not simply due to shared ancestry. The correlation between SΔNA and HA features were analyzed based on HA trees. HA trees were chosen because in that case SΔNA can be acquired or lost by mutation and/or reassortment. Hence the NA states are a more labile character on the HA tree than on the NA tree. Larger volatility of the NA state is more likely to reveal functional correlations if they are present. A single rate was fitted for each transition, without distinguishing whether transitions were due to mutations or reassortment. Proportions of sequences with deleted amino acids at positions in the stalk region of AIV Neuraminidases. (0.29 MB EPS) Click here for additional data file. Neighbor-joining tree of N1 sequences. Branch colors indicate host types, pink for Galliformes, yellow for Anseriformes, green for Charadriiformes and grey for birds from other orders. Genes of isolates from gallinaceous hosts are shown by tree branches with extended grey lines. The first column from the left shows a black dash for each isolate with SΔNA. The second column shows the deletion pattern ID. Sequences with the same deletion pattern are denoted by square brackets numbered with pattern ID. Deletion patterns that are nested within larger deletion pattern clades are indicated by deletion pattern IDs on the left side of the brackets. The third column indicates the HA subtype of each isolate. The fourth column shows the continent each isolated is from. The two N1 subclades are shown by brackets in the last column. The box indicates the part of the tree that is shown in Figure 2. (0.63 MB EPS) Click here for additional data file. Neighbor-joining tree of N2 sequences. Symbol descriptions are the same as in Figure S2. (0.49 MB EPS) Click here for additional data file. Neighbor-joining tree of N3 sequences. Symbol descriptions are the same as in Figure S2. (0.27 MB EPS) Click here for additional data file. Neighbor-joining tree of N7 sequences. Symbol descriptions are the same as in Figure S2. (0.24 MB EPS) Click here for additional data file. Neighbor-joining tree of H5 sequences. Branch colors indicate host types, pink for Galliformes, yellow for Anseriformes, green for Charadriiformes and grey for birds from other orders. Genes of isolates from gallinaceous hosts are shown in pink branches with extended grey lines. The columns from left to right show NA stalk state for each sequence (a pattern ID is given for each isolate with SΔNA), NA subtype, continent of origin and various HA protein features. NA subtypes and continents are color-coded. The presence of described HA features (Glyc = glycosylation, Del = deletion and Ins = insertion) are shown by black dashes. (0.61 MB EPS) Click here for additional data file. Neighbor-joining tree of H6 sequences. Symbol descriptions are the same as in Figure S6. Strain names are shown. (0.29 MB EPS) Click here for additional data file. Neighbor-joining tree of H7 sequences. Symbol descriptions are the same as in Figure S6. Strain names are shown. (0.31 MB EPS) Click here for additional data file. Summary of HA/NA subtypes, prevalence, deleted region, sampling location and time per deletion pattern. (0.03 MB DOC) Click here for additional data file. R code to identify glycosylation sites and deletions in amino acid sequences. (0.00 MB TXT) Click here for additional data file.

67 in total

1. Use of reverse transcriptase PCR to subtype N1 to N9 neuraminidase genes of avian influenza viruses.

Authors: Kenji Tsukamoto; Takayoshi Ashizawa; Koji Nakanishi; Noriyuki Kaji; Kotaro Suzuki; Makiko Shishido; Masatoshi Okamatsu; Masaji Mase
Journal: J Clin Microbiol Date: 2009-04-29 Impact factor: 5.948

Review 2. Evolution and ecology of influenza A viruses.

Authors: R G Webster; W J Bean; O T Gorman; T M Chambers; Y Kawaoka
Journal: Microbiol Rev Date: 1992-03

3. Identifying errors in avian influenza virus gene sequences and implications for data usage of public databases.

Authors: Jinling Li; Heinrich zu Dohna; Joy Miller; Carol J Cardona; Tim E Carpenter
Journal: Genomics Date: 2009-09-18 Impact factor: 5.736

4. Characterization of the pathogenicity of members of the newly established H9N2 influenza virus lineages in Asia.

Authors: Y J Guo; S Krauss; D A Senne; I P Mo; K S Lo; X P Xiong; M Norwood; K F Shortridge; R G Webster; Y Guan
Journal: Virology Date: 2000-02-15 Impact factor: 3.616

5. Balanced hemagglutinin and neuraminidase activities are critical for efficient replication of influenza A virus.

Authors: L J Mitnaul; M N Matrosovich; M R Castrucci; A B Tuzikov; N V Bovin; D Kobasa; Y Kawaoka
Journal: J Virol Date: 2000-07 Impact factor: 5.103

6. The influenza virus gene pool in a poultry market in South central china.

Authors: Ming Liu; Shiqin He; David Walker; NanNan Zhou; Daniel R Perez; Bing Mo; Fan Li; Xiaotian Huang; Robert G Webster; Richard J Webby
Journal: Virology Date: 2003-01-20 Impact factor: 3.616

7. Adaptation of influenza A/Mallard/Potsdam/178-4/83 H2N2 virus in Japanese quail leads to infection and transmission in chickens.

Authors: E M Sorrell; D R Perez
Journal: Avian Dis Date: 2007-03 Impact factor: 1.577

8. Characterization of recent H5 subtype avian influenza viruses from US poultry.

Authors: Chang-Won Lee; Dennis A Senne; Jose A Linares; Peter R Woolcock; David E Stallknecht; Erica Spackman; David E Swayne; David L Suarez
Journal: Avian Pathol Date: 2004-06 Impact factor: 3.378

9. A novel genotype H9N2 influenza virus possessing human H5N1 internal genomes has been circulating in poultry in eastern China since 1998.

Authors: Pinghu Zhang; Yinghua Tang; Xiaowen Liu; Wenbo Liu; Xiaorong Zhang; Hongqi Liu; Daxin Peng; Song Gao; Yantao Wu; Luyong Zhang; Shan Lu; Xiufan Liu
Journal: J Virol Date: 2009-06-24 Impact factor: 5.103

10. MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors: Robert C Edgar
Journal: BMC Bioinformatics Date: 2004-08-19 Impact factor: 3.169

48 in total

1. Genetic evolution of the H9N2 avian influenza virus in Korean poultry farms.

Authors: Chang-Hee Lee; Sung-Hwan Byun; Youn-Jeong Lee; In-Pil Mo
Journal: Virus Genes Date: 2012-03-28 Impact factor: 2.332

2. Highly pathogenic avian influenza H5N1 clade 2.3.2.1 and clade 2.3.4 viruses do not induce a clade-specific phenotype in mallard ducks.

Authors: Mariette Ducatez; Stephanie Sonnberg; Jeri Carol Crumpton; Adam Rubrum; Phouvong Phommachanh; Bounlom Douangngeun; Malik Peiris; Yi Guan; Robert Webster; Richard Webby
Journal: J Gen Virol Date: 2017-06-20 Impact factor: 3.891

3. Length variations in the NA stalk of an H7N1 influenza virus have opposite effects on viral excretion in chickens and ducks.

Authors: T W Hoffmann; S Munier; T Larcher; D Soubieux; M Ledevin; E Esnault; A Tourdes; G Croville; J-L Guérin; P Quéré; R Volmer; N Naffakh; D Marc
Journal: J Virol Date: 2011-10-19 Impact factor: 5.103

4. Influenza H7N9 and H9N2 viruses: coexistence in poultry linked to human H7N9 infection and genome characteristics.

Authors: Xinfen Yu; Tao Jin; Yujun Cui; Xiaoying Pu; Jun Li; Jin Xu; Guang Liu; Huijue Jia; Dan Liu; Shili Song; Yang Yu; Li Xie; Renjie Huang; Hua Ding; Yu Kou; Yinyan Zhou; Yayu Wang; Xun Xu; Ye Yin; Jian Wang; Chenyi Guo; Xianwei Yang; Liangping Hu; Xiaopeng Wu; Hailong Wang; Jun Liu; Guoqiu Zhao; Jiyong Zhou; Jingcao Pan; George F Gao; Ruifu Yang; Jun Wang
Journal: J Virol Date: 2014-01-08 Impact factor: 5.103

5. Avian influenza from an ecohealth perspective.

Authors: Les Simms; Martyn Jeggo
Journal: Ecohealth Date: 2014-04-24 Impact factor: 3.184

Review 6. Transmission of influenza A viruses.

Authors: Gabriele Neumann; Yoshihiro Kawaoka
Journal: Virology Date: 2015-03-24 Impact factor: 3.616

Review 7. Genetic Adaptation of Influenza A Viruses in Domestic Animals and Their Potential Role in Interspecies Transmission: A Literature Review.

Authors: Olga Munoz; Marco De Nardi; Karen van der Meulen; Kristien van Reeth; Marion Koopmans; Kate Harris; Sophie von Dobschuetz; Gudrun Freidl; Adam Meijer; Andrew Breed; Andrew Hill; Rowena Kosmider; Jill Banks; Katharina D C Stärk; Barbara Wieland; Kim Stevens; Sylvie van der Werf; Vincent Enouf; Gwenaelle Dauphin; William Dundon; Giovanni Cattoli; Ilaria Capua
Journal: Ecohealth Date: 2015-01-29 Impact factor: 3.184