When avian influenza viruses (AIVs) are transmitted from their reservoir hosts (wild waterfowl and shorebirds) to domestic bird species, they undergo genetic changes that have been linked to higher virulence and broader host range. Common genetic AIV modifications in viral proteins of poultry isolates are deletions in the stalk region of the neuraminidase (NA) and additions of glycosylation sites on the hemagglutinin (HA). Even though these NA deletion mutations occur in several AIV subtypes, they have not been analyzed comprehensively. In this study, 4,920 NA nucleotide sequences, 5,596 HA nucleotide and 4,702 HA amino acid sequences were analyzed to elucidate the widespread emergence of NA stalk deletions in gallinaceous hosts, the genetic polymorphism of the deletion patterns and association between the stalk deletions in NA and amino acid variants in HA. Forty-seven different NA stalk deletion patterns were identified in six NA subtypes, N1-N3 and N5-N7. An analysis that controlled for phylogenetic dependence due to shared ancestry showed that NA stalk deletions are statistically correlated with gallinaceous hosts and certain amino acid features on the HA protein. Those HA features included five glycosylation sites, one insertion and one deletion. The correlations between NA stalk deletions and HA features are HA-NA-subtype-specific. Our results demonstrate that stalk deletions in the NA proteins of AIV are relatively common. Understanding the NA stalk deletion and related HA features may be important for vaccine and drug development and could be useful in establishing effective early detection and warning systems for the poultry industry.
When avian influenza viruses (AIVs) are transmitted from their reservoir hosts (wild waterfowl and shorebirds) to domestic bird species, they undergo genetic changes that have been linked to higher virulence and broader host range. Common genetic AIV modifications in viral proteins of poultry isolates are deletions in the stalk region of the neuraminidase (NA) and additions of glycosylation sites on the hemagglutinin (HA). Even though these NA deletion mutations occur in several AIV subtypes, they have not been analyzed comprehensively. In this study, 4,920 NA nucleotide sequences, 5,596 HA nucleotide and 4,702 HA amino acid sequences were analyzed to elucidate the widespread emergence of NA stalk deletions in gallinaceous hosts, the genetic polymorphism of the deletion patterns and association between the stalk deletions in NA and amino acid variants in HA. Forty-seven different NA stalk deletion patterns were identified in six NA subtypes, N1-N3 and N5-N7. An analysis that controlled for phylogenetic dependence due to shared ancestry showed that NA stalk deletions are statistically correlated with gallinaceous hosts and certain amino acid features on the HA protein. Those HA features included five glycosylation sites, one insertion and one deletion. The correlations between NA stalk deletions and HA features are HA-NA-subtype-specific. Our results demonstrate that stalk deletions in the NA proteins of AIV are relatively common. Understanding the NA stalk deletion and related HA features may be important for vaccine and drug development and could be useful in establishing effective early detection and warning systems for the poultry industry.
Wild aquatic birds, such as Anseriformes (ducks and swans) and Charadriiformes (gulls and shorebirds) are reservoir hosts for avian influenza viruses (AIV) [1], [2], [3]. However, AIVs can cause outbreaks in poultry [4], [5], [6], [7], [8]. In some instances, AIV strains from poultry hosts have increased pathogenicity for poultry species [9], [10], and have acquired an ability to infect mammalian hosts [11], [12], [13], and/or have caused fatal infections in humans [14], [15]. Therefore, to minimize adverse effects in humans and poultry from AIV infections, it is important to understand what evolutionary changes occur in AIVs when they are transmitted from wild birds to poultry. Understanding these evolutionary changes can lead to better detection, prevention and control strategies.AIVs interact with their hosts mostly through two glycoproteins, Hemagglutinin (HA) and Neuraminidase (NA). HA recognizes receptors on target cells and NA, a sialidase, assists virus entry and release [16], [17]. One observation that has been reported in viruses isolated during separate poultry outbreaks is a deletion in the stalk region of the NA [18], [19], [20], [21]. The stalk is a structure that separates the enzymatically and antigenically active “head” from the hydrophobic domain embedded in the viral membrane [22], [23], [24]. Little is known about the biological function of the NA stalk. Previous studies have shown that deletions in the NA stalk region influenced the virus' replication efficiency in vivo, increased its host range, reduced its NA enzymatic activity, and in some cases increased the virus' virulence [13], [25], [26]. Stalk deleted NAs (referred to as SΔNA hereafter) were reported sporadically in some AIV subtypes, e.g. H5N1, H6N1, H7N1, H7N3 and H9N2 [8], [9], [27], [28], [29]. SΔNA are often accompanied by observations of variants on the HA protein, such as the addition of glycosylation sites, presumably to maintain functional balance between HA and NA which is necessary for viral infectivity [30], [31], [32]. These HA variants could further influence viral antigenicity, virulence and pathogenicity [32], [33].Although the SΔNA has been identified in different subtypes of poultry isolates, it is unclear whether there is a general correlation of SΔNAs with species in the order of Galliformes across different NA subtypes as claimed in previous publications [8], [9]. Galliformes (gallinaceous hosts) is an order of birds that includes important domestic and game birds, such as chickens, turkeys, pheasants, and quails. The aim of this study is to provide a broad understanding of the emergence of SΔNA through a comprehensive analysis of NA sequences. Our analysis showed SΔNA prevalence by subtypes, host and time period and demonstrated – with some exceptions – a general correlation between SΔNA and gallinaceous hosts. In addition, we analyzed the statistical correlation between SΔNA and variants on HA proteins.
Results
Occurrence of NA stalk deletion mutations
Among 4,920 neuraminidase sequences analyzed, 45.5% (2,238) carry stalk deletion mutations, i.e. 45.5% lost a stretch of amino acid residues in the stalk region (Table 1). The earliest SΔNA on record was from an H7N1chicken-origin virus sampled in 1934 in Germany [34]. SΔNAs were observed with varying prevalence in six of nine NA subtypes (N1–N3 and N5–N7). Eighteen of 109 (16.5%) reported HA-NA subtypes in the public database (http://www.flu.lanl.gov/) contain SΔNAs. Eleven of these 18 (61%) HA-SΔNA subtypes include cases from poultry outbreaks as supported by publications [4], . Viruses from nine of 16 (56%) HA subtypes (H2–H7 and H9–H11) had SΔNA genes (Table 1). Forty-seven distinctive SΔNA patterns were identified among six NA subtypes with deleted portions ranging from one amino acid to 36 amino acid residues (Figure 1, Table S1). The only length polymorphism that we did not count as stalk deletion pattern was a Serine at position 40 among N5 sequences that was present in all North American isolates and with no counterpart amino acid in any Eurasian isolate.
Table 1
Occurrence of NA stalk deletions (SΔNA) among all reported avian influenza HA-NAa subtypes.
N1
N2
N3
N4
N5
N6
N7
N8
N9
N1–N9
H1
136b
5
7
-
5
4
+
-
4
161
H2
19
*1/11
42
1
2
1
3
4
18
1/101
H3
25
*1/110
9
2
9
48
3
239
3
1/448
H4
4
1/19
7
5
4
1/213
4
52
3
2/311
H5
*1526c/1665b
*97/212
*3/58
1
1
1
6
2
8
1626/1954
H6
*113/184
*20/122
5
2
2/17
10
+
44
3
135/387
H7
*49/71
*144/163
*30/127
6
1
2
4/54
4
4
227/432
H8
-
+
1
35
1
-
1
-
-
38
H9
9
*229/705
2
1
3
3
+
1
-
229/724
H10
5
2
12/19
4
3
6
3/110
6
2
15/157
H11
10
22
1/8
1
1
3
-
2
71
1/118
H12
1
+
1
3
36
-
-
-
3
44
H13
-
3
1
-
-
10
-
2
7
23
H14
-
-
-
-
2
-
-
-
-
2
H15
-
+
-
-
-
+
-
1
4
5
H16
-
-
10
-
-
-
-
-
1
11
unknown
3
1/1
1/4
Total samples analyzed
2,132/441d
1,375
297
61
85
301
181
357
131
4,920/3,229d
Total # of SΔNA
1,688/174d
494
46
0
2
1
7
0
0
2,238/724d
SΔNA prevalence (%)
79.17/39.45d
35.93
15.49
0
2.35
0.33
3.87
0
0
45.49/22.42d
: hemagglutinin-neuraminidase.
: Single number and the number behind slash denote the number of samples analyzed.
: Number before slash denotes the number of sequences with SΔNA.
: excluding sequences from HPAI H5N1 viruses.
-: No isolate has been reported for the subtype in publically accessible database, GenBank.
+:No samples were included due to unqualified sequences of the reported HA-NA subtype.
*: Some cases of the subtype were reported in poultry outbreaks.
Figure 1
Patterns and prevalence of AIV SΔNA.
Bars represent positions of missing amino acid residues. Colors indicate NA subtypes (red = N1, blue = N2, orange = N3, green = N5, purple = N6 and brown = N7). The first column on the left shows the deletion IDs assigned to each pattern. The second column shows the number of sequences having the corresponding deletion pattern and in parentheses the percentage of this pattern among all sequences of the same NA subtype. Percentages do not sum to 100 for the entire graph since they are calculated per NA subtype.
Patterns and prevalence of AIV SΔNA.
Bars represent positions of missing amino acid residues. Colors indicate NA subtypes (red = N1, blue = N2, orange = N3, green = N5, purple = N6 and brown = N7). The first column on the left shows the deletion IDs assigned to each pattern. The second column shows the number of sequences having the corresponding deletion pattern and in parentheses the percentage of this pattern among all sequences of the same NA subtype. Percentages do not sum to 100 for the entire graph since they are calculated per NA subtype.: hemagglutinin-neuraminidase.: Single number and the number behind slash denote the number of samples analyzed.: Number before slash denotes the number of sequences with SΔNA.: excluding sequences from HPAI H5N1 viruses.-: No isolate has been reported for the subtype in publically accessible database, GenBank.+:No samples were included due to unqualified sequences of the reported HA-NA subtype.*: Some cases of the subtype were reported in poultry outbreaks.The site of deletion and the frequency of each missing amino acid vary among NA subtypes. For each NA subtype, the most commonly dispensed positions are from 55 to 70 among all SΔNA subtypes except N5 (Figure S1).
Distribution of NA stalk deletions
The phylogenetic trees of NA nucleotides show that deletion patterns usually group into monophyletic clades that contain sequences belonging to a single deletion pattern (Figures 2, S2, S3, S4, S5 for N1, N2, N3 and N7 genes). Exceptions to monophyletic deletions are instances in which the same deletion pattern arose multiple times independently (deletion patterns 9, 12, 22, and 29, Figures 2, and S3), or patterns that are nested within a larger clade with a different deletion pattern (patterns 6, 7 and 10 within pattern 5, and pattern 17 within pattern 14). The patterns in the latter exception were most likely derived from existing patterns with additional deletions since they have the same deleted amino acid positions as those in the clade they are nested in (Figures 1, 2 and S3).
Figure 2
Phylogenetic tree of a subset of N1 sequences and distribution of SΔNA patterns (constructed in MrBayes 3.1.2).
Values at nodes show estimated posterior probabilities for bipartitions. Branch colors indicate host order. Genes of isolates from gallinaceous hosts are shown by tree branches with extended grey lines. The first column from the left shows a black dash for each isolate with SΔNA. The second column shows the deletion pattern ID. Sequences with the same deletion pattern are denoted by square brackets numbered with pattern ID. The third column shows which continent each isolate was from and the fourth column indicates the HA subtype of each isolate.
Phylogenetic tree of a subset of N1 sequences and distribution of SΔNA patterns (constructed in MrBayes 3.1.2).
Values at nodes show estimated posterior probabilities for bipartitions. Branch colors indicate host order. Genes of isolates from gallinaceous hosts are shown by tree branches with extended grey lines. The first column from the left shows a black dash for each isolate with SΔNA. The second column shows the deletion pattern ID. Sequences with the same deletion pattern are denoted by square brackets numbered with pattern ID. The third column shows which continent each isolate was from and the fourth column indicates the HA subtype of each isolate.Most (38/47, 79%) of the SΔNA patterns were associated with a single HA subtype and a few patterns combined with multiple HA subtypes (9/47, 19%) (Table S1). Most mutants were limited to small geographic areas and a few patterns were found in isolates from multiple locations (10/47, 21%) (Table S1). The majority of SΔNAs (67%) persisted for less than a year (Figure 3). However, some patterns (4, 5, 14, 19, and 22) existed for many years (Figure 3, Table S1). Pattern 22 persisted for 25 years and appeared in both Asian and North American isolates (Figure 3 and S3).
Figure 3
Spatial-temporal distribution of SΔNA patterns.
Each line indicates an instance at which an SΔNA pattern emerged (refer to
for definition). Pattern IDs on the left are the same as in
Spatial-temporal distribution of SΔNA patterns.
Each line indicates an instance at which an SΔNA pattern emerged (refer to
for definition). Pattern IDs on the left are the same as in
Gallinaceous hosts and NA stalk deletions
The percentage of SΔNA virus was higher among gallinaceous (1,355/1,955, 69%) than non-gallinaceous hosts (893/2,424, 36%), mainly in the orders Charadriiformes and Anseriformes. A high percentage of SΔNA mutants was observed among non-gallinaceous isolates in N1 deletions in the 2000s (804/1,086, 74%) due to the large number (∼800) of HPAIV H5N1 samples from infected wild birds. Apart from the HPAI H5N1 viruses, there were 97 SΔNA isolates from non-gallinaceous hosts. Among them, only two SΔNAs of the non-gallinaceous isolates had no obvious connection to domestic poultry. One was an H5N3 virus from a migrating mallard in Japan [45] and the other an H6N5 virus from a shearwater in Australia [46]. Most other SΔNAs of non-gallinaceous isolates were either from domestic birds or linked to poultry for the following reasons: (i) 33 were isolated from farm ducks [8], [47], [48], [49], (ii) ten were isolated from birds in live bird markets [50], [51], [52], (iii) 18 were isolated from hosts species that were not indigenous to the geographic region of the isolate, therefore, domestic, and (iv) the remaining 34 clustered within gallinaceous isolates in the phylogenetic tree.Our analysis of the joint transition rates between all combinations of host types and NA states showed that estimated transition rates generally created a positive correlation between gallinaceous hosts and SΔNA. The rates leading to character combinations supporting a positive correlation (the combinations of gallinaceous/SΔNA and non-gallinaceous/full-length-NA) were subtracted from the rates leading to opposing character combinations (non-gallinaceous/SΔNA and gallinaceous/full-length-NA). A posterior distribution for this rate difference (ΔQ) was estimated. From this posterior distribution, a posterior probability that the rate difference exceeds zero was calculated (Table 2). A positive rate difference indicates that the positive correlation between gallinaceous hosts and SΔNA is due to uneven transition rates. For subclade 1 of N1, subclade 1 of N2 and N3, the posterior probabilities for a positive rate difference were greater than 0.95 (Table 2). However, this is not true for the following two groups, subclade 2 in N1 and subclade 2 in N2. Subclade 2 of N2, composed of N2 genes of H9N2 isolates of poultry outbreaks in China, includes 89% viruses from gallinaceous hosts but contains only 31% SΔNA mutants (Figure S3). The reverse is true for the HPAIV H5N1 subclade that contains 53% viruses from non-gallinaceous hosts yet has 99% SΔNA sequences (Figure S2).
Table 2
Bayesian estimation of joint transitions between gallinaceous and non-gallinaceous hosts and full-length-NA and SΔNA.
Posterior probability that deletions are irreversible
Subtype
Groupa
Number of sequences analyzed
ΔQ mean (95% range)b
Posterior probability that ΔQ>0
Non- Galliformes
Galliformes
N1
Subclade 1 (not HPc H5N1)
498
386 (316; 468)
1
0.04
0.92
Subclade 2 (HP H5N1)
1291
225 (−890; 1400)
0.65
0.26
0.61
N2
Subclade 1 (not Eurasian H9N2)
592
425 (213; 567)
1
0
0.97
Subclade 2 (Eurasian H9N2)
625
−13 (−199; 114)
0.12
0
0.98
N3
Entire tree
261
87 (0–513)
0.97
0.06
0.3
: Analysis was performed separately for major clades when the trees were too large.
: Difference of all rates that lead to the character combinations Galliformes/SΔNA and non-Galliformes/full-length-NA minus all rates that lead the character combinations non-Galliformes/SΔNA and Galliformes/full-length-NA. (refer to Methods).
: highly pathogenic.
: Analysis was performed separately for major clades when the trees were too large.: Difference of all rates that lead to the character combinations Galliformes/SΔNA and non-Galliformes/full-length-NA minus all rates that lead the character combinations non-Galliformes/SΔNA and Galliformes/full-length-NA. (refer to Methods).: highly pathogenic.Models with a zero transition rate from deleted to full-length NA in Galliformes have high posterior probabilities among N1 and N2 sequences. This result implies that restoration of full-length-NA from SΔNA is unlikely for N1 and N2 in Galliformes (Table 2). These posterior probabilities are generally low in non-Galliformes (Table 2).
Geographic area and NA stalk deletions
We tested for a correlation between SΔNA and Asia for N1, N2 and N3. In subclade 1 of N1 and in both N2 subclades, there is a significant positive correlation between SΔNA and an isolate being from Asia (Table 3). In subclade 2 of N1, the correlation is negative and no correlation was found for N3 (Table 3).
Table 3
Bayesian estimation of joint transitions between Asia and non-Asia and full-length-NA and SΔNA.
Subtype
Groupa
ΔQ mean (95% range)b
Posterior probability that ΔQ>0
N1
Subclade 1 (not HPc H5N1)
213 (145; 256)
1
Subclade 2 (HPc H5N1)
−1184 (−1515; −673)
0
N2
Subclade 1 (not Eurasian H9N2)
286 (170; 401)
1
Subclade 2 (Eurasian H9N2)
771 (373; 1563)
1
N3
Entire tree
−207 (−511; 2)
0.19
: Analysis was performed separately for major clades when the trees were too large.
: Difference of all rates that lead to the character combinations Asia/SΔNA and non-Asia/full-length-NA minus all rates that lead the character combinations non-Asia/SΔNA and Asia/full-length-NA. (refer to Methods).
: highly pathogenic.
: Analysis was performed separately for major clades when the trees were too large.: Difference of all rates that lead to the character combinations Asia/SΔNA and non-Asia/full-length-NA minus all rates that lead the character combinations non-Asia/SΔNA and Asia/full-length-NA. (refer to Methods).: highly pathogenic.
HA modifications and NA stalk deletions
To understand the potential impact of SΔNA on other viral properties, such as antigenicity, we tested whether any amino acid residues of HA protein features, such as glycosylation sites, are statistically correlated with the SΔNA genotype. Seven HA features were positively associated with SΔNA as indicated by the posterior probability of a positive rate difference (i.e. a rate difference that generates a positive correlation) exceeding 0.95. The features include five glycosylation sites, one deletion and one insertion that are identified in H5, H6 and H7 (Figure 4). All of these associations are NA-subtype-specific because no HA variant is significantly associated with SΔNA in more than one NA subtype (Figure 4, Table 4). The distribution of the seven features on the HA phylogenies is shown in the supplemental material (Figures S6, S7, S8 for H5–H7).
Figure 4
Proportions of HA protein features among viruses with full-length-NA (filled bars) or with SΔNA (open bars).
The X-axis indicates the features, where Glyc denotes a glycosylation site, Del denotes a deletion, Ins denotes an insertion and the number indicates the amino acid position according to the H3 numbering. NA subtypes are color-coded (red = N1, blue = N2, orange = N3, brown = N7). Asterisks denote HA features whose SΔNA are statistically correlated with the HA feature since the posterior probability that ΔQ exceeds zero is greater than 0.95 (refer to Methods and Table 4).
Table 4
Bayesian estimation of joint transitions between full-length NA and SΔNA and selected HA features.
Subtype
Number of sequences analyzed
Feature
Position
ΔQ mean (95% range)a
Posterior probability that ΔQ>0
Associated NA deletion patternsc
H5N1
119
Glycosylation
158
859 (30; 1776)
0.998
4, 5, 6, 11
H5N2
120
Glycosylation
131
546 (1; 1557)
0.978
25, 33
240
136 (29; 425)
>0.999
20, 25, 30
H6N1
136
Insertionb
141
943 (212; 1293)
>0.999
1, 2, 4
H7N1
42
Glycosylation
133
520 (13; 1603)
0.987
9, 10
Glycosylation
159d
749 (114; 1726)
0.995
9
H7N2
138
Deletionb
221
1545 (266; 3046)
0.991
21
Glycosylation
160
499 (6; 1585)
0.955
39
: Difference of all rates that lead to the character combinations feature present/SΔNA and feature absent/full-length-NA minus all rates that lead the character combinations feature absent/SΔNA and feature present/full-length-NA (refer to Methods).
: Blanks in HA sequences that are associated with SΔNA are designated as deletions and blanks in HA sequences that are associated with full-length-NA are designated as insertions in HA sequences associated with SΔNA.
: NA deletion pattern IDs that are associated with the HA feature that has a higher proportion among AIVs with SΔNA than among AIVs with full-length-NA.
: There is no exact corresponding position on the H3 sequence. This position is on an insertion between position 159 and 160 on the H3 sequence.
Proportions of HA protein features among viruses with full-length-NA (filled bars) or with SΔNA (open bars).
The X-axis indicates the features, where Glyc denotes a glycosylation site, Del denotes a deletion, Ins denotes an insertion and the number indicates the amino acid position according to the H3 numbering. NA subtypes are color-coded (red = N1, blue = N2, orange = N3, brown = N7). Asterisks denote HA features whose SΔNA are statistically correlated with the HA feature since the posterior probability that ΔQ exceeds zero is greater than 0.95 (refer to Methods and Table 4).: Difference of all rates that lead to the character combinations feature present/SΔNA and feature absent/full-length-NA minus all rates that lead the character combinations feature absent/SΔNA and feature present/full-length-NA (refer to Methods).: Blanks in HA sequences that are associated with SΔNA are designated as deletions and blanks in HA sequences that are associated with full-length-NA are designated as insertions in HA sequences associated with SΔNA.: NA deletion pattern IDs that are associated with the HA feature that has a higher proportion among AIVs with SΔNA than among AIVs with full-length-NA.: There is no exact corresponding position on the H3 sequence. This position is on an insertion between position 159 and 160 on the H3 sequence.
Discussion
In this study, we conducted a comprehensive analysis of the polymorphic AIV NA stalk regions using a large set of sequences of natural isolates (as opposed to laboratory-adapted isolates). Forty-seven different NA stalk deletion (SΔNA) patterns were identified in six NA subtypes, N1–N3 and N5–N7. An analysis that controlled for phylogenetic dependence due to shared ancestry showed SΔNA to be positively correlated with gallinaceous hosts and with some amino acid features on the HA. The analysis of the HA features estimated rates at which SΔNA were gained and lost on the HA tree depending on the presence of an HA feature. This analysis concerned only the overall rate of transitions and did not differentiate between transitions on the NA due to reassortment or due to mutation. It was therefore not necessary to estimate the reassortment rate between HA and SΔNA which is unknown for the analyzed dataset. Five glycosylation sites, one insertion and one deletion on the HA were identified to be statistically SΔNA-associated and all of these associations were specific to HA-NA subtype combinations. Our results further suggested that SΔNA mutations essentially cannot be restored to full-length in gallinaceous hosts. One challenge for our analysis was the unevenness of the sequence data due to varying sampling schemes. By fitting transition rates based on phylogenies, we could account for the uneven relatedness and the tendency of shared characteristics among closely related sequences. This was especially important for HPAIVs that are the source of a large number of very closely related sequences. However, when no SΔNAs were observed in non-gallinaceous hosts, the fitted transition rates could not distinguish between a restoration to a full-length NA stalk and an extinction of SΔNA mutants. Despite the limitation of our analysis, the findings shed light on the origin, spread and persistence of SΔNA as well as the functional balance between HA and NA [53].The results of our analysis that controlled for phylogenetic dependence due to shared NA ancestry suggest that the positive correlation between SΔNA and gallinaceous hosts is caused by host-dependent transition rates between full-length-NA and SΔNA. Previous studies have indicated that naturally-occurring SΔNA mutants tend to appear in gallinaceous hosts but did not demonstrate a general correlation between host and SΔNA [8], [9], [28]. Laboratory studies showed that SΔNA mutations arose from duck-origin viruses after these viruses were passaged through gallinaceous hosts [13], [19], [54]. Our results provide further evidence that the processes demonstrated in laboratory studies generated the patterns observed in naturally-occurring viruses. The emergence of SΔNA in AIV infected gallinaceous hosts seems to be a general phenomenon that is widely detected in several NA subtypes.However, the correlation between gallinaceous hosts and SΔNA mutations is not ubiquitous. In the N2 subclade 2, only 31% of the H9N2 isolates from poultry outbreaks were SΔNA mutants (Tables 2 and S1, Figure S3). As for the N1 subclade of HPAIV H5N1, almost all non-gallinaceous isolates had SΔNA (Figure S2). Besides the HPAIV H5N1 N1 subclade, there were 97 SΔNA sequences of other subtypes from non-gallinaceous isolates. Hence, SΔNA mutants appear to be generated in Galliformes but can be transmitted to other hosts. Whether the SΔNA of other subtypes can persist at low prevalence in non-gallinaceous birds is unknown. The high diversity of deletion patterns found in several NA subtypes suggests that these deletions convey a general advantage for the viruses in gallinaceous hosts that does not depend on the particular NA amino acid sequence. The fact that the same deletion pattern arose more than once in several instances indicates that the number of possible deletion patterns that confer this advantage is limited.Our results showed a positive correlation between Asian locations and SΔNAs in three instances (both subclades of N2 and subclade 1 of N1). This correlation could be due to different farm practices in Asia (lower compliance with biosecurity measures and more mixed-species farms) and/or AIV sampling schemes biased towards poultry species in Asia. Among HP H5N1, there is a negative correlation between Asian locations and SΔNAs because all HP H5N1 that spread from Asia to other regions carried SΔNAs whereas within Asia two forms of NA (SΔNA and full-length-NA) of HP H5N1 co-existed (Figure S2).This study also uncovered amino acid features on HA proteins that are statistically correlated with SΔNA. Previous studies demonstrated that shortened NA stalk length reduces NA activity [25] and that an efficient virus replication requires a matching reduction of HA activity [32], [55]. It has been shown for H1N1 and H7N1 that glycosylation sites on the HA globular head structure reduce HA affinity for receptor binding and make the virus less dependent on NA function [32], [56], [57], [58]. Viruses with these additional HA glycosylation sites replicated efficiently when combined with SΔNA and were less sensitive to NA inhibiting drugs [32], [33], [56], [57], [58]. In H7N1, the glycosylation site that conferred these effects was Asn149 [32], [33], [57] (or 133 according to the H3 numbering [59]), the same glycosylation site that is significantly correlated with SΔNA on N1 according to our results (Figure 4, Table 4). Our analysis showed other putative glycosylation sites in HA of other subtypes, as well as insertion and deletion that are statistically correlated with SΔNA (Figure 4, Table 4). Given the requirement of functional balance between HA and NA [32], [56], [57], [58], we speculate that the HA features identified by our methods could also reduce the HA affinity for host cell receptors and therefore, the virus' sensitivity to NA inhibitors. The fact that SΔNAs are widely associated with isolates of gallinaceous hosts while each HA feature occurs only with a single NA subtype could be an evidence that SΔNAs are adaptations to gallinaceous hosts whereas the accompanying HA features are secondary adaptations to SΔNAs. In theory, some of the correlations between HA features and SΔNAs could be the by-product of two pair-wise correlations with a third feature on an internal gene. Investigating such effects would require techniques to analyze multiple correlations while controlling for phylogenetic dependencies, which will be topics for future studies.In summary, our results showed that SΔNAs are widely observed in AIVs. They are repeatedly associated with poultry outbreaks but are also found in non-poultry hosts. SΔNA mutants should be of special concern for the poultry industry since they could imply an adaptation of a virus to gallinaceous hosts. Previous researchers suggested that these viruses bear the risk of a pandemic [60], [61], [62]. AIV with SΔNA and SΔNA-associated HA features might be less sensitive to NA inhibiting drugs or reduce the efficacy of vaccines developed using a similar virus with full-length NA. Therefore, we believe it is important to closely monitor the emergence of SΔNA mutants in poultry and other species and prevent extended AIV circulation in poultry.
Methods
Sequence retrieval
AIV NA and HA nucleotide sequences and HA amino acid sequences were retrieved on December 17, 2009 from public influenza database (http://www.flu.lanl.gov/) using keywords type A and avian host. Nucleotide sequences of all lengths were retrieved while amino acid sequences were restricted to full length. NA nucleotide sequences were excluded from the analysis if the stalk region was not fully sequenced, i.e. if the first sequenced nucleotide position was after position 90 or the last sequenced position was before 270, counting adenine (A) in the start codon (ATG) as position 1. Furthermore, HA and NA sequences were excluded if they were acquired from viruses that were manipulated in laboratories (e.g. being expanded through laboratory animal or modified for the purpose of vaccine development), or if sequences were duplicates of an earlier submission of the same gene of the same strain, or erroneous as identified in a previous publication [63], or from non-avian isolates. A total of 4,920 NA, 5,596 HA nucleotide and 4,702 HA amino acid sequences were included in this study.
Sequence alignment and identification of stalk deletions
All qualified nucleotide sequences of the same NA subtype were aligned using Clustal W (BioEdit v7.0.5, Ibis Therapeutics, Carlsbad, CA). All nucleotide sequences were translated into amino acid sequences within each subtype alignment using the same program. Deletion patterns in the stalk region (positions 30 to 90 for amino acids and 90 to 270 for nucleotides) were adjusted manually at both nucleotide and amino acid levels by moving nucleotides or amino acids between both positions that flank the deleted region. Nucleotides were adjusted to avoid stretches of missing nucleotides being flanked by incomplete codons. After translation, amino acids flanking a deletion were assigned to the flanking positions that best matched the consensus sequence. If permitted by the consensus sequences, deletion patterns were further adjusted to minimize the numbers of distinct deletion patterns. Deletion patterns were ordered by NA subtype and by size of the deleted region within each NA subtype. An identification (ID) number was assigned to each deletion pattern by numbering the ordered deletion patterns consecutively.Phylogenetic trees were constructed for each NA subtype from aligned NA nucleotide sequences using weighted neighbor-joining (function bionj in the R-package ape) [64]. Trees were rooted with corresponding gene segments of the 1918 H1N1human virus. The tree branches were labeled with deletion patterns. Due to their sizes, all trees are included in the supplemental material (Figures S2, S3, S4, S5). The first and last observation of each separate emerging SΔNA was recorded. Each different SΔNA pattern and distinct clades of the same SΔNA pattern were counted as separate emerging SΔNA. Two clades of the same SΔNA pattern were counted as distinct if they were separated by non-deleted sequences and had an estimated distance of more than 1% nucleotide changes between their nodes.
Identification of HA features correlated with NA stalk deletion
HA nucleotide and amino acid sequences were aligned using MUSCLE with two alignment iterations [65], [66]. Aligned HA amino acid sequences, excluding positions within cleavage site, were screened for three possible features, namely putative glycosylation sites (NXT/S, where X could be any amino acid residues except Proline) [28], deletions or insertions. A script was written in R
[67] that identifies the positions of all three features in all HA sequences (Script S1). HA and NA sequences were linked by strain names. HA and NA sequences with matching location, host species, serial number and year of sampling were assumed to be from the same isolate. The H3 numbering system was used for all HA subtypes as described in a previous study [59].
Estimation of correlations between two characters
We tested for correlations of SΔNA with host, with HA features and with geographic region while controlling for phylogenetic dependence due to shared NA or HA ancestry. The associations between NA stalk state and host or region were analyzed based on NA trees. The association between NA stalk state and HA features was analyzed based on HA trees. Characters of interest (i.e. state of NA stalk, host type, presence of HA feature and geographic region) were coded as binary characters and estimating the transition rates among the four possible combinations of two binary characters based on phylogenetic trees using the software package BayesTraits [68], [69] (Figure 5). Character A represents NA stalk state (with or without stalk deletion) and character B can be either host (gallinaceous or non-gallinaceous), HA amino acid feature (present or absent) or region (Asia or non-Asia). Independent character evolution would imply that the transition rates between two states of one character do not depend on the state of the other character.
Figure 5
Illustration of all possible combinations of two binary characters (A and B) and the transition rates between these combinations.
The model fits rates for the transitions per infinitely small time interval and therefore allows the change of only one state at a time (i.e. no rates are fitted for diagonal transitions). The shaded character combinations are expected to dominate if characters A and B are positively correlated. This positive correlation is created by transition rates if ΔQ = q
21+q
23+q
41+q
43−(q
12+q
14+q
32+q
34)>0.
Illustration of all possible combinations of two binary characters (A and B) and the transition rates between these combinations.
The model fits rates for the transitions per infinitely small time interval and therefore allows the change of only one state at a time (i.e. no rates are fitted for diagonal transitions). The shaded character combinations are expected to dominate if characters A and B are positively correlated. This positive correlation is created by transition rates if ΔQ = q
21+q
23+q
41+q
43−(q
12+q
14+q
32+q
34)>0.For a given association between character A and B, there are two supporting and two opposing character state combinations described in Figure 5. For example, an association between gallinaceous hosts and SΔNAs is supported by the character combinations Galliformes/SΔNA and non-Galliformes/full-length-NA and opposed by non-Galliformes/SΔNA and Galliformes/full-length-NA. The difference between all rates leading to the two supporting character state combinations minus all rates leading to the opposing character state combinations, ΔQ, was calculated as a measure for whether the character correlation is generated by interdependent transition rates (Figure 5).Using a Bayesian framework, posterior distributions of the transition rates were estimated given a distribution of phylogenetic trees. The distributions of phylogenetic HA and NA trees were estimated by fitting a general time reversible (GTR) model with gamma distributed rates to nucleotide sequences using MrBayes 3.1.2 [70]. The Markov chains for the trees were run for 3 Mio time steps with a burn-in period of 250,000 and trees were sampled every 2,000 steps leading to 1375 sampled trees per chain. Two independent chains were run for each tree and it was verified that the chains converged to the same tree by comparing the topological distances between trees within and between the chains. Each BayesTraits analysis sampled from a total of 2750 trees per subtype.Analysis using large trees can lead to log-likelihood values of the transition model below the smallest number that the BayesTraits code can represent. To avoid such low log-likelihood values, sequences were divided into groups and a separate parameter estimation was performed per group. NA sequences were grouped into NA subtypes and N1 and N2 were divided into two subclades within each subtype. N1 sequences were distinguished into the subclade that contains all HPAIV H5N1 NAs or all other N1 sequences. N2 sequences were separated into those from isolates of H9N2 outbreaks in China and those from other isolates. For HA sequences, trees were constructed for each HA-NA subtype, e.g. a separate tree was constructed for HA sequences from H5N1 viruses and from H5N2 viruses. HA sequences were grouped in this fashion since HA trees (Figures S6,S7, S8) suggested that the association between HA features and SΔNA depended on NA subtype. The N1 and H5 trees of the HPAIV H5N1 clade were further reduced by randomly selecting a single sequence from all clades whose maximum distance from its basal node was less than 0.01. Representative sequences belonging to closely related branches were retained by this trimming method.A reversible-jump Monte Carlo Markov Chain was used to simultaneously estimate the posterior distributions of transition rates and the posterior probabilities of constraints imposed on the transition rates [69]. A posterior distribution of ΔQ was calculated. A range containing 95% of ΔQ values and a posterior probability that ΔQ exceeds zero was determined from the posterior distribution. A high posterior probability of ΔQ exceeding zero was interpreted as evidence that observed positive correlations between characters were due to correlated transition rates and not simply due to shared ancestry.The correlation between SΔNA and HA features were analyzed based on HA trees. HA trees were chosen because in that case SΔNA can be acquired or lost by mutation and/or reassortment. Hence the NA states are a more labile character on the HA tree than on the NA tree. Larger volatility of the NA state is more likely to reveal functional correlations if they are present. A single rate was fitted for each transition, without distinguishing whether transitions were due to mutations or reassortment.Proportions of sequences with deleted amino acids at positions in the stalk region of AIV Neuraminidases.(0.29 MB EPS)Click here for additional data file.Neighbor-joining tree of N1 sequences. Branch colors indicate host types, pink for Galliformes, yellow for Anseriformes, green for Charadriiformes and grey for birds from other orders. Genes of isolates from gallinaceous hosts are shown by tree branches with extended grey lines. The first column from the left shows a black dash for each isolate with SΔNA. The second column shows the deletion pattern ID. Sequences with the same deletion pattern are denoted by square brackets numbered with pattern ID. Deletion patterns that are nested within larger deletion pattern clades are indicated by deletion pattern IDs on the left side of the brackets. The third column indicates the HA subtype of each isolate. The fourth column shows the continent each isolated is from. The two N1 subclades are shown by brackets in the last column. The box indicates the part of the tree that is shown in Figure 2.(0.63 MB EPS)Click here for additional data file.Neighbor-joining tree of N2 sequences. Symbol descriptions are the same as in Figure S2.(0.49 MB EPS)Click here for additional data file.Neighbor-joining tree of N3 sequences. Symbol descriptions are the same as in Figure S2.(0.27 MB EPS)Click here for additional data file.Neighbor-joining tree of N7 sequences. Symbol descriptions are the same as in Figure S2.(0.24 MB EPS)Click here for additional data file.Neighbor-joining tree of H5 sequences. Branch colors indicate host types, pink for Galliformes, yellow for Anseriformes, green for Charadriiformes and grey for birds from other orders. Genes of isolates from gallinaceous hosts are shown in pink branches with extended grey lines. The columns from left to right show NA stalk state for each sequence (a pattern ID is given for each isolate with SΔNA), NA subtype, continent of origin and various HA protein features. NA subtypes and continents are color-coded. The presence of described HA features (Glyc = glycosylation, Del = deletion and Ins = insertion) are shown by black dashes.(0.61 MB EPS)Click here for additional data file.Neighbor-joining tree of H6 sequences. Symbol descriptions are the same as in Figure S6. Strain names are shown.(0.29 MB EPS)Click here for additional data file.Neighbor-joining tree of H7 sequences. Symbol descriptions are the same as in Figure S6. Strain names are shown.(0.31 MB EPS)Click here for additional data file.Summary of HA/NA subtypes, prevalence, deleted region, sampling location and time per deletion pattern.(0.03 MB DOC)Click here for additional data file.R code to identify glycosylation sites and deletions in amino acid sequences.(0.00 MB TXT)Click here for additional data file.
Authors: Y J Guo; S Krauss; D A Senne; I P Mo; K S Lo; X P Xiong; M Norwood; K F Shortridge; R G Webster; Y Guan Journal: Virology Date: 2000-02-15 Impact factor: 3.616
Authors: Ming Liu; Shiqin He; David Walker; NanNan Zhou; Daniel R Perez; Bing Mo; Fan Li; Xiaotian Huang; Robert G Webster; Richard J Webby Journal: Virology Date: 2003-01-20 Impact factor: 3.616
Authors: Chang-Won Lee; Dennis A Senne; Jose A Linares; Peter R Woolcock; David E Stallknecht; Erica Spackman; David E Swayne; David L Suarez Journal: Avian Pathol Date: 2004-06 Impact factor: 3.378
Authors: Mariette Ducatez; Stephanie Sonnberg; Jeri Carol Crumpton; Adam Rubrum; Phouvong Phommachanh; Bounlom Douangngeun; Malik Peiris; Yi Guan; Robert Webster; Richard Webby Journal: J Gen Virol Date: 2017-06-20 Impact factor: 3.891
Authors: T W Hoffmann; S Munier; T Larcher; D Soubieux; M Ledevin; E Esnault; A Tourdes; G Croville; J-L Guérin; P Quéré; R Volmer; N Naffakh; D Marc Journal: J Virol Date: 2011-10-19 Impact factor: 5.103
Authors: Olga Munoz; Marco De Nardi; Karen van der Meulen; Kristien van Reeth; Marion Koopmans; Kate Harris; Sophie von Dobschuetz; Gudrun Freidl; Adam Meijer; Andrew Breed; Andrew Hill; Rowena Kosmider; Jill Banks; Katharina D C Stärk; Barbara Wieland; Kim Stevens; Sylvie van der Werf; Vincent Enouf; Gwenaelle Dauphin; William Dundon; Giovanni Cattoli; Ilaria Capua Journal: Ecohealth Date: 2015-01-29 Impact factor: 3.184