| Literature DB >> 20049091 |
Nicole Creanza1, Jason S Schwarz, Joel E Cohen.
Abstract
Long-term influenza evolution has been well studied, but the patterns of sequence diversity within seasons are less clear. H3N2 influenza genomes sampled from New York State over ten years indicated intraseasonal changes in evolutionary dynamics. Using the mean Hamming distance of a set of amino acid or nucleotide sequences as an indicator of its diversity, we found that influenza sequence diversity was significantly higher during the early epidemic period than later in the influenza season. Diversity was lowest during the peak of the epidemic, most likely due to the high prevalence of a single dominant amino acid sequence or very few dominant sequences during the peak epidemic period, corresponding with rapid expansion of the viral population. The frequency and duration of dominant sequences varied by influenza protein, but all proteins had an abundance of one distinct sequence during the peak epidemic period. In New York State from 1995 to 2005, high sequence diversity during the early epidemic suggested that seasonal antigenic drift could have occurred primarily in this period, followed by a clonal expansion of typically one clade during the peak of the epidemic, possibly indicating a shift to neutral drift or purifying selection.Entities:
Mesh:
Year: 2010 PMID: 20049091 PMCID: PMC2796395 DOI: 10.1371/journal.pone.0008544
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Sequence diversity and number of sequences over time.
The mean Hamming distance of the aligned sequences, averaged across proteins, within each period of each season dipped during the peak epidemic periods (solid red line). Error bars indicate mean standard errors of mean sequence diversity across proteins. For every peak period, the standard error of mean Hamming distance was less than 1, too small to be visible here. The non-overlap of the error bars of the peak epidemic periods and the other periods indicates the statistical significance of the reduction in mean Hamming distance during the peak epidemic periods. The peaks in the total number of sequences sampled (dashed blue line) coincided with the dips in mean Hamming distance, suggesting that dominant sequence expansion during the peak epidemic period may have accounted for reduced diversity during the epidemic peak. The scale on the left applies to the number of sequences (dashed blue line).
Significant differences in amino acid diversity between intraseasonal periods.
| Early epidemic | Peak epidemic | Late epidemic | |
| High diversity | 12* | 4 | 3 |
| Low diversity | 1 | 11* | 0 |
The early epidemic period was significantly more diverse than expected relative to the peak epidemic and late epidemic periods. The peak epidemic period was significantly less diverse than expected relative to the early and late epidemic periods. The null hypothesis that the three periods had equal median diversity was rejected by a non-parametric Kruskal-Wallis test (χ2 = 48.89, p<0.001). This observation was consistent with the hypothesis that the epidemic peak was associated with the rapid proliferation of a dominant amino acid sequence. An asterisk indicates that the number of high or low diversity events would occur by chance with probability less than 0.01.
Significant differences in nucleotide diversity between intraseasonal periods.
| Early epidemic | Peak epidemic | Late epidemic | |
| High diversity | 7* | 2 | 4 |
| Low diversity | 5 | 15* | 0 |
The nucleotide results mirrored the amino acid results. The early epidemic period was more diverse and the peak epidemic period less diverse than the other periods. The null hypothesis that the three periods had equal median diversity was rejected by the non-parametric Kruskal-Wallis test (χ2 = 43.43, p<0.001). These observations were consistent with the results for amino acid sequences, supporting the hypothesis that the epidemic peak was associated with the rapid proliferation of a dominant amino acid sequence. An asterisk indicates that the number of high or low diversity events would occur by chance with probability less than 0.01.
Figure 2Dominant sequences over time.
Different influenza proteins had different temporal patterns of dominant sequences. HA and NA had a different dominant sequence each season, each of which made up a relatively small proportion of samples from that season. Dominant sequences were a larger proportion of all sequences of the internal proteins. The two proteins with the highest proportion of dominant sequences and the longest duration of dominance were NS2 and M2. The fraction of sequences for each dominant sequence changed by season, but nearly all dominant sequences persisted for multiple seasons in these proteins. The x-axis represents influenza season, and the y-axis represents the fraction of all sequences in that season. Each line color represents a distinct dominant sequence, and each occurrence of the dominant sequence had an identical amino acid sequence. E.g., for the HA protein, the dominant sequence represented by the dark blue line was 20% of all HA sequences in the 1996 season, but was absent in 1995 and in 1997 and thereafter.