| Literature DB >> 21390205 |
Sergey Kryazhimskiy1, Jonathan Dushoff, Georgii A Bazykin, Joshua B Plotkin.
Abstract
The surface proteins of human influenza A viruses experience positive selection to escape both human immunity and, more recently, antiviral drug treatments. In bacteria and viruses, immune-escape and drug-resistant phenotypes often appear through a combination of several mutations that have epistatic effects on pathogen fitness. However, the extent and structure of epistasis in influenza viral proteins have not been systematically investigated. Here, we develop a novel statistical method to detect positive epistasis between pairs of sites in a protein, based on the observed temporal patterns of sequence evolution. The method rests on the simple idea that a substitution at one site should rapidly follow a substitution at another site if the sites are positively epistatic. We apply this method to the surface proteins hemagglutinin and neuraminidase of influenza A virus subtypes H3N2 and H1N1. Compared to a non-epistatic null distribution, we detect substantial amounts of epistasis and determine the identities of putatively epistatic pairs of sites. In particular, using sequence data alone, our method identifies epistatic interactions between specific sites in neuraminidase that have recently been demonstrated, in vitro, to confer resistance to the drug oseltamivir; these epistatic interactions are responsible for widespread drug resistance among H1N1 viruses circulating today. This experimental validation demonstrates the predictive power of our method to identify epistatic sites of importance for viral adaptation and public health. We conclude that epistasis plays a large role in shaping the molecular evolution of influenza viruses. In particular, sites with , which would normally not be identified as positively selected, can facilitate viral adaptation through epistatic interactions with their partner sites. The knowledge of specific interactions among sites in influenza proteins may help us to predict the course of antigenic evolution and, consequently, to select more appropriate vaccines and drugs.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21390205 PMCID: PMC3040651 DOI: 10.1371/journal.pgen.1001301
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Detecting positive epistasis between mutations at two sites, and .
The epistasis statistic is defined in terms of the total time elapsed between all pairs of consecutive substitutions at sites and (see “Materials and Methods”). In this schematic figure, substitutions at sites and are denoted by red and blue circles, respectively. Substitutions and form consecutive pairs. Substitutions are not consecutive because they occur on different lineages. Substitutions are not consecutive because substitution at site occurs between them.
Summary of data used in our analysis.
| H3N2 | H1N1 | |||
| H3 | N2 | H1 | N1 | |
| number of sequences | 2149 | 2339 | 1219 | 1836 |
| PDB accession number | 2VIU | 1NN2 | 1RUZ | 3BEQ |
| protein length | 566 | 469 | 565 | 470 |
| 131 | 45 | – | – | |
| sites considered | 141 | 111 | 115 | 113 |
| pairs considered | 19740 | 12210 | 13110 | 12656 |
| 63 | 54 | 78 | 60 | |
| pairs significant at 0.01 (exp) | 132 | 122 | ||
| pairs significant at 0.01 (obs) | 225 | 205 | ||
| FDR, % | 58.8 | 64.8 | 58.7 | 59.3 |
sites are taken from [8], [17] for H3, from [8], [19], [20] for N2;
is for the number of nominally significant pairs. FDR stands for “false discovery rate” (see “Materials and Methods”).
Figure 2Phylogenetic tree of HA (subtype H3) illustrating a putatively epistatic interaction between sites 391 (red circles) and 73 (blue circles).
Site 391 is not in an epitope, ; site 73 is epitope E, . Only substitutions at internal nodes are displayed. Branch lengths are equal to the total number of substitutions across all sites. Vertical bars show the approximate years in which the sequences were isolated. Substitutions C, D, E, and F at site 73 closely follow substitution A at site 391, leading to a highly significant value of the epistasis statistic (, nominal ). As a result, the ordered pair of sites (391, 73) is detected as epistatic by our method. At the same time, only a single substitution, B, at site 391 follows substitution F at site 73 – and only after a long period of time – resulting in a low value of the epistasis statistic for the inverse pair (, nominal ). Therefore, the ordered pair of sites (73,391) is not detected as epistatic by our method.
Figure 3Phylogenetic tree of NA (subtype N1) illustrating a putatively epistatic interaction between the leading site 344 (red circles) and the trailing site 275 (blue circles).
Other notations are as in Figure 2.
Characterization of epistatic pairs in subtype H3N2 surface proteins, compared to expectations for randomly chosen pairs.
| H3 | N2 | ||||
| exp | obs | exp | obs | ||
| leading site | average dN/dS | ||||
| fraction ept | |||||
| trailing site | average dN/dS | ||||
| fraction ept | |||||
| fraction of pairs | both npt | ||||
| (npt, ept) | |||||
| (ept, npt) | |||||
| same ept | |||||
| diff. ept | |||||
| time btw. consec. subst. (in syn subst) | mean | N/A | 14.71 | N/A | 13.82 |
| std | N/A | 16.76 | N/A | 15.72 | |
| time btw. consec. subst. (in years) | mean | N/A | 3.68 | N/A | 4.22 |
| std | N/A | 4.20 | N/A | 4.80 | |
| linear | |||||
| physical, Å | |||||
are obtained from two-tailed tests, except for the last two rows which report one-sided tests regarding distances between sites. Single and double asterisks denote significance at 0.05 and 0.01 level, respectively; “ns”, “ept”, “npt”, “obs”, “exp”, and “N/A” denote not significant, epitopic, non-epitopic, observed, expected, and not applicable respectively.
distances are computed only over those significant pairs in which both residues are present in the crystal structure (see “Materials and Methods”).
Characterization of epistatic pairs in subtype H1N1 surface proteins, compared to expectations for randomly chosen pairs.
| H1 | N1 | ||||
| exp | obs | exp | obs | ||
| leading site | average dN/dS | ||||
| trailing site | average dN/dS | ||||
| time btw. consec. subst. (in syn subst) | mean | N/A | 25.56 | N/A | 14.75 |
| std | N/A | 28.64 | N/A | 17.09 | |
| time btw. consec. subst. (in years) | mean | N/A | 5.80 | N/A | 4.40 |
| std | N/A | 6.50 | N/A | 5.10 | |
| average distance | linear | ||||
| physical, Å | |||||
Notations as in Table 2.