| Literature DB >> 24667049 |
James Lara1, Michael A Purdy2, Yury E Khudyakov2.
Abstract
Hepatitis E virus (HEV) causes epidemic and sporadic cases of hepatitis worldwide. HEV genotypes 3 (HEV3) and 4 (HEV4) infect humans and animals, with swine being the primary reservoir. The relevance of HEV genetic diversity to host adaptation is poorly understood. We employed a Bayesian network (BN) analysis of HEV3 and HEV4 to detect epistatic connectivity among protein sites and its association with the host specificity in each genotype. The data imply coevolution among ∼70% of polymorphic sites from all HEV proteins and association of numerous coevolving sites with adaptation to swine or humans. BN models for individual proteins and domains of the nonstructural polyprotein detected the host origin of HEV strains with accuracy of 74-93% and 63-87%, respectively. These findings, taken together with lack of phylogenetic association to host, suggest that the HEV host specificity is a heritable and convergent phenotypic trait achievable through variety of genetic pathways (abundance), and explain a broad host range for HEV3 and HEV4. Published by Elsevier B.V.Entities:
Keywords: Adaptation; Bayesian network; Coevolution; HEV ORFs; Hepatitis E virus; Prediction
Mesh:
Substances:
Year: 2014 PMID: 24667049 PMCID: PMC5745802 DOI: 10.1016/j.meegid.2014.03.011
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
Fig. 1Coordination among HEV protein sites modeled with BNs. The BN models show genome-wide epistatic connectivity among aa sites (a structural coefficient threshold = 2.0) and association to host variable (a structural coefficient threshold = 0.95). Nodes represent polymorphic aa sites and arcs between them represent dependency. Nodes are color-coded according to the ORF and ORF1-domains and numbered according to the aa positions in the respective ORFs. Orf1(x) encompasses UNK and denotes aa sites that fall outside known ORF1 domains (n = 16 in BNHEV3 and n = 18 in BNHEV4). (A) A learned BN of HEV3 sequences (n = 65) and (B) A learned BN of HEV4 sequences (n = 55).
Fig. 2Contribution of proteins and ORF1-domains to BNHEV3 and BNHEV4. (A) Bar charts show number of aa sites involved in BNs for each protein and ORF1-domain. (B) Number of links among aa sites from all proteins and ORF1-domains observed in BNs. Numbers outside and inside of parenthesis are for BNHEV3 and BNHEV4, respectively.
Fig. 3Strengths of epistatic influences. Strength of linkages (primary y-axis) and number of links (secondary y-axis; crosses joined with dashed lines) among HEV proteins and ORF1-domains (x-axis) was computed from learned BNHEV3 and BNHEV4. (A) Overall strengths measured for each protein or ORF1-domain in entire BNs. Directionality of influences is color coded. (B–J) Strength and number of links to all proteins and ORF1-domains in BNHEV3 and BNHEV4 observed for each protein and ORF1-domain.
Fig. 4The most interconnected (n ⩾ 5) and influential (global KL-divergence ⩾ 2.0) nodes in BNHEV3 and BNHEV4. Bars show global strength of influence (primary y-axis) and number of links (secondary y-axis) for aa sites identified in BNs (Fig. 1). The numbers of total and intra-protein (or intra-domain) links are shown with crosses and rhombi, respectively.
Fig. 5HEV3 and HEV4 protein sites most associated with host specificity. Bars show relative MI values (primary y-axis) for aa sites identified in BNs (Fig. 1). MI for the Plp site 557 (MI = 0.16) in BNHEV3 and Pol site 1692 (MI = 0.19) in BNHEV4 were assigned a relative value of 1. P values for each site are identified with crosses and shown as percentage (secondary y-axis).
Protein sites with relevant effects on the BN host variable.
| Genotype | ORF | Protein sites | Standardized total effects |
|---|---|---|---|
| HEV3 | ORF 1 | 89, 113, 509, 546, 547, 555, 557, 598, 605, 721, 746, 765, 782, 784, 796, 846, 1017, 1018, 1219, 1252, 1285, 1370 and 1508 | 0.459–0.0784 |
| ORF 2 | 5, 11, 13, 25 and 529 | 0.177–0.069 | |
| ORF 3 | 82, 88 and 89 | 0.188–0.125 | |
| HEV4 | ORF 1 | 161, 304, 462, 469, 488, 516, 546, 555, 557, 566, 573, 611, 620, 676, 683, 687, 720, 727, 736, 738, 740, 742, 743, 745, 755, 759, 762, 767, 769, 773, 774, 779, 781, 783, 786, 789, 790, 906, 938, 964, 1003, 1007, 1036, 1237, 1242, 1346, 1349, 1356, 1632, 1692 and 1704 | 0.5140–0.081 |
| ORF 2 | 11, 37, 39, 537, 597, 609 and 632 | 0.182–0.081 | |
| ORF 3 | 70, 73, 92, 94 and 103 | 0.298–0.081 | |
Numbering represent protein positions in respective ORFs. Listed sites correspond to BNHEV3 and BNHEV4 (Fig. 1).
Range of estimated values observed for corresponding protein sites (see Section 2 for further details on estimates).
Fig. 6HEV3 and HEV4 host-specific epistatic motifs. (A) HEV3-BNSwine contains 40 arcs connecting 42 aa sites; (B) HEV3-BNHuman contains 44 arcs connecting 48 aa sites; (C) HEV4-BNSwine contains 34 arcs connecting 40 aa sites; and (D) HEV4-BNHuman contains 60 arcs connecting 66 aa sites. All links are statistically significant (p ⩽ 10−5) and highly correlated (avg. r = 0.9326 and r = 0.8791 – A and B, respectively; r = 0.8075 and r = 0.7934 – C and D, respectively). Color coding and numbering are as in Fig. 1.
Quality assessment of BNHEV3 and BNHEV4. Comparisons between log-likelihood values (within shaded and unshaded row pairs).
a Statistical tests were performed on networks shown in Fig. 6.
b BN learned using data of HEV variants sampled from humans (BNHuman) were tested on HEV data sampled from swine (DSwine).
c BN learned from swine data (BNSwine) was tested on HEV data sampled from humans (DHuman).
Validation of host-specific dependency among aa sites.
| HEV genotype | Host-specific network | Cross-validation test | |
|---|---|---|---|
| ( | ( | ||
| HEV3 | BNHuman (44) | 9 | 100% |
| 30 | 80.0–93.3% | ||
| 5 | 0% | ||
| BNSwine (40) | 20 | 100% | |
| 18 | 73.3–93.3% | ||
| 2 | 0–60% | ||
| HEV4 | BNHuman (60) | 29 | 100% |
| 27 | 73.3–93.3% | ||
| 4 | 46.7–60.0% | ||
| BNSwine (34) | 2 | 100% | |
| 28 | 73.3–93.3% | ||
| 4 | 46.7–66.7% | ||
Cross-validation tests were performed by jackknife method to determine percent frequency (f) with which edges (e) appeared in sampled networks relative to the corresponding reference BN (Fig. 6).
Values shown in parenthesis denote the total arc counts in reference BNs.
Values represent edge counts between aa sites observed for a given (f).
Overall performance of BNC constructed for each ORF-encoded protein.
| ORFs | HEV3 | HEV4 | ||||
|---|---|---|---|---|---|---|
| Protein sites | F-measure Swine/Human | CA | Protein sites | F-measure Swine/Human | CA | |
| ORF1 | 514, 557, 643, 719, 720, 724, 728, 764, 775, 783, 795, 836, 855, 1005, 1234, 1449, 1506, 1507, 1599 and 1612 | 0.86/0.89 | 87.7 | 17, 462, 523, 531, 546, 560, 562, 574, 650, 683, 732, 733, 742, 756, 772, 779, 802, 804, 1096 and 1456 | 0.88/0.95 | 92.7 |
| ORF2 | 2, 12, 25, 30, 34, 48, 53, 67, 76, 103, 113, 149, 158, 188, 356, 473, 501, 511, 527, 529, 554, 571, 609, 649, 651 and 652 | 0.79/0.84 | 81.5 | 38, 39, 46, 78, 96, 98, 119, 146, 175, 318, 521, 527, 546, 609 and 614 | 0.73/0.88 | 83.6 |
| ORF3 | 3, 4, 7, 31, 33, 36, 38, 41, 56, 70, 76, 78, 81, 83, 85, 93, 94 and 99 | 0.59/0.81 | 73.8 | 2, 17, 29, 32, 42, 53, 67, 73, 79, 82, 84, 90 and 92 | 0.60/0.85 | 78.2 |
Numbering represent protein positions in respective ORFs.
Overall classification accuracy of classification by 10-fold CV.
Overall performance of BNC’s for different ORF1-domains.
| Domain | HEV3 | HEV4 | ||||
|---|---|---|---|---|---|---|
| Protein sites | F-measure Swine/Human | CA | Protein sites | F-measure Swine/Human | CA | |
| 70, 81, 89, 129, 137, 141, 151, 152, 158, 161, 189, 192, 200 and 206 | 0.51/0.72 | 64.6 | 61, 72, 75, 89, 122, 139, 148, 150, 189 and 204 | 0.0/0.83 | 70.9 | |
| 219, 240, 246, 248, 302, 323, 355, 363, 393, 399 and 400 | 0.29/0.75 | 63.1 | 239, 248, 274, 277, 304, 306, 332, 335, 338, 340, 356, 357, 359, 363, 413, 423, 428 and 429 | 0.56/0.87 | 80.0 | |
| 468, 495, 502, 509, 512, 514, 517, 530, 542, 557 and 589 | 0.69/0.77 | 73.8 | 462, 546, 560, 562 and 574 | 0.79/0.91 | 87.3 | |
| 719, 720, 724, 728, 740, 749, 764, 775, 783, 790, 791, 795 and 797 | 0.83/0.86 | 84.6 | 707, 718, 732, 733, 742, 756, 760, 767, 772 and 779 | 0.77/0.90 | 85.5 | |
| 843, 855, 873, 876, 879, 914, 915, 941, 949, 959, 972, 985 and 992 | 0.52/0.77 | 69.2 | 803, 804, 811, 817, 835, 836, 846, 869, 874, 876, 902, 942 and 952 | 0.31/1.0 | 80.0 | |
| 1016, 1024, 1043, 1076, 1101, 1146, 1160, 1233, 1234 and 1242 | 0.59/0.72 | 66.2 | 977, 981, 983, 984, 1003, 1007, 1044, 1047, 1064, 1094, 1096, 1100, 1117, 1120, 1124, 1136 and 1196 | 0.64/0.88 | 81.8 | |
| 1370, 1285, 1508, 1599, 1732, 1386, 1612, 1283, 1481, 1533, 1746, 1426, 1652, 1499, 1638 and 1555 | 0.79/0.83 | 81.5 | 1235, 1236, 1247, 1266, 1303, 1355, 1360, 1447, 1456, 1572, 1632, 1648, 1652, 1692, 1693 and 1704 | 0.67/0.89 | 83.6 | |
Site numbering based on polyprotein positions.
Overall classification accuracy of classification by 10-fold CV.
Fig. 7Host-specific separation of HEV3 and HEV4 strains in LP-modeled physicochemical space. Shown are LP plots of physicochemical properties for aa sites from Pol and Pp (Table 4). Probability mapping of human and swine strains is color-coded, with human space shown in blue and swine in red. Color density is proportional to probability values. (A) LP map of HEV3 variants (n = 65) using Pol aa physicochemical properties or markers (n = 16); (B) LP map of HEV4 variants (n = 55) using Pol markers (n = 16) and (C) LP map of HEV4 variants (n = 55) using the Pp markers (n = 10). Below the mappings are line charts showing the prediction results (probability scores) on validation datasets from the above corresponding LP maps; y-axis represents probability [0–1]; p(H) and p(S) are probabilities of the human (blue line) and swine (red line) origin of a strain, respectively. GenBank accession numbers (x-axis) are shown for each test sequence; black triangles and circles identify HEV strains obtained from humans and swine, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)