| Literature DB >> 30047110 |
Jisung Jang1, Sook Hee Yoon2, Wonseok Lee2, Jihyun Yu2, Joon Yoon1, Seunghyun Shim3, Heebal Kim4,5.
Abstract
Porcine epidemic diarrhea virus (PEDV), the causative agent of porcine epidemic diarrhea (PED), has led to tremendous economic losses in the global swine industry. Although the phylogeny of PEDV has been investigated extensively at the molecular level, there was no time-calibrated phylogenomic study on the virus. To improve insight into this topic, we analyzed 138 published genome sequences using the Bayesian coalescent analyses as well as Bayesian inferences and maximum likelihood methods. All of the global PEDV isolates were divided into six groups, except for one unclassified isolate. Of the six groups, Groups 1-5 comprised pandemic viruses while the remaining Group 6 contained classical isolates. Interestingly, the two clades, both pandemic and classical, consisted of clade-specific amino acid sequences in five genes: ORF1a, ORF1b, S, ORF3, and N. Within the pandemic clade, Group 1 and Group 2 originated from North America, whereas Group 3-Group 5 were derived from Asia. In Group 2, there was a common origin of S INDEL isolates. Within each group, there was no apparent association between temporal or geographic origin and heterogeneity of PEDVs. Our findings also showed that the PEDV virus evolved at a rate of 3.38 × 10-4 substitutions/site/year, and the most recent common ancestor of the virus emerged 75.9 years ago. Our Bayesian skyline plot analysis indicated that the PEDV had maintained constant effective population size excluding only a short period, around 2012, when a valley shaped decline in the effective number of infections occurred.Entities:
Keywords: Phylogenetics; Population size; Porcine epidemic diarrhea virus; The most recent common ancestor
Mesh:
Year: 2018 PMID: 30047110 PMCID: PMC7096959 DOI: 10.1007/s13258-018-0686-0
Source DB: PubMed Journal: Genes Genomics ISSN: 1976-9571 Impact factor: 1.839
The best fit evolutionary models estimated for PEDV genomic regions with Modeltest
| Genomic region | Model | −lnL- | Base frequencies (A, C, G) | Substitution matrix (A–C, A–G, A–T, C–G, C–T) | Pinvar | Nst |
|---|---|---|---|---|---|---|
| Coding region | GTR + I + G | 74448.13 | 0.25, 0.19, 0.23 | 0.78, 2.06, 0.54, 0.54, 5.62 | 0.69 | 6 |
| ORF1a | GTR + I + G | 29562.00 | 0.24, 0.18, 0.24 | 0.74, 2.02, 0.36, 0.28, 5.06 | 0.67 | 6 |
| ORF1b | GTR + I + G | 18720.92 | 0.26, 0.19, 0.22 | 0.45, 2.56, 0.62, 0.24, 9.04 | 0.75 | 6 |
| S | GTR + I + G | 14662.38 | 0.25, 0.20, 0.21 | 1.29, 1.91, 0.69, 1.04, 4.55 | 0.46 | 6 |
| ORF3 | TrN + I | 1699.44 | 0.24, 0.19, 0.19 | 1.00, 3.26, 1.00, 1.00, 6.24 | 0.64 | 6 |
| E | TrN | 563.85 | 0.25, 0.19, 0.18 | 1.00, 0.84, 1.00, 1.00, 6.26 | 0.00 | 6 |
| M | TrN + I | 1422.62 | 0.22, 0.22, 0.23 | 1.00, 2.31, 1.00, 1.00, 8.02 | 0.76 | 6 |
| N | GTR + G | 3976.63 | 0.30, 0.23, 0.24 | 0.49, 1.47, 0.57, 0.64, 4.15 | 0.00 | 6 |
−lnL log likelihood scores, Pinvar proportion of invariable sites
Summary of genomic regions of entire PEDV
| Genomic region | Total sites including gaps, nt/aa | Conserved sites (%), nt/aa | Average identities (%), nt/aa | ω value (dN/dS) |
|---|---|---|---|---|
| Coding region | 27,436/9131 | 23,670 (86.2%)/7694 (84.3%) | 98.9/99.1 | 0.167 |
| ORF1a | 12,309/4103 | 11,047 (89.7%)/3586 (87.4%) | 99.1/99.2 | 0.188 |
| ORF1b | 8035/2678 | 7326 (91.2%)/2475 (92.4%) | 99.1/99.7 | 0.077 |
| S | 4179/1383 | 2900 (69.4%)/868 (62.8%) | 97.9/97.7 | 0.243 |
| ORF3 | 672/224 | 508 (75.6%)/164 (73.2%) | 98.8/98.9 | 0.248 |
| E | 228/76 | 174 (76.3%)/58 (76.3%) | 99.0/99.1 | 0.215 |
| M | 690/226 | 623 (90.3%)/205 (90.7%) | 99.5/99.4 | 0.272 |
| N | 1323/441 | 1085 (82.0%)/338 (76.6%) | 98.6/98.8 | 0.271 |
Conserved specific amino acid sequences between classical and pandemic PEDV clades
| Genomic region | Classical clade (no = 11) | Pandemic clade (no = 127) |
|---|---|---|
| ORF1a | V-813a, E-962, E-1021, V-1596, G-1894, A-2306, S-3074, A-3075, G-3299 | A-813, D-962, D-1019, D-1021, N-1037, L-1596, D-1894, I-2138, S-2306, G-3074, V-3075, S-3299 |
| ORF1b | A-4276, V-6512, V-6740 | S-4276, M-5552, I-6512, I-6740 |
| S | I-6785, I-6847, I-7136, E-7145, T-7329, G-7374, N-7504, A-7739, S-7824, G-7953, Y-7974, S-8012, R-8078 | T-7136, Q-7145, S-7329, S-7374, S-7504, S-7546, V-7739, F-7743, A-7824, D-7953, R-8012, Q-8078 |
| ORF3 | F-8255 | |
| N | H-8931, Q-9086 | L-8931, L-9086 |
aAll positions were referred to PEDV coding genome sequences of strain CV777 (GenBank accession no. AF353511)
Fig. 1Plotting the nucleotide (a) and amino acid (b) sequence differences throughout the complete coding genomes of 138 global PEDVs. The number of differences at each site represents the number of variable isolates estimated with multiple sequence alignment. Each color indicates a different genomic region. (Color figure online)
Fig. 2Bayesian maximum clade credibility phylogenetic tree derived from the complete coding genome sequences of 138 PEDVs. The data set (27,436 bps) was also analyzed phylogenetically using Bayesian inference (BI) and maximum likelihood (ML) methods, and both of them produced identical topology. Divergence times (in years) are positioned below the nodes, and the 95% HPD intervals are in brackets. The credibility of the phylogenetic analysis is presented above the nodes: the left numbers represent Bayesian posterior probabilities (> 0.80) and the right ones represent ML bootstrap values (> 60%). Groups are indicated above the corresponding nodes using colored circles. (Color figure online)
Fig. 3Bayesian skyline plot on the basis of the entire genome sequences of 138 PEDV isolates. The bold line shows the effective population size estimated through time. The upper and lower lines indicate the 95% HPD confidence intervals for this estimation