| Literature DB >> 28458913 |
Adrian J Gibbs1, Kazusato Ohshima2, Ryosuke Yasaka2, Musa Mohammadi3, Mark J Gibbs4, Roger A C Jones5,6.
Abstract
Potato virus Y (PVY) is a major pathogen of potatoes and other solanaceous crops worldwide. It is most closely related to potyviruses first or only found in the Americas, and it almost certainly originated in the Andes, where its hosts were domesticated. We have inferred the phylogeny of the published genomic sequences of 240 PVY isolates collected since 1938 worldwide, but not the Andes. All fall into five groupings, which mostly, but not exclusively, correspond with groupings already devised using biological and taxonomic data. Only 42 percent of the sequences are not recombinant, and all these fall into one or other of three phylogroups; the previously named C (common), O (ordinary), and N (necrotic) groups. There are also two other distinct groups of isolates all of which are recombinant; the R-1 isolates have N (5' terminal minor) and O (major) parents, and the R-2 isolates have R-1 (major) and N (3' terminal minor) parents. Many isolates also have additional minor intra- and inter-group recombinant genomic regions. The complex interrelationships between the genomes were resolved by progressively identifying and removing recombinants using partitioned sequences of synonymous codons. Least squared dating and BEAST analyses of two datasets of gene sequences from non-recombinant heterochronously-sampled isolates (seventy-three non-recombinant major ORFs and 166 partial ORFs) found the 95% confidence intervals of the TMRCA estimates overlap around 1,000 CE (Common Era; AD). We attempted to identify the most accurate datings by comparing the estimated phylogenetic dates with historical events in the worldwide adoption of potato and other PVY hosts as crops, but found that more evidence from gene sequences of non-potato isolates, especially from South America, was required.Entities:
Keywords: Potato virus Y; least squares dating; phylogenetics; probabilistic dating.; recombination
Year: 2017 PMID: 28458913 PMCID: PMC5399925 DOI: 10.1093/ve/vex002
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1The branches of NJ phylogenies calculated from the main genomic ORFs of (A) 240 isolates of PVY and (B) 103 of the isolates that showed no significant evidence of recombination (see text). The marked clusters are of groups named in Kehoe and Jones (2016); cluster C is of groups C1(II) and C2(III); O is of O(I) and O5(X); N is of N(IV), XIII and NA-N(IX); NTN-1 is of NTN-NW, SYR-I(XII), NTN-B(VI), NTN-NW, SYR-II(XI), N-Wi(VII), and N:O(VIII); and NTN-2 of NTN-A(V). Chile 3 is isolate Accession Code FJ214726.
Figure 2The branches of SplitsTree phylogenies of the main genomic ORFs of (A) 240 isolates of PVY and (B) 103 of those isolates that showed no significant evidence of recombination. The marked clusters are the same as those in Fig. 1.
Figure 3A cartoon summarizing the relationships and genomic maps of the five major phylogroups of the PVY isolates shown in Figs 1 and 2.
Figure 4A cartoon summarizing the divergence dates of a ML phylogeny of seventy-three dated n-rec ORF sequences (Table 1 - line 1) estimated by the LSD method, and the dates from the MCC tree of a BEAST analysis of the same data. Bars indicate the 95% CI ranges for both analyses. Arrows indicate the period, 1935 CE–2016 CE, during which the isolates were collected.
Figure 5The NJ phylogeny of the central core regions of the genomes of 166 dated C, O, N, R-1, and R-2 isolates. (A) the Accession Codes of the C isolates are blue, O red, R-1 green, R-2 orange, and N yellow. (B) Summary of the phylogroup composition of the collapsed clusters a–f. The node dates are from LSD estimates of MJ and NJ phylogenies and from the MCC phylogeny of a BEAST analysis. These node dates were used to calculate the date scales assuming linearity. Labeled arrows indicate the dates of the likely origins of the O, R-1, and R-2 populations.
LSD dates of the TMRCA and major nodes in phylogenies of 73 n-rec PVY sequences.
| Dataset | Data | Algorithm | Sites used | Node dates | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Node 1 | Node 2 | Node 3 | Node 4 | Node 5 | Node 6 | Rate | ||||
| ORF | Nucs | ML | All | 1085.4 (1251–123) | 1269.3 | 1551.2 | 1651.4 | 1898.2 | 1901.8 | 2.07 (0.98–2.55) |
| n-syn codons | 1158.3 (1265–792) | 1320.8 | 1575.3 | 1680.5 | 1906.6 | 1903.4 | 2.78 (1.94–3.23) | |||
| syn codons | 1017.6 (1234–151) | 1243.7 | 1538.3 | 1629.3 | 1881.3 | 1904 | 1.77 (0.92–2.21) | |||
| NJ | All | 1204.9 (1496–-17272) | 1280.8 | 1515.2 | 1443.1 | 1890 | 1834.9 | 1.11 ((0.05–1.71) | ||
| n-syn codons | 1243.5 (1513 to − 230) | 1309.1 | 1543 | 1462.8 | 1901.9 | 1839.6 | 1.53 (0.55–2.28) | |||
| syn codons | 1152.5 (1467 to − 7.2 × 108) | 1242 | 1438.7 | 1459.8 | 1846.6 | 1828.4 | 0.91 (1 × 10−6–1.46) | |||
| AAs | ML | All | 1363.6 (1501–238) | 1427.1 | 1603 | 1702.1 | 1943 | 1933.6 | 1.05 (0.41–1.26) | |
| NJ | All | 1459.9 (1642–70) | 1513 | 1615 | 1702.3 | 1933.1 | 1910.4 | 0.86 (0.23–1.19) | ||
| Core | Nucs | ML | All | 1151.2 (1296–703) | 1259.9 | 1626.6 | 1622.8 | 1918.2 | 1925.7 | 2.02 (1.34–2.43) |
| NJ | All | 1307.7 (1521 to − 71) | 1307.7 | 1624.8 | 1544.6 | 1906.2 | 1897.5 | 1.19 (0.40–1.57) | ||
Node dates: positive dates are CE (Common Era = AD), negative are BCE (=BC).
Dataset: ORF—major open reading frame; Core, nucleotides 2206 − 5706.
Data: Nucs, nucleotides; AAs, encoded amino acids.
Algorithm: ML, maximum likelihood (PhyML); NJ, neighbor-joining (ClustalX).
Nodes numbered as in Fig. 4; 95% confidence intervals for Node 1 only.
Rate: evolutionary rate, substitutions/site/year; 95% confidence intervals for Node 1.
Figure 6A cartoon summarizing the TMRCA dates and 95% CIs (vertical scale) of PVY estimated by TempEst, LSD (ML and NJ trees) and BEAST (MCC tree) analyses of the seventy-three n-rec ORF and 161-core sequences, together with historical events that may have influenced the evolution of the virus.
BEAST dates of TMRCA nodes in phylogenies of PVY sequences.
| Dataset | 73 n-rec complete ORFs | All 166 core regions | 73 n-rec core regions | 93 rec core regions |
|---|---|---|---|---|
| Sequence length (nt) | 9201 | 3501 | 3501 | 3501 |
| No. of sequences | 73 | 166 | 73 | 93 |
| Sampling date range | 1938–2013 | 1938–2013 | 1938–2012 | 1970–2013 |
| TMRCA | 3603 (1411–6566) | 1989 (1084–3096) | 2227 (951–3952) | 2006 (688–3755) |
| TMRCA | −1590 (602 to − 4553) | 24 (929 to − 1083) | −214 (1062 to − 1939) | 7 (1325 to − 1742) |
| Substitution rate (nt/site/year) b | 5.97 × 10−5 (2.50 × 10−5–9.56 × 10−5) | 9.99 × 10−5 (6.88 × 10−5–1.32 × 10−4) | 9.24 × 10−5 (4.33 × 10−5–1.39 × 10−4) | 8.66 × 10−5 (3.71 × 10−5–1.37 × 10−4) |
TMRCA, ‘time to the most recent common ancestor’; years before 2013. 95% credibility intervals (CI) in parentheses.
TMRCA, dates; positive dates are CE (Common Era = AD), negative are BCE (=BC). 95% credibility intervals (CI) in parentheses.