Literature DB >> 35591877

Widespread interspecific phylogenetic tree incongruence between mosquito-borne and insect-specific flaviviruses at hotspots originally identified in Zika virus.

Michael W Gaunt1, John H-O Pettersson2, Goro Kuno3, Bill Gaunt4, Xavier de Lamballerie5, Ernest A Gould5.   

Abstract

Intraspecies (homologous) phylogenetic incongruence, or 'tree conflict' between different loci within the same genome of mosquito-borne flaviviruses (MBFV), was first identified in dengue virus (DENV) and subsequently in Japanese encephalitis virus (JEV), St Louis encephalitis virus, and Zika virus (ZIKV). Recently, the first evidence of phylogenetic incongruence between interspecific members of the MBFV was reported in ZIKV and its close relative, Spondweni virus. Uniquely, these hybrid proteomes were derived from four incongruent trees involving an Aedes-associated DENV node (1 tree) and three different Culex-associated flavivirus nodes (3 trees). This analysis has now been extended across a wider spectrum of viruses within the MBFV lineage targeting the breakpoints between phylogenetic incongruent loci originally identified in ZIKV. Interspecies phylogenetic incongruence at these breakpoints was identified in 10 of 50 viruses within the MBFV lineage, representing emergent Aedes and Culex-associated viruses including JEV, West Nile virus, yellow fever virus, and insect-specific viruses. Thus, interspecies phylogenetic incongruence is widespread amongst the flaviviruses and is robustly associated with the specific breakpoints that coincide with the interspecific phylogenetic incongruence previously identified, inferring they are 'hotspots'. The incongruence amongst the emergent MBFV group was restricted to viruses within their respective associated epidemiological boundaries. This MBFV group was RY-coded at the third codon position ('wobble codon') to remove transition saturation. The resulting 'wobble codon' trees presented a single topology for the entire genome that lacked any robust evidence of phylogenetic incongruence between loci. Phylogenetic interspecific incongruence was therefore observed for exactly the same loci between amino acid and the RY-coded 'wobble codon' alignments and this incongruence represented either a major part, or the entire genomes. Maximum likelihood codon analysis revealed positive selection for the incongruent lineages. Positive selection could result in the same locus producing two opposing trees. These analyses for the clinically important MBFV suggest that robust interspecific phylogenetic incongruence resulted from amino acid selection. Convergent or parallel evolutions are evolutionary processes that would explain the observation, whilst interspecific recombination is unlikely.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Keywords:  Mosquito-borne viruses; RY-coding; flavivirus; phylogenetic incongruence; recombination; selection

Year:  2022        PMID: 35591877      PMCID: PMC9113262          DOI: 10.1093/ve/veac027

Source DB:  PubMed          Journal:  Virus Evol        ISSN: 2057-1577


Introduction

The genus Flavivirus (family Flaviviridae) contains more than 50 RNA positive-stranded mosquito-borne flaviviruses (MBFV) that infect well in excess of 400 million humans annually, throughout the tropics, sub-tropics, and warmer temperate regions. They include four dengue virus serotypes, yellow fever virus (YFV), Japanese encephalitis virus (JEV), St Louis encephalitis virus (SLEV), West Nile virus (WNV), and Zika virus (ZIKV). The MBFV lineage also includes some insect-specific flaviviruses (ISFV) (Huhtamo et al. 2009, 2014; Grard et al. 2010) and Ecuador Paraiso Escondido virus (EPEV) an isolate from sandflies (Alkan et al. 2015) which does not replicate in mammalian cells even though it is placed phylogenetically in the MBFV lineage with viruses that are pathogens for mammals, including YFV and DENV. Bat-associated viruses which have no recognized arthropod vector, are also placed phylogenetically in the MBFV lineage. The genus Flavivirus also contains basal members of the ISFV which exist outside the MBFV lineage and are closely related to cell fusing agent virus (CFAV), possibly, a non-vectored precursor of the genus (Cook et al. 2012). Phylogenetic analyses robustly divided the MBFV lineage into two phylo-epidemiological groups, viz. Aedes-associated and Culex-associated MBFV, using either single protein gene trees (Gaunt et al. 2001; Gould et al. 2001) or whole-genome data (Huhtamo et al. 2009, 2014; Cook et al. 2012; Kolodziejek et al. 2013; Moureau et al. 2015). The Aedes-associated groups included YFV and DENV represented as paraphyletic clades (Huhtamo et al. 2009; Kuno et al. 2009; Kolodziejek et al. 2013). However, the Culex-associated flaviviruses formed a monophyletic group that included JEV, WNV, SLEV, and Rocio virus (ROCV) (Gaunt et al. 2001; Gould et al. 2001; Grard et al. 2010; Moureau et al. 2015). Intraspecific homologous phylogenetic incongruence in the MBFV has been identified in each of the four DENV serotypes, JEV, SLEV, and African ZIKV with most studies notably focusing on the E-protein (Holmes et al. 1999; Twiddy and Holmes 2003; Aaskov et al. 2007; Weaver and Vasilakis 2009; Carney et al. 2012; Faye et al. 2014; Worobey and Holmes 1999, Worobey et al. 1999). Surprisingly, in the above studies, intraspecific tree incongruence was not detected in YFV. This might be explained by the fact that the analyses were restricted because they were based on the flavivirus envelope protein (Twiddy and Holmes 2003), limiting their analytical power. Preliminary evidence for potential recombination was also reported in the non-MBFV lineage flavivirus CFAV (Cook et al. 2012), and CFAV recombination has also been observed between structural proteins and non-structural (NS) proteins within the CFAV-associated ISFV group (Baidaliuk et al. 2020). Phylogenetic incongruence can result from different underlying evolutionary processes, including recombination (Simon-Loriere and Holmes 2011; Thézé et al. 2015), selection, and associated parallel or convergent evolution (Sackman et al. 2017; Stern et al. 2017), as well as systematic error involving the tree-building methods, for example, where the substitution rate variation leads to ‘long branch attraction’ (Felsenstein 1978; Huelsenbeck 1997; Olsen 2013) or compositional bias (Phillips et al. 2004; Ishikawa et al. 2012; Breinholt and Kawahara 2013; Song et al. 2016). Recombination is of interest because in flaviviruses it makes biological sense within the theory of protein structure-stability associated with compensatory mutations. State-of-the-art analysis assumes two residues are physically associated in protein space despite not being contiguous within the coding sequence (Jones and Kandathil 2018). For example, the flavivirus E-protein monomer comprises parallel beta-pleated sheets, and a mutation in one beta-pleated sheet could destabilize the structure unless it is accompanied by a compensatory mutation in the parallel sheet. However, recombination ensures that both alleles are transferred to the progeny. In support of this argument, compensatory mutations that were reported in hepatitis C virus recombinants were identified in separate studies (Kalinina et al. 2002; Cusick et al. 2011) and interspecific recombination was detected in the NS5B protein within the genus Hepacivirus of the Flaviviridae family (Thézé et al. 2015). Amongst the MBFV lineage viruses, phylogenetic evidence of interspecific phylogenetic incongruence was reported between the most recent common ancestor of ZIKV and SPOV i.e. an Aedes-associated virus and more than one virus representing the Culex-associated virus clades (Gaunt et al. 2020). This discovery provided a novel and rational explanation for the long-standing observation of phylogenetic incongruence observed in several separate MBFV evolutionary studies. For years, it was unclear whether ZIKV and closely related SPOV belonged, topologically, in the DENV-Aedes-associated clade or the Culex-associated clade. The study provided robust phylogenetic evidence that the ancestral Aedes-associated lineage of ZIKV and SPOV had undergone three distinct interspecific recombination events, each of which involved a genetic crossover with one or more Culex-associated viruses, resulting in hybrid proteomes. With the evidence of robust interspecific phylogenetic incongruence established for ZIKV and SPOV, the recombined loci were observed to associate with distinct immunological properties; these genomes encompassed domains corresponding to the DENV pre-membrane (prM), envelope (E), and non-structural protein (NS1), all of which primarily induce human B-cell immune responses, plus the NS4B protein. In addition, the ZIKV and SPOV proteins that generate dominant CD8+ and/or CD4+ T-cell responses, formed three independent incongruent monophyletic groups within two Culex-associated clades. One of these clades comprised Australasian/Melanesian flavivirus strains with dominant HLA-associated CD8+ epitopes. The other clade involved the JEV-related Culex-associated viruses. The three incongruent Culex clade protein loci responsible for T-cell responses, comprised the capsid (C), NS2A to NS3 DEAD domain, the C-terminal NS3 HELICc domain to NS4A protein, inclusively, and the NS5 protein (Gaunt et al. 2020). Importantly, it has recently been demonstrated that the T-cell response following immunization with a live-attenuated JEV vaccine, produced a cross-protection against ZIKV infection in humans (Wang et al. 2020). This was phylogenetically predicted in our previous publication which identified interspecific recombination between DENV and JEV-related Culex-associated flaviviruses (Gaunt et al. 2020). The discovery of interspecific phylogenetic incongruence between flaviviruses occupying different but potentially sympatric ecological niches raised three important questions, ‘are the three independent occurrences of ancestral hybrids of ZIKV and SPOV isolated events within the MBFV lineage?’. Alternatively, ‘do the interspecific phylogenetic incongruence breakpoints identified in the ZIKV and SPOV ancestor represent “hotspots” for other MBFV lineage viruses?’ More fundamentally, given that the ZIKV and SPOV/DENV (Aedes/Culex) hybrid proteomes represent phylogenetic incongruence over a large genetic distance, ‘could any of the other MBFV lineage viruses, including the ISFV, demonstrate phylogenetic incongruence at these breakpoints?’ In this respect, the previously reported tree incongruence between CFAV-associated ISFV, which are ancestral to the MBFV lineage, is important because it could help towards understanding the evolutionary processes in viruses nearer to or at the root of the tree (Cook et al. 2012; Baidaliuk et al. 2020). Here, following an extensive programme of phylogenetic analyses we report the identification of widespread interspecific phylogenetic incongruence between MBFV lineage viruses notably for amino acid or first and second codon position alignments, whilst third position codon alignments demonstrated within loci phylogenetic incongruence against the other two data types.

Methods

Bayesian amino acid and nucleotide phylogenies analysed using Beast

The uncorrelated relaxed clock was run using all available models, viz. the lognormal, gamma (γ), and exponential distributions. (Suchard et al. 2018) The strict and all relaxed clock models were run under two coalescent priors constant size and Skyline plot (Drummond et al. 2005). All Bayesian MCMC calculations were performed in duplicate using two different random number seeds that were identical between data sets. Each Bayesian MCMC comprised 5 × 106 replications and a burn-in of 0.5 × 106 replications. If convergence was not observed the calculation was repeated for 1 × 107, 2 × 107, or 5 × 107 replications until convergence was observed. Bayesian MCMC convergence was assessed firstly using the tools within the Beast v1.10.4 package via the effective sample size (ESS) (‘loganalyser’) where the burn-in was applied, the two MCMC run tree and log files combined (‘logcombiner’) and the following parameters assessed for an ESS > 200: marginal, likelihood, prior, population size, tree model root height, tree length, ucld/ucgd mean, coefficient of variation, covariance, tree likelihood, branch rates, and coalescent. All statistical outputs of the ESS are available on request. The duplicate MCMC runs were assessed individually for convergence against each other using the diversity of graphing tools in Tracer 1.7 (Rambaut et al. 2018). The relaxed clock model, running under an exponential distribution, did not achieve convergence within 2 × 107 replications for all large data sets and was therefore discontinued. Hereafter, we refer to the remaining Bayesian phylogenetic calculations as the ‘6 models’, viz. one strict and two relaxed clocks (lognormal and gamma distributions) each for two population size assumptions (constant and ‘skyline’).

Beast and maximum likelihood amino acid phylogenies analysed using the LG matrix

The phylogenies using all members of the MBFV were generated using amino acid data rather than nucleotides to reduce problems arising from substitutional saturation in the third codon position. Moreover, amino acid data carry more information because they report on transversion substitutions within the third codon position for a large proportion of amino acid mutations. Recent applications of the LG matrix (Le and Gascuel 2008) support this assumption (Wolf et al. 2018; Gaunt et al. 2020).

Tree visualization

Dendroscope 3 was used as the primary phylogenetic tool for visualizing different tree structures between loci under the network clustering algorithm (Huson and Scornavacca 2012). In Dendroscope 3, differences between trees are represented as reticulation, approximating to a ‘tri-furcating’ tree.

Phylogenetic robustness analysed non-parametric, parametric bootstrapping, and posterior probabilities

Subsequent analyses were based on the observation that there was a maximum of one phylogenetic incongruence for a given flavivirus, i.e. a robustly incompatible tree structure from an amino acid alignment identified using maximum likelihood non-parametric bootstrapping and then assessed by Bayesian MCMC posterior probabilities for all six models. These two alternative structures were examined using parametric bootstrapping and again verified for supporting evidence using posterior probabilities for all six models (below). The rationale for the two approaches is that non-parametric bootstrap analysis provides a means of data mining whereas parametric bootstrapping (SOWHAT test) enables two alternative scenarios to be precisely examined (below). Posterior probabilities provide support for both methods because they have been observed to provide higher support than non-parametric bootstrapping (Douady et al. 2003; Alfaro et al. 2003). Phylogenetic incongruence was considered robust by non-parametric bootstrapping when nodes were supported by values greater than or equal to 80 per cent and for Bayesian analysis a posterior probability greater than 0.95 is considered robust.

Alignments

The genome-wide alignments in this study comprised seven loci representing robust interspecific phylogenetic breakpoints identified in the ZIKV–SPOV lineage are presented in Fig. 1 (Gaunt et al. 2020) together with the immunological associations for the breakpoints. It is important to note that the two combinatorial genomic loci identified by breakpoints in Fig. 1 (defined below) were combined without reference to B- or T-cell immunity, instead the aim was to obtain the most robust phylogenies with which to identify the interspecific phylogenetic incongruence that resulted in ZIKV and SPOV.
Figure 1.

Schematic representation of the recombination breakpoints and resulting seven protein loci across the MBFV amino acid genome, viz. prM, E, NS1–NS3 HELICc, NS2A–NS3 DEAD, C-terminal NS3–NS4A, NS4B and C–NS5, identified in the interspecific recombination events between ZIKV–SPOV (DENV Aedes-associated clade) with the Culex I and II-associated MBFV resulting in a chimeric genome. (Supplementary Information 1). Combined locus pairs (NS1–NS3 HELICc and C–NS5) were identical in topology using the SOWHAT test prior amalgamation. The predominant immunological associations are indicated in the diagram below after the recombination breakpoints were robustly established for both contiguous and combined loci. Note, C-protein and NS5-protein are juxtaposed during replication due to genome circularisation.

Schematic representation of the recombination breakpoints and resulting seven protein loci across the MBFV amino acid genome, viz. prM, E, NS1–NS3 HELICc, NS2A–NS3 DEAD, C-terminal NS3–NS4A, NS4B and C–NS5, identified in the interspecific recombination events between ZIKV–SPOV (DENV Aedes-associated clade) with the Culex I and II-associated MBFV resulting in a chimeric genome. (Supplementary Information 1). Combined locus pairs (NS1–NS3 HELICc and C–NS5) were identical in topology using the SOWHAT test prior amalgamation. The predominant immunological associations are indicated in the diagram below after the recombination breakpoints were robustly established for both contiguous and combined loci. Note, C-protein and NS5-protein are juxtaposed during replication due to genome circularisation. The methods and resulting alignments used to confirm interspecific phylogenetic incongruence in ZIKV–SPOV are described in Supplementary Information 1 (Forslund et al. 2004; Lemey et al. 2009; Gaunt et al. 2020). In summary, the MBFV lineage alignments of Fig. 1 and Supplementary Information 1 delineate: the proteins prM and E; a contiguous amino acid locus from NS2A to the NS3 DEAD domain (hereafter termed NS2A–NS3 DEAD); a contiguous locus from N-terminal NS3 HELIC domain and NS4A protein; NS4B protein; combined loci of NS1 and partial NS3 HELICc domains (hereafter termed NS1–NS3 HELICc); and a combined locus of C-protein and NS5 protein (hereafter termed C–NS5).

Amino acid alignments for maximum likelihood bootstrapping and Beast

Two variations in alignments were used between the two core approaches in this study described in Fig. 2: ‘data mining’ for non-parametric bootstrap analysis (70 taxa alignment) and the Bayesian analysis and ‘SOWHAT’ test used in the analysis (50 taxa alignment). Non-parametric bootstrapping was conducted for 1000 replications using 70 flavivirus genome sequences, comprising 50 different members of the MBFV lineage and two tick-borne virus genomes to root the tree (Supplementary Table S1). A reduced taxa alignment was required for amino acid parametric bootstrapping because overall this calculation is more intensive than non-parametric bootstrapping (SOWHAT test described below). The Bayesian and parametric bootstrap analysis used the same alignments as the non-parametric bootstrap analysis but with 50 flavivirus genomes, comprising 48 different members of the MBFV lineage and two tick-borne flaviviruses (Supplementary Table S1). Duck egg-drop syndrome virus (DEDSV) and Israel turkey meningoencephalitis virus (ITV) were discarded because genetically they were almost identical at all loci to Baiyangdian virus (BYDV) and Bagaza virus (BAGV), respectively.
Figure 2.

The analytical approach to genome-wide interspecific recombination using ‘data mining’ via non-parametric bootstrap analysis and then the SOWHAT test followed by Bayesian analysis between the MBFV described for the seven loci described in Fig. 1. The’data mining’ method and SOWHAT amino acid alignment datasets, differ in the number of taxa in the analysis, but not in interspecific genetic diversity. Note, the trees used for SOWHAT analysis are zero constrained.

The analytical approach to genome-wide interspecific recombination using ‘data mining’ via non-parametric bootstrap analysis and then the SOWHAT test followed by Bayesian analysis between the MBFV described for the seven loci described in Fig. 1. The’data mining’ method and SOWHAT amino acid alignment datasets, differ in the number of taxa in the analysis, but not in interspecific genetic diversity. Note, the trees used for SOWHAT analysis are zero constrained.

Nucleotide alignments, RY-coding, and associated phylogenies

Nucleotide phylogenies were produced for the same loci but utilizing restricted taxa to minimize nucleotide saturation. This analysis comprised two alignments each for first and second codon positions and a separate alignment for third codon positions, hereafter called ‘wobble codons’: (1) Culex I group using the KOKV group as the outgroup; (2) YFV group using the SOKV group as the outgroup. Each MBFV group listed here is described in Fig. 2. To overcome third codon position saturation all wobble codons were purine (R) and pyrimidine (Y), RY, transformed (Phillips et al. 2004). Nucleotide analysis used both maximum likelihood (RAxML) non-parametric bootstrapping for 1000 replications (Stamatakis 2014) and Bayesian methods (Beast v1.10.4; Suchard et al. 2018) both under a general time reversible (GTR) model with 4-category γ-distribution. All further methods involved in nucleotide tree construction were identical to the amino acid methods, with the exception that a single relaxed clock (lognormal) was run under both a constant and ‘Skyline’ (3–5 groups) population size assumption. The Bayesian MCMC could not achieve convergence in any RY-coded data set therefore phylogenetic analysis was discontinued for these data sets.

Parametric bootstrapping

Likelihood-based parametric tests of tree topologies have a long history in phylogenetics and the power of a parametric bootstrap over non-parametric bootstrapping resampling is well established (Goldman 1993; Hillis et al. 1996; Huelsenbeck and Bull 1996), in particular one method designated the Swofford–Olsen–Waddell–Hillis (SOWH) test (Hillis et al. 1996) being known for its rigorous verification (Goldman et al. 2000). A standard SOWH test compares the log-likelihood of the maximum likelihood (unconstrained tree) to the log-likelihood of an a priori tree hypothesis and the difference in log-likelihood scores between the unconstrained and constrained tree is the test statistic and denoted δ, i.e. δ ≡ LtML–Lt1 or t2. The null distribution is calculated by parametric bootstrapping using the tree parameters and the maximum likelihood tree of the constrained tree (t1) to generate a Monte Carlo simulation of amino acid sequence data. In this study 100 parametric bootstrap replicates were calculated for each test using amino acid data and 500 replicates for nucleotide data both using Seq-Gen v1.3.4 (Rambaut and Grassly 1997). Seq-Gen permitted simulation under the same models used for non-parametric bootstrapping (LG matrix for amino acid data and GTR for nucleotide data both with a 4-category γ distribution). The test statistic was compared against a null distribution at 1 per cent threshold for amino acid data and 5 per cent for nucleotide data due to the increased number of replicates; thus for amino acid data if more than one value in the null distribution was greater than δ the constrained tree could not be significantly different from the unconstrained maximum likelihood tree. Furthermore if more than 10 per cent of the null distribution was greater than δ then the constrained tree was assumed to be the unconstrained maximum likelihood tree. SOWHAT is the most recent development of SOWH (Church et al. 2015) and was used in this study. The default option was set to ensure that SOWHAT propagates the same number of insertions/deletions (‘indels’) as there are in the original comparative sequence alignment. The SOWHAT pipeline was modified to ensure that the ‘LG’ variable could be inserted into both the RAxML and Seq-Gen (regex modification at line 510). SOWHAT is a Perl5 data pipeline using RAxML v8.2.12 and Seq-Gen v1.3.4, which in addition to automating SOWH incorporates more recent developments in parametric bootstrapping (Susko 2014). SOWHAT was leveraged within an optimized Docker container for nucleotide data (shchurch/sowhat). We followed these recommendations using constrained trees that are minimally resolved and hereafter we use the formal designation describing such trees as a ‘zero-constrained’ tree.

Phylogenetic incongruence

In this analysis, phylogenetic incongruence is a conditional probability where, P(H1 | H0) or P(H0 | H1) ≤ 0.01 Here H0 is tree 1 (t1) and H1 is tree 2 (t2). ‘True’ means is not significantly different from the null distribution. In other words, one tree must be accepted where P > 0.1 and the alternative tree rejected if incongruence is to be identified. If P(H0 and H1) ≤ 0.01 then the tree is unresolved and if P(H0 and H1) > 0.05 the result is not significant, most likely resulting from a lack of phylogenetic resolution. For simplicity in Results we define for example P(H1 | H0) < 0.01 as ‘H0 accepted, H1 rejected’. The calculation is very similar to the likelihood ratio test (LRT) using parametric bootstrapping developed and used to identify genetic loci which were phylogenetically incongruent (Huelsenbeck and Bull 1996).

Positive selection analysis ω

We investigated whether selection was acting on individual amino acid residues using a maximum likelihood method to assess the nonsynonymous (amino acid–altering) to synonymous (silent) substitution rate ratio (ω = dN/dS) specifically at branches representing phylogenetic incongruence under the branch-lineage model through ‘CodeML’ within PAML (Yang and Nielsen 2002). This will result in two values for ω, one concerning the phylogenetic incongruence branch under investigation and the background ω for all other branches. The incongruent branch was designated as either: (1) the branch associated with the ancestral node if the incongruence resulted in a monophyletic group or (2) the branch connecting two paraphyletic groups if the phylogenetic incongruence resulted in a paraphyly. Multiple incongruence was present in the bifurcating trees under ω analysis, therefore the incongruent lineages not under immediate analysis were presented as polytomies, in addition to the root lineages and for SLEV group NTA group viruses. A PAML (Yang 2007) pipeline was leveraged through the ETE3 suite (Huerta-Cepas et al. 2016). All materials described in Methods, including all alignments, are available on request.

Results

Phylogenetic overview

A cluster network maximum likelihood consensus amino acid phylogeny was generated for 50 members of the MBFV lineage representing 70 taxa (Fig. 3). The consensus phylogeny superimposes the E- and C-NS5 protein trees onto a single tree, where differences in tree branching order (incongruence) between the E-protein and C–NS5 locus are represented in curved blue lines that support ‘floating branches’. The phylogeny summarises the common genetic pattern of interspecific tree incongruence observed across the MBFV lineage, as identified by incongruence between the E- and C–NS5 proteins, with the exception of YFV which does not show incongruence between these two loci in this specific analysis (tree 1), and the ISFV remain unresolved. The horizontal red branches, labelled trees 1–8, each highlight a putative phylogenetic incongruence, viz. YFV (Tree 1); Uganda S virus (UGSV), Banzi virus (BANV) and Bouboui virus (BOUV) (Tree 2); JEV and Usutu virus (USUV) (Tree 3); WNV and Yaounde virus (YAOV) (Tree 4); Rocio virus (ROCV) and Ilheus virus (ILHV) (Tree 5); Ecuador Paraiso Escondido virus (EPEV)(Tree 6); the ‘bat MBFV lineage viruses’ comprising the three members of the Sokuluk virus (SOKV) group (Tree 7); and two members of the ISFV, BJV, and NOUV, from the six members of this group (Tree 8). In summary, Trees 1–2 represent the YFV group, Trees 3–5 represent the Culex-associated clades, and Trees 6–8 inclusively represent the MBFV lineage viruses not associated with a mosquito-vertebrate transmission cycle. The rationale of combining the C-NS5 proteins into a single locus for comparison with the other combinatorial locus in this analysis, is summarized in Methods and was described in detail previously (Gaunt et al. 2020).
Figure 3.

Amino acid maximum likelihood network consensus phylogeny (Dendroscope 3) between the E-protein and C-NS5 protein phylogenies where differences in tree branching order (incongruence) between the E-protein and C-NS5 trees are represented by curved vertical lines that support ‘8 floating horizontal lines’ representing the MBFV members of interest to this study. ‘Trees 1–8’ denote the eight interspecific recombination events, better defined in Table 1.

Amino acid maximum likelihood network consensus phylogeny (Dendroscope 3) between the E-protein and C-NS5 protein phylogenies where differences in tree branching order (incongruence) between the E-protein and C-NS5 trees are represented by curved vertical lines that support ‘8 floating horizontal lines’ representing the MBFV members of interest to this study. ‘Trees 1–8’ denote the eight interspecific recombination events, better defined in Table 1.
Table 1.

The null and alternate hypothesis trees defined by robust non-parametric bootstrap analysis described in Fig. 4 in addition to further topologies defined by Bayesian amino acid trees.

Phylogenetic incongruenceNull hypothesis (H0) treeAlternate hypothesis (H1) tree
Tree 1YFVaMonophyly with WSLV & SEPVBasal to the YFV groupd
Tree 2UGSVSister group with BANVSister group with BOUV
Tree 3USUVSister group with JEVParaphyly with JEV
Tree 4WNVb,cParaphyly with YAOV ancestralMonophyly with YAOV
Tree 5ILHV & ROCVBasal to the NTAV groupdBasal to the Culex I groupd
Tree 6EPEVBasal to the YFV groupBasal to DENV, ISFV & Culex group
Tree 7SOKV groupcBasal to YFV group & EPEVBasal to MBFV
Tree 8BJV & NOUVMonophyly of all ISFVcParaphyly with other ISFV & basal to DENV & Culex group

A third tree extended Tree 1 H0: the YFV, WSLV, and SEPV monophyly included both EHV and BGV.

A third tree extending Tree 4 H0 and is denoted H2: H2 described WNV, CPCV and YAOV as monophyletic.

A further tree extending Tree 4 H0 is denoted H2: WNV and YAOV remain paraphyletic however WNV defined the immediate ancestral node rather than YAOV.

YFV, Culex, SOKV, DENV and ISFV groups are defined in Fig. 3.

An extensive programme of phylogenetic analyses, involving 50 recognized MBFV lineage members was instigated to look for evidence of widespread interspecific phylogenetic incongruence amongst the MBFV. The figures present amino acid maximum likelihood analyses and posterior probabilities of phylogenetic incongruence for all non-ZIKV MBFV and ISFV using the breakpoints for ZIKV described recently (Gaunt et al. 2020). Numbers above each node are percentage bootstrap values which in most cases are above 75 per cent. The order of the bootstraps and posterior probabilities along each internal branch represents the separate protein trees that identify the protein loci defined immediately above each tree. For example, in Fig 4a i, the left-hand tree represents the E-protein, whilst the right-hand mirror tree depicts separate protein alignments for NS1–NS3 HELICc (bootstrap ‘a’), NS2A–NS3 DEAD (bootstrap ‘b’) and NS4A–NS3 HELICc (bootstrap ‘c’) and the respective bootstrap scores a, b,c:, are indicated above the major branches followed by the posterior probabilities a; b;c. This protocol is adopted for all trees presented in Fig 4. The ‘Bat MBFV’ in Fig. 4c refers to bat-borne flaviviruses within the MBFV lineage. Given the precedent set by the discovery of phylogenetic incongruence involving ZIKV and SPOV (Gaunt et al. 2020), it seemed justified to ask the question, ‘do the remaining genomic loci in the analysis look like the E tree (alternate hypothesis, H1) or the C-NS5 tree (null hypothesis, H0)’? By definitively identifying which of the other loci in the genome associate with either the ‘E-protein’ tree topology or a ‘C–NS5 protein’ tree topology, the hybrid structure of tree incongruence in the given MBFV genome could be determined initially following the procedures employed for ZIKV and SPOV. The E- versus C–NS5 trees provide a general pattern (not a rule) to conceptualize the analysis for MBFV lineage virus hybrid genomes.

Is phylogenetic incongruence widespread amongst the MBFV?

Data mining

In attempting to assess the extent of the putative interspecific phylogenetic incongruence amongst the MBFV lineages, the first approach was data mining as described in Fig. 2. This examined the maximum likelihood non-parametric bootstrapping for the 70 flavivirus alignments and was confirmed using Bayesian posterior probabilities as indicated in ‘Phylogenetic Overview’ and in Fig. 3, for each locus originally defined by the breakpoints identified for the interspecific phylogenetic incongruence of ZIKV and SPOV (Fig. 1) (Gaunt et al. 2020). Interspecific tree incongruence was a possibility if the two mirror phylogenetic trees produced different robust branching topologies for two different MBFV lineage viruses (Methods). Robust phylogenetic incongruence identified by the non-parametric bootstrapping for Trees 1–8 is described in Fig. 4 and summarised in Table 1 using the loci described in Fig. 1. Each mirror tree in Trees 1–8 revealed two examples of interspecific incongruence between two or more loci and generated a null hypothesis (H0) and an alternative hypothesis (H1) which are summarised in Table 1. In all cases, C–NS5 loci were considered to be the ‘null hypothesis’ and E ‘the alternative hypothesis’, where incongruent lineages for C–NS5 or E are robust. In cases, where incongruence results in a monophyletic group against a paraphyletic group the monophyly is considered to be ‘null’.
Figure 4.

An extensive programme of phylogenetic analyses, involving 50 recognized MBFV lineage members was instigated to look for evidence of widespread interspecific phylogenetic incongruence amongst the MBFV. The figures present amino acid maximum likelihood analyses and posterior probabilities of phylogenetic incongruence for all non-ZIKV MBFV and ISFV using the breakpoints for ZIKV described recently (Gaunt et al. 2020). Numbers above each node are percentage bootstrap values which in most cases are above 75 per cent. The order of the bootstraps and posterior probabilities along each internal branch represents the separate protein trees that identify the protein loci defined immediately above each tree. For example, in Fig 4a i, the left-hand tree represents the E-protein, whilst the right-hand mirror tree depicts separate protein alignments for NS1–NS3 HELICc (bootstrap ‘a’), NS2A–NS3 DEAD (bootstrap ‘b’) and NS4A–NS3 HELICc (bootstrap ‘c’) and the respective bootstrap scores a, b,c:, are indicated above the major branches followed by the posterior probabilities a; b;c. This protocol is adopted for all trees presented in Fig 4. The ‘Bat MBFV’ in Fig. 4c refers to bat-borne flaviviruses within the MBFV lineage.

The null and alternate hypothesis trees defined by robust non-parametric bootstrap analysis described in Fig. 4 in addition to further topologies defined by Bayesian amino acid trees. A third tree extended Tree 1 H0: the YFV, WSLV, and SEPV monophyly included both EHV and BGV. A third tree extending Tree 4 H0 and is denoted H2: H2 described WNV, CPCV and YAOV as monophyletic. A further tree extending Tree 4 H0 is denoted H2: WNV and YAOV remain paraphyletic however WNV defined the immediate ancestral node rather than YAOV. YFV, Culex, SOKV, DENV and ISFV groups are defined in Fig. 3.

Non-parametric bootstrapping and Bayesian phylogenetic analysis

All observations of possible incongruence were confirmed, using Bayesian posterior probabilities for Trees 1–5 and Tree 8 (Section 3.3.3). Trees 6 and 7 which represented the deepest nodes in the MBFV tree (Fig. 4) were not confirmed. The E-protein locus topology for Tree 6 H1 (Table 1; EPEV) was supported for a relaxed clock with a dynamic (‘skyline’) population size at pp = 0.96, but pp = 0.88–0.93 for all other models. Secondly, the NS4A topology for Tree 7 H1 had pp = 0.951 for a relaxed ‘skyline’ clock, but pp = 0.77–0.88 for other models.

Non-parametric bootstrapping summary

Most loci did not show significant incongruence using non-parametric bootstrapping and are not included in Fig. 4. Non-parametric bootstrapping was insufficiently powerful specifically to identify, genome-wide interspecific phylogenetic incongruence for the viruses represented in Trees 1–8. For example, the C-NS5 protein topology Tree 1 H1 is robustly supported by Bayesian relaxed clock models but not by non-parametric bootstrapping (Fig. 4d). However non-parametric bootstrap analysis was useful because H0 and H1 for Trees 1–8 could then be examined on a genome-wide basis for a given locus using LRT parametric bootstrapping, a more powerful phylogenetic method.

Genome recombination: LRTs via parametric bootstrapping

Parametric bootstrapping was used to resolve the null and alternative trees for all loci within the genome for the same phylogenetic model (LG amino acid substitution matrix) under the SOWHAT test (Materials and Methods). The SOWHAT test for phylogenetic incongruence specified in Materials and Methods is a conditional probability where either H0 must be rejected and the alternative H1 accepted or vice versa. All results of the SOWHAT analysis are schematically represented in Fig. 5 and numerically described in Table 2 for MBFV that produce clinically apparent infections (Trees 1–5) and Table 3 for MBFV lineage viruses not associated with a mosquito-vertebrate transmission cycle (Trees 6–8).
Figure 5.

Schematic diagram of the MBFV lineage genome showing phylogenetic incongruence using the SOWHAT test (Tables 2 and 3), amino acid Bayesian posterior probabilities (models 1–6) and first and second codon trees for SLEV and YFV group alignments (Methods). NS3 HEL§ refers to NS3 HELICc and NS4A* refers to the C-terminal NS3 HELICc as well as NS4A. The line below each genome denotes the circularisation during replication connecting C to NS5. All non-parametric bootstraps and posterior probabilities are available on request.

Table 2.

Interspecific genomic incongruence for the MBFV using the likelihood ratio test performed using parametric bootstrapping to determine between two alternative interspecific trees for the clinically important Aedes- and Culex-associated viruses, including YFV, UGSV, JEV/USUV, WNV and ROCV.

Tre no.MBFV TreeLocusLikelihood ratio test (SOWHAT)Phylo. Incon-gruent Gen.?
Null (H0)Alt. (H1)
Δ P δ P
Aedes-associated1 H 1:E13.3<0.01*00.31Yes
YFV-basal YFV grpC–NS54.9<0.01*6.0e−60.1
prM1.9<0.01*3.2<0.01*Unre-solved
H 0: YFV-WSLV/SEP monophylyNS1-NS3 HELICc00.3114.9<0.01*Yes
NS2A-NS3 DEAD1.0e−60.997.9<0.01*
3ʹHELICc–NS4A7.6e−50.0916.5<0.01*
NS4B2.6e−30.211.9<0.01*
2 H 1:E22.2<0.01*00.53Yes
UGSV-BOUV3ʹHELICc–NS4A1.20.0572.4e−50.3
NS4B3.5<0.01*2.9<0.01Unre- solved
NS2A-NS3 DEAD0.5<0.01*3.2<0.01
H 0:prM00.46.7<0.01*Yes
UGSV-BANVNS1-NS3 HELICc00.4819.6<0.01*
C–NS51e−60.247.9<0.01*
Culex-associated3 H 1:E7.3<0.01*00.4Yes
USUV-ALFV/MVEV monophylyi.e. JEV-USUV paraphyly3ʹHELICc–NS4A2.1<0.01*1.6e−50.2
NS1-NS3 HELICc17.1<0.01*00.46
NS2A-NS3 DEAD5.2<0.01*2 e−60.41
NS4B2.10.019e−60.38
prM0.360.010.560.13
H0: JEV-USUVC–NS52.9e−50.1327.3<0.01*Yes
4 H 1: WNV–YAOV monophylyprM4.9<0.01*7e−60.6Yes
NS1-NS3 HELICc16.8<0.01*00.39
NS2A-NS3 DEAD8.6<0.01*3 e−60.29
E3.8<0.01*4.6<0.01*Unre-solved
3ʹHELICc–NS4A3.1<0.01*3.84<0.01*
NS4B9.6<0.01*10.18<0.01*
H0: WNV–YAOV paraphylyC-NS51e−60.313.5<0.01*Yes
5 H 1: ROCV/ILHV-basal Culex IE4.3<0.01*00.33Yes
NS1-NS3 HELICc7.9<0.01*00.39
prM4.4<0.01*0.090.04Not sig.
NS2A-NS3 DEAD3.2<0.01*9.6<0.01Unre-solved
H 0:ROCV/ILHV—NTAV grp (Culex I) monophyly3ʹHELICc–NS4A4 e−60.275.4<0.01Yes
NS4B0.0010.212.9<0.01
C-NS57 e−5<0.0110.2<0.01

Note: ‘-’ replaces ‘–’ for brevity.

 = significant at 1 per cent.

Table 3.

Interspecific genomic incongruence for the MBFV lineage as described in Table 1 to distinguish between two alternative interspecific trees for all members which are not associated with a dual-host-affiliated insect-specific flavivirus.

Tre. no.MBFV TreeLocusLikelihood ratio test (SOWH)Genome(s) phylogenet. incongruent?
Null (H0)Alt. (H1)
δ P δ P
Sandfly-associated6 H 1:EPEV basal to DENV & CulexE29.2<0.01*00.47Yes
H 01: EPEV basal toprM2e−60.543.2<0.01*Yes n.b. H01 identical to Tree7 H0
YFV & SOKV grpsNS1-NS3 HELICc00.2721.0<0.01*
NS2A-NS3 DEAD00.548.4<0.01*
NS4B−1.3e−311.8<0.01*
H 02: EPEV basal3ʹ HELICc-NS4A1.3e−50.214.2<0.01*Yes
singly to YFV grpC-NS5−3e−6122.4<0.01*
Bat-associated7 H 1:SOKV grp—basal MBFV3ʹHELICc—NS4A6.9<0.01*5.2e−50.19Yes
E6.3<0.01*8.2<0.01*See Tree 6 H1:EPEV-DENV/Culex
H 0:prM2e−60.5413.3<0.01*Yes
SOKV grp basalNS1-NS3 HELICc00.2721.0<0.01*
to YFV grpNS2A-NS3 DEAD00.545.4<0.01*
NS4B−1.3e−30.911.75<0.01*
C-NS500.9718.9<0.01*
Insect-specific8 H 1: ISF paraphyly—E00.67121.3<0.01*Yes
to DENV & CulexNS2A-NS3 DEAD1 e−60.2713.9<0.01*
H 0: ISF monophylyprM4.4<0.01*00.61Yes
NS1-NS3 HELICc7.7<0.01*00.48
3ʹHELICc—NS4A0<0.01*8e−60.26
NS4B16.2<0.01*00.99
C-NS55.5<0.01*1e−60.94

n.b. Tree 6 H01 and Tree 7 H0 are the same tree because EPEV is basal to YFV and SOKV groups and the SOKV group is basal to the YFV group. However, Tree 6 H1 and Tree 7 H1 are very different trees.

 = significant at 1 per cent; grp = group; grps = groups. See ‘Note’ in Table 2.

Summarized observations of parametric bootstrapping

The SOWHAT technique robustly supported the phylogenetic incongruence hypothesis defined for Trees 1–8 (Table 1). Overall, a total of 47 out of 55 loci could be defined as either tree H0 or tree H1 at 1 per cent significance (Fig. 5; Tables 2 and 3). Therefore, all SOWHAT observations support and significantly extend the non-parametric bootstrap analysis across each flavivirus genome that revealed interspecific hybrid genomes.

Genome resolution

The parametric bootstrap analysis resulted in 4 of the 8 phylogenetic incongruence topologies being fully resolved as defined in Table 1 across their respective flavivirus genomes for Tree 3 (USUV–JEV), Tree 6 (EPEV), Tree 7 (SOKV group), and Tree 8 (ISFV).

Comparison of robustness between parametric bootstrapping and posterior probabilities

An assessment of parametric bootstrap robustness with posterior probabilities for all 6 Bayesian models deployed (Methods) showed a high level of congruence between the amino acid tree methods (Fig. 5). The posterior probabilities confirmed phylogenetic incongruence defined by Trees 1–5 and for one locus in Tree 8 that involved the NS2A locus (H1; pp = 0.99–1.0). The Bayesian approach was equivocal for the topological incongruence described in Trees 6 and 7 (Section 3.2.2, viz ‘Non-parametric bootstrapping and Bayesian phylogenetic analysis’) and lacked support for the E-protein phylogenetic incongruence for Tree 8. Overall, the 6 Bayesian models resulted in a total of 40 out of 55 loci being robustly defined as either an H0 or H1 tree (Fig. 5), hereafter simplified to ‘robust tree loci’: 38 of these robust tree loci were congruent amongst the 47 robust tree loci defined by parametric bootstrapping; and 2 of these robust tree loci (prM, Tree 1 and NS4A Tree 3) corresponded to loci that could not be resolved by parametric bootstrapping. The two disparities between the methods were due to a third phylogeny that was neither H0 nor H1 for the prM locus that involved Tree 1 (YFV; pp = 0.97–1.0), as well as the NS4B locus that involved Tree 4 H0 (WNV; pp = 1.0) and was described in the footnotes of Table 1. Most loci lacking posterior probability support were small proteins either prM or NS4A.

Comparison of phylogenetic incongruence between amino acid and nucleotide data

Nucleotide analysis on first and second codon position.

To assess whether interspecific recombination was the evolutionary process of phylogenetic incongruence it is reasonable to assume the neutral phylogeny would be congruent with the amino acid tree for the same locus. Therefore, restricted taxa nucleotide alignments were produced based on the YFV group and the SLEV groups to minimize substitution saturation as described in Methods (Tree 1–5). Phylogenetic analysis on first and second codon trees for each of the seven loci (Fig. 1) showed robust congruence with the amino acid trees, Tree 1 (YFV) and Tree 2 (YFV group) being particularly notable examples (Fig. 5). The exceptions were two loci for Tree 3, supported singly by Bayesian posterior probabilities and one locus for Tree 4 supported by non-parametric bootstraps and posterior probabilities described below (Fig. 5). All trees, non-parametric bootstraps, and posterior probabilities are available on request. Schematic diagram of the MBFV lineage genome showing phylogenetic incongruence using the SOWHAT test (Tables 2 and 3), amino acid Bayesian posterior probabilities (models 1–6) and first and second codon trees for SLEV and YFV group alignments (Methods). NS3 HEL§ refers to NS3 HELICc and NS4A* refers to the C-terminal NS3 HELICc as well as NS4A. The line below each genome denotes the circularisation during replication connecting C to NS5. All non-parametric bootstraps and posterior probabilities are available on request. Wobble codon alignments of the SLEV and separately the YFV groups were firstly RY-coded (Phillips et al. 2004) (Methods) because both alignments were close to saturation for this codon position with an average p-distance of 0.61–0.64 for all loci (minimum of 0.49 and most maxima <0.72, with minority <0.76 all associated with NMV [Fig. 6]), with the exception of the C-NS5 locus which had an average p-distance of 0.57–0.59 (minimum 0.49 and maximum 0.65). The RY transformed wobble codon data sets were concatenated into single alignments to boost the phylogenetic signal based on whether a given locus supported H0 or H1 under the parametric bootstrapping analysis for amino acid data (Fig. 5) and are described in Table 4.
Figure 6.

RY-coded wobble codon maximum likelihood trees from concatenated loci and their comparison to single locus amino acid maximum likelihood and Bayesian trees. Note the Bayesian MCMC would not converge for the RY-codon ‘wobble codon data’ but did achieve convergence for non-RY-coded, nucleotide data (Fig. 5).

Table 4.

Interspecific genomic incongruence for the RY-coded third codon nucleotide data of the MBFV using the likelihood ratio test performed using parametric bootstrapping to determine between two alternative interspecific trees for the clinically important Aedes- and Culex-associated viruses, including YFV, UGSV, JEV/USUV, WNV and ROCV. See ‘Note’ in Table 2.

Tree no.MBFV TreeCombined LocusaLikelihood ratio test (SOWH)Robust Typology
Null (H0)Alt. (H1)Alt (H2)
δ P δ P δ P
Aedes-associated1 H 1: YFV-basal YFV grpprM, E, C–NS50.00.370.7740.014 H 0
H 0: YFV-WSLV/SEP monophylyNS1-NS3 HELICc, NS2A-NS3 DEAD,3ʹHELICc–NS4A, NS4B0.00.990.00.994None
2 H 1 UGSV-BOUVE, 3ʹHELICc–NS4A,0.00.342.820.036 H 0
H 0: UGSV-BANVprM, NS1-NS3 HELICc, C–NS50.0140.662.540.044 H 0
Culex-associated3 H 1: USUV-ALFV/MVEV monophyly (JEV-USUV paraphyly)prM, E, NS1-NS3 HELICc, NS2A-NS3 DEAD, 3ʹHELICc–NS4A, NS4B0.020.123.90.008 H 0
H0: JEV-USUVC-NS50.6530.170.5240.276None
4 H 1: WNV–YAOV monophylyprM, NS1-NS3 HELICc, NS2A-NS3 DEAD2.670.0461.520.010.00.24 H 2
H 0: WNV–YAOV paraphylyNS50.170.0480.00.9961.110.14 H 1 or H2
5 H 1: ROCV/ILHV-basal Culex IprM, E, NS1-NS3 HELICc, NS2A-NS3 DEAD4.20.0323.320.160.00.24 H 2
H 0:ROCV/ILHV—NTAV gp (Culex I) monophyly3ʹHELICc–NS4A, NS4B, C–NS52.350.0362.300.0240.00.83 H 2

Combined locus of wobble codons comprising all the genes listed for a given row as described in Fig. 1.

Parametric bootstrapping of RY-coded third codon data produced a clear and striking result for the clinically important MBFV because it did not support phylogenetic incongruence and no alignment for Trees 1–5 unequivocally supported H1 (Table 4). Genome-wide unequivocal support was observed singly comprising H0 for Tree 2 genomes, and H2 for Tree 5 genomes (Table 4). The genome-wide Tree 5 H2 topology is further supported by non-parametric bootstrap analysis (Fig. 6). Furthermore, the genomes of Tree 1 and Tree 3 in Table 1 represent a genome-wide H0 topology because the equivocal locus in both cases was H0 under amino acid analysis (Table 4). In addition, Tree 3 H0 is supported by non-parametric bootstrapping (Fig. 6). The Tree 4 genomes supported H2, defined in Table 1, and only Tree 4 C–NS5 locus supported either H1 or H2, but importantly rejected H0 which was the amino acid tree. Thus, the RY-coded third position codon alignments are incongruent with amino acid phylogenies concerning the same loci for either a sizeable region of a given genome for Tree 1–3 or else the entire genome for Tree 4 and 5. This strongly infers that the evolutionary process resulting in phylogenetic incongruence for amino acid data does not involve recombination (Discussion). The effectiveness of RY-coding for recovering phylogenetic signal from wobble codon data is demonstrated for the C-NS5 locus Tree4 because the tree is H1 for both first and second codon position alignments using Bayesian and non-parametric bootstrapping analysis (not shown), H1 for RY-coded third codon alignments using parametric bootstrapping (Table 4), but no phylogenetic signal was recovered for all three codon positions without RY-coding for any alignment in the analysis (not shown).

Amino acid positive selection.

Amino acid selection analysis was performed as described in Methods Section 2.7 to assess lineage specific positive selection for Trees 1–5. This analysis is limited by the saturation of transition substitutions, however, this scenario for this precise positive selection analysis has been described as robust via simulation (Gharib and Robinson-Rechavi 2013). Positive selection where ω ≫ 1 is evident in all lineages specifically associated with amino acid incongruence for Trees 1–5 (Table 5), whilst the minima and maxim background ω is 0.002–0.04 across all analyses. If the RY wobble codon tree (H0 for Trees 1–3 and H2 for Trees 4–5) was the ‘neutral tree’ and amino acid tree incongruence was driven by selection acting on individual amino acids then the positive selection in Table 5 would provide an explanation of phylogenetic incongruence for Trees 2, 3, and 4. Under this hypothesis a lack of positive selection (ω ≪ 1) is acceptable where the hypothesis topology tested is congruent with the amino acid tree and would explain the pattern of lower ω. However, ω ≪ 1 for Tree 1 H1 topology for NS1 to NS4B and Tree 5 H2 topology for prM to NS2A–NS3 DEAD, even though both should show ω ≫1. This is consistent with the prediction of false negatives for ω under simulation conditions of saturation (Gharib and Robinson-Rechavi 2013) and these given branches represent the deepest nodes in this specific analysis.
Table 5.

The nonsynonymous to synonymous substitution rate ratio for incongruent lineages was calculated under a maximum likelihood branch-specific model. The numbers underlined simply highlight that the hypothesis tree being tested is congruent with the amino acid tree for the combined locus under investigation. See ‘Note’ in Table 2.

Tree no.MBFV TreeCombined Locusa d N /dS (ω)bPositive selection
Null (H0)Alt. (H1)Alt (H2)
Aedes-associated1 H 1: YFV-basal YFV grpprM, E, C–NS57.90.25 H 0 only
H 0: YFV-WSLV/SEP monophylyNS1-NS3 HELICc, NS2A-NS3 DEAD,3ʹHELICc–NS4A, NS4B0.030.008No
2 H 1 UGSV-BOUVE, 3ʹHELICc–NS4A,360.13.3 H 0 & H1
H 0: UGSV-BANVprM, NS1-NS3 HELICc, C–NS50.002176.5 H 1 only
Culex-associated3 H 1: USUV-ALFV/MVEV monophyly (JEV-USUV paraphyly)prM, E, NS1-NS3 HELICc, NS2A-NS3 DEAD, 3ʹHELICc–NS4A, NS4B149.324.6 H 0 & H1
H0: JEV-USUVC-NS5333.0999.0 H 0 & H1
4 H 1: WNV–YAOV monophylyprM, NS1-NS3 HELICc, NS2A-NS3 DEAD16.664.942.8 H 0, H1, H2
H 0: WNV–YAOV paraphylyNS50.05951.710.2 H 1 & H2
5 H 1: ROCV/ILHV-basal Culex IprM, E, NS1-NS3 HELICc, NS2A-NS3 DEAD33.84.10.01 H 0 & H1
H 0:ROCV/ILHV—NTAV gp (Culex I) monophyly3ʹHELICc–NS4A, NS4B, C–NS5577.429.042.9 H 0, H1, H2

Combined locus of all three codon positions comprising all the genes listed for a given row as described in Fig. 1. The locus will represent H0 or H1 for amino acid data described in Fig. 5.

The background ω is 0.002–0.04, the largest value being for Tree 1 H0, the prM, E and NS5 locus.

Interspecific genomic incongruence for the MBFV using the likelihood ratio test performed using parametric bootstrapping to determine between two alternative interspecific trees for the clinically important Aedes- and Culex-associated viruses, including YFV, UGSV, JEV/USUV, WNV and ROCV. Note: ‘-’ replaces ‘–’ for brevity. = significant at 1 per cent. Interspecific genomic incongruence for the MBFV lineage as described in Table 1 to distinguish between two alternative interspecific trees for all members which are not associated with a dual-host-affiliated insect-specific flavivirus. n.b. Tree 6 H01 and Tree 7 H0 are the same tree because EPEV is basal to YFV and SOKV groups and the SOKV group is basal to the YFV group. However, Tree 6 H1 and Tree 7 H1 are very different trees. = significant at 1 per cent; grp = group; grps = groups. See ‘Note’ in Table 2. Interspecific genomic incongruence for the RY-coded third codon nucleotide data of the MBFV using the likelihood ratio test performed using parametric bootstrapping to determine between two alternative interspecific trees for the clinically important Aedes- and Culex-associated viruses, including YFV, UGSV, JEV/USUV, WNV and ROCV. See ‘Note’ in Table 2. Combined locus of wobble codons comprising all the genes listed for a given row as described in Fig. 1. The nonsynonymous to synonymous substitution rate ratio for incongruent lineages was calculated under a maximum likelihood branch-specific model. The numbers underlined simply highlight that the hypothesis tree being tested is congruent with the amino acid tree for the combined locus under investigation. See ‘Note’ in Table 2. Combined locus of all three codon positions comprising all the genes listed for a given row as described in Fig. 1. The locus will represent H0 or H1 for amino acid data described in Fig. 5. The background ω is 0.002–0.04, the largest value being for Tree 1 H0, the prM, E and NS5 locus. RY-coded wobble codon maximum likelihood trees from concatenated loci and their comparison to single locus amino acid maximum likelihood and Bayesian trees. Note the Bayesian MCMC would not converge for the RY-codon ‘wobble codon data’ but did achieve convergence for non-RY-coded, nucleotide data (Fig. 5).

Discussion

Intraspecific (homologous) tree incongruence within a single flavivirus species or serotype has been a focus of virological investigations for decades and was first reported in DENV. Prior to its recognition, it was believed that genetic diversity, as seen in DENV, was clonal and arose from the accumulation of substitutions (Blok et al. 1992). Subsequently, it was generally believed that tree incongruence was due to recombination and represented an efficient mechanism for generating genetic diversity (Worobey and Holmes 1999). However, it then took another 20 years to demonstrate that interspecific tree incongruence potentially represents an evolutionary process for shaping the epidemiology of flaviviruses. This followed the discovery that the Zika virus genome is a hybrid representing phylogenetic incongruence between Aedes-associated MBFV and flaviviruses transmitted primarily by Culex species, i.e. related to Japanese encephalitis virus (Gaunt et al. 2020). This discovery explained why different authors had produced conflicting MBFV lineage phylogenies or incongruence (Kuno et al. 1998, 2009; Gaunt et al. 2001; Huhtamo et al. 2009; Grard et al. 2010; Kolodziejek et al. 2013; Moureau et al. 2015; Kuno 2020). Moreover, whilst Aedes and/or Culex species mosquitoes are competent to transmit both ZIKV and DENV in their natural settings, one species in particular viz., the anthropophilic, urban-dwelling Aedes aegypti is the primary transmission vector of DENV and ZIKV throughout the tropics and sub-tropics (Boyer et al. 2018). Indeed, there are many reports of mixed infections in field-isolated Ae. aegypti (Chaves et al. 2018), inferring epidemiological competition between these viruses in regions where they overlap geographically. Moreover, the T-cell immune response of the ZIKV hybrid genome is strongly represented by T-cell loci obtained from Culex-associated flaviviruses rather than those of its immediate DENV ancestor. Thus, interspecific phylogenetic incongruence provided ZIKV with a potential genetic adaptation to escape the immune barrier presented by DENV (Gaunt et al. 2020). As far as we are aware, the putative immunological selection pressure for ZIKV interspecific phylogenetic incongruence makes the observation unique. For example, interspecific recombination was reported for Hepacivirus (Thézé et al. 2015), but this was not linked with an identifiable selection pressure. The significance of interspecific tree incongruence extends far beyond viral genetics, evolution, and systematics. Indeed if this is a consistent evolutionary process it could have wide-ranging potential impacts on vector or host specificity, immune response in vertebrates, virulence or disease severity, serological diagnostics, and other aspects of epidemiology. Based on phylogenetic evidence, the Culex-associated flaviviruses are the direct descendants of the Aedes-associated viruses (Gaunt et al. 2001; Gould et al. 2001) many of which, include the ancestors of JEV, USUV, WNV, and YAOV, that share the sympatric environment of the African rainforest with the Aedes-associated viruses including BANV, BOUV, UGSV, and YFV. We therefore proposed that tree incongruence for ZIKV–SPOV between the DENV and Culex spp. MBFV was an evolutionary adaptation of ZIKV and SPOV to this ecoregion driven by selective immune pressure from T-cell epitope rich loci within the Culex-associated MBFV, effectively altering the immunological phenotype of ZIKV and SPOV, potentially shaping their epidemiology in terms of emerging arboviruses. In this context, SPOV was first identified outside Africa in 2016 when it was isolated from a pool of Culex quinquefasciatus mosquitoes in Haiti (White et al. 2018). Thus, in common with other flaviviruses, alphaviruses, and bunyaviruses, SPOV has the potential to emerge in the New World. Identification of interspecific hybrid genomes using tree incongruence is not unique to ZIKV and SPOV. Hybrid genomes were identified in a further 10 of 50 MBFV-related flaviviruses analysed by both parametric bootstrapping and Bayesian amino acid trees. This currently represents the widest diversity of interspecific tree incongruence amongst the positive-stranded RNA arboviruses to our knowledge. Phylogenetic incongruence that involved recent, or shallow nodes on the MBFV phylogeny represent extant (non-extinct) MBFV lineage viruses of clinical importance. They are dual-host-affiliated insect-specific flaviviruses (dISF) that are characterized. The phylogenetic incongruence identified here that is basal within the phylogenetic tree, is restricted to MBFV lineage viruses that lack dual-host affiliation, viz., sandfly-vectored MBFV lineage virus (EPEV; Tree 6), the SOKV group (bat associated; Tree 7), and the ISFV group (Tree 8). Epidemiologically, both ZIKV and SPOV originated and co-circulate in an African sylvatic environment common to other pathogenic and clinically important MBFV including DENV, UGSV, WNV, and YFV. In context, the ZIKV and SPOV interspecific phylogenetic incongruence breakpoints can be defined as ‘hotspots’ because they are far from unique within the MBFV. The study was facilitated by the statistical phylogenetic power of parametric bootstrapping, which has received minimal attention in virology, and it was broadly supported by Bayesian phylogenetic models. Parametric bootstrapping is applicable because it repeatedly produced one alternative tree hypothesis for a given member of the MBFV lineage and Bayesian analysis identified the possibility of a third phylogeny. This approach provides a template to look for further and more powerful evidence of arbovirus interspecific phylogenetic incongruence. Viral phylogenetic incongruence could result from an evolutionary process involving recombination (reviewed by Simon-Loriere and Holmes 2011; Thézé et al. 2015) and selection or adaptation particularly through parallel or convergent evolution (reviewed by Stern et al. 2017; Geoghegan and Holmes 2018), tree-building methods (Felsenstein 1978; Huelsenbeck 1997; Alfaro et al. 2003; Drummond et al. 2005; Dávalos et al. 2012), or systematic bias in the nucleotide data (Phillips et al. 2004). Both recombination and parallel evolution have been identified as possible virulence factors through in vivo virus passage (Stern et al. 2017). In this study the congruence between maximum likelihood parametric bootstrapping and Bayesian methods both for amino acid and first and second codon positions infers that interspecific phylogenetic incongruence is an evolutionary process rather than a methodological discrepancy, notably for the viruses of clinical importance and including YFV. However, MBFV members that lack a dual-host transmission cycle, notably EPEV and the SOKV group demonstrated phylogenetic incongruence between viruses in deep (ancestral) nodes with a single locus exception (ISFV), only for non-parametric and parametric bootstrapping. This suggests caution for interpreting phylogenetic incongruence at deep nodes based on single methods. RY-coding of the wobble codon used in this study, albeit comparatively novel to virus phylogenetics, is a mainstream approach for overcoming compositional bias in insect evolution with a long history (Delsuc, Phillips, and Penny 2003; Breinholt and Kawahara 2013; Song et al. 2016; Yang et al. 2018). RY-coding has performed strongly in simulation studies using maximum likelihood phylogenetic analysis (Phillips et al. 2004; Ishikawa et al. 2012). In this study RY-coding was restricted to incongruent taxa that are not subject to complete saturation and is therefore a conservative application. Parametric and non-parametric bootstrapping of the RY-coded trees either inferred or robustly concluded there was a single phylogeny for all the clinically important MBFV and did not support genome-wide phylogenetic incongruence. This splits the analysis into: ‘between locus incongruence’ and; ‘within locus incongruence’. The ‘within locus incongruence’ is phylogenetic incongruence between data-types predominantly amino acid trees, but also first and second codon position trees against the RY-coded wobble codon trees. Regardless of whether RY-coded wobble codons are ‘neutral’, it is difficult to consider interspecific recombination as the evolutionary process causing ‘between locus phylogenetic incongruence’ for the clinically important MBFV. Moreover, for 2 of the 5 occurrences of ‘within locus incongruence’, both being SLEV group viruses, there was complete phylogenetic incongruence between data-types across their genomes, viz. Tree 4 and 5. Positive selection analyses (ω) for the clinically important MBFV support an explanation that interspecific phylogenetic incongruence was caused by selection pressure on amino acids physically localized within a locus resulting in similar or identical mutations after two or more lineages diverged. Two specific evolutionary processes that could generate interspecific ‘between and within locus’ phylogenetic incongruence are parallel and convergent evolution. It is important to reiterate simulation studies on saturated neutral substitutions infer the PAML method used (Yang and Nielsen 2002), was strongly biased towards false negative ω rather than false positives (Gharib and Robinson-Rechavi 2013). However, further simulation studies on recombination and selection available through forward in time methods (Peng et al. 2015; Haller and Messer 2019; Jariani et al. 2019) would be beneficial to support the findings of Walid and Robinson-Rechavi (2013). Positive selection is a very clear substitution process that could result in the same locus producing two opposing trees. Yellow fever (Tree 1) is a particularly interesting example of interspecific ‘between loci’ phylogenetic incongruence within the flaviviruses where the E and C–NS5 loci are phylogenetically basal within the YFV-related group, whereas all NS-proteins form sister groups with WSLV and SEPV. This would be expected to have profound immunological implications during vertebrate host infections for example by removing major CD4+ and CD8+ T-cell epitopes normally expressed by the original parent, of YFV. The implications are that WSLV and SEPV could generate T-cell cross-protective responses against YFV and this might, in part, explain the absence of YFV in New Guinea where SEPV is known to be present (Kuno 2020). This potential scenario has parallels with the ZIKV and SPOV interspecific hybrid genome/proteome associated with minimizing the cross-protective CD4+ and CD8+ T-cell immune responses resulting from previous DENV infections (Gaunt et al. 2020), but which rendered it susceptible to cross-protective responses induced by the recombinant Culex-associated MBFV. Given the diverse range of flaviviruses from arthropods, mammals, and birds, there is unlikely to be a singular selective force driving interspecific phylogenetic incongruence, with its consequent impact on flavivirus pathogenesis and epidemiology. For example, evidence of interspecific phylogenetic incongruence involving the C–NS5 protein was detected in USUV, JEV and probably in WNV and YAOV. This would strongly influence the cross-reactive CD4+ response, in an infected host if it was comparable to that generated against DENV, and presumably would also alter virus replication characteristics. Interspecific phylogenetic incongruence that resulted from interspecific recombination should not be excluded and could be facilitated by a range of ecologic mechanisms for example co-circulation and co-infections. Flavivirus infections in mosquito larvae were first reported in 1938 during YFV studies (Whitman and Antunes 1938) and also identified in DENV, SLEV, and WNV (Bara et al. 2013; Kuno 2016; Du et al. 2019). One of several possible mechanisms of co-infection is when an infective larva dies, other nearby larvae become infected through ingestion (Bara et al. 2013; Kuno 2016; Kuno et al. 2017; Du et al. 2019) and in addition there are many predators of mosquito larvae including birds. Mosquito vertical transmission is an alternative or contributory method of transmission and could play an essential role in co-infection (Burgdorfer and Varma 1967; Lequime et al. 2016; Tesh et al. 2016). All the viruses in this study with known vectors or close association with those viruses in the clade structure, demonstrated phylogenetic incongruence and remain within the phyloepidemiological boundaries that have been defined previously (Gaunt et al. 2001; Gould et al. 2001). This fundamental observation makes sense for the necessity of co-infection of the vector and/or vertebrate, generating mixed infections in vector and/or host cells as a prerequisite for either recombination or adaptation/selection involving co-circulation. The interspecific phylogenetic incongruence inferred here has many potential implications, not just for evolution, emergence, and biodiversity but also for their future potential impact on the development and widespread use of live-attenuated chimeric vaccines in regions of the world where co-infections between related flaviviruses are known to occur (Seligman and Gould 2004). Interspecific co-infections increase the potential for recombination and/or different mechanisms of adaptation such as convergent evolution. There is a consensus that MBFV transmission in general is not a constant parameter due to shifting environmental factors (French and Holmes 2020). For example, African YFV disperses sequentially from a sylvatic, to perisylvatic environment and then to the urban ecoregion of large towns and cities (Christopher 1960), where the domestic variant of Ae. aegypti is the primary transmission vector. In this context co-circulation, a prerequisite for interspecific recombination, could be temporally dynamic and environmentally sensitive. The potentially damaging global impact of human activities on the environment, including climate change, deforestation, pollution, urbanization, exploitation of fossil fuels, and widespread distribution of live-attenuated chimeric vaccines, must not be overlooked. Click here for additional data file.
  73 in total

1.  Widespread intra-serotype recombination in natural populations of dengue virus.

Authors:  M Worobey; A Rambaut; E C Holmes
Journal:  Proc Natl Acad Sci U S A       Date:  1999-06-22       Impact factor: 11.205

2.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability.

Authors:  Christophe J Douady; Frédéric Delsuc; Yan Boucher; W Ford Doolittle; Emmanuel J P Douzery
Journal:  Mol Biol Evol       Date:  2003-02       Impact factor: 16.240

Review 3.  Live flavivirus vaccines: reasons for caution.

Authors:  Stephen J Seligman; Ernest A Gould
Journal:  Lancet       Date:  2004-06-19       Impact factor: 79.321

4.  Comment on "Hexapod origins: monophyletic or paraphyletic?".

Authors:  Frédéric Delsuc; Matthew J Phillips; David Penny
Journal:  Science       Date:  2003-09-12       Impact factor: 47.728

Review 5.  Molecular evolution of dengue viruses: contributions of phylogenetics to understanding the history and epidemiology of the preeminent arboviral disease.

Authors:  Scott C Weaver; Nikos Vasilakis
Journal:  Infect Genet Evol       Date:  2009-02-13       Impact factor: 3.342

6.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

Authors:  David T Jones; Shaun M Kandathil
Journal:  Bioinformatics       Date:  2018-10-01       Impact factor: 6.937

7.  A natural intergenotypic recombinant of hepatitis C virus identified in St. Petersburg.

Authors:  Olga Kalinina; Helene Norder; Sergey Mukomolov; Lars O Magnius
Journal:  J Virol       Date:  2002-04       Impact factor: 5.103

8.  SLiM 3: Forward Genetic Simulations Beyond the Wright-Fisher Model.

Authors:  Benjamin C Haller; Philipp W Messer
Journal:  Mol Biol Evol       Date:  2019-03-01       Impact factor: 16.240

9.  Phylotranscriptomics: saturated third codon positions radically influence the estimation of trees based on next-gen data.

Authors:  Jesse W Breinholt; Akito Y Kawahara
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

10.  Novel flaviviruses from mosquitoes: mosquito-specific evolutionary lineages within the phylogenetic group of mosquito-borne flaviviruses.

Authors:  Eili Huhtamo; Shelley Cook; Gregory Moureau; Nathalie Y Uzcátegui; Tarja Sironen; Suvi Kuivanen; Niina Putkuri; Satu Kurkela; Ralph E Harbach; Andrew E Firth; Olli Vapalahti; Ernest A Gould; Xavier de Lamballerie
Journal:  Virology       Date:  2014-08-09       Impact factor: 3.616

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.