| Literature DB >> 31200111 |
Kiril M Dimitrov1, Celia Abolnik2, Claudio L Afonso3, Emmanuel Albina4, Justin Bahl5, Mikael Berg6, Francois-Xavier Briand7, Ian H Brown8, Kang-Seuk Choi9, Ilya Chvala10, Diego G Diel11, Peter A Durr12, Helena L Ferreira13, Alice Fusaro14, Patricia Gil15, Gabriela V Goujgoulova16, Christian Grund17, Joseph T Hicks5, Tony M Joannis18, Mia Kim Torchetti19, Sergey Kolosov10, Bénédicte Lambrecht20, Nicola S Lewis21, Haijin Liu22, Hualei Liu23, Sam McCullough12, Patti J Miller24, Isabella Monne14, Claude P Muller25, Muhammad Munir26, Dilmara Reischak27, Mahmoud Sabra28, Siba K Samal29, Renata Servan de Almeida15, Ismaila Shittu18, Chantal J Snoeck25, David L Suarez30, Steven Van Borm20, Zhiliang Wang23, Frank Y K Wong12.
Abstract
Several Avian paramyxoviruses 1 (synonymous with Newcastle disease virus or NDV, used hereafter) classification systems have been proposed for strain identification and differentiation. These systems pioneered classification efforts; however, they were based on different approaches and lacked objective criteria for the differentiation of isolates. These differences have created discrepancies among systems, rendering discussions and comparisons across studies difficult. Although a system that used objective classification criteria was proposed by Diel and co-workers in 2012, the ample worldwide circulation and constant evolution of NDV, and utilization of only some of the criteria, led to identical naming and/or incorrect assigning of new sub/genotypes. To address these issues, an international consortium of experts was convened to undertake in-depth analyses of NDV genetic diversity. This consortium generated curated, up-to-date, complete fusion gene class I and class II datasets of all known NDV for public use, performed comprehensive phylogenetic neighbor-Joining, maximum-likelihood, Bayesian and nucleotide distance analyses, and compared these inference methods. An updated NDV classification and nomenclature system that incorporates phylogenetic topology, genetic distances, branch support, and epidemiological independence was developed. This new consensus system maintains two NDV classes and existing genotypes, identifies three new class II genotypes, and reduces the number of sub-genotypes. In order to track the ancestry of viruses, a dichotomous naming system for designating sub-genotypes was introduced. In addition, a pilot dataset and sub-trees rooting guidelines for rapid preliminary genotype identification of new isolates are provided. Guidelines for sequence dataset curation and phylogenetic inference, and a detailed comparison between the updated and previous systems are included. To increase the speed of phylogenetic inference and ensure consistency between laboratories, detailed guidelines for the use of a supercomputer are also provided. The proposed unified classification system will facilitate future studies of NDV evolution and epidemiology, and comparison of results obtained across the world. CrownEntities:
Keywords: Avian paramyxovirus 1 (APMV-1); Classification; Genotype; Newcastle disease virus (NDV); Nomenclature; Phylogenetic analysis
Mesh:
Substances:
Year: 2019 PMID: 31200111 PMCID: PMC6876278 DOI: 10.1016/j.meegid.2019.103917
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
Criteria for classification of NDV isolates.
| Criterion | Description |
|---|---|
| 1 | Assignment of viruses into new genotypes and sub-genotypes is done based on complete fusion gene phylogenetic analysis (sequences of at least 1645 nucleotides or longer). |
| 2 | Assignment of viruses into new genotypes and sub-genotypes is done only utilizing a complete dataset of sequences from all existing genotypes. All classification criteria listed below need to be fulfilled for naming new genotypes and sub-genotypes. |
| 3 | Sub-trees and pilot tree can be used for assigning new isolates to existing sub/genotypes. |
| 4 | New genotypes or sub-genotypes are created only when four or more independent isolates, without a direct epidemiologic link (i.e. distinct outbreaks), are available. |
| 5 | New genotypes and sub-genotypes are created based on the phylogenetic tree topology (need to cluster into monophyletic branches) using the Maximum Likelihood method and the general time-reversible (GTR) model with gamma distribution (Γ) utilizing RaxML or a comparable tool. |
| 6 | The mean nucleotide distance (evolutionary distances) between groups is inferred as the number of base substitutions per site from averaging over all sequence pairs between groups using MEGA v. 5/6/7 software (or a comparable tool) and utilizing the Maximum Composite Likelihood model with rate variation among sites that was modeled with a gamma distribution (shape parameter = 1). |
| 7 | Different genotypes have an average distance per site above 10% (0.1). |
| 8 | Different sub-genotypes have an average distance per site above 5% (0.05). |
| 9 | The bootstrap value at the genotype and sub-genotype defining node is 70% or above (≥70%). |
| 10 | Viruses that do not fulfill all classification criteria are assigned to the lower order (closer to the root) sub/genotype (see |
Criteria that were adopted from the Diel et al. (2012a) classification system.
Nomenclature criteria for existing and new NDV genotypes and sub-genotypesa.
| Criterion | Description |
|---|---|
| 1 | All existing genotypes (as per |
| 2 | The lowercase Latin letter system to name sub-genotypes is replaced by the numerical-decimal system using Arabic numerals. |
| 3 | Dichotomous splitting is used at every defining node (at which separation into sub-genotypes is done) using the numerals 1 and 2. |
| 4 | Class I sub/genotypes receive a numerical-decimal address (Arabic numerals separated by periods) that starts with the Arabic numeral of the genotype. Further numeration is made using the dichotomous system at every defining node using numerals 1 and 2. (e.g. 1.1 and 1.2). |
| 5 | Class II sub/genotypes receive a numerical-decimal address (Roman-Arabic numerals separated by periods) that starts with the Roman numeral of the genotype. Further numeration is made using the dichotomous system at every defining node (separating sub-genotypes) using numerals 1 and 2. (e.g. VII.1 and VII.2) (example in |
| 6 | At the higher order (next defining node closer to the tips) within sub-genotype VII.1, for example, the sub-genotypes that are one order higher will become VII.1.1 and VII.1.2. At the next node within sub-genotype VII.2, the two further sub-genotypes will be named VII.2.1 and VII.2.2, respectively (example in |
| 7 | If a branch has unresolved topology, low branch support (i.e. there are polytomies or low bootstrap values), or insufficient number of isolates, the viruses within this branch are not assigned to a higher order, and are assigned the name of the lower order until the topology/support/number of the isolates is resolved and all criteria are fulfilled, regardless of the fact that the distnace criterion is met. |
| 8 | Newly identified virus diversity (a group of viruses undescribed before) that meets all classification criteria will be classified as new genotype and will receive a subsequent Roman numeral (e.g. currently XXII is the next available) (example in |
| 9 | Existing sub-genotypes that fulfill the genotype criteria will not be designated with a new name in order to maintain the ancestral information in their names (examples in |
| 10 | If a new sub-genotype of viruses is identified later, but still falls into a higher order within an existing genotype, this new sub-genotype receives the next consecutive numerical address for the respective level of order in the phylogeny to avoid re-numbering all existing sub-genotypes that are of higher order. For example if a new sub-genotype that outgroups VII.1.1 and VII.1.2 is identified but it is still within VII.1, this new sub-genotype will be named VII.1.3 (example in |
The names of genotypes and sub-genotypes used in this table do not correspond to the names in the phylogenetic trees presented in the current study. The names in this table were used for demonstration purposes only.
Estimates of evolutionary distances between class I and class II NDV sub-genotypesa, b, c
| A. Sub-genotypes within genotype 1 of class I | B. Sub-genotypes within genotype I of class II | C. Sub-genotypes within genotype V of class II | D. Sub-genotypes within genotype VII of class II | ||||
|---|---|---|---|---|---|---|---|
| 7.27 | 8.58 | 5.86 | 9.91 | ||||
| 7.99 | 8.52 | 5.58 | |||||
| 10.96 | |||||||
| E. Sub-genotypes within genotype VI of class II | |||||||
| 8.05 | 8.55 | 6.14 | 8.12 | ||||
| 6.59 | 5.15 | ||||||
Inferred from the complete nucleotide F gene sequences.
The nucleotide distances were calculated at every defining node.
The number of base substitutions per site from averaging all sequence pairs between class I and class II sub-genotypes is shown. Analysis was conducted using the Maximum Composite Likelihood model (Tamura et al., 2004). The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The number of nucleotide sequences in each sub-genotypes is shown in parenthesis. Codon positions included were 1st, 2nd, 3rd and noncoding. All positions containing gaps and missing data were eliminated. Evolutionary analyses were conducted in MEGA6 (Tamura et al., 2013).
For easier comparison, the former sub-genotype names are provided in parentheses.
Fig. 1Maximum likelihood phylogenetic trees of class I (A) and class II (B).
Phylogenetic analyses are based on the full-length nucleotide sequence of the fusion gene of isolates representing Newcastle disease virus class I (A, n = 284) and class II (B, n = 1672). The evolutionary history was inferred by using RaxML (Stamatakis, 2014) and utilizing the Maximum Likelihood method based on the General Time Reversible model with 1000 bootstrap replicates. The trees with the highest log likelihood (class I = −18,683.27, class II = −106,684.34) are shown. A discrete gamma distribution was used to model evolutionary rate differences among sites and the rate variation model allowed for some sites to be evolutionarily invariable. For imaging purposes, the taxa tips are not displayed and colors are randomly assigned to indicate the different sub/genotypes. Three new class II genotypes assigned in the current study are highlighted in red font. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Estimates of evolutionary distances between genotypes of class II NDV.a, b
| Genotype (number of analyzed sequences) | No. of base substitutions per site | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII | XIII | XIV | XVI | XVII | XVIII | XIX | XX | |
| I (n = 120) | |||||||||||||||||||
| II (n = 17) | 13.06 | ||||||||||||||||||
| III (n = 6) | 11.23 | 13.76 | |||||||||||||||||
| IV (n = 8) | 12.79 | 14.86 | 10.11 | ||||||||||||||||
| V (n = 47) | 18.06 | 19.69 | 16.25 | 15.03 | |||||||||||||||
| VI (n = 265) | 19.69 | 20.64 | 18.06 | 16.07 | 16.06 | ||||||||||||||
| VII (n = 772) | 18.92 | 21.63 | 17.34 | 15.85 | 16.12 | 14.62 | |||||||||||||
| VIII (n = 6) | 15.53 | 16.92 | 13.78 | 12.41 | 12.89 | 13.35 | 14.15 | ||||||||||||
| IX (n = 6) | 11.44 | 12.85 | 10.14 | 16.05 | 17.86 | 17.38 | 13.50 | ||||||||||||
| X (n = 22) | 12.16 | 11.83 | 13.52 | 14.59 | 19.85 | 20.85 | 20.58 | 17.16 | 12.88 | ||||||||||
| XI (n = 14) | 20.12 | 22.60 | 18.52 | 15.08 | 21.76 | 23.72 | 24.27 | 20.56 | 17.33 | 22.43 | |||||||||
| XII (n = 23) | 19.48 | 22.38 | 18.14 | 16.51 | 15.72 | 13.81 | 12.70 | 14.05 | 18.23 | 20.96 | 24.58 | ||||||||
| XIII (n = 70) | 18.79 | 21.28 | 17.81 | 16.15 | 15.87 | 15.21 | 12.91 | 13.98 | 17.35 | 20.49 | 23.45 | 11.92 | |||||||
| XIV (n = 77) | 22.32 | 25.71 | 22.18 | 19.46 | 18.48 | 18.29 | 15.90 | 16.90 | 22.44 | 23.70 | 28.31 | 14.47 | 14.83 | ||||||
| XVI (n = 4) | 17.71 | 20.31 | 16.63 | 14.68 | 15.92 | 17.22 | 17.67 | 13.90 | 16.56 | 19.16 | 23.68 | 17.38 | 17.22 | 20.47 | |||||
| XVII (n = 85) | 17.71 | 21.30 | 17.61 | 16.19 | 15.90 | 15.66 | 13.76 | 14.67 | 17.35 | 21.01 | 22.75 | 12.20 | 12.15 | 13.59 | 18.19 | ||||
| XVIII (n = 17) | 18.75 | 21.55 | 17.95 | 16.34 | 15.92 | 14.53 | 13.26 | 14.46 | 17.37 | 20.66 | 23.27 | 12.03 | 11.99 | 13.95 | 17.74 | 10.48 | |||
| XIX (n = 38) | 20.60 | 21.49 | 18.80 | 17.49 | 10.12 | 17.71 | 17.26 | 15.00 | 18.41 | 21.93 | 24.47 | 18.12 | 17.69 | 20.57 | 18.56 | 17.94 | 18.03 | ||
| XX (n = 17) | 16.77 | 18.83 | 15.55 | 13.71 | 13.69 | 10.12 | 12.84 | 11.41 | 15.24 | 19.20 | 21.85 | 12.91 | 13.28 | 16.78 | 15.10 | 13.33 | 13.10 | 16.14 | |
| XXI (n = 51) | 19.83 | 21.76 | 18.52 | 16.81 | 16.72 | 11.88 | 15.06 | 14.44 | 18.08 | 21.81 | 24.41 | 15.46 | 16.02 | 18.61 | 18.23 | 16.68 | 15.72 | 18.51 | 10.78 |
Inferred from the complete nucleotide F gene sequences.
The sequences from genotype XV are identified as recombinant forms and are not included in this analysis.
The number of base substitutions per site from averaging all sequence pairs between class II genotypes is shown. Analysis was conducted using the Maximum Composite Likelihood model (Tamura et al., 2004). The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis involved 1664 nucleotide sequences (8 unclassified sequences not assigned as members of any genotype were excluded from the analysis, UNCL 1–8, see Supplemental Table. S2). Codon positions included were 1st, 2nd, 3rd and noncoding. All positions containing gaps and missing data were eliminated. Evolutionary analyses were conducted in MEGA6 (Tamura et al., 2013). Shaded cell represents inter-genotype nucleotide distance that is lower than 10%.
Side-by-side comparison between the “lineage” system (Aldous et al., 2003), the “genotype” system (Diel et al., 2012a), and the updated classification system for NDV isolates proposed in this study. The names in bold font represent the sub/genotypes names based on the new classification and nomenclature systems
| Sub/lineage | Sub/genotype | Current study |
|---|---|---|
| Class II | ||
| 1 | I a | |
| I b | ||
| I c | ||
| I d | ||
| 2 | II | |
| 3a, 3f | III | |
| 3b | IV | |
| 3c | V b | |
| V c | ||
| V d | ||
| VI a | ||
| VI n | ||
| 4b, | VI b | |
| VI e | ||
| VI f | ||
| VI h | ||
| VI j | ||
| VI k | ||
| 4c | – | |
| 5a | VII a – one sequence | |
| 5d | VII b | |
| VII b | ||
| VII j | ||
| VII l | ||
| 5c | VII e | |
| VII f | ||
| – | VII g (RF) | |
| 5a | VII h | |
| – | VII k | |
| 5e | VII | |
| 3d | VIII | |
| 3e | IX | |
| 1 | X a | |
| X b | ||
| 3 g | XI | |
| 5b | XII a | |
| XII b | ||
| 5b | XIII a | |
| XIII a | ||
| XIII b | ||
| XIII b | ||
| 5 h, 7d | XIV | |
| 5f, 7c | XIV a | |
| XIV b | ||
| – | XV (RF) | |
| 3d | XVI | |
| 5 g, 7a | XVII a | |
| XVII b | ||
| 7b | XVIII a | |
| XVIII b | ||
| 3c | V a | |
| 4a, 4d | VI c | |
| – | VI l | |
| – | VI i | |
| – | VI g | |
| – | VI m | |
| Class I | ||
| 6 | 1a | |
| 1b | ||
| 1c | ||
| 1d | ||
Fig. 2Class II “pilot” tree.
Phylogenetic analysis is based on the full-length nucleotide sequence of the fusion gene of selected isolates representing all class II Newcastle disease virus sub/genotypes (n = 125). The evolutionary history was inferred by using RaxML (Stamatakis, 2014) and utilizing the Maximum Likelihood method based on the General Time Reversible model with 1000 bootstrap replicates. The tree with the highest log likelihood (−28,785.59) is shown. A discrete gamma distribution was used to model evolutionary rate differences among sites and the rate variation model allowed for some sites to be evolutionarily invariable. The Roman numerals presented in the taxa names in the phylogenetic tree represent the respective genotype for each isolate. The new (decimal naming) and the old names (alpha-numerical) are provided for easier comparison. The taxa names also include the GenBank identification number, host name, country of isolation, strain designation, and year of isolation. Three new genotypes assigned in the current study are highlighted in red font. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)