Literature DB >> 27508215

Phylodynamic analysis of porcine circovirus type 2: Methodological approach and datasets.

Giovanni Franzo¹, Martì Cortey², Joaquim Segalés², Joseph Hughes³, Michele Drigo¹.

Abstract

Since its first description, PCV2 has emerged as one of the most economically relevant diseases for the swine industry. Despite the introduction of vaccines effective in controlling clinical syndromes, PCV2 spread was not prevented and some potential evidences of vaccine immuno escape have recently been reported ("Complete genome sequence of a novel porcine circovirus type 2b variant present in cases of vaccine failures in the United States" (Xiao and Halbur, 2012) [1], "Genetic and antigenic characterization of a newly emerging porcine circovirus type 2b mutant first isolated in cases of vaccine failure in Korea" (Seo et al., 2014) [2]). In this article, we used a collection of PCV2 full genomes, provided in the present manuscript, and several phylogentic, phylodynamic and bioinformatic methods to investigate different aspects of PCV2 epidemiology, history and evolution (more thoroughly described in "PHYLODYNAMIC ANALYSIS of PORCINE CIRCOVIRUS TYPE 2 REVEALS GLOBAL WAVES of EMERGING GENOTYPES and the CIRCULATION of RECOMBINANT FORMS"[3]). The methodological approaches used to consistently detect recombiantion events and estimate population dymanics and spreading patterns of rapidly evolving ssDNA viruses are herein reported. Programs used are described and original scripts have been provided. Ensembled databases used are also made available. These consist of a broad collection of complete genome sequences (i.e. 843 sequences; 63 complete genomes of PCV2a, 310 of PCV2b, 4 of PCV2c, 217 of PCV2d, 64 of CRF01, 140 of CRF02 and 45 of CRF03.), divided in differnt ORF (i.e. ORF1, ORF2 and intergenic regions), of PCV2 genotypes and major Circulating Recombinat Forms (CRF) properly annotated with respective collection data and country. Globally, all of these data can be used as a starting point for further studies and for classification purpose.

Entities: Chemical Disease Gene Species

Year: 2016 PMID： 27508215 PMCID： PMC4962815 DOI： 10.1016/j.dib.2016.06.005

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table Value of the data Most extensive collection of PCV2 full genome sequences with available metadata. Proper annotation linking genetic data to country of origin and collection data Full description of several approaches used to analyze different aspects of viral evolution Datasets suitable for further evolutionary studies and for PCV2 classification purpose Standardized approach that can be used for follow-up studies on PCV2 evolution.

Data

Supplementary data 1 provides a table reporting the accession number of all (i.e. 843) PCV2 complete genomes and PCV2a ORF2 sequences used in Franzo et al. [3]. For each sequence, the country where it has been sampled and the collection data are also reported. The alignments of all major PCV2 genotypes (i.e. PCV2a, PCV2b, PCV2c and PCV2d) and circulating recombinant forms (CRF) are provided in Supplementary data 2 and could be used for comparison purpose and as a starting point for further studies. Finally, Supplementary data 3 provides an R script for ancestral state reconstruction of per-site amino acid sequence using a maximum likelihood approach.

Experimental design, materials and methods

Dataset

A total of 925 PCV2 complete genome sequences with known collection dates and country of origin were downloaded from GenBank (accessed 06/10/2014 – listed in “Supplementary data 1.xls” in the online version of this article) and aligned using the MAFFT method [4]. All poorly aligned sequences and those displaying degenerate nucleotides or indels which caused reading frame alterations, suggesting sequencing errors, were removed from the dataset (898 sequences were maintained) (Supplementary data 1).

Recombination analysis

The whole dataset was tested for recombination using two programs based on different approaches: RDP4 [5] and GARD [6]. When RDP was used, only recombination events detected by more than 2 methods with a significance value lower than 10−5 (p-value <10−5) and Bonferroni correction were accepted. The non-recombinant sequences as well as those sharing recombination events were split into separate datasets and expanded to their original size.

Genotyping and database preparation

The non-recombinant sequences were classified into genotypes PCV2a, PCV2b, PCV2c or PCV2d according to Franzo et al. 2015 [7]. The most appropriate nucleotide substitution model was selected according to the results of the Akaike information criterion (AIC) score calculated using JModel Test 2.1.2 [8]. A phylogenetic tree was reconstructed using the Maximum likelihood (ML) approach implemented in PhyML [9]. The best tree search method included the combination of two branch swapping algorithms: nearest neighbor interchange (NNI) and subtree pruning and regrafting (SPR). The robustness of the monophyly of the taxa subsets was estimated with the fast non-parametric version of the aLRT (Shimodaira–Hasegawa [SH]-aLRT), developed and implemented in PhyML 3.0 [10]. On the basis of the recombination and phylogenetic analyses, sequences were divided into independent datasets, corresponding to different genotypes and CRFs (i.e. those including more than 30 sequences collected in two or more countries). Every dataset was further divided in three regions, namely ORF1, ORF2 and intergenic region (obtained merging together the major and the minor intergenic regions) and a new alignment was generated on each dataset. The coding regions were aligned at the amino acid level and then the nucleotide sequences were back-translated using the MAFFT algorithm implemented in TranslatorX [11]. All these datasets, comprising different gene alignments, are provided in Supplementary data 2. These include 63 complete genomes of PCV2a, 310 of PCV2b, 4 of PCV2c, 217 of PCV2d, 64 of CRF01, 140 of CRF02 and 45 of CRF03. Additionally a dataset of 83 PCV2a ORF2 sequences is provided.

BEAST and selective pressures analysis

The time to most recent common ancestor (tMRCA), substitution rates, phylogeography and population dynamics were jointly estimated using a Bayesian serial coalescent approach implemented in BEAST 1.8.1 [12]. The selective pressure on the viral proteins was estimated using different methods based on the ratio between non-synonymous and synonymous substitution rates (dN/dS). Pervasive diversifying/purifying selection was estimated using SLAC, FEL and FUBAR method while episodic diversifying selection was evaluated using MEME [13], [14], [15]. The action of selective pressures was compared among different genes using the dNdSDistributionComparison.bf implemented in HyPhy [16]. Differences in the site-by-site selection patterns among different genotypes were investigated for each gene using the batch files CompareSelectivePressure.bf implemented in the same program. Ancestral state reconstruction of per site amino acid sequence was performed, based on the time scaled phylogenetic trees, using the maximum likelihood approach of the ape package implemented in R [17].The corresponding script is provided in Supplementary data 3.

Subject area	Biology, Genetics and Genomics
More specific subject area	Phylogenetics and Phylogenomics
Type of data	Excel file
How data was acquired	Sequence data were downloaded from GenBank, manually checked and annotated. Analysis were performed using state of art freely available programs for phylogeny, population dynamics and selective pressure analysis.
Data format	Raw, filtered
Experimental factors	PCV2 complete genome sequences were downloaded from Genbank and annotated with the respective collection country and data. Sequences have been aligned and the consistency of the alignment was checked. All sequences were scanned for recombination and subdivided in genotypes or recombinant forms. Databases generated in this way were used for further analysis.
Experimental features	PCV2 sequences download, quality check and annotation.
Experimental features	Sequence alignment, recombination analysis, Coalescent based analysis of population parameters and reconstruction of viral spreading patterns. Analysis of selective pressure acting on different coding regions.
Data source location	n/a
Data accessibility	Data are within the article

17 in total

1. GARD: a genetic algorithm for recombination detection.

Authors: Sergei L Kosakovsky Pond; David Posada; Michael B Gravenor; Christopher H Woelk; Simon D W Frost
Journal: Bioinformatics Date: 2006-11-16 Impact factor: 6.937

2. FUBAR: a fast, unconstrained bayesian approximation for inferring selection.

Authors: Ben Murrell; Sasha Moola; Amandla Mabona; Thomas Weighill; Daniel Sheward; Sergei L Kosakovsky Pond; Konrad Scheffler
Journal: Mol Biol Evol Date: 2013-02-18 Impact factor: 16.240

3. Genetic and antigenic characterization of a newly emerging porcine circovirus type 2b mutant first isolated in cases of vaccine failure in Korea.

Authors: Hwi Won Seo; Changhoon Park; Ikjae Kang; Kyuhyung Choi; Jiwoon Jeong; Su-Jin Park; Chanhee Chae
Journal: Arch Virol Date: 2014-07-18 Impact factor: 2.574

4. MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors: Kazutaka Katoh; Daron M Standley
Journal: Mol Biol Evol Date: 2013-01-16 Impact factor: 16.240

5. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

Authors: Federico Abascal; Rafael Zardoya; Maximilian J Telford
Journal: Nucleic Acids Res Date: 2010-04-30 Impact factor: 16.971

6. Phylodynamic analysis of porcine circovirus type 2 reveals global waves of emerging genotypes and the circulation of recombinant forms.

Authors: Giovanni Franzo; Marti Cortey; Joaquim Segalés; Joseph Hughes; Michele Drigo
Journal: Mol Phylogenet Evol Date: 2016-04-23 Impact factor: 4.286

7. Complete genome sequence of a novel porcine circovirus type 2b variant present in cases of vaccine failures in the United States.

Authors: Chao-Ting Xiao; Patrick G Halbur; Tanja Opriessnig
Journal: J Virol Date: 2012-11 Impact factor: 5.103

8. Bayesian phylogenetics with BEAUti and the BEAST 1.7.

Authors: Alexei J Drummond; Marc A Suchard; Dong Xie; Andrew Rambaut
Journal: Mol Biol Evol Date: 2012-02-25 Impact factor: 16.240

9. RDP3: a flexible and fast computer program for analyzing recombination.

Authors: Darren P Martin; Philippe Lemey; Martin Lott; Vincent Moulton; David Posada; Pierre Lefeuvre
Journal: Bioinformatics Date: 2010-08-26 Impact factor: 6.937

10. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes.

Authors: Maria Anisimova; Manuel Gil; Jean-François Dufayard; Christophe Dessimoz; Olivier Gascuel
Journal: Syst Biol Date: 2011-05-03 Impact factor: 15.683

3 in total

Review 1. Porcine Circovirus Type 2 (PCV2) Vaccines in the Context of Current Molecular Epidemiology.

Authors: Anbu K Karuppannan; Tanja Opriessnig
Journal: Viruses Date: 2017-05-06 Impact factor: 5.048

2. Porcine circovirus type 2 (PCV2) genotyping in Austrian pigs in the years 2002 to 2017.

Authors: Christiane Weissenbacher-Lang; Tamara Kristen; Verena Mendel; René Brunthaler; Lukas Schwarz; Herbert Weissenböck
Journal: BMC Vet Res Date: 2020-06-15 Impact factor: 2.741

3. Free to Circulate: An Update on the Epidemiological Dynamics of Porcine Circovirus 2 (PCV-2) in Italy Reveals the Role of Local Spreading, Wild Populations, and Foreign Countries.

Authors: Giovanni Franzo; Susanna Tinello; Laura Grassi; Claudia Maria Tucciarone; Matteo Legnardi; Mattia Cecchinato; Giorgia Dotto; Alessandra Mondin; Marco Martini; Daniela Pasotto; Maria Luisa Menandro; Michele Drigo
Journal: Pathogens Date: 2020-03-17

3 in total