| Literature DB >> 34960688 |
Frank Vandenbussche1, Elisabeth Mathijs2, Marylène Tignon2, Tamara Vandersmissen3, Ann Brigitte Cay2.
Abstract
Porcine reproductive and respiratory syndrome virus (PRRSV) is the causative agent of one of the most widespread and economically devastating diseases in the swine industry. Typing circulating PRRSV strains by means of sequencing is crucial for developing adequate control strategies. Most genetic studies only target the highly variable open reading frame (ORF) 5, for which an extensive database is available. In this study, we performed whole-genome sequencing (WGS) on a collection of 124 PRRSV-1 positive serum samples that were collected over a 5-year period (2015-2019) in Belgium. Our results show that (nearly) complete PRRSV genomes can be obtained directly from serum samples with a high success rate. Analysis of the coding regions confirmed the exceptionally high genetic diversity, even among Belgian PRRSV-1 strains. To gain more insight into the added value of WGS, we performed phylogenetic cluster analyses on separate ORF datasets as well as on a single, concatenated dataset (CDS) containing all ORFs. A comparison between the CDS and ORF clustering schemes revealed numerous discrepancies. To explain these differences, we performed a large-scale recombination analysis, which allowed us to identify a large number of potential recombination events that were scattered across the genome. As PRRSV does not contain typical recombination hot-spots, typing PRRSV strains based on a single ORF is not recommended. Although the typing accuracy can be improved by including multiple regions, our results show that the full genetic diversity among PRRSV strains can only be captured by analysing (nearly) complete genomes. Finally, we also identified several vaccine-derived recombinant strains, which once more raises the question of the safety of these vaccines.Entities:
Keywords: genotyping; porcine reproductive and respiratory syndrome virus; recombination; whole-genome sequencing
Mesh:
Substances:
Year: 2021 PMID: 34960688 PMCID: PMC8707199 DOI: 10.3390/v13122419
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Number of InDels found exclusively in the Belgian dataset or in the complete dataset.
| ORF | Protein | Belgian Dataset | Complete Dataset |
|---|---|---|---|
| ORF1ab | NSP1 | 0/0 | 6/0 |
| ORF1ab | NSP2 | 50/0 | 122/0 |
| ORF1ab | NSP3 | 0/0 | 0/0 |
| ORF1ab | NSP4 | 0/0 | 0/0 |
| ORF1ab | NSP5 | 0/0 | 0/0 |
| ORF1ab | NSP6 | 0/0 | 0/0 |
| ORF1ab | NSP7α | 0/0 | 0/0 |
| ORF1ab | NSP7β | 0/0 | 1/0 |
| ORF1ab | NSP8 | 0/0 | 0/0 |
| ORF1ab | NSP9 | 0/0 | 0/0 |
| ORF1ab | NSP10 | 0/0 | 1/0 |
| ORF1ab | NSP11 | 0/0 | 0/0 |
| ORF1ab | NSP12 | 1/5 | 1/7 |
| ORF2 | GP2 | 0/1 | 1/1 |
| ORF3 | GP3 | 9/12 | 18/15 |
| ORF4 | GP4 | 11/0 | 22/0 |
| ORF5 | GP5 | 0/0 | 0/0 |
| ORF6 | M | 1/0 | 1/0 |
| ORF7 | N | 0/0 | 2/2 |
Internal InDels are the result of the insertion or deletion of one or more codons within an ORF, whereas terminal InDels are due to the introduction or disruption of a stop codon.
Vaccine-derived recombinants found among the Belgian PRRSV-1 sequences.
| Event | Recombinant | Major | Minor | Start/Stop | Region |
|---|---|---|---|---|---|
| 7 | MZ417409 | MZ417402 | KT988004 | 11,927/13,412 | ORF2-ORF5 |
| 15 | MZ417409 | MZ417402 | KT988004 | 13,413/14,733 | ORF5-ORF7 |
| 5 | MZ417426 | MZ417421 | DD093450 | 6049/9047 | ORF1a-ORF1b |
| 36 | MZ417449 | MZ417446 | KT988004 | 8104/8543 | ORF1b |
| 9 a | MZ417459 | MZ417446 | DD093450 | 6514/8680 | ORF1a-ORF1b |
| 13 | MZ417463 | MZ417462 | KT988004 | 13,157/14,587 | ORF5-ORF7 |
| 2 | MZ417464 | MZ417463 | DD093450 | 2721/10,884 | ORF1a-ORF1b |
| 13 | MZ417464 | MZ417462 | KT988004 | 13,232/14,587 | ORF5-ORF7 |
| 3 | MZ417469 | MZ417467 | GU067771 | 6271/10,979 | ORF1a-ORF1b |
| 51 | MZ417469 | MZ417467 | GU067771 | 1/542 | ORF1a |
| 20 b | MZ417496 | MZ417417 | KT988004 | 13,366/14,695 | ORF5-ORF7 |
Potential recombination events were identified using RDP5.05 [51]. Sequences were considered to be recombinant when the same recombination signal was detected by at least 4 methods. The start/stop positions refer to the position in the recombinant sequence of the WGS dataset, which does not contain the terminal untranslated regions (see File S2 for more details). a The same recombination event was detected in sequences MZ417456- MZ417458. b The same recombination event was detected in sequence MZ417495.
Figure 1Recombination region count matrix (upper right) and recombination region hot/cold spot matrix (lower left) of the WGS dataset. Unique recombination events were mapped onto the matrices based on their estimated breakpoint positions. The colours in the upper right matrix are a function of the number of times that pairs of nucleotides (plotted on the x and y axes) are separated by observable recombination events. Highly exchangeable regions are depicted by warm colours, whereas less exchangeable regions are depicted by cool colours. The blue and red colour in the lower left matrix represent pairs of sites that appear to be the least and most separable by recombination. The genome organization is represented on top of the graph. Non-structural and structural proteins are shown in red and blue, respectively.
Figure 2Distribution of recombination breakpoints within the WGS dataset. A 200 bp window was moved along the alignment 1 bp at a time, and the number of breakpoints detected within the window region was counted and plotted (solid lines). The horizontal upper and lower dashed lines indicate the 99% and 95% confidence thresholds for globally significant breakpoint clusters. Light and dark grey areas indicate the 99% and 95% local breakpoint clustering thresholds. The positions of body TRS- and TRS-like elements are depicted as vertical lines. The red solid lines represent the body TRS elements in the alignment. The positions of TRS-like elements that are present in at least 33% of the sequences are shown as red dotted lines. The genome organization is represented on top of the graph. Non-structural and structural proteins are shown in red and blue, respectively.
Phylogenetic signal present in the complete PRRSV-1 datasets.
| Dataset | Length | % Fully Resolved | % Partly Resolved | % Unresolved |
|---|---|---|---|---|
| CDS | 15,591 | 97.65 | 2.24 | 0.11 |
| ORF1a | 7506 | 94.85 | 4.53 | 0.62 |
| ORF1b | 4458 | 89.26 | 8.81 | 1.93 |
| ORF2a | 747 | 66.06 | 15.89 | 18.05 |
| ORF3 | 810 | 74.79 | 15.28 | 9.92 |
| ORF4 | 555 | 66.94 | 19.58 | 13.48 |
| ORF5 | 603 | 82.62 | 9.20 | 8.18 |
| ORF6 | 519 | 75.04 | 13.67 | 11.29 |
| ORF7 | 393 | 61.40 | 13.85 | 24.75 |
The phylogenetic content of each dataset was evaluated by likelihood mapping analysis [35] as implemented in IQ-TREE v2.1.0 [32].
Figure 3Maximum likelihood tree of the CDS dataset. The phylogenetic tree was reconstructed using the maximum likelihood (ML) method as implemented in IQ-TREE v2.1.0 [36]. Optimal partitioning schemes and nucleotide substitution models were selected using ModelFinder v2.1.0 [33,34]. Fifty independent runs were performed using the best partitioning schemes and nucleotide substitution models. Branch supports were assessed using UFBoot2 [37] as well as SH-aLRT [38] with 10,000 replicates. The tree with the highest ML-score was mid-point rooted and annotated in the R environment using the packages phytools [52] and ggtree [39], respectively. The tree was coloured according to the CDS clustering scheme (C1–C31).
Figure 4Comparison of the renumbered clustering schemes of the CDS and ORF datasets. Maximum likelihood trees were generated for the CDS and ORF datasets using IQ-TREE v 2.1.0 [36]. The resulting trees were subjected to cluster analysis according to the methodology described by Prosperi et al. [40]. Clustering schemes of the ORF datasets were compared with the CDS clustering scheme, which served as reference. The renumbered CDS and ORF clusters of all PRRSV-1 strains were plotted against each other using the R package ggplot2 [46]. (A) Comparison between the ORF1a and CDS clustering schemes; (B) Comparison between the ORF1b and CDS clustering schemes; (C) Comparison between the ORF2a and CDS clustering schemes; (D) Comparison between the ORF3 and CDS clustering schemes; (E) Comparison between the ORF4 and CDS clustering schemes; (F) Comparison between the ORF5 and CDS clustering schemes; (G) Comparison between the ORF6 and CDS clustering schemes; (H) Comparison between the ORF7 and CDS clustering schemes.
Comparison of the renumbered clustering schemes of the CDS and ORF datasets.
| Dataset | AMI | F(b3) |
|---|---|---|
| CDS/ORF1a | 0.93 | 0.90 |
| CDS/ORF1b | 0.90 | 0.86 |
| CDS/ORF2a | 0.81 | 0.75 |
| CDS/ORF3 | 0.80 | 0.71 |
| CDS/ORF4 | 0.74 | 0.68 |
| CDS/ORF5 | 0.79 | 0.73 |
| CDS/ORF6 | 0.74 | 0.65 |
| CDS/ORF7 | 0.73 | 0.65 |
Clustering schemes of the ORF datasets were compared with the CDS clustering scheme, which served as reference. The adjusted mutual information (AMI) and BCubed F-score (F(b3)) were calculated using the R packages aricode [49] and DPBBM [50], respectively.