| Literature DB >> 34207433 |
Ana Catalina Blazquez1, Ariel José Berenstein1, Carolina Torres2,3, Agustín Izquierdo4, Carol Lezama5, Guillermo Moscatelli6, Elena Noemí De Matteo1, Mario Alejandro Lorenzetti1, María Victoria Preciado1.
Abstract
The sequence variability of the Epstein-Barr virus has been extensively studied throughout previous years in isolates from various geographic regions and consequent variations at both genetic and genomic levels have been described. However, isolates from South America were underrepresented in these studies. Here, we sequenced 15 complete EBV genomes that we analyzed together with publicly available raw NGS data for 199 EBV isolates from other parts of the globe by means of a custom-built bioinformatic pipeline. The phylogenetic relations of the genomes, the geographic structure and variability of the data set, and the evolution rates for the whole genome and each gene were assessed. The present work contributes to overcoming the scarcity of complete EBV genomes from South America and is the most comprehensive geography-related variability study, which involved determining the actual contribution of each EBV gene to the geographic segregation of the entire genome. Moreover, to the best of our knowledge, we established for the first time the evolution rate for the entire EBV genome based on a host-virus codivergence-independent assumption and assessed their evolution rates on a gene-by-gene basis, which were related to the encoded protein function. Considering the evolution of dsDNA viruses with a codivergence-independent approach may lay the basis for future research on EBV evolution. The exhaustive bioinformatic analysis performed on this new dataset allowed us to draw a novel set of conclusions regarding the genome evolution of EBV.Entities:
Keywords: EBV Argentina; Epstein–Barr virus; evolution rate; geographic variability; next-generation sequencing
Year: 2021 PMID: 34207433 PMCID: PMC8235469 DOI: 10.3390/v13061172
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Alignment entropy analysis: (A) entropic positions in the single pipeline (SP)-generated alignment; (B) entropic positions in the alignment directly generated with GenBank (GB)-downloaded sequences, for which highly entropic positions are depicted as black spikes or blocks; (C) entropy density chart for both SP and GB alignments, considering all alignment positions; (D) entropy comparison considering only positions with entropy >0, depicted in box plots.
Figure 2Phylogenetic reconstruction of 214 complete EBV genomes from different geographies. Phylogenetic tree constructed under the maximum likelihood method and 1000 ultrafast bootstrap resampling iterations. Only values over 70 are shown. The green shaded clade contains the EBV2 sequences and the pink shaded clade is EBV1. The color of each sequence represents the geographical region of the viral isolate. One sequence without a reliable origin of isolation is labeled in black (omitted in further analyses). The five subclades were highlighted using different externally colored bars. South American sequences correspond to isolates sequenced in the present study or were previously sequenced isolates from Argentina.
Figure 3Quantification of EBV1 genomic variation regarding geographic origin: (A) quantification and comparison of the amounts of variants from different regions, with the results depicted in box plots; (B–I) positioning of common variants against the EBV1 reference along the genome for each geographical region.
Figure 4Principal component analysis: (A) PCA of EBV1 sequences showing PC1 and PC2, whereby sequences are colored according to their geographic origin; (B) geographical distribution of the sequences, whereby the color intensity in the heat map scale represents the segregation potential of PC1; (C,D) quantification of the differences in PCA distribution (C) for PC1 and (D) for PC2.
Figure 5Discrimination analysis of principal components (DAPC): (A) ROC curve depicts the segregation potential of the sequences shown in Table 1; (B) DAPC variance contribution of each site (SNP) along the EBV1 genome. The red horizontal line indicates the threshold (0.00066) above which structural SNPs were identified. Colored boxes delimit the 4 informative coding regions of the genome. The most relevant genes within each region are indicated on the top of each box.
Figure 6Analysis of evolutionary rates: (A) evolution rates for 60 analyzed genes, where dots depict mean evolution rates, error bars indicate 95% HPDI (highest posterior density interval), the red dotted line depicts the genome’s mean evolution rate, and the gray horizontal stripe denotes the 95% HPDI for the genome´s mean evolution rate; (B–F) comparison of the genes’ evolution rates among different groups contained in the biological categories (gene names listed in Table 1). Results are depicted in box plots. Asterisks denote statistical significance. Adjusted p-value ˂ 0.05 (*); adjusted p-value > 0.05 (ns).
The table describes the groups as defined by the unsupervised analysis from the Gene Ontology database and the groups created based on their biological significance and those genes belonging to each group.
|
|
| Viral immunological escape and transcription |
| Viral infective |
| Viral replicative |
|
|
| Host cell plasma membrane |
| Host cell nucleus/cytoplasm |
| Extracellular space |
|
|
| Structural proteins |
| Non structural proteins |
|
|
| Enzymatic function |
| Non enzymatic function |
|
|
| DNA/RNA binding protein |
| Non DNA/RNA binding protein |