| Literature DB >> 31273385 |
Fanny Wegner1,2, Florent Lassalle3,4, Daniel P Depledge1, François Balloux3, Judith Breuer1.
Abstract
Epstein-Barr virus (EBV) is one of the most common viral infections in humans and persists within its host for life. EBV therefore represents an extremely successful virus that has evolved complex strategies to evade the host's innate and adaptive immune response during both initial and persistent stages of infection. Here, we conducted a comparative genomics analysis on 223 whole genome sequences of world-wide EBV strains. We recover extensive genome-wide linkage disequilibrium (LD) despite pervasive genetic recombination. This pattern is explained by the global EBV population being subdivided into three main sub-populations, one primarily found in East Asia, one in Southeast Asia and Oceania, and the third including most of the other globally distributed genomes we analyzed. Additionally, sites in LD were overrepresented in immunogenic genes. Taken together, our results suggest that host immune selection and local adaptation to different human host populations has shaped the genome-wide patterns of genetic diversity in EBV.Entities:
Year: 2019 PMID: 31273385 PMCID: PMC6805225 DOI: 10.1093/molbev/msz152
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.(A) Heatmap of SNPs which are in significant LD. Darker colors indicate lower P values, with insignificant pairs being colors white. Numbers denote the genome positions (sampled uniformly), with rows and columns that did not contain any significant pair of SNPs in LD removed. (B) Association network of all sites in LD with at least one other site. Each node represents a biallelic site, each link between two nodes signifies they are in LD with each other. (C) Frequency plot of all connected components in the association network.
. 2.Population assignment for all genome sequences assuming a population number of k = 3 for different subsets of sites. Every bar represents a strain that has been preassigned to either “Africa,” “Asia” or “Western” (comprised of American, European, and Australian isolates). The coloring of the bars represents the proportion of the input sites that have been assigned to a certain population. (A) all biallelic sites; (B) all sites in LD in the largest component; (C) nonsynonymous pairs of sites in LD; (D) synonymous pairs of sites in LD; (E) sites not in LD. (B–D) refer to the subset of sites in LD that are in the largest component in the association network (fig. 1).
. 3.(A) Proportion of pairs of nonsynonymous sites in LD with each other over LD strength. (B–C) Number of links between nonsynonymous sites between different categories of genes. (B) All sites (chi-square test, P < 2.2e-16). (C) Sites with a minimal distance of 1 kb (chi-square test, P < 2.2e-16).
List of 33 Genes Present in the Gene Network Considered to Code for Immunogenic Proteins.
| Protein | ORF | Number of Epitopes |
|---|---|---|
| Major DNA-binding protein |
| 2 |
| Tripartite terminase subunit UL28 homolog |
| 1 |
| Envelope glycoprotein B |
| 16 |
| DNA polymerase catalytic subunit |
| 1 |
| Ribonucleoside-diphosphate reductase small chain |
| 1 |
| Portal protein UL6 homolog |
| 1 |
| Major capsid protein |
| 3 |
| Triplex capsid protein VP23 homolog |
| 1 |
| Capsid protein VP26 |
| 13 |
| Protein BGLF3 |
| 1 |
| Apoptosis regulator BHRF1 |
| 3 |
| Envelope glycoprotein GP350 |
| 2 |
| Deoxyuridine 5′-triphosphate nucleotidohydrolase |
| 3 |
| DNA polymerase processivity factor BMRF1 |
| 8 |
| Protein BMRF2 |
| 1 |
| Major tegument protein |
| 4 |
| Protein BOLF1 |
| 2 |
| Ribonucleoside-diphosphate reductase large subunit |
| 1 |
| Replication and transcription activator |
| 8 |
| Tegument protein BRRF2 |
| 2 |
| DNA primase |
| 1 |
| Envelope glycoprotein H |
| 11 |
| Transactivator protein BZLF1 |
| 24 |
| Epstein–Barr nuclear antigen 1 |
| 82 |
| Epstein–Barr nuclear antigen 2 |
| 12 |
| Epstein–Barr nuclear antigen 3 |
| 30 |
| Epstein–Barr nuclear antigen 4 |
| 23 |
| Epstein–Barr nuclear antigen 6 |
| 33 |
| Protein LF2 |
| 1 |
| Uncharacterized protein LF3 |
| 1 |
| Latent membrane protein 1 |
| 17 |
| Latent membrane protein 2 |
| 32 |
Note.—Each epitope must have at least two references listed in IEDB.
. 4.Whole gene network, colored based on Eigenvector centrality, with warm colors indicating higher and cooler colors lower scores, respectively. Square node symbols denote genes belonging to IG, circular nodes denote genes belonging to NIG.
Most Influential Nodes in the Network.
| Eigenvector Rank | ORF | Protein | IG |
|---|---|---|---|
| 1 |
| Large tegument protein deneddylase | ○ |
| 2 |
| TBP-like protein | |
| 3 |
| DNA replication helicase | |
| 4 |
| Epstein–Barr nuclear antigen 4 | ● |
| 5 |
| Envelope glycoprotein GP350 | ● |
| 6 |
| Major tegument protein | ● |
| 7 |
| Epstein–Barr nuclear antigen 1 | ● |
| 8 |
| Tripartite terminase subunit 1 | |
| 9 |
| Capsid vertex component 1 | ○ |
| 10 |
| Tegument protein | |
| 11 |
| Epstein–Barr nuclear antigen 3 | ● |
| 12 |
| Latent membrane protein 1 | ● |
| 13 |
| Epstein–Barr nuclear antigen 6 | ● |
| 14 |
| Protein BOLF1 | ● |
| 15 |
| Tegument protein | ○ |
| 16 |
| Major DNA-binding protein | |
| 17 |
| BDLF3 (Glycoprotein) | |
| 18 |
| DNA primase | |
| 19 |
| Capsid vertex component 2 | ○ |
| 20 |
| Replication and transcription activator | ● |
| 21 |
| Envelope glycoprotein H | ● |
| 22 |
| Uncharacterized protein | |
| 23 |
| Uncharacterized protein | |
| 24 |
| DNA helicase/primase complex-associated protein | ○ |
| 25 |
| Envelope glycoprotein B | ● |
Note.—Circles in the column labeled IG mark proteins for which an immune response has been reported, with filled circles fulfilling the criterium of having at least two references and empty circles having fewer than two.