| Literature DB >> 25275123 |
Daniel P Depledge1, Eleanor R Gray2, Samit Kundu2, Samantha Cooray2, Anja Poulsen3, Peter Aaby4, Judith Breuer2.
Abstract
UNLABELLED: Varicella-zoster virus (VZV), a double-stranded DNA alphaherpesvirus, is associated with seasonal outbreaks of varicella in nonimmunized populations. Little is known about whether these outbreaks are associated with a single or multiple viral genotypes and whether new mutations rapidly accumulate during transmission. Here, we take advantage of a well-characterized population cohort in Guinea-Bissau and produce a unique set of 23 full-length genome sequences, collected over 7 months from eight households. Comparative sequence analysis reveals that four distinct genotypes cocirculated among the population, three of which were present during the first week of the outbreak, although no patients were coinfected, which indicates that exposure to infectious virus from multiple sources is common during VZV outbreaks. Transmission of VZV was associated with length polymorphisms in the R1 repeat region and the origin of DNA replication. In two cases, these were associated with the formation of distinct lineages and point to the possible coevolution of these loci, despite the lack of any known functional link in VZV or related herpesviruses. We show that these and all other sequenced clade 5 viruses possess a distinct R1 repeat motif that increases the acidity of an ORF11p protein domain and postulate that this has either arisen or been lost following divergence of the major clades. Thus, sequencing of whole VZV genomes collected during an outbreak has provided novel insights into VZV biology, transmission patterns, and (recent) natural history. IMPORTANCE: VZV is a highly infectious virus and the causative agent of chickenpox and shingles, the latter being particularly associated with the risk of painful complications. Seasonal outbreaks of chickenpox are very common among young children, yet little is known about the dynamics of the virus during person-to-person to transmission or whether multiple distinct viruses seed and/or cocirculate during an outbreak. In this study, we have sequenced chickenpox viruses from an outbreak in Guinea-Bissau that are supported by detailed epidemiological data. Our data show that multiple different virus strains seeded and were maintained throughout the 6-month outbreak period and that viruses transmitted between individuals accumulated new mutations in specific genomic regions. Of particular interest is the potential coevolution of two distinct parts of the genomes and our calculations of the rate of viral mutation, both of which increase our understanding of how VZV evolves over short periods of time in human populations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25275123 PMCID: PMC4249134 DOI: 10.1128/JVI.02337-14
Source DB: PubMed Journal: J Virol ISSN: 0022-538X Impact factor: 5.103
Overview of sample collection, household association, and putative transmission chains
| Sample | Household | Age (yr) | Date of infection (day/mo) | Putative infection routes | Accession no. | ||
|---|---|---|---|---|---|---|---|
| Route A | Route B | Route C | |||||
| Bandim 1 | 1 | 7 | 7/2 | Index | KM355696 | ||
| Bandim 2 | 1 | 3 | 28/2 | Recipient | KM355697 | ||
| Bandim 3 | 2 | 9 | 15/1 | Index | KM355698 | ||
| Bandim 4 | 2 | 2 | 22/1 | Recipient | Index | KM355699 | |
| Bandim 5 | 2 | 2 | 1/2 | Recipient | Recipient | KM355700 | |
| Bandim 6 | 2 | 12 | 7/2 | Recipient | KM355701 | ||
| Bandim 7 | 3 | 2 | 12/2 | Recipient | Index | KM355702 | |
| Bandim 8 | 3 | 6 | 24/1 | Index | Index | KM355703 | |
| Bandim 10 | 3 | NA | 1/3 | Recipient | KM355704 | ||
| Bandim 11 | 4 | 6 | 4/5 | Index | KM355705 | ||
| Bandim 12 | 4 | 2 | 21/5 | Recipient | KM355706 | ||
| Bandim 13 | 5 | 44 | 5/6 | Recipient | KM355707 | ||
| Bandim 14 | 5 | 1 | 14/5 | Index | KM355708 | ||
| Bandim 15 | 6 | 12 | 31/5 | Index | KM355709 | ||
| Bandim 16 | 6 | 4 | 11/6 | Recipient | Index | KM355710 | |
| Bandim 17 | 6 | 10 | 18/6 | Recipient | Recipient | KM355711 | |
| Bandim 18 | 6 | 4 | 18/6 | Recipient | Recipient | KM355712 | |
| Bandim 19 | 7 | 8 | 17/1 | Index | KM355713 | ||
| Bandim 20 | 7 | NA | 17/1 | Index | KM355714 | ||
| Bandim 21 | 7 | NA | 30/1 | Recipient | Recipient | Index | KM355715 |
| Bandim 22 | 7 | NA | 15/2 | Recipient | KM355716 | ||
| Bandim 23 | 8 | 18 | 5/3 | Index | KM355717 | ||
| Bandim 24 | 8 | 1 | 21/3 | Recipient | KM355718 | ||
Up to three putative infection routes could be established per household given a transmission period of between 7 and 21 days after the onset of the first symptoms.
NA, not available.
FIG 1Geographic and temporal distribution of Bandim samples. Sampling of patients from eight households was performed within a 4-km2 area of the Bandim township during the start, middle, and end of a varicella outbreak. (A) Period of symptomatic infection for each patient. Data are colored according to the lineages described in Fig. 1 and 2. (B) Location of each sample household during the months in which sampling took place, again the data are colored according to lineage. Bisected circles indicated multiple lineages within a single household (e.g., household 1 during February).
FIG 2Neighbor-joined phylogeny identified two major clade 5 genotypes in Bandim samples. A neighbor-joined phylogeny (500 bootstraps) comprising all publicly available VZV genomes with the genomes presented here shows clear segregation of the major geographical clades, while the primary genogroups 5A and 5B segregate within clade 5. Bootstrap scores are not given where bootstrap values fall below 0.9. Well-supported nodes (>0.9) are indicated by an asterisk.
Sequencing metrics for Bandim sample collection
| Sample | No. of paired reads | %OTR | MRD | %COV | OriS motif | |||
|---|---|---|---|---|---|---|---|---|
| Total | Mapping | >1× | >100× | Sanger | NGS | |||
| Bandim 1 | 449,170 | 220,828 | 52.30 | 307 | 99.99 | 80.44 | 8-15 | 8-15 |
| Bandim 2 | 479,705 | 425,669 | 91.96 | 583 | 99.99 | 99.21 | 8-15 | 8-15 |
| Bandim 3 | 494,775 | 423,408 | 88.17 | 582 | 99.99 | 90.01 | 10-8 | 10-8 |
| Bandim 4 | 477,905 | 431,040 | 92.53 | 589 | 99.99 | 96.83 | 10-8 | 10-8 |
| Bandim 5 | 496,764 | 425,040 | 87.86 | 583 | 99.99 | 94.38 | 10-8 | 10-8 |
| Bandim 6 | 605,683 | 424,736 | 73.13 | 577 | 99.99 | 93.99 | 9-12 | 9-12 |
| Bandim 7 | 618,631 | 403,137 | 68.71 | 549 | 99.99 | 98.33 | 9-8 | 9-8 |
| Bandim 8 | 622,270 | 480,279 | 79.65 | 659 | 99.99 | 94.12 | ||
| Bandim 10 | 466,700 | 398,238 | 87.60 | 550 | 99.99 | 94.85 | 8-15 | 8-15 |
| Bandim 11 | 508,867 | 453,839 | 91.97 | 623 | 99.99 | 99.38 | ||
| Bandim 12 | 458,520 | 234,607 | 56.08 | 324 | 99.99 | 88.82 | 10-8 | 10-8 |
| Bandim 13 | 1,198,584 | 946,813 | 80.84 | 1,955 | 99.99 | 97.47 | 9-15 | 9-15 |
| Bandim 14 | 1,227,196 | 949,880 | 79.64 | 1,972 | 99.99 | 98.27 | 9-16 | 9-16 |
| Bandim 15 | 1,410,853 | 1,209,428 | 87.69 | 2,471 | 99.99 | 97.02 | 9-9 | 9-9 |
| Bandim 16 | 1,386,654 | 694,294 | 55.87 | 1,203 | 99.99 | 95.94 | 9-9 | 9-9 |
| Bandim 17 | 891,210 | 619,377 | 74.58 | 848 | 99.99 | 95.07 | 9-9 | 9-9 |
| Bandim 18 | 1,997,274 | 1,737,784 | 89.30 | 3,578 | 99.99 | 99.74 | 9-16 | 9-16 |
| Bandim 19 | 1,714,100 | 1,479,316 | 88.37 | 3,074 | 99.99 | 98.74 | 10-8 | 10-8 |
| Bandim 20 | 1,757,018 | 1,446,020 | 84.64 | 2,990 | 99.99 | 97.73 | 10-8 | 10-8 |
| Bandim 21 | 845,990 | 745,560 | 89.61 | 1,543 | 99.99 | 97.89 | 10-8 | 10-8 |
| Bandim 22 | 1,562,621 | 1,284,750 | 84.50 | 2,678 | 99.99 | 96.31 | 10-8 | 10-8 |
| Bandim 23 | 1,428,708 | 1,213,904 | 86.75 | 2,444 | 99.99 | 99.69 | 8-14 | 8-14 |
| Bandim 24 | 1,550,409 | 1,356,234 | 89.19 | 2,817 | 99.99 | 98.78 | 8-14 | 8-14 |
%OTR, on-target read percentage (i.e., the percentage of VZV mapping reads).
MRD, mean read depth per base.
%COV, percentage of genome covered at defined read depth.
The first number indicates the number of TA repeats, and the second number indicates the number of GA repeats. Boldfacing indicates where a discrepancy exists between Sanger and NGS data.
FIG 3Phylogenies generated from full genome sequences and repeat region sequences only for Bandim sample collection. The neighbor-joined phylogeny comprising all Bandim samples (derived from Fig. 1) with asterisks denoting bootstrap support >0.9 (A) and a UPGMA (unweighted pair-group method with arithmetic averages) phylogeny (constructed from a distance matrix calculated from the repeat data where the variable is the number of repeat units) for the same sample collection using just the R1-R5 repeat region sequences (B) are shown. Repeat region patterns are identified by color and correspond to data shown in Table S1 in the supplemental material. (C) Phylogenetic network reveals multiple genotypes. Two primary genogroups are present in the Guinea-Bissau data set, while Bandim 6 and Bandim 13/14 are also considered different lineages. Nodes are colored according to the lineage (gray nodes are median joins), labeled according to the sample, and “sized” according to the number of identical genomes (not including variation in repeat regions). The numbers of SNP differences between consensus sequences are labeled along the branches (i.e., branch lengths are not scaled to number of changes).
FIG 4Estimates of VZV evolutionary rates. The evolutionary rate of VZV (mean and 95% highest posterior density [HPD]) is estimated from the true dates of sampling (highlighted in red) and from the shuffled dates of isolation (in black). The mean and 95% HPD interval is shown for each condition (the true and shuffled analyses). The results from either all 23 VZV whole viral genomes (a) or genogroup 5B alone (b) show that the estimated rate from both data sets overlaps with the 95% HPD intervals of the shuffled runs, indicating that there is little evidence of temporal structure.
Substitution rates and estimates of time since the lineages diverged: clock model, mean evolutionary rate, and 95% highest probability density interval
| Genogroup(s) | Clock model | Mean evolutionary rate (per site/yr) | 95% HPD | |
|---|---|---|---|---|
| Lower | Upper | |||
| 5B and 5A | Strict | 1.82 × 10−5 | 5.45 × 10−7 | 3.85 × 10−5 |
| Relaxed lognormal | 2.19 × 10−5 | 1.66 × 10−7 | 5.08 × 10−5 | |
| Relaxed exponential | 2.65 × 10−5 | 1.05 × 10−7 | 5.81 × 10−5 | |
| 5B only | Strict | 5.91 × 10−5 | 3.86 × 10−10 | 2.05 × 10−4 |
| Relaxed lognormal | 5.79 × 10−5 | 4.76 × 10−9 | 1.43 × 10−4 | |
| Relaxed exponential | 6.65 × 10−5 | 1.17 × 10−7 | 1.64 × 10−4 | |
HPD, highest probability density.
Substitution rates and estimates of time since the lineages diverged: segregation, years since divergence, and 95% confidence interval
| Genogroup rate used | Segregation | Time (yr) since divergence | 95% CI | |
|---|---|---|---|---|
| Lower | Upper | |||
| 5B and 5A | 5A and 5B | 7.14 | 1.71 | 28.36 |
| 5B and Bandim 6 | 3.47 | 0.86 | 13.70 | |
| 5B and Bandim 13/14 | 2.00 | 0.53 | 7.91 | |
| 5B only | 5A and 5B | 2.29 | 1.68 | 2.95 |
| 5B and Bandim 6 | 1.16 | 0.79 | 1.61 | |
| 5B and Bandim 13/14 | 0.81 | 0.56 | 1.12 | |
| Firth et al. ( | 5A and 5B | 30.37 | 20.85 | 41.05 |
| 5B and Bandim 6 | 14.21 | 8.26 | 21.24 | |
| 5B and Bandim 13/14 | 8.03 | 3.87 | 12.93 | |
That is, the time to the most contemporaneous tip (18 June 2001).
CI, confidence interval.
FIG 5Forty-four SNPs segregate VZV genomes isolated during the 2001 outbreak. Black boxes indicate SNP differences from the Dumas reference genome (while empty boxes indicate agreement). The positions identified (nucleotide base and codon) are equivalent to Dumas (NC_001348). Nonsynonymous changes are highlighted in gray in the upper part of the table, while the repeat regions R1, R2, R4, R5, and the OriS are highlighted in purple. Red and blue shading of the repeat regions indicate the relative conservation of the repeat region sequences in each of the genogroups (e.g., R5 differs between genogroups but is perfectly conserved within each genogroup) and correlates with data shown in Fig. 3. Putative transmission chains are grouped (based on geographic information) in the first column with the month during which the sample was isolated shown in the second column. Repeat regions that could not be amplified are indicated by an “X” (note that R3 repeat regions are not included). nc, noncoding region; *, stop codon.
R1 repeat regions differ between clade 5 genomes and all other genomes
| GenBank accession no./sequence identification | Clade | R1 motif | Length (aa) |
|---|---|---|---|
| 1 | [αββ][αββ][αββ][αβ][αβ][αβ] | 81 | |
| 1 | [αββ][αββ][αββ][αβ][αββ1][αβ] | 86 | |
| 1 | [αββ][αββ][αββ][αββ][αβ][αβ] | 101 | |
| 1 | [αββ][αββ][αββ][αβ][βββ][αβδ][αβ] | 86 | |
| 1 | [αββ][αββ][αββ][αβδ][αβ][αβ] | 97 | |
| 1 | [αββ][αββ][αββ][αβδ][αβ][αβ][αβ] | 97 | |
| 1 | [αββ][αββ][αββ][αβδ][αββ][αββ][αβ] | 107 | |
| 1 | [αββ][αββ][αββ][βββ][αβδ][αβ] | 90 | |
| 1 | [αββ][αββ][αβδ][αβ][αβ][αβ] | 81 | |
| 1 | [αβ][βββ][αββ][αββ][αβ][α1βδ][αβ][αβ] | 107 | |
| 2 | [αββ][αββ][αββ][αββ][αββ][αβ] | 91 | |
| 2 | [αββ][αββ][αββ][δ][αβ][αβ] | 75 | |
| 2 | [αββ][αββ][αββ][δ][αβ][αβ][αβ] | 86 | |
| 3 | [αββ][αβ][ββ1β][αββ1][αβ] | 69 | |
| 3 | [αβδ3][αββ][αβ][αββ][αβδ3][αβ] | 86 | |
| 3 | [αβδ3][αδ2β][αββ][αββ][αβ] | 75 | |
| 3 | [αδ2β][αβ][αββ][αβ][αβ] | 65 | |
| 4 | [αββ][αββ][αββ][δ][αβ][αβ][αβ] | 86 | |
| 4 | [αββ][αβ][ββδ][αβ][ββδ][αβ][αβ][αβ] | 101 | |
| Bandim 15/16/17 | 5 | [αβδ1][αββ][αββ][αβ][εβ][εβ][εβ][εβ][εβ][εβ][εβ][εβ][εβ] | 122 |
| Bandim 18 | 5 | [αβδ1][αδ2β][αββ][αβ][εβ][εβ][εβ] | 80 |
| Bandim 3/4/5/19/20/21/22/11/12/7 | 5 | [αβδ1][αδ2β][αββ][αβ][εβ][εβ][εβ][εβ][εβ][εβ][εβ][εβ] | 115 |
| Bandim 8/10/1/2/23/24 | 5 | [αβδ1][αδ2β][αββ][αββ][εβ][εβ] | 78 |
| Bandim 6 | 5 | [αβδ1][αδ2β][αββ][αββ][εβ][εβ][εβ][εβ][εβ][εβ][εβ][εβ] | 120 |
| Bandim 13/14 | 5 | [αβδ1][αδ2δ2β][αββ][αβ][εβ][εβ][εβ][εβ] | 92 |
| 5 | [αβδ3][αββ][αββ][αβ][βββ][αβ][εβ][εβ] | 99 | |
| 5 | [αβδ3][αδ2β][αββ][αββ][αββ][εβ] | 87 | |
| 5 | [αβδ3][αδ2β][αββ][αβ][βββ][αββ1][εβ] | 97 | |
| 5 | [αβδ3][αδ2β][αββ][αβ][βββ][αββ2][εβ] | 97 |
α, DAIDDE; β, GEAEE; δ, DAAEE; δ1, GETEE; δ2, GDAEE; ε, DE.
FIG 6The major circulating genogroups and lineages diverged prior to the 2001 outbreak. Dated Bayesian phylogenetic trees showing the temporal relationships between VZVs sampled during the outbreak in Guinea-Bissau. Nodes with a posterior probability of >0.75 are indicated by asterisks, and the scale bar indicates time in years (i.e., since the date of the most contemporaneous sample (18 June 2001). Genogroup 5A is highlighted in red, and genogroup 5B is highlighted in blue. Dates were inferred with a strict clock (A), the rate fixed to that estimated from a strict clock analysis of just the genogroup 5B samples (B), and the rate fixed to that estimated from a previous study (26) (C).