| Literature DB >> 30288299 |
Florent Lassalle1, Daniel P Depledge2, Matthew B Reeves2, Amanda C Brown3, Mette T Christiansen2, Helena J Tutill2, Rachel J Williams2, Katja Einer-Jensen4, Jolyon Holdstock3, Claire Atkinson5, Julianne R Brown6, Freek B van Loenen7, Duncan A Clark8, Paul D Griffiths2, Georges M G M Verjans7, Martin Schutten7, Richard S B Milne2, Francois Balloux1, Judith Breuer2.
Abstract
Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle.Entities:
Keywords: CMV; immune evasion; recombination; viral evolution
Year: 2016 PMID: 30288299 PMCID: PMC6167919 DOI: 10.1093/ve/vew017
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
List of genomes sequences used in the present study and associated metadata
| Strain name | GenBank accession | Country of isolation | Sample type | Immune status |
|---|---|---|---|---|
| NL/Rot1/Urine/2012 | KT726940 | Netherlands | Urine (1 passage) | Competent |
| NL/Rot2/Urine/2012 | KT726941 | Netherlands | Urine (1 passage) | Competent |
| NL/Rot3/Nasal/2012 | KT726942 | Netherlands | Nasal rinse (1 passage) | Competent |
| NL/Rot4/Nasal/2012 | KT726943 | Netherlands | Nasal rinse (1 passage) | Competent |
| NL/Rot5/Urine/2012 | KT726944 | Netherlands | Urine (1 passage) | Competent |
| NL/Rot6/Nasal/2012 | KT726945 | Netherlands | Nasal rinse (1 passage) | Competent |
| NL/Rot7/Urine/2012 | KT726946 | Netherlands | Urine (1 passage) | Competent |
| UK/Lon1/Blood/2013 | KT726947 | UK | Blood (EDTA) | Compromised |
| UK/Lon2/Blood/2013 | KT726948 | UK | Blood (EDTA) | Compromised |
| UK/Lon6/Urine/2011 | KT726949 | UK | Urine | Competent |
| UK/Lon7/Urine/2011 | KT726950 | UK | Urine | Competent |
| UK/Lon8/Urine/2012 | KT726951 | UK | Urine | Competent |
| UK/Lon3/Plasma/2012 | KT726952 | UK | Plasma | Compromised |
| UK/Lon9/Urine/2012 | KT726953 | UK | Urine | Competent |
| UK/Lon4/Bile/2011 | KT726954 | UK | Whole blood & bile | Compromised |
| UK/Lon5/Blood/2010 | KT726955 | UK | Whole blood & biopsy | Compromised |
| Merlin | NC_006273 | UK | Urine | Competent |
| 3157 | GQ221974 | UK | Urine (3 passage) | Competent |
| JP | GQ221975 | UK | Biopsy (prostate) | Compromised |
| HAN20 | GQ396663 | Germany | BAL (2 pass) | Unknown |
| HAN38 | GQ396662 | Germany | BAL (2 pass) | Unknown |
| 3301 | GQ466044 | UK | Urine | Competent |
| JHC | HQ380895 | South Korea | Whole blood | Compromised |
| U8 | GU179288 | Italy | Urine | Competent |
| U11 | GU179290 | UK | Urine | Competent |
| VR1814 | GU179289 | Italy | Cervical secretion | Competent |
| TR | KF021605 | USA | Vitreuos humor (>4 pass) | Compromised |
| BE/9/2010 | KC519319 | Belgium | Urine (2 pass) | Competent |
| BE/10/2010 | KC519320 | Belgium | Urine (2 pass) | Competent |
| BE/11/2010 | KC519321 | Belgium | Urine (2 pass) | Competent |
| BE/21/2010 | KC519322 | Belgium | Urine | Compromised |
| BE/27/2010 | KC519323 | Belgium | Urine (4 pass) | Compromised |
| HAN1 | JX512199 | Germany | BAL | Unknown |
| HAN2 | JX512200 | Germany | BAL (3 pass) | Unknown |
| HAN3 | JX512201 | Germany | BAL (3 pass) | Unknown |
| HAN8 | JX512202 | Germany | BAL (3 pass) | Unknown |
| HAN12 | JX512203 | Germany | BAL (3 pass) | Unknown |
| HAN16 | JX512204 | Germany | Urine (2 pass) | Unknown |
| HAN19 | JX512205 | Germany | BAL (2 pass) | Unknown |
| HAN22 | JX512206 | Germany | BAL (2 pass) | Unknown |
| HAN28 | JX512207 | Germany | BAL (3 pass) | Unknown |
| HAN31 | JX512208 | Germany | BAL (2 pass) | Unknown |
pass, passage.
aLongitudinal sampling (3 timepoints).
bSimultaneous sampling of two bodycompartments.
Figure 1.Heat map of correlation significance between the distribution of sequence genotypes in the 142 strains along (A) the whole genome or (B) in a close-up of the first 19 genes (25 non-recombining loci) in the prototypic HCMV genome organization. P values are indicated by coloring of the matrix cells (see Supplementary methods for attribution of genotypes). Each individual gene alignment was first scanned using GARD to identify recombination breakpoints and the alignments subsequently split either side of the breakpoint and considered as separate entities.
Figure 2.Conservation of protein sequences in cytomegalovirus. Syntenic circular genome maps showing protein sequence conservation (percent identity) against HCMV strain Merlin for (A) all 42 HCMV strains used in this study and (B) the non-human CMVs, from outer to inner track: CCMV, GMCMV, CyCMV, RhCMV, OMCMV, SMCMV, and MCMV. The percentage sequence identity is illustrated by the color legend for both A and B. (C) Maximum-likelihood tree of eight cytomegalovirus species (as in (B)) based on a whole genome alignment of conserved syntenic blocks. All bootstrap support values (based on 100 samples) were 100 percent.
Figure 3.Circular genome map showing linkage disequilibrium and nucleotidic diversity. (A) The purple backbone represents the classical HCMV genome arrangement TRL–UL–IRL–IRS–US–TRS where repeat sequences are shown in a lighter shade, the origin of lytic replication highlighted in pink, and the RL11 gene family indicated by the red shading. Tracks are numbered inwards: (1) Map of the protein-coding genes; (2) Presence of epitopes for CD4+ (blue), CD8+ (red) or both (purple)T-cells (Sylwester et al. 2005); (3) location of genes that have undergone positive selection episodes; (4) Local LD index, computed in 700-bp windows, the hotspots of LD (top 5 percent values) are highlighted in dark blue, with corresponding gene names shown outside of the plot; (5) HCMV nucleotide diversity, computed in adjacent windows of 100 bp, the hypervariable loci (top 5 percent values) are highlighted in dark red.
List of genes with the strongest hotspots of linkage disequilibrium (high-LD genes)
| Gene | Functional Annotation | CD4+ | CD8+ | LD score | Nuc. Div. | |
|---|---|---|---|---|---|---|
|
| Envelope glycoprotein, modulation of chemo- and/or cytokine receptor through binding (CCR5/CXCR4) | + |
|
| ||
|
| # | Glycoprotein, repression of replication | ++ | + |
|
|
|
| Membrane glycoprotein, modulation of T-cell signalling/function |
|
| |||
|
| Glycoprotein B (gB), heparan-binding, viral entry of the host cell | ++ | ++ |
|
| |
|
| Glycoprotein H (gH), viral entry of the host cell | + | + |
| 0.09 | |
|
| # | Putative membrane glycoprotein |
|
| ||
|
| * | Putative membrane glycoprotein sharing sequence homology with CD24 |
|
| ||
|
| # | Putative membrane glycoprotein |
|
| ||
|
| # | Putative membrane glycoprotein |
|
| ||
|
| Glycoprotein N (gN) |
|
| |||
|
| Membrane glycoprotein, modulation of T-cell signalling/function |
|
| |||
|
| Putative membrane glycoprotein |
|
| |||
|
| Major capsid protein (forms icosahedral capsid with UL85, UL80, UL48/49, and UL46) | ++ | + |
| 0.02 | |
|
| # | Membrane glycoprotein, modulates chemo- and/or cytokine production |
|
| ||
|
| Membrane glycoprotein, MHC-I homologue, mitochondrial inhibitor of apoptosis (vMIA) | ++ |
|
| ||
|
| Membrane glycoprotein, predicted involvement in virion assembly and egress | 9.8 |
| |||
|
| # | Putative membrane glycoprotein | 9.8 |
| ||
|
| Putative membrane glycoprotein | 9.8 |
| |||
|
| * | Type I transmembrane glycoprotein and a potent activator of NFκB-induced transcription | + | 9.3 |
| |
|
| Transcriptional activator involved in DNA replication | 8.6 | 0.04 | |||
|
| Glycoprotein M (gM), viral entry of the host cell | ++ | 8.5 | 0.05 | ||
|
| Tegument phosphoprotein | 8.0 | 0.03 | |||
|
| # | Putative membrane glycoprotein | 7.4 |
| ||
|
| membrane glycoprotein, binds IgG Fc domain, involved in immune regulation | 7.4 |
| |||
|
| # | membrane glycoprotein, binds IgG Fc domain, involved in immune regulation | 6.7 |
| ||
|
| Transcriptional regulator IE1, involved in immune regulation | ++ | ++ | 6.7 |
| |
|
| * | Putative membrane glycoprotein (homology to MHC-I) with NK cell evasion function | 5.5 |
| ||
|
| Capsid portal protein | 5.5 | 0.02 | |||
|
| Virion-associated regulatory protein | 5.0 | 0.06 | |||
|
| Putative membrane glycoprotein (homology to MHC-I) with NK cell evasion function | ++ | 5.0 | 0.06 | ||
|
| Putative multiple transmembrane protein | 5.0 | 0.03 |
#, RL11 family; *, UL/b’, region.
aGene name in Merlin reference sequence annotation (NCBI RefSeq NC_006273/AY446894), genes are ranked by decreasing LD score.
bFunctional annotation summarized from Merlin reference sequence annotation in NCBI RefSeq record (NC_006273/AY446894) and from the extensive review provided by van Damme and van Looke (2014).
cAntigenic status derived from Sylwester et al. (2005). ++, the gene was among the top 30, when ranked by total memory-corrected response, over 33 seropositive subjects; +, the gene was eliciting a positive response in at least 4 of the 33 tested seropositive subjects.
dThe highest local LD index among all windows included in the gene boudaries; local LD index is the −log10 transform of the P values of Mann–Whitney–Wilcoxon tests for each 700-pb windows located in the gene, under the null hypothesis that there is no higher LD in the window than in average in the genome. Genes presented in this table include the 5 percent top-scoring windows (high-LD genes); bold values are in the top 2 percent.
eNucleotidic diversity, highest value recorded in 100-bp windows within the gene; bold values are in the top 5 percent (hyper-variable genes).