| Literature DB >> 31050742 |
Nicolás M Suárez1, Gavin S Wilkie1, Elias Hage2,3, Salvatore Camiolo1, Marylouisa Holton1, Joseph Hughes1, Maha Maabar1, Sreenu B Vattipally1, Akshay Dhingra2, Ursula A Gompels4, Gavin W G Wilkinson5, Fausto Baldanti6,7, Milena Furione6, Daniele Lilleri8, Alessia Arossa9, Tina Ganzenmueller2,3,10, Giuseppe Gerna8, Petr Hubáček11, Thomas F Schulz2,3, Dana Wolf12, Maurizio Zavattoni6, Andrew J Davison1.
Abstract
The genomic characteristics of human cytomegalovirus (HCMV) strains sequenced directly from clinical pathology samples were investigated, focusing on variation, multiple-strain infection, recombination, and gene loss. A total of 207 datasets generated in this and previous studies using target enrichment and high-throughput sequencing were analyzed, in the process enabling the determination of genome sequences for 91 strains. Key findings were that (i) it is important to monitor the quality of sequencing libraries in investigating variation; (ii) many recombinant strains have been transmitted during HCMV evolution, and some have apparently survived for thousands of years without further recombination; (iii) mutants with nonfunctional genes (pseudogenes) have been circulating and recombining for long periods and can cause congenital infection and resulting clinical sequelae; and (iv) intrahost variation in single-strain infections is much less than that in multiple-strain infections. Future population-based studies are likely to continue illuminating the evolution, epidemiology, and pathogenesis of HCMV.Entities:
Keywords: gene loss; genome sequence; genotype; human cytomegalovirus; multiple-strain infection; mutation; recombination; target enrichment; variation
Mesh:
Substances:
Year: 2019 PMID: 31050742 PMCID: PMC6667795 DOI: 10.1093/infdis/jiz208
Source DB: PubMed Journal: J Infect Dis ISSN: 0022-1899 Impact factor: 5.226
Selected Characteristics on Sample Collections 1–3
| Characteristic | Collection 1 | Collection 2 | Collection 3 |
|---|---|---|---|
| Patients, No.a | 48 | 29 | 25 |
| Patient condition | Congenital infection | Mostly transplant recipients | Various |
| Samples, No. | 53 | 89 | 57 |
| Sample source, city (prefix) | Pavia (PAV), Jerusalem (JER), Prague (PRA) | Hannover (Child, RTR, SCTR), Pavia (PAV) | Rotterdam (Rot), London (Lon, Pat_) |
| Datasets, No. | 53 | 97b | 57c |
| Duplicated libraries, No. | 0 | 7 | 0 |
| HCMV load, IU/µLd | 26–559 968 | 5–194 840 | 104–18 377 |
| Genome copies for library, No.e | 225–8 399 520 | 280–3 896 800 | Unknown |
| Reads in Merlin alignment, % | 2–91 | 0–85 | 0–90 |
| Coverage ratio in Merlin alignment, % unique/total reads | 0.40–83.12 | 0.00–76.09 | 0.00–90.21 |
| Genome sequences determined, No.f | 42 | 25 | 24 |
Details are provided in Supplementary Tables 1–3.
Abbreviation: HCMV, human cytomegalovirus.
aArchived diagnostic samples were used, and clinical data were retrieved, with the approval of the institutional review boards of Policlinico San Matteo, Pavia (reference numbers 35853/2010 and 35854/2010), Hadassah University Hospital, Jerusalem (reference number HMO-063911), Motol University Hospital, Prague (reference number EK-701a/16) and Hannover Medical School, Hannover (reference number 2527-2014).
bWe reported 68 of the Hannover datasets previously [21].
cThese datasets were reported previously by others, and were either provided by the authors [19] or downloaded from the European Nucleotide Archive (study PRJEB12814) [20].
dViral load in most extracted samples was quantified in the laboratory of origin or the sequencing laboratory. In some instances, the entire sample was used blind to generate a sequencing library.
eAssumes that 1 IU is equivalent to 1 genome copy.
fThe trimmed paired-read data were aligned to the UCSC hg19 human reference genome (http://genome.ucsc.edu/) using Bowtie2. Nonmatching reads were assembled de novo into contigs using SPAdes version 3.5.0 [27]. The contigs were ordered using Scaffold_builder version 2.2 [28] by reference to a version of the strain Merlin sequence lacking all but 100 nt of the terminal repeat regions (TRL at the left end and TRS at the right end; Figure 1), and merged into a draft genome sequence. Residual gaps were filled by identifying relevant reads anchored in flanking regions and assembling them manually in a reiterative fashion. TRL and TRS were reinstated, and the complete genome sequence was verified by aligning it against the read data using Bowtie2 and inspecting the alignment in Tablet. An annotated genome sequence was produced using Sequin (https://www.ncbi.nlm.nih.gov/Sequin/).
Figure 1.Locations in the human cytomegalovirus strain Merlin genome of genes used for genotyping. The genome consists of 2 unique regions, UL (1325–194 343 bp) and US (197 627–233 108 bp), the former flanked by inverted repeats TRL (1–1324 bp) and IRL (194 344–195 667 bp), and the latter flanked by inverted repeats IRS (195 090–197 626 bp) and TRS (233 109–235 646 bp). Protein-coding regions are indicated by shaded arrows, and noncoding RNAs as narrower, white arrows, with gene nomenclature below. Introns are shown as narrow white bars. The 12 genes (RL5A, RL6, RL12, RL13, UL1, UL9, UL11, UL73, UL74, UL120, UL146, and UL139) used for motif read-matching are in dark gray (red in online version). Two of these genes (RL13 and UL146) were also used for genotype read-matching. The additional 5 genes (UL20, UL33, UL37, UL55, and US9) used to genotype sequences by alignment are medium gray (orange in online version). All other genes are shown in white (pink in online version).
Groups of Nonrecombinant Strains
| Genotypesa | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Group | Strain | RL5A | RL6 | RL12 | RL13 | UL1 | UL9 | UL11 | UL20 | UL33 | UL37 | UL55N | UL73 | UL74 | UL120 | UL146 | UL139 | US9 | Mutated Genes | Differencesb | Shared Mutations |
| 1 | PRA8 | 1 | 1 | 6 | 6 | 6 | 6 | 2 | 5 | 1 | 5 | 2/3 | 4C | 1C | 2B | 1 | 4 | 1 | UL145 | 0 | These strains share a UL145 mutation, were characterized in different studies, and were confirmed as having been derived from the same patient |
| CZ/3/2012 | 1 | 1 | 6 | 6 | 6 | 6 | 2 | 5 | 1 | 5 | 2/3 | 4C | 1C | 2B | 1 | 4 | 1 | UL145 | |||
| 2 | BE/3/2011 | 2 | 4 | 1B | 1 | 1 | 4 | 1 | 6 | 2 | 2 | 2/3 | 4A | 3 | 1A | 8 | 2 | 1 | None | 1 | None |
| BE/21/2011 | 2 | 4 | 1B | 1 | 1 | 4 | 1 | 6 | 2 | 2 | 2/3 | 4A | 3 | 1A | 8 | 2 | 1 | None | |||
| 3 | UK/Lon6/Urine/2011 | 5 | 1 | 7 | 7 | 7 | 1 | 1 | 6 | 2 | 1 | 4 | 3A | 1B | 3B | 13 | 1A | 1 | None | 23 | None |
| 2CEN15 | 5 | 1 | 7 | 7 | 7 | 1 | 1 | 6 | 2 | 1 | 4 | 3A | 1B | 3B | 13 | 1A | 1 | None | |||
| BE/5/2012 | 5 | 1 | 7 | 7 | 7 | 1 | 1 | 6 | 2 | 1 | 4 | 3A | 1B | 3B | 13 | 1A | 1 | None | |||
| 4 | BE/14/2012 | 3 | 5 | 4A | 4A | 4 | 6 | 6 | 5 | 4 | 5 | 4 | 3A | 1B | 2A | 9 | 5 | 1 | RL6 UL9 UL40 US7 | 26 | These strains share a UL9 mutation and also RL6 and UL40 mutations that are present in other strains |
| BE/36/2011 | 3 | 5 | 4A | 4A | 4 | 6 | 6 | 5 | 4 | 5 | 4 | 3A | 1B | 2A | 9 | 5 | 1 | RL6 UL9 UL40 | |||
| 5 | BE/10/2012 | 6 | 3 | 1A | 1 | 1 | 1 | 1 | 2 | 4 | 5 | 4 | 3A | 1B | 2A | 3 | 7 | 1 | None | 35 | None |
| BE/26/2011 | 6 | 3 | 1A | 1 | 1 | 1 | 1 | 2 | 4 | 5 | 4 | 3A | 1B | 2A | 3 | 7 | 1 | None | |||
| 6 | BE/1/2011 | 1 | 1 | 4A | 4A | 4 | 6 | 6 | 7 | 4 | 6 | 2/3 | 4A | 3 | 2A | 1 | 4 | 1 | UL1 UL9 | 65 | These strains bear a UL9 mutation that is present in other strains, and 2 strains share a UL1 mutation |
| BE/8/2010 | 1 | 1 | 4A | 4A | 4 | 6 | 6 | 7 | 4 | 6 | 2/3 | 4A | 3 | 2A | 1 | 4 | 1 | UL9 | |||
| BE/9/2012 | 1 | 1 | 4A | 4A | 4 | 6 | 6 | 7 | 4 | 6 | 2/3 | 4A | 3 | 2A | 1 | 4 | 1 | UL1 UL9 | |||
| 7 | NAN1LA | 3 | 5 | 5 | 5 | 5 | 7 | 3 | 2 | 2 | 6 | 2/3 | 4D | 5 | 3B | 7 | 5 | 2 | RL6 US9 | 73 | These strains share RL6 and US9 mutations that are present in other strains |
| BE/6/2012 | 3 | 5 | 5 | 5 | 5 | 7 | 3 | 2 | 2 | 6 | 2/3 | 4D | 5 | 3B | 7 | 5 | 2 | RL6 US9 US27 | |||
| 8 | BE/7/2012 | 2 | 4 | 1A | 1 | 1 | 4 | 1 | 7 | 5 | 3 | 2/3 | 4A | 3 | 2B | 13 | 5 | 1 | RL5A RL13 UL150 | 125 | These strains share a UL150 mutation that is present in other strains |
| BE/11/2012 | 2 | 4 | 1A | 1 | 1 | 4 | 1 | 7 | 5 | 3 | 2/3 | 4A | 3 | 2B | 13 | 5 | 1 | UL150 | |||
| BE/16/2012 | 2 | 4 | 1A | 1 | 1 | 4 | 1 | 7 | 5 | 3 | 2/3 | 4A | 3 | 2B | 13 | 5 | 1 | UL150 | |||
| BE/26/2010 | 2 | 4 | 1A | 1 | 1 | 4 | 1 | 7 | 5 | 3 | 2/3 | 4A | 3 | 2B | 13 | 5 | 1 | UL150 | |||
| BE/30/2011 | 2 | 4 | 1A | 1 | 1 | 4 | 1 | 7 | 5 | 3 | 2/3 | 4A | 3 | 2B | 13 | 5 | 1 | UL150 | |||
| 9 | JER851 | 1 | 1 | 3 | 3 | 3 | 2 | 1 | 3 | 4 | 6 | 1 | 2 | 2B | 3B | 7 | 4 | 1 | UL1 UL9 UL111A | 135 | These strains share a UL111A mutation that is present in another strain |
| JER4041 | 1 | 1 | 3 | 3 | 3 | 2 | 1 | 3 | 4 | 6 | 1 | 2 | 2B | 3B | 7 | 4 | 1 | UL111A | |||
| BE/25/2010 | 1 | 1 | 3 | 3 | 3 | 2 | 1 | 3 | 4 | 6 | 1 | 2 | 2B | 3B | 7 | 4 | 1 | UL111A | |||
| 10 | JER5695 | 1 | 1 | 7 | 7 | 7 | 1 | 1 | 6 | 2 | 1 | 2/3 | 3B | 2A | 4B | 13 | 2 | 1 | UL9 UL111A | 138 | These strains share a UL111A mutation that is present in other strains, and have different UL9 mutations |
| BE/15/2010 | 1 | 1 | 7 | 7 | 7 | 1 | 1 | 6 | 2 | 1 | 2/3 | 3B | 2A | 4B | 13 | 2 | 1 | RL1 UL9 UL111A | |||
| 11 | PRA7 | 1 | 1 | 4B | 4B | 4 | 9 | 6 | 5 | 5 | 6 | 6 | 4D | 5 | 4B | 10 | 2 | 1 | RL5A UL111A | 143 | These strains share RL5A and UL111A mutations that are present in other strains |
| JP | 1 | 1 | 4B | 4B | 4 | 9 | 6 | 5 | 5 | 6 | 6 | 4D | 5 | 4B | 10 | 2 | 1 | RL5A UL111A | |||
| BE/4/2010 | 1 | 1 | 4B | 4B | 4 | 9 | 6 | 5 | 5 | 6 | 6 | 4D | 5 | 4B | 10 | 2 | 1 | RL5A UL111A | |||
| 12 | BE/6/2011 | 5 | 1 | 4B | 4B | 4 | 9 | 6 | 1 | 5 | 3 | 2/3 | 3A | 1B | 1B | 9 | 5 | 1 | UL9 | 155 | Two strains share a UL9 mutation that is present in other strains |
| BE/18/2011 | 5 | 1 | 4B | 4B | 4 | 9 | 6 | 1 | 5 | 3 | 2/3 | 3A | 1B | 1B | 9 | 5 | 1 | None | |||
| BE/27/2011 | 5 | 1 | 4B | 4B | 4 | 9 | 6 | 1 | 5 | 3 | 2/3 | 3A | 1B | 1B | 9 | 5 | 1 | UL9 |
aSee Supplementary Figures 1 and 2 for genotype definitions. G prefix omitted.
bTotal number of differences among all strains in the group, not including size variations in tandem repeats. To exclude repeat regions, sequences were aligned from the TATA box of RL1 to the end of US, omitting the region from the AATAAA polyadenylation signal of UL150A to the beginning of TRS.
Mutated Genes in Order of Decreasing Frequency
| Gene | Feature(s) | Strains Mutated, No.a | Strains Mutated, %a | ||||
|---|---|---|---|---|---|---|---|
| Passagedb | Clinicalc | Alld | Passagedb | Clinicalc | Alld | ||
| UL9 | RL11 family; type 1 membrane protein | 50 | 31 | 81 | 32.89 | 34.07 | 33.33 |
| RL5A | RL11 family | 31 | 27 | 58 | 20.39 | 29.67 | 23.87 |
| UL1 | RL11 family; type 1 membrane protein | 20 | 18 | 38 | 13.16 | 19.78 | 15.64 |
| RL6 | RL11 family | 23 | 14 | 37 | 15.13 | 15.38 | 15.23 |
| US9 | US6 family; type 1 membrane protein | 26 | 11 | 37 | 17.11 | 12.09 | 15.23 |
| UL111A | Viral interleukin-10 | 16 | 7 | 23 | 10.53 | 7.69 | 9.47 |
| UL150 | Unknown | 11 | 3 | 14 | 7.24 | 3.30 | 5.76 |
| US7 | US6 family; type 1 membrane protein | 7 | 7 | 14 | 4.61 | 7.69 | 5.76 |
| UL40 | Type 1 membrane protein | 8 | 2 | 10 | 5.26 | 2.20 | 4.12 |
| UL30 | UL30 family | 2 | 3 | 5 | 1.32 | 3.30 | 2.06 |
| UL142 | MHC family; type 1 membrane protein | 2 | 3 | 5 | 1.32 | 3.30 | 2.06 |
| RL12 | RL11 family; type 1 membrane protein | 3 | 1 | 4 | 1.97 | 1.10 | 1.65 |
| RL1 | RL1 family | 1 | 2 | 3 | 0.66 | 2.20 | 1.23 |
| UL136 | Potential transmembrane domain | 3 | 0 | 3 | 1.97 | 0.00 | 1.23 |
| US13 | US12 family; type 3 membrane protein | 3 | 0 | 3 | 1.97 | 0.00 | 1.23 |
| UL133 | Potential transmembrane domain | 2 | 0 | 2 | 1.32 | 0.00 | 0.82 |
| US6 | US6 family; type 1 membrane protein | 1 | 1 | 2 | 0.66 | 1.10 | 0.82 |
| US8 | US6 family; type 1 membrane protein | 0 | 2 | 2 | 0.00 | 2.20 | 0.82 |
| US27 | GPCR family; type 3 membrane protein | 2 | 0 | 2 | 1.32 | 0.00 | 0.82 |
| UL11 | RL11 family; type 1 membrane protein | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| UL13 | Unknown | 0 | 1 | 1 | 0.00 | 1.10 | 0.41 |
| UL14 | UL14 family; type 1 membrane protein | 0 | 1 | 1 | 0.00 | 1.10 | 0.41 |
| UL15A | Potential transmembrane domain | 0 | 1 | 1 | 0.00 | 1.10 | 0.41 |
| UL20 | Type 1 membrane protein | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| UL43 | US22 family | 0 | 1 | 1 | 0.00 | 1.10 | 0.41 |
| UL99 | Envelope-associated protein | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| UL148 | Type 1 membrane protein | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| UL147 | CXCL family | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| UL145 | Unknown | 0 | 1 | 1 | 0.00 | 1.10 | 0.41 |
| UL150A | Unknown | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| IRS1 | US22 family | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| US1 | US1 family | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| US12 | US12 family; type 3 membrane protein | 1 | 0 | 1 | 0.66 | 0.00 | 0.41 |
| US19 | US12 family; type 3 membrane protein | 0 | 1 | 1 | 0.00 | 1.10 | 0.41 |
Abbreviations: CXCL, chemokine (CXC motif) ligand; GPCR, G protein–coupled receptor; MHC, major histocompatibility complex.
aOmitting mutations that occurred in RL13, UL128, UL130, and UL131A probably during passage, or that were engineered during bacterial artificial chromosome construction.
bStrains sequenced from strains passaged in cell culture, not taking into account the minority of mutations confirmed from the clinical samples (n = 152, excludes CZ/3/2012, which is the same strain as PRA8).
cStrains sequenced directly from clinical material (n = 91).
dStrains sequenced directly from clinical material or passaged virus (n = 243).
Figure 2.Box-and-whisker graphs created using ggplot2 (https://ggplot2.tidyverse.org) showing the total number of single-nucleotide polymorphisms (SNPs) detected at a frequency of >2% in single-strain and multiple-strain infections using LoFreq (A) and V-Phaser (B). Single-strain (n = 134 and 131, respectively) and multiple-strain datasets (n = 29 and 29, respectively) for which consensus genome sequences had been derived were identified by motif read-matching, and the total number of SNPs in each dataset was enumerated (insertions, deletions, and length polymorphisms were not considered). LoFreq employed a minimal coverage depth of 10 reads (minimal SNP quality [phred] 64) and strand-bias significance with a false discovery rate correction of P < .001. V-Phaser employed phasing with a window size of 500 nucleotides and quality score (phred) 20 for calibrating the significance of strand-bias at P < .05. Each box (light gray for single strains and dark gray for multiple strains) encompasses the first to third quartiles (Q1–Q3) and shows the median as a thick line. For each box, the horizontal line at the end of the upper dashed whisker marks the upper extreme (defined as the smaller of Q3 + 1.5 [Q3–Q1] and the highest single value), and the horizontal line at the end of the lower dashed whisker marks indicates the lower extreme (the greater of Q1 – 1.5 [Q3–Q1] and the lowest single value).