| Literature DB >> 30373153 |
Tomaž M Zorec1, Denis Kutnjak2, Lea Hošnjak3, Blanka Kušar4, Katarina Trčko5, Boštjan J Kocjan6, Yu Li7, Miljenko Križmarić8, Jovan Miljković9, Maja Ravnikar10, Mario Poljak11.
Abstract
Molluscum contagiosum virus (MCV) is the sole member of the Molluscipoxvirus genus and the causative agent of molluscum contagiosum (MC), a common skin disease. Although it is an important and frequent human pathogen, its genetic landscape and evolutionary history remain largely unknown. In this study, ten novel complete MCV genome sequences of the two most common MCV genotypes were determined (five MCV1 and five MCV2 sequences) and analyzed together with all MCV complete genomes previously deposited in freely accessible sequence repositories (four MCV1 and a single MCV2). In comparison to MCV1, a higher degree of nucleotide sequence conservation was observed among MCV2 genomes. Large-scale recombination events were identified in two newly assembled MCV1 genomes and one MCV2 genome. One recombination event was located in a newly identified recombinant region of the viral genome, and all previously described recombinant regions were re-identified in at least one novel MCV genome. MCV genes comprising the identified recombinant segments have been previously associated with viral interference with host T-cell and NK-cell immune responses. In conclusion, the two most common MCV genotypes emerged along divergent evolutionary pathways from a common ancestor, and the differences in the heterogeneity of MCV1 and MCV2 populations may be attributed to the strictness of the constraints imposed by the host immune response.Entities:
Keywords: complete genome; evolution; genetic landscape; immune evasion; molluscum contagiosum virus; recombination
Mesh:
Year: 2018 PMID: 30373153 PMCID: PMC6266040 DOI: 10.3390/v10110586
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Summary of origin, sequencing, and assembly approaches, estimated viral loads, remapping statistics, and genome characteristics of 15 MCV isolates included in the study.
| No. | Viral Genotype | GenBank Acc. No. | Reference | Country of Origin | Sequencing Technique (Platform) | Assembly | Viral Load (Viral Copies/Cell) | Per-base Short Read Depth of Coverage (Mean ± SD) | Percentage of Mapped Short Reads (%) | Genome Length (nt) | ITR Length (nt) | Number of Annotated Genes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | MCV1 | U60315 | Senkevich et al. [ | Unknown | Applied Biosystems AB373A (primer-walking) | / | / | / | / | 190,289 | 4711 | 178 |
| 2 | MCV1 | KY040275 | López-Bueno et al. [ | Spain | Illumina MiSeq (2 × 300 nt) | Short-read | / | / | / | 188,253 | 3821 | 181 |
| 3 | MCV1 | KY040276 | López-Bueno et al. [ | Spain | Illumina MiSeq (2 × 300 nt) | Short-read | / | / | / | 189,098 | 4252 | 179 |
| 4 | MCV1 | KY040277 | López-Bueno et al. [ | Spain | Illumina MiSeq (2 × 300 nt) | Short-read | / | / | / | 188,458 | 3758 | 179 |
| 5 | MCV1 | MH320553 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt), ONT | Hybrid | 4237 | 1772.92 ± 282.67 | 12.30 | 187,558 | 3519 | 177 |
| 6 | MCV1 | MH320552 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt), ONT | Hybrid | 2527 | 3864.52 ± 526.58 | 26.11 | 187,884 | 3651 | 176 |
| 7 | MCV1 | MH320547 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt) | Short-read | 1021 | 2243.29 ± 750.52 | 18.37 | 187,826 | 3559 | 177 |
| 8 | MCV1 | MH320555 | This study | Slovenia | Illumina HiSeq2000 (2 × 150 nt, 2 × 250 nt), ONT | Hybrid | 546,855 | 635.62 ± 208.74 | 87.98 | 189,292 | 4354 | 176 |
| 9 | MCV1 | MH320554 | This study | Slovenia | Illumina HiSeq2000 (2 × 150 nt; 2 × 250 nt), ONT | Hybrid | 40,351 | 581.67 ± 134.87 | 44.27 | 196,781 | 7975 | 175 |
| 10 | MCV2 | KY040274 | López-Bueno et al. [ | Spain | Illumina MiSeq (2 × 300 nt) | Short-read | / | / | / | 192,183 | 4086 | 170 |
| 11 | MCV2 | MH320550 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt), ONT | Hybrid | 26,717 | 2913.56 ± 417.96 | 18.53 | 196,206 | 7762 | 170 |
| 12 | MCV2 | MH320548 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt) | Short-read | 5226 | 5270.58 ± 1499.21 | 27.27 | 190,319 | 4937 | 170 |
| 13 | MCV2 | MH320556 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt) | Short-read | 4573 | 5861.15 ± 622.65 | 39.18 | 189,257 | 4319 | 170 |
| 14 | MCV2 | MH320551 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt), ONT | Hybrid | 1828 | 3543.65 ± 546.386 | 24.24 | 192,156 | 5979 | 170 |
| 15 | MCV2 | MH320549 | This study | Slovenia | Illumina HiSeq4000 (2 × 150 nt), ONT | Hybrid | 8727 | 1912.23 ± 416.643 | 13.30 | 193,271 | 6432 | 170 |
SD = standard deviation, nt = nucleotides, ONT = Oxford Nanopore Technologies, ITR = inverted terminal repeats.
Summary of 18 genes that were not found in either one of the 15 complete MCV genome sequences analyzed. These genes comprise approximately 10% of all MCV genes reported by Senkevich et al. [22].
| Gene | Missing in Genomes (Count) | Missing in Genomes (Sequence No.) | Function/Homologues/Reference |
|---|---|---|---|
|
| 3 | 7, 8, 9 | Predicted non-globular protein/MC164L/Senkevich et al. [ |
|
| 6 | 10 *, 11, 12, 13, 14, 15 | Unknown/ /Senkevich et al. [ |
|
| 2 | 1 *, 4 * | Predicted non-globular protein/ /Senkevich et al. [ |
|
| 1 | 1 * | Predicted non-globular protein/ /Senkevich et al. [ |
|
| 12 | 3 *, 5, 6, 7, 8, 9, 8, 10 *, 11, 12, 13, 15 | Predicted non-globular protein/ /Senkevich et al. [ |
|
| 6 | 3 *, 5, 6, 7, 8, 9 | Unknown/ /Senkevich et al. [ |
|
| 8 | 1 *, 2 *, 10 *, 11, 12, 13, 14, 15 | Predicted structural protein/ /Senkevich et al. [ |
|
| 6 | 10 *, 11, 12, 13, 14, 15 | Unknown/ /Senkevich et al. [ |
|
| 13 | 3 *, 4 *, 5, 6, 7, 8, 9, 10 *, 11, 12, 13, 14, 15 | Predicted structural protein/ /Senkevich et al. [ |
|
| 7 | 4 *, 10 *, 11, 12, 13, 14, 15 | Predicted C-terminal transmembrane helix/ /Senkevich et al. [ |
|
| 6 | 10 *, 11, 12, 13, 14, 15 | Unknown/ /Senkevich et al. [ |
|
| 6 | 10 *, 11, 12, 13, 14, 15 | Predicted long non-globular protein/ /Senkevich et al. [ |
|
| 1 | 1 * | Predicted non-globular protein/ /Senkevich et al. [ |
|
| 6 | 10 *, 11, 12, 13, 14, 15 | Unknown/ /Senkevich et al. [ |
|
| 7 | 6, 10 *, 11, 12, 13, 14, 15 | Unknown/ /Senkevich et al. [ |
|
| 1 | 3 * | Unknown/ /Senkevich et al. [ |
|
| 7 | 6, 10 *, 11, 12, 13, 14, 15 | Predicted peptide, putative secreted protein/ /NCBI Gene database |
|
| 9 | 5, 8, 9, 10 *, 11, 12, 13, 14, 15 | Predicted non-globular protein/MC001R/Senkevich et al. [ |
* indicates MCV genome sequences that were available in GenBank prior to this study.
Figure 1(Left) Maximum likelihood phylogenetic tree (GTR + I + G) with metric branch lengths and aLRT branch support values constructed based on the alignment of 15 complete MCV genome nucleotide sequences. (Right) Genome-to-genome p-distance plots, depicting a relatively large gap between the genomes of two different MCV genotypes. The phylogenetic tree was visualized using the BioPython Phylo module [57], and visualization of the pairwise p-distance plots was done using the Matplotlib (v2.2.2) Python module [58].
Mean sample to sample p-distances (with and without combinatorial subsampling, balancing) between complete MCV genomes and concatenated sequences of consensus MCV genes, and GC content of the complete MCV genomes and consensus MCV genes interrogated. Fields with underlined boldface text indicate mean distance centroid sequences (minimum mean p-distance to all other MCV genomes/concatenated consensus genes).
| Mean | GC Content | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Sample vs. All | Intra-Genotype | ||||||||
| Number | Viral Genotype | Genome | Genome (Balancing) | Consensus Genes | Consensus Genes (Balancing) | Genome | Consensus Genes | Genotype | Consensus Genes |
| 1 | MCV1 | 0.02821 ± 0.02885 | 0.03500 ± 3.3 × 10−4 | 0.02490 ± 0.02582 | 0.03100 ± 3.0 × 10−4 | 0.002909 ± 2.546 × 10−3 | 0.002314 ± 2.521 × 10−3 | 0.6336 | 0.6435 |
| 2 | MCV1 | 0.02793 ± 0.02877 | 0.03471 ± 3.2 × 10−4 | 0.02484 ± 0.02595 | 0.03100 ± 3.0 × 10−4 | 0.002730 ± 2.493 × 10−3 | 0.002158 ± 2.497 × 10−3 | 0.6342 | 0.6333 |
| 3 | MCV1 | 0.02954 ± 0.02465 | 0.03535 ± 9 × 10−5 | 0.02690 ± 0.02138 | 0.03193 ± 5 × 10−5 | 0.007318 ± 2.658 × 10−3 | 0.007500 ± 2.675 × 10−3 | 0.6338 | 0.6430 |
| 4 | MCV1 | 0.02827 ± 0.02966 | 0.03526 ± 2.5 × 10−4 | 0.02504 ± 0.02642 | 0.03127 ± 3.0 × 10−4 | 0.002317 ± 1.959 × 10−3 | 0.001969 ± 2.675 × 10−3 | 0.6345 | 0.6433 |
| 5 | MCV1 | 0.02822 ± 0.02991 | 0.03527 ± 3.2 × 10−4 | 0.02497 ± 0.02657 | 0.03123 ± 3.0 × 10−4 |
| 0.001795 ± 2.418 × 10−3 | 0.6341 | 0.6431 |
| 6 | MCV1 | 0.02824 ± 0.02987 | 0.03528 ± 3.2 × 10−4 | 0.02500 ± 0.02660 | 0.03127 ± 3.0 × 10−4 | 0.002152 ± 2.411 × 10−3 | 0.001794 ± 2.417 × 10−3 | 0.6339 | 0.6432 |
| 7 | MCV1 | 0.02824 ± 0.02989 | 0.03529 ± 3.2 × 10−4 | 0.02512 ± 0.02658 | 0.03138 ± 3.0 × 10−4 | 0.002140 ± 2.408 × 10−3 | 0.001919 ± 2.440 × 10−3 | 0.6340 | 0.6431 |
| 8 | MCV1 | 0.02823 ± 0.02982 | 0.03526 ± 3.3 × 10−4 | 0.02496 ± 0.02657 | 0.03122 ± 3.0 × 10−4 | 0.002181 ± 2.445 × 10−3 |
| 0.6332 | 0.6430 |
| 9 | MCV1 | 0.02798 ± 0.02881 | 0.03477 ± 3.3 × 10−4 | 0.02482 ± 0.02598 | 0.03094 ± 3.0 × 10−4 | 0.002736 ± 2.547 × 10−3 | 0.002122 ± 2.508 × 10−3 | 0.6312 | 0.6434 |
| 10 | MCV2 | 0.03999 ± 0.02877 |
| 0.03552 ± 0.02551 | 0.03035 ± 2.0 × 10−4 | 0.001233 ± 2.271 × 10−3 | 0.001168 ± 2.410 × 10−3 | 0.6432 | 0.6524 |
| 11 | MCV2 | 0.04005 ± 0.02877 | 0.03421 ± 2.3 × 10−4 | 0.03557 ± 0.02554 | 0.03039 ± 2.0 × 10−4 | 0.001263 ± 2.256 × 10−3 | 0.001173 ± 2.408 × 10−3 | 0.6403 | 0.6524 |
| 12 | MCV2 | 0.04002 ± 0.02881 | 0.03418 ± 2.4 × 10−4 | 0.03557 ± 0.02554 | 0.03040 ± 2.0 × 10−4 |
| 0.001177 ± 2.415 × 10−3 | 0.6438 | 0.6523 |
| 13 | MCV2 | 0.04271 ± 0.02710 | 0.03721 ± 9 × 10−5 | 0.03866 ± 0.02389 | 0.03380 ± 7 × 10−5 | 0.005307 ± 2.379 × 10−3 | 0.005506 ± 2.464 × 10−3 | 0.6441 | 0.6518 |
| 14 | MCV2 | 0.04002 ± 0.02880 | 0.03418 ± 2.4 × 10−4 | 0.03557 ± 0.02554 |
| 0.001231 ± 2.263 × 10−3 |
| 0.6424 | 0.6523 |
| 15 | MCV2 | 0.04004 ± 0.02850 | 0.03426 ± 2.3 × 10−4 | 0.03560 ± 0.02539 | 0.03045 ± 2.0 × 10−4 | 0.001580 ± 2.314 × 10−3 | 0.001365 ± 2.4426 × 10−3 | 0.6414 | 0.6523 |
The data dispersion term is given as standard deviation.
Figure 2First-order linkage maps of the MCV genomes interrogated, where each genome is represented as a colored node. Nodes are colored according to the MCV genotype (blue = MCV1; red = MCV2). Edges connect MCV genomes according to their nearest neighbors based on pairwise nucleotide sequence similarities (linkage) in different contexts (colored: black, blue, red, green, and purple). Black edges connect MCV genomes according to their linkage in the complete genome alignment. Blue edges represent linkage according to concatenated alignments of consensus genes. Linkage in individual genes is represented with green (intra-genotype) and purple (inter-genotype) edges. Counts of relevant neighboring MCV genes supporting each gene edge versus MCV genes that exhibit variation in the alignment of the relevant context (intra-and inter-genotype) are shown above or below the genome identifiers. Visualization was prepared using the Matplotlib (v2.2.2) Python module [58].
Figure 3Box and whisker plots of maximum p-distances observed in complete gene codon multiple nucleotide sequence alignments (MSAs) and intra-genotype codon MSAs (MCV1, MCV2); “-w/R” suffixes indicate exclusion of recombinant genes. Orange boxes represent 95% CI of the median (red line), as determined by 1000 bootstrap iterations; means are shown as red diamonds. Whiskers encode the data range and extend between the fifth and 95th percentile of data; data points above or below this range are shown as green circles. Colored lines connect maximum p-distance points of genes that lie above the 95th percentile (recombinant genes excluded) in six different contexts. The figure indicates considerably lower p-distances in the intra-genotype context, compared to the overall context. Most of the anomalously high intra-genotype p-distances can be explained by recombination, whereas the highest p-distances in the overall context can mainly be attributed to MCV genotype divergence (the same genes are closely related in the intra-genotype context). The per-gene maximum dissimilarity measure suggested another possible recombination event among a known (MCV1) and unknown MCV genotype in MC149.1R (the remaining outlying point after decoupling recombination in context MCV1-w/R), although this recombination event could not be confirmed by inspection of phylogenetic trees based on nucleotide and/or codon MSAs, nor could the recombination breakpoints be elucidated by any of the recombination detection methods employed by RDP4 [53]. Visualization was carried out using the Matplotlib (v2.2.2) Python module [58].
Figure 4Maximum likelihood phylogenetic trees (GTR + I + G) of recombinant genes grouped according to recombinant segments. Phylogenetic trees are annotated with gene designations, lengths of gene alignments (N), and minimum values of silhouette coefficients calculated from gene alignments (min(S)). Branches are equipped with branch support values (red) and branch lengths (black). Tree branches (wherever not dotted) are metric. Sample names of recombinant end nodes are highlighted with transparent red rectangles. Phylogenetic trees were visualized using the ETE3 toolkit [47].
Figure 5Schematic alignment of MCV genomes, depicting positions of recombinant segments. Individual recombinant segments are annotated and enumerated by position (RS1-3) and event number (in order of appearance: RS1.E1, RS1.E2, etc.). Individual recombination event annotations are structured in the following format: Recombinant segment (RS#), number of the individual event (.E#): affected genes; predicted MCV recombination donor; and location of the recombinant region in the genome (location of the recombinant region in an alignment). Semi-transparent bands indicate alignment positions of putative recombination hotspots.
Figure 6(Left) Maximum likelihood phylogenetic tree (GTR + I + G) with metric branch lengths and aLRT branch support values constructed based on the alignment of 15 complete MCV genome nucleotide sequences that have been stripped of the recombinant regions. (Right) Genome-to-genome p-distance plots after removal of identified recombinant regions, depicting a relatively large gap between the genomes of two different MCV genotypes. The phylogenetic tree was visualized using the BioPython Phylo module [57], and visualization of the pairwise p-distance plots was done using the Matplotlib (v2.2.2) Python module [58].