| Literature DB >> 27340695 |
Paul D Ling1, Simon Y Long2, Angela Fuery1, Rong-Sheng Peng1, Sarah Y Heaggans2, Xiang Qin3, Kim C Worley3, Shannon Dugan3, Gary S Hayward2.
Abstract
A novel group of mammalian DNA viruses called elephant endotheliotropic herpesviruses (EEHVs) belonging to the Proboscivirus genus has been associated with nearly 100 cases of highly lethal acute hemorrhagic disease in young Asian elephants worldwide. The complete 180-kb genomes of prototype strains from three AT-rich branch viruses, EEHV1A, EEHV1B, and EEHV5, have been published. However, less than 6 kb of DNA sequence each from EEHV3, EEHV4, and EEHV7 showed them to be a hugely diverged second major branch with GC-rich characteristics. Here, we determined the complete 206-kb genome of EEHV4(Baylor) directly from trunk wash DNA by next-generation sequencing and de novo assembly procedures. Among a total of 119 genes with an overall colinear organization similar to those of the AT-rich EEHVs, major features of EEHV4 include a family of 26 paralogous 7xTM and vGPCR-like genes plus 25 novel or missing genes. The genome also contains an unusual distribution of tracts of 5 to 11 successive A or T nucleotides in intergenic domains between the mostly much higher GC content protein coding regions. Furthermore, an extremely high GC-rich bias in the third wobble position of codons clearly delineates the coding regions for many but not all proteins. There are also two novel captured cellular genes, including a C-type lectin (vECTL) and an O-linked acetylglucosamine transferase (vOGT), as well as an unusually large and complex Ori-Lyt dyad symmetry domain. Finally, 30 kb from a second strain proved to include three small chimeric domains, indicating the existence of distinct EEHV4A and EEHV4B subtypes. IMPORTANCE Multiple species of herpesviruses from three different lineages of the Proboscivirus genus (EEHV1/6, EEHV2/5, and EEHV3/4/7) infect both Asian and African elephants, but lethal hemorrhagic disease is largely confined to Asian elephant calves and is predominantly associated with EEHV1. Milder disease caused by EEHV5 or EEHV4 is being increasingly recognized as well, but little is known about the latter, which is estimated to have diverged at least 35 million years ago from the others within a distinctive GC-rich branch of the Proboscivirus genus. Here, we have determined the complete genomic DNA sequence of a strain of EEHV4 obtained from a trunk wash sample collected from a surviving Asian elephant calf undergoing asymptomatic shedding during convalescence after an acute hemorrhagic disease episode. This represents the first example from among the three known GC-rich branch Proboscivirus species to be assembled and fully annotated. Several distinctive features of EEHV4 compared to AT-rich branch genomes are described.Entities:
Keywords: Elephas maximus calf; G-plus-C nucleotide content bias; acute hemorrhagic disease; elephant endotheliotropic herpesviruses; evolutionary divergence; trunk wash shedding
Year: 2016 PMID: 27340695 PMCID: PMC4911795 DOI: 10.1128/mSphere.00081-15
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
FIG 1 Annotated physical gene map of the complete EEHV4(Baylor) genome. The intact 206-kb EEHV4B(Baylor) genome determined here (GenBank accession no. KT832477) is depicted to scale. The relative sizes and orientations of all predicted open reading frames (ORFs) are indicated by horizontal arrows. Gene nomenclature is shown below each of the ORFs. The color key indicates groups of ORFs shared between all herpesviruses or subsets of herpesvirus subfamilies or multiple paralogues of repetitive gene families. Gray arrows indicate captured cellular genes, and white arrows denote novel genes that do not have obvious orthologues outside of the probosciviruses. Thin lines connecting arrows indicate introns. The position of the putative lytic replication origin is marked by a black rectangle.
Gene content and major features of the complete 205,896-bp EEHV4(Baylor) genome
| Gene name and orientation | Protein name | Type | Family or status | Position coordinates | % GC content | Protein size (aa) | % amino acid identity (% length matched) to: | Note or comment | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| EEHV1A Kimba | EEHV1A Raman | EEHV1B Emelia | EEHV5 Vijay | ||||||||
| TR | 3.5× 22-bp | 340–420 | Related to multimerized 17-bp repeats in TR of EEHV1A/1B/5 | ||||||||
| TR | Regulatory motifs | 1070–1410 | All have a cluster of 6–9× palindromic (8-bp) CREB-binding sites | ||||||||
| Nil | vFUT9 | Novel | E47 | EE63 | EE63 | EE63 | Absent in EEHV4 | ||||
| Nil | 7xTM | Novel | Nil | Nil | Nil | EE62B | Unique to EEHV5 | ||||
| Nil | IgFam | Nil | Nil | Nil | EE62A | Unique to EEHV5 | |||||
| Nil | vGPCR7 | 7xTM | E3fam | E48 | EE62 | Nil | EE62 | Absent in EEHV4 | |||
| Nil | E49fam | E49 | EE61 | Nil | Frag | Absent in EEHV4 | |||||
| Nil | vIgF1 | Novel | E50 | EE60 | Nil | Nil | Absent in EEHV4, -5 outside the probosciviruses | ||||
| Nil | vGPCR8 | 7xTM | E3fam | Frag | Frag | EE59 | Nil | Absent in EEHV4 | |||
| Nil | E49fam | E51 | EE58 | EE58 | Nil | Absent in EEHV4 | |||||
| Nil | vIgF2 | IgFam | E52 | EE57 | Frag | Nil | Absent in EEHB4 | ||||
| Nil | E49fam | Nil | Nil | EE56 | Nil | Absent in EEHV4 | |||||
| Nil | IgFam | Nil | Nil | EE55 | Nil | Absent in EEHV4 | |||||
| Nil | IgFam | Nil | Nil | EE54 | Nil | Absent in EEHV4 | |||||
| Nil | vIgF2.4 | IgFam | Frag | EE53 | EE53 | Nil | Absent in EEHV4 | ||||
| Nil | vIgF2.5 | IgFam | E53 | EE52 | EE52 | EE52 | Absent in EEHV4 | ||||
| Nil | vOX2-1 | Novel | E54 | EE51 | EE51 | EE51 | Absent in EEHV4 | ||||
| Nil | vIgF3 | IgFam | E55 | EE50 | EE50 | EE50 | Absent in EEHV4 | ||||
| Nil | vCD48 | IgFam | Nil | Nil | Nil | EE49D | Unique to EEHV5 | ||||
| Nil | IgFam | Nil | Nil | Nil | EE49C | Unique to EEHV5 | |||||
| Nil | vCD48 | IgFam | Nil | Nil | Nil | EE49B | Unique to EEHV5 | ||||
| Nil | IgFam | Nil | Nil | Nil | EE49A | Unique to EEHV5 | |||||
| E1, F | 7xTM | E3fam | 2061–3608 | 69 | 515 | 25 (45) | EE49 | EE49 | EE49 | N-term S/T extended | |
| Nil | Cys-rich | 7xTM | E3fam | E2 | EE48 | EE48 | EE48 | Absent in EEHV4 | |||
| E3, F | vGPCR6 | 7xTM | E3fam | 3981–4928 | 68 | 315 | 30 (70) | EE47 | EE47 | EE47 | Match to RAIP3 or C-5-Afam |
| E3.1, F | vGPCR6.1 | 7xTM | E3fam | 4991–6169 | 62 | 392 | 28 (60) | EE45 | EE45 | EE45 | Unique to EEHV4 |
| E3.2, F | vGPCR6.2 | 7xTM | E3fam | 6747–7727 | 67 | 328 | 30 (45) | EE45 | EE45 | EE45 | Unique to EEHV4 |
| E2A, F | 7xTM | Novel | 8015–9295 | 58 | 426 | Nil | Nil | Nil | Nil | Unique to EEHV4; S/T dom | |
| E3.3, F | vGPCR6.3 | 7xTM | E3fam | 9714–10721 | 50 | 335 | 34 (41) | EE45 | EE45 | EE45 | Unique to EEHV4 |
| E3.4, F | vGPCR6.4 | 7xTM | E3fam | 11637–12674 | 56 | 345 | 38 (69) | EE45 | EE45 | EE45 | Unique to EEHV4 |
| E4, F | vGCNT1 | AcTransf | Novel | 13026–14642 | 63 | 538 | 61 (68) | EE46 | EE46 | EE46 | — |
| Nil | vGPCR5 | 7xTM | E3fam | E5 | EE45 | EE45 | EE45 | Absent in EEHV4 | |||
| Nil | Novel | E5A | EE44 | EE44 | Nil | Short, memb, absent EEHV4 | |||||
| Nil | vCD48 | IgFam | Nil | Nil | Nil | EE44A | Unique to EEHV5 | ||||
| E4B, F | Novel | 15712–16578 | 56 | 288 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E4C, F | Novel | 16779–17234 | 54 | 151 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E6A, C | E27ex1 | Novel | 17811–17398 | 57 | 137 | E27 | EE20 | Nil | EE20 | 35% (44%) match to E27 | |
| E6B, F | Novel | 17978–18256 | 49 | 92 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E6, C | 7xTM | E6fam | 19679–18855 | 60 | 274 | 33 (89) | EE43 | EE43 | EE43 | ||
| Nil, C | vCXCL2? | E7A | Nil | Nil | Nil | Absent in EEHV1B/4/5 | |||||
| E7, C | 7xTM | E6fam | 20663–19956 | 55 | 235 | 32 (84) | EE42 | EE42 | EE42 | ||
| E10A, C | 7xTM | E6fam | 26078–25197 | 61 | Nil | Nil | Nil | 25 (46) | Matches central EE40(EEHV5) only | ||
| Nil | 7xTM | E6fam | E10 | EE39 | EE39 | EE39 | Absent in EEHV4 | ||||
| E11, C | 7xTM | E6fam | 27127–26363 | 61 | 254 | 36 (96) | EE38 | EE38 | EE38 | ||
| E12, C | 7xTM | E6fam | 28314–27457 | 60 | 285 | 34 (77) | EE37 | EE37 | EE37 | ||
| E12A, C | Novel | 28547–28281 | 60 | 88 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E13, C | 7xTM | E6fam | 29780–28965 | 58 | 271 | 52 (88) | EE36 | EE36 | EE36 | ||
| E14.1, C | 7xTM | E14fam | 31024–30215 | 59 | 269 | 27 (91) | Nil | Nil | Nil | Duplication of E14 | |
| E14.2, C | 7xTM | E14fam | 32191–31280 | 52 | 303 | 24 (77) | Nil | Nil | Nil | Duplication of E14 | |
| E14, C | 7xTM | E14fam | 33228–32419 | 57 | 269 | 26 (79) | EE35 | EE35 | EE35 | ||
| E15, C | vGPCR4 | 7xTM | E15fam | 34435–33425 | 58 | 336 | 34 (86) | EE34 | EE34 | EE34 | 26% (49%) match to Lox C-5-C |
| E16, C | 7xTM | E14fam | 35571–34768 | 57 | 267 | 40 (97) | EE33 | EE33 | EE33 | ||
| Nil | Novel | E16C | Nil | Nil | Nil | Conserved in EEHV1 and EEHV5 | |||||
| Nil | Novel | E16A/B | Nil | Nil | Nil | Spliced; unique to EEHV1A/B | |||||
| E17A, F | Novel | 36944–37252 | 63 | 102 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| Nil | Novel | Nil | Nil | Nil | EE32A | Unique to EEHV5 | |||||
| E18, F | 7xTM | E18fam | 37215–38018 | 57 | 267 | 32 (70) | EE31 | EE31 | EE31 | Related to E28 by 30% (51%) | |
| Nil | Novel | E18B | EE30A | EE30A | EE30A | Absent in EEHV4 | |||||
| Nil | Novel | E18A | EE30 | EE30 | EE30 | Absent in EEHV4 | |||||
| E18C, F | Novel | 38433–38720 | 59 | 94 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E19, F | ORF-F2 | U54.5fam | 39303–41042 | 68 | 579 | 52 (89) | EE29 | EE29 | EE29 | 25% (77%) to ORF-F1 | |
| E20, C | vGPCR4A | 7xTM | E15fam | 43188–42154 | 57 | 344 | 46 (84) | EE28 | EE28 | EE28 | 23% (67%) match to Lox RAIP3 |
| E20B, C | Novel | 45323–44904 | 65 | 139 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E20A, F | Novel | 45338–45664 | 61 | 108 | 43 (33) | EE27 | EE27 | EE27 | |||
| E21, C | vGPCR4B | 7xTM | E15fam | 46925–45810 | 59 | 371 | 35 (76) | EE26 | EE26 | EE26 | 27% (35%) match to Lox RAIP3 |
| E22, F | Novel | 47646–47921 | 49 | 91 | 48 (96) | EE25 | EE25 | EE25 | |||
| E22A, F | Novel | 48621–48730 | 53 | 79 | 46 (48) | EE24 | EE24 | EE24 | |||
| E23B, C | Novel | 48993–49325 | 57 | 110 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||
| E24B, C | vOX2-Bex2 | Novel | 49596–49250 | 56 | 132 | E54 | EE51 | EE51 | EE51 | ||
| vOX2-Bex1 | Novel | 50001–49950 | Short first exon | ||||||||
| Nil | vOX2-3 | E24 | EE23 | EE23 | EE23 | Absent in EEHV4 | |||||
| Nil | vOX2-V (E23A) | Nil | Nil | Nil | EE22A | Unique to EEHV5 | |||||
| Nil | vOX2-2 | E25 | EE22 | EE22 | EE22 | Absent in EEHV4 | |||||
| E26, C | vGPCR3 | 7xTM | E3fam | 51287–50418 | 50 | 289 | 42 (92) | EE21 | EE21 | EE21 | Match to ChemR C-5-C |
| E27, F | E27ex1 | E27 | Novel | 52138–52606 | 55 | 245 | 57 (58) | EE20ex1 | EE20ex1 | EE20ex1 | Related to E6A, E17 |
| E27ex2 | Novel | 52785–53053 | 59 | Nil | Nil | Nil | Nil | Unrelated to EE20ex2 | |||
| E28, F | 7xTM | E18fam | 53115–53861 | 54 | 248 | 44 (90) | EE19 | EE19 | EE19 | Related to E18 by 30% (51%) | |
| E29, F | 7xTM | Novel | 54092–54781 | 55 | 229 | 42 (91) | EE18 | EE18 | EE18 | ||
| E30, C | Novel | 55453–54911 | 55 | 180 | <15 | EE17 | EE17 | EE17 | Acidic similarity only | ||
| Nil | E31 | EE16 | EE16 | EE16 | Absent in EEHV4 | ||||||
| E31A, C | Novel | 56923–56321 | 56 | 200 | 35 (46) | EE15 | EE15 | EE15 | Only N-term cons | ||
| E31B, C | Novel | 57113–56610 | Nil | Nil | Nil | Nil | Unique to EEHV4 | ||||
| E31C, C | E31Cex | Novel | 57646–57182 | 61 | 136 | 65 (12) | EE14 | EE14 | EE14 | No ATG, splice to E32? | |
| E32, C | U14.5 | βδ? | 60473–57717 | 60 | 918 | 45 (82) | EE13 | EE13 | EE13 | ||
| Nil, F | E33 | EE12A | Frag | Nil | Unique to EEHV1A | ||||||
| E33A | Novel | 60991–61236 | 50 | 81 | 37 (59) | EE12 | EE12 | EE12 | |||
| U14, C | U14 | βδ | 63078–61408 | 59 | 556 | 37 (75) | U14 | U14 | U14 | ||
| U13.5, C | U13.5 | βδ | 64742–63444 | 52 | 432 | 76 (54) | UL34 | UL34 | UL34 | ||
| U12, C | vGPCR2ex2 | 7xTM | βδ | 67327–65060 | 57 | 783 | 50 (53) | U12 | U12 | U12 | |
| vGPCR2ex1 | 67537–67454 | 56 (100) | U12 | U12 | U12 | Short first exon | |||||
| E34, F | ORF-C | Novel | 68025–74276 | 59 | 2,083 | 42 (16) | U11 | U11 | U11 | Only N-term cons in EEHV1, and -5 | |
| U4, F | U4 | U4 | βδ | 74389–76053 | 61 | 554 | 58 (94) | U4 | U4 | U4 | 24% (38%) HHV6 U4 |
| U4.5, F | ORF-B | U4 | βδ | 76687–78444 | 63 | 585 | 59 (93) | EE11 | EE11 | EE11 | 24% (30%) U4 |
| E35, F | ORF-A | Novel | 79137–81659 | 63 | 840 | 51 (46) | EE10 | EE10 | EE10 | ||
| U44, C | U44 | Core | 82518–83729 | 55 | 403 | 77 (23) | U44 | U44 | U44 | Only C-term cons in EEHV1 and -5 | |
| U43, F | PRI | Core | 83629–87234 | 59 | 1,201 | 51 (84) | U43 | U43 | U43 | Primase subunit | |
| U42, F | MTAex1 | 87495–87627 | 65 (51) | U42 | U42 | U42 | Short first exon | ||||
| MTAex2 | Core | 87948–92020 | 64 | 1,401 | 52 (24) | U42 | U42 | U42 | Posttranscriptional regulator | ||
| Ori-Lyt | 92346–93325 | — | |||||||||
| U41, F | MDBP | Core | 93861–97376 | 61 | 1,171 | 63 (99) | U41 | U41 | U41 | SS DNA binding protein | |
| U40, F | TER2 | Core | 97498–99591 | 59 | 697 | 73 (98) | U40 | U40 | U40 | ||
| U39, F | gB | Core | 99533–102121 | 57 | 862 | 64 (94) | U39 | U39 | U39 | Env glycoprotein B | |
| U38, F | POL | Core | 102273–105527 | 62 | 1,084 | 65 (99) | U38 | U38 | U38 | DNA polymerase | |
| U37, C | DOC | Core | 106545–105740 | 59 | 268 | 64 (97) | U37 | U37 | U37 | Docking protein | |
| U36, C | Core | 108154–106538 | 63 | 538 | 69 (89) | U36 | U36 | U36 | |||
| U35, F | Core | 108266–108553 | 48 | 95 | 69 (98) | U35 | U35 | U35 | |||
| U34, F | Core | 108717–109619 | 53 | 300 | 63 (99) | U34 | U34 | U34 | |||
| U33, F | CRP | βγδ | 109914–111518 | 64 | 534 | 49 (91) | U33 | U33 | U33 | Cys-rich protein | |
| U32, F | SCP | Core | 111409–111672 | 58 | 87 | 46 (67) | U32 | U32 | U32 | Small capsid protein | |
| U31, C | TEG-L | Core | 118856–111873 | 65 | 2,321 | 44 (89) | U31 | U31 | U31 | Large tegument | |
| U30, C | TEG-S | Core | 124420–119289 | 61 | 1,713 | 45 (53) | U30 | U30 | U30 | Small tegument | |
| U29, F | TRI1 | Core | 124423–125313 | 60 | 296 | 60 (98) | U29 | U29 | U29 | Capsid triplex 1 | |
| U28, F | RRA | Core | 125563–128073 | 61 | 836 | 68 (66) | U28 | U28 | U28 | Ribonucleotide reductase A | |
| U27.5, F | RRB (ORF-H) | αγδ | 128212–129117 | 53 | 301 | 75 (99) | EE9 | EE9 | EE9 | Ribonucleotide reductase B | |
| U27, F | PPF | Core | 129702–131023 | 64 | 437 | 52 (65) | U27 | U27 | U27 | Pol processivity factor | |
| U45.7, F | ORF-J | Novel | 131035–131784 | 58 | 216 | 44 (33) | EE8 | EE8 | EE8 | ||
| U48.5, C | TK (ORF-E) | αγδ | 136101–135052 | 56 | 349 | 50 (87) | EE7 | EE7 | EE7 | Thymidine kinase | |
| U49, F | Core | 136100–136801 | 57 | 233 | 48 (85) | U49 | U49 | U49 | |||
| U50, F | PAC2 | Core | 136620–138347 | 57 | 575 | 64 (99) | U50 | U50 | U50 | Packaging | |
| U51, F | vGPCR1 | 7xTM | βδ | 138430–139647 | 56 | 405 | 42 (95) | U51 | U51 | U51 | — |
| U52, C | Core | 140605–139832 | 53 | 257 | 65 (98) | U52 | U52 | U52 | |||
| U53, F | SCA/PRO | Core | 140698–142485 | 60 | 595 | 49 (91) | U53 | U53 | U53 | Scaffold protease | |
| U54.5, C | ORF-F1 | U54.5fam | 144154–142715 | 61 | 479 | 38 (99) | U54 | U54 | U54 | 27% (95%) match to ORF-F2 | |
| U56, C | TRI2 | Core | 145344–144445 | 59 | 299 | 68 (99) | U56 | U56 | U56 | Capsid triplex 2 | |
| U5, C | MCP | Core | 149563–145511 | 63 | 1,350 | 71 (99) | U57 | U57 | U57 | Major capsid protein | |
| U58, F | vTBP | βγδ | 150117–153134 | 61 | 1,005 | 63 (87) | U58 | U58 | U58 | TATA-binding protein | |
| U59, F | βγδ | 152794–154152 | 62 | 452 | 48 (79) | U59 | U59 | U59 | |||
| U60, C | TERex3 | Core | 155504–154377 | 57 | 660 | 92 (99) | U60 | U60 | U60 | Terminase subunit 1 | |
| U62, F | βγδ | 155733–156005 | 54 | 90 | 57 (97) | U62 | U62 | U62 | |||
| U63, F | βγδ | 155944–156546 | 51 | 200 | 67 (72) | U63 | U63 | U63 | |||
| U64, F | PAC1 | Core | 156527–158552 | 64 | 541 | 48 (63) | U64 | U64 | U64 | Packaging | |
| U65, F | Core | 158055–159071 | 59 | 338 | 48 (98) | U65 | U65 | U65 | |||
| U66, C | TERex2 | Novel | 159272–159153 | 90 (100) | U66 | U66 | U66 | Terminase subunit 1 | |||
| TERex1 | Core | 160224–159490 | 53 | 89 (99) | U66 | U66 | U66 | Terminase subunit 1 | |||
| U67, F | βγδ | 160612–161742 | 58 | 376 | 61 (98) | U67 | U67 | U67 | |||
| U68, F | Core | 161739–162104 | 51 | 121 | 66 (98) | U68 | U68 | U68 | |||
| U69, F | CPK | Core | 162613–164232 | 59 | 539 | 57 (96) | U69 | U69 | U69 | Conserved protein kinase | |
| U70, F | EXO | Core | 164529–166094 | 61 | 521 | 53 (97) | U70 | U70 | U70 | Exonuclease | |
| U71, F | MyrTeg | Core | 166031–166342 | 56 | 103 | 41 (67) | U71 | U71 | U71 | Myristylated tegument | |
| U72, C | gM | Core | 167655–166537 | 55 | 372 | 66 (93) | U72 | U72 | U72 | Envelope glycoprotein M | |
| U73, F | OBP (ORF-G) | αδ | 168094–171645 | 61 | 1,183 | 65 (68) | U73 | U73 | U73 | Origin-binding protein | |
| U74, F | PAF | Core | 171659–173830 | 63 | 723 | 61 (91) | U74 | U74 | U74 | Pol-associated factor | |
| U75, C | Core | 174625–173813 | 63 | 270 | 54 (84) | U75 | U75 | U75 | |||
| U76, C | POR | Core | 176750–174579 | 63 | 723 | 74 (78) | U76 | U76 | U76 | Portal protein | |
| U77, F | HEL | Core | 176701–179574 | 63 | 957 | 79 (79) | U77 | U77 | U77 | Helicase subunit | |
| E36, F | ORF-M | Novel | 180833–183805 | 65 | 990 | 62 (19) | U79 | U79 | U79 | Env glycoprotein M (only N-term cons) | |
| Nil | ORF-N, vCXCL1 | Novel | E36A | EE6 | Nil | EE6 | Chemokine-like, absent in 1B, 4 | ||||
| U81, C | UDG | Core | 185284–184277 | 63 | 335 | 68 (68) | U81 | U81 | U81 | Uracil DNA glycosylase | |
| U82, C | gL | Core | 186080–185253 | 48 | 275 | 37 (94) | U82 | U82 | U82 | Env glycoprotein L | |
| E40, C | ORF-K | Novel | 194372–189975 | 66 | 1,465 | 72 (16) | EE2 | EE2 | EE2 | Only C-term cons | |
| E44A, C | ORF-S | Novel | 200795–199809 | 64 | 328 | 36 (84) | EE1A | EE1A | EE1A | Overlaps ORF-L | |
| E44, C | ORF-L IE-like | Novel | 201282–195226 | 65 | 2,018 | 52 (11) | EE1 | EE1 | EE1 | Transcriptional regulator | |
| TR | Palindrome | 202971–203014 | 45-bp hairpin | ||||||||
| TR | Packaging motifs | 205612–205665 | — | ||||||||
| TR | Packaging motifs | 205784–205896 | — | ||||||||
Fucosyl transferase 9 = EC 2.4.1.152.
Acetylglucosamine transferase 1 = EC 2.4.1.1.
UDP-β-Gal N-acetylglucosamine transferase 3, also known as O-linked N-acetylglucosamine transferase = EC 2.4.1.255.
Complex dyad symmetry. Resemblance to alphaherpesvirus Ori-L and Ori-S as well as HHV6 Ori-Lyt, but not to cytomegalovirus Ori-Lyt, much larger than EEHV1 and EEHV5 versions, 3× 90-bp and other dyad symmetry elements with 5× OBP-binding site motifs plus 35× 20-bp AT-rich tandem repeats.
No matches to other betaherpesvirus vGPCRs.
83% DNA match over 54 bp to terminal repeat motifs at 2852 to 2905 and 180311 to 180358 in EEHV1B(Emelia).
72% DNA match over 112 bp to terminal repeat motifs present in all three copies of the “a” sequence of HSV-1(KOS).
The six clusters of genes or exons with unusually low GC content are shown in bold.
Abbreviations: TR, tandem repeat; N-term, N terminal; dom, domain; memb, membrane; C term, C terminal; esp, especially; cons, conservation; SS, single stranded; UDG, uracil DNA glycosylase; Frag, fragmented; F, forward strand; C, complementary strand.
FIG 2 Global alignment patterns for the intact EEHV4 genome compared to EEHV1 and HCMV. The dot matrix diagrams showing direct linear nucleotide alignments were generated as implemented at http://blast.ncbi.nlm.nih.gov/Blast.cgi. (a) Comparison across the intact 206-kb genome of EEHV4(Baylor) (KT832477) from the GC-rich branch of the Proboscivirus genus with the intact 180-kb genome of EEHV1A(Kimba) (KC618527) from the AT-rich branch of the Proboscivirus genus derived from the work of Ling et al. (24) when aligned in the same orientation. (b) Comparison across the intact 206-kb genome of EEHV4(Baylor) (KT832477) with the intact 235-kb genome of HCMV(Merlin) (AY446834.2) in the Cytomegalovirus genus of the mammalian betaherpesvirus subfamily derived from the work of Dolan et al. (41), with the latter aligned in the standard orientation.
FIG 3 EEHV4 carries a very large family of distantly related paralogous 7xTM and vGPCR-like genes. Linear distance-based Bayesian bootstrap phylogenetic tree comparisons for all 26 members of the 7xTM-containing multigene family from the EEHV4(Baylor) GC-rich branch Proboscivirus compared to their nearest host cell analogue Lox RAIP3 as the outgroup. The entire family is loosely divided into five subgroups, whose designated prototypes of E15, E3, E18, E14, and E6 are indicated. Note that, as indicated by the distance values, all of these paralogues are very highly diverged from one another, with the exception of E14.1 and E14.2, which most likely represent the most recent duplication event. A subset of the genes in this family (p3 or δ3) exhibit features of GPCR genes as described in greater detail in the accompanying paper (27).
FIG 4 Codon-specific scanning GC content panels showing the wobble codon GC bias effect across selected representative segments of the EEHV4(Baylor) genome. Diagrams showing the percent G-plus-C content of each of the three potential translated codon frames across four selected 18-kb segments of the EEHV4B(Baylor) genomic DNA sequence as implemented under the codon-specific G-plus-C percent toolbox item in MacVector 12. Short vertical bars indicate forward direction terminators. Annotated ORF positions and sizes are denoted by open arrows. Highly GC-biased wobble position blocks with average values between 80 and 100% are marked with solid bars. For a hypothetical ORF with an initiator codon beginning in frame 1 at position x in the diagram, the wobble position codons are represented by the succeeding frame 3 line (or frame 1 for a frame 2 initiator and frame 2 for a frame 3 initiator). (a) Forward-directed strand across coordinates 1 to 18000 at the extreme left side encompassing 10 out of 11 rightward-oriented genes from E1 to E4C, including E4 (vGCNT1) with high wobble position GC bias. (b) Inverted segment of the complementary strand across coordinates 37000 to 19000 encompassing predominantly leftward-oriented genes (16 between E6 and E16D) and two rightward-oriented genes (E9A and E17), including 11 genes displaying uniformly high wobble codon GC bias plus seven genes in two blocks, E7B-E9 (vOGT)-E9A-E9B-E9C and E16D (vECTL)-E17-E17A, that do not display wobble GC bias (all of the latter are labeled with asterisks). (c) Forward strand across coordinates 86000 to 104000 encompassing six rightward-oriented core region genes, U43 (PRI) to U38 (POL), with high wobble position GC bias on either side of the predicted novel Ori-Lyt domain. (d) Inverted segment of the complementary strand from the extreme right side across coordinates 187867 to 205894 encompassing U44A (ORF-S), U44 (ORF-L), and U40 (ORF-K). The only three high-GC-bias wobble codon blocks found within this region occur in ORF-S and in the conserved C-terminal domains of ORF-L and ORF-K (marked with solid bars).
FIG 5 Complex tandem and inverted repeat patterns within the predicted Ori-Lyt domain of EEHV4. (a) Dot matrix self-comparison of the DNA from coordinates 92021 to 93860 encompassing the entire intergenic region between the N terminus of U41 (MDBP) and the C terminus of U42 (MTA) from EEHV4(Baylor). The proposed 1.2-kb Ori-Lyt domain spans from coordinates 92120 to 93325. Direct tandemly repeated structure is indicated by additional lines parallel to the main diagonal, whereas inverted repeats are indicated by additional lines perpendicular to the main diagonal. (b) Features of the unusual expanded dyad symmetry Ori-Lyt region of EEHV4 compared to those of EEHV1 and other alphaherpesvirus-like dyad-symmetry-type origins. Cartoon diagram comparing the sizes and major structural features of the predicted dyad symmetry domains of EEHV4(Baylor) and EEHV1A(Kimba) with those of the HHV6 version and with both Ori-L and Ori-S of HSV-1. Circles denote alternating A-plus-T dinucleotide runs. Short horizontal pointed bars represent copies of the consensus OBP-binding site motif. Other, larger sets of arrows designate various types of inverted repeats as well as the 24× and 10× copies of the 20-bp AT-rich direct tandem repeats in the EEHV4(Baylor) version.
FIG 6 Positions and sizes of three identified EEHV4A-EEHV4B chimeric domains and boundaries relative to those of EEHV1A-EEHV1B chimeric domains. The diagrams show Simplot (40) comparisons of the nucleotide identity patterns between EEHV4A(NAP22) and EEHV4B(Baylor) across three mapped chimeric domains, CD-I (1.1 kb), CD-II (3.7 kb), and CD-IV (4.7 kb), shown as blue lines in comparison to superimposed data for CD-I (3.2 kb) and CD-II (3.7 kb) of EEHV1A(Kimba) versus EEHV1B(Emelia), shown as red lines. CD-IV of EEHV4 has no equivalent in EEHV1, and there are no data available for the presumed region CD-III of EEHV4. (Top) CD-I chimeric region within U39 (gB) of EEHV4A versus EEHV4B at EEHV4(Baylor) map coordinates 99993 to 101150 compared to the much larger overlapping CD-II of EEHV1A versus EEHV1B, which encompasses part of U40, all of U39 (gB), and part of U38 (POL). Vertical arrows denote the positions of the EEHV1A1B chimeric domain boundaries. (Middle) CD-II chimeric region encompassing part of ORF-J, all of gN-gO-gH, and part of TK of EEHV4A versus EEHV4B at EEHV4(Baylor) map coordinates 131750 to 135400 compared to the nearly equivalent superimposed CD-II of EEHV1A versus EEHV1B. Vertical arrows denote the positions of the EEHV1A-1B chimeric domain boundaries. (Bottom) CD-IV chimeric domain of EEHV4A versus EEHV4B mapping between EEHV4(Baylor) coordinates at map coordinates 20541 to 25210. Vertical arrows denote the positions of the EEHV4A-4B chimeric domain boundaries that encompass part of E7 and all of E7B, E9, E9A (vOGT), E9B, and E9C but end before E10A.